Home Page | Pachyderm

Transform your Company

Pachyderm is used across a variety of industries and use cases. Pachyderm provides a powerful solution to optimize data processing, MLOps, and ML Lifecycles.

Healthcare & Life Sciences

Patient Records
X-Rays
Genomics

Modernized Care
Improved Diagnostics
Genomic Sequencing
Optimized Treatment

The scientific method's reproducibility gave birth to the scientific revolution. As biotech firms turn to AI/ML to drive next generation drug discovery, they need a new kind of reproducibility: Pachyderm's data lineage delivers the scientific method data scientists need to create scalable, repeatable experiments now. Read more...

Financial Services

Call Recordings
Financial Transactions
Security Logs

Fraud Detection
Customer Sentiment Analysis
Automated Trading
Risk Assessment

Machine Learning has made a large impact in Financial Services by addressing a wide range of applications from fraud detection to improved customer service and robo investing tools. Pachyderm’s immutalbe data lineage ensure that all processes can be reproduced reducing risk and enforcing compliance. Read more...

Natural Language Processing (NLP)

Documents
Text Files
Recordings

Sentiment Analysis
Automatic classification
Actionable insights
Automatic redaction

Detect customer sentiment and analyze customer interactions. Provide insight into efficient customer escalation handling and analyze key words and phases. With Pachyderm's complete pipelining and lineage the model can be refined until true actionable insights are reproduced. Read more...

Video and Image Processing

Videos
Images
Metadata

Object Detection
Motion tracking
Feature recognition
Color correction

Video and imaging ETL is characterized by large unstructured data sets that can create bottlenecks for teams as they look to productionize and scale. Pachyderm provides and autoscaling deduplication platform that effortlessly scales and deduplicates large datasets saving costs. Read more...

Use Any Language, Transform Any Data

Built for Data Engineers

Pachyderm is container-native, running with standard containerized tooling and allows engineers complete autonomy to use whatever languages or libraries are best for the job.

Pachyderm is data-agnostic, supporting both unstructured data such as videos and images as well as tabular data from data warehouses.

Explore the Docs


import cv2
import numpy as np
from matplotlib import pyplot as plt
import os

# edges.py reads an image and outputs transformed image
def make_edges(image):
  img = cv2.imread(image)
  tail = os.path.split(image)[1]
  edges = cv2.Canny(img,100,200)
  plt.imsave(os.path.join("/pfs/out", os.path.splitext(tail)[0]+'.png'), edges, cmap = 'gray')

# walk images and call make_edges on every file found
for dirpath, dirs, files in os.walk("/pfs/images"):
  for file in files:
      make_edges(os.path.join(dirpath, file))


var cv2 = require('cv2');
var np = require('numpy');
from matplotlib var plt = require('pyplot');
var os = require('os');

// make_edges reads an image from /pfs/images and outputs the result of running
// edge detection on that image to /pfs/out. Note that /pfs/images and
// /pfs/out are special directories that Pachyderm injects into the container.
function make_edges(image) {
  img = cv2.imread(image);
  tail = os.path.split(image)[1];
  edges = cv2.Canny(img,100,200);
  plt.imsave(os.path.join('/pfs/out', os.path.splitext(tail)[0]+'.png'), edges, cmap = 'gray');
}
// walk /pfs/images and call make_edges on every file found
for (dirpath, dirs, files in os.walk('/pfs/images')) {
  for (file in files) {
      make_edges(os.path.join(dirpath, file));
  }
}


package it.polito.elite.teaching.cv;

/**
 * @author luigi.derussis@polito.it Luigi De Russis
 */
public class HelloCV
{
	public static void main(String[] args)
	{
		// load the OpenCV native library
		System.loadLibrary(Core.NATIVE_LIBRARY_NAME);
		
		// create and print on screen a 3x3 identity matrix
		System.out.println("Create a 3x3 identity matrix...");
		Mat mat = Mat.eye(3, 3, CvType.CV_8UC1);
		System.out.println("mat = " + mat.dump());
		
		// prepare to convert a RGB image in gray scale
		String location = "resources/Poli.jpg";
		System.out.print("Convert the image at " + location + " in gray scale... ");
		// get the jpeg image from the internal resource folder
		Mat image = Imgcodecs.imread(location);
		// convert the image in gray scale
		Imgproc.cvtColor(image, image, Imgproc.COLOR_BGR2GRAY);
		// write the new image on disk
		Imgcodecs.imwrite("resources/Poli-gray.jpg", image);
		System.out.println("Done!");
	}
}


import scala.io.Source
import java.io.PrintWriter

object RemoveNan {
  def main(args: Array[String]): Unit = {
    // Read the csv file and store it as a list of strings
    val lines = Source.fromFile("/pfs/input_repo/input.csv").getLines.toList

    // Filter out rows containing Nan values
    val cleanRows = lines.filter(line => !line.contains("Nan"))

    // Write the filtered rows to a new csv file
    val writer = new PrintWriter("/pfs/out/output.csv")
    cleanRows.foreach(line => writer.write(line + "\n"))
    writer.close()
  }
}

Transform your Company

Learn how companies around the world are using Pachyderm to automate complex pipelines at scale.

Pachyderm helped us modernize our healthcare records. It supports our ML efforts to provide actionable medical insights from millions of member records and terabytes of clinical data. The joy as a data engineer was setting this up once and letting Pachyderm take care of it automatically

Read Case Study

Pachyderm helps us build automotive-grade maps for use in the automated-driving vehicles of today and the autonomous-driving vehicles of tomorrow. Developing this level of granularity requires processing voluminous amounts of data at scale with cutting-edge machine learning solutions.

Read Case Study

Pachyderm helps us convert our existing data science pipelines from manually managed scripts to scalable, repeatable end-to-end workflows; enabling us to focus more on developing transformative technology to drive agriculture forward instead of wrangling infrastructure.

Read Case Study

Automate Data Transformations

Data-driven pipelines for machine learning healthcare automotive services unstructured data any data & language

Automate Complex Pipelines
with sophisticated data
transformations

Orginal Data Sets

Transformation Code

New Data Sets

Automatic Detection

Version Control

Autoscaling

Automatic Deduplication

Cloud & On-prem

Structured & Unstructured

Transform Any Data

Transform your Company

Healthcare & Life Sciences

Financial Services

Natural Language Processing (NLP)

Video and Image Processing

Data Science & ML Ops

Integrates into your workflow

Use Any Language, Transform Any Data

Built for Data Engineers

Choose Any Data Type

Integrate with your favorite tools

automate your data pipeline

Automate Pipelines Easily

Easy as 1-2-3

Any Scale

Process petabytes of data, thousands of jobs, hundreds of models.

Transform your data pipeline

Transform your Company