kedro

Maintainable
data science solved

Kedro is a toolbox for production-ready data science.

Why Kedro?

Machine Learning Engineering

Kedro is the foundation for clean data science code. It borrows concepts from software engineering and applies them to machine-learning projects.

Handles Complexity

A Kedro project provides scaffolding for complex data and machine-learning pipelines. You spend less time on tedious "plumbing" and focus instead on solving new problems.

Standardisation

Kedro standardises how data science code is created and ensures teams collaborate to solve problems easily.

Production-Ready

Make a seamless transition from development to production with exploratory code that you can transition to reproducible, maintainable, and modular experiments.

Features

Pipeline Visualisation

Kedro-Viz is a blueprint of your data and machine-learning workflows. It provides data lineage, keeps track of machine-learning experiments, and makes it easier to collaborate with business stakeholders.

–– 01

Data Catalog

A series of lightweight data connectors used to save and load data across many different file formats and file systems. The Data Catalog supports S3, GCP, Azure, sFTP, DBFS, and local filesystems. Supported file formats include Pandas, Spark, Dask, NetworkX, Pickle, Plotly, Matplotlib, and many more. The Data Catalog also includes data and model snapshots for file-based systems.

–– 02

Integrations

Amazon SageMaker, Apache Airflow, Apache Spark, Azure ML, Dask, Databricks, Docker, fsspec, Jupyter Notebook, Kubeflow, Matplotlib, MLflow, Plotly, Pandas, VertexAI, and more.

–– 03

Project Template

You can standardise how configuration, source code, tests, documentation, and notebooks are organised with an adaptable, easy-to-use project template. Create your cookie cutter project templates with Starters.

–– 04

Dedicated IDE support

The extension integrates Kedro projects with Visual Studio Code, providing features like enhanced code navigation and autocompletion for seamless development.

FAQs

You can find the Kedro community on Slack.

We also maintain a list of extensions, plugins, articles, podcasts, talks, and Kedro showcase projects in the awesome-kedro repository.

Expand all

What is Kedro?

Kedro is an open-source Python framework hosted by the Linux Foundation (LF AI & Data). Kedro uses software engineering best practices to help you build production-ready data science code.

What does Kedro do?

Is Kedro an orchestrator?

I'm a data scientist. Why should I use Kedro?

I'm a Machine-Learning Engineer/Data Engineer. Why should I be interested in Kedro?

I'm a Product Lead, and my team wants to use Kedro. Why?

What's Kedro's origin story?

How can I find out more about Kedro?

Our community

Case studies

Kedro in production at

Learn how Kedro is used in production at Telkomsel, Indonesia's largest telecommunications company. Kedro is used to help consume tens of TBs of data, run hundreds of feature engineering tasks, and serve dozens of ML models.

Kedro in production at

Data scientists at Beamery, a fast-growing talent lifecycle management company, explain how Kedro helps them write "production-code". They talk about a workflow that involves Kedro when they want to progress their POCs.

Testimonials

Eduardo Ohe, Principal Data Engineer

Tremendously valuable

"Kedro has streamlined our workflow process, avoiding a lot of back and forth with debugging. It allowed our company to deliver more value to our customers quickly."

Ghifari Dwiki Ramadhan, Data Engineering

We heavily use Kedro

"We use Kedro in our production environment which consumes tens of TBs of data, runs hundreds of feature engineering tasks, and serves dozens of ML models."

Ready to start?

Kedro is an open-source project. Go ahead and install it with pip or conda:

pip install kedro

conda install -c conda-forge kedro

For more details, see the set up documentation or watch the video.

kedro

Maintainable
data science solved

Kedro is a toolbox for production-ready data science.

Why Kedro?

Features

Pipeline Visualisation

Data Catalog

Integrations

Project Template

Dedicated IDE support

Pipeline Abstraction

Coding Standards

Flexible Deployment

Experiment Tracking

FAQs

What is Kedro?

What does Kedro do?

Is Kedro an orchestrator?

I'm a data scientist. Why should I use Kedro?

I'm a Machine-Learning Engineer/Data Engineer. Why should I be interested in Kedro?

I'm a Product Lead, and my team wants to use Kedro. Why?

What's Kedro's origin story?

How can I find out more about Kedro?

Case studies

Kedro in production at

Kedro in production at

Testimonials

Tremendously valuable

We heavily use Kedro

Ready to start?

kedro

kedro

Maintainabledata science solved

Kedro is a toolbox for production-ready data science.

Why Kedro?

Features

Pipeline Visualisation

Data Catalog

Integrations

Project Template

Dedicated IDE support

Pipeline Abstraction

Coding Standards

Flexible Deployment

Experiment Tracking

FAQs

What is Kedro?

What does Kedro do?

Is Kedro an orchestrator?

I'm a data scientist. Why should I use Kedro?

I'm a Machine-Learning Engineer/Data Engineer. Why should I be interested in Kedro?

I'm a Product Lead, and my team wants to use Kedro. Why?

What's Kedro's origin story?

How can I find out more about Kedro?

Case studies

Kedro in production at

Kedro in production at

Testimonials

Tremendously valuable

We heavily use Kedro

Ready to start?

kedro

Maintainable
data science solved