Maintainable
data science solved

Kedro is a toolbox for production-ready data science.

Kedro hero graphic

Why Kedro?

Machine Learning Engineering
Kedro is the foundation for clean data science code. It borrows concepts from software engineering and applies them to machine-learning projects.
Handles Complexity
A Kedro project provides scaffolding for complex data and machine-learning pipelines. You spend less time on tedious "plumbing" and focus instead on solving new problems.
Standardisation
Kedro standardises how data science code is created and ensures teams collaborate to solve problems easily.
Production-Ready
Make a seamless transition from development to production with exploratory code that you can transition to reproducible, maintainable, and modular experiments.

Features

Pipeline Visualisation

Kedro-Viz is a blueprint of your data and machine-learning workflows. It provides data lineage, keeps track of machine-learning experiments, and makes it easier to collaborate with business stakeholders.

Kedro-Viz example
–– 01

Data Catalog

A series of lightweight data connectors used to save and load data across many different file formats and file systems. The Data Catalog supports S3, GCP, Azure, sFTP, DBFS, and local filesystems. Supported file formats include Pandas, Spark, Dask, NetworkX, Pickle, Plotly, Matplotlib, and many more. The Data Catalog also includes data and model snapshots for file-based systems.

Data Catalog
Data Catalog
–– 02

Integrations

Amazon SageMaker, Apache Airflow, Apache Spark, Azure ML, Dask, Databricks, Docker, fsspec, Jupyter Notebook, Kubeflow, Matplotlib, MLflow, Plotly, Pandas, VertexAI, and more.

Integrations example
–– 03

Project Template

You can standardise how configuration, source code, tests, documentation, and notebooks are organised with an adaptable, easy-to-use project template. Create your cookie cutter project templates with Starters.

Project Template
–– 04

Pipeline Abstraction

You never have to label the running order of tasks in your pipeline because Kedro supports a dataset-driven workflow that supports automatic resolution of dependencies between pure Python functions.

Pipeline Abstraction
–– 05

Coding Standards

Test-driven development using pytest, produce well-documented code using Sphinx, create linted code with support for flake8, isort and black and make use of the standard Python logging library.

–– 06

Flexible Deployment

Deployment strategies that include single or distributed-machine deployment as well as additional support for deploying on Argo, Prefect, Kubeflow, AWS Batch, AWS Sagemaker, Databricks, Dask and more.

Flexible Deployment example
–– 07

Experiment Tracking

Experiment tracking records all the information you need to recreate and analyse a data science experiment.

Experiment Tracking example

FAQs

You can find the Kedro community on Slack.

We also maintain a list of extensions, plugins, articles, podcasts, talks, and Kedro showcase projects in the awesome-kedro repository.

Expand all

What is Kedro?

Kedro is an open-source Python framework hosted by the Linux Foundation (LF AI & Data). Kedro uses software engineering best practices to help you build production-ready data science code.


What does Kedro do?


Is Kedro an orchestrator?


I'm a data scientist. Why should I use Kedro?


I'm a Machine-Learning Engineer/Data Engineer. Why should I be interested in Kedro?


I'm a Product Lead, and my team wants to use Kedro. Why?


What's Kedro's origin story?


How can I find out more about Kedro?

Our community
QuantumBlack logoBelfius logoLeapfrog logoBeamery logoXP logoGMO logoAugment Partners logoAI Singapore logoGetInData logoNHS AI Lab logoIndicium logoTelkomsel logoMcKinsey & Company logoNASA logoSber logoHelvetas logo
QuantumBlack logoBelfius logoLeapfrog logoBeamery logoXP logoGMO logoAugment Partners logoAI Singapore logoGetInData logoNHS AI Lab logoIndicium logoTelkomsel logoMcKinsey & Company logoNASA logoSber logoHelvetas logo

Case studies

Kedro in production at

Case study logo

Learn how Kedro is used in production at Telkomsel, Indonesia's largest telecommunications company. Kedro is used to help consume tens of TBs of data, run hundreds of feature engineering tasks, and serve dozens of ML models.

Kedro in production at

Case study logo

Data scientists at Beamery, a fast-growing talent lifecycle management company, explain how Kedro helps them write "production-code". They talk about a workflow that involves Kedro when they want to progress their POCs.

Testimonials

Testimonial logo

Eduardo Ohe, Principal Data Engineer

Tremendously valuable

"Kedro has streamlined our workflow process, avoiding a lot of back and forth with debugging. It allowed our company to deliver more value to our customers quickly."

Testimonial logo

Ghifari Dwiki Ramadhan, Data Engineering

We heavily use Kedro

"We use Kedro in our production environment which consumes tens of TBs of data, runs hundreds of feature engineering tasks, and serves dozens of ML models."

Ready to start?

Kedro is an open-source project. Go ahead and install it with pip or conda:

pip install kedro

or

conda install -c conda-forge kedro

For more details, see the set up documentation or watch the video.

Kedro ready to start graphic