Maintainable
data science solved

Kedro is an open-sourced Python framework for creating maintainable and modular data science code.

Kedro logo

Why Kedro?

Puts the "engineering" back into data science because it borrows concepts from software engineering and applies them to machine-learning code. It is the foundation for clean, data science code.

Features

Pipeline Visualisation

Kedro's pipeline visualisation plugin shows a blueprint of your developing data and machine-learning workflows, provides data lineage, keeps track of machine-learning experiments and makes it easier to collaborate with business stakeholders.

Kedro-Viz example

Data Catalog

A series of lightweight data connectors used to save and load data across many different file formats and file systems. Supported file formats include Pandas, Spark, Dask, NetworkX, Pickle, Plotly, Matplotlib and many more. The Data Catalog supports S3, GCP, Azure, sFTP, DBFS and local filesystems. The Data Catalog also includes data and model snapshots for file-based systems.

Data Catalog
Data Catalog

Integrations

Apache Spark, Pandas, Dask, Matplotlib, Plotly, fsspec, Apache Airflow, Jupyter Notebook and Docker.

Integrations example

Project Template

You can standardise how configuration, source code, tests, documentation, and notebooks are organised with an adaptable, easy-to-use project template. Create your cookie cutter project templates with Starters.

Project Template

Pipeline Abstraction

You never have to label the running order of tasks in your pipeline because Kedro supports a dataset-driven workflow that supports automatic resolution of dependencies between pure Python functions.

Pipeline Abstraction

Coding Standards

Test-driven development using pytest, produce well-documented code using Sphinx, create linted code with support for flake8, isort and black and make use of the standard Python logging library.

Flexible Deployment

Deployment strategies that include single or distributed-machine deployment as well as additional support for deploying on Argo, Prefect, Kubeflow, AWS Batch, AWS Sagemaker, Databricks, Dask and more.

Flexible Deployment example

Experiment Tracking

Experiment tracking records all the information you need to recreate and analyse a data science experiment.

Experiment Tracking example

Case Studies

Case study logo

Kedro in Production at Telksomsel

Learn how Kedro is used in production at Telkomsel, Indonesia's largest telecommunications company. Kedro is used to help consume tens of TBs of data, run hundreds of feature engineering tasks, and serve dozens of ML models.

Case study logo

Creating Robust ML Products at Beamery

Data scientists at Beamery, a fast-growing talent lifecycle management company, explain how Kedro helps them write "production-code". They talk about a workflow that involves Kedro when they want to progress their POCs.

Our community
  • QuantumBlack logo
  • Belfius logo
  • Leapfrog logo
  • Beamery logo
  • XP logo
  • GMO logo
  • Augment Partners logo
  • AI Singapore logo
  • GetInData logo
  • NHS AI Lab logo
  • Indicium logo
  • Telkomsel logo
  • McKinsey & Company logo
  • NASA logo
  • Sber logo
  • Helvetas logo

Testimonials

Ready to start?

You are ready to get going with the Kedro workflow. But first, head to our documentation to learn how to install Kedro and then get up to speed with concepts like nodes, pipelines, the data catalog in our introductory tutorial.

Kedro logo