Experiment tracking — 10 min read

Track machine learning experiments with Kedro

This article focuses on experiment tracking, explaining why it is valuable to data science projects and describing the features recently released for experiment tracking in Kedro projects.

16 Mar 2023 (last updated 4 May 2023)
small square mint holes

Like a laboratory notebook for a scientific project, experiment tracking is a way to record everything you need to compare machine-learning experiments and recreate them. The information recorded includes experiment data such as parameters, metrics, models, plots, and other dataset types, which can be searched using a timestamp.

What is Kedro?

Kedro is an open-source Python framework hosted by the Linux Foundation (LF AI & Data). Kedro standardises how data science code is created to ensure it is reproducible, maintainable, and modular; it uses software engineering best practices to help you build production-ready data science code.

Kedro-Viz enables you to visualise the data, nodes and data pipelines of your Kedro project. To see it in action, check out our hosted demo.

Experiment tracking with Kedro

We shipped the first iteration of experiment tracking on Kedro-Viz in November 2021, enabling users to see and compare different metrics (tracked datasets) from their Kedro runs. We did this in response to feedback from existing Kedro users, product metrics from the Kedro-MLflow plugin (developed by our open-source community), and an earlier experiment tracking tool called PerformanceAI

The main pain point that Kedro experiment tracking solves for Kedro users is that it provides a lightweight and instant way to compare experiments and observe how metrics change over time, with no additional dependencies to manage.

Kedro experiment tracking enables Kedro users to select, plot, and compare how multiple metrics change and identify the best-performing ML experiment without needing additional infrastructure or approval. 

Kedro’s new experiment tracking features

We recently released two additional features to address critical pain points in experiment tracking. 

Feature 1: Track and visualise plots as part of experiment tracking. This feature enables users to track their plots as part of their experiments and visualise them on Kedro-Viz experiment tracking.

Feature 2: Parallel coordinate and time-series plot - for plotting experiment metrics from pipeline runs. This feature enables users to plot experiment metrics from pipeline runs in a time series or parallel coordinate plot. Users can select, plot, and compare multiple metrics and select the best-performing experiment.

"I use experiment tracking to understand all my experiments and find the best one. When I'm iterating on a model, I'm probably going to test different combinations of parameters from my Kedro pipeline and maybe even different datasets. And for each of those, I'm going to log different success metrics. I use experiment tracking to log and visualise all those metrics so I don't need to go through each dataset in CSV or YAML and compare them manually." 

— Kedro user testimonial

Experiment tracking use cases with Kedro

Several experiment tracking solutions are available, and choosing one can be challenging. The choice of which experiment tracking tool to use depends on your use case:

  • Kedro - If you need experiment tracking, improved metrics visualisation, and want a lightweight tool created to leverage existing functionality in Kedro. Experiment tracking in Kedro is tightly coupled to the Kedro workflow.

  • MLflow - If you require experiment tracking, model registry and model serving capabilities or have access to Managed MLflow within the Databricks ecosystem, you can use MLFlow with Kedro through the Kedro-MLflow plugin.

  • Neptune.ai - If you require experiment tracking, model registry functionality, improved metrics visualisation and support for collaborative data science, you can use Neptune.ai with Kedro through the Kedro-Neptune plugin. 

What's next for experiment tracking in Kedro? 

Our next steps are to work on a pain point identified in recent user research and enable users to write experiments to a remote server. Currently, a user can only store experiment data on their local machine, but users want to be able to write to storage and share it with other team members. Supporting this feature would encourage multi-user collaboration across the user's team.

Existing Kedro users have indicated that a solution would transform their workflow: 

"If we could write our metrics files to an S3 bucket and then run experiment tracking pointing at that S3 bucket, that simplifies our workflow in many different ways and would be really helpful. And it would make Kedro experiment tracking just as easy, if not easier, than MLFlow for us."

— Kedro user feedback

Summary

Kedro experiment tracking makes it easy to view and compare your experiments and see how your metrics have changed from the Kedro-Viz web app. 

This article has described recent developments in ML experiment tracking by the Kedro team. Our documentation explains how to set up and use experiment tracking in Kedro today, so you can consider adopting it in your next project. 

You can also join the Slack community, check our website, and get updated with the latest developments in Kedro.


On this page:

Photo of Nero Okwa
Nero Okwa
Product Manager, Kedro
Share post:
Twitter logoLinkedIn logo

All blog posts

cover image alt

Best practices — 10 min read

How to become a more technical product owner

On World Product Day 2023, Yetunde Dada explains how to build your technical skills as a product owner to enhance your effectiveness and success in the role.

Yetunde Dada

24 May 2023

cover image alt

Tutorials — 6 min read

A Polars exploration into Kedro

Polars is an open-source library that provides fast dataframes for Python. This blog post explains how can you use Polars instead of pandas in Kedro for your data catalog and data manipulation.

cover image alt

Kedro deployment — 5 min read

Seven steps to deploy Kedro pipelines on Amazon EMR

Amazon EMR works with open-source big data frameworks like Apache Spark to help you tackle vast amounts of data. This post explains how to combine Amazon EMR, Kedro, and Apache Spark.

Afaque Ahmad

10 May 2023

cover image alt

Kedro news — 5 min read

In the Pipeline: May 2023

"In the Pipeline" is overflowing with the latest Kedro news, upcoming events, and interesting topics.

Jo Stichbury

4 May 2023

cover image alt

Best practices — 5 min read

Seven development principles for opinionated teams

In this blog post, we introduce a set of principles that summarise our development philosophy and steer our decision-making.

Jo Stichbury

26 Apr 2023