kedro

Experiment tracking, Kedro-Viz — 10 min read

Track machine learning experiments with Kedro

This article focuses on experiment tracking, explaining why it is valuable to data science projects and describing the features recently released for experiment tracking in Kedro projects.

16 Mar 2023 (last updated 16 Nov 2023)

Like a laboratory notebook for a scientific project, experiment tracking is a way to record everything you need to compare machine-learning experiments and recreate them. The information recorded includes experiment data such as parameters, metrics, models, plots, and other dataset types, which can be searched using a timestamp.

What is Kedro?

Kedro is an open-source Python framework hosted by the Linux Foundation (LF AI & Data). Kedro standardises how data science code is created to ensure it is reproducible, maintainable, and modular; it uses software engineering best practices to help you build production-ready data science code.

Kedro-Viz enables you to visualise the data, nodes and data pipelines of your Kedro project. To see it in action, check out our hosted demo.

Experiment tracking with Kedro

We shipped the first iteration of experiment tracking on Kedro-Viz in November 2021, enabling users to see and compare different metrics (tracked datasets) from their Kedro runs. We did this in response to feedback from existing Kedro users, product metrics from the Kedro-MLflow plugin (developed by our open-source community), and an earlier experiment tracking tool called PerformanceAI.

The main pain point that Kedro experiment tracking solves for Kedro users is that it provides a lightweight and instant way to compare experiments and observe how metrics change over time, with no additional dependencies to manage.

Kedro experiment tracking enables Kedro users to select, plot, and compare how multiple metrics change and identify the best-performing ML experiment without needing additional infrastructure or approval.

Kedro’s new experiment tracking features

We recently released two additional features to address critical pain points in experiment tracking.

Feature 1: Track and visualise plots as part of experiment tracking. This feature enables users to track their plots as part of their experiments and visualise them on Kedro-Viz experiment tracking.

Feature 2: Parallel coordinate and time-series plot - for plotting experiment metrics from pipeline runs. This feature enables users to plot experiment metrics from pipeline runs in a time series or parallel coordinate plot. Users can select, plot, and compare multiple metrics and select the best-performing experiment.

"I use experiment tracking to understand all my experiments and find the best one. When I'm iterating on a model, I'm probably going to test different combinations of parameters from my Kedro pipeline and maybe even different datasets. And for each of those, I'm going to log different success metrics. I use experiment tracking to log and visualise all those metrics so I don't need to go through each dataset in CSV or YAML and compare them manually."
— Kedro user testimonial

Experiment tracking use cases with Kedro

Several experiment tracking solutions are available, and choosing one can be challenging. The choice of which experiment tracking tool to use depends on your use case:

Kedro - If you need experiment tracking, improved metrics visualisation, and want a lightweight tool created to leverage existing functionality in Kedro. Experiment tracking in Kedro is tightly coupled to the Kedro workflow.
MLflow - If you require experiment tracking, model registry and model serving capabilities or have access to Managed MLflow within the Databricks ecosystem, you can use MLFlow with Kedro through the Kedro-MLflow plugin.
Neptune.ai - If you require experiment tracking, model registry functionality, improved metrics visualisation and support for collaborative data science, you can use Neptune.ai with Kedro through the Kedro-Neptune plugin.

What's next for experiment tracking in Kedro?

Our next steps are to work on a pain point identified in recent user research and enable users to write experiments to a remote server. Currently, a user can only store experiment data on their local machine, but users want to be able to write to storage and share it with other team members. Supporting this feature would encourage multi-user collaboration across the user's team.

Existing Kedro users have indicated that a solution would transform their workflow:

"If we could write our metrics files to an S3 bucket and then run experiment tracking pointing at that S3 bucket, that simplifies our workflow in many different ways and would be really helpful. And it would make Kedro experiment tracking just as easy, if not easier, than MLFlow for us."
— Kedro user feedback

Update June 1st 2023: We are pleased to announce that we have now released a version of Kedro-Viz that supports shared experiment storage. Please see the related post about collaborative experiment tracking in Kedro-Viz.

Summary

Kedro experiment tracking makes it easy to view and compare your experiments and see how your metrics have changed from the Kedro-Viz web app.

This article has described recent developments in ML experiment tracking by the Kedro team. Our documentation explains how to set up and use experiment tracking in Kedro today, so you can consider adopting it in your next project.

You can also join the Slack community, check our website, and get updated with the latest developments in Kedro.

On this page:

Nero Okwa

Product Manager, Kedro

All blog posts

Kedro newsletter — 5 min read

In the pipeline: July 2024

From the latest news to upcoming events and interesting topics, “In the Pipeline” is overflowing with updates for the Kedro community.

Jo Stichbury

1 Jul 2024

SQL in Python — 7 min read

Streamlining SQL Data Processing in Kedro ML Pipelines

Kedro and Ibis streamline the management of ML pipelines and SQL queries within a Python project, leveraging Google BigQuery for efficient execution and storage.

Dmitry Sorokin

5 Jun 2024

Kedro newsletter — 5 min read

In the pipeline: May 2024

From the latest news to upcoming events and interesting topics, “In the Pipeline” is overflowing with updates for the Kedro community.

Jo Stichbury

7 May 2024

Best practices — 5 min read

A practical guide to team topologies for ML platform teams

Creating data platforms is a challenging task. A guest author explains how Kedro reduces the learning curve and enables data science teams.

Carlos Barreto

30 Apr 2024

Kedro-Viz — 6 min read

Share a Kedro-Viz with Github pages

We have added support to automate publishing to Github pages through the publish-kedro-viz Github Action. Learn how to configure and use the feature!

Nero Okwa

4 Apr 2024