Experiment tracking — 10 min read

Collaborative experiment tracking in Kedro-Viz

We've launched collaborative experiment tracking in Kedro-Viz 6.2.0 to enable a team of users to log and compare each others' experiments. This tutorial explains how to configure and use the feature.

1 Jun 2023 (last updated 1 Jun 2023)
small square tie fighter

When training a model in machine learning, the goal is to determine the optimal configuration of attributes such as hyper-parameters, metrics, and training data. The process of identifying the best combinations requires running a lot of experiments and comparing them. As I mentioned in my previous article, experiment tracking is a way to record all the metadata you need to compare machine-learning experiments and recreate them for your project.

What is Kedro-Viz?

Kedro-Viz is an interactive development tool for building and visualising data science pipelines with Kedro. It enables you to monitor the status of your ML project, present it to stakeholders, and easily bring new team members up to speed. You can try it out using our hosted demo.

There's no better method to give an overview of a pipeline's structure in such an engaging, interactive, and thorough way. Our asset's pipelines are very complex, but are structured with modular pipelines, so being able to show the overall structure at the modular pipeline level, before jumping into each individual pipeline helps prevent the audience from getting overwhelmed by the number of nodes and datasets shown”.

Senior Data Scientist at Consultancy

What is experiment tracking in Kedro-Viz?

Experiment tracking on Kedro-Viz enables users to select, plot, and compare how multiple metrics change over time, and identify the best-performing ML experiment, with no additional dependencies to manage or infrastructure needed.

The video below demonstrates experiment tracking on Kedro-Viz:

During a project with multiple team members, you could end up with a scenario where the results of your experiments are spread across many machines because people are iterating on their individual computers. This makes the tracking process difficult to manage at a team level, as suggested by this feedback from our users.

"You might train one model locally on your computer. You might train another one in the cloud. Joe might run another pipeline or another experiment. Having all of those experiments in one place as a single source of truth is really powerful.

"If we could write our metrics files to an S3 bucket and then run experiment tracking pointing at that S3 bucket, that simplifies our workflow in many different ways and would be really helpful. And it would make Kedro experiment tracking just as easy, if not easier, than MLFlow for us."

"Can you use an existing database so that we can keep track of runs happening in different places?"

We have found a way to address this pain point and enable you to collaborate more easily. We are excited to announce that we've launched collaborative experiment tracking in Kedro-Viz 6.2.0. The new feature enables a team of users to log their experiments to a shared cloud storage service and view and compare each others' experiments in their own experiment tracking view. This simplifies their workflow, providing a single ‘source of truth’ and encourages multi-user collaboration.

We are releasing this feature in stages across different versions, and the first phase is Kedro-Viz 6.2.0. This version enables users to read experiments of other users that are stored on Amazon S3 or similar storage solutions on other cloud providers, as long as they are supported by fsspec. Future versions of collaborative experiment tracking aim to improve the user experience through automatic reloading and optimisation by caching.

Get started with collaborative experiment tracking

Follow these steps to set up collaborative experiment tracking in Kedro-Viz:

Step 1: Update Kedro-Viz

Ensure you have the latest version of Kedro-Viz (6.2.0 or later).

1pip install kedro-viz --upgrade

Step 2: Set up cloud storage

Kedro-Viz uses fsspec to save and read session_store files from a variety of data stores, including local file systems, network file systems, cloud object stores (e.g., Amazon S3, Azure Blob Storage, Google Cloud Storage), and HDFS.

Set up a central cloud storage repository such as a AWS S3 bucket to store all your team's experiments.

Step 3: Configure your Kedro project

Locate the settings.py file in your Kedro project directory and add the following:

1from kedro_viz.integrations.kedro.sqlite_store import SQLiteStore
2from pathlib import Path
6    "path": str(Path(__file__).parents[2] / "data"),
7    "remote_path": "s3://my-bucket-name/path/to/experiments",

Step 4: Set up a unique username

Kedro-Viz saves your experiments as SQLite database files on the central cloud storage. To ensure that all users have unique filenames, you need to set up your KEDRO_SQLITE_STORE_USERNAME in the environment variables. By default, Kedro-Viz will take your computer username if this is not specified.

1export KEDRO_SQLITE_STORE_USERNAME ="your_unique__username"

Step 5: Configure cloud storage credentials

From Kedro-Viz version 6.2, the only way to set up credentials for accessing your cloud storage is through environment variables, as shown below for Amazon S3 cloud storage.

1export AWS_ACCESS_KEY_ID="your_access_key_id"
2export AWS_SECRET_ACCESS_KEY="your_secret_access_key"
3export AWS_REGION="your_aws_region"

In the screenshot below we show an example of the session store and Kedro-Viz output for three team members (Huong, Tynan, and Rashida):

Session store showing the 3 objects for Huong, Tynan, and Rashida.
Session store showing the 3 objects for Huong, Tynan, and Rashida.
Three separate Kedro-Viz runs by Huong, Tynan, and Rashida.
Three separate Kedro-Viz runs by Huong, Tynan, and Rashida.

This tutorial offers a very swift run through of the configuration process. For further information, check out the documentation on the experiment tracking feature and keep up-to-date with the latest news about Kedro and Kedro-Viz on our Slack channels.

Many thanks to the Kedro-Viz team especially @Rashida Kanchwala for contributing to this post.

Find out more about Kedro

There are many ways to learn more about Kedro:

Further reading on the Kedro blog

On this page:

Photo of Nero Okwa
Nero Okwa
Product Manager, Kedro
Share post:
Mastodon logoLinkedIn logo

All blog posts

cover image alt

Kedro newsletter — 5 min read

In the pipeline: September 2023

"In the Pipeline" is overflowing with the September's Kedro news.

Jo Stichbury

6 Sept 2023

cover image alt

Databricks — 8 min read

How to integrate Kedro and Databricks Connect

In this blog post, Diego Lira explains how to use Databricks Connect with Kedro for a development experience that works completely inside an IDE.

Diego Lira

11 Aug 2023

cover image alt

Kedro newsletter — 5 min read

In the Pipeline: July 2023

"In the Pipeline" is overflowing with the latest Kedro news, upcoming events, and interesting topics.

Jo Stichbury

12 Jul 2023

cover image alt

Datasets — 10 min read

A new Kedro dataset for Spark Structured Streaming

This post illustrates the extensibility of Kedro with a new dataset for realtime data processing using Spark Structured Streaming.

Jo Stichbury

11 Jul 2023

cover image alt

Databricks, Datasets — 10 min read

How to use Databricks managed Delta tables in a Kedro project

This post explains how to use a newly-released dataset for managed Delta tables in Databricks within your Kedro project.

Jannic Holzer

5 Jul 2023