kedro

Kedro news — 5 min read

Exploring our new release: Kedro 0.19

This blog post gives details of the enhancements and improvements to Kedro in the recent 0.19 release.

15 Jan 2024 (last updated 29 Jan 2024)

Towards the end of the working year, in December 2023, we made a new major release of Kedro. Kedro 0.19 contains a host of new features, bug fixes, and documentation improvements.

This blog post gives details of the release and explains how find out more about recent enhancements and improvements to Kedro. The release also includes a few breaking changes with respect to 0.18.x because we have streamlined configuration management, dataset loading and project structure. We’ll explain below what’s changed and where to get more information.

Headline news

Kedro 0.19 introduced project tools to help you create a new Kedro project, customised for your needs. You can now invoke kedro new in the CLI and generate a project that contains the code you need while omitting the tools and example code you don’t want. We added a set of new spaceflights starters (spaceflights-pandas, spaceflights-pandas-viz, spaceflights-pyspark, and spaceflights-pyspark-viz) for use in combination with the kedro new command, and substantially revised the documentation for this area. There is a new guide to help users get a new project created swiftly and a section to explain the customisation options in detail.

When it comes to default project structure, Kedro 0.19 now includes the build configuration and project metadata in pyproject.toml, so that Kedro projects now follow modern Python packaging standards and have a similar structure to any other Python library.

We have previously explained that our goal of making Kedro leaner, lightweight, and fast-evolving, needed us to decouple framework code from the Kedro dataset code. In Kedro 0.19, we’ve released the kedro-datasets package and removed kedro.extras.datasets from framework code. Alongside this change, we’ve also improved the error messages displayed when a dataset is not found by raising a more explicit error when dependencies are missing, in differentiation to errors caused simply by typos.

OmegaConfigLoader is now the only configuration loader in Kedro, as we have removed the alternatives. Furthermore, Kedro 0.19 now enables you to choose between a merge strategy. The default is a destructive merge, but there's also option for a soft merge strategy for configuration files loaded with OmegaConfigLoader.

The main changes in Kedro 0.19

Here’s a short list of some of the other changes we made in this release:

We dropped Python 3.7 support.
We added the --conf-source option to %reload_kedro to enable users to specify a source for project configuration.
We added validation for the configuration file used to override run commands via the CLI.
We moved the default environment base and local from config loader to  _ProjectSettings. This enables the use of config loader as a standalone class without affecting existing Kedro users’ code.
We enhanced the documentation with a new top-level navigation to easily switch between Kedro, Kedro Viz, and Kedro-Datasets documentation, and a new search-as-you-type to improve the search experience.

There were numerous bug fixes and tweaks in the release, such as the following:

Added a new field tools to pyproject.toml when a project is created.
Added validation to node tags to be consistent with node names.
Removed pip-tools as a dependency.
Accepted path-like filepaths more broadly for datasets.

For the complete list of changes go to the release notes for Kedro 0.19.0 and Kedro 0.19.1.

Breaking changes in Kedro 0.19

These are the significant breaking changes in Kedro 0.19 compared to Kedro 0.18.x:

ConfigLoader and TemplatedConfigLoader have been removed.
The new datasets package (kedro-datasets) replaces kedro.extras.datasets and tests.
PartitionedDataset and IncrementalDataset were removed from kedro.io and moved to kedro-datasets.
Logging was removed from OmegaConfigLoader in favour of the environment variable  KEDRO_LOGGING_CONFIG.
Support for the layer attribute when defined at a top-level within DataCatalog was removed.
Inconsistencies in the use of naming were eliminated by renaming data_set and DataSet to dataset and Dataset across the codebase.
The create_default_data_set() method in the AbstractRunner was removed in favour of using dataset factories to create default dataset instances.
The default project template now has only one pyproject.toml at the root of the project (containing both the packaging metadata and the Kedro build config).

For more information if you are upgrading from Kedro 0.18, have a look at the migration guide.

Get started with Kedro 0.19

You can install Kedro 0.19 with pip install kedro==0.19.1 or conda/mamba/micromamba install -c conda-forge kedro=0.19.1.

Note that we released Kedro 0.19.0 but detected a problematic bug with it so released Kedro 0.19.1 with a fix immediately afterwards, and this is the one you should install and use.

Find out more about Kedro

There are many ways to learn more about Kedro:

Join our Slack organisation to reach out to us directly if you’ve a question or want to stay up to date with news. There's an archive of past conversations on Slack too.
Read our documentation or take a look at the Kedro source code on GitHub.
Check out our video course on YouTube.

Introduction to Kedro: Building Maintainable Data Pipelines

What’s next?

At the time of writing, in January 2024, we are planning the milestones for our next releases. (You can see what we're working on right now , whenever you are reading this post, on our sprint board).

We welcome every community contribution, large or small so please do continue to report bugs or suggest future features over on GitHub and raise discussions on Slack.

Stay tuned for an online community session about the new release soon. We’ll announce dates just as soon as we can!

On this page:

Juan Luis Cano Rodríguez

Product Manager, QuantumBlack

All blog posts

Kedro newsletter — 5 min read

In the pipeline: July 2024

From the latest news to upcoming events and interesting topics, “In the Pipeline” is overflowing with updates for the Kedro community.

Jo Stichbury

1 Jul 2024

SQL in Python — 7 min read

Streamlining SQL Data Processing in Kedro ML Pipelines

Kedro and Ibis streamline the management of ML pipelines and SQL queries within a Python project, leveraging Google BigQuery for efficient execution and storage.

Dmitry Sorokin

5 Jun 2024

Kedro newsletter — 5 min read

In the pipeline: May 2024

From the latest news to upcoming events and interesting topics, “In the Pipeline” is overflowing with updates for the Kedro community.

Jo Stichbury

7 May 2024

Best practices — 5 min read

A practical guide to team topologies for ML platform teams

Creating data platforms is a challenging task. A guest author explains how Kedro reduces the learning curve and enables data science teams.

Carlos Barreto

30 Apr 2024

Kedro-Viz — 6 min read

Share a Kedro-Viz with Github pages

We have added support to automate publishing to Github pages through the publish-kedro-viz Github Action. Learn how to configure and use the feature!

Nero Okwa

4 Apr 2024