kedro

In the pipeline: November 2023

From the latest news to upcoming events and interesting topics, “In the Pipeline” is overflowing with interesting updates for the Kedro community.

7 Nov 2023 (last updated 7 Feb 2024)

There were a few releases of Kedro projects in October 2023.

Kedro Framework 0.18.14

The Kedro 0.18.14 release added support to customise the pipeline directory structure and override the configuration keys, made substantial improvements to the documentation (including a page on how to use Kedro from Jupyter notebooks), and squashed a number of bugs. It’s the last in the 0.18.x set of releases, and preparations for 0.19.0 are underway!

Kedro-Viz 6.6.0 and 6.6.1

The Kedro-Viz 6.6.0 release added “Publish & Share”, a new feature where you can publish a pipeline visualisation on AWS S3 and share it easily with others. The release included a set of feature documentation to help you use it. A further Kedro-Viz release (6.6.1) was less substantial but squashed a number of bugs.

Kedro-Plugins

We also made a release of kedro-datasets 1.8 in October, which included a range of improvements, including a move for PartitionedDataSet and IncrementalDataSet from the core Kedro frameowrk repo to kedro-datasets , also renaming them to PartitionedDataset and IncrementalDataset.

Finally, there were releases for the official Kedro plugins, kedro-airflow (0.7), kedro-docker (0.4) and kedro-telemetry (0.3), to remove support for Python 3.7 and add support for Python 3.11.

Prepare your codebase for Kedro 0.19.0

You may have seen this announcement on the Kedro Slack channel:

The Kedro Framework team is currently working on our next release, Kedro 0.19.0, which we’re aiming to get to you by the end of the year. There’s a lot happening behind the scenes and we wanted to give you a heads-up about a few changes that you can adopt now to be ready when 0.19.0 hits the streets.

We’ve written migration guides to help you transition from the legacy ConfigLoader and TemplatedConfigLoader which will be removed in Kedro 0.19.0 in favour of OmegaConfigLoader.
Datasets move to their own package kedro-datasets and out of the Kedro core package in Kedro 0.19.0. You can make the change to use that package now. Further, from version 2.0.0 of kedro-datasets, which will be released together with Kedro 0.19.0, all dataset names have changed to replace the capital letter “S” in “DataSet” with a lower case “s”. For example, CSVDataSet is now CSVDataset.

If you need help to update your projects in preparation of Kedro 0.19.0 or post release, please reach out to the team, we’re more than happy to help you.We are very excited about the changes we’re bringing you in this release and more detailed communications about all changes will go out when the release is done.

If you are curious, you can take a look at the preliminary 0.19.0 release notes!

Contributor news: A month of Hacktoberfest

Last month we saw more community contributions last month than ever before. Thanks to the following Kedroids who made contributions to our various releases:

Adam Kells, Alistair McKelvie, Felix Wittmann, flpvvvv, IngerMathilde, Iñigo Hidalgo, harmonys-qb, Jason Hite, Jens Lordén, Jeroldine Akuye Oakley, Laíza Milena Scheid Parizotto, Matthias Roels, Miguel Ortiz, Mustapha Abdullahi, PtrBld, qheuristics, Richard, rxm7706, sbrugman, and Yi Kuang.

We are particularly grateful to those who made Hacktoberfest contributions, and although October is now over, there are still plenty of options if you want to contribute to us. Don’t forget to take a look at the guidance for contributors on the Kedro wiki, and ask us if you’ve any questions, either on individual issues, or over on Slack.

Meet the Kedroid: Laiza Milena Scheid Parizotto

This month, we meet Laiza Milena Scheid Parizotto who was the most prolific contributor to the Kedro project throughout Hacktoberfest. We caught up with her over Slack to find out more about her work.

Where do you work?

I'm located in Dresden, Germany, and I'm currently working as a Data Scientist with a strong focus on MLOps, for Indicium Tech, based in Brazil.

When did you start using Kedro?

I began my journey with Kedro earlier this year, and it has been a game-changer for my data science projects. The reason for adopting Kedro was to enhance project efficiency and maintainability. At Indicium Tech, we've fully embraced Kedro because of the numerous advantages it offers in developing and deploying our machine learning projects.

How have you adapted Kedro to your projects?

One of the key ways I've customized Kedro to fit our needs is by integrating it seamlessly with various cloud platforms. This integration has allowed us to harness the power of cloud services for our data-driven projects, making them more scalable and efficient.

What projects are you working on now, or will be working on soon?

I'm currently immersed in an exciting project involving pricing optimization for a global retail enterprise. Additionally, I have two more projects in the pipeline. One is focused on exploring new datasets with Kedro Streaming, and the other revolves around working with images and using the Pillow library to manipulate image data.

Where can we find you online? e.g. Slack handle, Github repo, or blog presence, stack overflow etc?

You can connect with me on GitHub or on LinkedIn and feel free to reach out if you have any questions or want to discuss data science, Kedro, or related topics. I'm always eager to connect with professionals from various backgrounds and interests, fostering meaningful connections and discussions.

Recently on the Kedro blog

Recently published on the Kedro blog:

We’re always looking for collaborators to write about their experiences using Kedro. Get in touch with us on our Slack workspace to tell us your story!

In other news

We've published a set of Kedro-Viz video tutorials that cover how to get started with Kedro-Viz, how to set up and use experiment tracking and the preview datasets feature in Kedro-Viz.

Last month, Python Espana saw a meeting of past and present Kedro team members. Lais and Juan Luis have both been our developer advocate, and in the photo above you can admire them modelling Kedro T-shirts from 2020 and the present day!

Juan Luis was also invited to Ubuntu Summit in Riga, Latvia early in November, to participate in a panel about the future of AI and give his perspective on how to “cross the chasm” between data scientists and data & machine learning engineers.

ai-panel

What we’ve been reading

Marcin Zabłocki from GetInData | Part of Xebia, and a member of Kedro’s Technical Steering Committee, wrote a blog post about Kedro Dynamic Pipelines in October. It covers some of the common use cases and provides an approach that may be useful if you’ve had similar requirements in the past. It’s a hot-topic in the Kedro TSC right now so you can expect more about dynamic pipelines to come in 2024.

Another hot topic is to make it easy to transition to use Kedro if you’ve already got a project in a notebook. This month, we published a blog post about adding Kedro features to a Jupyter notebook without converting it into a full-Kedroised project. There will be on this topic in the weeks to come along and we’ve also revised our docs on this topic recently.

We’ve been learning more about the [kedro-boot plugin](https://github.com/takikadiri/kedro-boot) to streamline the integration between Kedro projects and external applications. The plugin includes features such as injecting application data into the Kedro Data Catalog and enables you to orchestrate multiple Kedro pipeline runs dynamically. Takieddine Kadiri introduced it on the Kedro Slack channels last month, where you can find out more and give feedback to the plugin authors.

Finally, if you write an article, podcast or video that discusses Kedro, let us know about on Slack, and add it to the “Awesome Kedro” repository so others can find it!

That’s it for this edition!

And that’s a wrap for this month.

Don’t forget that we toot out regular Kedro updates onto Mastodon (https://social.lfx.dev/@kedro) and across the popular channels of the Slack community. Keep an eye on the QuantumBlack LinkedIn feed too!

Don’t forget you can bookmark this blog or add our RSS feed to your favorite reader to stay in the loop and join us next month for another update from the Kedro team.

On this page:

Jo Stichbury

Technical Writer, QuantumBlack

All blog posts

Kedro newsletter — 5 min read

In the pipeline: July 2024

From the latest news to upcoming events and interesting topics, “In the Pipeline” is overflowing with updates for the Kedro community.

Jo Stichbury

1 Jul 2024

SQL in Python — 7 min read

Streamlining SQL Data Processing in Kedro ML Pipelines

Kedro and Ibis streamline the management of ML pipelines and SQL queries within a Python project, leveraging Google BigQuery for efficient execution and storage.

Dmitry Sorokin

5 Jun 2024

Kedro newsletter — 5 min read

In the pipeline: May 2024

From the latest news to upcoming events and interesting topics, “In the Pipeline” is overflowing with updates for the Kedro community.

Jo Stichbury

7 May 2024

Best practices — 5 min read

A practical guide to team topologies for ML platform teams

Creating data platforms is a challenging task. A guest author explains how Kedro reduces the learning curve and enables data science teams.

Carlos Barreto

30 Apr 2024

Kedro-Viz — 6 min read

Share a Kedro-Viz with Github pages

We have added support to automate publishing to Github pages through the publish-kedro-viz Github Action. Learn how to configure and use the feature!

Nero Okwa

4 Apr 2024