Write reproducible, maintainable and modular data science code

*A tribute to “Learn you a Haskell for Great Good!

Grab a coffee and sit down. This is a long post!

Latte art. By Yuko Honda on Flickr (CC BY-SA 2.0) https://flic.kr/p/d9KiHk

In this article, I introduce Kedro, an open-source Python framework for creating reproducible, maintainable and modular data science code. After a brief description of what it is and why it is likely to become a standard part of every data scientist’s toolchain, I describe some technical Kedro concepts and illustrate how to use them with a tutorial.

Overall, you should be able to read and digest this article in about 30 minutes. …

Don’t break up with Jupyter Notebooks. Just use Kedro too!

This article is suitable for anyone who has found themselves seduced by the ease of working with Jupyter Notebooks. Although it’s aimed at readers who are relatively new to data science, it applies equally to more experienced data scientists & engineers who are considering how to improve their daily workflow.

The convenience of a Jupyter Notebook combined with Kedro’s software best-practice

I’m going to run through the reasons why we fall in and out of love with Jupyter Notebooks, and describe how the open source Kedro framework can help you solve some of the problems that cause headaches and heartaches for so many data professionals. And there’s no breakup involved! …

Applying BERT to analyze ESG topics in financial services

In this article, I’ll introduce you to a hot-topic in financial services and describe how a leading data provider is using data science and NLP to streamline how they find insights in unstructured data.

Tango with Pollock by Pitel on DeviantArt (CC BY-NC-SA 3.0)

Environmental, social, and governance (ESG) metrics measure the sustainability and societal impact of an investment in a company or business. Before committing to a company, investors want to know if there are any potential controversies brewing, or if the company shows particular leadership in an area of ESG, such as diversity in the workforce.

Refinitiv is a global provider of financial market data and infrastructure, and…

A new Python library for production-ready data pipelines

In this post, I will introduce Kedro, a new open source tool for data scientists and data engineers. After a brief description of what it is and why it is likely to become a standard part of every professional’s toolchain, I will describe how to use it in a tutorial that you should be able to complete in fifteen minutes. Strap in for a spaceflight to the future!

Image by stan (licensed under CC BY-SA 2.0)

Suppose you are a data scientist working for a senior executive who makes key financial decisions for your company. She asks you to provide an ad-hoc analysis, and when you do, she…

*Be* the change in your online community

© Derek Brahney for Mosaic

If you hang out “below the line” in the comments section of newspapers, on Reddit, StackOverflow or similar forums, there’s a good chance you’ve seen unkindness on the Internet.

In technology, particularly, we rely upon getting answers to our questions from the online community. There are rarely books on Swizzy startup’s Esoteric API v0.9, or Funky Framework v2; if you want answers to a question or problem that you cannot find for yourself, you go online to discuss it. …

How will we offset the rapidly increasing power consumption of data centres? Hyperscale has plucked the low hanging fruit. What’s next?

BalticServers.com [CC BY-SA 3.0 (https://creativecommons.org/licenses/by-sa/3.0)], from Wikimedia Commons

Pop quiz: Which organ of your body uses the most energy?

It’s the brain, of course.

Your brain uses up to 20 percent of all the calories you consume each day.

Maybe that explains why we’ve readily accepted that computers will use a huge amount of energy in processing, and emit so much heat that they also create a secondary power need to cool them down?

We are becoming increasingly aware of the amount of energy the data centres supporting our favourite social media sites, film & music streaming services and online shopping are using, and are expected to use in the future:

An interview with Dr Stelios Kampakis

Image from http://thedatascientist.com/decision-makers-handbook-data-science/


In this article, I’m interviewing a veteran data scientist, Dr Stylianos (Stelios) Kampakis, about his career to date and how he helps decision makers across a range of businesses understand how data science can benefit them.

An interview with Barry Zane of Cambridge Semantics

Image by Kyle McDonald on Flickr CC BY 2.0


In this interview, I’m catching up with Barry Zane, Vice President at Cambridge Semantics. Barry is creator of AnzoGraph™, a native, massively parallel processing (MPP) distributed graph database. Barry has had quite a journey in database world. He served as Vice President of Technology of Netezza Corporation from 2000 to 2005, and was responsible for guiding all aspects of software architecture and implementation, from initial prototypes through volume shipments to leading telecommunications, retail and internet customers. Netezza was eventually sold to IBM, but prior to that, Barry had turned his attentions elsewhere to found another company, ParAccel, which eventually became…

The hybrid model of computer-human intelligence offers a way to build on our mutual strengths and deliver efficiency

My friends at GRAKN.AI recently published an interesting article lamenting that “machines should be able to outperform humans in many more tasks than they currently can, or at least that they should be able to make truly smart predictions.”

The article makes the point that AI has cracked one of the key attributes of human intelligence — learning — but still has some way to go with logical reasoning over a representation of knowledge.

How do we help artificial intelligence to reason? It is so innate to us that we don’t even know we are doing it.

Image by Stig Nygaard on Flickr [CC BY 2.0]

Take a simple…

Where did it all go wrong?

Still from Jamiroquai’s “Virtual Insanity” from MusicLifeWord.org

In this article, I’m going to talk about how I perceive the mainstream consumer audience to have rejected virtual reality, and suggest that its child, augmented reality, may be the Slope of Enlightenment (of the Gartner Hype Cycle) that convinces us to buy in. While these are my views alone, towards the end of the piece, I’ve unearthed some data from software developers around the world who are working with AR and VR. Even if you don’t care about my views, you may find what they have to say interesting. …

Jo Stichbury

Rōnin technology writer and podcast host. Cat herder. Dereferences NULL.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store