HubofML #10: Happy New Year
This month's newsletter features posts from GoogleAI, Amazon, Dashdoor, Duolingo, Spotify, Slack, Uber, Shopify, Github, LinkedIn, and more.
What a year 2020 had been! For me, It's all about you - HubofML's subscribers 💯. As we head into 2021, my mission remains the same: curated links on machine learning, data science, software engineering, and tech leadership.
Whether you're a machine learning engineer, a software engineer, an architect, a tech lead, or an engineering manager, you can rest assured that there is something in each edition for you. Â
If there's anything I can ever do better, head here to give me anonymous feedback.
As usual, I hope this edition sparks ideas in you. If you miss the last one, you can catch up here.Â
Machine Learning
How DoorDash is Scaling its Data Platform to Delight Customers to Meet Their Growing Demand
Many of the fastest-growing and successful companies are data-driven. While data often seems like the answer to many businesses' problems, data's challenging nature, from its variety of technologies, skill sets, tools, and platforms, can be overwhelming and difficult to manage. This article sheds light on the challenges faced by organizations similar to DoorDash and how they have charted the course thus far.Â
How the NFL Builds Computer Vision Training Datasets At Scale
One of my favorites reinvent tracks this year was "How the NFL builds computer vision training datasets at scale." If you have seen the track already, here is an additional post on the subject you should read.Â
Things Not Strings: Understanding Search Intent with Better Recall
For every growing company using an out-of-the-box search solution, there comes a time when the corpus and query volume gets so big that developing a system to understand user search intent is needed to show relevant results consistently. Here is how Doordash's team improve their search performance with a 9% improvement in click-through rate, 10% improvement in conversion rate, and a 76% reduction in null rate.
Voice Separation with an Unknown Number of Multiple Speakers
Social events often have multiple voices speaking simultaneously. Here is a new way to separate a mixed audio sequence using gated neural networks that are trained to separate the voices at multiple processing steps while maintaining the speaker in each output channel fixed.
MediaPipe Holistic — Simultaneous Face, Hand and Pose Prediction, on Device
MediaPipe is an open-source framework designed specifically for complex perception pipelines leveraging accelerated inference. Last month, a group of researchers at Google announced a real-time version of MediaPipe called MediaHolistic. MediaHolistic offers a real-time, simultaneous perception of human pose, face landmarks, and hand tracking on mobile devices.Â
Building Infrastructure to Support Audio Research
Humans can tell the difference between a swooning vocal, a danceable beat, and a buzzing bee, but can we teach machines to hear those differences, too? Recently, Spotify open-sourced Klio – a framework for building smarter data pipelines for audio and other media processing. This is a good read from the Spotify team on what Klio is capable of and what went into building it.Â
How Foxconn Built an End-to-end Forecasting Solution in Two Months with Amazon Forecast
Foxconn's factory in Mexico assembles and ships electronics equipment to regions in North and South America. Each Foxconn's product has its own seasonal variations and requires different complexity and skill levels to build. For Foxconn, having individual forecasts for each product is important to understand the mix of skills they need in their workforce. This article describes how they built an end-to-end forecasting solution with Amazon Forecast.
This month I came across this awesome post by @dennybritz  on lessons learned from building a profitable algorithmic trading system using Reinforcement Learning techniques. It's a good read.
How Duolingo uses AI in every part of its app
Duolingo is one of the world's most popular language-learning platform. It has seen its new users grew by 101% this year alone. But the power of Duolingo lies in how it leverages machine learning; it builds an exceptionally detailed profile based on what you know and what you don't know as you use it. This is a good read on how Duolingo is using ML to enhance language learnings
Software Engineering
Uber's Real-Time Push Platform
For Uber, multiple participants can modify and view the state of an ongoing trip. This creates the need to keep all active participants and apps synced with real-time information. Here, Uber engineers describe how they went from polling for refreshing the app to a gRPC-based bi-directional streaming protocol to build our app experience.Â
Flaky tests are puzzling, frustrating, and a waste of time. As much we all hate to see flaky tests causing our builds to fail, they're about as common as a developer who brews pour-over coffee every morning. That's why I specifically love this post by @jnraine on Github's approach to reducing flaky builds.Â
What It's Like to Be an Engineering Manager on a Product-Oriented Team
What does it feel like to be an engineering manager on a product-oriented team? This is a good read from Anna Glukhova on what the engineering manager role looks like at Grammarly.
How LinkedIn Scales Compatibility Testing
One of the significant issues with multi-repo codebases is maintaining and co-ordinating changes across multiple repositories. But LinkedIn has 12,000 Multirepo codebases. How do they manage changes and compatibility?Â
In this post, @ndini92 and @dsully describe how Linkedin makes multi-repo codebases work at scale and the tooling surrounding it.
How to Scale Reliably Your Data Platform for High Volumes
For Shopify, Black Friday and Cyber Monday mean two things: sales and data. There will be a lot of sales, and there will be a lot of data. But how do you reliably scale your data platform when you see an average throughput increase of 150%? Here is a good read from @rbizla, a senior software engineer at Shopify.
Which Cloud Provider Is Right for Your Kubernetes WorkloadsÂ
Most cloud providers have their own set of Kubernetes hosting environments for managing containers. Although Kubernetes offerings vary wildly among providers, there are some factors to consider as you shop for the right platform to manage your container workloads. Here is a good read to help you weigh your options.
Engineering Leadership
Stop Measuring the Wrong Thing
Most engineering leaders would say that they want to create high performing teams. Unfortunately, many don't see that the way that they measure, track, and report on their teams cause performance to suffer. This post sheds light on a list of measures that decrease agile team performance.
Understanding how to have effective meetings is likely one of the most important skills employees at all levels need to learn to thrive at work.  Jennifer Phillips shares five tips for running effective meetings and making them enjoyable.
Technical Decision-Making and Alignment in a Remote Culture
Engineering organizations of all sizes make technical decisions every day. Most of the time, these decisions are on a relatively small scale: an individual engineer or a small team solving a problem in the way that makes the most sense to them. In a world of asynchronous communication, it's more important than ever to create inclusive and remote-friendly collaboration, decision-making, alignment, and documentation processes. Here is a good read from @CoralineAda
Thanks for reading! If you like this newsletter and want to support it, please share it with others.
Cheers 🎉,