HubofML - Newsletter #8
Predicting Ad Value at Twitter, Scaling Data Platform at DoorDash, Building Background Feature in GoogleMeet, Scaling Live Streaming to Millions of Viewers at Facebook and More
Hey,
Welcome to another edition of my newsletter, the eighth this year, which I hope will spark new ideas 💡 and provide you with useful information on how tech companies tackle various engineering problems. 💯
If you miss the last one, you can catch up here.
I want to make sure each edition brings something valuable to you; that's why your feedback matters to me. If you have any ideas or requests for future editions, let me know.
I hope you enjoy this month's edition. Please forward it to a friend, colleague, or anyone that comes to mind. 🙏
Machine Learning
A high efficient, real-time text-to-speech system on CPUs
How Facebook built and deployed a real-time neural text-to-speech system on CPU servers, delivering industry-leading compute efficiency and human-level quality.
Predicting the Value of Ad Request at Twitter
A post on how Twitter is using machine learning to predict the value of ad requests.
Scaling its Data Platform to Delight Customers and Meet our Growing Demand at DoorDash
Here is how DoorDash is able to deliver a reliable data platform that enables optimal business operations, pricing, and logistics as well as improved customer obsession, retention, and acquisition.
Background Features in Google Meet, Powered by Web ML
Google recently announced ways to blur and replace your background in Google Meet, which uses machine learning (ML) to better highlight participants regardless of their surroundings. Here is a post on how Google engineers built this feature into GoogleMeet.
Improving Deep Learning for Ranking Stays at Airbnb
This is how Airbnb improves the DNN architecture of its search ranking.
Pensieve: An Embedding Feature Platform
Pensieve is an embedding feature platform developed at Linkedin to pre-compute and publish entity embeddings. This post describes how it works and its architecture.
Deep Dive into ML Models in Production Using Tensorflow Extended (TFX) and Kubeflow
A quick introduction to TFX and how to deploy a ML project to production using TFX, Google AI Platform Pipelines, and Kubeflow.
PyTorch Loss Functions: The Ultimate Guide
A comprehensive guide to loss functions in Pytorch.
How to Extract Structured Data from Invoices
A comprehensive overview of techniques for structured key-value pair information extraction from invoices. The post reviews research papers that explore data extraction and touch upon how to get started implementing the methods.
Getting started with Torchserve
Learn Torchserve with examples.
10 Useful ML Practices For Python Developers
Creating ML models does not give the freedom to write crappy code. Pratik Bhavsar wrote a post on ten best practices for Python Developers.
How to Deploy PyTorch Lightning Models to Production
Pytorch Lightning provides a Python wrapper for PyTorch that lets data scientists and engineers write clean, manageable, and performant training code. Caleb Kaiser wrote a post on how to deploy Python Lightning models to production.
Software Engineering
How We Scale Live Streaming for Millions of Viewers Simultaneously
How Facebook built a system capable of managing both UGC (captured on all kinds of devices at differing quality levels) and broadcast-quality high-res streaming — and working reliably for billions of people around the world.
Revolutionizing Money Movements at Scale with Strong Data Consistency
How Uber migrated hundreds of millions customers between two asynchronous accounting systems while maintaining data-consistency with a goal of zero impact on users.
Scaling Email Infrastructure for Medium Digest
The medium digest contains a list of personalized stories for Medium users. This post describes how Medium scaled the infrastructure responsible for Medium Digest.
Production Testing with Dark Canaries
CI/CD pipelines allow code to be written quickly and pushed to user-facing applications and services. Though it boosts productivity, it has also caused problems such as site or service outages when bad code, configuration, or AI models are pushed to production. The post introduces how Linkedin is using dark canary clusters to detect problems before they hit production.
Building Mental Models of Ideas That Don't Change
In this post, Hammad Khalid, a lead software developer at Shopify, describes some engineering and management mental models that he has found useful over the years.
7 Tips for Creating A Successful CI/CD Pipeline
Continuous integrations and deployments are key elements to a company's success in releasing software actively in the multifold. These are seven tips for creating a successful CI/CD Pipeline.
Tech Leadership
Being Visible
Timing is a particular sort of luck, so you can simplify this even further down to just luck and work in some ways. One of the most effective ways to get luckier is to be more visible within your organization. Read more on how to achieve visibility in your organization, both internal and external.
Meeting Everyone on a New Team
We onboard into a new team all the time. Anna Shipman wrote about what she did to break down barriers between her team and made them feel less intimidated about approaching her as a Technical Director of FT.com.
What is Expected of an Engineering Manager
The role of an Engineering Manager can be summed up in people, delivery, and process. In this post, Rodrigo Flores explained what it's expected of every engineering manager.
Engineering Onboarding Processes at Medium
Efficient onboarding is vital as it helps with employee retention rates, clarifies and sets expectations for the new hire's role, and lowers employee stress. Here is what Medium's onboarding processes look like.
Anyone Can Be a Leader
In this post, Pat Kua dismisses a common misconception about being a leader.
Thanks for reading! If you like this newsletter and want to support it, please share it with others or buy me a coffee. If you have feedback, send it to me via mail.
Cheers,
Samuel