The Kingmaker at NETFLIX: Data-Driven Predictive Modeling

Aug 3, 2021
7 min read

The day was May 19 of the year 2019, and the very last episode of Game of Thrones was just about to air. Emotions ran high and wild as the opening credits streamed in for an audience of eight that night.

A few seconds in, and we realized that the audio and video were off by a couple of seconds.

It was anger, despair, and irritability wrapped in one.

Since we were too invested in the finale by then (mainly Jon Snow), we faithfully cursed the streaming service in multi-colored tones, vowed never to use it again, and decided to battle through by turning ON the subtitles.

This incident occurred, way before I started subscribing to Netflix, and now I just can’t seem to get enough. Netflix is my window into new lives, cultures, and experiences. Its high-quality content, great user experience, and on-point recommendations seem nothing less than magic.

So, How did Netflix actually make all of this happen? And more importantly, How does Netflix (a comparatively costlier alternative) manages to poach a subscriber amidst the sea of free, pirated, and cheaper video-on-demand alternatives?

Simple Answer: Data-driven Predictive Modeling Techniques.

Need Details? Read ahead.

Predictive modeling is a way to answer “What is most likely to happen?”, based on what has already happened (Historical data) and happening right now (Current data). It essentially maps out different probable outcomes that are likely to occur, using Machine Learning, Artificial Intelligence, Data Mining, Statistics, and whatnot.

The success or failure of such predictive analytics is heavily biased towards the abundance of quality data availability - which is used to select, train, and test various predictive modeling techniques. And, with its millions of subscribers streaming countless hours every single day,

"Netflix is Data-rich, Very rich!"

Some of its data sources include Member Ratings, Member Information (location, language, watch duration), Video data (play duration, box office performance, critic reviews), User device data (type, features), Metadata (actors, directors, genre, parental rating), Real-time user actions (scrolls, mouse-overs, clicks, time spent on a given page, etc.), Social network data, Search terms entered by members, etc. [Sources as listed in various blogs, interviews and Quora questions answered by Netflix data engineers]. This abundant and versatile data is utilized by Netflix to understand, predict, and optimize its various product features using data-driven predictive modeling techniques, to enhance and personalize your experience.

Everyone Loves Netflix Recommendations

Netflix knows its customers’ expectations intimately!

The Million Dollar Netflix prize (launched in 2006), first brought attention to Netflix’s committed focus on improving its recommendations engine. Flashforward to 2021, and today Netflix personalizes millions of different products for every single one of its million subscribers using data-driven predictive modeling techniques.

This user-experience optimization which started with member ratings has evolved to optimize content titles, the imagery used to portray these titles, synopses, metadata, trailers, the number of titles displayed in a row, etc. And, all of this happens while competing continuously with the new incoming titles and changing user behavior. Given the global scale of Netflix’s audience, this engine is fine-tuned to be in line with the cultural, racial, and overall societal sensitivities of the audience as well. For example, an American staple might not sit well with an Indian audience and vice-a-versa. Xavier Amatriain, Former Engineering Director at Netflix, says this in a WIRED interview:

“We know what you played, searched for, or rated, as well as the time, date, and device. We even track user interactions such as browsing or scrolling behavior. All that data is fed into several algorithms, each optimized for a different purpose. In a broad sense, most of our algorithms are based on the assumption that similar viewing patterns represent similar user tastes. We can use the behavior of similar users to infer your preferences.”

Predictive Quality Control

Butcher this and the churn rate skyrockets in no time!

When it comes to monitoring content quality (audio, video, subtitles, closed captions, etc]), Netflix sets the bar quite high. It imposes several Quality Checks (QC) consisting of various automated and manual inspections across its massive digital supply chain. Checks such as missing translation, censored content, incorrect frame rate, luminance shift, and audio loudness are some of the issues identified and replaced during QC to avoid any disruptive and disturbing streaming experience.  

With an ever-increasing global member base and Content catalog, Netflix looks up to data and predictive modeling techniques for employing quality checks accurately and efficiently. The machine learning model is trained using the past QC data and predicts the probability of an asset not meeting the Netflix quality standards.

Netflix Supply Chain with Predictive Quality Control [Image Source]

Global Expansion

As of 2021, Netflix is available for streaming in over 190 countries.

As Netflix continues to expand globally, a variety of new content categories, catering to different languages (dubbed audio and subtitles) and cultural preferences has to be added to its library every single day. Apart from the massive data and engineering effort that goes into installing and maintaining such data servers throughout the world, a global expansion of such a humongous scale also adds to the diversity of the receiving networks and devices. This poses a myriad of new technical challenges. How do Netflix deals with such expansive exertions has been elaborated in the succeeding paragraphs?

Personalized Streaming Experience

Not only ‘WHAT, but Netflix also optimizes “HOW you watch your content”!

Apart from suggesting what to watch, Netflix also tries to optimize another key QoE (Quality of Experience) metric that is “How you watch your content”. It accomplishes this by utilizing video performance data such as playback failures, rebuffer rate [Time required to replenish local client buffer from the server], bitrate, etc. This data is used to study and train predictive models determining the impact that each performance parameter on an individual member’s streaming experience. Ultimately, these predictions are used to fine-tune algorithms governing the QoE of a member for delivering a high-quality streaming experience.

The Netflix Streaming Supply Chain: Opportunities to optimize the streaming experience exist at multiple points [Image Source]

Chaitanya Ekanadham, the Senior Research Scientist at Netflix, elaborates:

“As we expand rapidly to audiences with diverse viewing behavior, operating on networks and devices with widely varying capabilities, a “one size fits all” solution for streaming video becomes increasingly suboptimal. For example: Viewing/browsing behavior on mobile devices is different than on Smart TVs. Cellular networks may be more volatile and unstable than fixed broadband networks. Networks in some markets may experience higher degrees of congestion. Different device groups have different capabilities and fidelities of internet connection due to hardware differences.”

Optimizing Content Caching

Even before the credit rolls on, Netflix caches and plays Next!

Now, as the credits start rolling, a user will always have a choice to continue with Netflix’s recommendations or let Netflix choose the ‘Next’. To reduce the time latency between consequent video plays and facilitate a smooth binge-watching experience, Netflix employs predictive caching models. These models are trained using historical member viewing data and other contextual variables, to predict the most likely content, that might have been played next by the user. Since caching is largely dependent on the size and bandwidth available at an intermediate internet hop, the use of a predictive model makes an intelligent data-driven guess as to “What to cache next?” from the massive Netflix catalog.

For streaming one episode of The Crown Netflix has to cache around 1,200 files! [Image Source]

Mohit Vohra, Director, Content Delivery, says:

“Maximizing caching efficiency at the closest possible locations translates to lesser network hops. Lesser network hops directly improve user streaming quality and also reduces the cost of transporting network content for both ISP networks and Netflix. Furthermore, maximizing caching efficiency makes responsible and efficient use of the internet.”

Predictive Content Delivery

No one loves the buffer wheel!

Internet video-on-demand streaming services are akin to tunnel delivery, where both sending and receiving devices/networks in question play a crucial role. And, Netflix heavily invests in utilizing state-of-the-art predictive modeling algorithms and technologies for optimizing user experience from both ends. However, some of the factors that usually impede Quality of Experience (QoE) at the user end are the poor quality of a subscriber’s Internet connection, characteristics of the delivery network, algorithms being run on the playing device, etc. To overcome these, Netflix makes use of the historical user data pertaining to the network (bandwidth, round-trip time, stability, predictability) and device (firmware update, UI changes) for a better characterization of the user device and network, and adjusts accordingly.

Illustration of the video quality adaptation problem [Image Source]

Netflix has 203.66 million subscribers worldwide, with 5,415 content titles in their US library alone. One of the major factors contributing to this success is Netflix’s utilization of its abundant data combined with extensive use of predictive modeling techniques to learn, predict, and improve its member experience continuously. The various QoE (Quality of Experience) metrics discussed above impact each other as well as member interaction.

To summarize, By using predictive modeling techniques, Netflix is able to:

  • Scale various processes to cope up with the growing demand.
  • Re-allocate crucial resources to further improve the member experience.
  • Personalize content even in the face of poor quality networks or devices (Important when dealing with new and uncertain markets).
  • Capture and keep user attention for as long as possible by providing a highly addictive and hyper-personalized user experience.
  • And much more, which we might not know about.

So, the next time you wonder as to why Netflix continues to be on the top or why you simply can’t stop watching it, recall this:

Netflix is continuously using data and algorithms to outspend and outproduce its competition, and the scale of hyper-personalization it provides to your viewing experience keeps on pumping those nasty endorphins!