Netflix Prize: Competitions provoke and inspire, awards motivate. In this spirit, Reed Hastings, co-founder, president and CEO of Netflix, launched a great Netflix Award competition in October 2006.
The Orteig Award inspired Charles Lindbergh to take an epochal lonely flight across the Atlantic in 1927. The Kremer Prize for flying a muscular plane made the Gossamer Condor fly across the English Channel in this way. The Ansari X Award in 1994 marked the beginning of commercial innovation in reusable spacecraft and was spectacularly won in 2004. A similar assumption – to inspire and even provoke – was behind the Netflix competition. According to Hastings, it was to promote and popularize research on the effectiveness of recommendation mechanisms.
Netflix tried not only to improve its recommendation system but also to model completely new solutions. That’s why the company released the largest dataset ever released to the public – over 100 million ratings on a scale of 1 to 5, covering 17,000 videos and nearly 0.5 million users. Challenge? Anyone who predicts consumer ratings 10% better than Cinematch (Netflix’s own algorithm – ed ) will win a $ 1 million prize. Netflix realized that everything about its business got better as recommendations improved. Crowdsourcing in the form of competition was an additional clever effort to innovate in the recommendations.
Table of Contents
Netflix Prize is a masterpiece of public relations and the promotion of technical innovations. The competition attracted thousands of participants – scientists, entrepreneurs, industry specialists – from all over the world. By the way, it changed the shape of global talks about big data and analytics.
“To understand the nature of the task,” explained social computer scientist, Scott Page, “imagine a giant spreadsheet with lines for each person and a column for each video.” If every user rated each video, the spreadsheet would contain over 8.5 billion ratings. This data comprised only 100 million ratings. Despite such a huge amount, they occupy less than 1.2% of cells. If you open this sheet in Excel, you’ll see blanks first of all. IT professionals call it sparse data. “
The brief description of the reward, Page noted, required predictive algorithms to fill these gaps successfully. The participants were tasked with creating a new generation of systems based on interconnection between users that offered innovative measurements of similarity between people and between movies. However, there was still an assumption that similar people should rate the same movie similarly, and each person would similarly rate similar movies.
However, as Page rightly pointed out, “characterizing the similarity between people or movies involves difficult choices: Are Mel Brooks’s Spaceballs closer to the comedy Is there a pilot flying with us? or maybe to Star Wars, the Spaceballswere a parody “? In the beginning, the film similarity measures prepared by the participants highlighted features such as genre (comedy, drama, action), ticket revenues, and external rankings. Some models tested for specific actors (was Morgan Freeman or Will Smith in the movie?) Or types of events such as gruesome deaths, car chases, and erotic scenes. Later models added data on the number of days between the movie’s premiere on DVD and the date it was rented. The search for and testing of novel features and unexpected correlations dominated the competitive framework of the project.
Page concluded that winning the competition required knowledge of the most important features of films, available information about them, ways of representing the properties of films in computer-accessible languages, good mental models related to film ratings, the ability to develop rating prediction algorithms, and experience in combining different models into working teams. Netflix award winners would have to be both holistic creators and rigorous reductionists.
The competition aroused huge amounts of creative energy and algorithmic insight. By the end of the first year, BellKor, a team from AT&T’s research labs, had overtaken the rest. The team’s top model processed 50 variables into a movie and improved Cinematch by 6.58%. And then BellKor did a lot more: it combined 50 models into one giant team and increased the Cinematch efficiency by 8.43%.
Recognizing the enhanced predictive ability of innovative model bonding, BellKor applied a Game of Thrones-style strategy, that is, forged algorithmic alliances with key competitors. One fruitful alliance made it easier to combine divergent models, while another allowed for a fresh look at viewers’ behaviour. Almost three years after barely fending off the attacks of a rival from Dinosaur Planet (really!), BellKora’s Pragmatic Chaos combo triumphantly broke the 10% threshold and won a million dollars. Game Over.
“The drama of the contest was amazing,” then Netflix CFO Neil Hunt said at the awards ceremony. “There were a lot of teams in the beginning – and they got six, seven, sometimes eight per cent improvement. But then the pace of progress slowed down, and we got to the second year of this challenge. There was a long period of time when the participants were barely making any progress, and we thought our challenge might never be realized. Later, some teams rightly saw that if they combined their approaches, they would get better results. For many people, this was quite counterintuitive [because usually the most brilliant two people are taken and told to come up with a solution]. […] And yet, the madness started all over again when the teams connected their algorithms in a certain way.
The insight into combining methods turned out to be a masterful mathematical move, although it thwarted initial hopes for a brilliant and quick solution to the problem. In fact, a reliable recommendation turned out to be a highly complex phenomenon. Both humans and machines need to find new ways to learn from each other – and from data – to significantly improve their ability to collaborate and anticipate what we really want.
Long-tail of the competition
The success of the competition surprised both data scientists and digital innovators. Sharing datasets publicly to attract communities looking for new challenges has become a best practice. It seems significant that Kaggle – a popular machine learning competition site acquired by Google in 2017 – was launched one year after the Netflix Prize was awarded.
Just as Lindbergh’s solo flight across the Atlantic awakened the entrepreneurial spirit of commercial aviation, the breakthrough with the Netflix Prize ignited digital ambitions that helped transform the future of recommendation engines. The Recommendations became less and less merely a research branch of the digital subsystem – they began to function as an emerging ecosystem. In 2007, the Association for Computing Machinery (ACM) inaugurated the RecSys conference – a global congress that brings together authors of the best academic and industry research.
Combining the methods turned out to be a masterpiece, although it thwarted initial hopes for a quick fix. In fact, a reliable recommendation turned out to be a highly complex phenomenon.
“I believe the Netflix award has proven to be extremely important,” said Joseph Konstan, an innovator at GroupLens. ‘It not only put the development of recommendation systems on the research map but also attracted the attention of many excellent scientists in the field of machine learning and data mining.’
The very conclusions are drawn from the competition also had an enormous impact. In a 2014 presentation, summing up the five years since the award was awarded, former Netflix algorithm specialist Xavier Amatriain, in collaboration with colleagues, provided detailed conclusions that changed innovation pathways:
- Implicit feedback – clicks, impressions, average time spent on the page, and other measurable user behaviour – have proven to be better and more reliable at capturing preferences than explicit ratings. In other words, user actions reveal more than their ratings. Improving the mechanisms for collecting classified feedback is crucial to improving the effectiveness of recommendations.
- “Predicting ratings” is ultimately not the best way to formalize your recommendation problems. The use of machine learning algorithms that personalize user rankings is a more effective approach to suggesting the best positions to users. Technically speaking, machine learning-based rankings could, for example, generate better recommendation packages than algorithms that predict user ratings. In this context, you need to think about managing your recommendation portfolio.
- A balance between exploration and exploitation is important to engage the user. What combination of recommendations should arouse curiosity and further exploration, and what is considered a “sure thing”? Variety and novelty in a recommendation can be just as important as relevance.
- Recommendations are not only a two-dimensional problem of correlating users and items but a multi-dimensional space that includes contextual elements such as time of day, day of the week or physical location. Contextual recommendations have become the subject of research and investment.
- Users decide which items to choose based on how good they think they are and how those decisions can engage or influence their social networks. Connections between users on Facebook, LinkedIn, and Gmail can be an excellent source of data worth supplementing the recommendation system. Suggestions in social media can significantly influence recommendations on other platforms. For example, during the Netflix Prize Competition, Facebook’s user base grew from twelve to over 360 million. Access to the appropriate fragments of this data can significantly reduce the problem of their scarcity.
- Transparency and accessibility in the area of user experience (UX) are important. Intelligent algorithms that reliably select the most context-relevant elements for users are not sufficient. These elements must be presented in a form that users can appreciate and usefully use.
- Therefore, the reasons and premises for the recommendation must be easy to explain. Explaining the recommendations usually makes it easier for users to make decisions, increasing conversion rates and leading to greater satisfaction and trust in the system. Providing explanations leads to a better understanding and makes users feel right when they don’t like specific recommendations. The method of automatically generating and presenting explanations on the side of the system aroused considerable interest among researchers.
Hearing what serious Silicon Valley or Shanghai-based researchers and investors are declaring, it’s hard to overestimate the impact of the Netflix Award on the development of data-driven recommendations and thus the development of platforms and machine learning. Seemingly bizarre machine learning theories have been stress tested in real-world contexts. Once contradictory and controversial discoveries have become common wisdom. The competition contributed to the creation of innovative teams of researchers and entrepreneurs.
The most ironic and even perverse about the Netflix award is that the company ultimately failed to use winning algorithms to transform Cinematch. Time and technology have evolved rapidly. The fashion for DVD borrowing was dying down. Innovations in cloud computing with higher on-demand bandwidth prompted Netflix to distribute movies online. Subscribers loved it, but the content delivery technologies, UX, and immediacy were radically different from before. The Cinematch criteria and queue management assumptions that successfully made the Netflix recommendation-driven business model proved too anachronistic.
“When we were a company that mailed DVDs in the mail, and people gave us ratings, it expressed a certain thought process,” recalls Netflix’s Amatriain. – You added something to your queue because you wanted to watch it a few days later; the decision came at a cost and delayed reward. With instant streaming, you start playing something, and if you don’t like it, you just change the video. Users don’t see the benefits of expressing explicit opinions, so they’re investing less effort in it. “
His Netflix colleague Carlos Gomez-Uribe confirms this: “Tests have shown that predicting ratings isn’t really super helpful, unlike what’s actually reproducing. We are moving from focusing solely on assessments and their predictions to relying on a more complex ecosystem of algorithms. “
A “complex ecosystem of algorithms” sounds similar to a “team”. What data are processed by ecosystems or algorithmic sets to improve the quality of recommendations? Netflix online monitors when subscribers pause or rewind. Checks what days subscribers watch what content (the company found that people usually watch TV during the week and postpone the movie for the weekend). It also tracks watch date, time, location (by zip code), device and – of course – ratings. It was impossible to monitor this behaviour in the DVD world.
It is difficult to overestimate the impact of the Netflix award on the development of recommendations and thus on the development of platforms and machine learning.
The ability to stream video content to users’ devices has increased the amount of subliminal feedback. While the Netflix Prize results failed to refine or revolutionize the Cinematch of the past, they initially allowed Netflix to take advantage of its new digital video-on-demand platform. The competition gave Netflix a clear and convincing argument to invest in the future of the recommendation system.
Moreover, a new wealth of customer information, such as by seeing which shows encourage compulsive viewing of multiple episodes, has allowed Netflix to recommend content and create it. Recommendations turned out to be a digital double-edged sword. The same data and algorithms that Netflix used to help users decide which shows to watch can be changed to help producers decide what shows to produce. Recommendation engines can be as valuable for creative development as for effective distribution.
The knowledge that gives you an advantage
In 2011, Netflix outbid HBO and AMC, acquiring the rights to produce the American version of the British TV blockbuster House of Cards. The recommendation-based analysis was key – thanks to them, the company decided to stake $ 100 million on starting its own production business. Netflix’s endorsements have become risk management tools. According to the Kissmetrics blog, Netflix knew that many of its subscribers watched David Fincher’s director of The Social Network – about Facebook – from start to finish. Netflix knew the UK House of Cards enjoyed considerable popularity. In addition, he also knew that fans of the British House of Cards also watched films with Kevin Spacey and/or directed by Fincher.
Overall, recommendations can tell Netflix developers what features are most likely to attract subscribers’ interest, enhance their loyalty, and encourage compulsive viewing. “Because we have a direct relationship with consumers,” said one CEO, “we know what people like to watch, and that helps us understand how much interest a particular program will be.” It gave us the confidence to find an audience for a show like House of Cards.
Up until the sex scandals surrounding Spacey, House of Cards proved to be highly successful – both as a financial and artistic investment. However, Netflix’s “recommendation effect” goes well beyond analyzing the original content design – it permeates every acquisition considered by the company. “Netflix is looking for the most powerful content,” said the former vice president of product engineering for Netflix. Various complex metrics are used to measure the satisfaction of members of the site. How much would it increase if Netflix licensed Mad Mena, and how much if it included the Sons of Anarchy and “? – Efficient ones are those that will offer you maximum happiness per dollar spent.
Without a recommendation infrastructure, these questions would be largely hypothetical. However, investment in organizational culture and innovation has given accurate, wise and profitable answers. TV and movie producers, networks, and distributors worldwide analyze Netflix data to make their own decisions. The operating model of this company changed Hollywood and its economy.