In the second installment of the classic “Back to the Future” trilogy, Biff Tannen becomes a wealthy and high powered thug, thanks to a book he receives from his future self who traveled back in time from 2015.
The book, "Grays Sports Almanac," contains the outcomes of major sporting events between the years of 1950 and 2000, which allows Tannen to make millions of dollars placing bets on horse races, football and baseball games, boxing matches, and more, and become powerful in an alternate version of 1985.
Bing Predicts, a statistical modeling tool from Microsoft, is no "Grays Sports Almanac," but short of time travel, it’s the next best thing.
Bing Predicts is run by a team of about a half dozen people out of Microsoft's Redmond, Washington headquarters. It uses machine learning and analyses big data on the web to predict the outcomes of reality TV shows, elections, sporting events, and more.
And it's gotten pretty good at it.
For the 2014 World Cup, Bing correctly predicted the outcomes for all of the 15 games in the knockout round. And it was more than 67% accurate when it came the outcome of the 2014 NFL season, correctly predicting around Thanksgiving that the New England Patriots would win the Super Bowl.
Bing Predicts can make these forecasts about events — last year, it not only accurately predicted the outcome of the Scottish Independence Referendum in the UK , but also the winner every single week during the 13th season of American Idol — thanks to how it harnesses big data.
Bing, the second largest search engine in the US in terms of queries, has access to a huge trove of information — there are over 3.6 billion searches on Bing per month in the US, according to comScore.
Searches can tell a lot about what people are thinking, but searches alone don’t tell the whole story.
Bing also has access to aggregate data of what people are actually doing online — what they’re clicking on, what they're reading and watching, how much time they’re spending on certain sites, and more.
“What people do on the web, what people say on social, speaks well to how they think, what they’re going to do,” Walter Sun, the principal applied science manager at Microsoft, told Tech Insider.
(Microsoft is quick to point out that all of the data the company looks at for Bing Predicts is anonymized and aggregate — it doesn’t look at a particular search connected to a single person.)
Elections and reality shows, which are based on popularity and voting, are one type of event Bing predicts. But the outcomes in sports are based on more than just popularity. After all, just because a certain football team is popular doesn’t mean it’s going to win.
Sun’s team addresses this by using a two-stage prediction model, which is essentially an algorithm, to predict NFL games.
The first stage is a traditional statistical model that’s based on historical data at a very granular level.
That is, it looks at a team’s record over the last few years, as well as margins of victories, scoring by quarters, player data (rushing yards and passing yards, for example), where games have been played, the type of surface they’re played on, whether the stadium is covered, the weather conditions, and so much more.
The model also takes into account changes to the lineup for something factors like a team losing a key player, for example.
“We look at historical statistics and see which factors contribute to strong teams and which match-ups are favorable and unfavorable for a team,” Sun wrote in a blog post earlier this year.
But analyzing these variables is only the first stage of the problem.
The second stage involves tapping into the social web. Sun’s team analyzes what people are saying online about teams, players, and games. The model can take into account public conversations happening on Facebook, Twitter, YouTube, and in forums, which help it predict with more accuracy. This part of the model picks up on what people are saying about injuries, suspensions, changes to the team’s lineup, controversies, and more.
The model might find that people in the New York area are worried about a snowstorm coming, and it can pick up on what people are saying about how well — or how poorly — a player performs in the cold, Sun said.
Sun and his team found that analyzing this so-called “wisdom of the crowd” actually increases the accuracy of Bing Predictions by 5%, a significant factor.
For example, just before Germany and Brazil met in the semifinals of the 2014 World Cup, Brazil lost two key players — one due to an injury, and another because of yellow cards.
“The web and social models picked that up, and we picked Germany” to win as a result, Sun old Tech Insider.
Some variables, or features, are more important than others, so the model assigns different weights to different features. Losing a key player, for example, has a much bigger effect on the outcome of a game than the type of surface a team plays on.
"The beauty of machines and models is that as long as we have the right features, the machine will tell you what the weights are for each category," Sun said.
And one feature can have different meanings. A high number of passing yards suggests that a team will likely win, as it signifies a high number of completed passes. But it's not that simple.
"Our model might learn that low passing yards means you’ll probably lose, but you may find out that very very high passing yards means you actually lose because you’re behind in the third or fourth quarter and you’re throwing the ball a lot," Sun explained.
In all, the Bing Predicts model considers hundreds of these different signals, or data points, for each event, like an election or game, Sun said.
So far this year, Bing is about 60% accurate in predicting NFL matchups.
And each new season, and even as the season progresses, the model gathers more data, so Bing can improve its accuracy.
Maybe in a few years it will be as accurate as the elusive "Grays Sports Almanac."