[Interactive Map] 2016 Election Forecast Using Social Data

By September 12, 2016 Uncategorized No Comments

At Networked Insights, we spend all day helping our clients gain deep insights from social data, and much of the conversation in 2016 has been dedicated to the Presidential election.

This got us to thinking: could social data prove more predictive than polling methods used by political data powerhouses like RealClearPolitics and FiveThirtyEight? In an election cycle with such an out-sized amount of highly emotional chatter around a single polarizing candidate (Donald Trump), would it even be possible to forecast the election without totally skewed results?

We think so, and we gave it a shot.

First, using our analytics platform Kairos, we built 4 metrics: Awareness, Positivity, Negativity and Intent, of which only Negativity and Intent proved to be valuable in predicting elections. Negativity and Intent are Natural Language Processing classifiers which take advantage of sentence structure as well as keyword matching.

Then we modeled the data against survey polls, primary results, and survey pools to obtain weights of influence for each of the social indices. Finally, we use those parameters to continue predicting the state elections based on new data, which we will update on a weekly basis. The map below plots the expected spread (1.3 means that Hillary is predicted to win the popular vote by 1.3 points) and color codes by party and computes the expected electoral votes.

See the results in this interactive map:

Having trouble viewing the map? Access it directly here.

When you hover in each state you will find the rankings for the metrics; the lower the rank, the better that state is performing when compared to other states. For example, a 3rd rank in Trump Intent means that there are only two other states with a better percentage of intentful conversations while a rank of 1st in Negativity Rank for Trump means that’s the state with the highest percent of Negative conversations for Trump.

Some surprising takeaways emerge from this data:

  • Pure conversation is not predictive. In states like N.Y. and Vermont, where Clinton holds a commanding lead, Trump still drives the majority of conversation, owning 80% of conversations, as opposed to Hillary’s 20%.
  • Negativity was a fairly reliable indicator of intent.
  • While the methodology was entirely different, using social data to predict intent produced results that were surprisingly close to polling data.
  • While this model using the most recent data has Clinton winning the electoral vote 300 to 255, this poll has Ohio and Florida leaning towards Trump, and South Carolina leaning towards Clinton. The Florida results in particular reflect a difference of 3.7 points in Trumps favor.

Could this data possibly be better than traditional polling? We think so, and here’s why:

  • The samples with social are huge, with thousands of people talking about the election every day
  • People voluntarily express emotions about candidates on social media and use language that implies intent. That language is often full of expletives and plenty of unintelligible verbiage, so you need to be able to clean up the data to make it useful.
  • The research community has already found some success using social data as a predictive indicator of electoral success, but hasn’t yet had the tools or methodology to accomplish the task with an acceptable level of accuracy. Social data is demographically biased, the campaign’s promotional frequency might affect results, and obtaining an accurate signal regarding emotions from free text is not easy to do.

This is where Networked Insights comes in. Our platform, Kairos, automatically cleans up “dirty” data to improve accuracy of results. It is also particularly great at uncovering the implicit meaning within the comments. What that means is that when someone says “I’d love to give Trump a piece of my mind,” for example, our technology can tell that it doesn’t mean that they love Trump (and thus have a positive sentiment towards him). Instead we use over 25,000 different classifiers to deeply understand both the emotions and potential intent behind a post.

What we hope it means is that this new model of leveraging social data to more accurately predict elections has some legs. We’ll find out soon enough.

Next week we will also look at what changed and highlight the top topics from each candidate to start understanding why the overall or a specific state changed.