Big Data Informs a Paradigm Shift in Political Analysis

Nov. 22, 2016

Do you know whether your neighbors are voting in an election? Are they Democrats or Republicans? As districts are drawn and redrawn--either to make them more or less partisan, depending on who you ask--it used to be impossible to definitively track whether election patterns changed accordingly. With the help of PICSciE high-performance-computing (HPC) resources, Kosuke Imai, Professor in the Department of Politics and the Center for Machine Learning, is changing the game in American political analysis. His work is informing a new, incredibly granular understanding of neighborhood voting patterns nationwide. 

Imai’s research centers on a database containing detailed information about 180 million American voters, refreshed every six months.  The data were made available by L2, a leading nonpartisan firm and the oldest organization in the United States that supplies voter data and related technology to candidates, political parties, pollsters, and consultants for use in campaigns. By merging this data set with census and other sources, Imai has developed a methodology by which to examine in fine detail how local voting patterns change over time. “What’s exciting about this database is that is basically has everyone in the United States who is a registered voter as well as some of those who are eligible but not registered,” Imai explains. “You can look at specific regions, even small geographical neighborhoods, and see how their voting pattern changes over time.”

To achieve this, Imai and his team have developed new computational methodologies that uncover the patterns hidden in vast pools of information. “The United States is very unique in that they provide the list of registered voters, and they also provide information on who votes,” Imai explains. Employing Della and Big Data HPC clusters in the PICSciE and Research Computing Lab, Imai can cross-reference voting information with data from a wide range of additional sources. In so doing, insights arise that would otherwise be obscured--for instance, the potential impact of a specific redistricting plan.

“We have basically every voter located on the map,” Imai says. “When we draw different boundaries for redistricting, we know exactly which voter will be contained in which district. As a result, we can predict what the election outcome will be on the different plans.”

Via groundbreaking machine-learning and other computational methods, Imai and his research team have been able to address the methodological problems that typically hide the relationship between political outcomes and neighborhood locations, racial backgrounds, and other factors. While the specific candidate choices of individual voters remain unknown, Imai can trace behaviors down to the neighborhood level.

“It’s not a straight-forward thing to do when you have 180 million observations in each database,” Imai explains, noting the complications that arise as voters move or change names,, among other issues. “There are statistical methods that I have been working on for the matching of the voters probabilistically. Machine learning is very good at extracting the strong, robust patterns that exist, which might be hard to see in a large database like this.”

The implications of this fine analysis are far-reaching. Not only can one achieve an accurate prediction regarding the effects of redistricting. Everything from the effects of political residential segregation on the fates of extreme political candidates to the realities of polarization along racial lines can be traced. 

In part, Imai envisions this work influencing more realistic political plans. In the meantime, he and his team are working on a number of ambitious projects including one that is analyzing half-billion international trade records, and another tracing the evolution of legislative bills by analyzing digitized text spanning 20 years and 50 states. “If you give research assistants a million documents, there’s no way that they  can compare all these different bills,” Imai says. “With machine learning methods, that’s possible.” Imai is also working, together with Brandon Stewart who is an assistant professor in the sociology department, on a social-science research interface that would allow researchers to see the connections between data sets automatically.

An advocate for the power of big-data analysis, Imai sees it as no less than transformative, and predicts increasing focus on high-performance computing as part of the social-science curriculum. “There are a lot of data about social and human behavior,” Imai says. “Data has really revolutionized the way the field is studied.”

Learn more about PICSciE’s HPC resources, and visit Professor Imai’s website to explore his work in more detail.