Text Mining Analysis: A New Tool for a New Type of Research

“South and Central Asia: 9/11+15: A Strategic Review of the Fight Against Terrorism” via EIN News
September 12, 2016
RESOLVE Network Launches Working Paper Setting Priorities for Research on Violent Extremism
September 19, 2016
Show all

Text Mining Analysis: A New Tool for a New Type of Research

The drive to develop policies aimed at countering and preventing violent extremism has generated reams of commentary from think tanks, NGO’s, and leaders in capitals from Bamako to Brussels. The glut of white papers, policy manifestos, and literature reviews that have accompanied the rise of violent extremist groups like Da’esh and al-Qaeda over the last decade is astounding. Information overload is not uncommon. What do we really know though about the current state of peer-reviewed research on violent extremism and violent social movements today? What exactly is the state of the evidence base for theories about the drivers of extremism? Chances are big data can tell us a lot more than we think we already know.

The RESOLVE Network has launched a text-mining experiment to unearth answers to these questions. In collaboration with Stability Analytics Inc., a San Diego-based research company, we have compiled and analyzed a corpus of 3,147 academic articles, with 14,224,399 words, from 426 different journals. (See a partial list of Peer Reviewed Journals used for our text mining analysis here.) Our unique approach to research on violent extremism taps into natural language processing tools and text-mining techniques to generate fresh insights into the trends, gaps, and methodological approaches that shape scholarly research into violent extremism.

Empirical analysis on the phenomenon ranges widely across disciplines. For much of the last 20 years, political scientists have dominated research in this area, deploying a mix of quantitative and qualitative methods to explore the complex dynamics that give rise to and sustain violent social movements. More recently, anthropologists, sociologist, and social psychologists have also advanced dozens of new ideas around connections between extremism, human evolution, social cohesion, and the neuroscience of moral judgment. There are multiple overlapping theories about how different factors play into extremist beliefs and sectarian violence. The question is which theories are the most salient for analyzing the rapidly evolving threat of violent extremism? Which scholars move trend lines? Which ideas have endured the test of time through strength of evidence versus strength of opinion?

The volume of literature on violent extremism and related areas of research and rate at which it is growing makes it next to impossible for researchers to keep up. Old-fashioned manual literature reviews can only take us so far.  Real time understanding of this rapidly evolving field demands innovative collection and analysis techniques. Automated and semi-automated methods of collecting, organizing and understanding emerging research can reveal patterns and connections that are otherwise undetectable to the human eye.

The objective of our research project is to automate existing literature review tasks and to provide novel capabilities through large scale text mining. Our data crunching experiment will systematically collect and visualize facts and details on the impact factor of prominent theories, scholars and methods used to study violent extremism. It will also seek to identify gaps in geographical coverage in areas of the world where research on the dynamics of violent social movements is especially thin.

Applications of text-mining analysis include those conducted by the Open Syllabus Project, which aggregated and assigned relationships to over 900,000 documents sourced from academic institutions around the world. The most taught text across subjects? “The Elements of Style” by William Strunk.

Another example comes from a Columbia University text-mining analysis of the State of the Union Address, from George Washington to Barack Obama. The goal of the undertaking was to understand how discourse around government obligations and activities have changed throughout the course of US history. The results of the effort was the creation of a “semantic network based on the occurrence of frequently used terms in each epoch.” One of the most continuous topics? The ongoing conversation on abortion.

Over the next several months we will work to develop unique visualizations of the relationships between concepts and disciplines, as well as see which themes and concepts are most frequently cited and considered authoritative. If this initial foray into machine learning applications proves fruitful, there are other approaches we may apply to the corpus for additional layers of analysis at a later date.

Interested in learning more about text-mining and natural language applications or want to contribute to our growing library? Email us at research@resolvenet.org.