« Back to Results

New Frontiers in Migration Issues: A Data Science Perspective

Paper Session

Sunday, Jan. 3, 2021 3:45 PM - 5:45 PM (EST)

Hosted By: International Trade and Finance Association
  • Chair: Thierry Warin, HEC Montréal

Scalable Analysis of Political Text Using Machine Learning

Kenneth Benoit
,
London School of Economics and Australian National University Patrick Chester, New York University
Michael Laver
,
New York University
Stefan Müller
,
University of Zurich

Abstract

Estimating policy positions from political text is a core element in many empirical analyses of political competition. This has traditionally been achieved using classical content analysis, which requires (costly) human experts to read and make judgements about all text in some corpus. Benoit et al. (2016) showed that crowd workers can label political texts as effectively as experts, but much faster and more cheaply. However, crowdsourced text analysis still requires judgements about every sentence in every text by multiple crowd workers, limiting its scalability to large text corpora. Unsupervised machine learning requires human “curation” of texts based on policy content, to allow ex-post human interpretation of results. Supervised machine learning methods, in contrast, leverage a relatively small training set of text labelled by humans, whether experts or crowd workers, to analyze a potentially huge volume of text out of sample, making this a much more scalable research tool. In this paper, we evaluate the effectiveness of different supervised machine learning algorithms using training sets labelled by humans, whether experts or crowd workers, to analyze both party manifestos and legislative speeches. We first replicate a widely used left-right scale derived from classical text analysis by human experts. We then exploit the flexibility crowd sourced labels to estimate “new” policy dimensions. Our results are encouraging, suggesting that supervised machine learning based on limited training data is a viable, fast, cheap and scalable method for analyzing large political text corpora out of sample.

Immigration-Related Discourses: Using Unstructured Data to Understand Perceptions of Facts and Their Evolution

Benoit Aubert
,
HEC Montréal
Jane Li
,
Victoria University of Wellington
Markus Luczak-Roesch
,
Victoria University of Wellington
Thierry Warin
,
HEC Montréal

Abstract

Increasingly, there are speeches in the public space that seem to ignore the facts. A parallel reality is created where opinions are reinforced by the sharing of opinions with like-minded individuals or groups. We seem to avoid the confrontation of contradictory ideas. Yet the confrontation of ideas is essential to build a common understanding of a given situation. This understanding is essential for policy makers. Any public policy or measure must be understood by citizens and businesses to be accepted. If the perception of the facts is wrong, it will be difficult for the citizen to understand why a proposed policy is adequate. The paper aims to uncover the influences behind a group's perceptions when those perceptions seem disconnected from reality or facts. The paper provides a better understanding of perceptions that are created within subgroups. In addition, the paper seek to understand the evolution of these perceptions over time, analyzing their genesis, patterns, influences, and sources behind these perceptions. Immigration is used as a topic to understand how discourse is created, how conversation evolves, and how facts or actions influence perceptions. There have been several events that have shaped discussions about immigration and its impact. There have also been several parallel discussions on immigration, making it a rich topic of study.

An NLP Perspective on the Refugee Crisis in Europe: The (Weak) Connection Between Political Media and Twitter Activity

Aleksandar Stojkov
,
Saints Cyril and Methodius University of Skopje-Macedonia

Abstract

Social networks have started to be the subject of a lot of studies from social scientists. The fact that millions of people write, share and comment is interesting already in itself. Indeed, writing, sharing and commenting are the three essential elements of a conversation. As such, a conversation provides some interesting information about people's feelings, attitudes and behavior. The main rationale behind analyses based on social media and Twitter relies on "the wisdom of the crowds" effect. The assumption is that the aggregated judgment of several people is often better than the judgement of experts or the smartest forecaster (Hogarth 1978). In this case study, we attempt to map the conversations on Twitter about the European refugee crisis. Not only the data (content of the tweets), but also the metadata are interesting. Indeed, the content allows us to do a sentiment analysis. We can thus map positive and negative comments about the refugees. With the metadata, we can for instance map where the tweets originate based on their latitude and longitude. We can thus add a spatial dimension to the conversations. We also join a set of different attributes to the data and metadata of the tweets, such as the number of refugees in a country and the routes from their origin country to Europe.

Regional Migration in China: A Machine Learning Approach to the Hukou System

Marta Bengoa
,
City University of New York
Thierry Warin
,
HEC Montréal

Abstract

We use the latest RUMIC survey with socioeconomic indicators, such as education, income, ethnicity, and hukou registration. The RUMiC survey also includes data on health indicators and outcomes. Based on this survey, we propose a Machine Learning protocol to extract a causal model highlighting the relevant and significant features, and their sequencing, in the explanation of health outcomes for regional migrants in China.
Discussant(s)
Thierry Warin
,
HEC Montréal
Kenneth Benoit
,
London School of Economics and Australian National University
Jane Li
,
Victoria University of Wellington
Aleksandar Stojkov
,
Ss. Cyril and Methodius University
JEL Classifications
  • F2 - International Factor Movements and International Business
  • R2 - Household Analysis