Distributed Supervised Sentiment Analysis of Tweets: Integrating Machine Learning and Streaming Analytics for Big Data Challenges in Communication and Audience Research

Authors

  • Carlos Arcila Calderón
  • Félix Ortega Mohedano
  • Mateo Álvarez
  • Miguel Vicente Mariño

DOI:

https://doi.org/10.5944/empiria.42.2019.23254

Keywords:

sentiment analysis, Twitter, Big Data, Streaming, machine learning, communication and audience research, Apache Spark

Abstract

The large-scale analysis of tweets in real-time using supervised sentiment analysis depicts a unique opportunity for communication and audience research. Bringing together machine learning and streaming analytics approaches in a distributed environment might help scholars to obtain valuable data from Twitter in order to immediately classify messages depending on the context with no restrictions of time or storage, empowering cross-sectional, longitudinal and experimental designs with new inputs. Even when communication and audience researchers begin to use computational methods, most of them remain unfamiliar with distributed technologies to face big data challenges. This paper describes the implementation of parallelized machine learning methods in Apache Spark to predict sentiments in real-time tweets and explains how this process can be scaled up using academic or commercial distributed computing when personal computers do not support computations and storage. We discuss the limitation of these methods and their implications in communication, audience and media studies.

Downloads

Download data is not yet available.

Published

2019-01-15

How to Cite

Arcila Calderón, C., Ortega Mohedano, F., Álvarez, M., & Vicente Mariño, M. (2019). Distributed Supervised Sentiment Analysis of Tweets: Integrating Machine Learning and Streaming Analytics for Big Data Challenges in Communication and Audience Research. Empiria. Revista de metodología de ciencias sociales, (42), 113–136. https://doi.org/10.5944/empiria.42.2019.23254