Masters Thesis

Modeling addiction and disease epidemiology using social media

Social media sources have generated an explosion of detailed behavioral data. These novel data sources provide us with an exceptional opportunity to devise innovative computational methods to study and characterize drug use and recovery at an unprecedented scale. In this thesis, we describe how novel computational methods encompassing data mining, machine learning, natural language processing, causal analysis, and social network analysis can be used to leverage social media data to understand drug use and recovery. In particular we report the following results: (1) We employ structural equation modeling to quantify how emotional distress, physical pain, relationships, and self-development are associated with addiction recovery behavior. (2) We demonstrate how recurrent neural networks utilizing word embeddings, and other textual features can be used to identify and predict the stages of opioid use and recovery. (3) We describe how computational models can predict addiction recovery inclinations by utilizing propensity score matching and a logistic regression classification model. (4) Finally, we present an open-source web application that analyzes social media posts to identify individuals open to addiction recovery intervention and characterize drug use at an individual and population level.