Project

Pandemic disease monitoring tool using Twitter data

Project (M.S., Computer Science)--California State University, Sacramento, 2018.

For any statistical study, from predicting the winner of an election, to governing the health statistics, a huge amount of real time data is necessary. Twitter data is extensively used in recent times for such statistical studies. The speed at which news spreads via twitter is phenomenal. When people tweet about the symptoms of a disease at a location, then these tweets help health care professionals to act appropriately and issue a warning about the probability of a disease becoming widespread. In this project, twitter data is collected for a few pandemic diseases, specifically Common Cold, Influenza-like Illness having fever (ILIFever) and without fever (ILIECDC), Allergy, and Gastroenteritis to determine disease affected areas, locations which require monetary support for medication, and determining outbreak of diseases at a location precisely. A tool is developed to retrieve the tweets from Twitter using generic terms used in the discussion for a disease selected. Retrieved tweets are processed and stored for further analysis. The credibility of stored tweets is determined by sentiment analysis with text classification. Other components were developed to: (1) effectively use medical vocabularies in retrieval of tweets, (2) visualize the disease-related tweets to predict the widespread of diseases, (3) visualize the credible tweets statistics and compare it with
 vi
 total tweets for each disease, (4) visualize geo-tagged tweets to determine the location from which more tweets related to particular disease has been made, and (5) crowd sourcing disease-related data from public and use an interactive maps to locate the disease affected areas. Integrating all the functionalities provides the user a choice to fetch the tweets, visualize the tweets, and visualize the crowdsourced data to predict the outbreak of a disease.

For any statistical study, from predicting the winner of an election, to governing the health statistics, a huge amount of real time data is necessary. Twitter data is extensively used in recent times for such statistical studies. The speed at which news spreads via twitter is phenomenal. When people tweet about the symptoms of a disease at a location, then these tweets help health care professionals to act appropriately and issue a warning about the probability of a disease becoming widespread. In this project, twitter data is collected for a few pandemic diseases, specifically Common Cold, Influenza-like Illness having fever (ILIFever) and without fever (ILIECDC), Allergy, and Gastroenteritis to determine disease affected areas, locations which require monetary support for medication, and determining outbreak of diseases at a location precisely. A tool is developed to retrieve the tweets from Twitter using generic terms used in the discussion for a disease selected. Retrieved tweets are processed and stored for further analysis. The credibility of stored tweets is determined by sentiment analysis with text classification. Other components were developed to: (1) effectively use medical vocabularies in retrieval of tweets, (2) visualize the disease-related tweets to predict the widespread of diseases, (3) visualize the credible tweets statistics and compare it with vi total tweets for each disease, (4) visualize geo-tagged tweets to determine the location from which more tweets related to particular disease has been made, and (5) crowd sourcing disease-related data from public and use an interactive maps to locate the disease affected areas. Integrating all the functionalities provides the user a choice to fetch the tweets, visualize the tweets, and visualize the crowdsourced data to predict the outbreak of a disease.

Relationships

Items