Masters Thesis

Comparison of Different Techniques for Fake News Detection

Data mining is the process of finding patterns and relationships by analysing large datasets to solve problems. It involves using classifiers which help classify data into certain categories. Fake news detection is a popular topic in data mining. With easy and low-cost access to information online, it is available on every social media platform. As a result, people have started relying on these platforms for the consumption of news. This also makes it easier for the spread of intentional fake news. To combat this, various researches are being done. Traditional fake news detection relies on the context of the news, whereas in social media, auxiliary information can be used which are either linguistic based or visual based. Many natural language processing algorithms help us extract and use this information to create machine learning models which can successfully distinguish between fake and real news. Fake news is either written intentionally to make readers believe it or it is just satirical. Recent evidence has shown that about 62 percent of US adults use social media for news. Fake news is shared widely as people are more likely to believe it. To many people Reddit is a source of news. Reddit has many subreddits specifically for satire and real news. In this thesis, we gather data from these subreddits to apply data mining on the posts from them. A number of machine learning classifiers will be created and compared to achieve a high accuracy for fake news detection.