Automatically predicting the helpfulness of online reviews

Online shopping websites provide platforms for consumers to review products and share opinions. Online reviews provided by the previous consumers are major information source for both consumers and marketers. However, a large number of reviews for a product can make it impossible for readers to read through all the reviews in order to collect information. So it is important to classify and rank the reviews base on their helpfulness to make them easily accessed by readers. It will help consumers finish their information search and decision making more easily. It will also be valuable for product manufactures or retailers to get informative and meaningful consumer feedbacks. Due to the lack of editorial and quality control, the reviews of product dramatically vary on quality: from very helpful to useless and even spam-like. The helpfulness of reviews is currently assessed manually by the votings from readers. This project experiments with data collected from Amazon through using a supervised machine learning approach to investigate the task of predicting the helpfulness of online reviews. It discusses the determinants of the helpfulness of online reviews. Eventually it proposes a model which is used to automatically predict the helpfulness of online reviews.