Graduate Project

Visualizing and Recognizing Important Trends in Housing Sales Dataset Using Hadoop

The purpose of this paper is to visualize and find important trendy patterns in a dataset that contains house sale prices and their features for King County in Washington between May 2014 and May 2015, which was obtained from Kaggle, a website that shares datasets for data science and machine learning. Visualizing and finding the patterns in this dataset gives us more insight about the real estate business, such as the type of homes that are preferred in inexpensive neighborhoods versus expensive neighborhoods and many factors that are affecting the price of a home in King County. The visualization was performed on Apache Hadoop, a platform distributed storage and processing big data on a cluster of computers. Through visualization, we revealed many important features associated with house sale prices in King County.