A comparative study on data mining tools

In a data mining course such as CSc 177 (Data Warehouse and Data Mining) at California State University, Sacramento, the students need to work on a term project. The students learn to implement various classification and clustering algorithms [1]. Part of the course requirements is to implement one or more data mining tools to complete the data mining project. This project allows students to explore various data mining algorithms using different open source data mining tools. These data mining tools are very useful to predict valuable information and gain knowledge based on the input data sets. A number of popular data mining and data visualization tools like Weka [2], Rapid Miner [3], R [4] and Tableau [5] provide a free and an efficient way to implement the various data mining algorithms. One of the challenges for students is to select one particular tool for their term project within limited time and also learn how to use it quickly. It is important to know and understand how each algorithm is implemented in tools such as Rapid Miner and Weka. It takes a lot of time to understand the features and interpret the results. For example, both Rapid Miner and Weka have unique features and their own methods of data representation. In this project, I developed a comparative study and presented it by designing a website that contains the following: 1. The output generated for Rapid Miner and Weka for a classification and clustering algorithm using the same data set simultaneously in a single window 2. A set of illustrations on how to use Tableau to solve a real quality health care problem 3. A set of useful resources which will help users to learn R 4. Comparative Study results added as a reference for students in the CSc 177 course website at California State University, Sacramento 5. Quizzes to check student understanding of the data mining tools