A Quick look at Data Mining with Weka

With an abundance of data from various sources,data mining for various purposes is the rage these days.weka is a collection of machine learning algorithms that can be used for data mining tasks.It is open source software and can be used via GUI, Java API and command line intefaces,which makes it very versatile.

Waikato analysis for knowledge experiment (weka) is a free sowftware licensed under GNU General Public License. It has been developed by the department of computer science, Waikato University, New Zealand. Weka has a collection of tools for machine learning algorithms including data preprocessing tools, classification,regression algorithms , clustering algorithms and algorithms for finding associations rules. It is written in java and runs on almost any platform. Weka has mainly 3 interfaces namely : a GUI, Java API and a command line interface.

The GUI has three components namely Explorer, Exprimenter and knowledge flow apart from normal command line interface.

The components of Explorer : 

Preprocess: The first component of Explorer provides an option for data preprocessing .Various formats like ARFF , CSV, C4.5, binary etc.. can be imported.The Explorer component provides an option to edit data set, if required. Weka has specific tools for data preprocessing called filters.

Classify : The next option in weka explorer is the classifier, which is model for predicting nominal or numeric quantities and includes various machine learning techniques like decision trees and lists,instance-based classifiers, Bayeys network etc..

Cluster : The cluster panel is similar to the Classify panel. Many techniques like k-means , EM , Cobweb, X-means and Farthest First are implemented. The output in this tab contains the confusion matrix, which shows how many errors there would be if the cluster were used instead of true class.

Associate : To find the association on the given set of input data, ' Associate' can be used. It contains an implementation of the Apriori algorithm for learning association rules. These algorithms can identify statistical dependencies between groups of attributes and commute all the rules that a given minimum support as well as exceed a confidence level.

Select Attributes : This tab can be used to identify the important attributes. It has two parts - one is to select an attribute using search methods like best-first, forward selection, random ,exhaustive , genetic algorithm and ranking, while the other is an evaluation method like corelation-based , wrapper , information-gain, chi-squared, etc..

Visualize : This tab can be used to visualize the result. It displays scatter plot for every attribute.

The components of Experimenter:

The experimenter option available in weka enables the user to perform some experiments on the data set by choosing different algorithms and analyzing different outputs. It has the folllowing components.

Setup : The first one is to set up the data sets algorithms output destination, etc. We can also add more datasets and compare the outcomes using more algorithms,if required.

Run: You can use this tab to run the experiment .


Analyze : This tab can be used to analyze the result. Weka can be seamlessly used with Java applications as well, just by calling the APIs ,without writing the machine learning code. Weka for big data is still in evolution phase. The latest distribuiton of Weka is 3.9.1 sometimes gives a heap size error in standard settings .

Weka provides a very interactive interface for building and testing various machine learning models. Although there are many machine learning tools available,Weka facilitates quick learning when its powerful GUI is used.


Leave a Reply

Previous Post

Modern Programming Trends in context of 1984

Next Post

Job Searching in Digital Era

Related Posts