Clustering before classification in Weka

The instances in my dataset have multiple numeric attributes and a binary class. Is there a way in Weka to use clusters and pass the result to a classifier (like SMO) to improve the classification results?

+3


source to share


2 answers


One way you could add cluster information to your data is with the method below (in Weka Explorer):

  • Load your favorite dataset
  • Select the cluster model (in my case I used SimpleKMeans)
  • Change cluster parameters as required
  • Use the training kit for cluster mode
  • Start the clustering process
  • After creating the clusters, right click on the list of results and select "Visualize cluster assignments"
  • Select Y as your cluster, then click the Save button as shown below:

Weka Cluster Visualize

  1. Save the data to the designated location.


Then you can load this file and use the cluster information in your classifier like any other attribute. Just make sure the correct attribute is set for the class and you must be right to go.

NOTE. When I ran these tests, I used J48 to evaluate the class, and it seemed like J48 was only using cluster values ​​to evaluate the class. The model was also surprisingly accurate, so either the dataset was either too simple or I could skip a step in the clustering process.

Hope it helps!

+2


source


In Weka Explorer after loading your dataset

  • select the Preprocess tab,
  • click the "Select ..." button,
  • add a filter without add-ons-attributes AddCluster.
  • click next to the button to open the Select Cluster field, select a cluster,
  • configure / parameterize a cluster
  • close all modal dialogs


Click the Apply button to apply the filter. It will add another attribute called "cluster" as the rightmost attribute in the list of attributes.

Then continue with your classification experiments.

+1


source







All Articles