Weka prints sparse arff file

I was looking for a sparse representation of the arff file as shown here . In my program, I can print the class label "B", but for some reason it doesn't print "A".

    attVals = new FastVector();
    attVals.addElement("A");
    attVals.addElement("B");
    atts.addElement(new Attribute("class", attVals));

    vals[index] = attVals.indexOf("A");

      

The output for the program is similar -

 {0 6,2 8}      ---  I should get {0 6,2 8,3 A}

      

But when I do

vals[index] = attVals.indexOf("B");

      

I am getting correct output -

 {0 6,2 8,3 B}

      

For some reason it doesn't accept index 0. Can anyone tell me why this is happening?

+3


source to share


1 answer


This is a very popular problem. Sparse format, by definition, does not store 0 values.

The Weka ARFF page clearly states that:

Warning. There is a known issue with saving SparseInstance objects from which have string attributes. In Weka, string and nominal values ​​are stored as numbers; these numbers act as indices into an array of possible attribute values ​​(very efficient). However, the first string value is assigned index 0: this means that, internally this value is stored as 0. When SparseInstance is written instances of strings with an internal value of 0 are not output, so their string value is lost (and when the arff file is read again, the default 0 is the index of another string value, so the attribute value changes). To work around this issue, add a dummy string value at index 0, which is never used when you declare string attributes, which are probablywill be used in SparseInstance objects and saved as sparse ARFF files.



First of all, you must specify the attribute of the dummy element. Just change your code to:

attVals = new FastVector();
attVals.addElement("dummy");
attVals.addElement("A");
attVals.addElement("B");

      

Let me know if you need more help.

+1


source







All Articles