ELKI: Running DBSCAN on Custom Objects in Java

I am trying to use ELKI from JAVA to run DBSCAN. For testing, I used FileBasedDatabaseConnection. Now I would like to run DBSCAN with my custom objects as parameters.

My objects have the following structure:

public class MyObject {
  private Long id;
  private Float param1;
  private Float param2;
  // ... and more parameters as well as getters and setters
}

      

I would like to run DBSCAN in ELKI using the List<MyObject>

as database , but only some of the parameters need to be considered (for example, launching DBSCAN for objects using the param1, param2 and param4 parameters). Ideally, the resulting clusters contain entire objects.

Is there a way to achieve this behavior?

If not, how can I convert the objects to a format that ELKI understands and allows me to map the resulting cluster objects to my custom objects (i.e. is there an easy way to programmatically set the label)?

The next question talks about featureVectors: Using ELKIs on custom objects and presenting the results
Could this be a possible solution to my problem? And how is the vector of objects created from mine List<MyObject>

?

+3


source to share


1 answer


ELKI has a modular architecture.

If you need your own datasource look at the package datasource

and implement DatabaseConnection

(JavaDoc) the interface.

If you want to handle objects MyObject

(the class you shared above is likely to have a significant performance impact) this is not particularly difficult. You will need SimpleTypeInformation<MyObject>

(JavaDoc) to identify your datatype and implementPrimitiveDistanceFunction

(JavaDoc) for your datatype.



If your actual data is floats, I suggest using DoubleVector

or FloatVector

and just use, for example, SubspaceEuclideanDistanceFunction

handle only the attributes you want to use.

R * -tree indexes can be used for these datatypes and many distance functions, dramatically speeding up DBSCAN runtime.

A Cluster

(JavaDoc) in ELKI never stores point data. It stores a point DBIDs

(Wiki). You can get point data from a database relation, or use, for example, offsets (Wiki) to copy it back to a list position for static databases.

+1


source







All Articles