Classification of multidimensional data

Question

Classification of multidimensional data

I would like to classify some multidimensional data:

The input data is as follows:

Data1: [[a1,b1,f1], [a2,b2,f2], ... [an,bn,fn]] where: fn = F(an,bn) --> ClassA
Data2: [[c1,d1,g1], [c2,d2,g2], ... [cn,dn,gn]] where: gn = G(cn,dn) --> ClassB
...

So, given Datax, as follows, we would like to classify it into one of the final classes:

Datax: [[x1,y1,z1], [x2,y2,z2], ... [xn,yn,zn]] where: zn = Z(xn,yn) --> which class?

I could flatten the array for each record and train my classifier:

Data1: [a1,b1,f1,a2,b2,f2,...,an,bn,fn]

But I thought, because the third values are themselves functions of the first two values (for example fn = F(an,bn)

), I should consider this relationship in my teaching, not for a flat array.

Does it matter? or what is the best approach to solve this problem?

+3

algorithm supervised-learning machine-learning classification training-data

towi_parallelism Apr 13 17 at 10:31

source to share

1 answer

PrisonGuy · Answer 1 · 2017-04-13T11:34:55+0000

If the 3rd data of each tuple is the product of the same deterministic function (which may be different on each row, but must be the same for each triplet of the row) then you can simply strip out zn, because it brings no new information.

ex: z1 = 3x1 + 2y1; z2 = 3x1 + 2y1; [...]; zn = 3xn + 2yn

If not, you should leave z1.

Having said that, I think you can flatten the array because most models automatically understand these dependencies.

Classification of multidimensional data

More articles: