Classification of multidimensional data

I would like to classify some multidimensional data:

The input data is as follows:

Data1: [[a1,b1,f1], [a2,b2,f2], ... [an,bn,fn]] where: fn = F(an,bn) --> ClassA
Data2: [[c1,d1,g1], [c2,d2,g2], ... [cn,dn,gn]] where: gn = G(cn,dn) --> ClassB
...

      

So, given Datax, as follows, we would like to classify it into one of the final classes:

Datax: [[x1,y1,z1], [x2,y2,z2], ... [xn,yn,zn]] where: zn = Z(xn,yn) --> which class?

      

I could flatten the array for each record and train my classifier:

Data1: [a1,b1,f1,a2,b2,f2,...,an,bn,fn]

      

But I thought, because the third values ​​are themselves functions of the first two values ​​(for example fn = F(an,bn)

), I should consider this relationship in my teaching, not for a flat array.

Does it matter? or what is the best approach to solve this problem?

+3


source to share


1 answer


If the 3rd data of each tuple is the product of the same deterministic function (which may be different on each row, but must be the same for each triplet of the row) then you can simply strip out zn, because it brings no new information.

ex: z1 = 3x1 + 2y1; z2 = 3x1 + 2y1; [...]; zn = 3xn + 2yn



If not, you should leave z1.

Having said that, I think you can flatten the array because most models automatically understand these dependencies.

0


source







All Articles