Why does TSNE in sklearn.manifold give different answers for the same values?

from sklearn.manifold import TSNE
tsne = TSNE(n_components=2, init='pca', n_iter=5000)

print(tsne.fit_transform(np.array([[1,2,3],[3,4,3],[1,2,3],[3,3,3]])))

      

outputs:

[[ 547.9452404    11.31943926]
 [-152.33035505 -223.32060683]
 [  97.57201578   84.04839505]
 [-407.18939464  124.50285141]]

      

For vector [1,2,3], which is repeated twice, it gave different values ​​/ vector.

Why is this so?

Edit1:

The example above is just a toy example to show this fact. Actually my data is a multi-level form of a form (500 100). However, the same problem persists.

+3


source to share


1 answer


This is an interesting question. TSNE converts the samples to a different space that preserves the distances between them, but does not guarantee that the sampled data value will be preserved. He considers each of the samples as a different point and tries to compare the distances from that point to each other in a different space. This does not take into account the value of the sample as well as its relative distance to any other point.

You can check that:

>>> a = np.array([[1,2,3],[3,4,3],[1,2,3],[3,3,3]])
>>> b = TSNE(n_components=2)
>>> from sklearn.metrics import euclidean_distances
>>> print(euclidean_distances(b[0], b).sum())
2498.7985853798709
>>> print(euclidean_distances(b[2], b).sum())
2475.26750924
>>> print(b)
[[-201.41082311  361.14132525]
 [-600.23416334 -523.48599925]
 [ 180.07532649 -288.01414955]
 [ 553.42486539  538.85793453]]

      



It keeps roughly the same distances (given the scale) for both samples for every other sample, although it has different representations for them.

As to why this does not work well for only 4 samples, I assume you will only have 4 samples and 3 measurements. TSNE cannot display the correct mapping with that many samples. It is supposed to work with large dimensional data (and a few examples of it).

For lower sized data, I'd say a simple PCA will do the job. PCA your data and save the top 2 measurements.

+2


source







All Articles