What is the correct input in scikit-learn MDS?

Question

What is the correct input in scikit-learn MDS?

I hope this is the right place to post - if not, I'm ready to move on to SO.

Anyway, I'm using MDS to help me find a 2D representation of a dataset. In essence, these are the pKa values of amino acid residues over the years of protein data - decimal numbers of the same scale, in essence. There are many positions (~ 600 rows) and there are many years (~ 12 columns).

My question is this: is the correct MDS entry in the data matrix (years versus positions), or can I put in the correlation matrix (year versus year)? I ask because the API docs conflict with the written description.

The API docs say data matrix: http://scikit-learn.org/stable/modules/generated/sklearn.manifold.MDS.html#sklearn.manifold.MDS (i.e. n_samples, n_features).

The written description says "input similarity matrix": http://scikit-learn.org/stable/modules/manifold.html

+3

python scikit-learn

ericmjl 07 Aug 14 at 21:01

source to share

1 answer

Dougal · Accepted Answer · 2014-08-07T21:03:37+0000

If you pass dissimilarity='euclidean'

in an initial estimate (or default), it will take the data matrix and compute the Euclidean distance matrix for you.

If you pass dissimilarity='precomputed'

it requires a dissimilarity matrix.

The docs are not really very clear; I'm sure a pull request adding a short note to the argument description X

and clarifying what the 'euclidean'

default is (I needed to check the source) will be accepted.

What is the correct input in scikit-learn MDS?

More articles: