What is the correct input in scikit-learn MDS?
I hope this is the right place to post - if not, I'm ready to move on to SO.
Anyway, I'm using MDS to help me find a 2D representation of a dataset. In essence, these are the pKa values โโof amino acid residues over the years of protein data - decimal numbers of the same scale, in essence. There are many positions (~ 600 rows) and there are many years (~ 12 columns).
My question is this: is the correct MDS entry in the data matrix (years versus positions), or can I put in the correlation matrix (year versus year)? I ask because the API docs conflict with the written description.
The API docs say data matrix: http://scikit-learn.org/stable/modules/generated/sklearn.manifold.MDS.html#sklearn.manifold.MDS (i.e. n_samples, n_features).
The written description says "input similarity matrix": http://scikit-learn.org/stable/modules/manifold.html
source to share
If you pass dissimilarity='euclidean'
in an initial estimate (or default), it will take the data matrix and compute the Euclidean distance matrix for you.
If you pass dissimilarity='precomputed'
it requires a dissimilarity matrix.
The docs are not really very clear; I'm sure a pull request adding a short note to the argument description X
and clarifying what the 'euclidean'
default is (I needed to check the source) will be accepted.
source to share