Visualize smallest nodes of hierarchical clustering using dendrogram

I am using linkage

to create agglomerative hierarchical clustering for a dataset of about 5000 instances. I want to visualize the "bottom" merges in the hierarchy, that is, the nodes that are close to the leaves with minimal measures.

Unfortunately, the render dendrogram

prefers to display the "top" nodes from the most recent merges in the algorithm. By default, it shows the top 30 nodes, collapsing the bottom of the tree. I can change the value P

to show more nodes, but I would need to show all 5000+ to see the lowest clustering levels at which the graph point is no longer readable.

MCVE

For example, starting with the example linkage

openExample('stats/CompareClusterAssignmentsToClustersExample')
run CompareClusterAssignmentsToClustersExample
dendrogram(Z, 'Orient', 'Left', 'Labels', species);

      

Produces a dendrogram with the top 30 nodes visible. Numerically labeled nodes collapse the lower levels of the tree.

Dendrogram with collapsed lower levels

I can increase the number of visible nodes to include all leaves at the expense of readability.

dendrogram(Z, size(Z,1), 'Orient', 'Left', 'Labels', species);

      

Dendrogram with all leaves

What I like

I would have liked this to be zoomed in in the version above as shown below, but showing the first 30 nearest clusters.

Scaling a dendrogram with all leaves

What i tried

I tried to provide a function using the first 30 lines Z

,

dendrogram(Z(1:30), 'Orient', 'Left');

      

but that means "The index exceeds the dimensions of the matrix". error when one of the rows refers to a cluster at row> 30.

I've also tried using the dendrogram property Reorder

, but I'm having a hard time finding the correct ordering that orders the clusters from nearest to furthest.

%The Z matrix is in order from closest cluster to furthest, 
% so I can use it to create an ordering
Y = reshape(Z(:, 1:2)', 1, [])
Y = Y(Y<151);
dendrogram(Z, 30, 'Orient', 'Left', 'Labels', species, 'Reorder', Y);

      

I am getting the error

In the requested order of nodes, some data points belonging to the same leaf on the graph are separated by points belonging to other leaves. Try using a different order.

Perhaps this order is not possible if the entire tree is computed because there will be branch intersections, but I hope there is a better order if I only look at a part of the tree and clusters at higher levels are not considered.

Question

How can I improve my rendering to show the lowest level clusters in the dendrogram?

+3


source to share





All Articles