D3.js with large (> 500,000) dots? Clustering?

I am considering plotting a scatterplot with a large number of points (500,000 and up).

We are currently doing this in Python with Matplotlib. It displays points and it provides pan and zoom control. I don't believe it provides any clustering or points, it just paints them out - I don't think it doesn't, but you can zoom in and they are all there.

I was looking at a diagram in JavaScript to make it a little easier to share. I was looking at D3.js to see if something like this is possible there. I found this example of a basic scatter plot:

http://bl.ocks.org/mbostock/3887118

First, can you build that many points? (500,000 and up) I was under the impression that you couldn't because of the overhead of all DOM objects? Are there any ways to get around this?

Second, is there any clustering, or a library, or even an example of this in D3.js?

Third, if anyone knows good examples of pan / zoom and clustering functions, or even just a packaged JS library that handles it, that would be awesome.

Fourth, it would be nice to have click handlers for each point - and display some text either in an overlay or even in a separate window. Any thoughts on this?

+3


source to share


2 answers


Can you draw half a million points with D3? Of course, but not with SVG. You will need to use a canvas (here's a simple 10,000 point example that includes brush selection: http://bl.ocks.org/emeeks/306e64e0d687a4374bcd ), which means you no longer have separate elements to assign click handlers. You won't be able to render half a million points with SVG, because all of those DOM elements will stub out your interface as you mentioned.

D3 includes support for quadrants that can be used for clustering. It is used in the example above to speed up searches, but you can use it to directly place items at specific scales.

Ultimately your options are:



1) Some other library / custom implementation that renders in the canvas and polls the mouse position to give you the data item displayed at that point.

2) An improved D3 custom approach that puts elements in close proximity and only displays SVG elements that match the zoom level and position of the canvas (pan) you are on.

+3


source


Yes, D3.js can be made to work with millions of scale data in two ways:



For clustering libraries, I would choose one of them. I would choose the scikits library from python, there are a lot of them in JavaScript, but they are not very reliable as they mainly cover k-means or hierarchical clustering. I would pre-compute the coordinates with scikits by clustering and then render it using D3.

D3 handles panning and zooming. Again, click handlers and text display are available in D3. ( http://bl.ocks.org/robschmuecker/7880033 )

+1


source







All Articles