Heatmap that clearly shows both high and low density areas (python)

I have a dataset that I would like to represent as a heatmap (x, y positions). Several areas have a much higher density than the rest of the region. This caused these high density areas to completely wash out the details of the lower density areas.

I think using Gaussian KDE provides a better view (and looks prettier) compared to two histograms or contour plots, so I would prefer solutions using this method.

I can't post images because this account has less than 10 rep, but here are some examples of what you've tried .

My code snippets are based on the snippets already posted that I link below, not the repost (some are quite long), but I'll edit them if asked.

The first few are based on Ivo Bosticky's code in this question: An efficient method for calculating the density of irregularly spaced points . The images have the "style" that I am after. As shown in the album above, with little gridison, low density areas are difficult to make out without real detail. The taller grids show some more patchy detail, but not really a smooth transition from high density to low density. Entering values ​​in the log scale erases everything at lower resolutions, and at higher resolutions shows details, but does not blend the mesh appropriately.

The second pair in this album is based on the scipy.stats.gaussian_kde example . Changing the grid effect seems to have little or no effect, and the log scale washes everything out again.

So TL; DR: How do I create a 2D Gaussian KDE that renders detail smoothly in high and low density areas?

+3


source to share


2 answers


The most naive way to represent scattered data is to use scatter plots. Of course, the problem is that once a certain point density is reached, the scatter plot does not provide any additional information. In this case we use histograms or heatmaps based on some KDE. However, these techniques invariably eliminate detail in the less dense regions of our dataset.

My show suggestion showed that it would be like doing a scatterplot colored with your kde values ​​anyway. For example. and

pyplot.scatter(your_x,your_y,c=your_kde_value,marker='.',linewidth=0)

      

Here your_kde_value

- an array containing KDE function value at the point of your scatter graph (ie, it must have the same shape as your_x

and your_y

.



The results might look like this (using a 10,000 point sample from a bivariate normal distribution:

enter image description here

As you can see, the color information provides all the details in the center, while we still keep the distant points.

+1


source


Here's an example that illustrates my suggestion - it's based on this matplotlib example:



import matplotlib.pyplot as plt
import numpy as np



# make these smaller to increase the resolution
dx, dy = 0.01, 0.01

# generate 2 2d grids for the x & y bounds
y, x = np.mgrid[slice(1, 5 + dy, dy),
                slice(1, 5 + dx, dx)]

z = np.sin(x) ** 10 + np.cos(10 + y * x) * np.cos(x)

plt.contourf(x,y,z, 20, cmap = 'rainbow')    #change these levels
plt.contour(x,y,z, 5, colors = 'k', linewidths = .25) #and here

plt.show()

      

0


source







All Articles