Scatter plot with alpha, still opaque over areas where spots are dense

I have a scatter plot that draws a very large number of points from two different datasets. In some areas, there are a huge number of points, so even at very low alpha (for example, alpha = 0.1) you cannot see through the mass. But with this alpha you can barely see the points in the sparse regions. Is there a way to bind the alpha for the stacked points, or somehow make the background visible under dense areas without washing out the sparse areas?

The code snippet looks like this:

# Code to populate the datasets not included.
fig, ax = plt.subplots()
ax.scatter(x1, y1, s=12, color='red')
ax.scatter(x2, y2, s=12, color='blue', alpha=0.1)
# Plus code to do xlabels and such not included.

      

to produce this:

enter image description here

As you can see, it is difficult to see the boundaries of the lower red leg and still release the upper blue leg.

Is there a way to create this effect?

Thanks in advance.

EDIT

One good suggestion is to use hexbins instead of spread. It sounds promising, but the colors still don't match nicely. For example,

ax.hexbin(x1, y1, cmap='Reds', mincnt=1, vmax=100)
ax.hexbin(x2, y2, cmap='Blues', mincnt=1, vmax=50, alpha=0.8, linewidths=0)

      

gives:

enter image description here

It would be really nice to have these blues and reds merged. Maybe each pixel could have an R value from one dataset and a B value from another dataset, or something else? But that doesn't look like the hexbin option.

EDIT

After using Thomasillo, answer:

enter image description here

Thanks, I think it looks better than the original.

+3


source to share


1 answer


1) To improve the hexbin graph, you can use the bins = 'log' option. This calculates the color of the hexagonal binning logarithmically, effectively making the lower numbers stick out better than the higher ones.

2) Calculate the density for each dataset yourself. And from both densities generate color, for example. allowing one density to influence red and another to influence blue. Highlight the result using imshow.



Something like

import matplotlib.pyplot as plt
import numpy as np
import itertools

x1 = np.random.binomial(5100,0.5,51100)
y1 = np.random.binomial(5000,0.7,51100)
x2 = np.random.binomial(5000,0.5,51100)
y2 = np.random.binomial(5000,0.7,51100)


xmin,xmax,xnum = 2350,2700,50
ymin,ymax,ynum = 3350,3700,50
xx,yy=np.mgrid[xmin:xmax:xnum*1j,ymin:ymax:ynum*1j]

def closest_idx(x,y):
    idcs    = np.argmin((xx-x)**2 + (yy-y)**2)
    i_x,i_y = np.unravel_index(idcs, (xnum,ynum) )
    return i_x,i_y

def calc_count( xdat,ydat ):
    ct = np.zeros_like(xx)
    for x,y in itertools.izip(xdat,ydat):
        ix,iy = closest_idx(x,y)
        ct [ix,iy] += 1
    return ct

ct1 = calc_count( x1,y1 )
ct2 = calc_count( x2,y2 )

def color_mix( c1 , c2 ):
    cm=np.empty_like(c1)
    for i in [0,1,2]:
        cm[i] = (c1[i]+c2[i])/2.
    return cm

dens1 = ct1 / np.max(ct1)
dens2 = ct2 / np.max(ct2)

ct1_color = np.array([1+0*dens1 , 1-dens1 , 1-dens1  ])
ct2_color = np.array([1-dens2   , 1-dens2 , 1+0*dens2])

col = color_mix( ct1_color , ct2_color )
col = np.transpose( col, axes=(2,1,0))


plt.imshow( col , interpolation='nearest' ,extent=(xmin,xmax,ymin,ymax),origin='lower')
plt.show()

      

+1


source







All Articles