Runs memory while plotting, Python

I am fetching a large amount of data from a database, which I later draw using a scatter plot. However, I am running out of memory and the program aborts when I use full data. It takes> 30 minutes to write this program and the length of the data list is about 20-30 million.

map = Basemap(projection='merc',
resolution = 'c', area_thresh = 10,
llcrnrlon=-180, llcrnrlat=-75,
urcrnrlon=180, urcrnrlat=82)

map.drawcoastlines(color='black')
# map.fillcontinents(color='#27ae60')
with lite.connect('database.db') as con:
    start = 1406851200
    end = 1409529600
    cur = con.cursor()
    cur.execute('SELECT latitude, longitude FROM plot WHERE unixtime >= {start} AND unixtime < {end}'.format(start = start, end = end))
    data = cur.fetchall()
    y,x = zip(*data)
    x,y = map(x,y)
    plt.scatter(x,y, s=0.05, alpha=0.7, color="#e74c3c", edgecolors='none')
    plt.savefig('Plot.pdf')
    plt.savefig('Plot.png')

      

I think my problem might be in the zip (*) function, but I really don't know. I am wondering how I can save more memory by rewriting my existing code and decoupling the plotting process. My idea is to split the time period in half and then just do the same thing twice for two time periods before saving the shape, however I'm not sure if this helps me at all. If the problem is actually speaking, I had no idea.

+3


source to share


1 answer


If you think the problem is with the function zip

, why not use a matplotlib array to mass-scale your data in the right format? Something like that:

data = numpy.array(cur.fetchall())
lat = data[:,0]
lon = data[:,1]
x,y = map(lon, lat)

      



Also, your generated PDF will be very large and slow to render by various PDF readers as it is vectorized by default. All of your millions of data points will be kept floating and displayed when the user opens the document. I recommend adding an argument rasterized=True

to your call plt.scatter()

. This will save the result as a bitmap inside your PDF (see Docs here )

If all this does not help, I will investigate further by commenting out the lines starting at the back. That is, first comment out plt.savefig('Plot.png')

and see if the memory usage is gone. If not, comment out the line before that, etc.

+2


source







All Articles