Scatter plot for non-digital data

I am learning about using matplotlib with pandas and I am having little problems with it. There is a data block that has districts and coffee shops as their labels y and x respectively. And the column values ​​represent the start date of the cafes in the respective areas.

          starbucks    cafe-cool      barista   ........    60 shops
dist1     2008-09-18  2010-05-04     2007-02-21   ...............
dist2     2007-06-12  2011-02-17       
dist3
.
.
100 districts

      

I want to plot a scatter plot with the x-axis as the time series and the y-axis as the cafe. Since I couldn't figure out a straight line to build this, I got my hands on the coffee shops as one list and dated another list.

shops = list(df.columns.values)
dt = pd.DataFrame(df.ix['dist1'])
dates = dt.set_index('dist1')

      

I tried it first plt.plot(dates, shops)

. Got a ZeroDivisionError: integer division or modulo with zero error. I could not understand the reason for this. I saw in some posts that the data should be numeric, so I used the ytick function.

y = [1, 2, 3, 4, 5, 6,...60] 

      

still plt.plot(dates, y)

threw the same ZeroDivisionError. If I could get by, perhaps I could plot a graph using the tick function. Source -   http://matplotlib.org/examples/ticks_and_spines/ticklabels_demo_rotation.html

I am trying to plot only the first row / dist 1. For this I selected the first row as a dataframe df1 = df.ix[1]

and then used the following

for badges, dates in df.iteritems():

    date = dates

    ax.plot_date(date, yval)

    # Record the number and label of the coffee shop
    label_ticks.append(yval)
    label_list.append(badges)
    yval+=1 

      

... I got an error on the line ax.plot_date(date, yval)

saying that x and y must have the same first dimension. Since I draw one by one for each coffe store for dist1, shouldn't the length always be the same for x and y? PS: date is a datetime.date object

+3


source to share


1 answer


To do this, you need to convert dates to dates, see here for an example. As mentioned, you also need to convert the coffee shops to then some numbering systems then change the tags accordingly.

Here's an attempt



import matplotlib.pyplot as plt
import matplotlib
import numpy as np
import pandas as pd
from datetime import datetime

def get_datetime(string):
    "Converts string '2008-05-04' to datetime"
    return datetime.strptime(string, "%Y-%m-%d")

# Generate datarame
df = pd.DataFrame(dict(
             starbucks=["2008-09-18", "2007-06-12"],
             cafe_cool=["2010-05-04", "2011-02-17"],
             barista=["2007-02-21"]),
             index=["dist1", "dist2"])

ax = plt.subplot(111)

label_list = []
label_ticks = []
yval = 1 # numbering system

# Iterate through coffee shops
for coffee_shop, dates in df.iteritems():

    # Convert strings into datetime list
    datetimes = [get_datetime(date) for date in dates] 

    # Create list of yvals [yval, yval, ...] to plot against
    yval_list = np.zeros(len(dates))+yval

    ax.plot_date(datetimes, yval_list)

    # Record the number and label of the coffee shop
    label_ticks.append(yval)
    label_list.append(coffee_shop)

    yval+=1 # Change the number so they don't all sit at the same y position

# Now set the yticks appropriately
ax.set_yticks(label_ticks)
ax.set_yticklabels(label_list)

# Set the limits so we can see everything
ax.set_ylim(ax.get_ylim()[0]-1,
            ax.get_ylim()[1]+1)

      

+1


source







All Articles