How do I create a histogram plot for my dataset?
I have the following data file df
:
time_diff avg_trips_per_day
631 1.0
231 1.0
431 1.0
7031 1.0
17231 1.0
20000 20.0
21000 15.0
22000 10.0
I want to create a histogram with time_diff
x avg_trips_per_day
-axis and y-axis to see the distribution of values time_diff
. So the Y-axis is not the repetition rate of the X's values df
, but it should be avg_trips_per_day
. The problem is I donβt know how to put time_diff
in the bins in order to treat it as a continuous variable.
This is what I'm trying, but it puts all possible values time_diff
on the X axis.
norm = plt.Normalize(df["avg_trips_per_day"].values.min(), df["avg_trips_per_day"].values.max())
colors = plt.cm.spring(norm(df["avg_trips_per_day"]))
plt.figure(figsize=(12,8))
ax = sns.barplot(x="time_diff", y="avg_trips_per_day", data=df, palette=colors)
plt.xticks(rotation='vertical', fontsize=12)
ax.grid(b=True, which='major', color='#d3d3d3', linewidth=1.0)
ax.grid(b=True, which='minor', color='#d3d3d3', linewidth=0.5)
plt.show()
source to share
import pandas as pd
import seaborn as sns
from io import StringIO
data = pd.read_table(StringIO("""time_diff avg_trips_per_day
631 1.0
231 1.0
431 1.0
7031 1.0
17231 1.0
20000 20.0
21000 15.0
22000 10.0"""), delim_whitespace=True)
data['timegroup'] = pd.qcut(data['time_diff'], 3)
sns.barplot(x='timegroup', y='avg_trips_per_day', data=data)
Is this what you want?
source to share
As you explained yourself, you don't need a histogram, but a simple barcode. But from what I understood you want bin time_diff
to build.
The following should help you combine your data and group it for a dataframe:
import pandas a pd
n_bins = 10
# bins indexed if want to use for x axis
x_bins = np.arange(n_bins)
# create bins
_, bins = pd.cut(df['time_diff'], bins=n_bins, retbins=True, right=False)
# regroup your data by computed bins indexes
binned_data = df['time_diff'].groupby(np.digitize(df['time_diff'], bins)).mean()
source to share