Using DataFrame.plot to create a subplot chart - how to use the ax parameter
I cannot wrap my head around the axes parameter, what it contains and how to use it to create subplots.
It would be very helpful if someone could explain what is going on in the following example
fig, axes = plt.subplots(nrows=3, ncols=4, figsize=(15, 10))
for idx, feature in enumerate(df.columns[:-1]):
df.plot(feature, "cnt", subplots=True, kind="scatter", ax=axes[idx / 4, idx % 4])
Here is the data ( UCI Bike exchange dataset ): Here is the output of the code snippet (pairwise comparison of functions and end results):
To be more specific, here are the parts that I actually understand (at least I think I know)
- plt.subplots returns a tuple containing a shape and axes object ( link )
- enumerate () returns a tuple containing the index of the function and its name ( link )
- df.plot uses column names to put data in subplots in fig
Here's what I don't understand
- What does the axes object contain? Again, based on the documentation and this , I understand that the axes contain "Axis, Tick, Line2D, Text, Polygon, etc." but
- What are we using with the [x, y] axes?
- why in this example the author chose to use [idx / 4, idx% 4] as values?
source to share
Regarding the last question about indexing an array as [idx / 4, idx % 4]
:
The idea is to iterate over all sub-tasks and all data columns at the same time. The problem is that the axis array is two-dimensional and the column array is one-dimensional. Therefore, it is necessary to decide which of them the loop follows and map the loop index / indices to the other dimension.
An intuitive way would be to use two loops
for i in range(axes.shape[0]):
for j in range(axes.shape[1]):
df.plot(df.columns[i*axes.shape[0]+j], "cnt", ... , ax=axes[i,j])
Here i*axes.shape[0]+j
maps two dimensions of a numpy array to a single dimension of a column list.
In the example from the question, the loop is over the columns, which means that we must somehow map a one-dimensional index to two dimensions. This is what does .. or should do. It will only work in python 2. To make it clearer and keep the version, actually need to use . makes it clear that integer division is being used. So, for the first 4 values, idx (0,1,2,3) is 0, for the next set of 4 values - 1, and so on. calculates the index modulo 4. So (0,1,2,3) maps to (0,1,2,3) and then (4,5,6,7) maps to (0,1,2,3) ) etc. [idx / 4, idx % 4]
[idx // 4, idx % 4]
//
idx // 4
idx % 4
An alternative solution using a single loop would be to flatten the axis array:
for idx, feature in enumerate(df.columns[:-1]):
df.plot(feature, "cnt", ... , ax=axes.flatten()[idx])
or maybe most pythonic
for ax, feature in zip(axes.flatten(), df.columns[:-1]):
df.plot(feature, "cnt", ... , ax=ax)
source to share
Object axes
in your code is a 2D array of objects Numpy matplotlibaxes
. Since the call was subplots()
asking for 3 rows and 4 columns, the array will be 3 by 4. Indexing into an array of type axes[r, c]
gives you an object axes
corresponding to the row r
and column c
, and you can pass that object as a keyword argument ax
to the plotting method to plot the plot on that axis. For example. if you want to draw something in the second and second columns you have to call plot(..., ax=axes[1,1])
.
The code is used [idx / 4, idx % 4]
as a way of converting indices (numbers from 0 to 11) to locations on a 3 by 4 grid. Try to evaluate this expression yourself with the idx
given value for each value from 0 to 11, and you will see how it works.
source to share