Using DataFrame.plot to create a subplot chart - how to use the ax parameter

I cannot wrap my head around the axes parameter, what it contains and how to use it to create subplots.

It would be very helpful if someone could explain what is going on in the following example

fig, axes = plt.subplots(nrows=3, ncols=4, figsize=(15, 10))
for idx, feature in enumerate(df.columns[:-1]):
  df.plot(feature, "cnt", subplots=True, kind="scatter", ax=axes[idx / 4, idx % 4])

      

Here is the data ( UCI Bike exchange dataset ): table with 5 rows of raw data Here is the output of the code snippet (pairwise comparison of functions and end results): nice graph with subtitles

To be more specific, here are the parts that I actually understand (at least I think I know)

  • plt.subplots returns a tuple containing a shape and axes object ( link )
  • enumerate () returns a tuple containing the index of the function and its name ( link )
  • df.plot uses column names to put data in subplots in fig

Here's what I don't understand

  • What does the axes object contain? Again, based on the documentation and this , I understand that the axes contain "Axis, Tick, Line2D, Text, Polygon, etc." but
    • What are we using with the [x, y] axes?
    • why in this example the author chose to use [idx / 4, idx% 4] as values?
+3


source to share


2 answers


Regarding the last question about indexing an array as [idx / 4, idx % 4]

:

The idea is to iterate over all sub-tasks and all data columns at the same time. The problem is that the axis array is two-dimensional and the column array is one-dimensional. Therefore, it is necessary to decide which of them the loop follows and map the loop index / indices to the other dimension.

An intuitive way would be to use two loops

for i in range(axes.shape[0]):
    for j in range(axes.shape[1]):
        df.plot(df.columns[i*axes.shape[0]+j], "cnt", ... , ax=axes[i,j])

      

Here i*axes.shape[0]+j

maps two dimensions of a numpy array to a single dimension of a column list.



In the example from the question, the loop is over the columns, which means that we must somehow map a one-dimensional index to two dimensions. This is what does .. or should do. It will only work in python 2. To make it clearer and keep the version, actually need to use . makes it clear that integer division is being used. So, for the first 4 values, idx (0,1,2,3) is 0, for the next set of 4 values ​​- 1, and so on. calculates the index modulo 4. So (0,1,2,3) maps to (0,1,2,3) and then (4,5,6,7) maps to (0,1,2,3) ) etc. [idx / 4, idx % 4]

[idx // 4, idx % 4]

//

idx // 4

idx % 4

An alternative solution using a single loop would be to flatten the axis array:

for idx, feature in enumerate(df.columns[:-1]):
    df.plot(feature, "cnt", ... , ax=axes.flatten()[idx])

      

or maybe most pythonic

for ax, feature in zip(axes.flatten(), df.columns[:-1]):
    df.plot(feature, "cnt", ... , ax=ax)

      

+2


source


Object axes

in your code is a 2D array of objects Numpy matplotlibaxes

. Since the call was subplots()

asking for 3 rows and 4 columns, the array will be 3 by 4. Indexing into an array of type axes[r, c]

gives you an object axes

corresponding to the row r

and column c

, and you can pass that object as a keyword argument ax

to the plotting method to plot the plot on that axis. For example. if you want to draw something in the second and second columns you have to call plot(..., ax=axes[1,1])

.



The code is used [idx / 4, idx % 4]

as a way of converting indices (numbers from 0 to 11) to locations on a 3 by 4 grid. Try to evaluate this expression yourself with the idx

given value for each value from 0 to 11, and you will see how it works.

+2


source







All Articles