Creating multiple arrays in a for loop (Python)

I am currently having a problem with Numpy arrays. If this question has already been asked elsewhere, I am sorry, but I feel like I looked all over the place.

My initial problem was that I was trying to create an array and fill it with multiple sets of station data of different sizes. Since I cannot fill the same array with datasets that vary in size, I figured I needed to create a new array for each station dataset by defining an array inside a for loop that I use to iterate over each station dataset. The problem is, when looping through, each dataset will overwrite the previous dataset, returning only the last instance of the for loop.

Then I tried using + and then join operations to concatenate a new header for each array, but it turns out that this is illegal when defining arrays. This is a program instance where each data array overwrites the previous one. Please note that not all code is included and that this is part of the definition.

for k in range(len(stat_id)):

    ## NOTE - more code precedes this final portion of the for loop, but was
    ## not included as it is unrelated to the issue at hand.

    # Bring all the data into one big array.
    metar_dat = np.zeros((len(stat_id),len(temp),7), dtype='object')
    for i in range(len(temp)):
        metar_dat[k,i] = np.dstack((stat_id[k], yr[i], month[i], day[i], time[i], temp[i], dwp[i]))
    #print np.shape(metar_dat[k])
    #print metar_dat[k]

#print np.shape(metar_dat) # Confirm success with shape read.
return metar_dat

      

After running and printing the array from this definition, I get this (two empty arrays and the final filled array):

[[[0 0 0 ..., 0 0 0]
[0 0 0 ..., 0 0 0]
[0 0 0 ..., 0 0 0]
..., 
[0 0 0 ..., 0 0 0]
[0 0 0 ..., 0 0 0]
[0 0 0 ..., 0 0 0]]

[[0 0 0 ..., 0 0 0]
[0 0 0 ..., 0 0 0]
[0 0 0 ..., 0 0 0]
..., 
[0 0 0 ..., 0 0 0]
[0 0 0 ..., 0 0 0]
[0 0 0 ..., 0 0 0]]

[[\TZR 2015 7 ..., 2342 58 48]
[\TZR 2015 7 ..., 2300 59 47]
[\TZR 2015 7 ..., 2200 60 48]
..., 
[\TZR 2015 7 ..., 0042 56 56]
[\TZR 2015 7 ..., 0022 56 56]
[\TZR 2015 7 ..., 0000 56 56]]]

      

My question is:

How do I create an array for each station dataset so that I don't overwrite any previous data?

Or

How can I create a single array containing datasets with different number of rows?

I'm still new to Python (and new to posting here) and any ideas would be much appreciated.

+3


source to share


3 answers


You are setting your 2D array to be zero inside your k-loop every time. Set it to zero (or empty if all elements are filled, as in your case) once outside of your nested loop and you should be fine:



metar_dat = np.empty((len(stat_id),len(temp),7), dtype='object')
for k in range(len(stat_id)):
    for i in range(len(temp)):
        metar_dat[k,i] = np.dstack((stat_id[k], yr[i], month[i], day[i], time[i], temp[i], dwp[i]))
return metar_dat

      

0


source


You end up with an array metar_dat

that is basically 0 because this is the one you created in the last iteration k

. It was len(stat_id)

long (in 1 dimensions), but you only inserted data for the latter k

. You have discarded the results for the earlier one k

.

I would suggest collecting data in a dictionary rather than an array of objects.

metar_dat = dict()  # dictionary rather than object array
for id in stat_id:
    # Bring all the data into one big array.
    data = np.column_stack([yr, month, day, time,temp, dwp])
    # should produce as (len(temp),6) integer array
    # or float is one or mo    for k in range(len(stat_id)):
    metar_dat[id] = data

      

If it len(temp)

changes for each id

, you cannot create a meaningful 3d array with the shape (len(stat_id), len(temp), 7)

- unless you fit each to the same maximum length. When we think of arrays, subject rectangles, not dangling lists.

A Python dictionary is a much better way to gather information with some sort of unique identifier.



Arrays of objects allow you to generalize the concept of numeric arrays, but they don't offer much extra power over lists or dictionaries. For example, you cannot add values ​​to the "id" dimension.

You need to describe what you hope to do with this data as soon as you collect it. This will help guide our presentation guidelines.

There are other ways to define the data structure for each id

. It seems that yr

, time

, temp

are equal to the array length. If they are all numbers, they can be collected in a 6-column array. If it is important to store some integer while others are floats (or even strings), you can use a structured array.

Structured arrays are often created by reading column data from a csv file. Some columns will have string data (ids), other integers or even dates, others will have floating point data. np.genfromtxt

is a good tool to download such a file.

0


source


You can also take a look at this post,

How can I make multiple empty arrays in python?

Search list of concepts

listOfLists = [[] for i in range (N)] Now listOfLists has N empty lists in it

0


source







All Articles