Pandas MemoryError on a server with a lot of memory

Question

Pandas MemoryError on a server with a lot of memory

I have a method for working with dataframes on pandas that behaves differently on 2 different systems. When trying to load and work with a specific csv source, I get memory errors on a Windows Server machine with 16GB of RAM, but not on my local machine, total 12

def load_table(self, name, source_folder="", columns=None):
    """Load a table from memory or csv by name.

    loads a table from memory or csv. if loaded from csv saves the result
    table to the temporary list. An explicit call to save_table is
    necessary if the results want to survive clearing temporary storage
    @param string name the name of the table to load
    @param string sourceFolder the folder to look for the csv if the table
        is not already in memory
    @return DataFrame returns a DataFrame representing the table if found.
    @raises IOError if table cannot be loaded
    """
    #using copy in these first two to avoid modification of existing data
    #without an explicit save_table
    if name in self.tables:
        result = self.tables[name].copy()
    elif name in self.temp_tables:
        result = self.temp_tables[name].copy()
    elif os.path.isfile(name+".csv"):
        data_frame = pd.read_csv(name+".csv", encoding="utf-8")
        self.save_temp(data_frame, name)
        result = data_frame
    elif os.path.isfile(name+".xlsx"):
        data_frame = pd.read_excel(name+".xlsx", encoding="utf-8")
        self.save_temp(data_frame, name)
        result = data_frame
    elif os.path.isfile(source_folder+name+".csv"):
        data_frame = pd.read_csv(source_folder+name+".csv", encoding="utf-8")
        self.save_temp(data_frame, name)
        result = data_frame
    elif os.path.isfile(source_folder+name+".xlsx"):
        data_frame = pd.read_excel(source_folder+name+".xlsx", encoding="utf-8")
        self.save_temp(data_frame, name)
        result = data_frame

and save_temp:

def save_temp(self, data_frame, name):
        """ save a table to the temporary storage

        @param DataFrame data_frame, the data frame we are storing
        @param string name, the key to index this value
        @throws ValueError throws an error if the data frame is empty
        """
        if data_frame.empty:
            raise ValueError("The data frame passed was empty", name, data_frame)
        self.temp_tables[name] = data_frame.copy()

Sometimes a memoryError happens on read_csv, which I tried in an interactive interpreter to load this file manually, which worked and then saved it to the table dictionary mentioned here. Then try to make load_table errors on the copy.

Taking a manually loaded dataframe and calling .copy () on it also throws a MemoryError with no text in the server field, but not locally.

The server server is running Windows Server 2012 R2 while my local computer is Windows 7

Both are 64-bit machines

the server is 2.20 GHz with 2 processors and my local computer is 3.4 GHz Server: 16 GB of RAM Local: 12 GB of RAM

changing .copy () to .copy (False) allows the code to run on the server machine, but doesn't answer the question of why it would get a MemoryError on a machine with more memory in the first place.

Edited to add: both use pandas: 0,16,0 numpy: 1.9.2 Apparently the server is using 32 bit python and my local machine is 64 bit 2.7.8 for both

+3

python pandas memory deep-copy windows-server-2012-r2

lathomas64 May 01 '15 at 14:31

source to share

1 answer

EdChum · Accepted Answer · 2015-05-01T16:59:48+0000

So your problem was that despite having the same pandas version and 64-bit operating system, you had a 32-bit python that has a 2GB memory limit.

Pandas MemoryError on a server with a lot of memory

More articles: