Pandas MemoryError on a server with a lot of memory
I have a method for working with dataframes on pandas that behaves differently on 2 different systems. When trying to load and work with a specific csv source, I get memory errors on a Windows Server machine with 16GB of RAM, but not on my local machine, total 12
def load_table(self, name, source_folder="", columns=None):
"""Load a table from memory or csv by name.
loads a table from memory or csv. if loaded from csv saves the result
table to the temporary list. An explicit call to save_table is
necessary if the results want to survive clearing temporary storage
@param string name the name of the table to load
@param string sourceFolder the folder to look for the csv if the table
is not already in memory
@return DataFrame returns a DataFrame representing the table if found.
@raises IOError if table cannot be loaded
"""
#using copy in these first two to avoid modification of existing data
#without an explicit save_table
if name in self.tables:
result = self.tables[name].copy()
elif name in self.temp_tables:
result = self.temp_tables[name].copy()
elif os.path.isfile(name+".csv"):
data_frame = pd.read_csv(name+".csv", encoding="utf-8")
self.save_temp(data_frame, name)
result = data_frame
elif os.path.isfile(name+".xlsx"):
data_frame = pd.read_excel(name+".xlsx", encoding="utf-8")
self.save_temp(data_frame, name)
result = data_frame
elif os.path.isfile(source_folder+name+".csv"):
data_frame = pd.read_csv(source_folder+name+".csv", encoding="utf-8")
self.save_temp(data_frame, name)
result = data_frame
elif os.path.isfile(source_folder+name+".xlsx"):
data_frame = pd.read_excel(source_folder+name+".xlsx", encoding="utf-8")
self.save_temp(data_frame, name)
result = data_frame
and save_temp:
def save_temp(self, data_frame, name):
""" save a table to the temporary storage
@param DataFrame data_frame, the data frame we are storing
@param string name, the key to index this value
@throws ValueError throws an error if the data frame is empty
"""
if data_frame.empty:
raise ValueError("The data frame passed was empty", name, data_frame)
self.temp_tables[name] = data_frame.copy()
Sometimes a memoryError happens on read_csv, which I tried in an interactive interpreter to load this file manually, which worked and then saved it to the table dictionary mentioned here. Then try to make load_table errors on the copy.
Taking a manually loaded dataframe and calling .copy () on it also throws a MemoryError with no text in the server field, but not locally.
The server server is running Windows Server 2012 R2 while my local computer is Windows 7
Both are 64-bit machines
the server is 2.20 GHz with 2 processors and my local computer is 3.4 GHz Server: 16 GB of RAM Local: 12 GB of RAM
changing .copy () to .copy (False) allows the code to run on the server machine, but doesn't answer the question of why it would get a MemoryError on a machine with more memory in the first place.
Edited to add: both use pandas: 0,16,0 numpy: 1.9.2 Apparently the server is using 32 bit python and my local machine is 64 bit 2.7.8 for both
source to share