How do I make sure the netcdf file is closed in python?
It's probably simple, but I haven't been able to find a solution on the internet ... I'm trying to work with a set of datasets stored as netcdf files. I open each one, read at some points, and then move on to the next file. I found that I was constantly running into errors mmap / script slows down as more files are read. This may be due to the netcdf files not being closed properly with the .close () command.
I tested this:
from scipy.io.netcdf import netcdf_file as ncfile
f=ncfile(netcdf_file,mode='r')
f.close()
then if i try
>>>f
<scipy.io.netcdf.netcdf_file object at 0x24d29e10>
and
>>>f.variables['temperature'][:]
array([ 1234.68034431, 1387.43136567, 1528.35794546, ..., 3393.91061952,
3378.2844357 , 3433.06715226])
So it seems the file is still open? What does close () actually do? how do i know it worked? Is there a way to close / clear all open files from python?
Software: Python 2.7.6, scipy 0.13.2, netcdf 4.0.1
source to share
Code f.close
:
Definition: f.close(self)
Source:
def close(self):
"""Closes the NetCDF file."""
if not self.fp.closed:
try:
self.flush()
finally:
self.fp.close()
f.fp
is a file object. So
In [451]: f.fp
Out[451]: <open file 'test.cdf', mode 'wb' at 0x939df40>
In [452]: f.close()
In [453]: f.fp
Out[453]: <closed file 'test.cdf', mode 'wb' at 0x939df40>
But I can see by playing with f
that I can still create dimensions and variables. But it f.flush()
returns an error.
It does not appear that while writing data is used mmap
only during read.
def _read_var_array(self):
....
if self.use_mmap:
mm = mmap(self.fp.fileno(), begin_+a_size, access=ACCESS_READ)
data = ndarray.__new__(ndarray, shape, dtype=dtype_,
buffer=mm, offset=begin_, order=0)
else:
pos = self.fp.tell()
self.fp.seek(begin_)
data = fromstring(self.fp.read(a_size), dtype=dtype_)
data.shape = shape
self.fp.seek(pos)
I don't have much experience with mmap
. It looks like it sets an object mmap
based on the block of bytes in the file and uses it as a data buffer for the variable. I don't know what happens with this access if the main file is closed. I wouldn't be surprised if there is some mistake mmap
.
If the file is opened with mmap=False
, the entire variable is read into memory and acts as a regular array numpy
.
mmap : None or bool, optional
Whether to mmap `filename` when reading. Default is True
when `filename` is a file name, False when `filename` is a
file-like object
My guess is that if you open a file without specifying a mode mmap
, read the variable from it and close the file, that it is unsafe to refer to that variable and its data later. Any link that requires more data to be loaded may result in an error mmap
.
But if you open the file with mmap=False
, you will be able to slice that variable even after the file is closed.
I don't see how mmap
one file or variable can prevent access to other files and variables. But I would have to read more on mmap
to be sure.
And from the netcdf
docs:
Note that when netcdf_file is used to open a file with mmap = True (the default for reading), the arrays it returns refer to data directly on disk. The file must not be closed and cannot be closed when asked if such arrays are alive. You might want to copy the datasets obtained from the mmapped Netcdf file if they need to be processed after the file is closed, see example below.
source to share