How do I make sure the netcdf file is closed in python?

It's probably simple, but I haven't been able to find a solution on the internet ... I'm trying to work with a set of datasets stored as netcdf files. I open each one, read at some points, and then move on to the next file. I found that I was constantly running into errors mmap / script slows down as more files are read. This may be due to the netcdf files not being closed properly with the .close () command.

I tested this:

from scipy.io.netcdf import netcdf_file as ncfile
f=ncfile(netcdf_file,mode='r')
f.close()

      

then if i try

>>>f
<scipy.io.netcdf.netcdf_file object at 0x24d29e10>

      

and

>>>f.variables['temperature'][:]
array([ 1234.68034431,  1387.43136567,  1528.35794546, ...,  3393.91061952,
    3378.2844357 ,  3433.06715226])

      

So it seems the file is still open? What does close () actually do? how do i know it worked? Is there a way to close / clear all open files from python?

Software: Python 2.7.6, scipy 0.13.2, netcdf 4.0.1

+3


source to share


1 answer


Code f.close

:

Definition: f.close(self)
Source:
    def close(self):
        """Closes the NetCDF file."""
        if not self.fp.closed:
            try:
                self.flush()
            finally:
                self.fp.close()

      

f.fp

is a file object. So

In [451]: f.fp
Out[451]: <open file 'test.cdf', mode 'wb' at 0x939df40>

In [452]: f.close()

In [453]: f.fp
Out[453]: <closed file 'test.cdf', mode 'wb' at 0x939df40>

      

But I can see by playing with f

that I can still create dimensions and variables. But it f.flush()

returns an error.

It does not appear that while writing data is used mmap

only during read.

def _read_var_array(self):
            ....
            if self.use_mmap:
                mm = mmap(self.fp.fileno(), begin_+a_size, access=ACCESS_READ)
                data = ndarray.__new__(ndarray, shape, dtype=dtype_,
                        buffer=mm, offset=begin_, order=0)
            else:
                pos = self.fp.tell()
                self.fp.seek(begin_)
                data = fromstring(self.fp.read(a_size), dtype=dtype_)
                data.shape = shape
                self.fp.seek(pos)

      

I don't have much experience with mmap

. It looks like it sets an object mmap

based on the block of bytes in the file and uses it as a data buffer for the variable. I don't know what happens with this access if the main file is closed. I wouldn't be surprised if there is some mistake mmap

.



If the file is opened with mmap=False

, the entire variable is read into memory and acts as a regular array numpy

.

mmap : None or bool, optional
    Whether to mmap `filename` when reading.  Default is True
    when `filename` is a file name, False when `filename` is a
    file-like object

      

My guess is that if you open a file without specifying a mode mmap

, read the variable from it and close the file, that it is unsafe to refer to that variable and its data later. Any link that requires more data to be loaded may result in an error mmap

.

But if you open the file with mmap=False

, you will be able to slice that variable even after the file is closed.

I don't see how mmap

one file or variable can prevent access to other files and variables. But I would have to read more on mmap

to be sure.

And from the netcdf

docs:

Note that when netcdf_file is used to open a file with mmap = True (the default for reading), the arrays it returns refer to data directly on disk. The file must not be closed and cannot be closed when asked if such arrays are alive. You might want to copy the datasets obtained from the mmapped Netcdf file if they need to be processed after the file is closed, see example below.

+1


source







All Articles