How is the file implemented?

I am curious how files work in python. How is the file implemented so that it can be looped like this:

csv_file = open("filename.csv", "r")
for line in csv_file:
    # do something with line

      

+3


source to share


3 answers


If you're using Python 2, the details are a bit muddy; alexmcf's answer covers the basics and you can find more information from there.

If you are using Python 3, everything is detailed in the module io

and comes with an easy-to-read pure Python implementation in stdlib, all built on top of nothing more than a very simple "raw file" interface (which FileIO

implements the top of Unix's native POSIX file descriptors).

IOBase

ABC / mixin provides a method __iter__

based method readline

:

IOBase

(and its subclasses) supports the iterator protocol, which means that an object IOBase

can be iterated over using strings in a stream. Strings are defined slightly differently depending on whether the stream is a binary stream (with receiving bytes) or a text stream (with receiving character strings). See below readline()

.

And if you look inside source 3.5, it will be as simple as you expect:



def __iter__(self):
    self._checkClosed()
    return self

def __next__(self):
    line = self.readline()
    if not line:
        raise StopIteration
    return line

      

Of course, there is a C accelerator in CPython 3.1+ which was used instead of this Python code if possible, but it looks pretty similar

static PyObject *
iobase_iter(PyObject *self)
{
    if (_PyIOBase_check_closed(self, Py_True) == NULL)
        return NULL;

    Py_INCREF(self);
    return self;
}

static PyObject *
iobase_iternext(PyObject *self)
{
    PyObject *line = PyObject_CallMethodObjArgs(self, _PyIO_str_readline, NULL);

    if (line == NULL)
        return NULL;

    if (PyObject_Size(line) == 0) {
        Py_DECREF(line);
        return NULL;
    }

    return line;
}

      

File objects returned open

and automatically created for things like sys.stdout

, and most or all file objects created elsewhere in stdlib ( GzipFile

etc.) are instances TextIOWrapper

(for text files) or BufferedRandom

, BufferedReader

or BufferedWriter

(for binaries) that inherit this behavior from IOBase

. Nothing stops another file class from overriding __iter__

(or registering using IOBase

like ABC instead of inheriting it), but I don't know anything about it.

+6


source


For Python 2 ...



  • How files are opened From the docs ...

    File objects are implemented using the Cs stdio package and can be created with an inline function open()

    .

  • The file object is a native iterator from the docs :

    The file object is its own iterator, for example, iter (f) returns f (if f is not closed). When a file is used as an iterator, usually in a for loop (for example, for a line in f: print line.strip ()), the next () method is called again.

  • As iterated over the generated file object . The iterator is generated by the newline character, which is specified by the function open()

    . Python recognizes the number of newline formats by default .

+3


source


+1


source







All Articles