Combining In-Place Filtering and Encoding Settings in File Input Module

I fileinput

use fileinput

filtering on a fileinput

module fileinput

to overwrite the input file in place.

It was required to set the encoding (for both reading and writing) to a value latin-1

and try to pass openhook=fileinput.hook_encoded('latin-1')

in fileinput.input

but the fileinput.input

error is due to an error

ValueError: FileInput cannot use an opening hook in inplace mode

      

Upon closer inspection, I see that the documentation for it fileinput

clearly states: you cannot use in-place and openhook together

How can I get around this?

+6


source to share


3 answers


As far as I know fileinput

there is no way with this module . You can accomplish the same task with a combination of a module codecs

, os.rename()

and os.remove()

:

import os
import codecs

input_name = 'some_file.txt'
tmp_name = 'tmp.txt'

with codecs.open(input_name, 'r', encoding='latin-1') as fi, \
     codecs.open(tmp_name, 'w', encoding='latin-1') as fo:

    for line in fi:
        new_line = do_processing(line) # do your line processing here
        fo.write(new_line)

os.remove(input_name) # remove original
os.rename(tmp_name, input_name) # rename temp to original name

      



You also have the option to specify a new encoding for the output file if you want to change it, or leave it as latin-1

when you open the output file if you don't want it to change.

I know this is not the in-place modification you were looking for, but it will do the task you are trying to do and is very flexible.

+5


source


This is very similar to the other answer, just done in function form so that it can be called multiple times with ease:

def inplace(orig_path, encoding='latin-1'):
    """Modify a file in-place, with a consistent encoding."""
    new_path = orig_path + '.modified'
    with codecs.open(orig_path, encoding=encoding) as orig:
        with codecs.open(new_path, 'w', encoding=encoding) as new:
            for line in orig:
                yield line, new
    os.rename(new_path, orig_path)

      

And this is what it looks like in action:



for line, new in inplace(path):
    line = do_processing(line)  # Use your imagination here.
    new.write(line)

      

This is true for both python2 and python3 and has the right thing with your data as long as you specify the correct encoding (in my case I really needed it utf-8

everywhere, but your needs are clearly changing).

+1


source


I am not too keen on existing solutions using rename

/ remove

because they simplify some of the file operations that the flag does inplace

- like file mode handling, attribute handling chmod

, etc.

In my case, since I have control over the environment my code will run in, I figured that the only sensible solution would be to set my locale to use UTF8:

export LC_ALL=en_US.UTF-8

      

Effect:

sh-4.2> python3.6 -c "import fileinput;
for line in fileinput.FileInput('DESCRIPTION', inplace=True): print(line.rstrip() + 'hi')
print('done')"
Traceback (most recent call last):
  File "<string>", line 2, in <module>
  File "/usr/lib64/python3.6/fileinput.py", line 250, in __next__
    line = self._readline()
  File "/usr/lib64/python3.6/fileinput.py", line 364, in _readline
    return self._readline()
  File "/usr/lib64/python3.6/encodings/ascii.py", line 26, in decode
    return codecs.ascii_decode(input, self.errors)[0]
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 227: ordinal not in range(128)'

sh-4.2> export LC_ALL=en_US.UTF-8
sh-4.2> python3.6 -c "import fileinput;
for line in fileinput.FileInput('DESCRIPTION', inplace=True): print(line.rstrip() + 'hi')
print('done')"
done

sh-4.2# 

      

Potential side effects are changes in the input and output of other files, but that doesn't bother me.

+1


source







All Articles