Combining In-Place Filtering and Encoding Settings in File Input Module
I fileinput
use fileinput
filtering on a fileinput
module fileinput
to overwrite the input file in place.
It was required to set the encoding (for both reading and writing) to a value latin-1
and try to pass openhook=fileinput.hook_encoded('latin-1')
in fileinput.input
but the fileinput.input
error is due to an error
ValueError: FileInput cannot use an opening hook in inplace mode
Upon closer inspection, I see that the documentation for it fileinput
clearly states: you cannot use in-place and openhook together
How can I get around this?
source to share
As far as I know fileinput
there is no way with this module . You can accomplish the same task with a combination of a module codecs
, os.rename()
and os.remove()
:
import os
import codecs
input_name = 'some_file.txt'
tmp_name = 'tmp.txt'
with codecs.open(input_name, 'r', encoding='latin-1') as fi, \
codecs.open(tmp_name, 'w', encoding='latin-1') as fo:
for line in fi:
new_line = do_processing(line) # do your line processing here
fo.write(new_line)
os.remove(input_name) # remove original
os.rename(tmp_name, input_name) # rename temp to original name
You also have the option to specify a new encoding for the output file if you want to change it, or leave it as latin-1
when you open the output file if you don't want it to change.
I know this is not the in-place modification you were looking for, but it will do the task you are trying to do and is very flexible.
source to share
This is very similar to the other answer, just done in function form so that it can be called multiple times with ease:
def inplace(orig_path, encoding='latin-1'):
"""Modify a file in-place, with a consistent encoding."""
new_path = orig_path + '.modified'
with codecs.open(orig_path, encoding=encoding) as orig:
with codecs.open(new_path, 'w', encoding=encoding) as new:
for line in orig:
yield line, new
os.rename(new_path, orig_path)
And this is what it looks like in action:
for line, new in inplace(path):
line = do_processing(line) # Use your imagination here.
new.write(line)
This is true for both python2 and python3 and has the right thing with your data as long as you specify the correct encoding (in my case I really needed it utf-8
everywhere, but your needs are clearly changing).
source to share
I am not too keen on existing solutions using rename
/ remove
because they simplify some of the file operations that the flag does inplace
- like file mode handling, attribute handling chmod
, etc.
In my case, since I have control over the environment my code will run in, I figured that the only sensible solution would be to set my locale to use UTF8:
export LC_ALL=en_US.UTF-8
Effect:
sh-4.2> python3.6 -c "import fileinput;
for line in fileinput.FileInput('DESCRIPTION', inplace=True): print(line.rstrip() + 'hi')
print('done')"
Traceback (most recent call last):
File "<string>", line 2, in <module>
File "/usr/lib64/python3.6/fileinput.py", line 250, in __next__
line = self._readline()
File "/usr/lib64/python3.6/fileinput.py", line 364, in _readline
return self._readline()
File "/usr/lib64/python3.6/encodings/ascii.py", line 26, in decode
return codecs.ascii_decode(input, self.errors)[0]
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 227: ordinal not in range(128)'
sh-4.2> export LC_ALL=en_US.UTF-8
sh-4.2> python3.6 -c "import fileinput;
for line in fileinput.FileInput('DESCRIPTION', inplace=True): print(line.rstrip() + 'hi')
print('done')"
done
sh-4.2#
Potential side effects are changes in the input and output of other files, but that doesn't bother me.
source to share