Python processes multiple files iteratively, without an explicit loop
I have a script that uses a large chunk of text to train a model. Now that it's written I can either read from a file or stdin
parser.add_argument('-i', help='input_file', default=sys.stdin)
... # do a bunch of other stuff
if args.i is sys.stdin:
m.train(args.i)
else:
m.train(open(args.i, 'r'))
then I can call my script as follows:
python myscript.py -i trainingdata.txt
or
cat trainingdata.txt | python myscript.py
The second version is especially useful if I want to search the file system and use multiple files to train the model. However, it gets tricky, because of the pipe, if I try to profile with cProfiler
ie at the same time
python -m cProfile myscript.py ...
I know I can send multiple files using a parameter -i
and iterate over the files, but then I'll have to change the behavior of the method train()
to avoid data overwriting.
Is there a good way to open an IO channel for lack of a better expression that concatenates input without explicitly reading and writing line by line?
source to share
you can chain
open files and use generator to yield
open files from filenames:
from itertools import chain
def yield_open(filenames):
for filename in filenames:
with open(filename, 'r') as file:
yield file
def train(file):
for line in file:
print(line, end='')
print()
files = chain.from_iterable(yield_open(filenames=['file1.txt', 'file2.txt']))
train(files)
this has the added advantage that only one of your files is open at a time.
you can also use this as a "data pipeline" (might be more readable):
file_gen = yield_open(filenames=['file1.txt', 'file2.txt']) files = chain.from_iterable(file_gen) train(files)
source to share