Parsing cmd syntax matches typical filters
I spent a few hours reading the argparse tutorials and was able to learn how to use normal parameters. The official documentation is not very readable. I am new to Python. I am trying to write a program that can be called in the following ways:
cat inFile | program [options] > outFile
- If no inFile or outfile is specified, read from stdin and print to stdout.
program [options] inFile outFile
program [options] inFile > outFile
- If only one file is given, input and output should go to standard output.
cat inFile | program [options] - outFile
- If '-' is specified instead of inFlie read from stdin.
program [options] /path/to/folder outFile
- process all files from /path/to/folder
and its subdirectories.
I want it to behave like a normal cli program under GNU / Linux.
It would be nice if the program could be called:
program [options] inFile0 inFile1 ... inFileN outFile
- the first path / file is always interpreted as input, the last one is always interpreted as output. Any additional, interpreted as inputs.
Maybe I could write dirty code that accomplishes this, but this will be used, so someone will end up maintaining it (and he will know where I live ...).
Any help / suggestions are greatly appreciated.
Combining answers and some additional knowledge from the internet, I managed to write this (it doesn't accept multiple inputs, but that's enough):
import sys, argparse, os.path, glob
def inputFile(path):
if path == "-":
return [sys.stdin]
elif os.path.exists(path):
if os.path.isfile(path):
return [path]
else:
return [y for x in os.walk(path) for y in glob.glob(os.path.join(x[0], '*.dat'))]
else:
exit(2)
def main(argv):
cmdArgsParser = argparse.ArgumentParser()
cmdArgsParser.add_argument('inFile', nargs='?', default='-', type=inputFile)
cmdArgsParser.add_argument('outFile', nargs='?', default='-', type=argparse.FileType('w'))
cmdArgs = cmdArgsParser.parse_args()
print cmdArgs.inFile
print cmdArgs.outFile
if __name__ == "__main__":
main(sys.argv[1:])
Thank!
source to share
You need a positional argument (name does not start with a dash), optional arguments ( nargs='?'
), default argument ( default='-'
). Also, it argparse.FileType
is a convenience for the factory to return sys.stdin
or sys.stdout
if -
passed (depending on the mode).
Together:
#!/usr/bin/env python
import argparse
# default argument is sys.argv[0]
parser = argparse.ArgumentParser('foo')
parser.add_argument('in_file', nargs='?', default='-', type=argparse.FileType('r'))
parser.add_argument('out_file', nargs='?', default='-', type=argparse.FileType('w'))
def main():
# default argument is is sys.argv[1:]
args = parser.parse_args(['bar', 'baz'])
print(args)
args = parser.parse_args(['bar', '-'])
print(args)
args = parser.parse_args(['bar'])
print(args)
args = parser.parse_args(['-', 'baz'])
print(args)
args = parser.parse_args(['-', '-'])
print(args)
args = parser.parse_args(['-'])
print(args)
args = parser.parse_args([])
print(args)
if __name__ == '__main__':
main()
source to share
I'll give you a script to start with. He uses optionals
instead positionals
. and only one input file. But it should give a taste of what you can do.
import argparse
parser = argparse.ArgumentParser()
inarg = parser.add_argument('-i','--infile', type=argparse.FileType('r'), default='-')
outarg = parser.add_argument('-o','--outfile', type=argparse.FileType('w'), default='-')
args = parser.parse_args()
print(args)
cnt = 0
for line in args.infile:
print(cnt, line)
args.outfile.write(line)
cnt += 1
When called with no arguments, it just prints your input (after the ^ D). I'm a little concerned that it doesn't come out until I issue another ^ D.
FileType
handy, but has a major bug - it opens files, but you have to close them yourself, or let Python do it on exit. There's also a complication that you don't want to close stdin / out.
The best argparse
questions include a basic script and specific questions on how to fix or improve it. Your specifications are clear enough. but it would be nice if you gave us more to work with.
To handle the subdirectory option, I would skip a bit FileType
. Use argparse
to get 2 lists of strings (or list and name) and then do the necessary chgdir
and or glob
to find and iterate over the files. Don't expect argparse
to do the actual job. Use it to parse strings on the command line. Here's a sketch of such a script, leaving most of the details for you.
import argparse
import os
import sys # of stdin/out
....
def open_output(outfile):
# function to open a file for writing
# should handle '-'
# return a file object
def glob_dir(adir):
# function to glob a dir
# return a list of files ready to open
def open_forread(afilename):
# function to open file for reading
# be sensitive to '-'
def walkdirs(alist):
outlist = []
for name in alist:
if <name is file>;
outlist.append(name)
else <name is a dir>:
glist = glob(dir)
outlist.extend(glist)
else:
<error>
return outlist
def cat(infile, outfile):
<do your thing here>
def main(args):
# handle args options
filelist = walkdirs(args.inlist)
fout = open_outdir(args.outfile)
for name in filelist:
fin = open_forread(name)
cat(fin,fout)
if <fin not stdin>: fin.close()
if <fout not stdout>: fout.close()
if '__name__' == '__main__':
parser = argparse.ArgumentParser()
parser.add_argument('inlist', nargs='*')
parser.add_argument('outfile')
# add options
args = parser.parse_args()
main(args)
The parser
name is required here outfile
, even if there is one -. I could define it nargs='?'
to make it optional. But that doesn't go well with "inlist" *.
Consider
myprog one two three
it
namespace(inlist=['one','two','three'], outfile=default)
or
namespace(inlist=['one','two'], outfile='three')
When positioning *
and the ?
id of the last line is ambiguous - is this the last entry for inlist
or an optional entry for outfile
? argparse
selects the first and never assigns a value outfile
.
With the definitions --infile
, the --outfile
emphasis on these lines is clear.
In a sense, this problem is too complex for argparse
- there is nothing in it to handle things like directories. In another sense, it is too easy. You can just as easily split sys.argv[1:]
between inlist
and outfile
without help argparse
.
source to share