Parsing cmd syntax matches typical filters

I spent a few hours reading the argparse tutorials and was able to learn how to use normal parameters. The official documentation is not very readable. I am new to Python. I am trying to write a program that can be called in the following ways:

cat inFile | program [options] > outFile

- If no inFile or outfile is specified, read from stdin and print to stdout.

program [options] inFile outFile

program [options] inFile > outFile

- If only one file is given, input and output should go to standard output.

cat inFile | program [options] - outFile

- If '-' is specified instead of inFlie read from stdin.

program [options] /path/to/folder outFile

- process all files from /path/to/folder

and its subdirectories.

I want it to behave like a normal cli program under GNU / Linux.

It would be nice if the program could be called:

program [options] inFile0 inFile1 ... inFileN outFile

- the first path / file is always interpreted as input, the last one is always interpreted as output. Any additional, interpreted as inputs.

Maybe I could write dirty code that accomplishes this, but this will be used, so someone will end up maintaining it (and he will know where I live ...).

Any help / suggestions are greatly appreciated.


Combining answers and some additional knowledge from the internet, I managed to write this (it doesn't accept multiple inputs, but that's enough):

import sys, argparse, os.path, glob

def inputFile(path):
    if path == "-":
        return [sys.stdin]
    elif os.path.exists(path):
        if os.path.isfile(path):
            return [path]
        else:
            return [y for x in os.walk(path) for y in glob.glob(os.path.join(x[0], '*.dat'))]
    else:
        exit(2)

def main(argv):
    cmdArgsParser = argparse.ArgumentParser()
    cmdArgsParser.add_argument('inFile', nargs='?', default='-', type=inputFile)
    cmdArgsParser.add_argument('outFile', nargs='?', default='-', type=argparse.FileType('w'))
    cmdArgs = cmdArgsParser.parse_args()

    print cmdArgs.inFile
    print cmdArgs.outFile

if __name__ == "__main__":
   main(sys.argv[1:])

      

Thank!

+3


source to share


2 answers


You need a positional argument (name does not start with a dash), optional arguments ( nargs='?'

), default argument ( default='-'

). Also, it argparse.FileType

is a convenience for the factory to return sys.stdin

or sys.stdout

if -

passed (depending on the mode).

Together:



#!/usr/bin/env python

import argparse

# default argument is sys.argv[0]
parser = argparse.ArgumentParser('foo')
parser.add_argument('in_file', nargs='?', default='-', type=argparse.FileType('r'))
parser.add_argument('out_file', nargs='?', default='-', type=argparse.FileType('w'))

def main():
    # default argument is is sys.argv[1:]
    args = parser.parse_args(['bar', 'baz'])
    print(args)
    args = parser.parse_args(['bar', '-'])
    print(args)
    args = parser.parse_args(['bar'])
    print(args)
    args = parser.parse_args(['-', 'baz'])
    print(args)
    args = parser.parse_args(['-', '-'])
    print(args)
    args = parser.parse_args(['-'])
    print(args)
    args = parser.parse_args([])
    print(args)

if __name__ == '__main__':
    main()

      

+2


source


I'll give you a script to start with. He uses optionals

instead positionals

. and only one input file. But it should give a taste of what you can do.

import argparse

parser = argparse.ArgumentParser()
inarg = parser.add_argument('-i','--infile', type=argparse.FileType('r'), default='-')
outarg = parser.add_argument('-o','--outfile', type=argparse.FileType('w'), default='-')

args = parser.parse_args()

print(args)
cnt = 0
for line in args.infile:
    print(cnt, line)
    args.outfile.write(line)
    cnt += 1

      

When called with no arguments, it just prints your input (after the ^ D). I'm a little concerned that it doesn't come out until I issue another ^ D.

FileType

handy, but has a major bug - it opens files, but you have to close them yourself, or let Python do it on exit. There's also a complication that you don't want to close stdin / out.

The best argparse

questions include a basic script and specific questions on how to fix or improve it. Your specifications are clear enough. but it would be nice if you gave us more to work with.


To handle the subdirectory option, I would skip a bit FileType

. Use argparse

to get 2 lists of strings (or list and name) and then do the necessary chgdir

and or glob

to find and iterate over the files. Don't expect argparse

to do the actual job. Use it to parse strings on the command line. Here's a sketch of such a script, leaving most of the details for you.

import argparse
import os
import sys # of stdin/out
....
def open_output(outfile):
   # function to open a file for writing
   # should handle '-'
   # return a file object

def glob_dir(adir):
    # function to glob a dir
    # return a list of files ready to open

def open_forread(afilename):
    # function to open file for reading
    # be sensitive to '-'

def walkdirs(alist):
    outlist = []
    for name in alist:
        if <name is file>;
            outlist.append(name)
        else <name is a dir>:
            glist = glob(dir)
            outlist.extend(glist)
        else:
            <error>
    return outlist

def cat(infile, outfile):
    <do your thing here>

def main(args):
    # handle args options
    filelist = walkdirs(args.inlist)
    fout = open_outdir(args.outfile)
    for name in filelist:
        fin = open_forread(name)
        cat(fin,fout)
        if <fin not stdin>: fin.close()
    if <fout not stdout>: fout.close()

if '__name__' == '__main__':

    parser = argparse.ArgumentParser()
    parser.add_argument('inlist', nargs='*')
    parser.add_argument('outfile')
    # add options 
    args = parser.parse_args()
    main(args)

      

The parser

name is required here outfile

, even if there is one -. I could define it nargs='?'

to make it optional. But that doesn't go well with "inlist" *.



Consider

myprog one two three

      

it

namespace(inlist=['one','two','three'], outfile=default)

      

or

namespace(inlist=['one','two'], outfile='three')

      

When positioning *

and the ?

id of the last line is ambiguous - is this the last entry for inlist

or an optional entry for outfile

? argparse

selects the first and never assigns a value outfile

.

With the definitions --infile

, the --outfile

emphasis on these lines is clear.

In a sense, this problem is too complex for argparse

- there is nothing in it to handle things like directories. In another sense, it is too easy. You can just as easily split sys.argv[1:]

between inlist

and outfile

without help argparse

.

0


source







All Articles