Pyparsing restricted list only returns first element

Here is my code:

l = "1.3E-2   2.5E+1"
parser = Word(alphanums + '+-.')
grammar = delimitedList(parser,delim='\t ')
print(grammar.parseString(l))

      

It returns:

['1.3E-2']

      

Honestly, I want all of both values, not one, not to know what is going on?

+3


source to share


2 answers


Works if you switch to raw lines:

l = r"1.3E-2\t2.5E+1"
parser = Word(alphanums + '+-.')
grammar = delimitedList(parser, delim=r'\t')
print(grammar.parseString(l))

      

Printing



['1.3E-2', '2.5E+1']

      

Typically, delimitedList works with something like PDPDP

where P

is syntactic markup and D

is a delimiting or separating sequence.

You have delim='\t '

. Specifically, it is a 1 tab separator followed by 1 space; it is not a tab or a space.

+6


source


As @dawg explains, delimitedList is for cases where you have an expression with delimited delimiters without spaces, usually commas. Pyparsing implicitly skips whitespace, so in the pyparsing world, what you actually see is not a delimiter, but OneOrMore(realnumber)

. Also, parseString internally calls str.expandtabs

on the supplied input string if you don't use an argument parseWithTabs=True

. Expanding tabs into spaces helps to keep data columns aligned when in tabular form, and when I originally wrote pyparsing this was a common use case.

If you have control over this data, you can use a different separator than <TAB>

perhaps commas or semicolons. If you are stuck with this format but decide to use pyparsing, use OneOrMore.

As you move forward, you will also want to clarify the expressions you define and the variable names you use. The name "parser" is not very informative, and the template Word(alphanums+'+-.')

will match a lot of things besides the actual real values ​​in scientific notation. I understand that if you are just trying to get something to work, this is a reasonable first cut, and you can go back and tweak it as soon as you get something. If you are actually going to analyze real numbers, here is an expression that might be useful:



realnum = Regex(r'[+-]?\d+\.\d*([eE][+-]?\d+)?').setParseAction(lambda t: float(t[0]))

      

Then you can define your grammar as "OneOrMore (realnum)", which is also much clearer. And the parse action will convert your strings to floats at parse time, which will help you step later when actually working with the parsed values.

Good luck!

+6


source







All Articles