Python string handling and processing

I have a few codes that I need to process and they go through several different formats that I need to process first to get them in the format I want:

Examples of codes:

ABC1.12 - correct format
ABC 1.22 - space between letters and numbers
ABC1.12/13 - 2 codes joined together and leading 1. missing from 13, should be ABC1.12 and ABC1.13 
ABC 1.12 / 1.13 - codes joined together and spaces


I know how to remove spaces, but I'm not sure how to handle codes that have been stripped. I know I can use a function split

to generate two codes, but I don’t know how I can add letters (and the first number) to the second code. These are the third and fourth examples in the list above.


    val = # code
    retList = [val]
    if "/" in val:
        (code1, code2) = session_codes = val.split("/", 1)

        (inital_letters, numbers) = code1.split(".", 1)
        if initial_letters not in code2:
            code2 = initial_letters + '.' + code2

        # reset list so that it returns both values 
        retList = [code1, code2]


This will not handle breaks for 4 as code2 becomes ABC1.1.13


source to share

5 answers

You can use regex for this purpose

A possible implementation would be as follows:

>>> def foo(st):
    parts=st.replace(' ','').split("/")
    parts=parts[0:1]+[x.split('.') for x in parts[1:]]
    parts=parts[0:1]+['.'.join(x) if len(x) > 1 else '.'.join([parts[1][0],x[0]]) for x in parts[1:]]
    return [parts[0]+p for p in parts[1:]]

>>> foo('ABC1.12')
>>> foo('ABC 1.22')
>>> foo('ABC1.12/13')
['ABC1.12', 'ABC1.13']
>>> foo('ABC 1.12 / 1.13')
['ABC1.12', 'ABC1.13']




Are you familiar with regex? This will be an angle worth exploring here. Also, consider the space character separation, not just the forward slash and decimal.



I suggest you write a regex for each code pattern, and then form a larger regex that is a concatenation of the individual.



Using PyParsing

The answer from @Abhijit is good and there might be a reg-ex for this simple problem. However, when dealing with parsing problems, you often need a more extensible solution that can grow with your problem. I found that pyparsing

great for this, you write a grammar that parses:

from pyparsing import *

index = Combine(Word(alphas))

# Define what a number is and convert it to a float
number = Combine(Word(nums)+Optional('.'+Optional(Word(nums))))
number.setParseAction(lambda x: float(x[0]))

# What do extra numbers look like?
marker = Word('/').suppress()
extra_numbers = marker + number

# Define what a possible line could be
line_code = Group(index + number + ZeroOrMore(extra_numbers))
grammar = OneOrMore(line_code)


From this definition, we can parse the line:

S = '''ABC1.12
ABC 1.22
XYZ 1.12 / 1.13
print grammar.parseString(S)



[['ABC', 1.12], ['ABC', 1.22], ['XXX', 1.12, 13.0, 77.0, 32.0], ['XYZ', 1.12, 1.13]]



The number is now in the correct format, since we are casting the type to floats during parsing. Many more "numbers" have been processed, look at the "XXX" index, all numbers like 1.12, 13, 32 are parsed regardless of the decimal.



Take a look at this method. Perhaps this is the easiest and best way to do it.

val = unicode(raw_input())

for aChar in val:
    if aChar.isnumeric():
        lastIndex = val.index(aChar)

part1 = val[:lastIndex].strip()
part2 = val[lastIndex:]

if "/" not in part2:
    print part1+part2
    if " " not in part2:
        codes = []
        divPart2 = part2.split(".")
        partCodes = divPart2[1].split("/")
        for aPart in partCodes:
        print codes
        codes = []
        divPart2 = part2.split("/")
        for aPart in divPart2:
            aPart = aPart.strip()
        print codes




All Articles