In Python, how to split dollar signs and commas into dollar fields only

I am reading in a large text file with lots of columns related to dollar and not and I am trying to figure out how to split dollar fields into $ and, characters only.

so I will say that I have:

a|b|c

$1,000|hi,you|$45.43

$300.03|$MS2|$55,000

      

where a and c are dollar fields and b is not. The output should be:

a|b|c

1000|hi,you|45.43

300.03|$MS2|55000

      

I thought regex would be the way to go, but I can't figure out how to express the replacement:

f=open('sample1_fixed.txt','wb')

for line in open('sample1.txt', 'rb'):
    new_line = re.sub(r'(\$\d+([,\.]\d+)?k?)',????, line)
    f.write(new_line)

f.close()

      

Anyone have an idea?

Thanks in advance.

+3


source to share


8 answers


Simple approach:



>>> import re
>>> exp = '\$\d+(,|\.)?\d+'
>>> s = '$1,000|hi,you|$45.43'
>>> '|'.join(i.translate(None, '$,') if re.match(exp, i) else i for i in s.split('|'))
'1000|hi,you|45.43'

      

+3


source


If you're not tied to the idea of ​​using a regex, I would suggest doing something simple, straightforward, and generally easy to read:

def convert_money(inval):
    if inval[0] == '$':
        test_val = inval[1:].replace(",", "")
        try:
            _ = float(test_val)
        except:
            pass
        else:
            inval = test_val

    return inval


def convert_string(s):
    return "|".join(map(convert_money, s.split("|")))


a = '$1,000|hi,you|$45.43'
b = '$300.03|$MS2|$55,000'

print convert_string(a)
print convert_string(b)

      



OUTPUT

1000|hi,you|45.43
300.03|$MS2|55000

      

+4


source


Use regexx

((?<=\d),(?=\d))|(\$(?=\d))

      

eg,

import re
>>> x="$1,000|hi,you|$45.43"
re.sub( r'((?<=\d),(?=\d))|(\$(?=\d))', r'', x)
'1000|hi,you|45.43'

      

0


source


Try the following regex and then replace the matching lines with \1\2\3

\$(\d+(?:\.\d+)?)(?:(?:,(\d{2}))*(?:,(\d{3})))?

      

DEMO

0


source


Try this regex if needed.

\$(\d+)[\,]*([\.]*\d*)

      

WATCH DEMO: http://regex101.com/r/wM0zB6/2

0


source


It looks like you are accessing an entire line of text at once. I think your first task would be to split your string column by column into an array or some other variables. Once you figure it out, your solution to convert currency strings to numbers shouldn't bother with other fields.

Once you have done that, I think there is perhaps an easier way to accomplish this task than with regular expressions. You can start from this SO question .

If you really want to use a regular expression, then this pattern should work for you:

\[$,]\g

      

Demo on regex101

Replace matches with blank lines. The pattern gets a little more complex if you have other currencies.

0


source


Defining a blacklist and checking for characters in it is an easy way to do it:

blacklist = ("$", ",") # define characters to remove
with open('sample1_fixed.txt','wb') as f:
    for line in open('sample1.txt', 'rb'):
        clean_line = "".join(c for c in line if c not in blacklist)
        f.write(clean_line)

      

0


source


\$(?=(?:[^|]+,)|(?:[^|]+\.))

      

Try this.Replace with empty string

.Use re.M

option.See demo.

http://regex101.com/r/gT6kI4/6

0


source







All Articles