In Python, how to split dollar signs and commas into dollar fields only
I am reading in a large text file with lots of columns related to dollar and not and I am trying to figure out how to split dollar fields into $ and, characters only.
so I will say that I have:
a|b|c
$1,000|hi,you|$45.43
$300.03|$MS2|$55,000
where a and c are dollar fields and b is not. The output should be:
a|b|c
1000|hi,you|45.43
300.03|$MS2|55000
I thought regex would be the way to go, but I can't figure out how to express the replacement:
f=open('sample1_fixed.txt','wb')
for line in open('sample1.txt', 'rb'):
new_line = re.sub(r'(\$\d+([,\.]\d+)?k?)',????, line)
f.write(new_line)
f.close()
Anyone have an idea?
Thanks in advance.
source to share
If you're not tied to the idea of ββusing a regex, I would suggest doing something simple, straightforward, and generally easy to read:
def convert_money(inval):
if inval[0] == '$':
test_val = inval[1:].replace(",", "")
try:
_ = float(test_val)
except:
pass
else:
inval = test_val
return inval
def convert_string(s):
return "|".join(map(convert_money, s.split("|")))
a = '$1,000|hi,you|$45.43'
b = '$300.03|$MS2|$55,000'
print convert_string(a)
print convert_string(b)
OUTPUT
1000|hi,you|45.43
300.03|$MS2|55000
source to share
It looks like you are accessing an entire line of text at once. I think your first task would be to split your string column by column into an array or some other variables. Once you figure it out, your solution to convert currency strings to numbers shouldn't bother with other fields.
Once you have done that, I think there is perhaps an easier way to accomplish this task than with regular expressions. You can start from this SO question .
If you really want to use a regular expression, then this pattern should work for you:
\[$,]\g
Replace matches with blank lines. The pattern gets a little more complex if you have other currencies.
source to share
Defining a blacklist and checking for characters in it is an easy way to do it:
blacklist = ("$", ",") # define characters to remove
with open('sample1_fixed.txt','wb') as f:
for line in open('sample1.txt', 'rb'):
clean_line = "".join(c for c in line if c not in blacklist)
f.write(clean_line)
source to share