Python - find txt.file for ID then return variable from below line
In Python, I'm trying (very badly) to read a .txt file, find the last occurrence of a line referencing a specific client, and read a few lines below that to get their current point balance.
A snapshot of the .txt file:
Customer ID:123
Total sale amount:2345.45
Points from sale:23
Points until next bonus: 77
I can find (and find) a specific customer ID, but I can't figure out how to search for the last occurrence of that ID or how to return the "Points until next bonus" value ... I am sorry if this is a basic question, but any help would be greatly appreciated !
My code so far ...
def reward_points ():
#current points total
rewards = open('sales.txt', 'r')
line = rewards.readlines()
search = (str('Customer ID:') + str(Cust_ID))
print(search) #Customer ID:123
while line != ' ':
if line.startswith(search):
find('Points until next bonus:')
current_point_total = line[50:52]
cust_record = rewards.readlines()
print(current_point_total)
rewards.close()
reward_points ()
source to share
I think you'd better parse the file into structured data rather than trying to search for a file that is not in a particularly usable format.
Here's the suggested approach
Iterate over the file with readline
Split the string into fields and labels by matching them: '
Put the fields and labels representing the customer in a dict
Place the dict representing the customer in another dict
Then you have an in-memory database that you can dereference with dict queries
eg customers['1234']['Points until next bonus']
Here's a simplified example code of this approach
#!/usr/bin/env python
import re
# dictionary with all the customers in
customers = dict()
with open("sales.txt") as f:
#one line at a time
for line in f:
#pattern match on 'key : value'
field_match = re.match('^(.*):(.*)$',line)
if field_match :
# store the fields in variables
(key,value) = field_match.groups()
# Customer ID means a new record
if key == "Customer ID" :
# set a key for the 'customers database'
current_id = value
# if we have never seen this id before it the first, make a record
if customers.get(current_id) == None :
customers[current_id] = []
# make the record an ordered list of dicts for each block
customers[current_id].append(dict())
# not a new record, so store the key and value in the dictionary at the end of the list
customers[current_id][-1][key] = value
# now customers is a "database" indexed on customer id
# where the values are a list of dicts of each data block
#
# -1 indexes the last of the list
# so the last customer record for "123" is
print customers["123"][-1]["Points until next bonus"]
Update
I didn't realize that you have multiple blocks for customers and they were interested in ordering, so I reworked the sample code to keep an ordered list of each block of data processed with customer ID
source to share
This is a good use case for itertools.groupby()
, and this use case fits well with this pattern:
Example:
from itertools import groupby, ifilter, imap
def search(d):
"""Key function used to group our dataset"""
return d[0] == "Customer ID"
def read_customer_records(filename):
"""Read customer records and return a nicer data structure"""
data = {}
with open(filename, "r") as f:
# clean adn remove blank lines
lines = ifilter(None, imap(str.strip, f))
# split each line on the ':' token
lines = (line.split(":", 1) for line in lines)
# iterate through each customer and their records
for newcustomer, records in groupby(lines, search):
if newcustomer:
# we've found a new customer
# create a new dict against their customer id
customer_id = list(records)[0][1]
data[customer_id] = {}
else:
# we've found customer records
# add each key/value pair (split from ';')
# to the customer record from above
for k, v in records:
data[customer_id][k] = v
return data
Output:
>>> read_customer_records("foo.txt")
{'123': {'Total sale amount': '2345.45', 'Points until next bonus': ' 77', 'Points from sale': '23'}, '124': {'Total sale amount': '245.45', 'Points until next bonus': ' 79', 'Points from sale': '27'}}
Then you can directly search for customers; eg:
>>> data = read_customer_records("foo.txt")
>>> data["123"]
{'Total sale amount': '2345.45', 'Points until next bonus': ' 77', 'Points from sale': '23'}
>>> data["123"]["Points until next bonus"]
' 77'
Basically, what we're doing here is "grouping" the dataset based on a row Customer ID:
. Then we create a data structure (a dict
) and we can search easily O(1)
.
Note. As long as your "customer records" in your "dataset" are separated Customer ID
, this will work regardless of the number of customer "records". This implementation also tries to handle the "messy" data as much as possible by reading a bit of input.
source to share
I would approach this a little more broadly. If I am not mistaken, you have a recording file of a certain format, recording starts and ends with **
. Why not do the following?
records = file_content.split("**")
for each record in records:
if (record.split("\n")[0] == search):
customer_id = getCustomerIdFromRecord(record)
customer_dictionary.put(customer_id, record)
This will display the customer_id and the latest record. You can analyze this to get the information you need.
EDIT: Since there are always 9 lines per record, you can get a list of all lines in the file and create a list of records where the record will be represented as a list of 9 lines. You can use the answer posted here:
source to share
All you have to do is find the lines starting with Customer ID:123
when you find a loop in it over the file object in the inner loop until you find the line Points until
, then extract the points. dots will be the last value of the last occurrence of the client with the ID.
with open("test.txt") as f:
points = ""
for line in f:
if line.rstrip() == "Customer ID:123":
for line in f:
if line.startswith("Points until"):
points = line.rsplit(None, 1)[1]
break
print(points)
77
source to share
def get_points_until_next_bonus(filename, customerID):
#get the last "Customer ID":
last_id = open(filename, 'r').read().split('Customer ID:'+str(customerID))[-1]
#get the first line with Points until next bonus: 77
return last_id.split('Points until next bonus: ')[1].split('\n')[0]
#there you go...
source to share