Python - find txt.file for ID then return variable from below line

In Python, I'm trying (very badly) to read a .txt file, find the last occurrence of a line referencing a specific client, and read a few lines below that to get their current point balance.

A snapshot of the .txt file:

Customer ID:123
Total sale amount:2345.45

Points from sale:23
Points until next bonus: 77

      

I can find (and find) a specific customer ID, but I can't figure out how to search for the last occurrence of that ID or how to return the "Points until next bonus" value ... I am sorry if this is a basic question, but any help would be greatly appreciated !

My code so far ...

def reward_points ():

#current points total
rewards = open('sales.txt', 'r')

line = rewards.readlines()
search = (str('Customer ID:') + str(Cust_ID))
print(search) #Customer ID:123

while line != ' ':
    if line.startswith(search):
        find('Points until next bonus:')
        current_point_total = line[50:52]
        cust_record = rewards.readlines()
        print(current_point_total)


rewards.close()

      

reward_points ()

+3


source to share


5 answers


I think you'd better parse the file into structured data rather than trying to search for a file that is not in a particularly usable format.

Here's the suggested approach

Iterate over the file with readline

Split the string into fields and labels by matching them: '

Put the fields and labels representing the customer in a dict

Place the dict representing the customer in another dict



Then you have an in-memory database that you can dereference with dict queries

eg customers['1234']['Points until next bonus']

Here's a simplified example code of this approach

#!/usr/bin/env python
import re

# dictionary with all the customers in 
customers = dict()

with open("sales.txt") as f:
    #one line at a time
    for line in f:
        #pattern match on 'key : value'
        field_match = re.match('^(.*):(.*)$',line)

        if field_match :
            # store the fields in variables
            (key,value) = field_match.groups()
            # Customer ID means a new record
            if key == "Customer ID" :
                # set a key for the 'customers database'
                current_id = value
                # if we have never seen this id before it the first, make a record
                if customers.get(current_id) == None :
                    customers[current_id] = []
                # make the record an ordered list of dicts for each block
                customers[current_id].append(dict())
            # not a new record, so store the key and value in the dictionary at the end of the list
            customers[current_id][-1][key] = value

# now customers is a "database" indexed on customer id
#  where the values are a list of dicts of each data block
#
# -1 indexes the last of the list
# so the last customer record for "123" is 

print customers["123"][-1]["Points until next bonus"]

      

Update

I didn't realize that you have multiple blocks for customers and they were interested in ordering, so I reworked the sample code to keep an ordered list of each block of data processed with customer ID

+2


source


This is a good use case for itertools.groupby()

, and this use case fits well with this pattern:

Example:

from itertools import groupby, ifilter, imap


def search(d):
    """Key function used to group our dataset"""

    return d[0] == "Customer ID"


def read_customer_records(filename):
    """Read customer records and return a nicer data structure"""

    data = {}

    with open(filename, "r") as f:
        # clean adn remove blank lines
        lines = ifilter(None, imap(str.strip, f))

        # split each line on the ':' token
        lines = (line.split(":", 1) for line in lines)

        # iterate through each customer and their records
        for newcustomer, records in groupby(lines, search):
            if newcustomer:
                # we've found a new customer
                # create a new dict against their customer id
                customer_id = list(records)[0][1]
                data[customer_id] = {}
            else:
                # we've found customer records
                # add each key/value pair (split from ';')
                # to the customer record from above
                for k, v in records:
                    data[customer_id][k] = v

    return data

      

Output:

>>> read_customer_records("foo.txt")
{'123': {'Total sale amount': '2345.45', 'Points until next bonus': ' 77', 'Points from sale': '23'}, '124': {'Total sale amount': '245.45', 'Points until next bonus': ' 79', 'Points from sale': '27'}}

      



Then you can directly search for customers; eg:

>>> data = read_customer_records("foo.txt")
>>> data["123"]
{'Total sale amount': '2345.45', 'Points until next bonus': ' 77', 'Points from sale': '23'}
>>> data["123"]["Points until next bonus"]
' 77'

      

Basically, what we're doing here is "grouping" the dataset based on a row Customer ID:

. Then we create a data structure (a dict

) and we can search easily O(1)

.

Note. As long as your "customer records" in your "dataset" are separated Customer ID

, this will work regardless of the number of customer "records". This implementation also tries to handle the "messy" data as much as possible by reading a bit of input.

+1


source


I would approach this a little more broadly. If I am not mistaken, you have a recording file of a certain format, recording starts and ends with **

. Why not do the following?

records = file_content.split("**")
for each record in records:
    if (record.split("\n")[0] == search):
        customer_id = getCustomerIdFromRecord(record)
        customer_dictionary.put(customer_id, record)

      

This will display the customer_id and the latest record. You can analyze this to get the information you need.

EDIT: Since there are always 9 lines per record, you can get a list of all lines in the file and create a list of records where the record will be represented as a list of 9 lines. You can use the answer posted here:

Convert list to list of python tuples

0


source


All you have to do is find the lines starting with Customer ID:123

when you find a loop in it over the file object in the inner loop until you find the line Points until

, then extract the points. dots will be the last value of the last occurrence of the client with the ID.

with open("test.txt") as f:
    points = ""
    for line in f:
        if line.rstrip() == "Customer ID:123":
            for line in f:
                if line.startswith("Points until"):
                    points = line.rsplit(None, 1)[1]
                    break

print(points)
77

      

0


source


def get_points_until_next_bonus(filename, customerID):
    #get the last "Customer ID":
    last_id = open(filename, 'r').read().split('Customer ID:'+str(customerID))[-1]
    #get the first line with Points until next bonus: 77
    return last_id.split('Points until next bonus: ')[1].split('\n')[0]
    #there you go...

      

0


source







All Articles