How to extract text infront of a sample using python3?

Question

How to extract text infront of a sample using python3?

Here is a sample entry that I have.

Record ID:           9211
User name:           Administrator first
User principal name: Administrator@example.com
When created:         1999-12-23 3:8:52
When changed:         2000-06-10 4:8:55
Account expires:      Never

I would like to extract data from infront of values. The output should look like this:

9211
Administrator first
Administrator
first
Administrator@example.com
1999-12-23 3:8:52
2000-06-10 4:8:55
Never

The word Administrator first

should be extracted and split as shown above.
I tried the following inorder to extract User name

from the sample but didn't get any output.

re.findall(r'User name:           (\w+)', i)

Please let me know how can I achieve this? There should be only the extracted data, not the spaces that are given before the data.

Please let me know how can I achieve this?

+3

python python-3.x regex text-processing python-textprocessing

Jaffer wilson May 11 '17 at 8:32

source to share

4 answers

Jan · Answer 1 · 2017-05-11T08:40:27+0000

You can use dict comprehension

import re

string = """
Record ID:           9211
User name:           Administrator first
User principal name: Administrator@example.com
When created:         1999-12-23 3:8:52
When changed:         2000-06-10 4:8:55
Account expires:      Never
"""

rx = re.compile(r'^(?P<key>[^:\n]+):\s*(?P<value>.+)', re.MULTILINE)
result = {m.group('key'): m.group('value') for m in rx.finditer(string)}
print(result)

Then just ask your dict i.e. result['User name']

... See the demo at ideone.com .

If you have multiple occurrences of records, and the records always have the same format (i.e. they start with Record ID

and end with Account expires

), you can wrap another expression and class around it, which ends up with a list of dictionaries:

import re
string = """
Record ID:           9211
User name:           Administrator first
User principal name: Administrator@example.com
When created:         1999-12-23 3:8:52
When changed:         2000-06-10 4:8:55
Account expires:      Never

Record ID:           9390
User name:           Administrator first
User principal name: Administrator@example.com
When created:         1999-12-23 3:8:52
When changed:         2000-06-10 4:8:55
Account expires:      Never
"""

class Analyzer:
    ''' Parses the input string and returns matched entries '''
    rx_parts = re.compile(r'^Record ID:(?s:.+?)^Account expires:.+', re.MULTILINE)
    rx_entries = re.compile(r'^(?P<key>[^:\n]+):\s*(?P<value>.+)', re.MULTILINE)
    result = list()

    def __init__(self, input_string = None):
        self.result = [{m.group('key'): m.group('value') 
                        for m in self.rx_entries.finditer(part.group(0))}
                        for part in self.rx_parts.finditer(input_string)]

    def query(self, key=None, value=None):
        try:
            subset = [item for item in self.result if item[key] == value]
        except KeyError:
            subset = []
        return subset

a = Analyzer(string)
admin = a.query(key = 'Record ID', value='9390')
print(admin)

user3598726 · Answer 2 · 2017-05-11T08:42:02+0000

You can use a naive approach:

text = """Record ID:           9211
User name:           Administrator first
User principal name: Administrator@example.com
When created:         1999-12-23 3:8:52
When changed:         2000-06-10 4:8:55
Account expires:      Never"""

# cut text at newline chars
for line in text.splitlines():
    # find the first ':'
    idx=line.index(':')
    # remove spaces from the start
    strippedLine = line[idx+1:].lstrip()
    if 'User name' in line:
        print (strippedLine)

mazjin · Answer 3 · 2017-05-11T08:48:38+0000

Usage r'User name:\s*(\w+\s*\w*)'

as the regex string works; it looks like the problem was the space between the field name and the value that caused and produced, as well as the space between the first and last words in the value (for values that have them, hence the match *

).

JohnyNich · Answer 4 · 2017-05-11T08:39:42+0000

What you can do is turn each string into a list and use the method .split()

on the list to split the string into two separate list indices. For example. If I were to split the phrase "Good people" and split it by "(space), then I would get a list with two indices" People "at index 0 and" People "at first.

I probably explained it badly so you can check other posts on the split method.

How to extract text infront of a sample using python3?

More articles: