RegEx template for parsing a working time string

I am writing a python library to parse a working time string and create a standard clock format. I am stuck with the following case:

My regex should return groups for Mon - Fri 7am - 5pm Sat 9am - 3pm

as ['Mon - Fri 7am - 5pm ', 'Sat 9am - 3pm']

, but if there is a comma between the first and the second, then it should return []

.

Also, the comma can be anywhere, but should not be between two weekdays and the duration. eg: Mon - Fri 7am - 5pm Sat 9am - 3pm and available upon email, phone call

should return ['Mon - Fri 7am - 5pm ', 'Sat 9am - 3pm']

.

This is what I have tried,

import re
pattern = """(
    (?:mon|tue|wed|thu|fri|sat|sun|mo|tu|we|th|fr|sa|su|m|w|f|thurs) # Start weekday
\s*[-|to]+\s* # Seperator
(?:mon|tue|wed|thu|fri|sat|sun|mo|tu|we|th|fr|sa|su|^(?![ap])m|w|f|thurs)?  # End weekday
\s*[from]*\s* # Seperator
(?:\d{1,2}(?:[:]\d{1,2})?)\s*(?:[ap][.]?m.?) # Start hour
\s*[-|to]+\s* # Seperator
(?:\d{1,2}(?:[:]\d{1,2})?)\s*(?:[ap][.]?m.?) # Close hour
)"""

regEx = re.compile(pattern, re.IGNORECASE|re.VERBOSE)

print re.findall(regEx, "Mon - Fri 7am - 5pm Sat 9am - 3pm")
# output ['Mon - Fri 7am - 5pm ', 'Sat 9am - 3pm']
print re.findall(regEx, "Mon - Fri 7am - 5pm Sat - Sun 9am - 3pm")
# output ['Mon - Fri 7am - 5pm ', 'Sat - Sun 9am - 3pm']
print re.findall(regEx, "Mon - Fri 7am - 5pm, Sat 9am - 3pm")
# expected output []
# but I get ['Mon - Fri 7am - 5pm,', 'Sat 9am - 3pm']
print re.findall(regEx, "Mon - Fri 7am - 5pm , Sat 9am - 3pm")
# expected output []
# but I get ['Mon - Fri 7am - 5pm ', 'Sat 9am - 3pm']

      

Also I tried the negative lookahead pattern in my regex

pattern = """(
(?:mon|tue|wed|thu|fri|sat|sun|mo|tu|we|th|fr|sa|su|m|w|f|thurs)
\s*[-|to]+\s*
(?:mon|tue|wed|thu|fri|sat|sun|mo|tu|we|th|fr|sa|su|^(?![ap])m|w|f|thurs)?
\s*[from]*\s*
(?:\d{1,2}(?:[:]\d{1,2})?)\s*(?:[ap][.]?m.?)
\s*[-|to]+\s*
(?:\d{1,2}(?:[:]\d{1,2})?)\s*(?:[ap][.]?m.?)
(?![^,])
)"""

      

But I didn't expect this. Do I have to explicitly write code to check the status? Is there a way to just change my regex instead of writing an explicit condition checker?

Another way I would like to implement is to infix comma between the two days of the week if the comma does not exist and change my regex to a / split by comma path. "Mon - Fri 7am - 5pm Sat 9am - 3pm"

=>"Mon - Fri 7am - 5pm, Sat 9am - 3pm"

+3


source to share


3 answers


I think you can do it simply by matching the whole expression so that the comma (and other characters are not allowed:

pattern = """^(
(
    (?:mon|tue|wed|thu|fri|sat|sun|mo|tu|we|th|fr|sa|su|m|w|f|thurs) # Start weekday
\s*[-|to]+\s* # Seperator
(?:mon|tue|wed|thu|fri|sat|sun|mo|tu|we|th|fr|sa|su|^(?![ap])m|w|f|thurs)?  # End weekday
\s*[from]*\s* # Seperator
(?:\d{1,2}(?:[:]\d{1,2})?)\s*(?:[ap][.]?m.?) # Start hour
\s*[-|to]+\s* # Seperator
(?:\d{1,2}(?:[:]\d{1,2})?)\s*(?:[ap][.]?m.?) # Close hour
)
)+$""

      

This will output:



[('Sat 9am - 3pm', 'Sat 9am - 3pm')]
[('Sat - Sun 9am - 3pm', 'Sat - Sun 9am - 3pm')]
[]
[]

      

Hope it helps,

+1


source


I wrote some lines of code to check and insert a comma after each if the comma doesn't exist between two days of the week. So I was able to get the same format "Mon - Fri 7am - 5pm, Sat 9am - 3pm"

and I can proceed further.



0


source


There is no way to figure out how to do this in a single regex and you have a good question. I could do whatever you need, but keep in mind that I am not proud of it.

Claiming that you have a function to do this ...

def sample_funct(unparsed_schedule)
    result = []

    # Day Pattern
    pattern = """
    (?:mon|tue|wed|thu|fri|sat|sun|mo|tu|we|th|fr|sa|su|m|w|f|thurs) # Start weekday
    \s*[-|to]+\s* # Seperator
    (?:mon|tue|wed|thu|fri|sat|sun|mo|tu|we|th|fr|sa|su|^(?![ap])m|w|f|thurs)?  # End weekday
    \s*[from]*\s* # Seperator
    (?:\d{1,2}(?:[:]\d{1,2})?)\s*(?:[ap][\.]?m\.?) # Start hour
    \s*[-|to]+\s* # Seperator
    (?:\d{1,2}(?:[:]\d{1,2})?)\s*(?:[ap][\.]?m\.?) # Close hour
    """

    # No commas pattern
    pattern2 = "%s\s*[^,]\s*%s" % (pattern, pattern)

    # Actual Regex Pattern Items
    schedule     = re.compile(pattern, re.IGNORECASE|re.VERBOSE)
    remove_comma = re.compile(pattern2, re.IGNORECASE|re.VERBOSE)

    # Check we have no commas in the middle
    valid_result = re.search(remove_comma, unparsed_schedule)
    if valid_result:
        # Positive result, return the list with schedules
        result = re.findall(schedule, validresult.group(0))

    # If no valid results will return empty list
    return result 

      

0


source







All Articles