RegEx template for parsing a working time string
I am writing a python library to parse a working time string and create a standard clock format. I am stuck with the following case:
My regex should return groups for Mon - Fri 7am - 5pm Sat 9am - 3pm
as ['Mon - Fri 7am - 5pm ', 'Sat 9am - 3pm']
, but if there is a comma between the first and the second, then it should return []
.
Also, the comma can be anywhere, but should not be between two weekdays and the duration. eg: Mon - Fri 7am - 5pm Sat 9am - 3pm and available upon email, phone call
should return ['Mon - Fri 7am - 5pm ', 'Sat 9am - 3pm']
.
This is what I have tried,
import re
pattern = """(
(?:mon|tue|wed|thu|fri|sat|sun|mo|tu|we|th|fr|sa|su|m|w|f|thurs) # Start weekday
\s*[-|to]+\s* # Seperator
(?:mon|tue|wed|thu|fri|sat|sun|mo|tu|we|th|fr|sa|su|^(?![ap])m|w|f|thurs)? # End weekday
\s*[from]*\s* # Seperator
(?:\d{1,2}(?:[:]\d{1,2})?)\s*(?:[ap][.]?m.?) # Start hour
\s*[-|to]+\s* # Seperator
(?:\d{1,2}(?:[:]\d{1,2})?)\s*(?:[ap][.]?m.?) # Close hour
)"""
regEx = re.compile(pattern, re.IGNORECASE|re.VERBOSE)
print re.findall(regEx, "Mon - Fri 7am - 5pm Sat 9am - 3pm")
# output ['Mon - Fri 7am - 5pm ', 'Sat 9am - 3pm']
print re.findall(regEx, "Mon - Fri 7am - 5pm Sat - Sun 9am - 3pm")
# output ['Mon - Fri 7am - 5pm ', 'Sat - Sun 9am - 3pm']
print re.findall(regEx, "Mon - Fri 7am - 5pm, Sat 9am - 3pm")
# expected output []
# but I get ['Mon - Fri 7am - 5pm,', 'Sat 9am - 3pm']
print re.findall(regEx, "Mon - Fri 7am - 5pm , Sat 9am - 3pm")
# expected output []
# but I get ['Mon - Fri 7am - 5pm ', 'Sat 9am - 3pm']
Also I tried the negative lookahead pattern in my regex
pattern = """(
(?:mon|tue|wed|thu|fri|sat|sun|mo|tu|we|th|fr|sa|su|m|w|f|thurs)
\s*[-|to]+\s*
(?:mon|tue|wed|thu|fri|sat|sun|mo|tu|we|th|fr|sa|su|^(?![ap])m|w|f|thurs)?
\s*[from]*\s*
(?:\d{1,2}(?:[:]\d{1,2})?)\s*(?:[ap][.]?m.?)
\s*[-|to]+\s*
(?:\d{1,2}(?:[:]\d{1,2})?)\s*(?:[ap][.]?m.?)
(?![^,])
)"""
But I didn't expect this. Do I have to explicitly write code to check the status? Is there a way to just change my regex instead of writing an explicit condition checker?
Another way I would like to implement is to infix comma between the two days of the week if the comma does not exist and change my regex to a / split by comma path. "Mon - Fri 7am - 5pm Sat 9am - 3pm"
=>"Mon - Fri 7am - 5pm, Sat 9am - 3pm"
source to share
I think you can do it simply by matching the whole expression so that the comma (and other characters are not allowed:
pattern = """^(
(
(?:mon|tue|wed|thu|fri|sat|sun|mo|tu|we|th|fr|sa|su|m|w|f|thurs) # Start weekday
\s*[-|to]+\s* # Seperator
(?:mon|tue|wed|thu|fri|sat|sun|mo|tu|we|th|fr|sa|su|^(?![ap])m|w|f|thurs)? # End weekday
\s*[from]*\s* # Seperator
(?:\d{1,2}(?:[:]\d{1,2})?)\s*(?:[ap][.]?m.?) # Start hour
\s*[-|to]+\s* # Seperator
(?:\d{1,2}(?:[:]\d{1,2})?)\s*(?:[ap][.]?m.?) # Close hour
)
)+$""
This will output:
[('Sat 9am - 3pm', 'Sat 9am - 3pm')]
[('Sat - Sun 9am - 3pm', 'Sat - Sun 9am - 3pm')]
[]
[]
Hope it helps,
source to share
There is no way to figure out how to do this in a single regex and you have a good question. I could do whatever you need, but keep in mind that I am not proud of it.
Claiming that you have a function to do this ...
def sample_funct(unparsed_schedule)
result = []
# Day Pattern
pattern = """
(?:mon|tue|wed|thu|fri|sat|sun|mo|tu|we|th|fr|sa|su|m|w|f|thurs) # Start weekday
\s*[-|to]+\s* # Seperator
(?:mon|tue|wed|thu|fri|sat|sun|mo|tu|we|th|fr|sa|su|^(?![ap])m|w|f|thurs)? # End weekday
\s*[from]*\s* # Seperator
(?:\d{1,2}(?:[:]\d{1,2})?)\s*(?:[ap][\.]?m\.?) # Start hour
\s*[-|to]+\s* # Seperator
(?:\d{1,2}(?:[:]\d{1,2})?)\s*(?:[ap][\.]?m\.?) # Close hour
"""
# No commas pattern
pattern2 = "%s\s*[^,]\s*%s" % (pattern, pattern)
# Actual Regex Pattern Items
schedule = re.compile(pattern, re.IGNORECASE|re.VERBOSE)
remove_comma = re.compile(pattern2, re.IGNORECASE|re.VERBOSE)
# Check we have no commas in the middle
valid_result = re.search(remove_comma, unparsed_schedule)
if valid_result:
# Positive result, return the list with schedules
result = re.findall(schedule, validresult.group(0))
# If no valid results will return empty list
return result
source to share