Python regex to check for start and end of a word in a string
I am working on a script to rename files. In this case, there are three possibilities.
1.file does not exist: create a new file
2.File exists: create a new file with filename '(file occurrence count)' like filename (1)
3. Duplicate file already exists: create a new file with file name '(file occurrence count)', for example filename (2)
I have a filename on a line. I can check the last character of a filename using a regex, but how can I check the last characters with '(' to ')' and get the number inside it?
source to share
You just need something like this:
(?<=\()(\d+)(?=\)[^()]*$)
Explanation:
-
(?<=\()
must be preceded by a literal(
-
(\d+)
matching and writing numbers -
(?=\)[^()]+$)
must follow)
and then more(
or)
to the end of the line.
Example: if filename Foo (Bar) Baz (23).jpg
, regex matches match23
source to share
Here is the code and tests to get the filename based on the existing filenames:
import re
def get_name(filename, existing_names):
exist = False
index = 0
p = re.compile("^%s(\((?P<idx>\d+)\))?$" % filename)
for name in existing_names:
m = p.match(name)
if m:
exist = True
idx = m.group('idx')
if idx and int(idx) > index:
index = int(idx)
if exist:
return "%s(%d)" % (filename, index + 1)
else:
return filename
# test data
exists = ["abc(1)", "ab", "abc", "abc(2)", "ab(1)", "de", "ab(5)"]
tests = ["abc", "ab", "de", "xyz"]
expects = ["abc(3)", "ab(6)", "de(1)", "xyz"]
print exists
for name, exp in zip(tests, expects):
new_name = get_name(name, exists)
print "%s -> %s" % (name, new_name)
assert new_name == exp
Look at this line for regex to get the number in (*)
:
p = re.compile("^%s(\((?P<idx>\d+)\))?$" % filename)
Here it uses a named capture ?P<idx>\d+
for a number \d+
and then a later capture with m.group('idx')
.
source to share