Extract substring from String Python
I am trying to extract the next substring from a string
-- CVS Header: $Source: /CVS/oracle11i/database/erp/apps/pkgspec/wwt_prime_pkg.p
ls,v $, $Revision: 1.14 $, $Author: $, $Date: 2014/09/23 21:41:15 $
String I want to extract: $Revision: 1.14 (or just 1.14)
My code looks like this:
from sys import *
from os.path import *
import re
script, filename = argv
print "Filename: %s\n" % filename
def check_string():
found = False
with open(filename) as f:
for line in f:
if re.search("(?<=\$Revision: ) 1.14", line):
print line
found = True
if not found:
print "No Header exists in %s" % filename
check_string()
This does not work.
Any suggestions?
Thank!
source to share
If I understand you correctly and split should do what you want:
if "$Revision:" in line:
print(line.split("$Revision: ")[1].split()[0])
1.14
In [6]: line ="""
...: -- CVS Header: $Source: /CVS/oracle11i/database/erp/apps/pkgspec/wwt_prime_pkg.p
...: ls,v $, $Revision: 1.14 $, $Author: $, $Date: 2014/09/23 21:41:15 $
...: """
In [7]: line.split("$Revision: ") # split the line at $Revision:
Out[7]:
['\n-- CVS Header: $Source: /CVS/oracle11i/database/erp/apps/pkgspec/wwt_prime_pkg.p\nls,v $, ',
'1.14 $, $Author: $, $Date: 2014/09/23 21:41:15 $\n']
# we use indexing to get the first element after $Revision: in the string
In [8]: line.split("$Revision: ")[1]
# which becomes the substring below
Out[8]: '1.14 $, $Author: $, $Date: 2014/09/23 21:41:15 $\n'
# if we call split again we split that substring on whitespace into individual strings
In [10]: '1.14 $, $Author: $, $Date: 2014/09/23 21:41:15 $\n'.split()
Out[10]: ['1.14', '$,', '$Author:', '$,', '$Date:', '2014/09/23', '21:41:15', '$']
# using indexing again we extract the first element which is the revision number
In [11]: '1.14 $, $Author: $, $Date: 2014/09/23 21:41:15 $\n'.split()[0]
Out[11]: '1.14'
The same for $Date
:
date = line.split("$Date: ")[1].split()[0]
Or just use in
if you just want to check for a substring in a string:
if "$Revision: 1.14" in line:
print line
source to share
>>> import re
>>> string="""-- CVS Header: $Source: /CVS/oracle11i/database/erp/apps/pkgspec/wwt_prime_pkg.p
... ls,v $, $Revision: 1.14 $, $Author: $, $Date: 2014/09/23 21:41:15 $"""
>>> re.findall(r'\$Revision:\s*([0-9.]*)',string,re.DOTALL) # if more than one such value is to be searched
['1.14']
>>> re.search(r'\$Revision:\s*([0-9.]*)',string,re.DOTALL).group(1) # if only one such value neeeds to be found
'1.14'
source to share
import sys
def check_string(f,target):
for line in f:
if line.find(target)>=0:
return line
script, filename = argv
f = open(filename)
rev_line = check_string(f,'Revision: 1.14')
if rev_line:
...
else:
...
Function check_string
- No regex needed
-
line.find(target)
returns-1
on error, indextarget
inline
on success - if the index is at least
0
, we have a match, so we returnline
- If we don't find a match, we go outside the function boundary by returning
None
Caller
After the regular template, we assign rev_line
what is returned to a variable check_string
. If we have not found 'Revision: 1.14'
, rev_line
- None
otherwise it is a complete string containing the target. Keep doing what should be done in both cases.
Edit
If the version number is not known at the time of writing the program, you have two cases
-
the revision number is issued from the file or calculated otherwise and is known at runtime
target = 'Revision: %d.%d' % (major, minor) rev_line = check_string(f, target)
-
the version number is not fully known during validation, in which case you create a string
target
containing the regex and change the internalscheck_string
, insteadif line.find(target)>=0:
you writeif re.search(target, line):
which is very similar to what you wrote in the 1st place, but the regex is no longer hard-coded into function and you can define it in the main body of the program.
In general, 2.
it's better because you can always create a "constant" regex ...
source to share