Extract substring from String Python

I am trying to extract the next substring from a string

-- CVS Header: $Source: /CVS/oracle11i/database/erp/apps/pkgspec/wwt_prime_pkg.p
ls,v $, $Revision: 1.14 $, $Author: $, $Date: 2014/09/23 21:41:15 $

      

String I want to extract: $Revision: 1.14 (or just 1.14)

My code looks like this:

from sys import *
from os.path import *
import re 

script, filename = argv

print "Filename: %s\n" % filename

def check_string():
    found = False
    with open(filename) as f:
        for line in f:
        if re.search("(?<=\$Revision: ) 1.14", line):
            print line
            found = True
        if not found:
            print "No Header exists in %s" % filename

check_string()

      

This does not work.

Any suggestions?

Thank!

+3


source to share


5 answers


If I understand you correctly and split should do what you want:

if "$Revision:" in line:
    print(line.split("$Revision: ")[1].split()[0])
1.14


In [6]: line ="""
   ...: -- CVS Header: $Source: /CVS/oracle11i/database/erp/apps/pkgspec/wwt_prime_pkg.p
   ...: ls,v $, $Revision: 1.14 $, $Author: $, $Date: 2014/09/23 21:41:15 $
   ...: """

In [7]: line.split("$Revision: ")  # split the line at $Revision: 
Out[7]: 
['\n-- CVS Header: $Source: /CVS/oracle11i/database/erp/apps/pkgspec/wwt_prime_pkg.p\nls,v $, ',
 '1.14 $, $Author: $, $Date: 2014/09/23 21:41:15 $\n']

# we use indexing to get the first element after $Revision:  in the string
In [8]: line.split("$Revision: ")[1] 
# which becomes the substring below
Out[8]: '1.14 $, $Author: $, $Date: 2014/09/23 21:41:15 $\n'

# if we call split again we split that substring on whitespace into individual strings
In [10]: '1.14 $, $Author: $, $Date: 2014/09/23 21:41:15 $\n'.split()
Out[10]: ['1.14', '$,', '$Author:', '$,', '$Date:', '2014/09/23', '21:41:15', '$']

# using indexing again we extract the first element which is the  revision number
In [11]: '1.14 $, $Author: $, $Date: 2014/09/23 21:41:15 $\n'.split()[0]
Out[11]: '1.14'

      

The same for $Date

:



 date  = line.split("$Date: ")[1].split()[0]

      

Or just use in

if you just want to check for a substring in a string:

if "$Revision: 1.14" in line:
    print line

      

+1


source


if re.search("(?<=\$Revision: ) 1.14", line):

      

your string won't work because you are trying to match two spaces between :

and 1.14

, try:

if re.search("(?<=\$Revision: )1.14", line):

      



or

if re.search("\$Revision:\s+1.14", line):

      

+2


source


Your regex requires two spaces between the colon and the version number and only one in the input.

+1


source


>>> import re
>>> string="""-- CVS Header: $Source: /CVS/oracle11i/database/erp/apps/pkgspec/wwt_prime_pkg.p
... ls,v $, $Revision: 1.14 $, $Author: $, $Date: 2014/09/23 21:41:15 $"""
>>> re.findall(r'\$Revision:\s*([0-9.]*)',string,re.DOTALL) # if more than one such value is to be searched
['1.14']   
>>> re.search(r'\$Revision:\s*([0-9.]*)',string,re.DOTALL).group(1) # if only one such value neeeds to be found 
'1.14'

      

0


source


import sys

def check_string(f,target):
    for line in f:
        if line.find(target)>=0:
            return line

script, filename = argv

f = open(filename)
rev_line = check_string(f,'Revision: 1.14')
if rev_line:
    ...
else:
    ...

      

Function check_string

  • No regex needed
  • line.find(target)

    returns -1

    on error, index target

    in line

    on success
  • if the index is at least 0

    , we have a match, so we returnline

  • If we don't find a match, we go outside the function boundary by returning None

Caller

After the regular template, we assign rev_line

what is returned to a variable check_string

. If we have not found 'Revision: 1.14'

, rev_line

- None

otherwise it is a complete string containing the target. Keep doing what should be done in both cases.

Edit

If the version number is not known at the time of writing the program, you have two cases

  • the revision number is issued from the file or calculated otherwise and is known at runtime

    target = 'Revision: %d.%d' % (major, minor)
    rev_line = check_string(f, target)
    
          

  • the version number is not fully known during validation, in which case you create a string target

    containing the regex and change the internals check_string

    , instead if line.find(target)>=0:

    you write if re.search(target, line):

    which is very similar to what you wrote in the 1st place, but the regex is no longer hard-coded into function and you can define it in the main body of the program.

In general, 2.

it's better because you can always create a "constant" regex ...

0


source







All Articles