Python regex for diffstat output

I would like to match the following lines using python regex and extract the numbers.

1 file changed, 1 insertion(+), 1 deletion(-)
2 files changed, 10 insertions(+), 10 deletions(-)
1 file changed, 1 insertion(+)
1 file changed, 2 deletions(-)

      

So while I'm using named groups in python regex and looking through patterns. But it doesn't work as expected.

#!/usr/bin/python
import re
pat='\s*(\d+).*changed,\s+(\d*)(?P<in>=\s+insertion).*(\d+)(?P<del>=\s+deletion.*')
diff_stats = re.compile(pat)
obj = diff_stats.match(line)

      

+3


source to share


2 answers


Remove =

from named capture group .. Also .. your last group is not closed!

\s*(\d+).*changed,\s+(\d*)(?P<in>\s+insertion).*(\d+)(?P<del>\s+deletion).*
                                 ↑                           ↑          ↑

      

See DEMO



Edit: Improved regex for +

and -

also named taking digits:

\s*(\d+)\s+files?\s+changed,\s*((?P<in>\d+)\s*(insertions?)\([+-]\))?,?\s*((?P<del>\d+)\s*(deletions?)\([+-]\))?

      

See DEMO

+1


source


You need to add the end of the line anchor. So you get a complete match. And also you need to make some parts optional.

^\s*(\d+).*\bchanged,\s+(?:(\d*)(?P<in>\s+insertion).*?)?(?:(\d+)(?P<del>\s+deletion.*))?$

      



DEMO

+1


source







All Articles