Regular expressions in unicode strings

Question

Regular expressions in unicode strings

I have a unicode text that I want to clean up using regular expressions. For example, I have cases where u '(2'. This exists because for formatting reasons the closing paragraph ends in an adjacent html cell. My original solution to this problem was to look ahead at the content of the next cell and use the string function to determine if he is a closure wig. I knew it was not a great solution but it worked. Now I want to fix this but I can’t get the regex to work.

missingParen=re.compile(r"^\(\d[^\)]$")

My understanding of what I think I am doing is:
^ at the beginning of the line I want to find (open groove, guy should be back asleep because it is a special character \ d I also want to find a single digit
[I create a special class characters
^ I don't want to find what follows,), which is the close $ at the end of the line

And of course the plot thickens, I made a silly assumption that since I put a \ d I wouldn't find (33, but I'm wrong, so I added {1} to my regex and it didn't help, it matched ( 3333, so my problem is harder than I thought. I want the string to be just an open pair and one digit. This is a smarter approach

missingParen=re.compile(r"^\(\d$")

And note that S Lott _I has already tagged it as a newbie so you can't pick up any cheap glasses. It's not that I don't appreciate your ideas. I read your book constantly, it probably has an answer.

+1

python regex

PyNEwbie Dec 11. '08 at 19:37

source to share

1 answer

PyNEwbie · Accepted Answer · 2008-12-11T20:13:47+0000

Okay, sorry to use this stream of mind stimulant mind, but it seems that outlining my original question got me going down the path. It seems to me that this is the solution for what I am trying to do:

  missingParen=re.compile(r"^\(\d$")

Regular expressions in unicode strings

More articles: