Regex operator for Python

Question

Regex operator for Python

I would like to create a regex statement in Python 2.7.8 that will replace characters. It will work like this ...

ó -> o
ú -> u
é -> e
á -> a
í -> i
ù,ú  -> u

These are the only Unicode characters that I would like to change. Unicode characters like ë, ä

, I don't want to change. Thus, the word thójlà

will become tholja

. I'm sure there is a way so that I don't have to create all the regexes separately as shown below.

word = re.sub(ur'ó', ur'o', word)
word = re.sub(ur'ú', ur'u', word)
word = re.sub(ur'é', ur'e', word)
....

I tried to figure it out but no luck. Any help is appreciated!

+3

python regex

user2743 08 dec. At 23:10

source to share

3 answers

chapelo · Answer 1 · 2014-12-08T23:39:19+0000

Try with str.translate and maketrans

...

print('thójlà'.translate(str.maketrans('óúéáíùú', 'oueaiuu')))
# thojlà

This way you guarantee the only replacements you want to make.

If you had many lines to change, you should assign your maketrans to a variable, for example

table = str.maketrans('óúéáíùú', 'oueaiuu')

and then each line can be translated as

s.translate(table)

RPGillespie · Answer 2 · 2014-12-08T23:18:53+0000

With the String function, replace()

you can do something like:

x = "thójlà"                  
>>> x
'thójlà'
>>> x = x.replace('ó','o')
'thojlà'
>>> x = x.replace('à','a')
'thojla'

Generalized way:

# -*- coding: utf-8 -*-

replace_dict = {
    'á':'a',
    'à':'a',
    'é':'e',
    'í':'i',
    'ó':'o',
    'ù':'u',
    'ú':'u'
}

str1 = "thójlà"

for key in replace_dict:
    str1 = str1.replace(key, replace_dict[key])

print(str1) #prints 'thojla'

Third way if your list of character collations gets too big:

# -*- coding: utf-8 -*-

replace_dict = {
    'a':['á','à'],
    'e':['é'],
    'i':['í'],
    'o':['ó'],
    'u':['ù','ú']
}

str1 = "thójlà"

for key, values in replace_dict.items():
    for character in values:
        str1 = str1.replace(character, key)

print(str1)

Marcin · Answer 3 · 2014-12-08T23:17:14+0000

If you can use external packages, the easiest way I think would be using unidecode . For example:

from unidecode import unidecode

print(unidecode('thójlà'))
# prints: thojla

Regex operator for Python

More articles: