Regex operator for Python
I would like to create a regex statement in Python 2.7.8 that will replace characters. It will work like this ...
ó -> o ú -> u é -> e á -> a í -> i ù,ú -> u
These are the only Unicode characters that I would like to change. Unicode characters like ë, ä
, I don't want to change. Thus, the word thójlà
will become tholja
. I'm sure there is a way so that I don't have to create all the regexes separately as shown below.
word = re.sub(ur'ó', ur'o', word)
word = re.sub(ur'ú', ur'u', word)
word = re.sub(ur'é', ur'e', word)
....
I tried to figure it out but no luck. Any help is appreciated!
source to share
Try with str.translate and maketrans
...
print('thójlà'.translate(str.maketrans('óúéáíùú', 'oueaiuu')))
# thojlà
This way you guarantee the only replacements you want to make.
If you had many lines to change, you should assign your maketrans to a variable, for example
table = str.maketrans('óúéáíùú', 'oueaiuu')
and then each line can be translated as
s.translate(table)
source to share
With the String function, replace()
you can do something like:
x = "thójlà"
>>> x
'thójlà'
>>> x = x.replace('ó','o')
'thojlà'
>>> x = x.replace('à','a')
'thojla'
Generalized way:
# -*- coding: utf-8 -*-
replace_dict = {
'á':'a',
'à':'a',
'é':'e',
'í':'i',
'ó':'o',
'ù':'u',
'ú':'u'
}
str1 = "thójlà"
for key in replace_dict:
str1 = str1.replace(key, replace_dict[key])
print(str1) #prints 'thojla'
Third way if your list of character collations gets too big:
# -*- coding: utf-8 -*-
replace_dict = {
'a':['á','à'],
'e':['é'],
'i':['í'],
'o':['ó'],
'u':['ù','ú']
}
str1 = "thójlà"
for key, values in replace_dict.items():
for character in values:
str1 = str1.replace(character, key)
print(str1)
source to share