You can extract tags from your HTML string using regex
like this example:
import re
a = '<table><tbody>\n<tr><td width="90%">Vidya<audio controls><source src="Vidya .mp3" type="audio/mpeg">Your browser does not support the audio element. </audio> <input type="checkbox" name="W" value="w"></td></tr>\n\n<tr><td width="90%">Yeh<audio controls><source src="Yeh .mp3" type="audio/mpeg">Your browser does not support the audio element. </audio> <input type="checkbox" name="X" value="x"></td></tr>\n\n<tr><td width="90%">Jawaani<audio controls><source src="Jawaani.mp3" type="audio/mpeg">Your browser does not support the audio element.</audio> <input type="checkbox" name="Y" value="y"></td></tr>\n\n</tbody>\n</table>'
values = re.findall('name="(.*?)" value="(.*?)"', a)
print(values)
Output:
[('W', 'w'), ('X', 'x'), ('Y', 'y')]
Otherwise, you can remove the tag name
and value
in their data using re.sub()
, as in this example:
new_a = re.sub('( name="\w+")| (value="\w+")', '', a)
print(new_a)
Output:
'<table><tbody>\n<tr><td width="90%">Vidya<audio controls><source src="Vidya .mp3" type="audio/mpeg">Your browser does not support the audio element. </audio> <input type="checkbox"></td></tr>\n\n<tr><td width="90%">Yeh<audio controls><source src="Yeh .mp3" type="audio/mpeg">Your browser does not support the audio element. </audio> <input type="checkbox"></td></tr>\n\n<tr><td width="90%">Jawaani<audio controls><source src="Jawaani.mp3" type="audio/mpeg">Your browser does not support the audio element.</audio> <input type="checkbox"></td></tr>\n\n</tbody>\n</table>'
source
to share