Replace HTML tag content with Regex
I want to encrypt the text content of an HTML document without changing its layout. Content is stored in tag pairs, for example: <span style ...> text_to_get </SPAN>. My idea is to use Regex to extract (1) and replace each textual part with ciphertext (2). I am completing step (1) but problems occur in step (2). Here is the code I am working on:
private string encryptSpanContent(string text, string passPhrase, string salt, string hash, int iteration, string initialVector, int keySize)
{
string resultText = text;
string pattern = "<span style=(?<style>.*?)>(?<content>.*?)</span>";
Regex regex = new Regex(pattern);
MatchCollection matches = regex.Matches(resultText);
foreach (Match match in matches)
{
string replaceWith = "<span style=" + match.Groups["style"] + ">" + AESEncryption.Encrypt(match.Groups["content"].Value, passPhrase, salt, hash, iteration, initialVector, keySize) + "</span>";
resultText = regex.Replace(resultText, replaceWith);
}
return resultText;
}
Is this the wrong string (which replaces all texts with the last replaceWith value)?
resultText = regex.Replace(resultText, replaceWith);
Can anyone help me fix this?
source to share
It is recommended to use the HTML Agility Pack if you are going to work with HTML as you may run into problems with regex, especially on nested tags or malformed HTML.
Assuming your HTML is well formed and you decide to use a regular expression, you should use a method Regex.Replace
that accepts MatchEvaluator
all occurrences to replace.
Try this approach:
string input = @"<div><span style=""color: #000;"">hello, world!</span></div>";
string pattern = @"(?<=<span style=""[^""]+"">)(?<content>.+?)(?=</span>)";
string result = Regex.Replace(input, pattern,
m => AESEncryption.Encrypt(m.Groups["content"].Value, passPhrase, salt, hash, iteration, initialVector, keySize));
Here I am using the lambada expression for MatchEvaluator
and linking to the "content" group as shown above. I also use look-around for tags span
to avoid including them in the replacement pattern.
source to share