Problem with data matching outside of html tags
I'm trying to find a way to match content that doesn't exist inside any xml or html tags. I've read that using regex is fundamentally bad for parsing xml / html and I'm open to any solution that solves my problem, but if regex works too well.
Here's an example of what I'm looking for:
the lazy fox jumped <span>over</span> the brown fence.
I want to come back
the lazy fox jumped the brown fence
Any ideas?
+2
source to share
2 answers
This is probably a naive technique, but my first instinct would be to run a regex, figure out what text it matches in your parent string, and DELETE it from that string, returning the remainder. In pseudocode
String input = "whatever";
matches = Regex.Matches(input,"<.*>.*?</.*>");
foreach (match m in Matches)
{
input = input.Remove(m.Value);
}
+1
source to share