Select and replace multiple lines in Notepad ++ with regex
I have a very large HTML file with the results of a security scan and I need to pull useless information out of a document. An example of what I need to pull looks something like this:
<tr>
<td width="20%" valign="top" class="classcell0"><span class="classtext" style="color: #ffffff; font-weight: bold !important;">Info</span></td>
<td width="10%" valign="top" class="classcell"> <a href="http://www.nessus.org/plugins/index.php?view=single&id=10395" target="_blank"> 10395</a>
</td>
<td width="70%" valign="top" class="classcell"><span class="classtext" style="color: #263645; font-weight: normal;">Microsoft Windows SMB Shares Enumeration</span></td>
</tr>
After editing, the text above should simply be deleted. However, I cannot do a standard find. Here's another example of what needs to be removed from the document:
<tr>
<td width="20%" valign="top" class="classcell0"><span class="classtext" style="color: #ffffff; font-weight: bold !important;">Info</span></td>
<td width="10%" valign="top" class="classcell"> <a href="http://www.nessus.org/plugins/index.php?view=single&id=11219" target="_blank"> 11219</a>
</td>
<td width="70%" valign="top" class="classcell"><span class="classtext" style="color: #263645; font-weight: normal;">Nessus SYN scanner</span></td>
</tr>
I need to treat id 10395 as a variable, but the length remains the same. In addition, Microsoft SMB Share Listing should be treated as a variable as it changes throughout the document.
I tried to shift something like this to a replacement, but I think I am completely losing the sign.
<td width="10%" valign="top" class="classcell"> <a href="http://www.nessus.org/plugins/index.php?view=single&id=\1\1\1\1\1" target="_blank"> \1\1\1\1\1</a>
Maybe I need to use a different tool at all?
source to share
I assume, by repeating \1
several times, you mean a placeholder for one character, but that is wrong. What you are trying to achieve looks something like this:
<td width="10%" valign="top" class="classcell"> <a href="http://www.nessus.org/plugins/index.php?view=single&id=(\d+)" target="_blank"> \1</a>
To match as many as 6 lines:
<tr>\s*<td width="20%" valign="top" class="classcell0"><span class="classtext" style="color: #ffffff; font-weight: bold !important;">Info</span></td>\s*<td width="10%" valign="top" class="classcell"> <a href="http://www\.nessus\.org/plugins/index\.php\?view=single&id=(\d+)" target="_blank"> \1</a>\s*</td>\s*<td width="70%" valign="top" class="classcell"><span class="classtext" style="color: #263645; font-weight: normal;">.*?</span></td>\s*</tr>
Then you can replace it with an empty string.
source to share
Regex is in order from least complex to most complex, but they all get the job done:
<a.*>.*\d.*</a>
<a.*>.*\d{5}.*</a>
<a.*id=\d{5}.*>.*\d{5}.*</a>
Disclaimer: be careful . I am unable to parse html with regex.
source to share