Select and replace multiple lines in Notepad ++ with regex

I have a very large HTML file with the results of a security scan and I need to pull useless information out of a document. An example of what I need to pull looks something like this:

<tr>
<td width="20%" valign="top" class="classcell0"><span class="classtext" style="color: #ffffff; font-weight: bold !important;">Info</span></td>
<td width="10%" valign="top" class="classcell"> <a href="http://www.nessus.org/plugins/index.php?view=single&amp;id=10395" target="_blank"> 10395</a>
</td>
<td width="70%" valign="top" class="classcell"><span class="classtext" style="color: #263645; font-weight: normal;">Microsoft Windows SMB Shares Enumeration</span></td>
</tr>

      

After editing, the text above should simply be deleted. However, I cannot do a standard find. Here's another example of what needs to be removed from the document:

<tr>
<td width="20%" valign="top" class="classcell0"><span class="classtext" style="color: #ffffff; font-weight: bold !important;">Info</span></td>
<td width="10%" valign="top" class="classcell"> <a href="http://www.nessus.org/plugins/index.php?view=single&amp;id=11219" target="_blank"> 11219</a>
</td>
<td width="70%" valign="top" class="classcell"><span class="classtext" style="color: #263645; font-weight: normal;">Nessus SYN scanner</span></td>
</tr>

      

I need to treat id 10395 as a variable, but the length remains the same. In addition, Microsoft SMB Share Listing should be treated as a variable as it changes throughout the document.

I tried to shift something like this to a replacement, but I think I am completely losing the sign.

<td width="10%" valign="top" class="classcell"> <a href="http://www.nessus.org/plugins/index.php?view=single&amp;id=\1\1\1\1\1" target="_blank"> \1\1\1\1\1</a>

      

Maybe I need to use a different tool at all?

+3


source to share


2 answers


I assume, by repeating \1

several times, you mean a placeholder for one character, but that is wrong. What you are trying to achieve looks something like this:

<td width="10%" valign="top" class="classcell"> <a href="http://www.nessus.org/plugins/index.php?view=single&amp;id=(\d+)" target="_blank"> \1</a>

      

To match as many as 6 lines:



<tr>\s*<td width="20%" valign="top" class="classcell0"><span class="classtext" style="color: #ffffff; font-weight: bold !important;">Info</span></td>\s*<td width="10%" valign="top" class="classcell"> <a href="http://www\.nessus\.org/plugins/index\.php\?view=single&amp;id=(\d+)" target="_blank"> \1</a>\s*</td>\s*<td width="70%" valign="top" class="classcell"><span class="classtext" style="color: #263645; font-weight: normal;">.*?</span></td>\s*</tr>

      

Then you can replace it with an empty string.

+1


source


Regex is in order from least complex to most complex, but they all get the job done:

<a.*>.*\d.*</a>

<a.*>.*\d{5}.*</a>

<a.*id=\d{5}.*>.*\d{5}.*</a>

      



Disclaimer: be careful . I am unable to parse html with regex.

+1


source







All Articles