Count the most common occurrences of unknown lines in a file

I have a large file with lines like this ...

19:54:05 10.10.8.5 [SERVER] Response sent: www.example.com. type A by 192.168.4.5
19:55:10 10.10.8.5 [SERVER] Response sent: ns1.example.com. type A by 192.168.4.5
19:55:23 10.10.8.5 [SERVER] Response sent: ns1.example.com. type A by 192.168.4.5

      

I do not need any other data, only that after submitting the answer: I need a sorted list of the most common domain names. The problem is, I won't know all the domain names beforehand, so I can't just search for the string.

Using the example above, I would like the result to be along the lines

ns1.example.com (2)
www.example.com (1)

      

... where the number in () is the counter for this event.

How / what can I use for Windows? Input file -.txt - the output file can be anything. Ideally this is a command line process, but I am really lost, so I would be happy with anything.

+3


source to share


4 answers


The cat is out of the bag, so try to help a little. This is a PowerShell solution. If you are having trouble with how this works, I recommend that you research the individual parts.

If the text file was "D: \ temp \ test.txt" you can do something like this.

$results = Select-String -Path D:\temp\test.txt -Pattern "(?<=sent: ).+(?= type)" | Select -Expand Matches | Select -Expand Value
$results | Group-Object | Select-Object Name,Count | Sort-Object Count -Descending

      

Using your input you get this for output



Name             Count
----             -----
ns1.example.com.     2
www.example.com.     1

      

Since a regex exists, I kept a link that explains how it works .

Please keep in mind that SO is, of course, a site that helps programmers and programming enthusiasts. We devote our free time when some people get paid to do it.

+3


source


Can you do this in PHP?



<?php
$lines = file($filename, FILE_IGNORE_NEW_LINES);

foreach($lines as $value) {
   $arr = explode(' ', $value);
   $domainarr[] = $arr[5];
}

$occurence = array_count_values($domainarr);

print_r($occurence);
?>

      

+2


source


This is in batch:

@echo off
setlocal enabledelayedexpansion
if exist temp.txt del temp.txt
for /f "tokens=6" %%a in (input.txt) do (Echo %%a >> temp.txt)
for /f %%a in (temp.txt) do (
set /a count=0
set v=%%a
if "!%%a!" EQU "" (
for /f %%b in ('findstr /L "%%a" "temp.txt"') do set /a count+=1
set %%a=count
Echo !v:~0,-1! ^(!count!^)
)
)
del temp.txt

      

It is currently displaying it on the screen. If you want to redirect it to a text file, replace:

Echo !v:~0,-1! ^(!count!^)

      

from:

Echo !v:~0,-1! ^(!count!^) >> output.txt

      

This outputs:

www.example.com (1)
ns1.example.com (2)

      

With sample data

+2


source


This batch file solution should work faster:

@echo off
setlocal

rem Accumulate each occurance in its corresponding array element
for /F "tokens=6" %%a in (input.txt) do set /A "count[%%a]+=1"

rem Show the result
for /F "tokens=2,3 delims=[]=" %%a in ('set count[') do echo %%a (%%b)

      

Output:

ns1.example.com. (2)
www.example.com. (1)

      

To save the result to a file, change the last line as follows:

(for /F "tokens=2,3 delims=[]=" %%a in ('set count[') do echo %%a (%%b^)) > output.txt

      

+2


source







All Articles