How do I write a wrapper script to take the first record for a specific column?
Below is a small file for demonstration. There are two columns and I would like to write a shell script to accept the first occurrence of each name.
--- input.txt ---
Name,Count Linux,2 Unix,10 Linux,10 Unix,4 Windows,6
--- desired output.txt ---
Name,Count Linux,2 Unix,10 Windows,6
The real input.txt is much larger (in GB size), so something that can scale would be large.
Also, I apologize if similar questions have been asked before (I could not find a solution to this by searching).
source to share
This would do it:
awk -F, '!seen[$1]++' input.txt
-F,
sets the input field separator to a comma. This means that $1
on each line there is a part before the comma (Name, Linux, Unix, etc.). seen
is an array that keeps track of values $1
that have already been seen. Each time it $1
matches, it seen[$1]
increases. The string is only displayed when it seen[$1]
is 0, which is only true the first time a new key is viewed.
source to share