Comparing two lists with a shell script
Suppose I have two lists of numbers in files f1, f2, each number one per line. I want to see how many numbers in the first list are not in the second and vice versa. I am currently using grep -f f2 -v f1 and then iterating over this with a shell script. It's pretty slow (quadratic time hurts). Is there a better way to do this?
I like "comm" for this kind of thing. (files need to be sorted.)
$ cat f1
1
2
3
$ cat f2
1
4
5
$ comm f1 f2
1
2
3
4
5
$ comm -12 f1 f2
1
$ comm -23 f1 f2
2
3
$ comm -13 f1 f2
4
5
$
Could you just put each number on one line and then diff
(1)? You may need to sort the lists ahead of time, although this needs to work correctly.
In the special case, when one file is a subset of another, the following:
cat f1 f2 | sort | uniq -u
will only list lines in the larger file. And, of course, the pipeline wc -l
will show the bill.
However, this is not exactly what you described.
This one-liner often serves my special needs, but I'd like to see a more general solution.