Comparing two lists with a shell script

Suppose I have two lists of numbers in files f1, f2, each number one per line. I want to see how many numbers in the first list are not in the second and vice versa. I am currently using grep -f f2 -v f1 and then iterating over this with a shell script. It's pretty slow (quadratic time hurts). Is there a better way to do this?

+2


source to share


3 answers


I like "comm" for this kind of thing. (files need to be sorted.)



$ cat f1
1
2
3
$ cat f2
1
4
5
$ comm f1 f2
        1
2
3
    4
    5
$ comm -12 f1 f2
1
$ comm -23 f1 f2
2
3
$ comm -13 f1 f2
4
5
$ 

      

+8


source


Could you just put each number on one line and then diff

(1)? You may need to sort the lists ahead of time, although this needs to work correctly.



+2


source


In the special case, when one file is a subset of another, the following:

cat f1 f2 | sort | uniq -u

      

will only list lines in the larger file. And, of course, the pipeline wc -l

will show the bill.

However, this is not exactly what you described.

This one-liner often serves my special needs, but I'd like to see a more general solution.

+1


source







All Articles