"grep -c" versus "wc -l"

I am processing several large text files i.e. converting them from one format to another. There are slight differences in the original file formats, but - with a bit of preprocessing in a few cases - they are mostly converted successfully with the bash shellscript I created.

So far so good, but one thing puzzled me. At some point, the script sets a variable named $iterations

so that it knows how many times to execute a particular for loop. This value is determined by the number of blank lines in the temporary file generated by the script.

So the original version of my script had the line:

    iterations=$(cat tempfile | grep '^$' | wc -l)

      

So far, this has worked fine with all but one of the text files, which did not seem to set the variable correctly $iterations

, giving the value "1", even though it turned out there were over 20,000 blank lines in tempfile

.

However, grep -c

upon finding it , I changed the line to:

    iterations=$(cat tempfile | grep -c '^$')

      

and the script worked unexpectedly, i.e. $iterations

was installed correctly.

Can anyone explain why the two versions give different results? And why would the first version work on some files and not others? Is there some kind of upper limit that exceeds the wc -l

default of 1? The file that won't work with the first version is one of the largest, but not the largest in the set (which was converted correctly the first time).

+3


source to share


1 answer


If the input is not a text file, then it grep

will print one line Binary file (standard input) matches

and wc -l

read that line! But it will grep -c

happily count the number of matches in the file.



+7


source







All Articles