Csplit on OS X doesn't recognize "$" as end-of-line character?

(I am using Mac OS X and this question may be specific to this Unix variant)

I am trying to split a file using csplit

with regex. It consists of various articles combined into one long text file. Each article ends with "All Rights Reserved". This is at the end of the line: grep Reserved$

finds them all. Only csplit

claims that there is no match.

csplit filename /Reserved$/

gives

csplit: Reserved$: no match

which is a clear and obvious lie. If I leave $

it works; but I want to be sure that I don't get any "Reserved" errors in the middle of the text. I've tried another word with a start-of-line character ^

and it seems to work. Other words (which appear at the end of a line in the data) also do not match when used (for example and$

).

Is this a known bug with OS X?

[Update: I made sure this is not a DOS / Unix line end-of-line problem by removing all carriage returns]

+3


source to share


1 answer


I downloaded the csplit source code from http://www.opensource.apple.com/source/text_cmds/text_cmds-84/csplit/csplit.c and tested it in the debugger.

Template compiled with

if (regcomp(&cre, re, REG_BASIC|REG_NOSUB) != 0)
    errx(1, "%s: bad regular expression", re);

      

and the lines match

/* Read and output lines until we get a match. */
first = 1;
while ((p = csplit_getline()) != NULL) {
    if (fputs(p, ofp) == EOF)
        break;
    if (!first && regexec(&cre, p, 0, NULL, 0) == 0)
        break;
    first = 0;
}

      

The problem is that the lines returned csplit_getline()

still have a terminating newline \n

. Therefore, "Reserved" is not the last characters in the line, and the pattern "Reserved" does not match.

After a quick and dirty insert



    p[strlen(p)-1] = 0;

      

to remove the trailing newline from the input string, the "Reserved $" pattern worked as expected.

There seems to be more problems with csplit on Mac OS X, see the notes on the answer Finding the correct regex for csplit (repetition count {*}

also doesn't work).

Note. You can match "Reserved" at the end of the line with the following trick:

csplit filename /Reserved<Ctrl-V><Ctrl-J>/

      

where you are actually using the control keys to enter a newline character on the command line.

+3


source







All Articles