Couldn't use memmem to search for strings

I don't like just dumping a load of code here and asking people to debug it for me, but I'm a little inexperienced with C and I'm completely stumped.

The overall goal is to do a little cleanup on a very large log file (11G +), I read 2048 bytes at a time, then go through the individual lines, writing them to the output file. I originally used strstr to find line endings, however, I found that this doesn't work with the partial line at the end of the read buffer - I think it's because the "line" I'm reading from the file doesn't have a \ 0 at the end. and strstr will get confused.

So after a little googling, I thought I'd try memmem, which appears to be a "binary safe" replacement for strstr. This is where I got stuck, my program was interrupted during a memmem call.

#include <stdio.h>
#include <string.h>

#define BUFF_LEN 2048

int main (void)
{
    char file_buff[BUFF_LEN], prev_line[BUFF_LEN], curr_line[BUFF_LEN];
    char *p_line_start, *p_lf;
    int bytes_consumed, bytes_read;
    FILE *in_fp, *out_fp;

    in_fp = fopen("208.log", "r");
    out_fp = fopen("expanded.log", "w+");

    int sane = 0;
    while (1) {
        bytes_read = fread(file_buff, 1, BUFF_LEN, in_fp);
        if (bytes_read == 0) {
            break;
        }

        // Set the pointer to the beginning of the file buffer
        p_line_start = file_buff;
        bytes_consumed = 0;

        // Chomp lines
        while (bytes_consumed < bytes_read) {
            printf("Read to go with bytes_read = %d, bytes_consumed = %d\n",
                bytes_read, bytes_consumed);
            p_lf = (char *) memmem(p_line_start, bytes_read - bytes_consumed,
                "\n", 1);
            if (p_lf == NULL) {
                // No newline left in file_buff, store what left in
                // curr_line and break out to read more from the file.
                printf("At loop exit I have chomped %ld of %d\n",
                    p_line_start - file_buff, bytes_read);
                //break;
                goto cleanup;
            }
            // Copy the line to our current line buffer (including the newline)
            memcpy(curr_line, p_line_start, p_lf - p_line_start + 1);
            printf("Chomped a line of length %ld\n", p_lf - p_line_start + 1);
            fwrite(curr_line, 1, p_lf - p_line_start + 1, out_fp);
            p_line_start = p_lf + 1;
            bytes_consumed += p_lf - p_line_start + 1;
        }

      

Can anyone drop me here? Advice on how best to debug this for yourself is also appreciated.

+3


source to share


1 answer


From one of your comments:

I am returning the return value because gcc was throwing warnings: msgstr "warning: assignment makes a pointer from an integer without a cast".

You just hide the problem by returning the return value.

memmem returns a pointer. Typically today a pointer is 64 bits. If you haven't already declared a function, the compiler doesn't know that it is returning a pointer, but instead assumes that it is returning an integer. Typically today an integer is 32 bits. The generated code will look where this integer would have been returned, and take 32 bits from there. It will actually be half of the returned pointer.



Try adding this line right after your call to memmem, and see if the printouts are different if you declare or don't declare memmem:

printf("[p_lf = %p]\n", (void*)p_lf);

      

When I ran it, with your original program (no declaration), it printed 0xffffffffffffda67 and then crashed because it was an invalid pointer. With the declaration (using #define _GNU_SOURCE) it printed 0x7fffffffda67 and didn't work. Note that if you only take the LSB 32 of 0x7fffffffda67, you get 0xffffda67, and if you then expand it to 64 bits, you get 0xffffffffffffda67, a pointer from your original program. (Address space allocation randomization is disabled.)

This is why you shouldn't be throwing return values.

+2


source







All Articles