Reading lines with spaces from a file

I am working on a project and I just ran into a really annoying problem. I have a file that stores all messages received by my account. A message is a data structure defined as follows:

typedef struct _message{
char dest[16]; 
char text[512];
}message;

      

dest

is a string that cannot contain spaces, unlike other fields. Lines are taken using a function fgets()

, however dest

, and text

may have a "dynamic" length (1 symbol before 1st symbol character). Note that I manually remove the newline character after I get each line from stdin.

The Inbox uses the following syntax to store messages:

dest
text

      

So, for example, if I have a message from Marco that says, "Hi, how are you?" and another message from Tarma that said, “Do you go to the gym today?” my inbox would look like this:

Marco
Hello, how are you?

Tarma
Are you going to the gym today?

      

I would like to read the username from a file and store it on line s1, and then do the same for the message and store it on line s2 (and then repeat the operation until EOF), but since the field text

allows spaces that I cannot use fscanf()

...

I tried to use fgets()

, but as I said, the size of each row is dynamic. For example, if I use fgets(my_file, 16, username)

it will result in unwanted characters being read. I just need to read the first line until \n

it is reached, and then read the second line until the next one is reached \n

, this time including spaces.

Any idea on how I can solve this problem?

+3


source to share


3 answers


#include <stdio.h>

int main(void){
    char username[16];
    char text[512];
    int ch, i;
    FILE *my_file = fopen("inbox.txt", "r");

    while(1==fscanf(my_file, "%15s%*c", username)){
        i=0;
        while (i < sizeof(text)-1 && EOF!=(ch=fgetc(my_file))){
            if(ch == '\n' && i && text[i-1] == '\n')
                break;
            text[i++] = ch;
        }
        text[i] = 0;
        printf("user:%s\n", username);
        printf("text:\n%s\n", text);
    }
    fclose(my_file);
    return 0;
}

      



+2


source


Since the length of each line is dynamic, then if I were you, I would first read the file to find each size of the line and then create a dynamic array of the line lengths.

Suppose your file is:

A long time ago
in a galaxy far,
far away....

      

So the length of the first line 15

, the second is the length of the line 16

, and the length of the third line 12

.



Then create a dynamic array to hold these values.

Then, as you read the lines, pass the fgets

corresponding array element as the second argument . How fgets (string , arrStringLength[i++] , f);

.

But this way you have to read your file twice, of course.

+2


source


You can use it fgets()

fairly easily as long as you're careful. This code works:

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

enum { MAX_MESSAGES = 20 };

typedef struct Message
{
    char dest[16]; 
    char text[512];
} Message;

static int read_message(FILE *fp, Message *msg)
{
    char line[sizeof(msg->text) + 1];
    msg->dest[0] = '\0';
    msg->text[0] = '\0';
    while (fgets(line, sizeof(line), fp) != 0)
    {
        //printf("Data: %zu <<%s>>\n", strlen(line), line);
        if (line[0] == '\n')
            continue;
        size_t len = strlen(line);
        line[--len] = '\0';
        if (msg->dest[0] == '\0')
        {
            if (len < sizeof(msg->dest))
            {
                memmove(msg->dest, line, len + 1);
                //printf("Name: <<%s>>\n", msg->dest);
            }
            else
            {
                fprintf(stderr, "Error: name (%s) too long (%zu vs %zu)\n",
                        line, len, sizeof(msg->dest)-1);
                exit(EXIT_FAILURE);
            }
        }
        else
        {
            if (len < sizeof(msg->text))
            {
                memmove(msg->text, line, len + 1);
                //printf("Text: <<%s>>\n", msg->dest);
                return 0;
            }
            else
            {
                fprintf(stderr, "Error: text for %s too long (%zu vs %zu)\n",
                        msg->dest, len, sizeof(msg->dest)-1);
                exit(EXIT_FAILURE);
            }
        }
    }
    return EOF;
}

int main(void)
{
    Message mbox[MAX_MESSAGES];
    int n_msgs;

    for (n_msgs = 0; n_msgs < MAX_MESSAGES; n_msgs++)
    {
        if (read_message(stdin, &mbox[n_msgs]) == EOF)
            break;
    }

    printf("Inbox (%d messages):\n\n", n_msgs);
    for (int i = 0; i < n_msgs; i++)
        printf("%d: %s\n   %s\n\n", i + 1, mbox[i].dest, mbox[i].text);

    return 0;
}

      

The reader handles (multiple) blank lines before the first name, between the name and the text, and after the last name. It's a bit unusual in the way it decides whether to store the string it just read in parts dest

or text

messages. It uses memmove()

because it knows exactly how much data needs to be moved and the data is null terminated. You can replace it with strcpy()

if you like, but it should be slower (perhaps not measurably slower) because it strcpy()

has to check every byte as it is copied, but it memmove()

doesn't. I use memmove()

it because it is always correct;memcpy()

can be used here, but it only works when you don't guarantee overlap. God saves man, who save himself; there are many software bugs without risking additional services. You can decide if the error output is okay for test code, but not necessarily a good idea in production code. You can decide how to handle "0 messages" versus "1 message" versus "2 messages" and so on.

You can easily revisit the code to use heap allocation for the message array. It would be easy to read the message in a simple variable Message

in main()

and organize up to a dynamic array in the preparation of the full report. The alternative is the "risk" of over-allocating the array, although this is unlikely to be a major problem (you would not grow the array one record at a time anyway to avoid quadratic behavior where memory has to be moved during each allocation).

If multiple fields were processed for each message (for example, received date and read date), you will need to refactor the code again, possibly with a different function.

Note that the code avoids the reserved namespace. A name is reserved for "implementation", eg _message

. Such code is not part of the implementation (of the C compiler and its support system), so you should not create names that start with an underscore. (This simplifies the limitation, but only slightly, and is much easier to understand than the thinner version.)

The code is careful not to write the magic number more than once.

Output example:

Inbox (2 messages):

1: Marco
   How are you?

2: Tarma
   Are you going to the gym today?

      

+1


source







All Articles