Should I keep the original files in memory when parsing?

Question

Should I keep the original files in memory when parsing?

I am writing the front end of an interpreter and at first I didn’t like the idea of just dumping all source files into memory and then linking to that text directly. Thus, the tokenizer reads a char from the buffer and creates a stream of tokens.

However, I got to the parsing side of things and it hit me because I would like to output good errors and warnings that show the wrong line of source code. I guess I could put the column numbers in tokens, but then the error messages would be like getting directions over the phone: “It's in file X, on line Y, in column Z, next to the curly brace, you know that. semicolon, you've gone far. "

I seem to be in a situation where I want to get my cake and eat it too. I want nice messages, but I don't want the memory to be loaded.

Is there something I am missing? Or loads the source into memory, how to go?

+3

c parsing

that_individual 09 Aug 17 at 2:08

source to share

2 answers

Best idea: mmap

your sources first, if you can. Revert to breaking the whole file if you are reading from a pipe or something.

After parsing, you can call madvise(MADV_DONTNEED)

(but only if it was originally mmap

ed) to tell the kernel to remove it from the cache (but still keep it for errors) ... but this is probably not necessary, and might not even be good idea, depending on your compiler design (e.g. identifiers still pointing, or interned for one, separate, distribution).

+1

o11c 09 Aug '17 at 3:11

source to share

Ira Baxter · Accepted Answer · 2017-08-09T02:58:02+0000

When an error message appears to the user, it hardly matters how long, in milliseconds, to report it.

I would keep your tokenized stream in memory to keep your translator fast. (You actually need to switch to a streaming interpreter, or even compromise on a bad one pass to improve execution speed).

If an error occurs, go to disk, select the line of interest and show it to the user. If he doesn't make mistakes, it will cost you zero. If he makes a small number of mistakes, it may be marginally ineffective, but the user won't know. If it makes a large number of errors, the contents of the file from the files containing the errors will be read by the OS into its local cache, which is more than your programs anyway, and therefore access will be more efficient than if you saved the source entirely to disk.

Should I keep the original files in memory when parsing?

More articles: