Reading binary files without buffering the whole file in memory in C ++

To make a binary comparator, I am trying to read the binary content of two files using the CreateFileW function. However, this results in the entire file being buffered into memory and this becomes a problem for large (500MB) files.

I have looked at other functions that will allow me to simply store part of a file, but I have not found any documentation specifically indicating how the buffer for these functions works (I am a bit new to this so I may be missing the obvious).

So far, the best match I seem to have found is ReadFile. It seems to have a specific buffer, but I'm not entirely sure there won't be another buffer implemented behind the scenes, such as with CreateFileW.

Do you have any data on what would be a good function to use?

+2


source to share


4 answers


You can use memory mapped files for this. open createFile

, use createFileMapping

, then MapViewOfFile

to get a pointer to the data.



+7


source


Not sure what you mean by buffering CreateFile - CreateFile won't read in the entire contents of the file, and furthermore, you need to call CreateFile before you can call ReadFile.

ReadFile will do what you want - the operating system can read a little before the data for opportunistic data caching, but it won't read the entire 500MB of the file.



If you really want to avoid buffering, pass FILE_FLAG_NO_BUFFERING to CreateFile and make sure your file accesses are a multiple of the volume's sector size. I highly recommend you don't do this - the system file cache exists for a reason and helps in performance. Caching files in memory should not affect overall system memory usage - under memory pressure, the system file cache will decrease.

As mentioned, you can also use memory mapped files. The difference between memory files and ReadFile is mostly just an interface - ultimately the file manager will serve requests in a similar way, including some buffering. The interface looks a little more intuitive, but keep in mind that any errors that occur will result in an exception that must be caught or it will crash your program.

+5


source


Calling CreateFile () does not itself buffer or read the contents of the target file. After calling CreateFile (), you must call ReadFile () to get any parts of the file you want, for example to read the first kilobyte of the file:

DWORD cbRead;
BYTE buffer[1024];
HANDLE hFile = ::CreateFile(filename,
                            GENERIC_READ,
                            FILE_SHARE_READ,
                            NULL,
                            OPEN_EXISTING,
                            FILE_ATTRIBUTE_NORMAL,
                            NULL);
::ReadFile(hFile, sizeof(buffer), &cbRead, NULL);
::CloseHandle(hFile);

      

Also, if you want to read a random part of a file, you can use SetFilePointer () before calling ReadFile (), for example, to read one kilobyte, starting one megabyte into a file:

DWORD cbRead;
BYTE buffer[1024];
HANDLE hFile = ::CreateFile(filename,
                            GENERIC_READ,
                            FILE_SHARE_READ,
                            NULL,
                            OPEN_EXISTING,
                            FILE_ATTRIBUTE_NORMAL,
                            NULL);
::SetFilePointer(hFile, 1024 * 1024, NULL, FILE_BEGIN);
::ReadFile(hFile, sizeof(buffer), &cbRead, NULL);
::CloseHandle(hFile);

      

You can of course call SetFilePointer () and ReadFile () as many times as you like while the file is open. Calling ReadFile () implicitly sets the file pointer to a byte immediately after the last byte read by ReadFile ().

In addition, you should read the documentation for the File Management Functions you are using and properly check the return values ​​to catch any errors that may arise.

Windows can, at its discretion, use the available system memory to cache the contents of open files, but data cached by this process will be discarded if memory is needed by a running program (after all, cached data can simply be re-read from disk if needed).

+5


source


I believe you want MapViewOfFile .

+1


source







All Articles