Java nio: how to read characters from memory mapped file with correct encoding

for a new project, I have to read the characters of the file (with custom encoding) to process the input. Since some of these files can be quite large (> 100MB), I would like to test Java nio's capabilities for memory stick files for faster access.

However, I have not been able to figure out how I can create something "Reader" for example to read from MappedByteBuffer with correct encoding decoding.

To create a MappedByteBuffer I am currently using:

    RandomAccessFile raFile = new RandomAccessFile("myFile.bla", "r");
    FileChannel channel = raFile.getChannel();
    MappedByteBuffer mappedByteBuffer = channel.map(MapMode.READ_ONLY, 0, channel.size());

      

I know I can use getChar () to get the character from the MappedByteBuffer, but how can I specify the encoding? The javadoc says that always two bytes are read and concatenated into one char, but what about ASCII encoded files?

I also found the Channels.newReader (...) methods, which, however, can only handle a channel and not a memory mapped file. Is there something similar for the MappedByteBuffer?

Just to be sure, I know that memory mapping is somewhat expensive and therefore only useful for large files. I haven't made a decision (yet) whether to use it or not, but I want to evaluate it for my special use.

Thanks a lot in advance + best wishes Andreas

+3


source to share


1 answer


You can use CharsetDecoder

extracted from your loved one Charset

with Charset#newDecoder()

.

StandardCharsets.UTF_8.newDecoder().decode(mappedByteBuffer)

      



This returns CharBuffer

from which you can get char

values
.

Please note that this consumes full MappedByteBuffer

. If you only want a few bytes, create a new one ByteBuffer

from several bytes of the original MappedByteBuffer

and decode that.

+5


source







All Articles