How do I convert IO memory to custom data types?
Customization
I recently implemented reading files based on mmap
and immediately ran into strange behavior. Relevant code:
-- | map whole aedat file into memory and return it as a vector of events
-- TODO what are the finalizing semantics of this?
mmapAERData :: S.Storable a => FilePath -> IO (S.Vector (AER.Event a))
mmapAERData name = do
-- mmap file into memory and find the offset behind the header
bs <- dropHeader <$> mmapFileByteString name Nothing
-- some conversion is necessary to get the 'ForeignPtr' from
-- a 'ByteString'
B.unsafeUseAsCString bs $ \ptr -> do
fptr <- newForeignPtr_ ptr
let count = B.length bs `div` 8 -- sizeof one event
return $ S.unsafeFromForeignPtr0 (castForeignPtr fptr) count
Some explanation: The AEDat format is basically a long list of two Word32s. One encodes the address with the other with a timestamp. Before that, there are some lines of header text that I throw into the function dropHeader
. I could do this directly on ForeignPtr
if absolutely necessary, but I prefer to use a generic function that works for ByteStrings
.
Examples Storable
can be found here and here . I'm not sure about the alignment here, but I suspect the 8 alignment should be correct.
Problem
Reading the data works well enough, but after a while the memory seems to be corrupted somehow:
>>> es <- DVS.mmapDVSData "dataset.aedat"
>>> es S.! 1000
Event {address = Address {polarity = D, posX = 6, posY = 50}, timestamp = 74.771407s}
>>> :type es
es :: S.Vector (DVS.Event DVS.Address)
>>> _ <- evaluate (V.convert es :: V.Vector (DVS.Event DVS.Address))
>>> es S.! 1000
Event {address = Address {polarity = D, posX = 0, posY = 44}, timestamp = 0s}
Obviously, accessing all the elements is es
somehow distorting my memory. Or does the garbage collector recycle it? It's weird anyway. What can I do about it?
source to share
mmapFileByteString
performs mmap
, which creates ForeignPtr
, and adheres to what is ForeignPtr
in ByteString
. unsafeUseAsCString
forces ForeignPtr
to Ptr
, from which a new one is then created ForeignPtr
. Then you take that second one ForeignPtr
and use it with S.unsafeFromForeignPtr0
to create a vector.
Having two ForeignPtr
pointing to the same memory is not. The GHC runtime treats them as two separate entities. After all references to have ByteString
disappeared, the finalizer for it will be called ForeignPtr
, freeing mmap
and reclaiming the underlying memory. This causes the second to ForeignPtr
point to an invalid area.
The solution is to use Data.ByteString.Internal.toForeignPtr
to extract and reuse ForeignPtr
from ByteString
. Replace the block with the unsafeUseAsCString
following:
let (fptr,offset,len) = Data.ByteString.Internal.toForeignPtr bs
-- it might be worthwhile to assert that offset == 0
let count = len `div` 8
return $ S.unsafeFromForeignPtr0 (castForeignPtr fptr) count
IMHO, the real solution here is not to mess with all this at all. Just conditionally read the file into ByteString
, pull the 8-byte substrings out of it, and manually convert them to Event
s. All of these things mmap
, and ForeignPtr
dangerous, and not much faster than doing things safely and correctly. If you want absolute maximum performance without considering security, a C program.
source to share