How do I convert IO memory to custom data types?

Customization

I recently implemented reading files based on mmap

and immediately ran into strange behavior. Relevant code:

-- | map whole aedat file into memory and return it as a vector of events
-- TODO what are the finalizing semantics of this?
mmapAERData :: S.Storable a => FilePath -> IO (S.Vector (AER.Event a))
mmapAERData name = do
    -- mmap file into memory and find the offset behind the header
    bs <- dropHeader <$> mmapFileByteString name Nothing
    -- some conversion is necessary to get the 'ForeignPtr' from
    -- a 'ByteString'
    B.unsafeUseAsCString bs $ \ptr -> do
      fptr <- newForeignPtr_ ptr
      let count = B.length bs `div` 8 -- sizeof one event
      return $ S.unsafeFromForeignPtr0 (castForeignPtr fptr) count

      

-> code in context

Some explanation: The AEDat format is basically a long list of two Word32s. One encodes the address with the other with a timestamp. Before that, there are some lines of header text that I throw into the function dropHeader

. I could do this directly on ForeignPtr

if absolutely necessary, but I prefer to use a generic function that works for ByteStrings

.

Examples Storable

can be found here and here . I'm not sure about the alignment here, but I suspect the 8 alignment should be correct.

Problem

Reading the data works well enough, but after a while the memory seems to be corrupted somehow:

>>> es <- DVS.mmapDVSData "dataset.aedat" 
>>> es S.! 1000
Event {address = Address {polarity = D, posX = 6, posY = 50}, timestamp = 74.771407s}
>>> :type es
es :: S.Vector (DVS.Event DVS.Address)
>>> _ <- evaluate (V.convert es :: V.Vector (DVS.Event DVS.Address))
>>> es S.! 1000
Event {address = Address {polarity = D, posX = 0, posY = 44}, timestamp = 0s}

      

Obviously, accessing all the elements is es

somehow distorting my memory. Or does the garbage collector recycle it? It's weird anyway. What can I do about it?

+3


source to share


1 answer


mmapFileByteString

performs mmap

, which creates ForeignPtr

, and adheres to what is ForeignPtr

in ByteString

. unsafeUseAsCString

forces ForeignPtr

to Ptr

, from which a new one is then created ForeignPtr

. Then you take that second one ForeignPtr

and use it with S.unsafeFromForeignPtr0

to create a vector.

Having two ForeignPtr

pointing to the same memory is not. The GHC runtime treats them as two separate entities. After all references to have ByteString

disappeared, the finalizer for it will be called ForeignPtr

, freeing mmap

and reclaiming the underlying memory. This causes the second to ForeignPtr

point to an invalid area.

The solution is to use Data.ByteString.Internal.toForeignPtr

to extract and reuse ForeignPtr

from ByteString

. Replace the block with the unsafeUseAsCString

following:



let (fptr,offset,len) = Data.ByteString.Internal.toForeignPtr bs
-- it might be worthwhile to assert that offset == 0
let count = len `div` 8
return $ S.unsafeFromForeignPtr0 (castForeignPtr fptr) count

      

IMHO, the real solution here is not to mess with all this at all. Just conditionally read the file into ByteString

, pull the 8-byte substrings out of it, and manually convert them to Event

s. All of these things mmap

, and ForeignPtr

dangerous, and not much faster than doing things safely and correctly. If you want absolute maximum performance without considering security, a C program.

+1


source







All Articles