How to lazy process XML document using hexpat?
-- Process document before handling error, so we get lazy processing.
For testing purposes, I redirected the output to
and dumped a 300MB file into it. Memory consumption continued to rise until I had to kill the process.
Now I removed the error handling from the function
process :: String -> IO () process filename = do inputText <- L.readFile filename let (xml, mErr) = parse defaultParseOptions inputText :: (UNode String, Maybe XMLParseError) hFile <- openFile "/dev/null" WriteMode hFile $ format xml hClose hFile return ()
As a result, the function now uses persistent memory. Why does error handling lead to massive memory consumption?
As I understand it,
are two separate unreasonable thunks after the call
and builds the 'mErr' score tree? If so, is there a way to deal with the error when using persistent memory?
source to share
I can't speak with hexpat permissions, but in general, error handling will force you to read the entire file into memory. If you only want to print the result if there are no errors in the input, you need to read the entire input before issuing.
As I said, I don't really know hexpat, but with an xml feed, you can do something like:
try $ runResourceT $ parseFile def inputFile $$ renderBytes def =$ sinkFile outputFile
It will use persistent memory, and if there are any processing errors, it will throw an exception (which it will
). The downside is that the output file can get corrupted. I guess it is best to output to a temporary file, and after the whole process completes, move the temporary file to the output file. Either way, just delete the temporary file.
source to share