An efficient way to generate a lot of text in Haskell
The full source code and profiling report is here: https://gist.github.com/anonymous/92334d00859c3db0ba8a
I am trying to create a large text file as a PPM image text file and I am facing some performance issues. I could have used a different image format, but I was interested in the text generation performance as I have other similar situations where I do not have the flexibility to choose an alternative format.
I started by concatenating String
together to generate my text file and quickly found that it took almost 90% of the execution time. So, I switched to Data.Text
, but found that the performance was not significantly improved. I ended up creating a test file to try and isolate the problem by comparing two functions:
ppmS (Matrix mx) = unlines ["P3", show w', show w', "255", pixels]
where
w' = mxSize mx * scale
pixels = unlines $ map row [0..w'-1]
row j = intercalate " " $ map (pix j) [0..w'-1]
pix j i = color (mx ! (div i scale, div j scale))
ppmT (Matrix mx) = T.unlines ["P3", T.pack (show w'), T.pack (show w'), "255", pixels]
where
w' = mxSize mx * scale
pixels = T.unlines $ map row [0..w'-1]
row j = T.intercalate " " $ map (pix j) [0..w'-1]
pix j i = color (mx ! (div i scale, div j scale))
Run through the profiler with the following commands:
ghc -O2 --make -prof -auto-all -caf-all -fforce-recomp test.hs
./test +RTS -p
I see the following:
total time = 0.60 secs (597 ticks @ 1000 us, 1 processor)
total alloc = 1,162,898,488 bytes (excludes profiling overheads)
individual inherited
COST CENTRE MODULE no. entries %time %alloc %time %alloc
MAIN MAIN 96 0 0.0 0.0 100.0 100.0
main Main 193 0 24.1 14.7 24.1 14.7
CAF:main3 Main 188 0 0.0 0.0 39.9 37.0
main Main 225 0 0.0 0.0 39.9 37.0
main.ppmFromText Main 226 0 0.0 0.0 39.9 37.0
ppmT Main 227 0 8.5 9.3 39.9 37.0
ppmT.row Main 252 0 0.0 0.0 0.0 0.0
ppmT.pixels Main 250 1 8.7 9.3 31.3 27.7
ppmT.row Main 251 500 20.6 18.4 22.6 18.4
ppmT.pix Main 253 250000 1.8 0.0 2.0 0.0
color Main 254 250000 0.2 0.0 0.2 0.0
CAF:main6 Main 171 0 0.0 0.0 35.8 48.3
main Main 198 0 0.0 0.0 35.8 48.3
main.ppmFromString Main 199 0 0.0 0.0 35.8 48.3
ppmS Main 200 0 9.4 14.4 35.8 48.3
ppmS.pixels Main 216 1 8.5 14.5 26.5 33.9
ppmS.row Main 217 500 13.9 19.4 17.9 19.4
ppmS.pix Main 218 250000 3.5 0.0 4.0 0.0
color Main 219 250000 0.5 0.0 0.5 0.0
which tells me both versions Text
both String
take significant time and allocate significant memory.
What is the best way to generate this text that is more time efficient and memory efficient?
source to share
Update: As it turns out, just using ByteStrings instead of Text with:
import qualified Data.ByteString.Char8 as T
import qualified Data.ByteString as TI
achieves the same or even better performance than the Blaze approach I originally tried. See the updated statistics table at the end.
Original answer:
You can achieve better results with the flame constructor monoid.
Here's your algorithm, adapted for use Blaze.ByteString.Builder
:
import Blaze.ByteString.Builder
import Blaze.ByteString.Builder.Char8
import Data.Monoid
import Data.List (intersperse)
munlines = mconcat . map ( <> (fromChar '\n') )
mintercalate s xs = mconcat $ intersperse s xs
ppmB (Matrix mx) = munlines [ fromString "P3",
fromString (show w'),
fromString (show w'),
fromString "255",
pixels ]
where
w' = mxSize mx * scale
pixels = munlines $ map row [0..w'-1]
row j = mintercalate (fromString " ") $ map (pix j) [0..w'-1]
pix j i = fromString $ color (mx ! (div i scale, div j scale))
main = do
let m = makeMatrix
let ppmFromString = toLazyByteString $ ppmB m
LBS.writeFile "output.ppm" ppmFromString
Full source is available here .
On my machine, I get the following RTS stats for four versions:
Allocated Time %GC
string 561 MB 0.40 s 56.6 %
text 601 MB 0.25 s 6.9 %
blaze 95 MB 0.07 s 3.0 %
bytestring 91 MB 0.06 s 10.1 %
Another option is to use a monad Put
from a binary package.
source to share