An efficient way to generate a lot of text in Haskell

The full source code and profiling report is here: https://gist.github.com/anonymous/92334d00859c3db0ba8a

I am trying to create a large text file as a PPM image text file and I am facing some performance issues. I could have used a different image format, but I was interested in the text generation performance as I have other similar situations where I do not have the flexibility to choose an alternative format.

I started by concatenating String

together to generate my text file and quickly found that it took almost 90% of the execution time. So, I switched to Data.Text

, but found that the performance was not significantly improved. I ended up creating a test file to try and isolate the problem by comparing two functions:

ppmS (Matrix mx) = unlines ["P3", show w', show w', "255", pixels]
  where
    w'      = mxSize mx * scale
    pixels  = unlines $ map row [0..w'-1]
    row j   = intercalate " " $ map (pix j) [0..w'-1]
    pix j i = color (mx ! (div i scale, div j scale))

ppmT (Matrix mx) = T.unlines ["P3", T.pack (show w'), T.pack (show w'), "255", pixels]
  where
    w'      = mxSize mx * scale
    pixels  = T.unlines $ map row [0..w'-1]
    row j   = T.intercalate " " $ map (pix j) [0..w'-1]
    pix j i = color (mx ! (div i scale, div j scale))

      

Run through the profiler with the following commands:

ghc -O2 --make -prof -auto-all -caf-all -fforce-recomp test.hs
./test +RTS -p

      

I see the following:

total time  =        0.60 secs   (597 ticks @ 1000 us, 1 processor)
total alloc = 1,162,898,488 bytes  (excludes profiling overheads)

                                                     individual     inherited
COST CENTRE              MODULE  no.     entries  %time %alloc   %time %alloc

MAIN                     MAIN     96           0    0.0    0.0   100.0  100.0
 main                    Main    193           0   24.1   14.7    24.1   14.7
 CAF:main3               Main    188           0    0.0    0.0    39.9   37.0
  main                   Main    225           0    0.0    0.0    39.9   37.0
   main.ppmFromText      Main    226           0    0.0    0.0    39.9   37.0
    ppmT                 Main    227           0    8.5    9.3    39.9   37.0
     ppmT.row            Main    252           0    0.0    0.0     0.0    0.0
     ppmT.pixels         Main    250           1    8.7    9.3    31.3   27.7
      ppmT.row           Main    251         500   20.6   18.4    22.6   18.4
       ppmT.pix          Main    253      250000    1.8    0.0     2.0    0.0
        color            Main    254      250000    0.2    0.0     0.2    0.0
 CAF:main6               Main    171           0    0.0    0.0    35.8   48.3
  main                   Main    198           0    0.0    0.0    35.8   48.3
   main.ppmFromString    Main    199           0    0.0    0.0    35.8   48.3
    ppmS                 Main    200           0    9.4   14.4    35.8   48.3
     ppmS.pixels         Main    216           1    8.5   14.5    26.5   33.9
      ppmS.row           Main    217         500   13.9   19.4    17.9   19.4
       ppmS.pix          Main    218      250000    3.5    0.0     4.0    0.0
        color            Main    219      250000    0.5    0.0     0.5    0.0

      

which tells me both versions Text

both String

take significant time and allocate significant memory.

What is the best way to generate this text that is more time efficient and memory efficient?

+3


source to share


1 answer


Update: As it turns out, just using ByteStrings instead of Text with:

import qualified Data.ByteString.Char8 as T
import qualified Data.ByteString as TI

      

achieves the same or even better performance than the Blaze approach I originally tried. See the updated statistics table at the end.

Original answer:

You can achieve better results with the flame constructor monoid.

Here's your algorithm, adapted for use Blaze.ByteString.Builder

:



import Blaze.ByteString.Builder
import Blaze.ByteString.Builder.Char8
import Data.Monoid
import Data.List (intersperse)

munlines = mconcat . map ( <> (fromChar '\n') )
mintercalate s xs = mconcat $ intersperse s xs

ppmB (Matrix mx) = munlines [ fromString "P3",
                              fromString (show w'),
                              fromString (show w'),
                              fromString "255",
                              pixels ]
  where
    w'      = mxSize mx * scale
    pixels  = munlines $ map row [0..w'-1]
    row j   = mintercalate (fromString " ") $ map (pix j) [0..w'-1]
    pix j i = fromString $ color (mx ! (div i scale, div j scale))

main = do
  let m = makeMatrix
  let ppmFromString = toLazyByteString $ ppmB m
  LBS.writeFile "output.ppm" ppmFromString

      

Full source is available here .

On my machine, I get the following RTS stats for four versions:

             Allocated   Time      %GC
  string     561 MB      0.40 s    56.6 %
  text       601 MB      0.25 s     6.9 %
  blaze       95 MB      0.07 s     3.0 %
  bytestring  91 MB      0.06 s    10.1 %

      

Another option is to use a monad Put

from a binary package.

+3


source







All Articles