Performance of IORef

I’ve tried to see how slower can ghc be compared to gcc in compiling a simple
imperative program. I’ve written this small test program in C:

/* loop.c */
#include 

int
main(int argc, char **argv)
{
    int i, j, k, t = 0;
    for(i = 0; i < 1000; i++)
        for(j = 0; j < 1000; j++)
            for(k = 0; k < 1000; k++)
                if(i * i + j * j + k * k % 7 == 0)
                    t++;
    printf("%d\n", t);
    return 0;
}

And a similar version in imperative Haskell, being as similar as possible to
the original C code:

-- loop.hs
import Control.Applicative
import Control.Monad
import Data.IORef

main :: IO ()
main
  = do
    i <- int
    j <- int
    k <- int
    t <- int
    for (i =: 0) (i <: 1000) (inc i)
      $ for (j =: 0) (j <: 1000) (inc j)
        $ for (k =: 0) (k <: 1000) (inc k)
          $ do
            i_ <- readIORef i
            j_ <- readIORef j
            k_  a -> IO ()
(=:) = writeIORef

( IORef a -> a -> IO Bool
i <: v = (< v)  readIORef i

inc :: Enum a => IORef a -> IO ()
inc i = modifyIORef i succ

for :: IO () -> IO Bool -> IO () -> IO () -> IO ()
for preface condition step code
  = preface >> for_ condition step code

for_ :: IO Bool -> IO () -> IO () -> IO ()
for_ condition step code
  = condition >>= flip when (code >> step >> for_ condition step code)

A functional version:

-- functional.hs
main :: IO ()
main
  = print
    $ length
      [i
        | i <- [0 :: Int .. 999]
          , j <- [0 .. 999]
          , k <- [0 .. 999]
          , i * i + j * j + k * k `mod` 7 == 0]

And a version using Template Haskell.

{-# LANGUAGE TemplateHaskell #-}

import Language.Haskell.TH.Syntax

main :: IO ()
main
  = print
    $ $(lift $ length
      [i
        | i <- [0 :: Int .. 999]
          , j <- [0 .. 999]
          , k <- [0 .. 999]
          , i * i + j * j + k * k `mod` 7 == 0])

The results:

$ time gcc -O3 -o loop -Wall loop.c

real    0m0.256s
user    0m0.148s
sys     0m0.040s
$ time ./loop
143

real    0m13.652s
user    0m11.629s
sys     0m0.052s
$ time ghc --make -O3 loop.hs

real    0m0.209s
user    0m0.160s
sys     0m0.020s
$ time ./loop
143

real    3m46.834s
user    3m24.313s
sys     0m0.712s
$ time ghc --make -O3 functional.hs
[1 of 1] Compiling Main             ( functional.hs, functional.o )
Linking functional ...

real    0m1.420s
user    0m1.016s
sys     0m0.220s
$ time ./functional
143

real    1m29.057s
user    1m25.809s
sys     0m0.224s
$ time ghc --make -O3 th.hs
[1 of 1] Compiling Main             ( th.hs, th.o )
Loading package ghc-prim ... linking ... done.
Loading package integer-gmp ... linking ... done.
Loading package base ... linking ... done.
Loading package array-0.3.0.0 ... linking ... done.
Loading package containers-0.3.0.0 ... linking ... done.
Loading package pretty-1.0.1.1 ... linking ... done.
Loading package template-haskell ... linking ... done.

real    90m54.226s
user    87m32.024s
sys     0m14.785s
marcot@zezinho:~/codigo/haskell$ time ./th
143

real    0m0.005s
user    0m0.000s
sys     0m0.004s

The time to build the code with TH is not the best, but the execution time
surely is. Maybe we could use this technic to improve the execution time of all
of the codes, interpreting a part of the code in build time automaticly,
without user indication.

The results in a table:

Version Build time (s) Execution time (s)
C 0.256 13.652
Imperative Haskell (IORefs) 0.209 136.834
Functional Haskell 1.420 89.057
Template Haskell 5454.226 0.005
About these ads

7 Respostas to “Performance of IORef”

  1. Anônimo Says:

    What’s the size of the template binary?

  2. Anonymous Says:

    Your first Haskell code sample seems a bit mangled.

  3. marcotmarcot Says:

    Yes, they’re all messed up, sorry by that. A link to them:

    http://people.debian.org/~marcot/blog/loop.c

    http://people.debian.org/~marcot/blog/loop.hs

    http://people.debian.org/~marcot/blog/functional.hs

    http://people.debian.org/~marcot/blog/th.hs

    The sizes of the binary are: 6.6K (C), 675K (IORef), 676K (Functional) and 689K
    (TH).

  4. Dale King Says:

    given the run time for template haskell it looks like the compiler unrolled the loops. I doubt you would get the same results with dynamic data instead of static.

  5. Anonymous Says:

    Can you say what compiler versions and hardware you’re using?

  6. marcotmarcot Says:

    Dale, you’re right, this can only be used with static data.

    $ gcc –version
    gcc (Debian 4.4.3-3) 4.4.3
    Copyright (C) 2010 Free Software Foundation, Inc.
    This is free software; see the source for copying conditions. There is NO
    warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

    $ ghc –version
    The Glorious Glasgow Haskell Compilation System, version 6.12.1

    $ cat /proc/cpuinfo
    processor : 0
    vendor_id : GenuineIntel
    cpu family : 6
    model : 15
    model name : Intel(R) Core(TM)2 Duo CPU T5750 @ 2.00GHz
    stepping : 13
    cpu MHz : 1994.810
    cache size : 2048 KB
    physical id : 0
    siblings : 2
    core id : 0
    cpu cores : 2
    apicid : 0
    initial apicid : 0
    fpu : yes
    fpu_exception : yes
    cpuid level : 10
    wp : yes
    flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx lm constant_tsc arch_perfmon pebs bts rep_good aperfmperf pni dtes64 monitor ds_cpl est tm2 ssse3 cx16 xtpr pdcm lahf_lm
    bogomips : 3989.62
    clflush size : 64
    cache_alignment : 64
    address sizes : 36 bits physical, 48 bits virtual
    power management:

    processor : 1
    vendor_id : GenuineIntel
    cpu family : 6
    model : 15
    model name : Intel(R) Core(TM)2 Duo CPU T5750 @ 2.00GHz
    stepping : 13
    cpu MHz : 1994.810
    cache size : 2048 KB
    physical id : 0
    siblings : 2
    core id : 1
    cpu cores : 2
    apicid : 1
    initial apicid : 1
    fpu : yes
    fpu_exception : yes
    cpuid level : 10
    wp : yes
    flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx lm constant_tsc arch_perfmon pebs bts rep_good aperfmperf pni dtes64 monitor ds_cpl est tm2 ssse3 cx16 xtpr pdcm lahf_lm
    bogomips : 3990.09
    clflush size : 64
    cache_alignment : 64
    address sizes : 36 bits physical, 48 bits virtual
    power management:

    $ free -m
    total used free shared buffers cached
    Mem: 3014 1566 1447 0 102 786
    -/+ buffers/cache: 677 2336
    Swap: 2999 0 2999
    $ uname -a
    Linux zezinho 2.6.32-2-amd64 #1 SMP Fri Feb 12 00:01:47 UTC 2010 x86_64 GNU/Linux

    I hope this answers the question.

  7. Another interesting code using Template Haskell « Blog do Marcot Says:

    […] interesting code using Template Haskell While writing the post about performance, I got interested in Template Haskell, specially in the $(lift) construction.  Investigating a […]

Deixe um comentário

Preencha os seus dados abaixo ou clique em um ícone para log in:

Logotipo do WordPress.com

Você está comentando utilizando sua conta WordPress.com. Sair / Alterar )

Imagem do Twitter

Você está comentando utilizando sua conta Twitter. Sair / Alterar )

Foto do Facebook

Você está comentando utilizando sua conta Facebook. Sair / Alterar )

Foto do Google+

Você está comentando utilizando sua conta Google+. Sair / Alterar )

Conectando a %s


Seguir

Obtenha todo post novo entregue na sua caixa de entrada.