As a part of my Google Summer of Code project, I should take a look at criterion, a benchmark library for Haskell. As I’d have to use it, and it seems to be a useful library for a lot of people, I packaged it to Debian, with all its dependencies. The haskell-criterion package is waiting on the NEW queue.
Ok, so now I had to test it with something. I came up with a test related to my reading about the Haskell execution model. I noticed that I never have played with multi-threaded Haskell code to achieve better performance, but only to make things that should be in parallel work. So I thought this was a good oportunity to try both. I wrote this code:
import Control.Applicative ((<$>), (<*>))
import Control.Concurrent.MVar (newEmptyMVar, putMVar, takeMVar)
import Control.Exception (evaluate)
import Criterion.Main (Benchmark, bench, bgroup, defaultMain, whnf, whnfIO)
import GHC.Conc (forkIO, par)
import System.Environment (getArgs)
value :: Int
value = 32
threadsL :: [Int]
threadsL = [1, 2, 3, 4, 8, 16, 32, 64]
main :: IO ()
main
= defaultMain
[bgroup "fib" $ [bench "fib" $ whnf fib value]
, bgroup "pfib" $ map pbench threadsL
, bgroup "ifib" $ map ibench threadsL]
pbench :: Int -> Benchmark
pbench i = bench (show i) $ whnf (pfib i) value
ibench :: Int -> Benchmark
ibench i = bench (show i) $ whnfIO $ ifib i value
fib :: Int -> Int
fib n
| n < 2 = n
| otherwise = fib (n – 1) + fib (n – 2)
pfib :: Int -> Int -> Int
pfib threads n
| n < 2 = n
| threads < 2 = fib n
| otherwise = par n1 $ seq n2 $ n1 + n2
where
n1 = pfib (threads – 1) $ n – 1
n2 = fib $ n – 2
ifib :: Int -> Int -> IO Int
ifib threads n
| n < 2 = evaluate n
| threads < 2 = evaluate $ fib n
| otherwise
= do
f1 <- newEmptyMVar
f2 <- newEmptyMVar
forkIO $ ifib (threads – 1) (n – 1) >>= putMVar f1
(+) <$> evaluate (fib $ n – 2) <*> takeMVar f1
My point is to benchmark the naïve implementation of the Fibonacci function. I’ve done three implementations:
- fib
- which is the simple sequential algorithm
- pfib
- which call par threads times
- ifib
- which call forkIO trheads times
I have chosen the value to be found to 32, because it took long enought to make the algorithms show a difference, and short enough to allow me to wait for the benchmarks. I built with:
ghc --make -threaded fib
And runned with:
./fib --summary=fib.csv +RTS -N
The results were surprising for me:
Name,Mean,MeanLB,MeanUB,Stddev,StddevLB,StddevUB
"fib/fib",1.7197746428132055,1.5104790505051613,2.031735854971409,1.2921598378638277,0.9514102096387,1.7188536150257019
"pfib/1",1.246479478704929,1.1805571945786482,1.3496261318802836,0.41570319371618925,0.29446316107198395,0.5823241217769125
"pfib/2",0.709402781355381,0.6850755818963056,0.7553855475068096,0.1650893842889147,0.10136189180677138,0.26853312094492815
"pfib/3",0.6876571162819864,0.6694172366738321,0.7332831247925761,0.136836636420341,5.5142145935712775e-2,0.2581805596992922
"pfib/4",0.6940202649712561,0.6684220250725745,0.746035416948795,0.17779646229098522,9.15862482237111e-2,0.2804203635281818
"pfib/8",0.5887631162285805,0.5625207790017128,0.637939236986637,0.17967127569396815,0.10516397607491562,0.2680488710546616
"pfib/16",0.6578513725876809,0.6107277926087382,0.7372924216866497,0.3070809683902726,0.20679305873839168,0.44571891526087154
"pfib/32",0.8900135167717932,0.8123333438515664,0.9942556460976597,0.45942103747727,0.36818132854060703,0.6233813540278427
"pfib/64",0.6894888337731359,0.6386897596001626,0.7766118057847021,0.33329115700425754,0.22490082647959764,0.5323965317477072
"ifib/1",1.3290104445099833,1.2501559790253634,1.4700443919777875,0.5277373265779313,0.35208278898795436,0.9179652474421286
"ifib/2",0.7626163347840308,0.7239342554688452,0.8381309326767921,0.26785654248385327,0.1679981089775395,0.44853454282169086
"ifib/3",0.7440206035256386,0.6907905062317847,0.8396386941552162,0.3549675945489581,0.22865300516283724,0.536883746220313
"ifib/4",0.6503480085015301,0.6130640967965126,0.711405030119419,0.239522235814317,0.1712396398224678,0.3616636557596486
"ifib/8",0.6300301631569865,0.6085795434594155,0.699028070795536,0.17776228649867923,6.103918021418795e-2,0.39508078831864085
"ifib/16",0.6735970553040505,0.6393461784005164,0.7405966480851172,0.2366561170051713,0.13725890735483465,0.38273830961617655
"ifib/32",0.6162510784745214,0.5983195599198344,0.6736035832047466,0.14678572822644564,5.335959472427259e-2,0.3231734735693349
"ifib/64",0.7217859562516212,0.6820346101403237,0.7857911285042766,0.2534368199920431,0.18073576275895306,0.37636526159654315
I’m only analyzing the first column.
How come pfib/1 and ifib/1, that is, pfib and ifib called with just one thread, got faster than fib/fib? All they do is calling fib! There’s certainly something wrong here. So I reran the tests in a single user environment (recovery-mode on GRUB). And also learned that I should not trust blindly these results in my usual user environment. The new tests were also surprising:
Name,Mean,MeanLB,MeanUB,Stddev,StddevLB,StddevUB
"fib/fib",0.4979443445852824,0.4978961411169597,0.49802434593949996,3.1186229071682967e-4,2.1763919616793833e-4,4.8569259617552764e-4
"pfib/1",0.4980979338339395,0.4980578795126507,0.49816361099992496,2.58930038115918e-4,1.7011790805380493e-4,4.0161943492611696e-4
"pfib/2",0.30187882572923386,0.3018276229551859,0.3019332328489849,2.7088498930449747e-4,2.3482715228451473e-4,3.21989842698397e-4
"pfib/3",0.3025411692312786,0.3024852743795938,0.3026018276861736,2.975260476849551e-4,2.6907713057828055e-4,3.3200184242397705e-4
"pfib/4",0.3045588889769145,0.3024707928350993,0.31463312775407504,2.0236423610294995e-2,3.5108499279216634e-4,4.8255009076253054e-2
"pfib/8",0.27702674538408,0.27158569247041425,0.2846977105787821,3.2790225595837955e-2,2.5215205278321072e-2,4.6554023411818295e-2
"pfib/16",0.26909642369065967,0.2650135436705178,0.2758063927343913,2.615724515115467e-2,1.8039283932648264e-2,4.0956608339390285e-2
"pfib/32",0.2726816359213419,0.26952718407426557,0.27636309534822184,1.7444576031246224e-2,1.4227057669027803e-2,2.2935079362453085e-2
"pfib/64",0.28010295302186694,0.27646669537339885,0.28658743054185587,2.4301648337333875e-2,1.3360828345387364e-2,3.9546300072208655e-2
"ifib/1",0.4984497705153056,0.4984218493155071,0.4984962406805583,1.8040276986757106e-4,1.242720342516858e-4,2.870577387900989e-4
"ifib/2",0.3603615990332194,0.34868544012818997,0.3741170039824078,6.496311767153333e-2,5.652406802832432e-2,7.448312251646773e-2
"ifib/3",0.31676718384538366,0.3109451881102153,0.32481522709642136,3.4887028192202074e-2,2.744363202200484e-2,4.8060797483303504e-2
"ifib/4",0.31541500956330976,0.30635046393190096,0.32643843323503213,5.109215651574664e-2,4.339074296796736e-2,5.995395740457435e-2
"ifib/8",0.32055945069108693,0.3132960501364299,0.3302640953711101,4.2886292109838166e-2,3.422708481199234e-2,5.4196317284934535e-2
"ifib/16",0.32268204123292654,0.31607536703859057,0.33141186625276287,3.858016286486453e-2,3.0959817731033304e-2,4.801572113400926e-2
"ifib/32",0.32020341069017116,0.3151469627073833,0.32664997250352584,2.92266723352992e-2,2.3352577028246314e-2,3.5913110786596904e-2
"ifib/64",0.3183670273474284,0.3136345234564373,0.3248856058767864,2.834147907711714e-2,2.1803978149062524e-2,3.8088053144371074e-2
I didn’t thought the improvement in performance on a single-user would be so dramatic. The execution of the program in a single-user environment used 41% of the time to run in my usual environment. About the comparative results, they were more close to what I’d expect than the old ones. fib/fib, pfib/1 and ifib/1 had a very similar execution time. What surprised me was that I was expecting to have the best result in the benchmarks with 2 threads, since I ran the tests in a Core 2 Duo T5750 @ 2.00 GHz with 2 cores. But in pfib the execution time has reduced from 4 threads to 8, and in ifib, from 2 to 3. I can’t think of where these improvements come from.
Unfortunately the creation of charts is currently broken in GHC 6.12, so no graphics by now.