Archive for maio \31\UTC 2010

Summer of Code weekly report #3

31/5/2010 (segunda-feira)

My project

This week I followed an advice of my mentor, Simon Marlow, and started to play with the code including print statements in interesting places in order to understand it’s flow. It’s indeed much better than just reading the source code and the documentation on the wiki. As usual, I had a lot of doubts about some parts of the code, and noted them down. But now, I’m seeing that I’m resolving most of them just by playing more with the code, without need to ask in the mailing list. There are still some left, and I’m planning to send them to the list in a next oportunity.

The first thing I noticed was the high number of objects, from my perspective, that are created and collected in:

main = return ()

I measured how much times evacuate() was called. This function moves an object to the next generation of the Garbage Collector. It’s called 775 times, in 1 call to GarbageCollect(), with 479 different objects (with different pointers to their Info Tables). Also, there were pinned objects and large objects in this count. This was surprising to me.

I also made a small draw of the memory allocation in lines using diagrams. The draw:

The source code:

import Prelude hiding (tan)
import Control.Monad
import Data.Colour.RGBSpace
import Data.List
import Graphics.Rendering.Diagrams
import System.Random

objectMaxLength :: Double
objectMaxLength = 50

objectMinLength :: Double
objectMinLength = 10

nObjects :: Int
nObjects = 10

main :: IO ()
main
= do
— objects <- replicateM nObjects $ randomRIO (objectMinLength, objectMaxLength)
let objects = [18,50,12,46,16,41,14,30,49,23]
renderAs PNG "mem.png" Auto
$ vsep
10
[text 40 "(1)" <> continuous objects
, text 40 "(2)" <> lineByLine objects
, text 40 "(3)" <> allocated]

continuous :: [Double] -> Diagram
continuous = memory colors

lineByLine :: [Double] -> Diagram
lineByLine = memory (intersperse white colors ++ [white]) . objectsLine

allocated :: Diagram
allocated
= memory
(intersperse white colors ++ [white])
[18, 82, 50, 0, 12, 0, 46, 0, 16, 0, 41, 35, 14, 0, 30, 0, 49, 7, 23]

objectsLine :: [Double] -> [Double]
objectsLine [] = []
objectsLine (x : xs) = x : objectMaxLength – x : objectsLine xs

memory :: Color color => [color] -> [Double] -> Diagram
memory colours objects
= unionA
top
left
[hcat
$ map (uncurry fc)
$ zip (cycle colours)
$ map (lw 0 . flip rect objectMaxLength) objects
, lines_]

lines_ :: Diagram
lines_ = hcat $ replicate nObjects $ rect objectMaxLength 50

colors :: (Ord colour, Floating colour) => [Colour colour]
colors
= [red, green, blue, yellow, magenta, cyan
, darkred, darkgreen, darkblue, darkmagenta, darkcyan]

Anúncios

Playing with criterion

23/5/2010 (domingo)

As a part of my Google Summer of Code project, I should take a look at criterion, a benchmark library for Haskell. As I’d have to use it, and it seems to be a useful library for a lot of people, I packaged it to Debian, with all its dependencies. The haskell-criterion package is waiting on the NEW queue.

Ok, so now I had to test it with something. I came up with a test related to my reading about the Haskell execution model. I noticed that I never have played with multi-threaded Haskell code to achieve better performance, but only to make things that should be in parallel work. So I thought this was a good oportunity to try both. I wrote this code:

import Control.Applicative ((<$>), (<*>))
import Control.Concurrent.MVar (newEmptyMVar, putMVar, takeMVar)
import Control.Exception (evaluate)
import Criterion.Main (Benchmark, bench, bgroup, defaultMain, whnf, whnfIO)
import GHC.Conc (forkIO, par)
import System.Environment (getArgs)

value :: Int
value = 32

threadsL :: [Int]
threadsL = [1, 2, 3, 4, 8, 16, 32, 64]

main :: IO ()
main
= defaultMain
[bgroup "fib" $ [bench "fib" $ whnf fib value]
, bgroup "pfib" $ map pbench threadsL
, bgroup "ifib" $ map ibench threadsL]

pbench :: Int -> Benchmark
pbench i = bench (show i) $ whnf (pfib i) value

ibench :: Int -> Benchmark
ibench i = bench (show i) $ whnfIO $ ifib i value

fib :: Int -> Int
fib n
| n < 2 = n
| otherwise = fib (n – 1) + fib (n – 2)

pfib :: Int -> Int -> Int
pfib threads n
| n < 2 = n
| threads < 2 = fib n
| otherwise = par n1 $ seq n2 $ n1 + n2
where
n1 = pfib (threads – 1) $ n – 1
n2 = fib $ n – 2

ifib :: Int -> Int -> IO Int
ifib threads n
| n < 2 = evaluate n
| threads < 2 = evaluate $ fib n
| otherwise
= do
f1 <- newEmptyMVar
f2 <- newEmptyMVar
forkIO $ ifib (threads – 1) (n – 1) >>= putMVar f1
(+) <$> evaluate (fib $ n – 2) <*> takeMVar f1

My point is to benchmark the naïve implementation of the Fibonacci function. I’ve done three implementations:

fib
which is the simple sequential algorithm
pfib
which call par threads times
ifib
which call forkIO trheads times

I have chosen the value to be found to 32, because it took long enought to make the algorithms show a difference, and short enough to allow me to wait for the benchmarks. I built with:

ghc --make -threaded fib

And runned with:

./fib --summary=fib.csv +RTS -N

The results were surprising for me:

Name,Mean,MeanLB,MeanUB,Stddev,StddevLB,StddevUB
"fib/fib",1.7197746428132055,1.5104790505051613,2.031735854971409,1.2921598378638277,0.9514102096387,1.7188536150257019
"pfib/1",1.246479478704929,1.1805571945786482,1.3496261318802836,0.41570319371618925,0.29446316107198395,0.5823241217769125
"pfib/2",0.709402781355381,0.6850755818963056,0.7553855475068096,0.1650893842889147,0.10136189180677138,0.26853312094492815
"pfib/3",0.6876571162819864,0.6694172366738321,0.7332831247925761,0.136836636420341,5.5142145935712775e-2,0.2581805596992922
"pfib/4",0.6940202649712561,0.6684220250725745,0.746035416948795,0.17779646229098522,9.15862482237111e-2,0.2804203635281818
"pfib/8",0.5887631162285805,0.5625207790017128,0.637939236986637,0.17967127569396815,0.10516397607491562,0.2680488710546616
"pfib/16",0.6578513725876809,0.6107277926087382,0.7372924216866497,0.3070809683902726,0.20679305873839168,0.44571891526087154
"pfib/32",0.8900135167717932,0.8123333438515664,0.9942556460976597,0.45942103747727,0.36818132854060703,0.6233813540278427
"pfib/64",0.6894888337731359,0.6386897596001626,0.7766118057847021,0.33329115700425754,0.22490082647959764,0.5323965317477072
"ifib/1",1.3290104445099833,1.2501559790253634,1.4700443919777875,0.5277373265779313,0.35208278898795436,0.9179652474421286
"ifib/2",0.7626163347840308,0.7239342554688452,0.8381309326767921,0.26785654248385327,0.1679981089775395,0.44853454282169086
"ifib/3",0.7440206035256386,0.6907905062317847,0.8396386941552162,0.3549675945489581,0.22865300516283724,0.536883746220313
"ifib/4",0.6503480085015301,0.6130640967965126,0.711405030119419,0.239522235814317,0.1712396398224678,0.3616636557596486
"ifib/8",0.6300301631569865,0.6085795434594155,0.699028070795536,0.17776228649867923,6.103918021418795e-2,0.39508078831864085
"ifib/16",0.6735970553040505,0.6393461784005164,0.7405966480851172,0.2366561170051713,0.13725890735483465,0.38273830961617655
"ifib/32",0.6162510784745214,0.5983195599198344,0.6736035832047466,0.14678572822644564,5.335959472427259e-2,0.3231734735693349
"ifib/64",0.7217859562516212,0.6820346101403237,0.7857911285042766,0.2534368199920431,0.18073576275895306,0.37636526159654315

I’m only analyzing the first column.

How come pfib/1 and ifib/1, that is, pfib and ifib called with just one thread, got faster than fib/fib? All they do is calling fib! There’s certainly something wrong here. So I reran the tests in a single user environment (recovery-mode on GRUB). And also learned that I should not trust blindly these results in my usual user environment. The new tests were also surprising:

Name,Mean,MeanLB,MeanUB,Stddev,StddevLB,StddevUB
"fib/fib",0.4979443445852824,0.4978961411169597,0.49802434593949996,3.1186229071682967e-4,2.1763919616793833e-4,4.8569259617552764e-4
"pfib/1",0.4980979338339395,0.4980578795126507,0.49816361099992496,2.58930038115918e-4,1.7011790805380493e-4,4.0161943492611696e-4
"pfib/2",0.30187882572923386,0.3018276229551859,0.3019332328489849,2.7088498930449747e-4,2.3482715228451473e-4,3.21989842698397e-4
"pfib/3",0.3025411692312786,0.3024852743795938,0.3026018276861736,2.975260476849551e-4,2.6907713057828055e-4,3.3200184242397705e-4
"pfib/4",0.3045588889769145,0.3024707928350993,0.31463312775407504,2.0236423610294995e-2,3.5108499279216634e-4,4.8255009076253054e-2
"pfib/8",0.27702674538408,0.27158569247041425,0.2846977105787821,3.2790225595837955e-2,2.5215205278321072e-2,4.6554023411818295e-2
"pfib/16",0.26909642369065967,0.2650135436705178,0.2758063927343913,2.615724515115467e-2,1.8039283932648264e-2,4.0956608339390285e-2
"pfib/32",0.2726816359213419,0.26952718407426557,0.27636309534822184,1.7444576031246224e-2,1.4227057669027803e-2,2.2935079362453085e-2
"pfib/64",0.28010295302186694,0.27646669537339885,0.28658743054185587,2.4301648337333875e-2,1.3360828345387364e-2,3.9546300072208655e-2
"ifib/1",0.4984497705153056,0.4984218493155071,0.4984962406805583,1.8040276986757106e-4,1.242720342516858e-4,2.870577387900989e-4
"ifib/2",0.3603615990332194,0.34868544012818997,0.3741170039824078,6.496311767153333e-2,5.652406802832432e-2,7.448312251646773e-2
"ifib/3",0.31676718384538366,0.3109451881102153,0.32481522709642136,3.4887028192202074e-2,2.744363202200484e-2,4.8060797483303504e-2
"ifib/4",0.31541500956330976,0.30635046393190096,0.32643843323503213,5.109215651574664e-2,4.339074296796736e-2,5.995395740457435e-2
"ifib/8",0.32055945069108693,0.3132960501364299,0.3302640953711101,4.2886292109838166e-2,3.422708481199234e-2,5.4196317284934535e-2
"ifib/16",0.32268204123292654,0.31607536703859057,0.33141186625276287,3.858016286486453e-2,3.0959817731033304e-2,4.801572113400926e-2
"ifib/32",0.32020341069017116,0.3151469627073833,0.32664997250352584,2.92266723352992e-2,2.3352577028246314e-2,3.5913110786596904e-2
"ifib/64",0.3183670273474284,0.3136345234564373,0.3248856058767864,2.834147907711714e-2,2.1803978149062524e-2,3.8088053144371074e-2

I didn’t thought the improvement in performance on a single-user would be so dramatic. The execution of the program in a single-user environment used 41% of the time to run in my usual environment. About the comparative results, they were more close to what I’d expect than the old ones. fib/fib, pfib/1 and ifib/1 had a very similar execution time. What surprised me was that I was expecting to have the best result in the benchmarks with 2 threads, since I ran the tests in a Core 2 Duo T5750 @ 2.00 GHz with 2 cores. But in pfib the execution time has reduced from 4 threads to 8, and in ifib, from 2 to 3. I can’t think of where these improvements come from.

Unfortunately the creation of charts is currently broken in GHC 6.12, so no graphics by now.

Google Summer of Code weekly report #1

17/5/2010 (segunda-feira)

Some students of the Google Summer of Code are required to make a weekly report describing their activities.  As far as I know, I’m not, but I think it’s a good idea in general, to comunicate to my mentor and other interested folks what I’ve been doing.  So I’ll start to do them here.

My project is Implementing the Immix Garbage Collector Algorithm on GHC.

Last week I was most of the time involved in my school activities, so I didn’t have much time to work on the project.  I went to the bureaucratic services of my university to get a Proof of Enrollment in English, print/sign/scanned the Foreign Certification Form and sent them to Google on Friday afternoon.  I asked them to confirm the receival, which they didn’t. I’m planning to ask again for a confirmal today in the afternoon, if I still get no reply.

I also tried to build the current version of GHC, from the darcs repository. I was getting an error with my old (and updated) repository, so I donwnloaded a new one and could build nicely. Then I noticed that with make distclean I could also build with my older one, so I just removed the new. I downloaded the nofib benchmark suite and made an initial benchmark.

While I was reading the documentation of the build process of GHC, I noticed a bug, and submitted a patch to the cvs-ghc mailing list. The patch was applied, with some changes.

This week I plan to focus more on the project, and spend the weekly 36 hours working on it. I’m planning to start reading the Immix Technical Report to see if there’s anything new there with regard to the article. After that I’ll read the wiki documentation on the GHC current GC, together with the GHC source code. I’ll also experiment with Criterion.

While I was writing this post I noticed that the GC benchmark in nofib is failing:

==nofib== fibheaps: time to run fibheaps follows...
/usr/bin/time ../../runstdtest/runstdtest ./fibheaps  -o1 fibheaps.stdout -o1 fibheaps.stdout  -ghc-timing     300000;  /usr/bin/time ../../runstdtest/runstdtest ./fibheaps  -o1 fibheaps.stdout -o1 fibheaps.stdout  -ghc-timing     300000;  /usr/bin/time ../../runstdtest/runstdtest ./fibheaps  -o1 fibheaps.stdout -o1 fibheaps.stdout  -ghc-timing     300000;  /usr/bin/time ../../runstdtest/runstdtest ./fibheaps  -o1 fibheaps.stdout -o1 fibheaps.stdout  -ghc-timing     300000;  /usr/bin/time ../../runstdtest/runstdtest ./fibheaps  -o1 fibheaps.stdout -o1 fibheaps.stdout  -ghc-timing     300000;
Command exited with non-zero status 2
0.72user 0.05system 0:00.77elapsed 99%CPU (0avgtext+0avgdata 277664maxresident)k
0inputs+32outputs (0major+17695minor)pagefaults 0swaps
././fibheaps 300000 < /dev/null
**** expected exit status 0 not seen ; got 2
././fibheaps 300000 < /dev/null
expected stderr not matched by reality
--- /tmp/no_stderr3559  2010-05-17 07:35:09.000000000 -0300
+++ /tmp/runtest3559.2  2010-05-17 07:35:10.000000000 -0300
@@ -0,0 +1,2 @@
+Stack space overflow: current size 8388608 bytes.
+Use `+RTS -Ksize -RTS' to increase it.
Command exited with non-zero status 1
0.72user 0.06system 0:00.85elapsed 92%CPU (0avgtext+0avgdata 277664maxresident)k
368inputs+48outputs (1major+22085minor)pagefaults 0swaps
Command exited with non-zero status 2
0.71user 0.04system 0:00.78elapsed 97%CPU (0avgtext+0avgdata 277664maxresident)k
0inputs+32outputs (0major+17694minor)pagefaults 0swaps
././fibheaps 300000 < /dev/null
**** expected exit status 0 not seen ; got 2
././fibheaps 300000 < /dev/null
expected stderr not matched by reality
--- /tmp/no_stderr3578  2010-05-17 07:35:10.000000000 -0300
+++ /tmp/runtest3578.2  2010-05-17 07:35:11.000000000 -0300
@@ -0,0 +1,2 @@
+Stack space overflow: current size 8388608 bytes.
+Use `+RTS -Ksize -RTS' to increase it.
Command exited with non-zero status 1
0.72user 0.06system 0:00.80elapsed 98%CPU (0avgtext+0avgdata 277664maxresident)k
0inputs+48outputs (0major+22093minor)pagefaults 0swaps
Command exited with non-zero status 2
0.72user 0.06system 0:00.80elapsed 98%CPU (0avgtext+0avgdata 277680maxresident)k
0inputs+32outputs (0major+17696minor)pagefaults 0swaps
././fibheaps 300000 < /dev/null
**** expected exit status 0 not seen ; got 2
././fibheaps 300000 < /dev/null
expected stderr not matched by reality
--- /tmp/no_stderr3597  2010-05-17 07:35:11.000000000 -0300
+++ /tmp/runtest3597.2  2010-05-17 07:35:12.000000000 -0300
@@ -0,0 +1,2 @@
+Stack space overflow: current size 8388608 bytes.
+Use `+RTS -Ksize -RTS' to increase it.
Command exited with non-zero status 1
0.72user 0.07system 0:00.83elapsed 96%CPU (0avgtext+0avgdata 277680maxresident)k
0inputs+48outputs (0major+22095minor)pagefaults 0swaps
Command exited with non-zero status 2
0.68user 0.08system 0:00.78elapsed 97%CPU (0avgtext+0avgdata 277680maxresident)k
0inputs+32outputs (0major+17695minor)pagefaults 0swaps
././fibheaps 300000 < /dev/null
**** expected exit status 0 not seen ; got 2
././fibheaps 300000 < /dev/null
expected stderr not matched by reality
--- /tmp/no_stderr3616  2010-05-17 07:35:12.000000000 -0300
+++ /tmp/runtest3616.2  2010-05-17 07:35:12.000000000 -0300
@@ -0,0 +1,2 @@
+Stack space overflow: current size 8388608 bytes.
+Use `+RTS -Ksize -RTS' to increase it.
Command exited with non-zero status 1
0.70user 0.09system 0:00.80elapsed 98%CPU (0avgtext+0avgdata 277680maxresident)k
0inputs+48outputs (0major+22089minor)pagefaults 0swaps
Command exited with non-zero status 2
0.71user 0.06system 0:00.93elapsed 82%CPU (0avgtext+0avgdata 277680maxresident)k
0inputs+32outputs (0major+17695minor)pagefaults 0swaps
././fibheaps 300000 < /dev/null
**** expected exit status 0 not seen ; got 2
././fibheaps 300000 < /dev/null
expected stderr not matched by reality
--- /tmp/no_stderr3635  2010-05-17 07:35:12.000000000 -0300
+++ /tmp/runtest3635.2  2010-05-17 07:35:13.000000000 -0300
@@ -0,0 +1,2 @@
+Stack space overflow: current size 8388608 bytes.
+Use `+RTS -Ksize -RTS' to increase it.
Command exited with non-zero status 1
0.73user 0.06system 0:00.95elapsed 83%CPU (0avgtext+0avgdata 277680maxresident)k
0inputs+48outputs (0major+22089minor)pagefaults 0swaps
make[1]: ** [runtests] Erro 1
Failed making all in fibheaps: 1
make: ** [all] Erro 1

Erro is error in Portuguese. I’ll investigate on that.

Google Summer of Code

3/5/2010 (segunda-feira)

It’s kind of late now to talk about this, but I was accepted as a Student in the Google Summer of Code student. My project is about making the binaries produced by GHC faster by improving its Garbage Collector algorithm. I’ll implement the techniques described in the Immix Garbage Collector.  It’ll be great to work on GHC, specially in the RTS, and to produce code that can contribute to a wide number of users, like the users of darcs, XMonad, pandoc and other softwares built with GHC.

I’ll be blogging here about the status of my project.  Coding starts at May 24th, until then I’ll read the GHC source code documentation and experimenting with performance measurements using Criterion in a system with no daemons enabled.

Good Summer to everyone.