This post assumes that the reader has read my last post.
This weekly report is late, because I was too interested in the project to stop working on it to write. Next week it have to be earlier, since I’m going to New York on Sunday, and will have to take a time from the project.
First of all, I need to say that I thought all problems were gone in my last post, but I noticed after a lot of testing that there is a segfault (or memory corruption or etc) that happens about one of five times I ran the real/maillist benchmark. I tried a lot of combinations of parameters to the RTS (-C0 -A4k -i0 -DS and others) to make it behave deterministically, and it did, without segfaults. I could not reproduce the segfault deterministically, because all of these parameters made the segfault disappear. My old debugging technique, of printing a log to see where the program has went before the segfault happened, also is causing the segfault to disappear, so this one is being much harder to debug than the last ones. I’m still looking at it.
In a moment of this week I thought of pulling the latests patches from GHC, which created the need of rebuilding the whole system. I probably rebuilt with a different mk/build.mk, so I got some very distorted benchmark results when comparing before executions from before this build and after them. Because this distortion only made itself visible in a benchmark where I’ve introduced some improvements which I was very positive about, I naively thought that the -46% results were actually real. After a reasonable disbelief of my mentor, I remembered the complete rebuild and thought that was the cause. I reran the pre-rebuild benchmarks, and got the updated results.
This whole week I kept on benchmarking. At first I was using my usual system, but I noticed some very unexpected results, so I decided to run them in single-user mode, running nothing in parallel. In this condition, I belive the distortions are minimized. I organized my changes in sequential patches to the repository, to make it easier to measure. It takes a long time to run nofib, specially gc/gc_bench. While I organized the code, I notice the segfault was gone. I don’t know still what was causing it, but I’m not worrying very much, since it’s gone.
I’m planning to write a bigger report next week with the complete results of the benchmarks.