While working on some Rail apps for RollStream and using Mauricio Fernandez's excellent rcov plugin we started to encounter the [BUG] rb_gc_mark(): unknown data type problem. We only saw this when we ran our controller tests; just running the unit tests wouldn't trigger it. It was a bummer, though, because we couldn't see where we were coverage-wise.
I poked around rcov for a while using Valgrind - there's no Mac OS X port, but I had a Linux VMWare Fusion instance handy. After some flailing around I finally hit paydirt. This Valgrind invocation:
valgrind --tool=memcheck --error-limit=no --leak-check=no \
--leak-resolution=low \
--log-file=valgrind.out /usr/local/bin/rcov --rails \
--aggregate coverage.data --text-summary -Ilib --html \
[... lots of controller names here ...]
turned up this problem report:
==13390== Invalid write of size 4
==13390== at 0x784BE8E: coverage_event_coverage_hook (rcovrt.c:103)
==13390== by 0x416E85: rb_eval (eval.c:4127)
[... stack elided ...]
==13390== Address 0x7e419e8 is not stack'd, malloc'd or (recently) free'd
rcovt.c line 103 involves usage of a cov_array struct; I added some bounds checking like so:
$ diff -Naur rcovrt.c ~/new.rcovrt.c
--- rcovrt.c 2008-08-28 17:50:16.000000000 -0400
+++ /Users/tom/new.rcovrt.c 2008-08-28 17:52:15.000000000 -0400
@@ -64,7 +64,9 @@
if(!carray->ptr[sourceline])
carray->ptr[sourceline] = 1;
} else {
+ if (carray && carray->len > sourceline) {
carray->ptr[sourceline]++;
+ }
}
return carray;
@@ -98,7 +100,7 @@
static void
coverage_increase_counter_cached(char *sourcefile, int sourceline)
{
- if(cached_file == sourcefile && cached_array) {
+ if(cached_file == sourcefile && cached_array && cached_array->len > sourceline) {
cached_array->ptr[sourceline]++;
return;
}
I rebuilt the gem, reran the coverage task, and huzzah! It completes!
This isn't a great fix, of course - I'd much rather figure out what's wrong with the allocation of cached_array. Perhaps someone cleverer than I can come up with a better fix.
Updated 8/27/08: Modified to document a better fix - check the cached_array->len attribute and compare it to the sourceline.
Wow. Nicely done! I salute you!
Posted by: Ryan Owens | August 25, 2008 at 05:59 PM
@ryan, many thanks!
Posted by: tomcopeland | August 25, 2008 at 07:47 PM
Just because cached_array is true doesn't mean cached_array->ptr is true, nor cached_array->ptr[sourceline]. I'm guessing one of those is invalid for one reason or another.
I think it should probably be:
if(cached_file == sourcefile && cached_array->ptr)
You could always inspect the ptr and sourceline individually to make sure they're valid, too.
Posted by: Daniel Berger | August 25, 2008 at 07:52 PM
@dan, yeah, I should email Mauricio, he could probably fix whatever the real problem is in about 5 minutes... the change I made seems to skip using a cached version of the source file contents, so it probably slows things down considerable. I need to dig into that extension a little more.
Posted by: tomcopeland | August 25, 2008 at 08:15 PM
This post may interest you:
http://rspec.lighthouseapp.com/projects/5645/tickets/309-fix-for-rcov-segfault
Posted by: Daniel Berger | August 26, 2008 at 11:50 AM
@dan, yup, that turned up in my Googlings, but seems inconclusive... kind of peters out with everyone saying "yeah, rb_gc_mark()" here too.
Posted by: tomcopeland | August 26, 2008 at 05:37 PM
@tom - nice find. The crashes were completely random, works now, doesn't in 5 minutes, works again.
Unfortunately I've been too deep into work and severely lacking in C knowledge to try to track it down.
Hopefully a fix gets rolled into the master repo so we can all be anal about our coverage again :-)
Thanks,
Michael
Posted by: UnderpantsGnome | August 26, 2008 at 11:33 PM
Thanks for digging into this, Tom. Looks like Scott Barron has been busy too:
http://github.com/spicycode/rcov/commit/66909fb17cce40e3cf2e1312b16f7ab97b9fe559
I hope we'll see a new rcov gem soon...
Posted by: Aslak Hellesøy | August 27, 2008 at 01:17 AM
@aslak - hm, I tried Scott's change but am still seeing segfaults; this time with just a "[BUG] Segmentation fault". I need to fire up Valgrind again and try to get to the bottom of those double frees...
Posted by: tomcopeland | August 28, 2008 at 03:11 PM
@all, I've updated this post with a better fix that does some bounds checking... still not great, but better.
Posted by: tomcopeland | August 28, 2008 at 05:57 PM
I applied your patch on a rcov fork on GitHub. You can install it as a gem.
Hope this helps:
http://mergulhao.info/2008/8/29/rcov-with-segfault-bug-patched
http://github.com/mergulhao/rcov/tree/master
Posted by: Sylvestre Mergulhao | August 29, 2008 at 01:02 AM
@sylvestre, cool!
Posted by: tomcopeland | August 29, 2008 at 08:17 AM
Thank you so much for this!
Just ran into problem, and had no idea how to fix it.
Applied the noted github gem and worked perfectly.
Posted by: John | September 08, 2008 at 09:49 PM
I installed mergulhao-rcov gem. run 'rake spec:rcov' again. it worked good. Thanks guys!
Posted by: ram | March 21, 2009 at 02:30 PM