Now in stock!

Tom Copeland's Recent Posts

RSS Feeds

« 20000 users on RubyForge | Main | An LOLCODE interpreter using JJTree »

Tracking down a Rails app memory leak

Recently on thenewsroom we've noticed a memory leak; the process size would just grow and grow up over 1 GB per Mongrel.  Restarting them regularly kept things more or less under control, but that's silly.  Much better to find the actual problem.

I tried out various memory profilers, like mongrel -B and Scott Laird's MemoryProfiler.  But the results just didn't jump out at me; I spent quite a while poking around Objectspace dump files and String diffs and such without really making much progress.

Then I tried Evan Weaver's BleakHouse.  This was nicer; the Gruff graphs are cool and it showed that the leak was in the way we were using Ruby and Rails.  So we didn't have a class Hash in a controller or anything like that.  But I still couldn't pin it down; I was just seeing charts with upwardly sloping numbers of String objects.

Finally I backed down to just poking around the app while watching the output of:

watch -n 1 "ps -o rss,vsz -p 12345"

Nothing interesting surfaced at first.  OK, what if I look at the differences between my dev and our production database?  Ah ha!  A particular table in my DB has 200 rows, the production one has 10K rows, and a particular page hit by a particular user causes 1500 of them to be fetched.  And that corresponds with a huge leap in memory size. 

Here was the problem.  Our architecture involves various Rails apps passing data around - and we were using Marshal'd OpenStruct objects as the transfer mechanism.  This is a bad idea.  Here's a demo of why.  First, let's write 10K Marshal'd OpenStruct objects to a file:

require 'ostruct'
structs = []
10000.times do |x|
  structs << OpenStruct.new(
   :first_name => "Fred#{x}",
   :last_name => "Fredson#{x}",
   :hatsize => 12+x,
   :country => "FooLand#{x}")
end
File.open("data.dat", "w") { |f| f.syswrite(Marshal.dump(structs)) }

This creates a data.dat file of about 500K.  Now let's read them back in, but check the process size at various points.  Here's a read.rb:

require 'ostruct'
puts "Hit enter to read data"
gets
puts "Reading data..."
data = Marshal.load(File.read("data.dat"))
puts "Hit enter to set to nil"
gets
data = nil
GC.start
puts "It's nil, back from GC.start"
sleep 500

When we run read.rb, the process is about 3 MB:

$ ps -o rss,vsz -p 30079
RSS    VSZ
1516   3048

Now, after we read in the data:

$ ps -o rss,vsz -p 30079
RSS    VSZ
80600  82104

Great scott, 82 MB!   But surely that GC.start will clean them up, right?

$ ps -o rss,vsz -p 30079
RSS    VSZ
80620  82104

And, no.  Booooo.  But replacing that OpenStruct with a Hash causes the memory size to only go from 3 MB to 7 MB, which is a bit of an improvement over rising from 3 MB to 80 MB. 

What would have been nice in this situation would be to have some sort of heap analyzer that showed the size of memory occupied by each object graph subtree.  It'd be worth looking at Evan's patch to Ruby 1.8.6 and his profiler to see if we could somehow gather and dump enough information to create something like that.  What a cool project that would be for someone much smarter than I :-)

TrackBack

TrackBack URL for this entry:
http://www.typepad.com/t/trackback/512521/21776539

Listed below are links to weblogs that reference Tracking down a Rails app memory leak:

Comments

The mem_inspect gets you most of the way there. Bleakhouse uses it to do its object counts. Its fairly time consuming to walk the heap and figure out how big everything is, though.

"But replacing that OpenStruct with a Hash causes the memory size to only go from 3 MB to 7 MB, which is a bit of an improvement over rising from 3 MB to 80 MB."

Does the memory go back down with a Hash, or not? What about a

class Fred
attr_accessor :first_name, :last_name, :hat_size, :country
end

which being a normal class might be better?
(I'm aware that your real OpenStruct may not suit a regular class, but I'm wondering if this tells us something useful about GC in Ruby, or just about OpenStruct.)

While trying to challenge assumptions, as one needs to at times like this: what happens if you use YAML instead of Marshal? I'd hope it makes no difference...

Check out LiteSpeed too, it will enforce hard and soft memory limits for your rails apps, and auto spawns/kills handlers as needed

Also check out Dike, which might be able to give you information down to the lines of code that cause the leak.

http://www.codeforpeople.com/lib/ruby/dike/

So...my question is why doesn't GC.start get it?

Roger - exactly! I have no idea... something for ruby-core, I suppose...

You could try making the heap chunk size small (and not grow exponentially) and see if ruby is able to reclaim that unused space

Roger - yup, I haven't been watching ruby-core lately, not sure what the status of this one is...

Post a comment

Comments are moderated, and will not appear on this weblog until the author has approved them.

If you have a TypeKey or TypePad account, please Sign In