Tom Copeland's Recent Posts

RSS Feeds

Dovecot: Auth process died too early - shutting down

I upgraded a MySQL installation today and broke the dovecot POP/IMAP daemon. Fortunately, the fix wasn't too hard, and hopefully this will be useful to someone.

The MySQL upgrade was from 4.1 to 5.1, so it was a pretty big move. I compiled the new release, shut down the web server, dumped all the databases, removed the RPMs, installed the new version, restarted things and restored the databases. Things were pretty much working... although I had had to uninstall the old RPMs with rpm -e --nodeps since the dovecot POP3 daemon had MySQL as a dependency. But running ldd on /usr/sbin/dovecot showed no dependency on the MySQL client library, so I figured maybe that was just a spec file thing. Wrong!

I restarted dovecot and the process died leaving this in the log:

Jun 29 17:01:04 fiddle dovecot: Dovecot starting up
Jun 29 17:01:05 fiddle dovecot: Auth process died too early - shutting down
Jun 29 17:01:05 fiddle dovecot: child 6206 (auth) returned error 127

Not good. One of the dovecot options is to get passwords from a database. I use PAM, but dovecot still depended on MySQL. Blah!

Fortunately, the fix was pretty easy. I hunted down the source RPM, installed it with rpm -i dovecot-0.99.11-4.EL4.src.rpm, and edited /usr/src/redhat/SPECS/dovecot.spec. Here are the changes I made:

$ diff dovecot.spec.orig dovecot.spec
25,26d24
< BuildRequires: mysql-devel
< BuildRequires: postgresql-devel
68,69c66,67
< 	--with-pgsql                 \
< 	--with-mysql                 \
---
> 	--without-pgsql                 \
> 	--without-mysql                 \

Then a quick build and install:

rpmbuild --ba /usr/src/redhat/SPECS/dovecot.spec
rpm -e dovecot
rpm -i /usr/src/redhat/RPMS/i386/dovecot-0.99.11-4.EL4.i386.rpm

And edit /etc/dovecot.conf:

$ diff /etc/dovecot.conf.rpm.orig /etc/dovecot.conf
14a15
> protocols = imap imaps pop3 pop3s
197a199
> default_mail_env = mbox:/var/spool/mail/%u

Enable dovecot to start on boot with /sbin/chkconfig --level 345 dovecot on and you're back in business. Hope this helps someone!

Incidentally, if you need a Postfix book, try The Book of Postfix: State-of-the-Art Message Transport by Ralf Hildebrandt and Patrick Koetter. It's a few years old but still very relevant... and of course Postfix is a great mail server. Enjoy!

Lovely SVG railroad diagrams revisited

Last week I blogged about Julian Hyde's Clapham utility which generates nice SVG graphs from JavaCC grammars. One painful bit was that JavaCC didn't output a clean BNF format of a grammar, so you had to do some copying and pasting and general munging to get a grammar in shape for Clapham to use it.

Well, no more. I've added a BNF option to JJDoc (here's an updated javacc.jar) so that it outputs plain old BNF. So now you can easily go from a JavaCC grammar to a Clapham-generated graph:

$ /Users/tom/java.net/javacc/bin/jjdoc -bnf \
-output_file=java.bnf \
/Users/tom/java.net/javacc/examples/JavaGrammars/1.5/Java1.5.jj
Java Compiler Compiler Version 4.3 (Documentation Generator Version 0.1.4)
(type "jjdoc" with no arguments for help)
Reading from file /Users/tom/java.net/javacc/examples/JavaGrammars/1.5/Java1.5.jj . . .
Grammar documentation generated successfully in java.bnf
$ $ java -cp lib/batik-awt-util.jar:lib/batik-bridge.jar:lib/batik-css.jar\
:lib/batik-dom.jar:lib/batik-ext.jar:lib/batik-gvt.jar:lib/batik-parser.jar\
:lib/batik-script.jar:lib/batik-svg-dom.jar:lib/batik-svggen.jar\
:lib/batik-transcoder.jar:lib/batik-util.jar:lib/batik-xml.jar\
:lib/clapham.jar:lib/javacc.jar:lib/junit.jar:lib/xercesImpl.jar net.hydromatic.clapham.Clapham \
-d grammar java.bnf 
Created output directory grammar
Symbol AdditiveExpression 
[... etc ...]

Here's the svg version of one of the nonterminals in the output (using an embed tag):

And the png version:

Fun stuff!

Lovely SVG railroad diagrams with JavaCC and Clapham

Julian Hyde posted to the javacc-users mailing list a few weeks ago about his utility Clapham which generates railroad diagrams. A picture is worth 1K words, so here's the Java grammar rendered using Clapham. Pretty nifty!

The only problem I had was some difficulty in building the utility - so to shortcut that for anyone else, here's a clapham.jar that I built from source. To generate the Java grammar digrams I downloaded Clapham, built that jar file (which you'll want to put in lib/), munged JJDoc's output to create java.bnf and then did this:

java -cp lib/batik-awt-util.jar:lib/batik-bridge.jar:lib/batik-css.jar:lib/batik-dom.jar:lib/batik-ext.jar:lib/batik-gvt.jar:lib/batik-parser.jar:lib/batik-script.jar:lib/batik-svg-dom.jar:lib/batik-svggen.jar:lib/batik-transcoder.jar:lib/batik-util.jar:lib/batik-xml.jar:lib/clapham.jar:lib/javacc.jar:lib/junit.jar:lib/xercesImpl.jar net.hydromatic.clapham.Clapham -d grammar java.bnf

Voila, nice diagrams. Good one, Julian!

This process would be a little easier without the intervening jjdoc step... would be nice to be able to parse a grammar file directly, which would probably mean skipping all the token definitions. Either that or we should write a jjdoc output format that just outputs plain old BNF.

As always, for more info on JavaCC, check out my JavaCC book!

Updating the PassengerMemLimit patch

A few months back schmalowsky (not sure of real name) posted to the Passenger issue tracker about a patch to limit Passenger memory usage. It's kind of a brutal patch in that it uses setrlimit, so if the process tries to allocate too much memory it just dies. But hey, it keeps one Passenger process from gobbling up all the memory on the box, and you don't need Monit to watch it, so, good enough.

That said, the patch is a few months old and the Passenger source code has moved around, so the patch no longer applies cleanly. I've brought it up to date and you can get the diff here. It seems to work fine - when I set it low enough the process gets put down as soon as it exceeds the threshold.

To try this patch out, do something like this:

git clone git://github.com/FooBarWidget/passenger.git
cd passenger
wget http://infoether.com/~tom/passenger_memlimit.diff
git apply passenger_memlimit.diff
rake package:gem
sudo gem install pkg/passenger-2.2.2.gem --no-rdoc --no-ri
sudo /usr/local/bin/passenger-install-apache2-module --auto
# now edit your httpd.conf and add "PassengerMemLimit 192M"
/sbin/service httpd configtest
sudo /sbin/service httpd restart

I fiddled with this patch for quite a while on my Macbook - but setrlimit(RLIMIT_RSS, nnn) doesn't seem to be taking effect. The function call is returning 0, and a subsequent getrlimit returns the value that I just set, but the process size grows and grows regardless. Not sure what's up with that. This is on OS X 10.5.7, Darwin Kernel Version 9.7.0. If anyone has insights into that please send them my way, thanks! Update: Eric Hodel tweets "setrlimit wasn't fully implemented on OS X, RSS and others are missing." Thanks Eric!

Also, mad props to schmalowsky for writing the original code. All I did was juggle it around a bit; the real work was already done. And of course a huge thanks as always to the Phusion guys for Passenger!

Better Subversion to Git documentation on RubyForge

I should have done this about 6 months ago... but, nonetheless, I've improved the RubyForge documentation for converting a project from Subversion to Git. The docs now show how to use git svn clone followed by git gc and then the initial push to get the Git repository rolling. Suggestions/corrections/comments are welcome, of course.

Note that I do the git svn clone using the http:// Subversion URL rather than svn:// or svn+ssh://. Using svn:// will fail for a repository with more than a couple of commits since xinetd is set up to block IPs that hammer in a lot of requests, and using svn+ssh:// just seems wasteful - no need to encrypt all that data.

Doing git gc --aggressive certainly made a difference for the project I just converted; the disk space dropped from 8.1 MB to 760 KB.

I was also happy to see that although I'm running Git 1.6.3 on my laptop and 1.5 on the server it made no difference. I should upgrade RubyForge's Git installation... anyone know if there's anything particularly hairy involved in upgrading Git from 1.5 to 1.6? Googling around suggests it's no problem, but if anyone knows something different, please let me know, thanks!

Auto-approving RubyForge projects

The subject line pretty much sums it up - we're auto-approving RubyForge projects now. So, if you submit a new project, it should get approved and provisioned within five minutes or so. Gregory Brown and I will be keeping an eye on the support forums and trackers and whatnot; hopefully everything is in place to run smoothly. Enjoy!

Oh, and RubyForge is now running Subversion 1.6.2. Just a minor version upgrade so I don't think anyone will notice... but if you see something awry please let me know, thanks!

Less PHP, more Ruby

Way back when we started RubyForge (summer 2003) there was kind of a clunky project approval and provisioning process - I'd ssh in, run a script, click some buttons, run another script, and it would be done. Only took 5-6 minutes or so, and no biggie, I'll automate all that any day now.

A few years later I read a posting by Zed Shaw saying "if you're ssh'ing in to a machine more than once a week you're doing it wrong." Boy, that hit a nerve... he was right on the money. Approving projects was just painful enough that I only wanted to do it 2-3 times a week, not every day.

Now, only a few years after that, I finally got around to rewriting stuff so that all happens automatically. As part of that I replaced a bunch of straight SQL with named scopes, optimized some other stuff, and generally made everything a lot more readable. The optimizations were fun; for example, updating the gitosis config file went from 4m30s to 13s and thus the cronjob now runs every 10 minutes vs every 30 minutes. It was also nice to replace a lot of system calls with FileUtils.chmod and such.

There's still more work to be done - I'm still using the pure Ruby PostgreSQL driver, and I've got some clumsy templating stuff that would be nicer in ERB, and I'm doing some stuff in Ruby that could be better done in SQL. The code is here and no doubt there's a lot of room for improvement. Still, it's nice to be moving forward. And Zed was right!

RubyForge running Subversion 1.6.1

I just upgraded RubyForge to Subversion 1.6.1; release notes for that version are here. All seems well, but if anyone notices anything awry, please let me know.

One thing I saw in the release notes was the filesystem storage improvements. I'm wondering if it'd be worth it to run svnadmin pack on every Subversion repository on RubyForge... sounds like it could save some disk space and give us faster response time too. Anyone have any experiences with using that?

The Eclipse JavaCC plugin and Mac OS X

Lately I've been trying to get the Eclipse JavaCC plugin to work on my Mac 10.5.6 laptop with Eclipse 3.4.2. I was having all sorts of problems... basically the ones which folks are discussing here. After fiddling around a bit I ended up checking out the plugin and rebuilding the plugin.jar file using Java 1.5 and that did the trick. So, to get the Eclipse JavaCC plugin v1.5.13 working, try this:

  • Install the JavaCC plugin using the instructions outlined here.
  • Download this new plugin.jar file.
  • Shut down Eclipse and copy the plugin.jar file into the plugin directory. On my Mac I've got Eclipse installed in /Users/tom/eclipse, so I copied the jar file to /Users/tom/eclipse/plugins/sf.eclipse.javacc_1.5.13/plugin.jar
  • Restart Eclipse and create a new grammar file (File => New => Java Project, File => New => File, name it foo.jj)
  • Paste in some boilerplace JavaCC grammar contents, e.g.:
  • options {
      JDK_VERSION="1.5";
    }
    PARSER_BEGIN(Foo)
    public class Foo {}
    PARSER_END(Foo)
    TOKEN : {
      <HI : "hello">
    }
    void Foo() : {} {
      <HI>
    }
    
  • The file icon should now look like a little "JJ" and the outline view should contain the token definitions and whatnot.

Hopefully that helps someone. And if you're working with JavaCC, check out my JavaCC book!

developer.com Java tool of the year award

Just a quick post to note that JavaCC was apparently in the running for the Developer.com Product of the Year awards for 2009... it's in the "Java Tool" section along with Hudson, Glassfish, and Eclipse SQL Explorer. In the end, it looks like Netbeans won in that category... Tor for the win!

Sphinx, Riddle, and will_paginate

I'm a big fan of the excellent Sphinx full text engine, and I have some projects that use UltraSphinx and others that just use the Ruby API, Riddle, directly. Riddle supports limits and offsets but (understandably) doesn't do Railsy pagination - so Rich Kilmer wrote some code to do that.

The basic idea is to implement enough of will_paginate's WillPaginate::Collection methods to make things work. In this case we're searching a bunch of Book objects from my military reading list site:

class SearchResults

   def initialize(query_results, page, page_size)
     @query_results = query_results
     @page = page
     @page_size = page_size
     @books = Book.find(@query_results[:matches].map{|match| match[:doc]})
   end

   def previous_page
     @page == 1 ? @page : @page - 1
   end

   def next_page
     (@page == total_pages ? total_pages : @page + 1)
   end

   def current_page
     @page
   end

   def total_pages
     (@query_results[:total_found]/@page_size)+1
   end

   def each(&block)
     @books.each(&block)
   end

   def empty?
     @books.empty?
   end

 end

Here's the searching code; we can just put this in a class method on Book:

 def self.search(terms, options = {})
   client = Riddle::Client.new("localhost", 3312)
   page = options[:page].to_i || 1
   page_size = options[:page_size] || 20
   client.offset = (page - 1) * page_size
   client.limit = page_size
   SearchResults.new(client.query(terms), page, page_size)
 end

We also need a simple controller action:

def search
  @books = Book.search(params[:term], params)
end

And a route to get us there with nice URLs:

map.search "/search/:term/:page", :controller => 'books', :action => 'search'

And, finally, the standard will_paginate view code:

<%= will_paginate(@books)%>
<% @books.each do |b| %>
	<br/><%= b.title %>
<% end %>

That's about it! I usually test this stuff by using Mocha to replace Riddle::Client.query with a stub that returns a Hash of search result information. Pretty standard stuff, really. Enjoy!

Rails, ActiveRecord, and time ranges

On my military reading list site I wanted to be able to find records created in a particular month. That let me do URLs like /revisions/navy/jan-2009 and then look up the proper revision - e.g., the 2009 revision of the Navy reading list.

So, here are the functions, the first of which I got from some place on the internet which now eludes my Googling:

def days_in_month(d)
  (Date.new(d.year,12,31) << (12-d.month)).day
end

def beginning_to_end_of_month(date)
  from = date - ((date.day-1).days)
  to = from + (days_in_month(from)-1).days + 1.day
  from..to
end

And here's how to use them to look up revisions created in Januaary 2009:

>> Revision.find(:all, 
 :conditions => 
  {:created_at => 
  beginning_to_end_of_month(Date.parse("jan 2009"))})
=> [#<Revision id: 1, reading_list_id: 3, 
 created_at: "2009-01-10 21:25:17", 
 updated_at: "2009-01-10 21:25:17">, 
 #<Revision id: 2, reading_list_id: 4, 
 created_at: "2009-01-14 21:49:56", 
 updated_at: "2009-01-14 21:49:56">]

The coolest thing about this is that conditions takes a Range object, so it's easy to look up things within a date range. It gets translated to a SQL BETWEEN clause... very handy!

Rails gem download numbers

Mark had me crunch some of the download numbers for Rails gems; then Rich took the spreadsheet I came up with and summarized it nicely:

Release 2.0.0 = 1382 downloads in 428 days
Release 2.0.1 = 42205 downloads in 428 days
Release 2.0.2 = 345076 downloads in 428 days
Release 2.0.4 = 1074 downloads in 153 days
Release 2.0.5 = 1147 downloads in 123 days
Release 2.1.0 = 203446 downloads in 245 days
Release 2.1.1 = 102040 downloads in 153 days
Release 2.1.2 = 73896 downloads in 123 days
Release 2.2.2 = 156763 downloads in 92 days

Looks like the pace of downloads is picking up! Here are the monthly total for Rails 2.2.2 so far:

Nov 2008 - 20773
Dec 2008 - 65960
Jan 2009 - 63171
Feb 2009 - 6859

Note that there are many caveats here - this doesn't account for folks who run local gem servers, for folks who get the gems from other places, for folks who roll their own, etc., etc. But, still. Also, since my SQL skills are weak, here's a reminder for me of how it was done so I can Google this up and run it again next year:

select gem_name, date_trunc('month', downloaded_at), count(*) 
 from gem_downloads where gem_name like 'rails-%.gem' 
 group by date_trunc('month', downloaded_at), gem_name 
 order by gem_name, date_trunc('month', downloaded_at)

Some military reading list trivia: A search over all the reading lists turns up only three hits for insurgency. Kind of surprising!

Compiling Subversion with a custom APR installation

I recently had to upgrade Subversion on an RHEL 4 machine; Subversion 1.1.4 was installed and that's getting a little long in the tooth. So I downloaded and tried to compile Subversion, but ran into problems because the RHEL 4 version of APR is pretty old too. Specifically, I got this error:

checking APR version... 0.9.4
wanted regexes are 0\.9\.[7-9] 0\.9\.1[0-9] 1\.
configure: error: invalid apr version found

So I poked around for a bit and figured out how the --with-apr and --with-apr-util options work. This works out well, actually, since I've got an upgraded version of Apache and now I can point Subversion to it. For posterity, here's the invocation:

./configure  \
  --with-ssl \
  --enable-shared \
  --without-berkeley-db \
  --with-apxs=/usr/local/apache2/bin/apxs \ 
  --with-apr=/usr/local/apache2/bin/apr-1-config \
  --with-apr-util=/usr/local/apache2/bin/apu-1-config

Hope this helps someone!

Military reading list trivia: Robert Kaplan's Imperial Grunts is on both the Navy and the Marine Corps reading lists.

RubyForge now on PostgreSQL 8.3

PostgreSQL 8.3 has been out for a while and has all sorts of nifty improvements... and I've finally gotten around to upgrading RubyForge to use it, huzzah! I did the upgrade last night and all seems well so far, but if anyone notices anything awry please let me know.

Also, if there are any PostgreSQL gurus reading this, if you have a moment please take a look at the RubyForge postgresql.conf and let me know if you see anything crazy there. The RubyForge server has around 8 GB of RAM and I figure that 3 GB or so can be safely dedicated to PostgreSQL. I've made a couple changes - bumping up shared_buffers and maintenance_work_mem and whatnot - but tuning suggestions would be welcome.

When I did this upgrade I did the usual major version dump/load... this took quite a while since the DB has around 70M records in it. Maybe next time I can do the upgrade with Slony; that would reduce the downtime window. On the other hand, it still only took 10 minutes to dump and 30 minutes to load, so, meh.

More military reading list trivia: John Keegan has three books on the various lists: Face of Battle, The Mask of Command, and Fields of Battle: The Wars for North America. I've only read the first of those; it's a great book. Ugly times at Agincourt... fighting hand to hand in a plowed field, yikes.

Sphinx, Riddle, and escaping special characters

If you're using Sphinx and Riddle, you'll notice that special characters don't get escaped. This means that if you do a extended mode search for apples -oranges, the dash in -oranges will be treated as a NOT operator. If you're accepting search terms from your users, this will lead to suprises unless you escape that and other special characters.

This functionality is built in to the Sphinx PHP API, but I didn't find it in Riddle. But here it is thanks to backreferences and the block form of gsub:

def self.escape_string(s)
  (s || "").gsub(/(:|@|-|!|~|&|"|\(|\)|\\|\|)/) { "\\#{$1}" }
end

I think that covers all the cases, but if you notice anything missing here please let me know, thanks! Feb 27 2009 update: Added : and ), thanks Brian!

More military reading list trivia: the 2009 Air Force reading list has been released! Note that it now includes David Galula's "Counterinsurgency Warfare: Theory and Practice" which is also on the Marine Corp reading list.

Who owns what RubyGem on RubyForge?

Prompted by a support request - here's a dump of the current gem namespace ownership in the RubyForge gem index. If folks find this useful, let me know and set this up to be regenerated via a daily cronjob.

More military reading list trivia: The Kite Runner is on both the Coast Guard and the Navy reading lists.

Advanced ActiveRecord screencast

I just watched the EnvyCast Advanced ActiveRecord screencast and even though it's a few months old it's still great stuff. Jason and Gregg do a nice tour of the various AR topics (STI, complicated includes, bulk data loads, etc) and they give enough detail to make each section useful. Usually I delete a screencast after I've watched it, but I think I'll keep this one around and watch it again in a few weeks just to make sure everything sank in.

The only nit I would pick is that at one point they refer to STI returning an instance of the desired subclass as "casting"; I think that's not quite what's happening there. The object isn't being implicitly cast to another type; rather, the object is created as a certain type (Song/Video/Film) and then we're calling the class method on it. Actually... I'm not sure if you _can_ cast stuff in Ruby... some classes can convert themselves to a different type (via to_i or to_s or whatever), but that's a conversion, not a cast.

Anyhow, that's a minor point, and I definitely plan on watching their Rails 2.2 screencast. And of course the RailsEnvy podcast is awesome!

More military reading list trivia: Malcolm Gladwell has two books on the various lists - "Blink" on the Coast Guard list and "The Tipping Point" on the Navy list.

libxml-ruby and extconf failure: need libm

I was installing libxml-ruby today and got a weird error:

extconf failure: need libm

Hm, ok, libm is supplied by the glibc-devel RPM, which is installed and there it is:

$ ls -l  /usr/lib/libm.a 
-rw-r--r--  1 root root 449902 Apr 15  2008 /usr/lib/libm.a
$ rpm -qf /usr/lib/libm.a
glibc-devel-2.3.4-2.41

After flailing around for a bit I start actually looking at libxml-ruby's ext/libxml/extconf.rb. Here's the relevant bit around line 50:

unless have_library('m', 'atan')
  # more checks
end

Ok, so, have_library comes from mkmf.rb, and that method does this:

try_func(func, libs, headers, &b)

And what does try_func do? It writes out a small C program that calls that specific function and tries to compile and link it. So that should work unless.... doh, GCC isn't installed! A quick "yum install gcc" and everything works.

I think it took me so long to figure out what was happening because I was expecting a "libxml2-devel not found" kind of error, and if GCC wasn't installed I'd expect to get a "no C compiler found" error. Anyhow, hope this helps someone.

On a completely unrelated note, Ender's Game appears on both the Marine Corps and Navy reading lists. Who knew?

A small fix for JavaCC 4.2's code generation

JavaCC 4.2 was released a few weeks ago, but there was a small problem (spotted by Xavier Le Vourch and patch submitted by rafiyr) which results in the generated JJTree code not using Java 1.5 language constructs.

rafiyr's patch looks good to me, the tests all pass, and a rebuilt javacc.jar file is here if anyone wants to give it a try. I'll commit this in a couple days unless any problems crop up...

Another happy UltraSphinx user

I wanted to set up searching on my military reading lists site, and while the dataset is small enough that a SQL LIKE query would have done the trick, I wanted to introduce a real search engine in the mix. Thanks to a couple of helpful posts from Neeraj Kumar and Rein Henrichs, it was simple to get the basic functionality up and running. I'm using the latest UltraSphinx code from GitHub and as far as I can tell everything is working fine with Ruby 1.8.6, Passenger 2.0.5, and Rails 2.2.2.

The only thing I would add to Rein's post is a variation on his Capistrano tasks. He uses this:

namespace :sphinx do
  desc "Generate the ThinkingSphinx configuration file"
  task :configure do
    run "cd #{release_path} && rake thinking_sphinx:configure"
  end
end
after "deploy:update_code", "sphinx:configure"

and I used this:

def run_remote_rake(rake_cmd)
  run "cd #{current_path} && /usr/local/bin/rake RAILS_ENV=production #{rake_cmd}"
end
after "deploy", "ultrasphinx:configure"

My technique is slightly shorter, but his method would allow you to run that task independently of a deploy. I reckon both ways work.

RubyForge gem publishing time reduced from 1 hour to 5 minutes

The RubyForge gem indexer has a fair number of gems to deal with - upwards of 17,000. This means that a complete rebuild of the various indexes can take a while; recently it's been taking 20-25 minutes and gobbling lots of I/O resources as it runs. This slows down everything else on that machine, of course.

But happily all that's changed. Eric Hodel made some changes to the indexer that allow it to do more incremental builds of the newer indexes, while the very old index - e.g., the uncompressed 27 MB YAML file - is now only rebuilt once a day. Adding a new gem to the index and making it available to download (thanks to those rewrite rules that Jeremy Kemper wrote) can now be done in around 10 seconds!

All that to say that instead of running the cronjob that deploys newly released gems once an hour, we're now running it every five minutes. So your new gems on RubyForge will become available much more quickly than before - and the server load is actually reduced. Good times indeed!

Upgrading your RubyForge project repository to Subversion 1.5

Last week I posted about upgrading RubyForge's installed Subversion version to 1.5. All seems well with that, and new repositories are being created as Subversion 1.5 repos, but that still leaves all the old repositories at 1.4. I don't want to yank the rug out from under anyone, so if you want your repo upgraded please file a support request. I've added a new category for that type of request, and I'll run svn-populate-node-origins-index on any converted repos so you should see decent performance.

Many thanks to Emiel van de Laar for filing a request and pointing me to the Subversion documentation on this!

RubyForge email ham to spam rates

I was looking at the logwatch output from RubyForge the other day and noticed this in the Postfix log section:

   34652   Accepted                                   5.30%
  618877   Rejected                                  94.70%
--------   ------------------------------------------------
  653529   Total                                    100.00%

Almost 95% of all SMTP connections are spam! My goodness. Here are the detailed stats:

    1824   Reject relay denied                        0.29%
   24225   Reject HELO/EHLO                           3.91%
  264808   Reject unknown user                       42.79%
  149903   Reject recipient address                  24.22%
  178112   Reject sender address                     28.78%
       2   Reject server configuration error          0.00%
       3   Reject VRFY                                0.00%

Mostly unknown users - spammers doing dictionary attacks, I reckon.

Speaking of spam, we use the Postgrey greylisting daemon to, well, do greylisting. A few weeks ago I noticed that it was constantly blocked on I/O and was taking up a good bit of CPU as well - I guess that's to be expected when you've got a 70 MB postgrey data file. Anyhow, I moved the daemon over to another box, twiddled the Postfix configuration on one end and the iptables configuration on the other end and the Rubyforge load average dropped a good bit. Good times.

RubyForge now running Subversion 1.5

A few months ago Jonathan Rochkind noticed that we were running Subversion 1.4 on RubyForge even though 1.5 has been out for a while. After my usual period of procrastination, I buckled down to it this evening and upgraded. Like all these things, it wasn't as painful as I thought it might be. Of course, if you notice anything awry, please let me know!

The Subversion documentation has some pages which might be useful for folks wondering what Subversion 1.5 brings; there's the 'Considerations when upgrading to Subversion 1.5' document which has some general notes on the changes, and the release notes give tons of details on all the changes.

The release notes link off to a post by Malcolm Rowe about the new sharded directory structure. I'm thinking of running the reshard script on all the RubyForge Subversion repositories just on general principle... has anyone had any bad experiences with running that?

JavaCC 4.1 released!

Well, I'm about two months late in blogging about this, but, still: JavaCC 4.1 is out! It's been close to three years since the last release (JavaCC 4.0 came out in Jan 2006), and this new version has a bunch of bug fixes and code cleanups.

The lion's share of the work for this release (and over the past few years in general) was done by Paul Cager, with contributions in the form of patches, discussions, and bug reports from many others. You can download this release here, and feedback is always welcome on the mailing lists.

Finally, I must plug my JavaCC book. It's a perfect Christmas present for your friends who enjoy reading EBNF!

A small JJDoc bug fixed

Michael Iber reported a bug in JavaCC - the documentation generator, JJDoc, was erroring out on tokens defined with repetition ranges. For example, a token definition that contained <FOO>{2,3} would result in this error:

Java Compiler Compiler Version 4.1d1 (Documentation Generator Version 0.1.4)
(type "jjdoc" with no arguments for help)
Reading from file error.jj . . .
Oops: Unknown regular expression type.
Exception in thread "main" java.lang.NullPointerException
        at org.javacc.jjdoc.HTMLGenerator.print(Unknown Source)
        at org.javacc.jjdoc.HTMLGenerator.println(Unknown Source)
        at org.javacc.jjdoc.HTMLGenerator.nonterminalsStart(Unknown Source)
        at org.javacc.jjdoc.JJDoc.emitNormalProductions(Unknown Source)
        at org.javacc.jjdoc.JJDoc.start(Unknown Source)
        at org.javacc.jjdoc.JJDocMain.mainProgram(Unknown Source)
        at org.javacc.jjdoc.JJDocMain.main(Unknown Source)
        at jjdoc.main(Unknown Source)

This is fixed in CVS now (v1.16 of JJDoc.java) and I've posted an updated javacc.jar for those who encounter this problem.

Kirk Pepperdine's nice review of Generating Parsers with JavaCC

I just noticed that Kirk Pepperdine had posted a kind note about my JavaCC book. He talks about having to do write new grammars rarely enough that it's helpful to have a book handy to fire up the neurons again when the task comes up, and I think that's probably a pretty common case. Sounds like the book has been useful to him there.

One thing he says is that Generating Parsers with JavaCC "...has been an invaluable source of information when I've been trying to sort out some of the more obscure error messages that I'm prone to generate." Yup, as I wrote all the grammars for the book I captured lots of error messages as they arose, looked them up in the JavaCC source code, and then added an explanation for each one. I also added those error messages to the index so that you can flip to the back of the book and look up "Multiply defined lexical tokens" or whatever.

Kirk's blog is pretty well known, I think, but I hadn't gone there for a while so it was interesting to catch up with some of his recent posts. I especially enjoyed the one on concurrent mark and sweep garbage collection. I notice in that post's comments that Greg Sporar points over to GCHisto, which has a handrolled parser for GC log files. I wonder if a JavaCC-generated parser would be much faster? Anyhow, good stuff!

Some excellent Ruby metaprogramming screencasts

Everyone probably knows about these already, but, just in case, I just watched the first episode of Dave Thomas' excellent Ruby metaprogramming screencast and it was quite nice. Dave explains and diagrams singleton (or 'ghost', as he calls them) classes very clearly; I don't think I had ever really though through the concepts of "current class" or how self changes when expressions are being evaluated.

The screencasts are reasonably priced - $5 - and the first one was only 30 minutes, so they're manageable. I think I view these in the same way I view a book - if I learn _anything_ at all from it, it's worth the price.

Dave Thomas is pretty amazing - after 7 (8? 9?) years of doing Ruby he's still soldiering away on ruby-core, working through the nuts and bolts of various Ruby 1.9 features, Ruby 1.8 bugs, and so forth. Dave, you rock!

rcov crashing with [BUG] rb_gc_mark()

While working on some Rail apps for RollStream and using Mauricio Fernandez's excellent rcov plugin we started to encounter the [BUG] rb_gc_mark(): unknown data type problem.  We only saw this when we ran our controller tests; just running the unit tests wouldn't trigger it.   It was a bummer, though, because we couldn't see where we were coverage-wise.

I poked around rcov for a while using Valgrind - there's no Mac OS X port, but I had a Linux VMWare Fusion instance handy. After some flailing around I finally hit paydirt.  This Valgrind invocation:

valgrind --tool=memcheck --error-limit=no --leak-check=no \
--leak-resolution=low \
--log-file=valgrind.out /usr/local/bin/rcov --rails \
--aggregate coverage.data --text-summary -Ilib --html \
[... lots of controller names here ...]

turned up this problem report:

==13390== Invalid write of size 4
==13390==    at 0x784BE8E: coverage_event_coverage_hook (rcovrt.c:103)
==13390==    by 0x416E85: rb_eval (eval.c:4127)
[... stack elided ...]
==13390==  Address 0x7e419e8 is not stack'd, malloc'd or (recently) free'd

rcovt.c line 103 involves usage of a cov_array struct; I added some bounds checking like so:

$ diff -Naur rcovrt.c ~/new.rcovrt.c 
--- rcovrt.c 2008-08-28 17:50:16.000000000 -0400
+++ /Users/tom/new.rcovrt.c 2008-08-28 17:52:15.000000000 -0400
@@ -64,7 +64,9 @@
           if(!carray->ptr[sourceline])
                   carray->ptr[sourceline] = 1;
   } else {
+   if (carray && carray->len > sourceline) {
          carray->ptr[sourceline]++;
+   }
   }

   return carray;
@@ -98,7 +100,7 @@
static void
coverage_increase_counter_cached(char *sourcefile, int sourceline)
{
- if(cached_file == sourcefile && cached_array) {
+ if(cached_file == sourcefile && cached_array && cached_array->len > sourceline) {
          cached_array->ptr[sourceline]++;
          return;
  }


I rebuilt the gem, reran the coverage task, and huzzah!  It completes!

This isn't a great fix, of course - I'd much rather figure out what's wrong with the allocation of cached_array.  Perhaps someone cleverer than I can come up with a better fix.

Updated 8/27/08: Modified to document a better fix - check the cached_array->len attribute and compare it to the sourceline. 

RubyForge newcaster

You may notice news items being promoted to the RubyForge front page more rapidly in the upcoming days - and that's thanks to Ben Bleything, who has agreed to take on the role of RubyForge newscaster for a while.  Thanks Ben!

Browsing git repositories on RubyForge

Thanks to gitweb and an assist from Garry Dolley, you can now browse the Git repositories that are hosted on RubyForge.  For example, Garry's EBay4R project's scm page now links to a browsable repo.  Good times.

The speed is decent, especially considering it's a CGI script.  The GitHub browser is much nicer, but, still, 'tis a start.  Enjoy!

Faster RubyGem deploys thanks to Jeremy Kemper

A long while back Jeremy Kemper did up some nice rewrite rules for RubyForge.  The idea was that the filenames of new gems will be tracked and RubyForge will serve those gems locally rather than redirecting the request out to the mirrors.  After a long delay (sorry!) I finally implemented the PHP side of things - e.g., modified the GForge code to write out the filename when a gem is released.

So today, for example, Jamis Buck released Capistrano 2.4.1 at 1:26.  The "new gem check" cronjob ran at 1:40 and placed the gem file in the main gems directory, and the index was rebuilt by around 1:47.  And at 1:48, what should appear in the gems.rubyforge.org virtual host log:

gems.rubyforge.org 213.119.94.76 - - [27/Jun/2008:13:48:11 -0400] "GET /gems/capistrano-2.4.1.gem HTTP/1.1" 200 109056 "-" "RubyGems/1.0.1 universal-darwin-9" 

Huzzah!  The gem is being served without waiting for all the mirrors to be updated.  Now I just need to write a little code (update 6/29, done) to remove file names from the list after 24 hours or so; by that time the gem will have been sync'd out. 

So, to summarize, gems will now be available very soon after they're released.  Thanks Jeremy!

RubyForge email statistics

I came across the Lire log analysis tool yesterday and just had to run it on RubyForge's mail server (yay Postfix!) logs.   Some notes:

  • A week of Postfix log files for RubyForge is 300 MB, so Lire took around an hour and a half (on a dual Xeon workstation) to process the log and create the reports.
  • RubyForge sent 111,000 messages during that week; this totalled 900 MB of data.  Another 45,000 messages were either queued or were undeliverable, so the total number of messages handled in one way or another was 156,000.
  • The busiest day in terms of email count was May 23 with 31,000 emails sent.
  • The busiest day in terms of data volume was May 20 with 20,000 emails sent comprising 231 MB of data.

I'd post the entire report, but it's got a bunch of email addresses in it and I don't want to make those available to spammers - not that spammers can't figure them out anyway, but still. 

This is more or less a stock Postfix install with just a few tweaks.  I've got default_process_limit set to 500, the bounce_size_limit is 10K, and smtpd_client_connection_count_limit is set to 5.  We've also got postgrey for greylisting - the verified_senders.db file is 251 MB! - and a few header and body checks.  We run Mailman with 24 OutgoingRunners and 12 RetryRunners; all other runners are just singletons.

Things seem to be working fine, but if anyone has any suggestions on optimizing Postfix or Mailman, please pass them on to me, thanks!

Capistrano, local_repository, and deploy_via :copy

Today I was working on a Rails app that's running on the same machine that hosts the Subversion repo.  So I figured I'd use local_repository to make the deployments faster.  You know, that's where you have this in your config/deploy.rb:

set :repository,  "file:///path/to/svn/#{application}/trunk"
set :local_repository, "svn+ssh://my.server.com/path/to/#{application}/trunk"

With these settings, Capistrano checks out the code using the file:// URI which is much faster than going over svn+ssh.  Anyhow, it was failing with a svn: Unable to open an ra_local session to URL error and I couldn't get it working... until I noticed that I had left the set :deploy_via, :copy line in from a previous setup.  Doh!  Once I removed that, all was well.

We could check for this with something like this in lib/capistrano/recipes/deploy/scm/base.rb:

$ diff -Naur base.rb base.new.rb 
--- base.rb     2008-05-21 19:50:07.000000000 -0400
+++ base.new.rb 2008-05-21 19:49:59.000000000 -0400
@@ -43,6 +43,7 @@
         # Creates a new SCM instance with the given configuration options.
         def initialize(configuration={})
           @configuration = configuration
           # The next line is broken into multiple lines only for blog formatting
+          logger.info "WARNING: You probably don't want to use 'set :deploy_via, :copy'
and a :local_repository together; remove 'set :deploy_via, :copy' and things will
probably work as expected" if @configuration.variables[:deploy_via] == :copy &&
@configuration.variables[:local_repository]
         end

         # Returns a proxy that wraps the SCM instance and forces it to operate

Are there cases in which you'd actually want to use those together?  I'm not sure.  We'd probably want a better message than that, too :-)

mod_rails and Capistrano

Here's the Capistrano code I've been using with mod_rails:

namespace :deploy do
  desc "Restarting mod_rails with restart.txt"
  task :restart, :roles => :app, :except => { :no_release => true } do
    run "touch #{current_path}/tmp/restart.txt"
  end

  [:start, :stop].each do |t|
    desc "#{t} task is a no-op with mod_rails"
    task t, :roles => :app do ; end
  end
end

I think that's all there is to it... as far as I can tell, start and stop don't make much sense in a mod_rails context.  I suppose you could use them to disable the app altogether somehow, although that kind of relates to this issue about maintenance.html.

Credit for the restart task goes to Jim Neath; the change I made was just to put it in the deploy namespace.  Works either way, though.

mod_rails for your staging environment

This is more of a twitter than a proper blog entry, but if you're using mod_rails for an environment other than production, don't forget to set RailsEnv staging or whatever your environment name is in your Apache configuration file.

mod_rails and "Apache 2... not found"

This is probably too easy a fix to even blog about, but, hey.  I was installing mod_rails for the first (second?) time, and got this message when I ran passenger-install-apache2-module:

 * Apache 2... not found
* Apache 2 development headers... not found

I had Apache installed in /usr/local/apache2 and /usr/local/apache2/bin wasn't on the PATH, so passenger's PlatformInfo.find_httpd method couldn't find it.  A quick PATH=$PATH:/usr/local/apache2/bin/ && export PATH, rerun passenger-install-apache2-module, and all was well.

5/14/2008 update: If you do this you'll probably also want to add that directory to your PATH setting in /etc/profile; otherwise you'll get similar error messages when you run passenger-memory-stats.

Problem trends with PMDReports

Ben Northrup emailed me about his PMDReports project.  To quote from his project page, "Whereas PMD generates and displays code quality statistics at a source code level, PMDReports persists and aggregates these statistics so that code quality can be viewed from a more macro, component level."

I got the latest version (0.8.2) and ran it on PMD's util package; here's the resulting "Component Quality Snapshot Today" report.  You can drill down into the rule violations, and although I just ran it on one component I could see how this could be the makings of a nice dashboard.  It would also be interesting to see those trend graphs after a few days.

Ben's past postings are worth a read; I enjoyed Why Code Quality Tools Work and Why our Programming Gods are so Unkempt?.  Especially the latter one!

RubyForge now has Git support

Folks using RubyForge have been requesting alternatives to CVS and Subversion for a while - there are feature requests in for Mercurial, Monotone, Darcs, and Git.  Of those, Git seems to be the most popular at the moment, so thanks to Garry Dolley's excellent tutorial on Gitosis, RubyForge now supports Git as one of the SCM choices.    Huzzah!   Garry not only put up the tutorial, he also volunteered EBay4R as the first project to use Git and helped me work through the initial configuration.  Thanks, Garry! 

This Git support is still pretty new, so I'm not quite sure if we've got all the right things set up.  But you should be able to start a project, select Git as the SCM, and push to and pull from a repository on RubyForge, and the Git repos are part of the nightly backup job.  There are nine projects so far that have established Git repositories, so something must be working.  Also, I've put up some notes in the RubyForge FAQ on getting started with a repository. 

Dr. Nic Williams posted some helpful notes on using RubyForge Git repositories.  He's also got some ideas on RubyForge supporting both Svn and Git for a project which are being discussed in a support ticket

Another thing I'm thinking about is providing a sort of pseudo-SCM that lets you say "the source for this project is hosted on GitHub or Gitorious or some other place".  Does anyone think that'd be useful?

Comments and suggestions are always welcome, and as they say on the Rails Envy podcast, let's git 'er done!   

RubyForge virtual host definitions

This is a little silly, but, hey.  Until this evening, all 5500+ RubyForge virtual host definitions have been embedded in one massive httpd.conf.  So httpd.conf was up to around 1.5 MB, or in terms of line count:

$ wc -l /usr/local/apache2/conf/bkp.httpd.conf 
71045 /usr/local/apache2/conf/bkp.httpd.conf

Nice, huh?  Yikes.  Anyhow, I finally buckled down to moving all the vhosts into separate files in a subdirectory and doing an Include *.conf.  Although I've put this off for a _long_ time, it only took about 15 minutes, makes the project management code simpler, and also makes a couple things - like project deletion - a bit easier to automate.  The only tricky bit is backing up all those files... but come to think of it, 99% of them are exactly the same (or could be generated) and thus don't need to be backed up.  The only outliers are vhosts like gems.rubyforge.org, which has a RewriteMap to support gem mirroring.

So often this seems to be the case... if I can just buckle down to a job it ends up being quicker and more enjoyable than I thought.  "In every job that must be done there is an element of fun...."

All that said, if you notice something awry with a RubyForge virtual host, please drop me a line - I might have missed something, thanks!

Rails, Capistrano, and Ruby extension dependencies

Sometimes if I hose up a Ruby install it won't have zlib or openssl or readline compiled in, which produces problems later when I try to get a Rails app running on there.  But with a little code we can get Capistrano to check for Ruby extensions (or, really, Ruby libraries in general) via the deploy:check task. 

Here's the new RemoteDependency implementation; you can just put this in your deploy.rb:

module ::Capistrano
  module Deploy
    class RemoteDependency
      def ruby_extension(extension_name, options={})
        output_per_server = {}
        try("ruby -r#{extension_name} -e ''", options) do |ch, stream, out|
          output_per_server[ch[:server]] ||= ''
          output_per_server[ch[:server]] += out
        end
        @success = true
        errored_hosts = []
        output_per_server.each_pair do |server, output|
          next if output.empty?
          errored_hosts << server
        end
        if errored_hosts.any?
          @hosts = errored_hosts.join(', ')
          output = output_per_server[errored_hosts.first]
          @message = "The Ruby interpreter was unable to load the extension #{extension_name}"
          @success = false
        end
        self
      end
    end
  end
end

Here's how to declare instances of this new dependency type; these go in deploy.rb also.  Note that we're only checking for these dependencies on the app server; you may not even have Ruby installed on your web servers:

depend :remote, :ruby_extension, :zlib, :roles => :app
depend :remote, :ruby_extension, :openssl, :roles => :app
depend :remote, :ruby_extension, :readline, :roles => :app

And here's a sample run:

$ cap deploy:check
    triggering start callbacks for `deploy:check'
  * executing `multistage:ensure'
*** Defaulting to `production'
  * executing `production'
  * executing `deploy:check'
  * executing "test -d /var/www/myapp/releases"
    servers: ["myserver.com"]
    [myserver.com] executing command
    command finished
  * executing "test -w /var/www/myapp"
    servers: ["myserver.com"]
    [myserver.com] executing command
    command finished
  * executing "test -w /var/www/myapp/releases"
    servers: ["myserver.com"]
    [myserver.com] executing command
    command finished
  * executing "which svn"
    servers: ["myserver.com"]
    [myserver.com] executing command
    command finished
  * executing "which gem"
    servers: ["myserver.com"]
    [myserver.com] executing command
    command finished
  * executing "ruby -rzlib -e ''"
    servers: ["myserver.com"]
    [myserver.com] executing command
    command finished
  * executing "ruby -ropenssl -e ''"
    servers: ["myserver.com"]
    [myserver.com] executing command
    command finished
  * executing "ruby -rreadline -e ''"
    servers: ["myserver.com"]
    [myserver.com] executing command
    command finished
You appear to have all necessary dependencies installed

Note the boilerplate deploy:check commands up front followed by the custom checks for the libraries.  As always, many thanks to Jamis Buck for Capistrano!

Awesome new RubyForge hardware

Last fall I posted about some sweet new hardware that Sun had sent us.  Well, they've outdone themselves this time - they sent us both a T5120 and an X4500 with 24 TB (yup, I had "GB" here at first, doh!) of storage.  Great scott!  Wow, does that X4500 weigh a ton.

Pictures are below.... and I'll post more details on the uses of these machines as we get them set up.  Thanks Sun!!

The T5120

The X4500

Lots of RubyForge traffic

Back in Sep 2005 I posted a note about RubyForge getting 200K hits per day.  I'm not sure how I was calculating hits back then - was I including images and vhosts and gems and all that?  Dunno. 

Well, anyhow, I did some basic "grep -c" calculations and RubyForge got 350K hits on Mar 4 - about four hits per second.  That's not counting all the vhosts, not counting RubyGem downloads, not counting images and other static content stuff.  So those are pretty much all dynamic requests, where dynamic == something that hits the DB.  Some of those are bots, of course, but, meh.  Also, some of those hits result in redirects to other sites - e.g., when someone downloads a file the download is recorded in the database and then the request is redirected off to a mirror.

So once again let me seize this opportunity to thank the RubyForge mirror providers - without you guys serving up all the long requests RubyForge would be swamped.  Thanks to Gregory Brown for monitoring the user's forum and answering questions.  Thanks to Dennis Oelkers for keeping the mirroring infrastructure running.  And thanks to Tim Bray (and Sun) for the lovely X4200 that RubyForge is currently running on.  Good times!

mod_security woes

A customer had an unfortunate experience with mod_security recently. They were getting occasional HTTP 500 responses from their Rails app on certain large pages.  ExceptionNotifier wasn't reporting any stacktraces, and a check of the actual log file didn't show any problems either.  Even connecting to the production machines and running the same request using script/console's app.get worked fine!

Finally they took an entire slice out of their production architecture and made the request while watching the Apache logs.  And lo and behold - mod_security was seeing a large response and returning a 500 code.
This was a surprise since mod_security had been (we thought) configured in "logging-only" mode. 

Lessons learned are 1) load up the staging environment with lots of data to shake out any such issues, 2) study the mod_security settings to ensure it'll do what's expected, and 3) if ExceptionNotifier and app.get
tells you no exception is happening in the Rails app, widen your search. 

TeaClipse and JavaCC

I recently came across Matthew Hind's TeaClipse project which he recently announced on his blog.  TeaClipse is an open source compiler for the Tea programming language.  You can read Matthew's thesis on the project here.  Note that if you want to use the grammar you'll need to make a couple of small modifications to SimpleNode that he describes in the thesis - and if you don't make them, the compiler will tell you about them.

I ran the grammar through JJDoc and here is the HTML'ized grammer.   It looks good and is pretty straightforward.  One minor suggestion I have is that the OPTIMIZE_TOKEN_MANAGER option is no longer available in JavaCC 4.0, so it can be removed from the grammar.  You may want to use CACHE_TOKENS instead, especially in this case since the Tea editor is reading from a file rather than processing new characters as they're typed.

Another small suggestion - JavaCC now has a facility for inlining assignment code more effectively.  For example, rather than doing this:

void foo():
{   Token t;}
{
t = <TYPE>

{ jjtThis.firstValue = t.image;}
}

You can now do this:

void foo():
{}
{
jjtThis.firstValue =

<TYPE>.image
}

That is, you can just make the assignment inline rather than first assigning to a temporary variable and then making another assignment in a syntactic action.  This saves some space, is arguably easier to read, and is more efficient to boot.    The method_declaration nonterminal in tea.jjt is one place where this might be useful.

I'm reading through the rest of Matthew's thesis now and learning a bit about quadruples.  Good stuff!   And of course, as the author of a JavaCC book I'm always happy to see new grammars surface!

Rails and ProxyPreserveHost

This will probably be a "duh" thing for the Apache gurus out there, but it was a learning experience for me.  The other day I ran into a little Apache/Rails problem.  I had a front end server running Apache, and a back end server running Apache and proxying to a Mongrel cluster.  When the Rails app controller code would call do_redirect the client browser would be redirected to the back end server's hostname.   The solution was two-fold:

  • I added a ProxyPreserveHost On directive to the front end server

  • I added a ServerAlias frontend directive on the backend server to the definition for the virtual host that I was proxying to.  This might not have been necessary if we only had one vhost on the backend, or if we had no vhosts on the backend and were just proxying to an IP address.    But we had multiple vhosts on that backend machine, so, bob's your uncle.

Once those directives were in place the redirects worked as expected and all was well.  Apache for the win!

More JavaCC optimizations

Paul Cager has been improving JavaCC again - this time he reduced the amount of object allocation done by a JavaCC-generated lexer. 

This began with a nicely detailed bug filed by s_fuhrm that showed that a new StringBuffer is being recreated for every token that's parsed when we could really just reuse one StringBuffer and clear it out after each match.  The change that Paul implemented is especially nice in that it also eliminates an if statement (a null check), so that's an extra performance boost.

The only gotcha is that if you've been using the image variable in your lexical actions, you'll start getting different results.  For example, suppose you had a lexical specification like this:

TOKEN_MGR_DECLS : {
  static StringBuffer lastB = new StringBuffer();
  static void p() {
    System.out.println("lastB is : " + lastB);
  }
}
TOKEN : {
  <A : "a"> { p(); }
  | <B : "b" (["1"-"9"])* > { p() ; lastB = image; }
}

With JavaCC 4.0, the image would never be reused and with input data of b12 a b42 you'd get output like this:

lastB is : 
lastB is : b12
lastB is : b12

In other words, that image object that lastB is referencing would stick around.  With this change in place, image (like the Matrix) is reloaded and you'll get this:

lastB is : 
lastB is : a
lastB is : b42

One solution is to use matchedToken.image instead - or you could just call toString on the image reference to get a copy of the String.    You can see an example of this on page 59 of Generating Parsers with JavaCC.  Finally, if you want to give your grammar a whirl with this change, I've posted a new javacc.jar built from the latest code in CVS here.  Enjoy!

RubyForge - 5000 projects

Just a quick post to note that RubyForge just went over the 5K project mark - ruby-reddit was #5000. Good times, and here's to the next 5000 projects!

RubyForge and transparent proxies

Some folks have reported problems logging into RubyForge from networks behind certain ISPs' not-so-transparent proxies - Telkom and TPG users seem to have this problem, and folks in Singapore have reported it as well.  So there's been a long-standing bug on this problem. 

I think we've got a work around for this now since I've hacked some IP subnet exclusions into the session tracking code.  It seems to be working; thanks to Sau Sheong Chang and Andy Shen for helping to test out the fix. 

So anyhow, I'm closing that bug. If you have problems logging into the RubyForge site and keep getting cookie-related errors, please drop me a note and we'll sort through it.  Thanks!