Now in stock!

Tom Copeland's Recent Posts

RSS Feeds

mod_rails and Capistrano

Here's the Capistrano code I've been using with mod_rails:

namespace :deploy do
  desc "Restarting mod_rails with restart.txt"
  task :restart, :roles => :app, :except => { :no_release => true } do
    run "touch #{current_path}/tmp/restart.txt"
  end

  [:start, :stop].each do |t|
    desc "#{t} task is a no-op with mod_rails"
    task t, :roles => :app do ; end
  end
end

I think that's all there is to it... as far as I can tell, start and stop don't make much sense in a mod_rails context.  I suppose you could use them to disable the app altogether somehow, although that kind of relates to this issue about maintenance.html.

Credit for the restart task goes to Jim Neath; the change I made was just to put it in the deploy namespace.  Works either way, though.

mod_rails for your staging environment

This is more of a twitter than a proper blog entry, but if you're using mod_rails for an environment other than production, don't forget to set RailsEnv staging or whatever your environment name is in your Apache configuration file.

mod_rails and "Apache 2... not found"

This is probably too easy a fix to even blog about, but, hey.  I was installing mod_rails for the first (second?) time, and got this message when I ran passenger-install-apache2-module:

 * Apache 2... not found
* Apache 2 development headers... not found

I had Apache installed in /usr/local/apache2 and /usr/local/apache2/bin wasn't on the PATH, so passenger's PlatformInfo.find_httpd method couldn't find it.  A quick PATH=$PATH:/usr/local/apache2/bin/ && export PATH, rerun passenger-install-apache2-module, and all was well.

5/14/2008 update: If you do this you'll probably also want to add that directory to your PATH setting in /etc/profile; otherwise you'll get similar error messages when you run passenger-memory-stats.

Problem trends with PMDReports

Ben Northrup emailed me about his PMDReports project.  To quote from his project page, "Whereas PMD generates and displays code quality statistics at a source code level, PMDReports persists and aggregates these statistics so that code quality can be viewed from a more macro, component level."

I got the latest version (0.8.2) and ran it on PMD's util package; here's the resulting "Component Quality Snapshot Today" report.  You can drill down into the rule violations, and although I just ran it on one component I could see how this could be the makings of a nice dashboard.  It would also be interesting to see those trend graphs after a few days.

Ben's past postings are worth a read; I enjoyed Why Code Quality Tools Work and Why our Programming Gods are so Unkempt?.  Especially the latter one!

RubyForge now has Git support

Folks using RubyForge have been requesting alternatives to CVS and Subversion for a while - there are feature requests in for Mercurial, Monotone, Darcs, and Git.  Of those, Git seems to be the most popular at the moment, so thanks to Garry Dolley's excellent tutorial on Gitosis, RubyForge now supports Git as one of the SCM choices.    Huzzah!   Garry not only put up the tutorial, he also volunteered EBay4R as the first project to use Git and helped me work through the initial configuration.  Thanks, Garry! 

This Git support is still pretty new, so I'm not quite sure if we've got all the right things set up.  But you should be able to start a project, select Git as the SCM, and push to and pull from a repository on RubyForge, and the Git repos are part of the nightly backup job.  There are nine projects so far that have established Git repositories, so something must be working.  Also, I've put up some notes in the RubyForge FAQ on getting started with a repository. 

Dr. Nic Williams posted some helpful notes on using RubyForge Git repositories.  He's also got some ideas on RubyForge supporting both Svn and Git for a project which are being discussed in a support ticket

Another thing I'm thinking about is providing a sort of pseudo-SCM that lets you say "the source for this project is hosted on GitHub or Gitorious or some other place".  Does anyone think that'd be useful?

Comments and suggestions are always welcome, and as they say on the Rails Envy podcast, let's git 'er done!   

RubyForge virtual host definitions

This is a little silly, but, hey.  Until this evening, all 5500+ RubyForge virtual host definitions have been embedded in one massive httpd.conf.  So httpd.conf was up to around 1.5 MB, or in terms of line count:

$ wc -l /usr/local/apache2/conf/bkp.httpd.conf 
71045 /usr/local/apache2/conf/bkp.httpd.conf

Nice, huh?  Yikes.  Anyhow, I finally buckled down to moving all the vhosts into separate files in a subdirectory and doing an Include *.conf.  Although I've put this off for a _long_ time, it only took about 15 minutes, makes the project management code simpler, and also makes a couple things - like project deletion - a bit easier to automate.  The only tricky bit is backing up all those files... but come to think of it, 99% of them are exactly the same (or could be generated) and thus don't need to be backed up.  The only outliers are vhosts like gems.rubyforge.org, which has a RewriteMap to support gem mirroring.

So often this seems to be the case... if I can just buckle down to a job it ends up being quicker and more enjoyable than I thought.  "In every job that must be done there is an element of fun...."

All that said, if you notice something awry with a RubyForge virtual host, please drop me a line - I might have missed something, thanks!

Rails, Capistrano, and Ruby extension dependencies

Sometimes if I hose up a Ruby install it won't have zlib or openssl or readline compiled in, which produces problems later when I try to get a Rails app running on there.  But with a little code we can get Capistrano to check for Ruby extensions (or, really, Ruby libraries in general) via the deploy:check task. 

Here's the new RemoteDependency implementation; you can just put this in your deploy.rb:

module ::Capistrano
  module Deploy
    class RemoteDependency
      def ruby_extension(extension_name, options={})
        output_per_server = {}
        try("ruby -r#{extension_name} -e ''", options) do |ch, stream, out|
          output_per_server[ch[:server]] ||= ''
          output_per_server[ch[:server]] += out
        end
        @success = true
        errored_hosts = []
        output_per_server.each_pair do |server, output|
          next if output.empty?
          errored_hosts << server
        end
        if errored_hosts.any?
          @hosts = errored_hosts.join(', ')
          output = output_per_server[errored_hosts.first]
          @message = "The Ruby interpreter was unable to load the extension #{extension_name}"
          @success = false
        end
        self
      end
    end
  end
end

Here's how to declare instances of this new dependency type; these go in deploy.rb also.  Note that we're only checking for these dependencies on the app server; you may not even have Ruby installed on your web servers:

depend :remote, :ruby_extension, :zlib, :roles => :app
depend :remote, :ruby_extension, :openssl, :roles => :app
depend :remote, :ruby_extension, :readline, :roles => :app

And here's a sample run:

$ cap deploy:check
    triggering start callbacks for `deploy:check'
  * executing `multistage:ensure'
*** Defaulting to `production'
  * executing `production'
  * executing `deploy:check'
  * executing "test -d /var/www/myapp/releases"
    servers: ["myserver.com"]
    [myserver.com] executing command
    command finished
  * executing "test -w /var/www/myapp"
    servers: ["myserver.com"]
    [myserver.com] executing command
    command finished
  * executing "test -w /var/www/myapp/releases"
    servers: ["myserver.com"]
    [myserver.com] executing command
    command finished
  * executing "which svn"
    servers: ["myserver.com"]
    [myserver.com] executing command
    command finished
  * executing "which gem"
    servers: ["myserver.com"]
    [myserver.com] executing command
    command finished
  * executing "ruby -rzlib -e ''"
    servers: ["myserver.com"]
    [myserver.com] executing command
    command finished
  * executing "ruby -ropenssl -e ''"
    servers: ["myserver.com"]
    [myserver.com] executing command
    command finished
  * executing "ruby -rreadline -e ''"
    servers: ["myserver.com"]
    [myserver.com] executing command
    command finished
You appear to have all necessary dependencies installed

Note the boilerplate deploy:check commands up front followed by the custom checks for the libraries.  As always, many thanks to Jamis Buck for Capistrano!

Awesome new RubyForge hardware

Last fall I posted about some sweet new hardware that Sun had sent us.  Well, they've outdone themselves this time - they sent us both a T5120 and an X4500 with 24 TB (yup, I had "GB" here at first, doh!) of storage.  Great scott!  Wow, does that X4500 weigh a ton.

Pictures are below.... and I'll post more details on the uses of these machines as we get them set up.  Thanks Sun!!

The T5120

The X4500

Lots of RubyForge traffic

Back in Sep 2005 I posted a note about RubyForge getting 200K hits per day.  I'm not sure how I was calculating hits back then - was I including images and vhosts and gems and all that?  Dunno. 

Well, anyhow, I did some basic "grep -c" calculations and RubyForge got 350K hits on Mar 4 - about four hits per second.  That's not counting all the vhosts, not counting RubyGem downloads, not counting images and other static content stuff.  So those are pretty much all dynamic requests, where dynamic == something that hits the DB.  Some of those are bots, of course, but, meh.  Also, some of those hits result in redirects to other sites - e.g., when someone downloads a file the download is recorded in the database and then the request is redirected off to a mirror.

So once again let me seize this opportunity to thank the RubyForge mirror providers - without you guys serving up all the long requests RubyForge would be swamped.  Thanks to Gregory Brown for monitoring the user's forum and answering questions.  Thanks to Dennis Oelkers for keeping the mirroring infrastructure running.  And thanks to Tim Bray (and Sun) for the lovely X4200 that RubyForge is currently running on.  Good times!

mod_security woes

A customer had an unfortunate experience with mod_security recently. They were getting occasional HTTP 500 responses from their Rails app on certain large pages.  ExceptionNotifier wasn't reporting any stacktraces, and a check of the actual log file didn't show any problems either.  Even connecting to the production machines and running the same request using script/console's app.get worked fine!

Finally they took an entire slice out of their production architecture and made the request while watching the Apache logs.  And lo and behold - mod_security was seeing a large response and returning a 500 code.
This was a surprise since mod_security had been (we thought) configured in "logging-only" mode. 

Lessons learned are 1) load up the staging environment with lots of data to shake out any such issues, 2) study the mod_security settings to ensure it'll do what's expected, and 3) if ExceptionNotifier and app.get
tells you no exception is happening in the Rails app, widen your search. 

TeaClipse and JavaCC

I recently came across Matthew Hind's TeaClipse project which he recently announced on his blog.  TeaClipse is an open source compiler for the Tea programming language.  You can read Matthew's thesis on the project here.  Note that if you want to use the grammar you'll need to make a couple of small modifications to SimpleNode that he describes in the thesis - and if you don't make them, the compiler will tell you about them.

I ran the grammar through JJDoc and here is the HTML'ized grammer.   It looks good and is pretty straightforward.  One minor suggestion I have is that the OPTIMIZE_TOKEN_MANAGER option is no longer available in JavaCC 4.0, so it can be removed from the grammar.  You may want to use CACHE_TOKENS instead, especially in this case since the Tea editor is reading from a file rather than processing new characters as they're typed.

Another small suggestion - JavaCC now has a facility for inlining assignment code more effectively.  For example, rather than doing this:

void foo():
{   Token t;}
{
t = <TYPE>

{ jjtThis.firstValue = t.image;}
}

You can now do this:

void foo():
{}
{
jjtThis.firstValue =

<TYPE>.image
}

That is, you can just make the assignment inline rather than first assigning to a temporary variable and then making another assignment in a syntactic action.  This saves some space, is arguably easier to read, and is more efficient to boot.    The method_declaration nonterminal in tea.jjt is one place where this might be useful.

I'm reading through the rest of Matthew's thesis now and learning a bit about quadruples.  Good stuff!   And of course, as the author of a JavaCC book I'm always happy to see new grammars surface!

Rails and ProxyPreserveHost

This will probably be a "duh" thing for the Apache gurus out there, but it was a learning experience for me.  The other day I ran into a little Apache/Rails problem.  I had a front end server running Apache, and a back end server running Apache and proxying to a Mongrel cluster.  When the Rails app controller code would call do_redirect the client browser would be redirected to the back end server's hostname.   The solution was two-fold:

  • I added a ProxyPreserveHost On directive to the front end server

  • I added a ServerAlias frontend directive on the backend server to the definition for the virtual host that I was proxying to.  This might not have been necessary if we only had one vhost on the backend, or if we had no vhosts on the backend and were just proxying to an IP address.    But we had multiple vhosts on that backend machine, so, bob's your uncle.

Once those directives were in place the redirects worked as expected and all was well.  Apache for the win!

More JavaCC optimizations

Paul Cager has been improving JavaCC again - this time he reduced the amount of object allocation done by a JavaCC-generated lexer. 

This began with a nicely detailed bug filed by s_fuhrm that showed that a new StringBuffer is being recreated for every token that's parsed when we could really just reuse one StringBuffer and clear it out after each match.  The change that Paul implemented is especially nice in that it also eliminates an if statement (a null check), so that's an extra performance boost.

The only gotcha is that if you've been using the image variable in your lexical actions, you'll start getting different results.  For example, suppose you had a lexical specification like this:

TOKEN_MGR_DECLS : {
  static StringBuffer lastB = new StringBuffer();
  static void p() {
    System.out.println("lastB is : " + lastB);
  }
}
TOKEN : {
  <A : "a"> { p(); }
  | <B : "b" (["1"-"9"])* > { p() ; lastB = image; }
}

With JavaCC 4.0, the image would never be reused and with input data of b12 a b42 you'd get output like this:

lastB is : 
lastB is : b12
lastB is : b12

In other words, that image object that lastB is referencing would stick around.  With this change in place, image (like the Matrix) is reloaded and you'll get this:

lastB is : 
lastB is : a
lastB is : b42

One solution is to use matchedToken.image instead - or you could just call toString on the image reference to get a copy of the String.    You can see an example of this on page 59 of Generating Parsers with JavaCC.  Finally, if you want to give your grammar a whirl with this change, I've posted a new javacc.jar built from the latest code in CVS here.  Enjoy!

RubyForge - 5000 projects

Just a quick post to note that RubyForge just went over the 5K project mark - ruby-reddit was #5000. Good times, and here's to the next 5000 projects!

RubyForge and transparent proxies

Some folks have reported problems logging into RubyForge from networks behind certain ISPs' not-so-transparent proxies - Telkom and TPG users seem to have this problem, and folks in Singapore have reported it as well.  So there's been a long-standing bug on this problem. 

I think we've got a work around for this now since I've hacked some IP subnet exclusions into the session tracking code.  It seems to be working; thanks to Sau Sheong Chang and Andy Shen for helping to test out the fix. 

So anyhow, I'm closing that bug. If you have problems logging into the RubyForge site and keep getting cookie-related errors, please drop me a note and we'll sort through it.  Thanks!

Updates for the Ruby PostgreSQL native driver

UPDATE: Jeff Davis has released a new PostgreSQL native driver gem, so just doing a gem install postgres should work fine again; no need for the hackery that I had suggested in this post.  Thanks to Jeff for getting a new gem out there so quickly!

Josh Berkus let me and some other folks know that the gem that you get with "gem install postgres" won't work with PostgreSQL 8.3.  That gem (postgres-0.7.1 gem) has been languishing for a while and is finally, well, gone.  So if you do a "gem install postgres" you'll see this:

$ gem install postgres
ERROR:  While executing gem ... (Gem::GemNotFoundException)
    Could not find postgres (> 0) in the repository

OK, so, now what?  Well, Jeff Davis (who originally noticed the breakage) has been maintaining the driver in a different RubyForge project, ruby-pg.  So to use his Postgres library, go to the ruby-pg files page, download the latest, untar it, and do a ruby extconf.rb && make && sudo make install.  Huzzah!

I noticed that when I built and installed the library, I ran it with Rails and got the error MissingSourceFile: no such file to load -- postgres.  I had to tweak ActiveRecord to pick up the new library, specifically, I edited line 7 of  activerecord-some_version_number_like_1.14.4/lib/active_record/connection_adapters/postgresql_adapter.rb to read require_library_or_gem 'pg' unless self.class.const_defined?(:PGconn).  There's probably a better way to do this, please feel free to chime in with a comment or just email me or whatever... thanks!

As a last resort, the "postgres-pr" gem is still available.  That's the pure Ruby driver, so it's very slow.  I haven't tried it with PostgreSQL 8.3 either, so, who knows.  And as a last last resort, I stuck a gzip'd copy of postgres-0.7.1.gem here, so you can always just grab that too.

Thanks to Jeff for keeping this driver going, and thanks to Josh for the prodding!

Finding copy/pasted code in a Rails app

Ryan Owens and I were looking at a Rails app a few days ago; we knew that there was some view code that had been copied and pasted but we weren't sure exactly where it was.  When I was doing Java fulltime I had worked on a copy/paste detector CPD; this handy utility has support for several languages including - thanks to Zev Blut - Ruby!

So we went to the CPD web page, fired it up via the Java Web Start link, had it scan app/controllers/ as a trial run, and lo and behold, it found a couple of duplicate methods!  Then we twiddled the settings to check the .rhtml files - we selected the "by extension" setting from the "language" dropdown, and put "rhtml" in the "extension" text field.  We kicked it off and what do you know - it found the exact duplication that we were looking for.  A few minutes later we had cleaned all that up and checked it in.  Good times.

The Ruby support isn't as good as it could be; it'd be nicer if we had a real JavaCC tokenizer for Ruby and it'd also be nice if we had one for ERB.  Right now we just skip spaces and comments and pretty much every other character is seen as a token.  So someone should get my JavaCC book and do this work.  If I get motivated, perhaps that someone will be me....

PMD 4.1 released!

Xavier Le Vourch has released PMD 4.1!   PMD is a Java static code analysis utility - it finds unused code, design issues, duplicated code, and much more.  I posted earlier about the release candidate, and the 4.1 final is pretty much the same code - 14 new rules, a bunch of bug fixes, fewer false positives, and the IntelliJ IDEA integration works again.  You can download this release or view the changelog.

On the CPD (copy/pasted code detector) side of the house there's better Ruby support and a new Microsoft Visual Studio output format.  I used the Ruby CPD checker to scan a Rails app just yesterday and it worked great - it found a couple of duplicated methods across some helpers and controllers.  You can even check .rhtml files using the "Any language" support and find duplicated code there.

PMD continues to develop nicely; I hope to post on the type resolution inner workings soon.  Go PMD!   And of course, go PMD book!  Now a mere $20.

JDeveloper PMD plugin release

The PMD plugin for JDeveloper has been brought up to date thanks to Torsten Kleiber. Here are the release notes, and you can download the plugin here. From the release notes, it looks like this version of the plugin should work in JDeveloper 11.1.1 Developer Preview 2. Also, you may be able to upgrade the plugin from within JDeveloper itself - there's a new zip file on the update site (http://pmd.sf.net), but I haven't tried that. I'll try to give it a whirl later.

PMD 4.1 rc1 - 14 new rules

Xavier Le Vourch has released PMD 4.1 rc1!    PMD is a Java code analysis utility that includes a bunch of rules and makes it easy to write custom rules to meet your needs.

This release has a whopping 14 new rules, including some EJB sanity checks like DoNotCallSystemExit and StaticEJBFieldShouldBeFinal, a design ruleset addition with EmptyMethodInAbstractClassShouldBeAbstract, and a check for a specific embedded literal type with AvoidUsingHardCodedIP.

PMD 4.1rc1 includes a pile of bug fixes and features.  False positives have been eliminated from BooleanInstantiation, UnusedPrivateField, SingularField, and various other rules.  CloneMethodMustImplementCloneable no longer throws exceptions when checking enums.  There's a new "nicehtml" renderer, Jaxen has been upgraded to 1.1.1, and CPD's Ruby support is better.

Xavier, Allan, Wouter, and Romain have also made good progress on type resolution... but I'll write that up in another post.  Go PMD!    And of course, go PMD book!  Now a mere $20.

RubyForge on Rails

Since I've been doing a lot of Rails lately, I've been talking with Rich Kilmer and Ryan Owens about porting RubyForge to Rails.  Right now RubyForge runs on a somewhat customized version of GForge.  Some noodlings on what a rewrite would involve:

  • One cost would be that we would have our own codebase that we'd then have to support.  And one benefit is that we'd have our own codebase that we'd then be more able to support :-)  The performance would probably be about the same; the current site is pretty snappy. 
  • We're currently running GForge 4.0.2.  It would make sense to upgrade to the newest release (4.5.x or even 4.6 beta) in case anyone other GForge users wanted to follow suit.
  • We could probably port to Rails by doing an ActiveRecord model on the existing schema.  Doing an AR model first would let us do some of the peripheral parts - like the RSS feeds and the cronjobs and such - without affecting the core code.
  • This would take a fair bit of effort, and a lot of the work would be stuff that I'm not good at - user interface, layouts, CSS.   Maybe we could make the job simpler by just supporting one theme and then re-adding the other themes as we had time.  Ryan could do this stuff better than I, but it'd be in his spare time, which is already in high demand.

A rewrite would mean a lot of work in people's spare time and no one would see any initial benefit.  But it make it easier for us to add more AJAXy niftiness and other features as needed.   But also.... it would probably be fun.  That might tip the balance :-)

JavaCC/JJTree file generation bug fix

For a long time there's been an annoying bug with JJTree and Windows.   JJTree generates source files with the directory path in a comment header, and if one of the subdirectoy names starts with a "u" it ends up being output as a Unicode escape.  This causes problems about ten seconds down the road when you try to compile the code.

For example, suppose you're generating the JJTree files into my\utils\dir.  You'll get a comment header like this in your Constants file:

/* Generated By:JJTree: Do not edit this line.  my\utils\dir\SomeParserTreeConstants.java */

A \u begins a Java Unicode escape sequence, and \utils is, of course, an invalid escape sequence.  Thus the compiler chokes on this invalid sequence when it tries to lex this file. 

Well, thanks to Paul Cager, this bug is fixed in CVS.  He actually just avoided the problem by just printing the generated file name in the comment header rather than including the whole directory path.  I've uploaded an updated JavaCC 4.0 jar file here if you want to give it a try.   Enjoy, and thanks to Paul for cleaning this up!

Using JavaCC?  Check out my JavaCC book!

Better JJTree Visitors

JavaCC comes with a built in tree builder, JJTree.  One of the nice bits about JJTree is that it will generate a visitor implementation so you can easily traverse the abstract syntax tree.  However, the interface that JJTree generates looks like this:

public interface FooVisitor {
  public Object visit(SimpleNode node, Object data);
  public Object visit(ASTBar node, Object data);
}

Since that second parameter is an Object, you're always downcasting if you want to use it.  For example, if you're passing around a Map, you'll need to do something like this in your visitor:

public class MyVisitor extends MyVisitorAdapter {
  public Object visit(ASTBar node, Object data) {
    Map myMap = (Map)data;
    // do stuff with the Map
  }
}

But thanks to Paul Cager, JJTree has a new VISITOR_DATA_TYPE option.  Just set it like this:

options {
  VISITOR=true;
  MULTI=true;
  VISITOR_DATA_TYPE="java.util.Map";
}

With this option in place, the generated visitor interface looks like this:

public interface FooVisitor {
  public Object visit(SimpleNode node, java.util.Map data);
  public Object visit(ASTBar node, java.util.Map data);
}

And now your visitor implementation can look like this - no cast necessary!

public class MyVisitor extends MyVisitorAdapter {
  public Object visit(ASTBar node, java.util.Map data) {
    // do stuff with the Map
  }
}

Note that you'll want to declare a fully-qualified type, otherwise you'll need to go back and fill in the import statements.  So it's more readable, more type-safe, and it probably yields a small runtime performance improvement due to the eliminated cast.  You can grab a javacc.jar built from CVS here if you want to give it a whirl.  Props to Paul for this and his other recent bug fixes!

Using JavaCC or JJTree?  Get the JavaCC book!

A JavaCC/JJTree bug fixed

Paul Cager has been working on various JavaCC and JJTree bugs lately.  Just recently he fixed a rather annoying bug; in JavaCC 4.0, the OUTPUT_DIRECTORY option setting wasn't copied from the .jjt file into the .jj file.  So you'd have a grammar like this:

$ cat foo.jjt 
options {
  OUTPUT_DIRECTORY="foobar";
}
PARSER_BEGIN(Foo)
public class Foo {}
PARSER_END(Foo)
void a() : {} {" a" }

And running JJTree on it would result in an error like this:

$ jjtree foo.jjt 
Java Compiler Compiler Version 4.0 (Tree Builder)
(type "jjtree" with no arguments for help)
Reading from file foo.jjt . . .
File "foobar/Node.java" does not exist.  Will create one.
Exception in thread "main" java.lang.Error: java.io.FileNotFoundException:
foobar/Node.java (No such file or directory)
        at org.javacc.jjtree.NodeFiles.ensure(Unknown Source)
        at org.javacc.jjtree.NodeFiles.ensure(Unknown Source)
        at org.javacc.jjtree.NodeScope.insertOpenNodeCode(Unknown Source)

And if you did create the "foobar" directory and run JJTree again, the "foo.jj" file would be created in the working directory and wouldn't have the OUTPUT_DIRECTORY option.  Booo.  However, with Paul's changes, it now works fine:

$ ~/javacc/bin/jjtree foo.jjt 
Java Compiler Compiler Version 4.1d1 (Tree Builder)
(type "jjtree" with no arguments for help)
Reading from file foo.jjt . . .
Warning: Output directory "foobar" does not exist. Creating the directory.
File "foobar/Node.java" does not exist.  Will create one.
File "foobar/SimpleNode.java" does not exist.  Will create one.
Annotated grammar generated successfully in foobar/foo.jj

And the OUTPUT_DIRECTORY option is preserved:

$ grep OUTPUT_DIRECTORY foobar/foo.jj 
  OUTPUT_DIRECTORY="foobar";

Good stuff!  I think we're getting close to a 4.1 release, which will be nice since it'll have lots of improvements to the Java 1.5 code that JavaCC generates, and it'll be the first official release that's BSD licensed.     Hopefully we can get in a few more bug fixes and then get this release out the door.

And, of course, here's a gratuitous plug for my JavaCC book!

A new domain specific language book - with JavaCC

I came across a post on Warner Onstine's blog; he's working on a book on DSLs for the Pragmatic Programmers.  Sounds like a great project; and I'm glad to see that they're including a chapter on what are usually known as "external DSLs" with ANTLR and JavaCC

He has a comment about how well PragProg's book production system is working.  I used what I think is a similar system - Docbook plus a bunch of customized XSLT and Ruby utilities - to put together Generating Parsers with JavaCC and it's worked out quite nicely too.  It's great to be able to run all the code examples and have both the code and the output plugged into the book's content - no more copying and pasting code!  I was even able to do callouts, which I only used in one or two places, but it's handy to have the option.

New hardware for RubyForge.

Thanks to our friends at Sun, we've got a spiffy new X4200 M2 server (with a prodigious 8 GB of RAM) for RubyForge.  It's racked up and ready to go; we're migrating it this evening.  This will mean a bit of downtime, but hopefully not too much.  I'll post updates here.

0246 EDT: Most things should be in place.  If you notice anything awry, please let me know, thanks!

More reliable gem installs

You may have seen this message from "gem install" before:

$ gem install rails
ERROR:  While executing gem ... (Gem::GemNotFoundException)
    Could not find rails (> 0) in any repository

Then when you run "gem install rails" five minutes later, it installs just fine.  This was due to the way we were rebuilding the gem index on RubyForge - we were doing it "in place", so that the current index would be overwritten and then populated over the course of the build.  These take a fair while - 10 minutes or so - and during that time the index was essentially empty.  Booooo.

Well, no longer.  Eric Hodel has twiddled the gem index builder to build it in a temporary directory and then move it in place.  So those gem index outages should be a thing of the past.  Thanks Eric!

Generating Parsers With JavaCC reviewed

I just noticed that Andy Glover has posted a nice review of my JavaCC book. He's got some kind words to say about the book, which is especially meaningful to me since I know that Andy travels all over the place giving talks and so he's got a good overview of the kinds of problems that folks are dealing with these days. As he says, DSLs are the new hotness and tools like JavaCC can help make the front end part of a DSL easier to build. Thanks Andy!

An LOLCODE interpreter using JJTree

Here's something that's lingered in my blog TODO queue for far too long.  Brian Egge, formerly of ThoughtWorks and now of Macquarie Bank Limited, wrote an LOLCODE interpreter with JavaCC and JJTree.  Behold:

$ bin/lol.sh test/samples/hello_world.LOL 
HAI WORLD!

I salute his efforts in this crucial field and humbly submit a small patch:

Index: src/com/lolcode/parser/LolCode.jjt
===================================================================
--- src/com/lolcode/parser/LolCode.jjt  (revision 20)
+++ src/com/lolcode/parser/LolCode.jjt  (working copy)
@@ -141,7 +141,7 @@
{}
{
   <IM> <IN> <YR> <LOOP> <EOL>
-  ( LOOKAHEAD(2147483647) Statement() )+
+  ( LOOKAHEAD(Statement()) Statement() )+
   <IM> <OUTTA> <YR> <LOOP> <EOL>
}

@@ -157,7 +157,7 @@
{}
{
   IncrFunction()
-|  LOOKAHEAD(2147483647) BreakFunction()
+|  LOOKAHEAD(BreakFunction()) BreakFunction()
|  <IZ> BoolExpression() <O_RLY> <QUESTION> <EOL> then() ( Else() )? <KTHX> <EOL>
| <OPEN> <IDENT> <QUOTED_STRING>
}
@@ -245,4 +245,4 @@
    return t.image.substring(1, t.image.length() - 1);
  }
}

I'm just happy to be able to contribute anything to this work.  Go Brian!

Tracking down a Rails app memory leak

Recently on thenewsroom we've noticed a memory leak; the process size would just grow and grow up over 1 GB per Mongrel.  Restarting them regularly kept things more or less under control, but that's silly.  Much better to find the actual problem.

I tried out various memory profilers, like mongrel -B and Scott Laird's MemoryProfiler.  But the results just didn't jump out at me; I spent quite a while poking around Objectspace dump files and String diffs and such without really making much progress.

Then I tried Evan Weaver's BleakHouse.  This was nicer; the Gruff graphs are cool and it showed that the leak was in the way we were using Ruby and Rails.  So we didn't have a class Hash in a controller or anything like that.  But I still couldn't pin it down; I was just seeing charts with upwardly sloping numbers of String objects.

Finally I backed down to just poking around the app while watching the output of:

watch -n 1 "ps -o rss,vsz -p 12345"

Nothing interesting surfaced at first.  OK, what if I look at the differences between my dev and our production database?  Ah ha!  A particular table in my DB has 200 rows, the production one has 10K rows, and a particular page hit by a particular user causes 1500 of them to be fetched.  And that corresponds with a huge leap in memory size. 

Here was the problem.  Our architecture involves various Rails apps passing data around - and we were using Marshal'd OpenStruct objects as the transfer mechanism.  This is a bad idea.  Here's a demo of why.  First, let's write 10K Marshal'd OpenStruct objects to a file:

require 'ostruct'
structs = []
10000.times do |x|
  structs << OpenStruct.new(
   :first_name => "Fred#{x}",
   :last_name => "Fredson#{x}",
   :hatsize => 12+x,
   :country => "FooLand#{x}")
end
File.open("data.dat", "w") { |f| f.syswrite(Marshal.dump(structs)) }

This creates a data.dat file of about 500K.  Now let's read them back in, but check the process size at various points.  Here's a read.rb:

require 'ostruct'
puts "Hit enter to read data"
gets
puts "Reading data..."
data = Marshal.load(File.read("data.dat"))
puts "Hit enter to set to nil"
gets
data = nil
GC.start
puts "It's nil, back from GC.start"
sleep 500

When we run read.rb, the process is about 3 MB:

$ ps -o rss,vsz -p 30079
RSS    VSZ
1516   3048

Now, after we read in the data:

$ ps -o rss,vsz -p 30079
RSS    VSZ
80600  82104

Great scott, 82 MB!   But surely that GC.start will clean them up, right?

$ ps -o rss,vsz -p 30079
RSS    VSZ
80620  82104

And, no.  Booooo.  But replacing that OpenStruct with a Hash causes the memory size to only go from 3 MB to 7 MB, which is a bit of an improvement over rising from 3 MB to 80 MB. 

What would have been nice in this situation would be to have some sort of heap analyzer that showed the size of memory occupied by each object graph subtree.  It'd be worth looking at Evan's patch to Ruby 1.8.6 and his profiler to see if we could somehow gather and dump enough information to create something like that.  What a cool project that would be for someone much smarter than I :-)

20000 users on RubyForge

Looks like RubyForge just went up over the 20K user accounts mark; now it's at 20,188.  Some of these accounts are inactive, of course, but, still.  Here's a chart of RubyForge growth since we stood it up a couple of years ago - click on the chart to see the full-size image:

The growth seems to have levelled out in the last nine months or so, but it's still pretty impressive - around 35 new users each day.  And of course that doesn't include the tons of people who do "gem install rails" without having a Rubyforge account.  Good times!

Get the JavaCC book and a free copy of the PMD book

My publisher, Centennial Books, is running a promotional deal - if you buy my JavaCC book you also get a copy - free! - of my PMD book.  The two books should complement each other pretty well - the JavaCC book will give you all the fundamentals on JavaCC and JJTree, and the PMD book will provide an example of an application written on top of those fine tools.  Enjoy!

Speaking of JavaCC, I came across an interesting post a week or so ago about a Java to Delphi converter that apparently uses JavaCC. There are some code samples there that show the result of the conversion process.  Makes me wonder if they do anything to translate object usages and such - is there a Delphi equivalent for ArrayList?  That seems to be one of the hard parts of a source to source translation like that... although even if you can't do object usage mappings, a translation tool might get you 90% of the way there.  But that last 10% might have to be a manual process.

RubyForge per-project RSS news feeds

Quite a while ago Hugh Sasse asked if an individual RubyForge project news page could have an RSS feed.  That capability was built into GForge, but I couldn't figure out how to get Firefox to surface it.  Hugh had posted a suggestion, but it slipped off my radar for a while.

Anyhow, I added the code he suggested and lo and behold, there it is.  So now if you go to, say, the Ruby Windows installer project news page and click on the orange RSS icon in your Firefox address bar, it'll give you two options.  Pick the second one and that'll be the project news feed.  Huzzah!

I'm still not sure how to twiddle that link tag so that it shows a different name for each feed - when I change the "title" attribute to anything other than "RSS" it seems to confuse Firefox.  Suggestions welcome. Update: Hugh suggested a fix, it worked fine, now the feeds are properly labelled. Thanks Hugh!

An excellent Continuous Integration book

Paul Duvall has a new book out: Continuous Integration.  It's a good one - it has lots of well stated reasons for doing a continuous integration system of some sort, including all sorts of little conversational snippets that help make the point.  For example, "Jim: hey, the test server isn't working.  Joe:  Ah yes... I forgot to 'svn add' one of my source files."  Doh! 

He suggests using one of those ambient orb gizmos to provide visual feedback on failed builds.  That's probably a better idea than a sound alert - I found that having one of those go off every once in a while was just annoying.  You can cut an email to everyone when a build fails, although that risks folks just routing them to a folder and letting them stack up.

Of course I was happy to see that the section on static analysis included lots of suggestions for using PMD and CPD.  CPD, especially, is a handy tool; the ability to find duplicated code across a large codebase is very nice. 

One of the hard things about a book on CI is that CI touches so many different parts of a system - the database, scripting, code, tests, code analysis, etc.  It's not the sort of book that lends itself to one big example; instead, you have to have a bunch of little config files and scripts and examples to help a person get rolling down the CI road.  I think they did a nice job here of having a good mix of theory and practical code samples.

So anyhow, I think this book is a good 'un.  Go get it! 

As an aside, I was a technical reviewer for this book, so there's probably some bias there.  But not too much!

More JavaCC and Java 1.5

Last week I posted about some improvements to JavaCC's Java 1.5 code output.  We've had some discussion on javacc-dev since then and now Sreenivasa has made the call - we're moving to Java 1.5 for JavaCC internally!

This means, of course, that you'll need a Java 1.5 runtime in order to run future releases of JavaCC.  But if your project is still using a Java 1.4 VM, no worries, JavaCC will continue to generate Java 1.4-compliant code for the foreseeable future.  As mentioned in my last post, just use the option setting JDK_VERSION="1.4".

I'm getting ready to check in a patch that migrates one of JavaCC's internal classes to Java 1.5.  Look at all the code that goes away - autoboxing sure makes a nice difference.  And using generics in the collections declarations makes the code more self-documenting, too.  I know this is old hat for folks who having been using this stuff for a while, but I'm still pretty jazzed about it.  Good times.

For more on JavaCC, get my JavaCC book!

RubyForge annoyance fix, 4000 projects

If you have several people in your RubyForge project you may have hit this problem - when one person uploads files to the project's virtual host, the files get written with non-group-writeable permissions, and so when someone else tries to upload files it fails with an error.  Boooo.  Well, there's a fix in place, or at least a workaround, and it's documented in the RubyForge FAQ - see # 25.  Or just see this screenshot and you'll get the gist of it.  The nice thing is that PostgreSQL seems to be fast enough - or the groups table only few enough records - that I've scheduled that cronjob to check for the setting once a minute and it doesn't seem to be impacting things at all.  Good times.

As long as I'm writing, if you've got a RubyForge project, please scan this list of tips for keeping your project tidy.

And I just noticed that RubyForge just went over the 4000 project mark.  Very cool!

JavaCC, Java 1.5, and StringBuilder

I've just checked in a small patch to JavaCC - now it defaults to emitting Java 1.5 source code when generating a parser.  You'll still be able to generate code for Java 1.4 (and earlier) via the JDK_VERSION option, but the default will be 1.5. 

Along the same lines, I checked in another small patch which causes JavaCC to generate tokenizers that use StringBuilder rather than StringBuffer to accumulate token images.  That closes this issue, which I opened last May - the wheels of JavaCC grind slowly but exceedingly fine :-)

Anyhow, I'm glad that we're keeping JavaCC current with Java development; the underlying principles of JavaCC are as useful today as they were when Dr. Sankar and Sreenivasa wrote JavaCC 10 years ago. Just needs an occasional tweak to bring things up to date.

For more on JavaCC, check out my JavaCC book!

JavaCC and fixed width data

Most JavaCC grammars are written to handle input that's variable-width - e.g., source code.  But sometimes you need to be able to parse fixed width data, and JavaCC can be good for that too.  I just wrote an example JJTree grammar that parses well data logs in the Canadian Well Logging Society's LAS format.  You can see the grammar, the source code, example input data, and notes on the grammar's techniques here

For much more on JavaCC and JJTree, get the book - Generating Parsers with JavaCC!

Zoho Creator uses JavaCC

As a big fan of JavaCC, this is good to hear - the application builder Zoho Creator uses JavaCC internally!  The JavaCC mention is near the end of this interview with lead architect Suganyas.

There aren't any real details about how they use JavaCC, but I bet it's used to parse their script code shown in the screenshot below - click the screenshot for a larger view:

Always happy to hear about the JavaCC book market expanding!

Back to JavaCC project work

Now that my JavaCC book is done I've been able to get back to doing some day to day JavaCC work.  For example:

It's nice to get back in the swing of things after being heads down on the book for so long.  Good times!

Rails + Facebook, good times

We're doing a lot of Rails programming for thenewsroom; I blogged about how it's working out really well a while back.

Well, now Rich Kilmer and Chad Fowler have written a Rails app that surfaces thenewsroom's video content through Facebook.  To give it a try, log in to Facebook and go to thenewsroom app page and add the app.  Then you can browse around the various videos (like this one from E3) and comment on them and whatnot.  The videos list gets updated every fifteen minutes or so with the latest content from thenewsroom, so it's always pretty new stuff.   

More importantly, the app is built using Rails, Mongrel, and lots of other Ruby goodness.  Sweet!

JavaCC example code posted

I've posted up the code examples for Generating Parsers with JavaCC; you can find a zip file containing everything on the example code page

Next up: getting nicely HTMLized example source code via another handy JavaCC utility - the Java HTMLizer!

svn+http on RubyForge

I mentioned this on ruby-lang already, but just so more folks get the word, svn+http is now available to provide read-only access to Subversion repositories on RubyForge.  For example:

$ svn list http://mongrel.rubyforge.org/svn/trunk
COPYING
LICENSE
README
Rakefile
bin/
doc/
examples/
ext/
lib/
[ ... etc ... ]
$ svn co http://mongrel.rubyforge.org/svn/trunk mongrel
A  mongrel/test
A  mongrel/test/test_command.rb
A  mongrel/test/test_redirect_handler.rb
A  mongrel/test/test_response.rb
[ ... etc ... ]

This should be convenient for folks who can't use svn+svnserve due to firewall issues - e.g., your firewall blocks outbound port 3690. 

Next up - getting svn+https working.  Paul Duncan suggested looking into mod_pam; I've got mod_auth_pgsql2 working so that may come in handy also.   We shall see.  Anyhow, I think svn+http should do the trick for lots of folks.

PMD 4.0 released!

PMD 4.0 is out!  This release (our first major release since Dec 2006) includes a shift to Java 1.5 both internally and in terms of PMD's defaults; you can still process Java 1.4 code but you'll need to use the targetjdk parameter.  Other changes since PMD 3.9 include ten new rules, prodigious XPath rule speedups, better memory usage, and lots of bugfixes all around.

You can download this release or view the full changelog.

I'm pretty excited about some of the stuff we're looking at for PMD these days.  I think we're finally going to buckle down and get PMD to do full type resolution, so a rule won't just match on typeName.equals("StringBuffer") but will actually resolve the types to ensure it's checking something with StringBuffer.class.  This will eliminate a ton of false positives and should find a lot more problems, too.  On a different note, Ryan Gustafson is working on an optimized XPath 2.0 engine for PMD that would make XPath rules even faster - and would give us more advanced XPath capabilities.  Should be a fun next few months!

If you're using PMD, get the PMD book!  Or, if you're interested in the JavaCC parser generator that powers PMD, get the JavaCC book!

"Generating Parsers With JavaCC" now available!

My JavaCC book, Generating Parsers With JavaCC is now in stock and available!  Here's where to order it.  It covers lots of stuff that I'm sure will be useful to those working with JavaCC grammars, including sections on:

  • Installing and running JavaCC from the command line, with Ant, and with Maven
  • Building tokenizers with regular expressions, lexical states, and using MORE/SKIP/SPECIAL_TOKEN tokens
  • Creating parsers, using multiple token/syntactic/semantic lookahead, and resolving ambiguities and choice conflicts, plus a chapter on JJDoc.
  • Working with Abstract Syntax Trees (ASTs), using conditional node descriptors, and building Visitors with JJTree.  Even a section on JTB!
  • Handling Unicode characters in a JavaCC grammar
  • Trapping and recovering from errors within a tokenizer, parser, and tree builder
  • Testing tokenizers and parsers with JUnit, and testing ASTs with XPath
  • Installing and using the excellent JavaCC Eclipse plugin
  • A flurry of small examples showing JavaCC in various contexts: parsing Apache logs, a simple Logo parser, and several others.

This book was a lot of fun to write and with any luck it'll make JavaCC a much more accessible parser generator.  Enjoy!

PMD 4.0rc2 released - Java 1.5 by default

Just to shake out any more bugs we've released PMD 4.0rc2.  The big change for this release is that PMD now expects to process Java 1.5 source code.  You can still analyze Java 1.4 (and earlier) source code by using the "targetjdk" parameter, but 1.5 is the default now.  There are also a few bugfixes from rc1, so things keep getting better. 

You can download this release and see the release notes on the project download page, enjoy!

Using PMD?  Get the PMD book - now only $20!

Finding PMD rules with multiple examples

I was working on some stuff before releasing PMD 4.0rc2 tonight and was trying to find any rulesets which had rules with more than one example.  A Ruby one-liner to the rescue!

$ ruby -rrexml/document -e 
  'include REXML ; 
  Dir.glob("*.xml").each {|f| r = Document.new(File.read(f)).elements
  ["//ruleset/rule[count(example)>1]"] ; 
  puts f if r }'
j2ee.xml
basic.xml

XPath and Ruby are a pretty sweet combination!

A Java to HTML converter

Over on javacc-users we've been discussing syntax highlighting; I wrote a little highlighting app last week, and I've been thinking about this a bit since then. 

So, along a similar line of thought, here's a simple Java to HTML converter.  It's not as fancy as java2html, but the grammar has some interesting parts, including no SPECIAL_TOKEN definitions, a conversion of Java to HTML within a COMMON_TOKEN_ACTION, and bonafide token definitions (vs SKIP tokens) for whitespace characters.  It was fun to write and hopefully it'll be useful for someone.

My JavaCC book will be in stock near the end of this week, yay!

Syntax highlighting and JavaCC

A recent post to the javacc-users list asked about using the token definitions in JavaCC for syntax highlighting language elements.  In response I put together a small JNLP-launched example app that builds a little Map of colors to use for each kind of token.   This example is just using a JTextPane, getting each token, and calling setSelection and setSelectionColor in turn to highlight each different token type.

How would we do this for real?  I haven't fiddled with the more advanced JEditorPane usage, but it looks like StyledDocument and AttributeSets with the various colors would do the trick.  You would need to have two modes - one, when a new file was opened, to add the proper colors to all the keywords, and another, as the user was typing, to analyze the text and hopefully make the color change without having to retokenize the entire document.  It seems like JTextComponent supports this sort of thing pretty well since it uses MVC; you could implement highlighting in the background without blocking the user's input.  Actually, JavaCC's tokenizers are fast enough that you could retokenize all but the largest documents without the user noticing, I bet.

Along these same lines, over on the Netbeans project they're doing interesting stuff with Schliemann.  It's sort of like a DSL for language highlighting/folding/indentation;  very nifty!

My JavaCC book should be in stock in less than ten days... huzzah!

A shortcut for PMD XPath rules

The normal path for experimenting with a PMD XPath rule is to run the rule designer, paste in some Java source code, paste in the XPath expression, and see what it returns.  But if you just want to try out an XPath expression and see what it returns in a particular file, there's some code in Subversion to do that now. 

Here's how to use it.  You need a Java source file to run it on:

package foo;
public class Test {
  private int x;
  private int y;
}

Then you can run the XPathTest class and feed it the XPath expression and the filename:

$ java net.sourceforge.pmd.util.XPathTest -xpath "//FieldDeclaration" -filename ~/tmp/Test.java 
Match at line 3 column 11; package name 'foo'; variable name 'x'
Match at line 4 column 11; package name 'foo'; variable name 'y'

This accepts any valid XPath expression, so you can limit the results to fields named 'x' by specifying an attribute predicate:

$ java net.sourceforge.pmd.util.XPathTest -xpath "//FieldDeclaration[@VariableName='x']" -filename ~/tmp/Test.java 
Match at line 3 column 11; package name 'foo'; variable name 'x'

You can download this feature in this updated pmd-4.0rc1.jar file.  Enjoy!  And if you're using PMD, get the PMD book!