Junior developer

puts "Hello world!"

My Photo


I've been Twittering

The U.S. Navy reading list

Tom Copeland's Recent Posts

  • How to use JavaCC with JRuby
  • Book review - Web Operations
  • Rails test data not getting rolled back
  • Leveraging config.gem in Capistrano's deploy:check
  • Things to keep from RubyForge
  • Generating Parsers with JavaCC, Second Edition now available
  • Transferring gem namespaces on RubyForge
  • Upgrading PostgreSQL with pg_migrator
  • The rubyforge gem and the RubyForge REST API
  • Applying the Rails XSS vulnerability patch

RSS Feeds

All/ Java/ Ruby/ PostgreSQL/ General

How to use JavaCC with JRuby

If you've done language hacking with Java you're probably familiar with the parser generator JavaCC. You can find a JavaCC grammar for just about anything; there are a bunch of them listed on the JavaCC site. With the parsers generated from these grammars you can do all sorts of nifty language processing stuff - checking Java code for problems, optimizing inefficient CSS, minifying Javascript, and so on.

I'm doing mostly Ruby these days, but all those JavaCC grammars are still accessible and useful through the magic of JRuby. With JRuby I can write a Ruby script that loads up a JavaCC-generated parser and rips right through whatever data I need to manage. Here's how.

Let's use the Java grammar as an example. Download this Java grammar and build it into a jar file - basically, you'll do this:

wget https://javacc.dev.java.net/files/documents/17/44514/Java1.5_parser_and_AST.zip
unzip Java1.5_parser_and_AST.zip
cd JavaParser
javac src/japa/parser/*.java src/japa/parser/ast/*.java \
src/japa/parser/ast/expr/*.java \
src/japa/parser/ast/visitor/*.java \
src/japa/parser/ast/body/*.java \
src/japa/parser/ast/type/*.java \
src/japa/parser/ast/stmt/*.java
jar -cvf grammar.jar -C src/ japa/

Or, if you're in a hurry, just download grammar.jar which has all that stuff in it. Now, install JRuby if you don't already have it somewhere on your system - rvm is probably the best path for this, or you can just download the latest binary and untar it somewhere on your computer. Finally, add a little test source file to the current directory - call it Hello.java and put this code in it:

public class Hello {
  public void hi() {
    System.out.println("Hello world!");
  }
}

With that setup in place, the nicest way to explore JavaCC and JRuby is to use JRuby's interactive interpreter, jirb:

$ jirb
>> VERSION
=> "1.8.7"

Great, we're in. Let's try to use that JavaParser class:

>> JavaParser
NameError: uninitialized constant JavaParser
	from (irb):2

Oops, need to import grammar and java as well:

>> require 'java'
=> true
>> require 'grammar'
=> true

Now we'll import JavaParser to save some typing:

>> java_import 'japa.parser.JavaParser'
=> Java::JapaParser::JavaParser

OK, let's load up that Hello.java file. First we'll create a Java File object:

>> f = Java::Java::io::File.new("Hello.java")
=> #<Java::JavaIo::File:0x743c86e9>

Now we parse the file contents!

>> root = JavaParser.parse(f)                
=> #<Java::JapaParserAst::CompilationUnit:0x442982d8>

We now have a reference to the root of the abstract syntax tree (AST) that the parser has built from that source file. What can we do with it? Well, we can show the name of the class:

>> root.types.first.name
=> "Hello"

We can also do something a little more interesting - we can use a Visitor implementation that comes with this grammar to visit each node of the AST and print out the source:

>> java_import 'japa.parser.ast.visitor.DumpVisitor' 
=> Java::JapaParserAstVisitor::DumpVisitor
>> d = DumpVisitor.new
=> #<Java::JapaParserAstVisitor::DumpVisitor:0x962e703>
>> d.visit(root, nil)       
=> nil
>> d.getSource
=> "public class Hello {\n\n    public void hi() {\n        System.out.println(\"Hello world!\");\n    }\n}\n"

We can also just use the tokenizer (i.e., the JavaParserTokenManager) if that's all we need. Here's a little program to do that - put this in a file called tokenize.rb:

require 'java'
require 'grammar'

java_import 'japa.parser.JavaParserTokenManager'
java_import 'japa.parser.JavaCharStream'
java_import 'java.io.FileReader'

file_reader = FileReader.new("Hello.java")
jcs = JavaCharStream.new(file_reader)
jptm = JavaParserTokenManager.new(jcs)

while ((t = jptm.next_token).image.size != 0) do
   puts t.image 
end

When you run it with jruby tokenize.rb you'll see this:

$ jruby tokenize.rb 
public
class
Hello
{
public
void
[ etc etc] 

This gives us the ability to use any JavaCC grammar's tokenizer to lex any data file. Very handy!

There's a lot more we can do with JRuby and JavaCC, but this should give you a feel for the possibilities. Enjoy!

Check out my JavaCC book for a much deeper dive into JavaCC, JJTree, and all that.

November 12, 2010 in Java, Ruby | Permalink | Comments (0) | TrackBack (0)

Book review - Web Operations

I just finished reading a great book - Web Operations by John Allspaw and Jesse Robbins. This isn't a book of code samples; it's a book to make you think more about system administration in the large, infrastructure as code, and other big picture items for the web systems that we deal with today. Some notes:

  • Chapters 6 and 11 talk about the difference between monitoring and metrics and the need for both. I've always done the monitoring part - mostly on the system level, though, gathering load average and disk usage and such. What I haven't done enough of is monitoring on the app level - e.g., if 100 users have signed up in the last day, alert me. These sorts of things are very doable, but they require effort. For metrics, doing those on the application level is very informative - how many users signed up today? How many tickets were opened in the past week? These chapters talk about using Ganglia for that... sounds like a great tool, and would be an improvement over the low-ceremony daily summary emails. At least those are a start, though; better than nothing. Gathering those metrics can lead to interesting discussions around trends and forecasting - if we did get 100 users in a day, what would that mean for our system? How would that change our hardware breakdown? And there are business-level questions too - are those users coming back, or are they creating accounts and never logging in again?
  • Chapter 7 is a reprint of the excellent "How Complex Systems Fail" essay by Dr Richard Cook. Well worth reading and re-reading.
  • Chapter 10 is a good discussion of dev/ops collaboration - or, sadly and more often, the conflict between the two. It's a tragedy and a waste when dev folks and ops folks don't get along. The ops guys have so much to teach the devs - they spend all day dealing with and are experts in stuff that devs usually touch once every 3 months (network configs, DNS, SMTP, backup/restore). And vice versa, of course - my impression is that there are a lot of ops shops that aren't using SCM for their scripts and don't have server builds automated. Tools like Puppet can help to bridge this gap... generally, both dev and ops need to be considerate of each other's responsibilities and needs. Lots to learn here.
  • Chapter 12 had a fun discussion of the lure of DB clustering. I won't spoil it for you, but it really rang true for me.
  • One thing I liked about the whole book was the general assumption that you do want failover and redundancy and are willing to work for that. That is to say, that you actually care about the app you are working on, the people you are working with, and the customers you are serving! I don't quite know how to put my finger on it... but there's sort of an underlying thoughtfulness about it all. There's a feeling that I don't want to blame others for the system going down, rather, I want to build in sufficient checks and balances so that when a server goes down the system continues to tick along without anyone having to make the 2 AM drive to the colo. It reminds me of the Cassandra project fellows saying that if a DB node goes offline in the middle of the night they can say "meh" and let it go until the next day. Good stuff.

Obviously, I heartily recommend this book. If you've done much devops at all you'll find yourself enjoying (and sympathizing with the folks suffering through) the anecdotes, and you'll come away from this book a list of things to do to lower the stress level around running your web app. Enjoy!

August 17, 2010 in Ruby | Permalink | Comments (2) | TrackBack (0)

Rails test data not getting rolled back

Here begins a MySQL new bee tale of woe. I was working on a Rails app, writing tests, all was well. Then a co-worker told me the tests were failing for him. I couldn't reproduce it; they were working fine for me.

Later I noticed odd behavior - I would create records in the setup method and they would stick around. I flailed around for a bit, poking myself in the eye and such, and then finally got down to business and did a show create table my_table_name and whoa, it's MyISAM! That's not good. So I set this option in /etc/my.cnf:

default-storage-engine = innodb

Restarted MySQL - what's this, it wouldn't start! Checked the logs:

100303 18:37:37 [ERROR] Unknown/unsupported table type: INNODB
100303 18:37:37 [ERROR] Aborting

Now we're getting somewhere. The clincher:

mysql> show engines;
+------------+---------+-----------------------------------------------------------+--------------+------+------------+
| Engine     | Support | Comment                                                   | Transactions | XA   | Savepoints |
+------------+---------+-----------------------------------------------------------+--------------+------+------------+
| CSV        | YES     | CSV storage engine                                        | NO           | NO   | NO         |
| MRG_MYISAM | YES     | Collection of identical MyISAM tables                     | NO           | NO   | NO         |
| MEMORY     | YES     | Hash based, stored in memory, useful for temporary tables | NO           | NO   | NO         |
| MyISAM     | DEFAULT | Default engine as of MySQL 3.23 with great performance    | NO           | NO   | NO         |
+------------+---------+-----------------------------------------------------------+--------------+------+------------+
4 rows in set (0.00 sec)

Notice what's not there? InnoDB! This was a MySQL 5.1.44 install that I had merrily compiled from source, feeling like quite the l33t h@x0r. But what I hadn't realized was that the InnoDB storage engine was now a plugin. After some more blundering this way and that, I settled on:

./configure --prefix=/usr/local/mysql51 \
--with-unix-socket-path=/usr/local/mysql51/run/mysql_socket \
--with-mysqld-user=mysql --enable-thread-safe-client --with-plugins=innobase

A make && make install and a MySQL restart later I'm up and running; show engines now displays InnoDB. Best of all, I could see and fix those test failures. Huzzah!

In retrospect I should have dove into the problem as soon as it was reported to me... live and learn.

March 03, 2010 in Ruby | Permalink | Comments (2) | TrackBack (0)

Leveraging config.gem in Capistrano's deploy:check

Capistrano lets you enumerate your Rails application's dependencies so you can check them at deploy time. Mislav Marohnić did a good description of it a while back; here are some example depend entries:

depend :remote, :gem, "tzinfo", ">=0.3.3"
depend :local, :command, "svn"
depend :remote, :directory, "/u/depot/files"

The problem with depend :remote, :gem, though, is that it duplicates the config.gem entries that you already have in config/environment.rb. It'd be much nicer if you could just reuse those.

So, here's some code that you can paste in your deploy.rb to do just that:

# Add dependencies on gems listed in config/environment.rb
class Collecter
attr_accessor :dependencies
def initialize
@dependencies = {}
File.read(File.join("config", "environment.rb")).split("\n").select {|line| line.match(/^(\s)*config.gem/) }.each do |line|
self.instance_eval(line)
end
end
def gem(name, args)
@dependencies[name] = args
args[:version] = ">=0.0.1" unless args.include?(:version)
end
def config
self
end
end
Collecter.new.dependencies.each do |name, args|
depend :remote, :gem, name, args[:version]
end
after "deploy:setup", "deploy:check

This parses your config/environment.rb, extracts the config.gem calls, and evaluates them in the context of an object that gathers up the dependency arguments. Then it declares an after hook for deploy:setup that runs deploy:check, so when you set up the application on a new server it'll ensure that the right stuff is in place.

Here's a sample run from my military reading list app:

$ cap deploy:setup
* executing `deploy:setup'
* executing "mkdir -p /var/www/militaryprofessionalreadinglists.com/"
[blah blah blah]
command finished
triggering after callbacks for `deploy:setup'
* executing `deploy:check'
* executing "test -d /var/www/militaryprofessionalreadinglists.com/releases"
servers: ["militaryprofessionalreadinglists.com"]
[militaryprofessionalreadinglists.com] executing command
command finished
[blah blah blah]
* executing "gem specification --version '>=0.0.1' mislav-will_paginate 2>&1 | awk 'BEGIN { s = 0 } /^name:/ { s = 1; exit }; END { if(s == 0) exit 1 }'"
servers: ["militaryprofessionalreadinglists.com"]
[militaryprofessionalreadinglists.com] executing command
command finished
You appear to have all necessary dependencies installed

Good times. The nice thing here is that the developer only has to list the dependencies in one place, and the hook ensures that failures are loudly proclaimed during initial setup.

I'm not sure how to integrate this into Capistrano itself; if Lee Hambley or one of the other Capistrano gurus sees this perhaps they can weigh in... thanks!

February 05, 2010 in Ruby | Permalink | Comments (0) | TrackBack (0)

Things to keep from RubyForge

Executive summary: File uploads, mailing lists, and virtual hosts

Everyone knows that the RubyForge gem index is getting merged into the GemCutter gem index to form a single massive community gem index at rubygems.org. That leaves open the question of what happens to the other RubyForge services. We've been saying that most of the RubyForge services - the ones others can do better - are going away as we move towards the community hub concept that Rich outlined.

So, which RubyForge stuff do we keep? Based on feedback and further discussion, here are the services which are making the cut:

  • mailing lists : these seem to be popular and alternatives seem to be not so good
  • file uploads : these can be done elsewhere, but it seems like keeping them with the projects is worthwhile
  • virtual hosts : these provide Ruby projects with a convenient place to have a home page
  • projects: since we need something off of which to hang the other features

These features will be part of the new community hub, albeit in a much-improved format. Stay tuned... and comments welcome!

October 30, 2009 in Ruby | Permalink | Comments (12) | TrackBack (0)

Transferring gem namespaces on RubyForge

A while back I blogged about how you could now see what gem namespaces your RubyForge project owns. That's been improved a bit; as of today you have the ability to release a namespace back into the wild. Here's a screenshot when I was logged into RubyForge and looking at the bottom of the codeforpeople files page:



So, if you go to the "Files" tab on your project and scroll to the bottom of the page you'll see a list of the namespaces your project owns. Each has a little "x" next to it. Just click that "x", go through a confirmation page, and your project will no longer own that namespace.

Note that you also need to delete any gems that claimed that namespace in that project. For example (from the FAQ entry on this topic), suppose you had a mygames project and released a file asteroids-0.0.1.gem. This means that mygames owns that asteroids namespace. Suppose you start an asteroids project and want to move the namespace over. To do this, you would release the gem on the new asteroids project and then delete the namespace from the mygames project. The next time RubyForge deploys gems it'll notice that the asteroids project has a new gem that uses a now-unclaimed namespace, and there you are, namespace transferred.

You can see that there is a gap here - if someone else has released an asteroids gem on their project there's a chance they might get the namespace instead. If that happens, please file a support request and we'll figure it out. Eventually I'll probably modify the user interface so that you can explicitly transfer the namespace from one project to another.

For now, though, this is better than having to file a support request every single time. Enjoy!

October 21, 2009 in Ruby | Permalink | Comments (0) | TrackBack (0)

The rubyforge gem and the RubyForge REST API

Yesterday Ryan Davis and I released v2.0.0 of the "rubyforge" gem. The big change for this version is that it no longer interacts with RubyForge by scraping HTML; instead, it uses the new RubyForge REST API. So instead of POST'ing a form to login and fetch your project list, it uses HTTP Basic authentication and hits /users/zenspider/groups.js.

If you're using the new gem, you may see this when you run rubyforge config:

$ rubyforge config
/Library/Ruby/Gems/1.8/gems/json-1.1.7/lib/json/common.rb:122:in `parse': 
618: unexpected token at '<?xml version="1.0" encoding="UTF-8"?> (JSON::ParserError)
<!DOCTYPE html
[ ... lots more ... ]

That's happening because your ~/.rubyforge/user-config.yml file is telling the rubyforge gem to hit https://rubyforge.org, and it needs to hit http://api.rubyforge.org instead. To fix that, either run rubyforge setup, or just edit your ~/.rubyforge/user-config.yml and change the uri setting.

I'm happy to have a REST API for RubyForge available; this has been in the works for a few months and I hope folks come up with some interesting ways to use it. It's certainly much easier (and much more efficient) than the old way of using Mechanize or whatever to parse the HTML. Not all the RubyForge resources are available yet, so if you want to access something and don't see it in the API please let me know and I'll add it. The code is here and feedback is of course welcome. Enjoy!

September 22, 2009 in Ruby | Permalink | Comments (0) | TrackBack (0)

Applying the Rails XSS vulnerability patch

I'm probably making this harder than it should be... but if so, the Internet will correct me :-)

Anyhow, I wanted to apply the Rails XSS vulnerability patch on a machine that was running Rails 2.3.2. The gems weren't frozen to the app, though, they were just out there in /usr/local/lib/ruby/gems/1.8/gems/.

I moved over to the gems directory and tried to apply the patch, but I got the "which file do you want to apply the patch to" message. It makes sense; the patch wants to make the change to (for example) activesupport/lib/active_support/multibyte.rb and all the gem directories have the version numbers attached - e.g., activesupport-2.3.2. I was getting ready to kind of pick the patch apart but then thought "hold on, symlinks!" So I did this:

$ cd /usr/local/lib/ruby/gems/1.8/gems/
$ sudo ln -s activesupport-2.3.2 activesupport
$ sudo ln -s actionpack-2.3.2 actionpack
$ sudo patch -p1 < /home/tom/2-3-CVE-2009-3009.patch
patching file activesupport/lib/active_support/multibyte.rb
patching file activesupport/lib/active_support/multibyte/chars.rb
Hunk #2 succeeded at 283 (offset -15 lines).
Hunk #4 succeeded at 622 (offset -15 lines).
patching file activesupport/lib/active_support/multibyte/utils.rb
patching file activesupport/test/multibyte_utils_test.rb
patching file actionpack/lib/action_view/helpers/tag_helper.rb
$ sudo rm -f actionpack
$ sudo rm -f activesupport

Restart the app, and huzzah! All's well.

September 04, 2009 in Ruby | Permalink | Comments (7) | TrackBack (0)

What gem namespaces does that RubyForge project own?

Back in January I posted a list of which RubyForge projects own what RubyGem. That's all well and good, but it's a one-time dump of data and it's more or less hidden off in a corner.

So now to fix that, you can see the gem namespaces each project owns at the bottom of the "files" page. For example, here's the seattlerb project "files" page with their massive list of namespaces.

This list gets updated daily, so it should stay fresh.

Next task: the ability to transfer gem namespaces to another project. That's been a long time coming, but this gets us a step closer.

August 25, 2009 in Ruby | Permalink | Comments (1) | TrackBack (0)

How much disk space is my RubyForge project using?

To find out, just log in to RubyForge, go to your project, and click the Admin tab. The totals are split out for "SCM" (source code mgmt, e.g., cvs/git/svn), space used in your project's virtual host, and space used by the files you've released:

Here are the top 10 projects in terms of SCM usage:

gforge=> select g.unix_group_name, d.scm_space_used/1000 \
as MB from disk_usages d, groups g where g.group_id = d.group_id \
 order by d.scm_space_used desc limit 10;
 unix_group_name | mb  
-----------------+-----
 blacklight      | 598
 cougar          | 576
 instantrails    | 547
 easygameengine  | 249
 ogrerb          | 228
 rubyes          | 215
 fxruby          | 187
 dojo-pkg        | 177
 restore         | 174
 tubix           | 150

And for released files:

gforge=> select g.unix_group_name, d.released_files_space_used/1000 \
as MB from disk_usages d, groups g where g.group_id = d.group_id \
order by d.released_files_space_used desc limit 10;
 unix_group_name |  mb  
-----------------+------
 backlog         | 1130
 instantrails    |  960
 rubyinstaller   |  726
 rhodes          |  666
 rmagick         |  603
 wxruby          |  535
 rails           |  369
 linnet          |  326
 rubricks        |  323
 fxruby          |  316

And for virtual host space used:

gforge=> select g.unix_group_name, d.virtual_host_space_used/1000 \
as MB from disk_usages d, groups g where g.group_id = d.group_id \
order by d.virtual_host_space_used desc limit 10;
 unix_group_name | mb  
-----------------+-----
 instantrails    | 328
 funfx           | 171
 rubyworks       | 116
 fxruby          | 109
 roby            |  97
 scrubyt         |  95
 freeride        |  91
 twitter4r       |  84
 rpa-base        |  76
 rubyhackerblog  |  74

Not surprising that InstantRails would be the leader or close to it in all three categories, I guess; such is the way of projects with large binaries. "backlog" has a bunch of war files - targetting JRuby, I reckon.

This information is populated via a cronjob, so if you clean up some stuff it'll be a while before the numbers get updated. Right now I've got it scheduled to run once a week. Mechanics-wise, it's very low-ceremony - it just iterates over the active projects and runs du -sk on various directories.

August 21, 2009 in Ruby | Permalink | Comments (0) | TrackBack (0)

Next »