While working on my JavaCC book I came across a JavaCC grammar for parsing Cobol programs. It's a pretty hefty grammar file - over 50 KB - with a ton of productions. I ran it through JJDoc and you can see that report as well.
I thought a couple of things were interesting:
- The global lookahead setting is 4, which is a lot more than you usually see.
- There are no lexical states (other than
DEFAULT, of course). - Whitespace seems to be handled entirely through
SPECIAL_TOKEN- noSKIPorMORE. I think that results in a lot of extraTokencreation... but maybe that sort of thing is not a big deal these days.
I tried to contact the author to see if he would mind me putting the grammar on the JavaCC grammars page, but the email bounced. Bernard Pinon, if you see this, nice work, please drop me a line and let me know if this grammar can live on the JavaCC site :-)
Update 3/30/09: Got an email from Bernard, grammar is here now, and he gave me permission to repost it on the JavaCC example grammars page. Thanks Bernard!
Hi Tom,
I just purchased your javacc book, specifically because I need to learn more about javacc so I can use this cobol grammar that you reference in your blog better.
I'm building a Cobol debugger (Yeah, what fun, I'm a lunatic...), and I need to extend that cobol grammar to support embedded CICS commands and embedded SQL. I fiddled around with the grammar, but I just don't understand javacc yet, so I didn't get that far. Any chance you might be able to send me an electronic version of the book so I don't have to wait for the paper copy? I can email you a copy of my receipt if you'd like.
Thanks!
jr
Posted by: Calphool | May 04, 2008 at 10:35 PM
Calphool!! We need to talk!
I am also trying to extend the cobol grammer to support CICS calls, and SQL, and well as some other company standard things. I'm trying to make semantic searching though not a debugger, more for Impact Analysis..
Problem is I'm running into a problem when I run across an all letters cobol word.. DFHRESP fails.. If I change it to DFHRESP1 or DFHRESP-A it gets past it..
I thought for sure...
| )*)*
(["0"-"9"])* ["a"-"z"] ( ["a"-"z","0"-"9"] )*
( ()+ (["a"-"z","0"-"9"])+)*
>
Catered to all text, no numbers or minus/letters..
Posted by: ultimav | August 05, 2009 at 02:28 PM
@ultimav, shoot me an email at tom@infoether.com if you want to work through it, always happy to see new JavaCC grammar stuff!
Posted by: tomcopeland | August 07, 2009 at 12:02 AM