AN ANTLR TUTORIAL FOR THE CRIMINALLY ASININE
So if you're like me, you opened up ANTLR and initially said "WOW this is so cool! I can write my own programming language!" This was most likely followed by the realization that you had no idea what you were doing, why nothing you did had a "proper output," why a header you declared only showed up in one spot, and why someone would name an ll(*) parser ANTLR. Well worry no more! I am here to give you everything you need to know in order to get ANTLR running with all the features you need to make a basic language.
Step 1:Getting Started
Well in order to get started, you need to go to ANTLR's site and do what they say. I'm not here to tell you that! (P.S. Look at their goofy hand thing... It's supposed to be ANTLRS!!! GET IT!?!?)
ANTLR's Website (if you cant find the spot for new people, go here: Getting Started )
If you didn't already, make sure you get ANTLRworks, it is your friend... Anyways, you should be able to get all of that set up. make sure you have that all working and then copy and paste the most basic grammar you can find on the site and make sure that compiles... I recommend this one: Expression evaluator
If you're like me, example code helps quite a bit, and you can usually get it for free from really good books on-line. Like "The definitive guide to ANTLR". It's a great book, and it's like the only way open-source code writers make money.If you are strapped for cash, it's got some great free code examples. And they're available here: Pragmatic Programmer's definitive guide to AJAX. Since I'm using their code to learn ANTLR, I should probably give a plug for their book. Make your employer buy it or something for you. I promise it has more information than this website.
Well from here on out, you're ready to go, but if you are like me, you will struggle with a couple of things. First of all, your eclipse plugin will suck and not parse your code well. First of all, make sure you aren't using your eclipse plugin to debug your grammar, that's what ANTLRworks is for, and trust me, you want ANTLRworks. Second of all, I use the ANTLR IDE plugin, and i find it works pretty decently. Here's a link: ANTLR v3 IDE
Step 2: Lexing
In other parser generators (yacc, bison, lex), the Lexer and Parser are two separate entities that are also developed separately. NOT SO IN ANTLR! Well, not exactly, i mean, they are still separate, and the way you tell is by the first letter of the rule. If it is capital, it's a lexer rule. If it's lowercase, its a parser rule.
Most people seem to put all their lexer statements and parser statements in the same file. Also, lexer rules are usually done in ALL CAPS while parser rules are done in camelCase. Not being one to stray from the pack, I shall do the same! Also, I would recommend you follow the pack, which isn't necessarily a bad thing in this situation.i + d = k
All statements (lexer and parser) need to be "left recursive" if recursive at all. On top of that, lexer statements can only use other lexer statements. The lexer is also what decides what your 'k' will be. This is how many characters your lexer must read ahead before it realizes what token it is parsing. It's like golf, you want a low score. Strive for 1, but its OK if your K is 2. (It will be anyways if you have the '=' and '==' operators). Specify your 'k' in the options bracket that should have been made apparent to you from the ANTLR site.
Illegal option output!?
Now that you're using all the same things i use, you will probably get the same bugs as me. First of all, whenever you see "Illegal option output," It's referring to something in your grammar, not your output option. Also, the only good way i have currently to fix these problems is to comment out parts (i was trying to use the '!' operator, btw).
Why is it not finding all of my lexer token?
That actually brings me to my next point... the '!' operator is intended to hide things from being sent to the parent node, but BY DEFAULT the last token grouping specified is automatically returned to the parent node unless otherwise specified (we're getting there!). So if you notice that not all of your token is being sent up, it's probably because you need to group it using the '(' ')' operators.
Parsing
Don't worry, parsing is way less confusing than lexing, just remember a couple of things: don't make anything left recursive, and use 'single' quotes for strings you are trying to match.
Recursion? Left, Right, huh?
you might be thinking "Left-recursive means write left-recursive grammar, right?" You're wrong! Write grammars statements that call themselves have their call on the right side. This allows the grammar to match this rule quickly without getting into those infinite loops, which would be a really cool roller coaster, btw.
I'm not going to teach you recursion, use wikipedia and google. look up something like "left recursion." God are you that lazy? Fine! Here's the link: Wikipedia on Left recursion and why it's a bad idea.
If you needed me to tell you that, you probably are a lot more lost than i think you are.
Why do some of these examples have "double quotes"
When i was learning from examples, i ran into some that had double quotes. I have no idea why, those don't work. use single quotes unless you buy the book and see a reason to go the other way.
Building AST's
Well now that you have a working language parser, you probably want to implement the cool part that actually does stuff... That isn't what this tutorial is about, but I'll tell you that a lot of people prefer to get this parsed data in something called an AST (Abstract Syntax Tree). The way you usually do this is using ANTLR's rewrite rule. It's way easy, but i couldn't find any documentation that would actually get me to stop getting compiler errors. The trick to these guys is the "->" and "^(" operator.
-> ^('awesome' AST rewrite syntax)
The title pretty much says it all. basically rewrite rules start with the "->" symbol and you make a node by using the '^( )' operator, which basically creates a node below whatever the parent node is of class 'whatever your first token is after the ^('. You need to specify a first token as the type of the node you're creating, otherwise you will get errors. you can use 'single quotes' or a TOKEN as the class. specify extra tokens at the top of your file after your options { } using tokens { COOLNAME; }. COOLNAME can be replaced with whatever you want the token to be called. you can specify as many tokens as you want, you idiot.
Sample grammar with AST
I bet you all just skipped down here, i know i would have... here's a language I wrote for the University of Nebraska, Lincoln for their CEENBoT project. You can check them out here: CEENBoT. It was designed to be simple for children. Did i succeed? you be the judge.
I can't show you this right now due to some restrictions... it will be up shortly...
Well that's it!
Yep, that's all i felt like writing on this for now... maybe I'll add more, maybe I'll never get around to it. Enjoy!