Build your own programming language with ANTLR
There are numerous programming languages available nowadays. Each one comes with different kind of coding styles; In compiler design we call it grammar which describes the coding style of particular language
Compiler
Compiler is a kind of program that translates source code from high level language in to lower level language.
Eg: C source code will be translated in to assembly
Basic Components of a Compiler
Lexer : splits source code in to tokens which are special keywords and structures of specific programming language
Parser : Identifies patterns of token set and build Abstract Syntax Tree(AST).
Generator : Generates the syntax of target language
When there are new changes in grammar above components need to be changed. Therefore writing a compiler from the scratch is somewhat difficult.
ANTLR
Another Tool For Language Recognition or ANTLR is making this task easy by giving formatting language for grammar. Also the Lexer and Parser source codes will be generated automatically. Awesome right 😍
Motivation
We are going to create very simple language called simpler 💪
a = 100
b = 150
show 10
show a
show b
simpler language only can store and display integer variables 😋
output
10
100
150
Getting Started
Setting up environment
- Install Java and download ANTLR library
- Add ANTLR library location to path variable
Very first you need to define your language’s grammar. Create simplerlang.g4
grammar simplerlang;program : statement+;statement : let | show ;let : VAR ‘=’ INT ;
show : ‘show’ (INT | VAR) ;VAR : [a-z]+ ;
INT : [0–9]+ ;
WS : [ \n\t]+ -> skip;
let and show are statements used to assign value a variable and used to display value(or value of a variable) respectively. INT means integer and VAR means variable
Generate Lexer and Parser sources
java -cp antlr-4.7.1-complete.jar org.antlr.v4.Tool simplerlang.g4
if you need to set custom package name use -package option 😀
java -cp antlr-4.7.1-complete.jar org.antlr.v4.Tool -package simplerlang simplerlang.g4
Do the things you want
Now for each statements you can write some functions for your own language. simplerlangBaseListener class is having methods which will be called when ANTLR is dealing with AST. So we can go ahead and use those.
Create simplerlangCustomListener and extend simplerlangBaseListener. override methods as per below.
We need HashMap to store our variables 😎.
HashMap<String, Integer> variableMap = new HashMap();
Handle show statement
@Override
public void exitShow(simplerlangParser.ShowContext ctx) {
if(ctx.INT() != null){
System.out.println(ctx.INT().getText());
}
else if(ctx.VAR() != null){
System.out.println(this.variableMap.get(ctx.VAR().getText()));
}
}
exitShow is giving you ctx which holds INT or VAR if there is an integer with show we just print it. otherwise if there is variable name we will fetch value from HashMap and print.
Handle let statement
@Override
public void exitLet(simplerlangParser.LetContext ctx) {
this.variableMap.put(ctx.VAR().getText(),
Integer.parseInt(ctx.INT().getText()));
}
ctx gives you INT and VAR both. So we put in HashMap 🤪
Thereafter Create another java class Simperlang with main method to work as compiler of your own language.
public static void main(String[] args) {
try {
CharStream input = (CharStream) new ANTLRFileStream("test.simpler");
simplerlangLexer lexer = new simplerlangLexer(input);
simplerlangParser parser = new simplerlangParser(new CommonTokenStream(lexer));
parser.addParseListener(new simplerlangCustomListener());
parser.program();
} catch (IOException ex) {
Logger.getLogger(Simplerlang.class.getName()).log(Level.SEVERE, null, ex);
}
}
here test.simpler is your source file. parser.program() will start executing your statements.
Create test.simpler and write something with your own language
a = 100
b = 150
show 10
show a
show b
Hey.. you played with your own language. Congrats! see the output 🔥
10
100
150
This is my first medium article. I hope you enjoyed by reading this. You can download source code of this activity also. Since it is on GitHub your PRs will be appreciated too.
ANTLR is very powerful tool. Ballerina uses the ANTLR too.
Happy coding!