Gallery
Mochaccino Sprint 2
Share
Explore

icon picker
Overview

Everybody should learn to program a computer... because it teaches you how to think.
Before we get into writing the tokeniser, let’s first take a moment to appreciate how all the moving parts come together to give us a working programming language. The diagram below may help to elucidate the various components of our language:
image.png
In the last sprint, we set up our syntax highlighter. In this sprint and many subsequent ones, we are going to be working on the compiler — the actual beating heart of the entire language.
The compiler itself is made up of a tokeniser, parser, and interpreter.

Tokeniser

The first thing we want to do is to take the user's source code, and produce a stream of tokens. A token contains the following data:
Lexeme
Token type
Literal (if applicable)
And also, optionally:
Line number, column number

The parser will then take this list of tokens and produce an Abstract Syntax Tree (AST), which can then be interpreted, transpiled, or compiled. But what's a lexeme anyway? A piece of source code is made up of smaller, distinct, lexemes:
var a <str> = "hello";
In this case, var, <, str, >, =, etc. are lexemes because they are small, logical chunks of the language. What isn't a lexeme?
If we were to take just ar out of var, it doesn't make any sense, so it isn't a lexeme. Neither is var a, because it's too long and can be further broken down into smaller chunks. So a token essentially identifies what a certain character stands for, where it is located in the source code, and the actual value of the word if it’s a literal.

Parser

The parser then takes this linear stream of tokens and produces an Abstract Syntax Tree (AST). The AST arranges the tokens in a hierarchical manner, and serves a few functions:
static analysis pass

Interpreter

Share
 
Want to print your doc?
This is not the way.
Try clicking the ⋯ next to your doc name or using a keyboard shortcut (
CtrlP
) instead.