Overview

What goes into creating a programming language?

⁠

Language Design

This step is important if you’re building a serious programming language that will actually be available to the masses. If so, then you should.

Note

The Insidion Engine is meant for simple, small-scale engines that parse relatively simpler grammars. This is not to say that the Insidion Engine cannot handle complex grammars, but rather, that you are better off with a different architecture if you’re going for production-grade engines.

⁠

The 5 Tasks of the Engine

Your source code is put through 5 stages of processing in order to be run.

Scanning

The first step is scanning. Scanning basically takes your source code, splits them up into characters, and strips out comments.

func myAmazingFunc() {

// single comment

/* multi

line

comment

}

Lexing

The next step is lexing, also known as lexical analysis or tokenisation. This takes the list of characters, and produces tokens. It does so very superficially, without going into too much detail. Taking the following statement, for instance:

var myVar = !anotherVar;

produces the following tokens:

{

"type": "variable.declaration",

"name": "myVar",

"value": {

"type": "operator.unary.complement",

"operand": {

"type": "variable.reference",

"name": "anotherVar"

}

Essentially, the lexer produces a rudimentary abstract syntax tree, showing the rough outline of the program. If a math operation is in play, it ensure there are expressions on the left- and right-hand side. The parser is the one validating these expressions. Hence, the lexer can catch most syntax errors while the parser catches runtime errors.

Parser

The parser parses the validity of each token in the basic syntax tree produced by the lexer. Take the following token, for example:

{

"type": "operator.unary.complement",

"operand": {

"type": "variable.reference",

"name": "anotherVar"

}

The parser understands that this is a variable reference, and looks up its variables to check whether it is defined. Moreover, it also checks whether it is of the boolean type, which is necessary for the operation. If there’s an issue with any part of the validation, the parser can respond in a few different ways, discussed later on.

Compiling

The compiler then takes a valid AST, and transforms it into machine code. Machine code can include languages like Assembly and frameworks like LLVM. You can also create your own Bytecode VM, which is like a virtual machine that ships together with your scanner, lexer, parser, compiler. For a bytecode VM, your compiler needs to convert the AST into a bytecode format.

Executing

If you have a bytecode VM, things are easier because you have a VM that directly runs your code, and is portable across platforms.

⁠

Gallery

Want to print your doc?
This is not the way.

Try clicking the ⋯ next to your doc name or using a keyboard shortcut (

CtrlP

) instead.