Building an interpreter in Go: Scanner

2020-04-20

I recently discovered a book that walks through the nitty-gritty of creating a programming language (Crafting Interpreters). So far it's been excellent!

The book walks through implementing a language called Lox in two different ways:

As a tree-walk interpreter written in Java (easier, but slower at runtime)
As a compiler written in C (more involved, but faster at runtime)

I've started following along with the first implementation, but writing in Go rather than Java, since I already know Java and have wanted to try Go for some time.

Then I realized I could compile my Go program to WebAssembly and call it from a web browser!

Here's the first milestone of my project:

This text box accepts Lox code, sends it to WebAssembly, and the page animates the list of tokens returned by the scanner. Try changing the input to see the resulting tokens!

The scanner is the simplest part of a programming language interpreter, so this first post is really about connecting a few different technologies.

A summary of the parts:

Scanner

The scanner takes the input code and creates a list of tokens that will be passed to the parser.
The parser will turn the tokens into an Abstract Syntax Tree representing the code. More on the parser in a future post.
Hover over the output above to see information about each token.
The blue highlights represent each token.
The red underline represents the character currently being evaluated.
The green cursor represents the start of the current token.
The scanner doesn't interpret the tokens -- you could put in var x = "hi""there" and it would produce tokens.
My scanner is written in Go.
Here's a link to the scanning chapter in Crafting Interpreters

Compilation to WebAssembly

WebAssembly (wasm) is a binary instruction format that browsers know how to execute. Many languages compile to wasm, including Go!
The Go binary produced is absurdly large (2.4MB!). The tinygo compiler will make smaller binaries, but I had issues with it and haven't set it up yet.
Go support for wasm felt clunky to me. The Go code creates new functions on window which just feels wrong.
I treat the Go binary as a library, and call it from javascript to get a list of the every step performed by the scanner. I log these steps to the console if you want to take a look!
Asking for every step is unsustainably slow for anything but a toy example, but it works for the visuals I wanted to make.

Visualization in d3.js

This was my first time playing with d3, a very popular visualization library for javascript.
It took a few hours to understand d3's selection api, but now that I do, I find it really intuitive and flexible. This blog post was the most helpful resource I found while figuring things out.
The basic idea is to select some group of objects (for example, a list of rectangles in an svg), and bind it to a list of data points (for example, a list of tokens to display). Then you tell d3 how to draw each data point at three different moments:
- When the data point is entering. Here we draw the object for the first time.
- When the data point is updating. Here we update any attributes of the object that need updating -- e.g. the position of the data point may have changed.
- When the data point is exiting. Typically we would just remove the object in this step.
- d3 decides which objects are entering/updating/exiting based on the size of the selection vs. the size of the data points. If the dataset has more elements than the selection, we enter the new ones and update the existing ones!
Using my code as an example, a selection might be the highlights -- svg rect elements. The data points are the tokens for the current step of the scanner, and I change the current step every 150ms.
The entire source code is less than 200 lines of javascript (that I had to write) including comments!

My next project is to visualize the parser as it's parsing the tokens into an AST!