Building a VM in Ruby
Codegram Lightning Talks
Txus (@txustice) - Twitter, GitHub
What is a VM anyway?
- A program that executes other programs
- Portable across platforms
- Generally written in a low-level language
- May be stack or register-based
Stack-based vs Register-based
Stack-based VMs
- JVM, Rubinius, most of the VMs
- Easier to implement
Register-based VMs
- Lua, Perl 6 (Parrot)
- More similar to actual CPUs
- Can be faster (less instructions to execute)
VMs are (practically) language-agnostic
- Language compilers target a particular VM
- VM features may suit better a particular programming paradigm (OO, functional)
- Own bytecode format
Bytecode format
- Compact representation of a program
- Fast to parse and read
Let's build our own!
But first, let's define what a program for our VM will look like.
Typical procedural program: many routines, one of them being the main
entry point.
Now, what does one of these functions look like?
Literals are all user-entered values (strings or fixnums)
used inside this function code.
Locals are all local variables used in the function
(including the arguments passed to it).
The instructions are formed by an opcode and one
operand (in other VMs there may be more operands).
Instructions
As an example, a function that adds 42 and 5 and returns the result will
have 0 locals, its literal table will have two elements
(42 and 5) and its instructions will be the following:
push_literal 0
push_literal 1
add
ret
We push literals 0 and 1 because those are indexes to the literals table
(the 0 is the first element and the 1 is the second).
The final "ret" is essential to every function. It tells it to return to
the caller the topmost value on the stack (in this case the result of adding
42 and 5).
Introducing MC: our bytecode format
_main_
:42,5,"add"
0.0
0.1
0.2
7.2
8
_add_
1.0
1.1
3
8
Instruction set
- 0:push_literal [literal_idx]
- 1:push_local [local_idx]
- 3:add
- 7:call [num_args]
- 8:ret
_foo_ indicates the start of a function called "foo", and :elem,elem2,elem3
declares a literal table of three elements.
Execution steps
- Load a program file.
- Parse it and create all the functions.
- Find the main function and execute it in a new call frame.
- Apply each instruction to the call frame stack. If we find...
- "call": We execute a function in a new call frame and wait for it
to return.
- "ret": Return the topmost element on the stack to our caller. If
we've consumed all the call frames, program terminates printing the
returned value.
You are now aware that...
our newly built VM is called MicroVM.
Sunday afternoon. One hour. Less than 150LOC (< 4kb).
$ wget https://raw.github.com/txus/microvm/master/microvm
$ chmod +x microvm
$ wget https://raw.github.com/txus/microvm/master/functions.mc
$ ./microvm functions.mc
Go check out the code!
https://github.com/txus/microvm
Problem? (I mean, questions?)
By Txus (@txustice) - Twitter, GitHub