wiki/articles/bytecode.md

6.5 KiB

Bytecode

Bytecode, also known as p-code (for portable code) is a machine language-like representation of a program. Unlike actual machine code, it has a higher-abstraction level, since commonly bytecode is bespoke to the language that it's meant to be used with, supporting high-level operations such as closures, memory management, dynamic typing, etc..

Usually bytecode is meant to be run in a virtual machine; this is common for scripting languages in order to achieve an usable performance, bytecode being a compact representation is more cache friendly than an AST so it allows for faster execution.

In other cases, bytecode is used as an intermediate representation for compiled languages, before lowering down to more machine-specific code.

Examples

PL/0 p-code

  • lit (load constant)
  • opr (operation, this opcode packs multiple arithmetic ops.)
  • lod (load variable)
  • sto (store variable)
  • cal (call procedure)
  • int (increment register)
  • jmp (unconditional jump)
  • jpc (conditional jump)

TODO more info

Lua bytecode

Lua uses a register based virtual machine, originally it was stack-based; the change was made in version 5.0, improving performance.

Factorial program in bytecode form

function fact(n)
  local res = 1
  for i = 1, n do
    -- Lua has no operator-assignment operators
    res = res * i
  end
  return res
end

return fact(5)

We compile to bytecode and dump it with:

lua -p -l -l fact.lua

Annotated (by me) bytecode output:

Toplevel function:                                                         main <fact.lua:0,0> (9 instructions at 0x58e78aa2ecc0)
                                                                           0+ params, 3 slots, 1 upvalue, 0 locals, 2 constants, 1 function
Varargs, 0 fixed arguments expected                                        	1	[1]	VARARGPREP	0
Create a function                                                          	2	[8]	CLOSURE  	0 0	    ; 0x58e78aa2ef20
Bound it to the name "fact" and add it to the _ENV                          3   [1]	SETTABUP 	0 0 0	; _ENV "fact"
Get `print` function reference from the _ENV, to R0                         4   [8] GETTABUP    0 0 1   ; _ENV "print"
Get `fact` function reference from the _ENV, to R1                          5   [9] GETTABUP    1 0 0   ; _ENV "fact"
Load 5 into R2 (first argument for `fact`)                                  6   [9] LOADI       2 5
Call function at R1, with 2-1 args, saving result in the `top` ^1           7   [9] CALL        1 2 0   ; 1 in all out
Call function at R2, using `top` as argument, no return value               8   [9] CALL        0 0 1   ; all in 0 out
Return from toplevel (exit program)                                         9   [9] RETURN      0 1 1   ; 0 out
                                                                           constants (2) for 0x58e78aa2ecc0:
The function name, as a string costant                                     	0       S	    "fact"
Ditto for print function                                                    1       S       "print"
                                                                           locals (0) for 0x58e78aa2ecc0:
                                                                           upvalues (1) for 0x58e78aa2ecc0:
_ENV is the table that contains the global enviroment                      	0	_ENV	1	0
                                                

The `fact` function:                                                       function <fact.lua:1,8> (10 instructions at 0x58e78aa2ef20)
                                                                           1 param, 6 slots, 0 upvalues, 6 locals, 0 constants, 0 functions
Load 1 into R1 (`res`)                                 	                    1	[2]	LOADI    	1 1
Load 1 into R2, holding the for loop initial state                          2	[3]	LOADI    	2 1
Move value from R0 [`n`] to the for loop max limit register                	3	[3]	MOVE     	3 0
for loop step value, 1 by default                         	                4	[3]	LOADI    	4 1
Initialize for loop ^2                                                 .--- 5	[3]	FORPREP  	2 2	         ; exit to 9
Multiply. R1 [`res`] = R1 * R5 [`i`]                                   | .> 6	[5]	MUL      	1 1 5
Attempt to execute the metamethod `__mul` ^3                           | |	7	[5]	MMBIN    	1 5 8	     ; __mul
Do a for loop iteration                                                | `-	8	[3]	FORLOOP  	2 3	         ; to 6
Return from function with 1 value, R1 [`res`]                          `-->	9	[7]	RETURN1  	1
Return with no arguments (redundant bytecode) ^4                           	10	[8]	RETURN0  	
                                                                           constants (0) for 0x58e78aa2ef20:
Local bindings for this function, with the registers:                      locals (6) for 0x58e78aa2ef20:
                                                                           	0	n	1	11
                                                                           	1	res	2	11
                                                                           	2	(for state)	5	9
                                                                           	3	(for state)	5	9
                                                                           	4	(for state)	5	9
                                                                           	5	i	6	8
                                                                           upvalues (0) for 0x58e78aa2ef20:

Notes:

  1. The top is a special slot in the virtual machine, originally meaning the "top" of the stack (when Lua had stack-based bytecode). Lua still uses a stack model for the C API.
  2. The first argument to this opcode is an offset in the register file in which FORPREP will treat the following three registers as arguments to it; R(Off) = initial and internal state of the loop, R(Off + 1) = the loop limit, R(Off + 2) = the loop step value (1 here) and R(Off + 3) = the register containing the external state, i in this case. The second argument is the jump offset to the instruction after the end of the loop.
  3. I don't know why it does this redundant metamethod lookup
  4. This redundant bytecode is meant for functions that return nothing, here it exists probably because the compiler always adds it to the end of all functions, but it doesn't bother to check later on if its redundant or not, to keep the bytecode compiler simple and fast.

TODO