OVM
Warning
OVM is a brand-new tool which started being developed in 2024. It is currently in a period of transition. Bear in mind that bugs may appear, and that some features are intended to evolve.
We recommend you to read HAVM Documentation (possibly use it), the legacy tool, which can complement OVM’s functionality & technical understanding. Take note of known bugs.
OVM is a Tree (HIR or LIR, see Intermediate representation TREE)
programs interpreter. OVM is written and maintained by Damien Assire,
Mael Cravero and Ghiles Ziat. Its development has been generously supported
by IRILL.
- Its features are:
evaluate (with tracing) both HIR and LIR
transform the Tree representation (linearize)
perform static checks on the Tree representation
OVM aims to modernize and replace legacy HAVM so that EPITA students could exercise their compiler projects before the final jump to assembly code. It is implemented in OCaml, an industrial-strength functional programming language with an emphasis on expressiveness and safety. OVM was coined on both OCaml, and VM standing for Virtual Machine.
- Resources:
Required version is OVM 0.1
Feedback can be sent to LRE’s Projects Address.
Documentation
Here is a brief summary of the documentation for OVM.
Table of Contents
Invoking OVM
- To invoke
ovmrun:ovm OPTIONS FILE. Where
FILEis a simple text file, andOPTIONSis any combination of the following options:--show-ast: Enable the printing of the produced AST before actually evaluating the program.--quiet: Disable detailed error output.--memory-size: Set the memory size (default :81920).--trace: Print the trace execution.--lin: Linearize the program before evaluation.--lir: Check if the program is low-level (reject high level constructs).--unsafe: Do not perform the static checks.--ssa: Check if the program is written using Static Single Assignment.--version: Display the program version.-help,--help: Display this list of options and exit successfully.
Steps of OVM
- OVM works in three steps:
it scans and parses the file (lexical and syntactical analysis), producing an abstract representation of a program;
it performs requested static checks and transformations;
it loads the resolved program in the virtual machine, search for an entry point labeled
mainand start evaluation.
OVM vs HAVM
OVM & HAVM serve the same purpose, however, there are some notable design/technical differences.
- Nested
jumpin recursive structure handling: One of the problems HAVM had, was the
jumps and how they were handled. Two main problems occurred on compiled programs.- Ineffective break:
The first problem with
jumpin HAVM is with the break in an expression, like this example:ineffective-break.tigwhile 1 do print_int((break; 1))
![graph "Ineffective break"
{
fontname="Source Sans Pro"
node [fontname="consolas"]
node [fontsize=10, shape=none, height=0.25, margin=0]
label="Ineffective break AST"
nodesep=0
n002 ;
n002 [label="seq", width=0.30] ;
n002 -- n003 ;
n003 [label="label l1"] ;
n002 -- n004 ;
n004 [label="cjump ne"] ;
n004 -- n005 ;
n005 [label="const 1"] ;
n004 -- n006 ;
n006 [label="const 0"] ;
n004 -- n007 ;
n007 [label="name l2"] ;
n004 -- n008 ;
n008 [label="name l0"] ;
n002 -- n009 ;
n009 [label="label l2"] ;
n002 -- n010 ;
n010 [label="sxp"] ;
n010 -- n011 ;
n011 [label="call"] ;
n011 -- n012 ;
n012 [label="name print_int", width=1] ;
n011 -- n013 ;
n013 [label="eseq"] ;
n013 -- n014 ;
n014 [label=<<b>jump</b>>] ;
n014 -- n015 ;
n015 [label=<<b>name l0</b>>] ;
n013 -- n016 ;
n016 [label="const 1"] ;
n002 -- n017 ;
n017 [label="jump"] ;
n017 -- n018 ;
n018 [label="name l1"] ;
n002 -- n019 ;
n019 [label="label l0"] ;
}](../../_images/graphviz-6638715a156c876dff1b8b5f2b1a6054ec355caa.png)
ineffective-break.png
In this example, HAVM evaluates the
eseqby evaluating thejumpand the statement after it. However, it will then return to theeseqand will continue its evaluation by evaluating theconst 1and returning1, doing it indefinitely because of thewhile.OVM correctly handles this case:
tc -H ineffective-break.tig > ineffective-break.hir$ tc -H ineffective-break.tig > ineffective-break.hir $ echo $? 0
ovm ineffective-break.hir$ ovm ineffective-break.hir $ echo $? 0
- Ineffective boolean operator:
Another means to generate a
jumpthat breaks the recursive evaluation is using optimized if, like the following example.ineffective-if.tigprint_int (if 0 | 0 then 0 else 1)
![graph "Ineffective break"
{
fontname="Source Sans Pro"
node [fontname="consolas"]
node [fontsize=10, shape=none, height=0.25, margin=0]
label="Ineffective boolean operator AST"
nodesep=0
n001 ;
n001 [label="call"]
n001 -- n002
n002 [label="name print_int", width=1]
n001 -- n003
n003 [label="eseq"]
n003 -- n004
n004 [label="seq", width=0.30]
n004 -- n005
n005 [label="seq", width=0.30]
n005 -- n006
n006 [label="cjump ne"]
n006 -- n007
n007 [label="const 0"]
n006 -- n008
n008 [label="const 0"]
n006 -- n009
n009 [label="name l3"]
n006 -- n010
n010 [label="name l4"]
n005 -- n032
n032 [label="label l3"]
n005 -- n011
n011 [label="cjump ne"]
n011 -- n012
n012 [label="const 1"]
n011 -- n013
n013 [label="const 0"]
n011 -- n014
n014 [label="name l0"]
n011 -- n015
n015 [label="name l1"]
n005 -- n033
n033 [label="label l4"]
n005 -- n016
n016 [label="cjump ne"]
n016 -- n017
n017 [label="const 0"]
n016 -- n018
n018 [label="const 0"]
n016 -- n019
n019 [label="name l0"]
n016 -- n020
n020 [label="name l1"]
n004 -- n021
n021 [label="label l0"]
n004 -- n022
n022 [label="move", width=0.45]
n022 -- n023
n023 [label="temp t0"]
n022 -- n024
n024 [label="const 0"]
n004 -- n025
n025 [label="jump", width=0.45]
n025 -- n026
n026 [label="name l2"]
n004 -- n027
n027 [label="label l1"]
n004 -- n028
n028 [label="move", width=0.45]
n028 -- n029
n029 [label="temp t0"]
n028 -- n030
n030 [label="const 1"]
n004 -- n031
n031 [label="label l2"]
}](../../_images/graphviz-ffe008b21301c79312cc36f5c7ceca700031a1e7.png)
ineffective-if.png
OVM correctly handles this case, compare the traces below:
tc -H ineffective-if.tig > ineffective-if.hir$ tc -H ineffective-if.tig > ineffective-if.hir $ echo $? 0
ovm --trace ineffective-if.hir | ansi2txt$ ovm --trace ineffective-if.hir | ansi2txt Parsing ineffective-if.hir eval seq eval sxp evalCall print_int evalEseq eval seq eval seq evalCjump const 0 const 0 ne 0 0 Jump to label l4 label l4 evalCjump const 0 const 0 ne 0 0 Jump to label l1 label l1 eval move const 1 move : temp t0 <- 1 label l2 end seq temp t0 = 1 Eseq -> 1 call print_int 1 print_int 1 -> void sxp : void eval sxp const 0 sxp : 0 end seq label end Exit with code : 0 Temporaries state : fp : 81916 sp : 81916 rv : 0 t0 : 1 1 $ echo $? 0havm --trace ineffective-if.hir$ havm --trace ineffective-if.hir checkingLow plaining unparsing checking evaling call ( name main ) [] 10.6-43.15: eseq 11.6-42.13: seq 12.8-30.15: seq 14.12-14.19: const 0 15.12-15.19: const 0 13.10-17.19: cjump ne 0 0 ( name l3 ) ( name l4 ) 26.12-26.19: const 0 27.12-27.19: const 0 25.10-29.19: cjump ne 0 0 ( name l0 ) ( name l1 ) 40.10-40.17: const 1 38.8-40.17: move ( temp t0 ) 1 41.8-41.16: label l2 12.8-30.15: seq end 31.8-31.16: label l0 34.10-34.17: const 0 32.8-34.17: move ( temp t0 ) 0 35.8-36.17: jump ( name l2 ) 11.6-42.13: seq end 43.8-43.15: temp t0 = 0 8.4-44.12: call ( name print_int ) [0] 8.4-44.12: end call ( name print_int ) [0] = () 7.2-44.12: sxp () 46.4-46.11: const 0 45.2-46.11: sxp 0 end call ( name main ) [] = 0 0 $ echo $? 0
- Proposed solutions:
To address these problems, we chose to use exceptions, a mechanism that is not idiomatic in the Haskell programming language for managing control flow. The approach involves raising an exception when encountering a
jump. This exception is caught at a point where the code for thejump’s destination can be evaluated, effectively bypassing the control flow of the previously executing code.One of the encountered problems was getting to the destination quickly. A possible way was scanning all the sequences until finding the destination label. But this method is particularly slow.
First approach:
For the first approach, we use a map which maps to a label a destination.
Unlike HAVM, we don’t “plain” the sequences of the program. Instead, we use a stack of sequences. Each cell of the stack contains the end of a sequence, after the destination label or the sub-sequence containing it, which has to be evaluated. The advantage of this is that we can restore the sub-sequences without loosing information for the traces.
In this example, the stack contain two sequences, in red:
[C, (seq [D, E])][F, G]
![graph {
fontname="Source Sans Pro"
node [fontname="consolas"]
node [fontsize=10, shape=none, height=0.25, margin=0]
n001 [label="seq", width=0.3] ;
n001 -- n002 ;
n002 [label="A", width=0.2]
n001 -- n003
n003 [label="seq", width=0.3]
n003 -- n004
n004 [label="seq", width=0.3]
n004 -- n005
n005 [label="jump", width=0.45]
n005 -- n006
n006 [label="name l0"]
n004 -- n007
n007 [label="B", width=0.2]
n003 -- n008
n008 [label="label l0"]
subgraph {
cluster=true ;
shape=box
node [fontcolor="red2"]
n003 -- n009
n009 [label="C", width=0.2]
n003 -- n010
n010 [label="seq", width=0.3]
n010 -- n011
n011 [label="D", width=0.2]
n010 -- n012
n012 [label="E", width=0.2]
n001 -- n013
n013 [label="F", width=0.2]
n001 -- n014
n014 [label="G", width=0.2]
}
}](../../_images/graphviz-74a1dd9b5de9d2641f3a985a8608e51016e7b1ef.png)
first-approach.png
Note that
DandEare not in their own cell in the stack, because the sequence they belong to is not currently being evaluated.The exception is caught at the beginning of the program’s evaluation, and then the corresponding stack is fetched and then evaluated.
The problem of this approach is the
eseq. The stack doesn’t take account of aneseq, and will not restore it nor evaluate it, even if thejumpdoesn’t go out of its statements.Second approach:
In the second approach, instead of catching the
jumpat the beginning of the evaluation, it is also caught by theeseq.To do so, instead of having a unique map, we have one associated to each
eseq, with a physical equality. Each map has the labels of the statements in its statement child, excluding those in childeseq’s statements. When ajumpis caught, if the destination label is not in theeseq’s map, it is propagated.The exception is also caught at the beginning of the program, but not propagated in this case.