Hereās a very simple benchmark: re2ocaml vs ragel-ocaml on recognizing a decimal integer (not parsing, no file input, no buffers - simply running a very primitive lexer function on the same string many times in a loop). Itās just to get the general idea, not a real benchmark yet.
x.rl (ragel code, one action leaving the state machine for integer):
%%{
machine atoi;
write data;
}%%
let atoi data =
let cs = ref 0 in
let p = ref 0 in
let pe = ref (String.length data) in
let res = ref false in
%%{
main := digit+ %{ res := true; } '\n';
write init;
write exec;
}%%
!res
let () =
let s = "1234567890\n" in
(* if not (atoi s) then raise (Failure "error"); *)
for i = 1 to (10 * 1000 * 1000) do
ignore (atoi s)
done
y.re (re2ocaml code):
open String
type state = {
yyinput: string;
mutable yycursor: int;
mutable yymarker: int;
}
%{
re2c:YYFN = ["lex;bool", "yyrecord;state"];
re2c:yyfill:enable = 0;
[0-9]+ [\n] { true }
* { false }
%}
let main () =
let s = "1234567890\n" in
(* let st = {yyinput = s; yycursor = 0; yymarker = 0}
in if not (lex st) then raise (Failure "error"); *)
for i = 1 to (10 * 1000 * 1000) do
let st = {yyinput = s; yycursor = 0; yymarker = 0}
in ignore (lex st)
done
let _ = main ()
Compilation (re2ocaml version 4.0, ragel version 7.0.4 and I used the fastest available mode -F1
):
$ ragel-ocaml -F1 x.rl -o x.ml && ocamlopt x.ml -o x
$ re2ocaml -i y.re -o y.ml && ocamlopt y.ml -o y
Results:
$ time ./x
real 0m0.684s
user 0m0.683s
sys 0m0.000s
$ time ./y
real 0m0.172s
user 0m0.171s
sys 0m0.001s
So re2ocaml is quite a bit faster. Itās easy to explain though, ragelās real fast mode -G2
is not implemented for ocaml, so it generates a table-based lexer which has one extra dereference to make on each character (indirect jump in the table). Perf stat confirms this (look at branch counts):
$ perf stat ./x
Performance counter stats for './x':
671.07 msec task-clock:u # 0.999 CPUs utilized
0 context-switches:u # 0.000 /sec
0 cpu-migrations:u # 0.000 /sec
124 page-faults:u # 184.781 /sec
2,523,955,805 cycles:u # 3.761 GHz
10,350,306,836 instructions:u # 4.10 insn per cycle
2,270,066,024 branches:u # 3.383 G/sec
9,109 branch-misses:u # 0.00% of all branches
0.671469889 seconds time elapsed
0.669477000 seconds user
0.001995000 seconds sys
$ perf stat ./y
Performance counter stats for './y':
172.91 msec task-clock:u # 0.996 CPUs utilized
0 context-switches:u # 0.000 /sec
0 cpu-migrations:u # 0.000 /sec
650 page-faults:u # 3.759 K/sec
649,827,957 cycles:u # 3.758 GHz
2,671,586,889 instructions:u # 4.11 insn per cycle
580,292,838 branches:u # 3.356 G/sec
7,586 branch-misses:u # 0.00% of all branches
0.173548209 seconds time elapsed
0.172154000 seconds user
0.001000000 seconds sys
I also find the code re2ocaml generates easier to read, but it may be my familiarity bias:
Ragel can definitely improve; thereās no reason in principle why it canāt implement fast mode for OCaml (it would have to be based on tail recursion as in re2ocaml, as thereās no goto in OCaml). In general ragerās fast mode and re2c are very close in speed and both very fast (for C, which one is faster often depends on the C compiler, e.g. GCC vs Clang may give different results).