utop # int_of_string "0b001";;
- : int = 1
utop # int_of_string "0xff";;
- : int = 255
int_of_string parses hex, binary, and octal notations in addition to the usual decimal notation. This is often surprising and can lead to subtle errors when using this to validate input syntax. Should the Int module contain conversion functions that are more explicit about what they accept and have an efficient implementation?
Agreed. I think validation of input syntax should be done / is often done before calling int_of_string. For example, a lexer naturally checks the syntax of integer literals before converting them to int using int_of_string. Likewise for input field validation in HTML forms.
I am aware that this behaviour is documented but think it would have been better to relegate such flexible behaviour to a function inside a module that is not open by default. This is too late now - hence my suggestion to add a stricter function to the Int module. We can probably agree that most code is not doing syntactic checks before calling int_of_string and opens itself up for surprises.