Redefining the string index operator

bluddy · January 15, 2023, 8:03am

I know that we’ve allowed indexing operators to be redefined. However, most of them are quite awkward with the extra syntax (e.g. .%[...]). We already need to ‘pay’ the extra cost of the dot in OCaml syntax (i.e. val.[...] vs just val[...]).

Would it be possible to also allow redefinition of the string indexing operators themselves? Using these operators on ASCII string indexing seems like a waste of prime real-estate that could be used for many more important tasks:

let x = Hashtbl.create 10
open Hashtbl.Infix
x.["test"] <- "hello";
print_endline x.["test"];

let x = [|1; 2; 3|]
open Array.Infix
x.[1] <- 2;
Printf.sprintf "x[1] is %d" x.[1];

Frederic_Loyer · January 15, 2023, 12:11pm

We can type :

let ( .%{} ) tabl index = Hashtbl.find tabl index
let ( .%{}<- ) tabl index value = Hashtbl.add tabl index value

You have the choice between {}, [] and (), but must choose a special character like % here.

Then you can use t.%{i}. I know : 2 extra characters compared with Python.

bluddy · January 15, 2023, 12:28pm

I’m aware of this, but I’m questioning the need to use the most natural operator ([]) on ASCII string indexing alone. This seems like a waste in our day and age. Also, the readability of .%[...] is quite awful IMO.

Oh also, a nitpick regarding your examples – you used Hashtbl.add rather than Hashtbl.replace, which is probably not what you wanted (another OCaml gotcha).

thierry-martinez · January 16, 2023, 4:24am

You may override String.get and String.set.

module Hashtbl = struct
  include Hashtbl
  module Infix = struct
    module String = struct
      include String
      let get = Hashtbl.find
      let set = Hashtbl.replace
    end
  end
end

let x = Hashtbl.create 10

open Hashtbl.Infix

let () =
  x.["test"] <- "hello";
  print_endline x.["test"]

(EDIT: it was already discussed here, Syntaxic sugar: String.set -> Bytes.set? - #17 by thierry-martinez)

nojb · January 16, 2023, 8:28am

Somewhat off topic, but in my experience, overloading operators is not a good idea 99% of the time.

Code is read much more than written, and such overloading harms readability. When reading a piece of code, if one sees e.[i] you immediately know that e is of type string, which helps to make sense of the code. If this operator is overloaded, you lose this aid and understanding the code in front of your eyes becomes more difficult.

For a similar reason I use infix operators very rarely, and only in the smallest possible scope.

Cheers,
Nicolas

bluddy · January 16, 2023, 9:48am

I don’t entirely disagree, but:

Accessing ASCII characters of a string is very dated as well as being rare. Losing the most universal syntax for member access ([]) to strings of all things is a huge waste IMO.
The readability of .[] is very high. Compare this to .(), which can easily be confused for plenty of other things (module access of different sorts), or the redefinable operators, which have serious readability issues as I mentioned earlier.
Knowing the exact types is a tradeoff vs other things. It’s nice to have concise syntax for certain operations that are extremely common, such as Hashtbl or Map access. IMO the readability improves dramatically when you have concise, easy to understand infix operators despite the fact that there is a little bit of ambiguity type-wise.

That’s a very cool hack which I didn’t know was possible and I’m definitely going to start using it right away, but it also would be nice to ‘un-hack’ it and make it official.

emillon · January 16, 2023, 10:49am

Note that in OCaml >= 5.0, the String.set sugar has been removed from the parser so I think you’ll always get a parse error even if a String.set function is in scope.

bluddy · January 16, 2023, 1:03pm

Well that’s disappointing.

gasche · January 16, 2023, 2:31pm

Personally I think that using t.%[x] instead of t.[x] is okay. There are other improvements to OCaml or the tooling ecosystem that I would work on before spending efforts on removing one character there.

Making t.[x] or t.(x) available on user-defined types using type-directed disambiguation is something that we have discussed in the past (proposed by @lpw25), and I think it is more interesting than merely a syntactic question because it changes how lookup is done to become more flexible. The usual caveats on ad-hoc polymorphism / overloading apply (but for constructor and field names we seem to like it), and it’s also work to implement.

Topic		Replies	Views
How does one index/split/splice strings in OCaml like we do in Python? Learning	1	908	November 9, 2019
Warning when using `a.[b] <- c` operator with bytes Learning syntax , safe-string , bytes	8	2946	March 26, 2018
Defining Hashtbl variable Learning	6	989	March 18, 2019
Syntaxic sugar: String.set -> Bytes.set? Learning evolution	17	1277	August 13, 2022
[ANN] Operator lookup tool for OCaml Community	12	3258	December 5, 2020

Redefining the string index operator

Related topics