Redefining the string index operator

I know that we’ve allowed indexing operators to be redefined. However, most of them are quite awkward with the extra syntax (e.g. .%[...]). We already need to ‘pay’ the extra cost of the dot in OCaml syntax (i.e. val.[...] vs just val[...]).

Would it be possible to also allow redefinition of the string indexing operators themselves? Using these operators on ASCII string indexing seems like a waste of prime real-estate that could be used for many more important tasks:

let x = Hashtbl.create 10
open Hashtbl.Infix
x.["test"] <- "hello";
print_endline x.["test"];

let x = [|1; 2; 3|]
open Array.Infix
x.[1] <- 2;
Printf.sprintf "x[1] is %d" x.[1];
2 Likes

We can type :

let ( .%{} ) tabl index = Hashtbl.find tabl index
let ( .%{}<- ) tabl index value = Hashtbl.add tabl index value

You have the choice between {}, [] and (), but must choose a special character like % here.

Then you can use t.%{i}. I know : 2 extra characters compared with Python.

1 Like

I’m aware of this, but I’m questioning the need to use the most natural operator ([]) on ASCII string indexing alone. This seems like a waste in our day and age. Also, the readability of .%[...] is quite awful IMO.

Oh also, a nitpick regarding your examples – you used Hashtbl.add rather than Hashtbl.replace, which is probably not what you wanted (another OCaml gotcha).

2 Likes

You may override String.get and String.set.

module Hashtbl = struct
  include Hashtbl
  module Infix = struct
    module String = struct
      include String
      let get = Hashtbl.find
      let set = Hashtbl.replace
    end
  end
end

let x = Hashtbl.create 10

open Hashtbl.Infix

let () =
  x.["test"] <- "hello";
  print_endline x.["test"]

(EDIT: it was already discussed here, Syntaxic sugar: String.set -> Bytes.set? - #17 by thierry-martinez)

3 Likes

Somewhat off topic, but in my experience, overloading operators is not a good idea 99% of the time.

Code is read much more than written, and such overloading harms readability. When reading a piece of code, if one sees e.[i] you immediately know that e is of type string, which helps to make sense of the code. If this operator is overloaded, you lose this aid and understanding the code in front of your eyes becomes more difficult.

For a similar reason I use infix operators very rarely, and only in the smallest possible scope.

Cheers,
Nicolas

3 Likes

I don’t entirely disagree, but:

  • Accessing ASCII characters of a string is very dated as well as being rare. Losing the most universal syntax for member access ([]) to strings of all things is a huge waste IMO.
  • The readability of .[] is very high. Compare this to .(), which can easily be confused for plenty of other things (module access of different sorts), or the redefinable operators, which have serious readability issues as I mentioned earlier.
  • Knowing the exact types is a tradeoff vs other things. It’s nice to have concise syntax for certain operations that are extremely common, such as Hashtbl or Map access. IMO the readability improves dramatically when you have concise, easy to understand infix operators despite the fact that there is a little bit of ambiguity type-wise.

That’s a very cool hack which I didn’t know was possible and I’m definitely going to start using it right away, but it also would be nice to ‘un-hack’ it and make it official.

3 Likes

Note that in OCaml >= 5.0, the String.set sugar has been removed from the parser so I think you’ll always get a parse error even if a String.set function is in scope.

2 Likes

Well that’s disappointing.

Personally I think that using t.%[x] instead of t.[x] is okay. There are other improvements to OCaml or the tooling ecosystem that I would work on before spending efforts on removing one character there.

Making t.[x] or t.(x) available on user-defined types using type-directed disambiguation is something that we have discussed in the past (proposed by @lpw25), and I think it is more interesting than merely a syntactic question because it changes how lookup is done to become more flexible. The usual caveats on ad-hoc polymorphism / overloading apply (but for constructor and field names we seem to like it), and it’s also work to implement.

3 Likes