Syntaxic sugar: String.set -> Bytes.set?

$ cat s.ml

let () =
  let s = Bytes.of_string "Hello" in
  s.[4] <- 'a';
  print_endline (Bytes.to_string s)

$ ocaml s.ml
File "./s.ml", line 4, characters 2-14:
4 |   s.[4] <- 'a';
      ^^^^^^^^^^^^
Alert deprecated: Stdlib.String.set
Use Bytes.set/BytesLabels.set instead.
Hella

I was trying to figure out how to update my scripts that are using this syntaxic sugar?
Is it planed to be replaced by Bytes.set in the next release?
How can I already make this syntaxic sugar point to Bytes.set with 4.14 in a forward compatible way?
Or is this syntaxic sugar just planed to disappear?

It doesn’t answer your question directly (I don’t know where the Stdlib is going w.r.t. Bytes/String and syntactic sugar), but something that might help is that you can use

let ( .%[]<- ) = Bytes.set

and then

  s.%[4] <- 'a';
3 Likes

BTW, .[] <- syntax has been removed in Remove (.[]<-) by nojb ¡ Pull Request #11345 ¡ ocaml/ocaml ¡ GitHub, so it will be gone in 5.0 I think.

Removed as in, removed from just the String module or removed as in no longer valid OCaml syntax?

As I understand it, the syntax itself has been removed. So the old trick of “module String = Bytes” doesn’t work either.

The syntax has been removed from OCaml 5.0 altogether. The reason is that s.[i] <- c was an alias for String.set which was also removed. It was briefly discussed to make s.[i] <- c an alias for Bytes.set but it was concluded that this would be weird, since s.[i] continues to be an alias for String.get.

Cheers,
Nicolas

4 Likes

reminder that custom indexing operators incur some overhead and aren’t as inliner-happy as the pure syntax sugar of Array and String. reminder that -unsafe doesn’t work with custom indexing operators.

1 Like

That’s false since the custom indexing operators are functions, and thus they are obviously equivalent to the desugaring of a.(x) to Array.get a x. In other words, there are no difference between

external get : 'a array -> int -> 'a = "%array_safe_get"

and

external (.%()): : 'a array -> int -> 'a = "%array_safe_get"

I would say this is a feature, since -unsafe is a syntactic flag, that makes a.(x) desugars to Array.unsage_get a x . And it is possible to define unsafe custom operators if you need them.

2 Likes

That’s false since the custom indexing operators are functions

went off with that claim based on stuff like this: Compiler Explorer

the .%() is called with the target subroutine being a whole bunch of instructions, still polymorphic, etc… while .() is just a cmp + mov and not polymorphic (according to lambda array.get[addr] ...).

It seems there’s no difference between two definitions of the same external function, but there is a difference when aliasing them at the ocaml level.
I may be reading this wrong though, but I guess I never expected needing to reach out for external compiler builtins for a harmless change like using a custom operator instead of the normal one, to get back that performance loss.

Hi Nicolas,

For me String.set was not removed, but moved to the Bytes module as Bytes.set.
So it was logic for me to expect that s.[i] <- c would work with the bytes type.
What other people here think about it?

If it was only briefly discussed, maybe we could try to make it here and now, and ask opinions about it?

I do agree that it is weird. But I think it is even much more weird if it’s just removed.

Also to update current code, I only need to add bytes.to/of_tring which is usually not that bad.

Rewriting all the s.[i] <- c everywhere with Bytes.set is more time consuming (but I didn’t know I can also just add a ‘%’ in the middle, so it’s not that bad with this solution.)

This is what the error/warning message tells us to do, using Bytes.set, the error message could tell about the custom s.%[i] <- c possible replacement. This could have make me save time, it’s maybe not too late for 5.0.

I had some hope that s.[i] would work with both string and bytes types, just like > works with both int and float. There is no need to use >. with floats.

We could also imagine something like:

  Bytes.( s.[i] );
  String.( s.[i] );

If we’re talking about expectations, I would have loved [] to be removed from String and shifted to either Array instead of .() or a new Vector type, where it would fit the expectations of the user.

1 Like

yeah string should just not have indexing sugar, it encourages treating strings as readonly byte arrays and proliferate the whole byte = char = grapheme misconception.

Can’t custom string operators be defined to work on graphemes?

this would be very inefficient with utf8 (it would have to walk the string from the start) so i wonder if indexing-like syntax would be misleading

Then we are back to square one if it turns out that everyone really just wants to be able to index into specific bytes in strings and not graphemes.

It was only briefly discussed because there was quick consensus on removing the syntax, see "str.[i] <- c" is desugared into "String.set str i c" which does not exist anymore ¡ Issue #11342 ¡ ocaml/ocaml ¡ GitHub

The syntax s.[i] <- c (aka String.set) has been deprecated since 4.02 so codebases should have had plenty of time to adapt.

OCaml does not support operator overloading. The comparison with > is not a good one, because this operation is parametrically polymorphic and can be applied to arguments of any type. The right comparison would be with addition (+, +.), which indeed uses different operators for int and float.

This is an interesting proposal (and has been discussed in the context of array and floatarray), but this requires designing a new language feature. RFC/PRs welcome :slight_smile:

Cheers,
Nicolas

1 Like

Don’t we already have this? :wink:

module Bytes = struct
  include Bytes

  module String = struct
    let get = get
  end
end

module String = struct
  include String

  module String = struct
    let get = get
  end
end

let s : string = "Hello, world!"

let () = assert String.(s.[0] = 'H')

let b : bytes = Bytes.of_string "Hello, world!"

let () = assert Bytes.(b.[0] = 'H')
4 Likes

Amusingly this also works with any module named Array in context:

module Array = Bytes
let s = Bytes.of_string "test"
let _ = s.(0) <- 'r'; print_char s.(1)
1 Like