Warning when using `a.[b] <- c` operator with bytes

Prior to -safe-string, the a.[b] <- c operator was used to modify characters in a string:

let s = String.make 3 '\000' in
s.[0] <- 'f';
s.[1] <- 'o';
s.[2] <- 'o';

With safe-string, only bytes may be modified, not string. However, I get the following warnings when I try to duplicate the above example under -safe-string:

utop # let b = Bytes.make 3 '\000';;
val b : bytes = Bytes.of_string "\000\000\000"

utop # b.[0] <- 'f';;
Characters 0-12:
Warning 3: deprecated: String.set
Use Bytes.set instead.
Characters 0-12:
Warning 3: deprecated: String.set
Use Bytes.set instead.
- : unit = ()

Is the a.[b] <- c operator being deprecated altogether? Or will it eventually be an alias for Bytes.set instead of String.set?

3 Likes

Though it doesn’t answer your question (and I would also like to know, what are the plans of our benevolent compiler developers with respect to the behavior of the indexing operators) there is a partial solution (a workaround) that will enable indexing operators for bytes.

The indexing operators, are not actually operators, but rather a syntax, or syntactic sugar, if you would like. A construct x.[n] <- v is translated by the parser into String.set x n v, correspondingly delimiting the index with () or {} will change the module to Array and Bigarray (and the number of indices inside of the curly braces correspondingly picks the right submodule in the Bigarray module, i.e., Array1 till Array3 and Genarray).

Thus, if you will somehow shadow the definition of the String module with your own definition, that has the set and get operations remapped to the bytes type, you can use the convenient indexing operators. The main problem here is that it will disable the same syntax for strings (that might be a good idea). Another problem is that we need to re-export the rest of the String module in the shadow definition.

My personal solution would be to add a module that has only the get/set functions, and open it as local as possible.

module Bytes : sig 
   ....
  module String : sig 
      val get : bytes -> int -> char
      val set : bytes -> int -> char -> unit
  end

  module Syntax : sig 
    module String : sig
      val get : bytes -> int -> char
      val set : bytes -> int -> char -> unit
    end
  end
end

Note, that I’ve exported it twice so that the notation would be available by opening the whole bytes module, e.g.,

Bytes.(foo.[0] <- 'f')

But since it will pollute the namespace with too many definitions, that is not nice, especially if you would like to extend the scope of the bytes notation a little bit further. In that case, we can do:

let test () = 
   let open Bytes.Syntax in
   s.[0] <- 'f';
   s.[1] <- 'o';
   s.[2] <- 'o';

Note, that in the scope of the Bytes.Syntax and Bytes the String module is disabled. That could be considered both as a bane of a boon, depending on whom you’re asking.

1 Like

I would advise to use the new extended indexing operators (added in 4.06.0):

let ( .%[]<- ) = Bytes.set
let ( .%[] ) = Bytes.get

rather than the old trick of redefining String.set/get; even more so since it is so easy to forget to add the unsafe get/set variant in String and ends up with code which breaks with the -unsafe option.

Disclaimer: I am certainly very biased towards the new extended indexing operator syntax

5 Likes

I would agree, but it is not always an option, especially if you’re trying to write code that works on older versions of the compiler.

Can you elaborate a little more?

P.S. it should be Bytes.get not Byte.get, same for set

When the -unsafe option is enabled a.[x] (resp. a.[x] <- y) is translated to String.unsafe_get a x (resp. String.unsafe_set a x y) rather than String.get (resp. String.set), and the same thing happens for array and bigarray indexing.

Consequently, it is probably safer to always redefine the pair get/unsafe_get (resp. set/unsafe_set) simultaneously. Otherwise, it could lead to surprising bug dependent on the compiler flags.

p.s.: I have fixed the type Byte typo, thanks!

3 Likes

This is all yet more evidence that Modular Implicits will be a really big deal…

I don’t remember if something was decided for a.[foo] <- bar. Would someone be willing to open a Mantis issue to propose/ask to rebind it to Bytes.set instead of String.set?

There was a PR to remap (.[]<-) to Bytes.set in presence of -safe-string but it was closed and lost during the chaotic history of the initial user-defined indexing operator PR.

1 Like

Defining an operator looks like a reasonable solution. I’ve been trying to port my codebase to 4.06.1, a lot of bytes related changes have to be made to old code, and sometimes it can break them of course. A lot of old libraries have some kind of an wrapper around String / IO functions that you’d have to change to bytes or do even more changes… :confused: