How to decode a hex string to a byte sequence?

I am trying to figure out how to work with byte sequences by decoding a hex string to a byte sequence, my goal is to encode that decoded byte sequence to a base64 string .
The ocaml manual doesn’t give clues about this (Bytes module docs) and the mirage-base64 library takes string as inputs which is weird given it’s an explicit bytes to ascii encoding.

For example in python I would do the following :

import binascii
input = "49276d206b696c6c696e6720796f757220627261696e206c696b65206120706f69736f6e6f7573206d757368726f6f6d"
as_bytes = binascii.unhexlify(input)
got = binascii.b2a_base64(as_bytes)
b'SSdtIGtpbGxpbmcgeW91ciBicmFpbiBsaWtlIGEgcG9pc29ub3VzIG11c2hyb29t\n'
got.decode("utf-8")
'SSdtIGtpbGxpbmcgeW91ciBicmFpbiBsaWtlIGEgcG9pc29ub3VzIG11c2hyb29t'

This is taken from the cryptopals challenge .

[I would [ETA: NOT] recommend this, but] there’s always handy-dandy int-of-string:

# int_of_string "0xff";;
- : int = 255

I wouldn’t be surprised to find about a thousand different implementations of “hex string” to “byte string” (I know I’ve implemented the equivalent of Perl’s “unpack/pack” at least twice in different contexts that needed different properties) but offhand, I don’t know where I’d lay my hands on one in a core library at the moment. Here’s another:

# Scanf.sscanf "ff" "%02x" (fun n -> n) ;;
- : int = 255

Not sure if this helps.

1 Like

What you’re doing here is decoding a base16 string into octets (binascii.unhexlify), then encoding the octets as a base64 string (binascii.b2a_base64). As you’ve noticed, the Mirage library only supports base64, not the full range of Radix-N encodings in RFC 4868.

My Orsetto library contains a more extensive implementation of RFC 4868. You could use Cf_base16.Std.decode_string to convert to octets, then use Cf_base64.Std.encode_string to convert back to base64 encoded text.

(NOTE: Orsetto is not yet available from the community OPAM repository. You have to get it from my personal repo.)

Dear @actuallyachraf,

if I understand you correctly, you get a string as input that is a series of hexadecimal bytes, and you want to output it as base64!? In order to achieve that, I can see the following solution which uses the hex package and the base64 package (both available via opam):

let binary = Hex.to_string (`Hex
"49276d206b696c6c696e6720796f757220627261696e206c696b65206120706f69736f6e6f7573206d757368726f6f6d") in
let base64 = Base64.encode_string binary

in an interactive utop session:

# #require "hex";;
# #require "base64";;
# let binary = Hex.to_string (`Hex "49276d206b696c6c696e6720796f757220627261696e206c696b65206120706f69736f6e6f7573206d757368726f6f6d");;
val binary : string = "I'm killing your brain like a poisonous mushroom"
# Base64.encode_string binary;;
- : string = "SSdtIGtpbGxpbmcgeW91ciBicmFpbiBsaWtlIGEgcG9pc29ub3VzIG11c2hyb29t"
4 Likes

Oh yes ! thank you ,I didn’t know about those packages .
Can you point me to the Base64 one as the one in opam https://opam.ocaml.org/packages/base64/ doesn’t have the Base64.encode_string you used there . But it works nonetheless
can you please explain the `Hex prefix here what it does ?

(`Hex "49276d206b696c6c696e6720796f757220627261696e206c696b65206120706f69736f6e6f7573206d757368726f6f6d");;

not sure which version of base64 you have installed, according to its latest documentation it is there:
https://docs.mirage.io/base64/Base64/index.html#val-encode_string

according to its CHANGES.md, its available since 3.1.0.

If you look into the documentation, the type t is defined as type t = [ `Hex of string ] (a polymorphic variant with the single constructor `Hex, so `Hex "f00d" is a value of that type (Hex.t). Since val to_string : t -> string, a t is needed.

hope that helps,

hannes

1 Like

My mistake I was running version 3.0.0 .
Thanks for your help I figured out a lot now :smile: