"OCaml -- first impressions"

dbuenzli · July 11, 2017, 8:58am

Indeed you can now iterate over code points, but your indices should iterate over scalar values. Since apparently in 2017 they didn’t fix c) that’s the mess you get:

> python3
Python 3.6.1 (default, Apr  4 2017, 09:40:21) 
[GCC 4.2.1 Compatible Apple LLVM 8.1.0 (clang-802.0.38)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> '\uD800' # unpaired surrogate
'\ud800'
>>> '\uD800'[0]
'\ud800'
>>> '\uD800'.encode('utf-8')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeEncodeError: 'utf-8' codec can't encode character '\ud800' in position 0: surrogates not allowed
>>> '\uD83D\uDC2B' # paired surrogates representing U+1F42B
'\ud83d\udc2b'
>>> '\uD83D\uDC2B'[0]
'\ud83d'
>>> '\uD83D\uDC2B'[1]
'\udc2b'
>>> '\uD83D\uDC2B'.encode('utf-8')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeEncodeError: 'utf-8' codec can't encode characters in position 0-1: surrogates not allowed
>>> '\U0001F42B'
'🐫'
>>> '\U0001F42B'[0]
'🐫'

So your python3 string doesn’t represent a sequence of Unicode scalar values, which as a programmer, is the minimal model you’d like to be able to work with (Swift made a bolder, more programmer friendly move for certain scripts as you index by grapheme clusters).

Topic		Replies	Views
Survey on the new "Getting Started" Documentation on OCaml.org Learning user-feedback , ocamlorg	3	687	November 8, 2023
[ANN] New Get Started Documentation on OCaml.org Community ocamlorg	2	907	October 19, 2023
OCaml - first impressions Learning	26	2289	September 20, 2020
Feedback on RWO dev site Site Feedback real-world-ocaml	3	1109	December 18, 2023
OCaml at First Glance Learning	21	3326	August 30, 2022

"OCaml -- first impressions"

Related topics