Indeed you can now iterate over code points, but your indices should iterate over scalar values. Since apparently in 2017 they didn’t fix c) that’s the mess you get:
> python3
Python 3.6.1 (default, Apr 4 2017, 09:40:21)
[GCC 4.2.1 Compatible Apple LLVM 8.1.0 (clang-802.0.38)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> '\uD800' # unpaired surrogate
'\ud800'
>>> '\uD800'[0]
'\ud800'
>>> '\uD800'.encode('utf-8')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
UnicodeEncodeError: 'utf-8' codec can't encode character '\ud800' in position 0: surrogates not allowed
>>> '\uD83D\uDC2B' # paired surrogates representing U+1F42B
'\ud83d\udc2b'
>>> '\uD83D\uDC2B'[0]
'\ud83d'
>>> '\uD83D\uDC2B'[1]
'\udc2b'
>>> '\uD83D\uDC2B'.encode('utf-8')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
UnicodeEncodeError: 'utf-8' codec can't encode characters in position 0-1: surrogates not allowed
>>> '\U0001F42B'
'🐫'
>>> '\U0001F42B'[0]
'🐫'
So your python3 string doesn’t represent a sequence of Unicode scalar values, which as a programmer, is the minimal model you’d like to be able to work with (Swift
made a bolder, more programmer friendly move for certain scripts as you index by grapheme clusters).