List operation a += b, a = a+b is different in Python

I just found this interesting feature of Python list operation:

In [1]: a1 = range(3)

In [2]: a2 = a1

In [3]: a2 += [3]

In [4]: a1
Out[4]: [0, 1, 2, 3]

In [5]: a2
Out[5]: [0, 1, 2, 3]

When we use +=, the value of a1 will also be changed, but if we use + as follows:

In [1]: a1 = range(3)

In [2]: a2 = a1

In [3]: a2 = a2 + [3]

In [4]: a1
Out[4]: [0, 1, 2]

In [5]: a2
Out[5]: [0, 1, 2, 3]

a1 will not be touched.

Python is considered a flexible, agile programming language, but I think the semantic above is ridiculous, I’d like to prefer OCaml in real world project:

utop # let a = [0; 1; 2];;
val a : int list = [0; 1; 2]

utop # let b = a;;
val b : int list = [0; 1; 2]

utop # let b = b @ [3];;
val b : int list = [0; 1; 2; 3]

utop # a;;
- : int list = [0; 1; 2]
1 Like

What you’re comparing here is two different language semantics, and these semantics have to be tied to data structure representation for performance reasons.

Python is generally a reference-semantics language, as are most popular languages, including Java and Ruby. This means that when we run

a = range(3)
b = a

b is just a reference to a ie. they’re pointing to the same thing in memory. The same thing is true for

def foo(a):
    a += [1]

b = range(3)
foo(b)

b will be modified directly in this case, because we’re passing a reference to b to the function foo(). OCaml and other functional languages tend to have value semantics, at least as far as immutable values are concerned, meaning that in

let a = [1; 2; 3]
let b = 0::a

b theoretically refers to an unrelated value than a. However, this is an illusion – b is really pointing to a new 0 element of the list, and that element then points to the elements of a. This is a useful illusion to be sure, because it’s really nice that we can pretend not to worry about sharing data between a and b, but this illusion breaks down as soon as we make things mutable:

let a = [ref 1; ref 2; ref 3] (* list of mutable references *)
let b = (ref 0)::a
(List.nth b 2) := 10 (* modifies a and b *)

So much for our pretend value semantics. So long as we avoid mutability though, we get to keep this nice notion of value semantics, allowing us to ignore the fact that there’s sharing going on ‘behind the scenes’. This is really nice, because references to mutable data structures cause a lot of complexity and bugs.

Notice, btw, that your example,

let a = [0; 1; 2]
let b = a
let b = b @ [3] (* copies the whole list *)

isn’t exactly a fair comparison. The semantics of immutable lists mean that adding to the end of the list involves rebuilding the whole list. This is like doing

a = range(3)
b = a[:] (* copy the whole python 'list' *)
b += [3] (* a is unmodified *)

ie. it’s quite inefficient, since it’s copying the whole data structure. In fact, efficiency is at the heart of a lot of this. Value semantics (having data structures be unrelated to each other) is wonderful in theory, but with most data structures, this involves heavy copying, which is why Python doesn’t do it. Python’s ‘list’ is a misnomer –
it’s really just a vector – a resizable, mutable array. OCaml’s lists are immutable by default and they use a linked list data structure, which means that it makes sense to share them when possible, pretending that they’re not just sharing values in memory. Notice that OCaml’s array can’t pull off this trick:

let a = [|1; 2; 3|]
let b = a (* points to a in memory *)
b.(2) <- 10 (* affects a and b *)

Again, due to mutability, we have to do the same sharing that python does. OCaml’s arrays don’t even have a reasonable method of expansion, since they’re not vectors: they don’t resize, and we have to create a new array every time we want to add to them:

let a = [|1; 2; 3|]
let b = a (* points to a in memory *)
let b = Array.append b [|4|]
    (* create a new array. No sharing, but inefficient *)

All of which is to say, different languages have different semantics, and those semantics are there for a reason. Python has reference semantics for lists and objects. OCaml has pretend-value semantics for immutable values, which is really cool (but has its own interesting limitations), but as soon as you’re dealing with mutable values, the pretense breaks down and you’re back in the land of reference semantics (for the sake of efficiency).

5 Likes

You are right, but I disagree. What makes this point of view correct is that fact that the polymorphic structural (in)equality functions compares references by their current content, and equality has a tendency of dictating semantics. I would prefer to think of the 'a ref list as pure list containing mutable objects, that is

(List.nth b 2) := 10 (* modifies neither a nor b, only the shared contained object *)

This is similar to passing around a file descriptor without affecting a file or a future without running it. We can purely transform the structures while avoiding dereferencing. Moreover, a sufficiently complete description of the algorithm should tell us what sharing exists between and within the input and output, and the sharing should be no more magical than to integers being the same or not. So, the “illusion” is still perfect to me.

1 Like

@liweijian, I apologize for parsing your question incorrectly. Honestly, my brain read it differently off of a phone.

Your observation is still related to the things I pointed out: the semantics of python (pass by reference, using mutable vector data structures) make the + and += operation work differently.

In the case of

b = b + [3]

the (+) operator doesn’t necessarily ‘know’ you’ll be assigning the result back to the same data structure, and so the only logical action is to create a full copy of the original b with 3 tacked on to the end. However, in

b += [3]

it’s clear that the user wants to add to the same data structure, and so to make this operation at all useful given the semantics of the language, we enlarge the size of b and modify it in-place. While this does imply that

(x += y) != (x = x + y)

there’s no law requiring this identity. At the end of the day, these are just very rough algebraic approximations for actions on data structures.

OCaml doesn’t choose to overload these operators for data structures, and given the lack of mutability, (+=) wouldn’t really make much sense. Had OCaml had a Vector type (as exists in Containers, for example), and had it been easy to resolve function in OCaml, the same choices would likely have been made. e.g.

let (+) x y =
    let z = CCVector.create () in
    CCVector.append z x;
    CCVector.append z y;
    y

let (+=) x y = CCVector.append x y