Consider the following program:

```
let selection_sort cmp a left right =
for i=left to right-2 do
let leftmost = Array.get a i in
let minimum_pos = ref i
and minimum = ref leftmost in
for j=i+1 to right-1 do
let current = Array.get a j in
if (cmp[@inlined always]) current !minimum < 0 then
( minimum := current;
minimum_pos := j)
done;
Array.set a !minimum_pos leftmost;
Array.set a i !minimum
done;;
let mycmp x y = x - y;;
let n = 107 in
let t = Array.init n (fun i -> i lxor 0x45) in
selection_sort mycmp t 0 n;
Array.iter (fun x -> Printf.printf "%d " x) t;;
```

Neither the classic inliner neither flambda at -O3 seem to be able to inline the call to cmp.

```
File "polymorph.ml", line 8, characters 9-48:
8 | if (cmp[@inlined always]) current !minimum < 0 then
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Warning 55 [inlining-impossible]: Cannot inline: [@inlined] attribute was not used on this function application (the optimizer did not know what function was being applied)
```

I would have thought flambda would specialize the selection sort function for that particular cmp?!

Try adding an inline annotation to `selection_sort`

as well. The compiler canâ€™t really inline `cmp`

without inlining `selection_sort`

as a whole, but Iâ€™m guessing the inner annotation alone wonâ€™t trigger that. So the optimizer is deciding not to inline `selection_sort`

and warns you it canâ€™t satisfy the `inline always`

constraint.

Right. Merely specifying that `selection_sort `

should be inlined is not enough to make the warning disappear, because the compiler still emits it because of the generic version of the code. But if I make the `selection_sort `

function private by using a module interface, indeed, it seems `mycmp`

is not emitted.

Moving on now to the (more interesting) case of nested calls to be specialized.

Now for this example:

```
let selection_sort cmp a left right =
for i=left to right-2 do
let leftmost = Array.get a i in
let minimum_pos = ref i
and minimum = ref leftmost in
for j=i+1 to right-1 do
let current = Array.get a j in
if (cmp [@inlined hint]) current !minimum < 0 then
( minimum := current;
minimum_pos := j)
done;
Array.set a !minimum_pos leftmost;
Array.set a i !minimum
done
[@@inline always];;
let pivot (cmp : 'a -> 'a -> int) (a : 'a array) (left : int) (right : int) =
assert(0 <= left);
assert(left <= right);
assert(right <= Array.length a);
let mid_start = ref left
and mid_end = ref (left + 1)
and current_pos = ref (right - 1)
and pivot = a.(left) in
while !mid_end <= !current_pos do
assert (left <= !mid_start);
assert (!mid_start < !mid_end);
assert (!current_pos < right);
let current = a.(!current_pos) in
match (cmp [@inlined hint]) current pivot with
| 0 ->
a.(!current_pos) <- a.(!mid_end);
mid_end := !mid_end + 1
| n when n < 0 ->
a.(!mid_start) <- current;
a.(!current_pos) <- a.(!mid_end);
mid_start := !mid_start+1;
mid_end := !mid_end+1
| _ ->
current_pos := !current_pos - 1
done;
for i= !mid_start to !mid_end-1 do
a.(i) <- pivot
done;
(!mid_start, !mid_end)
[@@inline always];;
let selection_thresh = 10
let rec quicksort (cmp : 'a -> 'a -> int) (a : 'a array) (left : int) (right : int) =
assert(0 <= left);
assert(left <= right);
assert(right <= Array.length a);
if right - left <= selection_thresh
then selection_sort cmp a left right
else
let (mid_start, mid_end) = pivot cmp a left right in
let len_left = mid_start - left
and len_right = right - mid_end in
if len_left < len_right then
(quicksort cmp a left mid_start;
quicksort cmp a mid_end right)
else
(quicksort cmp a mid_end right;
quicksort cmp a left mid_start)
[@@inline always];;
```

It seems that `quicksort`

is not specialized (one sees it being called with a closure).

Note that for Flambda at least, you can pass the `-inlining-report`

flag to the compiler, and it will generate some human-readable description of the inlining decisions. In particular, it will show that the size of the `selection_sort`

function is significantly bigger than any benefit from specialising the comparison, which is why itâ€™s not inlined (or specialised, which is almost the same for non-recursive functions).

We have thought about a few features that would have helped in this case (the initial one), most notably an annotation on parameters that triggers specialisation/inlining if the parameter is â€śknownâ€ť at the call site (in your case, you could have added the annotation on the `cmp`

parameter), but itâ€™s not implemented yet.

We do have in internship proposition still open on the subject (in french),

In the `quicksort`

example, what would be needed would be the ability to specialize functions without inlining. Is this feasible?

Yes, it only works for recursive functions but you can use `[@@specialise always]`

instead of `[@@inline always]`

.

`[@@specialise always]`

does not seem to have an effect on `quicksort,`

or maybe I used it wrong?

Did you check what happens when the function is used, outside of its definition ? The definition itself cannot be specialised, since you donâ€™t know its parameters yet. The `specialise`

annotation should ensure that whenever you call `quicksort`

(outside its definition), a new version is created which is specialised to the comparison function you gave.

My bad, I took the wrong version of the .s file, and indeed the` mycmp`

function disappeared, meaning it was inlined.

Now getting the same problem with this version. Normally `quicksort`

and its sub-calls should be specialized, but they are not (the `mycmp`

function is passed as a closure). Is this an issue with cross-module specialization/inlining?

It looks like a silly typo: the attribute is `[@@specialise]`

, not `[@@specialize]`

. Could you check again with this fix ?

OH DEAR. American vs British spelling, and I tend to write American. Thanks!

Indeed, it seems that `[@@specialise]`

works. Iâ€™ll see if I can make the big arrays polymorphic as well.

1 Like

Let us consider now the specialization of accesses to big arrays.

```
open Bigarray
type array_type = (int, int_elt, c_layout) Array1.t
let selection_sort (cmp : 'a -> 'a -> int)
(a : array_type) left right =
for i=left to right-2 do
let leftmost = Array1.get a i in
let minimum_pos = ref i
and minimum = ref leftmost in
for j=i+1 to right-1 do
let current = Array1.get a j in
if (cmp[@inlined hint]) current !minimum < 0 then
( minimum := current;
minimum_pos := j)
done;
Array1.set a !minimum_pos leftmost;
Array1.set a i !minimum
done
[@@inline always];;
let mycmp x y = x - y;;
let n = 107 in
let t = Array1.init Int C_layout n (fun i -> i lxor 0x45) in
selection_sort mycmp t 0 n;
for i=0 to n-1 do
Printf.printf "%d " (Array1.get t i)
done;;
```

Compiling this code yields no `caml_ba`

function calls. Now compile a empty `.mli`

file (to ensure that no generic version of the sorting code is produced), and remove the type annotation in the signature of `selection_sort`

. The code then produced contains `caml_ba`

calls. It looks like function specialization does not specialize Bigarray accesses regarding to layout and kind.

Indeed. Array and Bigarray specialisation relies on the typing information, while inlining and specialisation of functions occur after most of the types have been discarded. So even when the function is specialised, we donâ€™t have access to the typing information that would allow us to specialise the Bigarray accesses.

Weâ€™ve managed to rebuild some amount of Array specialisation after inlining in our experimental Flambda 2 branch, but we havenâ€™t done the same for Bigarrays yet. Hopefully by the time Flambda 2 is merged Bigarrays will be handled too, but this is likely not going to happen anytime soon.

Thank you for taking the time to answer my naive questions.

By the way, if you want your Bigarray example to work, you can cheat by passing the get/set functions as parameters:

```
let selection_sort get set (cmp : 'a -> 'a -> int)
a left right =
for i=left to right-2 do
let leftmost = get a i in
let minimum_pos = ref i
and minimum = ref leftmost in
for j=i+1 to right-1 do
let current = get a j in
if (cmp[@inlined hint]) current !minimum < 0 then
( minimum := current;
minimum_pos := j)
done;
set a !minimum_pos leftmost;
set a i !minimum
done
[@@inline always];;
let mycmp x y = x - y;;
open Bigarray
type array_type = (int, int_elt, c_layout) Array1.t
let () =
let n = 107 in
let t = Array1.init Int C_layout n (fun i -> i lxor 0x45) in
let[@inline] get (a : array_type) i = Array1.get a i in
let[@inline] set (a : array_type) i v = Array1.set a i v in
selection_sort get set mycmp t 0 n;
for i=0 to n-1 do
Printf.printf "%d " (Array1.get t i)
done;;
```

This trick means that you never generate non-specialised bigarray operations, so you should get the result you want. (And as a bonus, you can reuse the same code for arrays, bigarrays, and whatever other datatypes you might think of that implements `get`

and `set`

.)

Oh, indeed. (For the record: I was trying to evaluate the how `Domainslib`

could help us speed up some computations, and I tried implementing parallel quicksort first as a test, then I got sucked into issues that did not have anything to do with concurrency.)

1 Like