We don't have sincos in the Stdlib?

If I am right, we have sin and cos but not the combination of the two.
On modern hardware, I think there are special instructions to compute both at once.
Because, sometimes you need both the sin and cos of an angle.

There is an ongoing discussion in there: Stdlib.sincos [feature request] · Issue #12463 · ocaml/ocaml · GitHub

AFAIK only when using x87 FPU (i.e. when compiling with ‘-m32’, not by default), but even in that case they are so inaccurate that by default the compiler would never use them. See ⚙ D36344 [X86] Don't use fsin/fcos/fsincos instructions ever and Intel Underestimates Error Bounds by 1.3 quintillion | Random ASCII – tech blog of Bruce Dawson.

Also when using SSE there are no HW instructions to compute these AFAIK, they are computed in software in libm. But that software implementation is apparently faster than calling fsincos (see assembly - Calling fsincos instruction in LLVM slower than calling libc sin/cos functions? - Stack Overflow).

Having said that even the software implementation of sincos (if available) might be faster than calling sin and cos separately.
YMMV, I’d suggest to do some measurements (the results may also be different based on which libm implementation is used, or which architecture it is compiled for)

1 Like

I will try to benchmark a binding to the C sincos function.

What sort of use case do you have for sincos?

You could probably get decent speed ups if you need to compute lots of them and can use some sort of vectorized / data parallel interface. Something like:

val sincos : radians_in:float array
             -> sin_out:float array
             -> cos_out:float array
             -> unit

This should allow an implementation that could potentially take advantage of SIMD instructions and good level of ILP (and even exploiting multicore).

Of course, even if the sincos computation would be speeded up, using a plain array based interface like this may be cumbersome and it may be difficult to actually realize any performance benefits. What you’d really want is a way to compose data parallel computations that minimizes the number of intermediate data structures and allows the compiler to generete fused inner loops. Stream Fusion is an old paper on the topic. Here is a newer one: Stream Fusion, to Completeness.

The main trick is that the combinators do not have closed loops. I’m not sure how well the OCaml compiler inlines loopless higher-order functions. MLton used to be pretty good at it. My toy Fωμ compiler can also do it. Hopefully Flambda 2 will open up these sorts of things for OCaml.

That is the point. Most modern software implementations of sine and cosine rely on the following trigonometric identities: sin (a + b) = sin a * cos b + cos a * sin b and cos (a + b) = cos a * cos b - sin a * sin b. So, if your mathematical library has computed one (meaning it has already computed the four values cos a, sin a, cos b, and sin b), it is only two multiplications and one addition away from computing the other.

That is why both GCC and Clang optimize the following code into a single call to sincos(x) followed by an addition. (For Clang, you might need to pass -ffast-math.)

double foo(double x) { return sin(x) + cos(x); }

Perhaps a way to implement would be to add a configure test for sincos (it seems to be available at least on GNU libc, musl libc and FreeBSD):
(Some other platforms like Android implement it too, but just as calls to sin and cos in sequence).

If the function is not available then an ifdef macro could chose to call sin and cos sequentially. To make ‘noalloc’ direct C calls from OCaml possible (without intermediate stubs) perhaps there could be an ‘ml_sincos’ function that is either defined to be equal to ‘sincos’ or the above fallback implementation. This could be prototyped as a small library on ‘opam’ initially and the performance benefits and usefulness measured.
(Although ‘sincos’ needs to return 2 values, so it is not immediately obvious to me whether a ‘noalloc’ implementation would be possible here, unless you’ve preallocated an array or bigarray to store the results in)

So FreeBSD means that Mac OS X also has it?
And any BSD in fact?

On MacOS 13.4, the function is called __sincos.

What about windows also?

If you have any numerical code dealing with rotations, as soon
as you have an angle, you are usually interested both by its sinus and its cosinus.
This is useful for computer graphics but also molecular simulation software.

One recent example I plan to use:

@UnixJunkie for anything complex (not as in complex numbers, lol) I would recommend just using more advanced C libraries and OCaml bindings to them: