[ANN] dkml-c-probe: Cross-compiler friendly definitions for C compiling

Summary: dkml-c-probe is a new package for maintainers who compile or link C code. Install it with opam install dkml-c-probe. Full docs are at https://github.com/diskuv/dkml-c-probe#readme

Problem

You are creating an OCaml package that has foreign C code. Perhaps you need special C headers or libraries when you are targeting Apple users, or perhaps you need to execute custom OCaml code for Android users. More generally you need a way to determine whether your OCaml or C code is compiling for a Linux AMD/Intel 64-bit, Android ARM 32-bit, or any other ABI target.

Solution

A user of your OCaml package may, for example, be on a 64-bit AMD/Intel Linux machine using a 32-bit OCaml system compiled with gcc -m32; additionally they have a 32-bit Android ARM cross-compiler toolchain. dkml-c-probe will tell you the target operating system is Linux and the target ABI is Linux_x86 except when the cross-compiler toolchain is invoked. With the cross-compiler toolchain dkml-c-probe will tell you the target operating system is Android and the target ABI is Android_arm32v7a.

How it works

dkml-c-probe uses C preprocessor definitions (ex. #if TARGET_CPU_X86_64, #if __ANDROID__, etc.) to determine which ABI the C compiler (ex. ocamlopt -config | grep native_c_compiler) is targeting.

This isn’t a new idea. The pattern is used in Esy and Mirage code as well. dkml-c-probe just codifies the pattern for use in your own code.

Usage

In OCaml code you can use the versioned module:

module V2 :
  sig
    type t_os = Android | IOS | Linux | OSX | Windows
    type t_abi =
        Android_arm64v8a
      | Android_arm32v7a
      | Android_x86
      | Android_x86_64
      | Darwin_arm64
      | Darwin_x86_64
      | Linux_arm64
      | Linux_arm32v6
      | Linux_arm32v7
      | Linux_x86_64
      | Linux_x86
      | Windows_x86_64
      | Windows_x86
      | Windows_arm64
      | Windows_arm32
    val get_os : (t_os, string) result Lazy.t
    val get_abi : (t_abi, string) result Lazy.t
    val get_abi_name : (string, string) result Lazy.t
  end

Edit: The docs wrongly showed (*, Rresult.R.msg) result Lazy.t. The (*, string) result Lazy.t type is actually used in the API

In C code you can use the provided dkml_compiler_probe.h header from within Dune or Opam. Here is a snippet that handles part of the Linux introspection:

#elif __linux__
#   if __ANDROID__
#       ...
#   else
#       define DKML_OS_NAME "Linux"
#       define DKML_OS_Linux
#       if __aarch64__
#           define DKML_ABI "linux_arm64"
#           define DKML_ABI_linux_arm64
#       elif __arm__
#           if defined(__ARM_ARCH_6__) || defined(__ARM_ARCH_6J__) || defined(__ARM_ARCH_6K__) || defined(__ARM_ARCH_6Z__) || defined(__ARM_ARCH_6ZK__) || defined(__ARM_ARCH_6T2__)
#               define DKML_ABI "linux_arm32v6"
#               define DKML_ABI_linux_arm32v6
#           elif defined(__ARM_ARCH_7__) || defined(__ARM_ARCH_7A__) || defined(__ARM_ARCH_7R__) || defined(__ARM_ARCH_7M__) || defined(__ARM_ARCH_7S__)
#               define DKML_ABI "linux_arm32v7"
#               define DKML_ABI_linux_arm32v7
#           endif /* __ARM_ARCH_6__ || ...,  __ARM_ARCH_7__ || ... */
#       elif __x86_64__
#           define DKML_ABI "linux_x86_64"
#           define DKML_ABI_linux_x86_64
#       elif __i386__
#           define DKML_ABI "linux_x86"
#           define DKML_ABI_linux_x86
#       elif defined(__ppc64__) || defined(__PPC64__)
#           define DKML_ABI "linux_ppc64"
#           define DKML_ABI_linux_ppc64
#       elif __s390x__
#           define DKML_ABI "linux_s390x"
#           define DKML_ABI_linux_s390x
#       endif /* __aarch64__, __arm__, __x86_64__, __i386__, __ppc64__ || __PPC64__, __s390x__ */

Versioning and Contributing

Whenever a new ABI is added, it goes into a new version (ex. module V3). Your existing code that uses module V2 will be unaffected.

But each new ABI needs to have its own maintainer because I don’t have access to every hardware platform on the planet!

For example, PowerPC (ppc64) and Linux on IBM Z (s390x) are supported in the C Header but not the OCaml module because there are no PowerPC and S390x maintainers.

Please consider contributing, especially if you want others to have an easier compilation story for your favorite hardware platform.

2 Likes

Fantastic! Thanks for doing it. I was doing something similar “by hand” until now (and less robustly). I had thought of blogging about it but having a library makes so much more sense.

Can I ask you why you decided to go for the full platform identifier instead of separating the OS from the architecture in the type?

I would also suggest to have an Unknown variant instead of a hard failure. And let the users of the library decide if they want to fail or use some defaults instead of raising an error in the library itself. Or at least create an Unknown exception and document the behaviour clearly and explicitly. I usually get annoyed when my code fails due to undocumented or hidden exceptions.

Yeah, your by hand library is pretty similar use of the pattern.

  1. I’ve never had a need for the architecture by itself. For example, in your eigen code you have:
    @ (match arch, os with
      | `arm64, `mac -> [ "-mcpu=apple-m1" ]
      | `x86_64, _   -> [ "-march=native"; "-mfpmath=sse"; "-msse2" ]
      | _            -> [])

In my own code the (arch, os) | x86_64, _ clause would not work because I’m using the MSVC compiler on Windows (and clang on mac and gcc on Linux).

Edit: That wasn’t the best example. match arch,ccomp_type,os would be a better choice than match abi,os

(Unrelated coincidence: A couple days ago I tried to compile Owl with MSVC and ran into that same code block. It would be nice if the above code worked with MSVC)

  1. There are some pretty hairy architectures, and not all OS + architecture choices are valid. The 32-bit arm architecture has v6/v7/hard/soft ABI variants that aren’t compatible, and Linux supports all the variants. But on Windows only one flavor of ARM32 is supported. So I’ve found it easier and less error-prone to only enumerate valid ABIs.

Regardless, if someone wants to add a getter for an architecture, it definitely can be added.

1 Like

Looking back at the code, it is a bug that I’m using a result type but doing a failwith in a couple places. I can open a bug for that.

Are you proposing:

type t_abi = ... | Unknown of string
val get_abi : t_abi

compared to the existing

type t_abi = ...
val get_abi : (t_abi, Rresult.R.msg) result Lazy.t

?

That is an improvement that can go into V3. (I was not convinced when I started the project that the ABI (and architecture and OS) could be fully probed by the C compiler, so I allowed for future runtime probing with result Lazy.t)

1 Like

Oh I see. This was what made me raise the point in the first instance. I do still think that there is value in the unknown variant (see below).

Yes, you could actually keep the result to encapsulate other errors (e.g. compiler issues) but use the Unknown to deal with unsupported systems (like e.g. BSD family systems that right now don’t appear in the list). I think it is easier and more robust than having to parse the error message. By the way, does the library work on BSD or you get a similar error as in the issue I linked in this paragraph?

If you have a fix for it, I’d be happy to integrate it in Support Arm64 (and apple M1) by mseri · Pull Request #609 · owlbarn/owl · GitHub and patch eigen as well if necessary. Please let me know.

I plan to send you a PR to drop some external dependencies in the near future, if it is welcome. I’d like to replace my handwritten code with this library but I would prefer to keep extra build dependencies to a minimum.

Made an issue to track all of this: Task list for V3 · Issue #1 · diskuv/dkml-c-probe · GitHub . Please watch that issue @mseri and anybody else who would like a change in or submit a PR for the API. After the new V3 is available I’ll repost on this ANN thread.

For the eigen patch, yes I can send a PR.

1 Like

Thanks a lot, both for the library and the responsiveness!

V3 is available. Its C_abi module has some big enhancements:

  • cleaner API (thanks @mseri!)
  • recognizes the BSD family: OpenBSD, FreeBSD, NetBSD and DragonFly on x86_64 hardware
  • integration testing now includes OpenBSD, FreeBSD and one cross-compiling toolchain (macOS x86_64 host that targets arm64)

V3 also has a new module C_conf which occupies the same problem space as findlib / ocamlfind and pkg-config:

  • Unlike findlib which is a specification+tool for 3rd party OCaml libraries, C_conf is a specification+tool for foreign C libraries
  • Unlike pkg-config which is a specification+tool for system (host ABI) C libraries, C_conf is a specification+tool for the multiple ABIs that are present when you cross-compile OCaml or C code
  • Unlike pkg-config which is designed for Unix, C_conf is designed for Windows and Unix where paths may have spaces, backslashes and colons
  • For now the specification is based on environment variables. If it proves useful the specification can be extended.

Examples and doc links for V3 are available at https://github.com/diskuv/dkml-c-probe#dkml-c-probe

2 Likes

Thanks a lot for the update! Can you say a bit more about how C_conf works?

C_conf has a detailed problem statement and spec at C_conf (dkml-c-probe.Dkml_c_probe.C_conf) (which is linked to on the dkml-c-probe README).

I probably shouldn’t regurgitate the doc here, so I’ll take a few key pieces from the doc and then post some things here that I didn’t put on that doc page …

  1. Here is my configuration for locating the “gmp” library on my Apple Silicon host machine that cross-compiles to x86_64:

    CP_GMP_CC_DEFAULT                 = -IZ:/build/darwin_arm64/vcpkg_installed/arm64-osx/include
    CP_GMP_CC_DEFAULT_DARWIN_X86_64   = -IZ:/build/darwin_x86_64/vcpkg_installed/x64-osx/include
    CP_GMP_LINK_DEFAULT               = -LZ:/build/darwin_arm64/vcpkg_installed/arm64-osx/lib;-lgmp
    CP_GMP_LINK_DEFAULT_DARWIN_X86_64 = -LZ:/build/darwin_x86_64/vcpkg_installed/x64-osx/lib;-lgmp
    
  • The other direction may be more interesting, since the free GitHub Actions only supports x86_64. The scenario of taking a macOS x86_64 GitHub host and cross-compiling to Apple Silicon is implemented and partially tested.
  1. I am using a C package manager (vcpkg) to give me cross-compiled libraries and the flags for the target ABI (in this case darwin_x86_64 is the target ABI). In general it doesn’t matter where you get your target ABI compatible libraries from. Example: When I’m cross-compiling to Android on a Windows x86_64 host, the Android Studio environment gives me some libraries for an Android Emulator (host ABI) and also prebuilt libraries for 4 Android device ABIs:

    Directory: C:\Users\xxx\AppData\Local\Android\Sdk\ndk\23.1.7779620\prebuilt
    
    Mode                 LastWriteTime         Length Name
    ----                 -------------         ------ ----
    d-----        10/20/2021   8:27 PM                android-arm
    d-----        10/20/2021   8:27 PM                android-arm64
    d-----        10/20/2021   8:27 PM                android-x86
    d-----        10/20/2021   8:26 PM                android-x86_64
    d-----        10/20/2021   8:27 PM                windows-x86_64
    
  2. The CP_clibrary_CC_DEFAULT_abi configuration relies on abi (the ocamlfind toolchain name) being standardized. The gmp library, for example, is used by many OCaml packages; I wanted one configuration for gmp, not one configuration for each (gmp, OCaml package) combination. In fact, getting a consistent abi naming was one of my motivations for releasing dkml-c-probe. I don’t think the prior art got this right … the very stale opam-cross-android project uses abi = "android" which is insufficient to differentiate the 5+ sets of libraries available in Android Studio.

  3. The “gmp” (etc.) configuration is done once in a familiar syntax (-L, -I, -l). However the C_conf library will parse and print the configuration in the appropriate C compiler syntax. When the MSVC compiler is used you get MSVC style linking:

    [
      "-LIBPATH:Z:/build/darwin_x86_64/vcpkg_installed/x64-osx/lib";
      "gmp.lib"
    ]
    

    MSVC and GCC conventions are supported today in C_conf.

  4. A real example of using C_conf is in my customization of zarith library. It checks C_conf first to see whether the user has the host/target ABI configuration; if it doesn’t it falls back to pkg-config.

The trend of using pkg-config in OCaml packages makes both native Windows and cross-compilation difficult. At the moment we unintentionally shoot ourselves in the foot because Dune documentation encourages pkg-config for understandable reasons. I hope dkml-c-probe can break that trend.