Hi! optiml-transport was just released on opam. This library binds C++ primitives to solve the
optimal transportation problem between finite weighted point clouds (i.e. finite measures). Concretely, this allows to lift any metric on a base space to a metric on finitely supported probability measures over that base space. (In fact, the library works with cost functions more general than that satisfying the metric axioms.) The library also outputs an optimal coupling between any two such measures. Optimal transportation has many applications in statistics, graphics, optimization, etc.
The library consists in bindings to GitHub - nbonneel/network_simplex: Fast optimal transport code
Do you have any concrete use case example?
I can tell you about my current use case, which is probably not very representative.
I’m using it to assess the variability of a bunch of benchmarks of some piece of code over various experimental parameters (such as hw configuration, etc). Each benchmark yields an empirical time distribution, i.e. a finitely supported measure on the positive reals. Given N such measures, I compute the empirical variance in the metric space of probability measures as my variability score.
One could use other similarity measures for histograms instead of the optimal transportation one, say the euclidian distance or KL divergence. The issue is that these kinds of distance disregard the topology of the underlying space and are rather brittle wrt binning (i.e. the distance can vary a lot depending on the coarseness of the binning grid). There’s a good explanation there.
The simple-minded chemoinformatician that I am finds that the Tanimoto score is pretty good to compare histograms (Ts = sum_of_mins / sum_of_maxs over all bins).
Using 1 - Ts, that gives you a distance.