Hello,
Raven was recently announced for modern scientific computing in OCaml.
Combining provable guarantees with machine learning started to get witnessed in both research and industry, but not matured yet for everyday production. Since type-driven programming is more native in logic-based avenues, it is worthwhile to highlight these for future directions.
Probabilistic Models
Story. A company decided to design a simple score for assessing loyal customers to be targeted for promotions. The score should be adapted according to the feedback they’ll get from consumers.
ML Limitation. The score must be completely interpretable / explainable for their feedback assessment. Modern ML models do not offer that.
Decision. Adhoc rules are implemented in a SQL query to compute the scores.
Probabilistic Models. A fully interpretable probabilistic model could be designed, considering the rules the team hypothesized.
Think of a graph whose edges denote a rule alongside a probability. If the client is a female, then with 70% a chance, it takes that vertex. each client traces a path, to predict whether a client is recommended for a promotion.
Why OCaml? Many probabilistic models are based on logical axioms, and thereby we can infer using logic. A stakeholder would appreciate a guarantee that no more 20% of clients will get at least 50% discount. I saw Probability theory formalism in type theory as in Kachapova’s paper. It is in Coq theorem prover (implemented in OCaml).
I am not aware of any extention of those fronteirs, from theorem proving to type-driven probabilistic model design. Knowledge-based systems seems to be dead in all modern languages.
Learning-augmented Algorithms
Story. A ride-hailing company decided to design a new matching algorithm to ensure fairness among all demanding areas. Serving a segment of customers at the expense of others causes bad feedback.
ML Limitation. The data is biased, and thereby any machine learning model shall be biased as well. The company rejects any learning-from-data as it does not conform with its policies.
Decision. The engineering team found an open source matching engine, and after some testing and tuning, deployed it.
Learning-augmented Algorithms. A matching algorithm could be designed ensuring fairness among demanding areas, yet preferring profitable clients within each area.
Think of binary search but with choosing the pivot element using a predictive blackbox. If prediction accuracy is 100% then we find the target after one comparison. If accuracy is bad, then we are not worse than the worst-case scenario of binary search, finding the target in O(log).
Why OCaml? Since fairness has an axiomatic logical foundation, expressing logical properties is in favour of OCaml in property testing. Partial function application enables a clean algorithm parametrization.
I am not aware of any library for learning-augmented algorithms in any language.
Neuro-symbolic AI
Story. A legal consultancy company is designing a chatbot. It has strict policies the chatbot should follow.
ML Limitation. Text generation could hallucinate. It is more problemetic in sensitive domains like legal consultancy.
Decision. Retrieval-augmented generation, where related trusted sources are retrieved, guided by them the text is generated.
Neuro-symbolic AI. Instead of just retrieving, a symbolic engine could query the user some questions, and reason and infer new conclusions using logic. Based on these conclusions, the text generation process is guided.
Why OCaml? It is more native to express symbolic components in terms of types.
I am not aware of any library in any language for building symbolic components using types.
Bonus. Data Engineering
Data engineering is critical for any scientific computing work. Functional programming is already recognized for that. See TU Delft’s course.
Discussion
We are calling out for the community, to try similar case studies.
- Start with adhocs or imperative programming integration techniques in any language.
- If promising, look for architectural patterns in research.
- Estimate whether OCaml’s ecosystem is of an added value.
Contributing to OCaml is not a hobbyist’s decision. It should be driven by a business model, and that is what we are trying to figure now. Multi-disciplinary volunteers are needed, across research, engineering, and business.
I am happy to learn from your feedback and suggestions.