Nnpchecker error–how to debug?

I am getting an nnpchecker error in a build: [new release] dream-html (0.0.2) by yawaramin · Pull Request #23786 · ocaml/opam-repository · GitHub

The message is:

#=== ERROR while compiling dream-html.0.0.2 ===================================#
# context              2.2.0~alpha~dev | linux/x86_64 | ocaml-options-only-nnpchecker.1 | pinned(https://github.com/yawaramin/dream-html/releases/download/v0.0.2/dream-html-0.0.2.tbz)
# path                 ~/.opam/4.14/.opam-switch/build/dream-html.0.0.2
# command              ~/.opam/opam-init/hooks/sandbox.sh build dune build -p dream-html -j 127 @install @runtest
# exit-code            1
# env-file             ~/.opam/log/dream-html-7-5933ac.env
# output-file          ~/.opam/log/dream-html-7-5933ac.out
### output ###
# File "test/dune", line 2, characters 7-22:
# 2 |  (name dream_html_test)
#            ^^^^^^^^^^^^^^^
# (cd _build/default/test && ./dream_html_test.exe)
# <!DOCTYPE html>
# <html lang="en"><head><title>Dream_html Test</title>
# </head>
# <body id="test-content"><main spellcheck="true"><article id="article-1" class="story"><p data-hx-get="/p1?a%20b" data-hx-target="closest article > p">Test para 1.</p>
# <p>Test para 2.</p>
# <a href="/a?b=cd:efg/hij">cd:efg/hij</a>
# <a href="/%F0%9F%98%89">wink</a>
# </article>
# <input type="text" autocomplete="name" onblur="if (1 > 0) alert(this.value)"><!-- oops --&gt;&lt;script&gt;alert(&#x27;lol&#x27;)&lt;/script&gt; -->
# <dialog open><div></div>
# </dialog>
# <template id="idtmpl"><p>Template</p>
# </template>
# <div translate="no"><p translate="yes"></p>
# </div>
# <textarea required data-hx-trigger="keyup[target.value.trim() != '']" autocapitalize="words">super</textarea>
# <hr class="super"><p id="greet-Bob">Hello, Bob!</p>
# </main>
# </body>
# </html>
# 
# Out-of-heap pointer at 0x5616480e3210 of value 0x561647f87180 has non-black head (tag=65)
# 
# Out-of-heap pointers were detected by the runtime.
# The process would otherwise have terminated normally.
# File "test/dune", line 5, characters 0-65:
# 5 | (rule
# 6 |  (with-stdout-to
# 7 |   got.html
# 8 |   (run ./dream_html_test.exe)))
# (cd _build/default/test && ./dream_html_test.exe) > _build/default/test/got.html
# Out-of-heap pointer at 0x55dd1c32e210 of value 0x55dd1c1d2180 has non-black head (tag=65)
# 
# Out-of-heap pointers were detected by the runtime.
# The process would otherwise have terminated normally.

The test is defined here: dream-html/dune at v0.0.2 · yawaramin/dream-html · GitHub

It is a dune diff action. The test binary writes to standard output, dune captures it in a file and diffs it with another file. If they are the same then the test passes. This is working for me locally with OCaml 4.14 stock version. But it’s failing in Opam-CI with the ‘no naked pointers’ check.

Any ideas how to debug?

Easiest to debug if you have access to the binary and rr [ANN] A dynamic checker for detecting naked pointers

1 Like

Thanks. It looks like the binary is dune itself so I guess I can scan it with the nnpchecker. Kinda surprising that dune would be messing with C pointers though. I’ll try later today.

If the binary at fault is dune, then it would be great to make an issue on the dune issue tracker. Naked pointers are bugs in OCaml 5.

According to the log, I think I don’t think the issue is in dune itself. It’s in dream_html_test.exe and dune is displaying the command that failed:

# (cd _build/default/test && ./dream_html_test.exe) > _build/default/test/got.html
# Out-of-heap pointer at 0x55dd1c32e210 of value 0x55dd1c1d2180 has non-black head (tag=65)

Thanks for the analysis. If you have rr installed on your machine, it is fairly easy to get the source of this out-of-heap pointer.

Thanks for the pointer (no pun intended). Although, that is puzzling because the test is just building an HTML data structure, converting it to a string, and printing it to standard output:

let node = ...
let () = node |> to_string |> print_endline

In fact, we can see that it ran successfully because we see the entire test HTML output in the logs:

<!DOCTYPE html>
<html ...>
...
</html>

Only after it successfully finishes printing, does it throw the out-of-heap pointer error. So, I am actually hesitant to conclude that the issue is even in my test executable. Will look into it further.

Your executable is probably fine, but one of the libraries you’re linking against might use naked pointers.

1 Like

Here is a non-complete method. It only catches some uses of naked pointers, but some that seem to be common.

  1. Download the source of all dependencies, with something like:
    opam install --download-only dream
    
  2. Grep the sources for all possible (value) casts in C files:
    grep -r --include \*.h --include \*.c -e "(value)" _opam/.opam-switch/sources/
    
  3. Carefully audit the results to see if any of those creates a naked pointer.

Here it seems that ocaml-ssl creates a naked pointer in the C function ocaml_ssl_get_current_cipher. This does not guarantee that it is the source of your issue though, maybe it is possible to cross-check with other clues at your disposal.

This method is not exhaustive of course, like the nnp checker detector, but so far it seems to prove successful in finding code that is incompatible with OCaml 5. Unlike the nnp detector, it is not limited to code paths that actually run nor by the whims of GC scheduling, and the (value) cast seems to be an invariant of some of the techniques that are now deprecated. It does require to understand the code in order to exclude valid uses of the (value) cast, though.

For the context, this method has proven useful in identifying many packages that use naked pointers. See the OCaml workshop 2022 paper which contains the most up-to-date discussion on the nnp issue. While some have been fixed for OCaml 5, it is likely that many of these have not yet been fixed, while still showing as green. Unfortunately for this issue, I had to stop when I ran out of time, because it proved more tedious and time-consuming than expected (e.g. compared to code audits I did in the past). But I thought the core developers who were working on fixing deprecations in opam packages might be interested in this approach.

2 Likes

Thanks very much for that analysis! After looking at rr I realized it’s Linux only so I can’t use it on Mac. I filed an issue with ocaml-ssl: Naked pointer - ocaml_ssl_get_current_cipher · Issue #143 · savonet/ocaml-ssl · GitHub

EDIT: looks like you had already found this in the analysis for your paper :slight_smile: other/async_audit/value · master · gadmm / stdlib-experiment · GitLab [EDIT EDIT: not in an accusatory tone btw, just amused! haha]

As this is marked with an x it looks like I actually missed it :slight_smile: That’s probably a side-effect of this being tedious and time-consuming.

I just found another interesting tidbit: other pure-OCaml libraries built on Dream seem to be exposing the same nnpchecker error, e.g. [new release] dream-htmx (0.1.0) by beajeanm · Pull Request #21954 · ocaml/opam-repository · GitHub

So…clearly it’s not specific to my library. I will try submitting dream-html to opam again soon.