Eliom runtime errors, zero compile-time errors

I am at my wits end with eliom and ocsigenserver. I have tried three switches from 4.08.1 through 4.10.1, employed the newer versions of eliom-distillery in each case, and still have errors at run time, but zero compile time errors. My eliomc is not too complicated, but it does depend on 3 local projects. All 3 have META files, opam files, double checked for presence of byte and native cma and cmxa files (they’re present and listed in the META files). This used to work and now it doesn’t. I don’t know why.

In the case of trying to install and run.byte, a bigstringaf dependency jumps out of nowhere. After opam installing bigstringaf and including it in the server dependencies sometimes under some switches, causes still more errors about missing dlljsoo_runtime_stubs.so, which leads me down a bottomless rabbit hole trying to guess which opam package provides that. And eventually breaks the build process for one of the dependencies.

sudo PATH=$PATH OCAMLPATH=$OCAMLPATH LD_LIBRARY_PATH=$LD_LIBRARY_PATH make run.byte

Fatal error: cannot load shared library dllbigstringaf_stubs
Reason: dllbigstringaf_stubs.so: cannot open shared object file: No such file or directory
Aborted

Trying to install and then run.opt code causes a failure to find a core related symbol. I’m not sure why. I threw in core_unix as an explicit dependency in Makefile.options and in the dune library stanza of the dependent library, but no luck so far.

sudo PATH=$PATH OCAMLPATH=$OCAMLPATH LD_LIBRARY_PATH=$LD_LIBRARY_PATH make run.opt

ocsigenserver.opt: main: Fatal - While loading … .opam/4.10.1/lib/core/core.cmxs: error loading shared library: error loading shared library: (Failure ocsigenserver.opt: main: “/home/admin/large_ebs/.opam/4.10.1/lib/core/core.cmxs: undefined symbol: core_kernel_time_ns_format_tm”)

Does anyone have a clue why either of these errors keep happening? Thx in advance…

I think someone already reported (and fixed) a similar problem on MacOS but I don’t know the solution.
Are you on MacOS?

May be related to: make test.byte fails on MacOS · Issue #511 · ocsigen/ocsigen-start · GitHub

Not on MacOs. Running Devuan.

I’ve made some progress trying to brute force a solution, but it’s not pretty.
I noticed I had forgotten on a new box to add the user to the www-data group, so I fixed that.
Contrary to eliom, ocsigen documentation, the command line (if using sudo) should also include a means of preserving CAML_LD_LIBRARY_PATH, which I wasn’t doing at first, and adding that helped a bit making progress at run time. No more complaints about bigstringaf or whatever that was.

Make run.opt now still fails as before complaining about core_kernel_time_ns_format_tm, which is not listed under ocamlrun -p but does exist in the core_kernel package, and specifically in a time_ns_stubs.{c,h}. I am lost as to why it’s not being picked up.

Make run.byte gets further now but regardless of whether I use dbm or sqlite, I get this error now:

ocsigenserver: ocsigen:config: While parsing config file, tag : No defaulthostname, assuming it is
ocsigenserver: main: Fatal - Unix.Unix_error(Unix.EACCES, “bind”, “”), make: *** [Makefile:58: test.byte] Error 9

I confirmed that the sqlite or dbm persistent store does indeed create files under /usr/local/var/data/ and I initially thought it was just an ownership or permissions issue but loosening up ownership or permissions doesn’t help at all.

Interestingly EACCES error does not appear if I invoke (sudo with preserved paths) make test.byte…everything works ok. If I change the port to port 80 in the Makefile.options, recompile, don’t install but just invoke with sudo (and preserve paths), make test.byte still works ok, and the project works ok. A bit strange from my standpoint.

Where was the MacOS fix? Git issue tracker? Ocaml.discuss?

Thx.

Ok thank you for documenting the solution. It can help other people.
Do not hesitate to do a PR if you think you can make this more robust.

About your second problem: I think there is currently a bug with recent versions of Ocsigen Server (the ones using cohttp): if I’m right, running Ocsigen Server on ports 80/443 with another user than root is broken …
A workaround is to run as root (i.e. do not use in config file), or run on a large port number (like 8080) with a reverse proxy on port 80 …

Is it your problem?

It’s not exactly what I would call a solution. Perhaps a piece of the puzzle.

I had no idea there is a bug in Ocsigen Server preventing use with any user other than root. How old is that bug? I just confirmed the behavior you described is true. I tried running as root (defined as such in Makefile.options) and it works just fine. Perhaps I should downgrade Ocsigenserver? I forgot to mention I am on Ocsigen 5.0.1 and Eliom 10.0.0 under an opam switch of 4.10.1. Such are the joys of blindly upgrading. Sometimes newer is not better. This is a serious bug that chewed up a lot of my time, and had you not told me I would never have discovered what was wrong, and I might have been forced to dump eliom and ocsigenserver completely.

Do you work on Ocsigenserver? Or exclusively Eliom? Any idea when that bug will be fixed? Run a reverse proxy? So run another server process in order to run Ocsigenserver…sounds terrible. I’m not going that route. With what? Apache? Then what’s the point of running Ocsigenserver, I ask myself. But anyway, I suppose I could make a PR but it’s so tiny…and I still have no idea why make run.opt is still failing.

I have no idea of the problem, so it’s difficult to say, but I’ll have a look (probably tomorrow).

About a reverse proxy for a temporary solution (if you have an emergency), I think setting up a nginx is lightweight and usually quick.

I spent some time this morning on your problems.

Can you try branch fixrunningasroot of Ocsigen Server to see if it fixes the EACCESS error? You can pin this branch in opam like this:
opam pin add ocsigenserver https://github.com/ocsigen/ocsigenserver.git#fixrunningasroot
If it works for you we will release it in opam.

About " cannot load shared library dllbigstringaf_stubs": I was able to reproduce the problem by unsetting CAML_LD_LIBRARY_PATH. Can you check your opam configuration? May be you forgot eval $(opam env) in your root environment?

Thanks for looking at the problem.

CAML_LD_LIBRARY_PATH most likely was unset I now realize since I was using tmux, and I forgot to run eval every time I started up a new instance of tmux. So that most likely explains that.

The pin to fixrunningasroot branch does not work. I made sure to eval, clean, make, install, and run byte and opt. It just hangs at invoking ocsigenserver, and never launches, and nothing is written to log files. The same is true even if invoking test opt and byte.

Are you sure the server is not launched?
The new version seems to be more silent than before (it starts without a confirmation).

Confirmed. I unpinned, downgraded to ocsigenserver 5.0.1, went through all the motions, and now it launches and works. I also tested loading the main landing page locally with w3m pointed at localhost before and after I removed the pin. w3m would fail to load the page with the pinned version, but after the pin was removed, w3m loaded the page successfully.

Ok. I probably missed something. Can you send me by email (first name . last name @ ocsigen.org) your config file and result of the following command?
strace ocsigenserver -f your_config_file.conf

Update: The problem comes from cohttp: If we want to run on port 80/443, we need to launch the server as root, then switch the process to another user (for security reasons). But, if I’m right, cohttp does not allow to execute something just after binding the port (or to be alerted when the port it is safe to switch user).
Any people from cohttp reading this?

Can you open an issue on cohttp repository? We can discuss there how this could be dealt with and if we can address it before the next release

Yes of course. Thanks!

1 Like