Web Analytics on OCaml.org

As you’ve witnessed, the OCaml.org team has been hard at work to make the site the best resource to learn OCaml and discover the ecosystem.

Since the launch of V3 in April last year, we’ve revamped the centralised documentation site based on community feedback, and we’re currently doing the same for the Learn area and the documentation. We’re planning to revisit the Blog and Community sections next.

While we’re receiving tons of qualitative feedback that indicates that we’re moving in the right direction, it’s been a challenge to measure the impact of the decisions we take. Are users of the site able to find the Standard Library documentation more easily? How many users who install OCaml end up reading the documentation? Are people using the new OCaml Changelog and the Job board? A lot of questions are currently difficult to answer and would allow us to make better decisions to improve the experience on the site.

When we launched the site, we made a strong commitment to protect users’ privacy. We refuse to use cookies, we are not using any external service that might collect your data, we’re vendoring every JavaScript and asset so as to not use external CDN, and we’re not running any web analytics.

We’re still unwaveringly committed to protecting OCaml.org’s visitors’ privacy. To address our lack of data on the site’s usage while respecting the principles we’ve adopted, we’ve selected Plausible as a possible way to get usage statistics.

Plausible is a privacy-focused web analytics service. It doesn’t use cookies, doesn’t collect any personal data, and is fully compliant with GDPR, CCPA and PECR.

We plan on rolling out Plausible for OCaml.org in the coming weeks.

Do you have any questions or concerns with using Plausible on OCaml.org?

6 Likes

What will the analytics strategy be, exactly? What data will be collected and how? E.g., will there be query parameters in links so that a user’s browsing path can be tracked?

Just to get an idea of the community feeling about this, here’s a little poll:

  • I’m for adding analytics
  • I’m against adding analytics
  • I don’t care
0 voters
1 Like

You can find all the details right on Plausible’s website about data collection. We can self-host a Plausible instance (it’s open source, of course) alongside the ocaml.org website, which means that we won’t ever send even this anonymised aggregate data outside of the ocaml.org infrastructure, and we can implement a policy of resetting our database on regular intervals. All in all, I’m in support: a deployment like this gives the maintainers more day-to-day knowledge of what’s working on the new site, and what isn’t.

There’s also been a lot of people over the years asking for usage data of opam packages. While we do not (and likely never will) gather this data from the CLI itself, the documentation that is now present on ocaml.org may be a reasonable usage proxy via the aggregate Plausible statistics.

If people are against, I’d appreciate them chiming in on this thread with alternative approaches to solving the problems above too.

4 Likes

Query parameters are discarded, except for these special query parameters: ref= , source= , utm_source= , utm_medium= , utm_campaign= , utm_content= and utm_term= .

This seems reasonable to me. Thanks!

1 Like

Have you considered running a server-side analytics service? (it’s not clear how Plausible work but since their webpage mention JS size I assume that’s a client-side thing)

Since you already control the infrastructure, it looks like there could “just” be something like a dream middleware that would send the relevant data to a self hosted service (maybe Plausible itself - most analytics services have a similar option to send bulk analytics instead of letting the JS widget do it).

I think this would have some huge benefits compared to a client-side approach:

  • the developer population probably has a higher-than-average usage of adblockers, so you’d get more accurate data
  • by design, this can’t be set up to track purely client-side behavior like mouse coordinates, timing data, etc
1 Like

There’s also been a lot of people over the years asking for usage data of opam packages. While we do not (and likely never will) gather this data from the CLI itself, the documentation that is now present on ocaml.org may be a reasonable usage proxy via the aggregate Plausible statistics.

Are you going to give a public access to the Plausible statistics ? Or will it be only for the maintainers i.e. Tarides ?

If people are against, I’d appreciate them chiming in on this thread with alternative approaches to solving the problems above too.

It’s not because you believe there’s a problem that we need to believe it too…

1 Like

Thanks for the feedback and the participation in the survey!

Seeing that there aren’t major concerns, we’ll be moving forward with a trial of Plausible.

As @avsm said, we plan on self-hosting it on the OCaml.org infrastructure to respect our commitment to not use any third-party service. This means that not only we won’t be collecting any personal data, but even the aggregate data will never leave the OCaml.org infrastructure.

There’s roughly a third of people who are against adding analytics to OCaml.org in the survey above. We strongly believe that Plausible is aligned with our commitment to protect OCaml.org visitors’ privacy, but I’ll echo @avsm in saying that if people believe that this is not the case, I’d love to hear about the specific concerns and ideas for alternatives.

To answer some questions above:

Are you going to give a public access to the Plausible statistics ? Or will it be only for the maintainers i.e. Tarides ?

The analytics dashboard will be public.

Have you considered running a server-side analytics service?

Yes, @JiaeK actually worked on a server-side analytics service as part of her Outreachy internship in 2021 and had made fantastic progress. The WIP dashboard is available at https://ocaml.org/dashboard.

It currently doesn’t collect any data and only logs unique page accesses.

We had planned on building on top of this, but as you can imagine this is a large project, and the OCaml.org team has been prioritising improvements to the site itself.

I found the following to be a good read on the pros and cons of server-side vs client-side analytics: Client side vs server side analytics: What’s the gap in data? | Plausible Analytics

TL;DR for all its benefits, server-side analytics comes with a load of drawbacks and isn’t fundamentally more privacy-friendly than privacy-oriented client-side analytics solution.

That being said, if someone would like to contribute to the Dream analytics dashboard to make it usable as an alternative to other analytics solutions, I’d be more than happy to move towards this! Don’t hesitate to reach out to me or other OCaml.org maintainers about that.

I’m pleased to announce that we’ve rolled out the Plausible instance on the OCaml.org infrastructure.

The public dashboard is currently accessible at Plausible · ocaml.org, and we plan to update the OCaml.org DNS to provide a plausible.ocaml.org URL.

This is already showing very interesting results (the new Getting Started documentation are the most visited pages of the site!), and we can’t wait to see how the improvements we’re making to OCaml.org are reflected in the usage of the site.

As a reminder, Plausible is a privacy-focused Web analytics service, which we self-host on the OCaml.org infrastructure. No personal data is collected and we remain fully compliant with GDPR, CCPA and PECR. The information you have on the public dashboard is the information we have, and as you can see, this is all aggregated information which is never traced to individuals. Don’t hesitate to read more about what Plausible does to respect your privacy at Plausible: Privacy focused Google Analytics alternative | Plausible Analytics.

Thank you all!

7 Likes