What's up with ocamlwiki?

MarcCoquand · December 10, 2023, 3:36pm

Hey!

I don’t know if anyone has noticed, but at least when I google errors or how to’s, I am often directed to ocamlwiki.

I suspect 99% of all articles are just generated with chatGPT, the style of writing is very similar and links seem to be hallucinated. They are all created by “ocamlfan” around the same time, and they contain a bunch of formatting errors and inaccuracies.

One example is here: Dune Performance Optimization Techniques - Ocaml Wiki
Another one is here: XML Parsing - Ocaml Wiki

The formatting is off, it mentions non-existent dune commands like dune build --jobs <num_jobs> (I’ve now corrected it). The way the wiki pages writes the conclusions is 100% the writing style of chatgpt.

The end result is that I just don’t trust it…

hyphenrf · December 10, 2023, 3:50pm

It also redirected to some sketchy websites when I was curios and tried it out. Haven’t checked if it’s still the case since.
I know the creator’s intentions must be good, but this is very misguided and should be taken down or at least repurposed to be a proper community wiki.

shonfeder · December 10, 2023, 4:01pm

Definitely looks like LLM spam to me.

Navigating to Bitnami MediaWiki:About - Ocaml Wiki, I found that the contact email for the site: info@theresawikiforthat.com

And googling this I was able to find a few other LLM spam “wikis” (includes sites for Julia and Elixir): "theresawikiforthat.com" - Google Search

I would expect there are many more, but they haven’t been indexed yet.

Unsurprisingly the ICANN registration for the domain has redacted ownership info, but there is at least the contact address via cloudflare: Email: https://domaincontact.cloudflareregistrar.com/ocamlwiki.com

I hope you are right, but I am not so optimistic

amongonz · December 10, 2023, 4:18pm

I was actually creating a post with this exact same title right now, the forum gave me an error of duplicate title! I’ll copy-paste the post I had prepared:

I won’t link it so its search rank isn’t boosted further, but a site called ocamlwiki, created this september by some content farm called “theresawikiforthat”, has been ranking very high in DuckDuckGo/Bing search results for OCaml recently (no idea about Google), yet most of its contents are, to say it politely, trash:

Made up concepts, fallacious definitions, confusing sitemap, broken formatting in every page, syntax errors in half the examples, vague explanations, outdated guides…

I share my concern here because I’ve seen a couple of people already being misdirected or confused by its contents in online discussions, and I suspect this will increasingly be the case as the next batch of students try to learn OCaml for college.

Let me paraphrase some great pearls I’ve found:

OCaml achieves dynamic typing through the use of Hindley-Milner type inference, but it obviously introduces the risk of type errors at runtime.
Dereferencing a null pointer in OCaml raises the Null_pointer_dereference exception. If using unsafe code to bypass type safety, code may also raise Type_mismatch.
Memory in OCaml is dynamically allocated with let. The statement let x = 5 allocates memory for x.
Dynamic arrays are created with Array.create and resized with Array.append.
OCaml supports list comprehensions, in the form of List.map.
Lists are preferable over arrays specifically for long collections.
OCaml supports in-place reversal of lists as a memory management optimization.
Types of the form type enum_name = Enum1 | .. | EnumN are known as enums; to emulate the associated integer constants that enums have in other languages, use Enum1 of int | .. | EnumN of int.
Marshal offers a reliable and efficient way to exchange data with type information between programs or persist it to storage.
Dynlink provides support for JIT compilation, which can greatly enhance the performance of OCaml programs. OCaml uses Metacircular JIT Compilation.
Lwt is a metaprogramming library.
Libevent is widely used in the field of OCaml programming (used by 1 package).
The Core library leverages Lwt.

And I’m only stopping because I’m feeling my brain rotting right now. Can we do anything to sink this site’s SEO? Is there maybe a lack of more “relevant” search results for some queries that could be solved with actual learning materials?

shonfeder · December 10, 2023, 5:07pm

This only helps with individual quality of search results (and maybe systemically in that it moves traffic off google) but I’ve been a happy user of kagi for several months. afair, this site has never come up in a search result for me, and now it never will:

I would love to know what more we can do for the google problem. I’ve given “feedback” to google about the result via the ... context menu on their result, but I am not super optimistic here.

As we enter the age of widespread LLM spam, we will have to find effective tools for cultivating high-quality, open, but curated and protected, information ecosystems.

Two vague ideas in this direction for ocaml:

Develop or configure high quality, domain specific search for the ecosystem, with vetted domains (could we augment with semantic search!? ).
Setting our house in order, by cleaning up and deprecating outdated material (e.g., A lot of pages redirect to v2.ocaml.org · Issue #465 · ocaml/ocaml.org · GitHub)

I look forward to reading other thoughts on what more we can do.

hyphenrf · December 10, 2023, 10:52pm

haah it looks like I was the one in the wrong since it appears to be a content farm.

my rationale was: ocaml devs rather fill an obscure niche in the grand scheme of things, as far as spammers are concerned anyway, we are an infinitesimal fraction of the web demographics and we’re more tech-literate than the average, so it doesn’t make sense for us to be a target of fishy stuff from an effort/profit point of view… therefore it must be the individual work of some overly enthusiastic ocaml fan :P

lambda_foo · December 10, 2023, 11:11pm

I was also seeing this LLM generated spam from ocamlwiki last week, filling the first page of results for things consistently. But now it seems to be purged from DuckDuckGo and Google.

yawaramin · December 11, 2023, 2:25am

LLM-generated content is an existential threat to Google. Let’s hope they come to their senses and act fast to stamp it out.

emillon · December 11, 2023, 9:22am

To restate what others have been saying, I would recommend just ignoring this website. Not only most of it is inaccurate, but there are also things that are unnecessary or plain inefficient. Don’t contribute to it.

Mhouh · December 18, 2023, 11:50am

This really starts to annoy me.
How can we shut down this site? There must be a way…
AI (and people that use it) shouldn’t get away with spreading misinformation to others.

MarcCoquand · December 18, 2023, 4:10pm

I reported it to bing which I think Duckduckgo is using, https://www.microsoft.com/en-us/concern/bing. Maybe it helps if everyone reports it?

amaro · December 18, 2023, 4:41pm

How did you report it? Like most websites, I don’t see any “this is garbage/spam” category. From the available categories, I’m not sure which one fits it… “Unlawful content”? “Malicious websites”? “Unexpected offensive or harmful material”?

yawaramin · December 18, 2023, 5:53pm

Is it still showing up in Google results? I can’t find any. Seems like Google removed it?

amongonz · December 18, 2023, 6:21pm

I don’t see it on Google, but some of us probably default to using DuckDuckGo, and that pulls most of its results from Bing, which must be showing it too.

As for solutions, on the short term either the affected search engines downrank the website or some actual OCaml docs compete in SEO for the same top spots.

On the long term, if similar behaviour continued the OCaml Foundation could file for trademark on the OCaml name and logo (I don’t see any), and enforce a trademark policy, potentially disputing the ocamlwiki domain name. These policies are sometimes controversial in open source, but a simple “can’t appropiate the OCaml name or logo for a content farm/misinformation” would be enough, no need to exclude good faith usage like, say, ocamlverse.

amongonz · December 18, 2023, 6:39pm

The “spam” category shows after selecting “malicious websites”.

Mhouh · December 19, 2023, 4:09pm

Did the same. Thanks for the suggestion.

amaro · December 20, 2023, 6:45am

It seems Bing won’t do anything, here’s their reply:

The information in our index is discovered through an organic crawling process. A complex software algorithm determines how to present the results. These processes work best when we avoid manual manipulation of the data and instead let our software do the work. While this may occasionally produce non-relevant results, our system will adjust itself over time to remove irrelevant search term associations.

So spam will “magically” get sorted out by Bing over time, after it has already done enough damage.

Honestly, the “spam” category just seems to serve the purpose of telling them which automated message should be sent to the complainer.

Topic		Replies	Views
Interesting OCaml Articles Community web , blog , video , podcast , social-media	109	19880	October 18, 2023
What I dislike about OCaml Community ocaml	117	11662	November 5, 2022
An experimental, unofficial OCaml wiki Community announce	24	4666	April 11, 2022
[BLOG] OCaml linting tools and techniques Ecosystem blog , ppx , dune	2	644	May 4, 2024
Pinned notice somewhere noticeable to discourage use of heavily/fully LLM generated sites? Site Feedback	0	249	February 14, 2024

What's up with ocamlwiki?

Related topics