I don’t know if anyone has noticed, but at least when I google errors or how to’s, I am often directed to ocamlwiki.
I suspect 99% of all articles are just generated with chatGPT, the style of writing is very similar and links seem to be hallucinated. They are all created by “ocamlfan” around the same time, and they contain a bunch of formatting errors and inaccuracies.
The formatting is off, it mentions non-existent dune commands like dune build --jobs <num_jobs> (I’ve now corrected it). The way the wiki pages writes the conclusions is 100% the writing style of chatgpt.
It also redirected to some sketchy websites when I was curios and tried it out. Haven’t checked if it’s still the case since.
I know the creator’s intentions must be good, but this is very misguided and should be taken down or at least repurposed to be a proper community wiki.
I was actually creating a post with this exact same title right now, the forum gave me an error of duplicate title! I’ll copy-paste the post I had prepared:
I won’t link it so its search rank isn’t boosted further, but a site called ocamlwiki, created this september by some content farm called “theresawikiforthat”, has been ranking very high in DuckDuckGo/Bing search results for OCaml recently (no idea about Google), yet most of its contents are, to say it politely, trash:
Made up concepts, fallacious definitions, confusing sitemap, broken formatting in every page, syntax errors in half the examples, vague explanations, outdated guides…
I share my concern here because I’ve seen a couple of people already being misdirected or confused by its contents in online discussions, and I suspect this will increasingly be the case as the next batch of students try to learn OCaml for college.
Let me paraphrase some great pearls I’ve found:
OCaml achieves dynamic typing through the use of Hindley-Milner type inference, but it obviously introduces the risk of type errors at runtime.
Dereferencing a null pointer in OCaml raises the Null_pointer_dereference exception. If using unsafe code to bypass type safety, code may also raise Type_mismatch.
Memory in OCaml is dynamically allocated with let. The statement let x = 5 allocates memory for x.
Dynamic arrays are created with Array.create and resized with Array.append.
OCaml supports list comprehensions, in the form of List.map.
Lists are preferable over arrays specifically for long collections.
OCaml supports in-place reversal of lists as a memory management optimization.
Types of the form type enum_name = Enum1 | .. | EnumN are known as enums; to emulate the associated integer constants that enums have in other languages, use Enum1 of int | .. | EnumN of int.
Marshal offers a reliable and efficient way to exchange data with type information between programs or persist it to storage.
Dynlink provides support for JIT compilation, which can greatly enhance the performance of OCaml programs. OCaml uses Metacircular JIT Compilation.
Lwt is a metaprogramming library.
Libevent is widely used in the field of OCaml programming (used by 1 package).
The Core library leverages Lwt.
And I’m only stopping because I’m feeling my brain rotting right now. Can we do anything to sink this site’s SEO? Is there maybe a lack of more “relevant” search results for some queries that could be solved with actual learning materials?
This only helps with individual quality of search results (and maybe systemically in that it moves traffic off google) but I’ve been a happy user of kagi for several months. afair, this site has never come up in a search result for me, and now it never will:
I would love to know what more we can do for the google problem. I’ve given “feedback” to google about the result via the ... context menu on their result, but I am not super optimistic here.
As we enter the age of widespread LLM spam, we will have to find effective tools for cultivating high-quality, open, but curated and protected, information ecosystems.
Two vague ideas in this direction for ocaml:
Develop or configure high quality, domain specific search for the ecosystem, with vetted domains (could we augment with semantic search!? ).
haah it looks like I was the one in the wrong since it appears to be a content farm.
my rationale was: ocaml devs rather fill an obscure niche in the grand scheme of things, as far as spammers are concerned anyway, we are an infinitesimal fraction of the web demographics and we’re more tech-literate than the average, so it doesn’t make sense for us to be a target of fishy stuff from an effort/profit point of view… therefore it must be the individual work of some overly enthusiastic ocaml fan :P
I was also seeing this LLM generated spam from ocamlwiki last week, filling the first page of results for things consistently. But now it seems to be purged from DuckDuckGo and Google.
To restate what others have been saying, I would recommend just ignoring this website. Not only most of it is inaccurate, but there are also things that are unnecessary or plain inefficient. Don’t contribute to it.
This really starts to annoy me.
How can we shut down this site? There must be a way…
AI (and people that use it) shouldn’t get away with spreading misinformation to others.
How did you report it? Like most websites, I don’t see any “this is garbage/spam” category. From the available categories, I’m not sure which one fits it… “Unlawful content”? “Malicious websites”? “Unexpected offensive or harmful material”?
I don’t see it on Google, but some of us probably default to using DuckDuckGo, and that pulls most of its results from Bing, which must be showing it too.
As for solutions, on the short term either the affected search engines downrank the website or some actual OCaml docs compete in SEO for the same top spots.
On the long term, if similar behaviour continued the OCaml Foundation could file for trademark on the OCaml name and logo (I don’t see any), and enforce a trademark policy, potentially disputing the ocamlwiki domain name. These policies are sometimes controversial in open source, but a simple “can’t appropiate the OCaml name or logo for a content farm/misinformation” would be enough, no need to exclude good faith usage like, say, ocamlverse.
It seems Bing won’t do anything, here’s their reply:
The information in our index is discovered through an organic crawling process. A complex software algorithm determines how to present the results. These processes work best when we avoid manual manipulation of the data and instead let our software do the work. While this may occasionally produce non-relevant results, our system will adjust itself over time to remove irrelevant search term associations.
So spam will “magically” get sorted out by Bing over time, after it has already done enough damage.
Honestly, the “spam” category just seems to serve the purpose of telling them which automated message should be sent to the complainer.