There are also quite a few publications on recursive modules, though not all that is written there got into the language, so I’m not sure whether it is relevant.
Hmm, we should really create a repository for module bibliography like the effect and meta-programming one that Jeremy Yallop did.
There are really two module papers by Leroy, POPL 1994, which show you can make it compatible with separate compilation, and POPL 1995, which shows why it’s really usable for abstraction. There is also a JFP 1996 paper presenting the type theory behind it. This one is less useful from a practical point of view.
For first class modules packages types, The right reference is “First-Class Structures for Standard ML” by Claudio V. Russo.
There is a really nice comparison of module systems in the chapter 1 and 2 of Derek Dreyer’s thesis, “Understanding and Evolving the ML Module System”. The rest of the thesis is dedicated to pushing recursive modules to the limit, and is a bit harder to get into.
Now, for some sci-fi (aka, features that are not in OCaml):
Rossberg’s papers (MixML, F-Ing modules) are very nice.
“An Expressive Language of Signatures” (Ramsey et al) presents some extensions of the signature language that are also useful.
Backpack (“Backpack: retrofitting haskell with interfaces”, Kilpatrick et al) is interesting, although a bit alien for us ML programmers.
The work on the DOT language (a core language for Scala) is also relevant. See Tiark Rompf’s papers the last two years
Finally, for some proofs, a modern take on soudness for modules in the context of a very rich extension of SML was done by Karl Crary in “Modules, abstraction, and parametric polymorphism”, which is fairly nice and readable.
Tofte’s thesis, “Operational Semantics and Polymorphic Type Inference” is older and much harder to get into, but cover a simpler system.
Finally, and in order to plug in my own stuff, my thesis has lot’s of modules!
Chapter 3 is dedicated to summarizing the basic definition of the language found in XL’s papers in a way that is a bit more accessible (and with lot’s of examples). There are also some pointers to other related works.
My personal preference is for Mark Lillibridge’s thesis: “Translucent Sums:A Foundation for Higher-Order Module Systems”. In addition to the papers listed by Drup, Xavier’s “A modular module system” is quite readable.