Code review: Org mode tangler

Hi folks, I’m doing my first steps in OCaml. As an exercise I implemented restricted Org mode tangler, because Org’s own is painfully slow. Is anybody keen to review my code? Any feedback including but not limited by more idiomatic expressions, code style and performance optimizations is welcome! Also I included comments in places which puzzled or don’t satisfy me. Please find code gist here https://gist.github.com/ul/c9d9a0f9996a9a2339d20f7a04ab6e8f

Any pointer on what the Org mode tangler does, for people not familiar with org mode?

Sorry, I should have written it from the beginning! So, Emacs’ Org mode supports literate programming style. You can embed blocks like this into document:

#+NAME: signal.ss
#+BEGIN_SRC scheme :tangle ad-libitum/signal.ss :noweb yes :mkdirp yes :paddle no
  #!chezscheme
  (library (ad-libitum signal (1))
    (export signal ~< <~ define-signal define~
            make-channel-vector channel-ref channel-set!
            constant silence ∅ unit
            mono
            live-signal live-value
            signal-sum signal-prod signal-diff signal-div
            +~ *~ -~ /~ ∑ ∏ mix pan phase->interval amplitude->phase)
    (import (chezscheme)
            (srfi s26 cut)
            (ad-libitum common))
    <<signal>>
    <<channel>>
    <<constant>>
    <<silence>>
    <<unit>>
    <<deref-signal>>
    <<mono>>
    <<live-signal>>
    <<live-value>>
    <<signal-operators>>
    )
#+END_SRC

When you run tangler, it looks for all source blocks which have :tangle argument in their header and writes their contents into the file at path given in that argument. When tangler encounters <<macro>> pattern it looks for a source block named macro and recursively expands it in place of <<macro>>. E.g. I have block silence

#+NAME: silence
#+BEGIN_SRC scheme
  (define~ silence 0.0)
  (alias ∅ silence)
#+END_SRC

and its 2 lines between #+BEGIN_SRC and #+END_SRC will substitute <<silence>> in the ad-libitum/signal.ss file contents. Example of document full of source blocks can be found here https://raw.githubusercontent.com/ul/ad-libitum/master/README.org and tangled content is checked into the repo https://github.com/ul/ad-libitum

As a side note, I have implementation in Nim which uses regular expressions and simple state machine instead of proper parser: https://gist.github.com/ul/484666a57bcd798da47811fb0bba7c88

I know that comparing it to my OCaml version is comparing apples to oranges, but I want to know which is more tasty =) Nim version definitely wins in resulting binary size, it’s only 106K vs 7.4M of OCaml (thanks, Core). It is also a little bit shorter (69LOC vs 86LOC), and faster (MacBook Pro (15-inch, 2017) macOS 10.13.1 2.8 GHz Intel Core i7):

> bench './tangle-ocaml README.org' './tangle-nim README.org'

benchmarking bench/./tangle-ocaml README.org
time                 26.68 ms   (25.94 ms .. 27.27 ms)
                     0.994 R²   (0.983 R² .. 0.999 R²)
mean                 27.78 ms   (27.16 ms .. 28.92 ms)
std dev              1.825 ms   (1.137 ms .. 2.600 ms)
variance introduced by outliers: 26% (moderately inflated)

benchmarking bench/./tangle-nim README.org
time                 10.26 ms   (9.521 ms .. 11.09 ms)
                     0.965 R²   (0.929 R² .. 0.993 R²)
mean                 10.22 ms   (9.813 ms .. 10.96 ms)
std dev              1.439 ms   (693.1 μs .. 2.577 ms)
variance introduced by outliers: 72% (severely inflated)

But using parser combinators and having all the functional benefits from OCaml feels so much better! It definitely worth such trade-off, especially taking in account that it doesn’t multiply when application grow, and OCaml version looks much more ready for evolution.

You should be able to reduce executable size some by switching to Base. The only real change is you’ll need to change your use of Map idioms a little. e.g., from

String.Map.empty
String.t String.Map.t

to

Map.empty (module String)
Map.M(String).t

Another small observation. You use operations like String.Map.add in various places. Even in Core, you only really need the String.Map module when creating new maps. For other operations, you can use the polymorphic functions in Map itself.

Also, you locally open the String.Map module over a large match statement. My view is that the extra concision from using an open in this case isn’t worth the loss in explicitness and readability: I would just prefix the operations in question with Map.

y

Thanks, Yaron!

You should be able to reduce executable size some by switching to Base.

I think I depend on Core for super-handy In/Out_channel and mkdir_p as well, so switching to Base would lead to re-implementing it on my side. I totally can afford those 7 megabytes, I was just surprised with that overhead. Perhaps one day we will have whole program dead code elimination in OCaml (like proposed here) to get lean executables when they are needed.

For other operations, you can use the polymorphic functions in Map itself.

Yes, I overlooked it completely, though I guess it’s where OCaml abstraction power is!

Also, you locally open the String.Map module over a large match statement. My view is that the extra concision from using an open in this case isn’t worth the loss in explicitness and readability: I would just prefix the operations in question with Map.

Sure, especially when it’s now just Map, not a String.Map as I used before.

The In/Out_channels are available from Stdlib, from what it’s worth. I don’t think mkdir_p is in there, though.

Just for the record: switching to base + stdio and using own simplified implementation of mkdir_p reduces binary size to 2.7M and adds performance a little bit:

> bench './tangle-ocaml README.org' './tangle-nim README.org'

benchmarking bench/./tangle-ocaml README.org
time                 21.76 ms   (20.98 ms .. 22.95 ms)
                     0.979 R²   (0.936 R² .. 0.998 R²)
mean                 21.94 ms   (21.42 ms .. 23.31 ms)
std dev              1.679 ms   (818.6 μs .. 3.015 ms)
variance introduced by outliers: 33% (moderately inflated)

benchmarking bench/./tangle-nim README.org
time                 10.01 ms   (9.216 ms .. 10.73 ms)
                     0.975 R²   (0.964 R² .. 0.989 R²)
mean                 10.54 ms   (10.00 ms .. 11.51 ms)
std dev              1.857 ms   (1.161 ms .. 2.579 ms)
variance introduced by outliers: 79% (severely inflated)

Actually running code via Instruments’ Time Profiler shows (if I read it properly) that “business logic” is executed by both binaries in a 1ms, which is super-exciting because OCaml’s version of parser is much less hacky and corresponds structure of parsed document much more. And difference in total execution time is introduced by different timing of _dyld_start descendants:

OCaml:

Weight	Self Weight		Symbol Name
4.00 ms  100.0%	0 s	 	tangle-ocaml (20258)
4.00 ms  100.0%	0 s	 	 <Unnamed Thread> 0x16a65b6
3.00 ms   75.0%	0 s	 	  _dyld_start
3.00 ms   75.0%	0 s	 	   dyldbootstrap::start(macho_header const*, int, char const**, long, macho_header const*, unsigned long*)
3.00 ms   75.0%	0 s	 	    dyld::_main(macho_header const*, unsigned long, int, char const**, char const**, char const**, unsigned long*)
2.00 ms   50.0%	0 s	 	     dyld::link(ImageLoader*, bool, bool, ImageLoader::RPathChain const&, unsigned int)
1.00 ms   25.0%	0 s	 	      ImageLoader::link(ImageLoader::LinkContext const&, bool, bool, bool, ImageLoader::RPathChain const&, char const*)
1.00 ms   25.0%	0 s	 	       ImageLoader::recursiveRebase(ImageLoader::LinkContext const&)
1.00 ms   25.0%	1.00 ms	 	        ImageLoaderMachOCompressed::rebase(ImageLoader::LinkContext const&, unsigned long)
1.00 ms   25.0%	0 s	 	      <Unknown Address>
1.00 ms   25.0%	1.00 ms	 	       ImageLoader::recursiveLoadLibraries(ImageLoader::LinkContext const&, bool, ImageLoader::RPathChain const&, char const*)
1.00 ms   25.0%	0 s	 	     dyld::initializeMainExecutable()
1.00 ms   25.0%	0 s	 	      ImageLoader::runInitializers(ImageLoader::LinkContext const&, ImageLoader::InitializerTimingList&)
1.00 ms   25.0%	0 s	 	       ImageLoader::processInitializers(ImageLoader::LinkContext const&, unsigned int, ImageLoader::InitializerTimingList&, ImageLoader::UninitedUpwards&)
1.00 ms   25.0%	0 s	 	        ImageLoader::recursiveInitialization(ImageLoader::LinkContext const&, unsigned int, char const*, ImageLoader::InitializerTimingList&, ImageLoader::UninitedUpwards&)
1.00 ms   25.0%	0 s	 	         ImageLoader::recursiveInitialization(ImageLoader::LinkContext const&, unsigned int, char const*, ImageLoader::InitializerTimingList&, ImageLoader::UninitedUpwards&)
1.00 ms   25.0%	0 s	 	          ImageLoaderMachO::doInitialization(ImageLoader::LinkContext const&)
1.00 ms   25.0%	0 s	 	           ImageLoaderMachO::doModInitFunctions(ImageLoader::LinkContext const&)
1.00 ms   25.0%	0 s	 	            libSystem_initializer
1.00 ms   25.0%	0 s	 	             libdispatch_init
1.00 ms   25.0%	0 s	 	              _os_object_init
1.00 ms   25.0%	0 s	 	               _objc_init
1.00 ms   25.0%	0 s	 	                _dyld_objc_notify_register
1.00 ms   25.0%	0 s	 	                 dyld::registerObjCNotifiers(void (*)(unsigned int, char const* const*, mach_header const* const*), void (*)(char const*, mach_header const*), void (*)(char const*, mach_header const*))
1.00 ms   25.0%	0 s	 	                  dyld::notifyBatchPartial(dyld_image_states, bool, char const* (*)(dyld_image_states, unsigned int, dyld_image_info const*), bool, bool)
1.00 ms   25.0%	0 s	 	                   map_images
1.00 ms   25.0%	0 s	 	                    map_images_nolock
1.00 ms   25.0%	1.00 ms	 	                     _read_images
1.00 ms   25.0%	0 s	 	  0x5
1.00 ms   25.0%	1.00 ms	 	   camlMap__Make_1243

Nim:

Weight	Self Weight		Symbol Name
2.00 ms  100.0%	0 s	 	tangle-nim (20251)
2.00 ms  100.0%	0 s	 	 Main Thread  0x16a63ef
1.00 ms   50.0%	0 s	 	  start
1.00 ms   50.0%	0 s	 	   main
1.00 ms   50.0%	0 s	 	    NimMainModule
1.00 ms   50.0%	0 s	 	     re_5YObgtpedTBKZLpqEDsMpg
1.00 ms   50.0%	0 s	 	      extractOptions_9aLyPIoKa7QTA03XJxQTIeA
1.00 ms   50.0%	0 s	 	       copyStrLast
1.00 ms   50.0%	0 s	 	        rawNewObj_QBPo5VW2O56Eeh8hPbQ7Pg
1.00 ms   50.0%	0 s	 	         rawAlloc_yn9c8RLaS8vgVBeMBfmkdUg
1.00 ms   50.0%	0 s	 	          getBigChunk_z9bCNjXTYllZ3pI24nEsw2g
1.00 ms   50.0%	1.00 ms	 	           splitChunk_8QXhiy717OAl8WNA2X27EA
1.00 ms   50.0%	0 s	 	  _dyld_start
1.00 ms   50.0%	0 s	 	   dyldbootstrap::start(macho_header const*, int, char const**, long, macho_header const*, unsigned long*)
1.00 ms   50.0%	0 s	 	    dyld::_main(macho_header const*, unsigned long, int, char const**, char const**, char const**, unsigned long*)
1.00 ms   50.0%	0 s	 	     dyld::initializeMainExecutable()
1.00 ms   50.0%	0 s	 	      ImageLoader::runInitializers(ImageLoader::LinkContext const&, ImageLoader::InitializerTimingList&)
1.00 ms   50.0%	0 s	 	       ImageLoader::processInitializers(ImageLoader::LinkContext const&, unsigned int, ImageLoader::InitializerTimingList&, ImageLoader::UninitedUpwards&)
1.00 ms   50.0%	0 s	 	        ImageLoader::processInitializers(ImageLoader::LinkContext const&, unsigned int, ImageLoader::InitializerTimingList&, ImageLoader::UninitedUpwards&)
1.00 ms   50.0%	0 s	 	         ImageLoader::recursiveInitialization(ImageLoader::LinkContext const&, unsigned int, char const*, ImageLoader::InitializerTimingList&, ImageLoader::UninitedUpwards&)
1.00 ms   50.0%	0 s	 	          ImageLoader::recursiveInitialization(ImageLoader::LinkContext const&, unsigned int, char const*, ImageLoader::InitializerTimingList&, ImageLoader::UninitedUpwards&)
1.00 ms   50.0%	0 s	 	           ImageLoaderMachO::doInitialization(ImageLoader::LinkContext const&)
1.00 ms   50.0%	0 s	 	            ImageLoaderMachO::doModInitFunctions(ImageLoader::LinkContext const&)
1.00 ms   50.0%	0 s	 	             _GLOBAL__sub_I_iostream.cpp
1.00 ms   50.0%	0 s	 	              std::__1::ios_base::Init::Init()
1.00 ms   50.0%	0 s	 	               std::__1::__stdinbuf<char>::__stdinbuf(__sFILE*, __mbstate_t*)
1.00 ms   50.0%	0 s	 	                std::__1::basic_streambuf<char, std::__1::char_traits<char> >::basic_streambuf()
1.00 ms   50.0%	0 s	 	                 std::__1::locale::locale()
1.00 ms   50.0%	0 s	 	                  std::__1::locale::__global()
1.00 ms   50.0%	0 s	 	                   std::__1::locale::classic()
1.00 ms   50.0%	1.00 ms	 	                    std::__1::locale::__imp::__imp(unsigned long)

But I might be completely wrong in interpretation, because I’m novice in native profiling as well. Could anybody explain that difference? Both executables depend on the same single dylib:

> otool -L tangle-ocaml
tangle-ocaml:
	/usr/lib/libSystem.B.dylib (compatibility version 1.0.0, current version 1252.0.0)

> otool -L tangle-nim
tangle-nim:
	/usr/lib/libSystem.B.dylib (compatibility version 1.0.0, current version 1252.0.0)

Channels themselves are available in OCaml’s own std lib, but using Stdio is much more convenient, so I was talking about ‘Stdio.In_channel’ and ‘Stdio.Out_channel’ modules

I’m surprised at the size. Is the reason that the linker can’t get rid of unused code effectively?