[ANN] soupault: a static website generator based on HTML rewriting

https://baturin.org/projects/soupault/

Soupault is the first (to my knowledge) website generator that exploits the fact that well-formed HTML is machine readable and transformable (and thanks to @aantron’s lambdasoup it’s quite easy to do).

It can do things like “use the first <h1> for the page title” or “insert output of date -R into the <time> element no matter where it’s in the page”.

Features:

  • No templates, no themes, no front matter. You tell it where to insert stuff or what to extract using CSS selectors.
  • Built-in ToC, footnotes, and breadcrumbs.
  • Directories are site sections and can be nested.
  • Extracted metadata can be exported to JSON and fed to external scripts for creating section indices or custom taxonomies.
  • Configurable preprocessors for pages in formats other than HTML.

Soupault can be a drop-in automation tool for existing websites: the directory structure is fully configurable, clean URLs are optional, and it can preserve paths down to file extensions.

20 Likes

With soupault, it’s possible to take advantage of the full HTML markup and even make every page on your website look different rather than built from the same template, and still have an automated workflow.

This is awesome. I have been looking for a SSG that can serve the unique needs of artists’/designers’ portfolio websites and this finally sounds like the one. For this purpose I think a macOS binary would be pretty critical since they (my clients) pretty much use Apple stuff exclusively.

I’ve been experimenting with using Travis CI for OS X builds. Here’s an artifact: https://baturin.org/tmp/soupault-1.0.1-osx.zip

I can confirm that it’s a Mach-O executable, but that’s about it: I can’t actually test it since I don’t have a working Apple machine. If you’ve got time, please test it.

Works here (macos 10.14.6).

> ./soupault --version
[WARNING] Configuration file soupault.conf not found, using default settings
soupault 1.0.2
Copyright 2019 Daniil Baturin, licensed under MIT
Visit https://baturin.org/projects/soupault for documentation

Cool, then it passes the smoke test at least. Thanks for testing.

There’s nothing but basic POSIX stuff, so there’s little potential for incompatibilities… or so I hope.

nice! I did a navigation-injector (in ruby) based on the machine-readability of html a few years ago.

Writing the final markup has quite some appeal.

@hanjiexi I’ve made a 1.1 release with some updates and “official” macOS binaries. https://github.com/dmbaturin/soupault/releases/tag/1.1

@mro That’s pretty much my motivation. If those things require an HTML parser anyway, why keep a markdown-centric workflow.

1 Like

I’ve made a 1.2 release, now with Lua plugin support thanks to Lua-ML: https://baturin.org/projects/soupault/#plugins

2 Likes

1.3 release with some improvements.

  • Invalid config options cause warnings now. There are also “did you mean” suggestions for mistyped options, thanks to @c-cube’s spelll library.
  • Footnotes now keep original id’s for handy hotlinking, and you can add suffix/prefix to footnote ids to make a separate “namespace” for them.
  • Some minor bugfixes.
2 Likes

Made a 1.7.0 release.

First improvement is that you now can pipe the content of any element through any external program with preprocess_element widget (PR by Martin Karlsson).
For example, insert inline SVG versions of all graphviz graphs from <pre class="language-graphviz"> and also highlight the Dot source itself with highlight (or any other tool of your choice):

[widgets.graphviz-svg]
  widget = 'preprocess_element'
  selector = 'pre.language-graphviz'
  command = 'dot -Tsvg'
  action = 'insert_after'

[widgets.highlight]
  after = "graphviz-svg"
  widget = "preprocess_element"
  selector = '*[class^="language-"]'
  command = 'highlight -O html -f --syntax=$(echo $ATTR_CLASS | sed -e "s/language-//")'
  action = "replace_content" # default

graphviz_sample

Two other improvements are multiple index “views” and default value option for custom index fields, like

[index.custom_fields]
  category = { selector = "span#category", default = "Misc" }
1 Like

soupault 1.8.0 is released along with Lua-ML 0.9.1.
Lua-ML now raises Failure when Lua code execution fails. There’s much room for improvement in that area, for now I’ve just done something that is better than just displaying errors on stderr but otherwise allowing syntax and runtime errors pass silently.
If you have any ideas how perfect interpreter error reporting should work, please share!

As of improvements in soupault itself, there’s now:

  • A way for plugins to specify their minimum supported soupault version like Plugin.require_version("1.8.0")
  • TARGET_DIR environment variable and target_dir Lua global that contains the directory where the rendered page will be written, to make it easier for plugins/scripts to place processed assets together with pages.
  • “Build profiles”: if you add profile = "production" or similar to widget config, that widget will be ignored unless you run soupault --profile production.
  • A bunch of new utility functions for plugins.
3 Likes

1.9.0 release is now available.

  • --index-only option that makes soupault dump the site metadata to JSON and stop at that
  • Metadata extraction and index generation can now be limited to specific pages/section/path regexes, just like widgets
  • The preprocess_element widget now supports a list of selectors, e.g. selector = ["code", "pre code"].
  • Plugin API now has functions for running external programs, and some more element tree access functions.
  • CSS selector parse errors are now handled gracefully (lambdasoup PR#31).
  • The title widget now correctly removes HTML tags from the supposed title string and doesn’t add extra whitespace (fixes by Thomas Letan).
2 Likes

1.10.0 release is available.

Bug fixes:

  • Files without extensions are handled correctly.

New features:

  • Plugin discovery: if you save a plugin to plugins/my-plugin.lua, it’s automatically loaded as a widget named my-plugin. List of plugin directories is configurable.
  • New plugin API functions: HTMLget_tag_name, HTML.select_any_of, HTML.select_all_of.
  • The HTML module is now “monadic”: giving a nil to a function that expects an element gives you a nil back, rather than cause a runtime error.
1 Like

soupault 2.5.0 offers some features that are unique among SSGs.

There are two new built-in widgets for rewriting internal links, which is useful if you don’t host your website at the server root. For example, if you host it at example.com/~user, you cannot just write <img src="/header.png">: it will point to example.com/header.png while you want example.com/~user/header.png instead.

The relative_links widget will convert all internal links to relative links according to their depth in the directory tree. For example, suppose you have <img src="/header.png"> in your page template. Then in about/index.html that link will become <img src="../header.png">; in books/magnetic-fields/index.html it will be <img src="../../header.png"> and so on. This way you can move the website to a subdirectory and it will still work.

The absolute_links widget prepends a prefix to every internal link. Conceptually similar to the site URL option in other SSGs and CMSes, but works for all links, not only links generated by the SSG itself.

4 Likes

soupault 3.0.0 is now available.

It now uses the new OTOML library for loading the configs, which has some positive side effects, e.g. keys in the output of soupault --show-effective-config (that shows your config plus default values you didn’t set explicitly) now come in the same order as in your config file.

It also provides TOML and YAML parsing functions to Lua plugins and has colored log headers (can be disabled with NO_COLOR environment variables).

4 Likes

it’s in the news: Soupault is a tool that helps you create and manage static websites | Hacker News

2 Likes

soupault 3.2.0 offers some new functionality for plugin writers.

First, there’s a new persistent_data built-in variable (of type table) that allows plugins to keep a persistent state. It can be used to either avoid running expensive operations multiple times, or to accumulate data from multiple pages and output it to an index page (e.g. page count, total reading time, tag cloud, etc.).

Second, it’s now possible to check if an element would match certain selector using HTML.matches_selector(element_tree, element, selector) function or its sibling, HTML.matches_any_of_selectors(element_tree, element, selector_list) (thanks to @antron for making lambdasoup 0.7.3 release).

Finally, page file paths are now correctly quoted when running pages through external preprocessors, so file paths with spaces work as expected now.

2 Likes