document asset generation (#986)

oscarlevin · web-flow · commit 3b2504493acf · 2025-06-14T13:56:01.000-06:00
diff --git a/README.md b/README.md
@@ -305,6 +305,10 @@ poetry run python scripts/release_stable.py minor # +0.+1.0
 poetry run python scripts/release_stable.py major # +1.0.0
 ```
 
+### Asset generation
+
+Generating assets is complicated.  See [docs/asset-generation.md](docs/asset-generation.md)
+
 ---
 
 ## About
diff --git a/docs/asset-generation.md b/docs/asset-generation.md
@@ -0,0 +1,47 @@
+# Asset Generation and Caching
+
+One of the more complex but useful features of the CLI is that is tries to help the user manage *generated assets* for their project.  Big picture: some images or other assets are described in the author's source and must be generated using core pretext.  These assets are placed in `generated-assets`, which is copied to the correct place to deploy html or build a pdf with LaTeX.
+
+Since asset generation can be time intensive, the CLI tries to only rebuild assets when absolutely necessary.  The following is a description of how it does this and what options are available in the CLI to control this.
+
+## Strategy
+
+For each _target_ in the project, we hash the contents of all assets of each type (`sageplot`, `latex-image`, etc), and store this hash in a dictionary `.cache/[targetname]_assets.json`, where each key is an asset type with value equal to its hash.  This way, we can detect changes to source, and if any source of a particular asset changes, we can regenerate assets for that type by calling the appropriate function from core.
+
+The function in core processes the source using an `xsl` template to extract the xml for each asset (this is generally pretty fast).  Then for each extracted asset, it will call the appropriate (often external) routines to convert that source into the desired output format (this can be slow).
+
+However, for select assets (currently `asymptote`, `latex-image`, `prefigure`, and `sageplot`), the CLI intercepts this conversion of individual assets and checks to see if a version of the output is already available in the appropriate subdirectory of `.cache\`.  Specifically, we hash the source of the extracted xml and store `[hash].[ext]` when it is first generated, and if this file exists, we copy the file instead of regenerating the individual asset.
+
+### Examples
+
+1. Target `web` is built.  Author edits content but not any asset code.  Target is rebuild, `.cache/.web_assets.json` has current hash for all asset types so no calls to core to generate any assets are made.  The contents of `generated-assets` is unchanged.
+
+2. Target `web` is build.  Author builds target `runestone`, another `format="html"` target with just a slightly different publication file.  No `.cache/.runestone_assets.json` exists, so generation of each asset type is requested from core.  Each `latex-image` asset is just copied from `.cache/latex-image` instead of regenerated.  Assets that we don't have individual caching for (`webwork`, `mermaid`, etc) are regenerated.
+
+3. Target `web` is built.  Author edits the source of one of five `latex-image` elements.  Author builds `web` again.  The hash in `.cache/.web_assets.json` for key `latex-image` doesn't match, so we request generation of latex-images from core.  For each of the five latex-images, we check whether its source has a hash that matches the stem of an svg in `.cache/latex-image`.  Four of these do, so we copy them.  The fifth doesn't exists in the cache folder, so it is generated by core (and a copy with hash as filename stem is placed into the cache folder).
+
+### Pitfalls
+
+1. The caching mechanism currently in place does not check whether the required generated assets exist in the `generated-assets` folder, so if a user deletes all or some of these assets, the build will break since no new assets will be generated (assuming no changes are made to source).
+
+2. If the software to generate assets is improved (which happens with `prefigure` assets, for example), the user will not get new versions of these assets (assuming no changes are made to source).
+
+## User Interface
+
+Assets are generated (sometimes using the cache, sometimes skipping it) depending on what CLI command a user enters.
+
+- `pretext build`. Assets will be generated only if the source has changed inside that asset type, and cached output will be copied if present.
+- `pretext build -g` (`pretext build --generate`).  For each asset type in source, we will request it be generated by core regardless of whether the source has changed.  Will copy cached versions of assets where possible.
+- `pretext build -q` (`pretext build --no-generate`).  No assets will be generated (or copied from cache), even if source has changed (or hasn't been successfully generated before).
+- `pretext generate`. Assets of all, or specified, types will be generated, even if source has changed.  Cached versions of individual assets will be copied if possible (CHANGE?).  Identical to `pretext build -g` except it allows to limit by asset type and doesn't call build.
+- `pretext generate -q` (`pretext build --only-changed`). Limit generation of assets to only those that have changed since last call to generate.  Same as `pretext build` except you don't do a build, just generate assets, and can limit to asset type.
+- `pretext generate -f` (`pretext generate --force`).  Generates all assets, even if source has not changed, and does NOT copy assets from cache even if available.
+
+There is also a `pretext generate --clean` that deletes the .cache directory.
+
+### Consequences
+
+To avoid pitfall number 1, run `pretext build -g`.
+
+To avoid pitfall number 2, run `pretext generate -f`.
+