Skip to content

Add doc tests.#12216

Merged
trivialfis merged 13 commits into
dmlc:masterfrom
trivialfis:doctest
May 24, 2026
Merged

Add doc tests.#12216
trivialfis merged 13 commits into
dmlc:masterfrom
trivialfis:doctest

Conversation

@trivialfis
Copy link
Copy Markdown
Member

LLMs make authoring examples easier, but we need a way to validate them in CI. This PR adds tests for some snippets in the doc, while others are ignored for now. See notes in the docs.rst for more info.

A small sphinx extension is used to extract and execute R code blocks. We will be using it to check #12196 .

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR introduces CI validation for selected Sphinx documentation snippets (Python + R) by running the Sphinx doctest builder and adding a small custom Sphinx extension that executes R code-tab blocks via a persistent R session per document.

Changes:

  • Add a new CI job + pipeline script to build XGBoost, install the R package, and run Sphinx doctests.
  • Add a custom Sphinx extension (doc/xgboost_doc_doctest.py) to mark .. code-tab:: snippets for execution, including R snippet execution through an embedded R process.
  • Update docs environment/dependency installation to support the new doctest flow and reduce runtime of a heavy tutorial snippet.

Reviewed changes

Copilot reviewed 8 out of 8 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
R-package/tests/helper_scripts/install_deps.R Adds dependency “scopes” (including doc-test) for targeted dependency installation in CI.
ops/pipeline/test-docs.sh New script to build libxgboost, install R deps + R package, then run Sphinx doctests.
doc/xgboost_doc.yml Adds Sphinx/doc dependencies needed for doctest + tabs/issue linking.
doc/xgboost_doc_doctest.py New Sphinx extension to mark code-tab blocks for doctest, including R execution support.
doc/tutorials/advanced_custom_obj.rst Reduces expensive sampling in a tutorial snippet and stabilizes assertions for doctest runs.
doc/contrib/docs.rst Documents how doctests work and how to skip snippets via no-doctest.
doc/conf.py Enables Sphinx doctest + wires in the new extension and doctest setup.
.github/workflows/misc.yml Adds a new test-docs job and gates RTD trigger on its success.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread doc/conf.py Outdated
Comment thread doc/conf.py
Comment thread doc/conf.py Outdated
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 9 out of 11 changed files in this pull request and generated 3 comments.

Files not reviewed (2)
  • R-package/man/xgb.plot.deepness.Rd: Language not supported
  • R-package/man/xgb.plot.shap.Rd: Language not supported

Comment thread doc/conf.py
if name == "xgboost" or name.startswith("xgboost."):
del sys.modules[name]
"""
doctest_test_doctest_blocks = ""
Comment thread doc/contrib/docs.rst Outdated
Comment thread R-package/tests/helper_scripts/install_deps.R
@trivialfis trivialfis requested review from RAMitchell and hcho3 May 19, 2026 05:57
@trivialfis
Copy link
Copy Markdown
Member Author

trivialfis commented May 20, 2026

cc @jameslamb not sure if this is useful for you, if so, maybe we could extract something for reusing.

@trivialfis
Copy link
Copy Markdown
Member Author

I will merge this if there's no objection.

Copy link
Copy Markdown
Contributor

@jameslamb jameslamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the @, yeah this seems pretty useful! Registering as a Sphinx builder is a great idea.

I have another idea for your consideration, that might not require this much code... what about checking example code into the repo directly, as regular scripts (.py, .R, etc.) and including them into the docs with literalinclude?

.. literalinclude:: ./_example-scripts/custom_obj.R
    :language: r

I use this approach to keep pydistcheck's documented default config up to date while also making it machine-readable for unit tests:

With a setup like that, no Sphinx builder required. And testing in CI could be something like this:

find \
  ./docs/_example_scripts \
  -type f \
  -name '*.R' \
  -exec Rscript --vanilla {} \;

Would also make it easier to run the project's linters over that code, to help catch other correctness issues and keep the style consistent.

language = language.lower()
if language in {"py", "python", "python3"}:
return "python"
if language in {"r", "s", "rlang"}:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

handling S language code blocks, wow this is thorough 😅

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea, GPT suggested it. I can't take credit for this. My reaction to its suggestion is basically "why not"...

@trivialfis
Copy link
Copy Markdown
Member Author

@jameslamb Thank you for taking a look and the suggestion. Yes, the literal include could work. I don't have a strong preference.

@trivialfis
Copy link
Copy Markdown
Member Author

Let's go with the doc tests for now since the PR is mostly to aid LLMs writing/translating examples, keeping the snippets in the doc seems easier.

@trivialfis trivialfis merged commit 8968e2d into dmlc:master May 24, 2026
87 of 88 checks passed
@trivialfis trivialfis deleted the doctest branch May 24, 2026 08:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants