Skip to content

Commit 2676297

Browse files
chore: add internal markdown link check (#21831)
## Which issue does this PR close? <!-- We generally require a GitHub issue to be filed for all bug fixes and enhancements and this helps us generate change logs for our releases. You can link an issue to this PR using the GitHub syntax. For example `Closes #123` indicates that this PR will close issue #123. --> - Closes #21747 ## Rationale for this change datafusion did not have a CI check for broken links in markdown content, docs workflows build and deploy docs, and dev checks formatting and spelling, but none of them validate link targets. This pr adds a dedicated link check for internal markdown links so broken references fail early in PRs. I kept the scope internal-only to avoid flaky CI failures from external websites and rate limits. Rust doc comments remain covered by the existing rustdoc CI job. <!-- Why are you proposing this change? If this is already explained clearly in the issue then this section is not needed. Explaining clearly why changes are proposed helps reviewers understand your changes and offer better suggestions for fixes. --> ## What changes are included in this PR? - Added a new Dev workflow job, **Check Markdown Links**, in `dev.yml`. - Added `LYCHEE_VERSION` pin in `tool_versions.sh`. - Added `markdown_link_check.sh` to run lychee on the selected markdown paths. - Added `lychee.toml` with internal-link policy and exclusions. - Added check markdown links to required status checks in `.asf.yaml`. - Updated contributor testing docs with the new local command and scope note. - Fixed internal markdown links that failed under the new check in: - `roadmap.md` - `49.0.0.md` - `overview.md` - `dataframe.md` - `format_options.md` <!-- There is no need to duplicate the description in the issue here but it is sometimes worth providing a summary of the individual changes in this PR. --> ## Are these changes tested? Yes, - `python3 ci/scripts/check_asf_yaml_status_checks.py` passed. - `bash -n ci/scripts/markdown_link_check.sh` passed. - `bash ci/scripts/markdown_link_check.sh` passed with 0 errors. - `cargo fmt --all --check` passed. OK: All 5 required_status_checks match existing GitHub Actions jobs. 🔍 12824 Total (in 0s) ✅ 490 OK 🚫 0 Errors 👻 12334 Excluded <!-- We typically require tests for all PRs in order to: 1. Prevent the code from being accidentally broken by subsequent changes 2. Serve as another way to document the expected behavior of the code If tests are not included in your PR, please explain why (for example, are they covered by existing tests)? --> ## Are there any user-facing changes? No, There is one contributor-facing CI change: PRs now fail when internal markdown links break in the checked markdown files. <!-- If there are user-facing changes then we may require documentation to be updated before approving the PR. --> <!-- If there are any breaking changes to public APIs, please add the `api change` label. --> --------- Co-authored-by: Oleks V <comphead@users.noreply.github.com>
1 parent bff0ffb commit 2676297

11 files changed

Lines changed: 120 additions & 6 deletions

File tree

.asf.yaml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -55,6 +55,7 @@ github:
5555
contexts:
5656
- "Check License Header"
5757
- "Use prettier to check formatting of documents"
58+
- "Check Markdown Links"
5859
- "Validate required_status_checks in .asf.yaml"
5960
- "Spell Check with Typos"
6061
# needs to be updated as part of the release process

.github/workflows/dev.yml

Lines changed: 19 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -23,6 +23,9 @@ on:
2323
pull_request:
2424
merge_group:
2525

26+
permissions:
27+
contents: read
28+
2629
concurrency:
2730
group: ${{ github.repository }}-${{ github.head_ref || github.sha }}-${{ github.workflow }}
2831
cancel-in-progress: true
@@ -51,6 +54,22 @@ jobs:
5154
# if you encounter error, see instructions inside the script
5255
run: ci/scripts/doc_prettier_check.sh
5356

57+
markdown-link-check:
58+
name: Check Markdown Links
59+
runs-on: ubuntu-latest
60+
steps:
61+
- uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
62+
- name: Load tool versions
63+
run: |
64+
source ci/scripts/utils/tool_versions.sh
65+
echo "LYCHEE_VERSION=${LYCHEE_VERSION}" >> "$GITHUB_ENV"
66+
- name: Install lychee
67+
uses: taiki-e/install-action@055f5df8c3f65ea01cd41e9dc855becd88953486 # v2.75.18
68+
with:
69+
tool: lychee@${{ env.LYCHEE_VERSION }}
70+
- name: Run markdown link check
71+
run: bash ci/scripts/markdown_link_check.sh
72+
5473
asf-yaml-check:
5574
name: Validate required_status_checks in .asf.yaml
5675
runs-on: ubuntu-latest

ci/scripts/markdown_link_check.sh

Lines changed: 33 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,33 @@
1+
#!/usr/bin/env bash
2+
#
3+
# Licensed to the Apache Software Foundation (ASF) under one
4+
# or more contributor license agreements. See the NOTICE file
5+
# distributed with this work for additional information
6+
# regarding copyright ownership. The ASF licenses this file
7+
# to you under the Apache License, Version 2.0 (the
8+
# "License"); you may not use this file except in compliance
9+
# with the License. You may obtain a copy of the License at
10+
#
11+
# http://www.apache.org/licenses/LICENSE-2.0
12+
#
13+
# Unless required by applicable law or agreed to in writing,
14+
# software distributed under the License is distributed on an
15+
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
16+
# KIND, either express or implied. See the License for the
17+
# specific language governing permissions and limitations
18+
# under the License.
19+
20+
set -euo pipefail
21+
22+
ROOT_DIR="$(git rev-parse --show-toplevel)"
23+
24+
cd "${ROOT_DIR}"
25+
26+
MARKDOWN_FILES=()
27+
while IFS= read -r file; do
28+
MARKDOWN_FILES+=("${file}")
29+
done < <(
30+
git -C "${ROOT_DIR}" ls-files 'README.md' 'CONTRIBUTING.md' 'docs/**/*.md' 'datafusion-cli/README.md' 'datafusion-examples/README.md' 'dev/**/*.md'
31+
)
32+
33+
lychee --no-progress --config "${ROOT_DIR}/lychee.toml" "${MARKDOWN_FILES[@]}"

ci/scripts/utils/tool_versions.sh

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -21,3 +21,4 @@
2121
# It is intended to be sourced by other scripts and should not be executed directly.
2222

2323
PRETTIER_VERSION="2.7.1"
24+
LYCHEE_VERSION="0.23.0"

docs/source/contributor-guide/roadmap.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -19,7 +19,7 @@ under the License.
1919

2020
# Roadmap and Improvement Proposals
2121

22-
The [project introduction](../user-guide/introduction) explains the
22+
The [project introduction](../user-guide/introduction.md) explains the
2323
overview and goals of DataFusion, and our development efforts largely
2424
align to that vision.
2525

docs/source/contributor-guide/testing.md

Lines changed: 28 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -186,6 +186,34 @@ tested in the same way using the [doc_comment] crate. See the end of
186186
[doc_comment]: https://docs.rs/doc-comment/latest/doc_comment
187187
[core/src/lib.rs]: https://github.com/apache/datafusion/blob/main/datafusion/core/src/lib.rs#L583
188188

189+
## Documentation Link Checks
190+
191+
Run the internal markdown link check locally:
192+
193+
```shell
194+
source ci/scripts/utils/tool_versions.sh
195+
cargo install lychee --locked --version "${LYCHEE_VERSION}"
196+
bash ci/scripts/markdown_link_check.sh
197+
```
198+
199+
Notes:
200+
201+
- The script is run with `bash` and is compatible with the default Bash on macOS (no `mapfile` dependency).
202+
- The CI configuration currently checks internal markdown links only. External `http(s)` and `mailto` links are excluded to avoid flaky failures.
203+
204+
When a link is broken, lychee prints the file and URL/path that failed. For example:
205+
206+
```text
207+
[docs/source/user-guide/cli/overview.md]:
208+
[ERROR] file:///.../docs/source/user-guide/cli/missing-page.md | Cannot find file: File not found. Check if file exists and path is correct
209+
```
210+
211+
Rust doc comments are validated by rustdoc in CI and can be checked locally with:
212+
213+
```shell
214+
bash ci/scripts/rust_docs.sh
215+
```
216+
189217
## Benchmarks
190218

191219
### Criterion Benchmarks

docs/source/library-user-guide/upgrading/49.0.0.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -123,7 +123,7 @@ Or via SQL:
123123
SET datafusion.execution.spill_compression = 'zstd';
124124
```
125125

126-
For more details about this configuration option, including performance trade-offs between different compression codecs, see the [Configuration Settings](../../user-guide/configs) documentation.
126+
For more details about this configuration option, including performance trade-offs between different compression codecs, see the [Configuration Settings](../../user-guide/configs.md) documentation.
127127

128128
### Deprecated `map_varchar_to_utf8view` configuration option
129129

docs/source/user-guide/cli/overview.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -41,5 +41,5 @@ DataFusion CLI v37.0.0
4141
Elapsed 1.969 seconds.
4242
```
4343
44-
For more information, see the [Installation](installation), [Usage Guide](usage)
45-
and [Data Sources](datasources) sections.
44+
For more information, see the [Installation](installation.md), [Usage Guide](usage.md)
45+
and [Data Sources](datasources.md) sections.

docs/source/user-guide/dataframe.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -122,4 +122,4 @@ async fn main() -> Result<()> {
122122
[`collect`]: https://docs.rs/datafusion/latest/datafusion/dataframe/struct.DataFrame.html#method.collect
123123
[library users guide]: ../library-user-guide/using-the-dataframe-api.md
124124
[api reference on docs.rs]: https://docs.rs/datafusion/latest/datafusion/dataframe/struct.DataFrame.html
125-
[expressions reference]: expressions
125+
[expressions reference]: expressions.md

docs/source/user-guide/sql/format_options.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -29,7 +29,7 @@ Format-related options can be specified in three ways, in decreasing order of pr
2929
- `COPY` option tuples
3030
- Session-level config defaults
3131

32-
For a list of supported session-level config defaults, see [Configuration Settings](../configs). These defaults apply to all operations but have the lowest level of precedence.
32+
For a list of supported session-level config defaults, see [Configuration Settings](../configs.md). These defaults apply to all operations but have the lowest level of precedence.
3333

3434
If creating an external table, table-specific format options can be specified when the table is created using the `OPTIONS` clause:
3535

0 commit comments

Comments
 (0)