Skip to content

Address EditorConfig violations#158550

Open
BaumiCoder wants to merge 28 commits into
rust-lang:mainfrom
BaumiCoder:fix-editorconfig-violations
Open

Address EditorConfig violations#158550
BaumiCoder wants to merge 28 commits into
rust-lang:mainfrom
BaumiCoder:fix-editorconfig-violations

Conversation

@BaumiCoder

Copy link
Copy Markdown

This Pull Request is part of an academic research project. More preciously it is for my master's thesis, more background information here.

Definition: EditorConfig violations

The goal of this Pull Request is to address all EditorConfig violations in the project. With EditorConfig you have set for example insert_final_newline = true for most files, but there were files without a final newline. I call this an EditorConfig violation as a file content violates the respective EditorConfig property. They can have different kind of impacts. For example trailing spaces, which get remove with the next save, may only make the next diff / Pull Request to this file less readable. Some other may lead to bugs, such as adding final newlines to test files, which are not allowed to have them.

How to address them?

To check for such violations and fix them I developed the CLI tool ecformat. However, there are different ways to address the violations:

  1. Adjust EditorConfig: The EditorConfig properties for a file can be wrong. For example some of your test files have no final newline, but it is intentional as comments in the files tells us. Therefore, it would be wrong to insert final newlines in them as this may even break some tests. The correct way to address that is to set insert_final_newline = false for them, which avoids automatic insert of a final newline on save.
  2. Fix file content: Another way is to fix the content of the file. Which means for a missing final newline to insert it.
    To do so ecformat fix can be helpful.
  3. Ignore files: For some files it is not meaningful to consider the EditorConfig violations, which ecformat check finds. I added a .ecformat_ignore file for this. It has the common ignore syntax such as a .gitignore.
    Do use such ignore files, provide the name of them with an CLI switch, ecformat check --ignore-file .ecformat_ignore. (The .gitignore files are automatically considered inside a git repository.) There are different reasons to ignore files:
    1. The files belong the external repositories, such as submodules and subtrees.
    2. As EditorConfig is only for text files, considering binary files does not make sense.
    3. EditorConfig supports to set the property on the file level. However, there can be files with intentional differences inside. A common example is that there is indent_size = 4 set, but to align some code with the previous line an indentation / alignment uses for example 5 spaces. Detecting such cases as alignment and not as indentation needs language specific parsing, which is out of scope for a generic EditorConfig utility. Such files need to be ignored fully by ecformat as it currently (version 0.2.0) has no support to ignore only a single property for some files.

Changes in this Pull Request

For this Pull Request, I addressed all violations in the project in the way, which looks the most correct one to me. (I focused on the Properties of the current EditorConfig specification 0.17.2.)
If I fixed file contents, I manually reviewed all of them, making adjustments in the ecformat fix changes were necessary. Furthermore, ecformat detects the charset wrong for some files. Therefore, I would suggest disabling this property for whole project, with --disable-charset or -c. (I plan to improve the charset detection in a later version.)

If I picked the wrong way to address the violations for some files, please let me know with a respective Review comment.
Then I will address it in the correct way. I add Review comments myself for special parts, which are especially worthy of discussion. To support the Review of this big Pull Request, I tried to group the changes in meaningful commits with respective commit messages. Therefore, may take a look at the commit history.

Avoid EditorConfig violations in the future

Such EditorConfig violations can creep in again. Possible reasons can be that the editor of some contributors has not EditorConfig support, or it would be required to install a respective plugin for the support.

If you want to avoid them in the future, I would suggest integrating a respective utility. For example as part of the CI, some pre-commit scripts or similar places. One possibility is to use ecformat with ecformat check --disable-charset --ignore-file .ecformat_ignore. Another one is editorconfig-checker, which is a more sophisticated tool without the possibility to fix violations. It provides more configuration options, such as ignoring violations in single lines with marker comments.

Let me know if you are interested using one of them. I could add one in this Pull Request or a follow-up Pull Request.
If you not consider to use ecformat in your project, the .ecformat_ignore file is maybe not worth to merge into main. I can remove it if desired.

As Cargo.lock and yarn.lock are normally created automatically,
we do not want using a different format when editing it manually.
The auxiliary vector files are binary files.
They use all two spaces for indentation. The intrinsic.natvis differed
in two ways from the others:
 - It had a UTF-8 BOM.
 - It had two multiline comments in a different format.
   (No new line after <!--)
There are two file extensions for YAML and the '.yaml' currently also
use two spaces for indentation.
All JSON and JSON5 files in the repostiory
use two spaces for indentation.
Mostly fixing the indentation size (use 4 spaces everywhere and not
in some file 2 and some 4 for the same situations).
Furthermore, insert final newline for one of them.
In several file types the mixing of different formats
(mostly indent_size) seems to be intentional.
All other Cargo.toml files already use 4 spaces for indentation.
There is often alignment which "breaks" the indentation rules.
Furthermore, some files use 4 spaces and some 2 spaces,
but these indentation errors cannot be separated from the alignments.
All three nix file use 2 spaces for indentation.
The most shell scripts use 4 spaces for indentation. The set of scripts
using (only) 2 spaces is smaller.
When git creates or modifies this file, it uses tab characters.
./x test failed early due to too long line
Only one of them had already a final newline. All tests still pass after
these insertions (executed with ./x test).
Adjust shell scripts, without .sh extension, to 4 spaces,
ignore remaining mixed files and adjust EditorConfig for some files.
@rustbot

rustbot commented Jun 29, 2026

Copy link
Copy Markdown
Collaborator

Some changes occurred in HTML/CSS/JS.

cc @GuillaumeGomez, @lolbinarycat

@rustbot

rustbot commented Jun 29, 2026

Copy link
Copy Markdown
Collaborator

Thanks for the pull request, and welcome! The Rust Project is excited to review your changes, and you should hear from @jdno (or someone else) some time within the next two weeks.

Please see the contribution instructions for more information. Namely, in order to ensure the minimum review times lag, PR authors and assigned reviewers should ensure that the review label (S-waiting-on-review and S-waiting-on-author) stays updated, invoking these commands when appropriate:

  • @rustbot author: the review is finished, PR author should check the comments and take action accordingly
  • @rustbot review: the author is ready for a review, this PR will be queued again in the reviewer's queue
Why was this reviewer chosen?

The reviewer was selected based on:

  • Owners of files modified in this PR: infra-ci
  • infra-ci expanded to Kobzol, Mark-Simulacrum, jdno, jieyouxu, marcoieni
  • Random selection from Mark-Simulacrum, jdno, jieyouxu, marcoieni

@rustbot rustbot added the A-CI Area: Our Github Actions CI label Jun 29, 2026
@rustbot rustbot added A-meta Area: Issues & PRs about the rust-lang/rust repository itself A-run-make Area: port run-make Makefiles to rmake.rs A-rustdoc-js Area: Rustdoc's JS front-end A-testsuite Area: The testsuite used to check the correctness of rustc O-SGX Target: SGX S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-bootstrap Relevant to the bootstrap subteam: Rust's build system (x.py and src/bootstrap) T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. T-infra Relevant to the infrastructure team, which will review and decide on the PR/issue. T-libs Relevant to the library team, which will review and decide on the PR/issue. T-rustdoc Relevant to the rustdoc team, which will review and decide on the PR/issue. T-rustdoc-frontend Relevant to the rustdoc-frontend team, which will review and decide on the web UI/UX output. labels Jun 29, 2026
Comment thread .editorconfig
# For example, the indentation in lists depend on the length of the list marker
# (such as '- ' or '100. '), at least in the GitHub Markdown flavor:
# https://docs.github.com/en/get-started/writing-on-github/getting-started-with-writing-and-formatting-on-github/basic-writing-and-formatting-syntax#nested-lists
# Therefore, pressing tab should insert a single space to encourage situative accurate indentations.

@BaumiCoder BaumiCoder Jun 29, 2026

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is maybe a matter of taste if inserting only one space with a tab is a nice solution for this or not. Another one would be to keep the default indent_size = 4 for Markdown files and ignore them when checking for EditorConfig violations.

View changes since the review

Comment thread .editorconfig
# these specific files have no trailing newline to test on these
[tests/ui/{lint/unused_parens_multibyte_recovery.rs,parser/{macro/macro-missing-right-paren.rs,missing_right_paren.rs}}]
insert_final_newline = false
[tests/{pretty/issue-74745.rs,ui/{parser/issues/issue-{62524,68730,58094-missing-right-square-bracket}.rs,type/issue-91268.rs}}]

@BaumiCoder BaumiCoder Jun 29, 2026

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These files all have the name pattern issue-*.rs. Using the pattern would reduce the maintenance effort, if new such files without final newlines gets added to the project. However, not all files with that pattern are without a final newline.

View changes since the review

Comment thread .ecformat_ignore
tests/run-make-cargo/thumb-none-qemu/example/memory.x
src/bootstrap/mk/Makefile.in
src/etc/rust_analyzer_eglot.el
src/librustdoc/html/static/fonts/README.txt

@BaumiCoder BaumiCoder Jun 29, 2026

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The content of this README looks almost like Markdown. Only some formatting bugs may need to be fixed. Should that be changed to a Markdown file? If the content and file ending would be changed, it would not be necessary to ignore it here.

View changes since the review

Comment thread .ecformat_ignore
library/std/src/sys/pal/sgx/abi/entry.S
tests/pretty/block-comment-wchar.pp
tests/run-make-cargo/thumb-none-qemu/example/memory.x
src/bootstrap/mk/Makefile.in

@BaumiCoder BaumiCoder Jun 29, 2026

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This Makefile uses two styles of indentation. First, for targets of a Makefile you need to use tab characters for indentation. Otherwise, it is syntactically wrong. Second, there is some further code at the beginning of this file and there 2 spaces are in use for indentation. I do not know if using only tab character for indentation would be fine for this file or not.

View changes since the review

@BaumiCoder BaumiCoder Jun 29, 2026

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Most of the *.check files had no final newline. Although, tests (./x test) still pass on my machine after adding these final newlines, I want to ask if you may prefer to have no final newlines in these files. If so I could remove them from all the *.check files and set insert_final_newline = false for them.

View changes since the review

@rust-log-analyzer

This comment has been minimized.

Remove all added final newlines as test like
tests/ui/argfile/commandline-argfile-badutf8.rs
fail in the CI job aarch64-gnu-llvm-21-1
Therefore, set insert_final_newline = false for all the args files
for the tests as they seem to have no final newline intentional.
@rust-log-analyzer

This comment has been minimized.

@bjorn3

bjorn3 commented Jun 29, 2026

Copy link
Copy Markdown
Member

In tidy we ignore all test files for style lints. At the same time it is still useful to not include it in the editorconfig ignore to increase the chance of new or changed tests having a proper format. Maybe the test changes should be reverted?

Comment thread .editorconfig
Comment on lines +27 to +36
# these specific files have an UTF-8 BOM to test with that charset
[tests/ui/codemap_tests/utf8-bom.rs]
charset = utf-8-bom
# these specific files have CRLF line endings to test on them
[tests/{rustdoc-ui/intra-doc/warning-crlf.rs,ui/{frontmatter/frontmatter-crlf.rs,lexer/lexer-crlf-*.rs}}]
end_of_line = crlf
# these specific files have an UTF-8 BOM and CRLF line endings to test on these
[tests/ui/{include-macros/data.bin,json/json-bom-plus-crlf*.rs}]
charset = utf-8-bom
end_of_line = crlf

@bjorn3 bjorn3 Jun 29, 2026

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is now a third location (next to .gitattributes and // ignore-tidy-cr in the test itself) where CRLF containing tests are placed. This is bound to go out of sync: #157184

View changes since the review

Removing the final newlines in this two files did not undo all changes
in these files, which lead to similar errors as before.
Comment thread tests/rustdoc-ui/argfile/commandline-argfile-badutf8.args
Comment thread tests/ui/argfile/commandline-argfile-badutf8.args
@@ -1,4 +1,4 @@
<?xml version="1.0" encoding="utf-8"?>
<?xml version="1.0" encoding="utf-8"?>

@bjorn3 bjorn3 Jun 29, 2026

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What changed here? Github is showing mojibake in the original file. Was that a UTF-8 BOM or something?

View changes since the review

@BaumiCoder BaumiCoder Jun 29, 2026

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file had the charset utf-8-bom instead of utf-8. Therefore, I removed the BOM, which is not visible in the diff here. (I do not see any mojibake here. GitHub shows me no visible difference in this line.)

@BaumiCoder

Copy link
Copy Markdown
Author

In tidy we ignore all test files for style lints. At the same time it is still useful to not include it in the editorconfig ignore to increase the chance of new or changed tests having a proper format. Maybe the test changes should be reverted?

This could be an approach, but the options insert_final_newline = true, trim_trailing_whitespace and maybe even charset could lead to unintentional changes in the test files. Should they be unset for all test files?

@bjorn3

bjorn3 commented Jun 29, 2026

Copy link
Copy Markdown
Member

Editor autoformat can also result in unintended changes as well as editor defaults that already insert a final newline and trim trailing whitespace like I have myself. You have to be careful what changes you commit either way. Unsetting insert_final_newline and trim_trailing_whitespace would make new tests deviate from preferred formatting.

@BaumiCoder

Copy link
Copy Markdown
Author

You are right, better would be to set trim_trailing_whitespace = false and insert_final_newline = false to disable this automation explicitly. Only charset = unset is maybe the only option to avoid problems, because the editor may want to fix the charset for an file on save.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

A-CI Area: Our Github Actions CI A-meta Area: Issues & PRs about the rust-lang/rust repository itself A-run-make Area: port run-make Makefiles to rmake.rs A-rustdoc-js Area: Rustdoc's JS front-end A-testsuite Area: The testsuite used to check the correctness of rustc O-SGX Target: SGX S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-bootstrap Relevant to the bootstrap subteam: Rust's build system (x.py and src/bootstrap) T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. T-infra Relevant to the infrastructure team, which will review and decide on the PR/issue. T-libs Relevant to the library team, which will review and decide on the PR/issue. T-rustdoc Relevant to the rustdoc team, which will review and decide on the PR/issue. T-rustdoc-frontend Relevant to the rustdoc-frontend team, which will review and decide on the web UI/UX output.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants