Skip to content

Commit c2c486a

Browse files
committed
tests: add whitespace tests for vertical tab behavior
* tests: add whitespace tests for vertical tab behavior Add two small tests to highlight how vertical tab is handled differently. - vertical_tab_lexer.rs checks that the lexer treats vertical tab as whitespace - ascii_whitespace_excludes_vertical_tab.rs shows that split_ascii_whitespace does not split on it This helps document the difference between the Rust parser (which accepts vertical tab) and the standard library’s ASCII whitespace handling. See: rust-lang/rust-project-goals#53 * tests: add ignore-tidy-tab directive to whitespace tests * tests: expand vertical tab lexer test to cover all Pattern_White_Space chars * tests: add whitespace/ README entry explaining lexer vs stdlib mismatch * Update ascii_whitespace_excludes_vertical_tab.rs * Update ascii_whitespace_excludes_vertical_tab.rs make sure tabs and spaces are well checked * Update ascii_whitespace_excludes_vertical_tab.rs * fix tidy: add whitespace README entry * Update README.md with missing full stop * Update ascii_whitespace_excludes_vertical_tab.rs * fix tidy: use full path format for whitespace README entry * fix tidy: README order, trailing newlines in whitespace tests * fix: add run-pass directive and restore embedded whitespace bytes * fix tidy: remove duplicate whitespace README entry * Add failing UI test for invalid whitespace (zero width space) This adds a //@ check-fail test to ensure that disallowed whitespace characters like ZERO WIDTH SPACE are rejected by the Rust lexer. * git add tests/ui/whitespace/invalid_whitespace.rs git commit -m "Fix tidy: add trailing newline" git push * Fix tidy: add trailing newline * Update invalid_whitespace.rs * Update invalid_whitespace.rs * Clean up whitespace in invalid_whitespace.rs Remove unnecessary blank lines in invalid_whitespace.rs * Update invalid_whitespace.rs * Clarify ZERO WIDTH SPACE usage in test Update comment to clarify usage of ZERO WIDTH SPACE. * Improve error messages for invalid whitespace Updated error messages to clarify the issue with invisible characters. * Modify invalid_whitespace test for clarity Update test to check for invalid whitespace characters. * Resolve unknown token error in invalid_whitespace.rs Fix whitespace issue causing unknown token error. * Remove invisible character from variable assignment Fix invisible character issue in variable assignment. * Improve error message for invalid whitespace Updated error message to clarify invisible characters. * Improve error handling for invisible characters Updated error message for invisible characters in code. * Document error for unknown token due to whitespace Add error message for invalid whitespace in code * Update error message for invalid whitespace handling * Modify invalid_whitespace.rs for whitespace checks Updated the test to check for invalid whitespace handling. * Correct whitespace in variable declaration Fix formatting issue by adding space around '=' in variable declaration. * Update error message for invalid whitespace * Update invalid_whitespace.stderr * Refine error handling for invalid whitespace test Update the error messages for invalid whitespace in the test. * Update invalid_whitespace.rs * Fix whitespace issues in invalid_whitespace.rs * Update invalid_whitespace.stderr file * Clean up whitespace in invalid_whitespace.rs Removed unnecessary blank lines from the test file. * Update invalid_whitespace.stderr
1 parent bf4fbfb commit c2c486a

5 files changed

Lines changed: 118 additions & 0 deletions

File tree

tests/ui/README.md

Lines changed: 15 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1582,6 +1582,21 @@ Tests on various well-formedness checks, e.g. [Type-checking normal functions](h
15821582

15831583
Tests on `where` clauses. See [Where clauses | Reference](https://doc.rust-lang.org/reference/items/generics.html#where-clauses).
15841584

1585+
## `tests/ui/whitespace/`
1586+
1587+
Tests for whitespace handling in the Rust lexer. The Rust language
1588+
defines whitespace as Unicode Pattern_White_Space, which is not the
1589+
same as what the standard library gives you:
1590+
1591+
- `is_ascii_whitespace` follows the WhatWG Infra Standard and skips
1592+
vertical tab (`\x0B`)
1593+
- `is_whitespace` matches Unicode White_Space, which is a broader set
1594+
1595+
These tests make that gap visible and check that the lexer accepts
1596+
all 11 Pattern_White_Space characters correctly.
1597+
1598+
See: https://github.com/rustfoundation/interop-initiative/issues/53
1599+
15851600
## `tests/ui/windows-subsystem/`: `#![windows_subsystem = ""]`
15861601

15871602
See [the `windows_subsystem` attribute](https://doc.rust-lang.org/reference/runtime.html#the-windows_subsystem-attribute).
Lines changed: 22 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,22 @@
1+
//@ run-pass
2+
// This test checks that split_ascii_whitespace does NOT split on
3+
// vertical tab (\x0B), because the standard library uses the WhatWG
4+
// Infra Standard definition of ASCII whitespace, which excludes
5+
// vertical tab.
6+
//
7+
// See: https://github.com/rust-lang/rust-project-goals/issues/53
8+
9+
fn main() {
10+
let s = "a\x0Bb";
11+
12+
let parts: Vec<&str> = s.split_ascii_whitespace().collect();
13+
14+
assert_eq!(parts.len(), 1,
15+
"vertical tab should not be treated as ASCII whitespace");
16+
17+
let s2 = "a b";
18+
let parts2: Vec<&str> = s2.split_ascii_whitespace().collect();
19+
assert_eq!(parts2.len(), 2,
20+
"regular space should split correctly");
21+
22+
}
Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,13 @@
1+
// This test ensures that the Rust lexer rejects invalid whitespace
2+
// characters such as ZERO WIDTH SPACE.
3+
4+
//@ check-fail
5+
6+
fn main() {
7+
let x = 5;
8+
let y = 10;
9+
10+
let a=​x + y;
11+
//~^ ERROR unknown start of token
12+
//~| HELP invisible characters like
13+
}
Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,10 @@
1+
error: unknown start of token: \u{200b}
2+
--> $DIR/invalid_whitespace.rs:10:11
3+
|
4+
LL | let a=​x + y;
5+
| ^
6+
|
7+
= help: invisible characters like '\u{200b}' are not usually visible in text editors
8+
9+
error: aborting due to 1 previous error
10+
Lines changed: 58 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,58 @@
1+
//@ run-pass
2+
// ignore-tidy-tab
3+
//
4+
// Tests that the Rust lexer accepts Unicode Pattern_White_Space characters.
5+
//
6+
// Worth noting: the Rust reference defines whitespace as Pattern_White_Space,
7+
// which is not the same as what is_ascii_whitespace or is_whitespace give you.
8+
//
9+
// is_ascii_whitespace follows WhatWG and skips vertical tab (\x0B).
10+
// is_whitespace uses Unicode White_Space, which is a broader set.
11+
//
12+
// The 11 characters that actually count as whitespace in Rust source:
13+
// \x09 \x0A \x0B \x0C \x0D \x20 \u{85} \u{200E} \u{200F} \u{2028} \u{2029}
14+
//
15+
// Ref: https://github.com/rustfoundation/interop-initiative/issues/53
16+
17+
#[rustfmt::skip]
18+
fn main() {
19+
// tab (\x09) between let and the name
20+
let _ws1 = 1_i32;
21+
22+
// vertical tab (\x0B) between let and the name
23+
// this is the one is_ascii_whitespace gets wrong
24+
let _ws2 = 2_i32;
25+
26+
// form feed (\x0C) between let and the name
27+
let _ws3 = 3_i32;
28+
29+
// plain space (\x20), here just so every character is represented
30+
let _ws4 = 4_i32;
31+
32+
// NEL (\u{85}) between let and the name
33+
let…_ws5 = 5_i32;
34+
35+
// left-to-right mark (\u{200E}) between let and the name
36+
let‎_ws6 = 6_i32;
37+
38+
// right-to-left mark (\u{200F}) between let and the name
39+
let‏_ws7 = 7_i32;
40+
41+
// \x0A, \x0D, \u{2028}, \u{2029} are also Pattern_White_Space but they
42+
// act as line endings, so you can't stick them in the middle of a statement.
43+
// The lexer still handles them correctly at line boundaries.
44+
45+
// These are Unicode White_Space but NOT Pattern_White_Space:
46+
// \u{A0} no-break space \u{1680} ogham space mark
47+
// \u{2000} en quad \u{2001} em quad
48+
// \u{2002} en space \u{2003} em space
49+
// \u{2004} three-per-em space \u{2005} four-per-em space
50+
// \u{2006} six-per-em space \u{2007} figure space
51+
// \u{2008} punctuation space \u{2009} thin space
52+
// \u{200A} hair space \u{202F} narrow no-break space
53+
// \u{205F} medium math space \u{3000} ideographic space
54+
55+
// add them up so the compiler doesn't complain about unused variables
56+
let _sum = _ws1 + _ws2 + _ws3 + _ws4 + _ws5 + _ws6 + _ws7;
57+
println!("{}", _sum);
58+
}

0 commit comments

Comments
 (0)