Commit 483317f
Add an InlineWhitespace token to the HTML tokeniser so that normalised
whitespace-only DefaultMode text is tracked distinctly from real text
content. In the tree builder, an InlineWhitespace token is turned into a
HtmlText " " node only when its nearest previous and next tokens are
both inline content; otherwise it is silently discarded.
This preserves the significant inter-element space in:
<span>Hello,</span> <span>World</span> -> "Hello, World"
< > -> "< >"
while still dropping insignificant inter-block whitespace (e.g. newlines
between <head> and <body>), so existing tests continue to pass.
Whitespace produced by character references ( , , 	 ...) goes
through the CharRefMode path and is never turned into InlineWhitespace, so
it is never filtered.
Three regression tests added covering the cases from the issue.
Co-authored-by: Repo Assist <copilot@github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Don Syme <dsyme@users.noreply.github.com>
1 parent 73f6dfe commit 483317f
2 files changed
Lines changed: 121 additions & 1 deletion
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
49 | 49 | | |
50 | 50 | | |
51 | 51 | | |
| 52 | + | |
52 | 53 | | |
53 | 54 | | |
54 | 55 | | |
| |||
60 | 61 | | |
61 | 62 | | |
62 | 63 | | |
| 64 | + | |
63 | 65 | | |
64 | 66 | | |
65 | 67 | | |
| |||
237 | 239 | | |
238 | 240 | | |
239 | 241 | | |
240 | | - | |
| 242 | + | |
241 | 243 | | |
242 | 244 | | |
243 | 245 | | |
| |||
935 | 937 | | |
936 | 938 | | |
937 | 939 | | |
| 940 | + | |
| 941 | + | |
| 942 | + | |
| 943 | + | |
| 944 | + | |
| 945 | + | |
| 946 | + | |
| 947 | + | |
| 948 | + | |
| 949 | + | |
| 950 | + | |
| 951 | + | |
| 952 | + | |
| 953 | + | |
| 954 | + | |
| 955 | + | |
| 956 | + | |
| 957 | + | |
| 958 | + | |
| 959 | + | |
| 960 | + | |
| 961 | + | |
| 962 | + | |
| 963 | + | |
| 964 | + | |
| 965 | + | |
| 966 | + | |
| 967 | + | |
| 968 | + | |
| 969 | + | |
| 970 | + | |
| 971 | + | |
| 972 | + | |
| 973 | + | |
| 974 | + | |
| 975 | + | |
| 976 | + | |
| 977 | + | |
| 978 | + | |
| 979 | + | |
| 980 | + | |
| 981 | + | |
| 982 | + | |
| 983 | + | |
| 984 | + | |
| 985 | + | |
| 986 | + | |
| 987 | + | |
| 988 | + | |
| 989 | + | |
| 990 | + | |
| 991 | + | |
| 992 | + | |
| 993 | + | |
| 994 | + | |
| 995 | + | |
| 996 | + | |
| 997 | + | |
| 998 | + | |
| 999 | + | |
| 1000 | + | |
| 1001 | + | |
938 | 1002 | | |
939 | 1003 | | |
940 | 1004 | | |
| |||
1050 | 1114 | | |
1051 | 1115 | | |
1052 | 1116 | | |
| 1117 | + | |
| 1118 | + | |
| 1119 | + | |
| 1120 | + | |
| 1121 | + | |
| 1122 | + | |
| 1123 | + | |
| 1124 | + | |
| 1125 | + | |
| 1126 | + | |
| 1127 | + | |
| 1128 | + | |
| 1129 | + | |
| 1130 | + | |
| 1131 | + | |
| 1132 | + | |
| 1133 | + | |
| 1134 | + | |
| 1135 | + | |
| 1136 | + | |
1053 | 1137 | | |
1054 | 1138 | | |
1055 | 1139 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
983 | 983 | | |
984 | 984 | | |
985 | 985 | | |
| 986 | + | |
| 987 | + | |
| 988 | + | |
| 989 | + | |
| 990 | + | |
| 991 | + | |
| 992 | + | |
| 993 | + | |
| 994 | + | |
| 995 | + | |
| 996 | + | |
| 997 | + | |
| 998 | + | |
| 999 | + | |
| 1000 | + | |
| 1001 | + | |
| 1002 | + | |
| 1003 | + | |
| 1004 | + | |
| 1005 | + | |
| 1006 | + | |
| 1007 | + | |
| 1008 | + | |
| 1009 | + | |
| 1010 | + | |
| 1011 | + | |
| 1012 | + | |
| 1013 | + | |
| 1014 | + | |
| 1015 | + | |
| 1016 | + | |
| 1017 | + | |
| 1018 | + | |
| 1019 | + | |
| 1020 | + | |
| 1021 | + | |
0 commit comments