r[input]
r[input.syntax]
CHAR -> [U+0000-U+D7FF U+E000-U+10FFFF] // a Unicode scalar value
ASCII -> [U+0000-U+007F]
NUL -> U+0000
EOF -> !CHAR // End of file or input
r[input.intro] This chapter describes how a source file is interpreted as a sequence of tokens.
See Crates and source files for a description of how programs are organised into files.
r[input.encoding]
r[input.encoding.utf8] Each source file is interpreted as a sequence of Unicode characters encoded in UTF-8.
r[input.encoding.invalid] It is an error if the file is not valid UTF-8.
r[input.byte-order-mark]
If the first character in the sequence is U+FEFF (BYTE ORDER MARK), it is removed.
r[input.crlf]
Each pair of characters U+000D (CR) immediately followed by U+000A (LF) is replaced by a single U+000A (LF). This happens once, not repeatedly, so after the normalization, there can still exist U+000D (CR) immediately followed by U+000A (LF) in the input (e.g. if the raw input contained "CR CR LF LF").
Other occurrences of the character U+000D (CR) are left in place (they are treated as whitespace).
r[input.shebang]
r[input.shebang.removal] If a shebang is present, it is removed from the input sequence (and is therefore ignored).
r[input.frontmatter]
r[input.frontmatter.removal] If the remaining input begins with a frontmatter fence, optionally preceded by lines containing only whitespace, the frontmatter and any preceding whitespace are removed.
For example, given the following file:
--- cargo
package.edition = "2024"
---
fn main() {}The first three lines (the opening fence, body, and closing fence) would be removed, leaving an empty line followed by fn main() {}.
r[input.tokenization]
The resulting sequence of characters is then converted into tokens as described in the remainder of this chapter.
Note
The standard library [include!] macro applies the following transformations to the file it reads:
- Byte order mark removal.
- CRLF normalization.
- Shebang and frontmatter removal when invoked in an item context (as opposed to expression or statement contexts).
The [include_str!] and [include_bytes!] macros do not apply these transformations.