|
64 | 64 |
|
65 | 65 | Programs consist of variable declarations via assignment, expressions, and control-flow constructs such as `IF`, `ELSEIF`, `ELSE`, `WHILE`, and `FOR`. |
66 | 66 |
|
67 | | -Prefix has seven runtime data types: binary integers (`INT`), binary floating-point numbers (`FLT`, IEEE754), strings (`STR`), non-scalar tensors (`TNS`), first-class user-defined functions (`FUNC`), associative maps (`MAP`), and thread handles (`THR`). |
| 67 | +Prefix has seven runtime data types: based integers (`INT`), based floating-point numbers (`FLT`, IEEE754), strings (`STR`), non-scalar tensors (`TNS`), first-class user-defined functions (`FUNC`), associative maps (`MAP`), and thread handles (`THR`). |
68 | 68 |
|
69 | 69 | Identifiers, function parameters, and return values are statically typed; the type of every symbol MUST be declared when it is first introduced. Computation proceeds by evaluating expressions and executing statements in sequence, with explicit constructs for branching and looping. Input and output are modeled through built-in operators, in particular `INPUT` and `PRINT`. |
70 | 70 |
|
|
86 | 86 |
|
87 | 87 | Line continuation: The character `^` serves as a line-continuation marker. When a caret `^` appears in the source and is followed immediately by a newline, both the `^` and the newline are ignored by the lexer (that is, the logical line continues on the next physical line). The lexer also accepts a caret immediately before a code note (a comment beginning with `!`); in this case the `^`, the comment text up to the line terminator, and the terminating newline are treated as if they were not present. If a `^` is present in a string, it does not count as a line continuation. If a caret appears and is not immediately followed by a newline, a code note, or the platform's single-character newline sequence, the lexer MUST raise a syntax error. |
88 | 88 |
|
89 | | -The character `-` primarily serves as the leading sign of a numeric literal (Section 3.1). When `-` appears immediately before OPTIONAL whitespace and then binary digits, it is parsed as part of the numeric literal (that is, a signed literal). In other contexts (for example inside an index expression) a single `-` token is recognized as a dash used for slice notation `lo-hi`. If `-` appears in any other unsupported context the lexer MUST raise a syntax error. |
| 89 | +The character `-` primarily serves as the leading sign of a numeric literal (Section 3.1). A signed numeric literal MUST place the sign before the base prefix (for example, `-0xA`, `-0d10.5`). In other contexts (for example inside an index expression) a single `-` token is recognized as a dash used for slice notation `lo-hi`. If `-` appears in any other unsupported context the lexer MUST raise a syntax error. |
90 | 90 |
|
91 | 91 | Identifiers denote variables and user-defined functions. They MUST be non-empty and case-sensitive. An identifier MUST NOT contain non-ASCII characters, nor any of the following characters: `{`, `}`, `[`, `]`, `(`, `)`, `=`, `,`, `!`. The first character of an identifier MUST NOT be the digit `0` or `1` (these digits are used to begin binary integer literals). However, the characters `0` and `1` are permitted in subsequent positions within an identifier (for example, `a01` and `X10Y` are valid identifiers, while `0foo` and `1bar` are not). The namespace is flat: variables and functions share a single identifier space, so a given name cannot simultaneously denote both. A user-defined function name MUST NOT conflict with the name of any built-in operator or function (see Section 13). |
92 | 92 |
|
|
115 | 115 |
|
116 | 116 | As noted above, non-ASCII characters remain disallowed, and the delimiter characters `{`, `}`, `(`, `)`, `=`, `,`, and `!` are never permitted inside identifiers. |
117 | 117 |
|
118 | | -This deliberately-permissive identifier character set preserves an unambiguous lexical distinction between binary integer literals (which MUST begin with `0` or `1`) and identifiers, while allowing module-qualified names and other symbolic conventions to be expressed directly as plain identifiers in source code. |
| 118 | +This deliberately-permissive identifier character set preserves an unambiguous lexical distinction between numeric literals (which MUST begin with a base prefix starting with `0`) and identifiers, while allowing module-qualified names and other symbolic conventions to be expressed directly as plain identifiers in source code. |
119 | 119 |
|
120 | 120 | ### Pointer operator |
121 | 121 |
|
|
124 | 124 |
|
125 | 125 | ## 3. Data Model |
126 | 126 |
|
127 | | -Prefix supports seven runtime data types: binary integers, binary floating-point numbers, strings, non-scalar tensors, first-class functions, associative maps, and thread handles. |
| 127 | +Prefix supports seven runtime data types: based integers, based floating-point numbers, strings, non-scalar tensors, first-class functions, associative maps, and thread handles. |
128 | 128 |
|
129 | | -Binary integer literal: an unsigned non-empty sequence of `{0,1}` (for example, `0`, `1`, `1011`), or a signed literal formed by a leading `-` (the dash is part of the literal, not an operator) followed by OPTIONAL spaces, tabs, or carriage returns and then a non-empty sequence of `{0,1}`. A `-` that does not immediately introduce a literal is a syntax error. |
| 129 | +Numeric literals (`INT` and `FLT`) MUST include an explicit base prefix. Valid prefixes are: |
130 | 130 |
|
131 | | -Binary floating-point literal: an IEEE754 floating-point value written in binary fixed-point notation `n.n`, where both sides of the radix point are non-empty sequences of `{0,1}`. Examples: |
| 131 | +- `0b` = base 2 digits `0-1` |
| 132 | +- `0o` = base 8 digits `0-7` |
| 133 | +- `0d` = base 10 digits `0-9` |
| 134 | +- `0x` = base 16 digits `0-9` and `A-F` |
| 135 | +- `0t` = base 32 digits `0-9` and `A-V` |
| 136 | +- `0c` = base 58 digits `1-9`, `A-H`, `J-N`, `P-Z`, `a-k`, `m-z` |
| 137 | +- `0s` = base 64 digits `0-9`, `A-Z`, `a-z`, `+`, `_` |
| 138 | +- `0rNN` = base `NN` where `NN` is two decimal digits and `2 <= NN <= 64` |
132 | 139 |
|
133 | | -- `0.1` denotes one-half. |
| 140 | +Signed numeric literals MUST place the sign before the prefix (for example, `-0d10`, `-0xA.8`). `FLT` literals MUST include both integer and fractional parts around the radix point. `FLT` infinities and NaN are written as `INF`, `-INF`, and `NaN`, and do not carry a numeric base prefix (they are considered base-NaN). |
134 | 141 |
|
135 | | -- `0.01` denotes one-quarter. |
| 142 | +`INT` and `FLT` values store their base at runtime. The result of a mathematical operation MUST use the highest base present in its numeric operands. Bases below 2 or above 64 are invalid and MUST raise an error when requested (for example via `CONVERT`). |
136 | 143 |
|
137 | | -- `0.11` denotes three-quarters. |
| 144 | +When numeric values are converted to `STR` (including by `PRINT`), they MUST be rendered in their own base and MUST include the base prefix. `INF`, `-INF`, and `NaN` are rendered exactly as shown. |
138 | 145 |
|
139 | | -`FLT` literals MUST NOT begin with the radix point (so `.1` is invalid). A leading `-` MAY prefix a `FLT` literal using the same rules as for integers (the dash is part of the literal and is not an operator). |
| 146 | +The built-in operators `CONVERT(num, base)` and `BASE(num)` are provided. `CONVERT` returns `num` represented in `base` (2..64). `BASE` returns the numeric base for non-NaN-base values. |
140 | 147 |
|
141 | | -In addition to binary fixed-point forms, the language also recognizes two special `FLT` literal tokens: `INF` and `NaN`. These tokens are matched case-sensitively and are part of the `FLT` literal class. `INF` denotes IEEE-754 infinity and `NaN` denotes a quiet Not-a-Number. `NaN` MUST NOT be negative; `-INF` (the dash token followed by `INF`) is permitted and denotes negative infinity. When `FLT` values are rendered as source, printed via `PRINT`, or serialized by the interpreter, `INF`, `-INF`, and `NaN` MUST appear exactly as shown. Arithmetic and comparison involving these values follow IEEE-754 semantics. |
| 148 | +Bitwise shift/logic operators `BAND`, `BOR`, `BXOR`, `SHL`, and `SHR` accept only binary (`0b`) `INT` operands. |
142 | 149 |
|
143 | 150 | String literal: a sequence of characters enclosed in either double quotation marks (`"`) or single quotation marks (`'`). A string opened with one delimiter MUST be closed with the same delimiter. Newlines are not permitted inside string literals. |
144 | 151 |
|
|
218 | 225 |
|
219 | 226 | When shared, aliased containers are required, developers should use pointer references (with `@`). |
220 | 227 |
|
221 | | -Every runtime value has a static type: `INT`, `FLT`, `STR`, `TNS`, `MAP`, `THR`, or `FUNC`. Integers are conceptually unbounded mathematical integers. Floats are IEEE754 binary floating-point numbers. Strings are sequences of characters (source text is ASCII, but escape codes MAY denote non-ASCII code points). Tensors are non-scalar aggregates whose elements MAY be `INT`, `FLT`, `STR`, `MAP`, `THR`, `FUNC`, or `TNS`. Maps are associative containers mapping scalar keys (`INT`, `FLT`, or `STR`) to values of a single static type. Threads are handles to parallel code blocks. Functions are user-defined code blocks with lexical closures. |
| 228 | +Every runtime value has a static type: `INT`, `FLT`, `STR`, `TNS`, `MAP`, `THR`, or `FUNC`. Integers are conceptually unbounded mathematical integers with an attached base in `[2,64]`. Floats are IEEE754 floating-point numbers with an attached base in `[2,64]` or base-NaN for `INF`/`NaN`. Strings are sequences of characters (source text is ASCII, but escape codes MAY denote non-ASCII code points). Tensors are non-scalar aggregates whose elements MAY be `INT`, `FLT`, `STR`, `MAP`, `THR`, `FUNC`, or `TNS`. Maps are associative containers mapping scalar keys (`INT`, `FLT`, or `STR`) to values of a single static type. Threads are handles to parallel code blocks. Functions are user-defined code blocks with lexical closures. |
222 | 229 |
|
223 | 230 | Function value (`FUNC`): a reference to a user-defined function body (including its lexical closure). A `FUNC` value can be stored in variables or tensors, passed as an argument, or returned from a function. The call syntax applies to any expression that evaluates to `FUNC`; for example, `alias()` calls the function bound to `alias`, and `tns[1]()` calls the function stored in that tensor element. `FUNC` values are always truthy; equality compares object identity (two references are equal only if they refer to the same function definition). String rendering produces an implementation-defined placeholder such as `<func name>`. |
224 | 231 |
|
225 | | -When a Boolean interpretation is required, `INT` treats 0 as false and non-zero as true; `FLT` treats 0.0 as false and any non-zero value as true; `STR` treats the empty string as false and any non-empty string as true; `TNS` is true if any contained element is true by these rules, otherwise false. Control-flow conditions (`IF`, `ELSEIF`, `WHILE`) and `ASSERT` convert strings to integers using the same rules as the `INT` built-in; tensors are first reduced to their Boolean truth value (1 or 0). |
| 232 | +When a Boolean interpretation is required, `INT` treats 0 as false and non-zero as true; `FLT` treats 0.0 as false and any non-zero value as true; `STR` treats the empty string as false and any non-empty string as true; `TNS` is true if any contained element is true by these rules, otherwise false. Control-flow conditions (`IF`, `ELSEIF`, `WHILE`) and `ASSERT` convert strings to integers using the same rules as the `INT` built-in (base-prefixed parse), and tensors are first reduced to their Boolean truth value (1 or 0). |
226 | 233 |
|
227 | 234 | `INT` and `FLT` are not interoperable: no implicit conversion occurs. Operators that accept both types require that all numeric arguments have the same numeric type. |
228 | 235 |
|
|
0 commit comments