Skip to content

Commit 57d8fd8

Browse files
gh-57: Add support for multiple bases to Prefix.
1 parent 2f74cdb commit 57d8fd8

21 files changed

Lines changed: 1413 additions & 1327 deletions

docs/SPECIFICATION.html

Lines changed: 20 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -64,7 +64,7 @@
6464

6565
Programs consist of variable declarations via assignment, expressions, and control-flow constructs such as `IF`, `ELSEIF`, `ELSE`, `WHILE`, and `FOR`.
6666

67-
Prefix has seven runtime data types: binary integers (`INT`), binary floating-point numbers (`FLT`, IEEE754), strings (`STR`), non-scalar tensors (`TNS`), first-class user-defined functions (`FUNC`), associative maps (`MAP`), and thread handles (`THR`).
67+
Prefix has seven runtime data types: based integers (`INT`), based floating-point numbers (`FLT`, IEEE754), strings (`STR`), non-scalar tensors (`TNS`), first-class user-defined functions (`FUNC`), associative maps (`MAP`), and thread handles (`THR`).
6868
6969
Identifiers, function parameters, and return values are statically typed; the type of every symbol MUST be declared when it is first introduced. Computation proceeds by evaluating expressions and executing statements in sequence, with explicit constructs for branching and looping. Input and output are modeled through built-in operators, in particular `INPUT` and `PRINT`.
7070
@@ -86,7 +86,7 @@
8686

8787
Line continuation: The character `^` serves as a line-continuation marker. When a caret `^` appears in the source and is followed immediately by a newline, both the `^` and the newline are ignored by the lexer (that is, the logical line continues on the next physical line). The lexer also accepts a caret immediately before a code note (a comment beginning with `!`); in this case the `^`, the comment text up to the line terminator, and the terminating newline are treated as if they were not present. If a `^` is present in a string, it does not count as a line continuation. If a caret appears and is not immediately followed by a newline, a code note, or the platform's single-character newline sequence, the lexer MUST raise a syntax error.
8888

89-
The character `-` primarily serves as the leading sign of a numeric literal (Section 3.1). When `-` appears immediately before OPTIONAL whitespace and then binary digits, it is parsed as part of the numeric literal (that is, a signed literal). In other contexts (for example inside an index expression) a single `-` token is recognized as a dash used for slice notation `lo-hi`. If `-` appears in any other unsupported context the lexer MUST raise a syntax error.
89+
The character `-` primarily serves as the leading sign of a numeric literal (Section 3.1). A signed numeric literal MUST place the sign before the base prefix (for example, `-0xA`, `-0d10.5`). In other contexts (for example inside an index expression) a single `-` token is recognized as a dash used for slice notation `lo-hi`. If `-` appears in any other unsupported context the lexer MUST raise a syntax error.
9090

9191
Identifiers denote variables and user-defined functions. They MUST be non-empty and case-sensitive. An identifier MUST NOT contain non-ASCII characters, nor any of the following characters: `{`, `}`, `[`, `]`, `(`, `)`, `=`, `,`, `!`. The first character of an identifier MUST NOT be the digit `0` or `1` (these digits are used to begin binary integer literals). However, the characters `0` and `1` are permitted in subsequent positions within an identifier (for example, `a01` and `X10Y` are valid identifiers, while `0foo` and `1bar` are not). The namespace is flat: variables and functions share a single identifier space, so a given name cannot simultaneously denote both. A user-defined function name MUST NOT conflict with the name of any built-in operator or function (see Section 13).
9292

@@ -115,7 +115,7 @@
115115

116116
As noted above, non-ASCII characters remain disallowed, and the delimiter characters `{`, `}`, `(`, `)`, `=`, `,`, and `!` are never permitted inside identifiers.
117117

118-
This deliberately-permissive identifier character set preserves an unambiguous lexical distinction between binary integer literals (which MUST begin with `0` or `1`) and identifiers, while allowing module-qualified names and other symbolic conventions to be expressed directly as plain identifiers in source code.
118+
This deliberately-permissive identifier character set preserves an unambiguous lexical distinction between numeric literals (which MUST begin with a base prefix starting with `0`) and identifiers, while allowing module-qualified names and other symbolic conventions to be expressed directly as plain identifiers in source code.
119119

120120
### Pointer operator
121121

@@ -124,21 +124,28 @@
124124

125125
## 3. Data Model
126126

127-
Prefix supports seven runtime data types: binary integers, binary floating-point numbers, strings, non-scalar tensors, first-class functions, associative maps, and thread handles.
127+
Prefix supports seven runtime data types: based integers, based floating-point numbers, strings, non-scalar tensors, first-class functions, associative maps, and thread handles.
128128

129-
Binary integer literal: an unsigned non-empty sequence of `{0,1}` (for example, `0`, `1`, `1011`), or a signed literal formed by a leading `-` (the dash is part of the literal, not an operator) followed by OPTIONAL spaces, tabs, or carriage returns and then a non-empty sequence of `{0,1}`. A `-` that does not immediately introduce a literal is a syntax error.
129+
Numeric literals (`INT` and `FLT`) MUST include an explicit base prefix. Valid prefixes are:
130130
131-
Binary floating-point literal: an IEEE754 floating-point value written in binary fixed-point notation `n.n`, where both sides of the radix point are non-empty sequences of `{0,1}`. Examples:
131+
- `0b` = base 2 digits `0-1`
132+
- `0o` = base 8 digits `0-7`
133+
- `0d` = base 10 digits `0-9`
134+
- `0x` = base 16 digits `0-9` and `A-F`
135+
- `0t` = base 32 digits `0-9` and `A-V`
136+
- `0c` = base 58 digits `1-9`, `A-H`, `J-N`, `P-Z`, `a-k`, `m-z`
137+
- `0s` = base 64 digits `0-9`, `A-Z`, `a-z`, `+`, `_`
138+
- `0rNN` = base `NN` where `NN` is two decimal digits and `2 <= NN <= 64`
132139
133-
- `0.1` denotes one-half.
140+
Signed numeric literals MUST place the sign before the prefix (for example, `-0d10`, `-0xA.8`). `FLT` literals MUST include both integer and fractional parts around the radix point. `FLT` infinities and NaN are written as `INF`, `-INF`, and `NaN`, and do not carry a numeric base prefix (they are considered base-NaN).
134141
135-
- `0.01` denotes one-quarter.
142+
`INT` and `FLT` values store their base at runtime. The result of a mathematical operation MUST use the highest base present in its numeric operands. Bases below 2 or above 64 are invalid and MUST raise an error when requested (for example via `CONVERT`).
136143
137-
- `0.11` denotes three-quarters.
144+
When numeric values are converted to `STR` (including by `PRINT`), they MUST be rendered in their own base and MUST include the base prefix. `INF`, `-INF`, and `NaN` are rendered exactly as shown.
138145
139-
`FLT` literals MUST NOT begin with the radix point (so `.1` is invalid). A leading `-` MAY prefix a `FLT` literal using the same rules as for integers (the dash is part of the literal and is not an operator).
146+
The built-in operators `CONVERT(num, base)` and `BASE(num)` are provided. `CONVERT` returns `num` represented in `base` (2..64). `BASE` returns the numeric base for non-NaN-base values.
140147
141-
In addition to binary fixed-point forms, the language also recognizes two special `FLT` literal tokens: `INF` and `NaN`. These tokens are matched case-sensitively and are part of the `FLT` literal class. `INF` denotes IEEE-754 infinity and `NaN` denotes a quiet Not-a-Number. `NaN` MUST NOT be negative; `-INF` (the dash token followed by `INF`) is permitted and denotes negative infinity. When `FLT` values are rendered as source, printed via `PRINT`, or serialized by the interpreter, `INF`, `-INF`, and `NaN` MUST appear exactly as shown. Arithmetic and comparison involving these values follow IEEE-754 semantics.
148+
Bitwise shift/logic operators `BAND`, `BOR`, `BXOR`, `SHL`, and `SHR` accept only binary (`0b`) `INT` operands.
142149
143150
String literal: a sequence of characters enclosed in either double quotation marks (`"`) or single quotation marks (`'`). A string opened with one delimiter MUST be closed with the same delimiter. Newlines are not permitted inside string literals.
144151
@@ -218,11 +225,11 @@
218225
219226
When shared, aliased containers are required, developers should use pointer references (with `@`).
220227
221-
Every runtime value has a static type: `INT`, `FLT`, `STR`, `TNS`, `MAP`, `THR`, or `FUNC`. Integers are conceptually unbounded mathematical integers. Floats are IEEE754 binary floating-point numbers. Strings are sequences of characters (source text is ASCII, but escape codes MAY denote non-ASCII code points). Tensors are non-scalar aggregates whose elements MAY be `INT`, `FLT`, `STR`, `MAP`, `THR`, `FUNC`, or `TNS`. Maps are associative containers mapping scalar keys (`INT`, `FLT`, or `STR`) to values of a single static type. Threads are handles to parallel code blocks. Functions are user-defined code blocks with lexical closures.
228+
Every runtime value has a static type: `INT`, `FLT`, `STR`, `TNS`, `MAP`, `THR`, or `FUNC`. Integers are conceptually unbounded mathematical integers with an attached base in `[2,64]`. Floats are IEEE754 floating-point numbers with an attached base in `[2,64]` or base-NaN for `INF`/`NaN`. Strings are sequences of characters (source text is ASCII, but escape codes MAY denote non-ASCII code points). Tensors are non-scalar aggregates whose elements MAY be `INT`, `FLT`, `STR`, `MAP`, `THR`, `FUNC`, or `TNS`. Maps are associative containers mapping scalar keys (`INT`, `FLT`, or `STR`) to values of a single static type. Threads are handles to parallel code blocks. Functions are user-defined code blocks with lexical closures.
222229
223230
Function value (`FUNC`): a reference to a user-defined function body (including its lexical closure). A `FUNC` value can be stored in variables or tensors, passed as an argument, or returned from a function. The call syntax applies to any expression that evaluates to `FUNC`; for example, `alias()` calls the function bound to `alias`, and `tns[1]()` calls the function stored in that tensor element. `FUNC` values are always truthy; equality compares object identity (two references are equal only if they refer to the same function definition). String rendering produces an implementation-defined placeholder such as `<func name>`.
224231
225-
When a Boolean interpretation is required, `INT` treats 0 as false and non-zero as true; `FLT` treats 0.0 as false and any non-zero value as true; `STR` treats the empty string as false and any non-empty string as true; `TNS` is true if any contained element is true by these rules, otherwise false. Control-flow conditions (`IF`, `ELSEIF`, `WHILE`) and `ASSERT` convert strings to integers using the same rules as the `INT` built-in; tensors are first reduced to their Boolean truth value (1 or 0).
232+
When a Boolean interpretation is required, `INT` treats 0 as false and non-zero as true; `FLT` treats 0.0 as false and any non-zero value as true; `STR` treats the empty string as false and any non-empty string as true; `TNS` is true if any contained element is true by these rules, otherwise false. Control-flow conditions (`IF`, `ELSEIF`, `WHILE`) and `ASSERT` convert strings to integers using the same rules as the `INT` built-in (base-prefixed parse), and tensors are first reduced to their Boolean truth value (1 or 0).
226233
227234
`INT` and `FLT` are not interoperable: no implicit conversion occurs. Operators that accept both types require that all numeric arguments have the same numeric type.
228235

src/ast.c

Lines changed: 7 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -17,21 +17,24 @@ static void* ast_alloc(size_t size) {
1717
return ptr;
1818
}
1919

20-
Expr* expr_int(int64_t value, int line, int column) {
20+
Expr* expr_int(int64_t value, int base, int line, int column) {
2121
Expr* expr = ast_alloc(sizeof(Expr));
2222
expr->type = EXPR_INT;
2323
expr->line = line;
2424
expr->column = column;
25-
expr->as.int_value = value;
25+
expr->as.int_value.value = value;
26+
expr->as.int_value.base = base;
2627
return expr;
2728
}
2829

29-
Expr* expr_flt(double value, int line, int column) {
30+
Expr* expr_flt(double value, int base, int base_is_nan, int line, int column) {
3031
Expr* expr = ast_alloc(sizeof(Expr));
3132
expr->type = EXPR_FLT;
3233
expr->line = line;
3334
expr->column = column;
34-
expr->as.flt_value = value;
35+
expr->as.flt_value.value = value;
36+
expr->as.flt_value.base = base;
37+
expr->as.flt_value.base_is_nan = base_is_nan;
3538
return expr;
3639
}
3740

src/ast.h

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -56,8 +56,8 @@ struct Expr {
5656
int line;
5757
int column;
5858
union {
59-
int64_t int_value;
60-
double flt_value;
59+
struct { int64_t value; int base; } int_value;
60+
struct { double value; int base; int base_is_nan; } flt_value;
6161
char* str_value;
6262
char* ident;
6363
char* ptr_name;
@@ -150,8 +150,8 @@ struct Stmt {
150150
} as;
151151
};
152152

153-
Expr* expr_int(int64_t value, int line, int column);
154-
Expr* expr_flt(double value, int line, int column);
153+
Expr* expr_int(int64_t value, int base, int line, int column);
154+
Expr* expr_flt(double value, int base, int base_is_nan, int line, int column);
155155
Expr* expr_str(char* value, int line, int column);
156156
Expr* expr_ptr(char* name, int line, int column);
157157
Expr* expr_ident(char* name, int line, int column);

0 commit comments

Comments
 (0)