Skip to content

Commit 8cef90e

Browse files
authored
Decode arrays given as Constant operands (#674)
In preparation for decoding allocated constants: * created a new module `RT-DECODING` in `rt/decoding.md` * rewriting for `#decodeConstant` goes through `decodeValue` in this module for allocated constants * added a decoder for arrays of values (iteratively calls `decodeValue` for each array element, consuming the given bytes) The added test program contains code with arrays inlined into the function calls as constant operands.
1 parent 9f57b55 commit 8cef90e

13 files changed

Lines changed: 8963 additions & 77 deletions

kmir/src/kmir/kdist/mir-semantics/rt/data.md

Lines changed: 35 additions & 31 deletions
Original file line numberDiff line numberDiff line change
@@ -9,6 +9,7 @@ requires "../ty.md"
99
requires "./types.md"
1010
requires "./value.md"
1111
requires "./numbers.md"
12+
requires "./decoding.md"
1213
1314
module RT-DATA
1415
imports INT
@@ -24,8 +25,10 @@ module RT-DATA
2425
2526
imports RT-VALUE-SYNTAX
2627
imports RT-NUMBERS
28+
imports RT-DECODING
2729
imports RT-TYPES
2830
imports KMIR-CONFIGURATION
31+
2932
```
3033

3134
## Operations on local variables
@@ -74,20 +77,21 @@ To ensure the sort coercions above do not cause any harm, some definedness-relat
7477

7578
### Evaluating Items to `Value`s
7679

77-
Some built-in operations (`RValue` or type casts) use constructs that will evaluate to a value of sort `Value`.
78-
The basic operations of reading and writing those values can use K's "heating" and "cooling" rules to describe their evaluation.
79-
Other uses of heating and cooling are to _read_ local variables as operands.
80-
A `TypedValue` stored as a local is trivially rewritten to `Value` by projecting out the value.
80+
Many rules for MIR constructs in this module use heating and cooling
81+
to evaluate expressions to results or read local variables as operands.
82+
The `Evaluation` sort gathers all constructs that can evaluate to a `Value`, defined together with `Value`.
83+
84+
First, a `TypedValue` stored in a local is trivially rewritten to `Value` by projecting out the value.
8185
It is an error to read `NewLocal` or `Moved`.
8286

8387
```k
84-
syntax Evaluation ::= TypedValue | Value // other sorts are added at the first use site
85-
86-
syntax KResult ::= Value
88+
syntax Evaluation ::= TypedValue
8789
8890
rule <k> typedValue(VAL, _, _) => VAL ... </k> [priority(100)]
8991
```
9092

93+
Other subsorts of `Evaluation` are defined when first used.
94+
9195
### `thunk`
9296

9397
We also create a subsort of `Value` that is a `thunk` which takes an `Evaluation` as an argument.
@@ -1232,47 +1236,47 @@ What can be supported without additional layout consideration is trivial casts b
12321236

12331237
| CastKind | Description |
12341238
|------------------------------|-------------|
1235-
| PointerExposeProvenance | |
1239+
| PointerExposeAddress | |
12361240
| PointerWithExposedProvenance | |
12371241
| FnPtrToPtr | |
12381242

12391243
## Decoding constants from their bytes representation to values
12401244

12411245
The `Value` sort above operates at a higher level than the bytes representation found in the MIR syntax for constant values.
12421246
The bytes have to be interpreted according to the given `TypeInfo` to produce the higher-level value.
1243-
This is currently only defined for `PrimitiveType`s (primitive types in MIR).
12441247

12451248
```k
12461249
syntax Evaluation ::= #decodeConstant ( ConstantKind, Ty, TypeInfo )
1250+
```
12471251

1248-
//////////////////////////////////////////////////////////////////////////////////////
1249-
// decoding the correct amount of bytes depending on base type size
1252+
For allocated constants without provenance, the decoder works directly with the bytes.
12501253

1251-
// Boolean: should be one byte with value one or zero
1252-
rule <k> #decodeConstant(constantKindAllocated(allocation(BYTES, _, _, _)), _TY, typeInfoPrimitiveType(primTypeBool))
1253-
=> BoolVal(false) ... </k>
1254-
requires 0 ==Int Bytes2Int(BYTES, LE, Unsigned) andBool lengthBytes(BYTES) ==Int 1
1254+
```k
1255+
rule <k> #decodeConstant(
1256+
constantKindAllocated(allocation(BYTES, provenanceMap(.ProvenanceMapEntries), _, _)),
1257+
_TY,
1258+
TYPEINFO
1259+
)
1260+
=> #decodeValue(BYTES, TYPEINFO, TYPEMAP)
1261+
...
1262+
</k>
1263+
<types> TYPEMAP </types>
1264+
```
12551265

1256-
rule <k> #decodeConstant(constantKindAllocated(allocation(BYTES, _, _, _)), _TY, typeInfoPrimitiveType(primTypeBool))
1257-
=> BoolVal(true) ... </k>
1258-
requires 1 ==Int Bytes2Int(BYTES, LE, Unsigned) andBool lengthBytes(BYTES) ==Int 1
1266+
Zero-sized types can be decoded trivially into their respective representation.
12591267

1260-
// Integer: handled in separate module for numeric operation_s
1261-
rule <k> #decodeConstant(constantKindAllocated(allocation(BYTES, _, _, _)), _TY, TYPEINFO)
1262-
=> #decodeInteger(BYTES, #intTypeOf(TYPEINFO)) ... </k>
1263-
requires #isIntType(TYPEINFO)
1264-
andBool lengthBytes(BYTES) ==K #bitWidth(#intTypeOf(TYPEINFO)) /Int 8
1265-
[preserves-definedness]
1268+
**FIXME test the new cases for tuple and array/slice**
12661269

1267-
// zero-sized struct types
1270+
```k
1271+
// zero-sized struct
12681272
rule <k> #decodeConstant(constantKindZeroSized, _TY, typeInfoStructType(_, _, _))
12691273
=> Aggregate(variantIdx(0), .List) ... </k>
1270-
1271-
// TODO Char type
1272-
// rule #decodeConstant(constantKindAllocated(allocation(BYTES, _, _, _)), typeInfoPrimitiveType(primTypeChar)) => typedValue(Str(...), TY, mutabilityNot)
1273-
// TODO Float decoding: not supported natively in K
1274-
1275-
// unimplemented cases stored as thunks
1274+
// zero-sized tuple
1275+
rule <k> #decodeConstant(constantKindZeroSized, _TY, typeInfoTupleType(_))
1276+
=> Aggregate(variantIdx(0), .List) ... </k>
1277+
// zero-sized array
1278+
rule <k> #decodeConstant(constantKindZeroSized, _TY, typeInfoArrayType(_, _))
1279+
=> Range(.List) ... </k>
12761280
```
12771281

12781282
## Primitive operations on numeric data
Lines changed: 173 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,173 @@
1+
# Allocation Decoding in MIR-Semantics
2+
3+
This module provides functions for decoding byte representations of various allocations into
4+
high-level `Value` representations used by the MIR semantics.
5+
6+
When Rust code contains constants (arrays, structs, enums, etc.), the compiler stores these as
7+
byte sequences in the SMIR JSON output.
8+
The semantics needs to decode these bytes back into structured values that can be operated on at
9+
runtime.
10+
This module contains the decoding functions for different allocation types, handling the conversion
11+
from raw bytes to typed `Value` objects according to Rust's memory layout rules.
12+
13+
```k
14+
requires "../ty.md"
15+
requires "value.md"
16+
requires "numbers.md"
17+
18+
module RT-DECODING
19+
imports BOOL
20+
imports MAP
21+
22+
imports TYPES
23+
imports RT-VALUE-SYNTAX
24+
imports RT-NUMBERS
25+
imports RT-TYPES
26+
```
27+
28+
## Element Decoding Interface to turn bytes into a `Value`
29+
30+
This recursive decoder function checks byte length and decodes the bytes to a `Value` of the given type.
31+
32+
This is currently only defined for `PrimitiveType`s (primitive types in MIR).
33+
and arrays (where layout is trivial).
34+
35+
### Decoding `PrimitiveType`s
36+
37+
```k
38+
syntax Evaluation ::= #decodeValue ( Bytes , TypeInfo , Map ) [function, total]
39+
| UnableToDecode( Bytes , TypeInfo )
40+
41+
// Boolean: should be one byte with value one or zero
42+
rule #decodeValue(BYTES, typeInfoPrimitiveType(primTypeBool), _TYPEMAP) => BoolVal(false)
43+
requires 0 ==Int Bytes2Int(BYTES, LE, Unsigned) andBool lengthBytes(BYTES) ==Int 1
44+
45+
rule #decodeValue(BYTES, typeInfoPrimitiveType(primTypeBool), _TYPEMAP) => BoolVal(true)
46+
requires 1 ==Int Bytes2Int(BYTES, LE, Unsigned) andBool lengthBytes(BYTES) ==Int 1
47+
48+
// Integer: handled in separate module for numeric operation_s
49+
rule #decodeValue(BYTES, TYPEINFO, _TYPEMAP) => #decodeInteger(BYTES, #intTypeOf(TYPEINFO))
50+
requires #isIntType(TYPEINFO) andBool lengthBytes(BYTES) ==Int #elemSize(TYPEINFO)
51+
[preserves-definedness]
52+
53+
// TODO Char type
54+
// rule #decodeConstant(constantKindAllocated(allocation(BYTES, _, _, _)), typeInfoPrimitiveType(primTypeChar)) => typedValue(Str(...), TY, mutabilityNot)
55+
56+
// TODO Float decoding: not supported natively in K
57+
```
58+
59+
60+
### Array decoding
61+
62+
Arrays are decoded iteratively, using a known (expected) length or the length of the byte array.
63+
64+
```k
65+
rule #decodeValue(BYTES, typeInfoArrayType(ELEMTY, someTyConst(tyConst(LEN, _))), TYPEMAP)
66+
=> #decodeArrayAllocation(BYTES, {TYPEMAP[ELEMTY]}:>TypeInfo, readTyConstInt(LEN, TYPEMAP))
67+
requires ELEMTY in_keys(TYPEMAP)
68+
andBool isTypeInfo(TYPEMAP[ELEMTY])
69+
andBool isInt(readTyConstInt(LEN, TYPEMAP))
70+
[preserves-definedness]
71+
72+
rule #decodeValue(BYTES, typeInfoArrayType(ELEMTY, noTyConst), TYPEMAP)
73+
=> #decodeSliceAllocation(BYTES, {TYPEMAP[ELEMTY]}:>TypeInfo)
74+
requires ELEMTY in_keys(TYPEMAP)
75+
andBool isTypeInfo(TYPEMAP[ELEMTY])
76+
```
77+
78+
### Error marker (becomes thunk) for other (unimplemented) cases
79+
80+
All unimplemented cases will become thunks by way of this default rule:
81+
82+
```k
83+
rule #decodeValue(BYTES, TYPEINFO, _TYPEMAP) => UnableToDecode(BYTES, TYPEINFO) [owise]
84+
```
85+
86+
## Helper function to determine the expected byte length for a type
87+
88+
```k
89+
// TODO: this function should go into the rt/types.md module
90+
syntax Int ::= #elemSize ( TypeInfo ) [function]
91+
```
92+
93+
Known element sizes for common types:
94+
95+
```k
96+
rule #elemSize(typeInfoPrimitiveType(primTypeBool)) => 1
97+
rule #elemSize(TYPEINFO) => #bitWidth(#intTypeOf(TYPEINFO)) /Int 8
98+
requires #isIntType(TYPEINFO)
99+
100+
rule 0 <=Int #elemSize(_) => true [simplification, preserves-definedness]
101+
```
102+
103+
104+
105+
## Array Allocations
106+
107+
Array allocations contain homogeneous elements stored contiguously in memory.
108+
The main function `#decodeArrayAllocation` takes the raw bytes of an array allocation along with
109+
type information and converts it into a `Range` value containing the decoded elements.
110+
111+
The decoding process:
112+
1. Takes the byte array, element type information, and array length
113+
2. Iteratively consumes elements from the front of the byte array
114+
3. Decodes each element according to its type using `#decodeElement`
115+
4. Accumulates the decoded elements into a list
116+
5. Returns a `Range` value containing all elements
117+
118+
The byte consumption approach allows for validation - if there are surplus bytes or insufficient
119+
bytes for the declared array length, the function will get stuck rather than produce incorrect
120+
results.
121+
122+
```k
123+
syntax Value ::= #decodeArrayAllocation ( Bytes, TypeInfo, Int ) [function]
124+
// bytes, element type info, array length
125+
126+
rule #decodeArrayAllocation(BYTES, ELEMTYPEINFO, LEN)
127+
=> Range(#decodeArrayElements(BYTES, ELEMTYPEINFO, LEN, .List))
128+
129+
syntax List ::= #decodeArrayElements ( Bytes, TypeInfo, Int, List ) [function]
130+
// bytes, elem type info, remaining length, accumulated list
131+
132+
rule #decodeArrayElements(BYTES, _ELEMTYPEINFO, LEN, ACC)
133+
=> ACC
134+
requires LEN <=Int 0
135+
andBool lengthBytes(BYTES) ==Int 0 // exact match - no surplus bytes
136+
[preserves-definedness]
137+
138+
rule #decodeArrayElements(BYTES, ELEMTYPEINFO, LEN, ACC)
139+
=> #decodeArrayElements(
140+
substrBytes(BYTES, #elemSize(ELEMTYPEINFO), lengthBytes(BYTES)),
141+
ELEMTYPEINFO,
142+
LEN -Int 1,
143+
ACC ListItem(#decodeValue(
144+
substrBytes(BYTES, 0, #elemSize(ELEMTYPEINFO)),
145+
ELEMTYPEINFO,
146+
.Map // HACK
147+
))
148+
)
149+
requires LEN >Int 0
150+
andBool lengthBytes(BYTES) >=Int #elemSize(ELEMTYPEINFO) // enough bytes remaining
151+
[preserves-definedness]
152+
```
153+
154+
## Slice Allocations
155+
156+
Slices are arrays with dynamic length.
157+
The `#decodeSliceAllocation` function computes the array length by dividing the total byte length
158+
by the element size, then uses the same element-by-element decoding approach as arrays.
159+
160+
```k
161+
syntax Value ::= #decodeSliceAllocation ( Bytes, TypeInfo ) [function]
162+
// -------------------------------------------------------------------
163+
rule #decodeSliceAllocation(BYTES, ELEMTYPEINFO)
164+
=> Range(#decodeArrayElements(BYTES, ELEMTYPEINFO,
165+
lengthBytes(BYTES) /Int #elemSize(ELEMTYPEINFO), .List))
166+
requires lengthBytes(BYTES) %Int #elemSize(ELEMTYPEINFO) ==Int 0 // element size divides cleanly
167+
andBool 0 <Int #elemSize(ELEMTYPEINFO)
168+
[preserves-definedness]
169+
```
170+
171+
```k
172+
endmodule
173+
```

kmir/src/kmir/kdist/mir-semantics/rt/value.md

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -105,6 +105,17 @@ The local variables may be actual values (`typedValue`) or uninitialised (`NewLo
105105
rule valueOf(typedValue(V, _, _)) => V
106106
```
107107

108+
## Evaluating Items to `Value`s
109+
110+
Many built-in operations (`RValue` or type casts) use `Operand`s that will evaluate to a value of sort `Value`.
111+
The basic operations of reading and writing those values can use K's "heating" and "cooling" rules to describe their evaluation to `Value`s.
112+
113+
```k
114+
syntax Evaluation ::= Value // other sorts are added at the first use site
115+
116+
syntax KResult ::= Value
117+
```
118+
108119
## A generic MIR Error sort
109120

110121
```k
Lines changed: 18 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,18 @@
1+
const I8_ARRAY: [i8; 3] = [1, -2, 3];
2+
const I32_ARRAY: [i32; 4] = [10, -20, 30, -40];
3+
4+
fn main() {
5+
6+
// Product of first two elements
7+
let i8_product = I8_ARRAY[0] * I8_ARRAY[1];
8+
let i32_product = I32_ARRAY[0] * I32_ARRAY[1];
9+
10+
// Assertions
11+
12+
// these constants get allocated, which is not supported yet
13+
// assert_eq!(i8_product, -2); // 1 * (-2) = -2
14+
// assert_eq!(i32_product, -200); // 10 * -20 = -200
15+
16+
// therefore using a computation instead of constants
17+
assert_eq!(i8_product as i32 * 100, i32_product);
18+
}

0 commit comments

Comments
 (0)