Skip to content

Commit f60ead8

Browse files
author
rocky
committed
More explicit comments and ...
add ## comments, and document better module and method information
1 parent 8165db4 commit f60ead8

File tree

3 files changed

+83
-46
lines changed

3 files changed

+83
-46
lines changed

HOW-TO-USE.rst

Lines changed: 45 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -11,7 +11,7 @@ skeletal. Then use ``pydisasm --xasm`` to convert to the Python source
1111
code assembler format. From this then modify the results and run
1212
``pyc-xasm``.
1313

14-
In normal python disassembly code (and in the bytecode file), the main
14+
In normal Python disassembly code (and in the bytecode file), the main
1515
function appears first; it contains constants which contain code to
1616
other functions and so on. In Python disassembly these are linearized
1717
so that from top down you have a topological sort of the dependencies.
@@ -32,39 +32,66 @@ preferable right now.
3232
Format of assembly file
3333
=======================
3434

35-
Again, easiest to consult the ``pydisasm --xasm`` output ``.pyasm``-file that is
36-
produced. Even easier, just to look in the test directory_ for files that end
35+
Again, easiest to consult the ``pydisasm --xasm`` output ``.pyasm``-file that is
36+
produced. Even easier, just to look in the test directory_ for files that end
3737
with ``.pyasm``.
3838

39-
In general, lines that start with "#" in column one are comments or code or function
40-
objects other than bytecode instructions.
39+
Lines that start with '##" are comments.
4140

42-
Necessary fields that are in Python code object and function objects
43-
are here. These include stuff like the Python "magic" number which
44-
determines which Python bytecode opcodes to use and which Python
45-
interpreter can be used to run the resulting program.
41+
Examples:
42+
43+
## This line is a comment
44+
## Method Name: GameSheet
45+
46+
Lines that start with ``#``" in column one are used to indicate a code
47+
or function object that is not a bytecode instructions. However this
48+
is only true if the rest of the line matches one of the code of
49+
function objects mentioned below. If instead the the rest of a line
50+
does not match a function or code object, it line too will be
51+
treated tacitly as a comment.
4652

4753
Module-level info
4854
------------------
4955

5056

57+
The only necessary mdoule-level inforamtion that is needed is the
58+
Python "magic" number which determines which Python bytecode opcodes
59+
to use and which Python interpreter can be used to run the resulting
60+
program.
61+
62+
Optional information includes:
63+
64+
* Timestamp of code
65+
* Source code size module 2**32 or a SIP hash
66+
5167
Here is an example of the module-level information:
5268

5369
::
5470

5571
# Python bytecode 2.2 (60717)
5672
# Timestamp in code: 1499156389 (2017-07-04 04:19:49)
73+
# Source code size mod 2**32: 577 bytes
5774

58-
The bytecode is necessary. However the timestamp is not. In Python 3
59-
there is also a size modulo 2**32 that is recorded.
75+
Again, the bytecode numberf is necessary. However the timestamp is not. In Python 3
76+
there is also a size modulo 2**32 that is recorded, and in later Python this can be a
77+
SIP hash.
6078

6179
::
6280

63-
# Source code size mod 2**32: 577 bytes
6481

6582
Method-level info
6683
------------------
6784

85+
Method-level information starts with ``#`` in column one. Here is some
86+
method-level information:
87+
88+
* The method name of the code object (``Method Name``)
89+
* Number of local variables used in module or fuction (``Number of locals``)
90+
* A filename where the file (``Filename``)
91+
* Maximum Stack Size needed to run code (``Stack Size``)
92+
* Code flags which indicate properties of the code (``Flags``)
93+
* Fine number for the first line of the code (``First Line``)
94+
6895
Here is an example:
6996

7097
::
@@ -73,9 +100,10 @@ Here is an example:
73100
# Filename: exec
74101
# Argument count: 2
75102
# Kw-only arguments: 0
76-
# Number of locals: 2 # Stack size: 3
103+
# Number of locals: 2
104+
# Stack size: 3
77105
# Flags: 0x00000043 (NOFREE | NEWLOCALS | OPTIMIZED)
78-
# First Line: 11
106+
# First Line: 11
79107
# Constants:
80108
# 0: ' GCD. We assume positive numbers'
81109
# 1: 0
@@ -105,14 +133,14 @@ the last sentence means is that
105133

106134
LOAD_CONST 3
107135

108-
would be invalid if the size of the constant array is less than 4, or `constant[3]` wasn't defined by adding it to the `Constants` section. However when you put a value in parenthesis, that indicate a value rather than an index into a list.
136+
would be invalid if the size of the constant array is less than 4, or `constant[3]` wasn't defined by adding it to the `Constants` section. However when you put a value in parenthesis, that indicate a value rather than an index into a list.
109137
So you could instead write:
110138

111139
::
112140

113141
LOAD_CONST (1)
114142

115-
which in this case does the same thing since `1 = constant[3]`. If the value 1 does not appear anywhere in the constants list, the assembler would append the value 1 to the end of the list of the constants list. When writing the final bytecode file an appropriate constant index will be inserted into that instruction.
143+
which in this case does the same thing since `1 = constant[3]`. If the value 1 does not appear anywhere in the constants list, the assembler would append the value 1 to the end of the list of the constants list. When writing the final bytecode file an appropriate constant index will be inserted into that instruction.
116144

117145
Line Numbers and Labels
118146
-----------------------
@@ -187,7 +215,7 @@ parenthesis. For example:
187215
::
188216

189217
LOAD_CONST (3) # loads number 3
190-
LOAD_CONST 3 # load Constants[3]
218+
LOAD_CONST 3 # load Constants[3]
191219
JUMP_ABSOLUTE 10 # Jumps to offset 10
192220
JUMP_ABSOLUTE L10 # Jumps to label L10
193221
LOAD_CONSTANT (('load_entry_point',)) # Same as: tuple('load_entry_point')

xasm/assemble.py

Lines changed: 35 additions & 26 deletions
Original file line numberDiff line numberDiff line change
@@ -62,12 +62,12 @@ def get_opname_operand(opc, fields):
6262
return opname, None
6363

6464

65-
class Assembler(object):
65+
class Assembler:
6666
def __init__(self, python_version, is_pypy):
6767
self.opc = get_opcode(python_version, is_pypy)
6868
self.code_list = []
6969
self.codes = [] # FIXME use a better name
70-
self.status = "unfinished"
70+
self.status: str = "unfinished"
7171
self.size = 0 # Size of source code. Only relevant in version 3 and above
7272
self.python_version = python_version
7373
self.timestamp = None
@@ -144,6 +144,9 @@ def asm_file(path):
144144
while i < len(lines):
145145
line = lines[i]
146146
i += 1
147+
if line.startswith("##"):
148+
# comment line
149+
continue
147150
if line.startswith(".READ"):
148151
match = re.match("^.READ (.+)$", line)
149152
if match:
@@ -197,7 +200,9 @@ def asm_file(path):
197200
asm.timestamp = int(time_str)
198201
elif line.startswith("# Method Name: "):
199202
if method_name:
200-
co = create_code(asm, label, backpatch_inst)
203+
co, is_valid = create_code(asm, label, backpatch_inst)
204+
if not is_valid:
205+
return
201206
asm.update_lists(co, label, backpatch_inst)
202207
label = {}
203208
backpatch_inst = set([])
@@ -336,7 +341,7 @@ def asm_file(path):
336341

337342
if match:
338343
line_no = int(match.group(1))
339-
linetable_field = "co_linotab" if python_version_pair < (3, 10) else "co_linetable"
344+
linetable_field = "co_lnotab" if python_version_pair < (3, 10) else "co_linetable"
340345
assert asm is not None
341346
linetable = getattr(asm.code, linetable_field)
342347
linetable[offset] = line_no
@@ -348,27 +353,31 @@ def asm_file(path):
348353
if num_fields == 1 and line_no is not None:
349354
continue
350355

351-
if num_fields > 1:
352-
if fields[0] == ">>":
353-
fields = fields[1:]
354-
num_fields -= 1
355-
if match_lineno(fields[0]) and is_int(fields[1]):
356-
line_no = int(fields[0][:-1])
357-
opname, operand = get_opname_operand(asm.opc, fields[2:])
358-
elif match_lineno(fields[0]):
359-
line_no = int(fields[0][:-1])
360-
fields = fields[1:]
356+
try:
357+
if num_fields > 1:
361358
if fields[0] == ">>":
362359
fields = fields[1:]
363-
if is_int(fields[0]):
360+
num_fields -= 1
361+
if match_lineno(fields[0]) and is_int(fields[1]):
362+
line_no = int(fields[0][:-1])
363+
opname, operand = get_opname_operand(asm.opc, fields[2:])
364+
elif match_lineno(fields[0]):
365+
line_no = int(fields[0][:-1])
366+
fields = fields[1:]
367+
if fields[0] == ">>":
364368
fields = fields[1:]
365-
opname, operand = get_opname_operand(asm.opc, fields)
366-
elif is_int(fields[0]):
367-
opname, operand = get_opname_operand(asm.opc, fields[1:])
369+
if is_int(fields[0]):
370+
fields = fields[1:]
371+
opname, operand = get_opname_operand(asm.opc, fields)
372+
elif is_int(fields[0]):
373+
opname, operand = get_opname_operand(asm.opc, fields[1:])
374+
else:
375+
opname, operand = get_opname_operand(asm.opc, fields)
368376
else:
369-
opname, operand = get_opname_operand(asm.opc, fields)
370-
else:
371-
opname, _ = get_opname_operand(asm.opc, fields)
377+
opname, _ = get_opname_operand(asm.opc, fields)
378+
except Exception as e:
379+
print(f"Line {i}: {e}")
380+
raise
372381

373382
if opname in asm.opc.opname:
374383
inst = Instruction()
@@ -688,13 +697,13 @@ def create_code(asm: Assembler, label, backpatch):
688697
is_code_ok(asm)
689698

690699
# Stamp might be added here
691-
# if asm.python_version[:2] == PYTHON_VERSION_TRIPLE[:2]:
692-
# code = asm.code.to_native()
693-
# else:
694-
code = asm.code.freeze()
700+
if asm.python_version[:2] == PYTHON_VERSION_TRIPLE[:2]:
701+
code = asm.code.to_native()
702+
else:
703+
code = asm.code.freeze()
695704

696705
# asm.print_instructions()
697706

698707
# print (*args)
699708
# co = self.Code(*args)
700-
return code
709+
return code, is_valid

xasm/pyc_convert.py

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -142,7 +142,7 @@ def transform_33_32(inst, new_inst, i, n, offset, instructions, new_asm):
142142

143143

144144
def transform_asm(asm, conversion_type, src_version, dest_version):
145-
new_asm = Assembler(dest_version)
145+
new_asm = Assembler(dest_version, is_pypy=False)
146146
for field in "code size".split():
147147
setattr(new_asm, field, copy(getattr(asm, field)))
148148

@@ -177,10 +177,10 @@ def transform_asm(asm, conversion_type, src_version, dest_version):
177177
i += 1
178178
pass
179179

180-
co = create_code(new_asm, new_asm.label[-1], new_asm.backpatch[-1])
180+
co, is_valid = create_code(new_asm, new_asm.label[-1], new_asm.backpatch[-1])
181181
new_asm.code_list.append(co)
182182
new_asm.code_list.reverse()
183-
new_asm.finished = "finished"
183+
new_asm.status = "finished" if is_valid else "invalid"
184184
return new_asm
185185

186186

0 commit comments

Comments
 (0)