IR: Refactoring and introduction of explicitly names "statement nodes"#656
IR: Refactoring and introduction of explicitly names "statement nodes"#656mlange05 wants to merge 19 commits into
Conversation
|
Documentation for this branch can be viewed at https://sites.ecmwf.int/docs/loki/656/index.html |
cf50b27 to
c705e00
Compare
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## main #656 +/- ##
==========================================
- Coverage 94.20% 94.16% -0.04%
==========================================
Files 288 286 -2
Lines 47546 47274 -272
==========================================
- Hits 44790 44516 -274
- Misses 2756 2758 +2
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
51a2cda to
da12fe7
Compare
da12fe7 to
3055d61
Compare
|
@reuterbal I've rebased this over recent main (0.3.6) for now and added a few more (less commonly used) statement node types. Seeing this, it might make sense to wait for PR #665 and then expand the explicit frontend use of these uncommon types? |
57f25ff to
0fef097
Compare
|
Ok, @reuterbal I've rebased and added some bits to the new frontend test structure. There's more to be done there, but this one was already getting big, so I'll keep it to what it is now. Ready for another look. |
reuterbal
left a comment
There was a problem hiding this comment.
Thanks!
The refactoring of IR nodes into multiple files is a very welcome change. Unfortunately, the Github diff view doesn't make it very easy to see where things have simply moved or where other changes have been incorporated, too - but I trust that this has been executed with caution and that tests would have failed if something was out of the ordinary.
I have flagged a few cases where I think the frontends are not correctly translating to the new statement node types but instead create "old-fashioned" intrinsic/generic nodes. That should be straightforward to fix.
The only conceptual question for me is the fact, that the new statement nodes are not capturing execution/programming concepts in the way we used to do this as much as possible in the Loki IR, but are very much aligned with the specific Fortran-way of expressing such concepts. I flagged this in the respective places and would be interested in your thoughts on this. I am not fundamentally opposed to doing it in the way this is implement now, but wanted to point out the change this constitutes and ensure that it is a concious decision to design it that way.
| code = module.to_fortran() | ||
| code_lines = code.splitlines() | ||
| assert len(code_lines) in (35, 36) # OMNI produces an extra line | ||
| assert len(code_lines) in (35, 37) # OMNI produces an extra line |
|
|
||
|
|
||
| @dataclass_strict(frozen=True) | ||
| class PublicStmt(GenericStmt): |
There was a problem hiding this comment.
Not sure if I 100% like the very Fortran-string-equivalent representation of statement nodes, as it abandons the user-friendly view onto execution concepts that Loki encapsulated so far. It removes some of the higher-level approach over an AST representation that we tried to maintain so far.
Specifically, I'm concerned about PublicStmt vs. PrivateStmt - both of which could be classified as a VisibilityStmt with a positive vs. negative default. A transformation or lint rule that is interested in visibility of symbols will now have to look for both node types instead of searching directly for one type only.
|
|
||
|
|
||
| @dataclass_strict(frozen=True) | ||
| class ContainsStmt(GenericStmt): |
There was a problem hiding this comment.
Similarly for ContainsStmt: Semantically, this does not add any additional information other than delineating the beginning of the contains part. Loki's IR already captures this through the contains property and I'm wondering if a ContainsSection as an overload of the internal node type Section wouldn't be a more intuitive way of representing this concept without the need to explicitly represent this as a single-word-string-only node?
|
|
||
|
|
||
| @dataclass_strict(frozen=True) | ||
| class PrintStmt(GenericStmt): |
There was a problem hiding this comment.
I suppose we'd want a WriteStmt, too, at some point, but that would constitute a larger impact due to it's use in transformations?
Also here, wondering if a generic OutputStmt might not be a better fit?
There was a problem hiding this comment.
As discussed offline, in principle yes, but due to the various flavours of string and output formatting it's quite hard to generalise. So, we might look into generalising output (and file handling) intrinsics at a later stage, but defer to a future PR for this.
| return ir.StopStmt(**kwargs) | ||
|
|
||
| visit_Intrinsic_Stmt = visit_Generic_Stmt | ||
| visit_Format_Stmt = visit_Generic_Stmt |
There was a problem hiding this comment.
Did I not spot a FormatStmt class in stmt_nodes.py?
There was a problem hiding this comment.
Yes, it's more involved than expected, but yes - please see top commit after rebase.
| # the usual `Output_Item_List` entity. | ||
| return ir.Intrinsic(text=f'PRINT {", ".join(str(i) for i in o.items if i is not None)}', | ||
| source=kwargs.get('source'), label=kwargs.get('label')) | ||
| return ir.GenericStmt( |
There was a problem hiding this comment.
Should this not generate a PrintStmt?
| args = f", {args}" if values else "" | ||
| fmt = o.attrib['format'] | ||
| return ir.Intrinsic(text=f'print {fmt}{args}', source=kwargs['source']) | ||
| return ir.GenericStmt(text=f'print {fmt}{args}', source=kwargs['source']) |
There was a problem hiding this comment.
Shouldn't this be a PrintStmt?
| def visit_FformatDecl(self, o, **kwargs): | ||
| fmt = f'FORMAT{o.attrib["format"]}' | ||
| return ir.Intrinsic(text=fmt, source=kwargs['source']) | ||
| return ir.GenericStmt(text=fmt, source=kwargs['source']) |
There was a problem hiding this comment.
Shouldn't this be a FormatStmt?
This was a misnomer all along, and now allows for better specialisation.
Adds node types for COMMON, CONTINUE, STOP, PRINT and FORMAT stmts.
7e27b16 to
8851324
Compare
Instead of taking the `text` argument in the constructor, we suppress it and derived the string from a tuple of expression/str arguments in the `values` argument. This allows us to cleanly use sub-expressions as part of output statements.
0abab4d to
a2cfbb7
Compare
Make OMNI FORMAT declaration parsing quote-aware so commas inside quoted format strings are retained as part of the string literal instead of splitting into separate descriptors. Preserve the leading numeric FORMAT statement label from source metadata when OMNI does not attach it directly to the declaration node. Add regression coverage for labelled FORMAT statements with quoted comma-containing items in the OMNI frontend, update frontend output statement assertions to check typed values, and cover PRINT/FORMAT string literal generation in fgen tests.
Note: This PR technically introduces an externally API-changing change, as it renames
ir.Intrinsic->ir.GenericStmtThis PR refactors the
loki.ir.nodessource into a dedicated sub-package and introduces a new set of statement nodes that all inherit fromloki.ir.nodes.GenericStmt. These replace previousir.Intrinsicnodes (a misnomer from the early days 😉 ), which in turn allows us to refer to specific statement nodes explicitly by type (eg.ir.ImplicitStmtorir.ContainsStmt). It is worth pointing out that I've only specialisedCONTAINS,IMPLICIT,CYCLE,RETURN,GO TO,PUBLIC,PRIVATEthus far, but others could follow.Pleas note also that the latter two now exist as dedicated node types, but I've not changed (yet) the internal mechanics of the
Modulethat use this information.In some more detail:
dataclass_stricttoloki.toolsloki.ir.nodesinto sub-package with sources forabstract_nodes,internal_nodesandleaf_nodesloki.ir.Intrinsic=>loki.ir.GenericStmtstmt_nodestoloki.ir.nodesand implementedImplicitStmt,ContainsStmt,ReturnStmt,CycleStmt,GotoStmt,PrivateStmtandPublicStmtir.nodes.teststo capture new node types and avoid JIT