-
Notifications
You must be signed in to change notification settings - Fork 82
Expand file tree
/
Copy pathHiFiTreeDiff.rsc
More file actions
529 lines (456 loc) · 26.8 KB
/
HiFiTreeDiff.rsc
File metadata and controls
529 lines (456 loc) · 26.8 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
@license{
Copyright (c) 2018-2023, NWO-I Centrum Wiskunde & Informatica
All rights reserved.
Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions are met:
1. Redistributions of source code must retain the above copyright notice,
this litst of conditions and the following disclaimer.
2. Redistributions in binary form must reproduce the above copyright notice,
this list of conditions and the following disclaimer in the documentation
and/or other materials provided with the distribution.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE
LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
POSSIBILITY OF SUCH DAMAGE.
}
@synopsis{Infer ((TextEdit)) from the differences between two parse ((ParseTree::Tree))s}
@description{
This module provides an essential building block for creating high-fidelity source-to-source code transformations.
It is common for industrial use cases of source-to-source transformation to extract
a list of text edits programmatically using parse tree pattern matching. This way the
changes are made on the textual level, with less introduction of noise and fewer removals
of valuable layout (indentation) and source code comments.
The construction of such high-fidelity edit lists can be rather involved because it tangles
and scatters a number of concerns:
1. syntax-directed pattern matching
2. string substitution; construction of the rewritten text
* retention of layout and in particular indentation
* retention of source code comments
* retention of specific case-insensitive keyword style
* syntactic correctness of the result; especially in relation to list separators there are many corner-cases to thing of
On the other hand, ParseTree to ParseTree rewrites are much easier to write and get correct.
They are "syntax directed" via the shape of the tree that follows the grammar of the language.
Some if not all of the above aspects are tackled by the rewriting mechanism with concrete patterns.
Especially the corner cases w.r.t. list separators are all handled by the rewriting mechanisms.
Also the rules are in "concrete syntax", on both the matching and the substition side. So they are
readable for all who know the object language. The rules guarantee syntactic correctness of the
rewritten source code. However, rewrite rules do quite some noisy damage to the layout, indentation
and comments, of the result.
With this module we bring these two modalities of source-to-source transformations together:
1. The language engineer uses concrete syntax rewrite rules to derive a new ParseTree from the original;
2. We run ((treeDiff)) to obtain a set of minimal text edits;
3. We apply the text edits to the editor contents or the file system.
}
@benefits{
* Because the derived text edits change fewer characters, the end result is more "hifi" than simply
unparsing the rewritten ParseTree. More comments are retained and more indentation is kept the same. More
case-insensitive keywords retain their original shape.
* At the same time the rewrite rules are easier to maintain as they remain "syntax directed".
* Changes to the grammar will be picked up when checking all source and target patterns.
* The diff algorithm uses cross-cutting information from the parse tree (what is layout and what not,
what is case-insensitive, etc.) which would otherwise have to be managed by the language engineer in _every rewrite rule_.
* The diff algoritm understands what indentation is and brings new sub-trees to the original level
of indentation (same as the sub-trees they are replacing)
* Typically the algorithm's run-time is lineair in the size of the tree, or better. Same for memory usage.
}
@pitfalls{
* ((treeDiff)) only works under the assumption that the second tree was derived from the first
by applying concrete syntax rewrite rules in Rascal. If there is no origin relation between the two
then its heuristics will not work. The algorithm could degenerate to substituting the entire file,
or worse it could degenerate to an exponential search for commonalities in long lists.
* ((treeDiff))'s efficiency is predicated on the two trees being derived from each other in main memory of the currently running JVM.
This way both trees will share pointers where they are the same, which leads to very efficient equality
testing. If the trees are first independently serialized to disk and then deserialized again, and then ((treeDiff)) is called,
this optimization is not present and the algorithm will perform (very) poorly.
* Substitution patterns should be formatted as best as possible. The algorithm will not infer
spacing or relative indentation inside of the substituted subtree. It will only infer indentation
for the entire subtree. Another way of resolving this is using a code formatter on the subsituted patterns.
}
module analysis::diff::edits::HiFiTreeDiff
extend analysis::diff::edits::TextEdits;
import List;
import Location;
import ParseTree;
import String;
import util::Math;
@synopsis{Detects minimal differences between parse trees and makes them explicit as ((TextEdit)) instructions.}
@description{
This is a "diff" algorithm of two parse trees to generate a ((TextEdit)) script that applies the differences on
the textual level, _with minimal collatoral damage in whitespace_. This is why it is called "HiFi": minimal unnecessary
noise introduction to the original file. It also tries to conserve source code comments; where still possible.
The resulting ((TextEdit))s are an intermediate representation for making changes in source code text files.
They can be executed independently via ((ExecuteTextEdits)), or interactively via ((IDEServices)), or LanguageServer features.
This top-down diff algorithm takes two arguments:
1. an _original_ parse tree for a text file,
2. and a _derived_ parse tree that is mostly equal to the original but has pieces of it substituted or rewritten.
From the tree node differences between these two trees, ((TextEdit))s are derived such that:
* when the edited source text is parsed again, the resulting tree would match the derived tree.
However, the parsed tree could be different from the derived tree in terms of whitespace, indentation and case-insensitive literals (see below).
* when tree nodes (grammar rules) are equal, smaller edits are searched by pair-wise comparison of the children
* differences between respective layout or (case insensitve) literal nodes are always ignored
* when lists have changed, careful editing of possible separators ensures syntactic correctness
* when new sub-trees are inserted, the replacement will be at the same indentation level as the original.
* when case-insensitive literals have been changed under a grammar rule that remained the same, no edits are produced.
The function comes in handy when we use Rascal to rewrite parse trees, and then need to communicate the effect
back to the IDE (for example using ((util::IDEServices)) or `util::LanguageServer` interfaces). We use
((ExecuteTextEdits)) to _test_ the effect of ((TextEdits)) while developing a source-to-source transformation.
}
@benefits{
* This function allows the language engineer to work in terms of abstract and concrete syntax trees while manipulating source text. The
((TextEdit))s intermediate representation bridge the gap to the minute details of IDE interaction such as "undo" and "preview" features.
* Text editing is fraught with details of whitespace, comments, list separators; all of which are handled here by
the exactness of syntactic and semantic knowledge of the parse trees.
* Where possible the algorithm also retains the capitalization of case-insensitive literals.
* The algorithm retrieves and retains indentation levels from the original tree, even if sub-trees in the
derived tree have mangled indentation. This allows us to ignore the indentation concern while thinking of rewrite
rules for source-to-souce transformation, and focus on the semantic effect.
* The algorithm inherits source code comments from the original, wherever sub-trees of the original and the
rewritten tree still line up.
}
@pitfalls{
* If the first argument is not an original parse tree, then basic assumptions of the algorithm fail and it may produce erroneous text edits.
* If the second argument is not derived from the original, then the algorithm will produce a single text edit to replace the entire source text.
* If the second argument was not produced from the first in the same JVM memory, it will not share many pointers to equal sub-trees
and the performance of the algorithm will degenerate quickly.
* If the parse tree of the original does not reflect the current state of the text in the file, then the generated text edits will do harm.
* If the original tree is not annotated with source locations, the algorithm fails.
* Both parse trees must be type correct, e.g. the number of symbols in a production rule, must be equal to the number of elements of the argument list of ((appl)).
* This algorithm does not work with ambiguous (sub)trees.
* When large sub-trees or sub-lists are moved to other parts of the tree, comment inheritance is not possible anymore.
}
@examples{
If we rewrite parse trees, this can be done with concrete syntax matching.
The following example swaps the if-branch with the else-branch in Pico:
```rascal-shell
import lang::pico::\syntax::Main;
import IO;
import analysis::diff::edits::ExecuteTextEdits;
import analysis::diff::edits::TextEdits;
import analysis::diff::edits::HiFiTreeDiff;
// an example Pico program:
writeFile(|tmp:///example.pico|,
"begin
' declare
' a : natural,
' b : natural;
' if a then
' a := b
' else
' b := a
' fi
'end");
import ParseTree;
original = parse(#start[Program], |tmp:///example.pico|);
// match and replace all conditionals
rewritten = visit(original) {
case (Statement) `if <Expression e> then <{Statement ";"}* ifBranch> else <{Statement ";"}* elseBranch> fi`
=> (Statement) `if <Expression e> then
' <{Statement ";"}* elseBranch>
'else
' <{Statement ";"}* ifBranch>
'fi`
}
// Check the result as a string. It worked, but we see some collatoral damage in whitespace (indentation).
"<rewritten>"
// Now derive text edits from the two parse trees:
edits = treeDiff(original, rewritten);
// Wrap them in a single document edit
edit = changed(original@\loc.top, edits);
// Apply the document edit on disk:
executeDocumentEdits([edit]);
// and when we read the result back, we see the transformation succeeded, and indentation was not lost:
readFile(tmp://example.pico|);
// It's also possible to directly rewrite the original string, for debugging purposes:
executeTextEdits("<original>", treeDiff(original, rewritten))
```
}
// equal trees generate empty diffs (note this already ignores whitespace differences because non-linear matching ignores layout nodes)
list[TextEdit] treeDiff(Tree a, a) = [];
// skip production labels of original rules when diffing, to be able to focus on the Symbol constructor for downstream case-distinction
list[TextEdit] treeDiff(
Tree t:appl(prod(label(_, Symbol s), list[Symbol] syms, set[Attr] attrs), list[Tree] args),
Tree u)
= treeDiff(appl(prod(s, syms, attrs), args)[@\loc=t@\loc?|bla:///|], u);
// skip production labels of original rules when diffing, to be able to focus on the Symbol constructor for downstream case-distinction
list[TextEdit] treeDiff(
Tree t,
Tree u:appl(prod(label(_, Symbol s), list[Symbol] syms, set[Attr] attrs), list[Tree] args))
= treeDiff(t, appl(prod(s, syms, attrs), args)[@\loc=u@\loc?|bla:///|]);
// matched layout trees generate empty diffs such that the original is maintained
list[TextEdit] treeDiff(
appl(prod(layouts(_), _, _), list[Tree] _),
appl(prod(layouts(_), _, _), list[Tree] _))
= [];
// matched literal trees generate empty diffs
list[TextEdit] treeDiff(
appl(prod(lit(str l), _, _), list[Tree] _),
appl(prod(lit(l) , _, _), list[Tree] _))
= [];
// matched case-insensitive literal trees generate empty diffs such that the original is maintained
list[TextEdit] treeDiff(
appl(prod(cilit(str l), _, _), list[Tree] _),
appl(prod(cilit(l) , _, _), list[Tree] _))
= [];
// different lexicals generate small diffs even if the parent is equal. This avoids extremely small edits within the boundaries of single identifiers.
list[TextEdit] treeDiff(
t:appl(prod(lex(str l), _, _), list[Tree] _),
r:appl(prod(lex(l) , _, _), list[Tree] _))
= [replace(t@\loc, learnIndentation(t@\loc, "<r>", "<t>"))]
when t != r;
// Several ways of changing recursive expressions can be exploited to create very small diffs.
// This is comparable to finding large common sublists in the case of lists. If you look at
// unary and binary associative operators as a kind of operator-separated lists, this makes sense.
// remove a unary prefix operator
list[TextEdit] treeDiff(
appl(prod(sort(str exp), [lit(_), _, sort(exp)], _), [Tree op, _, Tree r]),
r)
= [delete(fromUntil(op@\loc, r@\loc))];
// add a unary prefix operator
list[TextEdit] treeDiff(
Tree r,
appl(prod(sort(str exp), [lit(_), _, sort(exp)], _), [Tree op, Tree sp, r]))
= [replace(beginOf(r@\loc), "<op><sp>")];
// remove a unary postfix operator
list[TextEdit] treeDiff(
appl(prod(sort(str exp), [sort(str exp), _, lit(_)], _), [Tree l, _, Tree r]),
r)
= [delete(cover([endOf(l@\loc), r@\loc]))];
// add a unary postfix operator
list[TextEdit] treeDiff(
Tree l,
appl(prod(sort(str exp), [sort(str exp), _, lit(_)], _), [l, Tree sp, Tree op]))
= [replace(endOf(l), "<sp><op>")];
// remove the left hand side of a binary operator
list[TextEdit] treeDiff(
appl(prod(sort(str exp), [sort(exp), _, lit(_), _, sort(exp)], _), [Tree l, _, _, _, Tree r]),
r)
= [delete(fromUntil(l@\loc, r@\loc))];
// add the left hand side of a binary operator
list[TextEdit] treeDiff(
Tree r,
appl(prod(sort(str exp), [sort(exp), _, lit(_), _, sort(exp)], _), [Tree l, Tree sp1, Tree op, Tree sp2, r]))
= [replace(beginOf(r@\loc), "<l><sp1><op><sp2>")];
// remove the right hand side of a binary operator
list[TextEdit] treeDiff(
appl(prod(sort(str exp), [sort(exp), _, lit(_), _, sort(exp)], _), [Tree l, _, _, _, Tree r]),
l)
= [delete(cover([endOf(l@\loc), r@\loc]))];
// add the right hand side of a binary operator
list[TextEdit] treeDiff(
Tree l,
appl(prod(sort(str exp), [sort(exp), _, lit(_), _, sort(exp)], _), [l, Tree sp1, Tree op, Tree sp2, Tree r]))
= [replace(endOf(l@\loc), "<sp1><op><sp2><r>")];
// change a binary operator
list[TextEdit] treeDiff(
appl(prod(sort(str exp), [sort(exp), _, lit(str x), _, sort(exp)], _), [Tree l, _, Tree op, _, Tree r]),
appl(prod(sort(str exp), [sort(exp), _, lit(str y), _, sort(exp)], _), [l, _, Tree newOp, _, r]))
= [replace(op@\loc, "<newOp>")] when x != y;
// change a prefix operator
list[TextEdit] treeDiff(
appl(prod(sort(str exp), [lit(str x), _, sort(exp)], _), [Tree op, _, Tree r]),
appl(prod(sort(str exp), [lit(str y), _, sort(exp)], _), [Tree newOp, _, r]))
= [replace(op@\loc, "<newOp>")] when x != y;
// change a postfix operator
list[TextEdit] treeDiff(
appl(prod(sort(str exp), [sort(exp), _, lit(str x)], _), [Tree l, _, Tree op]),
appl(prod(sort(str exp), [sort(exp), _, lit(str y)], _), [l, _, Tree newOp]))
= [replace(op@\loc, "<newOp>")] when x != y;
// remove brackets
list[TextEdit] treeDiff(
t:appl(prod(sort(str exp), [lit(_), _, sort(exp), _, lit(_)], {*_, \bracket()}), [_, _, Tree e, _, _]),
e)
= [
delete(fromUntil(beginOf(t@\loc), e@\loc)),
delete(fromUntil(endOf(e@\loc), endOf(t@\loc)))
];
// add brackets
list[TextEdit] treeDiff(
Tree e,
t:appl(prod(sort(str exp), [lit(_), _, sort(exp), _, lit(_)], {*_, \bracket()}), [Tree open, Tree sp1, Tree e, Tree sp2, Tree close]))
= [
replace(beginOf(e@\loc), "<open><sp1>"),
replace(endOf(e@\loc), "<sp2><close>")
];
// When the productions are different, we've found an edit, and there is no need to recurse deeper.
default list[TextEdit] treeDiff(
t:appl(Production p:prod(_,_,_), list[Tree] _),
r:appl(Production q:!p , list[Tree] _))
= [replace(t@\loc, learnIndentation(t@\loc, "<r>", "<t>"))];
// If list production are the same, then the element lists can still be of different length
// and we switch to listDiff which has different heuristics than normal trees to detect large identical sublists.
list[TextEdit] treeDiff(
Tree t:appl(Production p:regular(Symbol reg), list[Tree] aElems),
appl(p, list[Tree] bElems))
= listDiff(t@\loc, seps(reg), aElems, bElems);
// When the productions are equal, but the children may be different, we dig deeper for differences
default list[TextEdit] treeDiff(t:appl(Production p, list[Tree] argsA), appl(p, list[Tree] argsB))
= [*treeDiff(a, b) | <a,b> <- zip2(argsA, argsB)];
@synopsis{decide how many separators we have}
private int seps(\iter-seps(_, list[Symbol] s)) = size(s);
private int seps(\iter-star-seps(_, list[Symbol] s)) = size(s);
private default int seps(Symbol _) = 0;
@synopsis{List diff finds minimal differences between the elements of two lists.}
@description{
This algorithm uses heuristics to avoid searching for the largest common sublist all too often.
Also it minimized the sublists that largest common sublist is executed on.
1. Since many patches to parse tree lists typically only change a prefix or a postfix, and we
can detect this quickly, we first extract patches for those instances.
2. It is also fast and easy to detect unchanged prefixes and postfixes, so by focusing
on the changes parts in the middle we generate more instances of case 1.
3. Another simple and quick case is when simply all elements are different (the prefix==the list==the postfix)
3. What we are left with is either an empty list and we are done, or a more complex situation
where we apply the "findEqualSubList" algorithm, which splits the list in three parts:
* two unequal prefixes
* two equal sublists in the middle
* two unequal postfixes
4. the algorithm then concatenates the diffs by recursing to step 1 on the prefixes and the diffs by recursing to step 1. on the postfixes
5. two empty lists terminate the recursion,
}
list[TextEdit] listDiff(loc span, int seps, list[Tree] originals, list[Tree] replacements) {
edits = [];
// this algorithm isolates commonalities between the two lists
// by handling different special cases. It continues always with
// what is left to be different. By maximizing commonalities,
// the edits are minimized. Note that we float on source location parameters
// not only for the edit locations but also for sub-tree identity.
<span, originals, replacements> = trimEqualElements(span, originals, replacements);
<specialEdits, originals, replacements> = commonSpecialCases(span, seps, originals, replacements);
edits += specialEdits;
equalSubList = findEqualSubList(originals, replacements);
// by using the (or "a") largest common sublist as a pivot to divide-and-conquer
// to the left and right of it, we minimize the number of necessary
// edit actions for the entire list.
if (equalSubList != [],
[*preO, *equalSubList, *postO] := originals,
[*preR, *equalSubList, *postR] := replacements) {
// TODO: what about the separators?
// we align the prefixes and the postfixes and
// continue recursively.
return edits
+ listDiff(beginCover(span, preO), seps, preO, preR)
+ listDiff(endCover(span, postO), seps, postO, postR)
;
}
else if (originals == [], replacements == []) {
return edits;
}
else {
// here we know there are no common elements anymore, only a common amount of different elements
common = min([size(originals), size(replacements)]);
return edits
// first the minimal length pairwise replacements, essential for finding accidental commonalities
+ [*treeDiff(a, b) | <a, b> <- zip2(originals[..common], replacements[..common])]
// then we either remove the tail that became shorter:
+ [replace(cover([after(last@\loc), cover(originals[common+1..])]), "") | size(originals) > size(replacements), [*_, last] := originals[..common]]
// or we add new elements to the end, while inheriting indentation from the originals:
+ [replace(after(span), learnIndentation(span, yield(replacements[common..]), yield(originals))) | size(originals) < size(replacements)]
;
}
}
@synopsis{Finds the largest sublist that occurs in both lists}
@description{
Using list matching and backtracking, this algorithm detects which common
sublist is the largest. It assumes ((trimEqualElements)) has happened already,
and thus there are interesting differences left, even if we remove any equal
sublist.
Note that this is not a general algorithm for Largest Common Subsequence (LCS), since it
uses particular properties of the relation between the original and the replacement list:
* New elements are never equal to old elements (due to source locations)
* Equal prefixes and postfixes may be assumed to be maximal sublists as well (see above).
* Candidate equal sublists always have consecutive source locations from the origin.
}
list[Tree] findEqualSubList([*Tree sub], [*_, *sub, *_]) = sub;
list[Tree] findEqualSubList([*_, *Tree sub, *_], [*sub]) = sub;
list[Tree] findEqualSubList([*_, p, *Tree sub, q, *_], [*_, !p, *sub, !q, *_]) = sub;
default list[Tree] findEqualSubList(list[Tree] _orig, list[Tree] _repl) = [];
@synopsis{trips equal elements from the front and the back of both lists, if any.}
tuple[loc, list[Tree], list[Tree]] trimEqualElements(loc span,
[Tree a, *Tree aPostfix], [ a, *Tree bPostfix])
= trimEqualElements(endCover(span, aPostfix), aPostfix, bPostfix);
tuple[loc, list[Tree], list[Tree]] trimEqualElements(loc span,
[*Tree aPrefix, Tree a], [*Tree bPrefix, a])
= trimEqualElements(beginCover(span, aPrefix), aPrefix, bPrefix);
default tuple[loc, list[Tree], list[Tree]] trimEqualElements(loc span,
list[Tree] a, list[Tree] b)
= <span, a, b>;
// only one element removed in front, then we are done
tuple[list[TextEdit], list[Tree], list[Tree]] commonSpecialCases(loc span, 0, [Tree a, *Tree tail], [*tail])
= <[replace(a@\loc, "")], [], []>;
// only one element removed in front, plus 1 separator, then we are done because everything is the same
tuple[list[TextEdit], list[Tree], list[Tree]] commonSpecialCases(loc span, 1,
[Tree a, Tree _sep, Tree tHead, *Tree tail], [tHead, *tail])
= <[replace(fromUntil(a, tHead), "")], [], []>;
// only one element removed in front, plus 1 separator, then we are done because everything is the same
tuple[list[TextEdit], list[Tree], list[Tree]] commonSpecialCases(loc span, 3,
[Tree a, Tree _l1, Tree _sep, Tree _l2, Tree tHead, *Tree tail], [tHead, *tail])
= <[replace(fromUntil(a, tHead), "")], [], []>;
// singleton replacement
tuple[list[TextEdit], list[Tree], list[Tree]] commonSpecialCases(loc span, int _,
[Tree a], [Tree b])
= <treeDiff(a, b), [], []>;
default tuple[list[TextEdit], list[Tree], list[Tree]] commonSpecialCases(loc span, int _, list[Tree] a, list[Tree] b)
= <[], a, b>;
@synopsis{convenience overload for shorter code}
private loc fromUntil(Tree from, Tree until) = fromUntil(from@\loc, until@\loc);
@synopsis{Compute location span that is common between an element and a succeeding element}
@description{
The resulting loc is including the `from` but excluding the `until`. It goes right
up to `until`.
```ascii-art
[from] gap [until]
<--------->
````
}
private loc fromUntil(loc from, loc until) = from.top(from.offset, until.offset - from.offset);
private int end(loc src) = src.offset + src.length;
private loc endOf(loc src) = src.top[offset=src.offset + src.length][length=0];
private loc beginOf(loc src) = src.top[offset=src.offset][length=0];
private loc after(loc src) = src(end(src), 0);
private loc endCover(loc span, []) = span(span.offset + span.length, 0);
private loc endCover(loc span, [Tree x]) = x@\loc;
private default loc endCover(loc span, list[Tree] l) = cover(l);
private loc beginCover(loc span, []) = span(span.offset, 0);
private loc beginCover(loc span, [Tree x]) = x@\loc;
private default loc beginCover(loc span, list[Tree] l) = cover(l);
private loc cover(list[Tree] elems:[_, *_]) = cover([e@\loc | Tree e <- elems, e@\loc?]);
@synopsis{yield a consecutive list of trees}
private str yield(list[Tree] elems) = "<for (e <- elems) {><e><}>";
@synopsis{Make sure the subtitution is at least as far indented as the original}
@description{
This algorithm ignores the first line, since the first line is always preceeded by the layout of a parent node.
Then it measures the depth of indentation of every line in the original, and takes the minimum.
That minimum indentation is stripped off every line that already has that much indentation in the replacement,
and then _all_ lines are re-indented with the discovered minimum.
}
private str learnIndentation(loc span, str replacement, str original) {
list[str] indents(str text) = [indent | /^<indent:[\t\ ]*>[^\ \t]/ <- split("\n", text)];
origIndents = indents(original);
replLines = split("\n", replacement);
if (replLines == []) {
return "";
}
minIndent = "";
if ([_] := origIndents) {
// only one line. have to invent indentation from span
minIndent = "<for (_ <- [0..(span.begin?) ? span.begin.column : 0]) {> <}>";
}
else {
// we skip the first line for learning indentation, because that one would typically be embedded in a previous line.
minIndent = sort(origIndents[1..])[0]? "";
}
// we remove the leading spaces _up to_ the minimal indentation of the original,
// keep the rest of the indentation from the replacement (if any is left), and then the actual content.
// that entire multiline result is then lazily indented with the minimal indentation we learned from the original.
return indent(
minIndent,
"<for (/^<theInd:[\t\ ]*><rest:[^\t\ ]+.*>$/ <- replLines) {><theInd[..max(size(minIndent), span.begin? ? span.begin.column : 0)]><rest>
'<}>"[..-1])
;
}