rascal/src/org/rascalmpl/library/analysis/diff/edits/HiFiTreeDiff.rsc at 3f4842503f4b2e76a8307c3d287f3da21927bb4e · usethesource/rascal

529 lines (456 loc) · 26.8 KB
Copyright (c) 2018-2023, NWO-I Centrum Wiskunde & Informatica
All rights reserved.
Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions are met:
1. Redistributions of source code must retain the above copyright notice,
this litst of conditions and the following disclaimer.
2. Redistributions in binary form must reproduce the above copyright notice,
this list of conditions and the following disclaimer in the documentation
and/or other materials provided with the distribution.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE
LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
POSSIBILITY OF SUCH DAMAGE.
@synopsis{Infer ((TextEdit)) from the differences between two parse ((ParseTree::Tree))s}
@description{
This module provides an essential building block for creating high-fidelity source-to-source code transformations.
It is common for industrial use cases of source-to-source transformation to extract
a list of text edits programmatically using parse tree pattern matching. This way the 
changes are made on the textual level, with less introduction of noise and fewer removals 
of valuable layout (indentation) and source code comments. 
The construction of such high-fidelity edit lists can be rather involved because it tangles
and scatters a number of concerns:
1. syntax-directed pattern matching
2. string substitution; construction of the rewritten text
   * retention of layout and in particular indentation
   * retention of source code comments
   * retention of specific case-insensitive keyword style
   * syntactic correctness of the result; especially in relation to list separators there are many corner-cases to thing of
On the other hand, ParseTree to ParseTree rewrites are much easier to write and get correct. 
They are "syntax directed" via the shape of the tree that follows the grammar of the language.
Some if not all of the above aspects are tackled by the rewriting mechanism with concrete patterns.
Especially the corner cases w.r.t. list separators are all handled by the rewriting mechanisms.
Also the rules are in "concrete syntax", on both the matching and the substition side. So they are 
readable for all who know the object language. The rules guarantee syntactic correctness of the 
rewritten source code. However, rewrite rules do quite some noisy damage to the layout, indentation 
and comments, of the result.
With this module we bring these two modalities of source-to-source transformations together:
1. The language engineer uses concrete syntax rewrite rules to derive a new ParseTree from the original;
2. We run ((treeDiff)) to obtain a set of minimal text edits;
3. We apply the text edits to the editor contents or the file system.
* Because the derived text edits change fewer characters, the end result is more "hifi" than simply
unparsing the rewritten ParseTree. More comments are retained and more indentation is kept the same. More
case-insensitive keywords retain their original shape.
* At the same time the rewrite rules are easier to maintain as they remain "syntax directed". 
* Changes to the grammar will be picked up when checking all source and target patterns. 
* The diff algorithm uses cross-cutting information from the parse tree (what is layout and what not,
 what is case-insensitive, etc.) which would otherwise have to be managed by the language engineer in _every rewrite rule_.
* The diff algoritm understands what indentation is and brings new sub-trees to the original level
of indentation (same as the sub-trees they are replacing)
* Typically the algorithm's run-time is lineair in the size of the tree, or better. Same for memory usage.
* ((treeDiff)) only works under the assumption that the second tree was derived from the first
by applying concrete syntax rewrite rules in Rascal. If there is no origin relation between the two
then its heuristics will not work. The algorithm could degenerate to substituting the entire file,
or worse it could degenerate to an exponential search for commonalities in long lists. 
* ((treeDiff))'s efficiency is predicated on the two trees being derived from each other in main memory of the currently running JVM.
This way both trees will share pointers where they are the same, which leads to very efficient equality
testing. If the trees are first independently serialized to disk and then deserialized again, and then ((treeDiff)) is called, 
this optimization is not present and the algorithm will perform (very) poorly.
* Substitution patterns should be formatted as best as possible. The algorithm will not infer
spacing or relative indentation inside of the substituted subtree. It will only infer indentation
for the entire subtree. Another way of resolving this is using a code formatter on the subsituted patterns.
module analysis::diff::edits::HiFiTreeDiff
extend analysis::diff::edits::TextEdits;
import List;
import Location;
import ParseTree;
import String;
import util::Math;
@synopsis{Detects minimal differences between parse trees and makes them explicit as ((TextEdit)) instructions.}
@description{
This is a "diff" algorithm of two parse trees to generate a ((TextEdit)) script that applies the differences on 
the textual level, _with minimal collatoral damage in whitespace_. This is why it is called "HiFi": minimal unnecessary
noise introduction to the original file. It also tries to conserve source code comments; where still possible.
The resulting ((TextEdit))s are an intermediate representation for making changes in source code text files. 
They can be executed independently via ((ExecuteTextEdits)), or interactively via ((IDEServices)), or LanguageServer features. 
This top-down diff algorithm takes two arguments:
1. an _original_ parse tree for a text file, 
2. and a _derived_ parse tree that is mostly equal to the original but has pieces of it substituted or rewritten. 
From the tree node differences between these two trees, ((TextEdit))s are derived such that:
* when the edited source text is parsed again, the resulting tree would match the derived tree. 
However, the parsed tree could be different from the derived tree in terms of whitespace, indentation and case-insensitive literals (see below).
* when tree nodes (grammar rules) are equal, smaller edits are searched by pair-wise comparison of the children
* differences between respective layout or (case insensitve) literal nodes are always ignored 
* when lists have changed, careful editing of possible separators ensures syntactic correctness
* when new sub-trees are inserted, the replacement will be at the same indentation level as the original. 
* when case-insensitive literals have been changed under a grammar rule that remained the same, no edits are produced.
The function comes in handy when we use Rascal to rewrite parse trees, and then need to communicate the effect
back to the IDE (for example using ((util::IDEServices)) or `util::LanguageServer` interfaces). We use
((ExecuteTextEdits)) to _test_ the effect of ((TextEdits)) while developing a source-to-source transformation. 
* This function allows the language engineer to work in terms of abstract and concrete syntax trees while manipulating source text. The
((TextEdit))s intermediate representation bridge the gap to the minute details of IDE interaction such as "undo" and "preview" features.
* Text editing is fraught with details of whitespace, comments, list separators; all of which are handled here by 
the exactness of syntactic and semantic knowledge of the parse trees. 
* Where possible the algorithm also retains the capitalization of case-insensitive literals.
* The algorithm retrieves and retains indentation levels from the original tree, even if sub-trees in the
derived tree have mangled indentation. This allows us to ignore the indentation concern while thinking of rewrite
rules for source-to-souce transformation, and focus on the semantic effect. 
* The algorithm inherits source code comments from the original, wherever sub-trees of the original and the
rewritten tree still line up. 
* If the first argument is not an original parse tree, then basic assumptions of the algorithm fail and it may produce erroneous text edits.
* If the second argument is not derived from the original, then the algorithm will produce a single text edit to replace the entire source text.
* If the second argument was not produced from the first in the same JVM memory, it will not share many pointers to equal sub-trees
and the performance of the algorithm will degenerate quickly.
* If the parse tree of the original does not reflect the current state of the text in the file, then the generated text edits will do harm. 
* If the original tree is not annotated with source locations, the algorithm fails.
* Both parse trees must be type correct, e.g. the number of symbols in a production rule, must be equal to the number of elements of the argument list of ((appl)).
* This algorithm does not work with ambiguous (sub)trees.
* When large sub-trees or sub-lists are moved to other parts of the tree, comment inheritance is not possible anymore. 
If we rewrite parse trees, this can be done with concrete syntax matching.
The following example swaps the if-branch with the else-branch in Pico:
```rascal-shell
import lang::pico::\syntax::Main;
import analysis::diff::edits::ExecuteTextEdits;
import analysis::diff::edits::TextEdits;
import analysis::diff::edits::HiFiTreeDiff;
// an example Pico program:
writeFile(|tmp:///example.pico|,
    '   declare
    '       a : natural,
    '       b : natural;
    '   if a then 
    '       a := b
    '   else
    '       b := a
    'end");
import ParseTree;
original = parse(#start[Program], |tmp:///example.pico|);
// match and replace all conditionals
rewritten = visit(original) {
    case (Statement) `if <Expression e> then <{Statement ";"}* ifBranch> else <{Statement ";"}* elseBranch> fi`
      => (Statement) `if <Expression e> then 
                     '  <{Statement ";"}* elseBranch> 
                     '  <{Statement ";"}* ifBranch> 
// Check the result as a string. It worked, but we see some collatoral damage in whitespace (indentation).
"<rewritten>"
// Now derive text edits from the two parse trees:
edits = treeDiff(original, rewritten);
// Wrap them in a single document edit
edit = changed(original@\loc.top, edits);
// Apply the document edit on disk:
executeDocumentEdits([edit]);
// and when we read the result back, we see the transformation succeeded, and indentation was not lost:
readFile(tmp://example.pico|);
// It's also possible to directly rewrite the original string, for debugging purposes:
executeTextEdits("<original>", treeDiff(original, rewritten))
// equal trees generate empty diffs (note this already ignores whitespace differences because non-linear matching ignores layout nodes)
list[TextEdit] treeDiff(Tree a, a) = [];
// skip production labels of original rules when diffing, to be able to focus on the Symbol constructor for downstream case-distinction
list[TextEdit] treeDiff(
    Tree t:appl(prod(label(_, Symbol s), list[Symbol] syms, set[Attr] attrs), list[Tree] args), 
    Tree u)
    = treeDiff(appl(prod(s, syms, attrs), args)[@\loc=t@\loc?|bla:///|], u);
// skip production labels of original rules when diffing, to be able to focus on the Symbol constructor for downstream case-distinction
list[TextEdit] treeDiff(
    Tree t,
    Tree u:appl(prod(label(_, Symbol s), list[Symbol] syms, set[Attr] attrs), list[Tree] args))
    = treeDiff(t, appl(prod(s, syms, attrs), args)[@\loc=u@\loc?|bla:///|]);
// matched layout trees generate empty diffs such that the original is maintained
list[TextEdit] treeDiff(
    appl(prod(layouts(_), _, _), list[Tree] _), 
    appl(prod(layouts(_), _, _), list[Tree] _))
// matched literal trees generate empty diffs 
list[TextEdit] treeDiff(
    appl(prod(lit(str l), _, _), list[Tree] _), 
    appl(prod(lit(l)    , _, _), list[Tree] _))
// matched case-insensitive literal trees generate empty diffs such that the original is maintained 
list[TextEdit] treeDiff(
    appl(prod(cilit(str l), _, _), list[Tree] _), 
    appl(prod(cilit(l)    , _, _), list[Tree] _))
// different lexicals generate small diffs even if the parent is equal. This avoids extremely small edits within the boundaries of single identifiers.
list[TextEdit] treeDiff(
    t:appl(prod(lex(str l), _, _), list[Tree] _), 
    r:appl(prod(lex(l)    , _, _), list[Tree] _))
    = [replace(t@\loc, learnIndentation(t@\loc, "<r>", "<t>"))]
    when t != r;
// Several ways of changing recursive expressions can be exploited to create very small diffs.
// This is comparable to finding large common sublists in the case of lists. If you look at 
// unary and binary associative operators as a kind of operator-separated lists, this makes sense.
// remove a unary prefix operator
list[TextEdit] treeDiff(
    appl(prod(sort(str exp), [lit(_), _, sort(exp)], _), [Tree op, _, Tree r]), 
    = [delete(fromUntil(op@\loc, r@\loc))];
// add a unary prefix operator
list[TextEdit] treeDiff(
    Tree r,
    appl(prod(sort(str exp), [lit(_), _, sort(exp)], _), [Tree op, Tree sp, r]))
    = [replace(beginOf(r@\loc), "<op><sp>")];
// remove a unary postfix operator
list[TextEdit] treeDiff(
    appl(prod(sort(str exp), [sort(str exp), _, lit(_)], _), [Tree l, _, Tree r]), 
    = [delete(cover([endOf(l@\loc), r@\loc]))];
// add a unary postfix operator
list[TextEdit] treeDiff(
    Tree l,
    appl(prod(sort(str exp), [sort(str exp), _, lit(_)], _), [l, Tree sp, Tree op]))
    = [replace(endOf(l), "<sp><op>")];
// remove the left hand side of a binary operator
list[TextEdit] treeDiff(
    appl(prod(sort(str exp), [sort(exp), _, lit(_), _, sort(exp)], _), [Tree l, _, _, _, Tree r]), 
    = [delete(fromUntil(l@\loc, r@\loc))];
// add the left hand side of a binary operator
list[TextEdit] treeDiff(
    Tree r,
    appl(prod(sort(str exp), [sort(exp), _, lit(_), _, sort(exp)], _), [Tree l, Tree sp1, Tree op, Tree sp2, r]))
    = [replace(beginOf(r@\loc), "<l><sp1><op><sp2>")];
// remove the right hand side of a binary operator
list[TextEdit] treeDiff(
    appl(prod(sort(str exp), [sort(exp), _, lit(_), _, sort(exp)], _), [Tree l, _, _, _, Tree r]), 
    = [delete(cover([endOf(l@\loc), r@\loc]))];
// add the right hand side of a binary operator
list[TextEdit] treeDiff(
    Tree l,
    appl(prod(sort(str exp), [sort(exp), _, lit(_), _, sort(exp)], _), [l, Tree sp1, Tree op, Tree sp2, Tree r]))
    = [replace(endOf(l@\loc), "<sp1><op><sp2><r>")];
// change a binary operator
list[TextEdit] treeDiff(
    appl(prod(sort(str exp), [sort(exp), _, lit(str x), _, sort(exp)], _), [Tree l, _, Tree op, _, Tree r]), 
    appl(prod(sort(str exp), [sort(exp), _, lit(str y), _, sort(exp)], _), [l, _, Tree newOp, _, r]))
    = [replace(op@\loc, "<newOp>")] when x != y;
// change a prefix operator
list[TextEdit] treeDiff(
    appl(prod(sort(str exp), [lit(str x), _, sort(exp)], _), [Tree op, _, Tree r]), 
    appl(prod(sort(str exp), [lit(str y), _, sort(exp)], _), [Tree newOp, _, r]))
    = [replace(op@\loc, "<newOp>")] when x != y;
// change a postfix operator
list[TextEdit] treeDiff(
    appl(prod(sort(str exp), [sort(exp), _, lit(str x)], _), [Tree l, _, Tree op]), 
    appl(prod(sort(str exp), [sort(exp), _, lit(str y)], _), [l, _, Tree newOp]))
    = [replace(op@\loc, "<newOp>")] when x != y;
// remove brackets
list[TextEdit] treeDiff(
    t:appl(prod(sort(str exp), [lit(_), _, sort(exp), _, lit(_)], {*_, \bracket()}), [_, _, Tree e, _, _]), 
        delete(fromUntil(beginOf(t@\loc), e@\loc)),
        delete(fromUntil(endOf(e@\loc), endOf(t@\loc)))
// add brackets
list[TextEdit] treeDiff(
    Tree e,
    t:appl(prod(sort(str exp), [lit(_), _, sort(exp), _, lit(_)], {*_, \bracket()}), [Tree open, Tree sp1, Tree e, Tree sp2, Tree close]))
        replace(beginOf(e@\loc), "<open><sp1>"),
        replace(endOf(e@\loc), "<sp2><close>")
// When the productions are different, we've found an edit, and there is no need to recurse deeper.
default list[TextEdit] treeDiff(
    t:appl(Production p:prod(_,_,_), list[Tree] _), 
    r:appl(Production q:!p         , list[Tree] _))
    = [replace(t@\loc, learnIndentation(t@\loc, "<r>", "<t>"))];
// If list production are the same, then the element lists can still be of different length
// and we switch to listDiff which has different heuristics than normal trees to detect large identical sublists.
list[TextEdit] treeDiff(
    Tree t:appl(Production p:regular(Symbol reg), list[Tree] aElems), 
    appl(p, list[Tree] bElems))
    = listDiff(t@\loc, seps(reg), aElems, bElems);
// When the productions are equal, but the children may be different, we dig deeper for differences
default list[TextEdit] treeDiff(t:appl(Production p, list[Tree] argsA), appl(p, list[Tree] argsB))
    = [*treeDiff(a, b) | <a,b> <- zip2(argsA, argsB)];
@synopsis{decide how many separators we have}
private int seps(\iter-seps(_, list[Symbol] s))      = size(s);
private int seps(\iter-star-seps(_, list[Symbol] s)) = size(s);
private default int seps(Symbol _)                   = 0;
@synopsis{List diff finds minimal differences between the elements of two lists.}
@description{
This algorithm uses heuristics to avoid searching for the largest common sublist all too often.
Also it minimized the sublists that largest common sublist is executed on. 
1. Since many patches to parse tree lists typically only change a prefix or a postfix, and we 
can detect this quickly, we first extract patches for those instances.
2. It is also fast and easy to detect unchanged prefixes and postfixes, so by focusing
on the changes parts in the middle we generate more instances of case 1.
3. Another simple and quick case is when simply all elements are different (the prefix==the list==the postfix)
3. What we are left with is either an empty list and we are done, or a more complex situation
where we apply the "findEqualSubList" algorithm, which splits the list in three parts:
   * two unequal prefixes
   * two equal sublists in the middle
   * two unequal postfixes
4. the algorithm then concatenates the diffs by recursing to step 1 on the prefixes and the diffs by recursing to step 1. on the postfixes 
5. two empty lists terminate the recursion,
list[TextEdit] listDiff(loc span, int seps, list[Tree] originals, list[Tree] replacements) {
    edits = [];
    // this algorithm isolates commonalities between the two lists
    // by handling different special cases. It continues always with
    // what is left to be different. By maximizing commonalities,
    // the edits are minimized. Note that we float on source location parameters
    // not only for the edit locations but also for sub-tree identity.
    <span, originals, replacements> = trimEqualElements(span, originals, replacements);
    <specialEdits, originals, replacements> = commonSpecialCases(span, seps, originals, replacements);
    edits += specialEdits;
    equalSubList = findEqualSubList(originals, replacements);
    // by using the (or "a") largest common sublist as a pivot to divide-and-conquer
    // to the left and right of it, we minimize the number of necessary 
    // edit actions for the entire list.
    if (equalSubList != [],
        [*preO, *equalSubList, *postO] := originals,
        [*preR, *equalSubList, *postR] := replacements) {
        // TODO: what about the separators?
        // we align the prefixes and the postfixes and
        // continue recursively.
        return edits 
            + listDiff(beginCover(span, preO), seps, preO, preR)   
            + listDiff(endCover(span, postO), seps, postO, postR)
    else if (originals == [], replacements == []) {
        return edits;
    else { 
        // here we know there are no common elements anymore, only a common amount of different elements
        common = min([size(originals), size(replacements)]);
        return edits 
            // first the minimal length pairwise replacements, essential for finding accidental commonalities
            + [*treeDiff(a, b) | <a, b> <- zip2(originals[..common], replacements[..common])]
            // then we either remove the tail that became shorter:
            + [replace(cover([after(last@\loc), cover(originals[common+1..])]), "") | size(originals) > size(replacements), [*_, last] := originals[..common]]
            // or we add new elements to the end, while inheriting indentation from the originals:
            + [replace(after(span), learnIndentation(span, yield(replacements[common..]), yield(originals))) | size(originals) < size(replacements)]
@synopsis{Finds the largest sublist that occurs in both lists}
@description{
Using list matching and backtracking, this algorithm detects which common
sublist is the largest. It assumes ((trimEqualElements)) has happened already,
and thus there are interesting differences left, even if we remove any equal
Note that this is not a general algorithm for Largest Common Subsequence (LCS), since it
uses particular properties of the relation between the original and the replacement list:
* New elements are never equal to old elements (due to source locations)
* Equal prefixes and postfixes may be assumed to be maximal sublists as well (see above).
* Candidate equal sublists always have consecutive source locations from the origin.
list[Tree] findEqualSubList([*Tree sub], [*_, *sub, *_]) = sub;
list[Tree] findEqualSubList([*_, *Tree sub, *_], [*sub]) = sub;
list[Tree] findEqualSubList([*_, p, *Tree sub, q, *_], [*_, !p, *sub, !q, *_]) = sub;
default list[Tree] findEqualSubList(list[Tree] _orig, list[Tree] _repl) = [];
@synopsis{trips equal elements from the front and the back of both lists, if any.}
tuple[loc, list[Tree], list[Tree]] trimEqualElements(loc span, 
    [Tree a, *Tree aPostfix], [ a, *Tree bPostfix])
    = trimEqualElements(endCover(span, aPostfix), aPostfix, bPostfix);
tuple[loc, list[Tree], list[Tree]] trimEqualElements(loc span, 
    [*Tree aPrefix, Tree a], [*Tree bPrefix, a])
    = trimEqualElements(beginCover(span, aPrefix), aPrefix, bPrefix);
default tuple[loc, list[Tree], list[Tree]] trimEqualElements(loc span, 
    list[Tree] a, list[Tree] b)
    = <span, a, b>;
// only one element removed in front, then we are done
tuple[list[TextEdit], list[Tree], list[Tree]] commonSpecialCases(loc span, 0, [Tree a, *Tree tail], [*tail])
    = <[replace(a@\loc, "")], [], []>;
// only one element removed in front, plus 1 separator, then we are done because everything is the same
tuple[list[TextEdit], list[Tree], list[Tree]] commonSpecialCases(loc span, 1, 
    [Tree a, Tree _sep, Tree tHead, *Tree tail], [tHead, *tail])
    = <[replace(fromUntil(a, tHead), "")], [], []>;
// only one element removed in front, plus 1 separator, then we are done because everything is the same
tuple[list[TextEdit], list[Tree], list[Tree]] commonSpecialCases(loc span, 3, 
    [Tree a, Tree _l1, Tree _sep, Tree _l2, Tree tHead, *Tree tail], [tHead, *tail])
    = <[replace(fromUntil(a, tHead), "")], [], []>;
// singleton replacement
tuple[list[TextEdit], list[Tree], list[Tree]] commonSpecialCases(loc span, int _, 
    [Tree a], [Tree b])
    = <treeDiff(a, b), [], []>;
default tuple[list[TextEdit], list[Tree], list[Tree]] commonSpecialCases(loc span, int _, list[Tree] a, list[Tree] b)
    = <[], a, b>;
@synopsis{convenience overload for shorter code}
private loc fromUntil(Tree from, Tree until) = fromUntil(from@\loc, until@\loc);
@synopsis{Compute location span that is common between an element and a succeeding element}
@description{
The resulting loc is including the `from` but excluding the `until`. It goes right
up to `until`.
```ascii-art
 [from] gap [until]
 <--------->
private loc fromUntil(loc from, loc until) = from.top(from.offset, until.offset - from.offset);
private int end(loc src) = src.offset + src.length;
private loc endOf(loc src) = src.top[offset=src.offset + src.length][length=0];
private loc beginOf(loc src) = src.top[offset=src.offset][length=0];
private loc after(loc src) = src(end(src), 0);
private loc endCover(loc span, []) = span(span.offset + span.length, 0);
private loc endCover(loc span, [Tree x]) = x@\loc;
private default loc endCover(loc span, list[Tree] l) = cover(l);
private loc beginCover(loc span, []) = span(span.offset, 0);
private loc beginCover(loc span, [Tree x]) = x@\loc;
private default loc beginCover(loc span, list[Tree] l) = cover(l);
private loc cover(list[Tree] elems:[_, *_]) = cover([e@\loc | Tree e <- elems, e@\loc?]);
@synopsis{yield a consecutive list of trees}
private str yield(list[Tree] elems) = "<for (e <- elems) {><e><}>";
@synopsis{Make sure the subtitution is at least as far indented as the original}
@description{
This algorithm ignores the first line, since the first line is always preceeded by the layout of a parent node.
Then it measures the depth of indentation of every line in the original, and takes the minimum.
That minimum indentation is stripped off every line that already has that much indentation in the replacement,
and then _all_ lines are re-indented with the discovered minimum.
private str learnIndentation(loc span, str replacement, str original) {
    list[str] indents(str text) = [indent | /^<indent:[\t\ ]*>[^\ \t]/ <- split("\n", text)];
    origIndents = indents(original);
    replLines   = split("\n", replacement);
    if (replLines == []) {
        return "";
    minIndent = "";
    if ([_] := origIndents) {
        // only one line. have to invent indentation from span
        minIndent = "<for (_ <- [0..(span.begin?) ? span.begin.column : 0]) {> <}>";
        // we skip the first line for learning indentation, because that one would typically be embedded in a previous line.
        minIndent = sort(origIndents[1..])[0]? "";
    // we remove the leading spaces _up to_ the minimal indentation of the original, 
    // keep the rest of the indentation from the replacement (if any is left), and then the actual content.
    // that entire multiline result is then lazily indented with the minimal indentation we learned from the original.
    return indent(
        minIndent, 
        "<for (/^<theInd:[\t\ ]*><rest:[^\t\ ]+.*>$/ <- replLines) {><theInd[..max(size(minIndent), span.begin? ? span.begin.column : 0)]><rest>
        '<}>"[..-1])
Provide feedback

Saved searches

Use saved searches to filter your results more quickly

FilesExpand file tree

HiFiTreeDiff.rsc

Latest commit

History

HiFiTreeDiff.rsc

File metadata and controls