Static analysis, part II#28816
Open
alexduf wants to merge 14 commits into
Open
Conversation
This simplifies a lot of things
- naming - infinite recursion protection - better model - faster semantic db access
- types - naming
Contributor
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
The frontend repository for the Guardian's website was created in February 2012: 14 years ago.
It has had an increasing number of features and services added, leading this repo becoming pretty big.
Parallel to that, the DCAR project was started with the objective of centralising all the rendering happening at the Guardian.
We're getting closer to all the rendering happening in that new project, but at this stage we find ourselves unable to easily answer a question: what rendering is still happening from within the frontend repo?
More generally, this is the beginning of a larger piece of work: removing unused or undesirable code from this repo, at scale.
Measuring, Measuring, Measuring
The first step to understanding which endpoints are being used, is to look at live traffic data.
This work has been done already and is using the ALB access logs as well as the routes definitions of this application to undesrstand which endpoints are actively in use.
Digging deeper
If we know which endpoints are being used, and to what extent, we can now start looking at which endpoints are rendering HTML content by looking at the code.
On a smaller application, this is something that could have been done manually by reading the code, helped with an IDE.
But this project is at the scale where it makes sense to spend a few days writing the tool to help us do that for us, which explain the existence of this module.
This module leverages the scala-meta library to map all the twirl templates to their corresponding controller(s).
This is done by constructing "call hierarchies" (also known as call graphs), where the root of the tree is a twirl template, and we find all the places where this template is called, recusrively.
The end result is a tree data-structure that describes all the possible code paths to the twirl template across the application. Here's an edited down version to illustrate it:
But why pushing this to the frontend repo?
The work that went into writing this static analysis tool for this one specific question is likely to be re-used when we continue cleaning the codebase from unused features.
For instance, now that we're able to create call hierarchies we could also look at identifying which method could be safely removed once we've decided which controller entrypoints can be dropped.
Once the cleanup is finished, then this module can be entirely dropped, or spun into its own repo.
End result and next steps
This PR proposes a module that is capable of generating a csv that contains the mapping between endpoints (controller methods), and twirl templates.
The next step is to ingest this CSV as well as the aggregated live traffic data into a small local database (sqlite) and to understand:
About the review of this code
This code does not come with the usual unit tests and general rigour that is expected in production, simply because it is NOT production code. It is only meant to run locally by engineers.