-
Notifications
You must be signed in to change notification settings - Fork 0
srcQL Queries
srcQL (source Query Language) is a query language that provides the ability to search source code for for snippets that match some syntactic or semantic pattern. srcQL is integrated into the srcML toolkit, which will process any given srcQL query and produce a corresponding XPath search.
The language is composed of three main components - source code expressions, scoping operators, and set operators.
A source code expression is an expression used by srcQL to specify what syntactic or semantic structure the query should search for.
A source code pattern (previously referred to as srcPat) is a small fragment of source code that is used to search for matches within the target source code. Source code patterns use the syntax of the language of the source code that the query is being ran on. For example, the following C++ pattern,
int xwill match with any int x that is found within the target source code. Specifically, this query looks for the following srcML structure:
<decl><type><name>int</name></type> <name>x</name></decl>Any fragment of source code can be used as a pattern. For example,
if () {}is a valid pattern that will match with any if statement within C++.
Because source code fragments work as patterns, any scoping within a fragment will be taken into consideration while searching. For example,
if () { while() {} }will match with any if statement that has a while loop directly inside of its block.
The order of syntax will also be taken into consideration. For example, the following query,
if () {
while () {}
switch () {}
}will match with any if statement that directly contains a while loop and a switch statement, with the while loop appearing before the switch statement.
if (true) { // Will Match
while (true) { }
switch (val) { }
}
if (true) { // Will NOT Match
switch (val) { }
while (true) { }
}
if (true) { // Will Match
while (true) { }
int x = 0;
switch (val) { }
}As shown in the 3rd example above, the presence of something between the while and switch does not affect the pattern, so long as general ordering and scoping matches.
In addition to source code fragments, srcQL also allows the presence of Syntactic Variables to replace certain names and expressions within the source code.
Looking at the first example again,
int xThis pattern will only return declarations of type int with name x, which is very limited. To solve this, x can be replaced with a syntactic variable:
int $NThis pattern will match any integer variable declaration, regardless of the name of the variable.
Syntactic variables can also replace the type in the declaration, giving,
$T $Nwhich will match with any variable declaration of any type and name.
Syntactic variables can also act as a wildcard for expressions,
$Eis a pattern that will match with any expression in the source code. srcML defines what constitutes an expression in this case.
Syntactic variables can also act as a wildcard for generic text - strings being the primary example
"" // Will only match with empty strings
"x" // Will only match with strings that exactly match "x"
"$T" // Will match any string
"prefix$T" // Will match any string that starts with "prefix"Due to a source code pattern's matching looking for the bare minimum to match, patterns that contain series will match with series larger than themselves. For example,
$TYPE $FUNC();is a pattern that will match with any function declaration,
$TYPE $FUNC($A);will match with any function declaration that has at least one parameter,
$TYPE $FUNC($A,$B);will match with any function declaration that has at least two parameters, and so on.
If the same syntactic variable is used multiple times, the query will only return code fragments where the two variables are the same thing. For more information, go to the Unification section.
Instead of providing a source code pattern for an expression, a valid XPath can be substituted. XPaths can be used to specify a wider range of hierarchical structure within a source code expression, but cannot make use of syntactical variables or unification.
An example valid XPath would be,
//src:function
which is equivalent to
$T $U() { }The provided XPath must start with //, and cannot return anything other than a node set.
Lastly, a bare srcML tag can be used, which acts as a shortcut to simple XPaths.
The following 3 source expressions are equivalent:
$T $U() { }
//src:function
src:function