-
Notifications
You must be signed in to change notification settings - Fork 0
srcQL Queries
srcQL (source Query Language) is a query language that provides the ability to search source code for for snippets that match some syntactic or semantic pattern. srcQL is integrated into the srcML toolkit, which will process any given srcQL query and produce a corresponding XPath search.
The language is composed of three main components - source code expressions, scoping operators, and set operators.
A source code expression is an expression used by srcQL to specify what syntactic or semantic structure the query should search for.
A source code pattern (previously referred to as srcPat) is a small fragment of source code that is used to search for matches within the target source code. Source code patterns use the syntax of the language of the source code that the query is being ran on. For example, the following C++ pattern will match with any int x that is found within the target source code.
int xSpecifically, this query looks for the following srcML structure:
<decl><type><name>int</name></type> <name>x</name></decl>All of the examples on this page will be on C++, but source code patterns work for all languages that srcML supports.
Any fragment of source code can be used as a pattern. For example, the following pattern will match with any if statement within C++.
if () {}Because source code fragments work as patterns, any scoping within a fragment will be taken into consideration while searching. For example, the following pattern will match with any if statement that has a while loop directly inside of its block.
if () { while() {} }The order of syntax will also be taken into consideration. For example, the following query, will match with any if statement that directly contains a while loop and a switch statement, with the while loop appearing before the switch statement.
if () {
while () {}
switch () {}
}This pattern could be applied to the following source code, and will match only some of the ifs
if (true) { // Will Match
while (true) { }
switch (val) { }
}
if (true) { // Will NOT Match
switch (val) { }
while (true) { }
}
if (true) { // Will Match
while (true) { }
int x = 0;
switch (val) { }
}As shown in the 3rd example above, the presence of something between the while and switch does not affect the pattern, so long as general ordering and scoping matches.
In addition to source code fragments, srcQL also allows the presence of Logical Variables to replace certain names and expressions within the source code.
Looking at the first example again, this pattern will only return declarations of type int with name x
int xThis is very limited in what it will return. To expand the capabilities of patterns, x can be replaced with a logical variable:
int $NThis pattern will match any integer variable declaration, regardless of the name of the variable.
Logical variables can also replace the type in the declaration. The following pattern will match with any variable declaration of any type and name.
$T $NLogical variables can also act as a wildcard for expressions. The following is a pattern that will match with any expression in the source code.
$EsrcML defines what constitutes an expression in this case. To view what valid expression subelements are, visit this page.
Logical variables can also act as a wildcard for generic text - strings being the primary example
"" // Will only match with empty strings
"x" // Will only match with strings that exactly match "x"
"$T" // Will match any string
"prefix$T" // Will match any string that starts with "prefix"Due to a source code pattern's matching looking for the bare minimum to match, patterns that contain series will match with series larger than themselves. For example the following pattern will match with any function declaration,
$TYPE $FUNC();This following pattern will match with any function declaration that has at least one parameter,
$TYPE $FUNC($A);
This following pattern will match with any function declaration that has at least two parameters, and so on.
$TYPE $FUNC($A,$B);Subsumption is an intrinsic part of the source code pattern - for example, the if pattern from earlier will match with any if statement, regardless of what is in the condition or block.
if () {}The following pattern from earlier matches with any if statement that has at least one while loop, but it could have many! It also allows for any other statements to appear within the block. In addition, the while loop can have anything inside of its condition and block.
if () { while() {} }Nesting this further yields the same results.
If the same logical variable is used multiple times, the query will only return code fragments where the two variables are the same thing. This process is called Unification, and allows patterns to search for more specific matches.
The following simple unification pattern will return any expression that adds two copies of a value together,
$X + $X1 + 1 // Will Match
1 + 2 // Will NOT Match
x + x // Will Match
foo() + foo() // Will Match
bar(1) + bar(2) // Will NOT MatchUnification can occur on different variables at once. For example, the following pattern will find all functions that assign its parameter to a new variable:
$FTYPE $FUNC($TYPE $PARAM) { $TYPE $NEW = $PARAM; }In that example, only the $TYPE and $PARAM variables undergo unification, because they appear twice. $FTYPE, $FUNC, and $NEW just act as wildcards since they only appear once.
Unification can be used to search for highly specific patterns. For example:
$TYPE $FUNC($TYPE $PARAM) { $TYPE $RTN = $CALL($PARAM); return $RTN; }The above pattern will match with all functions that:
- Have a parameter of the same type as its return type
- Creates a new parameter with an initial value from a function call that passes the parameter
- Returns the new variable
Instead of providing a source code pattern for an expression, a valid XPath can be substituted. XPaths can be used to specify a wider range of hierarchical structure within a source code expression, but cannot make use of logical variables or unification.
An example valid XPath would be,
//src:function
which will match the same things as
$T $U() { }The provided XPath must start with either / or //, and cannot return anything other than a node set.
Lastly, a bare srcML tag can be used, which acts as a shortcut to simple XPaths.
The following 3 source expressions will match to the same source code:
$T $U() { } -- Pattern
//src:function -- XPath
src:function -- srcML Tag