Skip to content

Commit b6077a6

Browse files
authored
Add the Statis Analysis Blog Post (#18)
1 parent 1fb92ad commit b6077a6

1 file changed

Lines changed: 243 additions & 0 deletions

File tree

Lines changed: 243 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,243 @@
1+
---
2+
layout: post
3+
title: "Cross-Translation Unit Taint Analysis for Firefox"
4+
date: 2025-12-16 10:06:37 +0100
5+
author: Balázs Benics, Tom Ritter, Frederik Braun
6+
---
7+
8+
## Preface
9+
10+
Browser security is a cutting edge frontier for exploit mitigations,
11+
addressing bug classes holistically, and identifying vulnerabilities. Not
12+
everything we try works, and we think it's important to document our
13+
shortcomings in addition to our successes. A responsible project uses all
14+
available tools to find bugs and vulnerabilities before you ship. Besides
15+
many other tools and techniques, Firefox uses Clang Tidy and the Clang
16+
Static Analyzer, including many customized checks for enforcing the coding
17+
conventions in the project. To extend these tools, Mozilla contacted Balázs,
18+
as one of the maintainers of the Clang Static Analyzer, to help address
19+
problems encountered when exploring Cross Translation Unit (CTU) Static
20+
Analysis. Ultimately, we weren't able to make as much headway with this
21+
project as we hoped, but we wanted to contribute our experience to the
22+
community and hopefully inspire future work. Be warned, this is a highly
23+
technical blog post.
24+
25+
The following sections describe some fundamental concepts, such as taint
26+
analysis, CTU, the Clang Static Analyzer engine. This will be followed by
27+
the problem statement and the solution. Finally, some closing words.
28+
29+
Disclaimer: The work described here was sponsored by Mozilla.
30+
31+
## Static Analysis Fundamentals
32+
33+
### Taint analysis
34+
35+
Vulnerabilities often root from using untrusted data in some way. Data from
36+
such sources is called "tainted" in static analysis, and "taint analysis" is the
37+
technique that tracks how such "tainted" values propagate or "flow"
38+
throughout the program.
39+
40+
In short, "Taint sources" introduce a flow, such as reading from a socket. If
41+
a tainted value reaches a "taint sink" then we should report an error. These
42+
"sources" and "sinks" are often configurable.
43+
44+
A [YAML configuration](https://clang.llvm.org/docs/analyzer/user-docs/TaintAnalysisConfiguration.html) file can be used with the Clang Static Analyzer
45+
configuring the taint rules.
46+
47+
### Cross Translation Unit (CTU) analysis
48+
49+
The steps involved in bugs or vulnerabilities might cross file boundaries.
50+
Conventional static analysis tools that operate on a translation-unit basis
51+
would not find the issue. Luckily, the Clang Static Analyzer offers [CTU
52+
mode](https://clang.llvm.org/docs/analyzer/user-docs/CrossTranslationUnit.html)
53+
that loads the relevant pieces of the required translation units to
54+
enhance the contextual view of the analysis target, thus increasing the
55+
covered execution paths. Running CTU needs a bit of setup, but luckily
56+
tools like [scan-build](https://clang.llvm.org/docs/analyzer/user-docs/CommandLineUsage.html#scan-build)
57+
or [CodeChecker](https://github.com/Ericsson/codechecker) have built-in support.
58+
59+
### Path-sensitive analysis
60+
61+
The Clang Static Analyzer implements a path-sensitive symbolic execution.
62+
[Here is an excellent talk](https://www.youtube.com/watch?v=g0Mqx1niUi0)
63+
but let us give a refresher here.
64+
65+
Basically, it interprets the abstract syntax tree (AST) of the analyzed C/C++
66+
program and builds up program facts statement by statement as it
67+
simulates different execution paths of the program. If it sees an `if`
68+
statement, it splits into two execution paths: one where the condition is
69+
assumed to be false, and another one where it's assumed to be true. Loops
70+
are handled slightly differently, but that’s not the point of this post today.
71+
72+
When the engine sees a function call, it will jump to the definition of the
73+
callee (if available) and continue the analysis there with the arguments we
74+
had at the call-site. We call this "inlining" in the Clang Static Analyzer. This
75+
makes this engine inter-procedural, in other words, reason across
76+
functions. Of course, this only works if it knows the callee. This means that
77+
without knowing the pointee of a function pointer or the dynamic type of a
78+
polymorphic object (that has virtual functions), it cannot "inline" the callee,
79+
which in turn means that the engine must conservatively relax the program
80+
facts it gathered so far because they might be changed by the callee.
81+
82+
For example, if we have some allocated memory, and we pass that pointer
83+
to such a function, then the engine must assume that the pointer was
84+
potentially released, and not raise leak warnings after this point.
85+
86+
The conclusion here is that following the control-flow is critical, and virtual
87+
functions limit our ability to reason about this if we don’t know the dynamic
88+
type of objects.
89+
90+
## So, taint analysis for Firefox?
91+
92+
### Firefox has a lot of virtual functions!
93+
94+
We discussed that control-flow is critical for taint analysis, and virtual
95+
functions ruin the control-flow. A browser has almost every code pattern
96+
you can imagine, and it so happens that many of the motivating use cases
97+
for this analysis involve virtual functions that also happen to cross file
98+
boundaries.
99+
100+
### Once upon a time...
101+
102+
It all started by Tom creating a couple of GitHub issues, like
103+
[#114270](https://github.com/llvm/llvm-project/issues/114270)
104+
(which prompted a couple smaller fixes that are not the subject of this port),
105+
and [#62663](https://github.com/llvm/llvm-project/issues/62663).
106+
107+
This latter one was blocked by not being able to follow the callees of virtual
108+
functions, kicking off this whole subject and the prototype.
109+
110+
### Plotting against virtual functions
111+
112+
#### The idea
113+
Let’s just look at the AST and build the inheritance graph. After that, if we
114+
see a virtual call to `data()`, we could check who overrides this method.
115+
116+
Let’s say only class `A` and `B` overrides this method in the translation unit.
117+
This means we could split the path into 2 and assume that on one path we
118+
call `A::data()` and on the other one `B::data()`.
119+
120+
```c++
121+
// class A... class B deriving from Base
122+
void func(Base *p) {
123+
p->data(); // ‘p’ might point to an object A or B here.
124+
}
125+
```
126+
127+
This looks nice and simple, and the core of the idea is solid. However, there
128+
are a couple of problems:
129+
130+
1. One translation unit (TU) might define a class `Derived`, overriding
131+
`data()`, and then pass a `Base` pointer to the other translation unit.
132+
And when that TU is analyzed, it shouldn’t be sure that only class `A`
133+
and `B` overrides `data()` just because it didn’t see `Derived` from the
134+
other TU.
135+
This is the problem with inheritance, which is an "open-set" relation.
136+
One cannot be sure to see the whole inheritance graph all at once.
137+
138+
2. It’s not only that `Derived` might be in a different TU, but it might be in
139+
a 3rd party library, and dynamically loaded at runtime. In this case,
140+
assuming a finite set of callees for a virtual function would be wrong.
141+
142+
#### Refining the idea
143+
144+
Fixing problem (2) is easy, as we should just assume that the list of
145+
potential callees always has an extra unknown callee, to have an execution
146+
path where the call is conservatively evaluated and do the invalidations -
147+
just like before.
148+
149+
Fixing problem (1) is more challenging because we need whole-program
150+
analysis. We need to create the inheritance graphs of each TU and then
151+
merge them into a unified graph. Once we’ve built that, we can run the
152+
Clang Static Analyzer and start reasoning about the overriders of virtual
153+
functions in the whole project.
154+
Consequently, in the example we discussed before, we would know that
155+
class `A`, `B` and (crucially) `Derived` overrides `data()`. So after the call,
156+
we would have 4 execution paths: `A`, `B`, `Derived` and the last path is for
157+
the unknown case (like potentially dynamically loading some library that
158+
overrides this method).
159+
160+
#### It sounds great, but does it work?
161+
162+
It does! The analysis gives a list of the potential overriders of a virtual
163+
function. The Clang Static Analyzer was modified to do the path splits we
164+
discussed and remember the dynamic type constraints we learn on the
165+
way. There is one catch though.
166+
167+
Some taint flows cross file boundaries, and the Clang Static Analyzer has
168+
CTU to counter this, right?
169+
170+
CTU uses the "[ASTImporter](https://clang.llvm.org/docs/LibASTImporter.html)",
171+
which is known to have infinite recursion,
172+
crashes and incomplete implementation in terms of what constructs it can
173+
import. There are plenty of examples, but one we encountered
174+
was [#123093](https://github.com/llvm/llvm-project/issues/123093).
175+
176+
Usually fixing one of these is time consuming and needs a deep
177+
understanding of the ASTImporter. And even if you fix one of these, there
178+
will be plenty of others to follow.
179+
180+
This patch for "devirtualizing" virtual function calls didn’t really help with the
181+
reliability of the ASTImporter. As the interesting taint flows cross file
182+
boundaries, the benefits of this new feature are unfortunately limited by the
183+
ASTImporter for Firefox.
184+
185+
### Is it available in the Clang Static Analyzer already?
186+
187+
Unfortunately no, and as the contract was over, it is unlikely that these
188+
patches would merge upstream without others splitting up the patches and
189+
doing the labor of proposing them upstream. Note that this whole program
190+
analysis is a brand new feature and it was just a quick prototype to check
191+
the viability.
192+
193+
Upstreaming would likely also need some wider consensus about the
194+
design.
195+
196+
Apparently, whole-project analyses could be important for other domains
197+
besides bug-finding, such as code rewriting tools, which was the motivation
198+
for a
199+
[recently posted RFC](https://discourse.llvm.org/t/rfc-scalable-static-analysis-framework/88678).
200+
The proposed framework in that RFC could
201+
potentially also work for the use-case described in this blog post, but it’s
202+
important to highlight that this prototype was built before that RFC and
203+
framework, consequently it’s not using that.
204+
205+
Balázs shared that working on the prototype was really motivating at first,
206+
but as he started to hit the bugs in the ASTImporter - effectively blocking
207+
the prototype - development slowed down. All in all, the prototype proved
208+
that using project-level information, such as "overriders", could enable
209+
better control-flow modeling, but CTU analysis as we have it in Clang today
210+
will show its weaknesses when trying to resolve those calls. Without
211+
resolving these virtual calls, we can’t track taint flows across file boundaries
212+
in the Clang Static Analyzer.
213+
214+
### What does this mean for Firefox?
215+
216+
Not much, unfortunately. If the ASTImporter would work as expected, then
217+
finalizing the prototype would meaningfully improve taint analysis on code
218+
using virtual functions.
219+
220+
You can find the source code at Balázs’ GitHub repo as
221+
[steakhal/llvm-project/devirtualize-for-each-overrider](https://github.com/steakhal/llvm-project/tree/devirtualize-for-each-overrider),
222+
which served well for
223+
exploring and rapid prototyping but it is far from production quality.
224+
225+
### Bonus: We need to talk about the ASTImporter
226+
From the cases Balázs looked at, it seems like qualified names, such as
227+
`std` in `std::unique_ptr` for example, will trigger the import of a `std`
228+
[DeclContext](https://clang.llvm.org/doxygen/classclang_1_1DeclContext.html),
229+
which in turn triggers the import of all the declarations within
230+
that lexical declaration context. In other words, we start importing a lot
231+
more things than strictly necessary to make the `std::` qualification work.
232+
This in turn increases the chances of hitting something that causes a crash
233+
or just fails to import, poisoning the original AST we wanted to import into.
234+
This is likely not how it should work, and might be a good subject to discuss
235+
in the future.
236+
237+
Note that the ASTImporter can be configured to do so-called "[minimal
238+
imports](https://clang.llvm.org/docs/LibASTImporter.html#api)" which is
239+
probably what we should have for the Clang Static
240+
Analyzer, however,
241+
[this is not set](https://github.com/llvm/llvm-project/blob/4a5692d6b3a6276ef6a8b6a62ef187a16dd3f983/clang/lib/CrossTU/CrossTranslationUnit.cpp#L799-L801),
242+
and setting it would lead to even more crashes. Balázs didn’t investigate this
243+
further, but it might be something to explore in the future.

0 commit comments

Comments
 (0)