|
| 1 | +--- |
| 2 | +layout: post |
| 3 | +title: "Cross-Translation Unit Taint Analysis for Firefox" |
| 4 | +date: 2025-12-16 10:06:37 +0100 |
| 5 | +author: Balázs Benics, Tom Ritter, Frederik Braun |
| 6 | +--- |
| 7 | + |
| 8 | +## Preface |
| 9 | + |
| 10 | +Browser security is a cutting edge frontier for exploit mitigations, |
| 11 | +addressing bug classes holistically, and identifying vulnerabilities. Not |
| 12 | +everything we try works, and we think it's important to document our |
| 13 | +shortcomings in addition to our successes. A responsible project uses all |
| 14 | +available tools to find bugs and vulnerabilities before you ship. Besides |
| 15 | +many other tools and techniques, Firefox uses Clang Tidy and the Clang |
| 16 | +Static Analyzer, including many customized checks for enforcing the coding |
| 17 | +conventions in the project. To extend these tools, Mozilla contacted Balázs, |
| 18 | +as one of the maintainers of the Clang Static Analyzer, to help address |
| 19 | +problems encountered when exploring Cross Translation Unit (CTU) Static |
| 20 | +Analysis. Ultimately, we weren't able to make as much headway with this |
| 21 | +project as we hoped, but we wanted to contribute our experience to the |
| 22 | +community and hopefully inspire future work. Be warned, this is a highly |
| 23 | +technical blog post. |
| 24 | + |
| 25 | +The following sections describe some fundamental concepts, such as taint |
| 26 | +analysis, CTU, the Clang Static Analyzer engine. This will be followed by |
| 27 | +the problem statement and the solution. Finally, some closing words. |
| 28 | + |
| 29 | +Disclaimer: The work described here was sponsored by Mozilla. |
| 30 | + |
| 31 | +## Static Analysis Fundamentals |
| 32 | + |
| 33 | +### Taint analysis |
| 34 | + |
| 35 | +Vulnerabilities often root from using untrusted data in some way. Data from |
| 36 | +such sources is called "tainted" in static analysis, and "taint analysis" is the |
| 37 | +technique that tracks how such "tainted" values propagate or "flow" |
| 38 | +throughout the program. |
| 39 | + |
| 40 | +In short, "Taint sources" introduce a flow, such as reading from a socket. If |
| 41 | +a tainted value reaches a "taint sink" then we should report an error. These |
| 42 | +"sources" and "sinks" are often configurable. |
| 43 | + |
| 44 | +A [YAML configuration](https://clang.llvm.org/docs/analyzer/user-docs/TaintAnalysisConfiguration.html) file can be used with the Clang Static Analyzer |
| 45 | +configuring the taint rules. |
| 46 | + |
| 47 | +### Cross Translation Unit (CTU) analysis |
| 48 | + |
| 49 | +The steps involved in bugs or vulnerabilities might cross file boundaries. |
| 50 | +Conventional static analysis tools that operate on a translation-unit basis |
| 51 | +would not find the issue. Luckily, the Clang Static Analyzer offers [CTU |
| 52 | +mode](https://clang.llvm.org/docs/analyzer/user-docs/CrossTranslationUnit.html) |
| 53 | +that loads the relevant pieces of the required translation units to |
| 54 | +enhance the contextual view of the analysis target, thus increasing the |
| 55 | +covered execution paths. Running CTU needs a bit of setup, but luckily |
| 56 | +tools like [scan-build](https://clang.llvm.org/docs/analyzer/user-docs/CommandLineUsage.html#scan-build) |
| 57 | +or [CodeChecker](https://github.com/Ericsson/codechecker) have built-in support. |
| 58 | + |
| 59 | +### Path-sensitive analysis |
| 60 | + |
| 61 | +The Clang Static Analyzer implements a path-sensitive symbolic execution. |
| 62 | +[Here is an excellent talk](https://www.youtube.com/watch?v=g0Mqx1niUi0) |
| 63 | +but let us give a refresher here. |
| 64 | + |
| 65 | +Basically, it interprets the abstract syntax tree (AST) of the analyzed C/C++ |
| 66 | +program and builds up program facts statement by statement as it |
| 67 | +simulates different execution paths of the program. If it sees an `if` |
| 68 | +statement, it splits into two execution paths: one where the condition is |
| 69 | +assumed to be false, and another one where it's assumed to be true. Loops |
| 70 | +are handled slightly differently, but that’s not the point of this post today. |
| 71 | + |
| 72 | +When the engine sees a function call, it will jump to the definition of the |
| 73 | +callee (if available) and continue the analysis there with the arguments we |
| 74 | +had at the call-site. We call this "inlining" in the Clang Static Analyzer. This |
| 75 | +makes this engine inter-procedural, in other words, reason across |
| 76 | +functions. Of course, this only works if it knows the callee. This means that |
| 77 | +without knowing the pointee of a function pointer or the dynamic type of a |
| 78 | +polymorphic object (that has virtual functions), it cannot "inline" the callee, |
| 79 | +which in turn means that the engine must conservatively relax the program |
| 80 | +facts it gathered so far because they might be changed by the callee. |
| 81 | + |
| 82 | +For example, if we have some allocated memory, and we pass that pointer |
| 83 | +to such a function, then the engine must assume that the pointer was |
| 84 | +potentially released, and not raise leak warnings after this point. |
| 85 | + |
| 86 | +The conclusion here is that following the control-flow is critical, and virtual |
| 87 | +functions limit our ability to reason about this if we don’t know the dynamic |
| 88 | +type of objects. |
| 89 | + |
| 90 | +## So, taint analysis for Firefox? |
| 91 | + |
| 92 | +### Firefox has a lot of virtual functions! |
| 93 | + |
| 94 | +We discussed that control-flow is critical for taint analysis, and virtual |
| 95 | +functions ruin the control-flow. A browser has almost every code pattern |
| 96 | +you can imagine, and it so happens that many of the motivating use cases |
| 97 | +for this analysis involve virtual functions that also happen to cross file |
| 98 | +boundaries. |
| 99 | + |
| 100 | +### Once upon a time... |
| 101 | + |
| 102 | +It all started by Tom creating a couple of GitHub issues, like |
| 103 | +[#114270](https://github.com/llvm/llvm-project/issues/114270) |
| 104 | +(which prompted a couple smaller fixes that are not the subject of this port), |
| 105 | +and [#62663](https://github.com/llvm/llvm-project/issues/62663). |
| 106 | + |
| 107 | +This latter one was blocked by not being able to follow the callees of virtual |
| 108 | +functions, kicking off this whole subject and the prototype. |
| 109 | + |
| 110 | +### Plotting against virtual functions |
| 111 | + |
| 112 | +#### The idea |
| 113 | +Let’s just look at the AST and build the inheritance graph. After that, if we |
| 114 | +see a virtual call to `data()`, we could check who overrides this method. |
| 115 | + |
| 116 | +Let’s say only class `A` and `B` overrides this method in the translation unit. |
| 117 | +This means we could split the path into 2 and assume that on one path we |
| 118 | +call `A::data()` and on the other one `B::data()`. |
| 119 | + |
| 120 | +```c++ |
| 121 | +// class A... class B deriving from Base |
| 122 | +void func(Base *p) { |
| 123 | +p->data(); // ‘p’ might point to an object A or B here. |
| 124 | +} |
| 125 | +``` |
| 126 | +
|
| 127 | +This looks nice and simple, and the core of the idea is solid. However, there |
| 128 | +are a couple of problems: |
| 129 | +
|
| 130 | + 1. One translation unit (TU) might define a class `Derived`, overriding |
| 131 | +`data()`, and then pass a `Base` pointer to the other translation unit. |
| 132 | +And when that TU is analyzed, it shouldn’t be sure that only class `A` |
| 133 | +and `B` overrides `data()` just because it didn’t see `Derived` from the |
| 134 | +other TU. |
| 135 | +This is the problem with inheritance, which is an "open-set" relation. |
| 136 | +One cannot be sure to see the whole inheritance graph all at once. |
| 137 | +
|
| 138 | + 2. It’s not only that `Derived` might be in a different TU, but it might be in |
| 139 | +a 3rd party library, and dynamically loaded at runtime. In this case, |
| 140 | +assuming a finite set of callees for a virtual function would be wrong. |
| 141 | +
|
| 142 | +#### Refining the idea |
| 143 | +
|
| 144 | +Fixing problem (2) is easy, as we should just assume that the list of |
| 145 | +potential callees always has an extra unknown callee, to have an execution |
| 146 | +path where the call is conservatively evaluated and do the invalidations - |
| 147 | +just like before. |
| 148 | +
|
| 149 | +Fixing problem (1) is more challenging because we need whole-program |
| 150 | +analysis. We need to create the inheritance graphs of each TU and then |
| 151 | +merge them into a unified graph. Once we’ve built that, we can run the |
| 152 | +Clang Static Analyzer and start reasoning about the overriders of virtual |
| 153 | +functions in the whole project. |
| 154 | +Consequently, in the example we discussed before, we would know that |
| 155 | +class `A`, `B` and (crucially) `Derived` overrides `data()`. So after the call, |
| 156 | +we would have 4 execution paths: `A`, `B`, `Derived` and the last path is for |
| 157 | +the unknown case (like potentially dynamically loading some library that |
| 158 | +overrides this method). |
| 159 | +
|
| 160 | +#### It sounds great, but does it work? |
| 161 | +
|
| 162 | +It does! The analysis gives a list of the potential overriders of a virtual |
| 163 | +function. The Clang Static Analyzer was modified to do the path splits we |
| 164 | +discussed and remember the dynamic type constraints we learn on the |
| 165 | +way. There is one catch though. |
| 166 | +
|
| 167 | +Some taint flows cross file boundaries, and the Clang Static Analyzer has |
| 168 | +CTU to counter this, right? |
| 169 | +
|
| 170 | +CTU uses the "[ASTImporter](https://clang.llvm.org/docs/LibASTImporter.html)", |
| 171 | +which is known to have infinite recursion, |
| 172 | +crashes and incomplete implementation in terms of what constructs it can |
| 173 | +import. There are plenty of examples, but one we encountered |
| 174 | +was [#123093](https://github.com/llvm/llvm-project/issues/123093). |
| 175 | +
|
| 176 | +Usually fixing one of these is time consuming and needs a deep |
| 177 | +understanding of the ASTImporter. And even if you fix one of these, there |
| 178 | +will be plenty of others to follow. |
| 179 | +
|
| 180 | +This patch for "devirtualizing" virtual function calls didn’t really help with the |
| 181 | +reliability of the ASTImporter. As the interesting taint flows cross file |
| 182 | +boundaries, the benefits of this new feature are unfortunately limited by the |
| 183 | +ASTImporter for Firefox. |
| 184 | +
|
| 185 | +### Is it available in the Clang Static Analyzer already? |
| 186 | +
|
| 187 | +Unfortunately no, and as the contract was over, it is unlikely that these |
| 188 | +patches would merge upstream without others splitting up the patches and |
| 189 | +doing the labor of proposing them upstream. Note that this whole program |
| 190 | +analysis is a brand new feature and it was just a quick prototype to check |
| 191 | +the viability. |
| 192 | +
|
| 193 | +Upstreaming would likely also need some wider consensus about the |
| 194 | +design. |
| 195 | +
|
| 196 | +Apparently, whole-project analyses could be important for other domains |
| 197 | +besides bug-finding, such as code rewriting tools, which was the motivation |
| 198 | +for a |
| 199 | +[recently posted RFC](https://discourse.llvm.org/t/rfc-scalable-static-analysis-framework/88678). |
| 200 | +The proposed framework in that RFC could |
| 201 | +potentially also work for the use-case described in this blog post, but it’s |
| 202 | +important to highlight that this prototype was built before that RFC and |
| 203 | +framework, consequently it’s not using that. |
| 204 | +
|
| 205 | +Balázs shared that working on the prototype was really motivating at first, |
| 206 | +but as he started to hit the bugs in the ASTImporter - effectively blocking |
| 207 | +the prototype - development slowed down. All in all, the prototype proved |
| 208 | +that using project-level information, such as "overriders", could enable |
| 209 | +better control-flow modeling, but CTU analysis as we have it in Clang today |
| 210 | +will show its weaknesses when trying to resolve those calls. Without |
| 211 | +resolving these virtual calls, we can’t track taint flows across file boundaries |
| 212 | +in the Clang Static Analyzer. |
| 213 | +
|
| 214 | +### What does this mean for Firefox? |
| 215 | +
|
| 216 | +Not much, unfortunately. If the ASTImporter would work as expected, then |
| 217 | +finalizing the prototype would meaningfully improve taint analysis on code |
| 218 | +using virtual functions. |
| 219 | +
|
| 220 | +You can find the source code at Balázs’ GitHub repo as |
| 221 | +[steakhal/llvm-project/devirtualize-for-each-overrider](https://github.com/steakhal/llvm-project/tree/devirtualize-for-each-overrider), |
| 222 | +which served well for |
| 223 | +exploring and rapid prototyping but it is far from production quality. |
| 224 | +
|
| 225 | +### Bonus: We need to talk about the ASTImporter |
| 226 | +From the cases Balázs looked at, it seems like qualified names, such as |
| 227 | +`std` in `std::unique_ptr` for example, will trigger the import of a `std` |
| 228 | +[DeclContext](https://clang.llvm.org/doxygen/classclang_1_1DeclContext.html), |
| 229 | +which in turn triggers the import of all the declarations within |
| 230 | +that lexical declaration context. In other words, we start importing a lot |
| 231 | +more things than strictly necessary to make the `std::` qualification work. |
| 232 | +This in turn increases the chances of hitting something that causes a crash |
| 233 | +or just fails to import, poisoning the original AST we wanted to import into. |
| 234 | +This is likely not how it should work, and might be a good subject to discuss |
| 235 | +in the future. |
| 236 | +
|
| 237 | +Note that the ASTImporter can be configured to do so-called "[minimal |
| 238 | +imports](https://clang.llvm.org/docs/LibASTImporter.html#api)" which is |
| 239 | +probably what we should have for the Clang Static |
| 240 | +Analyzer, however, |
| 241 | +[this is not set](https://github.com/llvm/llvm-project/blob/4a5692d6b3a6276ef6a8b6a62ef187a16dd3f983/clang/lib/CrossTU/CrossTranslationUnit.cpp#L799-L801), |
| 242 | +and setting it would lead to even more crashes. Balázs didn’t investigate this |
| 243 | +further, but it might be something to explore in the future. |
0 commit comments