|
| 1 | +# Temporal Workflow Check for Java |
| 2 | + |
| 3 | +Temporal workflowcheck is a utility scans Java bytecode looking for workflow implementation methods that do invalid |
| 4 | +things. This mostly centers around |
| 5 | +[workflow logic constraints](https://docs.temporal.io/dev-guide/java/foundations#workflow-logic-requirements) that |
| 6 | +require workflows are deterministic. Currently it will catch when a workflow method does any of the following: |
| 7 | + |
| 8 | +* Invokes a method that is configured as invalid (e.g. threading, IO, random, system time, etc) |
| 9 | +* Accesses a static field configured as invalid (e.g. `System.out`) |
| 10 | +* Accesses a non-final static field |
| 11 | +* Invokes a method that itself violates any of the above rules |
| 12 | + |
| 13 | +With the last rule, that means this analyzer is recursive and gathers information transitively to ensure |
| 14 | +non-deterministic calls aren't made indirectly. |
| 15 | + |
| 16 | +⚠️ BETA |
| 17 | + |
| 18 | +This software is beta quality. We are gathering feedback before considering it stable. |
| 19 | + |
| 20 | +## Running |
| 21 | + |
| 22 | +### Prerequisites |
| 23 | + |
| 24 | +* JDK 8+ |
| 25 | + |
| 26 | +### Running manually |
| 27 | + |
| 28 | +The all-in-one JAR is best for running manually. Either download the latest version `-all.jar` from |
| 29 | +https://repo1.maven.org/maven2/io/temporal/temporal-workflowcheck or build via `gradlew :temporal-workflowcheck:build` |
| 30 | +then obtain `-all.jar` in `temporal-workflowcheck/build/libs`. |
| 31 | + |
| 32 | +Simply running the following will show help text: |
| 33 | + |
| 34 | + java -jar path/to/temporal-workflowcheck-<version>-all.jar --help |
| 35 | + |
| 36 | +Replace `<version>` with the actual version. The `check` call runs the workflow check and it accepts classpath entries |
| 37 | +as arguments, for example: |
| 38 | + |
| 39 | + java -jar path/to/temporal-workflowcheck-<version>-all.jar check path/to/my.jar path/to/my/classes/ |
| 40 | + |
| 41 | +The `check` command accepts the following arguments: |
| 42 | + |
| 43 | +* `--config <config>` - Path to a `.properties` configuration file. Multiple `--config` arguments can be provided with |
| 44 | + the later overriding the earlier. See the [Configuration](#configuration) section for details. |
| 45 | +* `--no-default-config` - If present, the default configuration file will not be the implied first configuration file. |
| 46 | +* `--show-valid` - In addition to showing invalid workflow methods, also show which workflow methods are valid. |
| 47 | +* `<classpath...>` - All other arguments are classpath entries. This accepts the same values as `-cp` on `java` |
| 48 | + commands. Each entry can be a set of entries separated by platform-specific path separator (i.e. `;` for Windows or |
| 49 | + `:` for Nix), or prefixed with an `@` symbol saying it's a file with entries one per line, or just as separate |
| 50 | + arguments. They are all combined to one large classpath when running. |
| 51 | + |
| 52 | +### Running in a Gradle project |
| 53 | + |
| 54 | +See the [Gradle sample](samples/gradle). |
| 55 | + |
| 56 | +### Running in a Maven project |
| 57 | + |
| 58 | +See the [Maven sample](samples/maven). |
| 59 | + |
| 60 | +### Running programmatically |
| 61 | + |
| 62 | +The workflowcheck utility is also a library. The `io.temporal.workflowcheck.WorkflowCheck` class can be instantiated |
| 63 | +with a `io.temporal.workflowcheck.Config` and then `findWorkflowClasses` can be run with classpath entries. This will |
| 64 | +return details about every workflow method implementation found, including invalid pieces. |
| 65 | + |
| 66 | +## Usage |
| 67 | + |
| 68 | +To use workflowcheck effectively, users may have to add configuration and warning-suppression to properly handle false |
| 69 | +positives. |
| 70 | + |
| 71 | +### Configuration |
| 72 | + |
| 73 | +workflowcheck configuration is done via `.properties` file(s). The main use of configuration is to configure what the |
| 74 | +system considers an "invalid" method or field. Each property is in the format: |
| 75 | + |
| 76 | +``` |
| 77 | +temporal.workflowcheck.invalid.[[some/package/]ClassName.]memberName[(Lmethod/Descriptor;)V]=true|false |
| 78 | +``` |
| 79 | + |
| 80 | +The key names after `temporal.workflowcheck.invalid.` are known as "descriptor patterns" and these patterns are checked |
| 81 | +to see whether a call or field access is invalid. If the value is `true` the pattern is considered invalid and if it is |
| 82 | +`false` it is considered valid. When supplying properties files as configuration, the later-provided configuration keys |
| 83 | +overwrite the earlier keys. During checking, the more-specific patterns are checked first and the first, most-specific |
| 84 | +one to say whether it is valid or invalid is what is used. This means that, given the following two properties: |
| 85 | + |
| 86 | +``` |
| 87 | +temporal.workflowcheck.invalid.my/package/MyUnsafeClass=true |
| 88 | +temporal.workflowcheck.invalid.my/package/MyUnsafeClass.safeMethod=false |
| 89 | +``` |
| 90 | + |
| 91 | +Every method and static field on `my.package.MyUnsafeClass` is considered invalid _except_ for `safeMethod`. |
| 92 | + |
| 93 | +The [implied default configuration](src/main/resources/io/temporal/workflowcheck/workflowcheck.properties) contains a |
| 94 | +good set of default invalid/valid configurations to catch most logic mistakes. Additional configurations can be more |
| 95 | +specific. For example, the default configuration disallows any calls on `java.lang.Thread`. But if, say, a failure is |
| 96 | +reported for `java.lang.Thread.getId()` but it is known to be used safely/deterministically by Temporal's definition, |
| 97 | +then a configuration file with the following will make it valid: |
| 98 | + |
| 99 | +``` |
| 100 | +temporal.workflowcheck.invalid.java/lang/Thread.getId=false |
| 101 | +``` |
| 102 | + |
| 103 | +When the system checks for valid/invalid, it checks the most-specific to least-specific (kinda), trying to find whether |
| 104 | +there is a key present (regardless of whether it is `true` or `false`) and it uses that value. For example, when the |
| 105 | +system encounters a call to `myString.indexOf("foo", 123)`, it will check for the following keys in order (the |
| 106 | +`temporal.workflowcheck.invalid.` prefix is removed for brevity): |
| 107 | + |
| 108 | +* `java/lang/String.indexOf(Ljava/lang/String;I)` |
| 109 | +* `java/lang/String.indexOf` |
| 110 | +* `String.indexOf(Ljava/lang/String;I)` |
| 111 | +* `String.indexOf` |
| 112 | +* `indexOf(Ljava/lang/String;I)` |
| 113 | +* `indexOf` |
| 114 | +* `String` |
| 115 | +* `java/lang/String` |
| 116 | +* `java/lang` |
| 117 | +* `java` |
| 118 | + |
| 119 | +The class name is the binary class name as defined by the JVM spec. The method descriptor is the method descriptor as |
| 120 | +defined by the JVM spec but with the return type removed (return types can be covariant across interfaces and therefore |
| 121 | +not useful for our strict checking). |
| 122 | + |
| 123 | +Note, in order to support superclass/superinterface checking, if nothing is found for the type, the same method is |
| 124 | +checked against the superclass and superinterfaces. So technically `java/lang/Object.indexOf` would match even though |
| 125 | +that method does not exist. This is by intention to allow marking entire hierarchies of methods invalid (e.g. |
| 126 | +`Map.forEach=true` but `LinkedHashMap.forEach=false`). |
| 127 | + |
| 128 | +There is advanced logic with inheritance and how the proper implementation of a method is determined including resolving |
| 129 | +interface default methods, but that is beyond this documentation. Users are encouraged to write tests confirming |
| 130 | +behavior of configuration keys. |
| 131 | + |
| 132 | +### Suppressing warnings |
| 133 | + |
| 134 | +Usually in Java when wanting to suppress warnings on source code, the `@SuppressWarnings` annotation in `java.lang` is |
| 135 | +used. However, workflowcheck operates on bytecode and that annotation is not preserved in bytecode. As an alternative, |
| 136 | +the `@WorkflowCheck.SuppressWarnings` annotation is available in `io.temporal.workflowcheck` that will ignore errors. |
| 137 | +For instance, one could have: |
| 138 | + |
| 139 | +```java |
| 140 | +@WorkflowCheck.SuppressWarnings |
| 141 | +public long getCurrentMillis() { |
| 142 | + return System.currentTimeMillis(); |
| 143 | +} |
| 144 | +``` |
| 145 | + |
| 146 | +This will now consider `getCurrentMillis` as valid regardless of what's inside it. Since the retention policy on the |
| 147 | +`@WorkflowCheck.SuppressWarnings` annotation is `CLASS`, it is not even required to be present at runtime. So the |
| 148 | +`workflowcheck` library can just be a compile-only dependency (i.e. `provided` scope in Maven or `compileOnly` in |
| 149 | +Gradle), the library is not needed at runtime. |
| 150 | + |
| 151 | +the `@WorkflowCheck.SuppressWarnings` annotation provides an `invalidMembers` field that can be a set of the descriptor |
| 152 | +patterns mentioned in the [Configuration](#configuration) section above. When not set, every invalid piece is accepted, |
| 153 | +so users are encouraged to at least put the method/field name they want to allow so accidental suppression is avoided. |
| 154 | +That means the above snippet would become: |
| 155 | + |
| 156 | +```java |
| 157 | +@WorkflowCheck.SuppressWarnings(invalidMembers = "currentTimeMillis") |
| 158 | +public long getCurrentMillis() { |
| 159 | + return System.currentTimeMillis(); |
| 160 | +} |
| 161 | +``` |
| 162 | + |
| 163 | +_Technically_ there is an inline suppression approach that is a runtime no-op that is `WorkflowCheck.suppressWarnings()` |
| 164 | +invocation followed by `WorkflowCheck.restoreWarnings()` later. So the above _could_ be: |
| 165 | + |
| 166 | +```java |
| 167 | +public long getCurrentMillis() { |
| 168 | + WorkflowCheck.suppressWarnings("currentTimeMillis"); |
| 169 | + var l = System.currentTimeMillis(); |
| 170 | + WorkflowCheck.restoreWarnings(); |
| 171 | + return l; |
| 172 | +} |
| 173 | +``` |
| 174 | + |
| 175 | +However this is hard to use for a couple of reasons. First, the methods are evaluated when they are seen in bytecode, |
| 176 | +not in the order they appear in logic. `javac` bytecode ordering is not the same as source ordering. Second, this does |
| 177 | +require a runtime dependency on the workflowcheck library. Users are discouraged from ever using this and should use the |
| 178 | +annotation instead. |
| 179 | + |
| 180 | +### Best practices |
| 181 | + |
| 182 | +#### False positives |
| 183 | + |
| 184 | +When encountering a false positive in a commonly used or third-party library, decide how far up the call stack the call |
| 185 | +is considered deterministic by Temporal's definition. Then configure the method as "valid". |
| 186 | + |
| 187 | +When encountering a specific false positive in workflow code, consider moving it to its own method and adding |
| 188 | +`@WorkflowCheck.SuppressWarnings` for just that method (or just add that annotation on the method but target the |
| 189 | +specific call). Annotations can be better than using configuration files for small amounts of local workflow code |
| 190 | +because the configuration file can get really cluttered with single-workflow-specific code and using configuration makes |
| 191 | +it hard for code readers to see that it is intentionally marked as valid. |
| 192 | + |
| 193 | +#### Collection iteration |
| 194 | + |
| 195 | +By default, iterating any `Iterable` is considered unsafe with specific exceptions carved out for `LinkedHashMap`, |
| 196 | +`List`, `SortedMap`, and `SortedSet`. But in many cases, static analysis code cannot detect that something is safe. For |
| 197 | +example: |
| 198 | + |
| 199 | +``` |
| 200 | +var map = new TreeMap<>(Map.of("a", "b")); |
| 201 | +for (var entry : map.entrySet()) { |
| 202 | + // ... |
| 203 | +} |
| 204 | +``` |
| 205 | + |
| 206 | +The implicit `Set.iterator` call on the `entrySet` will be considered invalid, because `entrySet`'s type is `Set`. The |
| 207 | +same thing happens when a higher level collection type is used, for example: |
| 208 | + |
| 209 | +``` |
| 210 | +Collection<String> strings = new TreeSet<>(List.of("foo", "bar")); |
| 211 | +for (var string : strings) { |
| 212 | + // ... |
| 213 | +} |
| 214 | +``` |
| 215 | + |
| 216 | +In cases where the higher-level type can be used, try to use that. So in the above sample change to |
| 217 | +`SortedSet<String> strings`. If that is not available, wrapping as a list just for iteration is acceptable. Workflow |
| 218 | +performance is not the same as general Java code performance, so it is often totally reasonable to accept the hit on |
| 219 | +iteration. So for the first example, it could be written like so: |
| 220 | + |
| 221 | +``` |
| 222 | +var map = new TreeMap<>(Map.of("a", "b")); |
| 223 | +for (var entry : new ArrayList<>(map.entrySet())) { |
| 224 | + // ... |
| 225 | +} |
| 226 | +``` |
| 227 | + |
| 228 | +In advanced situations, warning-suppression approaches can be applied. |
| 229 | + |
| 230 | +## Internals |
| 231 | + |
| 232 | +The following sections give some insight into the development of workflowcheck. |
| 233 | + |
| 234 | +### How it works |
| 235 | + |
| 236 | +Workflowcheck works by scanning all non-standard-library classes on the classpath. When scanning, in addition to some |
| 237 | +other details, the following bits of information are collected for every method: |
| 238 | + |
| 239 | +* Whether the method is a workflow declaration (e.g. interface methods with `@WorkflowMethod`) |
| 240 | +* Unsuppressed/unconfigured method invocations |
| 241 | +* Field accesses configured as invalid |
| 242 | +* Unsuppressed/unconfigured static field access |
| 243 | + |
| 244 | +This intentionally, to avoid eager recursion issues, does not traverse the call graph eagerly. |
| 245 | + |
| 246 | +Then for every method of every scanned class, it is checked whether it is a workflow method. This is done by checking if |
| 247 | +it contains a body and overrides any super interface workflow declaration at any level. For every method that is a |
| 248 | +workflow implementation, it is processed for invalidity. |
| 249 | + |
| 250 | +The invalidity processor is a recursive call that checks a method for whether it is invalid. Specifically, it: |
| 251 | + |
| 252 | +* Considers all invalid field accesses as invalid member accesses |
| 253 | +* Resolves target of all static field accesses and if the fields are non-final static fields, considers them invalid |
| 254 | + member accesses |
| 255 | +* Checks all method calls to see if they are invalid by: |
| 256 | + * Finding the most-specific configured descriptor pattern, using advanced most-specific logic when encountering |
| 257 | + ambiguous interface depth. If it is configured invalid, mark as such. Regardless of whether invalid or valid, if it |
| 258 | + was configured at all, do not go to the next step. |
| 259 | + * Resolve the most specific implementation of a method. Just because `Foo.bar()` is the method invocation doesn't mean |
| 260 | + `Foo` declares `bar()`, it may inherited. Advanced virtual resolution logic is used to find the first implementation |
| 261 | + in the hierarchy that it refers to. If/when resolved, that method is recursively checked for invalidity via this |
| 262 | + same processor (storing itself to prevent recursion) and if it's invalid, then so is this call. |
| 263 | + |
| 264 | +This algorithm ensures that configuration can apply at multiple levels of hierarchy but transitive code-based method |
| 265 | +invalidity is only on the proper implementation. So if `Foo.bar()` is bad but `ExtendsFoo.bar()` is ok, the former does |
| 266 | +not report a false positive (unless of course `ExtendsFoo.bar()` invokes `super.bar()` which would transitively mark it |
| 267 | +as invalid). |
| 268 | + |
| 269 | +During this resolution, the call graph is constructed with access to the class/method details for each transitive |
| 270 | +non-recursive invocation. Once complete, all the valid methods are trimmed to relieve memory pressure and all classes |
| 271 | +with workflow implementations properly contain their direct and indirect invalid member accesses. |
| 272 | + |
| 273 | +The printer then prints these out. |
| 274 | + |
| 275 | +### FAQ |
| 276 | + |
| 277 | +**Why not use static analysis library X?** |
| 278 | + |
| 279 | +One of the primary features of workflowcheck is to find whether a method is invalid transitively (i.e. building a call |
| 280 | +graph) across existing bytecode including the Java standard library. During research, no tool was found to be able to do |
| 281 | +this without significant effort or performance penalties. Approaches researched: |
| 282 | + |
| 283 | +* Checkstyle, ErrorProne, PMD, etc - not built for transitive bytecode checking |
| 284 | +* Custom annotation processor - Bad caching across compilation units, JDK compiler API hard to use (have to add-opens |
| 285 | + for modules for sun compiler API, or have to use third party) |
| 286 | +* Soot/SootUp - Soot is too old, SootUp is undergoing new development but was still a bit rough when tried (e.g. failed |
| 287 | + when an annotation wasn't on the classpath) |
| 288 | +* ClassGraph - Does not say which methods call other methods (so not a call graph) |
| 289 | +* SemGrep - Does not seem to support recursive call-graph analysis on bytecode to find bad calls at arbitrary call |
| 290 | + depths |
| 291 | +* CodeQL - Too slow |
| 292 | +* Doop, jQAssistant, java-callgraph, etc - not up to date |
| 293 | + |
| 294 | +Overall, walking the classpath using traditional, high-performance bytecode visiting via OW2 ASM is a good choice for |
| 295 | +this project's needs. |
| 296 | + |
| 297 | +**Why use `.properties` files instead of a better configuration format?** |
| 298 | + |
| 299 | +A goal of the workflowcheck project is to have as few dependencies as possible. |
| 300 | + |
| 301 | +**Why not use more modern Java features in the code?** |
| 302 | + |
| 303 | +The code is optimized for performance, so direct field access instead of encapsulation, looping instead of streaming, |
| 304 | +mutable objects instead of records, etc may be present. But the user-facing API does follow proper practices. |
| 305 | + |
| 306 | +### Open Issues |
| 307 | + |
| 308 | +https://github.com/temporalio/sdk-java/issues?q=is%3Aissue%20state%3Aopen%20workflowcheck |
0 commit comments