Skip to content

Commit fecc156

Browse files
tconley1428cretzclaude
authored
workflowcheck - initial support for Java static analyzer (#2767)
* workflowcheck - initial support for Java static analyzer * Configure temporal-workflowcheck for separate jar publishing Add java plugin and shadow publication configuration to enable publishing the static analyzer as a standalone executable jar. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com> * Add shadow jar dpendency * Remove copyright notices from temporal-workflowcheck files 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com> * Fix formatting * Revert accidental submodule updates * Publish both regular and shadow jars for workflowcheck - Regular jar for library usage and compile-time annotations - Shadow jar with -all suffix for standalone executable tool - Allows users to choose appropriate artifact based on needs 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com> * Replace readme todos with issue link * Update temporal-workflowcheck/build.gradle Co-authored-by: Chad Retz <chad.retz@gmail.com> * Revert "Update temporal-workflowcheck/build.gradle" This reverts commit e9c6754. * Fix temporal-workflowcheck test failure Update test expectation to match actual line number in Suppression.java and fix Gradle build to properly copy test source files as resources. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com> * Allow concurrent locks in workflow check Change from blanket disallowing all java/util/concurrent/locks to allowing them since they are widely used throughout the standard library in safe ways and the blanket restriction was impractical. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com> * Update gradle build * Fix temporal-workflowcheck CI failures - Update ASM from 9.6 to 9.7 to support Java 23 class files (major version 67) - Fix test resource loading by properly copying Java source files and properties - Replace unreliable sourceSets configuration with explicit Copy task - Ensure both JDK8 and latest Java CI environments work correctly 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com> --------- Co-authored-by: Chad Retz <chad@temporal.io> Co-authored-by: Chad Retz <chad.retz@gmail.com> Co-authored-by: Claude <noreply@anthropic.com>
1 parent 87b07f5 commit fecc156

File tree

50 files changed

+4137
-1
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

50 files changed

+4137
-1
lines changed

settings.gradle

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -10,4 +10,5 @@ include 'temporal-spring-boot-autoconfigure'
1010
include 'temporal-spring-boot-starter'
1111
include 'temporal-remote-data-encoder'
1212
include 'temporal-shaded'
13-
include 'temporal-envconfig'
13+
include 'temporal-workflowcheck'
14+
include 'temporal-envconfig'

temporal-workflowcheck/README.md

Lines changed: 308 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,308 @@
1+
# Temporal Workflow Check for Java
2+
3+
Temporal workflowcheck is a utility scans Java bytecode looking for workflow implementation methods that do invalid
4+
things. This mostly centers around
5+
[workflow logic constraints](https://docs.temporal.io/dev-guide/java/foundations#workflow-logic-requirements) that
6+
require workflows are deterministic. Currently it will catch when a workflow method does any of the following:
7+
8+
* Invokes a method that is configured as invalid (e.g. threading, IO, random, system time, etc)
9+
* Accesses a static field configured as invalid (e.g. `System.out`)
10+
* Accesses a non-final static field
11+
* Invokes a method that itself violates any of the above rules
12+
13+
With the last rule, that means this analyzer is recursive and gathers information transitively to ensure
14+
non-deterministic calls aren't made indirectly.
15+
16+
⚠️ BETA
17+
18+
This software is beta quality. We are gathering feedback before considering it stable.
19+
20+
## Running
21+
22+
### Prerequisites
23+
24+
* JDK 8+
25+
26+
### Running manually
27+
28+
The all-in-one JAR is best for running manually. Either download the latest version `-all.jar` from
29+
https://repo1.maven.org/maven2/io/temporal/temporal-workflowcheck or build via `gradlew :temporal-workflowcheck:build`
30+
then obtain `-all.jar` in `temporal-workflowcheck/build/libs`.
31+
32+
Simply running the following will show help text:
33+
34+
java -jar path/to/temporal-workflowcheck-<version>-all.jar --help
35+
36+
Replace `<version>` with the actual version. The `check` call runs the workflow check and it accepts classpath entries
37+
as arguments, for example:
38+
39+
java -jar path/to/temporal-workflowcheck-<version>-all.jar check path/to/my.jar path/to/my/classes/
40+
41+
The `check` command accepts the following arguments:
42+
43+
* `--config <config>` - Path to a `.properties` configuration file. Multiple `--config` arguments can be provided with
44+
the later overriding the earlier. See the [Configuration](#configuration) section for details.
45+
* `--no-default-config` - If present, the default configuration file will not be the implied first configuration file.
46+
* `--show-valid` - In addition to showing invalid workflow methods, also show which workflow methods are valid.
47+
* `<classpath...>` - All other arguments are classpath entries. This accepts the same values as `-cp` on `java`
48+
commands. Each entry can be a set of entries separated by platform-specific path separator (i.e. `;` for Windows or
49+
`:` for Nix), or prefixed with an `@` symbol saying it's a file with entries one per line, or just as separate
50+
arguments. They are all combined to one large classpath when running.
51+
52+
### Running in a Gradle project
53+
54+
See the [Gradle sample](samples/gradle).
55+
56+
### Running in a Maven project
57+
58+
See the [Maven sample](samples/maven).
59+
60+
### Running programmatically
61+
62+
The workflowcheck utility is also a library. The `io.temporal.workflowcheck.WorkflowCheck` class can be instantiated
63+
with a `io.temporal.workflowcheck.Config` and then `findWorkflowClasses` can be run with classpath entries. This will
64+
return details about every workflow method implementation found, including invalid pieces.
65+
66+
## Usage
67+
68+
To use workflowcheck effectively, users may have to add configuration and warning-suppression to properly handle false
69+
positives.
70+
71+
### Configuration
72+
73+
workflowcheck configuration is done via `.properties` file(s). The main use of configuration is to configure what the
74+
system considers an "invalid" method or field. Each property is in the format:
75+
76+
```
77+
temporal.workflowcheck.invalid.[[some/package/]ClassName.]memberName[(Lmethod/Descriptor;)V]=true|false
78+
```
79+
80+
The key names after `temporal.workflowcheck.invalid.` are known as "descriptor patterns" and these patterns are checked
81+
to see whether a call or field access is invalid. If the value is `true` the pattern is considered invalid and if it is
82+
`false` it is considered valid. When supplying properties files as configuration, the later-provided configuration keys
83+
overwrite the earlier keys. During checking, the more-specific patterns are checked first and the first, most-specific
84+
one to say whether it is valid or invalid is what is used. This means that, given the following two properties:
85+
86+
```
87+
temporal.workflowcheck.invalid.my/package/MyUnsafeClass=true
88+
temporal.workflowcheck.invalid.my/package/MyUnsafeClass.safeMethod=false
89+
```
90+
91+
Every method and static field on `my.package.MyUnsafeClass` is considered invalid _except_ for `safeMethod`.
92+
93+
The [implied default configuration](src/main/resources/io/temporal/workflowcheck/workflowcheck.properties) contains a
94+
good set of default invalid/valid configurations to catch most logic mistakes. Additional configurations can be more
95+
specific. For example, the default configuration disallows any calls on `java.lang.Thread`. But if, say, a failure is
96+
reported for `java.lang.Thread.getId()` but it is known to be used safely/deterministically by Temporal's definition,
97+
then a configuration file with the following will make it valid:
98+
99+
```
100+
temporal.workflowcheck.invalid.java/lang/Thread.getId=false
101+
```
102+
103+
When the system checks for valid/invalid, it checks the most-specific to least-specific (kinda), trying to find whether
104+
there is a key present (regardless of whether it is `true` or `false`) and it uses that value. For example, when the
105+
system encounters a call to `myString.indexOf("foo", 123)`, it will check for the following keys in order (the
106+
`temporal.workflowcheck.invalid.` prefix is removed for brevity):
107+
108+
* `java/lang/String.indexOf(Ljava/lang/String;I)`
109+
* `java/lang/String.indexOf`
110+
* `String.indexOf(Ljava/lang/String;I)`
111+
* `String.indexOf`
112+
* `indexOf(Ljava/lang/String;I)`
113+
* `indexOf`
114+
* `String`
115+
* `java/lang/String`
116+
* `java/lang`
117+
* `java`
118+
119+
The class name is the binary class name as defined by the JVM spec. The method descriptor is the method descriptor as
120+
defined by the JVM spec but with the return type removed (return types can be covariant across interfaces and therefore
121+
not useful for our strict checking).
122+
123+
Note, in order to support superclass/superinterface checking, if nothing is found for the type, the same method is
124+
checked against the superclass and superinterfaces. So technically `java/lang/Object.indexOf` would match even though
125+
that method does not exist. This is by intention to allow marking entire hierarchies of methods invalid (e.g.
126+
`Map.forEach=true` but `LinkedHashMap.forEach=false`).
127+
128+
There is advanced logic with inheritance and how the proper implementation of a method is determined including resolving
129+
interface default methods, but that is beyond this documentation. Users are encouraged to write tests confirming
130+
behavior of configuration keys.
131+
132+
### Suppressing warnings
133+
134+
Usually in Java when wanting to suppress warnings on source code, the `@SuppressWarnings` annotation in `java.lang` is
135+
used. However, workflowcheck operates on bytecode and that annotation is not preserved in bytecode. As an alternative,
136+
the `@WorkflowCheck.SuppressWarnings` annotation is available in `io.temporal.workflowcheck` that will ignore errors.
137+
For instance, one could have:
138+
139+
```java
140+
@WorkflowCheck.SuppressWarnings
141+
public long getCurrentMillis() {
142+
return System.currentTimeMillis();
143+
}
144+
```
145+
146+
This will now consider `getCurrentMillis` as valid regardless of what's inside it. Since the retention policy on the
147+
`@WorkflowCheck.SuppressWarnings` annotation is `CLASS`, it is not even required to be present at runtime. So the
148+
`workflowcheck` library can just be a compile-only dependency (i.e. `provided` scope in Maven or `compileOnly` in
149+
Gradle), the library is not needed at runtime.
150+
151+
the `@WorkflowCheck.SuppressWarnings` annotation provides an `invalidMembers` field that can be a set of the descriptor
152+
patterns mentioned in the [Configuration](#configuration) section above. When not set, every invalid piece is accepted,
153+
so users are encouraged to at least put the method/field name they want to allow so accidental suppression is avoided.
154+
That means the above snippet would become:
155+
156+
```java
157+
@WorkflowCheck.SuppressWarnings(invalidMembers = "currentTimeMillis")
158+
public long getCurrentMillis() {
159+
return System.currentTimeMillis();
160+
}
161+
```
162+
163+
_Technically_ there is an inline suppression approach that is a runtime no-op that is `WorkflowCheck.suppressWarnings()`
164+
invocation followed by `WorkflowCheck.restoreWarnings()` later. So the above _could_ be:
165+
166+
```java
167+
public long getCurrentMillis() {
168+
WorkflowCheck.suppressWarnings("currentTimeMillis");
169+
var l = System.currentTimeMillis();
170+
WorkflowCheck.restoreWarnings();
171+
return l;
172+
}
173+
```
174+
175+
However this is hard to use for a couple of reasons. First, the methods are evaluated when they are seen in bytecode,
176+
not in the order they appear in logic. `javac` bytecode ordering is not the same as source ordering. Second, this does
177+
require a runtime dependency on the workflowcheck library. Users are discouraged from ever using this and should use the
178+
annotation instead.
179+
180+
### Best practices
181+
182+
#### False positives
183+
184+
When encountering a false positive in a commonly used or third-party library, decide how far up the call stack the call
185+
is considered deterministic by Temporal's definition. Then configure the method as "valid".
186+
187+
When encountering a specific false positive in workflow code, consider moving it to its own method and adding
188+
`@WorkflowCheck.SuppressWarnings` for just that method (or just add that annotation on the method but target the
189+
specific call). Annotations can be better than using configuration files for small amounts of local workflow code
190+
because the configuration file can get really cluttered with single-workflow-specific code and using configuration makes
191+
it hard for code readers to see that it is intentionally marked as valid.
192+
193+
#### Collection iteration
194+
195+
By default, iterating any `Iterable` is considered unsafe with specific exceptions carved out for `LinkedHashMap`,
196+
`List`, `SortedMap`, and `SortedSet`. But in many cases, static analysis code cannot detect that something is safe. For
197+
example:
198+
199+
```
200+
var map = new TreeMap<>(Map.of("a", "b"));
201+
for (var entry : map.entrySet()) {
202+
// ...
203+
}
204+
```
205+
206+
The implicit `Set.iterator` call on the `entrySet` will be considered invalid, because `entrySet`'s type is `Set`. The
207+
same thing happens when a higher level collection type is used, for example:
208+
209+
```
210+
Collection<String> strings = new TreeSet<>(List.of("foo", "bar"));
211+
for (var string : strings) {
212+
// ...
213+
}
214+
```
215+
216+
In cases where the higher-level type can be used, try to use that. So in the above sample change to
217+
`SortedSet<String> strings`. If that is not available, wrapping as a list just for iteration is acceptable. Workflow
218+
performance is not the same as general Java code performance, so it is often totally reasonable to accept the hit on
219+
iteration. So for the first example, it could be written like so:
220+
221+
```
222+
var map = new TreeMap<>(Map.of("a", "b"));
223+
for (var entry : new ArrayList<>(map.entrySet())) {
224+
// ...
225+
}
226+
```
227+
228+
In advanced situations, warning-suppression approaches can be applied.
229+
230+
## Internals
231+
232+
The following sections give some insight into the development of workflowcheck.
233+
234+
### How it works
235+
236+
Workflowcheck works by scanning all non-standard-library classes on the classpath. When scanning, in addition to some
237+
other details, the following bits of information are collected for every method:
238+
239+
* Whether the method is a workflow declaration (e.g. interface methods with `@WorkflowMethod`)
240+
* Unsuppressed/unconfigured method invocations
241+
* Field accesses configured as invalid
242+
* Unsuppressed/unconfigured static field access
243+
244+
This intentionally, to avoid eager recursion issues, does not traverse the call graph eagerly.
245+
246+
Then for every method of every scanned class, it is checked whether it is a workflow method. This is done by checking if
247+
it contains a body and overrides any super interface workflow declaration at any level. For every method that is a
248+
workflow implementation, it is processed for invalidity.
249+
250+
The invalidity processor is a recursive call that checks a method for whether it is invalid. Specifically, it:
251+
252+
* Considers all invalid field accesses as invalid member accesses
253+
* Resolves target of all static field accesses and if the fields are non-final static fields, considers them invalid
254+
member accesses
255+
* Checks all method calls to see if they are invalid by:
256+
* Finding the most-specific configured descriptor pattern, using advanced most-specific logic when encountering
257+
ambiguous interface depth. If it is configured invalid, mark as such. Regardless of whether invalid or valid, if it
258+
was configured at all, do not go to the next step.
259+
* Resolve the most specific implementation of a method. Just because `Foo.bar()` is the method invocation doesn't mean
260+
`Foo` declares `bar()`, it may inherited. Advanced virtual resolution logic is used to find the first implementation
261+
in the hierarchy that it refers to. If/when resolved, that method is recursively checked for invalidity via this
262+
same processor (storing itself to prevent recursion) and if it's invalid, then so is this call.
263+
264+
This algorithm ensures that configuration can apply at multiple levels of hierarchy but transitive code-based method
265+
invalidity is only on the proper implementation. So if `Foo.bar()` is bad but `ExtendsFoo.bar()` is ok, the former does
266+
not report a false positive (unless of course `ExtendsFoo.bar()` invokes `super.bar()` which would transitively mark it
267+
as invalid).
268+
269+
During this resolution, the call graph is constructed with access to the class/method details for each transitive
270+
non-recursive invocation. Once complete, all the valid methods are trimmed to relieve memory pressure and all classes
271+
with workflow implementations properly contain their direct and indirect invalid member accesses.
272+
273+
The printer then prints these out.
274+
275+
### FAQ
276+
277+
**Why not use static analysis library X?**
278+
279+
One of the primary features of workflowcheck is to find whether a method is invalid transitively (i.e. building a call
280+
graph) across existing bytecode including the Java standard library. During research, no tool was found to be able to do
281+
this without significant effort or performance penalties. Approaches researched:
282+
283+
* Checkstyle, ErrorProne, PMD, etc - not built for transitive bytecode checking
284+
* Custom annotation processor - Bad caching across compilation units, JDK compiler API hard to use (have to add-opens
285+
for modules for sun compiler API, or have to use third party)
286+
* Soot/SootUp - Soot is too old, SootUp is undergoing new development but was still a bit rough when tried (e.g. failed
287+
when an annotation wasn't on the classpath)
288+
* ClassGraph - Does not say which methods call other methods (so not a call graph)
289+
* SemGrep - Does not seem to support recursive call-graph analysis on bytecode to find bad calls at arbitrary call
290+
depths
291+
* CodeQL - Too slow
292+
* Doop, jQAssistant, java-callgraph, etc - not up to date
293+
294+
Overall, walking the classpath using traditional, high-performance bytecode visiting via OW2 ASM is a good choice for
295+
this project's needs.
296+
297+
**Why use `.properties` files instead of a better configuration format?**
298+
299+
A goal of the workflowcheck project is to have as few dependencies as possible.
300+
301+
**Why not use more modern Java features in the code?**
302+
303+
The code is optimized for performance, so direct field access instead of encapsulation, looping instead of streaming,
304+
mutable objects instead of records, etc may be present. But the user-facing API does follow proper practices.
305+
306+
### Open Issues
307+
308+
https://github.com/temporalio/sdk-java/issues?q=is%3Aissue%20state%3Aopen%20workflowcheck
Lines changed: 64 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,64 @@
1+
plugins {
2+
id 'application'
3+
id 'com.gradleup.shadow' version '8.3.3'
4+
id 'java'
5+
}
6+
7+
description = 'Temporal Java WorkflowCheck Static Analyzer'
8+
9+
dependencies {
10+
implementation 'org.ow2.asm:asm:9.7'
11+
compileOnly 'com.google.code.findbugs:jsr305:3.0.2'
12+
testImplementation project(":temporal-sdk")
13+
testImplementation "junit:junit:${junitVersion}"
14+
// Only for testing external-JAR-based bad calls
15+
testImplementation "com.google.guava:guava:$guavaVersion"
16+
}
17+
18+
application {
19+
mainClass = 'io.temporal.workflowcheck.Main'
20+
}
21+
22+
// Need all-in-one JAR
23+
shadowJar {
24+
relocate 'org.objectweb.asm', 'io.temporal.workflowcheck.shaded.org.objectweb.asm'
25+
archiveClassifier = ''
26+
}
27+
build.dependsOn shadowJar
28+
29+
// Configure publishing to publish both regular library jar and shadow executable jar
30+
publishing {
31+
publications {
32+
// Regular library jar for programmatic usage and compile-time annotations
33+
maven(MavenPublication) {
34+
from components.java
35+
}
36+
// Fat executable jar with shaded dependencies
37+
shadow(MavenPublication) { publication ->
38+
project.shadow.component(publication)
39+
artifactId = "${project.name}-all"
40+
}
41+
}
42+
}
43+
44+
// Fix dependency issue with shadow publication metadata generation
45+
tasks.named('generateMetadataFileForShadowPublication').configure {
46+
dependsOn 'jar'
47+
}
48+
49+
// Copy Java test source files to resources so they can be loaded at runtime
50+
tasks.register('copyJavaSourcesToResources', Copy) {
51+
from('src/test/java') {
52+
include '**/*.java'
53+
include '**/*.properties'
54+
}
55+
into 'build/resources/test'
56+
}
57+
processTestResources.dependsOn copyJavaSourcesToResources
58+
59+
60+
spotless {
61+
java {
62+
toggleOffOn()
63+
}
64+
}
Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
gradle/build
2+
gradle-multi-project/project-app/build
3+
gradle-multi-project/project-workflows/build

0 commit comments

Comments
 (0)