Fix thread safety in dissect processor by eliminating mutable shared state#6858
Fix thread safety in dissect processor by eliminating mutable shared state#6858shik-25 wants to merge 1 commit into
Conversation
…state Signed-off-by: Shikhar Dhawale <shikhar_dhawale@apple.com>
Benchmark: GC allocation pressure before vs after thread-safety fixSetup: JMH,
What changed: The new code allocates local Verdict: The +408 bytes/op is the honest cost of the fix. The +25% allocation rate increase is proportional - 6.7% more throughput × 17.5% more per op. GC pause time per cycle is slightly lower (-4%), meaning the JVM handles the extra allocation efficiently. Throughput improves because shared-state contention is eliminated under concurrent execution |
| public List<Field> getDissectedFields(){ | ||
| final List<Field> dissectedFields = new ArrayList<>(); | ||
| Map<String, AppendField> appendFieldMap = getAppendedFields(unAppendedFieldsMap); | ||
| private Map<String, String> getDissectedFields(Map<Field, String> localValues) { |
There was a problem hiding this comment.
Design note on Map<Field, String> localValues:
localValues uses Field objects as map keys without overriding
equals/hashCode, relying on identity semantics. This is intentional:
AppendFieldallows multiple instances with the same logical key (e.g.,{+field1}appearing twice in a pattern). Implementingequals/hashCodeby key would cause silent overwrites inlocalValues.- Field objects are never copied or rebuilt — the same instances from
normalFieldMap,indirectFieldMap, etc. are used for bothputandget, so identity-based lookup is always correct.
Identity semantics are the right tradeoff here.
dlvenable
left a comment
There was a problem hiding this comment.
Thank you @shik-25 for this contribution! This looks like a good improvement, but I do see a possible increase in memory pressure from your benchmarks.
I left a few comments on the design and some ideas to try to reduce memory pressure as well.
| @@ -12,13 +12,8 @@ | |||
| public abstract class Field { | |||
| boolean stripTrailing = false; | |||
| private String key; | |||
There was a problem hiding this comment.
Let's remove setKey and instead make a package-protected constructor. Right now, only the inheriting constructors (e.g. AppendField) call setKey. We can pass it in the constructor and also make the field final to ensure this doesn't get used incorrectly.
| if(fieldString==null) { | ||
| return; | ||
| } | ||
| Field field = fieldHelper.getField(fieldString); |
There was a problem hiding this comment.
I think this code should become something like this:
Field lastField = fieldsList.size() > 0 ? fieldsList.getLast() : null;
Field field = fieldHelper.getField(fieldString, lastField);
fieldsList.addLast(field);
You add a nullable Field to the FieldHelper. This has the benefit of allow us to make Field::setNext() package protected.
Also, you can add a check on setField so that it throws if it is called after having received a next value. This can help us find subtle errors that may be introduced.
| } catch (Exception e) { | ||
| return false; | ||
| if (head == null) { | ||
| return null; |
There was a problem hiding this comment.
This is a good defensive check, but maybe it could hide some subtle errors. Perhaps add a good ERROR log about this.
| useJUnitPlatform() | ||
| } | ||
|
|
||
| tasks.withType(Zip).configureEach { |
| @Setup | ||
| public void setUp() { | ||
| dissector = new Dissector("%{timestamp} %{level} %{message}"); | ||
| input = "2024-01-15 ERROR service crashed"; |
There was a problem hiding this comment.
It would probably be ideal to also have a benchmark that includes more scenarios like append fields, indirect fields, and strip-trailing. But this isn't necessary for the PR.
| if (!computeDelimiterPositions(text, delimStarts, delimEnds, n)) { | ||
| return null; | ||
| } | ||
| Map<Field, String> localValues = new HashMap<>(); // keyed by identity — same Field instances used for put and get |
There was a problem hiding this comment.
I believe you can set this with an initial size: new HashMap(delimiterList.size()). This may help in some cases to reduce memory and allocations.
| final List<Field> dissectedFields = new ArrayList<>(); | ||
| Map<String, AppendField> appendFieldMap = getAppendedFields(unAppendedFieldsMap); | ||
| private Map<String, String> getDissectedFields(Map<Field, String> localValues) { | ||
| final Map<String, String> results = new HashMap<>(); |
There was a problem hiding this comment.
This also looks like you can gauge the size in advance:
new HashMap<>(normalFieldMap.size() + appendFieldMap.size() + unAppendedFieldsMap.size())
There are some conditions on null for normalFieldMap. But this is probably OK.
| for (Map.Entry<String, List<AppendField>> entry : unAppendedFieldsMap.entrySet()) { | ||
| List<AppendField> copy = new ArrayList<>(entry.getValue()); | ||
| Collections.sort(copy); | ||
| String value = copy.stream() |
There was a problem hiding this comment.
The copy up to this point is going to be the same for all iterations since it is based on unAppendedFieldsMap. Can we also pre-sort this in the constructor?
Description
Fixes a thread-safety bug in the dissect processor by eliminating shared mutable state in
Dissector. Previously,dissectText()mutated sharedDelimiter.start/endandField.valueon every invocation, causing data races when multiple threads processed events concurrently. The@SingleThreadannotation was a workaround that limited throughput unnecessarily.Fix
dissectText()now uses localint[]arrays for delimiter positions and a localMap<Field, String>for field values instead of mutating shared objects.normalFieldMap,indirectFieldMap, etc.) are read-only after construction and wrapped inCollections.unmodifiableMap().dissectText()returnsMap<String, String>directly, replacing the separategetDissectedFields()call.@SingleThreadis removed.Issues Resolved
Resolves #5546
Check List
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.