Commit ac909a0
fix(operator): correct regex escaping in WordCloud operator (#4261)
### What changes were proposed in this PR?
Fixed two issues in `WordCloudOpDesc.scala`:
1. **Regex escaping bug**: The `pyb` refactor in #4189 changed
`manipulateTable()` from `s"..."` to `pyb"""..."""`, but the regex `\\w`
was not adjusted. In `s"..."`, `\\w` is an escape sequence producing
`\w`. In triple-quoted `pyb"""..."""`, backslashes are literal, so `\\w`
stays as `\\w` — producing `r'\\w'` in Python, which matches a literal
backslash + `w` instead of word characters. This caused all rows to be
filtered out, resulting in: *"text column does not contain words or
contains only nulls."* Fixed by changing to `\w`.
2. **Duplicate statement**: Removed a duplicate `Map(...)` line in
`getOutputSchemas`.
Added unit tests to verify the regex pattern is correct.
### Any related issues, documentation, discussions?
Regression introduced by #4189.
### How was this PR tested?
Added `WordCloudOpDescSpec` with tests that verify:
- `manipulateTable()` uses `r'\w'` (not `r'\\w'`)
- Text column name appears in generated code
All tests pass.
### Was this PR authored or co-authored using generative AI tooling?
Generated-by: Claude Code (Claude Opus 4.6)
---------
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>1 parent 206f3f3 commit ac909a0
2 files changed
Lines changed: 58 additions & 2 deletions
File tree
- common/workflow-operator/src
- main/scala/org/apache/texera/amber/operator/visualization/wordCloud
- test/scala/org/apache/texera/amber/operator/visualization/wordCloud
Lines changed: 1 addition & 2 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
52 | 52 | | |
53 | 53 | | |
54 | 54 | | |
55 | | - | |
56 | 55 | | |
57 | 56 | | |
58 | 57 | | |
| |||
67 | 66 | | |
68 | 67 | | |
69 | 68 | | |
70 | | - | |
| 69 | + | |
71 | 70 | | |
72 | 71 | | |
73 | 72 | | |
| |||
Lines changed: 57 additions & 0 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
| 1 | + | |
| 2 | + | |
| 3 | + | |
| 4 | + | |
| 5 | + | |
| 6 | + | |
| 7 | + | |
| 8 | + | |
| 9 | + | |
| 10 | + | |
| 11 | + | |
| 12 | + | |
| 13 | + | |
| 14 | + | |
| 15 | + | |
| 16 | + | |
| 17 | + | |
| 18 | + | |
| 19 | + | |
| 20 | + | |
| 21 | + | |
| 22 | + | |
| 23 | + | |
| 24 | + | |
| 25 | + | |
| 26 | + | |
| 27 | + | |
| 28 | + | |
| 29 | + | |
| 30 | + | |
| 31 | + | |
| 32 | + | |
| 33 | + | |
| 34 | + | |
| 35 | + | |
| 36 | + | |
| 37 | + | |
| 38 | + | |
| 39 | + | |
| 40 | + | |
| 41 | + | |
| 42 | + | |
| 43 | + | |
| 44 | + | |
| 45 | + | |
| 46 | + | |
| 47 | + | |
| 48 | + | |
| 49 | + | |
| 50 | + | |
| 51 | + | |
| 52 | + | |
| 53 | + | |
| 54 | + | |
| 55 | + | |
| 56 | + | |
| 57 | + | |
0 commit comments