OBSDA-1383: Make Splunk output sourcetype configurable in CLF#3251
OBSDA-1383: Make Splunk output sourcetype configurable in CLF#3251marpears wants to merge 8 commits intoopenshift:masterfrom
Conversation
|
/hold |
jcantrill
left a comment
There was a problem hiding this comment.
There is no functional test to support this addition. Also note we do not backport features to earlier releases
jcantrill
left a comment
There was a problem hiding this comment.
Please add e2e validation tests and at least one functional test
|
/ok-to-test |
|
@marpears: This pull request references OBSDA-1383 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the feature to target the "4.8.0" version, but no target version was set. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
/retest |
|
@marpears You will need to rebase and force push. |
1fc8d94 to
b04ac06
Compare
|
/retest |
|
/label tide/squash-merge-method |
|
@jcantrill: The label(s) DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
|
/label tide/merge-method-squash |
b04ac06 to
c55ba44
Compare
|
/approve |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: jcantrill, marpears The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
… of sourceType use to docs
…e defined. Docs updated. Added api validation, e2e and functional tests.
…abel. Added an API validation test where sourceType is templated, and fixed single quote bug in API validation tests during oc apply.
c55ba44 to
48c7394
Compare
|
Warning Rate limit exceeded
To keep reviews running without waiting, you can enable usage-based add-on for your organization. This allows additional reviews beyond the hourly cap. Account admins can enable it under billing. ⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: Repository: openshift/coderabbit/.coderabbit.yaml Review profile: CHILL Plan: Enterprise Run ID: 📒 Files selected for processing (6)
WalkthroughThis PR adds support for an optional ChangesSplunk SourceType Feature
Sequence DiagramsequenceDiagram
participant User as User/Admin
participant APIServer as Kubernetes API Server
participant Validator as Field Validator
participant Generator as Vector Generator
participant Templates as Transform Templates
participant Vector as Vector Agent
participant Splunk as Splunk HEC
User->>APIServer: Create ClusterLogForwarder with sourceType + payloadKey
APIServer->>Validator: Validate XValidation rule
Validator-->>APIServer: ✓ sourceType allowed (payloadKey exists)
APIServer-->>User: Resource created
Generator->>Generator: Check PayloadKey & SourceType config
alt Has PayloadKey + SourceType
Generator->>Templates: Select payloadKeysourceTypeTmpl
else Has PayloadKey only
Generator->>Templates: Select payloadKeyTmpl
else No PayloadKey
Generator->>Templates: Use default ("_json")
end
Templates->>Vector: Deploy transforms & sink config
Vector->>Vector: Parse timestamp
Vector->>Vector: Extract source from log_type/log_source
Vector->>Vector: Derive sourcetype (label or config)
Vector->>Vector: Restructure event around payloadKey
Vector->>Splunk: Send JSON with templated source/sourcetype
Splunk-->>Vector: ✓ Event indexed
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes 🚥 Pre-merge checks | ✅ 10 | ❌ 2❌ Failed checks (2 warnings)
✅ Passed checks (10 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Tip 💬 Introducing Slack Agent: The best way for teams to turn conversations into code.Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.
Built for teams:
One agent for your entire SDLC. Right inside Slack. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 6
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@api/observability/v1/output_types.go`:
- Around line 1359-1362: The kubebuilder validation pattern on the SourceType
field (the struct tag on SourceType string `json:"sourceType,omitempty"`) omits
':' in the static character class causing valid colon-separated sourcetypes to
be rejected; update the +kubebuilder:validation:Pattern annotation to include
':' inside the static class (e.g. add : to `[a-zA-Z0-9-_.\/]`) so values like
"my:custom:sourcetype" validate, and also correct the doc comment/example that
shows `"log4j"` (remove the stray quotes) so examples match the intended valid
values.
In `@bundle/manifests/observability.openshift.io_clusterlogforwarders.yaml`:
- Line 3912: The example value currently shows a quoted string `"log4j"` which
violates the declared regex expecting an unquoted literal; update the example
list item to use an unquoted entry (log4j) so it matches the validation pattern
and remove the surrounding quotes from the value shown in the manifest example
for the ClusterLogForwarder configuration.
- Line 3919: The sourceType validation pattern is too restrictive and disallows
colons (:) causing valid Splunk sourcetypes like "my:custom:sourcetype" to be
rejected; update the regex used for the sourceType field (the pattern shown) to
include ":" in the allowed character class (i.e., add ":" to the character class
that currently contains a-zA-Z0-9-_.\/) so static values with colons pass API
validation while preserving existing escaping/grouping logic.
In `@config/crd/bases/observability.openshift.io_clusterlogforwarders.yaml`:
- Around line 3910-3913: The example value for the sourceType field contains
quotes which violate the field's validation regex; update the example in the
ClusterLogForwarder CRD (the `sourceType` example lines shown as `2. "log4j"`)
to remove the quotes so it matches the pattern (e.g., change `2. "log4j"` to `2.
log4j`) and ensure any other static-literal examples in that `sourceType`
example block follow the same unquoted format.
In `@docs/features/logforwarding/outputs/splunk-forwarding.adoc`:
- Around line 57-60: Update the splunk-forwarding docs to clarify that
sourceType defaults to `_json` only when payloadKey is not set and otherwise
resolves to `generic_single_line` (adjust text near `sourceType` and the
conditional description around `payloadKey`), standardize the naming to use
`splunk/sourcetype` everywhere (replace `splunk_sourcetype` occurrences), and
fix the grammar in the `compression` line to read “available are: `none`,
`gzip`.” Ensure references to `payloadKey`, `sourceType`, and `compression` are
consistent across the other referenced sections (lines 76–78, 97–98, 245–249).
In `@test/e2e/collection/apivalidations/splunk-sourcetype.yaml`:
- Around line 22-23: The fixture in splunk-sourcetype.yaml sets sourceType
(sourceType: log4j) but omits the required cross-field payloadKey, causing CRD
validation to fail; update the manifest to include a payloadKey entry paired
with the existing sourceType (i.e., add a payloadKey field alongside sourceType)
so it satisfies the Splunk cross-field validation rule.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: 03c4129b-d332-454e-b2ff-da63fd0e1a1f
📒 Files selected for processing (18)
api/observability/v1/output_types.gobundle/manifests/cluster-logging.clusterserviceversion.yamlbundle/manifests/observability.openshift.io_clusterlogforwarders.yamlconfig/crd/bases/observability.openshift.io_clusterlogforwarders.yamlconfig/manifests/bases/cluster-logging.clusterserviceversion.yamldocs/features/logforwarding/outputs/splunk-forwarding.adocinternal/generator/vector/output/splunk/splunk.gointernal/generator/vector/output/splunk/splunk_sink_payloadkey.tomlinternal/generator/vector/output/splunk/splunk_sink_with_payloadkey_and_sourcetype.tomlinternal/generator/vector/output/splunk/splunk_sink_with_payloadkey_and_static_sourcetype.tomlinternal/generator/vector/output/splunk/splunk_test.gotest/e2e/collection/apivalidations/api_validations_test.gotest/e2e/collection/apivalidations/splunk-payloadkey-and-sourcetype.yamltest/e2e/collection/apivalidations/splunk-payloadkey-and-templated-sourcetype.yamltest/e2e/collection/apivalidations/splunk-payloadkey.yamltest/e2e/collection/apivalidations/splunk-sourcetype.yamltest/e2e/collection/apivalidations/splunk-templated-sourcetype.yamltest/functional/outputs/splunk/forward_to_splunk_metadata_test.go
💤 Files with no reviewable changes (1)
- internal/generator/vector/output/splunk/splunk_sink_payloadkey.toml
| // +kubebuilder:validation:Optional | ||
| // +kubebuilder:validation:Pattern:=`^(([a-zA-Z0-9-_.\/])*(\{(\.[a-zA-Z0-9_]+|\."[^"]+")+((\|\|)(\.[a-zA-Z0-9_]+|\.?"[^"]+")+)*\|\|"[^"]*"\})*)*$` | ||
| // +operator-sdk:csv:customresourcedefinitions:type=spec,displayName="SourceType",xDescriptors={"urn:alm:descriptor:com.tectonic.ui:text"} | ||
| SourceType string `json:"sourceType,omitempty"` |
There was a problem hiding this comment.
SourceType pattern validation rejects colon — blocks the primary Splunk naming convention.
The pattern's static character class [a-zA-Z0-9-_.\/] does not include : (colon). The old Splunk style uses underscore separators (e.g., access_combined) while the new style uses colon separators (e.g., ibm:ldap:audit), and the tradition is to use a single colon to denote the hierarchical levels from least specific to most specific — the software product is listed first, then the specific component of the product. Real-world examples like cisco:esa:textmail, zeek:conn:json, and the PR description's own example my:custom:sourcetype would all be rejected by the current pattern.
A user wanting a static colon-separated sourcetype has no clean workaround: my:custom:sourcetype fails validation, and the template syntax cannot encode a bare static string with colons either.
🐛 Proposed fix — add : to the static character class
- // +kubebuilder:validation:Pattern:=`^(([a-zA-Z0-9-_.\/])*(\{(\.[a-zA-Z0-9_]+|\."[^"]+")+((\|\|)(\.[a-zA-Z0-9_]+|\.?"[^"]+")+)*\|\|"[^"]*"\})*)*$`
+ // +kubebuilder:validation:Pattern:=`^(([a-zA-Z0-9-_.:\/])*(\{(\.[a-zA-Z0-9_]+|\."[^"]+")+((\|\|)(\.[a-zA-Z0-9_]+|\.?"[^"]+")+)*\|\|"[^"]*"\})*)*$`Also note that example 2 in the doc comment ("log4j" with literal double-quotes) would not match the pattern's static portion either. The intended value is simply log4j (no quotes), which does pass. The comment is misleading and should be corrected.
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| // +kubebuilder:validation:Optional | |
| // +kubebuilder:validation:Pattern:=`^(([a-zA-Z0-9-_.\/])*(\{(\.[a-zA-Z0-9_]+|\."[^"]+")+((\|\|)(\.[a-zA-Z0-9_]+|\.?"[^"]+")+)*\|\|"[^"]*"\})*)*$` | |
| // +operator-sdk:csv:customresourcedefinitions:type=spec,displayName="SourceType",xDescriptors={"urn:alm:descriptor:com.tectonic.ui:text"} | |
| SourceType string `json:"sourceType,omitempty"` | |
| // +kubebuilder:validation:Optional | |
| // +kubebuilder:validation:Pattern:=`^(([a-zA-Z0-9-_.:\/])*(\{(\.[a-zA-Z0-9_]+|\."[^"]+")+((\|\|)(\.[a-zA-Z0-9_]+|\.?"[^"]+")+)*\|\|"[^"]*"\})*)*$` | |
| // +operator-sdk:csv:customresourcedefinitions:type=spec,displayName="SourceType",xDescriptors={"urn:alm:descriptor:com.tectonic.ui:text"} | |
| SourceType string `json:"sourceType,omitempty"` |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@api/observability/v1/output_types.go` around lines 1359 - 1362, The
kubebuilder validation pattern on the SourceType field (the struct tag on
SourceType string `json:"sourceType,omitempty"`) omits ':' in the static
character class causing valid colon-separated sourcetypes to be rejected; update
the +kubebuilder:validation:Pattern annotation to include ':' inside the static
class (e.g. add : to `[a-zA-Z0-9-_.\/]`) so values like "my:custom:sourcetype"
validate, and also correct the doc comment/example that shows `"log4j"` (remove
the stray quotes) so examples match the intended valid values.
✅ Addressed in commit 17bcba0
There was a problem hiding this comment.
@vparfonov @Clee2691 This pattern matcher is restricted by VRL. Do we have any idea if VRL would allow us to add a colon for this type? Given our template utilities it may complicate the other fields and we may need to think about how we would accept this suggestion?
There was a problem hiding this comment.
Seems like the humans are having a chat. I'll hop back into my burrow for now. If you need me again, just tag @coderabbitai in a new comment, and I'll come hopping out!
| 4. {.foo||.bar||"missing"} | ||
|
|
||
| 5. foo.{.bar.baz||.qux.quux.corge||.grault||"nil"}-waldo.fred{.plugh||"none"} | ||
| pattern: ^(([a-zA-Z0-9-_.\/])*(\{(\.[a-zA-Z0-9_]+|\."[^"]+")+((\|\|)(\.[a-zA-Z0-9_]+|\.?"[^"]+")+)*\|\|"[^"]*"\})*)*$ |
There was a problem hiding this comment.
sourceType regex is too restrictive for common Splunk sourcetypes
Line 3919 blocks static values containing :, so values like my:custom:sourcetype (the core use case) will be rejected by API validation.
Suggested fix
- pattern: ^(([a-zA-Z0-9-_.\/])*(\{(\.[a-zA-Z0-9_]+|\."[^"]+")+((\|\|)(\.[a-zA-Z0-9_]+|\.?"[^"]+")+)*\|\|"[^"]*"\})*)*$
+ pattern: ^(([a-zA-Z0-9\-_.\/:])*(\{(\.[a-zA-Z0-9_]+|\."[^"]+")+((\|\|)(\.[a-zA-Z0-9_]+|\.?"[^"]+")+)*\|\|"[^"]*"\})*)*$📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| pattern: ^(([a-zA-Z0-9-_.\/])*(\{(\.[a-zA-Z0-9_]+|\."[^"]+")+((\|\|)(\.[a-zA-Z0-9_]+|\.?"[^"]+")+)*\|\|"[^"]*"\})*)*$ | |
| pattern: ^(([a-zA-Z0-9\-_.\/:])*(\{(\.[a-zA-Z0-9_]+|\."[^"]+")+((\|\|)(\.[a-zA-Z0-9_]+|\.?"[^"]+")+)*\|\|"[^"]*"\})*)*$ |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@bundle/manifests/observability.openshift.io_clusterlogforwarders.yaml` at
line 3919, The sourceType validation pattern is too restrictive and disallows
colons (:) causing valid Splunk sourcetypes like "my:custom:sourcetype" to be
rejected; update the regex used for the sourceType field (the pattern shown) to
include ":" in the allowed character class (i.e., add ":" to the character class
that currently contains a-zA-Z0-9-_.\/) so static values with colons pass API
validation while preserving existing escaping/grouping logic.
|
/retest |
|
Hi @jcantrill, I think it would be worth extending this feature with a special option of According to the Vector docs, it does not mandate sourcetype. Splunk will use a sourcetype to httpevent if not set, so using Do you agree? |
|
@marpears: The following test failed, say
Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
What does it mean to enforce it on the the receiver side? What is the receiver behavior? Does it still accept the logs forwarded by the collector? |
Hi @jcantrill, I've done a test which confirmed that when the collector doesn't define a source type, Splunk accepts the log event and uses a default source type of But on reflection, I think the complexities and use case of a special option to not set the source type in the collector needs some further thought. I do not want to detract from the core objective of this PR, which was to allow for a user-defined source type in the collector, so I'll park this idea. |
We discussed this in a team meeting and I do also don't like the idea of a "magic" word. My suggested alternative was to test an empty string. What does that mean to splunk? I am assuming that vector will send an empty string if configured which is different than configurating nothing and getting 'httpevent' |
I've given that scenario a try, with the source type set to an empty string in the collector, it resulted in Splunk using a source type of an empty string. I did another test to confirm the source type needs to be completely omitted from the sink output config in the collector for Splunk to assign a source type. My test was done with a simple unstructured log message, and Splunk assigned the source type of |
Description
This PR allows for configuration of the Splunk output source type in the CLF using a new
sourceTypefield. This is so that we can support users who have defined custom source types in Splunk.The
sourceTypefield can be used only whenpayloadKeyis defined, and allows for a templated value so that it can be retrieved from a pod label. IfsourceTypeis not defined, the current logic is preserved where it defaults to _json. If usingpayloadKeywithoutsourceType, the source type used will be either _json or generic_single_line, depending on the structure of the final event payload.CLF configuration example:
/cc @Clee2691 @cahartma
/assign @jcantrill
Links