OBSDA-1383: Make Splunk output sourcetype configurable in CLF#3251
Conversation
|
/hold |
jcantrill
left a comment
There was a problem hiding this comment.
There is no functional test to support this addition. Also note we do not backport features to earlier releases
jcantrill
left a comment
There was a problem hiding this comment.
Please add e2e validation tests and at least one functional test
|
/ok-to-test |
|
@marpears: This pull request references OBSDA-1383 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the feature to target the "4.8.0" version, but no target version was set. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
/retest |
|
@marpears You will need to rebase and force push. |
1fc8d94 to
b04ac06
Compare
|
/retest |
|
/label tide/squash-merge-method |
|
@jcantrill: The label(s) DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
|
/label tide/merge-method-squash |
b04ac06 to
c55ba44
Compare
|
/approve |
|
Hi @jcantrill, I think it would be worth extending this feature with a special option of According to the Vector docs, it does not mandate sourcetype. Splunk will use a sourcetype to httpevent if not set, so using Do you agree? |
What does it mean to enforce it on the the receiver side? What is the receiver behavior? Does it still accept the logs forwarded by the collector? |
Hi @jcantrill, I've done a test which confirmed that when the collector doesn't define a source type, Splunk accepts the log event and uses a default source type of But on reflection, I think the complexities and use case of a special option to not set the source type in the collector needs some further thought. I do not want to detract from the core objective of this PR, which was to allow for a user-defined source type in the collector, so I'll park this idea. |
We discussed this in a team meeting and I do also don't like the idea of a "magic" word. My suggested alternative was to test an empty string. What does that mean to splunk? I am assuming that vector will send an empty string if configured which is different than configurating nothing and getting 'httpevent' |
I've given that scenario a try, with the source type set to an empty string in the collector, it resulted in Splunk using a source type of an empty string. I did another test to confirm the source type needs to be completely omitted from the sink output config in the collector for Splunk to assign a source type. My test was done with a simple unstructured log message, and Splunk assigned the source type of |
I believe I interpret your test as:
I am having difficulty finding reference documentation that may explicitly state whether sourceType is required but did find some documentation that advises to use the enterprise interface to modify the receiver config. My interpretation is an admin can fix it after the fact. Given my opinions regarding the need to set this value in context of various workloads on a kubernetes cluster, I would lean towards disregarding the collector configuration you proposed regarding 'none'. Admins have several options with the existing ClusterLogForwarder solution that should meet the broader needs:
The more urgent issue is determining if we can allow |
4e7fcdf to
0b9d504
Compare
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@docs/features/logforwarding/outputs/splunk-forwarding.adoc`:
- Line 76: Fix the typo "stuctured" to "structured" in the sentence that
describes the `sourceType` field being defined in the CLF as a templated field
(the line mentioning "The `sourceType` field can be defined in the CLF as a
templated field to get its value from the stuctured log event, such as a pod
label."). Update that single word so the sentence reads "structured log event".
In `@test/e2e/collection/apivalidations/splunk-templated-source-1.yaml`:
- Line 22: Update the test fixture to use the new Splunk output field name:
replace the templated key "source" with "sourceType" (i.e., change source:
'{.kubernetes.labels."splunk/source"||"generic_single_line"}' to use sourceType)
and update the label reference to match the intended field ("splunk/sourcetype")
so the fixture validates the templated sourceType behavior.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: 12db8fff-31b0-44fb-ac08-6209b61c0ac0
📒 Files selected for processing (22)
api/observability/v1/output_types.gobundle/manifests/cluster-logging.clusterserviceversion.yamlbundle/manifests/observability.openshift.io_clusterlogforwarders.yamlconfig/crd/bases/observability.openshift.io_clusterlogforwarders.yamlconfig/manifests/bases/cluster-logging.clusterserviceversion.yamldocs/features/logforwarding/outputs/splunk-forwarding.adocinternal/generator/vector/output/splunk/splunk.gointernal/generator/vector/output/splunk/splunk_sink_payloadkey.tomlinternal/generator/vector/output/splunk/splunk_sink_with_payloadkey_and_sourcetype.tomlinternal/generator/vector/output/splunk/splunk_sink_with_payloadkey_and_static_sourcetype.tomlinternal/generator/vector/output/splunk/splunk_test.gotest/e2e/collection/apivalidations/api_validations_test.gotest/e2e/collection/apivalidations/splunk-payloadkey-and-sourcetype.yamltest/e2e/collection/apivalidations/splunk-payloadkey-and-templated-sourcetype-1.yamltest/e2e/collection/apivalidations/splunk-payloadkey-and-templated-sourcetype-2.yamltest/e2e/collection/apivalidations/splunk-payloadkey.yamltest/e2e/collection/apivalidations/splunk-source.yamltest/e2e/collection/apivalidations/splunk-sourcetype.yamltest/e2e/collection/apivalidations/splunk-templated-source-1.yamltest/e2e/collection/apivalidations/splunk-templated-source-2.yamltest/e2e/collection/apivalidations/splunk-templated-sourcetype.yamltest/functional/outputs/splunk/forward_to_splunk_metadata_test.go
💤 Files with no reviewable changes (1)
- internal/generator/vector/output/splunk/splunk_sink_payloadkey.toml
🚧 Files skipped from review as they are similar to previous changes (12)
- internal/generator/vector/output/splunk/splunk_test.go
- bundle/manifests/cluster-logging.clusterserviceversion.yaml
- test/e2e/collection/apivalidations/splunk-payloadkey-and-sourcetype.yaml
- config/manifests/bases/cluster-logging.clusterserviceversion.yaml
- test/e2e/collection/apivalidations/api_validations_test.go
- api/observability/v1/output_types.go
- test/e2e/collection/apivalidations/splunk-payloadkey.yaml
- internal/generator/vector/output/splunk/splunk.go
- test/e2e/collection/apivalidations/splunk-templated-sourcetype.yaml
- internal/generator/vector/output/splunk/splunk_sink_with_payloadkey_and_static_sourcetype.toml
- internal/generator/vector/output/splunk/splunk_sink_with_payloadkey_and_sourcetype.toml
- bundle/manifests/observability.openshift.io_clusterlogforwarders.yaml
Hi @jcantrill, Just to clarify the previous test:
That aside, I have made further changes to allow |
… of sourceType use to docs
…e defined. Docs updated. Added api validation, e2e and functional tests.
…abel. Added an API validation test where sourceType is templated, and fixed single quote bug in API validation tests during oc apply.
…api validation and functional tests
22bf7fe to
99d3e3f
Compare
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@internal/generator/vector/output/splunk/splunk.go`:
- Around line 125-127: Replace the nested fmt.Sprintf + builder.WriteString
calls with direct writes using fmt.Fprintf to avoid the temporary allocation: in
the block that currently calls
builder.WriteString(fmt.Sprintf(payloadKeysourceTypeTmpl, quotedPathArray,
commontemplate.TransformUserTemplateToVRL(o.Splunk.SourceType))) and the else
that calls builder.WriteString(fmt.Sprintf(payloadKeyTmpl, quotedPathArray)),
call fmt.Fprintf(&builder, ...) with the same format and args (use
payloadKeysourceTypeTmpl and payloadKeyTmpl and pass quotedPathArray and the
result of commontemplate.TransformUserTemplateToVRL(o.Splunk.SourceType) where
appropriate) so the strings.Builder is written to directly.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: 3148a056-dd00-496e-a035-fda17d1219f9
📒 Files selected for processing (22)
api/observability/v1/output_types.gobundle/manifests/cluster-logging.clusterserviceversion.yamlbundle/manifests/observability.openshift.io_clusterlogforwarders.yamlconfig/crd/bases/observability.openshift.io_clusterlogforwarders.yamlconfig/manifests/bases/cluster-logging.clusterserviceversion.yamldocs/features/logforwarding/outputs/splunk-forwarding.adocinternal/generator/vector/output/splunk/splunk.gointernal/generator/vector/output/splunk/splunk_sink_payloadkey.tomlinternal/generator/vector/output/splunk/splunk_sink_with_payloadkey_and_sourcetype.tomlinternal/generator/vector/output/splunk/splunk_sink_with_payloadkey_and_static_sourcetype.tomlinternal/generator/vector/output/splunk/splunk_test.gotest/e2e/collection/apivalidations/api_validations_test.gotest/e2e/collection/apivalidations/splunk-payloadkey-and-sourcetype.yamltest/e2e/collection/apivalidations/splunk-payloadkey-and-templated-sourcetype-1.yamltest/e2e/collection/apivalidations/splunk-payloadkey-and-templated-sourcetype-2.yamltest/e2e/collection/apivalidations/splunk-payloadkey.yamltest/e2e/collection/apivalidations/splunk-source.yamltest/e2e/collection/apivalidations/splunk-sourcetype.yamltest/e2e/collection/apivalidations/splunk-templated-source-1.yamltest/e2e/collection/apivalidations/splunk-templated-source-2.yamltest/e2e/collection/apivalidations/splunk-templated-sourcetype.yamltest/functional/outputs/splunk/forward_to_splunk_metadata_test.go
💤 Files with no reviewable changes (1)
- internal/generator/vector/output/splunk/splunk_sink_payloadkey.toml
✅ Files skipped from review due to trivial changes (3)
- test/e2e/collection/apivalidations/splunk-payloadkey-and-templated-sourcetype-2.yaml
- test/e2e/collection/apivalidations/splunk-templated-sourcetype.yaml
- test/e2e/collection/apivalidations/splunk-payloadkey-and-templated-sourcetype-1.yaml
🚧 Files skipped from review as they are similar to previous changes (16)
- test/e2e/collection/apivalidations/splunk-sourcetype.yaml
- test/e2e/collection/apivalidations/api_validations_test.go
- bundle/manifests/cluster-logging.clusterserviceversion.yaml
- test/e2e/collection/apivalidations/splunk-payloadkey-and-sourcetype.yaml
- test/e2e/collection/apivalidations/splunk-source.yaml
- test/e2e/collection/apivalidations/splunk-templated-source-2.yaml
- test/functional/outputs/splunk/forward_to_splunk_metadata_test.go
- internal/generator/vector/output/splunk/splunk_test.go
- test/e2e/collection/apivalidations/splunk-payloadkey.yaml
- api/observability/v1/output_types.go
- test/e2e/collection/apivalidations/splunk-templated-source-1.yaml
- docs/features/logforwarding/outputs/splunk-forwarding.adoc
- bundle/manifests/observability.openshift.io_clusterlogforwarders.yaml
- internal/generator/vector/output/splunk/splunk_sink_with_payloadkey_and_sourcetype.toml
- internal/generator/vector/output/splunk/splunk_sink_with_payloadkey_and_static_sourcetype.toml
- config/crd/bases/observability.openshift.io_clusterlogforwarders.yaml
| builder.WriteString(fmt.Sprintf(payloadKeysourceTypeTmpl, quotedPathArray, commontemplate.TransformUserTemplateToVRL(o.Splunk.SourceType))) | ||
| } else { | ||
| builder.WriteString(fmt.Sprintf(payloadKeyTmpl, quotedPathArray)) |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
# First, find and read the splunk.go file
cat -n internal/generator/vector/output/splunk/splunk.go | sed -n '120,135p'Repository: openshift/cluster-logging-operator
Length of output: 934
🏁 Script executed:
# Check the entire file length to understand context
wc -l internal/generator/vector/output/splunk/splunk.goRepository: openshift/cluster-logging-operator
Length of output: 132
🏁 Script executed:
# Search for all fmt.Sprintf + WriteString patterns in the file
rg -n "fmt\.Sprintf.*Sprintf|WriteString" internal/generator/vector/output/splunk/splunk.go -A 1Repository: openshift/cluster-logging-operator
Length of output: 713
🏁 Script executed:
# Verify QF1012 is a real staticcheck rule
web_search
QF1012 staticcheck Go lint rule fmt.Sprintf WriteStringRepository: openshift/cluster-logging-operator
Length of output: 174
🌐 Web query:
QF1012 staticcheck Go fmt.Sprintf WriteString unnecessary allocation
💡 Result:
Staticcheck's QF1012 diagnostic identifies an inefficient pattern where a program calls a method like WriteString or Write on an object (such as a bytes.Buffer or strings.Builder) using the result of fmt.Sprintf [1][2][3]. The diagnostic recommends replacing this with fmt.Fprintf [1][2][3]. The primary reason for this recommendation is performance [4][5]. Using fmt.Sprintf inside a WriteString call causes several inefficiencies: 1. Unnecessary Allocation: fmt.Sprintf creates a new, temporary string in memory to hold the formatted output [6][4][5]. This string is then immediately copied into the buffer by WriteString, resulting in an extra allocation and unnecessary memory usage [6][4][5]. 2. Overhead: fmt.Sprintf must parse the format string and handle variable arguments (interface{}), which incurs additional CPU overhead compared to writing directly to the destination buffer [6][4][5]. By using fmt.Fprintf(&buf,...), the formatting logic writes directly into the buffer's underlying storage, avoiding the creation of the intermediate string and reducing pressure on the garbage collector [7][4][5]. Example of the fix: // Before (flagged by QF1012) buf.WriteString(fmt.Sprintf("Value: %d", val)) // After (optimized) fmt.Fprintf(&buf, "Value: %d", val) This diagnostic is part of the standard Staticcheck suite and is enabled by default in most Go development environments, including gopls [1][2][3].
Citations:
- 1: https://staticcheck.dev/docs/checks
- 2: https://go.googlesource.com/tools/+/master/gopls/doc/analyzers.md
- 3: https://go.dev/gopls/analyzers
- 4: https://www.rickyspears.com/technology/the-hidden-cost-of-fmt-sprintf-why-this-simple-function-can-burn-a-hole-in-your-go-programs-performance/
- 5: https://dev.to/yudaph/fmtsprintf-looks-simple-but-will-burn-a-hole-in-your-pocket-h4h
- 6: https://medium.com/swlh/bad-go-frivolous-sprintf-2ad28fedf1a0
- 7: ctreminiom/go-atlassian@fd28380
🏁 Script executed:
# Verify builder is a strings.Builder
rg -n "var.*builder|builder.*:=" internal/generator/vector/output/splunk/splunk.go | head -5Repository: openshift/cluster-logging-operator
Length of output: 111
🏁 Script executed:
# Check the imports to confirm builder type
head -20 internal/generator/vector/output/splunk/splunk.goRepository: openshift/cluster-logging-operator
Length of output: 1237
Replace nested fmt.Sprintf + WriteString with fmt.Fprintf to satisfy staticcheck QF1012.
Lines 125 and 127 use an inefficient pattern that allocates an unnecessary temporary string. Use fmt.Fprintf(&builder, ...) instead to write directly to the strings.Builder, avoiding the intermediate allocation and improving performance.
Suggested change
- builder.WriteString(fmt.Sprintf(payloadKeysourceTypeTmpl, quotedPathArray, commontemplate.TransformUserTemplateToVRL(o.Splunk.SourceType)))
+ fmt.Fprintf(&builder, payloadKeysourceTypeTmpl, quotedPathArray, commontemplate.TransformUserTemplateToVRL(o.Splunk.SourceType))
} else {
- builder.WriteString(fmt.Sprintf(payloadKeyTmpl, quotedPathArray))
+ fmt.Fprintf(&builder, payloadKeyTmpl, quotedPathArray)
}🧰 Tools
🪛 golangci-lint (2.12.2)
[error] 125-125: QF1012: Use fmt.Fprintf(...) instead of WriteString(fmt.Sprintf(...))
(staticcheck)
[error] 127-127: QF1012: Use fmt.Fprintf(...) instead of WriteString(fmt.Sprintf(...))
(staticcheck)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@internal/generator/vector/output/splunk/splunk.go` around lines 125 - 127,
Replace the nested fmt.Sprintf + builder.WriteString calls with direct writes
using fmt.Fprintf to avoid the temporary allocation: in the block that currently
calls builder.WriteString(fmt.Sprintf(payloadKeysourceTypeTmpl, quotedPathArray,
commontemplate.TransformUserTemplateToVRL(o.Splunk.SourceType))) and the else
that calls builder.WriteString(fmt.Sprintf(payloadKeyTmpl, quotedPathArray)),
call fmt.Fprintf(&builder, ...) with the same format and args (use
payloadKeysourceTypeTmpl and payloadKeyTmpl and pass quotedPathArray and the
result of commontemplate.TransformUserTemplateToVRL(o.Splunk.SourceType) where
appropriate) so the strings.Builder is written to directly.
|
/hold cancel |
|
/lgtm |
Thanks @jcantrill , can this be included in the 6.6 release? |
|
@marpears: all tests passed! Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
Description
This PR allows for configuration of the Splunk output source type in the CLF using a new
sourceTypefield. This is so that we can support users who have defined custom source types in Splunk.The
sourceTypefield can be used only whenpayloadKeyis defined, and allows for a templated value so that it can be retrieved from a pod label. IfsourceTypeis not defined, the current logic is preserved where it defaults to _json. If usingpayloadKeywithoutsourceType, the source type used will be either _json or generic_single_line, depending on the structure of the final event payload.CLF configuration example:
/cc @Clee2691 @cahartma
/assign @jcantrill
Links
Summary by CodeRabbit
New Features
Validation
Documentation
Tests