Add Prometheus scrape/pull source to prometheus plugin [Adding in the same Prometheus Remote write source] #6743
Conversation
f327b15 to
d118aa1
Compare
dlvenable
left a comment
There was a problem hiding this comment.
Thanks @srikanthpadakanti .
At a high-level would a scraper be expected to get data from the same Prometheus nodes pushing? I'm wondering if it makes sense to have a separate source or not.
| @JsonProperty("http_basic") | ||
| private HttpBasicConfig httpBasic; | ||
|
|
||
| @JsonProperty("bearer_token") |
There was a problem hiding this comment.
Maybe this should be an oauth2 group with only bearer_token for consistency.
scrape:
oauth2:
bearer_token:
There was a problem hiding this comment.
Bearer token here is a static token passed as an HTTP Authorization header, not an OAuth2 flow. Nesting it under oauth2 would imply token endpoint and refresh support that we do not have. Kept it flat under authentication to match the actual behavior.
Happy to add an oauth2 group later when we support the full flow with token exchange.
| final HttpHeadersBuilder builder = HttpHeaders.builder() | ||
| .add(HttpHeaderNames.ACCEPT, ACCEPT_HEADER_VALUE); | ||
|
|
||
| if (config.getAuthentication() != null) { |
There was a problem hiding this comment.
It would probably be ideal to have small classes and a common interface for updating the request based on the authentication. This could allow more complicated auth schemes like SigV4 in the future.
I had some similar comments on an http sink PR that is in progress currently. #6747 (comment). Though my comments went the other way. Have the code more amenable to basic auth instead of SigV4. 😄
There was a problem hiding this comment.
Extracted into a ScrapeRequestAuthenticator interface with BasicAuthenticator and BearerTokenAuthenticator as separate classes. Factory method handles selection. Adding SigV4 or other schemes in the future is just a new implementation without touching the scraper.
| @JsonProperty("flatten_labels") | ||
| private boolean flattenLabels = false; | ||
|
|
||
| @JsonProperty("insecure") |
There was a problem hiding this comment.
If any of the URLs in targets has a non-HTTPS scheme we should throw a validation error unless insecure is true.
You can add a JSR @AssertTrue to handle this in this class.
Signed-off-by: Srikanth Padakanti <srikanth_padakanti@apple.com>
d118aa1 to
713b2d2
Compare
A scraper would not typically target the same nodes pushing via remote write. They serve different use cases: remote write is Prometheus servers configured to push, while scrape pulls from any endpoint exposing the Prometheus text exposition format, including application instances that are not Prometheus servers themselves. The reason they share a plugin is the precedent from kkondaka's feedback on PR #6627 where the recommendation was to keep it as a single prometheus source with config options for different modes, similar to how That said, if you prefer a separate source I am open to splitting it out. |
Signed-off-by: Srikanth Padakanti <srikanth_padakanti@apple.com>
@srikanthpadakanti , @kkondaka , I think this should be a different source and that the S3 analogy doesn't quite fit here. The S3 source is entirely pull-based. Whether it is scan or SQS or EventBridge, it is pulling the data from S3. In no case does it open a port. The scan and the SQS are just two different ways to discover the content. The S3 source pulls the data from S3 the exact same way and uses the same code path regardless of the discovery. Additionally we had also once wanted the ability to scan and then start using SQS for a CDC-like behavior, similar to what we do for DynamoDB/RDS. In this case, there is a big difference between opening a port and going out to an external system to make requests. The source configuration will have a security difference depending on the configuration and may not be clear. I recommend that we split this into a |
|
@dlvenable Whether it is pull or push is not something users/customers know about or care about, right? Why make it difficult for the users? |
I think this is not at all the case. Users must know whether it is push or pull because that is how they get the data there right? If it is pull-based, they must enable a scrape URI and provide that. If it is push-based they must configure their remote-write protocol. I think this simplifies it for users because these are in reality two different sources with disjoint configurations. The documentation will need to say for just about every field whether it is remote-write or pull/scape. It seems more straightforward as a user to start with one model and see the configurations only for that model. In what way would two sources make the the configuration more complicated for users? |
|
@kkondaka , I suggest that we accept this PR and discuss further on the source. |
| public class ScrapeAuthenticationConfig { | ||
|
|
||
| @Valid | ||
| @JsonProperty("http_basic") |
There was a problem hiding this comment.
Looks like these auth mechanisms are common for all http. We should move them to a common location.
There was a problem hiding this comment.
acknowledge, offer to do it as a follow-up
There was a problem hiding this comment.
I have created a new issue for the followup - #6766
…ource Signed-off-by: Srikanth Padakanti <srikanth_padakanti@apple.com>
dlvenable
left a comment
There was a problem hiding this comment.
Thank you @srikanthpadakanti for this contribution!
Description
Add Prometheus scrape/pull source to the existing prometheus plugin. The source periodically HTTP GETs target
/metricsendpoints, parses the Prometheus text exposition format, and converts metrics into Data Prepper events (Gauge, Sum, Histogram, Summary). Supports static target URLs, configurable scrape interval/timeout, HTTP Basic Auth, Bearer Token, and TLS certificates. Both scrape and remote write modes can run simultaneously under the same plugin.Three usage modes:
Issues Resolved
#1997
Resolves #1997
Check List
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.