Skip to content

Agent initialization api#945

Merged
LikeTheSalad merged 8 commits into
open-telemetry:mainfrom
LikeTheSalad:agent-initialization-api
May 5, 2025
Merged

Agent initialization api#945
LikeTheSalad merged 8 commits into
open-telemetry:mainfrom
LikeTheSalad:agent-initialization-api

Conversation

@LikeTheSalad
Copy link
Copy Markdown
Contributor

These changes aim to add an agent initialization API. The current way of initializing RUM is quite flexible, as it should be able to adapt to different use-cases, however, that flexibility also makes it more verbose and complicated to use. This API aims to provide "the recommended way" of initializing RUM, where some configuration decisions are made internally in a way that should properly cover what's needed for most applications. By taking responsibility for several initialization decisions, this API also aims to make the whole initialization process much friendlier, as shown below:

Before

// Minimal config initialization
val rum = OpenTelemetryRum
            .builder(application)
            .addSpanExporterCustomizer {
                OtlpHttpSpanExporter
                    .builder()
                    .setEndpoint("http://10.0.2.2:4318/v1/traces")
                    .build()
            }.addLogRecordExporterCustomizer {
                OtlpHttpLogRecordExporter
                    .builder()
                    .setEndpoint("http://10.0.2.2:4318/v1/logs")
                    .build()
            }.addMetricExporterCustomizer {
                OtlpHttpMetricExporter
                    .builder()
                    .setEndpoint("http://10.0.2.2:4318/v1/metrics")
                    .build()
            }.build()

After

// Minimal config initialization
val rum = OpenTelemetryRumInitializer.initialize(application, "http://10.0.2.2:4318")

@LikeTheSalad LikeTheSalad requested a review from a team as a code owner April 15, 2025 11:53
.addInstrumentation(SessionInstrumentation())
try {
rum = otelRumBuilder.build()
rum = OpenTelemetryRumInitializer.initialize(this, "http://10.0.2.2:4318")
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OpenTelemetryRumInitializer is an object (singleton) but looks like it should be just a function or extension?
the single class holds no state or anything.

Copy link
Copy Markdown
Contributor Author

@LikeTheSalad LikeTheSalad Apr 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right. The reason I went with an object is to try and provide a consistent experience for users calling it either from Kotlin or Java code. Though I'm open to other approaches. If we turned it into a standalone function, for example, would it be a way for people from Java to still call it this way: OpenTelemetryRumInitializer.initialize(...)?

We could also decide not to think/worry about Java usage apis, though I'd like to get some feedback on whether that could cause issues or not.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as a kotlin object, i think Java users have to do eg OpenTelemetryRumInitializer.INSTANCE.initialize
I think for OpenTelemetryRumInitializer.initialize you'd need a @JvmStatic annotation within a companion object

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True, though it seems to work with objects with @JvmStatic annotations too, without having to be inside a companion object (at least that was the case when I tried it locally).

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah i was not sure about being within the companion object anyway, if it works, all good

@SenNeonoveNoci
Copy link
Copy Markdown

SenNeonoveNoci commented Apr 15, 2025

Hi @LikeTheSalad,

To follow up on what I mentioned during the SIG meeting, I'll write down the API we currently have for initializing the Agent we're working on (built on top of opentelemetry-android). Maybe it can serve as a source of inspiration.

The agent gets installed like this:

val agent: SplunkRum = SplunkRum.install(
    application: Application,
    agentConfiguration: AgentConfiguration,
    vararg moduleConfigurations: ModuleConfiguration
)

The AgentConfiguration, as the name suggests, lets clients configure the agent. Only a few items are mandatory to keep the configuration process as simple as possible:

class AgentConfiguration(
    val endpoint: EndpointConfiguration,
    val appName: String? = null,
    var appVersion: String? = null,
    var enableDebugLogging: Boolean = false,
    var globalAttributes: Attributes = Attributes.empty(),
    var spanInterceptor: ((SpanData) -> SpanData?)? = null,
    var user: UserConfiguration = UserConfiguration(),
    var session: SessionConfiguration = SessionConfiguration(),
)

The EndpointConfiguration allows clients to configure the endpoint in several ways (specific to the Splunk Agent) while keeping it really simple for the most basic case:

class EndpointConfiguration {

    ..

    constructor(realm: String, rumAccessToken: String) {
        ..
    }
    constructor(trace: URL) {
        ..
    }
    constructor(trace: URL, sessionReplay: URL) {
        ..
    }
}

The array (vararg) of ModuleConfiguration lets clients configure specific instrumentations. An example module configuration might look like this:

class SlowRenderingModuleConfiguration(
    val isEnabled: Boolean = true,
    val detectionPollInterval: Long
) : ModuleConfiguration

On agent install, the AgentConfiguration is used for general configuration, and the ModuleConfigurations are passed to the instrumentation modules. A minimal installation might look like this:

val agent: SplunkRum = SplunkRum.install(
    application,
    AgentConfiguration(EndpointConfiguration("http://www.exmple.com"))
)

You can find the code here:

If you find this helpful and have follow-up questions, feel free to keep them coming.

Also, if we aim to make the installation/initialization process as smooth as possible, note that application: Application is not strictly necessary and can be replaced with a ContentProvider.

cc: @jzwc

@bidetofevil
Copy link
Copy Markdown
Contributor

So there are two things that something like this can achieve:

  1. Convenience API to make initialization with defaults easier - yes it's duplicative as @cleverchuk pointed out, but that's by design. Use this easy thing if it works for you - if not, go deeper. I think that's a good pattern to maintain. And if it allows the user to not have to have an explicit dependency to core, then all the better.
  2. A shared Agent API that can be abstracted away without dependencies on the specific Agent implementation.

The latter is something I think we can put a pin in for this PR as I don't want to hold it up. As for the former, I think settling one a few optional parameters and providing defaults would be useful. Having a raw address seems oddly explicit though - maybe just the host and port of the collector and we can construct the URL based on that? This can be used with gRPC similarly if we want to make that an option.

@bidetofevil
Copy link
Copy Markdown
Contributor

@SenNeonoveNoci your approach seems like one of many that we explored a few months back (i.e. use a config object). I think it was decided then that it was a bit too verbose so individual optional parameters with defaults were preferred. Here's some background on it.

I think as a means of abstracting related details away in a specific interface so that the agent API for init doesn't have a tonne of concrete dependencies is a good idea. Even more than one if too many concepts are mixed in. But that feels like something we should do to flesh out an Agent API (2 in my previous) rather than something we need to do immediately if we only want to do 1.

@LikeTheSalad
Copy link
Copy Markdown
Contributor Author

Thank you for your feedback @SenNeonoveNoci @bidetofevil @marandaneto (and @cleverchuk from your comments on the SIG meeting) - I'm really excited about these discussions because, as @bidetofevil mentioned, we've had them in one form or another in the past, although it always felt to me like we were trying to make guesses, which is a little scary tbh, as we didn't have too many people involved to ensure we didn't miss something important. Now that we've got a nice quorum, I'm hopeful that we will be able to start shaping the agent for its first stable API.

Now, before diving into the fun part, I'd like to share with you my philosophical view on how to proceed, and hopefully it's something that doesn't sound bad to you (and if it does, please let me know so that we can try to find some common ground to move forward). My general idea regarding API design is that there are often many ways to do the same thing, so most of the time and effort that's put into it is to try and find the "best approach". I love that idea, although in practice, I've noticed that it can be a bit paralyzing to try and make sure that the next decision/PR is the "right one", because that causes some pressure that might make the creation of PRs (and even the PR reviews) to feel like they have to be perfect, where ultimately it all becomes too overwhelming to make any progress. On top of that, I think it's usually the user's feedback the one that weighs the most when it comes to design decisions, but for us to be able to get that feedback, we should provide something for them to comment upon, which is why I think the best way of making a nice API is to be able of "failing fast" in a way that we can quickly try things and adjust as needed, iteratively. The fail-fast approach is not ideal in some cases, although for this project, given that it's not stable and that people are willing to try it in its "preview state", I think we are in an ideal scenario to do so.

If the above doesn't sound bad to you, I'd like to propose the following way of moving forward with the agent API design:

  • Define a high-level ultimate goal for this project. I think this was already mentioned by @breedx-splk and @bidetofevil based on discussions in the previous Kubecon. My understanding is that this project should provide a solution for both vendors to be able to fully customize it as needed, as well as end users to be able to use it with minimal configuration based on our recommendations, though please correct me if I missed something.
  • Anybody should feel free to create PRs with the additions/refactors (as atomic as possible) that they think could enhance the apis, no matter where, even if it can cause breaking changes.
  • If there are no obvious issues with a design decision from a PR, but we're concerned (we might only have a "hunch") that it might cause troubles in the future, but we don't know exactly which ones and/or if those issues will ever arise, then I don't think we should block that PR, because it could discourage further ideas from ever get a PR unless people are certain that that's the "right step" to make. Plus, if the ideas are bad, once they're merged, I'm sure some user will let us know.
  • If the idea proposed is good and it means that the agent will get an overall enhancement, but the execution is not ideal, I think regular PR reviews can, in some cases, help get the PR in the right shape. However, if the execution is not ideal, and to make it ideal means to do some refactoring/big changes elsewhere, then I think we should merge the PR and work on a better adaptation in a further one, which will allow users to get a better API that what they had available before, even though the implementation needs to get better. This will allow us to make incremental improvements.

That's my proposal, I'm looking forward to knowing your thoughts on it. Also, please let me know if you think some wording should change or if something is missing from that list.


Now, regarding your comments on this PR:

On agent install, the AgentConfiguration is used for general configuration, and the ModuleConfigurations are passed to the instrumentation modules.

I like the idea of having "module configurations" to prevent the agent configuration from getting too big. In general, I like the overall approach, though that type of configuration structure seems like it's a big enough topic to deserve its own PR 😅 as it seems like it might overlap with the current OtelRumConfig object, so I'm not sure if that'd mean we should keep both or go with a more powerful config abstraction (like the one @bidetofevil was mentioning), but for now I just added the available config object as that way we can add the incremental progress of providing a convenience API, which I think it's one of the somethings I mentioned earlier that we still don't have, to at least having something to work upon. So, if you don't mind, I think the config object should be addressed separately because it's a big topic on its own.

Also, if we aim to make the installation/initialization process as smooth as possible, note that application: Application is not strictly necessary and can be replaced with a ContentProvider.

I'm curious about this approach, by replacing application with ContentProvider do you mean something like having a content provider that captures the application object so that it's available later on during initialization without the user having to pass it when manually calling the initialization? Or to wrap the agent initialization around a content provider so that users don't need to initialize it manually? Or maybe something else that I might have missed.

Having a raw address seems oddly explicit though - maybe just the host and port of the collector and we can construct the URL based on that?

I think I see what you mean, although a couple of concerns come to my mind about that approach, so I'd like to get your thoughts on them:

  • I think that for HTTP collectors, the path can be customized. So for example, typically for traces the path would be v1/traces, but my understanding is that that's not mandatory, so if we try and guess the full URL based on that assumption, we might get it wrong in case some user decided to use something else.
  • Probably some users might have a collector in their local network, so they might not have tls enabled for it. How can we know whether we should build the URL with http or https?

@jzwc
Copy link
Copy Markdown

jzwc commented Apr 19, 2025

I'd like to contribute my perspective (worth about 0.02€) here, too as I am one of the architects of the approach together with @SenNeonoveNoci.

Our modular approach extends beyond simply modularizing the configuration—it's part of our broader vision to consistently modularize the entire architecture of the agent.

The idea of enabling raw (full) collector URLs provisioning is exactly as @LikeTheSalad guesses, to allow collectors with non-default paths to be used, something that is allowed by the specification: Non-default URL paths for requests MAY be configured on the client and server sides.: https://opentelemetry.io/docs/specs/otlp/#otlphttp-request

Regarding the ContentProvider discussion - while my Android development expertise is limited, I believe the intention is not to substitute the Application class with a ContentProvider, but rather to leverage the fact that a ContentProvider can be implicitly accessed within Agent code to serve the same functional purpose as an Application instance, i.e., have no such (Application, ContentProvider) attribute in the initialization anymore. Is this understanding correct, @SenNeonoveNoci?

@breedx-splk
Copy link
Copy Markdown
Contributor

  • If there are no obvious issues with a design decision from a PR, but we're concerned (we might only have a "hunch") that it might cause troubles in the future, but we don't know exactly which ones and/or if those issues will ever arise, then I don't think we should block that PR, because it could discourage further ideas from ever get a PR unless people are certain that that's the "right step" to make. Plus, if the ideas are bad, once they're merged, I'm sure some user will let us know.

Yes! This! More of this! I want to sing this from the mountaintops.

@SenNeonoveNoci
Copy link
Copy Markdown

I'm curious about this approach — by replacing the application with a ContentProvider, do you mean something like:

  • Having a ContentProvider that captures the Application object so it's available later during initialization without the user having to pass it manually?
  • Wrapping the agent initialization in a ContentProvider so users don’t have to initialize it at all?
  • Or maybe something else I might have missed?

@LikeTheSalad Sorry for the late response — we had some national holidays (yay!) here in the Czech Republic 🇨🇿

Let me try to portray the possible solution with an example:

In the “Agent” module’s AndroidManifest.xml, you'd register a provider like this:

<application>
    <provider
        android:name=".AgentInstaller"
        android:authorities="${applicationId}.agent-installer"
        android:enabled="true"
        android:exported="false" />
    ...
</application>

The AgentInstaller class can then look like this:

internal class AgentInstaller : ContentProvider() {

    override fun onCreate(): Boolean {
        val application = context?.applicationContext as Application
        Agent.install(application) // Pass the context to where it's needed
        return true
    }

    override fun query(
        uri: Uri,
        projection: Array<out String>?,
        selection: String?,
        selectionArgs: Array<out String>?,
        sortOrder: String?
    ): Cursor? = null

    override fun getType(uri: Uri): String? = null
    override fun insert(uri: Uri, values: ContentValues?): Uri? = null
    override fun delete(uri: Uri, selection: String?, selectionArgs: Array<out String>?): Int = 0
    override fun update(uri: Uri, values: ContentValues?, selection: String?, selectionArgs: Array<out String>?): Int = 0
}

This way, you can access the Application instance before the client manually initializes the Agent — because the provider’s onCreate() will be triggered before the Application’s onCreate(), which is where initialization code is usually placed.

@bidetofevil
Copy link
Copy Markdown
Contributor

The ability to get the Application object from a ContentProvider allows you to initialize the SDK sooner than Application.onCreate(). It's nice if the latter isn't early enough for you, but not strictly necessary if it is, i.e. the trade off of needing to stash the application object instead of having passed in is debatable since it's pretty easy for an SDK user to provide that.

But overall, since the APIs aren't stable yet, I agree with @LikeTheSalad and @breedx-splk - I also prefer to add them in unless there are serious and immediate objections, and we can fix in place later, especially when we aren't introducing new concepts that can get ingrained and be hard to remove later.

@LikeTheSalad
Copy link
Copy Markdown
Contributor Author

Thanks for the detailed explanation, @SenNeonoveNoci. It does sound like an enhancement to avoid users having to pass their app object, although I agree with @bidetofevil that it's usually not too difficult for them to provide it, so I was wondering that maybe in a (distant) future, should we wanted to make the overall process as zero-code as possible, one way to do so might be to do the whole agent initialization in a content provider. It should be an opt-in feature I think, as usually some people prefer to do these initializations manually, but nevertheless I think we can play with some nice tools in the future that make use of a plain-old boring manual API on behalf of our users, as the one that's proposed in this PR.

Copy link
Copy Markdown
Contributor

@breedx-splk breedx-splk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@LikeTheSalad thanks I think this is a good first start at fleshing out the agent as the go-to place for opinionated and simple/quick configuration of OpenTelemetryRum. 👍🏻

I can't offer an approval just yet, however, because you've removed the OtelRumConfigExtension functions without any explanation here. This seems to greatly reduce some of the convenience that was here before without any replacement. Can you either address this with an explanation or put it back until we can figure out what the replacement(s) look like? Thanks!

@LikeTheSalad
Copy link
Copy Markdown
Contributor Author

LikeTheSalad commented Apr 30, 2025

@LikeTheSalad thanks I think this is a good first start at fleshing out the agent as the go-to place for opinionated and simple/quick configuration of OpenTelemetryRum. 👍🏻

I can't offer an approval just yet, however, because you've removed the OtelRumConfigExtension functions without any explanation here. This seems to greatly reduce some of the convenience that was here before without any replacement. Can you either address this with an explanation or put it back until we can figure out what the replacement(s) look like? Thanks!

Got it. The reason I removed those convenience methods is to avoid having overlapping apis in the agent, as I added them some time ago as a quick workaround to avoid the verbosity of configuring the instrumentations using the loader api directly, but now that we're introducing a proper agent api whose job is to make things easier for users in general, I don't think the workaround is needed anymore, as the new api should be the place for those configs too. The reason why I didn't provide a replacement in this PR is to keep it small, because I know these kinds of changes tend to be contentious (though ironically that decision made it contentious 😅 ), so I try to make them as small as possible to avoid blocking progress on things that can be added independently.

I plan to create a follow-up PR focused on instrumentation configs, where we can add these configs as well as an additional one to allow people to disable instrumentations based on a set of enum values, as we discussed some time ago. I wouldn't like to make this PR bigger so what I'll do regarding your concern about avoiding breaking existing apis without replacement, is to keep the convenience methods for now and then remove them in the PR where the replacements will be added.

Update: I've just applied the changes, so it should be sorted now, @breedx-splk.

Copy link
Copy Markdown
Contributor

@breedx-splk breedx-splk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@LikeTheSalad LikeTheSalad merged commit cc6dcd0 into open-telemetry:main May 5, 2025
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants