Use Case
On a VM with 8GB and 4 cores I'm getting container startup times measured at >45s. It's sometimes almost down to 35s if I've got:
- a warm cache,
- an existing CA,
- and I'm not doing anything interesting with my mounts.
In either case that is wayyyyy tooooo slowwww.
Why is that too slow? Well, ideally, in an orchestrated environment (kubernetes/docker swarm/nomad/ECS/etc) you want your stateless services to scale up and down quickly and responsively as load demands it.
Yes, I know openvox server isn't stateless... yet. That's a very related goal I plan on discussing with you all.
(It's related because separating the stateful bits (CA/code/config) eliminates a lot of preprocessing for container startup, which makes it go faster. I'll be opening a separate FR in the future)
In either case, the grand vision is I can have an HPA react to workload which informs a deployment to scale the replicaset. But if a service takes too long this leads to what's called the Bullwhip Effect. A user shouldn't have to worry whether there's enough services at a given time, only that it isn't overloading their cluster.
Describe the solution you would like
My first goal is getting startup under 10s (relative to my previously mentioned benchmarks). Faster would be ideal but that may take some java expertise (which I don't have in abundance).
What's making it slow? When checking podman logs -t and running through timestamps I can see that the entrypoint shell phase takes up about 2/3 that time. Once the entrypoint scripts are done then the JVM takes over, which in my experience takes up the remaining third.
Thankfully logback.xml has some pretty sensible defaults and we can see what happens at which times. Here's a quick and dirty breakdown from my tests. No warm cache, no mounts, fresh image.
Phase Time
|-----------------------------------------------------|-----------------|
| Shell (entrypoint, incl. ca setup on cold start) | ~36s |
|-----------------------------------------------------|-----------------|
| JVM + Clojure + Trapperkeeper class loading | ~4s |
|-----------------------------------------------------|-----------------|
| JRuby instance creation (loading Puppet into JRuby) | ~10s |
|-----------------------------------------------------|-----------------|
| Jetty bind (ready) | ~1s |
|-----------------------------------------------------|-----------------|
| TOTAL | ~51s |
That first bit is what I'd like to tackle first. When I start adding set -x options to the scripts and prepending commands with time I can see various culprits.
puppet config commands take up ~1s each, each booting a separate ruby interpreter. A few dozen of those are littered about. This is by far and away the most time consuming part.
- I'm planning on replacing it with a grab-all
puppet config get <XYZ> and then supply those settings using a batching helper script.
- The
chown -R from 87-ca-permissions.sh is a temporary workaround I'm not happy with.
- Until a future date when we can get rid of it, I can probably ensure it only runs on files with a different
uid (as opposed to its current shotgun approach).
90-ca.sh has a few hocon set commands, each needing to spend half a second booting the gem.
- We can probably load the library in its own ruby script and consolidate those.
- The
puppetserver ca generate command takes a long while, but since that's only on first boot it shouldn't matter. I can ignore that.
As for the JVM, I've tinkered with some of the startup parameters to no avail. Messing with TieredStopAtLevel was disastrous, Jruby relies heavily on C2 JIT.
Buuuut I think we can pre-seed classes by creating a CDS (Class Data Sharing) archive. This memory maps classes into a reference file instead of loading them class-by-class. This is already provided for the default JDK classes, but not the application JARs (clojure, jruby, puppetserver). I'm still investigating.
I think that's all I've got so far? If anyone else has any ideas or knows of any additional bottlenecks we might be able to clear, please let me know!
Describe alternatives you've considered
No response
Additional context
No response
Use Case
On a VM with 8GB and 4 cores I'm getting container startup times measured at >45s. It's sometimes almost down to 35s if I've got:
In either case that is wayyyyy tooooo slowwww.
Why is that too slow? Well, ideally, in an orchestrated environment (kubernetes/docker swarm/nomad/ECS/etc) you want your stateless services to scale up and down quickly and responsively as load demands it.
Yes, I know openvox server isn't stateless... yet. That's a very related goal I plan on discussing with you all.
(It's related because separating the stateful bits (CA/code/config) eliminates a lot of preprocessing for container startup, which makes it go faster. I'll be opening a separate FR in the future)
In either case, the grand vision is I can have an HPA react to workload which informs a deployment to scale the replicaset. But if a service takes too long this leads to what's called the Bullwhip Effect. A user shouldn't have to worry whether there's enough services at a given time, only that it isn't overloading their cluster.
Describe the solution you would like
My first goal is getting startup under 10s (relative to my previously mentioned benchmarks). Faster would be ideal but that may take some java expertise (which I don't have in abundance).
What's making it slow? When checking
podman logs -tand running through timestamps I can see that the entrypoint shell phase takes up about 2/3 that time. Once the entrypoint scripts are done then the JVM takes over, which in my experience takes up the remaining third.Thankfully
logback.xmlhas some pretty sensible defaults and we can see what happens at which times. Here's a quick and dirty breakdown from my tests. No warm cache, no mounts, fresh image.That first bit is what I'd like to tackle first. When I start adding
set -xoptions to the scripts and prepending commands withtimeI can see various culprits.puppet configcommands take up ~1s each, each booting a separate ruby interpreter. A few dozen of those are littered about. This is by far and away the most time consuming part.puppet config get <XYZ>and then supply those settings using a batching helper script.chown -Rfrom87-ca-permissions.shis a temporary workaround I'm not happy with.uid(as opposed to its current shotgun approach).90-ca.shhas a fewhocon setcommands, each needing to spend half a second booting the gem.puppetserver ca generatecommand takes a long while, but since that's only on first boot it shouldn't matter. I can ignore that.As for the JVM, I've tinkered with some of the startup parameters to no avail. Messing with
TieredStopAtLevelwas disastrous, Jruby relies heavily on C2 JIT.Buuuut I think we can pre-seed classes by creating a CDS (Class Data Sharing) archive. This memory maps classes into a reference file instead of loading them class-by-class. This is already provided for the default JDK classes, but not the application JARs (clojure, jruby, puppetserver). I'm still investigating.
I think that's all I've got so far? If anyone else has any ideas or knows of any additional bottlenecks we might be able to clear, please let me know!
Describe alternatives you've considered
No response
Additional context
No response