Version
The latest released. At the moment of writing this - 4.5.13
Context
We have encountered a metrics issue while using the non-blocking Postgres DB driver with Vertx.
The reason why two issues are mentioned in one ticket is that I believe they are tightly coupled and fixing one potentially fixes the second.
Issue 1: vertx_pool_queue_pending doesn't decrease after connection loss
When we have pending queries (vertx_pool_queue_pending{pool_type="sql",}) and the database connection is lost (due to a DB restart, network glitch, etc.), the vertx_pool_queue_pending metric remains stuck. It never goes below the value recorded at the time of connection loss.
This means that in the metrics graph, it appears as if there are always pending queries waiting for a connection—even when the database connection is restored immediately. The only way to resolve this issue is to restart the service.
I've reviewed VertxPoolMetrics and related classes, but it's unclear where the issue lies. Notably, any queries that were pending when the connection was lost are never executed after reconnection.
Issue 2: vertx_pool_queue_pending freezes with high load
We also observed that when sending a high volume of requests, the vertx_pool_queue_pending metric does not decrease correctly.
Do you have a reproducer?
import io.vertx.core.Vertx;
import io.vertx.core.VertxOptions;
import io.vertx.core.json.JsonObject;
import io.vertx.junit5.VertxExtension;
import io.vertx.junit5.VertxTestContext;
import io.vertx.micrometer.MetricsService;
import io.vertx.micrometer.MicrometerMetricsOptions;
import io.vertx.micrometer.VertxPrometheusOptions;
import io.vertx.pgclient.PgBuilder;
import io.vertx.pgclient.PgConnectOptions;
import io.vertx.sqlclient.Pool;
import io.vertx.sqlclient.PoolOptions;
import org.junit.jupiter.api.AfterAll;
import org.junit.jupiter.api.BeforeAll;
import org.junit.jupiter.api.Test;
import org.junit.jupiter.api.extension.ExtendWith;
import org.testcontainers.containers.PostgreSQLContainer;
@ExtendWith(VertxExtension.class)
public class PostgresTest {
private static PostgreSQLContainer<?> postgresContainer;
private static Vertx vertx;
private static Pool pgPool;
@BeforeAll
static void setup() {
postgresContainer = new PostgreSQLContainer<>("postgres")
.withDatabaseName("testdb")
.withUsername("user")
.withPassword("password");
postgresContainer.start();
MicrometerMetricsOptions metricsOptions = new MicrometerMetricsOptions()
.setPrometheusOptions(new VertxPrometheusOptions()
.setEnabled(true)
.setStartEmbeddedServer(true)
.setEmbeddedServerOptions(new io.vertx.core.http.HttpServerOptions().setPort(8081))
.setPublishQuantiles(true))
.setEnabled(true);
vertx = Vertx.vertx(new VertxOptions().setMetricsOptions(metricsOptions));
PgConnectOptions connectOptions = new PgConnectOptions()
.setPort(postgresContainer.getFirstMappedPort())
.setHost(postgresContainer.getHost())
.setDatabase(postgresContainer.getDatabaseName())
.setUser(postgresContainer.getUsername())
.setPassword(postgresContainer.getPassword());
PoolOptions poolOptions = new PoolOptions().setMaxSize(5);
pgPool = PgBuilder.pool()
.with(poolOptions)
.connectingTo(connectOptions)
.using(vertx)
.build();
}
@Test
void testDatabaseConnection(VertxTestContext testContext) throws InterruptedException {
for (int i = 0; i < 300_000; i++) {
pgPool.withTransaction(sqlConnection ->
sqlConnection.query("SELECT PG_SLEEP(5)").execute()
);
}
MetricsService metricsService = MetricsService.create(vertx);
for (int i = 0; i < 1_000_000; i++) {
Thread.sleep(1000);
JsonObject metricsSnapshot = metricsService.getMetricsSnapshot();
System.out.println("vertx.pool.in.use" + metricsSnapshot.getString("vertx.pool.in.use"));
System.out.println("vertx.pool.queue.pending" + metricsSnapshot.getString("vertx.pool.queue.pending"));
System.out.println("=======");
}
}
@AfterAll
static void tearDown() {
if (pgPool != null) {
pgPool.close();
}
if (vertx != null) {
vertx.close();
}
if (postgresContainer != null) {
postgresContainer.stop();
}
}
}
Steps to reproduce
Please run the test above and take a look at the logs
Observed Behavior
- We create 300,000 requests, which immediately fill up
vertx.pool.queue.pending (except for the 5 connections actively processing queries).
- Once all requests are added to the queue, we start printing metrics every second.
- After about a minute,
vertx.pool.in.use drops to 0, meaning no queries are actively being processed.
- However,
vertx.pool.queue.pending freezes at around 299,970 and never decreases.
- Any new requests increase the pending count from this frozen value, rather than resetting.
Version
The latest released. At the moment of writing this - 4.5.13
Context
We have encountered a metrics issue while using the non-blocking Postgres DB driver with Vertx.
The reason why two issues are mentioned in one ticket is that I believe they are tightly coupled and fixing one potentially fixes the second.
Issue 1:
vertx_pool_queue_pendingdoesn't decrease after connection lossWhen we have pending queries (
vertx_pool_queue_pending{pool_type="sql",}) and the database connection is lost (due to a DB restart, network glitch, etc.), thevertx_pool_queue_pendingmetric remains stuck. It never goes below the value recorded at the time of connection loss.This means that in the metrics graph, it appears as if there are always pending queries waiting for a connection—even when the database connection is restored immediately. The only way to resolve this issue is to restart the service.
I've reviewed
VertxPoolMetricsand related classes, but it's unclear where the issue lies. Notably, any queries that were pending when the connection was lost are never executed after reconnection.Issue 2:
vertx_pool_queue_pendingfreezes with high loadWe also observed that when sending a high volume of requests, the
vertx_pool_queue_pendingmetric does not decrease correctly.Do you have a reproducer?
Steps to reproduce
Please run the test above and take a look at the logs
Observed Behavior
vertx.pool.queue.pending(except for the 5 connections actively processing queries).vertx.pool.in.usedrops to 0, meaning no queries are actively being processed.vertx.pool.queue.pendingfreezes at around 299,970 and never decreases.