Skip to content

Commit 4175bf6

Browse files
authored
Audit log all endpoints + coverage test (#9467)
Closes #8819 Closes #8820 1. An ergonomic helper `audit_and_time` that a. Wraps up the two annoying audit log calls — init and complete — into a single function that takes the handler logic as a callback b. Combines that with the latency timing function, because it's ugly to do both 2. A test that uses the `VERIFY_ENDPOINTS` list to make sure every non-get endpoint gets an audit log entry when you call it * Two exceptions: `system_timeseries_query` and `timeseries_query`, which are POSTs because of the query body but are read operations 3. Add audit log calls to every relevant endpoint. Logging IDs of created resource (#8811) was discussed in here and prototyped, but I took it out because it requires some decision-making. The idea with the helper is that however if we do the resource IDs by implementing a trait on response structs (I will post a draft PR showing how this looks), it shouldn’t require changing all the callsites. Then again, even if it did, it's not a huge deal. <details> <summary>Original PR description</summary> This is a draft because I want to get reactions to the approach. Would address #8820, #8811, and #8819. The idea is: 1. An ergonomic helper `audit_and_time` that a. Wraps up the two annoying audit log calls — init and complete — into a single function that takes the handler logic as a callback b. Combines that with the latency timing function, because it's ugly to do both 2. A test that uses the `VERIFY_ENDPOINTS` list to make sure every non-get endpoint gets an audit log entry when you call it The only added functionality here is the use of a trait `MaybeHasResourceId` (could rename to `AuditResponse` or something) to extract the ID of created resources so we can log them. Note that this PR doesn't do any of the DB work yet to actually store that ID — I want to make sure people like the callsite ergonomics first. ### Concerns and future plans * Here I am specifically pulling out the ID of the created resource, but in #8821 we discuss wanting to log the entire response body, and that would include the ID and be strictly more general. So we might want to skip straight to that if we expect to do it eventually. Added some notes on handling giant responses [here](#8821 (comment)). * This doesn't address #8813 but that issue doesn't have anything to do with the helper — we'll have to store the credential ID in the opctx and pull it off of there in the audit log entry init call </details>
1 parent b7beb65 commit 4175bf6

4 files changed

Lines changed: 966 additions & 2296 deletions

File tree

nexus/src/context.rs

Lines changed: 39 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -28,9 +28,12 @@ use oximeter::types::ProducerRegistry;
2828
use oximeter_instruments::http::{HttpService, LatencyTracker};
2929
use slog::Logger;
3030
use std::env;
31+
use std::future::Future;
3132
use std::sync::Arc;
3233
use uuid::Uuid;
3334

35+
use dropshot::{HttpError, HttpResponse};
36+
3437
/// Indicates the kind of HTTP server.
3538
#[derive(Clone, Copy)]
3639
pub enum ServerKind {
@@ -334,6 +337,42 @@ impl ServerContext {
334337
}
335338
}
336339

340+
/// Execute an external API handler with audit logging and latency tracking.
341+
///
342+
/// This helper:
343+
/// 1. Creates an OpContext via authentication
344+
/// 2. Initializes an audit log entry
345+
/// 3. Runs the handler
346+
/// 4. Completes the audit log entry with result info
347+
/// 5. Wraps everything in latency instrumentation
348+
pub async fn audit_and_time<F, Fut, R>(
349+
rqctx: &dropshot::RequestContext<ApiContext>,
350+
handler: F,
351+
) -> Result<R, HttpError>
352+
where
353+
F: FnOnce(Arc<OpContext>, Arc<Nexus>) -> Fut,
354+
Fut: Future<Output = Result<R, HttpError>>,
355+
R: HttpResponse,
356+
{
357+
let apictx = rqctx.context();
358+
let nexus = Arc::clone(&apictx.context.nexus);
359+
let handler = async {
360+
let opctx = Arc::new(op_context_for_external_api(rqctx).await?);
361+
let audit = nexus.audit_log_entry_init(&opctx, rqctx).await?;
362+
let result = handler(Arc::clone(&opctx), Arc::clone(&nexus)).await;
363+
// Ignore error: unlike the init line, audit log failures cannot cause
364+
// the request to fail because the primary operation has already taken
365+
// place. The complete function retries internally and logs on failure.
366+
let _ = nexus.audit_log_entry_complete(&opctx, &audit, &result).await;
367+
result
368+
};
369+
apictx
370+
.context
371+
.external_latencies
372+
.instrument_dropshot_handler(rqctx, handler)
373+
.await
374+
}
375+
337376
/// Authenticates an incoming request to the external API and produces a new
338377
/// operation context for it
339378
pub(crate) async fn op_context_for_external_api(

0 commit comments

Comments
 (0)