|
| 1 | +--- |
| 2 | +name: akka-net-best-practices |
| 3 | +description: Critical Akka.NET best practices including EventStream vs DistributedPubSub, supervision strategies, error handling, Props vs DependencyResolver, work distribution patterns, and cluster/local mode abstractions for testability. |
| 4 | +invocable: false |
| 5 | +--- |
| 6 | + |
| 7 | +# Akka.NET Best Practices |
| 8 | + |
| 9 | +## When to Use This Skill |
| 10 | + |
| 11 | +Use this skill when: |
| 12 | +- Designing actor communication patterns |
| 13 | +- Deciding between EventStream and DistributedPubSub |
| 14 | +- Implementing error handling in actors |
| 15 | +- Understanding supervision strategies |
| 16 | +- Choosing between Props patterns and DependencyResolver |
| 17 | +- Designing work distribution across nodes |
| 18 | +- Creating testable actor systems that can run with or without cluster infrastructure |
| 19 | +- Abstracting over Cluster Sharding for local testing scenarios |
| 20 | + |
| 21 | +## Reference Files |
| 22 | + |
| 23 | +- [work-distribution-patterns.md](work-distribution-patterns.md): Database queues, Akka.Streams throttling, outbox pattern |
| 24 | +- [cluster-local-abstractions.md](cluster-local-abstractions.md): GenericChildPerEntityParent, IPubSubMediator, execution mode wiring |
| 25 | +- [async-cancellation-patterns.md](async-cancellation-patterns.md): Actor-scoped CancellationToken, linked CTS, timeout handling |
| 26 | + |
| 27 | +--- |
| 28 | + |
| 29 | +## 1. EventStream vs DistributedPubSub |
| 30 | + |
| 31 | +### Critical: EventStream is LOCAL ONLY |
| 32 | + |
| 33 | +`Context.System.EventStream` is **local to a single ActorSystem process**. It does NOT work across cluster nodes. |
| 34 | + |
| 35 | +```csharp |
| 36 | +// BAD: This only works on a single server |
| 37 | +// When you add a second server, subscribers on server 2 won't receive events from server 1 |
| 38 | +Context.System.EventStream.Subscribe(Self, typeof(PostCreated)); |
| 39 | +Context.System.EventStream.Publish(new PostCreated(postId, authorId)); |
| 40 | +``` |
| 41 | + |
| 42 | +**When EventStream is appropriate:** |
| 43 | +- Logging and diagnostics within a single process |
| 44 | +- Local event bus for truly single-process applications |
| 45 | +- Development/testing scenarios |
| 46 | + |
| 47 | +### Use DistributedPubSub for Multi-Node |
| 48 | + |
| 49 | +For events that must reach actors across multiple cluster nodes, use `Akka.Cluster.Tools.PublishSubscribe`: |
| 50 | + |
| 51 | +```csharp |
| 52 | +using Akka.Cluster.Tools.PublishSubscribe; |
| 53 | + |
| 54 | +public class TimelineUpdatePublisher : ReceiveActor |
| 55 | +{ |
| 56 | + private readonly IActorRef _mediator; |
| 57 | + |
| 58 | + public TimelineUpdatePublisher() |
| 59 | + { |
| 60 | + // Get the DistributedPubSub mediator |
| 61 | + _mediator = DistributedPubSub.Get(Context.System).Mediator; |
| 62 | + |
| 63 | + Receive<PublishTimelineUpdate>(msg => |
| 64 | + { |
| 65 | + // Publish to a topic - reaches all subscribers across all nodes |
| 66 | + _mediator.Tell(new Publish($"timeline:{msg.UserId}", msg.Update)); |
| 67 | + }); |
| 68 | + } |
| 69 | +} |
| 70 | +``` |
| 71 | + |
| 72 | +### Akka.Hosting Configuration for DistributedPubSub |
| 73 | + |
| 74 | +```csharp |
| 75 | +builder.WithDistributedPubSub(role: null); // Available on all roles, or specify a role |
| 76 | +``` |
| 77 | + |
| 78 | +### Topic Design Patterns |
| 79 | + |
| 80 | +| Pattern | Topic Format | Use Case | |
| 81 | +|---------|--------------|----------| |
| 82 | +| Per-user | `timeline:{userId}` | Timeline updates, notifications | |
| 83 | +| Per-entity | `post:{postId}` | Post engagement updates | |
| 84 | +| Broadcast | `system:announcements` | System-wide notifications | |
| 85 | +| Role-based | `workers:rss-poller` | Work distribution | |
| 86 | + |
| 87 | +--- |
| 88 | + |
| 89 | +## 2. Supervision Strategies |
| 90 | + |
| 91 | +### Key Clarification: Supervision is for CHILDREN |
| 92 | + |
| 93 | +A supervision strategy defined on an actor dictates **how that actor supervises its children**, NOT how the actor itself is supervised. |
| 94 | + |
| 95 | +```csharp |
| 96 | +public class ParentActor : ReceiveActor |
| 97 | +{ |
| 98 | + // This strategy applies to children of ParentActor, NOT to ParentActor itself |
| 99 | + protected override SupervisorStrategy SupervisorStrategy() |
| 100 | + { |
| 101 | + return new OneForOneStrategy( |
| 102 | + maxNrOfRetries: 10, |
| 103 | + withinTimeRange: TimeSpan.FromSeconds(30), |
| 104 | + decider: ex => ex switch |
| 105 | + { |
| 106 | + ArithmeticException => Directive.Resume, |
| 107 | + NullReferenceException => Directive.Restart, |
| 108 | + ArgumentException => Directive.Stop, |
| 109 | + _ => Directive.Escalate |
| 110 | + }); |
| 111 | + } |
| 112 | +} |
| 113 | +``` |
| 114 | + |
| 115 | +### Default Supervision Strategy |
| 116 | + |
| 117 | +The default `OneForOneStrategy` already includes rate limiting: |
| 118 | +- **10 restarts within 1 second** = actor is permanently stopped |
| 119 | +- This prevents infinite restart loops |
| 120 | + |
| 121 | +**You rarely need a custom strategy** unless you have specific requirements. |
| 122 | + |
| 123 | +### When to Define Custom Supervision |
| 124 | + |
| 125 | +**Good reasons:** |
| 126 | +- Actor throws exceptions indicating irrecoverable state corruption -> Restart |
| 127 | +- Actor throws exceptions that should NOT cause restart (expected failures) -> Resume |
| 128 | +- Child failures should affect siblings -> Use `AllForOneStrategy` |
| 129 | +- Need different retry limits than the default |
| 130 | + |
| 131 | +**Bad reasons:** |
| 132 | +- "Just to be safe" - the default is already safe |
| 133 | +- Don't understand what the actor does - understand it first |
| 134 | + |
| 135 | +--- |
| 136 | + |
| 137 | +## 3. Error Handling: Supervision vs Try-Catch |
| 138 | + |
| 139 | +### When to Use Try-Catch (Most Cases) |
| 140 | + |
| 141 | +**Use try-catch when:** |
| 142 | +- The failure is **expected** (network timeout, invalid input, external service down) |
| 143 | +- You know **exactly why** the exception occurred |
| 144 | +- You can handle it **gracefully** (retry, return error response, log and continue) |
| 145 | +- Restarting would **not help** (same error would occur again) |
| 146 | + |
| 147 | +```csharp |
| 148 | +public class RssFeedPollerActor : ReceiveActor |
| 149 | +{ |
| 150 | + public RssFeedPollerActor() |
| 151 | + { |
| 152 | + ReceiveAsync<PollFeed>(async msg => |
| 153 | + { |
| 154 | + try |
| 155 | + { |
| 156 | + var feed = await _httpClient.GetStringAsync(msg.FeedUrl); |
| 157 | + var items = ParseFeed(feed); |
| 158 | + // Process items... |
| 159 | + } |
| 160 | + catch (HttpRequestException ex) |
| 161 | + { |
| 162 | + // Expected failure - log and schedule retry |
| 163 | + _log.Warning("Feed {Url} unavailable: {Error}", msg.FeedUrl, ex.Message); |
| 164 | + Context.System.Scheduler.ScheduleTellOnce( |
| 165 | + TimeSpan.FromMinutes(5), Self, msg, Self); |
| 166 | + } |
| 167 | + catch (XmlException ex) |
| 168 | + { |
| 169 | + // Invalid feed format - log and mark as bad |
| 170 | + _log.Error("Feed {Url} has invalid format: {Error}", msg.FeedUrl, ex.Message); |
| 171 | + Sender.Tell(new FeedPollResult.InvalidFormat(msg.FeedUrl)); |
| 172 | + } |
| 173 | + }); |
| 174 | + } |
| 175 | +} |
| 176 | +``` |
| 177 | + |
| 178 | +### When to Let Supervision Handle It |
| 179 | + |
| 180 | +**Let exceptions propagate (trigger supervision) when:** |
| 181 | +- You have **no idea** why the exception occurred |
| 182 | +- The actor's **state might be corrupt** |
| 183 | +- A **restart would help** (fresh state, reconnect resources) |
| 184 | +- It's a **programming error** (NullReferenceException, InvalidOperationException from bad logic) |
| 185 | + |
| 186 | +### Anti-Pattern: Swallowing Unknown Exceptions |
| 187 | + |
| 188 | +```csharp |
| 189 | +// BAD: Swallowing exceptions hides problems |
| 190 | +catch (Exception ex) |
| 191 | +{ |
| 192 | + _log.Error(ex, "Error processing work"); |
| 193 | + // Actor continues with potentially corrupt state |
| 194 | +} |
| 195 | + |
| 196 | +// GOOD: Handle known exceptions, let unknown ones propagate |
| 197 | +catch (HttpRequestException ex) |
| 198 | +{ |
| 199 | + // Known, expected failure - handle gracefully |
| 200 | + _log.Warning("HTTP request failed: {Error}", ex.Message); |
| 201 | + Sender.Tell(new WorkResult.TransientFailure()); |
| 202 | +} |
| 203 | +// Unknown exceptions propagate to supervision |
| 204 | +``` |
| 205 | + |
| 206 | +--- |
| 207 | + |
| 208 | +## 4. Props vs DependencyResolver |
| 209 | + |
| 210 | +### When to Use Plain Props |
| 211 | + |
| 212 | +**Use `Props.Create()` when:** |
| 213 | +- Actor doesn't need `IServiceProvider` or `IRequiredActor<T>` |
| 214 | +- All dependencies can be passed via constructor |
| 215 | +- Actor is simple and self-contained |
| 216 | + |
| 217 | +```csharp |
| 218 | +// Simple actor with no DI needs |
| 219 | +public static Props Props(PostId postId, IPostWriteStore store) |
| 220 | + => Akka.Actor.Props.Create(() => new PostEngagementActor(postId, store)); |
| 221 | +``` |
| 222 | + |
| 223 | +### When to Use DependencyResolver |
| 224 | + |
| 225 | +**Use `resolver.Props<T>()` when:** |
| 226 | +- Actor needs `IServiceProvider` to create scoped services |
| 227 | +- Actor uses `IRequiredActor<T>` to get references to other actors |
| 228 | +- Actor has many dependencies that are already in DI container |
| 229 | + |
| 230 | +```csharp |
| 231 | +// Registration with DI |
| 232 | +builder.WithActors((system, registry, resolver) => |
| 233 | +{ |
| 234 | + var actor = system.ActorOf(resolver.Props<OrderProcessorActor>(), "order-processor"); |
| 235 | + registry.Register<OrderProcessorActor>(actor); |
| 236 | +}); |
| 237 | +``` |
| 238 | + |
| 239 | +### Remote Deployment Considerations |
| 240 | + |
| 241 | +**You almost never need remote deployment.** If you're not doing remote deployment (and you probably aren't): |
| 242 | +- `Props.Create(() => new Actor(...))` with closures is fine |
| 243 | +- The "serialization issue" warning doesn't apply |
| 244 | + |
| 245 | +For most applications, use **cluster sharding** instead of remote deployment - it handles distribution automatically. |
| 246 | + |
| 247 | +--- |
| 248 | + |
| 249 | +## 5. Work Distribution Patterns |
| 250 | + |
| 251 | +When you have many background jobs (RSS feeds, email sending, etc.), don't process them all at once - this causes thundering herd problems. |
| 252 | + |
| 253 | +**Three patterns to solve this:** |
| 254 | +1. **Database-Driven Work Queue** - Use `FOR UPDATE SKIP LOCKED` for natural cross-node distribution |
| 255 | +2. **Akka.Streams Rate Limiting** - Throttle processing within a single node |
| 256 | +3. **Durable Queue (Outbox Pattern)** - Database-backed outbox for reliable processing |
| 257 | + |
| 258 | +See [work-distribution-patterns.md](work-distribution-patterns.md) for full code samples. |
| 259 | + |
| 260 | +--- |
| 261 | + |
| 262 | +## 6. Common Mistakes Summary |
| 263 | + |
| 264 | +| Mistake | Why It's Wrong | Fix | |
| 265 | +|---------|----------------|-----| |
| 266 | +| Using EventStream for cross-node pub/sub | EventStream is local only | Use DistributedPubSub | |
| 267 | +| Defining supervision to "protect" an actor | Supervision protects children | Understand the hierarchy | |
| 268 | +| Catching all exceptions | Hides bugs, corrupts state | Only catch expected errors | |
| 269 | +| Always using DependencyResolver | Adds unnecessary complexity | Use plain Props when possible | |
| 270 | +| Processing all background jobs at once | Thundering herd, resource exhaustion | Use database queue + rate limiting | |
| 271 | +| Throwing exceptions for expected failures | Triggers unnecessary restarts | Return result types, use messaging | |
| 272 | + |
| 273 | +--- |
| 274 | + |
| 275 | +## 7. Quick Reference |
| 276 | + |
| 277 | +### Communication Pattern Decision Tree |
| 278 | + |
| 279 | +``` |
| 280 | +Need to communicate between actors? |
| 281 | +├── Same process only? -> EventStream is fine |
| 282 | +├── Across cluster nodes? |
| 283 | +│ ├── Point-to-point? -> Use ActorSelection or known IActorRef |
| 284 | +│ └── Pub/sub? -> Use DistributedPubSub |
| 285 | +└── Fire-and-forget to external system? -> Consider outbox pattern |
| 286 | +``` |
| 287 | + |
| 288 | +### Error Handling Decision Tree |
| 289 | + |
| 290 | +``` |
| 291 | +Exception occurred in actor? |
| 292 | +├── Expected failure (HTTP timeout, invalid input)? |
| 293 | +│ └── Try-catch, handle gracefully, continue |
| 294 | +├── State might be corrupt? |
| 295 | +│ └── Let supervision restart |
| 296 | +├── Unknown cause? |
| 297 | +│ └── Let supervision restart |
| 298 | +└── Programming error (null ref, bad logic)? |
| 299 | + └── Let supervision restart, fix the bug |
| 300 | +``` |
| 301 | + |
| 302 | +### Props Decision Tree |
| 303 | + |
| 304 | +``` |
| 305 | +Creating actor Props? |
| 306 | +├── Actor needs IServiceProvider? |
| 307 | +│ └── Use resolver.Props<T>() |
| 308 | +├── Actor needs IRequiredActor<T>? |
| 309 | +│ └── Use resolver.Props<T>() |
| 310 | +├── Simple actor with constructor params? |
| 311 | +│ └── Use Props.Create(() => new Actor(...)) |
| 312 | +└── Remote deployment needed? |
| 313 | + └── Probably not - use cluster sharding instead |
| 314 | +``` |
| 315 | + |
| 316 | +--- |
| 317 | + |
| 318 | +## 8. Cluster/Local Mode Abstractions |
| 319 | + |
| 320 | +For applications that need to run both in clustered production and local/test environments, use abstraction patterns to toggle between implementations: |
| 321 | + |
| 322 | +- **`AkkaExecutionMode` enum** - Controls which implementations are used (LocalTest vs Clustered) |
| 323 | +- **`GenericChildPerEntityParent`** - Mimics sharding behavior locally using the same `IMessageExtractor` |
| 324 | +- **`IPubSubMediator`** - Abstracts DistributedPubSub for swappable local/cluster implementations |
| 325 | + |
| 326 | +See [cluster-local-abstractions.md](cluster-local-abstractions.md) for complete implementation code. |
| 327 | + |
| 328 | +--- |
| 329 | + |
| 330 | +## 9. Actor Logging |
| 331 | + |
| 332 | +### Use ILoggingAdapter, Not ILogger<T> |
| 333 | + |
| 334 | +In actors, use `ILoggingAdapter` from `Context.GetLogger()` instead of DI-injected `ILogger<T>`: |
| 335 | + |
| 336 | +```csharp |
| 337 | +public class MyActor : ReceiveActor |
| 338 | +{ |
| 339 | + private readonly ILoggingAdapter _log = Context.GetLogger(); |
| 340 | + |
| 341 | + public MyActor() |
| 342 | + { |
| 343 | + Receive<MyMessage>(msg => |
| 344 | + { |
| 345 | + _log.Info("Processing message for user {UserId}", msg.UserId); |
| 346 | + _log.Error(ex, "Failed to process {MessageType}", msg.GetType().Name); |
| 347 | + }); |
| 348 | + } |
| 349 | +} |
| 350 | +``` |
| 351 | + |
| 352 | +**Why ILoggingAdapter:** |
| 353 | +- Integrates with Akka's logging pipeline and supervision |
| 354 | +- Supports semantic/structured logging as of v1.5.57 |
| 355 | +- Method names: `Info()`, `Debug()`, `Warning()`, `Error()` (not `Log*` variants) |
| 356 | +- No DI required - obtained directly from actor context |
| 357 | + |
| 358 | +**Don't inject ILogger<T> into actors** - it bypasses Akka's logging infrastructure. |
| 359 | + |
| 360 | +### Semantic Logging (v1.5.57+) |
| 361 | + |
| 362 | +```csharp |
| 363 | +// Named placeholders for better log aggregation and querying |
| 364 | +_log.Info("Order {OrderId} processed for customer {CustomerId}", order.Id, order.CustomerId); |
| 365 | + |
| 366 | +// Prefer named placeholders over positional |
| 367 | +// Good: {OrderId}, {CustomerId} |
| 368 | +// Avoid: {0}, {1} |
| 369 | +``` |
| 370 | + |
| 371 | +--- |
| 372 | + |
| 373 | +## 10. Managing Async Operations with CancellationToken |
| 374 | + |
| 375 | +When actors launch async operations via `PipeTo`, those operations can outlive the actor if not properly managed. Key practices: |
| 376 | + |
| 377 | +- **Actor CTS in PostStop** - Always cancel and dispose in `PostStop()` |
| 378 | +- **New CTS per operation** - Cancel previous before starting new work |
| 379 | +- **Pass token everywhere** - EF Core queries, HTTP calls, etc. |
| 380 | +- **Linked CTS for timeouts** - External calls get short timeouts to prevent hanging |
| 381 | +- **Graceful handling** - Distinguish timeout vs shutdown in catch blocks |
| 382 | + |
| 383 | +See [async-cancellation-patterns.md](async-cancellation-patterns.md) for complete implementation code. |
0 commit comments