Pattern
Cold Start Mitigation
Serverless agent runtimes have a 1-3 second cold start; design around it.
When to use
Production deployments where p99 latency matters. Strategies: keep-warm pings every N minutes, provisioned concurrency for predictable load, or fallback to a smaller / cached model on cold-start detection.
When not to use
When the workload is batched and latency-insensitive. Cold start adds cost in keep-warm strategies; only pay it when latency is the bottleneck.