Performance docs
Latency and performance, using real logs.
These numbers come from analysis_logs.latency_ms in production D1, not invented marketing claims. They are observed telemetry for the last 30 days as of 2026-05-23, using nearest-rank percentile over successful logged requests. They are not a contractual SLA.
Observed latency
| Modality | n | p50 | p95 | p99 | min–max |
|---|---|---|---|---|---|
image | 25 | 6861 ms | 9115 ms | 10488 ms | 4683–10488 ms |
text | 199 | 8616 ms | 11180 ms | 12912 ms | 2704–14454 ms |
Source: Remote D1 analysis_logs.latency_ms. Observed production telemetry, not a contractual SLA. Public demos and provider outages can differ from these values.
Headers
JSON API and demo responses expose Server-Timing: total;dur=<ms> and Access-Control-Expose-Headers: X-Request-Id, Server-Timing so browser clients can measure actual end-to-end request duration.
Integration guidance
- Use a 30s timeout for text.
- Use a 60s timeout for image.
- Use async queues for image moderation when batching at scale.
- Retry transient
503with exponential backoff; do not retry validation failures.