When a zstd-compressed request (e.g. from Codex CLI) hits /v1/responses
with a cloud model the request failed.
Fix by decompressing zstd bodies before
model extraction, so cloud models are detected and proxied directly
without the writer being wrapped.
writeError in both OpenAI and Anthropic middleware writers would return
a raw json.SyntaxError when the error payload wasn't valid JSON (e.g.
"invalid character 'e' looking for beginning of value"). Fall back to
using the raw bytes as the error message instead.
Also use the actual HTTP status code rather than hardcoding 500, so
error types map correctly
The Codex runner was not setting OPENAI_BASE_URL or OPENAI_API_KEY, this prevents Codex from sending requests to api.openai.com instead of the local Ollama server. This mirrors the approach used by the Claude runner.
Codex v0.98.0 sends zstd-compressed request bodies to the /v1/responses endpoint. Add decompression support in ResponsesMiddleware with an 8MB max decompressed size limit to prevent resource exhaustion.