Zima Labs API
API Documentation
An OpenAI-compatible API with zero data retention and hardware-level encryption. Drop in your existing SDK: change the base URL and key, nothing else.
Getting Started
Access state-of-the-art AI models through simple REST endpoints. Fast responses, per-token billing, production-ready infrastructure.
22+ Models
- • GPT-5, GPT-4o, o1/o3
- • Claude Sonnet, Opus, Haiku
- • DeepSeek, Qwen, Llama
Reasoning Models
- • o1, o3, o4, GPT-5
- • DeepSeek-R1 with reasoning_content
- • Chain-of-thought, clean answers
Enterprise Ready
- • Real-time token billing
- • Sub-second responses
- • Automatic failover
Authentication
All requests require your API key in the Authorization header as a Bearer token. Generate keys from your dashboard.
curl -X GET https://api.zimalabs.io/api/v1/usage \
-H "Authorization: Bearer YOUR_API_KEY" \
--max-time 10Security: Keep your API key confidential. Keys start with zima- and never expire unless rotated.
API Endpoints
Base URL: https://api.zimalabs.io/api/v1
/api/v1/chat/completionsGenerate chat completions with any supported model.
curl -X POST https://api.zimalabs.io/api/v1/chat/completions \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "MODEL_ID",
"messages": [{"role": "user", "content": "Hello!"}],
"max_tokens": 50,
"temperature": 0.7
}' \
--max-time 10/api/v1/modelsList all available models, providers, and access requirements.
curl -X GET https://api.zimalabs.io/api/v1/models \
-H "Authorization: Bearer YOUR_API_KEY" \
--max-time 10/api/v1/usageCheck your balance, token usage, and billing info.
curl -X GET https://api.zimalabs.io/api/v1/usage \
-H "Authorization: Bearer YOUR_API_KEY" \
--max-time 10Response Format
Responses follow the OpenAI format, extended with a billing object on every completion.
{
"id": "chatcmpl-1234567890",
"object": "chat.completion",
"model": "MODEL_ID",
"choices": [{"index": 0, "message": {"role": "assistant", "content": "..."}, "finish_reason": "stop"}],
"usage": {"prompt_tokens": 12, "completion_tokens": 20, "total_tokens": 32},
"billing": {"cost_usd": 0.0003, "remaining_credits": 10.9997}
}Billing Fields
cost_usdTotal cost for this requestremaining_creditsYour balance after requestapi_key_nameName of the key used
Usage Fields
prompt_tokensInput tokens consumedcompletion_tokensOutput tokens generatedtotal_tokensSum of prompt + completion
Error Handling
Standard HTTP status codes with structured error messages.
{
"error": {
"type": "invalid_request_error",
"message": "Model 'gpt-5' not found or inactive"
}
}Status Codes
- 200: Success
- 400: Bad Request
- 401: Unauthorized
- 402: Insufficient Credits
- 408: Request Timeout (8s)
- 500: Server Error
For reasoning models, the chain-of-thought is returned in message.reasoning_content. The clean answer is always in message.content. Zero code changes needed for existing clients.
Types & Parameters
Models fall into two categories. Reasoning models auto-exclude temperature and receive higher token limits.
Reasoning Models
Temperature excluded automatically. Higher token limits applied.
- • GPT-5, o1, o3, o4 series
- • DeepSeek-R1, gpt-oss-120b
- • kimi-k2-thinking
Returns: reasoning_content (chain-of-thought) + content (clean answer)
Standard Models
Full parameter support: temperature, max_tokens, top_p.
- • Claude Sonnet, Haiku, Opus
- • GPT-4o, GPT-4o-mini
- • DeepSeek-V3, Qwen, GLM
Returns: content only (no reasoning_content)
4th Wall Switching
An optional feature (off by default) that lets users switch models mid-conversation using natural language: “switch to GPT-5”, “use Claude instead”.
curl -X POST https://api.zimalabs.io/api/v1/chat/completions \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "gpt-4o",
"messages": [{"role": "user", "content": "Switch to Claude and write a poem"}],
"enable4thWall": true,
"announceSwitches": true,
"max_tokens": 200
}'enable4thWall- Type: boolean
- Default: false
- Enables conversational model switching
announceSwitches- Type: boolean
- Default: true
- Notifies when a model switch occurs
Quick Testing
Ready-to-use commands to verify your integration. Replace YOUR_API_KEY with a key from your dashboard.
1. Check auth & balance
curl -X GET https://api.zimalabs.io/api/v1/usage \
-H "Authorization: Bearer YOUR_API_KEY" \
--max-time 102. Reasoning model (GPT-5), no temperature
curl -X POST https://api.zimalabs.io/api/v1/chat/completions \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "gpt-5",
"messages": [{"role": "user", "content": "What is 15 × 23?"}],
"max_tokens": 200
}'3. Standard model (Claude Sonnet), full params
curl -X POST https://api.zimalabs.io/api/v1/chat/completions \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "claude-sonnet-3.7",
"messages": [{"role": "user", "content": "Write a short story about AI."}],
"max_tokens": 150,
"temperature": 0.8
}'4. DeepSeek-R1: inspect reasoning_content
curl -X POST https://api.zimalabs.io/api/v1/chat/completions \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "deepseek-r1",
"messages": [{"role": "user", "content": "Explain quantum computing simply."}],
"max_tokens": 300
}'5. List all models
curl -X GET https://api.zimalabs.io/api/v1/models \
-H "Authorization: Bearer YOUR_API_KEY" \
--max-time 10Note: Reasoning models automatically exclude temperature and receive higher token limits. Their chain-of-thought is in reasoning_content; content is always the clean answer. Your existing code needs zero changes.