UnifiedAI Documentation

UnifiedAI is a universal AI proxy that provides a single, unified API to access multiple AI providers. It supports both OpenAI v1 and Ollama API formats, making it compatible with existing tools and SDKs.

Key Features

Quick Start

Get started with UnifiedAI in under 2 minutes using your preferred programming language.

Python

from openai import OpenAI

client = OpenAI(
    base_url="http://localhost:9000/v1",
    api_key="sk-test"  # Any key starting with 'sk-' works
)

response = client.chat.completions.create(
    model="openrouter-deepseek/deepseek-chat",
    messages=[{"role": "user", "content": "Hello!"}]
)

print(response.choices[0].message.content)

JavaScript/TypeScript

import OpenAI from 'openai';

const client = new OpenAI({
  baseURL: 'http://localhost:9000/v1',
  apiKey: 'sk-test'
});

const response = await client.chat.completions.create({
  model: 'openrouter-qwen/qwen-2.5-72b-instruct:free',
  messages: [{ role: 'user', content: 'Hello!' }]
});

console.log(response.choices[0].message.content);

cURL

curl -X POST http://localhost:9000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer sk-test" \
  -d '{
    "model": "openrouter-deepseek/deepseek-chat",
    "messages": [{"role": "user", "content": "Hello!"}]
  }'

OpenAI v1 API

UnifiedAI implements the OpenAI v1 API specification, making it compatible with any tool or SDK that supports OpenAI.

Base URL

http://localhost:9000/v1

Endpoints

List Models

GET /v1/models

Returns a list of all available models from all providers.

Response:

{
  "object": "list",
  "data": [
    {
      "id": "openrouter-deepseek/deepseek-chat",
      "object": "model",
      "created": 1234567890,
      "owned_by": "openrouter"
    },
    {
      "id": "zai-glm-4.6",
      "object": "model",
      "created": 1234567890,
      "owned_by": "zai"
    }
  ]
}

Create Chat Completion

POST /v1/chat/completions

Creates a chat completion (streaming or non-streaming).

Headers:

Header Value Required
Authorization Bearer sk-* Yes
Content-Type application/json Yes

Request Body:

{
  "model": "openrouter-deepseek/deepseek-chat",
  "messages": [
    {
      "role": "user",
      "content": "What is the capital of France?"
    }
  ],
  "stream": false,
  "temperature": 0.7,
  "max_tokens": 1000
}

Parameters:

Parameter Type Description
model string Model ID in format: provider-modelname
messages array Array of message objects
stream boolean Enable streaming (default: false)
temperature number Sampling temperature (0.0 - 2.0)
max_tokens integer Maximum tokens to generate
top_p number Nucleus sampling parameter
frequency_penalty number Penalize repetitions (-2.0 to 2.0)
presence_penalty number Encourage new topics (-2.0 to 2.0)
tools array Tools/functions for function calling
tool_choice string/object Control tool selection behavior
n integer Number of completions to generate
stop string/array Stop sequences
response_format object Output format (text/json_object/json_schema)
Full OpenAI v1 Compatibility

UnifiedAI supports all OpenAI v1 API parameters including advanced features like function calling, JSON mode, and prompt caching. Additional parameters like logprobs, seed, and logit_bias are also supported.

Ollama API

UnifiedAI also provides Ollama-compatible endpoints, allowing you to use it with tools like Continue.dev, Cursor, and other Ollama clients.

Base URL

http://localhost:9000/api

Endpoints

List Models (Tags)

GET /api/tags

Returns available models in Ollama format.

Chat

POST /api/chat

Generate a chat completion in Ollama format.

Request:

{
  "model": "deepseek-chat",
  "messages": [
    {
      "role": "user",
      "content": "Hello!"
    }
  ],
  "stream": false
}

Generate

POST /api/generate

Generate a completion from a prompt.

Request:

{
  "model": "qwen-2.5-72b",
  "prompt": "Tell me a joke",
  "stream": false
}

Version

GET /api/version

Returns the Ollama API version.

Authentication

UnifiedAI uses API key authentication for the OpenAI v1 API.

API Key Format

Any API key starting with sk- will be accepted (e.g., sk-test, sk-anything).

This is intentional for development and testing purposes.

How to Authenticate

Include the API key in the Authorization header:

Authorization: Bearer sk-test

The Ollama API does not require authentication.

OpenRouter Provider

OpenRouter is the primary provider, giving access to 200+ AI models from various vendors.

Provider Name

openrouter

Available Models

OpenRouter provides access to models from:

Free Models

UnifiedAI pre-configures several free models:

openrouter-qwen/qwen-2.5-72b-instruct:free
openrouter-deepseek/deepseek-chat-v3.1:free
openrouter-nvidia/nemotron-nano-9b-v2:free
openrouter-mistralai/devstral-small-2505:free
openrouter-moonshotai/kimi-dev-72b:free

Model Verification

UnifiedAI can verify if a model exists on OpenRouter before routing the request. This prevents errors for non-existent models.

GLM (Z.ai) Provider

GLM (also known as Z.ai) provides ChatGLM models including GLM-4.6, GLM-4.5, and the Z1 series.

Provider Name

zai

Available Models

zai-glm-4.6
zai-glm-4.5
zai-glm-z1-series

Features

Streaming

UnifiedAI supports real-time streaming using Server-Sent Events (SSE) for both OpenAI and Ollama APIs.

OpenAI Streaming

Set stream: true in your request:

import OpenAI from 'openai';

const client = new OpenAI({
  baseURL: 'http://localhost:9000/v1',
  apiKey: 'sk-test'
});

const stream = await client.chat.completions.create({
  model: 'openrouter-deepseek/deepseek-chat',
  messages: [{ role: 'user', content: 'Write a poem' }],
  stream: true
});

for await (const chunk of stream) {
  process.stdout.write(chunk.choices[0]?.delta?.content || '');
}

Ollama Streaming

{
  "model": "deepseek-chat",
  "messages": [{"role": "user", "content": "Write a story"}],
  "stream": true
}

The response will be sent as newline-delimited JSON (NDJSON).

Prompt Caching

UnifiedAI supports Prompt Caching, a powerful optimization technique that caches repeated prompt segments to reduce latency and costs.

Benefits of Prompt Caching

Supported Providers

Provider Support Cache Duration Discount
Anthropic Claude
(via OpenRouter)
✅ Excellent 5 minutes 90%
Google Gemini
(via OpenRouter)
✅ Good Up to 1 hour ~80%
OpenAI GPT
(via OpenRouter)
❌ Not available N/A N/A
GLM (Z.ai) ❌ Not available N/A N/A
About Provider Support

UnifiedAI currently supports OpenRouter and GLM (Z.ai) as providers. When using OpenRouter, you get access to models from Anthropic, Google, OpenAI, and many others.

Prompt caching availability depends on the underlying model provider, not OpenRouter itself.

How to Use

Add the cache_control field to messages you want to cache:

{
  "model": "openrouter-anthropic/claude-3.5-sonnet",
  "messages": [
    {
      "role": "system",
      "content": "You are an expert programmer. Here is the complete codebase... [10,000 tokens]",
      "cache_control": {"type": "ephemeral"}
    },
    {
      "role": "user",
      "content": "Explain this function"
    }
  ]
}
Using OpenRouter

UnifiedAI accesses Anthropic Claude models through OpenRouter. Use the format: openrouter-anthropic/claude-3.5-sonnet

Prompt caching will work automatically when using Claude models via OpenRouter.

First Request: Processes all tokens and creates cache (normal cost)

Second Request (within 5 min): Cache hit! Only processes new query (90% discount)

Ideal Use Cases

Cache Expiration Handling

If cache expires, the provider automatically recreates it on the next request. No errors, it just costs normal price to recreate.

Important Notes

Response with Cache Stats

{
  "id": "msg_123",
  "choices": [...],
  "usage": {
    "prompt_tokens": 1000,
    "completion_tokens": 200,
    "cache_creation_input_tokens": 1000,  // Cache created (miss)
    "cache_read_input_tokens": 0          // Tokens from cache (hit)
  }
}

For more details, see the Prompt Caching Guide.

Model Naming Convention

UnifiedAI uses a specific naming convention to route requests to the correct provider.

Format

provider-modelname

Examples

Model ID Provider Description
openrouter-deepseek/deepseek-chat OpenRouter DeepSeek Chat via OpenRouter
zai-glm-4.6 GLM (Z.ai) GLM 4.6 model
openrouter-anthropic/claude-3.5-sonnet OpenRouter Claude 3.5 Sonnet via OpenRouter

Ollama Model Names

When using the Ollama API, you can use simplified model names without the provider prefix:

# Ollama format
POST /api/chat
{
  "model": "deepseek-chat",  # Automatically routes to provider
  "messages": [...]
}

Error Handling

UnifiedAI provides detailed error messages following OpenAI's error format.

Error Response Format

{
  "error": {
    "message": "Error description",
    "type": "error_type"
  }
}

Common Error Types

Error Type HTTP Status Description
invalid_request_error 400 Invalid request parameters or missing required fields
invalid_request_error 401 Missing or invalid API key
provider_error 502 Error from the upstream provider API
server_error 500 Internal server error

Examples

Missing API Key

{
  "error": {
    "message": "Missing or invalid Authorization header",
    "type": "invalid_request_error"
  }
}

Invalid Model

{
  "error": {
    "message": "Provider 'unknown' is not supported",
    "type": "invalid_request_error"
  }
}

Provider API Error

{
  "error": {
    "message": "OpenRouter API error: 429 - Rate limit exceeded",
    "type": "provider_error"
  }
}
Best Practices

Support

For issues, feature requests, or questions, please visit our GitHub repository or contact support.

UnifiedAI Documentation • Version 1.1.0 • Last updated: January 2025