API Usage

OpenAI-compatible endpoint (no authentication)

Chat Completions

curl http://HOST:8000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "Qwen/Qwen3-30B-A3B-Instruct-2507",
    "messages": [{"role": "user", "content": "Hello"}]
  }'

Disable Thinking Mode

Add chat_template_kwargs to disable Qwen3 thinking:

curl http://HOST:8000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "Qwen/Qwen3-30B-A3B-Instruct-2507",
    "messages": [{"role": "user", "content": "Hello"}],
    "chat_template_kwargs": {"enable_thinking": false}
  }'

Batch Inference

curl http://HOST:8000/v1/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "Qwen/Qwen3-30B-A3B-Instruct-2507",
    "prompt": ["Hello", "Hi", "Hey"],
    "max_tokens": 100
  }'

Python

from openai import OpenAI
client = OpenAI(base_url="http://HOST:8000/v1", api_key="x")
r = client.chat.completions.create(
    model="Qwen/Qwen3-30B-A3B-Instruct-2507",
    messages=[{"role": "user", "content": "Hello"}]
)
print(r.choices[0].message.content)