OpenAI-compatible endpoint (no authentication)
curl http://HOST:8000/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "Qwen/Qwen3-30B-A3B-Instruct-2507",
"messages": [{"role": "user", "content": "Hello"}]
}'
Add chat_template_kwargs to disable Qwen3 thinking:
curl http://HOST:8000/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "Qwen/Qwen3-30B-A3B-Instruct-2507",
"messages": [{"role": "user", "content": "Hello"}],
"chat_template_kwargs": {"enable_thinking": false}
}'
curl http://HOST:8000/v1/completions \
-H "Content-Type: application/json" \
-d '{
"model": "Qwen/Qwen3-30B-A3B-Instruct-2507",
"prompt": ["Hello", "Hi", "Hey"],
"max_tokens": 100
}'
from openai import OpenAI
client = OpenAI(base_url="http://HOST:8000/v1", api_key="x")
r = client.chat.completions.create(
model="Qwen/Qwen3-30B-A3B-Instruct-2507",
messages=[{"role": "user", "content": "Hello"}]
)
print(r.choices[0].message.content)