iApp vLLM Gateway

Healthy   Uptime 64h 45m   1797722 requests
Qwen3.5-122B-A10B-FP8
Healthy   GPU 2,3   389599 req
qwen3.5-122b
Qwen/Qwen3.5-122B-A10B-FP8
Qwen3.5-35B-A3B-FP8
Healthy   GPU 4   19752 req
qwen3.5-35b
Qwen/Qwen3.5-35B-A3B-FP8
Qwen3.5-35B-A3B-FP8
Healthy   GPU 6,7   1388371 req
qwen3.5-35b-multi
Qwen/Qwen3.5-35B-A3B-FP8
Models endpoint /v1/models
Reload config POST /gateway/reload

Available Aliases

AliasModel
qwen3.5-122bQwen/Qwen3.5-122B-A10B-FP8
qwen3.5-35bQwen/Qwen3.5-35B-A3B-FP8
qwen3.5-35b-multiQwen/Qwen3.5-35B-A3B-FP8

Use either the alias or full model name in the model field.

Basic Request

curl
Python
curl http://api3-siamai.aieat.or.th/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "qwen3.5-122b",
    "messages": [{"role": "user", "content": "Hello"}]
  }'
from openai import OpenAI

client = OpenAI(base_url="http://api3-siamai.aieat.or.th/v1", api_key="none")

response = client.chat.completions.create(
    model="qwen3.5-122b",
    messages=[{"role": "user", "content": "Hello"}]
)
print(response.choices[0].message.content)

Streaming

curl
Python
curl http://api3-siamai.aieat.or.th/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "qwen3.5-122b",
    "messages": [{"role": "user", "content": "Hello"}],
    "stream": true
  }'
stream = client.chat.completions.create(
    model="qwen3.5-122b",
    messages=[{"role": "user", "content": "Hello"}],
    stream=True
)
for chunk in stream:
    if chunk.choices[0].delta.content:
        print(chunk.choices[0].delta.content, end="")

Thinking Mode

Qwen3.5 models think by default. Disable with enable_thinking: false or append /no_think to your prompt.

Default (on)
Disable
# Thinking is enabled by default
response = client.chat.completions.create(
    model="qwen3.5-122b",
    messages=[{"role": "user", "content": "Solve: 2x + 3 = 7"}]
)
# Response includes reasoning in <think>...</think> tags
# Option 1: API parameter
response = client.chat.completions.create(
    model="qwen3.5-122b",
    messages=[{"role": "user", "content": "Hello"}],
    extra_body={"chat_template_kwargs": {"enable_thinking": False}}
)

# Option 2: Prompt suffix
messages=[{"role": "user", "content": "Hello /no_think"}]

Structured JSON Output

Force valid JSON output using response_format.

JSON mode
JSON schema
response = client.chat.completions.create(
    model="qwen3.5-122b",
    messages=[{"role": "user", "content": "List 3 countries as JSON"}],
    response_format={"type": "json_object"}
)
response = client.chat.completions.create(
    model="qwen3.5-122b",
    messages=[{
        "role": "user",
        "content": "Extract: John is 30 and lives in Tokyo"
    }],
    response_format={
        "type": "json_schema",
        "json_schema": {
            "name": "person",
            "schema": {
                "type": "object",
                "properties": {
                    "name": {"type": "string"},
                    "age": {"type": "integer"},
                    "city": {"type": "string"}
                },
                "required": ["name", "age", "city"]
            }
        }
    }
)
# {"name": "John", "age": 30, "city": "Tokyo"}

Parameters

ParameterDefaultDescription
temperature0.7Randomness (0.0 - 2.0)
max_tokens-Response length limit
top_p0.9Nucleus sampling (0.0 - 1.0)
top_k20Vocabulary sampling
frequency_penalty0.0Repetition penalty
stop-Stop sequences