Models¶

A LanguageModel configures LLM access for a LanguageCluster. The operator reads all LanguageModel resources in the namespace and registers them with the cluster's shared LiteLLM gateway — agents never hold API credentials or connect to model providers directly.

How It Works¶

One LiteLLM proxy (gateway) runs per LanguageCluster. When you add or remove a LanguageModel, the gateway restarts with the updated model list — no agent redeploy required.

Credential Management¶

API keys are never injected into agent pods. Store them in a Secret:

kubectl create secret generic anthropic-credentials \
  --from-literal=api-key=sk-ant-your-key-here

Reference the Secret from the model spec:

apiVersion: langop.io/v1alpha1
kind: LanguageModel
metadata:
  name: claude-sonnet
spec:
  provider: anthropic
  modelName: claude-sonnet-4-5
  apiKeySecretRef:
    name: anthropic-credentials
    key: api-key

The gateway pod mounts the Secret and presents a single OpenAI-compatible endpoint to agents. Rotating a key is a kubectl create secret operation — the gateway restarts, agents are unaffected.

Agent Integration¶

The operator injects two environment variables into every agent container:

Variable	Value
`MODEL_ENDPOINT`	`http://gateway.<namespace>.svc.cluster.local:8000`
`LLM_MODEL`	Comma-separated list of model names from `spec.models[].name`

Both are also available through /etc/agent/config.yaml under the models: key:

models:
  claude-sonnet:
    role: primary
    provider: anthropic
    model: claude-sonnet-4-5
    endpoint: http://gateway.my-cluster.svc.cluster.local:8000

Agents call the gateway with the model name they want. The gateway routes to the correct upstream provider.

Supported Providers¶

Provider	Value
Anthropic	`anthropic`
OpenAI	`openai`
Azure OpenAI	`azure`
AWS Bedrock	`bedrock`
Google Vertex AI	`vertex`
Any OpenAI-compatible API	`openai-compatible`
Custom LiteLLM config	`custom`

Self-hosted models (Ollama, vLLM)¶

spec:
  provider: openai-compatible
  modelName: llama3.2
  endpoint: http://ollama.default.svc.cluster.local:11434/v1

No apiKeySecretRef needed for unauthenticated endpoints.

Multiple models¶

Agents can reference multiple models. Each model is registered with the same gateway; the agent chooses which to call at runtime:

# LanguageAgent
spec:
  models:
    - name: claude-sonnet   # primary
    - name: llama3          # fallback / secondary

Rate Limiting¶

spec:
  rateLimits:
    requestsPerMinute: 100
    tokensPerMinute: 50000

Limits are enforced by the shared gateway across all agents. Per-agent limits are not currently supported.

LanguageCluster — owns the shared gateway
LanguageAgent — references models via spec.models
LanguageModel API Reference — full field documentation