Models¶
A LanguageModel configures LLM access for a LanguageCluster. The operator reads all LanguageModel resources in the namespace and registers them with the cluster's shared LiteLLM gateway — agents never hold API credentials or connect to model providers directly.
How It Works¶
One LiteLLM proxy (gateway) runs per LanguageCluster. When you add or remove a LanguageModel, the gateway restarts with the updated model list — no agent redeploy required.
Credential Management¶
API keys are never injected into agent pods. Store them in a Secret:
Reference the Secret from the model spec:
apiVersion: langop.io/v1alpha1
kind: LanguageModel
metadata:
name: claude-sonnet
spec:
provider: anthropic
modelName: claude-sonnet-4-5
apiKeySecretRef:
name: anthropic-credentials
key: api-key
The gateway pod mounts the Secret and presents a single OpenAI-compatible endpoint to agents. Rotating a key is a kubectl create secret operation — the gateway restarts, agents are unaffected.
Agent Integration¶
The operator injects two environment variables into every agent container:
| Variable | Value |
|---|---|
MODEL_ENDPOINT |
http://gateway.<namespace>.svc.cluster.local:8000 |
LLM_MODEL |
Comma-separated list of model names from spec.models[].name |
Both are also available through /etc/agent/config.yaml under the models: key:
models:
claude-sonnet:
role: primary
provider: anthropic
model: claude-sonnet-4-5
endpoint: http://gateway.my-cluster.svc.cluster.local:8000
Agents call the gateway with the model name they want. The gateway routes to the correct upstream provider.
Supported Providers¶
| Provider | Value |
|---|---|
| Anthropic | anthropic |
| OpenAI | openai |
| Azure OpenAI | azure |
| AWS Bedrock | bedrock |
| Google Vertex AI | vertex |
| Any OpenAI-compatible API | openai-compatible |
| Custom LiteLLM config | custom |
Self-hosted models (Ollama, vLLM)¶
spec:
provider: openai-compatible
modelName: llama3.2
endpoint: http://ollama.default.svc.cluster.local:11434/v1
No apiKeySecretRef needed for unauthenticated endpoints.
Multiple models¶
Agents can reference multiple models. Each model is registered with the same gateway; the agent chooses which to call at runtime:
Rate Limiting¶
Limits are enforced by the shared gateway across all agents. Per-agent limits are not currently supported.
Related¶
- LanguageCluster — owns the shared gateway
- LanguageAgent — references models via
spec.models - LanguageModel API Reference — full field documentation