A lightweight proxy server that enables OpenClaw (and other OpenAI-compatible clients) to use Google Vertex AI's Gemini 2.5 Flash with granular control over reasoning effort levels.
OpenClaw doesn't natively support passing custom API parameters like reasoning_effort to Vertex AI models. This proxy solves that by:
- Intercepting OpenAI-compatible API requests from OpenClaw
- Parsing model IDs to determine desired reasoning level
- Adding the
reasoning_effortparameter to Vertex AI requests - Forwarding modified requests to Vertex AI
- Returning responses back to OpenClaw
- ✅ Three-tier thinking system: Low (1K tokens) / Medium (8K tokens) / High (24K tokens)
- ✅ OpenAI-compatible API: Works with OpenClaw and other clients
- ✅ Automatic token refresh: Uses
gcloudfor fresh tokens - ✅ Zero configuration needed in OpenClaw: Just change the base URL
- ✅ Lightweight: FastAPI-based, minimal dependencies
- ✅ Easy deployment: Run as systemd/launchd service
┌──────────┐ ┌───────────────┐ ┌─────────────┐
│ OpenClaw │────────>│ Proxy Server │────────>│ Vertex AI │
└──────────┘ │ localhost:8000│ │ Gemini │
└───────────────┘ └─────────────┘
│
Model ID Parsing:
├─ *-low → reasoning_effort: "low"
├─ *-medium → reasoning_effort: "medium"
└─ *-high → reasoning_effort: "high"
- Python 3.9+
- Google Cloud SDK (
gcloud) - Vertex AI API enabled
- Application Default Credentials configured
# Install gcloud
# See: https://cloud.google.com/sdk/docs/install
# Authenticate
gcloud auth application-default login
# Set project
gcloud config set project YOUR_PROJECT_IDgit clone https://github.com/danzam98/vertexai-proxy.git
cd vertexai-proxypip install -r requirements.txtcp .env.example .env
# Edit .env with your Vertex AI project detailspython proxy.pyThe server will start on http://127.0.0.1:8000
# Health check
curl http://localhost:8000/health
# Test completion (low reasoning)
curl -X POST http://localhost:8000/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "google/gemini-2.5-flash-low",
"messages": [{"role": "user", "content": "What is 2+2?"}]
}'Edit ~/.openclaw/openclaw.json:
{
"models": {
"mode": "merge",
"providers": {
"vertexai-proxy": {
"baseUrl": "http://127.0.0.1:8000/v1",
"apiKey": "dummy-key-not-used",
"api": "openai-completions",
"models": [
{
"id": "google/gemini-2.5-flash-low",
"name": "Gemini 2.5 Flash (Low)",
"reasoning": true,
"contextWindow": 1048576,
"maxTokens": 65536
},
{
"id": "google/gemini-2.5-flash-medium",
"name": "Gemini 2.5 Flash (Medium)",
"reasoning": true,
"contextWindow": 1048576,
"maxTokens": 65536
},
{
"id": "google/gemini-2.5-flash-high",
"name": "Gemini 2.5 Flash (High)",
"reasoning": true,
"contextWindow": 1048576,
"maxTokens": 65536
}
]
}
}
},
"agents": {
"defaults": {
"model": {
"primary": "vertexai-proxy/google/gemini-2.5-flash-medium"
},
"heartbeat": {
"model": "vertexai-proxy/google/gemini-2.5-flash-low"
},
"subagents": {
"model": "vertexai-proxy/google/gemini-2.5-flash-high",
"maxConcurrent": 8
}
}
}
}Create ~/Library/LaunchAgents/com.vertexai.proxy.plist:
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
<plist version="1.0">
<dict>
<key>Label</key>
<string>com.vertexai.proxy</string>
<key>ProgramArguments</key>
<array>
<string>/usr/bin/python3</string>
<string>/Users/YOUR_USERNAME/vertexai-proxy/proxy.py</string>
</array>
<key>RunAtLoad</key>
<true/>
<key>KeepAlive</key>
<true/>
<key>StandardOutPath</key>
<string>/tmp/vertexai-proxy.log</string>
<key>StandardErrorPath</key>
<string>/tmp/vertexai-proxy.err.log</string>
</dict>
</plist>Load the service:
launchctl load ~/Library/LaunchAgents/com.vertexai.proxy.plistCreate /etc/systemd/system/vertexai-proxy.service:
[Unit]
Description=Vertex AI Reasoning Proxy
After=network.target
[Service]
Type=simple
User=YOUR_USERNAME
WorkingDirectory=/home/YOUR_USERNAME/vertexai-proxy
ExecStart=/usr/bin/python3 /home/YOUR_USERNAME/vertexai-proxy/proxy.py
Restart=always
[Install]
WantedBy=multi-user.targetEnable and start:
sudo systemctl enable vertexai-proxy
sudo systemctl start vertexai-proxyThe proxy parses model IDs with the following pattern:
google/gemini-2.5-flash-low→reasoning_effort: "low"(1K tokens)google/gemini-2.5-flash-medium→reasoning_effort: "medium"(8K tokens)google/gemini-2.5-flash-high→reasoning_effort: "high"(24K tokens)google/gemini-2.5-flash→reasoning_effort: "medium"(default)
By routing different agent types to appropriate thinking levels:
| Agent Type | Reasoning Level | Reasoning Tokens | Use Case |
|---|---|---|---|
| Heartbeat | Low | ~1K | Quick checks, simple queries |
| Main Agent | Medium | ~8K | Standard conversations |
| Subagents | High | ~24K | Complex tasks, deep reasoning |
Estimated savings: 40-60% on reasoning token costs vs. always using high reasoning.
Ensure gcloud is authenticated:
gcloud auth application-default loginChange the port in .env or proxy.py:
PROXY_PORT=8001Verify the proxy is running:
curl http://localhost:8000/healthuvicorn proxy:app --reload --host 127.0.0.1 --port 8000python proxy.py 2>&1 | tee proxy.logContributions welcome! Please:
- Fork the repository
- Create a feature branch
- Make your changes
- Submit a pull request
MIT License - see LICENSE file for details
- Built for OpenClaw
- Uses Google Cloud Vertex AI
- Inspired by the need for granular reasoning control
- Issues: https://github.com/danzam98/vertexai-proxy/issues
- Discussions: https://github.com/danzam98/vertexai-proxy/discussions
Built with ❤️ for the OpenClaw community