LLM Provider Setup

Configure Cluster Code to work with your preferred AI provider - from cloud-based models to local, self-hosted options.

Overview
Quick Setup
Anthropic (Claude)
OpenAI (GPT)
1. Models Available
2. Setup Steps
Google (Gemini)
1. Models Available
2. Setup Steps
Ollama (Local Models)
LM Studio (Alternative Local Option)
1. Setup Steps
OpenAI-Compatible Providers
1. Supported Providers
2. Generic Setup
Managing Multiple Providers
Advanced Configuration
1. Custom Model Parameters
2. Provider-Specific Options
Privacy & Security Considerations
Troubleshooting
Recommended Configurations
Next Steps
Additional Resources

Overview

Cluster Code supports multiple LLM providers through the Vercel AI SDK, giving you flexibility in choosing between:

GitHub Copilot - Access multiple models (GPT-4o, Claude, Gemini) via your GitHub account (Recommended)
Cloud Providers - Anthropic, OpenAI, Google (powerful, but sends data externally)
Local Models - Ollama, LM Studio (complete privacy, no API costs)
Custom Providers - Any OpenAI-compatible endpoint

At least one LLM provider must be configured for Cluster Code to function. Choose based on your privacy, cost, and performance requirements.

Quick Setup

Using GitHub Copilot (Recommended)

The easiest way to get started is with GitHub Copilot:

# Authenticate with GitHub
cluster-code github login

# Select your preferred model
cluster-code github model

This gives you access to GPT-4o, Claude, Gemini, o1, and more through a single authentication.

See the GitHub Copilot Setup Guide for detailed instructions.

Using Environment Variables

For direct API access, use environment variables:

# Anthropic (Claude)
export ANTHROPIC_API_KEY="sk-ant-api03-..."

# OpenAI (GPT)
export OPENAI_API_KEY="sk-..."

# Google (Gemini)
export GOOGLE_GENERATIVE_AI_API_KEY="..."

Cluster Code will automatically detect and use these credentials.

Using Configuration Commands

For persistent configuration:

# Add a provider
cluster-code config provider add anthropic

# Set as active provider
cluster-code config provider set anthropic

# Verify configuration
cluster-code config provider show

Anthropic (Claude)

Claude models from Anthropic provide excellent reasoning and code understanding capabilities.

Models Available

Model	Context	Best For
`claude-3-5-sonnet-20241022`	200K	Recommended - Best balance of speed and intelligence
`claude-3-opus-20240229`	200K	Most capable, slower, more expensive
`claude-3-haiku-20240307`	200K	Fastest, most economical

Setup Steps

Get API Key
- Visit Anthropic Console
- Sign up or log in
- Go to API Keys section
- Create a new API key

Configure Cluster Code

Option A: Environment Variable

export ANTHROPIC_API_KEY="sk-ant-api03-..."

Option B: Interactive Setup

cluster-code config provider add anthropic
# Provider type: Anthropic (Claude)
# API Key: sk-ant-api03-...
# Model: claude-3-5-sonnet-20241022

Option C: Direct Configuration

cluster-code config provider add anthropic \
  --api-key "sk-ant-api03-..." \
  --model "claude-3-5-sonnet-20241022"

Set as Active Provider

cluster-code config provider set anthropic

Verify

cluster-code chat "Hello, test my Anthropic connection"

Configuration File

Your config will look like this in ~/.cluster-code/config.json:

{
  "llm": {
    "provider": "anthropic",
    "model": "claude-3-5-sonnet-20241022",
    "maxTokens": 4096
  },
  "providers": {
    "anthropic": {
      "type": "anthropic",
      "name": "Anthropic",
      "apiKey": "sk-ant-api03-..."
    }
  }
}

OpenAI (GPT)

OpenAI’s GPT models are widely used and well-supported.

Models Available

Model	Context	Best For
`gpt-4-turbo`	128K	Latest GPT-4, faster and cheaper
`gpt-4`	8K	Original GPT-4, most tested
`gpt-3.5-turbo`	16K	Fast and economical

Setup Steps

Get API Key
- Visit OpenAI Platform
- Sign up or log in
- Go to API Keys
- Create new secret key

Configure Cluster Code

Option A: Environment Variable

export OPENAI_API_KEY="sk-..."

Option B: Interactive Setup

cluster-code config provider add openai
# Provider type: OpenAI (GPT)
# API Key: sk-...
# Model: gpt-4-turbo

Set as Active Provider

cluster-code config provider set openai

Verify
```
cluster-code diagnose
```

Google (Gemini)

Google’s Gemini models offer competitive performance with generous free tier.

Models Available

Model	Context	Best For
`gemini-1.5-pro`	2M	Largest context, best for complex tasks
`gemini-1.5-flash`	1M	Fast and efficient
`gemini-pro`	32K	General purpose

Setup Steps

Get API Key
- Visit Google AI Studio
- Sign in with Google account
- Click “Create API Key”

Configure Cluster Code

Option A: Environment Variable

export GOOGLE_GENERATIVE_AI_API_KEY="..."

Option B: Interactive Setup

cluster-code config provider add google
# Provider type: Google (Gemini)
# API Key: ...
# Model: gemini-1.5-pro

Set as Active Provider

cluster-code config provider set google

Ollama (Local Models)

Run models locally for complete privacy and no API costs.

Popular Models

Model	Size	Best For
`llama3:70b`	70B	Best quality, requires powerful hardware
`llama3:8b`	8B	Good balance, runs on most systems
`deepseek-coder-v2:16b`	16B	Code-focused, excellent for Kubernetes
`mistral:7b`	7B	Fast and efficient
`codellama:13b`	13B	Code generation and analysis

Setup Steps

Install Ollama

macOS/Linux:
```
curl -fsSL https://ollama.ai/install.sh | sh
```
Windows:
- Download from ollama.ai
- Run the installer

Pull a Model

# For best results (requires 48GB+ RAM)
ollama pull llama3:70b

# For good balance (requires 8GB+ RAM)
ollama pull llama3:8b

# For code-focused tasks (requires 16GB+ RAM)
ollama pull deepseek-coder-v2:16b

Verify Ollama is Running

# Start Ollama (if not already running)
ollama serve

# Test the model
ollama run llama3:8b "Hello"

Configure Cluster Code

Option A: Interactive Setup

cluster-code config provider add ollama
# Provider type: Ollama (Local)
# Base URL: http://localhost:11434/v1
# Model: llama3:8b
# API Key: (leave empty - press Enter)

Option B: Environment Variables

export OLLAMA_BASE_URL="http://localhost:11434/v1"
export OLLAMA_MODEL="llama3:8b"

Set as Active Provider

cluster-code config provider set ollama

Verify

cluster-code chat "Test local model connection"

Performance Tips

# Use GPU acceleration (NVIDIA)
# Ollama automatically uses GPU if available

# Check GPU usage
nvidia-smi

# Adjust context window for better performance
cluster-code config set llm.maxTokens 2048

# Use smaller model for faster responses
ollama pull llama3:8b
cluster-code config provider set ollama --model llama3:8b

LM Studio (Alternative Local Option)

LM Studio provides a user-friendly GUI for running local models.

Setup Steps

Install LM Studio
- Download from lmstudio.ai
- Install and launch the application
Download a Model
- In LM Studio, go to “Discover” tab
- Search for models like “Llama 3”, “DeepSeek Coder”, etc.
- Download your preferred model
Start Local Server
- Click “Local Server” in LM Studio
- Load your model
- Start server (default: http://localhost:1234/v1)

Configure Cluster Code

cluster-code config provider add lmstudio
# Provider type: OpenAI-compatible (Custom)
# Base URL: http://localhost:1234/v1
# Model: <model-name-from-lmstudio>
# API Key: (leave empty)

Set as Active Provider

cluster-code config provider set lmstudio

OpenAI-Compatible Providers

Use any OpenAI-compatible API endpoint.

Supported Providers

LM Studio - Local GUI for running models
LocalAI - Self-hosted OpenAI alternative
Text Generation WebUI - Gradio-based interface
vLLM - High-performance inference server
Together.ai - Cloud-hosted open models
Anyscale - Hosted LLM endpoints

Generic Setup

cluster-code config provider add custom
# Provider type: OpenAI-compatible (Custom)
# Base URL: https://your-api-endpoint.com/v1
# API Key: your-api-key (if required)
# Model: your-model-name

Example: Together.ai

cluster-code config provider add together \
  --type openai-compatible \
  --base-url "https://api.together.xyz/v1" \
  --api-key "your-together-api-key" \
  --model "meta-llama/Llama-3-70b-chat-hf"

Managing Multiple Providers

List All Configured Providers

cluster-code config provider list

Output:

Configured LLM Providers:
  anthropic (Anthropic Claude) [ACTIVE]
    Model: claude-3-5-sonnet-20241022
  openai (OpenAI GPT)
    Model: gpt-4-turbo
  ollama (Ollama Local)
    Model: llama3:8b

Switch Between Providers

# Switch to OpenAI
cluster-code config provider set openai

# Switch to local Ollama
cluster-code config provider set ollama

# Switch to Anthropic
cluster-code config provider set anthropic

Remove a Provider

cluster-code config provider remove openai

Update Provider Settings

# Update API key
cluster-code config provider update anthropic --api-key "new-key"

# Update model
cluster-code config provider update anthropic --model "claude-3-opus-20240229"

# Update base URL (for custom providers)
cluster-code config provider update custom --base-url "https://new-url.com/v1"

Advanced Configuration

Custom Model Parameters

Edit ~/.cluster-code/config.json to customize model behavior:

{
  "llm": {
    "provider": "anthropic",
    "model": "claude-3-5-sonnet-20241022",
    "maxTokens": 4096,
    "temperature": 0.7,
    "topP": 0.9,
    "stream": true
  }
}

Parameters:

maxTokens: Maximum response length (default: 4096)
temperature: Randomness (0-1, default: 0.7)
topP: Nucleus sampling (0-1, default: 0.9)
stream: Enable streaming responses (default: true)

Provider-Specific Options

Anthropic:

{
  "providers": {
    "anthropic": {
      "type": "anthropic",
      "apiKey": "sk-ant-api03-...",
      "maxRetries": 3,
      "timeout": 60000
    }
  }
}

OpenAI:

{
  "providers": {
    "openai": {
      "type": "openai",
      "apiKey": "sk-...",
      "organization": "org-...",
      "baseURL": "https://api.openai.com/v1"
    }
  }
}

Ollama:

{
  "providers": {
    "ollama": {
      "type": "ollama",
      "baseURL": "http://localhost:11434/v1",
      "keepAlive": "5m",
      "numCtx": 4096
    }
  }
}

Privacy & Security Considerations

Cloud Providers (Anthropic, OpenAI, Google)

Data sent to cloud providers:

Cluster metadata (names, namespaces, labels)
Pod logs and error messages
Resource configurations
Your questions and commands

Privacy measures:

Use environment variables instead of config files for API keys
Regularly rotate API keys
Review provider privacy policies
Consider data residency requirements

Local Models (Ollama, LM Studio)

Benefits:

Complete data privacy - nothing leaves your machine
No API costs
Works offline
Full control over model and data

Trade-offs:

Requires local compute resources
May be slower than cloud models
Limited to available model capabilities

Best Practices

# Use environment variables for sensitive data
export ANTHROPIC_API_KEY="..."
# Don't commit this to git

# Exclude config from version control
echo ".cluster-code/" >> .gitignore
echo "~/.cluster-code/config.json" >> .gitignore

# Use different providers for different sensitivity levels
# Use local models for sensitive clusters
cluster-code config provider set ollama

# Use cloud models for development/testing
cluster-code config provider set anthropic

Troubleshooting

Authentication Errors

# Verify API key is set
echo $ANTHROPIC_API_KEY  # Should show your key

# Test provider directly
cluster-code config provider show

# Re-configure provider
cluster-code config provider remove anthropic
cluster-code config provider add anthropic

Ollama Connection Issues

# Check if Ollama is running
curl http://localhost:11434/api/tags

# Start Ollama service
ollama serve

# Check model is pulled
ollama list

# Test model directly
ollama run llama3:8b "Test"

Rate Limiting

# Add retry logic (in config.json)
{
  "llm": {
    "maxRetries": 5,
    "retryDelay": 1000
  }
}

# Or switch to local model temporarily
cluster-code config provider set ollama

Model Performance Issues

# Reduce token limit for faster responses
cluster-code config set llm.maxTokens 2048

# Use a faster model
cluster-code config provider update anthropic --model claude-3-haiku-20240307

# For Ollama, use smaller model
ollama pull llama3:8b
cluster-code config provider set ollama --model llama3:8b

Recommended Configurations

For Production Clusters (Privacy-Focused)

# Use local models only
ollama pull deepseek-coder-v2:16b
cluster-code config provider add ollama --model deepseek-coder-v2:16b
cluster-code config provider set ollama

For Development (Best Performance)

# Use Anthropic Claude for best results
export ANTHROPIC_API_KEY="sk-ant-api03-..."
cluster-code config provider add anthropic --model claude-3-5-sonnet-20241022
cluster-code config provider set anthropic

For Cost-Conscious Users

# Use Ollama for free local inference
ollama pull llama3:8b
cluster-code config provider add ollama --model llama3:8b
cluster-code config provider set ollama

For Maximum Capability

# Use Claude Opus for most complex tasks
cluster-code config provider add anthropic --model claude-3-opus-20240229
cluster-code config provider set anthropic

Next Steps

Now that you’ve configured your LLM provider:

Getting Started Guide - Start using Cluster Code
Configuration Reference - Advanced configuration options
Diagnostics Guide - Learn AI-powered troubleshooting

Or try it out:

cluster-code chat "Show me all pods in my cluster"

LLM Provider Setup

Table of Contents

Overview

Quick Setup

Using GitHub Copilot (Recommended)

Using Environment Variables

Using Configuration Commands

Anthropic (Claude)

Models Available

Setup Steps

Configuration File

OpenAI (GPT)

Models Available

Setup Steps

Google (Gemini)

Models Available

Setup Steps

Ollama (Local Models)

Popular Models

Setup Steps

Performance Tips

LM Studio (Alternative Local Option)

Setup Steps

OpenAI-Compatible Providers

Supported Providers

Generic Setup

Managing Multiple Providers

List All Configured Providers

Switch Between Providers

Remove a Provider

Update Provider Settings

Advanced Configuration

Custom Model Parameters

Provider-Specific Options

Privacy & Security Considerations

Cloud Providers (Anthropic, OpenAI, Google)

Local Models (Ollama, LM Studio)

Best Practices

Troubleshooting

Authentication Errors

Ollama Connection Issues

Rate Limiting

Model Performance Issues

Recommended Configurations

For Production Clusters (Privacy-Focused)

For Development (Best Performance)

For Cost-Conscious Users

For Maximum Capability

Next Steps

Additional Resources