LeanProxy-MCP Documentation¶

Welcome to the LeanProxy-MCP user documentation. This documentation is intended for developers and technical users who want to understand and use LeanProxy-MCP.

What is LeanProxy-MCP?¶

LeanProxy-MCP is a lightweight, local CLI proxy designed to sit between your IDE and MCP (Model Context Protocol) servers. It acts as a "Token Firewall" — reducing token consumption and redacting sensitive data before it reaches LLM providers.

Target Audience¶

This documentation is designed for: - Developers who use IDEs with MCP support (Claude Desktop, Cursor, OpenCode, Windsurf) - Technical users who want to optimize token usage and protect sensitive data - DevOps engineers who need to manage MCP server configurations

Quick Links¶

Guide	Description
Installation	Download and install LeanProxy-MCP
Quick Start	Get up and running in minutes
Commands Reference	Complete CLI command documentation
Configuration	Customize LeanProxy-MCP behavior
Architecture	Understanding the internal design
Security	Security hardening features
Graceful Shutdown	Proper shutdown patterns and best practices
Troubleshooting	Common issues and solutions
FAQ	Frequently asked questions

The Economics of MCP: Why LeanProxy Saves Money¶

The AI provider market has shifted from monthly forfaits to pay-per-use pricing (May 2026). Every token sent to an LLM now costs real money. This makes token efficiency critical.

The MCP Schema Tax¶

When you run multiple MCP servers, each adds tool schemas to every LLM request. We measured this live with our own MCP configuration:

MCP Servers	Tools	Tokens per Request
Garmin	100	~10,000 tokens
GitHub	41	~4,100 tokens
Stitch	12	~1,200 tokens
Intervals.icu	10	~1,000 tokens
All 4 combined	163	~16,300+ tokens

These tool counts come from live MCP servers queried via LeanProxy. Each tool adds ~100 tokens of schema + arguments. With 163 tools configured, that's the "schema tax" on every prompt.

For a 7-prompt mixed session where all 4 MCP servers are configured but only 2-3 actually invoked, Native MCP wastes ~16,300 tokens on schemas never used.

Real Examples: Working Sessions¶

Based on live MCP tool invocations:

Session	Description	Prompts	Native MCP	LeanProxy	Savings
A	Sport (Garmin + Intervals.icu)	4	~21,000	~2,000	90%+
B	Dev (GitHub + Stitch)	5	~10,600	~2,400	77%+
C	Full Day (all 4)	7	~49,600	~3,500	93%+

Session A: Morning Sport (Garmin + Intervals.icu)¶

Prompt	Tool Invoked	Native MCP	LeanProxy
1	`garmin_get_stats`	10,000	~500
2	`intervals_get_events`	11,000	~500
3	`intervals_get_activity_intervals`	cached	~500
4	`intervals_add_or_update_event`	cached	~500
Total		~21,000	~2,000

Session B: Dev Session (GitHub + Stitch)¶

Prompt	Tool Invoked	Native MCP	LeanProxy
1	`github_search_repositories`	4,100	~600
2	`github_get_file_contents`	cached	cached
3	`stitch_list_projects`	5,300	~600
4	`stitch_generate_screen_from_text`	cached	~600
5	`github_create_pull_request`	cached	~600
Total		~10,600	~2,400

The Cache Read Cost Fallacy¶

Providers advertise prompt caching as "free" or "90% savings" — but cache reads aren't free.

When a prompt cache hit occurs, you still pay for reading from cache: - OpenAI: Cache reads at 0.25x input token price - Anthropic: Cache reads at 0.25x input token price
- DeepSeek: Cache reads at 0.25x input token price - Google Gemini: Cache reads at ~0.25x input token price

This means 100% cache hit doesn't mean 100% free. A 16,300-token MCP schema at 100% cache hit still costs:

16,300 tokens × 0.25x = 4,075 "effective" tokens worth of money

Real Comparison: Native MCP vs LeanProxy¶

MCP Servers	Tools	Native MCP (100% cache hit)	LeanProxy	Savings
1 (GitHub)	41	1,025 tokens	27.5	97.3%
2 (GitHub + Stitch)	53	1,325 tokens	27.5	97.9%
3 (+ Intervals.icu)	63	1,575 tokens	27.5	98.2%
4 (all)	163	4,075 tokens	27.5	99.3%

Native MCP sends tool schemas every prompt at 0.25x cache read. LeanProxy sends only ~110 router tokens regardless of backend servers.

The key insight: With Native MCP + caching, you pay for every tool schema on every request (at 0.25x). LeanProxy sends only the router schema — the backend tool schemas only load when actually invoked.

Provider Caching on "Same Input Context"¶

For MCP tool schemas that are identical every request, caching only reduces cost by 75% — you're still paying for the read. The "same input context" scenario:

Scenario	Input Tokens	Cache Rate	Cache Cost (0.25x)	LeanProxy	Savings
1 server (GitHub)	4,100	100% hit	1,025	27.5	97%
2 servers	5,300	100% hit	1,325	27.5	98%
3 servers	15,200	100% hit	3,800	27.5	99%
4 servers (all)	16,300	100% hit	4,075	27.5	99.3%

Critical insight: With "same input context" caching, 100% cache hit STILL costs at 0.25x. LeanProxy sends only ~110 tokens, making cache read cost negligible (27.5 tokens). This is the real advantage.

Monthly Total Token Savings (100 sessions/month)¶

Native MCP sends tool schemas every request (at 0.25x cache read). LeanProxy only sends router schema.

Servers	Tools	GPT-4o-mini ($0.0375/M)	Anthropic Sonnet ($0.40/M)
1	41	$1.03 → $1.02 saved	$10.93 → $10.90 saved
2	53	$1.33 → $1.32 saved	$14.13 → $14.10 saved
4	163	$4.08 → $4.07 saved	$43.47 → $43.44 saved

Formula: 16,300 tokens × 100 sessions × 0.25x cache read / 1M (GPT-4o-mini) or / 1M (Sonnet)

Should You Use Caching with MCP?¶

Scenario	Cache Hit	Recommendation
MCP tool schemas (100% same)	100%	❌ Still costs 0.25x — use LeanProxy
Conversation history (growing)	90%+	✅ Caching saves money
Codebase/RAG context	80%+	✅ Caching saves money
MCP schemas in short session	100%	❌ Cache read cost > savings

Key insight: For MCP tool schemas that are identical every request, caching only reduces cost by 75% — you're still paying for the read. LeanProxy eliminates the overhead entirely. See "Provider Caching on Same Input Context" above for the math.

How LeanProxy Achieves This¶

LeanProxy uses a gateway pattern with JIT (Just-In-Time) schema loading:

Single router schema: Only 2 tools (invoke_tool, list_tools) = ~110 tokens vs 16,300+ for Native MCP
On-demand tool registration: Backend server schemas only load when actually needed (~500 tokens per invocation)
Session-aware caching: Tool schemas persist across the session without per-request overhead

Decision Framework¶

Service Usage (G/N ratio)	Recommendation
> 40% (every prompt)	Native MCP justified
5-40% (regular use)	LeanProxy Gateway
< 5% (rare use)	CLI or on-demand skill

For most developers, GitHub has G/N ≈ 5-10% (fetch issue + create PR), making LeanProxy the cost-efficient choice.

Key Features¶

Feature	Description
Token Firewall	Pre-configured redaction engine that intercepts secrets, API keys, and PII
Shadow Manifesting	Merges global and project-local MCP configurations
JIT Discovery	On-demand tool registration to minimize context overhead
Dry-Run Mode	Simulate proxy behavior without live execution
POSIX CLI	Simple commands for server management

Getting Started¶

New to LeanProxy-MCP? Start here:

Installation Guide - Download and install
Quick Start - Basic usage
Commands Reference - Full command documentation

Need Help?¶

Check the FAQ
Review the Troubleshooting Guide
See Commands Reference for detailed command documentation