80% fewer tokens. Zero code changes. One config line.
Zero agent code changes. One config line — KWJ intercepts log floods, repeated reads, and redundant fetches before they reach your LLM. Real result: 2.62M tokens → 254K.
No credit card. 1-hour free trial. $19/month after.
{"mcpServers":{"kwj":{"command":"custom-mcp","args":["serve"]}}}
Works with Claude Code, Cursor, Windsurf, OpenAI SDK + more. See all integrations → · View MCP manifest →
Works with any AI agent — 10 integration examples →
* Measured on the fleet-evolve STAR50 benchmark. Individual results vary by workflow.
import kwj
client = kwj.Client(api_key="kwj_your_key")
# Web cache: zero tokens on repeat fetches (TTL 3600s)
docs = client.web_read("https://docs.anthropic.com/en/api")
# Output compressor: 500 lines -> 30 essential lines
compressed = client.digest(raw_build_log)
# Fuzzy answer cache: skip analysis if already computed
result = client.cache_get("uniswap oracle audit findings")
if not result.hit:
result = run_expensive_analysis()
client.cache_put("uniswap oracle audit findings", result)
# Code slicer: 10,000-line file -> one function (98% token cut)
fn_body = client.slice("src/main.rs", symbol="handle_request")
import { KWJClient } from '@kwj/sdk';
const client = new KWJClient({ apiKey: 'kwj_your_key' });
// Web cache: zero tokens on repeat fetches (TTL 3600s)
const docs = await client.webRead('https://docs.anthropic.com/en/api');
// Output compressor: 500 lines -> 30 essential lines
const compressed = await client.digest(rawBuildLog);
// Fuzzy answer cache: skip analysis if already computed
const cached = await client.cacheGet('security audit ERC-20');
if (!cached.hit) {
const result = await runExpensiveAnalysis();
await client.cachePut('security audit ERC-20', result);
}
// Code slicer: one symbol from a 10k-line file
const fnBody = await client.slice('src/index.ts', 'handleRequest');
# Web cache - zero tokens on repeat fetches
curl "https://kwj.ai/api/v1/web_read?url=https://docs.anthropic.com&api_key=kwj_xxx"
# Output compressor - 500 lines -> 30 essential lines
curl "https://kwj.ai/api/v1/digest?input=$(cat build.log | python3 -c 'import sys,urllib.parse; print(urllib.parse.quote(sys.stdin.read()))')&api_key=kwj_xxx"
# Fuzzy answer cache lookup (Jaccard 0.35 threshold)
curl "https://kwj.ai/api/v1/cache_get?q=security+audit+uniswap&api_key=kwj_xxx"
# {"ok":true,"hit":true,"value":"...cached analysis..."}
# Document extraction - PDF without full-load token waste
curl "https://kwj.ai/api/v1/doc_extract?file=/tmp/report.pdf&api_key=kwj_xxx"
pip install kwj
npm install @kwj/sdk
export KWJ_API_KEY=kwj_your_key
Live upgrades
The fleet runs an autonomous build loop. New token-saving improvements ship every hour. You get them automatically — no upgrade, no action needed.
WEEK OF JUN 16
WEEK OF JUN 9
Live stats
The case
Every LLM plan charges you for tokens. KWJ intercepts the waste before it reaches any model — Claude, OpenAI, DeepSeek, or any other.
Measurable savings guarantee: KWJ automatically tracks token reduction via your API key. If verified savings fall below 80% in your first 30 days, your first month is refunded. No questions asked.
Real numbers
No synthetic benchmarks. These are measured reductions from live AI agent sessions.
The same git status runs 300+ times per session without caching. One TTL-keyed hash returns the result instantly on every subsequent call.
300 calls → 1 real call · cache hit rate: 98.7%
Any agent reads entire 2,000-line files when it needs one function. Symbol-slicing sends only the relevant span — 40 lines instead of 2,000.
2,000 lines → 40 lines per read · 98% token cut
Build logs bloat context. The compressor collapses repeated lines, elides middles, and always rescues error/warning lines. Errors are never lost.
500-line log → 30 lines · errors/warnings always preserved
Full memory files load on every turn. TF-IDF indexing pulls only the 3–5 relevant fact chunks instead of loading the entire knowledge base.
TF-IDF · per-file mtime indexing · auto-reindex on change
Expensive analysis gets re-derived from scratch. Jaccard shingle similarity at 0.35 threshold matches near-identical queries and returns the stored answer immediately.
Jaccard 0.35 threshold · sha256 content-address · $0 on hit
Multi-model
80% fewer tokens sent to any LLM. Same output. Works with Claude, OpenAI, DeepSeek, and 7 more.
Add to mcp.json — tools auto-appear. Native MCP protocol, zero extra setup.
MCPServerStdio("custom-mcp", ["serve"]) in Python. All 50 tools available instantly.
CLI tools pipe slim context to any model endpoint. No SDK dependency required.
MultiServerMCPClient adapter — all tools available to any LangChain agent or graph node.
StdioMcpToolAdapter with custom-mcp serve — works with any AutoGen agent pattern.
Add to .cursor/mcp.json or .continue/config.json — one line, all tools appear.
Any agent that supports MCP or can run shell commands gets the full tool suite. See all 10 integration examples →
The tools
Every tool targets a measured token sink. The fleet compounds: each tool makes every other tool cheaper to call.
Runs shell commands once, reuses the result for hours. Eliminates 300+ repeated identical calls per session via TTL-keyed hash.
Reads only the function you need. Extracts one symbol from a 10,000-line file. Rust, Python, TypeScript, Go, JavaScript.
Fetches web pages once, serves from cache for days. TTL-keyed by URL hash. Eliminates redundant documentation fetches entirely.
Shrinks 500-line logs to 30 essential lines. Collapses noise, always rescues error and warning lines. Failures are never dropped.
Finds relevant facts without loading all memory. TF-IDF pulls only the 3–5 fact chunks you need. Auto-reindexes on file change.
Never recomputes the same analysis twice. Fuzzy Jaccard shingle similarity at 0.35 threshold catches near-identical queries.
PDF, CSV, and XLSX extraction without token waste. 3-tier PDF pipeline: native text, OCR, Rust fallback. Structured output.
One call to 25+ tools. No repeated tool loading. Web read, vault, cache, recall — all one hop away via the MCP protocol.
Git cache, job scheduler, task queue, code auditor, search engine. All included at $19/month.
Built for
Smart Reader slices only the function under review. Output Compressor strips test noise from Slither/Echidna output. Answer Cache reuses prior invariant analysis.
98% token cut on symbol reads · tools: custom-read, custom-digest, custom-cacheWeb Cache serves repeated URL fetches at $0. Memory Search pulls only the 3–5 relevant fact chunks from a large knowledge base using TF-IDF. Answer Cache short-circuits repeated analysis.
95% memory recall cut · tools: custom-websearch, custom-recall, custom-cacheContext Map gives a symbol outline of any directory. Symbol Slicer sends only the function you asked about. Command Cache deduplicates git status, git log, and build commands.
300 git calls → 1 real call · tools: custom-context, custom-read, custom-bashWeb Cache TTLs price feeds to avoid redundant fetches. Output Compressor trims verbose API responses. Vault stores API credentials with AES-256-GCM so secrets never hit the prompt.
$0 per cache hit · tools: custom-websearch, custom-digest, custom-vaultYour savings
Based on 90% average token reduction applied to your plan cost.
Setup
01
Enter your email below. Key issued instantly. No credit card required. 1-hour trial, then $19/month.
02
{"mcpServers":{"kwj":{"command":"custom-mcp","args":["serve"]}}}
click to copy
All 50 tools register as MCP endpoints automatically. Works with Claude Code, Cursor, Windsurf, OpenAI SDK + more. See all integrations →
03
Track token reduction live. Most users see measurable cuts in the first session. The fleet compounds with use.
Common questions
A token is roughly 4 characters of text — a word, a symbol, a fragment of code. Every API call to any LLM is priced per token in and per token out. A 500-line log file is about 8,000 tokens. If your agent reads it once per turn and you have 20 turns in a session, that one file costs 160,000 tokens — before you've done any real work. KWJ intercepts that before it reaches any model.
Enter your email and you get an API key immediately — no credit card needed. The key is valid for 1 hour and includes 100 API calls across all 50 tools. That's enough to run a real session and see the token savings first-hand. After the trial expires, you choose whether to subscribe at $19/month.
Your trial key stops accepting new requests. Your agent session continues normally — KWJ tools simply return an auth error and your agent falls back to its default behavior. None of your work is lost. Subscribe at any point to reactivate the same key or get a new one.
KWJ works with any agent that supports MCP (Claude Code, OpenAI Agents SDK, Cursor, Windsurf, Continue.dev, LangChain, LangGraph, AutoGen) or can run shell commands (any Python/Node.js/bash-based agent). The token savings are LLM-agnostic — compressing context before sending to any model saves tokens. See /agents for integration examples.
Integration
Drop KWJ into any agent workflow. No framework lock-in. Works with LangChain, OpenAI SDK, Anthropic SDK, raw HTTP, or any MCP client. See all 10 integration examples →
import openai # or: anthropic, deepseek, groq — same pattern
import kwj
# Works with any LLM provider — swap openai.OpenAI() for any other client
llm = openai.OpenAI(api_key="sk-your-key") # or Anthropic(), DeepSeek(), etc.
kwj_client = kwj.Client(api_key="kwj_your_key")
def run_agent_turn(user_message: str) -> str:
# 1. Shrink context before sending to any LLM
compressed_logs = kwj_client.digest(read_build_logs())
relevant_facts = kwj_client.recall("project architecture decisions")
code_symbol = kwj_client.slice("src/main.rs", symbol="handle_request")
# 2. Check answer cache — skip LLM entirely on a hit (any model)
cache_hit = kwj_client.cache_get(user_message)
if cache_hit.hit:
return cache_hit.value
# 3. Call your LLM with lean, pre-shrunk context (90% fewer tokens)
response = llm.chat.completions.create(
model="gpt-4o", # or claude-sonnet-4-5, deepseek-chat, llama-3.1-70b…
max_tokens=2048,
messages=[
{"role": "user", "content": (
f"Context:\n{relevant_facts}\n\n"
f"Recent build output:\n{compressed_logs}\n\n"
f"Relevant code:\n{code_symbol}\n\n"
f"Question: {user_message}"
)}
]
)
result = response.choices[0].message.content
# 4. Store in cache — future turns skip LLM entirely on match
kwj_client.cache_put(user_message, result)
return result
# Works with Claude Code, Cursor, Windsurf, Continue.dev, OpenAI Agents SDK,
# LangChain, LangGraph, AutoGen — any MCP-compatible client.
# Add to ~/.claude/mcp.json, .cursor/mcp.json, or .continue/config.json:
{
"mcpServers": {
"kwj": {
"command": "custom-mcp",
"args": ["serve"],
"env": {
"KWJ_API_KEY": "kwj_your_key_here"
}
}
}
}
# All 50 KWJ tools register automatically:
# kwj_web_read — cached web fetch (TTL 3600s)
# kwj_digest — shrink log/command output
# kwj_slice — extract one symbol from a file
# kwj_cache_get — fuzzy answer cache lookup
# kwj_cache_put — store result for future turns
# kwj_recall — TF-IDF memory search
# kwj_doc_extract — PDF / XLSX / CSV extraction
# kwj_git — cached git wrapper
# ... and 42 more
# See /agents for Python, TypeScript, bash, and per-framework examples.
$ kwj ping --api-key kwj_your_key_here
# {"ok":true,"tools":50,"plan":"trial","expires_in":"58m"}
Architecture
kwj sits between your agent framework and the LLM. It intercepts every prompt, compresses it, checks the cache, and routes the slim version — so your agent pays for 20% of the tokens it used to burn.
kwj sits between your agent framework and the LLM. It intercepts every prompt, compresses it, checks the cache, and routes the slim version.
Early users
“KWJ cut our Claude Code bill in half. We were burning 8M tokens a day on build log reads alone. custom-digest + custom-bash took that to under 400K. Nothing else we tried came close.”
Senior Platform Engineer“I run 50+ agent sessions a day auditing smart contracts. Before KWJ, my OpenAI bill was brutal. Now the answer cache short-circuits 80% of repeat analysis. $19/month is a rounding error on what I was spending.”
Smart Contract Auditor“MCP integration took 2 minutes. Added it to my .claude/mcp.json and every tool just appeared. The symbol slicer alone stopped me reading 9,000-line files 30 times a day. Immediate, measurable.”
indie developer — Claude Code userPricing
One plan. Everything included. No per-seat, no per-call, no overages.
Monthly
per month · 1-hour free trial
Annual
per month, billed $180/yr · Save $48/yr — 2 months free
Start now
Key arrives instantly. No credit card. Unlock all 50 tools free for 1 hour — most users see 80%+ reduction on their first session.
No credit card required. 1-hour trial, 100 free API calls across all 50 tools. No strings attached.