KWJ is an AI agent infrastructure API providing 50 token-saving tools for $19/month. Works with Claude, OpenAI, DeepSeek, LangChain, AutoGen, and any MCP-compatible agent.

How much does KWJ cost?

$19/month for all 50 tools with unlimited cache hits.

How does KWJ reduce token usage?

KWJ intercepts expensive patterns: repeated shell calls are cached with TTL, large files are sliced to the needed symbol only, logs are compressed before entering context, and prior analysis is reused via fuzzy answer caching. Combined, these cut token usage by up to 90.3%.

Does KWJ work with Claude Code?

Yes. KWJ is natively designed for Claude Code. Set KWJ_API_KEY in your environment and add the KWJ MCP server to your claude_desktop_config.json. All 50 tools register automatically as MCP tools — no SDK changes required.

What is an MCP server for AI agents?

MCP (Model Context Protocol) is an open standard that lets AI agents like Claude, OpenAI Agents SDK, Cursor, and Windsurf call external tools. KWJ ships as an MCP server, so any MCP-compatible agent can use its 50 token-saving tools with a single config entry — no code changes required.

How do I reduce LLM token costs for my AI agent?

The fastest way is to stop feeding LLMs raw file contents and command output. KWJ's custom-context tool slices only the code symbol you need (instead of whole files), custom-digest compresses build logs to failures-only, and custom-cache reuses prior analysis answers at zero token cost. Together these three tools alone reduce per-session token usage by 70–90%.

Is there a free trial for KWJ?

Yes. Get a 1-hour free trial with 100 API calls per day at kwj.ai/buy. No credit card required. Your API key is generated instantly after email verification. Full $19/month subscription activates after trial with unlimited cache hits.

What is AI token compression?

AI token compression reduces the number of tokens sent to LLMs by filtering, summarizing, or caching the data before it enters the model's context window. KWJ's tools compress command output, slice code files to only the needed symbol, cache repeated web fetches, and recall prior analysis — achieving 90.3% token reduction across real Claude Code sessions (STAR50 benchmark).

KWJ API — 50 LLM token-saving tools — $19/month — https://kwj.ai — works with Claude, OpenAI, DeepSeek, LangChain, AutoGen, Cursor, Windsurf, Continue.dev TOOLS: command-cache (98.7% bash dedup), smart-reader (98% symbol-slice), output-compressor (94% log shrink), memory-search (95% TF-IDF recall), answer-cache (100% on hit, Jaccard 0.35), web-cache (URL TTL cache), vault (AES-256-GCM secrets), totp (RFC6238 2FA), tempmail (disposable inbox), inbox (SMTP receiver), browser (CDP headless), masking (fingerprint coherence), proxy-pool (6k+ proxies), rpc (EVM aggregator), storage (erasure-coded), cpu (distributed batch), ram (encrypted paging), digest (log compression), context (symbol outline), recall (TF-IDF memory), cache (fuzzy answer reuse), approve (Telegram tap), captcha (OCR solver), mcp-bridge (25 MCP tools) PRICING: $19/month, 1-hour free trial, no credit card, unlimited cache hits INTEGRATION: set KWJ_API_KEY env var, tools register as MCP endpoints automatically SAVINGS: measured 2.62M tokens → 254K per session on STAR50 benchmark (90.3% reduction)

Knowledge Without Junk

80% fewer tokens. Zero code changes. One config line.

Measured on 47 real Claude + OpenAI agent sessions

Your AI agent wastes
9 out of 10 tokens.

Zero agent code changes. One config line — KWJ intercepts log floods, repeated reads, and redundant fetches before they reach your LLM. Real result: 2.62M tokens → 254K.

Stop Wasting Tokens — Try Free → View Docs →

No credit card. 1-hour free trial. $19/month after.

MCP Config {"mcpServers":{"kwj":{"command":"custom-mcp","args":["serve"]}}}

Works with Claude Code, Cursor, Windsurf, OpenAI SDK + more. See all integrations → · View MCP manifest →

Works with any AI agent — 10 integration examples →

Works with

Claude Code OpenAI SDK DeepSeek LangChain LangGraph AutoGen Cursor Windsurf MCP

🔒 Measurable savings guarantee 🛡 End-to-end encrypted 📄 No CC required ⚡ Keys issued instantly

0% token reduction
measured across 47 live sessions

2.62M → 254K tokens per session
STAR50 benchmark, reproducible

0 tools deployed
all included in $19 plan

$0 cost per cache hit
unlimited on $19 plan

* Measured on the fleet-evolve STAR50 benchmark. Individual results vary by workflow.

kwj — agent integration

Python

TypeScript

curl

import kwj

client = kwj.Client(api_key="kwj_your_key")

# Web cache: zero tokens on repeat fetches (TTL 3600s)
docs = client.web_read("https://docs.anthropic.com/en/api")

# Output compressor: 500 lines -> 30 essential lines
compressed = client.digest(raw_build_log)

# Fuzzy answer cache: skip analysis if already computed
result = client.cache_get("uniswap oracle audit findings")
if not result.hit:
    result = run_expensive_analysis()
    client.cache_put("uniswap oracle audit findings", result)

# Code slicer: 10,000-line file -> one function (98% token cut)
fn_body = client.slice("src/main.rs", symbol="handle_request")

import { KWJClient } from '@kwj/sdk';

const client = new KWJClient({ apiKey: 'kwj_your_key' });

// Web cache: zero tokens on repeat fetches (TTL 3600s)
const docs = await client.webRead('https://docs.anthropic.com/en/api');

// Output compressor: 500 lines -> 30 essential lines
const compressed = await client.digest(rawBuildLog);

// Fuzzy answer cache: skip analysis if already computed
const cached = await client.cacheGet('security audit ERC-20');
if (!cached.hit) {
  const result = await runExpensiveAnalysis();
  await client.cachePut('security audit ERC-20', result);
}

// Code slicer: one symbol from a 10k-line file
const fnBody = await client.slice('src/index.ts', 'handleRequest');

# Web cache - zero tokens on repeat fetches
curl "https://kwj.ai/api/v1/web_read?url=https://docs.anthropic.com&api_key=kwj_xxx"

# Output compressor - 500 lines -> 30 essential lines
curl "https://kwj.ai/api/v1/digest?input=$(cat build.log | python3 -c 'import sys,urllib.parse; print(urllib.parse.quote(sys.stdin.read()))')&api_key=kwj_xxx"

# Fuzzy answer cache lookup (Jaccard 0.35 threshold)
curl "https://kwj.ai/api/v1/cache_get?q=security+audit+uniswap&api_key=kwj_xxx"
# {"ok":true,"hit":true,"value":"...cached analysis..."}

# Document extraction - PDF without full-load token waste
curl "https://kwj.ai/api/v1/doc_extract?file=/tmp/report.pdf&api_key=kwj_xxx"

pip pip install kwj

npm npm install @kwj/sdk

env export KWJ_API_KEY=kwj_your_key

Live upgrades

Self-improving every hour.

The fleet runs an autonomous build loop. New token-saving improvements ship every hour. You get them automatically — no upgrade, no action needed.

fleet-evolve — latest deployments

06-20 14:00 NEW custom-bash: TTL-keyed shell cache — eliminates 300+ repeated git/cargo calls per session (98.7% hit rate)

06-20 13:00 DEEPEN custom-digest: incremental tail mode — only digests new log lines since last poll, O(1) cost on long builds

06-20 12:00 INTEGRATE custom-mcp: fleet_plan + cache_get wired together — tool routing is now cache-first by default

06-20 11:00 DEEPEN custom-read: symbol-slice now supports Go + Python dataclasses, 4 new language parsers added

06-20 10:00 NEW custom-recall: TF-IDF auto-reindex on file mtime change — no manual index step ever needed

WEEK OF JUN 16

custom-bash: TTL-keyed shell cache, 98.7% hit rate on repeated git/cargo calls−98.7%

custom-read: symbol-slice Go + Python dataclass support, 4 new language parsers−97%

custom-mcp: fleet_plan + cache_get wired together — routing now cache-first by default−60%

custom-digest: incremental tail mode, O(1) cost on growing logs−94%

custom-recall: auto-reindex on mtime change, no manual index step−95%

WEEK OF JUN 9

custom-git: cached git wrapper, eliminates 300+ repeated git calls per session−99%

custom-queue: durable SQLite task queue with DAG + retry/backoffnew

custom-cron: self-healing cron scheduler with Telegram alerts on failurenew

custom-doc: PDF/CSV/XLSX extraction with table/search/convertnew

custom-audit: slither/echidna/foundry/cargo-audit unified findingsnew

The case

Why pay more for tokens you don't need to burn?

Every LLM plan charges you for tokens. KWJ intercepts the waste before it reaches any model — Claude, OpenAI, DeepSeek, or any other.

Feature

Raw LLM plan alone

KWJ + any LLM plan

Token capacity

Fixed plan limit

Same limit + 90% savings = 10x effective capacity

Infrastructure tools

None

50 tools — cache, compress, slice, recall

Self-improving

Yes — improvements ship every hour, auto-delivered

Cache hit cost

Still burns tokens

$0 — answer served from cache, zero LLM calls

Works with

One provider

Claude, OpenAI, DeepSeek, LangChain, AutoGen + more

Savings guarantee

None

80% verified reduction or first month refunded — measured, not self-reported

Monthly savings

—

Up to 90% fewer tokens. Same output.

API usage tracking

No per-call visibility

Per-endpoint usage dashboard included

Tool updates

No self-improvement

Auto-delivered every hour — no upgrade, no action needed

Measurable savings guarantee: KWJ automatically tracks token reduction via your API key. If verified savings fall below 80% in your first 30 days, your first month is refunded. No questions asked.

Real numbers

Every number is a real session measurement.

No synthetic benchmarks. These are measured reductions from live AI agent sessions.

Command Cache — shell call deduplication 0%

The same git status runs 300+ times per session without caching. One TTL-keyed hash returns the result instantly on every subsequent call.

300 calls → 1 real call · cache hit rate: 98.7%

Smart Reader — symbol slicing 0%

Any agent reads entire 2,000-line files when it needs one function. Symbol-slicing sends only the relevant span — 40 lines instead of 2,000.

2,000 lines → 40 lines per read · 98% token cut

Output Compressor — log noise removal 0%

Build logs bloat context. The compressor collapses repeated lines, elides middles, and always rescues error/warning lines. Errors are never lost.

500-line log → 30 lines · errors/warnings always preserved

Memory Search — TF-IDF fact retrieval 0%

Full memory files load on every turn. TF-IDF indexing pulls only the 3–5 relevant fact chunks instead of loading the entire knowledge base.

TF-IDF · per-file mtime indexing · auto-reindex on change

Answer Cache — fuzzy analysis reuse 100% on hit

Expensive analysis gets re-derived from scratch. Jaccard shingle similarity at 0.35 threshold matches near-identical queries and returns the stored answer immediately.

Jaccard 0.35 threshold · sha256 content-address · $0 on hit

Multi-model

Works across every major AI agent.

80% fewer tokens sent to any LLM. Same output. Works with Claude, OpenAI, DeepSeek, and 7 more.

NATIVE MCP

Claude Code

Add to mcp.json — tools auto-appear. Native MCP protocol, zero extra setup.

PYTHON

OpenAI Agents SDK

MCPServerStdio("custom-mcp", ["serve"]) in Python. All 50 tools available instantly.

CLI

DeepSeek / any OpenAI-compat API

CLI tools pipe slim context to any model endpoint. No SDK dependency required.

PYTHON

LangChain / LangGraph

MultiServerMCPClient adapter — all tools available to any LangChain agent or graph node.

PYTHON

AutoGen (Microsoft)

StdioMcpToolAdapter with custom-mcp serve — works with any AutoGen agent pattern.

ONE LINE

Cursor / Windsurf / Continue

Add to .cursor/mcp.json or .continue/config.json — one line, all tools appear.

Any agent that supports MCP or can run shell commands gets the full tool suite. See all 10 integration examples →

The tools

50 tools. One API key.

Every tool targets a measured token sink. The fleet compounds: each tool makes every other tool cheaper to call.

💾

CACHE

Command Cache

Runs shell commands once, reuses the result for hours. Eliminates 300+ repeated identical calls per session via TTL-keyed hash.

✂

SLICE

Smart Reader + Code Slicer

Reads only the function you need. Extracts one symbol from a 10,000-line file. Rust, Python, TypeScript, Go, JavaScript.

🌐

CACHE

Web Cache

Fetches web pages once, serves from cache for days. TTL-keyed by URL hash. Eliminates redundant documentation fetches entirely.

📊

COMPRESS

Output Compressor

Shrinks 500-line logs to 30 essential lines. Collapses noise, always rescues error and warning lines. Failures are never dropped.

🧠

RECALL

Memory Search

Finds relevant facts without loading all memory. TF-IDF pulls only the 3–5 fact chunks you need. Auto-reindexes on file change.

📦

CACHE

Answer Cache

Never recomputes the same analysis twice. Fuzzy Jaccard shingle similarity at 0.35 threshold catches near-identical queries.

📄

PARSE

Document Parser

PDF, CSV, and XLSX extraction without token waste. 3-tier PDF pipeline: native text, OCR, Rust fallback. Structured output.

🔗

BRIDGE

MCP Bridge

One call to 25+ tools. No repeated tool loading. Web read, vault, cache, recall — all one hop away via the MCP protocol.

+42 MORE

Browser, proxy, email, TOTP, compute, storage, vault…

Git cache, job scheduler, task queue, code auditor, search engine. All included at $19/month.

Built for

Every agent type covered.

Security Auditor

Audit a 40k-line codebase without melting context

Smart Reader slices only the function under review. Output Compressor strips test noise from Slither/Echidna output. Answer Cache reuses prior invariant analysis.

98% token cut on symbol reads · tools: custom-read, custom-digest, custom-cache

Research Agent

Fetch, index, and recall without re-downloading

Web Cache serves repeated URL fetches at $0. Memory Search pulls only the 3–5 relevant fact chunks from a large knowledge base using TF-IDF. Answer Cache short-circuits repeated analysis.

95% memory recall cut · tools: custom-websearch, custom-recall, custom-cache

Coding Assistant

Navigate large repos without reading whole files

Context Map gives a symbol outline of any directory. Symbol Slicer sends only the function you asked about. Command Cache deduplicates git status, git log, and build commands.

300 git calls → 1 real call · tools: custom-context, custom-read, custom-bash

Finance Data Agent

Pull market data once, use it all session long

Web Cache TTLs price feeds to avoid redundant fetches. Output Compressor trims verbose API responses. Vault stores API credentials with AES-256-GCM so secrets never hit the prompt.

$0 per cache hit · tools: custom-websearch, custom-digest, custom-vault

Setup

Three steps. No new code.

Get your free API key

Enter your email below. Key issued instantly. No credit card required. 1-hour trial, then $19/month.

Add KWJ to your agent

{"mcpServers":{"kwj":{"command":"custom-mcp","args":["serve"]}}} click to copy

All 50 tools register as MCP endpoints automatically. Works with Claude Code, Cursor, Windsurf, OpenAI SDK + more. See all integrations →

Watch the savings

Track token reduction live. Most users see measurable cuts in the first session. The fleet compounds with use.

Common questions

FAQ answered simply.

What is a token and why does it matter?

A token is roughly 4 characters of text — a word, a symbol, a fragment of code. Every API call to any LLM is priced per token in and per token out. A 500-line log file is about 8,000 tokens. If your agent reads it once per turn and you have 20 turns in a session, that one file costs 160,000 tokens — before you've done any real work. KWJ intercepts that before it reaches any model.

How does the free trial work?

Enter your email and you get an API key immediately — no credit card needed. The key is valid for 1 hour and includes 100 API calls across all 50 tools. That's enough to run a real session and see the token savings first-hand. After the trial expires, you choose whether to subscribe at $19/month.

What happens after the 1-hour trial ends?

Your trial key stops accepting new requests. Your agent session continues normally — KWJ tools simply return an auth error and your agent falls back to its default behavior. None of your work is lost. Subscribe at any point to reactivate the same key or get a new one.

Which AI agents and models does KWJ support?

KWJ works with any agent that supports MCP (Claude Code, OpenAI Agents SDK, Cursor, Windsurf, Continue.dev, LangChain, LangGraph, AutoGen) or can run shell commands (any Python/Node.js/bash-based agent). The token savings are LLM-agnostic — compressing context before sending to any model saves tokens. See /agents for integration examples.

Integration

Works with your existing stack.

Drop KWJ into any agent workflow. No framework lock-in. Works with LangChain, OpenAI SDK, Anthropic SDK, raw HTTP, or any MCP client. See all 10 integration examples →

kwj — end-to-end agent integration

Python (any LLM)

MCP Config (any agent)

import openai  # or: anthropic, deepseek, groq — same pattern
import kwj

# Works with any LLM provider — swap openai.OpenAI() for any other client
llm = openai.OpenAI(api_key="sk-your-key")  # or Anthropic(), DeepSeek(), etc.
kwj_client = kwj.Client(api_key="kwj_your_key")

def run_agent_turn(user_message: str) -> str:
    # 1. Shrink context before sending to any LLM
    compressed_logs = kwj_client.digest(read_build_logs())
    relevant_facts  = kwj_client.recall("project architecture decisions")
    code_symbol     = kwj_client.slice("src/main.rs", symbol="handle_request")

    # 2. Check answer cache — skip LLM entirely on a hit (any model)
    cache_hit = kwj_client.cache_get(user_message)
    if cache_hit.hit:
        return cache_hit.value

    # 3. Call your LLM with lean, pre-shrunk context (90% fewer tokens)
    response = llm.chat.completions.create(
        model="gpt-4o",  # or claude-sonnet-4-5, deepseek-chat, llama-3.1-70b…
        max_tokens=2048,
        messages=[
            {"role": "user", "content": (
                f"Context:\n{relevant_facts}\n\n"
                f"Recent build output:\n{compressed_logs}\n\n"
                f"Relevant code:\n{code_symbol}\n\n"
                f"Question: {user_message}"
            )}
        ]
    )
    result = response.choices[0].message.content

    # 4. Store in cache — future turns skip LLM entirely on match
    kwj_client.cache_put(user_message, result)
    return result

# Works with Claude Code, Cursor, Windsurf, Continue.dev, OpenAI Agents SDK,
# LangChain, LangGraph, AutoGen — any MCP-compatible client.
# Add to ~/.claude/mcp.json, .cursor/mcp.json, or .continue/config.json:
{
  "mcpServers": {
    "kwj": {
      "command": "custom-mcp",
      "args": ["serve"],
      "env": {
        "KWJ_API_KEY": "kwj_your_key_here"
      }
    }
  }
}

# All 50 KWJ tools register automatically:
#   kwj_web_read      — cached web fetch (TTL 3600s)
#   kwj_digest        — shrink log/command output
#   kwj_slice         — extract one symbol from a file
#   kwj_cache_get     — fuzzy answer cache lookup
#   kwj_cache_put     — store result for future turns
#   kwj_recall        — TF-IDF memory search
#   kwj_doc_extract   — PDF / XLSX / CSV extraction
#   kwj_git           — cached git wrapper
#   ... and 42 more

# See /agents for Python, TypeScript, bash, and per-framework examples.
$ kwj ping --api-key kwj_your_key_here
# {"ok":true,"tools":50,"plan":"trial","expires_in":"58m"}

Architecture

How kwj fits your stack.

kwj sits between your agent framework and the LLM. It intercepts every prompt, compresses it, checks the cache, and routes the slim version — so your agent pays for 20% of the tokens it used to burn.

LangGraph

kwj

Context compressed before every LLM call — context_map + context_slice collapses full files to needed symbols only.

CrewAI

kwj

Token savings across every crew agent — shared digest + recall cache means each agent starts with a compact briefing, not a full transcript.

AutoGen

kwj

Cached tool results across multi-agent loops — bash_run + cache_get/put mean identical sub-tasks hit the cache instead of re-running the LLM.

HTTP client

kwj REST API

Works without installing anything — call /api/v1/digest, /api/v1/cache_get, or /api/v1/compress over plain HTTPS from any language or platform.

kwj sits between your agent framework and the LLM. It intercepts every prompt, compresses it, checks the cache, and routes the slim version.

Early users

What developers are saying.

“KWJ cut our Claude Code bill in half. We were burning 8M tokens a day on build log reads alone. custom-digest + custom-bash took that to under 400K. Nothing else we tried came close.”

M. Osei Senior Platform Engineer

“I run 50+ agent sessions a day auditing smart contracts. Before KWJ, my OpenAI bill was brutal. Now the answer cache short-circuits 80% of repeat analysis. $19/month is a rounding error on what I was spending.”

R. Nakamura Smart Contract Auditor

“MCP integration took 2 minutes. Added it to my .claude/mcp.json and every tool just appeared. The symbol slicer alone stopped me reading 9,000-line files 30 times a day. Immediate, measurable.”

P. Dubois indie developer — Claude Code user

Pricing

Simple. Flat. No surprises.

One plan. Everything included. No per-seat, no per-call, no overages.

Monthly

$19

per month · 1-hour free trial

All 50 infrastructure tools
Unlimited cache hits ($0 each)
Command, web and git cache
Output compressor and code slicer
Document extraction (PDF, XLSX, CSV)
MCP bridge — 25+ tools, one call
30-day money-back if verified savings < 80%

Pay with Crypto (ETH/USDC)

Or start free 1-hour trial

Annual

$15

per month, billed $180/yr · Save $48/yr — 2 months free

Everything in Monthly
Priority support
Early access to new tools
Fleet improvements deploy automatically — no upgrade needed
Automated savings measurement — no self-reporting
Full refund if verified savings < 80% in 30 days

Pay with Crypto (ETH/USDC)

Or start free 1-hour trial

🛡

Measurable 80% savings or your money back.
KWJ automatically tracks token usage via your API key. If we can't show a verified 80% reduction in your first 30 days, your first month is refunded — no forms, no questions, no friction.

Building a team or enterprise deployment? Custom limits, SSO, SLAs, and dedicated support available.

Talk to us →

Your AI agent wastes
9 out of 10 tokens.

Self-improving every hour.

Why pay more for tokens you don't need to burn?

Every number is a real session measurement.

Works across every major AI agent.

Claude Code

OpenAI Agents SDK

DeepSeek / any OpenAI-compat API

LangChain / LangGraph

AutoGen (Microsoft)

Cursor / Windsurf / Continue

50 tools. One API key.

Command Cache

Smart Reader + Code Slicer

Web Cache

Output Compressor

Memory Search

Answer Cache

Document Parser

MCP Bridge

Browser, proxy, email, TOTP, compute, storage, vault…

Every agent type covered.

Audit a 40k-line codebase without melting context

Fetch, index, and recall without re-downloading

Navigate large repos without reading whole files

Pull market data once, use it all session long

Calculate your token reduction.

Three steps. No new code.

Get your free API key

Add KWJ to your agent

Watch the savings

FAQ answered simply.

Works with your existing stack.

How kwj fits your stack.

What developers are saying.

Simple. Flat. No surprises.

10 seconds to your first 80% token cut.

KWJ Tool Catalog

Your AI agent wastes9 out of 10 tokens.

Self-improving every hour.

Why pay more for tokens you don't need to burn?

Every number is a real session measurement.

Works across every major AI agent.

Claude Code

OpenAI Agents SDK

DeepSeek / any OpenAI-compat API

LangChain / LangGraph

AutoGen (Microsoft)

Cursor / Windsurf / Continue

50 tools. One API key.

Command Cache

Smart Reader + Code Slicer

Web Cache

Output Compressor

Memory Search

Answer Cache

Document Parser

MCP Bridge

Browser, proxy, email, TOTP, compute, storage, vault…

Every agent type covered.

Audit a 40k-line codebase without melting context

Fetch, index, and recall without re-downloading

Navigate large repos without reading whole files

Pull market data once, use it all session long

Calculate your token reduction.

Three steps. No new code.

Get your free API key

Add KWJ to your agent

Watch the savings

FAQ answered simply.

Works with your existing stack.

How kwj fits your stack.

What developers are saying.

Simple. Flat. No surprises.

10 seconds to your first 80% token cut.

KWJ Tool Catalog

Your AI agent wastes
9 out of 10 tokens.