Building AI-Friendly CLIs
JSON-First Design with Schema Commands
These days, AI agents are writing code, calling tools, and even handling deployments — and with that shift, the CLI, which had been quietly sitting in the background, is getting attention again. GUIs and web dashboards are great for humans, but from an AI agent's perspective, CLIs are a much easier interface to work with.
The thing is, most existing CLIs were designed for humans. Pretty table outputs, color codes, shorthand flags — all nice for human eyes, but quite painful for agents to parse. Output formats subtly change between versions, and figuring out the exact usage often means reading separate documentation.
This week, I did a major overhaul of our team's internal CLI at work, rebuilding it around JSON I/O and schema commands. The agent's task success rate jumped noticeably. Based on that experience, I want to share some thoughts on how to build AI-friendly CLIs.
Why JSON-First Input / Output Matters for AI Agents
Human vs AI CLI Usage Patterns
When humans use a CLI, they check --help, read man pages, eyeball error messages, and iterate through trial and error. Even if the output changes a bit, they adapt by reading the context. AI agents, on the other hand, take the output as a raw string and process it literally. They have to parse table-formatted output with regex, and the moment column order shifts or line breaks change, things break immediately.
# This is convenient for humans, but...
$ kubectl get pods
NAME READY STATUS RESTARTS AGE
my-app-7d4b8c6f5-x2k9z 1/1 Running 0 3d
# This is what agents need
$ kubectl get pods -o json
{
"items": [{
"metadata": {"name": "my-app-7d4b8c6f5-x2k9z"},
"status": {"phase": "Running", "containerStatuses": [{"ready": true}]}
}]
}
The Pain of Unstructured Text Output
The pain that unstructured text output causes for agents is bigger than you'd think. Here are some patterns I actually ran into:
- Inconsistent parsing: sometimes the output has headers, sometimes it doesn't
- Locale dependency: date/number formats change based on system locale
- Color code pollution: ANSI escape codes sneak in and break string comparisons
- Progress bar collisions: stderr and stdout get mixed up, garbling the output
- Silent truncation: long values get clipped to
...with no way to detect it
When you start handling these edge cases one by one, your agent code ends up buried in CLI parsing logic. You end up spending more time interpreting output than doing the actual work.
Advantages of JSON
Switching to JSON-centric I/O makes most of these problems disappear.
- Type safety: numbers are numbers, strings are strings. You can distinguish
"3"from3 - Schema-based validation: define and validate input/output shapes upfront with JSON Schema
- Easy chaining: instantly parseable by
jq, pipelines, and any programming language - Consistency: identical output regardless of locale or terminal settings
- Structured errors: return errors as JSON so agents can identify error types and respond appropriately
{
"error": {
"code": "RESOURCE_NOT_FOUND",
"message": "Pod 'my-app' not found in namespace 'default'",
"suggestions": ["Check namespace with --namespace flag"]
}
}
Real-World Test Results from Our Project
Here's a quick summary of what changed after adding a --json flag to the CLI and having agents perform the same tasks:
| Metric | Text Output | JSON Output |
|---|---|---|
| Task success rate | Around 60% | Around 90% |
| Average retries | 2.3 times | 0.4 times |
| Parsing-related errors | 41% of all errors | Nearly 0% |
Of course, these numbers are based on specific tasks I tested, so it's hard to generalize. But the fact that just switching to JSON made this much of a difference was a pretty meaningful result.
The Schema Command – Letting AI Learn and Adapt at Runtime
JSON I/O alone is a huge improvement, but there's a way to take it one step further: the schema command.
Inspiration from Google Workspace CLI (gws)
This idea was inspired by Google Workspace CLI (gws). gws has a structure that lets you query schema information per resource at runtime. Looking at that, I thought: "Why not let the agent ask the CLI directly instead of reading documentation?"
How the Schema Subcommand Works
The concept is simple. Add a schema subcommand to your CLI — specify a resource and action, and it returns the JSON Schema for that command's input and output.
$ mytool schema user.create
{
"$schema": "http://json-schema.org/draft-07/schema#",
"description": "Create a new user",
"input": {
"type": "object",
"required": ["email", "role"],
"properties": {
"email": {"type": "string", "format": "email"},
"role": {"type": "string", "enum": ["admin", "member", "viewer"]},
"name": {"type": "string", "maxLength": 100}
}
},
"output": {
"type": "object",
"properties": {
"id": {"type": "string", "format": "uuid"},
"email": {"type": "string"},
"role": {"type": "string"},
"created_at": {"type": "string", "format": "date-time"}
}
}
}
Even better if you can also query the list of available resources:
$ mytool schema --list
["user.create", "user.delete", "user.get", "user.list", "project.create", ...]
Benefits for Agents
The benefits this structure gives agents are substantial.
- No external docs needed: agents can query the CLI directly and construct accurate inputs
- Auto-adapts to API changes: when the CLI updates, the schema updates with it, so agents always work against the latest spec
- Pairs with dry-run: build input from the schema, validate with
--dry-run, then execute — a safe workflow - Self-describing: the CLI can describe itself without needing a separate AGENTS.md or tool description
Our Implementation Overview
In our project, we implemented it with the following structure:
- Define input/output schemas on each command handler (based on Pydantic models)
- The
schemasubcommand serializes these into JSON Schema and returns them - A
--listoption allows browsing the full resource/action tree - Include an
examplesfield in schema responses so agents have something to reference
The implementation itself wasn't particularly difficult. Since we were already defining input/output models with Pydantic, most of it was solved just by calling .model_json_schema().
Example Agent Workflow Using Schema
Here's what the actual flow looks like when an agent uses the schema:
1. Agent: mytool schema --list
→ Check available commands
2. Agent: mytool schema user.create
→ Check input schema (required fields: email, role)
3. Agent: mytool user create --json '{"email":"new@example.com","role":"member"}' --dry-run
→ Validate before execution
4. Agent: mytool user create --json '{"email":"new@example.com","role":"member"}'
→ Execute, receive JSON response
5. Agent: Use the id field from the response for the next task
Throughout this entire flow, the agent never references documentation once. The CLI itself serves as the documentation.
Practical Design Patterns
Here are some patterns worth considering when actually applying JSON I/O and Schema.
Input Design Choices
There are roughly three input approaches, and you can choose based on the situation:
| Approach | Example | Best For |
|---|---|---|
| stdin JSON | echo '{"key":"val"}' | mytool create |
Large payloads, pipeline chaining |
| argument JSON | mytool create --json '{"key":"val"}' |
Single command execution, keeping it in shell history |
| Mixed | mytool create --name foo --json '{"extra":"opts"}' |
Frequently used options as flags, the rest as JSON |
Personally, I'd recommend argument JSON as the default with stdin support as well. From an agent's perspective, a self-contained single command is the easiest to work with.
Output Design Best Practices
There are a few important principles for output design:
--jsonflag is a must: keep the default human-readable, but return structured output when--jsonis passed- NDJSON support: for streaming scenarios (logs, events, etc.), support line-delimited JSON
- Errors in JSON too: in
--jsonmode, errors should also be returned as JSON, alongside exit codes - Include metadata: pagination info, request IDs, timestamps should be part of the response
# Normal mode
$ mytool user list
EMAIL ROLE CREATED
alice@example.com admin 2026-01-15
bob@example.com member 2026-02-20
# JSON mode
$ mytool user list --json
{
"data": [
{"email": "alice@example.com", "role": "admin", "created_at": "2026-01-15T00:00:00Z"},
{"email": "bob@example.com", "role": "member", "created_at": "2026-02-20T00:00:00Z"}
],
"meta": {"total": 2, "page": 1, "per_page": 50}
}
In the end, I actually went with JSON output as the default and added a --no-json flag instead. If the tool isn't meant for human use, unifying all I/O as JSON gave us the best hit rate.
Versioning & Backward Compatibility
Versioning JSON output is something you need to pay attention to. A few strategies:
- Adding fields is fine, removing/changing requires caution: adding new fields doesn't break backward compatibility, but removing fields or changing types can break agents
- Include a version field: putting something like
"api_version": "v1"in the response lets agents branch based on version - Deprecation warnings: fields slated for removal should be flagged in a separate warnings array
Using Pydantic / Zod / JSON Schema for Validation
Here are some tools you can use for schema definition:
- Python: Pydantic is the most convenient. Model definition → automatic JSON Schema generation → input validation, all in one
- TypeScript/Node: Define schemas with Zod and convert using
zod-to-json-schema - Go/Rust etc.: Write JSON Schema files directly or use code-generation libraries
The key point is that the type definitions used in your code and the schema returned by the schema command must come from the same source. If these are managed separately, they will inevitably drift out of sync.
Truth be told, we didn't pay much attention to this in our project. And it led to quite a few failures.
AI-Friendly Helper Flags
Beyond schema and JSON, there are some agent-friendly flags worth adding:
--dry-run: preview results without actually executing. Lets agents safely test things out--explain: describe what the command will do in natural language. Helps with agent planning--output-format: choose between json, yaml, csv, etc.--quiet: strip unnecessary banners and warnings, return only essential output--no-color: remove ANSI escape codes (honestly, every CLI should have this)
Results, Lessons, and Caveats After Adoption
Quantitative & Qualitative Outcomes
I shared the JSON transition results earlier. Here's what changed after adding the schema command on top of that:
| Metric | JSON Only | JSON + Schema |
|---|---|---|
| Task success rate | Around 90% | ~97% (almost all succeeded) |
| Agent first-try accuracy | ~70% | ~90% (honestly, this was the biggest improvement) |
| Doc references needed | Avg 1.2 per task | Nearly 0 |
The sample size wasn't huge, so these numbers might not be statistically significant. But what I felt more strongly on a qualitative level was how much the agent code's complexity dropped. The parsing logic disappeared and we could focus on business logic.
Common Failure Patterns We Observed
Of course, it's not a silver bullet. Here are the common failure patterns and their fixes:
- Oversized JSON responses: when a list API returns thousands of items, it blows past the agent's context window → pagination and filtering are essential
- Deeply nested structures: JSON nested 5+ levels deep is hard for agents to navigate accurately → keep it flat when possible
- Enum value errors: even with enums defined in the schema, agents sometimes insert similar but incorrect values → input validation + clear error messages
- optional vs required confusion: agents sometimes skip required fields → mark required fields clearly in the schema and tell them which fields are missing in error messages
Remaining Challenges
There are still some unsolved problems:
- Complex pagination: getting agents to handle cursor-based pagination smoothly remains tricky
- Binary data: file uploads/downloads and other binary data are hard to express cleanly in JSON
- Long-running operations: tracking status and handling timeouts for tasks that take several minutes
I still haven't found great answers for these. The long-running operations problem is particularly interesting — most agents end up polling with their own sleep loops, which seems pretty inefficient. Ideally, agents should be able to receive callbacks, but that's not easy to pull off.
Conclusion
To sum it up, the two essentials for building AI-friendly CLIs are:
- JSON-First I/O: structure your inputs and outputs so agents can parse and use them reliably
- Schema Command: let the CLI describe its own interface, eliminating the dependency on external docs
Even just having these two things in place made a noticeable difference in agent performance — I saw it firsthand. If you want to start right away, the easiest entry point is adding a single --json flag to your existing CLI. That alone makes integration with agents dramatically smoother.
The age of the CLI has come around once again.
