These days, AI agents are writing code, calling tools, and even handling deployments — and with that shift, the CLI, which had been quietly sitting in the background, is getting attention again. GUIs and web dashboards are great for humans, but from an AI agent's perspective, CLIs are a much easier interface to work with.

The thing is, most existing CLIs were designed for humans. Pretty table outputs, color codes, shorthand flags — all nice for human eyes, but quite painful for agents to parse. Output formats subtly change between versions, and figuring out the exact usage often means reading separate documentation.

This week, I did a major overhaul of our team's internal CLI at work, rebuilding it around JSON I/O and schema commands. The agent's task success rate jumped noticeably. Based on that experience, I want to share some thoughts on how to build AI-friendly CLIs.

Why JSON-First Input / Output Matters for AI Agents

Human vs AI CLI Usage Patterns

When humans use a CLI, they check --help, read man pages, eyeball error messages, and iterate through trial and error. Even if the output changes a bit, they adapt by reading the context. AI agents, on the other hand, take the output as a raw string and process it literally. They have to parse table-formatted output with regex, and the moment column order shifts or line breaks change, things break immediately.

# This is convenient for humans, but...
$ kubectl get pods
NAME                     READY   STATUS    RESTARTS   AGE
my-app-7d4b8c6f5-x2k9z  1/1     Running   0          3d

# This is what agents need
$ kubectl get pods -o json
{
  "items": [{
    "metadata": {"name": "my-app-7d4b8c6f5-x2k9z"},
    "status": {"phase": "Running", "containerStatuses": [{"ready": true}]}
  }]
}

The Pain of Unstructured Text Output

The pain that unstructured text output causes for agents is bigger than you'd think. Here are some patterns I actually ran into:

  • Inconsistent parsing: sometimes the output has headers, sometimes it doesn't
  • Locale dependency: date/number formats change based on system locale
  • Color code pollution: ANSI escape codes sneak in and break string comparisons
  • Progress bar collisions: stderr and stdout get mixed up, garbling the output
  • Silent truncation: long values get clipped to ... with no way to detect it

When you start handling these edge cases one by one, your agent code ends up buried in CLI parsing logic. You end up spending more time interpreting output than doing the actual work.

Advantages of JSON

Switching to JSON-centric I/O makes most of these problems disappear.

  • Type safety: numbers are numbers, strings are strings. You can distinguish "3" from 3
  • Schema-based validation: define and validate input/output shapes upfront with JSON Schema
  • Easy chaining: instantly parseable by jq, pipelines, and any programming language
  • Consistency: identical output regardless of locale or terminal settings
  • Structured errors: return errors as JSON so agents can identify error types and respond appropriately
{
  "error": {
    "code": "RESOURCE_NOT_FOUND",
    "message": "Pod 'my-app' not found in namespace 'default'",
    "suggestions": ["Check namespace with --namespace flag"]
  }
}

Real-World Test Results from Our Project

Here's a quick summary of what changed after adding a --json flag to the CLI and having agents perform the same tasks:

Metric Text Output JSON Output
Task success rate Around 60% Around 90%
Average retries 2.3 times 0.4 times
Parsing-related errors 41% of all errors Nearly 0%

Of course, these numbers are based on specific tasks I tested, so it's hard to generalize. But the fact that just switching to JSON made this much of a difference was a pretty meaningful result.

The Schema Command – Letting AI Learn and Adapt at Runtime

JSON I/O alone is a huge improvement, but there's a way to take it one step further: the schema command.

Inspiration from Google Workspace CLI (gws)

This idea was inspired by Google Workspace CLI (gws). gws has a structure that lets you query schema information per resource at runtime. Looking at that, I thought: "Why not let the agent ask the CLI directly instead of reading documentation?"

How the Schema Subcommand Works

The concept is simple. Add a schema subcommand to your CLI — specify a resource and action, and it returns the JSON Schema for that command's input and output.

$ mytool schema user.create
{
  "$schema": "http://json-schema.org/draft-07/schema#",
  "description": "Create a new user",
  "input": {
    "type": "object",
    "required": ["email", "role"],
    "properties": {
      "email": {"type": "string", "format": "email"},
      "role": {"type": "string", "enum": ["admin", "member", "viewer"]},
      "name": {"type": "string", "maxLength": 100}
    }
  },
  "output": {
    "type": "object",
    "properties": {
      "id": {"type": "string", "format": "uuid"},
      "email": {"type": "string"},
      "role": {"type": "string"},
      "created_at": {"type": "string", "format": "date-time"}
    }
  }
}

Even better if you can also query the list of available resources:

$ mytool schema --list
["user.create", "user.delete", "user.get", "user.list", "project.create", ...]

Benefits for Agents

The benefits this structure gives agents are substantial.

  • No external docs needed: agents can query the CLI directly and construct accurate inputs
  • Auto-adapts to API changes: when the CLI updates, the schema updates with it, so agents always work against the latest spec
  • Pairs with dry-run: build input from the schema, validate with --dry-run, then execute — a safe workflow
  • Self-describing: the CLI can describe itself without needing a separate AGENTS.md or tool description

Our Implementation Overview

In our project, we implemented it with the following structure:

  1. Define input/output schemas on each command handler (based on Pydantic models)
  2. The schema subcommand serializes these into JSON Schema and returns them
  3. A --list option allows browsing the full resource/action tree
  4. Include an examples field in schema responses so agents have something to reference

The implementation itself wasn't particularly difficult. Since we were already defining input/output models with Pydantic, most of it was solved just by calling .model_json_schema().

Example Agent Workflow Using Schema

Here's what the actual flow looks like when an agent uses the schema:

1. Agent: mytool schema --list
   → Check available commands

2. Agent: mytool schema user.create
   → Check input schema (required fields: email, role)

3. Agent: mytool user create --json '{"email":"new@example.com","role":"member"}' --dry-run
   → Validate before execution

4. Agent: mytool user create --json '{"email":"new@example.com","role":"member"}'
   → Execute, receive JSON response

5. Agent: Use the id field from the response for the next task

Throughout this entire flow, the agent never references documentation once. The CLI itself serves as the documentation.

Practical Design Patterns

Here are some patterns worth considering when actually applying JSON I/O and Schema.

Input Design Choices

There are roughly three input approaches, and you can choose based on the situation:

Approach Example Best For
stdin JSON echo '{"key":"val"}' | mytool create Large payloads, pipeline chaining
argument JSON mytool create --json '{"key":"val"}' Single command execution, keeping it in shell history
Mixed mytool create --name foo --json '{"extra":"opts"}' Frequently used options as flags, the rest as JSON

Personally, I'd recommend argument JSON as the default with stdin support as well. From an agent's perspective, a self-contained single command is the easiest to work with.

Output Design Best Practices

There are a few important principles for output design:

  • --json flag is a must: keep the default human-readable, but return structured output when --json is passed
  • NDJSON support: for streaming scenarios (logs, events, etc.), support line-delimited JSON
  • Errors in JSON too: in --json mode, errors should also be returned as JSON, alongside exit codes
  • Include metadata: pagination info, request IDs, timestamps should be part of the response
# Normal mode
$ mytool user list
EMAIL              ROLE     CREATED
alice@example.com  admin    2026-01-15
bob@example.com    member   2026-02-20

# JSON mode
$ mytool user list --json
{
  "data": [
    {"email": "alice@example.com", "role": "admin", "created_at": "2026-01-15T00:00:00Z"},
    {"email": "bob@example.com", "role": "member", "created_at": "2026-02-20T00:00:00Z"}
  ],
  "meta": {"total": 2, "page": 1, "per_page": 50}
}

In the end, I actually went with JSON output as the default and added a --no-json flag instead. If the tool isn't meant for human use, unifying all I/O as JSON gave us the best hit rate.

Versioning & Backward Compatibility

Versioning JSON output is something you need to pay attention to. A few strategies:

  • Adding fields is fine, removing/changing requires caution: adding new fields doesn't break backward compatibility, but removing fields or changing types can break agents
  • Include a version field: putting something like "api_version": "v1" in the response lets agents branch based on version
  • Deprecation warnings: fields slated for removal should be flagged in a separate warnings array

Using Pydantic / Zod / JSON Schema for Validation

Here are some tools you can use for schema definition:

  • Python: Pydantic is the most convenient. Model definition → automatic JSON Schema generation → input validation, all in one
  • TypeScript/Node: Define schemas with Zod and convert using zod-to-json-schema
  • Go/Rust etc.: Write JSON Schema files directly or use code-generation libraries

The key point is that the type definitions used in your code and the schema returned by the schema command must come from the same source. If these are managed separately, they will inevitably drift out of sync.

Truth be told, we didn't pay much attention to this in our project. And it led to quite a few failures.

AI-Friendly Helper Flags

Beyond schema and JSON, there are some agent-friendly flags worth adding:

  • --dry-run: preview results without actually executing. Lets agents safely test things out
  • --explain: describe what the command will do in natural language. Helps with agent planning
  • --output-format: choose between json, yaml, csv, etc.
  • --quiet: strip unnecessary banners and warnings, return only essential output
  • --no-color: remove ANSI escape codes (honestly, every CLI should have this)

Results, Lessons, and Caveats After Adoption

Quantitative & Qualitative Outcomes

I shared the JSON transition results earlier. Here's what changed after adding the schema command on top of that:

Metric JSON Only JSON + Schema
Task success rate Around 90% ~97% (almost all succeeded)
Agent first-try accuracy ~70% ~90% (honestly, this was the biggest improvement)
Doc references needed Avg 1.2 per task Nearly 0

The sample size wasn't huge, so these numbers might not be statistically significant. But what I felt more strongly on a qualitative level was how much the agent code's complexity dropped. The parsing logic disappeared and we could focus on business logic.

Common Failure Patterns We Observed

Of course, it's not a silver bullet. Here are the common failure patterns and their fixes:

  • Oversized JSON responses: when a list API returns thousands of items, it blows past the agent's context window → pagination and filtering are essential
  • Deeply nested structures: JSON nested 5+ levels deep is hard for agents to navigate accurately → keep it flat when possible
  • Enum value errors: even with enums defined in the schema, agents sometimes insert similar but incorrect values → input validation + clear error messages
  • optional vs required confusion: agents sometimes skip required fields → mark required fields clearly in the schema and tell them which fields are missing in error messages

Remaining Challenges

There are still some unsolved problems:

  • Complex pagination: getting agents to handle cursor-based pagination smoothly remains tricky
  • Binary data: file uploads/downloads and other binary data are hard to express cleanly in JSON
  • Long-running operations: tracking status and handling timeouts for tasks that take several minutes

I still haven't found great answers for these. The long-running operations problem is particularly interesting — most agents end up polling with their own sleep loops, which seems pretty inefficient. Ideally, agents should be able to receive callbacks, but that's not easy to pull off.

Conclusion

To sum it up, the two essentials for building AI-friendly CLIs are:

  1. JSON-First I/O: structure your inputs and outputs so agents can parse and use them reliably
  2. Schema Command: let the CLI describe its own interface, eliminating the dependency on external docs

Even just having these two things in place made a noticeable difference in agent performance — I saw it firsthand. If you want to start right away, the easiest entry point is adding a single --json flag to your existing CLI. That alone makes integration with agents dramatically smoother.

The age of the CLI has come around once again.