Tools vs. Skills vs. CLI vs. MCP vs. A2A
Why these approaches get conflated
“Connecting agents” is overloaded. People use the phrase to mean at least four different integration problems—often in the same system—so it’s easy to compare apples to oranges.
One problem is how an LLM triggers an action (e.g., “call this function with these arguments”). That’s the territory of tools (tool calling / function calling) in most commercial LLM APIs. citeturn5search0turn1view6turn0search26
A second problem is how you package repeatable expertise/workflows so an agent can do a task reliably without you hardcoding every step in your orchestration layer. That’s where filesystem-based skills (typically SKILL.md + optional scripts/resources) fit: they’re an instruction-and-assets distribution mechanism with progressive disclosure to control context usage. citeturn1view5turn13view0turn14search20
A third problem is how an agent talks to external systems in a vendor-neutral way, so multiple agent hosts/clients can reuse the same connectors. That’s what MCP targets: standardizing access to tools, resources, and prompts over a JSON-RPC-based protocol with defined transports and security guidance/specs. citeturn7search5turn7search4turn7search28turn7search2
A fourth problem is how agents talk to other agents as peers (not as “just another tool”), including discovery, task lifecycle, and modality negotiation. That’s what A2A is designed for, and its own docs explicitly position it as complementary to MCP (MCP = tool/context access; A2A = agent collaboration). citeturn1view2turn9view0turn10view0turn8view1
Finally, “API specs” (usually OpenAPI) are not a connectivity layer by themselves; they’re a contract surface you can compile into tool definitions (for direct tool calling) or into an MCP server (manual or generated). citeturn8view4turn15view3
The upshot: these are not mutually exclusive. Many production stacks end up using A2A for agent-to-agent coordination, MCP for agent-to-system access, tools for atomic actions, and skills for workflow packaging and token-efficient instruction loading—with OpenAPI as a “source of truth” feeding one or more of those layers. citeturn9view0turn7search5turn13view0turn15view3
A practical comparison framework
A useful comparison is less “which is best?” and more “which layer does this solve, and what are the trade-offs?” The aspects below are the ones that repeatedly show up in real-world docs, security guidance, and postmortems:
- Primary integration target (system tools vs workflows vs agent peers) citeturn7search5turn10view0turn13view0
- Token footprint drivers (tool schema preload, instruction load, intermediate results, caching/deferral) citeturn8view3turn8view5turn1view5
- Determinism and type-safety (strict schemas, stable error handling, validation) citeturn7search13turn5search0turn8view0
- Security model (credential isolation, supply-chain risk, prompt injection exposure, auth standards) citeturn13view2turn7search1turn14search8turn10view0
- Interop and governance (open standard vs vendor feature; portability across hosts) citeturn7search5turn1view1turn1view0turn9view0
- Observability and operational fit (auditing, rate limits, approvals, idempotency, long-running tasks) citeturn10view0turn8view5turn13view2turn5search0
A compact “layer map” that keeps comparisons honest:
| Approach | What it standardizes (core promise) | Typical failure mode if misused |
|---|---|---|
| Tools | Structured action invocation from an LLM (functions and built-in tools) citeturn5search0turn1view6 | Too many tools → token/latency tax, tool confusion, brittle multi-step chains citeturn15view0turn8view3 |
| Skills | Instruction + assets packaging with progressive disclosure via filesystem/shell/scripts citeturn1view5turn13view0turn14search20 | Supply-chain and shell/script risk; “instructions as code” non-determinism citeturn14search8turn14search4turn0search21 |
| API specs (OpenAPI) | Machine-readable contract to generate tool surfaces or clients citeturn8view4turn15view3 | Endpoints ≠ agent-intents; auto-wrapping yields massive low-level tool catalogs citeturn15view0turn15view1 |
| MCP | Vendor-neutral protocol for tools/resources/prompts + transports + auth guidance/spec citeturn7search5turn7search4turn7search1 | Context bloat from tool definitions/results; unsafe remote servers or prompt injection via external content citeturn8view3turn13view2turn4search7 |
| A2A | Vendor-neutral protocol for agent discovery + task lifecycle + modality negotiation + updates citeturn10view0turn9view0turn8view1 | Using it as a generic RPC layer without task/identity discipline; unclear trust boundaries without proper security schemes citeturn10view0turn11view1 |
This table is synthesized directly from the normative docs/specs plus ecosystem writeups that document the recurring operational issues (schema bloat, multi-step brittleness, security exposure). citeturn8view3turn8view5turn1view5turn10view0turn13view2turn15view0
Tools and “direct API” integration
Tools are the lowest-latency conceptual bridge between a model and an action: the model outputs a structured tool call; something executes it; results come back; the model continues. citeturn5search0turn1view6
Where tools run matters
A persistent design fork is client-executed vs provider-hosted tool execution.
With classic function/tool calling, the model emits a call and your application executes it, then sends tool output back to the model in a subsequent request (or a later step in an agent loop). citeturn5search0turn13view1
By contrast, some “built-in tools” are executed inside the provider’s orchestration environment (web search, file search, code interpreter, computer/shell environments, or remote MCP calling depending on the platform). For example, one vendor describes an orchestrated loop where the API forwards commands into a container runtime and streams output back into the model context, including controls like bounded output and parallel sessions. citeturn13view1turn2search8
This execution choice affects:
- Security boundaries (credentials stay with your app vs handled by a hosted connector/server). citeturn13view2turn5search5
- Latency and caching (hosted systems can optimize the loop; client loops give you full control but require more engineering). citeturn13view1turn5search0
- Observability and approvals (client-side is easier to gate; hosted surfaces often provide their own approval/allowlisting interfaces). citeturn13view2turn5search0
“Tool sets” in practice: deferring and searching tools
A modern response to “too many tools” is deferred loading + tool search, which keeps only a high-level searchable surface in context and loads detailed schemas on demand. citeturn8view5turn1view6
One vendor’s tool-search guide is explicit about the motivation (reduce upfront token/cost), the mechanism (mark functions/namespaces/MCP servers as defer_loading), and the practical guidance (“use namespaces or MCP servers” as the search surface; keep namespaces small). citeturn8view5turn2search2
This is a key bridge between “tools” and “MCP”: once you defer entire MCP servers as the searchable unit, you’re treating each MCP server as a tool set that can be lazily expanded by the model. citeturn8view5turn1view6
When “direct API calls” are the right answer
“Direct API calls” can mean two different things:
- Model proposes a tool call; your code calls the API. This is still “tools,” but you keep the API knowledge in your app, not in the model context. citeturn5search0turn13view1
- Model reasons over an API contract (OpenAPI) and calls endpoints via generated functions/tools.
The first approach is often best when you want high-level, intention-shaped operations (e.g., refund_customer_by_email) rather than exposing low-level endpoints. Multiple ecosystem critiques point out that “atomic, discoverable APIs” are great for humans but expensive for agents because each choice and each multi-step chain imposes token + latency costs. citeturn15view0turn15view1
The second approach is attractive when you already have a robust OpenAPI spec and want fast prototyping. OpenAI has long published examples converting OpenAPI specs into function/tool definitions. citeturn8view4turn5search4
But multiple practitioners warn that doing this naively (whether as direct function tools or as auto-generated MCP tools) tends to create huge catalogs of low-level operations, which increases context load and can worsen tool-selection reliability. citeturn15view0turn15view1turn15view2
Skills as workflow packaging
Skills are best understood as a distribution format for agent competence: instructions plus optional scripts/resources, loaded only when relevant.
The core mechanism: progressive disclosure
In the Claude skills docs, progressive disclosure is described as a three-level system: load small metadata for each skill at startup; load the SKILL.md body only when triggered; and keep larger resources in the filesystem, where scripts can be executed and only their outputs enter context. citeturn1view5turn14search3
A separate vendor’s “agent skills” docs describe essentially the same idea: metadata is visible first; the full SKILL.md is loaded only when the skill is selected; skills are directories containing SKILL.md plus optional scripts/references/assets; and the ecosystem is positioned as an open standard. citeturn13view0turn14search20
This approach is fundamentally about token economics and instruction reliability:
- Token economics: load only what you need. citeturn1view5turn13view0
- Instruction reliability: skills can encode a stable SOP (checklists, guardrails, “when to trigger”) that a generic base agent might not infer consistently. citeturn13view0turn14search12
“Leanest implementation” really means “filesystem + shell”
A skill system becomes particularly lean when the agent already has filesystem access and a shell: you can package workflows as scripts and docs, and let the model run commands through a shell tool to fetch live data, call CLIs, and transform results. citeturn13view1turn1view5turn0search12
That same vendor’s shell-tool article emphasizes a crucial point: the model only proposes commands; an orchestrator executes them and loops results back, and it highlights controls like bounded output to keep logs from consuming context. citeturn13view1
The hard trade-off: flexibility vs attack surface
Skills intentionally blur “prompt” and “code.” That power shows up directly in threat modeling and supply-chain research.
Snyk’s “ToxicSkills” research (February 2026) reports scanning thousands of publicly available skills and finding a meaningful fraction with critical security issues, including malicious payloads, prompt injection risk, and exposed secrets—explicitly framing skills as a supply-chain security problem because they can inherit shell/filesystem/API access from the host agent. citeturn14search8turn14search16
Separate Snyk writeups also focus on how trivially a SKILL.md plus shell execution can become a remote code execution pathway in poorly controlled environments. citeturn14search4turn0search21
This security posture is not an argument against skills; it’s an argument for treating them like packages with capabilities, requiring the same kinds of controls you’d apply to plugin ecosystems: provenance, sandboxing, least privilege, and auditing. citeturn14search8turn13view1
MCP for tool and context interoperability
MCP is explicitly framed as a “USB-C for AI applications” style standard: a way for AI hosts/clients to connect to external tools, data sources, and workflows through a shared protocol surface. citeturn1view4turn7search2
What MCP standardizes
At the protocol level, MCP uses JSON-RPC as its message encoding and defines standard transports including stdio and streamable HTTP. citeturn7search4turn7search0turn7search5
The spec’s overview enumerates scope such as lifecycle management, authorization for HTTP transports, and server features (resources/prompts/tools). citeturn7search28turn7search13
The adoption narrative is now also institutional: entity[“organization”,“Linux Foundation”,“open source foundation”] press announcements describe MCP being anchored inside the entity[“organization”,“Agentic AI Foundation”,“lf-directed agentic ai fund”], with founding contributions including MCP and other agent-adjacent standards. citeturn1view1
entity[“company”,“Anthropic”,“ai company”] also describes donating MCP into that foundation, co-founded with entity[“company”,“Block”,“fintech company”] and entity[“company”,“OpenAI”,“ai company”], with support from entity[“company”,“Google”,“tech company”], entity[“company”,“Microsoft”,“tech company”], entity[“company”,“Amazon Web Services”,“cloud provider”], entity[“company”,“Cloudflare”,“internet security company”], and entity[“company”,“Bloomberg”,“financial data company”]. citeturn0search3turn1view1
That governance shift matters in practice because MCP is increasingly treated as an integration substrate across ecosystems, including major AI platforms’ “remote MCP server” support. citeturn13view2turn1view4
The “N×M” promise vs the “token bloat” reality
MCP’s architectural pitch is to reduce custom one-off connectors (many agent hosts × many systems) into a standard protocol interaction pattern (clients talk to servers via MCP). citeturn7search2turn4search2
However, MCP’s most visible operational critique is that naïve clients/hosts often load all tool definitions up front, and that intermediate results are repeatedly streamed through the model context—driving cost, latency, and sometimes failure on large artifacts. citeturn8view3turn4search7
Anthropic’s own engineering writeup sketches a concrete token-cost scenario: routing a long transcript through the model loop can add tens of thousands of tokens and can exceed context limits. citeturn8view3
Mitigations that actually work
A pattern that is repeatedly recommended (by protocol implementers and platform vendors) is some form of lazy or selective tool exposure:
- Tool search / deferred loading at the host/tooling layer, so the model sees only a high-level server/namespace description until it needs details. citeturn8view5turn8view3
- Curated, intent-level tools instead of 1:1 endpoint exposure, because agents are weak at brittle multi-step flows and pay a high tax per tool call. citeturn15view1turn15view0
- Compute-side filtering/aggregation (code execution modes) to keep large intermediate results out of the model context and return only summaries or selected slices. citeturn8view3turn13view1
These mitigations also show up as platform guidance. For example, entity[“company”,“OpenAI”,“ai company”]’s remote MCP server documentation explicitly frames MCP servers as a powerful extension mechanism but spends substantial space on risks (prompt injection via untrusted content, malicious servers, data leakage) and recommends strong authentication (OAuth, dynamic client registration) and careful trust decisions (prefer official servers, minimize sensitive data exposure in tool metadata). citeturn13view2turn7search1
Benchmarks and empirical signals about MCP in the wild
A 2025 research paper on “making REST APIs agent-ready” provides unusually concrete data: it reports mining GitHub and identifying 22,722 MCP-tagged repositories in the six months after MCP’s release, but only 1,164 that contained functional server implementations—suggesting both rapid interest and substantial “boilerplate/implementation effort” friction. citeturn15view3
The same work introduces a compiler (AutoMCP) that generates MCP servers from OpenAPI specs and evaluates it on 50 APIs (>5,000 endpoints), finding recurring failure modes largely driven by incomplete/inconsistent specs. citeturn15view3
This aligns with practitioner guidance: auto-generation is useful for bootstrapping, but production-quality MCP tool surfaces usually require curation and intent-shaping. citeturn15view0turn15view1
A2A for agent-to-agent collaboration
A2A is an open protocol explicitly designed to let agents collaborate “as agents,” even when they do not share internal memory, tools, or context. The original Google announcement frames it as complementary to MCP and oriented toward large-scale multi-agent deployments. citeturn1view2
The entity[“organization”,“Linux Foundation”,“open source foundation”] launch announcement similarly emphasizes secure agent-to-agent communication, discovery, and collaboration across platforms/vendors/frameworks. citeturn1view0
What A2A does that MCP can’t
MCP can expose an “agent-like service” as a tool server, but MCP’s normative surface is still tools/resources/prompts for a single host-model loop—it does not standardize peer-agent collaboration semantics like task objects, agent cards, or modality negotiation. citeturn7search28turn7search13turn10view0
A2A’s spec and reference materials make these peer semantics first-class:
Agent discovery and capability advertisement (AgentCard).
A2A standardizes discovery via “Agent Cards” that describe capabilities and connection information, and the spec includes explicit capability validation rules (e.g., how clients should interpret streaming/push-notification capability flags) plus security scheme declarations in the AgentCard. citeturn9view0turn11view1turn10view0
Task-oriented collaboration, not single function calls.
A2A defines a Task object with lifecycle/state, and treats collaboration as task fulfillment. The Hugging Face explainer highlights task lifecycle and names the output object (Artifact). citeturn8view1turn10view0turn11view2
By contrast, MCP tools are invoked as discrete operations (even if the tool itself triggers a long-running process), and the protocol’s “unit” is a tool/resource/prompt interaction rather than a standardized, cross-agent task lifecycle. citeturn7search13turn7search28turn10view0
Modality and UX negotiation.
A2A messages are explicitly built from “parts” with content types, allowing agents/clients to negotiate formats and UI features. citeturn8view1turn10view0
This is materially different from MCP’s focus on tool schemas and resource/prompt retrieval; MCP does not define a peer negotiation mechanism for “what UI modalities do we both support for this task?” as a first-class interoperable concept. citeturn7search28turn10view0
Streaming updates and asynchronous delivery as protocol objects.
A2A specifies streaming events for task status and artifact updates and supports push notification configuration. citeturn11view2turn10view0
MCP can stream via its transport mechanisms, but it does not define a standardized cross-agent “task status update event” model or a native push-notification control plane the way A2A does. citeturn7search4turn10view0turn11view2
Multiple bindings beyond JSON-RPC.
A2A defines multiple standard protocol bindings, including JSON-RPC and gRPC, plus an HTTP+JSON/REST binding with SSE streaming described in the spec. citeturn11view3turn11view4turn10view0
MCP’s spec similarly defines transports/bindings, but the point here is what is being bound: in A2A it’s a task-and-agent-collaboration model; in MCP it’s a tool/resource/prompt model. citeturn7search4turn7search28turn10view0
Summarizing the boundary in one line: MCP answers “how do I give a model access to tools and context?” while A2A answers “how do I make agents discover each other and collaborate on tasks without becoming each other’s tools?” citeturn1view2turn7search5turn10view0
When direct API calls make sense vs wrapping in MCP vs describing in skills
These choices are most coherent when you treat them as different points on a spectrum of intent-shaping vs reuse vs operational overhead.
Direct API calls (via your orchestration code) usually win when:
You want to expose a small number of high-level actions with strict validation and clear approvals, and you don’t need cross-host interoperability for that integration. This matches the standard “tool calling flow” where the model emits a call and your application executes it. citeturn5search0turn13view1
It also aligns with repeated critiques that agents do poorly when forced to chain many low-level API endpoints; putting the composition burden in your code can reduce tool calls and context pollution. citeturn15view0turn15view1
Wrapping APIs in MCP servers tends to make sense when:
You want one connector surface that multiple hosts/agent frameworks can reuse, you want to standardize discovery/metadata/security policies around the integration, or you need to expose not just actions but also resources/prompts in a consistent way. citeturn7search5turn7search28turn13view2
The empirical AutoMCP work suggests there is real engineering cost in manual MCP development (boilerplate, low-churn single-maintainer implementations), which is why MCP server generation from OpenAPI is attractive—but it also highlights that spec quality becomes a limiting factor. citeturn15view3
A practical “middle way” that shows up across practitioner guidance is: use OpenAPI-to-MCP to bootstrap, then curate into intent-level tools and implement selective exposure so the agent doesn’t ingest thousands of endpoints. citeturn15view0turn15view1turn15view2
Describing API usage in SKILL.md (or skills in general) tends to make sense when:
You have a shell/CLI-capable agent environment and the integration is best expressed as a workflow (“run these commands; parse results; apply policy checks”), especially when you can hide heavy logic in scripts whose outputs (not source code) enter context. citeturn1view5turn13view1turn0search12
This can be extremely token-efficient for broad competencies because only metadata is always loaded and detailed instructions/scripts are pulled in on demand. citeturn1view5turn13view0
But the security posture is fundamentally different: skills can become a supply-chain and shell-execution risk surface, and recent ecosystem scanning suggests a non-trivial rate of vulnerable or malicious skills in public registries. citeturn14search8turn14search4
So, skills are most appropriate when you can enforce provenance and sandboxing (enterprise policy, curated registries, least privilege), not as an unvetted “download and run” ecosystem. citeturn14search8turn13view2
The “A2A + MCP + tools/skills” architecture that actually scales
Many sources converge on a layered architecture:
- Use A2A to find and coordinate specialized agents (task delegation, updates, artifacts). citeturn9view0turn10view0turn8view1
- Inside each specialized agent, use MCP (or native tool calling) to access external systems with standardized controls. citeturn1view2turn7search5turn13view2
- Use tool search / deferred loading and progressive disclosure (skills) to keep context lean as the integration surface grows. citeturn8view5turn1view5turn8view3
This is also consistent with A2A’s own reference materials, which explicitly teach A2A–MCP complementarity and emphasize preserving opacity (agents collaborate without exposing internal tools/state). citeturn9view0turn1view2