January 31, 2026

Agentic Frameworks: A First-Principles Deep Dive

agentsLangGraphPydanticAIarchitecture

Part 1: The 6 Core Abstractions Every Framework Provides

An LLM alone is a stateless function: f(prompt) → text. Agentic frameworks exist to turn this into a stateful, tool-using, goal-pursuing loop. Every framework — LangGraph, CrewAI, AutoGen, Semantic Kernel, OpenAI Agents SDK, etc. — is essentially providing some combination of the same ~6 core abstractions.

1. The Agent Loop (Reasoning-Action Cycle)

This is the foundational primitive. At its core:

while not done:
    observation = perceive(environment)
    thought = llm(system_prompt + memory + observation)
    action = parse_action(thought)
    result = execute(action)
    memory.append(observation, thought, action, result)

This is the ReAct pattern (Reason + Act). Every framework implements some variant. The differences are in how much structure they impose on the loop — LangGraph gives you a full state machine with explicit edges, while lighter approaches like OpenAI's Agents SDK keep it more implicit.

Tradeoff: More structure = more predictable behavior but less flexibility. A rigid graph is easier to debug but harder to adapt to unexpected situations. A free-form loop is more capable but harder to make reliable.

2. Tool / Function Abstraction

A tool is a function with a schema the LLM can understand:

Tool = {
    name: string,
    description: string,       # natural language for the LLM
    parameters: JSONSchema,    # structured input spec
    execute: (params) → result # actual implementation
}

The framework handles: serializing the schema into the prompt or function-calling API, parsing the LLM's output into a valid function call, executing it, and feeding the result back. This is largely commoditized now that most model providers support native tool calling.

Tradeoff: Rich tool descriptions improve selection accuracy but consume context window. Too many tools degrade selection performance — empirically, accuracy drops significantly beyond ~15-20 tools, which is why some frameworks introduce tool routing or hierarchical tool selection.

3. Memory / State Management

This is where frameworks diverge most. There are three layers:

Working memory — the current context window. Trivial but bounded by context length.
Short-term memory — conversation history management. The key question: when context gets too long, what do you summarize/drop? This is a lossy compression problem with no clean solution.
Long-term memory — persisted knowledge across sessions. Vector stores, knowledge graphs, structured databases.

The math matters here. If your context window is C tokens and each turn averages t tokens, you get roughly C/t turns before you must evict. Summarization compresses at some ratio r, but every compression loses signal. The real engineering challenge is the curation policy: what's worth remembering vs. forgetting.

Tradeoff: Richer memory = better coherence over long tasks, but higher latency per step (retrieval cost), more tokens consumed ($$), and more failure modes (stale or contradictory memories).

4. Orchestration / Multi-Agent Coordination

When you have multiple agents, you need a coordination pattern:

Sequential/Pipeline — agent A's output feeds agent B. Simple, predictable.
Hierarchical — a "manager" agent delegates to specialist agents. Mimics org charts.
Collaborative/Debate — agents critique each other's work. Can improve quality but multiplies cost.
Graph-based — arbitrary DAG of agent interactions (LangGraph's core selling point).

First-principles question: do you actually need multiple agents? Often a single agent with good tools and prompting outperforms a multi-agent setup. Multi-agent adds coordination overhead, error propagation between agents, and debugging complexity.

Tradeoff: Multi-agent can decompose complex tasks and enable specialization, but each agent-to-agent handoff is a lossy communication channel (natural language). You're trading compute cost and latency for potential quality gains that are highly task-dependent.

5. Planning / Task Decomposition

Some frameworks add explicit planning:

plan = llm("Break this goal into subtasks: " + goal)
for subtask in plan:
    result = agent.execute(subtask)
    if needs_replan(result):
        plan = revise_plan(plan, result)

This ranges from simple prompt-based decomposition to tree search (Tree of Thoughts), to full MCTS-style exploration. The sophistication varies enormously.

Tradeoff: Explicit planning helps on complex multi-step tasks but adds latency (extra LLM calls), can over-decompose simple tasks, and the plan itself can be wrong — leading to confidently executing the wrong steps. "Plan and execute" vs. "just start doing" is genuinely task-dependent.

6. Guardrails / Control Flow

The safety and reliability layer:

Input/output validation — schema checking, content filtering
Human-in-the-loop gates — pause for approval before irreversible actions
Retry/fallback logic — handle tool failures, malformed outputs
Budget constraints — max iterations, token limits, cost caps

This is often underweighted in framework marketing but critical in production. An agent without guardrails is a while True loop with your credit card attached.

What Frameworks Actually Differ On

Dimension	Lightweight (Agents SDK, Smolagents)	Heavyweight (LangGraph, CrewAI)
Loop structure	Implicit, model-driven	Explicit graphs/workflows
Multi-agent	Optional, simple handoffs	First-class, structured
State management	Minimal, bring your own	Built-in persistence
Learning curve	Low	High
Debuggability	Harder (implicit flow)	Easier (visible graph)
Flexibility	High	Constrained by framework

Alternatives to Consider

Before adopting any framework, ask: what if I just wrote the loop myself? A basic agent is ~50 lines of code:

Call the model with tools
If it returns a tool call, execute it, feed result back
If it returns text, you're done
Add retry logic and a max-iteration cap

Frameworks earn their keep when you need persistent state across sessions, complex multi-agent coordination, observability/tracing, or human-in-the-loop workflows. If you don't need those, the framework is overhead you're paying for in complexity, abstraction leakage, and version churn.

The honest assessment: the agentic framework space is still immature and churning fast. The abstractions aren't fully settled yet. Betting heavily on any single framework's specific API is risky — betting on understanding the underlying patterns is not.

Part 2: Top 5 Frameworks Deep Dive

The Contenders

Based on current adoption, momentum, architectural distinctiveness, and production readiness:

#	Framework	v1.0 / Stable	Philosophy
1	LangGraph	v1.0 (Oct 2025)	Graph-based agent runtime — maximum control
2	OpenAI Agents SDK	Stable (Mar 2025)	Minimalist primitives — fewest abstractions
3	CrewAI	Stable	Role-based multi-agent — team metaphor
4	PydanticAI	v1.0 (Sep 2025)	Type-safe, FastAPI-inspired — validation-first
5	Google ADK	v0.5 (2025)	Event-driven, multi-language — Google-ecosystem-optimized

Why these 5? AutoGen (Microsoft) is notable but has fragmented between versions (0.2 → 0.4 rewrite → AG2 fork), creating ecosystem confusion. These 5 represent the clearest, most architecturally distinct approaches with active momentum going into 2026.

Dimension 1: The Agent Loop (Reasoning-Action Cycle)

This is the most fundamental question: how does the framework structure the think → act → observe cycle?

LangGraph: Explicit State Machine

LangGraph gives you the most control. The loop is a directed graph you define explicitly:

from langgraph.graph import StateGraph, START, END

class State(TypedDict):
    messages: Annotated[list, add_messages]
    next_action: str

graph = StateGraph(State)
graph.add_node("reason", reason_node)
graph.add_node("act", act_node)
graph.add_node("evaluate", evaluate_node)

graph.add_edge(START, "reason")
graph.add_conditional_edges("reason", route_decision)
graph.add_edge("act", "evaluate")
graph.add_conditional_edges("evaluate", should_continue)

app = graph.compile()

You choose exactly which nodes execute, in what order, under what conditions. The execution model is inspired by Google's Pregel (bulk synchronous parallel): nodes fire in "super-steps," each processing messages from the previous step.

The math: Your agent's behavior is a function f: State × Node → State applied iteratively. The graph topology constrains the iteration space. A recursion limit (default 1000 in v1.0.6) bounds total super-steps.

What this means in practice: You can implement ReAct, plan-and-execute, tree-of-thought, or any custom loop topology. The graph is the interface. But you must _design_ the graph — there's no default "just go do things" mode (though create_react_agent provides a one-liner for the standard ReAct pattern).

OpenAI Agents SDK: Implicit Model-Driven Loop

The SDK takes the opposite approach — the loop is hidden:

from agents import Agent, Runner

agent = Agent(
    name="Assistant",
    instructions="You are a helpful assistant",
    tools=[search_tool, calc_tool]
)

result = await Runner.run(agent, "What's the GDP of France?")

The Runner internally does:

Send prompt + tools to LLM
If LLM returns tool call → execute tool → feed result back → goto 1
If LLM returns text → done

That's it. The loop is the standard ReAct cycle with the model deciding when to stop. You don't define edges or nodes — the LLM _is_ the router.

What this means in practice: Fast to ship, easy to understand. But if you need the model to follow a strict multi-step workflow (e.g., "always validate before submitting"), you're encoding that in the prompt, not the framework. This works surprisingly well with capable models but gives you less structural guarantees.

CrewAI: Task-Driven Sequential/Hierarchical Loop

CrewAI structures the loop around tasks assigned to roles:

from crewai import Agent, Task, Crew

researcher = Agent(role="Researcher", goal="Find accurate data", ...)
writer = Agent(role="Writer", goal="Write compelling content", ...)

research_task = Task(description="Research AI trends", agent=researcher)
write_task = Task(description="Write article", agent=writer)

crew = Crew(
    agents=[researcher, writer],
    tasks=[research_task, write_task],
    process=Process.sequential  # or Process.hierarchical
)
result = crew.kickoff()

The loop is implicit within each task (agent reasons + acts until task is complete), and explicit between tasks (sequential or managed by a "manager" agent in hierarchical mode).

What this means in practice: The abstraction is high-level — you think in roles and deliverables, not graph edges. Great for "assign work to specialists" patterns. Less suitable for fine-grained control over individual reasoning steps.

PydanticAI: Typed Agent Loop with Validation Gates

PydanticAI's loop is structurally similar to OpenAI's but wrapped in strict type validation:

from pydantic_ai import Agent
from pydantic import BaseModel

class CityInfo(BaseModel):
    name: str
    population: int
    country: str

agent = Agent(
    'openai:gpt-4o',
    result_type=CityInfo,
    system_prompt="Extract city information"
)
result = agent.run_sync("Tell me about Paris")
# result.data is guaranteed to be a valid CityInfo

The loop runs until the LLM produces output that passes Pydantic validation. If validation fails, the error is fed back to the LLM for self-correction. This is a validation-gated ReAct loop — the framework won't return until the output schema is satisfied.

The math: This adds a constraint satisfaction layer: the output space is restricted to {o ∈ LLM_outputs | validate(o, Schema) = True}. The retry mechanism is essentially rejection sampling with feedback.

Google ADK: Event-Driven Runtime

ADK structures everything as events flowing through an event loop:

from google.adk.agents import Agent
from google.adk.tools import FunctionTool

def get_weather(city: str) -> str:
    return f"Weather in {city}: Sunny, 22°C"

weather_agent = Agent(
    name="weather_agent",
    model="gemini-2.0-flash",
    instruction="Help users check the weather",
    tools=[FunctionTool(get_weather)]
)

Internally, the runtime processes a stream of events: UserInput → ModelCall → ToolCall → ToolResult → ModelCall → FinalResponse. Each event is a discrete unit that can be inspected, logged, replayed. ADK also provides workflow agents (SequentialAgent, ParallelAgent, LoopAgent) for deterministic control flow alongside LLM-driven agents.

What this means in practice: The event-driven model is excellent for debugging and observability (you can inspect every event). The dual approach of LLM agents + workflow agents means you can mix deterministic and non-deterministic control flow in the same system.

Comparative Assessment

Aspect	LangGraph	OpenAI SDK	CrewAI	PydanticAI	Google ADK
Loop visibility	Fully explicit	Hidden	Task-level	Hidden + validation	Event stream
Custom topologies	Arbitrary graphs	Prompt-only	Sequential/Hierarchical	Linear + retry	LLM + Workflow agents
Default behavior	Must define	ReAct auto	Task execution	Validated ReAct	Event-driven ReAct
Best when	You need precise control	Standard patterns suffice	Role delegation works	Output correctness critical	Mixed deterministic + LLM

First-principles verdict: The "right" loop structure depends on your failure mode tolerance. If wrong outputs are expensive (financial, safety), you want explicit graphs (LangGraph) or validation gates (PydanticAI). If speed-to-ship matters more and your model is capable, the implicit loops (OpenAI SDK, ADK) are pragmatic.

Dimension 2: Tool / Function Abstraction

LangGraph

Tools are LangChain Tool objects or plain functions decorated with @tool. LangGraph itself is tool-agnostic — it just executes nodes. Tools live inside nodes. MCP support via adapters that auto-discover and convert MCP tools to LangChain format.

Strength: Largest existing tool ecosystem via LangChain integrations. Weakness: The adapter layer for MCP adds abstraction cost vs. native MCP frameworks.

OpenAI Agents SDK

Tools are Python functions decorated with @function_tool, or MCP servers (hosted or local). Automatic schema generation. Minimal boilerplate.

Strength: Native MCP support (both hosted and local). Minimal boilerplate. Weakness: Best performance with OpenAI models; tool calling quality varies with other providers.

CrewAI

Tools inherit from BaseTool or use the @tool decorator. CrewAI also supports MCP servers in agent configuration.

Strength: Tools can be assigned per-agent, matching the role metaphor. Weakness: Tool ecosystem is smaller than LangChain's. Tool routing across agents can be opaque.

PydanticAI

Tools are plain Python functions with type annotations — Pydantic infers the schema automatically. The dependency injection via RunContext makes tools testable and type-safe. Native MCP support.

Strength: The dependency injection is genuinely elegant — it makes tools testable (you can mock deps) and type-safe. The type system catches schema mismatches at write time, not runtime. Weakness: Python-only for tool definitions.

Google ADK

Tools are FunctionTool wrappers, with built-in support for Google services, OpenAPI specs, and MCP. Also supports AgentTool — using another agent as a tool.

Strength: Rich pre-built tools for Google ecosystem. AgentTool concept is powerful. Multi-language support (Python, TypeScript, Go, Java). Weakness: Non-Google integrations require more work.

Comparative Assessment

Aspect	LangGraph	OpenAI SDK	CrewAI	PydanticAI	Google ADK
Schema generation	Manual + decorator	Auto from types	Decorator	Auto from types (best)	Auto from types
MCP support	Adapter layer	Native (hosted + local)	Config-based	Native	Native
Testability	Standard	Standard	Standard	Excellent (DI)	Good
Pre-built tools	Largest (LangChain)	OpenAI built-ins	Moderate	Minimal	Google ecosystem
Multi-language	Python, JS	Python, TypeScript	Python	Python	Python, TS, Go, Java

First-principles verdict: Tool abstraction is largely commoditized — they all wrap functions with schemas. The real differentiators are: PydanticAI's dependency injection (testability), Google ADK's multi-language breadth, and LangChain's existing integration ecosystem. MCP is converging as the universal protocol, which will further commoditize this layer.

Dimension 3: Memory / State Management

This is where frameworks diverge most significantly.

LangGraph: First-Class State with Checkpointing

LangGraph's state management is its defining feature. State is a typed schema that flows through the graph. Every node reads and writes to it. Reducers control how updates merge:

from typing import Annotated
from langgraph.graph import add_messages

class AgentState(TypedDict):
    messages: Annotated[list, add_messages]  # append reducer
    documents: list[str]                      # overwrite reducer (default)
    iteration_count: int

Checkpointing saves state at every super-step to a persistence backend (SQLite, Postgres, etc.). This enables durable execution, time travel, and human-in-the-loop pauses.

The math: State transitions are S_{t+1} = reducer(S_t, node_output). Checkpointing creates a DAG of states. Time-travel is graph traversal on this DAG. The cost is O(n) storage per super-step × state size.

Tradeoff: Most powerful state management of any framework, but also most complex. You need to design your state schema, choose reducers, configure checkpointers.

OpenAI Agents SDK: Session-Level, BYO Long-Term

The SDK provides Sessions for automatic conversation history. Long-term memory is not included — you integrate external solutions (Mem0, your own vector store, etc.).

Tradeoff: Clean and simple for conversational agents. But durable execution, state rollback, or cross-session memory are your responsibility.

CrewAI: Memory as a Feature Set

CrewAI provides multiple memory types as configuration: short-term, long-term (persisted across crew runs via embeddings), entity memory (structured knowledge about entities), and user memory (per-user preferences).

Tradeoff: Most "batteries-included" memory system. But you don't control the curation policy (what gets stored, what gets forgotten).

PydanticAI: Message History + Deps for State

PydanticAI manages conversation history via message_history and delegates broader state to the dependency injection system.

Tradeoff: Clean, explicit, type-safe. But memory management is your responsibility — no built-in long-term memory, embeddings, or curation.

Google ADK: Session + Memory Services

ADK separates state into two layers: Session state (per-session, managed by SessionService) and Memory service (cross-session recall, managed by MemoryService). Also supports session rewind — rolling back to before a previous invocation.

Tradeoff: Good built-in support, especially within Google Cloud. But full memory service capabilities depend on the Google ecosystem.

Comparative Assessment

Aspect	LangGraph	OpenAI SDK	CrewAI	PydanticAI	Google ADK
Working memory	State schema + reducers	Auto session	Per-crew context	Message history	Session state
Short-term (conversation)	Built-in via messages	Sessions	Built-in	Manual message passing	SessionService
Long-term (cross-session)	Persistent checkpointers	BYO	Built-in (entity, user)	BYO via deps	MemoryService
Durable execution	✅ Checkpointing	❌	❌	❌ (external via Temporal)	Partial
Time travel / rollback	✅ Full DAG	❌	❌	❌	✅ Session rewind
State complexity	High (you design it)	Low	Medium (config-based)	Medium (DI-based)	Medium

First-principles verdict: LangGraph is the clear winner if state management is your core requirement. CrewAI is easiest to "just turn on." PydanticAI and OpenAI SDK are honest about what they don't provide, which is respectable.

Dimension 4: Orchestration / Multi-Agent Coordination

LangGraph: Arbitrary Graph Topologies

Agents are nodes. Coordination is edges. Supports sequential, parallel (via parallel super-steps), hierarchical, cyclic, and any custom topology. Sub-graphs can be composed into larger graphs.

Strength: Maximum flexibility. If you can draw the coordination pattern, you can build it. Weakness: You _must_ draw it. No sensible defaults. Graph design is a skill.

OpenAI Agents SDK: Handoffs

Multi-agent coordination via a single primitive: handoffs. An agent can transfer control to another agent. The receiving agent takes over completely.

Strength: Dead simple. One concept to learn. Weakness: Only supports delegation/transfer patterns. No parallel execution, no debate, no complex coordination topologies.

CrewAI: Role-Based Teams

Two modes: sequential (assembly line) and hierarchical (manager delegates and reviews).

Strength: The role/team metaphor is intuitive. Good for decomposable workflows. Weakness: Limited to these two coordination patterns. Can't do arbitrary topologies without workarounds.

PydanticAI: Agent-as-Tool Delegation

No native multi-agent orchestration. You compose agents by using one agent as a dependency or tool for another.

Strength: Explicit, type-safe, composable. No magic. Weakness: No built-in coordination patterns. Multi-agent workflows are hand-rolled.

Google ADK: Hierarchical Agents + Workflow Agents

Structured multi-agent via agent hierarchies and workflow agents (SequentialAgent, ParallelAgent, LoopAgent). Also supports AgentTool and the A2A protocol for cross-framework interoperability.

Strength: Clean separation of deterministic and dynamic orchestration. A2A protocol is forward-looking. Weakness: Still v0.5 — solid primitives but younger ecosystem.

Comparative Assessment

Pattern	LangGraph	OpenAI SDK	CrewAI	PydanticAI	Google ADK
Sequential	✅	Via handoffs	✅ Native	Manual	✅ SequentialAgent
Parallel	✅ Super-steps	❌	❌	❌	✅ ParallelAgent
Hierarchical	✅ Sub-graphs	Handoff chains	✅ Native	Manual	✅ Agent hierarchy
Cyclic / Debate	✅	❌	❌	Manual	✅ LoopAgent
Custom topology	✅ Arbitrary	❌	❌	Manual	Workflow + LLM combo
Cross-framework	❌	❌	❌	A2A support	A2A protocol

First-principles verdict: Do you actually need multi-agent? If yes, LangGraph gives you the most expressive power. CrewAI gives you the fastest path to role-based delegation. Google ADK's mix of deterministic + LLM orchestration is architecturally the most interesting.

Dimension 5: Planning / Task Decomposition

Framework	Approach	Verdict
LangGraph	No built-in. Implement as graph nodes. Can build plan-and-execute, tree-of-thought, MCTS as topology.	Maximum flexibility, zero built-in convenience.
OpenAI SDK	No explicit planning. Relies on model's internal chain-of-thought.	Works well with strong reasoning models; fragile with weaker ones.
CrewAI	Task lists _are_ the plan. In hierarchical mode, the manager does dynamic planning.	Good for known workflows; less flexible for dynamic decomposition.
PydanticAI	No built-in. Structured outputs can encode plans as typed objects.	You can build planning, but the framework doesn't provide it.
Google ADK	Workflow agents provide deterministic planning. LLM agents do dynamic planning.	The static/dynamic split is pragmatic. Pre-plan the structure, let LLMs handle details.

Dimension 6: Guardrails / Control Flow

Guardrail Type	LangGraph	OpenAI SDK	CrewAI	PydanticAI	Google ADK
Output schema enforcement	Manual	Manual	Basic	Best-in-class	Good
Input validation	Node-based	First-class	Basic	Via deps	Callbacks
HITL interrupts	Best-in-class	❌	Per-task	❌	Tool confirmation
Cost/token limits	Via LangSmith	Max turns	Max iterations	Usage tracking	Via Cloud
Observability	LangSmith	Built-in tracing	Basic logging	Logfire integration	Dev UI + Cloud
Evaluation tooling	Via LangSmith	Via OpenAI evals	❌	pydantic_evals	Built-in CLI

Summary: When to Use What

Choose LangGraph when: You need durable execution, human-in-the-loop as a core requirement, custom coordination topologies, long-running agents (hours/days), or time-travel debugging. Cost: Steep learning curve. Graph design is a new skill.

Choose OpenAI Agents SDK when: You want to ship fast with minimal abstractions, standard ReAct + handoff patterns suffice, and you're primarily using OpenAI models. Cost: You'll build everything beyond the basics yourself.

Choose CrewAI when: Your problem maps naturally to roles and tasks, you want multi-agent out of the box, and built-in memory matters. Cost: Less control over individual agent behavior.

Choose PydanticAI when: Output correctness is your #1 priority, you want type safety and IDE support, and you value testability. Cost: Limited orchestration — you build coordination yourself.

Choose Google ADK when: You're in the Google Cloud ecosystem, want multi-language support, the deterministic + LLM agent split matches your architecture, and A2A interoperability matters. Cost: Still pre-1.0. Google-ecosystem bias.

The Honest Meta-Assessment

None of these frameworks have settled on truly stable abstractions yet. Some patterns are emerging:

MCP is winning as the universal tool protocol. All 5 now support it.
The "just write the loop" approach (OpenAI SDK, PydanticAI) is gaining ground as models get more capable.
State management remains the hardest unsolved problem. LangGraph's checkpointing is the most mature but also most complex.
Multi-agent is oversold. Most production systems use 1-3 agents with good tools, not swarms of 10+ specialists.
Interoperability (A2A, MCP, AG-UI) is the real next frontier.

The safest strategy: understand the primitives, write your core agent logic in a way that's framework-portable, and treat the framework as the deployment/infrastructure layer rather than the business logic layer.