How to Write Your First AI Agent in One Evening

You use ChatGPT every day. You paste text in, get text back, feel productive. But every tech conference in 2026 keeps throwing around one word: agents. Your PM says "we need an agent." Your CTO says "agents are the future." LinkedIn is drowning in agent thinkpieces. And you're sitting there thinking: "I don't even know what that means."

Here's the gap. A chatbot waits for you to type and types back — like texting a smart friend. An AI agent is different. An agent has a goal, picks its own tools, handles errors, and keeps working until the job is done or decides it can't be done. Nobody holds its hand. It's the difference between asking someone a question and hiring someone to do a job.

Tonight, you close that gap. You'll build a real agent — in Python, from scratch, no frameworks — that searches the web, analyzes information, makes decisions, and saves a research report to disk. By bedtime, you'll understand the exact pattern that powers Claude Code, Codex, Devin, and every other agent product charging $200/month. The pattern itself takes about 30 lines.

What we're building

A Research Agent that:

Takes a topic from you
Searches the web for relevant information
Reads and analyzes what it finds
Writes a structured research summary
Saves the result to a file

This isn't a toy demo. This is the same architecture used by production agents — tool use, reasoning loops, structured output. The only difference between this and a "production agent" is error handling and scale.

Step 1: Set up the project (10 minutes)

mkdir research-agent && cd research-agent
python3 -m venv venv
source venv/bin/activate

pip install anthropic httpx

Two dependencies:

anthropic — the Python SDK (software development kit — a pre-built library for talking to Claude's API)
httpx — for making web requests from Python

You also need an Anthropic API key — essentially a password that lets your code talk to Claude. Grab one at console.anthropic.com. New accounts get $5 in free credits, which is enough to run this agent hundreds of times.

export ANTHROPIC_API_KEY=sk-ant-...
touch agent.py

Step 2: Define the tools (15 minutes)

An agent without tools is just a chatbot. Tools are functions that let the agent interact with the real world — search the web, read files, call APIs (how programs yell at each other across the internet — think machine-to-machine texting).

We'll give our agent two tools:

# agent.py

import anthropic
import httpx
import json
import os
from datetime import datetime

client = anthropic.Anthropic()
MODEL = "claude-haiku-4.5"

# Tool definitions — these tell Claude what's available
tools = [
    {
        "name": "web_search",
        "description": "Search the web for information on a topic. Returns results with titles, URLs, and snippets.",
        "input_schema": {
            "type": "object",
            "properties": {
                "query": {
                    "type": "string",
                    "description": "The search query"
                }
            },
            "required": ["query"]
        }
    },
    {
        "name": "save_file",
        "description": "Save text content to a file on disk.",
        "input_schema": {
            "type": "object",
            "properties": {
                "filename": {
                    "type": "string",
                    "description": "Name of the file to save"
                },
                "content": {
                    "type": "string",
                    "description": "Content to write to the file"
                }
            },
            "required": ["filename", "content"]
        }
    }
]

These definitions work like a restaurant menu. Claude reads the descriptions and decides when to use each tool — you don't hard-code the order. The input_schema part uses JSON Schema — a standard format for describing what data looks like, so Claude knows exactly what parameters each tool expects. Yes, you describe your data format using yet another data format. Welcome to programming.

Step 3: Implement the tools (15 minutes)

Definitions tell Claude what exists. Now we write the code that actually runs when Claude calls a tool. This is where the rubber meets the road — or, more accurately, where your beautiful abstractions meet the ugly reality of parsing HTML like it's 2003:

def execute_tool(tool_name: str, tool_input: dict) -> str:
    """Execute a tool and return the result as a string."""
    if tool_name == "web_search":
        return do_web_search(tool_input["query"])
    elif tool_name == "save_file":
        return do_save_file(tool_input["filename"], tool_input["content"])
    else:
        return f"Error: Unknown tool '{tool_name}'"


def do_web_search(query: str) -> str:
    """Search using DuckDuckGo's HTML endpoint. No API key needed."""
    try:
        response = httpx.get(
            "https://html.duckduckgo.com/html/",
            params={"q": query},
            headers={"User-Agent": "ResearchAgent/1.0"},
            timeout=10.0,
        )
        response.raise_for_status()

        text = response.text
        results = []
        parts = text.split('class="result__snippet"')
        for part in parts[1:6]:  # Grab up to 5 results
            snippet_end = part.find("</a>")
            if snippet_end > 0:
                snippet = part[:snippet_end]
                clean = snippet.replace("<b>", "").replace("</b>", "")
                clean = clean.split(">")[-1] if ">" in clean else clean
                if clean.strip():
                    results.append(clean.strip())

        if results:
            return "Search results:\n" + "\n".join(
                f"- {r}" for r in results
            )
        return f"Search completed but no clear results for: {query}"

    except Exception as e:
        return f"Search error: {str(e)}"


def do_save_file(filename: str, content: str) -> str:
    """Save content to the output directory."""
    os.makedirs("output", exist_ok=True)
    filepath = os.path.join("output", filename)
    try:
        with open(filepath, "w") as f:
            f.write(content)
        return f"File saved successfully: {filepath}"
    except Exception as e:
        return f"Error saving file: {str(e)}"

The web search uses DuckDuckGo's HTML endpoint — no API key, no signup, no cost. The HTML parsing is held together with duct tape and optimism (we're scraping raw page markup rather than using a proper data feed), but it works. For production, you'd swap in Brave Search API (2,000 free queries/month) or self-hosted SearXNG.

Step 4: Build the agent loop (20 minutes)

This is the heart of the whole thing. Every agent product with a flashy landing page and a $50M valuation runs some version of these 30 lines:

def run_agent(topic: str, max_turns: int = 10) -> str:
    """Run the research agent on a topic."""
    print(f"\n{'='*60}")
    print(f"Research Agent — Topic: {topic}")
    print(f"{'='*60}\n")

    system_prompt = """You are a research agent. Your job is to research a topic
thoroughly and produce a well-structured summary.

Your process:
1. Search for information on the topic (multiple searches with different angles)
2. Analyze what you find
3. Write a comprehensive research summary
4. Save the summary to a file

Be thorough — do at least 3 different searches to cover the topic well.
Be critical — evaluate sources and note conflicting information.
When done, save the final summary as a markdown file.

Current date: """ + datetime.now().strftime("%Y-%m-%d")

    messages = [
        {
            "role": "user",
            "content": f"Research this topic and produce a detailed summary: {topic}"
        }
    ]

    # The agent loop
    for turn in range(max_turns):
        print(f"--- Turn {turn + 1} ---")

        response = client.messages.create(
            model=MODEL,
            max_tokens=4096,
            system=system_prompt,
            tools=tools,
            messages=messages,
        )

        print(f"Stop reason: {response.stop_reason}")

        if response.stop_reason == "tool_use":
            tool_results = []

            for block in response.content:
                if block.type == "tool_use":
                    tool_name = block.name
                    tool_input = block.input
                    tool_id = block.id

                    print(f"  Tool: {tool_name}")
                    print(f"  Input: {json.dumps(tool_input, indent=2)[:200]}")

                    result = execute_tool(tool_name, tool_input)
                    print(f"  Result: {result[:200]}...")

                    tool_results.append({
                        "type": "tool_result",
                        "tool_use_id": tool_id,
                        "content": result,
                    })

            messages.append({"role": "assistant", "content": response.content})
            messages.append({"role": "user", "content": tool_results})

        elif response.stop_reason == "end_turn":
            final_text = ""
            for block in response.content:
                if hasattr(block, "text"):
                    final_text += block.text

            print(f"\nAgent completed in {turn + 1} turns.")
            return final_text

    return "Agent did not complete within turn limit."

Let me break down this loop, because it's the entire magic trick:

Send the task to Claude along with the tool definitions
Claude thinks and decides to either use a tool or respond with final text
If tool_use — we execute the tool and send the result back as a new message
Claude sees the result and decides the next move
Repeat until Claude says end_turn — meaning it's done

The critical insight: you didn't hard-code "search first, then analyze, then write." Claude figures out the workflow based on the task. That's what separates an agent from a script. A script follows your instructions. An agent follows its own.

The stop_reason field is key. Claude's API returns either "tool_use" (I want to call a tool) or "end_turn" (I'm finished). Your loop just checks which one and acts accordingly.

Step 5: Add the entry point (5 minutes)

The boring part. But even boring parts need to exist, otherwise nothing runs — a lesson half the AI demo repos on GitHub still haven't learned:

if __name__ == "__main__":
    import sys

    if len(sys.argv) > 1:
        topic = " ".join(sys.argv[1:])
    else:
        topic = input("Enter research topic: ")

    result = run_agent(topic)

    print(f"\n{'='*60}")
    print("Research complete. Check the output/ directory.")
    print(f"{'='*60}")

Step 6: Run it (5 minutes)

python agent.py "current state of MCP protocol adoption in 2026"

Watch the terminal. You'll see the agent think through the problem on its own:

============================================================
Research Agent — Topic: current state of MCP protocol adoption in 2026
============================================================

--- Turn 1 ---
Stop reason: tool_use
  Tool: web_search
  Input: {"query": "MCP model context protocol adoption 2026"}
  Result: Search results: - The MCP ecosystem has grown...

--- Turn 2 ---
Stop reason: tool_use
  Tool: web_search
  Input: {"query": "MCP servers enterprise production 2026"}
  Result: Search results: - Amazon Bedrock AgentCore...

--- Turn 3 ---
Stop reason: tool_use
  Tool: web_search
  Input: {"query": "MCP protocol limitations challenges 2026"}
  Result: Search results: - Stateful sessions fight with...

--- Turn 4 ---
Stop reason: tool_use
  Tool: save_file
  Input: {"filename": "mcp-research-2026.md", "content": "# MCP Protocol..."}
  Result: File saved successfully: output/mcp-research-2026.md

--- Turn 5 ---
Stop reason: end_turn

Agent completed in 5 turns.

Five turns. Three searches, one file save, one final summary. Nobody told it to search from different angles — it decided that on its own. That's agency, not scripting.

Step 7: Make it smarter (30 minutes)

The basic agent works. Now let's add three upgrades that turn it from a demo into something you'd actually keep using.

Memory between sessions

Right now, every run starts from zero. Let's give the agent a simple memory — a JSON file (a structured text format that programs can easily read and write) that stores what it researched before:

from pathlib import Path

MEMORY_FILE = "memory.json"

def load_memory() -> list:
    """Load previous research topics and findings."""
    if Path(MEMORY_FILE).exists():
        with open(MEMORY_FILE) as f:
            return json.load(f)
    return []

def save_memory(topic: str, summary: str):
    """Save this research session to memory."""
    memory = load_memory()
    memory.append({
        "date": datetime.now().isoformat(),
        "topic": topic,
        "summary": summary[:500],
    })
    memory = memory[-20:]  # Keep last 20 entries
    with open(MEMORY_FILE, "w") as f:
        json.dump(memory, f, indent=2)

Inject the memory into the system prompt — the instruction text that shapes how Claude behaves:

memory = load_memory()
if memory:
    memory_context = "\n\nPrevious research sessions:\n"
    for m in memory[-5:]:
        memory_context += f"- [{m['date'][:10]}] {m['topic']}: {m['summary'][:100]}...\n"
    system_prompt += memory_context

Now the agent knows what it researched before. It can reference past findings, avoid duplicate searches, and build on previous work.

A thinking tool

This one's a trick. Add a tool that does literally nothing:

tools.append({
    "name": "think",
    "description": "Use this tool to think through your approach before acting. Write out your reasoning and what you need to find out next.",
    "input_schema": {
        "type": "object",
        "properties": {
            "thought": {
                "type": "string",
                "description": "Your reasoning and plan"
            }
        },
        "required": ["thought"]
    }
})

In the tool executor, it just returns a confirmation:

elif tool_name == "think":
    print(f"  Thinking: {tool_input['thought'][:300]}")
    return "Thought recorded. Continue with your plan."

Why add a tool that does nothing? Because it gives the agent a structured place to reason before acting. Without it, Claude jumps straight to tool calls. With it, Claude pauses, plans, and then executes — producing noticeably better results. Anthropic documents this technique in their tool use guide, and production agents rely on it.

Error recovery

def execute_tool_safe(tool_name: str, tool_input: dict) -> str:
    """Execute a tool with automatic retries."""
    for attempt in range(3):
        try:
            result = execute_tool(tool_name, tool_input)
            if "error" in result.lower() and attempt < 2:
                print(f"  Retry {attempt + 1}...")
                continue
            return result
        except Exception as e:
            if attempt < 2:
                print(f"  Error, retrying: {e}")
                continue
            return f"Tool failed after 3 attempts: {str(e)}"

Web requests fail. APIs go down. Timeouts happen. A production agent retries before giving up. Three attempts with fallback is the baseline.

The final structure

research-agent/
├── agent.py          # ~150 lines of Python
├── memory.json       # Auto-created, stores session history
├── output/           # Auto-created, stores research reports
│   └── *.md
└── requirements.txt  # anthropic, httpx

Under 200 lines. Two dependencies. No frameworks.

Why no framework?

You might be wondering: "Why not use LangChain or LlamaIndex?" (Both are popular Python frameworks that add pre-built abstractions around LLM calls.)

Because the agent loop above is 30 lines. LangChain would add 15 dependencies and three layers of abstraction for the same result.

Use a framework when:

You need 10+ tools with complex routing logic
You need conversation memory that scales across thousands of users
You need multiple agents coordinating on the same task
You've outgrown simple Python and need someone else's architecture

Skip the framework when:

You're building your first agent
Your agent has 2–5 tools
You want to understand every line of what's running
"It works and I understand it" beats "it works and I trust the abstraction"

As of March 2026, the Anthropic SDK documentation shows the same bare-bones loop pattern we just built. The official recommendation is to start without a framework.

What you built tonight

Let's take inventory:

A working AI agent — takes a goal, pursues it autonomously
Tool use — the agent calls external systems (web search, file I/O)
A reasoning loop — Claude decides the next action based on results, not a hard-coded script
Memory — the agent remembers past sessions and builds on them
Error handling — retries before failing
Output persistence — results land in actual files on your disk

This is the same core architecture as Claude Code, Devin, OpenAI Codex, and every other agent product. They have better tools, more error handling, and bigger context windows — the amount of text the model can "see" at once, like its working memory. But the loop is identical to what you just wrote.

Where to go next

You now understand the fundamental pattern. Everything else is engineering on top of it:

More tools — a calculator, a web scraper, a database connector, a code executor
Better memory — vector databases (systems that store text by meaning, not just keywords) for semantic search across past sessions
Parallel tool calls — run multiple searches at once instead of sequentially
Multi-agent systems — a second agent that reviews the first agent's work, like a code review
MCP integration — Model Context Protocol, a standard for connecting AI agents to external tools, like USB but for data sources

You're not learning someone's framework that might be dead in six months. You're learning the pattern. The same pattern that worked in 2024, works in 2026, and will work in 2028 — because the underlying mechanic (model decides → tool executes → result feeds back) is how all agent systems function, regardless of what marketing name they use.

The "AI agent" industry wants you to believe building agents requires a PhD and a $100M Series B. It requires understanding one loop: call the model, check if it wants a tool, execute the tool, send the result back, repeat. That's it. Everything else is engineering around that loop — and now you know enough to do that engineering yourself.