AIDevelopmentTutorials

Building Custom AI Agents for Your Development Pipeline

Adrian Saycon

February 16, 20264 min read

Building Custom AI Agents for Your Development Pipeline

AI agents are moving beyond chatbots. In development, an agent is an AI system that can take actions autonomously: read files, run commands, make API calls, and make decisions based on results. I’ve been building custom agents for my dev pipeline over the past few months, and they’ve become genuinely useful team members.

What AI Agents Are in a Dev Context

Think of an AI agent as a junior developer who never sleeps and follows instructions precisely. Unlike a simple AI chat where you ask a question and get a response, an agent operates in a loop: it receives a task, plans steps, executes them, observes results, and adjusts its approach. The key difference is autonomy. You give it a goal, not step-by-step instructions.

In practice, dev agents handle tasks like:

Reviewing pull requests for security issues and code quality
Running test suites and diagnosing failures
Generating migration scripts from schema changes
Monitoring deployments and rolling back on errors

Building a Simple Code Review Agent

Let’s build something practical: an agent that reviews PRs for common security issues. We’ll use Node.js and the Anthropic SDK.

import Anthropic from "@anthropic-ai/sdk";
import { execSync } from "child_process";

const client = new Anthropic();

async function reviewPR(prNumber) {
  // Get the diff
  const diff = execSync(
    `gh pr diff ${prNumber} --repo your-org/your-repo`
  ).toString();

  // Get changed file contents for full context
  const files = execSync(
    `gh pr view ${prNumber} --json files --jq '.files[].path'`
  ).toString().split("n").filter(Boolean);

  const fileContents = files.map(f => {
    try {
      return { path: f, content: execSync(`gh api repos/your-org/your-repo/contents/${f} -q .content | base64 -d`).toString() };
    } catch { return { path: f, content: "[binary or unavailable]" }; }
  });

  const response = await client.messages.create({
    model: "claude-sonnet-4-20250514",
    max_tokens: 4096,
    messages: [{
      role: "user",
      content: `Review this PR diff for security issues. Focus on:
- SQL injection vulnerabilities
- XSS risks in user-facing output
- Hardcoded secrets or API keys
- Insecure dependency usage
- Authentication/authorization gaps

Diff:n${diff}nnFull file contexts:n${JSON.stringify(fileContents, null, 2)}`
    }]
  });

  return response.content[0].text;
}

This is a basic version. A real agent adds a feedback loop: it posts review comments, waits for changes, and re-reviews.

Integrating with CI/CD

The real power comes from running agents in your pipeline. Here’s a GitHub Actions workflow that triggers the review agent on every PR:

name: AI Security Review
on:
  pull_request:
    types: [opened, synchronize]

jobs:
  security-review:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-node@v4
        with:
          node-version: 20
      - run: npm ci
      - run: node scripts/review-agent.mjs ${{ github.event.pull_request.number }}
        env:
          ANTHROPIC_API_KEY: ${{ secrets.ANTHROPIC_API_KEY }}
          GH_TOKEN: ${{ secrets.GITHUB_TOKEN }}

Making Agents Agentic: The Tool Loop

The Claude API supports tool use, which is what turns a simple prompt into a real agent. You define tools the agent can call, and it decides when to use them:

const tools = [
  {
    name: "read_file",
    description: "Read contents of a file in the repository",
    input_schema: {
      type: "object",
      properties: {
        path: { type: "string", description: "File path relative to repo root" }
      },
      required: ["path"]
    }
  },
  {
    name: "run_test",
    description: "Run a specific test file and return results",
    input_schema: {
      type: "object",
      properties: {
        testFile: { type: "string", description: "Test file to run" }
      },
      required: ["testFile"]
    }
  },
  {
    name: "post_comment",
    description: "Post a review comment on the PR",
    input_schema: {
      type: "object",
      properties: {
        body: { type: "string" },
        path: { type: "string" },
        line: { type: "number" }
      },
      required: ["body"]
    }
  }
];

// Agent loop
let messages = [{ role: "user", content: initialPrompt }];

while (true) {
  const response = await client.messages.create({
    model: "claude-sonnet-4-20250514",
    max_tokens: 4096,
    tools,
    messages
  });

  if (response.stop_reason === "end_turn") break;

  // Process tool calls
  const toolResults = await Promise.all(
    response.content
      .filter(block => block.type === "tool_use")
      .map(async (toolCall) => ({
        type: "tool_result",
        tool_use_id: toolCall.id,
        content: await executeTool(toolCall.name, toolCall.input)
      }))
  );

  messages.push({ role: "assistant", content: response.content });
  messages.push({ role: "user", content: toolResults });
}

Architecture Considerations

A few lessons from building production agents:

Set token budgets. Agents can loop indefinitely. Cap the number of tool-call iterations (I use 10-15 for most tasks) and set max_tokens appropriately.
Log everything. Agent decisions are non-deterministic. Log every message, tool call, and response so you can debug unexpected behavior.
Scope permissions tightly. Only give agents access to what they need. A review agent shouldn’t be able to merge PRs.
Use cheaper models for simple tasks. Not every agent needs the most capable model. Route simple checks to faster, cheaper models and reserve the heavy hitters for complex reasoning.

Agents are the most exciting application of AI in development right now. Start with a simple, well-scoped task like PR review or test diagnosis, get it working reliably, and expand from there. The tooling is mature enough for production use if you build in the right guardrails.