AIDevelopment

AI for Code Reviews: What Works and What Doesn’t

Adrian Saycon

March 18, 20263 min read

AI for Code Reviews: What Works and What Doesn’t

Six months ago I started running every pull request through an AI review before requesting human review. My expectation was that AI would catch the easy stuff so humans could focus on architecture. That’s roughly what happened — but the details were surprising.

Where AI Reviews Excel

Consistency checks. AI catches every missing semicolon, inconsistent naming convention, unused import, and mismatched bracket. I stopped writing comments about code style entirely.

Missing edge cases. AI is genuinely good at spotting missing null checks, unhandled error states, and boundary conditions:

// AI caught that this function doesn't handle the empty array case
function getLatestPost(posts: Post[]): Post {
  return posts.sort((a, b) =>
    new Date(b.date).getTime() - new Date(a.date).getTime()
  )[0]; // undefined if posts is empty
}

Documentation gaps. It notices when a public function lacks JSDoc, when a complex regex has no explanation, or when config options are undocumented.

Security basics. SQL injection patterns, hardcoded secrets, missing input sanitization, exposed API keys in client code — AI catches these reliably.

Where AI Reviews Fail

Architectural decisions. AI can’t tell you that your new service class should be a hook, or that this feature belongs in a different module. It reviews code in isolation without understanding the broader system trajectory.

Business logic validation. AI doesn’t know that your e-commerce app should never allow negative prices. It verifies the code does what it says, not that what it says matches business needs.

Performance at scale. AI might flag an O(n²) loop but can’t tell you that n is always under 10 so it doesn’t matter.

False positives. The biggest practical problem. AI suggests refactoring perfectly fine code, flags intentional patterns as mistakes, and recommends changes that would make code worse. Your team needs to learn to filter the output.

Setting Up an Effective Workflow

Step 1: AI reviews first. Run the PR through AI before requesting human review. A GitHub Action triggers on PR creation:

# .github/workflows/ai-review.yml
name: AI Code Review
on:
  pull_request:
    types: [opened, synchronize]

jobs:
  review:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - name: Run AI Review
        uses: your-ai-review-action@v1
        with:
          model: claude-sonnet
          focus: "security,edge-cases,consistency"
          max-comments: 10

Step 2: Author addresses AI comments. Fix valid suggestions before requesting human review. Humans see cleaner code with obvious issues already resolved.

Step 3: Human review focuses on what matters. Architecture, business logic, and design decisions — a much better use of reviewer time.

Calibrating the Output

Scope the review. Don’t ask AI to review “everything.” Give it specific focus areas: security, error handling, type safety. Fewer but higher-quality comments.

Provide context. Include your coding standards and known patterns in the review prompt. AI that knows your conventions won’t flag intentional deviations.

Limit comment count. Cap at ten comments per review. Forces the AI to prioritize. Ten high-signal comments beat fifty mixed ones.

The Numbers

Across about 200 PRs:

~40% of AI comments led to code changes
~30% were valid observations the author chose not to change
~30% were false positives or style preferences

Human review time dropped by roughly 25%. The biggest win wasn’t time saved — it was review quality. Humans gave better feedback because they weren’t bogged down in the mundane stuff. AI code review is a complement, not a replacement.