AI Code Review Case Study: Building a WordPress Plugin

A real example of an AI code review: genuine WordPress security fixes, and a quiet field-name mismatch only real test data caught.

I write a fair bit about vibe coding in general terms: what it’s good for, where it falls down, how I work with it day to day. All useful, but it’s also exactly the kind of thing that’s easy to nod along to without it really landing. So here’s one real example of an AI code review instead of another general principle.

PHP security fix reviewed during an AI code review of a WordPress plugin

Not a hypothetical. The actual back-and-forth that went into an AI code review of the JSON Importer plugin I use to bulk-publish posts to my own WordPress sites (the same tool, as it happens, that imported a recent batch of content for this site). I asked AI to review the existing code, took its findings seriously, made my own calls on what to act on, tested each fix on its own before moving to the next, and still hit a snag that taught me more about the limits of an AI code review than the review itself did.

Why This Is the AI Code Review Example I Keep Coming Back To

I already had a working plugin. Simple enough on the surface: paste a JSON array into a textarea, hit import, and it creates WordPress posts from it, pulling in Yoast SEO fields, categories, and tags along the way. It did the job. I hadn’t looked at it critically in a while.

So I handed the code over and asked for exactly that: dig through it, flag anything wrong, tell me where it could be better. If you want the broader argument this case study sits inside, I’ve laid that out in my vibe coding overview: what it’s good for, where it falls down, and the workflow I actually use. This post is the proof, not the theory.

One thing worth saying up front, because it matters for how this whole exercise went: I didn’t take a list of fixes and bolt them all on at once. Each one went in, got tested properly on its own, and only then did the next one get added. That’s not an AI habit. That’s just how you avoid ending up with three half-working changes and no idea which one broke things.

Where the AI Code Review Earned Its Keep

This is the bit that’s genuinely useful, and it’s worth being specific about why. A security review of your own code, even code you wrote yourself a while back, is exactly the kind of bounded, verifiable task vibe coding is good at. There’s a finite list of known WordPress vulnerability patterns (missing nonces, unsanitised content, capability mismatches), and checking code against that list is mechanical work, not judgement work.

A few of the things the review turned up:

No CSRF protection on the form. Nothing stopped a forged request from outside WordPress tricking a logged-in user into running an import they never intended. The fix is a standard WordPress nonce, checked on submission:

wp_nonce_field('json_importer_action', 'json_importer_nonce');

and then, before anything else runs:

if (
    isset($_POST['import_json']) &&
    isset($_POST['json_importer_nonce']) &&
    wp_verify_nonce($_POST['json_importer_nonce'], 'json_importer_action') &&
    current_user_can('edit_posts')
) {

Three checks, all required, none of them optional. That’s the kind of thing that’s easy to half-implement (add the nonce field but forget to verify it, say), and a careful pass catches that.

post_content going into the database raw. Titles were being sanitised. Content wasn’t. That’s a stored XSS route sitting wide open for anyone who can get content into the import. The fix uses the right function for the job, not a blunt one:

'post_content' => !empty($record['content']) ? wp_kses_post($record['content']) : '',

wp_kses_post() strips anything that shouldn’t be in post content, things like script tags and on* attributes, while leaving normal HTML markup intact. Using sanitize_text_field() here instead would have stripped all the HTML, which isn’t what anyone publishing a formatted post wants.

The Capability Gap That Took Real WordPress Experience to Spot

A capability gap that only shows up if you follow the consequence all the way through. The plugin deliberately allows Authors and Contributors to use the importer, not just Editors and Admins. Fine in principle. But nothing in the original code stopped a Contributor’s JSON from setting status to publish directly, which is exactly the kind of thing the Contributor role exists to prevent. The fix checks the actual capability for the post type rather than trusting the JSON:

$post_type_object = get_post_type_object($requested_type);
$can_publish = $post_type_object && current_user_can($post_type_object->cap->publish_posts);

if ($requested_status === 'publish' && !$can_publish) {
    $requested_status = 'pending';
}

I like this fix more than a flat rejection would have been. It doesn’t fail the whole import because someone aimed too high. It just quietly puts the post into the queue an Editor would expect to review, which is what should have happened anyway.

My Own Role in Getting This Right

That capability gap is the one I’d flag as the clearest example of AI earning its keep here. I’d loosened the restriction for a reasonable-sounding reason (letting more roles use a handy tool) and hadn’t followed the consequence all the way through to what a Contributor could actually do with it. The review caught the gap I’d left.

I want to be straight about my own part in all this too. The list wasn’t something I took and blindly implemented. Every point got read, understood for why it mattered, and judged on its own merits, including which behaviours stayed exactly as I wanted them (categories get created automatically if they don’t already exist, which was my instruction from the start, not something the review suggested).

Where It Quietly Went Wrong

Here’s the part that matters most, because it happened to me, not to some hypothetical careless beginner.

I’d documented my own JSON format separately: posts should use a title field and a status field. Standard stuff for me, second nature, written down in my own notes.

The plugin’s older code was still checking for post_title and post_status, the field names from an earlier version, not my documented spec. Nobody flagged the mismatch during the review, because the review was looking at whether the code was secure and sensible, not whether it matched a specification it had never been shown. Despite all that, the code looked complete, ran without errors, and would have passed a quick glance with no trouble at all.

It failed the first time I actually used it on a real file. Every single record came back “skipped, no post_title,” because the file used title, exactly as I’d specified, and the code was looking for the wrong key entirely.

Why a Clean-Looking Result Isn’t the Same as a Correct One

This is precisely the failure mode I’d point to if someone asked me what’s actually risky about vibe coding. Nothing crashed and no error message pointed at the real problem. The plugin did exactly what it was told, which simply wasn’t what I needed it to do. If I hadn’t tested it against a real file in my real format, it would have sat there looking finished while quietly doing nothing useful at all.

Worth noting too: my own first guess at the cause, before I’d looked at the actual JSON, was wrong. I assumed the array was nested oddly. It wasn’t. The fix only happened once I stopped guessing and looked at the real file.

The Fix the AI Code Review Missed Until Real Data Caught It

Once the actual JSON was in front of me, the fix was straightforward. Check the documented field name first, then fall back to the old one for anything still in that format:

$record_title = $record['title'] ?? $record['post_title'] ?? '';

and the same approach for status:

$requested_status = $record['status'] ?? $record['post_status'] ?? 'pending';

Two lines, and both old and new JSON formats work from here on. It’s quick once you know what’s actually wrong; the finding-out is the part that took the testing.

The lesson isn’t really about JSON keys. It’s about where the actual risk sits in this way of working. The security review (judging code against known, documented vulnerability patterns) was reliable and saved me real time. Matching code against my own specific intent, the bit that only existed in my own notes and never made it into the conversation, was where things slipped through. Vibe coding can tell you your code is insecure against a known checklist, but it can’t tell you your code doesn’t match a spec it was never shown, or wasn’t told to check against in the first place.

That’s the whole argument in miniature: useful for the bounded, checkable work, blind to the context that lives only in your head. Right up until you run it against something real and find out the hard way.

What I’d Do Differently in My Next AI Code Review

If you’re working this way yourself, here’s what this particular case actually changes about how I’ll prompt next time:

  1. State the spec explicitly, every time, even when it feels obvious. I assumed “my JSON format” was implicit context that would carry forward. It wasn’t, because it was never actually given the current spec, only the old code, which had drifted from it.
  2. Paste the real data alongside the real code. A genuine sample of the JSON file would have caught the field name mismatch in seconds, the same way it eventually did once I stopped guessing and showed the actual file.
  3. Treat “looks complete and runs without errors” as a status report, not a result. The plugin satisfied both of those and still did nothing useful. Running it against real data is the only test that counts.
  4. Test each fix on its own before adding the next. This is the habit that kept the rest of this project sane. Bundling several changes together and testing once at the end means that when something breaks, you’re hunting through all of them to find out which one did it.
  5. Use vibe coding hardest for the work with a checklist, lightest for the work that depends on what’s in your head. The security review worked because there’s a known list of WordPress vulnerability patterns to check against. Matching my undocumented intent had no such list, and that’s exactly where it slipped.

None of this is a reason to avoid an AI code review for something like this. The plugin works, it’s in daily use, and it’s measurably more secure than before the review. It’s a reason to know which part of the job you’re still doing yourself.

Leave a Reply