Build LogMar 17, 2026

Building a self-learning X marketing agent

How I built an agent that runs A/B experiments, analyzes results, and rewrites its own strategy without me touching it. Here's the full architecture.

Most marketing agents automate a fixed script. Post at this time. Reply to this query. Follow these rules. That works until it doesn't. The market changes, the platform algorithm shifts, your intuitions turn out to be wrong. And the bot keeps doing what you told it regardless.

I wanted something different: an agent that figures out what works and updates itself. After running for several weeks, the playbook is on version 14. It behaves nothing like what I originally configured. Here's how it's built.

Version 1 was bad on purpose

You can't run experiments on nothing. The agent needed a seed strategy. Version 1 of reply-playbook.json was intentionally rough: search X for tweets about AI automation, reply to ones with 200 to 5,000 views, keep it short and casual. The initial queries were generic: “AI automation,” “building AI agents,” “chatbot for business.”

Two problems showed up fast. Generic queries return noise. “AI automation” attracted recruiter spam, corporate LinkedIn reposts, and listicle accounts. The 200 to 5K view range was also too small. Replies to 200-view tweets got 4 to 10 views. Not enough signal to learn from.

That's fine. Version 1 is supposed to be wrong. The point is to get data flowing so the learner has something to analyze.

Three skills, one closed loop

The system runs on three OpenClaw skills wired together:

reply-guy

Reads the playbook. Searches X for targets using the current queries. Scores candidates by engagement rate. Spawns an Opus sub-agent to write the reply text. Posts it via browser automation. Logs everything to reply-log.json.

engagement-scrape

Navigates to the replies tab on X. Scrapes views and likes for every logged reply. Merges updated stats back into reply-log.json. Tags each reply with a performance tier (low/mid/high).

learner

Reads the experiment definitions. Checks if any hit their sample target. Analyzes results across groups. Writes conclusions. Updates the playbook rules. Bumps the version number.

Data flows in one direction: reply-guy posts and logs, engagement-scrape enriches the log, learner reads the enriched log and updates the playbook, reply-guy reads the updated playbook on the next tick. The loop is fully closed. I don't touch it between versions.

The heartbeat wakes the agent every 30 minutes. Each tick it checks what's due: a reply (every 25 minutes, 9 AM to 11 PM), or a scrape (every 3 hours). Scrape always runs first so fresh data applies to the next reply.

The Opus/Sonnet split

The reply text itself is written by Claude Opus, spawned as a sub-agent on each tick. Sonnet handles everything else: search, scoring, browser automation, logging, learning. This is intentional.

Opus writes better replies. It understands nuance, matches tone, avoids generic phrasing. Worth paying for. But you do NOT want Opus doing browser automation or scraping. That would cost 100x more per tick for zero benefit. The cost split matters:

Opus reply write: ~$0.01/reply (~500 tokens in, ~25 out)

Sonnet orchestration: ~$0.002/tick

Sonnet scrape: ~$0.003/scrape

Total: well under $1/day for a 24/7 system

The playbook structure

reply-playbook.json has four sections. This is the whole brain of the system:

targeting — minViews, maxViews, maxReplies, minEngagementRate, search queries. Controls what the agent replies to.
rules — array of distilled lessons with sample sizes and dates. Every concluded experiment adds a rule here.
experiments — active A/B tests. Hypothesis, rotation logic, current data, conclusion state.
stats — running totals updated each scrape: total replies, avg views, best reply, current streak.

The learner reads this file, runs analysis, writes back. The reply-guy reads this file and acts on it. They don't talk to me unless something breaks.

What the agent learned

v10

URLs suspended. Replies with URLs averaged 75% fewer views. The early-stop rule fired at n=46. Rule written, experiment closed.

v11

Style A wins: short opinion takes (43v avg) outperform quote-and-question replies (23v avg). Opinion feels like conversation. Questions feel like engagement bait.

v12

Query overhaul. Generic automation queries returned recruiter spam and listicle accounts. Switched to AI-builder focused terms: agent orchestration, LLM production, OpenClaw builds.

v13

Target range updated to 2K to 50K views. Previous 200 to 5K range produced replies nobody saw. Higher-traffic tweets mean more reply visibility.

v14

Removed maxReplies cap. Was filtering out perfectly good targets. The cap was a premature optimization with no data behind it.

What a bad learning cycle looks like

Not every tick produces useful signal. Sometimes the skip list grows too large. The skip list is how the agent avoids replying to the same tweet twice. It stores tweet IDs it's already handled. When the list fills up relative to how many results a query returns, you get zero targets.

The agent detects this: if 3 consecutive search queries return empty results, it logs a NO_TARGETS event and reports to Discord. Then I can see the pattern and prune old skip list entries or widen the queries. Old tweet IDs expire after 7 days anyway since X's search rarely surfaces them, so this is mostly a self-correcting problem.

The other failure mode: an experiment runs but the two variants never get balanced data. One style gets 30 replies and the other gets 5 because the rotation timing doesn't work out evenly. The learner handles this by waiting for the smaller variant to hit its minimum before concluding. Patience over false precision.

The key insight

Here's what makes this different from a bot. The bot does what you told it. This system questions what you told it, runs experiments, finds the parts that are wrong, and fixes them. I set the initial hypotheses. The conclusions are data-driven. The gap between what I assumed and what the agent discovered is exactly where the value is.

Every Discord notification at 30-minute intervals is either a reply posted, a scrape completed, or a new playbook version. Watching those roll in and knowing the agent just updated its own strategy is the kind of thing that's hard to explain until you've seen it run.

Get the full config + 27 more skills

Everything you need to build this exact system is in the toolkit. 8 hours, 28 production skills, instant download.

Get the Toolkit ($197) →Have it built for you →