Build LogMar 21, 2026

102 replies, a playbook overhaul at midnight, and two new A/B experiments

Crossed 100 replies. Late night session on March 20, pulling data and finding my assumptions wrong in a good way. Full manual audit, two concluded experiments digested, playbook pushed to v17. The question now: style_a beat style_b decisively and URLs are banned. What's the next lever?

How the loop works

The reply-guy cron runs every 3 hours. Richard (claude-sonnet-4-6) searches X for targets, spawns an Opus sub-agent to write the reply, posts via headed Chrome on port 9225, and logs everything to reply-log.json.

The brain of the whole system is data/reply-playbook.json. Targeting rules, active experiments, voice rules, engagement learnings, all versioned with timestamps. Every time the learner concludes an experiment or the strategist rewrites a rule, the version number bumps and the diff is traceable.

If you want the full architecture of how this stays alive 24/7, read the heartbeat pattern post. The short version: a 30-minute heartbeat wakes the agent, it checks what's due, does the highest-priority task, updates state, goes back to sleep.

Playbook v17 changes

length cap removed

Old 5-15 word hard cap doesn't work for substantive tweets. New rule: match length to tweet complexity. Simple hype = 5-10 words. Technical discussion = up to 30 words with a real insight.

experiment 4

Reply length scaling. Formal test: 50/50 rotation. length_short (5-15w) vs length_long (15-30w). Target sample: n=40. Measuring avg views per length group.

experiment 5

Viral pattern mimicry. Study replies that hit 100v+, identify structural patterns, test those vs default style. Tagged viral_pattern vs default_pattern. Needs the text logging bug fixed first (see below).

query overhaul

15 queries to 20. Four categories: core (6, direct AI agent/LLM building topics), outcome (5, people showing AI results), aspiration (5, people who want what the course teaches), pain (4, people frustrated with AI tools). Old queries were generic automation pain: n8n too complicated, zapier too expensive. New queries: "AI agent production", "LLM agent orchestration", "agentic AI workflow", "built an AI agent".

What broke (and what it means)

Two issues surfaced during the audit.

First: the reply text field is empty in recent log entries. Newer cron runs capture target_author, target_views, and query_used correctly, but not the actual reply text. Posting still works fine. This is a data quality issue. Root cause is likely in how the cron job passes Opus output back to the logging step. This is critical for Experiment 5: you can't study viral patterns without the text. Added to P1.

Second: the daily sales cron (job 7b30e187) has 2+ consecutive errors. Likely Gumroad cookies expired. Moving up priority. Two consecutive failures means something changed. Investigate before the third run.

What the agents shipped overnight

@WesRoth: 33,890 views via “agent workflow”

@pinai_io: 7,315 views via “AI agent infrastructure”

@AethirEco: 5,104 views via “AI agent infrastructure”

@iruletrenches: 2,798 views via “OpenAI agent built”

The query overhaul is already pulling bigger targets. Old queries rarely surfaced 30K+ view tweets. The shift from generic automation pain to AI-builder-specific terms changed the whole target pool.

Also running overnight: PARA Review cron ran at midnight and committed memory files. AI Intel report and Political Money Flow monitor both ran clean.

Current numbers

Total replies: 102

Avg views/reply: 49

style_a avg: 51v (n=22)

style_b avg: 60v (n=9, 2 outliers inflate: chainyoda 209v, LangChain 102v)

URL replies avg: 9v (n=2), suspended

No-URL replies avg: 36v (n=44)

When to close an experiment

This is the key lesson from the audit. We ran style_a vs style_b to n=45 total. That was probably 20 replies too long. The trend was clear at n=25. The cost of running a concluded experiment: you learn nothing new from those reps. Every reply after the signal converged was wasted rotation.

But don't close too early. The URL experiment looked conclusive at n=10 (2 URL vs 8 no-URL), but we ran it to n=46 because the stakes were high enough to earn the larger sample. Banning URLs from every future reply is a permanent rule. That deserves confidence.

Rule of thumb: soft threshold at n=15 per variant, hard close at n=30 per variant if the gap is more than 30% and directionally consistent. Fast cycle without being reckless.

Same principle applies to any agent pipeline. Define your measurement window before you start. Write your conclusion criteria into the experiment object. If it's not in the config, you'll keep running indefinitely because “it might change.” It won't. The data already told you.

Get the full config + 27 more skills

Everything you need to build this exact system is in the toolkit. 8 hours, 28 production skills, instant download.