ArchitectureMar 14, 2026

The heartbeat pattern for always-on AI agents

Cron jobs fail silently. The heartbeat pattern gives your AI agent one clock, one decision tree, and state-aware scheduling that actually works.

My first version of the X marketing agent used cron jobs. Five of them: reply at 9 AM, reply at 12 PM, reply at 3 PM, scrape engagement at 6 PM, scrape again at 9 PM. Simple enough. Worked fine for a week.

Then one morning I woke up to duplicate replies. The 9 AM cron and the 12 PM cron had both fired, both found the same target tweet, and both posted. The scrape job had timed out silently the night before, so the skip list wasn't updated, so nothing knew the tweet had already been replied to. No errors logged. No alerts. Just two replies from the same account on the same tweet.

That's the cron job problem. It fires regardless of what happened last time. State is your problem to manage, separately, everywhere.

How the heartbeat pattern works

One clock. One agent. One decision tree. The gateway sends a wake event every 30 minutes (9 AM to 11 PM for social tasks). The agent wakes up, reads its state file, checks what's due, does the one highest-priority thing, updates state, and goes back to sleep.

Read heartbeat-state.json
Check each timestamp against its threshold
Find the highest-priority task that's due
Execute it, update state, report to Discord
Sleep until next wake event

The state file is the source of truth. Not a schedule. The actual timestamps of what ran last:

// heartbeat-state.json

{

"lastReplyPosted": "2026-03-20T16:42:11Z",

"lastEngagementScrape": "2026-03-20T13:37:24Z",

"repliesPostedToday": 12,

"todayDate": "2026-03-20"

}

When the agent wakes up, it subtracts lastReplyPosted from now. If it's been 25 minutes or more and the current hour is between 9 and 23, a reply is due. If it's been 3 hours or more, a scrape is due. If neither, it responds with HEARTBEAT_OK and goes back to sleep. Simple arithmetic, no coordination problems.

The HEARTBEAT.md contract

The agent's task controller lives in a file called HEARTBEAT.md. It defines thresholds, the decision tree, reporting formats, and when to send HEARTBEAT_OK. Every time the agent wakes up, it reads this file and follows it.

Here's a simplified version of the decision tree inside it:

## Decision Tree

Is scrape due? (lastEngagementScrape > 3h ago)

YES: run engagement-scrape skill

Is reply due? (lastReplyPosted > 25min AND hour 9-23 AND dailyCount < 45)

YES: run reply-guy skill

Both due?

Scrape first. Apply learnings, then reply.

Neither due?

HEARTBEAT_OK

The key part: you can change agent behavior by editing a markdown file. No code changes, no redeployment. The next heartbeat picks up the new instructions. I've updated thresholds, added error escalation rules, changed reporting formats, all without touching any running code.

Why this beats cron (and n8n, Make, LangGraph)

vs cron: Cron fires on a schedule. Heartbeat fires on a condition. If the last run failed 5 minutes ago, cron fires again at the scheduled time regardless. Heartbeat checks the timestamp, sees it's not 25 minutes yet, and skips. No duplicates, no wasted calls.

vs n8n/Make: Workflow tools are great for deterministic pipelines. Less great for agents that need to decide what to do based on context. You can't easily encode “if 3 consecutive errors of the same type, skip this task type and report to Discord” in a visual workflow. In HEARTBEAT.md it's two sentences.

vs LangGraph: LangGraph is a great framework but it's more infrastructure than you need for a single-agent heartbeat loop. The heartbeat pattern is simpler: one file defines the behavior, the agent reads it on wake. No graph compilation, no node definitions, no state machines to configure.

Error escalation built in

The agent reads an ERRORS.md file on every wake. If the same error type appears 3 or more times in the last 24 hours, it skips that task and posts a pattern alert to Discord:

🔴 Error pattern: 4x PASTE_FAILED in 24h — reply textarea not syncing after ClipboardEvent. Skipping reply-guy this tick.

Cron jobs fail silently into the void. Every heartbeat tick either reports what it did or why it didn't do anything. My Discord channel is a natural audit log: I can scroll back and see exactly what ran, when, and what the result was. That observability is free because every tick reports by default.

HEARTBEAT_OK ticks are intentional

Most ticks do nothing. 30-minute intervals, 9 AM to 11 PM, is 28 ticks per day. Most of those are HEARTBEAT_OK because nothing's due yet. That's by design.

Each HEARTBEAT_OK tick costs about $0.001 in Sonnet tokens. It's not wasted. It's the agent checking state cheaply and confirming everything is on track. When a task becomes due, the agent is already running and ready. No cold start, no lag. The “idle” ticks are what make the active ones fast and reliable.

The cost math

28 ticks/day at ~$0.001/tick = $0.03/day for orchestration. Actual work: 2 replies plus 1 scrape on most days. Opus reply calls at ~$0.01/reply = $0.02 for replies. Scrape at ~$0.003. Total: roughly $0.05 to $0.08 per day for a 24/7 agent running a full reply operation.

Daily: ~$0.05 to $0.08

Monthly: ~$1.50 to $2.40

Annual: ~$18 to $29

Running on a $700 Mac Mini. No cloud hosting costs.

For $2/month you get an agent that posts replies, scrapes its own engagement data, runs A/B experiments, updates its own strategy, and reports to Discord every 30 minutes. The Mac Mini it runs on is the most expensive part of the stack by a factor of 30.

The heartbeat pattern is how you get there. One clock. One decision tree. Everything else follows from that.

Find your path to production AI agents

Not sure if you should build it yourself or have someone build it for you? 60 seconds.

Take the quiz →Get the Toolkit ($197) →