State of X (Twitter) Engagement 2026: What 983 Replies on One Account Actually Did
A field report from one indie account's real data: 983 replies, 224 follower attributions. Replies to people who engaged you first beat cold replies ~5x.
On this page · 10 sections
- The one number that should change how you reply
- Engagement score, defined (so you can audit us)
- Finding 1: Warm beats cold by ~5×. Reply where you're already wanted.
- Finding 2: Small accounts reciprocate. Big accounts mostly don't.
- Finding 3: Two clocks matter — roughly noon and 8pm.
- Finding 4: The first three words of a reply move the number.
- Finding 5: ~80% of your growth is ambient. Stop trying to attribute every follow.
- Putting it together: the playbook these numbers imply
- The honest methodology section
- A closing note on doing this at scale
The one number that should change how you reply
We logged 7,756 actions on a single X account (@thedeepflux, indie-hacker / micro-SaaS niche) over several weeks of an autonomous agent running real engagement. Of those, 983 replies had their actual engagement scraped back — likes, replies, and bookmarks counted after the fact. We also recorded 224 new-follower attributions.
Here is the finding that mattered more than any other:
Replies to people who engaged YOU first — notifications and mentions — averaged 5.8 engagement. Cold replies (you sliding into a stranger's thread) averaged 0.8 to 1.3. That is roughly a 5× gap.
Run the math both ways and it holds: against the high end of cold (1.3) it's 4.5×; against the low end (0.8) it's 7.2×; midpoint is about 5.5×. Call it 5×. Same account, same voice, same AI writing the replies, same time windows. The only variable that moved was who started the conversation — and it moved the outcome by roughly 5 to 1.
That's not a hack. It's reciprocity, and it's the most under-used lever on the platform.
One honest caveat up front, because the rest of this report leans on it: this is one account's data, n in the hundreds, in a single niche. It is a field report, not a universal law. We're publishing the actual numbers so you can argue with them, replicate them, or beat them — not so you can treat them as gospel. Where the sample is thin (it often is), we'll say so.
Engagement score, defined (so you can audit us)
Every number here uses one formula:
engagement = likes + 3×replies + 5×bookmarks
Bookmarks are weighted heaviest because a bookmark is the closest thing X gives you to "this was actually useful, I want it later" — real value capture, not a casual thumb. Replies are weighted 3× because they cost the other person effort and signal a live conversation. Retweets are excluded entirely; in our data they were too noisy to mean anything consistent. You may weight these differently — that's fine, the rank order of findings barely moves if you do.
Finding 1: Warm beats cold by ~5×. Reply where you're already wanted.
The 5.8-vs-1.0 gap is the whole ballgame. Cold replies are the default advice — "go reply to big accounts in your niche" — and on our data they were the worst-performing replies we made. The threads where someone had just liked, replied to, or followed us were where engagement actually compounded.
The tactic: before you spend any reply budget reaching into strangers' threads, clear your notifications first. Every like, reply, and new follower is a warm door. Reply there before you go cold-prospecting. If you only have ten replies in you today, spend the first eight on people who already touched your account this week. The data says those eight are worth roughly as much as forty cold ones.
This also reframes "engagement pods" and reply-guy grinding. The value isn't volume of cold replies — it's converting the warm inbound you're probably ignoring.
Finding 2: Small accounts reciprocate. Big accounts mostly don't.
We tiered every reply target by follower count: small (0–1K), peer (1K–10K), big (10K+). Replies to small accounts were the bulk of the dataset — n=197, the largest single tier — and they averaged 1.2 engagement. That's above the cold-reply floor and, critically, above what big accounts returned.
Big accounts are a visibility play, not a relationship play: you're a grain of sand in their replies, they will never see you, and they certainly won't reply back. Small accounts notice. They reply, they follow, they remember. The reciprocity that powers Finding 1 lives disproportionately in the small tier.
The tactic: stop optimizing your reply targets for the biggest possible audience. Optimize for people who can see you and will respond. A 400-follower builder shipping in your exact niche is a better reply target than a 200K-follower influencer — not because the reach is bigger (it isn't) but because the loop actually closes. Reach you can't convert is vanity.
(Sample honesty: 197 is a real n for small; peer and big tiers were thinner, so treat the direction — small reciprocates more — as more reliable than the exact decimals.)
Finding 3: Two clocks matter — roughly noon and 8pm.
Reply engagement was not flat across the day. It peaked twice — around 12:00 (noon) and around 20:00 (8pm) — at roughly 1.3 average, and troughed overnight at 0.7–0.8. So the difference between your best hour and your worst hour was nearly 2× on timing alone.
Two peaks, not one, is the interesting part. The common "post at 9am" advice misses the evening window entirely. Noon catches the lunch-scroll; 8pm catches the after-work, post-dinner, lying-on-the-couch scroll. Overnight is dead — which is obvious in hindsight but worth quantifying: replies you fire at 3am are doing roughly 60% of the work a noon reply does.
The tactic: concentrate your reply sessions in two blocks — a midday one and an evening one — rather than smearing them across the day or, worse, automating them through the night. If you're scheduling, schedule into the peaks. If you're manual, the couch-scroll at 8pm is prime time, not downtime.
(Caveat: timing is timezone- and audience-specific. Our account skews toward a builder audience whose noon and evening are when they're at a keyboard or doomscrolling. Yours may peak elsewhere — but the shape worth testing is "two peaks, dead overnight," not "one magic hour.")
Finding 4: The first three words of a reply move the number.
We tagged openers and measured them. Phrasing — just the first few words — produced a measurable spread:
- "unpopular maybe…" → 1.8 (the best opener we logged)
- "actually curious…" → 1.4
- "real" / "ok but…" → 1.3
- pure-question openers → underperformed
The pattern: openers that signal a take or genuine curiosity beat openers that just interrogate. "Unpopular maybe" pre-frames a contrarian opinion people want to argue with. "Actually curious" reads as a real human asking a real thing, not a reply-guy fishing. A naked question ("What made you choose X?") underperformed — probably because it puts all the work on the other person and signals low investment from you.
The tactic: lead with a stance or honest curiosity, not an interrogation. "Unpopular maybe, but [opinion]" gives people something to react to. A pure question gives them a chore. If your reply opens with "What" or "How" and nothing else, you're starting in the weakest measured position.
(This is the thinnest-sliced finding — opener-level buckets are small samples, so read these as hypotheses worth A/B-ing on your own account, not settled coefficients. The robust takeaway is "openers measurably differ; lead with a take," not "1.8 is the canonical value of 'unpopular maybe.'")
Finding 5: ~80% of your growth is ambient. Stop trying to attribute every follow.
This one humbled us. We logged 224 new followers and tried to trace each back to a specific action — a reply, a follow, a like, a DM. Only 44 of them (about 20%, average confidence 0.21) mapped to a single identifiable cause. The other ~80% had no clean attribution at all.
That's not a measurement failure — it is the finding. Most follows didn't come from a 1:1 interaction. They came from ambient visibility: someone saw our reply sitting under a popular tweet, in a thread we never directly engaged that person in, and followed. We showed up in the right reply section and got picked up by bystanders.
This quietly demolishes the "reply to convert the author" mental model. You're not primarily replying to win over the person you're replying to. You're replying to be visible to everyone else reading that thread. The author is the door; the audience behind them is the room.
The tactic: pick threads for who's watching, not who you're replying to. A reply under a tweet with 500 engaged readers — even a tweet from someone who'll never follow you back — can out-earn a perfect reply to someone with no audience. And stop obsessing over per-action ROI dashboards: if 80% of growth is unattributable ambient visibility, the right metric is "am I consistently visible in good rooms," not "did this specific reply get me this specific follow." The compounding is real; it's just diffuse.
Putting it together: the playbook these numbers imply
If you did nothing but act on this one account's data, here's the stack-ranked routine:
- Clear notifications first. Warm replies beat cold ~5×. This is the highest-leverage habit on the list, full stop.
- Target small accounts in your exact niche. They reciprocate (1.2 avg, our largest tier). Big accounts are lottery tickets.
- Run two reply blocks — midday and ~8pm. Skip overnight; it does ~60% of peak.
- Open with a take or real curiosity, not a bare question. "Unpopular maybe" energy.
- Choose threads by audience size, not author size. ~80% of growth is bystanders in good reply sections, not the people you reply to.
None of this requires a tool. It requires attention and consistency — which is exactly the problem, because attention and consistency are the two things humans run out of by day nine.
The honest methodology section
- Source: one X account, @thedeepflux, indie-hacker / micro-SaaS niche. Data pulled from the running agent's own action log.
- Volume: 7,756 total logged actions; 983 replies with scraped engagement outcomes; 224 follower attributions.
- Metric: engagement = likes + 3×replies + 5×bookmarks; retweets excluded as noise.
- Biggest sub-samples: small-tier replies (n=197) are the most reliable slice. Opener-level and tier-level (peer/big) buckets are smaller — treat their direction as signal and their exact decimals as soft.
- What this is not: a multi-account study, a controlled experiment, or a universal benchmark. It's a single, instrumented, real-world account reporting its actual numbers. Replicate before you bet your strategy on it.
We're publishing the real figures specifically so they can be argued with and beaten. If you run the same measurements on your account and get a different shape — especially on timing or openers, the two we flagged as thin — we want to hear it.
A closing note on doing this at scale
Everything above is doable by hand. The catch is that "clear notifications, reply to the right small accounts, hit both daily peaks, open with a take, pick threads for the audience" is a five-part discipline you have to execute every single day without skipping — and skipping is the default human outcome. The data in this report came from an account where that routine never skips a day, because it's run by an agent that does the warm-first, small-account, twice-daily, take-first engagement on its own — in the account's own voice. That's how we generated a dataset this size in the first place, and it's how we keep acting on it. If you want the routine without the daily willpower tax, that's the entire premise of X-Autopilot. Either way: clear your notifications first. That one's free.
Frequently asked
Answers indexed by Google + AI assistants.
Is replies-to-notifications really 5x better than cold replies?+
On this account's data, yes. Replies to people who engaged us first (notifications and mentions) averaged 5.8 engagement; cold replies into strangers' threads averaged 0.8 to 1.3. That's a 4.5x to 7.2x range depending on which end of cold you compare against, midpoint about 5.5x. We round it to 'roughly 5x.' It's one account, n in the hundreds, so treat it as a strong field signal, not a proven law.
What does 'engagement score' mean in this study?+
engagement = likes + 3x replies + 5x bookmarks. Bookmarks are weighted heaviest because they signal real value capture (someone wants it later), replies 3x because they cost the other person effort, and retweets are excluded entirely because they were too noisy to be meaningful in our data.
When is the best time to reply on X, according to this data?+
Two daily peaks: around noon (12:00) and around 8pm (20:00), both averaging about 1.3 engagement. Overnight troughs at 0.7 to 0.8 — roughly 60% of peak. The takeaway is 'two peaks, dead overnight,' not 'one magic hour.' Timing is timezone- and audience-specific; the shape is worth testing on your own account.
Should I reply to big accounts or small accounts to grow?+
Small accounts reciprocated more. Replies to small accounts (0-1K followers) were our largest sample (n=197) and averaged 1.2 engagement, beating big accounts. Big accounts are a visibility lottery — you're invisible in their replies. Small accounts notice, reply, and follow back. Optimize for who can see you and respond, not for the biggest possible audience.
Why can't most new followers be attributed to a specific action?+
Because most growth is ambient, not 1:1. Of 224 new followers, only about 20% (44, average confidence 0.21) traced to a specific reply, follow, or like. The other ~80% came from bystanders seeing our replies in good threads — visibility, not direct interaction. The lesson: pick threads for who's watching, not just who you're replying to, and stop chasing per-action follow attribution.
Does the opener of a reply actually change engagement?+
In our data, yes, measurably. 'Unpopular maybe' openers hit 1.8, 'actually curious' 1.4, 'real'/'ok but' 1.3, while pure-question openers underperformed. Openers that signal a take or genuine curiosity beat interrogation. This is our thinnest-sliced finding (small samples), so treat the direction — lead with a stance, not a bare question — as the robust takeaway, not the exact numbers.
Is this a representative study of all X accounts?+
No, and we won't pretend otherwise. It's one account (@thedeepflux) in the indie-hacker / micro-SaaS niche, n in the hundreds, instrumented in real operation. It's a field report meant to be replicated and argued with, not a universal benchmark. We publish the actual numbers precisely so you can test them against your own account.
Browse all tool comparisons, the X tools directory, or tool alternatives.