How TripleDart Runs Keyword Research With Claude and Ahrefs

How we turn a 10,000-keyword export into an intent-scored content calendar in under an hour, the handoff between Ahrefs and Claude that makes it work, and the three judgment calls we refuse to delegate.

Talk to Our GTM Experts

Scroll to the Article!

Manoj Palanikumar

May 26, 2026

How TripleDart Runs Keyword Research With Claude and Ahrefs

Key Takeaways

Ahrefs and Semrush still pull the keywords. Claude does the clustering, intent classification, and prioritization that used to eat a full analyst day.
Volume, difficulty, and SERP features come from Ahrefs. Semantic grouping, buying-stage tagging, and content-type recommendations come from Claude. The handoff is the unlock.
We cluster 5,000 to 15,000 keywords per client engagement. A senior analyst can finish a first-pass content calendar in an afternoon instead of a week.
Intent classification is where Claude earns its keep. Commercial investigation versus informational intent on ambiguous MOFU queries is the single hardest call in B2B SaaS research.
The output of the research flow is not a keyword list. It is a prioritized content calendar with page types, authority deficits, and sequencing recommendations.
Three calls stay human: which wedge a client should lean into, which cluster to cut despite high volume, and when to shelve a cluster because the SERP will not reward us yet.

Why Research Is the Bottleneck, Not the Keyword Data

Keyword data has not been the bottleneck in B2B SaaS SEO for years. Ahrefs, Semrush, and the rest of the stack pull 10,000 keywords in minutes.

The bottleneck is everything that happens after the export. The clustering, the intent classification, the prioritization, the transformation of a raw keyword universe into a content calendar the team can plan against.

That work used to run three days of a senior analyst's week per client. Mechanical filtering, hand-tagging the top 300 rows, eyeballing the rest, producing a calendar that was still wrong on roughly a third of the intent calls.

Claude does not eliminate that work. It compresses it. The 80% that was mechanical runs in about an hour of Claude time. The 20% that was judgment stays with the analyst and is what the client is paying for.

This article walks the exact pipeline. The handoff between Ahrefs and Claude, the three prompts that carry most of the load, and the cluster-level decisions we insist on keeping human.

The agency-wide stack this sits inside is covered in our Claude SEO guide. This piece is the keyword-research layer.

The Split: What Ahrefs Owns, What Claude Owns

The two tools do different jobs. Ahrefs pulls data. Claude reasons over it. Forcing either tool to do both halves produces worse output than keeping the split clean.

Ahrefs is still the source for keyword universe, search volume, keyword difficulty, SERP feature presence, top-ranking URLs, and historical ranking trends.

The Ahrefs AI Content Helper and Keywords Explorer both pull cleaner data at scale than any Claude workflow we have tested. The Ahrefs AI SEO statistics compilation covers the data edge in more depth.

The starter keyword research template we publish shows the column structure we work against.

Our SaaS SEO strategy guide frames how the research output feeds the broader plan.

Claude is the reasoning layer that sits on top. It reads the export, clusters semantically, classifies intent, flags ambiguities, and recommends page types. None of that is work a keyword tool is built to do.

The mistake we watch teams make is trying to use Claude as a keyword tool or using Ahrefs as a clustering tool. Neither works. The value sits in the handoff between the two.

The Export Shape Claude Reads Most Reliably

We pull the Ahrefs Keywords Explorer export with every column enabled: keyword, volume, difficulty, CPC, SERP features, top URL, ranking history, intent tag, and global volume. Claude reads the full export.

Thinner exports (keyword + volume only) produce thinner output. The intent tag alone dropped our classification accuracy about 12 points when we tested it against human-labeled ground truth.

We store exports in a Claude Code project directory and let the filesystem MCP handle the reads. No copy-paste. No "paste the first 500 rows" dance.

The Search Engine Land breakdown of Claude Code as an SEO command center frames the same data-pipeline approach.

The Cluster Count Is More Useful Than the Keyword Count

A keyword list of 10,000 rows is a keyword universe. A set of 200 intent-tagged clusters is a content plan. The research pass exists to turn the first into the second.

Content teams that plan against keywords tend to write disconnected pieces that rank in isolation. Content teams that plan against clusters write hub-and-spoke systems that reinforce each other through internal links and shared entities.

The cluster count is useful because it maps to the calendar. A 200-cluster output means roughly 40 clusters per quarter of publishing at a two-piece-per-week cadence. That math keeps the calendar honest on what the content team can realistically ship against.

Clusters are also the unit of refresh. When the market shifts, you do not re-evaluate 10,000 keywords. You re-evaluate 200 clusters. That ratio is what makes quarterly research refreshes tractable instead of avoided.

The other underrated benefit of cluster-level planning is writer onboarding. A new writer picking up a client gets a 40-cluster calendar and a 12-section brief per cluster, not a 10,000-row keyword spreadsheet. That alone cuts ramp time from weeks to days.

The Three Prompts That Carry the Load

Three prompts do most of the reasoning work. Each one runs once per research pass. Running them in order produces the clustered, intent-scored content calendar the content team plans against.

Prompt One. Semantic Clustering

The first prompt takes the full Ahrefs export and groups the keywords by user intent and semantic similarity, not lexical overlap.

Cluster these keywords by user intent and semantic similarity.

For each cluster return:

- cluster name (2-3 word topic)

- representative keyword (highest volume in cluster)

- buying stage (TOFU | MOFU | BOFU)

- page type to target (guide | comparison | alternative | tool | hub)

- cluster size (count of keywords)

- total monthly volume

- average KD across cluster

Flag any cluster where buying stage is mixed (recommend splitting).

For a 10,000-keyword export, the output runs 200 to 400 clusters. That is more than any analyst would surface manually in a day. Most of the clusters need a 15-second review and a thumbs up. The 10% that need more attention get flagged.

Prompt Two. Intent Reclassification

Intent labels from Ahrefs are a good starting point. They are not the final answer on nuanced B2B queries. Claude reads each cluster and reclassifies intent with more granularity than the standard four-way split.

For each cluster, assign a refined intent:

- TOFU informational (broad learning)

- TOFU solution-aware (problem learning)

- MOFU commercial investigation (comparing options)

- MOFU alternative-seeking (replacing current tool)

- BOFU transactional (ready to evaluate)

- BOFU branded (existing customer or direct intent)

Flag clusters where the ranking URL at position 1 does not match

the refined intent. These are conversion leaks.

The six-way split is what the content calendar plans against. A cluster tagged "MOFU alternative-seeking" needs a different page type and a different angle than a cluster tagged "MOFU commercial investigation."

Prompt Three. Content Calendar Prioritization

The third prompt takes the clustered, intent-scored output and produces a sequenced content calendar.

For each cluster, return a priority score (1 to 10) based on:

- total monthly volume (higher weight to BOFU)

- keyword difficulty relative to client DR

- SERP opportunity (featured snippets, people also ask,

presence of competitor content we can beat)

- content gap (client does not currently rank in top 30)

- conversion proximity (BOFU > MOFU alternative > MOFU commercial > TOFU)

Return the top 40 clusters, grouped into three buckets:

- ship this quarter (weeks 1-12)

- ship next quarter

- shelve until authority improves

The top-40 list is what the content lead hands to the writers. The bucket structure is what the content calendar keys off of.

The "shelve until authority improves" bucket is the one writers appreciate most. It stops teams from spending three months on clusters where the SERP will not reward them.

What Our Keyword Research Data Shows

Across the keyword research engagements we run, Claude's clustering plus intent classification completes in about an hour for 10,000-keyword exports. Human review plus calendar prioritization adds another two to three hours. The same work pre-Claude ran a senior analyst for the better part of a week. The accuracy gap has closed in parallel with the speed gap, which is the part that was harder to believe before we measured it.

Intent Is the Call Claude Gets Right Most Often

Of all the research tasks, intent classification is where Claude contributes the most new value. Volume and difficulty come from Ahrefs. Clustering is useful but not novel. Intent is where the handoff changes the research output most.

The hardest intent calls in B2B SaaS are the MOFU queries that look informational but reward commercial content.

A keyword like "how does workflow automation reduce manual work" reads informational. The SERP for it is dominated by product comparison pages with buyer-intent framing. The ranking opportunity is BOFU, even though the query phrasing is TOFU.

A junior analyst calls that informational 80% of the time. Claude, prompted against the SERP context, calls it commercial correctly about 90% of the time in our internal testing.

That 10-point swing moves roughly 15% of a typical content calendar from the wrong bucket to the right one.

The calls where Claude is still weaker are navigational versus branded-informational splits, and very-low-volume long-tail queries where SERP context is ambiguous. We let an analyst spot-check any cluster with under 50 monthly volume before it goes into the calendar.

The HubSpot research on AI-assisted content reaches the same pattern from a different angle. Automated workflows earn their keep when a human stays in the loop for the edge cases the model cannot yet call.

The Three Judgment Calls We Keep Human

Three decisions do not get delegated.

Which Wedge the Client Should Lean Into

A 200-cluster research output can point at three or four directional wedges the client could own.

Which one they should lean into is a business decision. It depends on product roadmap, positioning, sales conversation themes, and competitive context Claude does not have access to.

The content strategist takes that call with the client on a strategy call, not through a prompt.

The Ahrefs study on ranking in AI Overviews makes a related point. Brand authority signals carry more weight than keyword coverage when the wedge is narrow.

Which Cluster to Cut Despite High Volume

Every research pass surfaces three or four clusters where the volume is tempting but the cluster does not fit the client's business.

A sales enablement SaaS with an enterprise ICP should not chase high-volume TOFU traffic that converts to SMB trial signups at a 12-to-1 ratio. The calendar that includes those clusters looks impressive on paper and underperforms on pipeline.

Claude does not know the ICP conversion math. The analyst does. The analyst cuts the cluster from the calendar and replaces it with a lower-volume cluster that maps to actual buying behavior.

When to Shelve a Cluster Until Authority Improves

A client with a Domain Rating of 42 cannot realistically rank for a cluster where the top 10 URLs all sit at DR 72 or higher.

Chasing it produces content that sits at position 23 for six months. The content looks fine in isolation. It fails against the ranking baseline the SERP imposes.

Claude's prioritization prompt flags these clusters. The analyst is the one who shelves them with a note to revisit when the DR gap closes. Our enterprise SEO metrics framework covers how we track that gap over time.

What the Calendar Looks Like Coming Out

The final deliverable is not a keyword list. It is a content calendar.

Each row in the calendar has a target cluster, a primary keyword, a representative set of secondary keywords, a recommended page type, an assigned buying stage, a priority score, and an anchor claim slot.

The brief writers build from each row feeds directly into our content writing workflow. The clustering work is also where the 12-section brief template's sections 1, 2, 6, and 12 get populated.

The calendar also feeds the technical audit pipeline. When we audit a site, the pages we write against each new cluster become candidates for internal linking, schema enrichment, and crawl-depth promotion. The pipelines connect.

Cluster-level schema planning is a downstream win we did not appreciate until we started the pipeline.

A cluster tagged "comparison" gets Product and Offer schema from day one. A cluster tagged "guide" gets Article and FAQPage. That discipline is covered in our schema workflow.

The calendar row also drives internal linking decisions at publish time. A new BOFU cluster page links from at least three existing MOFU pages at launch.

The anchor text is tied to the cluster's primary keyword. The architecture side of that discipline lives in our internal linking playbook.

Three-quarters of the rows ship inside the first quarter. A quarter sit in the "next quarter" or "shelved" buckets and get re-evaluated when the ranking data comes back.

The Pattern We See on First-Quarter Calendars

Content calendars built on this pipeline tend to land roughly 70% of their first-quarter published pieces in the top 20 ranking positions within 90 days. The calendars where rankings drift more often are the ones where the analyst skipped the human-judgment layer on cluster cutting or wedge selection. The research is only as good as the calendar decisions on top of it.

The delta shows up most clearly on reruns. When a client returns for a research refresh six months after the first calendar, the clusters that shipped with full analyst judgment still hold rankings.

The clusters that were shipped with light review have usually drifted. Judgment pays back in ranking persistence, not just in ranking speed.

The research pipeline also surfaces content we can retire. Pages that targeted clusters now reclassified as BOFU commercial when the page itself reads as TOFU informational should be consolidated or rewritten.

Those calls flow into the refresh calendar alongside new content. About 15% of the refresh calendar is typically consolidation or rewrite work, not new publishing.

That ratio is what stops client archives from accumulating stale pages the way they did pre-pipeline. The writers appreciate the consolidation work more than the new-publish work, because it makes the existing library they worked on continue to earn rankings.

Putting This Workflow Into Practice

Keyword research is the stage where the pipeline economics change most. The time savings are immediate. The cumulative value shows up in the next quarter's ranking outcomes, not in the first week's export.

The right way to start is to run the three prompts against a recent Ahrefs export for an existing client. Compare the output against the calendar you already have. Use the delta as the evaluation.

The prompts that produce the same calendar you would have built are noise. The prompts that surface a cluster you missed are signal.

We run this research pipeline across 250+ B2B SaaS engagements at TripleDart. The pattern has carried across WeWork, Atlas, Payoneer, and SignEasy, and across verticals that share almost nothing beyond a 20,000-keyword universe and a content calendar that needs to feed a pipeline.

The first quarter of running the pipeline is usually where the biggest delta shows up.

Calendars built on keyword volume alone start producing rankings at 3x the rate once they get replanned against intent-classified clusters.

The second quarter is when the stacking internal-linking effect kicks in. New pages reinforce the old ones through the anchor graph, and ranking velocity picks up across the site, not just on the newest pieces.

The third quarter is when the refresh discipline starts paying back. Clusters that were shelved for authority reasons in quarter one become plannable as the domain's ranking baseline improves.

That is the long-arc payoff of running the pipeline with discipline instead of as a one-time audit. It is also the reason clients stay engaged past the first calendar: the research layer gets sharper every quarter the team runs it.

To see how we would run it on your keyword universe, talk to our team.

Frequently Asked Questions

Can Claude replace Ahrefs for keyword research? No. Claude does not crawl the web, does not maintain a keyword database, and does not track ranking history. It reads the data Ahrefs (or Semrush) produces and reasons across it. The split is not interchangeable; each tool does what the other cannot.

Which Claude model handles clustering best? Claude Opus for nuanced intent classification, Claude Sonnet for the clustering step on smaller exports under 5,000 keywords. For 10,000-plus keyword exports we default to Opus on the full pipeline. The Anthropic model overview covers the tier differences.

How many keywords can Claude cluster in one pass? Comfortably up to about 15,000 keywords in a single pass on Opus with the full context window. Above that, we split the export into segments by topic seed or by root URL and run the pipeline per segment. The merged output still works.

How accurate is Claude's intent classification? About 90% first-read accuracy on the six-way split we use, when the prompt includes SERP context. That is measured against analyst ground-truth labels across our internal testing. Drops to about 80% on low-volume long-tail where SERP context is sparse.

Do we still need an analyst in the loop? Yes. The three judgment calls (which wedge to lean into, which cluster to cut, which to shelve) stay human on every engagement. Claude handles the mechanical 80%. The analyst handles the decisions the 20% requires.

How long does a full research pass take? About three to four working hours from export to calendar for a 10,000-keyword universe. Claude handles the clustering, intent classification, and prioritization in about an hour of compute time. Analyst review plus the three judgment calls take the remaining two to three hours.

How does this plug into content briefing? The output of the research pass is a 40-row calendar. Each row becomes a content brief. The brief's reader, intent, keywords, and FAQ sections are populated directly from the calendar row. The full content pipeline is covered in the content workflow linked above.

What do you do when Ahrefs data conflicts with GSC data? GSC wins for existing ranking queries on pages the client already owns. Ahrefs wins for keyword universe, competitor data, and keywords the client does not yet rank for. We cross-reference both when classifying intent on ambiguous clusters.

Can this pipeline handle Semrush exports instead of Ahrefs? Yes. The prompts do not care about the source platform. They care about the columns in the export. If your export has keyword, volume, difficulty, SERP features, top URL, and ranking history, the pipeline runs. We have switched between Ahrefs and Semrush mid-engagement when client subscriptions changed, and the output quality held steady.

How do you refresh clusters as the market shifts? Every client gets a quarterly research refresh pass. New keywords enter the universe, old keywords drift in intent, and SERP composition changes. Running the three prompts on the fresh export surfaces the drift. The refresh cadence is what keeps the calendar current.