Most marketers treat Reddit like a black box. They know their audience is in there somewhere (lurking in subreddits, complaining in comment threads, upvoting things that actually matter to them) but extracting that signal feels like work. So they skip it, guess at their targeting, and wonder why their CPCs are high and their CTRs are low.
This walkthrough is going to fix that.
By the end, you'll have a working Python script that pulls real Reddit data using the official API, runs it through Claude to extract audience intelligence, and gives you targeting inputs you can take directly into Reddit Ads Manager. No scraping. No guesswork. No wasted budget warming up an audience you don't understand yet.
Why Reddit, and Why Now
Reddit sits in a strange position in the marketing stack. It's enormous (over 100,000 active communities, hundreds of millions of users) but it's also unusually honest. People on Reddit say what they actually think. They're not performing for a following or trying to look aspirational. They're asking real questions, venting real frustrations, and recommending things they genuinely use.
That makes it one of the richest sources of unfiltered audience intelligence available to marketers. The problem is that intelligence is distributed across thousands of threads. That's where the API plus AI becomes a serious edge: you can pull that signal at scale and let the model do the pattern recognition.
Reddit's ad platform has also matured significantly. Subreddit targeting is particularly powerful because you're reaching people self-organized around a shared interest or need, not algorithmically inferred demographics that may or may not hold.
What You'll Build
A Python script that:
Authenticates with Reddit's API using PRAW
Pulls top posts and comments from one or more target subreddits
Sends that data to Claude via the Anthropic API
Returns a structured audience brief: language patterns, pain points, objections, content format preferences, and targeting recommendations
This runs locally. Your data stays yours.
Prerequisites
Python 3.8+
A Reddit account (to create an API app)
An Anthropic API key
Install the required libraries:
bash
pip install praw anthropic
Step 1: Set Up Reddit API Access
Reddit's API requires you to register an application. This is a two-minute process.
Log in to Reddit and go to https://www.reddit.com/prefs/apps
Click Create App (or Create Another App if you've done this before)
Fill in the fields:
Name: Something descriptive, like audience-research-tool
App type: Select script
Redirect URI: Use http://localhost:8080 (required but not used for script-type apps)
Click Create App
You'll get a client ID (shown under your app name) and a client secret. Keep these. You'll need them in the next step.
Step 2: Configure Your Credentials
Create a file called config.py to store your credentials outside your main script. Never hardcode these or commit them to a repo.
python
REDDIT_CLIENT_ID = "your_client_id_here"
REDDIT_CLIENT_SECRET = "your_client_secret_here"
REDDIT_USER_AGENT = "audience-research-tool/1.0 by u/your_reddit_username"
ANTHROPIC_API_KEY = "your_anthropic_api_key_here"
The user_agent string is required by Reddit's API. Their guidelines ask that it follow the format platform:app_name:version (by u/username). Using a descriptive agent helps avoid rate limiting.
Step 3: Pull Data from Reddit with PRAW
PRAW (Python Reddit API Wrapper) is the standard library for interacting with Reddit's API. It handles authentication, rate limiting, and pagination cleanly, which is why it's the right choice here over raw requests calls.
python
# reddit_client.py
import praw
from config import REDDIT_CLIENT_ID, REDDIT_CLIENT_SECRET, REDDIT_USER_AGENT
def get_reddit_client():
return praw.Reddit(
client_id=REDDIT_CLIENT_ID,
client_secret=REDDIT_CLIENT_SECRET,
user_agent=REDDIT_USER_AGENT,
)
def fetch_subreddit_data(subreddit_name, post_limit=25, comment_limit=10):
"""
Pulls top posts and their top comments from a subreddit.
Returns a list of dicts with post title, body, score, and top comments.
"""
reddit = get_reddit_client()
subreddit = reddit.subreddit(subreddit_name)
posts = []
post_data = {
"title": post.title,
"body": post.selftext[:500] if post.selftext else "",
"score": post.score,
"num_comments": post.num_comments,
"url": post.url,
"comments": []
}
# Pull top-level comments only, sorted by score
post.comment_sort = "top"
post.comments.replace_more(limit=0) # Skip "load more" threads
for comment in post.comments[:comment_limit]:
if len(comment.body) > 20: # Skip very short/low-signal comments
post_data["comments"].append({
"body": comment.body[:300],
"score": comment.score
})
posts.append(post_data)
return posts
A few notes on the choices here:
time_filter="month" pulls posts from the last 30 days. You can change this to "week", "year", or "all" depending on how much historical signal you want. For fast-moving categories, "week" keeps things fresh. For evergreen research, "year" or "all"gives you more volume.
replace_more(limit=0) is important. Without it, PRAW will try to load collapsed comment threads, which burns API calls quickly. For research purposes, the top-level comments contain most of the high-signal content anyway.
Body truncation ([:500], [:300]) keeps the payload to Claude manageable without losing the substance. Adjust these if your posts tend toward long-form discussion.
Step 4: Build the Prompt and Call Claude
This is where the intelligence comes from. The quality of your audience brief lives or dies in how you frame the task for Claude. You want structured output you can act on: not a summary, but a brief.
python
import anthropic
import json
from config import ANTHROPIC_API_KEY
def build_prompt(subreddit_name, posts):
"""
Formats Reddit data into a structured prompt for Claude.
"""
content = f"Subreddit: r/{subreddit_name}\n\n"
for i, post in enumerate(posts, 1):
content += f"--- Post {i} (Score: {post['score']}) ---\n"
content += f"Title: {post['title']}\n"
if post["body"]:
content += f"Body: {post['body']}\n"
if post["comments"]:
content += "Top comments:\n"
for comment in post["comments"]:
content += f" [{comment['score']} pts] {comment['body']}\n"
content += "\n"
prompt = f"""You are an expert audience researcher and media strategist with deep experience in digital advertising.
Below is a sample of top posts and comments from r/{subreddit_name} over the past month.
{content}
Based on this data, produce a structured audience intelligence brief with the following sections:
1. WHO IS HERE: Describe the likely demographics, professional background, and mindset of this community. What stage of life or career are they in? What do they care about?
2. LANGUAGE & VOICE: List 8–12 specific words, phrases, or expressions this audience uses naturally. These are terms you'd want to mirror in ad copy and landing pages.
3. PAIN POINTS & FRUSTRATIONS: What are the recurring problems, complaints, or unmet needs surfacing in this community? Be specific — vague pain points don't help with targeting.
4. WHAT THEY RESPOND TO: Based on post formats and engagement patterns, what content types perform best here (personal stories, how-tos, data, controversy, humor)? What hooks get traction?
5. OBJECTIONS TO WATCH: What skepticism, cynicism, or objections might this audience bring to an ad or offer? What would make them scroll past?
6. AD TARGETING RECOMMENDATIONS: Based on what you've observed, suggest 3–5 additional subreddits that likely share this audience. List any interest categories or keywords that would map to Reddit Ads targeting options.
7. ONE INSIGHT WORTH ACTING ON: What's the single most useful thing a marketer could take from this community's behavior that they probably wouldn't find in a standard persona doc?
Be direct and specific. Skip generalities. This brief is going to inform real media spend."""
return prompt
def analyze_subreddit(subreddit_name, posts):
"""
Sends Reddit data to Claude and returns the audience brief.
"""
client = anthropic.Anthropic(api_key=ANTHROPIC_API_KEY)
prompt = build_prompt(subreddit_name, posts)
message = client.messages.create(
model="claude-opus-4-5",
max_tokens=2000,
messages=[
{"role": "user", "content": prompt}
]
)
return message.content[0].text
The prompt is doing a lot of work here. A few things worth calling out:
Section 6 (Ad Targeting Recommendations) is the most directly actionable output. Subreddit suggestions from the model are worth validating. Cross-reference them against Reddit's audience size estimates in Ads Manager before committing budget.
Section 7 (One Insight Worth Acting On) consistently produces the most interesting results. It forces the model past synthesis into genuine observation. The insight you get here often reframes the whole brief.
Step 5: Wire It Together
python
# main.py
from reddit_client import fetch_subreddit_data
from analyzer import analyze_subreddit
def research_subreddit(subreddit_name, post_limit=25):
print(f"\nPulling data from r/{subreddit_name}...")
posts = fetch_subreddit_data(subreddit_name, post_limit=post_limit)
print(f"Retrieved {len(posts)} posts. Analyzing with Claude...\n")
brief = analyze_subreddit(subreddit_name, posts)
print("=" * 60)
print(f"AUDIENCE BRIEF: r/{subreddit_name}")
print("=" * 60)
print(brief)
# Save to file
output_file = f"brief_{subreddit_name}.txt"
with open(output_file, "w") as f:
f.write(f"AUDIENCE BRIEF: r/{subreddit_name}\n")
f.write("=" * 60 + "\n\n")
f.write(brief)
print(f"\nBrief saved to {output_file}")
if name == "__main__":
# Run against one subreddit
research_subreddit("personalfinance")
# Or run against multiple to compare audiences
# for sub in ["personalfinance", "financialindependence", "frugal"]:
# research_subreddit(sub)
Run it:
bash
python main.py
You'll see the brief printed to your terminal and saved as a .txt file you can bring into a doc or share with a team.
Step 6: Translating the Brief into Reddit Ads Targeting
Once you have your brief, here's how the sections map to actual targeting decisions in Reddit Ads Manager:
Section 1 (Who Is Here) informs your creative angle and landing page tone. If the community skews toward experienced practitioners, jargon-heavy copy may work. If it's people newer to a topic, plain language and reassurance win.
Section 2 (Language & Voice) feeds directly into headline and body copy. If the brief surfaces that this community says "paycheck to paycheck" rather than "cash flow constraints," use their words.
Section 3 (Pain Points) shapes your hook. Reddit audiences are banner-blind and skeptical of obvious advertising. Leads that open with a pain point they recognize perform significantly better than benefit-forward hooks.
Section 6 (Targeting Recommendations) gives you your subreddit list. In Reddit Ads, you can layer subreddit targeting with interest categories for tighter reach. Start broad (one or two subreddits) then refine based on performance data after 7–10 days.
Section 5 (Objections) is the one most media teams skip and then wonder why their comment sections are rough. If you know the objections going in, you can address them preemptively in the creative or in the landing page above the fold.
A Few Practical Notes
Rate limits: Reddit's free API tier allows 60 requests per minute for script-type apps. For a research run against 3–4 subreddits at 25 posts each, you won't come close to hitting this.
Subreddit size matters: Communities under ~50k subscribers often have too little recent activity for a meaningful sample. Communities over 2M can feel generic. The smaller, more specific subreddits often produce sharper audience signal.
Run this before, not after: The value of this workflow is in shaping your campaign strategy before you've committed budget. Running it mid-campaign to understand why something isn't working is useful too, but the real leverage is upstream.
Multi-subreddit comparison: The commented-out loop at the bottom of main.py is worth using. Running the same analysis across three adjacent communities (say, r/smallbusiness, r/Entrepreneur, and r/startups) and comparing the briefs shows you where audiences overlap and where they diverge. That's how you decide which subreddit gets the most budget.
What You're Really Building
This isn't just a data-pull script. It's a research workflow that compresses what used to take hours of manual thread-reading into a ten-minute process, and it produces output that's actually structured for decision-making, not just interesting to read.
The marketers who win on Reddit are the ones who sound like they've spent time in the community before they show up trying to sell something. This workflow gets you there without the hours.
The code above is a working foundation. From here, you can extend it to pull search-specific posts using PRAW's reddit.subreddit("all").search() method, add competitor mention tracking, or pipe the output into a more structured reporting format. But run it as-is first. The brief it produces is worth reading before you start optimizing the tool.
Have questions about the setup or want to see a specific variation (multi-subreddit comparison, keyword search mode, or structured JSON output)? Drop a comment below.