AI Product Strategy & Roadmap

⬅️ Back to Day 1: Position

In 2026, every B2B SaaS has the same question: how should we integrate AI? The pressure comes from every direction — investors expecting AI features in your next pitch deck, customers asking "do you do AI?", competitors shipping AI agents that may or may not work, and a flood of frontier models (Claude Opus 4.7, GPT-5, Gemini 3) capable of dramatically reshaping product surfaces. The wrong response: panic-build "AI features" that don't actually move metrics, slap a chatbot on the homepage, ship a "summary button" that summarizes things nobody needed summarized.

This is the strategic playbook for figuring out what to build, what to skip, where to invest, and how to communicate. Distinct from AI product positioning (which is the marketing layer); this is about product strategy + roadmap decisions.

What Done Looks Like

A clear AI thesis: where AI creates real value in your product (vs cosmetic AI features)
3-5 prioritized AI feature investments with success criteria
Build-vs-buy decisions documented (use OpenAI / Claude / Gemini APIs vs train own models vs use vertical AI vendors)
Cost / margin model: AI features have variable costs that don't disappear; pricing reflects this
Privacy + compliance baked in (BAA / SOC 2 / customer data not training models)
Clear "what's AI, what isn't" inside the product (transparency builds trust)
Internal AI literacy: PMs + engineers + sales + CS understand what's possible + impossible
A specific person owns AI strategy (head of AI / CTO / a senior PM) past $5M ARR
Quarterly review: which AI features moved metrics; which didn't; iterate

1. The AI Thesis: Where Does AI Actually Help

The first task is honest: where does AI create durable value for your customers, not generic "AI features."

Strong AI Use Cases

These typically work:

Free-form text understanding: classifying / summarizing / extracting from unstructured customer input
Generation in low-stakes contexts: drafting emails, generating ideas, creating boilerplate
Search + retrieval over your data: semantic search beats keyword for many use cases
Conversational interfaces for self-serve support / FAQ / discovery
Recommendation + ranking: prioritizing what users see based on intent
Translation, transcription, and language tasks: massively improved
Code generation for technical products
Data extraction from documents, screenshots, PDFs
Anomaly detection + insights generation

Weak AI Use Cases

These usually disappoint:

High-stakes decisions without human review (medical diagnosis, legal advice, financial trades)
Real-time precision requirements (sub-100ms decisions; LLMs are slow)
Tasks requiring real-world reasoning beyond context window
Replicating a human relationship (sales, customer success — augment, don't replace)
Generation in contexts demanding precision (legal contracts, medical records, financial calcs)
Solved problems (search via Algolia is fine for keyword search; AI doesn't always help)

Test Your Thesis

For each candidate AI feature, answer:

What user job is being done? (Not "we want AI"; "we want to help users do X faster.")
Could a human do this in 30 seconds? If yes, AI might do it in 3 seconds reliably. If no (requires hours of analysis), AI may struggle.
What's the failure cost? If AI gets it wrong 10% of the time, can the user recover?
Is there an existing non-AI solution that's good enough?
Will this be worth running in 2 years? (Specific to your product; not just "AI is hot")

If a feature passes all five, it's worth building. If not, skip.

2. Build vs Buy: API vs Custom Model vs Vendor

For every AI feature, the architectural decision:

Use a Foundation Model API (Claude / GPT / Gemini)

Default in 2026 for 80%+ of AI features.

Pros:

Best-in-class reasoning + generation
No model training / hosting overhead
Frequent improvements (capabilities increase quarterly)
BAA available at enterprise tiers
Predictable pricing per token

Cons:

Variable cost (scales with usage)
Privacy concerns if naive ("our data going to OpenAI/Anthropic")
Vendor dependency (Anthropic raises prices, you eat it)
Latency: API calls add ~500ms-3s

Use foundation model APIs when:

The use case fits a general LLM well
You need quality > cost
BAA / privacy requirements can be met by vendor
You want to ship fast

Fine-Tune a Smaller Open-Source Model

For specific use cases at high volume.

Pros:

Lower per-inference cost at high volume
Domain-specific accuracy (with good training data)
Self-hosted = full data privacy

Cons:

Engineering investment (initial training + ongoing)
Quality may lag frontier models
Operational burden (GPU hosting, monitoring)
Requires labeled training data

Use fine-tuned models when:

You're at >$1M/yr in API spend (then it's worth optimizing)
Specific narrow task with quality bar foundation models miss
Privacy / data residency requires self-hosted

Use a Vertical AI Vendor

Specialized AI platforms targeting specific use cases.

Examples:

AI customer support: Decagon / Sierra / Ada / Intercom Fin
AI sales SDR: 11x / Artisan / Aisdr
AI moderation: Hive / OpenAI Moderation / Spectrum Labs
AI image / video editing: Runway / Krea / Magnific
AI voice: Vapi / Bland / Retell

Pros:

Specialized for the task; works out of box
Integration time measured in days
Often better than DIY for the specific use case

Cons:

Vendor margin captures some value
Lock-in
Less customization

Use vertical vendors when:

The use case is well-defined + commoditized
Speed-to-market beats per-unit cost
You don't have AI engineering capacity

Decision Framework

Use Case	Default Pick
New AI feature, mainstream task	Foundation model API (Claude / GPT)
Mature use case with specialist vendor	Vertical AI vendor
Ultra-high volume, narrow task	Fine-tuned open-source model
Privacy-critical	Self-hosted (Llama, Mistral) on BAA infrastructure
Rapid prototype	Foundation model API; iterate to specialist later

3. The 3-Tier AI Feature Roadmap

A useful framing: classify AI features by ambition + risk.

Tier 1: Cosmetic AI ("AI Sprinkles")

Adding AI to existing surfaces with low risk:

"Summarize this" buttons
"Generate alt text" / "Write me a description"
"Tone adjustment" on outgoing messages
"Improve this" on user-typed content
Smart search (semantic vs keyword)

Time to ship: weeks. Cost: low. Differentiation: minimal. Customer expectation: yes, please.

Build these first; they're table stakes by 2026.

Tier 2: Workflow AI

AI that materially changes how a user accomplishes their job:

AI-drafted document templates (sales proposals, customer responses)
Smart classification + routing (incoming requests / leads)
Intelligent recommendations (next-best-action)
AI-assisted analytics (auto-insights from dashboards)
Conversational interfaces over your data

Time to ship: months. Cost: medium. Differentiation: meaningful. Customer expectation: differentiating.

These are where most B2B SaaS create real AI value.

Tier 3: AI Agents / Autonomous

AI that takes actions on the user's behalf:

AI customer support agents resolving tier-1 tickets autonomously
AI SDRs sourcing + outreaching prospects
AI workflow automation that triggers + routes work
AI coding agents that write + commit code

Time to ship: quarters to years. Cost: high. Differentiation: potentially transformative. Customer expectation: this is "real" AI.

Most companies should NOT start here. Build Tier 1 + Tier 2 first; learn; then attempt Tier 3.

4. Cost + Margin Math

AI features have variable costs that don't disappear. Plan for them.

Per-Inference Costs (2026 ranges)

Claude Opus 4.7: ~$15-75 per 1M input tokens; $75-150 per 1M output
Claude Sonnet 4.6: ~$3-15 per 1M input; $15-75 per 1M output
Claude Haiku 4.5: ~$0.80-1 per 1M input; $4-5 per 1M output
GPT-5 / Sonnet 4.6: similar tiers
GPT-4o-mini / Haiku tiers: cheap but lower quality

Calculating Per-Customer AI Cost

For a typical "AI summary" feature:

5K tokens input + 1K tokens output per use
Customer uses 10 times/day = 60K tokens/day = 1.8M tokens/month
At Claude Sonnet pricing: ~$15-30/customer/month in AI costs
At Haiku pricing: ~$5-10/customer/month

If your ACV is $50/seat/mo and 5 seats = $250/mo customer ARR, AI costs of $50/customer/month = 20% gross margin hit. Significant.

Pricing Implications

Bundle AI into existing pricing if cost is low + competitive: "AI summarize" doesn't need separate pricing
AI-specific tiers for higher cost AI features: "AI Pro plan adds AI agent"
Usage-based pricing for high-cost AI: per-seat with per-usage caps; overage charges
Hard caps + alerts for runaway usage (a single customer burning $10K/month in tokens is real)

Optimization Levers

Smaller models when sufficient (Haiku before Opus)
Caching (Anthropic prompt caching reduces input cost 90% for repeated prompts)
Truncation / summarization (don't send 100K tokens of context if 5K works)
Hybrid: keyword search first, LLM only for ambiguous cases
Per-customer rate limits

5. Privacy + Compliance

The line where AI features can break customer trust.

Customer Data Training

Do NOT train models on customer data without explicit consent
Use API providers with "do not train" guarantees (Anthropic, OpenAI Enterprise, Azure OpenAI)
For SaaS handling sensitive data: zero-data-retention agreements

BAA / Compliance

HIPAA: use Claude Enterprise, OpenAI Enterprise, AWS Bedrock with BAA
SOC 2: ensure AI vendors are SOC 2 certified
GDPR: AI processing may require additional disclosures + opt-outs

Surfacing AI Use to Customers

Best practice: be transparent.

"✨ AI-generated" labels on AI output
"How AI is used" page explaining your AI policies
Customer admin controls: turn off AI features per workspace
Audit logs: which AI features ran when

Data Leakage Risks

Cross-tenant prompt injection: a malicious user's input shouldn't expose another tenant's data
Output sanitization: AI may regurgitate training data; rare but possible
Logs: don't log full prompts + responses indefinitely; retention policy

6. Internal AI Literacy

Your team needs to understand AI to build AI products.

PM Literacy

Understand context windows, latency, hallucination rates
Know what current models can / can't do
Test prompts manually before specifying features
Write better PRDs that specify "the AI should..." with realistic expectations

Engineering Literacy

Familiar with prompt engineering basics
Know the major providers' APIs (Anthropic, OpenAI, Google)
Eval frameworks (does the AI feature actually work?)
Cost monitoring + alerting

Sales / CS Literacy

Realistic about AI limits to avoid overselling
Demo AI features without "magic" framing
Handle customer concerns about AI privacy + accuracy
Know when to escalate to product

Investment

Internal training: 1-2 hrs/quarter on AI updates
Hands-on workshops for PMs / engineers
Slack channel for AI sharing + experiments
Budget for individual experimentation ($100/mo/user in API credits)

7. AI Roadmap Process

How you plan AI features matters as much as what you build.

Quarterly AI Roadmap Review

What's the current AI thesis?
What features in flight; how are they performing?
What new AI capabilities did frontier models release this quarter?
What's competitive landscape doing?
What's the spend trajectory?

Per-Feature Eval Framework

Every AI feature needs a measurable eval:

Quality: accuracy / relevance / hallucination rate
Latency: median + p95 response time
Cost: per-request and per-customer
Adoption: % of eligible users using it
Impact: does it move retention / activation / NPS?

If a feature can't be measured, you can't iterate on it.

Kill Criteria

Define when you'd kill an AI feature:

Adoption <10% of eligible users after 90 days
Quality below threshold despite tuning
Cost ratio worse than 30% of feature revenue
Customer trust events (visible AI failures hurting reputation)

Most teams underestimate kill criteria; build features that drift forever. Be willing to retire.

8. Common Failure Modes

"AI sprinkles" without strategic thesis. Adding AI summaries to every screen because Cursor / Notion did it. Doesn't move metrics; adds cost; complicates UX.

Underestimating AI cost. Pricing assumes negligible AI cost; reality is 20-50% of revenue. Margin crashes. Model costs early.

Overestimating AI capability. "We'll let users ask anything; the LLM will figure it out." Doesn't work; LLMs hallucinate; users bounce. Constrain AI to specific tasks.

Building Tier 3 (agents) before Tier 1 + 2. Founder excited about AI agents; tries to ship autonomous agent before any simpler AI features. Fails operationally.

Vendor lock-in to a single LLM. Anthropic raises prices 40%; you can't switch quickly. Use abstraction layers (Vercel AI Gateway, OpenRouter, AI SDK).

No evals. Ship AI feature; don't measure quality; gradual degradation goes unnoticed; users lose trust. Build eval suites.

Customer-data-training concerns ignored. Customer asks "do you train on our data?"; you don't have a clean answer. Lose enterprise deal. Be explicit.

Cosmetic AI features pretending to be transformative. Labeling a "summarize" button as "Revolutionary AI Summarization™" — customers see through it.

Building features the model can't reliably do. "AI will detect customer intent and route automatically" — at 70% accuracy, the routing is worse than no automation. Test before promising.

Privacy theater. "Your data is private!" without explaining what that means concretely. Be specific (no training; data residency; encryption) or don't claim privacy.

No transparency about AI use. Output looks AI-generated; users notice; trust erodes. Label "AI-generated" outputs.

Hardcoding to current model capabilities. Building around current Claude Sonnet limits; in 6 months, capability shifts and your design feels dated. Architect for capability evolution.

Not budgeting for AI ops. AI features need monitoring, eval, prompt iteration, cost optimization — ongoing investment. Treat as platform.

Treating AI as a moat. "Our AI is unique" — not unless you have proprietary data, fine-tuned models, or unique application. Most AI features can be replicated by competitors quickly.

No customer admin controls. Customer wants to disable AI features for their workspace (compliance, preference); you've made it impossible. Build admin toggles.

Confusing AI features with AI strategy. Shipping 10 disconnected AI features doesn't add up to coherent AI product. Have a thesis.

Not internal-testing with employees. Ship AI feature without dogfooding internally. Customers find embarrassing edge cases. Always dogfood first.

Sales overselling AI capability. "Our AI can do anything." Customer expectations gap creates churn. Sales training matters.

Investor-driven AI strategy. "Our investors want more AI" → ship AI features that don't move customer metrics. Customer-driven strategy wins.

No kill criteria. AI feature underperforming; nobody pulls the plug; resources stuck. Define kill criteria.

Forgetting non-AI users. Products designed entirely around AI alienate users who don't want AI. Make AI optional + valuable.

What Done Looks Like (Recap)

You've shipped AI product strategy when:

Documented AI thesis articulating where AI creates value in your product
3-5 prioritized AI feature investments with measurable success criteria
Build-vs-buy decisions explicit per feature
Cost + margin model accounting for variable AI cost
Privacy + compliance approach documented + communicated
Internal AI literacy program (PM / Eng / Sales / CS)
Quarterly AI roadmap review with eval data
Kill criteria defined; willing to retire underperforming features
Customer admin controls for AI usage
A named owner past $5M ARR

Mistakes to Avoid

Cosmetic AI without strategic thesis
Underestimating per-inference cost / margin impact
Building Tier 3 agents before Tier 1 + 2
Vendor lock-in without abstraction
No evals; quality drifts unnoticed
Privacy theater without specifics
Lack of transparency on AI usage
Hardcoding to current model limits
No AI ops budget
Investor-driven AI roadmap vs customer-driven
No kill criteria; features drift forever
No customer admin controls
Forgetting non-AI users

AI Product Strategy & Roadmap

What Done Looks Like

1. The AI Thesis: Where Does AI Actually Help

Strong AI Use Cases

Weak AI Use Cases

Test Your Thesis

2. Build vs Buy: API vs Custom Model vs Vendor

Use a Foundation Model API (Claude / GPT / Gemini)

Fine-Tune a Smaller Open-Source Model

Use a Vertical AI Vendor

Decision Framework

3. The 3-Tier AI Feature Roadmap

Tier 1: Cosmetic AI ("AI Sprinkles")

Tier 2: Workflow AI

Tier 3: AI Agents / Autonomous

4. Cost + Margin Math

Per-Inference Costs (2026 ranges)

Calculating Per-Customer AI Cost

Pricing Implications

Optimization Levers

5. Privacy + Compliance

Customer Data Training

BAA / Compliance

Surfacing AI Use to Customers

Data Leakage Risks

6. Internal AI Literacy

PM Literacy

Engineering Literacy

Sales / CS Literacy

Investment

7. AI Roadmap Process

Quarterly AI Roadmap Review

Per-Feature Eval Framework

Kill Criteria

8. Common Failure Modes

What Done Looks Like (Recap)

Mistakes to Avoid

See Also