COHORT 10 Β· MAY 2026 Β· REFERENCE GUIDE

OpenAI Models Landscape
Complete Reference & Cost Estimator

A comprehensive breakdown of every model in OpenAI’s current API portfolio β€” organized by tier, use case, and cost β€” with an interactive calculator to estimate your production spend before you build.

πŸ“… Updated: 2 May 2026
🧠 Models covered: 60+
πŸ’‘ Default recommendation: GPT-5.4 mini
πŸ”— Source: OpenAI official docs + pricing
🎯

Cohort 10 Default Stack

GPT-5.4 mini β†’ everyday chat/coding Β· GPT-5.5 β†’ high-stakes agentic/document work Β· GPT-4.1 β†’ best non-reasoning tool-calling Β· text-embedding-3-small β†’ retrieval/search Β· GPT Image 2 β†’ image generation Β· gpt-realtime-1.5 β†’ live voice agents Β· gpt-4o-transcribe β†’ speech-to-text

Portfolio Architecture

Five Layers. One Portfolio.

OpenAI’s lineup is now structured around clear tiers. Understanding which layer fits your workload is more valuable than knowing every model name.

Layer 1
Frontier General
GPT-5.5, GPT-5.4 β€” highest capability, premium pricing, 1M+ token context
Layer 2
Value Production
GPT-5.4 mini, GPT-5.4 nano β€” high-volume defaults, fast, cost-efficient
Layer 3
Deep Reasoning
o3, o4-mini, o3-deep-research β€” slow, specialist, math/science/research
Layer 4
Specialist Media
Image, video, voice, TTS, transcription β€” modality-native models
Layer 5
Compatibility
GPT-4, GPT-3.5, legacy previews β€” backward compat only, not new builds

Release Timeline

Cadence Has Accelerated

OpenAI shipped more model families in 18 months than in the prior 3 years combined.

Key Strategic Themes

What Changed vs 2024

01
Quality Tiers Are Now ExplicitPro β†’ Flagship β†’ Mini β†’ Nano pricing tiers are documented and priced clearly. Model selection is about workload shape, not just “latest vs not latest.”
02
Context Windows Exceeded 1M TokensGPT-5.5, GPT-5.4, and GPT-4.1 all support 1M+ token contexts. Long-document and multi-file agentic workflows are now native, not workarounds.
03
Batch/Flex + Cached Input PricingAsynchronous batch jobs and prompt caching materially change economics. Cached tokens on GPT-5.1 are 90% cheaper and held up to 24 hours.
04
Safety Specialization Is a First-Class FeatureOpenAI publishes system cards per family. Tool-capable models (browse, code, computer-use) require explicit deployment controls in production.
05
Modality-Native Models Win Their DomainsVoice, image, video, and transcription now have dedicated specialist models. Don’t route audio/image work through general GPT endpoints.

Shortlist Tier

Models to Actively Evaluate

These are the models Cohort 10 should consider first. Each card shows pricing, context, and best-fit role.

Frontier General Models GPT-5 family Β· 1M+ context Β· Premium pricing
Value & Production Models Mini / Nano variants Β· Cost-efficient Β· High-volume
Reasoning & Research Models o-series Β· Deep-research Β· Slower but deeper
Specialist & Media Models Image Β· Voice Β· Transcription Β· Embeddings Β· Moderation

Side-by-Side

Top-16 Models Compared

Sorted by recommended tier. Prices are standard list in USD per 1M tokens unless stated. Input cached = prompt-cache discount price.

Model Best Role Context Input $/1M Cached $/1M Output $/1M Latency

Embedding & Moderation

Utility Models

ModelRolePrice $/1MNotes
text-embedding-3-smallDefault retrieval/search$0.02Best price/performance default
text-embedding-3-largeHigh-recall / multilingual$0.13Better recall on hard tasks
text-embedding-ada-002Legacy compat only$0.10Prefer 3-small for new builds
omni-moderation-latestText + image safety filterFREEUse as pre/post-filter on all endpoints

Interactive Tool

Estimate Your Monthly Cost

Enter your workload parameters below. The calculator shows side-by-side cost across the most common models so you can make the right build vs. quality trade-off before writing a single line of code.

πŸ’° OpenAI API Cost Estimator

Based on official list pricing as of May 2026. Does not include Batch/Flex discounts.

Quick presets β†’

Monthly cost by model β†’

πŸ’‘ Cost tip: At 10k requests/month with 500 input + 1000 output tokens, GPT-5.5 costs ~β€” vs GPT-5.4 mini at ~β€” β€” a β€”Γ— difference. Use Batch/Flex for async jobs to cut costs further (typically 50% off). Enable prompt caching when your system prompt exceeds 1,024 tokens β€” cached tokens are charged at 10% of list price.

Workload β†’ Model

Task-to-Model Reference

Match your use case to the right model before you build. Primary = best quality. Alternative = lower cost or latency. Hover any row for details.

Task
Primary
Alternative
Why This Works

Decision Tree

How to Pick the Right Model

Follow this logic for every new project. When in doubt, start with GPT-5.4 mini and escalate only when needed.

Migration Paths

Upgrading Older Fleets

If you’re moving off legacy models, use these direct upgrade paths.

Architecture Pattern

Recommended Router Architecture

πŸ—οΈ

The Multi-Tier Router Stack

GPT-5.4 nano β†’ intent classification & extraction (cheapest, fastest) Β· GPT-5.4 mini β†’ most chat/workflow steps (production default) Β· GPT-5.5 β†’ escalation only (research-heavy, code-heavy, high-stakes) Β· text-embedding-3-small β†’ all retrieval Β· omni-moderation-latest β†’ safety pre/post filter (free) Β· Specialist models β†’ media/voice/image when modality demands it.

This architecture aligns with OpenAI’s current portfolio design and is where the best cost-quality frontier sits for production apps.

Cost Optimization

Top Cost-Saving Moves

01
Use Batch/Flex for Async WorkOpenAI’s batch processing offers substantially lower effective token rates. If you don’t need real-time results, this is your biggest lever.
02
Aggressively Cache Long System PromptsCached tokens cost 90% less on most models. If your system prompt exceeds 1,024 tokens, caching can cut your input bill by 80%+.
03
Cap max_tokens on Routine FlowsModels will generate up to their maximum unless you tell them to stop. For extraction, classification, and short answers β€” always set a token cap.
04
Route by Task ComplexityDon’t send simple classification tasks to GPT-5.5. A nano model handles label prediction perfectly well at 40Γ— lower cost per token.
05
Use Modality-Native SpecialistsSending audio through a general text model is expensive and less accurate than gpt-4o-transcribe. Same for images (GPT Image 2) and voice (gpt-realtime-1.5).

Leave a Reply

Your email address will not be published. Required fields are marked *