COHORT 10 · MAY 2026 · REFERENCE GUIDE

OpenAI Models Landscape
Complete Reference & Cost Estimator

A comprehensive breakdown of every model in OpenAI’s current API portfolio — organized by tier, use case, and cost — with an interactive calculator to estimate your production spend before you build.

📅 Updated: 2 May 2026

🧠 Models covered: 60+

💡 Default recommendation: GPT-5.4 mini

🔗 Source: OpenAI official docs + pricing

🎯

Cohort 10 Default Stack

GPT-5.4 mini → everyday chat/coding · GPT-5.5 → high-stakes agentic/document work · GPT-4.1 → best non-reasoning tool-calling · text-embedding-3-small → retrieval/search · GPT Image 2 → image generation · gpt-realtime-1.5 → live voice agents · gpt-4o-transcribe → speech-to-text

Portfolio Architecture

Five Layers. One Portfolio.

OpenAI’s lineup is now structured around clear tiers. Understanding which layer fits your workload is more valuable than knowing every model name.

Layer 1

Frontier General

GPT-5.5, GPT-5.4 — highest capability, premium pricing, 1M+ token context

Layer 2

Value Production

GPT-5.4 mini, GPT-5.4 nano — high-volume defaults, fast, cost-efficient

Layer 3

Deep Reasoning

o3, o4-mini, o3-deep-research — slow, specialist, math/science/research

Layer 4

Specialist Media

Image, video, voice, TTS, transcription — modality-native models

Layer 5

Compatibility

GPT-4, GPT-3.5, legacy previews — backward compat only, not new builds

Release Timeline

Cadence Has Accelerated

OpenAI shipped more model families in 18 months than in the prior 3 years combined.

Key Strategic Themes

What Changed vs 2024

Quality Tiers Are Now ExplicitPro → Flagship → Mini → Nano pricing tiers are documented and priced clearly. Model selection is about workload shape, not just “latest vs not latest.”

Context Windows Exceeded 1M TokensGPT-5.5, GPT-5.4, and GPT-4.1 all support 1M+ token contexts. Long-document and multi-file agentic workflows are now native, not workarounds.

Batch/Flex + Cached Input PricingAsynchronous batch jobs and prompt caching materially change economics. Cached tokens on GPT-5.1 are 90% cheaper and held up to 24 hours.

Safety Specialization Is a First-Class FeatureOpenAI publishes system cards per family. Tool-capable models (browse, code, computer-use) require explicit deployment controls in production.

Modality-Native Models Win Their DomainsVoice, image, video, and transcription now have dedicated specialist models. Don’t route audio/image work through general GPT endpoints.

Shortlist Tier

Models to Actively Evaluate

These are the models Cohort 10 should consider first. Each card shows pricing, context, and best-fit role.

Frontier General Models GPT-5 family · 1M+ context · Premium pricing

Value & Production Models Mini / Nano variants · Cost-efficient · High-volume

Reasoning & Research Models o-series · Deep-research · Slower but deeper

Specialist & Media Models Image · Voice · Transcription · Embeddings · Moderation

Side-by-Side

Top-16 Models Compared

Sorted by recommended tier. Prices are standard list in USD per 1M tokens unless stated. Input cached = prompt-cache discount price.

Model	Best Role	Context	Input $/1M	Cached $/1M	Output $/1M	Latency

Embedding & Moderation

Utility Models

Model	Role	Price $/1M	Notes
text-embedding-3-small	Default retrieval/search	$0.02	Best price/performance default
text-embedding-3-large	High-recall / multilingual	$0.13	Better recall on hard tasks
text-embedding-ada-002	Legacy compat only	$0.10	Prefer 3-small for new builds
omni-moderation-latest	Text + image safety filter	FREE	Use as pre/post-filter on all endpoints

Interactive Tool

Estimate Your Monthly Cost

Enter your workload parameters below. The calculator shows side-by-side cost across the most common models so you can make the right build vs. quality trade-off before writing a single line of code.

💰 OpenAI API Cost Estimator

Based on official list pricing as of May 2026. Does not include Batch/Flex discounts.

Quick presets →

Avg Input Tokens per Request

Avg Output Tokens per Request

Requests per Month

Cached Input % (0–100)

Monthly cost by model →

💡 Cost tip: At 10k requests/month with 500 input + 1000 output tokens, GPT-5.5 costs ~— vs GPT-5.4 mini at ~— — a —× difference. Use Batch/Flex for async jobs to cut costs further (typically 50% off). Enable prompt caching when your system prompt exceeds 1,024 tokens — cached tokens are charged at 10% of list price.

Workload → Model

Task-to-Model Reference

Match your use case to the right model before you build. Primary = best quality. Alternative = lower cost or latency. Hover any row for details.

Task

Primary

Alternative

Why This Works

Decision Tree

How to Pick the Right Model

Follow this logic for every new project. When in doubt, start with GPT-5.4 mini and escalate only when needed.

Migration Paths

Upgrading Older Fleets

If you’re moving off legacy models, use these direct upgrade paths.

Architecture Pattern

Recommended Router Architecture

🏗️

The Multi-Tier Router Stack

GPT-5.4 nano → intent classification & extraction (cheapest, fastest) · GPT-5.4 mini → most chat/workflow steps (production default) · GPT-5.5 → escalation only (research-heavy, code-heavy, high-stakes) · text-embedding-3-small → all retrieval · omni-moderation-latest → safety pre/post filter (free) · Specialist models → media/voice/image when modality demands it.

This architecture aligns with OpenAI’s current portfolio design and is where the best cost-quality frontier sits for production apps.

Cost Optimization

Top Cost-Saving Moves

Use Batch/Flex for Async WorkOpenAI’s batch processing offers substantially lower effective token rates. If you don’t need real-time results, this is your biggest lever.

Aggressively Cache Long System PromptsCached tokens cost 90% less on most models. If your system prompt exceeds 1,024 tokens, caching can cut your input bill by 80%+.

Cap max_tokens on Routine FlowsModels will generate up to their maximum unless you tell them to stop. For extraction, classification, and short answers — always set a token cap.

Route by Task ComplexityDon’t send simple classification tasks to GPT-5.5. A nano model handles label prediction perfectly well at 40× lower cost per token.

Use Modality-Native SpecialistsSending audio through a general text model is expensive and less accurate than gpt-4o-transcribe. Same for images (GPT Image 2) and voice (gpt-realtime-1.5).

OpenAI Models 2026

OpenAI Models Landscape
Complete Reference & Cost Estimator

Cohort 10 Default Stack

Five Layers. One Portfolio.

Cadence Has Accelerated

What Changed vs 2024

Models to Actively Evaluate

Top-16 Models Compared

Utility Models

Estimate Your Monthly Cost

💰 OpenAI API Cost Estimator

Task-to-Model Reference

How to Pick the Right Model

Upgrading Older Fleets

Recommended Router Architecture

The Multi-Tier Router Stack

Top Cost-Saving Moves

Leave a Reply Cancel reply

About Us

Follow

Like & Share

Member ID: 22197

Interested in AI/Data Science?

OpenAI Models 2026

OpenAI Models LandscapeComplete Reference & Cost Estimator

Cohort 10 Default Stack

Five Layers. One Portfolio.

Cadence Has Accelerated

What Changed vs 2024

Models to Actively Evaluate

Top-16 Models Compared

Utility Models

Estimate Your Monthly Cost

💰 OpenAI API Cost Estimator

Task-to-Model Reference

How to Pick the Right Model

Upgrading Older Fleets

Recommended Router Architecture

The Multi-Tier Router Stack

Top Cost-Saving Moves

Leave a Reply Cancel reply

About Us

Follow

Like & Share

Member ID: 22197

Interested in AI/Data Science?

OpenAI Models Landscape
Complete Reference & Cost Estimator