chore(pricing): Update vertex-ai pricing by sivadurga-d · Pull Request #185 · Portkey-AI/models

sivadurga-d · 2026-03-02T10:39:21Z

🔄 Pricing Update: vertex-ai

📊 Summary

Change Type	Count
➕ Models added	30
🔄 Prices updated	25

➕ New Models

gemini-3.1-flash-image-preview
gemini-2.5-flash-live-api
gemini-2.0-flash-image-generation
gemini-2.0-flash-live-api
gemini-1.5-flash
gemini-1.5-pro
textembedding-gecko@002
gpt-oss-120b
gpt-oss-20b
mistral-ocr-25.05
mistral-medium-3
mistral-small-3.1-25.03
codestral-2
llama-3.3-70b-instruct-maas
llama-4-scout-17b-16e-instruct-maas
llama-4-maverick-17b-128e-instruct-maas
jamba-1.5-large
jamba-1.5-mini
qwen3-next-80b-thinking
qwen3-next-80b-instruct
... and 10 more

🔄 Updated Models (any field change)

gemini-3.1-pro-preview
gemini-3-pro-preview
gemini-3-flash-preview
gemini-2.5-pro
gemini-2.5-flash
gemini-2.5-flash-lite
gemini-2.0-flash
gemini-2.0-flash-lite
gemini-1.0-pro
imagen-4.0-ultra-generate-001
imagen-4.0-generate-001
imagen-4.0-fast-generate-001
imagen-3.0-generate-001
imagen-3.0-fast-generate-001
imagegeneration@006
imagegeneration@002
veo-3.1-fast-generate-001
text-embedding-005
text-embedding-004
text-multilingual-embedding-002
textembedding-gecko@003
textembedding-gecko@001
textembedding-gecko-multilingual@001
multimodalembedding@001
llama-3.1-405b-instruct-maas

📋 Model → Pricing Page Mapping

Google Models (Gemini, Imagen, Veo, Embedding)

Model ID	Pricing Page Section	Notes
gemini-3.1-pro-preview	Gemini 3 → Standard	Input $2/1M, Output $12/1M, Cache read $0.2/1M, Image output $120/1M tokens, Web search $14/1K searches (1.4¢)
gemini-3.1-flash-image-preview	Gemini 3 → Standard	Input $0.5/1M, Output $3/1M, Image output $60/1M tokens
gemini-3-pro-preview	Gemini 3 → Standard	Input $2/1M, Output $12/1M, Cache read $0.2/1M, Image output $120/1M tokens, Web search $14/1K searches (1.4¢)
gemini-3-flash-preview	Gemini 3 → Standard	Input $0.5/1M, Output $3/1M, Cache read $0.05/1M, Web search $14/1K searches (1.4¢)
gemini-2.5-pro	Gemini 2.5 → Standard	Input $1.25/1M, Output $10/1M, Cache read $0.125/1M, Web search $35/1K grounded prompts (3.5¢)
gemini-2.5-flash	Gemini 2.5 → Standard	Input $0.3/1M, Output $2.5/1M, Cache read $0.03/1M, Batch 50% discount, Image output $30/1M tokens, Web search $35/1K (3.5¢)
gemini-2.5-flash-live-api	Gemini 2.5 → Standard	Input text $0.5/1M, Output text $2/1M, Input audio $3/1M, Output audio $12/1M, Web search $35/1K (3.5¢)
gemini-2.5-flash-lite	Gemini 2.5 → Standard	Input $0.1/1M, Output $0.4/1M, Cache read $0.01/1M, Batch 50% discount, Web search $35/1K (3.5¢)
gemini-2.0-flash	Gemini 2.0 → Token-based	Input $0.15/1M, Output $0.6/1M, Input audio $1/1M, Batch 50% discount, Web search $35/1K (3.5¢)
gemini-2.0-flash-image-generation	Gemini 2.0 → Token-based	Input $0.15/1M, Output text $0.6/1M, Input audio $1/1M, Image output $30/1M tokens
gemini-2.0-flash-live-api	Gemini 2.0 → Token-based	Input text $0.5/1M, Output text $2/1M, Input audio $3/1M, Output audio $12/1M, Web search $35/1K (3.5¢)
gemini-2.0-flash-lite	Gemini 2.0 → Token-based	Input $0.075/1M, Output $0.3/1M, Input audio $0.075/1M, Batch 50% discount, Web search $35/1K (3.5¢)
gemini-1.5-flash	Other Gemini models	Input $0.01875/1K chars, Output $0.075/1K chars (≤128K context)
gemini-1.5-pro	Other Gemini models	Input $0.3125/1K chars, Output $1.25/1K chars (≤128K context)
gemini-1.0-pro	Other Gemini models	Input $0.125/1K chars, Output $0.375/1K chars
imagen-4.0-ultra-generate-001	Imagen	$0.06/image
imagen-4.0-generate-001	Imagen	$0.04/image
imagen-4.0-fast-generate-001	Imagen	$0.02/image
imagen-3.0-generate-001	Imagen	$0.04/image
imagen-3.0-fast-generate-001	Imagen	$0.02/image
imagegeneration@006	Imagen 2, Imagen 1	$0.020/image
imagegeneration@002	Imagen 2, Imagen 1	$0.020/image
veo-3.1-generate-001	Veo 3.1	$0.20/sec (720p, 1080p)
veo-3.1-fast-generate-001	Veo 3.1 Fast	$0.10/sec (720p, 1080p)
veo-3.0-generate-001	Veo 3	$0.20/sec (720p, 1080p)
veo-3.0-fast-generate-001	Veo 3 Fast	$0.10/sec (720p, 1080p)
veo-2.0-generate-001	Veo 2	$0.50/sec (720p)
text-embedding-005	Gemini Embedding	$0.00015/1K tokens (online), $0.00012/1K (batch)
text-embedding-004	Gemini Embedding	$0.00015/1K tokens (online), $0.00012/1K (batch)
text-multilingual-embedding-002	Embeddings for Text	$0.000025/1K chars (online), $0.00002/1K (batch)
textembedding-gecko@003	Embeddings for Text	$0.000025/1K chars (online), $0.00002/1K (batch)
textembedding-gecko@002	Embeddings for Text	$0.000025/1K chars (online), $0.00002/1K (batch)
textembedding-gecko@001	Embeddings for Text	$0.000025/1K chars (online), $0.00002/1K (batch)
textembedding-gecko-multilingual@001	Embeddings for Text	$0.000025/1K chars (online), $0.00002/1K (batch)
multimodalembedding@001	Embeddings for Multimodal	Text $0.0002/1K chars, Image $0.0001/image, Video Plus $0.0020/sec

Anthropic Claude Models

Model ID	Pricing Page Section	Notes
claude-opus-4-6	Claude → Global	Input $5/1M, Output $25/1M, 5m Cache Write $6.25/1M, Cache Hit $0.5/1M, Batch 50% discount
claude-opus-4-5@20251101	Claude → Global	Input $5/1M, Output $25/1M, 5m Cache Write $6.25/1M, Cache Hit $0.5/1M, Batch 50% discount
claude-opus-4-1@20250805	Claude → Uniform pricing	Input $15/1M, Output $75/1M, 5m Cache Write $18.75/1M, Cache Hit $1.5/1M, Batch 50% discount
claude-opus-4@20250514	Claude → Uniform pricing	Input $15/1M, Output $75/1M, 5m Cache Write $18.75/1M, Cache Hit $1.5/1M, Batch 50% discount
claude-sonnet-4-6	Claude → Global	Input $3/1M, Output $15/1M, 5m Cache Write $3.75/1M, Cache Hit $0.3/1M, Batch 50% discount
claude-sonnet-4-5@20250929	Claude → Global	Input $3/1M, Output $15/1M, 5m Cache Write $3.75/1M, Cache Hit $0.3/1M, Batch 50% discount
claude-sonnet-4@20250514	Claude → Uniform pricing	Input $3/1M, Output $15/1M, 5m Cache Write $3.75/1M, Cache Hit $0.3/1M, Batch 50% discount
claude-haiku-4-5@20251001	Claude → Global	Input $1/1M, Output $5/1M, 5m Cache Write $1.25/1M, Cache Hit $0.1/1M, Batch 50% discount

OpenAI Models

Model ID	Pricing Page Section	Notes
gpt-oss-120b	OpenAI's models	Input $0.09/1M, Output $0.36/1M, Batch 50% discount
gpt-oss-20b	OpenAI's models	Input $0.07/1M, Output $0.25/1M, Cache Hit $0.007/1M, Batch 50% discount

Mistral AI Models

Model ID	Pricing Page Section	Notes
mistral-ocr-25.05	Mistral AI's models	Input $0.0005/1M, Output $0.0005/1M
mistral-medium-3	Mistral AI's models	Input $0.40/1M, Output $2.00/1M
mistral-small-3.1-25.03	Mistral AI's models	Input $0.10/1M, Output $0.30/1M
codestral-2	Mistral AI's models	Input $0.30/1M, Output $0.90/1M

Meta Llama Models

Model ID	Pricing Page Section	Notes
llama-3.1-405b-instruct-maas	Meta's Llama models	Input $5.00/1M, Output $16.00/1M
llama-3.3-70b-instruct-maas	Meta's Llama models	Input $0.72/1M, Output $0.72/1M, Batch 50% discount
llama-4-scout-17b-16e-instruct-maas	Meta's Llama models	Input $0.25/1M, Output $0.70/1M, Batch 50% discount
llama-4-maverick-17b-128e-instruct-maas	Meta's Llama models	Input $0.35/1M, Output $1.15/1M, Batch 50% discount

AI21 Lab Models

Model ID	Pricing Page Section	Notes
jamba-1.5-large	AI21 Lab's models	Input $2/1M, Output $8/1M (Deprecated)
jamba-1.5-mini	AI21 Lab's models	Input $0.20/1M, Output $0.40/1M (Deprecated)

Qwen Models

Model ID	Pricing Page Section	Notes
qwen3-next-80b-thinking	Qwen's models	Input $0.15/1M, Output $1.20/1M
qwen3-next-80b-instruct	Qwen's models	Input $0.15/1M, Output $1.20/1M
qwen3-coder-480b-a35b-instruct	Qwen's models	Input $0.22/1M, Output $1.80/1M, Cache Hit $0.022/1M, Batch 50% discount
qwen3-235b-a22b-instruct-2507	Qwen's models	Input $0.22/1M, Output $0.88/1M, Batch 50% discount

Additional Partner Models

Model ID	Pricing Page Section	Notes
deepseek-v3.1	Deepseek's models	Input $0.60/1M, Output $1.70/1M, Cache Hit $0.06/1M, Batch 50% discount
deepseek-v3.2	Deepseek's models	Input $0.56/1M, Output $1.68/1M, Cache Hit $0.056/1M, Batch 50% discount
deepseek-r1-0528	Deepseek's models	Input $1.35/1M, Output $5.40/1M, Batch 50% discount
deepseek-ocr	Deepseek's models	Input $0.30/1M, Output $1.20/1M
minimax-m2	MiniMax's models	Input $0.30/1M, Output $1.20/1M, Cache Hit $0.03/1M
kimi-k2-thinking	Moonshot's models	Input $0.60/1M, Output $2.50/1M, Cache Hit $0.06/1M
glm-4.7	GLM's models	Input $0.60/1M, Output $2.20/1M
glm-5	GLM's models	Input $1/1M, Output $3.2/1M, Cache Hit $0.1/1M (Free until Feb 19, 2026)

🔍 Key Pricing Features

Web Search / Google Search: Converted from per 1,000 searches to cents per search (e.g., $14/1K → 1.4¢, $35/1K → 3.5¢)
Gemini Cache: Cache read only (no cache write for Gemini on Vertex)
Claude Cache: 5m Cache Write pricing used when both 5m and 1h options available
Batch API: 50% discount applied where available
Image Output: For Gemini models with image generation, stored as image_token in additional_units
Imagen: Per-image pricing stored in image_pricing structure
Veo: Per-second pricing for video generation with default duration/sample count
Embeddings: Per 1,000 tokens (not per million) for text embedding models
Model IDs: Used exact Vertex API model IDs, including version suffixes (@yyyymmdd) where applicable

📊 Coverage Summary

Google Models: 35 models (Gemini 3.x, 2.5, 2.0, 1.5, 1.0, Imagen, Veo, Embeddings)
Anthropic Claude: 8 models (Opus, Sonnet, Haiku families)
OpenAI: 2 models (gpt-oss variants)
Meta Llama: 4 models (Llama 3.x, 4 Scout, 4 Maverick)
Mistral AI: 4 models (OCR, Medium, Small, Codestral)
AI21: 2 models (Jamba variants)
Qwen: 4 models (Next-80B, Coder, Instruct variants)
Other Partners: 8 models (Deepseek, MiniMax, Moonshot, GLM)

Total: 67 models across all publishers

Generated by Pricing Agent on 2026-03-02 (update_mode: full)

chore(pricing): Update vertex-ai pricing

6b01766

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

chore(pricing): Update vertex-ai pricing#185

chore(pricing): Update vertex-ai pricing#185
sivadurga-d wants to merge 1 commit intomainfrom
pricing-update/vertex-ai-20260302103918-flves9

sivadurga-d commented Mar 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

sivadurga-d commented Mar 2, 2026

🔄 Pricing Update: vertex-ai

📊 Summary

➕ New Models

🔄 Updated Models (any field change)

📋 Model → Pricing Page Mapping

Google Models (Gemini, Imagen, Veo, Embedding)

Anthropic Claude Models

OpenAI Models

Mistral AI Models

Meta Llama Models

AI21 Lab Models

Qwen Models

Additional Partner Models

🔍 Key Pricing Features

📊 Coverage Summary

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant