chore(pricing): Update vertex-ai pricing by sivadurga-d · Pull Request #182 · Portkey-AI/models

sivadurga-d · 2026-03-01T18:56:20Z

🔄 Pricing Update: vertex-ai

📊 Summary

Change Type	Count
➕ Models added	30
🔄 Prices updated	22

➕ New Models

gemini-3.1-flash-image-preview
gemini-2.5-pro-computer-use-preview
gemini-2.5-flash-live-api
gemini-2.0-flash-image-generation
gemini-2.0-flash-live-api
gemini-1.5-flash
gemini-1.5-pro
multilingual-e5-small
multilingual-e5-large
gpt-oss-120b
gpt-oss-20b
llama-3.3-70b-instruct-maas
llama-4-scout-17b-16e-instruct-maas
llama-4-maverick-34b-16e-instruct-maas
mistral-ocr-25-05
mistral-medium-3
mistral-small-3.1-25-03
codestral-2
qwen3-next-80b-thinking
qwen3-next-80b-instruct
... and 10 more

🔄 Updated Models (any field change)

gemini-3.1-pro-preview
gemini-3-pro-preview
gemini-3-flash-preview
gemini-2.5-pro
gemini-2.5-flash
gemini-2.5-flash-lite
gemini-2.0-flash
gemini-2.0-flash-lite
gemini-1.0-pro
imagen-4.0-ultra-generate-001
imagen-4.0-generate-001
imagen-4.0-fast-generate-001
imagen-3.0-generate-001
imagen-3.0-fast-generate-001
veo-3.1-generate-001
veo-3.1-fast-generate-001
veo-3.0-generate-001
veo-3.0-fast-generate-001
text-embedding-004
textembedding-gecko@003
multimodalembedding@001
llama-3.1-405b-instruct-maas

📋 Model → Pricing Page Mapping

Google Models (Gemini, Imagen, Veo, Embeddings)

Model ID	Pricing Page Section	Notes
gemini-3.1-pro-preview	Gemini 3 > Standard	Input/output tokens + cache read + web search (1.4¢) + image output tokens
gemini-3.1-flash-image-preview	Gemini 3 > Standard	Input/output tokens + image output tokens (60¢/1M)
gemini-3-pro-preview	Gemini 3 > Standard	Input/output tokens + cache read + web search + image output tokens
gemini-3-flash-preview	Gemini 3 > Standard	Input/output/audio tokens + cache read + web search
gemini-2.5-pro	Gemini 2.5 > Standard	Input/output tokens + cache read + web search (3.5¢)
gemini-2.5-pro-computer-use-preview	Gemini 2.5 > Standard	Input/output tokens only
gemini-2.5-flash	Gemini 2.5 > Standard	Input/output tokens + cache read + web search + image output tokens
gemini-2.5-flash-live-api	Gemini 2.5 > Standard	Text/audio/image input tokens + text/audio output tokens
gemini-2.5-flash-lite	Gemini 2.5 > Standard	Input/output tokens + cache read + web search
gemini-2.0-flash	Gemini 2.0 > Standard	Input/output tokens + audio input + web search + batch API
gemini-2.0-flash-image-generation	Gemini 2.0 > Standard	Input/output tokens + audio/image input + image output tokens
gemini-2.0-flash-live-api	Gemini 2.0 > Standard	Text/audio/image input + text/audio output tokens
gemini-2.0-flash-lite	Gemini 2.0 > Standard	Input/output tokens + audio input + batch API
gemini-1.5-flash	Other Gemini models	Character-based pricing (converted to per_thousand_tokens)
gemini-1.5-pro	Other Gemini models	Character-based pricing (converted to per_thousand_tokens)
gemini-1.0-pro	Other Gemini models	Character-based pricing (converted to per_thousand_tokens)
imagen-4.0-ultra-generate-001	Imagen > Imagen 4 Ultra	$0.06 per image
imagen-4.0-generate-001	Imagen > Imagen 4	$0.04 per image
imagen-4.0-fast-generate-001	Imagen > Imagen 4 Fast	$0.02 per image
imagen-3.0-generate-001	Imagen > Imagen 3	$0.04 per image
imagen-3.0-fast-generate-001	Imagen > Imagen 3 Fast	$0.02 per image
veo-3.1-generate-001	Veo > Veo 3.1	20¢/sec video, 40¢/sec video+audio
veo-3.1-fast-generate-001	Veo > Veo 3.1 Fast	10¢/sec video, 15¢/sec video+audio
veo-3.0-generate-001	Veo > Veo 3	20¢/sec video, 40¢/sec video+audio
veo-3.0-fast-generate-001	Veo > Veo 3 Fast	10¢/sec video, 15¢/sec video+audio
veo-2.0-generate-001	Veo > Veo 2	50¢/sec video
text-embedding-004	Embedding models	$0.00015 per 1K tokens
gemini-embedding-001	Embedding models	$0.00015 per 1K tokens (with batch pricing)
textembedding-gecko@003	Embedding models	$0.000025 per 1K tokens (with batch pricing)
multimodalembedding@001	Embedding models	$0.0002 per 1K tokens + image/video pricing
multilingual-e5-small	Embedding models > Open Source	$0.000015 per 1K tokens (with batch pricing)
multilingual-e5-large	Embedding models > Open Source	$0.000025 per 1K tokens (with batch pricing)

Anthropic Models (Claude)

Model ID	Pricing Page Section	Notes
claude-opus-4-6	Anthropic's Claude models > Global	Input: $5, Output: $25, 5m Cache Write: $6.25, Cache Hit: $0.5 (with batch pricing)
claude-opus-4-5@20251101	Anthropic's Claude models > Global	Input: $5, Output: $25, 5m Cache Write: $6.25, Cache Hit: $0.5 (with batch pricing)
claude-opus-4-1@20250805	Anthropic's Claude models > Uniform	Input: $15, Output: $75, 5m Cache Write: $18.75, Cache Hit: $1.5 (with batch pricing)
claude-opus-4@20250514	Anthropic's Claude models > Uniform	Input: $15, Output: $75, 5m Cache Write: $18.75, Cache Hit: $1.5 (with batch pricing)
claude-sonnet-4-6	Anthropic's Claude models > Global	Input: $3, Output: $15, 5m Cache Write: $3.75, Cache Hit: $0.3 (with batch pricing)
claude-sonnet-4-5@20250929	Anthropic's Claude models > Global	Input: $3, Output: $15, 5m Cache Write: $3.75, Cache Hit: $0.3 (with batch pricing)
claude-sonnet-4@20250514	Anthropic's Claude models > Uniform	Input: $3, Output: $15, 5m Cache Write: $3.75, Cache Hit: $0.3 (with batch pricing)
claude-haiku-4-5@20251001	Anthropic's Claude models > Global	Input: $1, Output: $5, 5m Cache Write: $1.25, Cache Hit: $0.1 (with batch pricing)

Model IDs source: Claude on Vertex AI - Used canonical Vertex API model IDs from the official table.

OpenAI Models

Model ID	Pricing Page Section	Notes
gpt-oss-120b	OpenAI's models	Input: $0.09, Output: $0.36 (with batch pricing)
gpt-oss-20b	OpenAI's models	Input: $0.07, Output: $0.25, Cache Hit: $0.007 (with batch pricing)

Partner Models

Model ID	Pricing Page Section	Notes
llama-3.1-405b-instruct-maas	Meta's Llama models	Input: $5, Output: $16
llama-3.3-70b-instruct-maas	Meta's Llama models	Input: $0.72, Output: $0.72 (with batch pricing)
llama-4-scout-17b-16e-instruct-maas	Meta's Llama models	Input: $0.25, Output: $0.70 (with batch pricing)
llama-4-maverick-34b-16e-instruct-maas	Meta's Llama models	Input: $0.35, Output: $1.15 (with batch pricing)
mistral-ocr-25-05	Mistral AI's models	Input/Output: $0.0005 per 1M tokens (or $0.0005/page)
mistral-medium-3	Mistral AI's models	Input: $0.40, Output: $2.00
mistral-small-3.1-25-03	Mistral AI's models	Input: $0.10, Output: $0.30
codestral-2	Mistral AI's models	Input: $0.30, Output: $0.90
qwen3-next-80b-thinking	Qwen's models	Input: $0.15, Output: $1.20
qwen3-next-80b-instruct	Qwen's models	Input: $0.15, Output: $1.20
qwen3-coder-480b-a35b-instruct	Qwen's models	Input: $0.22, Output: $1.80, Cache Hit: $0.022 (with batch pricing)
qwen3-235b-a22b-instruct-2507	Qwen's models	Input: $0.22, Output: $0.88 (with batch pricing)
deepseek-v3.1	Deepseek's models	Input: $0.60, Output: $1.70, Cache Hit: $0.06 (with batch pricing)
deepseek-v3.2	Deepseek's models	Input: $0.56, Output: $1.68, Cache Hit: $0.056 (with batch pricing)
deepseek-r1-0528	Deepseek's models	Input: $1.35, Output: $5.40 (with batch pricing)
deepseek-ocr	Deepseek's models	Input: $0.30, Output: $1.20 (or $0.0003/page, $0.00012/page)
minimax-m2	MiniMax's models	Input: $0.30, Output: $1.20, Cache Hit: $0.03
kimi-k2-thinking	Moonshot's models	Input: $0.60, Output: $2.50, Cache Hit: $0.06
glm-4.7	GLM's models	Input: $0.60, Output: $2.20
glm-5	GLM's models	Input: $1.00, Output: $3.20, Cache Hit: $0.10

🔑 Key Pricing Details

Grounding with Google Search: $35 per 1,000 grounded prompts (with free daily allowances for some models)
Web Grounding for Enterprise: $45 per 1,000 grounded prompts
Grounding with Google Maps: $25 per 1,000 grounded prompts
Web search for Gemini 3: $14 per 1,000 search queries → converted to 1.4¢ per search
Web search for Gemini 2.x: $35 per 1,000 grounded prompts → converted to 3.5¢ per search
Cache pricing: Applied for Claude models (5m Cache Write preferred) and some Gemini models (cache read only)
Batch API: 50% discount applied where available

📝 Processing Notes

Used Global pricing tab (not regional) as specified in the skill
Web search unit conversion: Pricing page shows "$X per 1,000 searches" → converted to cents per search (e.g., $14/1000 = 1.4¢)
Embedding models: Pricing page uses per 1,000 tokens (not per 1M), so used price_unit: "per_thousand_tokens"
Claude model IDs: Used canonical Vertex API model IDs from Claude on Vertex AI documentation
No lte/gt categories: One entry per model as specified
Veo video pricing: Includes both video-only and video+audio pricing where applicable
Image output tokens for Gemini: Used additional.image_token field (not image_pricing which is for Imagen only)
All pricing in $/1M tokens except embeddings ($/1K tokens), Imagen ($/image), and Veo (cents/second)

Generated by Pricing Agent on 2026-03-01 (update_mode: full)

chore(pricing): Update vertex-ai pricing

ba21b50

sivadurga-d closed this Mar 2, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

chore(pricing): Update vertex-ai pricing#182

chore(pricing): Update vertex-ai pricing#182
sivadurga-d wants to merge 1 commit intomainfrom
pricing-update/vertex-ai-20260301185618-jb0i2w

sivadurga-d commented Mar 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

sivadurga-d commented Mar 1, 2026

🔄 Pricing Update: vertex-ai

📊 Summary

➕ New Models

🔄 Updated Models (any field change)

📋 Model → Pricing Page Mapping

Google Models (Gemini, Imagen, Veo, Embeddings)

Anthropic Models (Claude)

OpenAI Models

Partner Models

🔑 Key Pricing Details

📝 Processing Notes

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant