Skip to content

chore(pricing): Update vertex-ai pricing#185

Open
sivadurga-d wants to merge 1 commit intomainfrom
pricing-update/vertex-ai-20260302103918-flves9
Open

chore(pricing): Update vertex-ai pricing#185
sivadurga-d wants to merge 1 commit intomainfrom
pricing-update/vertex-ai-20260302103918-flves9

Conversation

@sivadurga-d
Copy link
Contributor

🔄 Pricing Update: vertex-ai

📊 Summary

Change Type Count
➕ Models added 30
🔄 Prices updated 25

➕ New Models

  • gemini-3.1-flash-image-preview
  • gemini-2.5-flash-live-api
  • gemini-2.0-flash-image-generation
  • gemini-2.0-flash-live-api
  • gemini-1.5-flash
  • gemini-1.5-pro
  • textembedding-gecko@002
  • gpt-oss-120b
  • gpt-oss-20b
  • mistral-ocr-25.05
  • mistral-medium-3
  • mistral-small-3.1-25.03
  • codestral-2
  • llama-3.3-70b-instruct-maas
  • llama-4-scout-17b-16e-instruct-maas
  • llama-4-maverick-17b-128e-instruct-maas
  • jamba-1.5-large
  • jamba-1.5-mini
  • qwen3-next-80b-thinking
  • qwen3-next-80b-instruct
  • ... and 10 more

🔄 Updated Models (any field change)

  • gemini-3.1-pro-preview
  • gemini-3-pro-preview
  • gemini-3-flash-preview
  • gemini-2.5-pro
  • gemini-2.5-flash
  • gemini-2.5-flash-lite
  • gemini-2.0-flash
  • gemini-2.0-flash-lite
  • gemini-1.0-pro
  • imagen-4.0-ultra-generate-001
  • imagen-4.0-generate-001
  • imagen-4.0-fast-generate-001
  • imagen-3.0-generate-001
  • imagen-3.0-fast-generate-001
  • imagegeneration@006
  • imagegeneration@002
  • veo-3.1-fast-generate-001
  • text-embedding-005
  • text-embedding-004
  • text-multilingual-embedding-002
  • textembedding-gecko@003
  • textembedding-gecko@001
  • textembedding-gecko-multilingual@001
  • multimodalembedding@001
  • llama-3.1-405b-instruct-maas

📋 Model → Pricing Page Mapping

Google Models (Gemini, Imagen, Veo, Embedding)

Model ID Pricing Page Section Notes
gemini-3.1-pro-preview Gemini 3 → Standard Input $2/1M, Output $12/1M, Cache read $0.2/1M, Image output $120/1M tokens, Web search $14/1K searches (1.4¢)
gemini-3.1-flash-image-preview Gemini 3 → Standard Input $0.5/1M, Output $3/1M, Image output $60/1M tokens
gemini-3-pro-preview Gemini 3 → Standard Input $2/1M, Output $12/1M, Cache read $0.2/1M, Image output $120/1M tokens, Web search $14/1K searches (1.4¢)
gemini-3-flash-preview Gemini 3 → Standard Input $0.5/1M, Output $3/1M, Cache read $0.05/1M, Web search $14/1K searches (1.4¢)
gemini-2.5-pro Gemini 2.5 → Standard Input $1.25/1M, Output $10/1M, Cache read $0.125/1M, Web search $35/1K grounded prompts (3.5¢)
gemini-2.5-flash Gemini 2.5 → Standard Input $0.3/1M, Output $2.5/1M, Cache read $0.03/1M, Batch 50% discount, Image output $30/1M tokens, Web search $35/1K (3.5¢)
gemini-2.5-flash-live-api Gemini 2.5 → Standard Input text $0.5/1M, Output text $2/1M, Input audio $3/1M, Output audio $12/1M, Web search $35/1K (3.5¢)
gemini-2.5-flash-lite Gemini 2.5 → Standard Input $0.1/1M, Output $0.4/1M, Cache read $0.01/1M, Batch 50% discount, Web search $35/1K (3.5¢)
gemini-2.0-flash Gemini 2.0 → Token-based Input $0.15/1M, Output $0.6/1M, Input audio $1/1M, Batch 50% discount, Web search $35/1K (3.5¢)
gemini-2.0-flash-image-generation Gemini 2.0 → Token-based Input $0.15/1M, Output text $0.6/1M, Input audio $1/1M, Image output $30/1M tokens
gemini-2.0-flash-live-api Gemini 2.0 → Token-based Input text $0.5/1M, Output text $2/1M, Input audio $3/1M, Output audio $12/1M, Web search $35/1K (3.5¢)
gemini-2.0-flash-lite Gemini 2.0 → Token-based Input $0.075/1M, Output $0.3/1M, Input audio $0.075/1M, Batch 50% discount, Web search $35/1K (3.5¢)
gemini-1.5-flash Other Gemini models Input $0.01875/1K chars, Output $0.075/1K chars (≤128K context)
gemini-1.5-pro Other Gemini models Input $0.3125/1K chars, Output $1.25/1K chars (≤128K context)
gemini-1.0-pro Other Gemini models Input $0.125/1K chars, Output $0.375/1K chars
imagen-4.0-ultra-generate-001 Imagen $0.06/image
imagen-4.0-generate-001 Imagen $0.04/image
imagen-4.0-fast-generate-001 Imagen $0.02/image
imagen-3.0-generate-001 Imagen $0.04/image
imagen-3.0-fast-generate-001 Imagen $0.02/image
imagegeneration@006 Imagen 2, Imagen 1 $0.020/image
imagegeneration@002 Imagen 2, Imagen 1 $0.020/image
veo-3.1-generate-001 Veo 3.1 $0.20/sec (720p, 1080p)
veo-3.1-fast-generate-001 Veo 3.1 Fast $0.10/sec (720p, 1080p)
veo-3.0-generate-001 Veo 3 $0.20/sec (720p, 1080p)
veo-3.0-fast-generate-001 Veo 3 Fast $0.10/sec (720p, 1080p)
veo-2.0-generate-001 Veo 2 $0.50/sec (720p)
text-embedding-005 Gemini Embedding $0.00015/1K tokens (online), $0.00012/1K (batch)
text-embedding-004 Gemini Embedding $0.00015/1K tokens (online), $0.00012/1K (batch)
text-multilingual-embedding-002 Embeddings for Text $0.000025/1K chars (online), $0.00002/1K (batch)
textembedding-gecko@003 Embeddings for Text $0.000025/1K chars (online), $0.00002/1K (batch)
textembedding-gecko@002 Embeddings for Text $0.000025/1K chars (online), $0.00002/1K (batch)
textembedding-gecko@001 Embeddings for Text $0.000025/1K chars (online), $0.00002/1K (batch)
textembedding-gecko-multilingual@001 Embeddings for Text $0.000025/1K chars (online), $0.00002/1K (batch)
multimodalembedding@001 Embeddings for Multimodal Text $0.0002/1K chars, Image $0.0001/image, Video Plus $0.0020/sec

Anthropic Claude Models

Model ID Pricing Page Section Notes
claude-opus-4-6 Claude → Global Input $5/1M, Output $25/1M, 5m Cache Write $6.25/1M, Cache Hit $0.5/1M, Batch 50% discount
claude-opus-4-5@20251101 Claude → Global Input $5/1M, Output $25/1M, 5m Cache Write $6.25/1M, Cache Hit $0.5/1M, Batch 50% discount
claude-opus-4-1@20250805 Claude → Uniform pricing Input $15/1M, Output $75/1M, 5m Cache Write $18.75/1M, Cache Hit $1.5/1M, Batch 50% discount
claude-opus-4@20250514 Claude → Uniform pricing Input $15/1M, Output $75/1M, 5m Cache Write $18.75/1M, Cache Hit $1.5/1M, Batch 50% discount
claude-sonnet-4-6 Claude → Global Input $3/1M, Output $15/1M, 5m Cache Write $3.75/1M, Cache Hit $0.3/1M, Batch 50% discount
claude-sonnet-4-5@20250929 Claude → Global Input $3/1M, Output $15/1M, 5m Cache Write $3.75/1M, Cache Hit $0.3/1M, Batch 50% discount
claude-sonnet-4@20250514 Claude → Uniform pricing Input $3/1M, Output $15/1M, 5m Cache Write $3.75/1M, Cache Hit $0.3/1M, Batch 50% discount
claude-haiku-4-5@20251001 Claude → Global Input $1/1M, Output $5/1M, 5m Cache Write $1.25/1M, Cache Hit $0.1/1M, Batch 50% discount

OpenAI Models

Model ID Pricing Page Section Notes
gpt-oss-120b OpenAI's models Input $0.09/1M, Output $0.36/1M, Batch 50% discount
gpt-oss-20b OpenAI's models Input $0.07/1M, Output $0.25/1M, Cache Hit $0.007/1M, Batch 50% discount

Mistral AI Models

Model ID Pricing Page Section Notes
mistral-ocr-25.05 Mistral AI's models Input $0.0005/1M, Output $0.0005/1M
mistral-medium-3 Mistral AI's models Input $0.40/1M, Output $2.00/1M
mistral-small-3.1-25.03 Mistral AI's models Input $0.10/1M, Output $0.30/1M
codestral-2 Mistral AI's models Input $0.30/1M, Output $0.90/1M

Meta Llama Models

Model ID Pricing Page Section Notes
llama-3.1-405b-instruct-maas Meta's Llama models Input $5.00/1M, Output $16.00/1M
llama-3.3-70b-instruct-maas Meta's Llama models Input $0.72/1M, Output $0.72/1M, Batch 50% discount
llama-4-scout-17b-16e-instruct-maas Meta's Llama models Input $0.25/1M, Output $0.70/1M, Batch 50% discount
llama-4-maverick-17b-128e-instruct-maas Meta's Llama models Input $0.35/1M, Output $1.15/1M, Batch 50% discount

AI21 Lab Models

Model ID Pricing Page Section Notes
jamba-1.5-large AI21 Lab's models Input $2/1M, Output $8/1M (Deprecated)
jamba-1.5-mini AI21 Lab's models Input $0.20/1M, Output $0.40/1M (Deprecated)

Qwen Models

Model ID Pricing Page Section Notes
qwen3-next-80b-thinking Qwen's models Input $0.15/1M, Output $1.20/1M
qwen3-next-80b-instruct Qwen's models Input $0.15/1M, Output $1.20/1M
qwen3-coder-480b-a35b-instruct Qwen's models Input $0.22/1M, Output $1.80/1M, Cache Hit $0.022/1M, Batch 50% discount
qwen3-235b-a22b-instruct-2507 Qwen's models Input $0.22/1M, Output $0.88/1M, Batch 50% discount

Additional Partner Models

Model ID Pricing Page Section Notes
deepseek-v3.1 Deepseek's models Input $0.60/1M, Output $1.70/1M, Cache Hit $0.06/1M, Batch 50% discount
deepseek-v3.2 Deepseek's models Input $0.56/1M, Output $1.68/1M, Cache Hit $0.056/1M, Batch 50% discount
deepseek-r1-0528 Deepseek's models Input $1.35/1M, Output $5.40/1M, Batch 50% discount
deepseek-ocr Deepseek's models Input $0.30/1M, Output $1.20/1M
minimax-m2 MiniMax's models Input $0.30/1M, Output $1.20/1M, Cache Hit $0.03/1M
kimi-k2-thinking Moonshot's models Input $0.60/1M, Output $2.50/1M, Cache Hit $0.06/1M
glm-4.7 GLM's models Input $0.60/1M, Output $2.20/1M
glm-5 GLM's models Input $1/1M, Output $3.2/1M, Cache Hit $0.1/1M (Free until Feb 19, 2026)

🔍 Key Pricing Features

  • Web Search / Google Search: Converted from per 1,000 searches to cents per search (e.g., $14/1K → 1.4¢, $35/1K → 3.5¢)
  • Gemini Cache: Cache read only (no cache write for Gemini on Vertex)
  • Claude Cache: 5m Cache Write pricing used when both 5m and 1h options available
  • Batch API: 50% discount applied where available
  • Image Output: For Gemini models with image generation, stored as image_token in additional_units
  • Imagen: Per-image pricing stored in image_pricing structure
  • Veo: Per-second pricing for video generation with default duration/sample count
  • Embeddings: Per 1,000 tokens (not per million) for text embedding models
  • Model IDs: Used exact Vertex API model IDs, including version suffixes (@yyyymmdd) where applicable

📊 Coverage Summary

  • Google Models: 35 models (Gemini 3.x, 2.5, 2.0, 1.5, 1.0, Imagen, Veo, Embeddings)
  • Anthropic Claude: 8 models (Opus, Sonnet, Haiku families)
  • OpenAI: 2 models (gpt-oss variants)
  • Meta Llama: 4 models (Llama 3.x, 4 Scout, 4 Maverick)
  • Mistral AI: 4 models (OCR, Medium, Small, Codestral)
  • AI21: 2 models (Jamba variants)
  • Qwen: 4 models (Next-80B, Coder, Instruct variants)
  • Other Partners: 8 models (Deepseek, MiniMax, Moonshot, GLM)

Total: 67 models across all publishers


Generated by Pricing Agent on 2026-03-02 (update_mode: full)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant