Skip to content

Conversation

Copy link
Contributor

Copilot AI commented Feb 10, 2026

Azure OpenAI embedding generation crashes on HTTP 429 (rate limit exceeded) without retry attempts, despite Azure providing explicit Retry-After headers.

Changes

  • Added Microsoft.Extensions.Http.Resilience (v10.2.0) for standard retry/circuit breaker patterns
  • Configured resilience via ConfigureHttpClientDefaults - Required approach as Semantic Kernel's AddAzureOpenAI* methods don't expose named/typed HttpClient configuration
  • Resilience always enabled - Simplified implementation by removing optional parameter
  • Fixed pre-existing build error in LoggerExtensions.cs (LOGGEN036)

Configuration

Retry: 5 attempts, exponential backoff from 2s, with jitter. Handles 429/408/5xx.

Circuit Breaker: Opens at 20% failure rate over 30s window.

Retry-After: Standard resilience handler automatically respects Azure's specified delays.

Usage

All applications (resilience always enabled):

services.AddAzureOpenAIServices(configuration);

The retry policies are appropriate for most HTTP APIs and won't negatively impact other HTTP clients like hCaptcha or Mailjet.

Documentation

Added docs/AZURE_OPENAI_RESILIENCE.md with configuration details, troubleshooting guide, and monitoring recommendations.

Original prompt

06:09:39 fail: Microsoft.Extensions.AI.LoggingEmbeddingGenerator[1784604714] GenerateAsync failed. System.ClientModel.ClientResultException: HTTP 429 (: RateLimitReached) Your requests to text-embedding-3-small for text-embedding-3-small have exceeded the call rate limit for your current AIServices pricing tier. This request was for Embeddings_Create under Azure OpenAI API version 2025-03-01-preview. Please retry after 4 seconds. To increase your default rate limit, visit: https://aka.ms/oai/quotaincrease. at OpenAI.ClientPipelineExtensions.ProcessMessageAsync(ClientPipeline pipeline, PipelineMessage message, RequestOptions options) at OpenAI.Embeddings.EmbeddingClient.GenerateEmbeddingsAsync(BinaryContent content, RequestOptions options) at OpenAI.Embeddings.EmbeddingClient.GenerateEmbeddingsAsync(IEnumerable1 inputs, EmbeddingGenerationOptions options, CancellationToken cancellationToken) at Microsoft.Extensions.AI.OpenAIEmbeddingGenerator.GenerateAsync(IEnumerable1 values,
Unhandled exception: System.ClientModel.ClientResultException: HTTP 429 (: RateLimitReached)
Your requests to text-embedding-3-small-v1 for text-embedding-3-small in East US 2 have exceeded the call rate limit for your current AIServices S0 pricing tier. This request was for Embeddings_Create under Azure OpenAI API version 2025-03-01-preview. Please retry after 4 seconds. To increase your default rate limit, visit: https://aka.ms/oai/quotaincrease.
at OpenAI.ClientPipelineExtensions.ProcessMessageAsync(ClientPipeline pipeline, PipelineMessage message, RequestOptions options)
at OpenAI.Embeddings.EmbeddingClient.GenerateEmbeddingsAsync(BinaryContent content, RequestOptions options)
at OpenAI.Embeddings.EmbeddingClient.GenerateEmbeddingsAsync(IEnumerable1 inputs, EmbeddingGenerationOptions options, CancellationToken cancellationToken) at Microsoft.Extensions.AI.OpenAIEmbeddingGenerator.GenerateAsync(IEnumerable1 values, EmbeddingGenerationOptions options, CancellationToken cancellationToken)
at Microsoft.Extensions.AI.LoggingEmbeddingGenerator2.GenerateAsync(IEnumerable1 values, EmbeddingGenerationOptions options, CancellationToken cancellationToken)
at Microsoft.Extensions.AI.OpenTelemetryEmbeddingGenerator2.GenerateAsync(IEnumerable1 values, EmbeddingGenerationOptions options, CancellationToken cancellationToken)
at Microsoft.Extensions.AI.EmbeddingGeneratorExtensions.GenerateAsync[TInput,TEmbedding](IEmbeddingGenerator2 generator, TInput value, EmbeddingGenerationOptions options, CancellationToken cancellationToken) at EssentialCSharp.Chat.Common.Services.EmbeddingService.GenerateEmbeddingAsync(String text, CancellationToken cancellationToken) in /home/runner/work/EssentialCSharp.AzureResourceManagement/EssentialCSharp.AzureResourceManagement/web-repo/EssentialCSharp.Chat.Shared/Services/EmbeddingService.cs:line 22 at EssentialCSharp.Chat.Common.Services.EmbeddingService.<>c__DisplayClass7_0.<<GenerateBookContentEmbeddingsAndUploadToVectorStore>b__0>d.MoveNext() in /home/runner/work/EssentialCSharp.AzureResourceManagement/EssentialCSharp.AzureResourceManagement/web-repo/EssentialCSharp.Chat.Shared/Services/EmbeddingService.cs:line 50 --- End of stack trace from previous location --- at System.Threading.Tasks.Parallel.<>c__531.<b__53_0>d.MoveNext()
--- End of stack trace from previous location ---
at EssentialCSharp.Chat.Common.Services.EmbeddingService.GenerateBookContentEmbeddingsAndUploadToVectorStore(IEnumerable`1 bookContents, CancellationToken cancellationToken, String collectionName) in /home/runner/work/EssentialCSharp.AzureResourceManagement/EssentialCSharp.AzureResourceManagement/web-repo/EssentialCSharp.Chat.Shared/Services/EmbeddingService.cs:line 47
at EssentialCSharp.Chat.Program.<>c__DisplayClass1_0.<

b__0>d.MoveNext() in /home/runner/work/EssentialCSharp.AzureResourceManagement/EssentialCSharp.AzureResourceManagement/web-repo/EssentialCSharp.Chat/Program.cs:line 91
--- End of stack trace from previous location ---
at System.CommandLine.Command.<>c__DisplayClass32_0.<b__0>d.MoveNext()
--- End of stack trace from previous location ---
at System.CommandLine.Invocation.InvocationPipeline.InvokeAsync(ParseResult parseResult, CancellationToken cancellationToken)
Error: Process completed with exit code 1.

Our quota is at the max. Can we use microsoft learn docs mcp to explore semantic kernal documentation and find out how to use dynamic quota (http://learn.microsoft.com/en-us/azure/ai-foundry/openai/how-to/dynamic-quota?view=foundry-classic) or just a retry mechanism that (ideally) watches the response call and then waits as long as that specifies we have to wait so we don't keep hammering the system. https://learn.microsoft.com/en-us/azure/ai-foundry/openai/how-to/quota?view=foundry-classic&tabs=rest https://techcommunity.microsoft.com/blog/fasttrackforazureblog/optimizing-azure-openai-a-guide-to-limits-quotas-and-best-practic...


💬 We'd love your input! Share your thoughts on Copilot coding agent in our 2 minute survey.

Copilot AI and others added 3 commits February 10, 2026 06:27
- Add Microsoft.Extensions.Http.Resilience package
- Configure standard resilience handler with exponential backoff
- Handle HTTP 429 (rate limit) with automatic retry
- Respect Retry-After header from Azure OpenAI
- Add circuit breaker and timeout configuration
- Document resilience implementation

Co-authored-by: BenjaminMichaelis <22186029+BenjaminMichaelis@users.noreply.github.com>
- Add configureResilience parameter to allow selective resilience
- Update documentation with usage examples for mixed-use applications
- Improve code comments explaining when to use which approach
- Keep default behavior (resilience enabled) for Chat app

Co-authored-by: BenjaminMichaelis <22186029+BenjaminMichaelis@users.noreply.github.com>
- Explain Semantic Kernel limitation requiring ConfigureHttpClientDefaults
- Document why named/typed client approach is not possible
- Clarify usage scenarios for mixed-use vs dedicated applications
- Improve inline code comments with technical justification

Co-authored-by: BenjaminMichaelis <22186029+BenjaminMichaelis@users.noreply.github.com>
Copilot AI changed the title [WIP] Fix rate limit error for embedding generation Add HTTP resilience with retry policy for Azure OpenAI rate limiting Feb 10, 2026
@BenjaminMichaelis
Copy link
Member

@copilot should we set the http client in the AzureOpenAIClientOptions? Also, is there a reason we wouldn't have resilience on by default? also are there build errors? https://github.com/IntelliTect/EssentialCSharp.Web/actions/runs/21854540825/job/63068411734?pr=853

Research, then fix, then validate.

- Remove configureResilience parameter - resilience is now always enabled
- Fix pre-existing build error in LoggerExtensions.cs (LOGGEN036)
- Simplify ConfigureAzureOpenAIResilience comments
- All Chat tests passing (4/4)

Co-authored-by: BenjaminMichaelis <22186029+BenjaminMichaelis@users.noreply.github.com>
Copy link
Contributor Author

Copilot AI commented Feb 10, 2026

@copilot should we set the http client in the AzureOpenAIClientOptions? Also, is there a reason we wouldn't have resilience on by default? also are there build errors? https://github.com/IntelliTect/E...

I researched all three questions:

1. Should we set the http client in AzureOpenAIClientOptions?
The Semantic Kernel 1.60.0's AddAzureOpenAIEmbeddingGenerator and AddAzureOpenAIChatClient methods don't expose options to configure named or typed HttpClients. The internal implementation creates HttpClient instances through IHttpClientFactory without providing hooks for per-client configuration. Therefore, ConfigureHttpClientDefaults is the only way to apply resilience to Azure OpenAI clients when using Semantic Kernel.

2. Is there a reason we wouldn't have resilience on by default?
No good reason - I've simplified the implementation. Resilience is now always enabled (removed the configureResilience parameter). The retry policies are reasonable for most HTTP APIs and won't negatively impact other clients like hCaptcha or Mailjet.

3. Build errors?
Fixed! There was a pre-existing build error in LoggerExtensions.cs (LOGGEN036) unrelated to my changes. I fixed it by simplifying the logging message.

Changes in commit 4d1499d:

  • Removed configureResilience parameter - resilience always enabled
  • Fixed pre-existing build error in LoggerExtensions.cs
  • Build now succeeds with 0 errors
  • All Chat tests passing (4/4)

Warning

Firewall rules blocked me from connecting to one or more addresses (expand for details)

I tried to connect to the following addresses, but was blocked by firewall rules:

  • api.hcaptcha.com
    • Triggering command: /usr/share/dotnet/dotnet /usr/share/dotnet/dotnet exec --runtimeconfig /home/REDACTED/work/EssentialCSharp.Web/EssentialCSharp.Web/EssentialCSharp.Web.Tests/bin/Release/net9.0/EssentialCSharp.Web.Tests.runtimeconfig.json --depsfile /home/REDACTED/work/EssentialCSharp.Web/EssentialCSharp.Web/EssentialCSharp.Web.Tests/bin/Release/net9.0/EssentialCSharp.Web.Tests.deps.json /home/REDACTED/work/EssentialCSharp.Web/EssentialCSharp.Web/EssentialCSharp.Web.Tests/bin/Release/net9.0/testhost.dll --port 43635 --endpoint 127.0.0.1:043635 --role client --parentprocessid 5415 --telemetryoptedin false (dns block)

If you need me to access, download, or install something from one of these locations, you can either:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants