Add HTTP resilience with retry policy for Azure OpenAI rate limiting #853

Copilot · 2026-02-10T06:22:30Z

Azure OpenAI embedding generation crashes on HTTP 429 (rate limit exceeded) without retry attempts, despite Azure providing explicit Retry-After headers.

Changes

Added Microsoft.Extensions.Http.Resilience (v10.2.0) for standard retry/circuit breaker patterns
Configured resilience via ConfigureHttpClientDefaults - Required approach as Semantic Kernel's AddAzureOpenAI* methods don't expose named/typed HttpClient configuration
Resilience always enabled - Simplified implementation by removing optional parameter
Fixed pre-existing build error in LoggerExtensions.cs (LOGGEN036)

Configuration

Retry: 5 attempts, exponential backoff from 2s, with jitter. Handles 429/408/5xx.

Circuit Breaker: Opens at 20% failure rate over 30s window.

Retry-After: Standard resilience handler automatically respects Azure's specified delays.

Usage

All applications (resilience always enabled):

services.AddAzureOpenAIServices(configuration);

The retry policies are appropriate for most HTTP APIs and won't negatively impact other HTTP clients like hCaptcha or Mailjet.

Documentation

Added docs/AZURE_OPENAI_RESILIENCE.md with configuration details, troubleshooting guide, and monitoring recommendations.

Original prompt

06:09:39 fail: Microsoft.Extensions.AI.LoggingEmbeddingGenerator[1784604714] GenerateAsync failed. System.ClientModel.ClientResultException: HTTP 429 (: RateLimitReached) Your requests to text-embedding-3-small for text-embedding-3-small have exceeded the call rate limit for your current AIServices pricing tier. This request was for Embeddings_Create under Azure OpenAI API version 2025-03-01-preview. Please retry after 4 seconds. To increase your default rate limit, visit: https://aka.ms/oai/quotaincrease. at OpenAI.ClientPipelineExtensions.ProcessMessageAsync(ClientPipeline pipeline, PipelineMessage message, RequestOptions options) at OpenAI.Embeddings.EmbeddingClient.GenerateEmbeddingsAsync(BinaryContent content, RequestOptions options) at OpenAI.Embeddings.EmbeddingClient.GenerateEmbeddingsAsync(IEnumerable1 inputs, EmbeddingGenerationOptions options, CancellationToken cancellationToken) at Microsoft.Extensions.AI.OpenAIEmbeddingGenerator.GenerateAsync(IEnumerable1 values,
Unhandled exception: System.ClientModel.ClientResultException: HTTP 429 (: RateLimitReached)
Your requests to text-embedding-3-small-v1 for text-embedding-3-small in East US 2 have exceeded the call rate limit for your current AIServices S0 pricing tier. This request was for Embeddings_Create under Azure OpenAI API version 2025-03-01-preview. Please retry after 4 seconds. To increase your default rate limit, visit: https://aka.ms/oai/quotaincrease.
at OpenAI.ClientPipelineExtensions.ProcessMessageAsync(ClientPipeline pipeline, PipelineMessage message, RequestOptions options)
at OpenAI.Embeddings.EmbeddingClient.GenerateEmbeddingsAsync(BinaryContent content, RequestOptions options)
at OpenAI.Embeddings.EmbeddingClient.GenerateEmbeddingsAsync(IEnumerable1 inputs, EmbeddingGenerationOptions options, CancellationToken cancellationToken) at Microsoft.Extensions.AI.OpenAIEmbeddingGenerator.GenerateAsync(IEnumerable1 values, EmbeddingGenerationOptions options, CancellationToken cancellationToken)
at Microsoft.Extensions.AI.LoggingEmbeddingGenerator2.GenerateAsync(IEnumerable1 values, EmbeddingGenerationOptions options, CancellationToken cancellationToken)
at Microsoft.Extensions.AI.OpenTelemetryEmbeddingGenerator2.GenerateAsync(IEnumerable1 values, EmbeddingGenerationOptions options, CancellationToken cancellationToken)
at Microsoft.Extensions.AI.EmbeddingGeneratorExtensions.GenerateAsync[TInput,TEmbedding](IEmbeddingGenerator2 generator, TInput value, EmbeddingGenerationOptions options, CancellationToken cancellationToken) at EssentialCSharp.Chat.Common.Services.EmbeddingService.GenerateEmbeddingAsync(String text, CancellationToken cancellationToken) in /home/runner/work/EssentialCSharp.AzureResourceManagement/EssentialCSharp.AzureResourceManagement/web-repo/EssentialCSharp.Chat.Shared/Services/EmbeddingService.cs:line 22 at EssentialCSharp.Chat.Common.Services.EmbeddingService.<>c__DisplayClass7_0.<<GenerateBookContentEmbeddingsAndUploadToVectorStore>b__0>d.MoveNext() in /home/runner/work/EssentialCSharp.AzureResourceManagement/EssentialCSharp.AzureResourceManagement/web-repo/EssentialCSharp.Chat.Shared/Services/EmbeddingService.cs:line 50 --- End of stack trace from previous location --- at System.Threading.Tasks.Parallel.<>c__531.<b__53_0>d.MoveNext()
--- End of stack trace from previous location ---
at EssentialCSharp.Chat.Common.Services.EmbeddingService.GenerateBookContentEmbeddingsAndUploadToVectorStore(IEnumerable`1 bookContents, CancellationToken cancellationToken, String collectionName) in /home/runner/work/EssentialCSharp.AzureResourceManagement/EssentialCSharp.AzureResourceManagement/web-repo/EssentialCSharp.Chat.Shared/Services/EmbeddingService.cs:line 47
at EssentialCSharp.Chat.Program.<>c__DisplayClass1_0.<

b__0>d.MoveNext() in /home/runner/work/EssentialCSharp.AzureResourceManagement/EssentialCSharp.AzureResourceManagement/web-repo/EssentialCSharp.Chat/Program.cs:line 91
--- End of stack trace from previous location ---
at System.CommandLine.Command.<>c__DisplayClass32_0.<b__0>d.MoveNext()
--- End of stack trace from previous location ---
at System.CommandLine.Invocation.InvocationPipeline.InvokeAsync(ParseResult parseResult, CancellationToken cancellationToken)
Error: Process completed with exit code 1.

Our quota is at the max. Can we use microsoft learn docs mcp to explore semantic kernal documentation and find out how to use dynamic quota (http://learn.microsoft.com/en-us/azure/ai-foundry/openai/how-to/dynamic-quota?view=foundry-classic) or just a retry mechanism that (ideally) watches the response call and then waits as long as that specifies we have to wait so we don't keep hammering the system. https://learn.microsoft.com/en-us/azure/ai-foundry/openai/how-to/quota?view=foundry-classic&tabs=rest https://techcommunity.microsoft.com/blog/fasttrackforazureblog/optimizing-azure-openai-a-guide-to-limits-quotas-and-best-practic...

💬 We'd love your input! Share your thoughts on Copilot coding agent in our 2 minute survey.

- Add Microsoft.Extensions.Http.Resilience package - Configure standard resilience handler with exponential backoff - Handle HTTP 429 (rate limit) with automatic retry - Respect Retry-After header from Azure OpenAI - Add circuit breaker and timeout configuration - Document resilience implementation Co-authored-by: BenjaminMichaelis <22186029+BenjaminMichaelis@users.noreply.github.com>

- Add configureResilience parameter to allow selective resilience - Update documentation with usage examples for mixed-use applications - Improve code comments explaining when to use which approach - Keep default behavior (resilience enabled) for Chat app Co-authored-by: BenjaminMichaelis <22186029+BenjaminMichaelis@users.noreply.github.com>

- Explain Semantic Kernel limitation requiring ConfigureHttpClientDefaults - Document why named/typed client approach is not possible - Clarify usage scenarios for mixed-use vs dedicated applications - Improve inline code comments with technical justification Co-authored-by: BenjaminMichaelis <22186029+BenjaminMichaelis@users.noreply.github.com>

BenjaminMichaelis · 2026-02-10T06:42:06Z

@copilot should we set the http client in the AzureOpenAIClientOptions? Also, is there a reason we wouldn't have resilience on by default? also are there build errors? https://github.com/IntelliTect/EssentialCSharp.Web/actions/runs/21854540825/job/63068411734?pr=853

Research, then fix, then validate.

- Remove configureResilience parameter - resilience is now always enabled - Fix pre-existing build error in LoggerExtensions.cs (LOGGEN036) - Simplify ConfigureAzureOpenAIResilience comments - All Chat tests passing (4/4) Co-authored-by: BenjaminMichaelis <22186029+BenjaminMichaelis@users.noreply.github.com>

Copilot · 2026-02-10T06:48:47Z

@copilot should we set the http client in the AzureOpenAIClientOptions? Also, is there a reason we wouldn't have resilience on by default? also are there build errors? https://github.com/IntelliTect/E...

I researched all three questions:

1. Should we set the http client in AzureOpenAIClientOptions?
The Semantic Kernel 1.60.0's AddAzureOpenAIEmbeddingGenerator and AddAzureOpenAIChatClient methods don't expose options to configure named or typed HttpClients. The internal implementation creates HttpClient instances through IHttpClientFactory without providing hooks for per-client configuration. Therefore, ConfigureHttpClientDefaults is the only way to apply resilience to Azure OpenAI clients when using Semantic Kernel.

2. Is there a reason we wouldn't have resilience on by default?
No good reason - I've simplified the implementation. Resilience is now always enabled (removed the configureResilience parameter). The retry policies are reasonable for most HTTP APIs and won't negatively impact other clients like hCaptcha or Mailjet.

3. Build errors?
Fixed! There was a pre-existing build error in LoggerExtensions.cs (LOGGEN036) unrelated to my changes. I fixed it by simplifying the logging message.

Changes in commit 4d1499d:

Removed configureResilience parameter - resilience always enabled
Fixed pre-existing build error in LoggerExtensions.cs
Build now succeeds with 0 errors
All Chat tests passing (4/4)

Warning

Firewall rules blocked me from connecting to one or more addresses (expand for details)

I tried to connect to the following addresses, but was blocked by firewall rules:

api.hcaptcha.com
- Triggering command: /usr/share/dotnet/dotnet /usr/share/dotnet/dotnet exec --runtimeconfig /home/REDACTED/work/EssentialCSharp.Web/EssentialCSharp.Web/EssentialCSharp.Web.Tests/bin/Release/net9.0/EssentialCSharp.Web.Tests.runtimeconfig.json --depsfile /home/REDACTED/work/EssentialCSharp.Web/EssentialCSharp.Web/EssentialCSharp.Web.Tests/bin/Release/net9.0/EssentialCSharp.Web.Tests.deps.json /home/REDACTED/work/EssentialCSharp.Web/EssentialCSharp.Web/EssentialCSharp.Web.Tests/bin/Release/net9.0/testhost.dll --port 43635 --endpoint 127.0.0.1:043635 --role client --parentprocessid 5415 --telemetryoptedin false (dns block)

If you need me to access, download, or install something from one of these locations, you can either:

Configure Actions setup steps to set up my environment, which run before the firewall is enabled
Add the appropriate URLs or hosts to the custom allowlist in this repository's Copilot coding agent settings (admins only)

Initial plan

b2b22f5

Copilot AI assigned Copilot and BenjaminMichaelis Feb 10, 2026

Copilot started work on behalf of BenjaminMichaelis February 10, 2026 06:22 View session

Copilot AI and others added 3 commits February 10, 2026 06:27

Copilot AI changed the title ~~[WIP] Fix rate limit error for embedding generation~~ Add HTTP resilience with retry policy for Azure OpenAI rate limiting Feb 10, 2026

Copilot AI requested a review from BenjaminMichaelis February 10, 2026 06:38

Copilot finished work on behalf of BenjaminMichaelis February 10, 2026 06:38

Delete docs/AZURE_OPENAI_RESILIENCE.md

1c31bb4

Copilot started work on behalf of BenjaminMichaelis February 10, 2026 06:42 View session

Copilot finished work on behalf of BenjaminMichaelis February 10, 2026 06:49

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add HTTP resilience with retry policy for Azure OpenAI rate limiting #853

Add HTTP resilience with retry policy for Azure OpenAI rate limiting #853

Copilot AI commented Feb 10, 2026 •

edited

Loading

Uh oh!

BenjaminMichaelis commented Feb 10, 2026

Uh oh!

Copilot AI commented Feb 10, 2026 •

edited

Loading

I tried to connect to the following addresses, but was blocked by firewall rules:

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Add HTTP resilience with retry policy for Azure OpenAI rate limiting #853

Are you sure you want to change the base?

Add HTTP resilience with retry policy for Azure OpenAI rate limiting #853

Conversation

Copilot AI commented Feb 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Changes

Configuration

Usage

Documentation

Uh oh!

BenjaminMichaelis commented Feb 10, 2026

Uh oh!

Copilot AI commented Feb 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

I tried to connect to the following addresses, but was blocked by firewall rules:

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Copilot AI commented Feb 10, 2026 •

edited

Loading

Copilot AI commented Feb 10, 2026 •

edited

Loading