Skip to content

# Fix: Occasional 502 Internal Server Error Returning Raw HTML via Python SDK #1923#6

Open
Hellnight2005 wants to merge 3 commits intolingodotdev:mainfrom
Hellnight2005:fix/python-sdk-502-html-response
Open

# Fix: Occasional 502 Internal Server Error Returning Raw HTML via Python SDK #1923#6
Hellnight2005 wants to merge 3 commits intolingodotdev:mainfrom
Hellnight2005:fix/python-sdk-502-html-response

Conversation

@Hellnight2005
Copy link

@Hellnight2005 Hellnight2005 commented Feb 4, 2026

Fix: Occasional 502 Internal Server Error Returning Raw HTML via Python SDK #1923

Description

This PR fixes the issue where the Python SDK raises a RuntimeError containing raw HTML when a 502 Bad Gateway error is received from the API (typically from an upstream proxy like Nginx or Cloudflare). This behavior caused logs to be flooded with HTML content and made exception handling difficult.

Implementation Details

  • Modified src/lingodotdev/engine.py to sanitize error messages for 5xx responses (specifically in _localize_chunk, recognize_locale, and whoami).
  • The SDK now attempts to parse the response as JSON to extract a structured error message.
  • If successful and it contains an "error" field, append that to the exception message.
  • If unsuccessful (e.g., HTML response), suppress the body content to prevent HTML dumping.

Type of Change

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • Documentation update
  • Performance improvement
  • Code refactoring

Testing

  • Tests pass locally
  • New tests added for new functionality
  • Integration tests pass

Verification Steps

I created a reproduction script reproduce_502_error.py that mocks a 502 Bad Gateway response with an HTML body to verify the fix.

Before Fix:

Server error (502): Bad Gateway. 
<html>...</html>
. This may be due to temporary service issues.

After Fix:

Server error (502): Bad Gateway. This may be due to temporary service issues.

I also added a permanent unit regression test in tests/test_502_handling.py.

Checklist

  • My code follows the style guidelines of this project
  • I have performed a self-review of my own code
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings
  • New and existing unit tests pass locally with my changes

Commit Message Format

  • fix: sanitize 502 HTML responses from error messages

Summary by CodeRabbit

  • Bug Fixes

    • Server error messages for 500-range failures now optionally include sanitized server-provided details and show truncated, safe response previews; 400-range responses also produce safer decoded previews without leaking raw HTML or binary content.
    • Malformed or non-JSON (including malformed Unicode) responses are handled more robustly to avoid exposing raw data while giving clearer error context.
  • Tests

    • Added tests covering 500/502 HTML and JSON error messaging, and malformed-unicode and 400-response handling.

@coderabbitai
Copy link

coderabbitai bot commented Feb 4, 2026

📝 Walkthrough

Walkthrough

Enhances error handling in LingoDotDevEngine: _safe_parse_json gains a Unicode-safe fallback that decodes response text when JSON decoding raises UnicodeDecodeError; _localize_chunk, recognize_locale, and whoami append an extracted "error" field from JSON bodies for 500–599 responses when available. Adds tests for HTML/JSON 5xx responses and malformed-unicode responses (including a 400 case).

Changes

Cohort / File(s) Summary
Engine error handling
src/lingodotdev/engine.py
Broadened JSON parse error handling to catch UnicodeDecodeError and fall back to decoded response text (with replacement for undecodable bytes); for 500–599 responses attempt to extract an "error" field from JSON and append it to server-error messages in _localize_chunk, recognize_locale, and whoami; preserves previous behavior for non-JSON and non-500 responses.
5xx handling tests
tests/test_502_handling.py
Adds async tests: test_502_html_handling ensures HTML/non-JSON 502 responses produce sanitized server-error messages without leaking raw HTML; test_500_json_handling ensures 500 JSON body "error" is surfaced in raised errors.
Malformed-unicode tests
tests/test_unicode_handling.py
Adds test_malformed_unicode_handling and test_unicode_error_in_400_response to simulate UnicodeDecodeError from response .json() and .text, asserting _safe_parse_json and higher-level handlers produce controlled errors that include a truncated response preview and appropriate status-specific messages.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

Suggested reviewers

  • vrcprl

Poem

🐇 I sniffed the server's cryptic rhyme,
500s hummed a secret line.
No wild HTML to sow dismay,
JSON murmurs what they say.
Hop—error trimmed and stored away.

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title directly addresses the main change: fixing 502 errors returning raw HTML by sanitizing error messages in the Python SDK, which aligns with the substantial modifications to error handling across multiple engine methods.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In `@tests/test_502_handling.py`:
- Around line 7-60: Update both tests to use the engine as an async context
manager so the underlying httpx.AsyncClient is closed: replace direct
instantiation of LingoDotDevEngine in test_502_html_handling and
test_500_json_handling with "async with LingoDotDevEngine(config) as engine"
(leveraging the class's __aenter__/__aexit__ support) before calling
engine.localize_text, ensuring the client is cleaned up and preventing
ResourceWarnings.
🧹 Nitpick comments (1)
src/lingodotdev/engine.py (1)

187-198: Avoid broad exception swallowing in error-detail parsing.

Catching Exception with pass can mask unexpected failures, and the code may append non-string or empty error values. Narrow the exception to JSON decode errors and only append a non-empty string; apply the same refinement to the other 5xx handlers.

♻️ Suggested refactor
                 if 500 <= response.status_code < 600:
                     error_details = ""
                     try:
-                        error_json = response.json()
-                        if isinstance(error_json, dict) and "error" in error_json:
-                            error_details = f" {error_json['error']}"
-                    except Exception:
-                        pass
+                        error_json = response.json()
+                    except ValueError:
+                        error_json = None
+
+                    if isinstance(error_json, dict):
+                        error_text = error_json.get("error")
+                        if isinstance(error_text, str) and error_text.strip():
+                            error_details = f" {error_text}"

@Hellnight2005 Hellnight2005 force-pushed the fix/python-sdk-502-html-response branch from f903312 to 3feaef3 Compare February 4, 2026 14:04
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In `@tests/test_502_handling.py`:
- Around line 32-38: The test currently expects HTML preview leakage; update the
assertions in tests/test_502_handling.py to enforce sanitization: use assert
"Response:" not in error_msg and assert "<html>" not in error_msg (invert the
two checks), and remove or relax the truncation-specific assertion (the assert
comparing len(error_msg) to len(html_body) should be removed or replaced with a
non-dependent length check) so the test no longer requires an HTML preview;
refer to the variables error_msg and html_body to locate the assertions to
change.

@Hellnight2005 Hellnight2005 force-pushed the fix/python-sdk-502-html-response branch 3 times, most recently from 1ac3969 to 94c68c4 Compare February 4, 2026 14:22
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In `@src/lingodotdev/engine.py`:
- Around line 84-113: Update LingoDotDevEngine._safe_parse_json to also catch
UnicodeDecodeError in addition to json.JSONDecodeError, use
LingoDotDevEngine._truncate_response to build the preview, and raise the same
RuntimeError message while chaining the original exception (using "from e") so
the UnicodeDecodeError context is preserved; keep the error text/format
identical but ensure the except clause handles both exception types and
references the original exception variable when raising.

@Hellnight2005 Hellnight2005 force-pushed the fix/python-sdk-502-html-response branch from 94c68c4 to 99957f3 Compare February 4, 2026 14:32
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Fix all issues with AI agents
In `@src/lingodotdev/engine.py`:
- Around line 228-231: There's a duplicated "raise RuntimeError(" token causing
a SyntaxError in the response error-handling block; remove the extra "raise
RuntimeError(" so only one raise statement remains and ensure the formatted
message uses the existing response, response.status_code, response.reason_phrase
and error_details variables (the string should be a single f-string passed to
raise RuntimeError). Locate the error block that references
response.status_code, response.reason_phrase and error_details and delete the
stray/duplicate raise so the exception is raised exactly once with the combined
message.
- Around line 483-486: Remove the duplicated "raise RuntimeError(" line in the
error handling block so the code contains a single raise statement that
constructs the full error message; update the remaining raise to be a single
call like raise RuntimeError(f"Server error ({response.status_code}):
{response.reason_phrase}.{error_details} This may be due to temporary service
issues.") ensuring the parentheses and f-string concatenation are correct
(locate the block around the existing duplicate raise in the response/error
handling code).
🧹 Nitpick comments (1)
src/lingodotdev/engine.py (1)

518-529: Consider extracting 5xx error formatting into a helper.

The error-details parsing + message formatting logic is now duplicated across _localize_chunk, recognize_locale, and whoami. A small helper (e.g., _format_server_error(response)) would reduce repetition and keep behavior consistent.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
src/lingodotdev/engine.py (1)

222-246: ⚠️ Potential issue | 🟠 Major

Guard error-path response preview decoding to avoid unhandled UnicodeDecodeError.

response.text at line 223 (and 478) is accessed immediately after checking response.is_success, before the status code branches. If the response bytes cannot be decoded (due to invalid or mislabeled encoding), UnicodeDecodeError will escape and bypass the error handling logic, preventing proper error reporting to the user. The codebase already handles this pattern in _safe_parse_json (lines 105–114) by catching and decoding with errors="replace".

Defer computing response_preview until needed within the 400/other-status branches, and use the same fallback decoding pattern applied in _safe_parse_json.

🔧 Proposed fix
@@
-            if not response.is_success:
-                response_preview = self._truncate_response(response.text)
-                if 500 <= response.status_code < 600:
+            if not response.is_success:
+                if 500 <= response.status_code < 600:
                     error_details = ""
                     try:
                         error_json = response.json()
                         if isinstance(error_json, dict) and "error" in error_json:
                             error_details = f" {error_json['error']}"
                     except Exception:
                         pass
@@
                     raise RuntimeError(
                         f"Server error ({response.status_code}): {response.reason_phrase}.{error_details} "
                         "This may be due to temporary service issues."
                     )
-                elif response.status_code == 400:
+                try:
+                    response_text = response.text
+                except UnicodeDecodeError:
+                    response_text = response.content.decode("utf-8", errors="replace")
+                response_preview = self._truncate_response(response_text)
+                elif response.status_code == 400:
                     raise ValueError(
                         f"Invalid request ({response.status_code}): {response.reason_phrase}. "
                         f"Response: {response_preview}"
                     )
@@
-            if not response.is_success:
-                response_preview = self._truncate_response(response.text)
-                if 500 <= response.status_code < 600:
+            if not response.is_success:
+                if 500 <= response.status_code < 600:
                     error_details = ""
                     try:
                         error_json = response.json()
                         if isinstance(error_json, dict) and "error" in error_json:
                             error_details = f" {error_json['error']}"
                     except Exception:
                         pass
@@
                     raise RuntimeError(
                         f"Server error ({response.status_code}): {response.reason_phrase}.{error_details} "
                         "This may be due to temporary service issues."
                     )
+                try:
+                    response_text = response.text
+                except UnicodeDecodeError:
+                    response_text = response.content.decode("utf-8", errors="replace")
+                response_preview = self._truncate_response(response_text)
                 raise RuntimeError(
                     f"Error recognizing locale ({response.status_code}): {response.reason_phrase}. "
                     f"Response: {response_preview}"
                 )

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant