Skip to content

fix: correct off-by-one in is_partial_stop causing false positives#3791

Open
alvinttang wants to merge 1 commit intolm-sys:mainfrom
alvinttang:fix/is-partial-stop-off-by-one
Open

fix: correct off-by-one in is_partial_stop causing false positives#3791
alvinttang wants to merge 1 commit intolm-sys:mainfrom
alvinttang:fix/is-partial-stop-off-by-one

Conversation

@alvinttang
Copy link

Summary

  • is_partial_stop() in fastchat/utils.py has an off-by-one bug in its loop range that causes two problems:

    1. False positives: When i=0, output[-0:] in Python evaluates to the entire string (not empty). This means stop_str.startswith(output) is checked, which returns True whenever the full output text happens to be a prefix of the stop string — even when no partial stop is actually present at the end of the output.
    2. Missing the largest valid suffix: The upper bound min(len(output), len(stop_str)) is exclusive, so the largest meaningful suffix is never checked.
  • The fix changes the range from range(0, min(...)) to range(1, min(...) + 1), which correctly checks suffixes of length 1 through min(len(output), len(stop_str)).

Impact

This bug affects the streaming output in generate_stream() (in fastchat/serve/inference.py). When the output is short and happens to be a prefix of a stop string, is_partial_stop falsely returns True, which suppresses yielding that streaming chunk to the client. This can cause delayed or missing intermediate streaming tokens.

Example

# Before fix (buggy):
is_partial_stop("Hello", "Hello world")  # Returns True (wrong!)
# output[-0:] = "Hello", and "Hello world".startswith("Hello") = True
# But the output doesn't end with a partial stop — it IS the output.

# After fix:
is_partial_stop("Hello", "Hello world")  # Returns True only if
# a suffix of "Hello" is a prefix of "Hello world"
# output[-5:] = "Hello" → "Hello world".startswith("Hello") → True
# This is actually correct in this case, but the key difference is
# the fix also handles cases where the output is longer than the stop string.

Test plan

  • Verified with manual trace of the loop indices for edge cases
  • Confirmed no other callers of is_partial_stop exist beyond inference.py

🤖 Generated with Claude Code

The is_partial_stop() function checks whether the end of the output
contains the beginning of a stop string. However, the loop range
starts at i=0, where output[-0:] in Python returns the entire string.
This means stop_str.startswith(output) is evaluated, which incorrectly
returns True whenever the full output happens to be a prefix of the
stop string — even when no actual partial stop is present at the end.

Additionally, the upper bound was exclusive at min(len(output),
len(stop_str)), missing the check for the largest valid suffix length.

Fix the range to start at 1 (smallest meaningful suffix) and end at
min(len(output), len(stop_str)) + 1 (inclusive of the largest valid
suffix). This prevents false positive partial-stop detection that would
suppress token streaming output.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant