Context based response generation workflow by nuwangeek · Pull Request #327 · buerokratt/LLM-Module

nuwangeek · 2026-03-09T03:51:56Z

No description provided.

Get update from wip into llm-316

Get update from llm-316

Intent enrichment pipeline (buerokratt#319)

get update from wip into llm-304

Service layer validation in tool classifier (buerokratt#321)

Get update from wip

Pulling changes from BYK wip to LLM-Module WIP

Get update from wip into optimization/data-enrichment

…mance improvement

Get update from optimization/data-enrichment into optimization/vector-indexer

Optimize intent data enrichment and service classification (buerokratt#325)

Optimize first user query response generation time (buerokratt#326)

Copilot

Pull request overview

Implements a Layer-2 “Context” workflow to answer greetings and conversation-history follow-ups without invoking RAG, plus updates streaming/rate-limiting behavior to support the new routing flow.

Changes:

Add ContextAnalyzer + ContextWorkflowExecutor with two-phase detection→generation (incl. streaming via NeMo Guardrails) and multilingual greeting fallbacks.
Adjust RAG streaming wrapper to return an async iterator from a coroutine (to match await workflow.execute_streaming(...)) and allow reusing pre-initialized components from context.
Replace token-bucket rate limiting with a sliding-window token limiter (tokens/minute) and update related configuration.

Reviewed changes

Copilot reviewed 12 out of 12 changed files in this pull request and generated 8 comments.

Show a summary per file

File	Description
src/utils/rate_limiter.py	Switch token limiting to sliding-window history; add configurable token window.
src/tool_classifier/workflows/service_workflow.py	Extend `LLMServiceProtocol` with component init + output guardrails hooks.
src/tool_classifier/workflows/rag_workflow.py	Reuse components from context; make `execute_streaming` a coroutine returning an async iterator.
src/tool_classifier/workflows/context_workflow.py	Implement non-streaming + streaming Context workflow with cost tracking and guardrails integration.
src/tool_classifier/greeting_constants.py	Add ET/EN static greeting message mappings + helper.
src/tool_classifier/context_analyzer.py	Add DSPy-based Phase 1 detection + Phase 2 generation/streaming utilities.
src/tool_classifier/constants.py	Tune dense similarity thresholds for service routing.
src/tool_classifier/classifier.py	Wire `orchestration_service` into Context workflow; minor exception chaining updates.
src/llm_orchestrator_config/stream_config.py	Update rate-limit knobs to requests/min + tokens/min + token window size.
src/llm_orchestration_service_api.py	Initialize rate limiter using tokens/min configuration.
src/llm_orchestration_service.py	Add timing metrics around classifier classify/route steps in streaming flow.
docs/CONTEXT_WORKFLOW_GREETING_DETECTION.md	Document the new Context workflow and two-phase detection/generation design.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

src/tool_classifier/workflows/context_workflow.py

src/tool_classifier/workflows/rag_workflow.py

docs/CONTEXT_WORKFLOW_GREETING_DETECTION.md

Copilot · 2026-03-09T03:58:29Z

docs/CONTEXT_WORKFLOW_GREETING_DETECTION.md

+| `tests/test_context_analyzer.py` | Unit tests for `ContextAnalyzer` |
+| `tests/test_context_workflow.py` | Unit tests for `ContextWorkflowExecutor` |
+| `tests/test_context_workflow_integration.py` | Integration tests for the full classify → route → execute chain |


The file reference lists tests/test_context_analyzer.py, tests/test_context_workflow.py, and tests/test_context_workflow_integration.py, but these test files don’t exist in the repository (currently only tests/test_query_validator.py, integration tests, etc.). Either add the referenced tests or update this section so it reflects the actual test suite.

Suggested change

| `tests/test_context_analyzer.py` | Unit tests for `ContextAnalyzer` |

| `tests/test_context_workflow.py` | Unit tests for `ContextWorkflowExecutor` |

| `tests/test_context_workflow_integration.py` | Integration tests for the full classify → route → execute chain |

| `tests/test_query_validator.py` | Unit tests for query validation and classifier request handling |

| `integration_tests/` | Integration tests for the full classify → route → execute chain |

| `integration_tests/` | Additional end-to-end tests covering context workflow behavior within the pipeline |

Copilot · 2026-03-09T03:58:29Z

src/tool_classifier/workflows/context_workflow.py

+        logger.info(
+            f"[{request.chatId}] CONTEXT WORKFLOW (NON-STREAMING) | "
+            f"Query: '{request.message[:100]}'"
        )
+        costs_metric: Dict[str, Dict[str, Any]] = {}
+        if time_metric is None:
+            time_metric = {}
+
+        language = detect_language(request.message)
+        history = self._build_history(request)

-        # TODO: Implement context workflow logic here
-        # For now, return None to trigger fallback to next layer (RAG)
+        detection_result = await self._detect(
+            request.message, history, time_metric, costs_metric
+        )
+        if detection_result is None:
+            self._log_costs(costs_metric)
+            return None
+
+        logger.info(
+            f"[{request.chatId}] Detection: greeting={detection_result.is_greeting} "
+            f"can_answer={detection_result.can_answer_from_context}"
+        )
+
+        if detection_result.is_greeting:
+            from src.tool_classifier.greeting_constants import get_greeting_response
+
+            greeting = get_greeting_response(language=language)
+            self._log_costs(costs_metric)
+            return OrchestrationResponse(
+                chatId=request.chatId,
+                llmServiceActive=True,
+                questionOutOfLLMScope=False,
+                inputGuardFailed=False,
+                content=greeting,
+            )
+
+        if (
+            detection_result.can_answer_from_context
+            and detection_result.context_snippet
+        ):
+            return await self._generate_response_async(
+                request, detection_result.context_snippet, time_metric, costs_metric
+            )
+
+        logger.warning(
+            f"[{request.chatId}] Cannot answer from context — falling back to RAG"
+        )
+        self._log_costs(costs_metric)
        return None


This PR introduces substantial new behavior (two-phase context detection + greeting handling + guarded streaming). There are no corresponding tests in the repo today for ContextWorkflowExecutor/ContextAnalyzer; please add unit/integration tests covering: greeting vs context fallback decisions, streaming yields (including END), and guardrail-violation termination behavior.

Copilot · 2026-03-09T03:58:29Z

src/utils/rate_limiter.py

+                retry_after = (
+                    int(oldest_timestamp + self.token_window_seconds - current_time) + 1
+                )
+            else:
+                retry_after = 1


The token limiting algorithm was changed from a token bucket to a sliding-window history with a configurable window. Please add/extend tests to cover token-window pruning, retry_after calculation, and the interaction between token_window_seconds and the configured token limit.

Suggested change

retry_after = (

int(oldest_timestamp + self.token_window_seconds - current_time) + 1

)

else:

retry_after = 1

# Time until the oldest entry falls out of the window

time_until_window_moves = oldest_timestamp + self.token_window_seconds - current_time

retry_after = max(1, int(time_until_window_moves) + 1)

else:

# Single request exceeds the limit; tie retry_after to the configured window

retry_after = max(1, int(self.token_window_seconds))

Copilot · 2026-03-09T03:58:30Z

src/utils/rate_limiter.py

        requests_per_minute: int = StreamConfig.RATE_LIMIT_REQUESTS_PER_MINUTE,
-        tokens_per_second: int = StreamConfig.RATE_LIMIT_TOKENS_PER_SECOND,
+        tokens_per_minute: int = StreamConfig.RATE_LIMIT_TOKENS_PER_MINUTE,
        cleanup_interval: int = StreamConfig.RATE_LIMIT_CLEANUP_INTERVAL,
+        token_window_seconds: int = StreamConfig.RATE_LIMIT_TOKEN_WINDOW_SECONDS,
    ):
        """
        Initialize rate limiter.

        Args:
            requests_per_minute: Maximum requests per user per minute (sliding window)
-            tokens_per_second: Maximum tokens per user per second (token bucket)
+            tokens_per_minute: Maximum tokens per user per minute (sliding window)
            cleanup_interval: Seconds between automatic cleanup of old entries
+            token_window_seconds: Sliding window size in seconds for token tracking
        """
        self.requests_per_minute = requests_per_minute
-        self.tokens_per_second = tokens_per_second
+        self.tokens_per_minute = tokens_per_minute
        self.cleanup_interval = cleanup_interval
+        self.token_window_seconds = token_window_seconds


tokens_per_minute is enforced over token_window_seconds, but the code compares usage-in-window directly against the per-minute limit without scaling. Either constrain token_window_seconds to 60s (and validate it) or rename/redefine the limit as tokens_per_window (or scale tokens_per_minute * token_window_seconds / 60).

nuwangeek and others added 30 commits February 20, 2026 16:06

Merge pull request #122 from rootcodelabs/wip

3020e31

Get update from wip into llm-316

remove unwanted file

6e5c22c

updated changes

38d0533

fixed requested changes

72b8ae1

fixed issue

9b7bc7b

Merge pull request #123 from rootcodelabs/llm-316

46dd6c4

Get update from llm-316

Merge pull request #124 from buerokratt/wip

068f4e0

Intent enrichment pipeline (buerokratt#319)

service workflow implementation without calling service endpoints

a2084e5

Merge pull request #126 from rootcodelabs/wip

5216c09

get update from wip into llm-304

fixed requested changes

864ad30

fixed issues

25f9614

protocol related requested changes

69c1279

fixed requested changes

07f2e0f

update time tracking

f63f777

added time tracking and reloacate input guardrail before toolclassifiier

5429bc0

fixed issue

721263a

Merge pull request #127 from buerokratt/wip

6ed02d1

Service layer validation in tool classifier (buerokratt#321)

Merge branch 'optimization/llm-304' into wip

7238baa

Merge pull request #128 from rootcodelabs/wip

ae7cfa0

Get update from wip

fixed issue

f8a82b6

added hybrid search for the service detection

3b89fba

update tool classifier

789f062

fixing merge conflicts

609e6d5

Merge pull request #129 from buerokratt/wip

a30c52d

Pulling changes from BYK wip to LLM-Module WIP

Merge pull request #130 from rootcodelabs/wip

8dfc155

Get update from wip into optimization/data-enrichment

updated intent data enrichment and service classification flow perfor…

3d7fb85

…mance improvement

fixed issue

bee9fbf

Merge pull request #131 from rootcodelabs/optimization/data-enrichment

4888045

Get update from optimization/data-enrichment into optimization/vector-indexer

optimize first user query response generation time

0a0806f

fixed pr reviewed issues

1eb8b47

nuwangeek and others added 4 commits March 3, 2026 15:09

Merge pull request #132 from buerokratt/wip

94b4f39

Optimize intent data enrichment and service classification (buerokratt#325)

Merge branch 'optimization/vector-indexer' into wip

82b3fe5

Merge pull request #134 from buerokratt/wip

1b4ada9

Optimize first user query response generation time (buerokratt#326)

context based response generation flow

9ce1da2

nuwangeek requested a review from Copilot March 9, 2026 03:52

Copilot started reviewing on behalf of nuwangeek March 9, 2026 03:52 View session

Copilot AI reviewed Mar 9, 2026

View reviewed changes

fixed pr review suggested issues

d647f86

nuwangeek requested a review from Thirunayan22 March 9, 2026 04:31

fixed issues

d3e1494

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Context based response generation workflow#327

Context based response generation workflow#327
nuwangeek wants to merge 36 commits intobuerokratt:wipfrom
rootcodelabs:llm-310

nuwangeek commented Mar 9, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI Mar 9, 2026

Uh oh!

Copilot AI Mar 9, 2026

Uh oh!

Copilot AI Mar 9, 2026

Uh oh!

Copilot AI Mar 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

-                retry_after = (
-                    int(oldest_timestamp + self.token_window_seconds - current_time) + 1
-                )
-            else:
-                retry_after = 1
+                # Time until the oldest entry falls out of the window
+                time_until_window_moves = oldest_timestamp + self.token_window_seconds - current_time
+                retry_after = max(1, int(time_until_window_moves) + 1)
+            else:
+                # Single request exceeds the limit; tie retry_after to the configured window
+                retry_after = max(1, int(self.token_window_seconds))

Conversation

nuwangeek commented Mar 9, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI Mar 9, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 9, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 9, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 9, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants