feat: add ray_serve as llm provider by ChenZiHong-Gavin · Pull Request #178 · InternScience/GraphGen

ChenZiHong-Gavin · 2026-02-06T10:54:15Z

No description provided.

gemini-code-assist · 2026-02-06T10:54:32Z

Summary of Changes

Hello @ChenZiHong-Gavin, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request integrates Ray Serve as a new backend option for managing and serving Large Language Models. This enhancement provides a flexible and scalable infrastructure for deploying various LLM implementations, allowing for distributed inference and improved resource utilization.

Highlights

New LLM Provider: Added 'ray_serve' as a selectable backend option for initializing Large Language Models.
Ray Serve Client Integration: Introduced RayServeClient to enable interaction with Ray Serve deployments, allowing connection to existing services or dynamic deployment of new ones.
LLM Deployment Wrapper for Ray Serve: Created LLMDeployment, a Ray Serve deployment class that wraps existing LLM backends (vLLM, HuggingFace, SGLang) and exposes their generation capabilities via a web API.

Changelog

graphgen/common/init_llm.py
- Added a new conditional branch to instantiate RayServeClient when the specified backend is 'ray_serve'.
graphgen/models/llm/api/ray_serve_client.py
- New file defining the RayServeClient class.
- Provides functionality to obtain a handle to an existing Ray Serve application or deployment.
- Includes logic to deploy a new Ray Serve application if a serve_backend is provided and no existing handle is found.
- Implements asynchronous methods (generate_answer, generate_topk_per_token, generate_inputs_prob) to interact with the deployed LLM service.
graphgen/models/llm/local/ray_serve_deployment.py
- New file defining the LLMDeployment class, decorated as a Ray Serve deployment.
- Initializes an internal LLM instance based on the provided backend (e.g., 'vllm', 'huggingface', 'sglang').
- Exposes the LLM's generation methods (generate_answer, generate_topk_per_token, generate_inputs_prob) for remote calls.
- Includes an __call__ method to handle incoming HTTP requests, parsing method and arguments to invoke the appropriate LLM function.
- Provides an app_builder function to facilitate command-line deployment using serve run.

Activity

No human activity (comments, reviews, etc.) has been recorded for this pull request yet.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

graphgen/models/llm/api/ray_serve_client.py

+        if app_name:
+            try:
+                self.handle = serve.get_app_handle(app_name)
+            except Exception:


graphgen/models/llm/api/ray_serve_client.py

+        elif deployment_name:
+            try:
+                self.handle = serve.get_deployment(deployment_name).get_handle()
+            except Exception:


gemini-code-assist

Code Review

This pull request successfully integrates Ray Serve as a new LLM provider, enhancing the system's ability to leverage distributed LLM deployments. However, it introduces several security concerns, primarily related to the exposure of an unauthenticated HTTP endpoint. The LLMDeployment class lacks authentication, exposes verbose error messages, and passes untrusted input to LLM wrappers that are vulnerable to prompt injection. Additionally, there are areas for improvement related to general error handling, code organization, and potential blocking operations in an asynchronous context.

gemini-code-assist · 2026-02-06T10:57:59Z

graphgen/models/llm/local/ray_serve_deployment.py

+    ) -> List[Token]:
+        return await self.llm_instance.generate_inputs_prob(text, history, **extra)
+
+    async def __call__(self, request: Request) -> Dict:


The __call__ method, which serves as the HTTP entry point for the Ray Serve deployment, lacks any authentication or authorization checks. This allows any user with network access to the Ray Serve port to execute LLM queries, potentially leading to unauthorized resource consumption and abuse of the service.

gemini-code-assist · 2026-02-06T10:57:59Z

graphgen/models/llm/local/ray_serve_deployment.py

+            kwargs = data.get("kwargs", {})
+
+            if method == "generate_answer":
+                result = await self.generate_answer(text, history, **kwargs)


The __call__ method passes untrusted user input (text and history) directly to the LLM's generate_answer method. Several LLM backends in this repository (e.g., HuggingFaceWrapper, SGLangWrapper) use manual string concatenation to build prompts, making them highly susceptible to prompt injection attacks. An attacker could provide crafted input to manipulate the LLM's behavior or spoof conversation history.

gemini-code-assist · 2026-02-06T10:57:59Z

graphgen/models/llm/api/ray_serve_client.py

+            except Exception:
+                pass


Catching a generic Exception can hide unexpected errors and make debugging difficult. It's better to catch more specific exceptions that you anticipate, or at least log the full traceback if a generic Exception must be caught.

gemini-code-assist · 2026-02-06T10:57:59Z

graphgen/models/llm/api/ray_serve_client.py

+                self.handle = serve.get_deployment(deployment_name).get_handle()
+            except Exception:


Similar to the previous comment, catching a generic Exception here can obscure the root cause of issues. Consider catching more specific exceptions related to serve.get_deployment or get_handle failures.

gemini-code-assist · 2026-02-06T10:57:59Z

graphgen/models/llm/api/ray_serve_client.py

+                deployment = LLMDeployment.bind(backend=serve_backend, config=kwargs)
+                serve.run(deployment, name=app_name, route_prefix=f"/{app_name}")


serve.run() is a blocking call by default. If RayServeClient is intended to be used within an asynchronous application, this call will block the event loop, potentially causing performance issues or deadlocks. Consider if the Ray Serve application should be deployed out-of-band (e.g., as a separate script) or if serve.start() and serve.run_app() should be used in a non-blocking manner if the client is responsible for its lifecycle within an async context.

gemini-code-assist · 2026-02-06T10:57:59Z

graphgen/models/llm/local/ray_serve_deployment.py

+
+            return {"result": result}
+        except Exception as e:
+            return {"error": str(e)}


The current implementation exposes sensitive internal information by returning raw exception details, which could be exploited by attackers to gain deeper insights into the system's architecture and vulnerabilities. It's crucial to prevent the leakage of stack traces, file paths, or configuration details. Instead, catch specific exceptions and return generalized error messages, logging full details internally without exposing them to the client.

except Exception as e: # Log the error internally for debugging # import logging # logging.exception("Error in LLMDeployment") return {"error": "An internal error occurred while processing your request."}

gemini-code-assist · 2026-02-06T10:58:00Z

graphgen/models/llm/api/ray_serve_client.py

+                    import uuid
+
+                    app_name = f"llm_app_{serve_backend}_{uuid.uuid4().hex[:8]}"


Imports should generally be placed at the top of the file (PEP 8 guideline). Moving import uuid to the top of the file improves readability and ensures consistency, even if Python allows conditional imports.

import uuid app_name = f"llm_app_{serve_backend}_{uuid.uuid4().hex[:8]}"

gemini-code-assist · 2026-02-06T10:58:00Z

graphgen/models/llm/api/ray_serve_client.py

+                # Filter kwargs to avoid passing unrelated args if necessary,
+                # but LLMDeployment config accepts everything for now.
+                # Note: We need to pass kwargs as the config dict.


The comment indicates a potential issue with kwargs being passed directly. It's safer and clearer to explicitly construct a config dictionary with only the parameters relevant to LLMDeployment to avoid passing unintended arguments. This also makes the code more robust to future changes in LLMDeployment's constructor.

# Explicitly construct config for LLMDeployment deployment_config = {"model": kwargs.get("model"), "tokenizer": kwargs.get("tokenizer")} # Add other relevant config parameters deployment = LLMDeployment.bind(backend=serve_backend, config=deployment_config)

gemini-code-assist · 2026-02-06T10:58:00Z

graphgen/models/llm/local/ray_serve_deployment.py

+        tokenizer_model = os.environ.get("TOKENIZER_MODEL", "cl100k_base")
+        if "tokenizer" not in config:
+            tokenizer = Tokenizer(model_name=tokenizer_model)
+            config["tokenizer"] = tokenizer


The tokenizer initialization logic is duplicated here and in LLMServiceActor. To improve maintainability and avoid redundancy, consider extracting this logic into a shared utility function or a common base class if applicable. This ensures consistent tokenizer handling across different LLM wrappers.

gemini-code-assist · 2026-02-06T10:58:00Z

graphgen/models/llm/local/ray_serve_deployment.py

+    Usage: serve run graphgen.models.llm.local.ray_serve_deployment:app_builder backend=vllm model=...
+    """
+    # args comes from the command line key=value pairs
+    backend = args.pop("backend", "vllm")


The app_builder function defaults backend to "vllm" if not provided. This default might not align with the user's expectation or the LLMDeployment's intended behavior if other backends are more common or desired as a default. Consider making the backend explicit or ensuring the default is well-documented and consistent with the overall system design.

feat: add ray_serve as llm provider

c2bc691

github-actions bot added the core label Feb 6, 2026

github-code-quality bot found potential problems Feb 6, 2026

View reviewed changes

graphgen/models/llm/api/ray_serve_client.py

if app_name:

try:

self.handle = serve.get_app_handle(app_name)

except Exception:

graphgen/models/llm/api/ray_serve_client.py

elif deployment_name:

try:

self.handle = serve.get_deployment(deployment_name).get_handle()

except Exception:

gemini-code-assist bot reviewed Feb 6, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add ray_serve as llm provider#178

feat: add ray_serve as llm provider#178
ChenZiHong-Gavin wants to merge 1 commit intomainfrom
feat/add-ray-serve

ChenZiHong-Gavin commented Feb 6, 2026

Uh oh!

gemini-code-assist bot commented Feb 6, 2026

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Feb 6, 2026

Uh oh!

gemini-code-assist bot Feb 6, 2026

Uh oh!

gemini-code-assist bot Feb 6, 2026

Uh oh!

gemini-code-assist bot Feb 6, 2026

Uh oh!

gemini-code-assist bot Feb 6, 2026

Uh oh!

gemini-code-assist bot Feb 6, 2026

Uh oh!

gemini-code-assist bot Feb 6, 2026

Uh oh!

gemini-code-assist bot Feb 6, 2026

Uh oh!

gemini-code-assist bot Feb 6, 2026

Uh oh!

gemini-code-assist bot Feb 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

		self.handle = serve.get_deployment(deployment_name).get_handle()
		except Exception:

		deployment = LLMDeployment.bind(backend=serve_backend, config=kwargs)
		serve.run(deployment, name=app_name, route_prefix=f"/{app_name}")

		import uuid

		app_name = f"llm_app_{serve_backend}_{uuid.uuid4().hex[:8]}"

Conversation

ChenZiHong-Gavin commented Feb 6, 2026

Uh oh!

gemini-code-assist bot commented Feb 6, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Feb 6, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Feb 6, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Feb 6, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Feb 6, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Feb 6, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Feb 6, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Feb 6, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Feb 6, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Feb 6, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Feb 6, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant