Add micro batching and enpoints for v1 list_models and get_model by baixiac · Pull Request #41 · CogStack/CogStack-ModelServe

baixiac · 2026-03-06T17:08:05Z

feat: add enpoints for v1 models and list models
feat: add micro batching and lower CPU usage during model loading
feat: ensure the pad token for generative models
feat: use the async streamer during async generation
feat: apply timeout to text generation
fix: fix the property name for stop sequences in OpenAI requests

feat: add micro batching and lower CPU usage during model loading feat: ensure the pad token for generative models feat: use the async streamer during async generation feat: apply timeout to text generation fix: fix the property name for stop sequences in OpenAI requests

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add micro batching and enpoints for v1 list_models and get_model#41

Add micro batching and enpoints for v1 list_models and get_model#41
baixiac wants to merge 1 commit intomainfrom
llm-gen2

baixiac commented Mar 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

baixiac commented Mar 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant