Skip to content

Conversation

@fcogidi
Copy link
Collaborator

@fcogidi fcogidi commented Jan 26, 2026

Summary

This PR converts utility modules/scripts into a package (aieng-agents-utils).

Type of Change

  • 🐛 Bug fix (non-breaking change that fixes an issue)
  • ✨ New feature (non-breaking change that adds functionality)
  • 💥 Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • 📝 Documentation update
  • 🔧 Refactoring (no functional changes)
  • ⚡ Performance improvement
  • 🧪 Test improvements
  • 🔒 Security fix

Changes Made

New Utility Package (aieng-agents-utils)

  • Created a Python package with agent development utilities
  • Agent Tools Module: Code interpreter (E2B), Gemini grounding with Google Search, Weaviate knowledge base, Wikipedia news events fetcher
  • Data Processing: PDF to HuggingFace dataset converter with OCR, dataset chunking utilities, unified dataset loader
  • Async Utilities: Progress bars for async operations, rate limiting, concurrent task management
  • Client Management: Lifecycle manager for async clients (OpenAI, Weaviate) with proper cleanup
  • Gradio Integration: Message format converters between Gradio chatbot and OpenAI SDK
  • Langfuse Integration: OpenTelemetry-based tracing and observability setup
  • Environment Config: Type-safe Pydantic-based configuration management
  • Session Management: SQLite-backed persistent conversation sessions

CLI Tools

  • pdf_to_hf_dataset: Console script for converting PDFs to chunked HuggingFace datasets
    • Multimodal OCR using OpenAI-compatible APIs
    • Smart page filtering (TOC detection, front/back matter skipping)
    • Structured output support with heading detection
    • Token-aware chunking for embedding models
  • chunk_hf_dataset: Console script for re-chunking existing HuggingFace datasets

API Key Management

  • Create user-scoped API keys for the Gemini grounding proxy
  • Delete API keys by owner with filtering options
  • Support for batch operations via --owners-file
  • JSON and CSV output formats
  • Firestore-backed storage with metadata support
  • Usage limits and tracking per key

Documentation

  • Added README with installation instructions (uv/pip)

Package Configuration

  • Proper pyproject.toml with build system (hatchling)
  • Console script entry points for CLI tools
  • Complete dependency specifications
  • MIT license
  • Author information and project metadata

Testing

  • Tests pass locally (uv run pytest tests/)
  • Linting passes (uv run ruff check src_dir/)
  • Manual testing performed (describe below)

Manual testing details:

  1. Package structure: Verified all modules are properly importable
  2. Reference implementations: Manually ran all reference implementations to verify they all still work correctly
  3. Tests: Verified that all tests still pass
  4. CLI scripts: Verified that CLI scripts work

Checklist

  • Code follows the project's style guidelines
  • Self-review of code completed
  • Documentation updated
  • No sensitive information (API keys, credentials) exposed

@fcogidi fcogidi requested a review from amrit110 January 26, 2026 18:59
@fcogidi fcogidi self-assigned this Jan 26, 2026
@amrit110
Copy link
Member

@fcogidi great initiative! Was super happy to see this PR. Will review it shortly.

@fcogidi fcogidi requested a review from Copilot January 26, 2026 19:08
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR refactors the codebase by extracting utility modules into a separate installable package (aieng-agents-utils), improving code organization and reusability across the Vector Institute Agent Bootcamp implementations.

Changes:

  • Created a new Python package structure with proper build configuration, console scripts, and comprehensive documentation
  • Migrated all utility modules (tools, data processing, async utilities, Langfuse integration, etc.) to the new package namespace
  • Updated all import statements across reference implementations to use the new package structure

Reviewed changes

Copilot reviewed 63 out of 77 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
pyproject.toml Updated project metadata and dependencies to use the new workspace package
aieng-agents-utils/pyproject.toml Added package configuration with build system, dependencies, and console scripts
aieng-agents-utils/README.md Created comprehensive documentation for the new utility package
aieng-agents-utils/aieng/agents/init.py Created main package entry point with public API exports
aieng-agents-utils/aieng/agents/tools/init.py Organized tool exports with proper all definitions
aieng-agents-utils/aieng/agents/prompts.py Consolidated system prompts into centralized module
src//app.py, src//cli.py Updated imports to reference new package namespace
tests/tool_tests/test_integration.py Updated test imports to use new package structure
aieng-agents-utils/tests/README.md Added test documentation with updated command paths

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants