Skip to content

Add GEPA Optimize_Anything skill eval and optimize Capability for Skill evaluation#166

Draft
auschoi96 wants to merge 24 commits intodatabricks-solutions:mainfrom
auschoi96:main
Draft

Add GEPA Optimize_Anything skill eval and optimize Capability for Skill evaluation#166
auschoi96 wants to merge 24 commits intodatabricks-solutions:mainfrom
auschoi96:main

Conversation

@auschoi96
Copy link
Collaborator

Addition to the .test file to support GEPA optimization and skill evaluation. Added a README to explain how to use this

@auschoi96
Copy link
Collaborator Author

a significant portion of the lines from this comes from the generated yaml files, manifest, ground_truth and candidates

auschoi96 and others added 16 commits February 26, 2026 14:14
Add 7-lakehouse-monitoring.md reference file covering quality monitors,
profile types (Snapshot, TimeSeries, InferenceLog), MCP tool usage, and
Python SDK examples. Update SKILL.md with trigger condition and reference
table entry.

Tested against a live Databricks workspace - created and verified a
snapshot monitor on a Unity Catalog table.
- SKILL.md: updated trigger bullet and reference table to data profiling
- Renamed 7-lakehouse-monitoring.md to 7-data-profiling.md with new
  w.data_quality SDK examples
- Added new Data Quality docs and SDK references, kept legacy Lakehouse
  Monitoring SDK link for backward compatibility
…ently. This is because tools are used universally so we may not be able to optimize the two together
@auschoi96
Copy link
Collaborator Author

This PR touches many files, but most are data, not code:

  • ~30 ground_truth.yaml + manifest.yaml files = test datasets for 16 skills (evaluation data, not code)
  • ~12 source files under src/skill_test/optimize/ = the actual framework
  • ~5 scripts = CLI entry points and test case generation utilities

For a full explanation of what's happening, check out the read me in .test/README.md

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants