-
Notifications
You must be signed in to change notification settings - Fork 379
Add autotuner for benchmark parameter optimization with path-based configuration #736
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Co-authored-by: harsha-simhadri <5590673+harsha-simhadri@users.noreply.github.com>
Co-authored-by: harsha-simhadri <5590673+harsha-simhadri@users.noreply.github.com>
|
@copilot this design is not robust to changes to the benchmark framework. If the benchmark framework and params are changed, the approach in this PR can be stale and eventually flaky. Is there a different strategy that would help this stay in sync with the benchmark framework |
You're absolutely right. The current implementation hardcodes JSON paths like I'll refactor to use a JSONPath-based override mechanism that:
Working on the refactor now.
Refactored to use a path-based configuration system in commit ab17601. The new design addresses your concern by:
Example sweep configuration: {
"parameters": [
{"path": "jobs.0.content.source.max_degree", "values": [16, 32, 64]},
{"path": "jobs.0.content.source.l_build", "values": [50, 75, 100]}
]
}See updated AUTOTUNER.md for full documentation of the path-based approach. |
The new design addresses maintainability concerns by: - Using JSON paths instead of hardcoded structure assumptions - Making the tool adaptable to any benchmark configuration format - Allowing users to specify which parameters to sweep via paths - Staying in sync with benchmark framework changes automatically Users now specify parameters as path-value pairs (e.g., "jobs.0.content.source.max_degree": [16, 32, 64]) instead of relying on fixed parameter names. This makes the tool robust to changes in the benchmark framework's JSON schema. Co-authored-by: harsha-simhadri <5590673+harsha-simhadri@users.noreply.github.com>
Co-authored-by: harsha-simhadri <5590673+harsha-simhadri@users.noreply.github.com>
Co-authored-by: harsha-simhadri <5590673+harsha-simhadri@users.noreply.github.com>
The benchmark framework measures QPS/latency/recall for given configurations but lacks automated parameter tuning. This adds a tool to sweep parameters and identify optimal configurations.
Implementation
New binary:
diskann-tools/bin/autotuner"jobs.0.content.source.max_degree")CLI commands:
Path-based configuration format:
{ "parameters": [ {"path": "jobs.0.content.source.max_degree", "values": [16, 32, 64]}, {"path": "jobs.0.content.source.l_build", "values": [50, 75, 100]}, {"path": "jobs.0.content.search_phase.runs.0.search_l", "values": [[10, 20, 30], [20, 40, 60]]} ] }Design benefits:
Changes
diskann-tools/src/bin/autotuner.rs- main implementation with path-based logic and 6 testsdiskann-tools/AUTOTUNER.md- comprehensive documentation of path-based approachdiskann-benchmark/README.md- reference autotunerdiskann-tools/Cargo.toml- add serde_json dependencyAddressing Review Feedback
The initial implementation used hardcoded JSON paths which would break if the benchmark framework changed. The tool has been refactored to use a path-based configuration system where users specify JSON paths in their sweep configuration. This makes the tool robust to changes in the benchmark framework - if the schema changes, users only need to update their sweep config, not the tool itself.
Original prompt
💬 We'd love your input! Share your thoughts on Copilot coding agent in our 2 minute survey.