feat: Pluggable SQL dialect framework with ClickHouse dialect#5187
feat: Pluggable SQL dialect framework with ClickHouse dialect#5187anirudha wants to merge 2 commits intoopensearch-project:mainfrom
Conversation
6177a52 to
4a0a670
Compare
sql/src/main/java/org/opensearch/sql/sql/dialect/clickhouse/ClickHouseQueryPreprocessor.java
Fixed
Show fixed
Hide fixed
4a0a670 to
0b8c762
Compare
|
review comments/ HTTP/Parameter handling and validation (Critical) Use an enum or central constants for dialect names to avoid typo-prone string comparisons. Do a focused file-level review and prepare GitHub review comments for specific files/classes (e.g., DialectRegistry, QueryPreprocessor, ClickHouseOperatorTable, OpenSearchClickHouseSqlDialect). I can post those comments or give a per-file checklist. |
Adds a pluggable SQL dialect framework to the OpenSearch SQL plugin's Calcite-based query engine, with ClickHouse as the first dialect. Framework: - DialectPlugin interface, DialectRegistry, QueryPreprocessor in api module - Dialect routing via ?dialect= query parameter on /_plugins/_sql - Integration with existing Calcite query pipeline ClickHouse dialect: - Query preprocessor strips FORMAT, SETTINGS, FINAL clauses - ClickHouseOperatorTable translates 25+ ClickHouse functions to Calcite equivalents (time-bucketing, type-conversion, aggregates, conditionals) - OpenSearchClickHouseSqlDialect for RelNode-to-SQL unparsing - Error handling with position info, function/type identification, safe 500s Testing: - 20 property-based tests (jqwik, 100 iterations each) - Unit tests for registry, routing, error handling - Integration tests (ClickHouseDialectIT) for end-to-end pipeline - 43 ClickBench SQL compatibility queries Signed-off-by: anirudha <anirudha@nyu.edu> Signed-off-by: Anirudha Jadhav <jadhanir@amazon.com>
0b8c762 to
2af3c5a
Compare
|
Preprocessor edge cases (fixed): Rewrote ClickHouseQueryPreprocessor to use a mask-and-restore approach — string literals ('...') and line comments (-- ...) are masked with placeholder tokens before the regex runs, then restored after. This prevents false matches on FORMAT, SETTINGS, and FINAL inside string content or comments. Logging (added): Added LOG.info when routing to a dialect pipeline (includes request ID and dialect name) and LOG.debug for the preprocessed query (shows before/after). New edge case tests (added): 3 new test methods in the preprocessor property test — keywords inside string literals are preserved, keywords inside line comments are preserved, and mixed-case keyword variants are properly stripped. All tests pass across api, sql, and legacy modules. Amended commit and force-pushed to PR #5187. |
|
next review Critical Validate dialect parameter and return proper 4xx Problem: registry mutation at request-time or non-atomic registration leads to races. Problem: Calcite unparsing needs ClickHouse-compatible quoting/formatting. Action: Preprocessor edge cases: DialectRegistry: |
…st pollution The RestSQLQueryActionObservabilityTest was setting LocalClusterState to a mock in @before but never restoring it, causing NullPointerException in QueryFunctionsTest and unnecessary Mockito stubbings in SqlRequestFactoryTest when tests ran in sequence. Added @after tearDown to reset LocalClusterState to null.
PR Code Analyzer ❗AI-powered 'Code-Diff-Analyzer' found issues on commit d88e27c. 'Diff too large, requires skip by maintainers after manual review' Pull Requests Author(s): Please update your Pull Request according to the report above. Repository Maintainer(s): You can Thanks. |
| * lock-free reads. After this call, {@link #register} will throw {@link IllegalStateException}. | ||
| */ | ||
| public synchronized void freeze() { | ||
| this.dialects = Map.copyOf(mutableDialects); |
There was a problem hiding this comment.
public synchronized void freeze() {
if (!frozen) {
this.dialects = Map.copyOf(mutableDialects);
this.frozen = true;
}
}
There was a problem hiding this comment.
Should we clear the mutableDialects here?
mutableDialects.clear()
There was a problem hiding this comment.
Q: why we duplicate the dialects map to mutableDialects and immutableDialects? IMO, the CurrentHashMap mutableDialects is sufficient and no necessary to do "freeze" operation.
| testImplementation('org.junit.jupiter:junit-jupiter:5.9.3') | ||
| testImplementation('net.jqwik:jqwik:1.9.2') |
There was a problem hiding this comment.
jqwik 1.9.2 is not compatible with JUnit 5.9.3
There was a problem hiding this comment.
[optional] #1974 suggested to eliminate JUnit5, I prefer to not introduce junit-jupiter in api module.
| PPL, | ||
| SQL | ||
| SQL, | ||
| CLICKHOUSE; |
There was a problem hiding this comment.
Q: should the CLICKHOUSE dialect be a new QueryType?
| 1. Build your ``DialectPlugin`` implementation into a JAR file. | ||
|
|
||
| 2. Create a ServiceLoader descriptor file in your JAR at:: | ||
|
|
||
| META-INF/services/org.opensearch.sql.api.dialect.DialectPlugin | ||
|
|
||
| 3. The file should contain the fully qualified class name of your implementation, one per line:: | ||
|
|
||
| com.example.dialect.MyCustomDialectPlugin | ||
|
|
||
| 4. Place the JAR on the OpenSearch SQL plugin's classpath. |
There was a problem hiding this comment.
Do we need a security review here?
| * Tests the full pipeline: REST request with dialect=clickhouse | ||
| * -> preprocessing -> Calcite parsing -> execution -> JSON response. | ||
| * | ||
| * <p>Validates Requirements 10.2, 10.3, 10.4, 10.5. |
There was a problem hiding this comment.
what Validates Requirements 10.2, 10.3, 10.4, 10.5. means?
| @@ -0,0 +1 @@ | |||
| SELECT COUNT(*) FROM hits No newline at end of file | |||
There was a problem hiding this comment.
Can we add some Clickhouse-specific queries?
| sendErrorResponse( | ||
| channel, | ||
| "Dialect parameter must be non-empty.", | ||
| RestStatus.BAD_REQUEST); |
There was a problem hiding this comment.
would this cause any failure when endpoint is localhost:9200/_plugins/_sql
| * <li>Internal errors: 500 with generic message, stack trace logged at ERROR level | ||
| * </ul> | ||
| */ | ||
| private void executeDialectQuery( |
There was a problem hiding this comment.
Not a good design to execute dialect query, especially calling Calcite here.
| * Tokenize the query into a list of tokens using a character-by-character state machine. Tracks | ||
| * string literals, block comments, line comments, and parenthesis depth. | ||
| */ | ||
| List<Token> tokenize(String query) { |
There was a problem hiding this comment.
is this method copied from ClickHouse?
#5184
Description
Adds a pluggable SQL dialect framework to the OpenSearch SQL plugin's Calcite-based query engine, with ClickHouse as the first dialect implementation. This enables Grafana dashboard migration use cases where users move from ClickHouse to OpenSearch without rewriting queries.
Framework
ClickHouse Dialect
Testing
Issues Resolved
N/A
Check List
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.