Skip to content

feat: Pluggable SQL dialect framework with ClickHouse dialect#5187

Draft
anirudha wants to merge 2 commits intoopensearch-project:mainfrom
anirudha:feature/clickhouse-sql-dialect
Draft

feat: Pluggable SQL dialect framework with ClickHouse dialect#5187
anirudha wants to merge 2 commits intoopensearch-project:mainfrom
anirudha:feature/clickhouse-sql-dialect

Conversation

@anirudha
Copy link
Collaborator

@anirudha anirudha commented Feb 26, 2026

#5184

Description

Adds a pluggable SQL dialect framework to the OpenSearch SQL plugin's Calcite-based query engine, with ClickHouse as the first dialect implementation. This enables Grafana dashboard migration use cases where users move from ClickHouse to OpenSearch without rewriting queries.

Framework

  • DialectPlugin interface, DialectRegistry, QueryPreprocessor in api module
  • Dialect routing via ?dialect= query parameter on /_plugins/_sql
  • Integration with existing Calcite query pipeline (CalcitePlanContext -> RelNode -> execution)

ClickHouse Dialect

  • Query preprocessor strips FORMAT, SETTINGS, FINAL clauses
  • ClickHouseOperatorTable translates 25+ ClickHouse functions to Calcite equivalents (time-bucketing, type-conversion, aggregates, conditionals)
  • OpenSearchClickHouseSqlDialect for RelNode-to-SQL unparsing
  • Error handling with position info, function/type identification, safe 500s

Testing

  • 20 property-based tests (jqwik, 100 iterations each)
  • Unit tests for registry, routing, error handling
  • Integration tests (ClickHouseDialectIT) for end-to-end pipeline
  • 43 ClickBench SQL compatibility queries (SQLClickBenchIT, CalciteSQLClickBenchIT)

Issues Resolved

N/A

Check List

  • New functionality includes testing
  • New functionality has been documented
  • Commits are signed off

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

@anirudha anirudha force-pushed the feature/clickhouse-sql-dialect branch from 6177a52 to 4a0a670 Compare February 26, 2026 01:23
@anirudha anirudha force-pushed the feature/clickhouse-sql-dialect branch from 4a0a670 to 0b8c762 Compare February 26, 2026 01:40
@anirudha
Copy link
Collaborator Author

review comments/
Risk areas, correctness, and suggestions (prioritized)

HTTP/Parameter handling and validation (Critical)
Validate dialect parameter strictly and return a clear 4xx for unknown/invalid dialects rather than allowing a backend 500. I saw mention of safe 500s in the PR, but user input errors (unknown dialect) should be a 400 with a helpful message.
Sanitize the param and/or whitelist dialect names. If the registry is extensible by plugins, ensure discovery only exposes registered names.
Thread-safety, initialization, and lifecycle (High)
Ensure DialectRegistry is thread-safe and immutable after startup or that registrations are synchronized. A registry mutated at request-time could introduce subtle race conditions.
If dialect plugins hold per-dialect state, verify they are either stateless or properly synchronized. Caching (e.g., compiled regex, prebuilt operator maps) should be read-only after init.
Error code mapping (High)
Differentiate parse/semantic errors (400) vs internal mapping/unparse failures (500). The PR mentions improved error handling; please ensure errors from dialect translation (e.g., unsupported function mapping) are surfaced with an informative error code/type and not exposed with internal stack traces.
Preprocessor correctness and robustness (High)
The ClickHouse preprocessor strips FORMAT, SETTINGS, FINAL clauses. Ensure that stripping handles edge cases:
String literals containing those tokens,
Comments containing those tokens,
Upper/lower case variants,
Clause ordering/whitespace/newlines. Add targeted tests that include these edge cases to avoid accidental clause removal inside strings or comments.
Semantic compatibility of function mappings (High)
ClickHouse functions and Calcite/OpenSearch behavior can differ subtly (null handling, integer vs float promotion, timezone behavior for time-bucket functions, aggregate edge cases, DISTINCT semantics).
For each mapped function, add tests asserting behavior for null inputs, boundary values, and type coercions.
Document known semantic differences and, where possible, adjust mappings to match ClickHouse behavior (if the goal is exact compatibility).
SQL unparsing and quoting (Medium)
OpenSearchClickHouseSqlDialect must produce SQL with proper identifier quoting and literal formatting matching ClickHouse expectations (e.g., backticks vs double quotes, boolean/literal syntax).
Verify that string escaping and date/time literal formatting are correct for ClickHouse and that unparse output is safe to present or use.
Security considerations (Medium)
If dialect plugins can add new execution behavior or unparsed SQL is sent to other systems, consider an explicit security review. For example, unparse output should not be executed against untrusted systems without validation.
Ensure the dialect parameter cannot be abused to influence server-side class loading or plugin activation in unintended ways.
Performance and caching (Medium)
Translating functions and unparsing on every request can be CPU-intensive. Ensure operator translation lookups, compiled patterns, and unparsers are cached and reused.
Consider benchmarking request latency for queries with dialect enabled to ensure acceptable overhead.
Compatibility and feature flagging (Medium)
Ensure default behavior (no dialect param) remains exactly as before and that adding the framework doesn’t change the public API contract.
Consider adding a runtime toggle or feature flag to enable/disable dialect routing for easier rollout and rollback.
Module organization, packaging, and SPI (Low → Medium)
If dialects are intended to be pluggable by third parties, document the plugin SPI and recommended packaging and classloading practices. Using Java ServiceLoader or a documented plugin registration API helps third parties discover and register dialects safely.
Test stability / flakiness (Low → Medium)
The PR contains many integration/property tests — run full CI (including integration tests) to surface potential flakiness. For property tests (jqwik), ensure deterministic seeds or allow easier repro on failure.
Observability and logging (Low)
Add meaningful logs when dialect selection occurs and when translations fail. Consider metrics for:
number of queries routed by dialect,
translation errors,
average translation/unparse latency.
Documentation (Low)
The PR mentions documentation; ensure it includes:
how to enable/select dialects,
list of supported ClickHouse functions with a "mapping table" showing behavioral notes,
known incompatibilities and recommended workarounds.
Minor/nit suggestions

Use an enum or central constants for dialect names to avoid typo-prone string comparisons.
Add JavaDoc to the DialectPlugin interface describing thread-safety expectations and lifecycle.
Consider unit tests that simulate multiple concurrent requests selecting dialects to validate thread-safety.
If any new public APIs are introduced in api module, ensure they are covered by API docs and have stable wording for back-compat.
Concrete things I can do next if you want

Do a focused file-level review and prepare GitHub review comments for specific files/classes (e.g., DialectRegistry, QueryPreprocessor, ClickHouseOperatorTable, OpenSearchClickHouseSqlDialect). I can post those comments or give a per-file checklist.
Produce a test-gap list mapping specific ClickBench query categories to additional unit/property tests to ensure parity.
Run or help run unit/integration tests locally (if you provide access/CI outputs) and report flaky tests or failures.

Adds a pluggable SQL dialect framework to the OpenSearch SQL plugin's
Calcite-based query engine, with ClickHouse as the first dialect.

Framework:
- DialectPlugin interface, DialectRegistry, QueryPreprocessor in api module
- Dialect routing via ?dialect= query parameter on /_plugins/_sql
- Integration with existing Calcite query pipeline

ClickHouse dialect:
- Query preprocessor strips FORMAT, SETTINGS, FINAL clauses
- ClickHouseOperatorTable translates 25+ ClickHouse functions to Calcite
  equivalents (time-bucketing, type-conversion, aggregates, conditionals)
- OpenSearchClickHouseSqlDialect for RelNode-to-SQL unparsing
- Error handling with position info, function/type identification, safe 500s

Testing:
- 20 property-based tests (jqwik, 100 iterations each)
- Unit tests for registry, routing, error handling
- Integration tests (ClickHouseDialectIT) for end-to-end pipeline
- 43 ClickBench SQL compatibility queries

Signed-off-by: anirudha <anirudha@nyu.edu>
Signed-off-by: Anirudha Jadhav <jadhanir@amazon.com>
@anirudha anirudha force-pushed the feature/clickhouse-sql-dialect branch from 0b8c762 to 2af3c5a Compare February 26, 2026 01:50
@anirudha
Copy link
Collaborator Author

Preprocessor edge cases (fixed): Rewrote ClickHouseQueryPreprocessor to use a mask-and-restore approach — string literals ('...') and line comments (-- ...) are masked with placeholder tokens before the regex runs, then restored after. This prevents false matches on FORMAT, SETTINGS, and FINAL inside string content or comments.

Logging (added): Added LOG.info when routing to a dialect pipeline (includes request ID and dialect name) and LOG.debug for the preprocessed query (shows before/after).

New edge case tests (added): 3 new test methods in the preprocessor property test — keywords inside string literals are preserved, keywords inside line comments are preserved, and mixed-case keyword variants are properly stripped.

All tests pass across api, sql, and legacy modules. Amended commit and force-pushed to PR #5187.

@anirudha
Copy link
Collaborator Author

next review
Updated review (prioritized, actionable)

Critical

Validate dialect parameter and return proper 4xx
Problem: unknown/misspelled dialect should produce a clear client error, not an internal 500.
Action: validate the query param against registry. For unknown dialect return 400 (Bad Request) with a JSON body:
fields: error_type, message, dialect_requested
Example message: "Unknown SQL dialect 'clickhous'. Supported dialects: [clickhouse]"
Add tests: route with invalid dialect, missing value, and with malicious values (long strings / control chars).
Use a parser/tokenizer for preprocessor, not ad-hoc regex
Problem: naive string removal of FORMAT / SETTINGS / FINAL may remove tokens inside strings/comments and is fragile with whitespace/casing.
Action: implement preprocessor using a SQL tokenizer or Calcite SqlParser to identify top-level clause nodes and strip them safely. If using regex is unavoidable, add strict tests (see below).
Add tests: queries containing the tokens in string literals, comments, inside function args, mixed-case tokens, and different clause orders.
High 3) DialectRegistry thread-safety and lifecycle

Problem: registry mutation at request-time or non-atomic registration leads to races.
Action:
Make registry immutable after startup (e.g., final ImmutableMap) or guard registration with synchronized blocks.
Document lifecycle: when registration occurs (startup) and expected thread-safety contract.
Add tests: concurrent registration/lookup stress test (or simulate concurrent requests) to prove safety.
Function mapping semantics (ClickHouse → Calcite)
Problem: mapped functions may differ in null-handling, type promotion, timezones, DISTINCT semantics.
Action:
For each mapped function, add a small unit of behavioral tests: null inputs, boundary values, type combos.
Where exact ClickHouse parity is required, add per-function notes in code comments and in docs.
Prefer explicit casts where ClickHouse does implicit promotions.
Error classification and sanitization
Problem: mapping/unparse failures may leak internal state or produce unclear 500s.
Action:
Map parse/semantic errors to 4xx (400/422) where appropriate.
Map internal failures to 500 but sanitize message and include:
code, short message, optional position info (line/col), and an internal_id for log lookup.
Ensure stack traces are only logged, not returned to clients.
Medium 6) SQL unparsing correctness

Problem: Calcite unparsing needs ClickHouse-compatible quoting/formatting.
Action:
Verify identifier quoting, literal escaping, and date/time literal formats from OpenSearchClickHouseSqlDialect.
Add round-trip tests: parse ClickHouse query → RelNode → unparse → re-parse (or compare to expected SQL strings).
Caching and performance
Problem: function mapping/unparse on every request could be expensive.
Action:
Cache operator lookups, compiled regex/patterns, unparsers if safe.
Use thread-safe caches (ConcurrentHashMap) and bounded caches if entries can grow.
Add benchmarks: cold vs warm latency with and without dialect enabled.
Observability
Action:
Add logs at selection time (INFO/DEBUG depending on verbosity) indicating dialect in use and metric counters for routing failures and translation failures.
Expose metrics: dialect_requests_total, dialect_translation_errors_total, dialect_unparse_latency_ms.
Low / Cosmetic 9) API docs & migration guide

Action:
Add README entry with:
how to enable dialect via ?dialect=,
list of supported ClickHouse functions and known differences,
examples of clause stripping, and migration tips for Grafana users.
API/Code hygiene
Action:
Use central constants/enum for dialect names.
Add JavaDoc for DialectPlugin describing thread-safety and lifecycle.
Use ServiceLoader or documented SPI to enable third-party dialects safely.
Concrete test list to add (explicit)

Preprocessor edge cases:
"SELECT 'FORMAT' as a FROM t" — should not strip FORMAT from string
Queries with /* comment with FORMAT */ and -- comment FORMAT
Mixed-case keywords (Format, format, FORMAT)
FORMAT ... appearing in nested subqueries or function args
Function mapping tests:
For each mapped function, test null input, empty inputs, integer overflow/underflow, and type combinations
Error handling tests:
Unknown dialect -> 400
Malformed ClickHouse query -> 400/422 with position
Unsupported function mapping -> 422 with function name and suggestion
Concurrency:
Start N threads issuing dialect-enabled queries simultaneously and assert no exceptions or registry races
Round-trip unparsing:
A small set of queries → parse → RelNode → unparse → expected SQL or acceptance by ClickBench harness
Suggested per-file review checklist (I can draft comments)

DialectRegistry:
Ensure immutability or synchronized registration
Expose a list of registered names for UI/docs
DialectPlugin interface:
Document thread-safety and init/close lifecycle
QueryPreprocessor / ClickHousePreprocessor:
Replace regex with proper parser/tokenizer
Add unit tests for edge cases
ClickHouseOperatorTable:
Add per-function tests and comments documenting semantic differences
Ensure use of immutable maps and thread-safe caches
OpenSearchClickHouseSqlDialect:
Confirm quoting/escaping and literal formatting
Add round-trip tests
Dialect routing (HTTP handler):
Validate param, ensure secure parsing, log and metric hooks

…st pollution

The RestSQLQueryActionObservabilityTest was setting LocalClusterState
to a mock in @before but never restoring it, causing NullPointerException
in QueryFunctionsTest and unnecessary Mockito stubbings in
SqlRequestFactoryTest when tests ran in sequence.

Added @after tearDown to reset LocalClusterState to null.
@github-actions
Copy link
Contributor

PR Code Analyzer ❗

AI-powered 'Code-Diff-Analyzer' found issues on commit d88e27c.

'Diff too large, requires skip by maintainers after manual review'


Pull Requests Author(s): Please update your Pull Request according to the report above.

Repository Maintainer(s): You can bypass diff analyzer by adding label skip-diff-analyzer after reviewing the changes carefully, then re-run failed actions. To re-enable the analyzer, remove the label, then re-run all actions.


⚠️ Note: The Code-Diff-Analyzer helps protect against potentially harmful code patterns. Please ensure you have thoroughly reviewed the changes beforehand.

Thanks.

* lock-free reads. After this call, {@link #register} will throw {@link IllegalStateException}.
*/
public synchronized void freeze() {
this.dialects = Map.copyOf(mutableDialects);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

public synchronized void freeze() {
  if (!frozen) {
    this.dialects = Map.copyOf(mutableDialects);
    this.frozen = true;
  }
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we clear the mutableDialects here?
mutableDialects.clear()

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Q: why we duplicate the dialects map to mutableDialects and immutableDialects? IMO, the CurrentHashMap mutableDialects is sufficient and no necessary to do "freeze" operation.

Comment on lines +23 to +24
testImplementation('org.junit.jupiter:junit-jupiter:5.9.3')
testImplementation('net.jqwik:jqwik:1.9.2')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

jqwik 1.9.2 is not compatible with JUnit 5.9.3

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[optional] #1974 suggested to eliminate JUnit5, I prefer to not introduce junit-jupiter in api module.

PPL,
SQL
SQL,
CLICKHOUSE;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Q: should the CLICKHOUSE dialect be a new QueryType?

Comment on lines +391 to +401
1. Build your ``DialectPlugin`` implementation into a JAR file.

2. Create a ServiceLoader descriptor file in your JAR at::

META-INF/services/org.opensearch.sql.api.dialect.DialectPlugin

3. The file should contain the fully qualified class name of your implementation, one per line::

com.example.dialect.MyCustomDialectPlugin

4. Place the JAR on the OpenSearch SQL plugin's classpath.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need a security review here?

* Tests the full pipeline: REST request with dialect=clickhouse
* -> preprocessing -> Calcite parsing -> execution -> JSON response.
*
* <p>Validates Requirements 10.2, 10.3, 10.4, 10.5.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what Validates Requirements 10.2, 10.3, 10.4, 10.5. means?

@@ -0,0 +1 @@
SELECT COUNT(*) FROM hits No newline at end of file
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add some Clickhouse-specific queries?

sendErrorResponse(
channel,
"Dialect parameter must be non-empty.",
RestStatus.BAD_REQUEST);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would this cause any failure when endpoint is localhost:9200/_plugins/_sql

* <li>Internal errors: 500 with generic message, stack trace logged at ERROR level
* </ul>
*/
private void executeDialectQuery(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not a good design to execute dialect query, especially calling Calcite here.

* Tokenize the query into a list of tokens using a character-by-character state machine. Tracks
* string literals, block comments, line comments, and parenthesis depth.
*/
List<Token> tokenize(String query) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this method copied from ClickHouse?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants