Fix MAP field path resolution for `top`, `rare`, `join`, `lookup` and `streamstats` commands by dai-chen · Pull Request #5206 · opensearch-project/sql

dai-chen · 2026-03-05T20:10:46Z

Description

TODO: Follow-up to #5198. This PR extends FieldPathPreMaterializer to resolve MAP dotted paths (e.g. doc.user.name) for additional PPL commands that were not covered in the initial fix.

Related Issues

Resolves #5152

Check List

New functionality includes testing.
New functionality has been documented.
New functionality has javadoc added.
New functionality has a user manual doc added.
New PPL command checklist all confirmed.
API changes companion pull request created.
Commits are signed per the DCO using --signoff or -s.
Public documentation issue/PR created.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Signed-off-by: Chen Dai <daichen@amazon.com>

github-actions · 2026-03-05T20:12:19Z

PR Reviewer Guide 🔍

(Review updated until commit `6cabe6f`)

Here are some key observations to aid the review process:

🧪 PR contains tests
🔒 No security concerns identified
📝 TODO sections core/src/main/java/org/opensearch/sql/calcite/MapPathPreMaterializer.java [98]: Temporary catch for AssertionError from QualifiedNameResolver. integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalcitePPLMapPathIT.java [547]: Correlate path loses materialized MAP columns; needs separate investigation. core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java [201]: Follow-up to Fix MAP path resolution for symbol-based PPL commands #5198 mentioned in PR description.
🔀 Multiple PR themes Sub-PR theme: Fix itemName calculation bug in QualifiedNameResolver.resolveFieldAccess Relevant files: core/src/main/java/org/opensearch/sql/calcite/QualifiedNameResolver.java Sub-PR theme: Extend MapPathPreMaterializer to support join, lookup, streamstats, top, and rare commands Relevant files: core/src/main/java/org/opensearch/sql/calcite/MapPathPreMaterializer.java core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java core/src/test/java/org/opensearch/sql/calcite/MapPathPreMaterializerTest.java integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalcitePPLMapPathIT.java
⚡ Recommended focus areas for review Silent Error Swallowing The catch block now catches both `RuntimeException` and `AssertionError`, silently swallowing errors and only logging at DEBUG level. This could hide real bugs where fields fail to resolve for unexpected reasons unrelated to wildcards or non-MAP types. The FIXME comment acknowledges this is temporary, but there is no tracking issue or deadline, and the broad catch may mask regressions in production. } catch (RuntimeException \| AssertionError e) { // FIXME: Temporary catch for AssertionError from QualifiedNameResolver. // Skip unresolvable fields (e.g. wildcards, dotted paths on non-MAP types); // let the command itself handle them and throw its own error. log.debug("Skipping field resolution for '{}': {}", field.getField(), e.getMessage()); Bug Fix Correctness The original code used `parts.size() - length` as the second argument to `joinParts`, but the fix uses `parts.size() - remainingStart` (i.e., `parts.size() - (length + start)`). This changes the semantics of the length calculation. The fix should be validated carefully to ensure it correctly handles all combinations of `start`, `length`, and `parts.size()`, especially edge cases where `start > 0`. int remainingStart = length + start; int remainingLength = parts.size() - remainingStart; String itemName = joinParts(parts, remainingStart, remainingLength); Double Materialization `visitJoin` explicitly calls `mapPathMaterializer.materializePaths(node, context)` after analyzing children, but `visitChildren` (called from `visitRareTopN` and others) also calls `materializePaths`. For `Join`, since `visitJoin` calls `analyze` on children directly (not `visitChildren`), this is intentional. However, it's worth verifying that no code path causes `materializePaths` to be called twice for the same node, which could result in duplicate projected columns. mapPathMaterializer.materializePaths(node, context); Empty Test Body `testStreamstatsGlobalWindowByMapPath` has an empty test body with a TODO comment explaining the correlate path is broken. An empty test provides no value and may give false confidence. It should either be marked with `@Disabled` with a linked issue, or removed until the fix is implemented. public void testStreamstatsGlobalWindowByMapPath() throws IOException { // streamstats with global=true, window>0, and group-by on MAP sub-path // This exercises the correlate path: buildGroupFilter + buildRequiredLeft. // TODO: The correlate path loses materialized MAP columns when building the right-side // scan. This needs a separate investigation into how projectPlus interacts with the // correlate variable creation in buildStreamWindowJoinPlan. }

github-actions · 2026-03-05T20:12:52Z

PR Code Suggestions ✨

Latest suggestions up to 6cabe6f

Explore these optional code suggestions:

Category	Suggestion	Impact
General	Remove redundant unwrapping of original unmodified list The `groupExprList` local variable was already transformed to wrap each `Field` in an `Alias`, but here `node.getGroupExprList()` (the original, unmodified list) is used instead. This means the unwrapping logic is redundant — the original list already contains plain `Field` expressions. Using `node.getGroupExprList()` directly (without the stream/map) would be cleaner and less error-prone. core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java [2414-2419] -List<RexNode> partitionKeys = - rexVisitor.analyze( - node.getGroupExprList().stream() - .map(expr -> expr instanceof Alias ? ((Alias) expr).getDelegated() : expr) - .toList(), - context); +List<RexNode> partitionKeys = rexVisitor.analyze(node.getGroupExprList(), context); Suggestion importance[1-10]: 6 __ Why: The suggestion correctly points out that `node.getGroupExprList()` returns the original unmodified list (plain `Field` expressions), making the `instanceof Alias` unwrapping logic dead code. Using `node.getGroupExprList()` directly is cleaner and equivalent, reducing unnecessary complexity.	Low
	Avoid swallowing AssertionError in catch block Catching `AssertionError` is marked as a FIXME and is a code smell — `AssertionError` indicates a programming error in Calcite's internals, not a recoverable condition. Silently swallowing it may hide real bugs. The root cause in `QualifiedNameResolver` should be fixed so that it throws an `IllegalArgumentException` instead of triggering an `AssertionError`, eliminating the need to catch it here. core/src/main/java/org/opensearch/sql/calcite/MapPathPreMaterializer.java [97-102] -} catch (RuntimeException \| AssertionError e) { - // FIXME: Temporary catch for AssertionError from QualifiedNameResolver. +} catch (RuntimeException e) { // Skip unresolvable fields (e.g. wildcards, dotted paths on non-MAP types); // let the command itself handle them and throw its own error. log.debug("Skipping field resolution for '{}': {}", field.getField(), e.getMessage()); } Suggestion importance[1-10]: 5 __ Why: Catching `AssertionError` is indeed a code smell and the PR itself marks it as a FIXME. The suggestion is valid in principle, but the PR author is aware of this and the fix requires changes in `QualifiedNameResolver` to throw a proper exception instead. The suggestion doesn't provide a complete solution, just removes the catch without fixing the root cause.	Low
	Filter only dotted paths for lookup materialization `getMappingAliasMap()` maps lookup-index field names to source field names (the `AS` targets). Only the source-side names (values) that are dotted MAP paths need pre-materialization. However, if the map semantics are reversed (keys are source names, values are lookup names), this would incorrectly materialize lookup-index field names instead. Confirm the map direction matches the intent, and consider filtering to only entries whose value actually contains a dot to avoid unnecessary work. core/src/main/java/org/opensearch/sql/calcite/MapPathPreMaterializer.java [75-78] case Lookup lookup -> lookup.getMappingAliasMap().values().stream() + .filter(name -> name.contains(".")) .map(name -> new Field(QualifiedName.of(List.of(name.split("\\."))))) .toList(); Suggestion importance[1-10]: 4 __ Why: Adding a `.filter(name -> name.contains("."))` guard is a minor optimization that avoids unnecessary `Field` creation for non-dotted names. It's a valid improvement but has low impact since non-dotted fields would simply resolve to `INPUT_REF` kind and be skipped in `doMaterializeMapPaths` anyway.	Low
Possible issue	Verify field-access index arithmetic correctness The original code passed `parts.size() - length` as the length argument to `joinParts`, but the new code passes `parts.size() - remainingStart` which equals `parts.size() -` `length - start`. If `start > 0`, this produces a different (shorter) result. Verify that the intended behavior is to join all remaining parts after `remainingStart`, and confirm the `joinParts` signature interprets the second argument as a start index and the third as a count/length — if the third argument is a count, then `remainingLength` should be `parts.size() - remainingStart`, which is correct only when `start == 0`. Please add a unit test covering `start > 0` to confirm correctness. core/src/main/java/org/opensearch/sql/calcite/QualifiedNameResolver.java [291-293] int remainingStart = length + start; int remainingLength = parts.size() - remainingStart; +// Verify: joinParts(parts, fromIndex, count) — remainingLength is the number of parts after the matched prefix String itemName = joinParts(parts, remainingStart, remainingLength); Suggestion importance[1-10]: 5 __ Why: The old code used `parts.size() - length` as the length argument to `joinParts`, while the new code uses `parts.size() - remainingStart` which equals `parts.size() - length - start`. This is a genuine behavioral change that could be a bug fix or introduce a regression depending on the `joinParts` signature semantics. The suggestion correctly identifies this discrepancy and asks for verification.	Low

Previous suggestions

Suggestions up to commit 50f9c6d

Category	Suggestion	Impact
Possible issue	Remove duplicate field path materialization call The `fieldPathMaterializer.materializePaths(node, context)` call in `visitJoin` is redundant because `visitChildren` already calls `fieldPathMaterializer.materializePaths` for every `UnresolvedPlan` node, including `Join`. This double invocation could cause duplicate column projections or unexpected behavior. The `Join` case was also added to `extractFieldOperands` in `FieldPathPreMaterializer`, so it will be handled by the existing `visitChildren` flow. core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java [1384-1385] children.forEach(c -> analyze(c, context)); -fieldPathMaterializer.materializePaths(node, context); Suggestion importance[1-10]: 7 __ Why: The `visitJoin` method calls `fieldPathMaterializer.materializePaths(node, context)` explicitly after `children.forEach(c -> analyze(c, context))`, but `visitChildren` already handles this for `UnresolvedPlan` nodes. However, `visitJoin` doesn't call `visitChildren` - it calls `analyze` directly on children, so the flow is different. The `Join` case in `extractFieldOperands` handles join fields, but the explicit call here may still be needed since `visitChildren` isn't invoked. This needs careful verification.	Medium
	Avoid catching fatal JVM errors Catching `Throwable` is overly broad and will silently swallow serious JVM errors like `OutOfMemoryError` or `StackOverflowError`. The original code caught `RuntimeException`, and the test for `AssertionError` handling can be addressed by catching `Exception` instead, which covers both checked and unchecked exceptions while still allowing fatal errors to propagate. core/src/main/java/org/opensearch/sql/calcite/FieldPathPreMaterializer.java [111-115] -} catch (Throwable e) { +} catch (Exception e) { // Skip unresolvable fields (e.g. wildcards, dotted paths on non-MAP types); // let the command itself handle them and throw its own error. log.debug("Skipping field resolution for '{}': {}", field.getField(), e.getMessage()); } Suggestion importance[1-10]: 7 __ Why: Catching `Throwable` is overly broad and will silently swallow serious JVM errors like `OutOfMemoryError`. Changing to `Exception` would cover both checked and unchecked exceptions (including `AssertionError`'s parent `Error` is excluded) while allowing fatal errors to propagate. This is a valid concern for production code quality.	Medium
	Use correct list variable for partition keys The `groupExprList` local variable was already transformed to wrap each `Field` in an `Alias`, but `node.getGroupExprList()` is used here instead of the local `groupExprList`. This means the unwrapping logic `expr instanceof Alias ? ((Alias)` `expr).getDelegated() : expr` is operating on the original (non-aliased) list, making the mapping a no-op. Use the local `groupExprList` to be consistent, or simply use `node.getGroupExprList()` directly since those are the original `Field` expressions. core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java [2414-2419] -List<RexNode> partitionKeys = - rexVisitor.analyze( - node.getGroupExprList().stream() - .map(expr -> expr instanceof Alias ? ((Alias) expr).getDelegated() : expr) - .toList(), - context); +List<RexNode> partitionKeys = rexVisitor.analyze(node.getGroupExprList(), context); Suggestion importance[1-10]: 6 __ Why: The suggestion correctly identifies that `node.getGroupExprList()` returns the original `Field` expressions (not wrapped in `Alias`), so the `instanceof Alias` check is always false and the mapping is a no-op. Using `node.getGroupExprList()` directly is cleaner and equivalent, making the improved code accurate.	Low
General	Verify correct map side used for field extraction The `getMappingAliasMap()` maps lookup field names to source field names (the `AS` clause). The values represent the source-side field names to materialize. However, the keys of this map (the lookup index field names) should not be materialized from the source relation. Verify that `getMappingAliasMap().values()` indeed returns the source-side field names and not the lookup-side names, as materializing lookup-index field names against the source relation would silently produce no-ops or incorrect projections. core/src/main/java/org/opensearch/sql/calcite/FieldPathPreMaterializer.java [87-90] +case Lookup lookup -> + lookup.getMappingAliasMap().values().stream() + .map(name -> new Field(QualifiedName.of(List.of(name.split("\\."))))) + .toList(); - Suggestion importance[1-10]: 2 __ Why: The `existing_code` and `improved_code` are identical, making this a verification suggestion rather than an actual code change. The suggestion asks to verify correctness but proposes no modification.	Low

Suggestions up to commit de1d42b

Category	Suggestion	Impact
Possible issue	Avoid catching fatal JVM errors silently Catching `Throwable` is overly broad and will silently swallow `OutOfMemoryError`, `StackOverflowError`, and other fatal JVM errors that should never be suppressed. The original code caught `RuntimeException`, which was already broad enough. Change the catch to `Exception` to avoid masking fatal errors while still handling `AssertionError` from Calcite (which is a checked concern here). core/src/main/java/org/opensearch/sql/calcite/FieldPathPreMaterializer.java [111-114] -} catch (Throwable e) { +} catch (Exception e) { // Skip unresolvable fields (e.g. wildcards, dotted paths on non-MAP types); // let the command itself handle them and throw its own error. log.debug("Skipping field resolution for '{}': {}", field.getField(), e.getMessage()); } Suggestion importance[1-10]: 6 __ Why: Catching `Throwable` is indeed overly broad and can mask fatal JVM errors like `OutOfMemoryError`. The test `testSkipsErrorField` uses `AssertionError` specifically to test this catch block, but changing to `Exception` would not catch `AssertionError` (which extends `Error`, not `Exception`). The improved code would break the existing test, making this suggestion partially incorrect as written.	Low
Possible issue	Use last part of qualified name as alias `f.getField().toString()` may return a fully-qualified dotted name (e.g. `"doc.user.name"`), but `Alias` typically expects a simple identifier as its name. If `Field.getField()` returns a `QualifiedName`, calling `toString()` on it may produce a dotted string that doesn't match how the column is later looked up. Consider using only the last part of the qualified name (e.g. `f.getField().getParts().get(f.getField().getParts().size()-1)`) or the same name derivation used elsewhere in the codebase. core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java [2382-2384] List<UnresolvedExpression> fieldList = node.getFields().stream() - .map(f -> (UnresolvedExpression) new Alias(f.getField().toString(), f)) + .map(f -> { + QualifiedName qn = f.getField(); + String alias = qn.getParts().get(qn.getParts().size() - 1); + return (UnresolvedExpression) new Alias(alias, f); + }) .toList(); Suggestion importance[1-10]: 3 __ Why: The suggestion raises a valid concern about using `toString()` on a `QualifiedName` for dotted paths. However, looking at the PR context, the existing code in `doMaterializeFieldPaths` also uses `field.getField().toString()` as the column name, and the `top`/`rare` commands need the full dotted name (e.g., `doc.user.name`) to match the materialized column. Using only the last part would break this consistency.	Low
General	Materialize dotted paths from both sides of lookup mapping `getMappingAliasMap()` maps lookup field names to source field names (or vice versa). Using `.values()` may extract the wrong side of the mapping depending on the map's semantics. If the map is `lookupField -> sourceField`, then `.values()` gives source field names, but if it's `sourceField -> lookupField`, the keys should be used instead. Verify which side contains the dotted MAP paths that need materialization and use the correct side. core/src/main/java/org/opensearch/sql/calcite/FieldPathPreMaterializer.java [87-90] case Lookup lookup -> - lookup.getMappingAliasMap().values().stream() + lookup.getMappingAliasMap().entrySet().stream() + .flatMap(e -> Stream.of(e.getKey(), e.getValue())) + .filter(name -> name.contains(".")) .map(name -> new Field(QualifiedName.of(List.of(name.split("\\."))))) .toList(); Suggestion importance[1-10]: 4 __ Why: The suggestion raises a valid concern about which side of the `getMappingAliasMap()` contains the dotted MAP paths. The integration test `testLookupOnMapPath` uses `name AS doc.user.name` syntax, suggesting the value side contains the source field name. However, materializing both sides could be safer and the `improved_code` adds a `.filter(name -> name.contains("."))` guard which is a reasonable improvement.	Low

Signed-off-by: Chen Dai <daichen@amazon.com>

…visitor The Alias wrapping in getGroupByList broke V2 engine's top/rare with group-by because Alias.getChild() returns null, causing NPE in the V2 ExpressionAnalyzer. Move the wrapping to visitRareTopN in the Calcite visitor where it's only needed. Signed-off-by: Chen Dai <daichen@amazon.com>

github-actions · 2026-03-05T22:24:55Z

PR Code Analyzer ❗

AI-powered 'Code-Diff-Analyzer' found issues on commit 50f9c6d.

Path	Line	Severity	Description
core/src/main/java/org/opensearch/sql/calcite/FieldPathPreMaterializer.java	107	low	Change from catching `RuntimeException` to `Throwable` silently swallows JVM-level errors (OutOfMemoryError, AssertionError, etc.) and the stack trace is dropped from the debug log (changed from `log.debug(..., e)` to `log.debug(..., e.getMessage())`). This is a code smell that could mask serious errors, but is consistent with the stated intent to handle Calcite internal AssertionErrors during field resolution, and is covered by explicit tests.

The table above displays the top 10 most important findings.

Total: 1 | Critical: 0 | High: 0 | Medium: 0 | Low: 1

Pull Requests Author(s): Please update your Pull Request according to the report above.

Repository Maintainer(s): You can bypass diff analyzer by adding label skip-diff-analyzer after reviewing the changes carefully, then re-run failed actions. To re-enable the analyzer, remove the label, then re-run all actions.

⚠️ Note: The Code-Diff-Analyzer helps protect against potentially harmful code patterns. Please ensure you have thoroughly reviewed the changes beforehand.

Thanks.

github-actions · 2026-03-05T22:25:52Z

Persistent review updated to latest commit 50f9c6d

Signed-off-by: Chen Dai <daichen@amazon.com>

github-actions · 2026-03-06T00:15:13Z

Persistent review updated to latest commit 6cabe6f

dai-chen added 3 commits March 5, 2026 11:51

Fix map path resolution issue in top/rare

006df4d

Signed-off-by: Chen Dai <daichen@amazon.com>

Fix map path resolution issue in join/lookup/streamstats

92b2bf6

Signed-off-by: Chen Dai <daichen@amazon.com>

Fix assertion error issue and rename to field path

7b17202

Signed-off-by: Chen Dai <daichen@amazon.com>

dai-chen self-assigned this Mar 5, 2026

dai-chen added PPL Piped processing language bugFix labels Mar 5, 2026

dai-chen added 2 commits March 5, 2026 13:11

Fix stale javadoc

4676685

Signed-off-by: Chen Dai <daichen@amazon.com>

dai-chen force-pushed the fix/resolve-map-paths-for-category-b-c branch from de1d42b to 50f9c6d Compare March 5, 2026 22:24

dai-chen added 2 commits March 5, 2026 14:57

Revert the rename changes

b0d6071

Signed-off-by: Chen Dai <daichen@amazon.com>

Narrow down exception catch with fixme comment

6cabe6f

Signed-off-by: Chen Dai <daichen@amazon.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix MAP field path resolution for `top`, `rare`, `join`, `lookup` and `streamstats` commands#5206

Fix MAP field path resolution for `top`, `rare`, `join`, `lookup` and `streamstats` commands#5206
dai-chen wants to merge 7 commits intoopensearch-project:mainfrom
dai-chen:fix/resolve-map-paths-for-category-b-c

dai-chen commented Mar 5, 2026

Uh oh!

github-actions bot commented Mar 5, 2026 •

edited

Loading

Uh oh!

github-actions bot commented Mar 5, 2026 •

edited

Loading

Uh oh!

github-actions bot commented Mar 5, 2026

Uh oh!

github-actions bot commented Mar 5, 2026

Uh oh!

github-actions bot commented Mar 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

dai-chen commented Mar 5, 2026

Description

Related Issues

Check List

Uh oh!

github-actions bot commented Mar 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Reviewer Guide 🔍

(Review updated until commit 6cabe6f)

Uh oh!

github-actions bot commented Mar 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Code Suggestions ✨

Previous suggestions

Uh oh!

github-actions bot commented Mar 5, 2026

PR Code Analyzer ❗

Uh oh!

github-actions bot commented Mar 5, 2026

Uh oh!

github-actions bot commented Mar 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

github-actions bot commented Mar 5, 2026 •

edited

Loading

(Review updated until commit `6cabe6f`)

github-actions bot commented Mar 5, 2026 •

edited

Loading