Fix #5150: Fix dedup aggregation pushdown nullifying renamed fields by opensearchpplteam · Pull Request #5192 · opensearch-project/sql

opensearchpplteam · 2026-02-27T00:00:21Z

Description

Root cause: The DedupPushdownRule converts dedup operations to LITERAL_AGG top_hits aggregations. When a field is renamed before dedup (e.g., rename value as val), the top_hits response from OpenSearch still uses the original field name (value). However, the output schema expects the renamed name (val). The OpenSearchIndexEnumerator looks up fields by the renamed names, finds nothing in the top_hits response map, and returns null.

Fix: Build a field name mapping (original -> renamed) in AggregateAnalyzer for the LITERAL_AGG case by comparing inferNamedField().getRootName() with the output alias from the project args. Pass this mapping to TopHitsParser, which applies it to translate OpenSearch field names back to the expected output names before results are consumed by the enumerator.

Changes:

AggregateAnalyzer.java: Build original-to-renamed field name mapping from project args in LITERAL_AGG case, pass to TopHitsParser
TopHitsParser.java: Added fieldNameMapping field and applyFieldNameMapping() method to translate OS field names to renamed output names. Added backward-compatible 4-arg constructor; existing 3-arg constructor delegates with empty map.
Unit test: Verifies dedup top_hits parsing with field name remapping
YAML REST tests: End-to-end tests for rename-then-dedup and eval-then-dedup scenarios

Related Issues

Resolves #5150

Check List

New functionality includes testing.
New functionality has been documented.
New functionality has javadoc added.
New functionality has a user manual doc added.
New PPL command checklist all confirmed.
API changes companion pull request created.
Commits are signed per the DCO using --signoff or -s.
Public documentation issue/PR created.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

github-actions · 2026-02-27T00:01:46Z

PR Reviewer Guide 🔍

(Review updated until commit `fe1766a`)

Here are some key observations to aid the review process:

🧪 PR contains tests
🔒 No security concerns identified
✅ No TODO sections
🔀 Multiple PR themes Sub-PR theme: Add fieldNameMapping support to TopHitsParser Relevant files: opensearch/src/main/java/org/opensearch/sql/opensearch/response/agg/TopHitsParser.java opensearch/src/test/java/org/opensearch/sql/opensearch/response/OpenSearchAggregationResponseParserTest.java Sub-PR theme: Wire fieldNameMapping from AggregateAnalyzer with integration tests Relevant files: opensearch/src/main/java/org/opensearch/sql/opensearch/request/AggregateAnalyzer.java integ-test/src/yamlRestTest/resources/rest-api-spec/test/issues/5150.yml
⚡ Recommended focus areas for review Incomplete Mapping The field name mapping only handles `RexInputRef` args. If a project arg is a `RexCall` (e.g., an eval expression like `value * 2`), it is silently skipped. This means computed/eval fields that are also renamed will not be included in the mapping, potentially causing null values in the output for those fields. for (Pair<RexNode, String> arg : args) { if (arg.getKey() instanceof RexInputRef) { String originalName = helper.inferNamedField(arg.getKey()).getRootName(); String outputName = arg.getValue(); if (!originalName.equals(outputName)) { String previousMapping = fieldNameMapping.put(originalName, outputName); if (previousMapping != null) { LOG.warn( "Field name mapping collision: field '{}' was mapped to '{}', now" + " overwritten by '{}'", originalName, previousMapping, outputName); } } } } Mapping Collision If two different original fields are mapped to the same output name (e.g., two fields both renamed to the same alias), `applyFieldNameMapping` will silently overwrite one value with the other in the result map. There is no warning or error for this case, which could lead to silent data loss. private Map<String, Object> applyFieldNameMapping(Map<String, Object> map) { if (fieldNameMapping.isEmpty()) { return map; } Map<String, Object> result = new LinkedHashMap<>(); for (Map.Entry<String, Object> entry : map.entrySet()) { String mappedName = fieldNameMapping.getOrDefault(entry.getKey(), entry.getKey()); result.put(mappedName, entry.getValue()); } return result; } Missing Schema Check The second test case ("Eval new field then dedup on different field retains eval values") does not include a `schema` match assertion, unlike the first test. This means the test does not verify that the output schema (column names and types) is correct for the eval scenario. "5150: Eval new field then dedup on different field retains eval values": - skip: features: - headers - allowed_warnings - do: allowed_warnings: - 'Loading the fielddata on the _id field is deprecated and will be removed in future versions. If you require sorting or aggregating on this field you should also include the id in the body of your documents, and map this field as a keyword field that has [doc_values] enabled' headers: Content-Type: 'application/json' ppl: body: query: source=test_5150 \| eval doubled = value * 2 \| dedup category \| sort category \| fields category, value, doubled - match: { total: 2 } - length: { datarows: 2 } - match: { datarows.0.0: "X" } - match: { datarows.0.1: 100 } - match: { datarows.0.2: 200 } - match: { datarows.1.0: "Y" } - match: { datarows.1.1: 300 } - match: { datarows.1.2: 600 }

github-actions · 2026-02-27T00:02:05Z

PR Code Suggestions ✨

Latest suggestions up to fe1766a

Explore these optional code suggestions:

Category	Suggestion	Impact
Possible issue	Guard against null field resolution result The `helper.inferNamedField(arg.getKey())` call may return `null` or throw if the field cannot be resolved, leading to a `NullPointerException` at `.getRootName()`. Adding a null check before calling `getRootName()` would make this more robust. opensearch/src/main/java/org/opensearch/sql/opensearch/request/AggregateAnalyzer.java [617-632] for (Pair<RexNode, String> arg : args) { if (arg.getKey() instanceof RexInputRef) { - String originalName = helper.inferNamedField(arg.getKey()).getRootName(); + var namedField = helper.inferNamedField(arg.getKey()); + if (namedField == null) { + continue; + } + String originalName = namedField.getRootName(); String outputName = arg.getValue(); if (!originalName.equals(outputName)) { String previousMapping = fieldNameMapping.put(originalName, outputName); if (previousMapping != null) { LOG.warn( "Field name mapping collision: field '{}' was mapped to '{}', now" + " overwritten by '{}'", originalName, previousMapping, outputName); } } } } Suggestion importance[1-10]: 5 __ Why: Adding a null check for `helper.inferNamedField(arg.getKey())` is a reasonable defensive measure to prevent a potential `NullPointerException` at `.getRootName()`. The improved code accurately reflects the suggested change.	Low
General	Detect output key collisions during field remapping If two original fields map to the same output name, the second one will silently overwrite the first in the result map. This could cause data loss without any warning. Consider detecting such collisions and logging a warning or throwing an exception. opensearch/src/main/java/org/opensearch/sql/opensearch/response/agg/TopHitsParser.java [162-172] private Map<String, Object> applyFieldNameMapping(Map<String, Object> map) { if (fieldNameMapping.isEmpty()) { return map; } Map<String, Object> result = new LinkedHashMap<>(); for (Map.Entry<String, Object> entry : map.entrySet()) { String mappedName = fieldNameMapping.getOrDefault(entry.getKey(), entry.getKey()); + if (result.containsKey(mappedName)) { + LOG.warn( + "Field name mapping collision in result: output key '{}' already exists, overwriting", + mappedName); + } result.put(mappedName, entry.getValue()); } return result; } Suggestion importance[1-10]: 4 __ Why: The suggestion is valid and addresses a potential silent data loss scenario when two original fields map to the same output name. However, this is an edge case that is unlikely in practice, and the `LOG` reference used in `TopHitsParser` is not defined in that class (it's defined in `AggregateAnalyzer`), making the `improved_code` incorrect as-is.	Low

Previous suggestions

Suggestions up to commit e8a2e3a

Category	Suggestion	Impact
Possible issue	Detect output key collisions during field remapping If two original field names map to the same output name, the second one will silently overwrite the first in the result map. This could cause data loss without any warning. Consider detecting such collisions and logging a warning or throwing an exception. opensearch/src/main/java/org/opensearch/sql/opensearch/response/agg/TopHitsParser.java [162-172] private Map<String, Object> applyFieldNameMapping(Map<String, Object> map) { if (fieldNameMapping.isEmpty()) { return map; } Map<String, Object> result = new LinkedHashMap<>(); for (Map.Entry<String, Object> entry : map.entrySet()) { String mappedName = fieldNameMapping.getOrDefault(entry.getKey(), entry.getKey()); + if (result.containsKey(mappedName)) { + throw new IllegalStateException( + String.format( + "Field name mapping collision in result: key '%s' already exists when mapping '%s'", + mappedName, entry.getKey())); + } result.put(mappedName, entry.getValue()); } return result; } Suggestion importance[1-10]: 4 __ Why: The suggestion addresses a valid edge case where two original fields could map to the same output name, causing silent data loss. However, this scenario is unlikely in practice given the mapping is built from project args, and throwing an exception may be overly aggressive compared to a warning log.	Low
General	Store defensive copy of mutable map parameter The `fieldNameMapping` field is stored as a direct reference to the passed-in map, which could be mutated externally after construction. Store a defensive copy to ensure immutability of the parser's state. opensearch/src/main/java/org/opensearch/sql/opensearch/response/agg/TopHitsParser.java [50] -this.fieldNameMapping = fieldNameMapping; +this.fieldNameMapping = Map.copyOf(fieldNameMapping); Suggestion importance[1-10]: 4 __ Why: Storing a defensive copy of `fieldNameMapping` is a good defensive programming practice to prevent external mutation. The `Map.copyOf()` change is minimal and correct, though the risk of external mutation is low given the call sites in this PR.	Low

Suggestions up to commit e9a24ad

Category	Suggestion	Impact
General	Strengthen integration test assertions with exact value checks The `is_true` assertion checks for truthiness, which would fail for falsy values like `0`. Since `value` fields contain `100` and `300` (non-zero), this works here, but it's fragile. Consider using `match` assertions to verify the exact expected values for stronger test coverage. integ-test/src/yamlRestTest/resources/rest-api-spec/test/issues/5150.yml [49-53] - match: { schema: [{"name": "category", "type": "string"}, {"name": "val", "type": "bigint"}] } - length: { datarows: 2 } - # Each row should have non-null val - - is_true: datarows.0.1 - - is_true: datarows.1.1 + # Each row should have non-null val — verify exact values + - match: { datarows.0: ["X", 100] } + - match: { datarows.1: ["Y", 300] } Suggestion importance[1-10]: 5 __ Why: The suggestion improves test robustness by replacing `is_true` (truthiness check) with exact `match` assertions. This is a valid improvement since `is_true` would fail for falsy values like `0`, and exact value checks provide stronger test coverage for the renamed field behavior.	Low
General	Detect collisions when applying field name mapping If two original field names map to the same output name, the second one will silently overwrite the first in the result map. This could cause data loss without any warning. Consider detecting and logging a warning when a collision occurs during the mapping application. opensearch/src/main/java/org/opensearch/sql/opensearch/response/agg/TopHitsParser.java [162-172] private Map<String, Object> applyFieldNameMapping(Map<String, Object> map) { if (fieldNameMapping.isEmpty()) { return map; } Map<String, Object> result = new LinkedHashMap<>(); for (Map.Entry<String, Object> entry : map.entrySet()) { String mappedName = fieldNameMapping.getOrDefault(entry.getKey(), entry.getKey()); + if (result.containsKey(mappedName)) { + LOG.warn( + "Field name mapping collision in result: mapped name '{}' already exists, overwriting value", + mappedName); + } result.put(mappedName, entry.getValue()); } return result; } Suggestion importance[1-10]: 4 __ Why: The suggestion adds collision detection in `applyFieldNameMapping`, which is a valid defensive measure. However, this is a relatively rare edge case and the impact is low since the primary collision warning is already handled upstream in `AggregateAnalyzer`. The `LOG` reference would also need to be added to `TopHitsParser`.	Low

Suggestions up to commit d82a0a1

Category	Suggestion	Impact
Possible issue	Guard against null field resolution result If `helper.inferNamedField(arg.getKey())` returns `null`, calling `.getRootName()` on it will throw a `NullPointerException`. A null check should be added before accessing `getRootName()` to prevent a runtime crash when the field cannot be resolved. opensearch/src/main/java/org/opensearch/sql/opensearch/request/AggregateAnalyzer.java [609-617] for (Pair<RexNode, String> arg : args) { if (arg.getKey() instanceof RexInputRef) { - String originalName = helper.inferNamedField(arg.getKey()).getRootName(); + var namedField = helper.inferNamedField(arg.getKey()); + if (namedField == null) continue; + String originalName = namedField.getRootName(); String outputName = arg.getValue(); if (!originalName.equals(outputName)) { fieldNameMapping.put(originalName, outputName); } } } Suggestion importance[1-10]: 6 __ Why: If `helper.inferNamedField(arg.getKey())` can return null, calling `.getRootName()` would throw a `NullPointerException`. Adding a null check is a reasonable defensive measure, though it depends on whether the method contract guarantees non-null returns.	Low
General	Detect field name mapping collisions If two original field names map to the same output name, or if a mapped output name collides with another existing key in the result map, one value will silently overwrite the other. While this may be an edge case, it could lead to data loss without any warning. Consider detecting and handling such collisions explicitly. opensearch/src/main/java/org/opensearch/sql/opensearch/response/agg/TopHitsParser.java [159-169] private Map<String, Object> applyFieldNameMapping(Map<String, Object> map) { if (fieldNameMapping.isEmpty()) { return map; } Map<String, Object> result = new LinkedHashMap<>(); for (Map.Entry<String, Object> entry : map.entrySet()) { String mappedName = fieldNameMapping.getOrDefault(entry.getKey(), entry.getKey()); + if (result.containsKey(mappedName)) { + throw new IllegalStateException( + String.format("Field name mapping collision: multiple fields map to '%s'", mappedName)); + } result.put(mappedName, entry.getValue()); } return result; } Suggestion importance[1-10]: 3 __ Why: While collision detection is a valid concern, this is an edge case that is unlikely in practice since field rename mappings are derived from explicit user-defined renames. Throwing an exception here could cause unexpected failures in valid scenarios where the mapping is well-defined.	Low

Suggestions up to commit 6eb5ac5

Category	Suggestion	Impact
Possible issue	Add null check for inferNamedField The code doesn't handle potential null values from `helper.inferNamedField(arg.getKey())`. If this method returns null, calling `getRootName()` will throw a NullPointerException. Add a null check before accessing `getRootName()` to prevent runtime failures. opensearch/src/main/java/org/opensearch/sql/opensearch/request/AggregateAnalyzer.java [608-617] Map<String, String> fieldNameMapping = new HashMap<>(); for (Pair<RexNode, String> arg : args) { if (arg.getKey() instanceof RexInputRef) { - String originalName = helper.inferNamedField(arg.getKey()).getRootName(); - String outputName = arg.getValue(); - if (!originalName.equals(outputName)) { - fieldNameMapping.put(originalName, outputName); + var namedField = helper.inferNamedField(arg.getKey()); + if (namedField != null) { + String originalName = namedField.getRootName(); + String outputName = arg.getValue(); + if (!originalName.equals(outputName)) { + fieldNameMapping.put(originalName, outputName); + } } } } Suggestion importance[1-10]: 7 __ Why: Valid defensive programming suggestion to prevent potential `NullPointerException` if `helper.inferNamedField()` returns null. However, without knowing the implementation details of `inferNamedField()`, it's unclear if this is a real issue or if the method guarantees non-null returns for `RexInputRef` instances.	Medium

Suggestions up to commit eddc6b9

Category	Suggestion	Impact
Possible issue	Add null check for inferNamedField The code assumes `helper.inferNamedField(arg.getKey())` never returns null, which could cause a NullPointerException. Add a null check before calling `getRootName()` to prevent potential runtime failures when the field cannot be inferred. opensearch/src/main/java/org/opensearch/sql/opensearch/request/AggregateAnalyzer.java [608-617] Map<String, String> fieldNameMapping = new HashMap<>(); for (Pair<RexNode, String> arg : args) { if (arg.getKey() instanceof RexInputRef) { - String originalName = helper.inferNamedField(arg.getKey()).getRootName(); - String outputName = arg.getValue(); - if (!originalName.equals(outputName)) { - fieldNameMapping.put(originalName, outputName); + var namedField = helper.inferNamedField(arg.getKey()); + if (namedField != null) { + String originalName = namedField.getRootName(); + String outputName = arg.getValue(); + if (!originalName.equals(outputName)) { + fieldNameMapping.put(originalName, outputName); + } } } } Suggestion importance[1-10]: 7 __ Why: The suggestion correctly identifies a potential `NullPointerException` if `helper.inferNamedField(arg.getKey())` returns null. Adding a null check improves robustness, though the actual likelihood of this occurring depends on the codebase's guarantees about `inferNamedField` behavior.	Medium

github-actions · 2026-02-27T00:26:33Z

Persistent review updated to latest commit eddc6b9

…g renamed fields When the DedupPushdownRule converts a dedup to an aggregation-based top_hits query, fields that were renamed (via rename or eval) would return null values. This happened because the TopHitsParser returned results using original OpenSearch field names, but the output schema expected the renamed names. Added a field name mapping to TopHitsParser so it can translate original OS field names to their renamed output names in the LITERAL_AGG (dedup) aggregation response path. Signed-off-by: opensearchpplteam <opensearchpplteam@gmail.com>

github-actions · 2026-02-27T01:46:14Z

Persistent review updated to latest commit 6eb5ac5

The testNoMvBasic and testNoMvWithEval failures in CalciteExplainIT are unrelated to this PR's dedup pushdown changes and only occur on Java 25. Re-triggering CI. Signed-off-by: opensearchpplteam <opensearchpplteam@gmail.com>

github-actions · 2026-02-27T03:25:44Z

Persistent review updated to latest commit d82a0a1

opensearchpplteam · 2026-02-27T04:34:55Z

opensearch/src/main/java/org/opensearch/sql/opensearch/request/AggregateAnalyzer.java

+            String originalName = helper.inferNamedField(arg.getKey()).getRootName();
+            String outputName = arg.getValue();
+            if (!originalName.equals(outputName)) {
+              fieldNameMapping.put(originalName, outputName);


When two project args reference the same original field with different output names (e.g., eval pay2 = salary | rename salary as pay | dedup dept_id), the second HashMap.put silently overwrites the first mapping. One of the renamed fields would still get null values. This is an edge case from issue #5150 differential testing. Consider either documenting this as a known limitation or using a multimap / detecting the collision.

Added a warning log when a collision is detected (when the same original field maps to multiple output names). The last mapping wins, which is documented in a code comment as a known limitation. This edge case requires two derived fields from the same original field before dedup, which is rare in practice.

opensearchpplteam · 2026-02-27T04:35:04Z

opensearch/src/main/java/org/opensearch/sql/opensearch/response/agg/TopHitsParser.java

                        : new LinkedHashMap<>(hit.getSourceAsMap());
                hit.getFields().values().forEach(f -> map.put(f.getName(), f.getValue()));
-                return map;
+                return applyFieldNameMapping(map);


The field name mapping is applied only in the multi-row return path (the else branch at line 140). The returnSingleValue path (used by ARG_MAX/ARG_MIN pushdown) and returnMergeValue path skip mapping entirely. If those paths are ever used with renamed fields via a future pushdown rule, they would have the same null-value bug. Worth confirming this is an intentional scope limitation for now, or if those branches should also apply the mapping defensively.

Confirmed this is intentional. The returnSingleValue and returnMergeValue paths use agg.getName() as the map key (not field names), so field name mapping is irrelevant for those paths. Added a clarifying comment in the code explaining why mapping is only applied in the multi-row return path.

…apping limitations - Add warning log when HashMap collision is detected in AggregateAnalyzer field name mapping (when same original field maps to multiple output names) - Add clarifying comments in TopHitsParser explaining why field name mapping is only needed in the multi-row return path Signed-off-by: opensearchpplteam <opensearchpplteam@gmail.com>

github-actions · 2026-02-27T04:53:24Z

Persistent review updated to latest commit e9a24ad

penghuo · 2026-02-27T21:41:34Z

integ-test/src/yamlRestTest/resources/rest-api-spec/test/issues/5150.yml

+  - is_true: datarows.0.1
+  - is_true: datarows.1.1


compare with real value, not just null value

Updated — replaced is_true with match assertions for actual expected values (category and val columns).

penghuo · 2026-02-27T21:41:41Z

integ-test/src/yamlRestTest/resources/rest-api-spec/test/issues/5150.yml

+  - is_true: datarows.0.2
+  - is_true: datarows.1.2


compare with real value, not just null value

Updated — replaced is_true with match assertions for actual expected values (category, value, and doubled columns).

…project#5150 Signed-off-by: opensearchpplteam <opensearchpplteam@gmail.com>

github-actions · 2026-02-27T21:57:42Z

Persistent review updated to latest commit e8a2e3a

Signed-off-by: opensearchpplteam <opensearchpplteam@gmail.com>

github-actions · 2026-03-03T17:30:02Z

Persistent review updated to latest commit fe1766a

LantaoJin · 2026-03-05T02:55:14Z

Hold on merging, the current fix is more like some kind of workaround. I'll need some time to recall the implementation, and if I can suggest a new fix.

penghuo added PPL Piped processing language bugFix labels Feb 27, 2026

opensearchpplteam force-pushed the fix/issue-5150 branch from eddc6b9 to 6eb5ac5 Compare February 27, 2026 01:45

Trigger CI rerun for flaky Java 25 integration test

d82a0a1

The testNoMvBasic and testNoMvWithEval failures in CalciteExplainIT are unrelated to this PR's dedup pushdown changes and only occur on Java 25. Re-triggering CI. Signed-off-by: opensearchpplteam <opensearchpplteam@gmail.com>

opensearchpplteam commented Feb 27, 2026

View reviewed changes

penghuo reviewed Feb 27, 2026

View reviewed changes

Address review: assert real values in YAML REST tests for opensearch-…

e8a2e3a

…project#5150 Signed-off-by: opensearchpplteam <opensearchpplteam@gmail.com>

opensearchpplteam mentioned this pull request Mar 3, 2026

Dedup pushdown field name mapping collision when multiple fields reference same original field #5197

Open

Link known limitation comment to tracking issue opensearch-project#5197

fe1766a

Signed-off-by: opensearchpplteam <opensearchpplteam@gmail.com>

penghuo approved these changes Mar 3, 2026

View reviewed changes

Conversation

opensearchpplteam commented Feb 27, 2026

Description

Related Issues

Check List

Uh oh!

github-actions bot commented Feb 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Reviewer Guide 🔍

(Review updated until commit fe1766a)

Uh oh!

github-actions bot commented Feb 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Code Suggestions ✨

Previous suggestions

Uh oh!

github-actions bot commented Feb 27, 2026

Uh oh!

github-actions bot commented Feb 27, 2026

Uh oh!

github-actions bot commented Feb 27, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Feb 27, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Feb 27, 2026

Uh oh!

github-actions bot commented Mar 3, 2026

Uh oh!

LantaoJin commented Mar 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

github-actions bot commented Feb 27, 2026 •

edited

Loading

(Review updated until commit `fe1766a`)

github-actions bot commented Feb 27, 2026 •

edited

Loading