Skip to content

Support pushdown array/collect aggregation#5072

Open
LantaoJin wants to merge 12 commits intoopensearch-project:mainfrom
LantaoJin:pr/issues/5070
Open

Support pushdown array/collect aggregation#5072
LantaoJin wants to merge 12 commits intoopensearch-project:mainfrom
LantaoJin:pr/issues/5070

Conversation

@LantaoJin
Copy link
Member

@LantaoJin LantaoJin commented Jan 26, 2026

Description

This PR is a pushdown optimization for #5025, which introduce array/collect Calcite aggregation.

Related Issues

Resolves #5070

Check List

  • New functionality includes testing.
  • New functionality has been documented.
  • New functionality has javadoc added.
  • New functionality has a user manual doc added.
  • New PPL command checklist all confirmed.
  • API changes companion pull request created.
  • Commits are signed per the DCO using --signoff or -s.
  • Public documentation issue/PR created.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Signed-off-by: Lantao Jin <ltjin@amazon.com>
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Jan 26, 2026

Important

Review skipped

Draft detected.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

  • 🔍 Trigger a full review

Comment @coderabbitai help to get the list of available commands and usage tips.

@LantaoJin LantaoJin added the enhancement New feature or request label Jan 26, 2026
@LantaoJin
Copy link
Member Author

This PR includes duplicated logic of #5025, which is required to rebase after 5025 merged.

@github-actions
Copy link
Contributor

github-actions bot commented Mar 3, 2026

PR Reviewer Guide 🔍

(Review updated until commit e4fa9b2)

Here are some key observations to aid the review process:

🧪 PR contains tests
🔒 No security concerns identified
📝 TODO sections

🔀 Multiple PR themes

Sub-PR theme: Core pushdown support for COLLECT/ARRAY_AGG aggregation

Relevant files:

  • opensearch/src/main/java/org/opensearch/sql/opensearch/request/AggregateAnalyzer.java
  • core/src/main/java/org/opensearch/sql/calcite/utils/OpenSearchTypeFactory.java
  • integ-test/src/test/resources/expectedOutput/calcite/explain_mvcombine.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_take.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_patterns_simple_pattern_agg_push.yaml

Sub-PR theme: Integration tests and documentation for mvcombine index settings

Relevant files:

  • integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteMvCombineCommandIT.java
  • integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalcitePPLMapPathIT.java
  • integ-test/src/test/java/org/opensearch/sql/ppl/NewAddedCommandsIT.java
  • integ-test/src/test/java/org/opensearch/sql/security/CalciteCrossClusterSearchIT.java
  • integ-test/src/test/java/org/opensearch/sql/legacy/SQLIntegTestCase.java
  • docs/user/ppl/cmd/mvcombine.md

⚡ Recommended focus areas for review

Hardcoded Size

The COLLECT/ARRAY_AGG pushdown uses helper.queryBucketSize as the top_hits size. This means the result size is bounded by the query bucket size (default 10000), not by any user-specified limit. If a user expects all matching documents to be collected, this could silently truncate results without any warning or error.

case COLLECT, ARRAY_AGG -> {
  TopHitsAggregationBuilder topHitsBuilder =
      createTopHitsBuilder(
          aggCall, args, aggName, helper, helper.queryBucketSize, false, false, null, null);
  yield Pair.of(topHitsBuilder, new TopHitsParser(aggName, false, true));
}
Missing Return

In testMvCombineUnsupportedInV2, the variable result may be uninitialized if the try block throws an exception that is not a ResponseException. The finally block also does not guarantee result is assigned before verifyQuery(result) is called.

  JSONObject result;
  try {
    updateIndexSettings(
        TEST_INDEX_BANK, "{ \"index\": { \"max_inner_result_window\":" + 10000 + " } }");
    result =
        executeQuery(
            String.format(
                "source=%s | fields state, city, age | mvcombine age", TEST_INDEX_BANK));
  } catch (ResponseException e) {
    result = new JSONObject(TestUtils.getResponseBody(e.getResponse()));
  } finally {
    updateIndexSettings(
        TEST_INDEX_BANK, "{ \"index\": { \"max_inner_result_window\":" + 100 + " } }");
  }
  verifyQuery(result);
}
JUnit Annotation

The @After annotation is imported from org.junit.After (JUnit 4), but the test class uses org.junit.jupiter.api.Test (JUnit 5). In JUnit 5, the correct annotation is @AfterEach. This mismatch may cause the teardown method to never be called, leaving the index setting at 10000 after test failures.

@After
public void afterTest() throws IOException {
  updateIndexSettings(INDEX, "{ \"index\": { \"max_inner_result_window\":" + 100 + " } }");
}
Configuration Requirement

The documentation states that index.max_inner_result_window must be increased to plugins.query.buckets (default 10000) or larger, but does not explain what happens if this setting is not updated (e.g., silent truncation or error). It would be helpful to clarify the failure mode and whether there is any automatic validation.

The mvcombine command leverages OpenSearch's top hits aggregation pushdown, which requires increasing the index setting `index.max_inner_result_window` to `plugins.query.buckets` (default: 10000) or larger.

Change the `index.max_inner_result_window` to `10000`:

```bash ppl
curl -sS -H 'Content-Type: application/json' \
-X PUT localhost:9200/mvcombine_data/_settings \
-d '{"index" : {"max_inner_result_window" : "10000"}}'
{
  "acknowledged": true
}

</details>

</td></tr>
</table>

@github-actions
Copy link
Contributor

github-actions bot commented Mar 3, 2026

PR Code Suggestions ✨

Latest suggestions up to e4fa9b2

Explore these optional code suggestions:

CategorySuggestion                                                                                                                                    Impact
Possible issue
Guard against uninitialized variable usage

The variable result may be uninitialized if updateIndexSettings itself throws an
exception before result is assigned, and the finally block's updateIndexSettings
could also throw, masking the original exception. Additionally, result is used after
the try-catch-finally block but may not be initialized if an unexpected exception
occurs. Consider initializing result to null before the try block and handling the
case where it remains null.

integ-test/src/test/java/org/opensearch/sql/ppl/NewAddedCommandsIT.java [248-261]

+JSONObject result = null;
 try {
       updateIndexSettings(
           TEST_INDEX_BANK, "{ \"index\": { \"max_inner_result_window\":" + 10000 + " } }");
       result =
           executeQuery(
               String.format(
                   "source=%s | fields state, city, age | mvcombine age", TEST_INDEX_BANK));
     } catch (ResponseException e) {
       result = new JSONObject(TestUtils.getResponseBody(e.getResponse()));
     } finally {
       updateIndexSettings(
           TEST_INDEX_BANK, "{ \"index\": { \"max_inner_result_window\":" + 100 + " } }");
     }
+if (result != null) {
+  verifyQuery(result);
+}
Suggestion importance[1-10]: 3

__

Why: The result variable is declared at line 247 (JSONObject result;) before the try block, so the compiler would catch any uninitialized usage. The suggestion's concern about updateIndexSettings throwing before result is assigned is valid but minor in a test context, and the improved code changes verifyQuery(result) to a null-guarded call which alters test semantics.

Low
General
Use correct size limit for collect aggregation

Using helper.queryBucketSize as the dedupNumber (size) for the top_hits aggregation
means the result size is tied to the composite bucket size rather than the actual
query limit. For COLLECT/ARRAY_AGG, the intent is to collect all values up to the
query limit, so helper.queryBucketSize may be incorrect — it should likely use a
value derived from the requested total size or a dedicated configuration. This could
silently truncate collected arrays.

opensearch/src/main/java/org/opensearch/sql/opensearch/request/AggregateAnalyzer.java [606-611]

 case COLLECT, ARRAY_AGG -> {
       TopHitsAggregationBuilder topHitsBuilder =
           createTopHitsBuilder(
-              aggCall, args, aggName, helper, helper.queryBucketSize, false, false, null, null);
+              aggCall, args, aggName, helper, helper.maxInnerResultWindow, false, false, null, null);
       yield Pair.of(topHitsBuilder, new TopHitsParser(aggName, false, true));
     }
Suggestion importance[1-10]: 2

__

Why: The suggestion proposes using helper.maxInnerResultWindow which doesn't appear to be a field in the helper object based on the PR context. The PR intentionally uses helper.queryBucketSize for COLLECT/ARRAY_AGG, consistent with the max_inner_result_window being set to match plugins.query.buckets (default 10000), making this suggestion potentially incorrect.

Low

Previous suggestions

Suggestions up to commit 1e3656d
CategorySuggestion                                                                                                                                    Impact
Possible issue
Prevent potential uninitialized variable usage

The variable result may be uninitialized if updateIndexSettings throws an exception
before the assignment. If updateIndexSettings in the try block throws an exception
that is not a ResponseException, result will be uninitialized and the subsequent
verifyQuery(result) call will fail with a compile error or NPE. Consider
initializing result to null before the try block and handling the case where it
remains null.

integ-test/src/test/java/org/opensearch/sql/ppl/NewAddedCommandsIT.java [247-261]

+JSONObject result = null;
 try {
       updateIndexSettings(
           TEST_INDEX_BANK, "{ \"index\": { \"max_inner_result_window\":" + 1000 + " } }");
       result =
           executeQuery(
               String.format(
                   "source=%s | fields state, city, age | mvcombine age", TEST_INDEX_BANK));
     } catch (ResponseException e) {
       result = new JSONObject(TestUtils.getResponseBody(e.getResponse()));
     } finally {
       updateIndexSettings(
           TEST_INDEX_BANK, "{ \"index\": { \"max_inner_result_window\":" + 100 + " } }");
     }
Suggestion importance[1-10]: 6

__

Why: The variable result could be uninitialized if updateIndexSettings throws a non-ResponseException, causing a compile error or NPE at verifyQuery(result). Initializing result = null before the try block is a valid defensive practice, though the PR code already declares result before the try block (JSONObject result; at line 247).

Low
Suggestions up to commit adbbc79
CategorySuggestion                                                                                                                                    Impact
Possible issue
Guard against uninitialized result variable

The variable result may be uninitialized if updateIndexSettings itself throws an
exception before result is assigned, and the finally block runs cleanup but result
is never set. The result variable should be initialized to a default value (e.g.,
null) before the try block, and the code after the try-catch-finally should handle
the case where result is null.

integ-test/src/test/java/org/opensearch/sql/ppl/NewAddedCommandsIT.java [248-261]

+JSONObject result = null;
 try {
       updateIndexSettings(
           TEST_INDEX_BANK, "{ \"index\": { \"max_inner_result_window\":" + 10000 + " } }");
       result =
           executeQuery(
               String.format(
                   "source=%s | fields state, city, age | mvcombine age", TEST_INDEX_BANK));
     } catch (ResponseException e) {
       result = new JSONObject(TestUtils.getResponseBody(e.getResponse()));
     } finally {
       updateIndexSettings(
           TEST_INDEX_BANK, "{ \"index\": { \"max_inner_result_window\":" + 100 + " } }");
     }
+if (result != null) {
+  verifyQuery(result);
+}
Suggestion importance[1-10]: 4

__

Why: The concern about result being uninitialized is valid if updateIndexSettings throws a non-ResponseException, but looking at the existing code, result is declared at line 247 (JSONObject result;) and the verifyQuery(result) call at line 261 would fail to compile if result could be uninitialized. The improved code changes the test logic by making verifyQuery conditional, which alters test semantics. The suggestion has some merit but the improved code is not quite right for a test context.

Low
General
Document size limit dependency for collect aggregation

Using helper.queryBucketSize as the size parameter for top_hits aggregation in
COLLECT/ARRAY_AGG could be very large (default 10000), potentially causing memory
pressure or exceeding max_inner_result_window. Consider using a configurable or
bounded limit, and document that the index setting max_inner_result_window must be
set accordingly, or validate that the requested size does not exceed the configured
limit before building the aggregation.

opensearch/src/main/java/org/opensearch/sql/opensearch/request/AggregateAnalyzer.java [606-611]

 case COLLECT, ARRAY_AGG -> {
+      // Requires index.max_inner_result_window >= helper.queryBucketSize
       TopHitsAggregationBuilder topHitsBuilder =
           createTopHitsBuilder(
               aggCall, args, aggName, helper, helper.queryBucketSize, false, false, null, null);
       yield Pair.of(topHitsBuilder, new TopHitsParser(aggName, false, true));
     }
Suggestion importance[1-10]: 1

__

Why: The suggestion only adds a comment to the existing code without any functional change. The existing_code and improved_code are functionally identical, differing only by an added comment, which falls into the documentation category and should receive a low score.

Low
Suggestions up to commit aed63f7
CategorySuggestion                                                                                                                                    Impact
Possible issue
Guard against uninitialized variable after try-catch

The variable result is assigned inside the try block but used after the
try-catch-finally block in verifyQuery(result). If updateIndexSettings throws an
exception before result is assigned, result will be uninitialized and the code won't
compile, or if it's initialized to null, verifyQuery will receive a null value.
Ensure result is initialized before the try block (e.g., JSONObject result = null;)
and handle the null case, or restructure so verifyQuery is called inside the try
block.

integ-test/src/test/java/org/opensearch/sql/ppl/NewAddedCommandsIT.java [248-261]

+JSONObject result = null;
 try {
       updateIndexSettings(
           TEST_INDEX_BANK, "{ \"index\": { \"max_inner_result_window\":" + 10000 + " } }");
       result =
           executeQuery(
               String.format(
                   "source=%s | fields state, city, age | mvcombine age", TEST_INDEX_BANK));
     } catch (ResponseException e) {
       result = new JSONObject(TestUtils.getResponseBody(e.getResponse()));
     } finally {
       updateIndexSettings(
           TEST_INDEX_BANK, "{ \"index\": { \"max_inner_result_window\":" + 100 + " } }");
     }
+if (result != null) {
+  verifyQuery(result);
+}
Suggestion importance[1-10]: 4

__

Why: The result variable is declared before the try block (JSONObject result;) and is assigned in both the try and catch branches, so the Java compiler would flag it as potentially uninitialized if verifyQuery(result) is called after. However, the suggestion's improved_code adds a null check that changes test semantics (silently skipping verification), which is not ideal for a test. The real fix is simply initializing result = null before the try block, but the null guard is debatable.

Low
General
Silent result truncation for large collect aggregations

The COLLECT/ARRAY_AGG pushdown always uses the hardcoded MAX_TOP_HITS_RESULT_WINDOW
(10000) as the size, regardless of any user-specified limit or the actual data size.
This could silently truncate results when there are more than 10000 items in a group
without any warning to the user. Consider checking if the actual required size
exceeds MAX_TOP_HITS_RESULT_WINDOW and either throwing an informative exception or
logging a warning.

opensearch/src/main/java/org/opensearch/sql/opensearch/request/AggregateAnalyzer.java [608-621]

 case COLLECT, ARRAY_AGG -> {
+        // Warn if results may be silently truncated
         TopHitsAggregationBuilder topHitsBuilder =
             createTopHitsBuilder(
                 aggCall,
                 args,
                 aggName,
                 helper,
                 MAX_TOP_HITS_RESULT_WINDOW,
                 false,
                 false,
                 null,
                 null);
         yield Pair.of(topHitsBuilder, new TopHitsParser(aggName, false, true));
       }
Suggestion importance[1-10]: 3

__

Why: The concern about silent truncation at MAX_TOP_HITS_RESULT_WINDOW is valid, but the improved_code only adds a comment without any actual warning or exception logic, making it essentially identical to the existing code. The suggestion is more of a design consideration than an actionable fix.

Low
Suggestions up to commit 70e2f57
CategorySuggestion                                                                                                                                    Impact
Possible issue
Handle exceptions from async settings update call

The .get() call on the future can throw checked and unchecked exceptions (e.g.,
InterruptedException, ExecutionException) that are not handled. If the future is
interrupted or the execution fails, the exception will propagate as an unchecked
exception without a meaningful error message. Wrap the call in a try-catch block
similar to how getIndexMaxResultWindows handles exceptions in the same class.

opensearch/src/main/java/org/opensearch/sql/opensearch/client/OpenSearchNodeClient.java [142-153]

-AcknowledgedResponse response =
-    client
-        .admin()
-        .indices()
-        .prepareUpdateSettings(indexExpression)
-        .setSettings(
-            settings.entrySet().stream()
-                .filter(e -> e.getKey().startsWith("index."))
-                .collect(Collectors.toMap(Map.Entry::getKey, Map.Entry::getValue)))
-        .get();
-return response.isAcknowledged();
+try {
+  AcknowledgedResponse response =
+      client
+          .admin()
+          .indices()
+          .prepareUpdateSettings(indexExpression)
+          .setSettings(
+              settings.entrySet().stream()
+                  .filter(e -> e.getKey().startsWith("index."))
+                  .collect(Collectors.toMap(Map.Entry::getKey, Map.Entry::getValue)))
+          .get();
+  return response.isAcknowledged();
+} catch (Exception e) {
+  throw new IllegalStateException(
+      String.format("Failed to update index settings %s for %s", settings, Arrays.toString(indexExpression)), e);
+}
Suggestion importance[1-10]: 6

__

Why: The .get() call can throw InterruptedException and ExecutionException which are not handled, unlike the pattern used in getIndexMaxResultWindows. Wrapping in a try-catch would improve robustness and provide meaningful error messages.

Low
Prevent potential uninitialized variable usage

The variable result may be uninitialized if executeQuery throws an exception other
than ResponseException (e.g., IOException). In that case, verifyQuery(result) would
fail with a compile error or use an uninitialized variable. Initialize result to
null before the try block and add a null check, or declare it with a default value.

integ-test/src/test/java/org/opensearch/sql/ppl/NewAddedCommandsIT.java [248-261]

+JSONObject result = null;
 try {
   updateIndexSettings(
       TEST_INDEX_BANK, "{ \"index\": { \"max_inner_result_window\":" + 10000 + " } }");
   result =
       executeQuery(
           String.format(
               "source=%s | fields state, city, age | mvcombine age", TEST_INDEX_BANK));
 } catch (ResponseException e) {
   result = new JSONObject(TestUtils.getResponseBody(e.getResponse()));
 } finally {
   updateIndexSettings(
       TEST_INDEX_BANK, "{ \"index\": { \"max_inner_result_window\":" + 100 + " } }");
 }
-verifyQuery(result);
+if (result != null) {
+  verifyQuery(result);
+}
Suggestion importance[1-10]: 4

__

Why: The result variable could be uninitialized if an exception other than ResponseException is thrown, but in practice the existing code already declares result before the try block (line 247: JSONObject result;). The suggestion adds a null check which is a minor defensive improvement.

Low
General
Respect index max inner result window setting

The COLLECT and ARRAY_AGG aggregations always use the hardcoded
MAX_TOP_HITS_RESULT_WINDOW (10000) as the size, but the actual index setting
max_inner_result_window may be lower than this value (the default is 100). If the
index setting hasn't been updated, the query will fail at runtime. Consider reading
the actual index setting and using it as the size limit, or at least
validating/documenting this requirement clearly.

opensearch/src/main/java/org/opensearch/sql/opensearch/request/AggregateAnalyzer.java [608-621]

 case COLLECT, ARRAY_AGG -> {
+    // Note: requires index.max_inner_result_window >= MAX_TOP_HITS_RESULT_WINDOW (10000)
+    // Ensure the index setting has been updated before using this aggregation.
+    int sizeLimit = Math.min(MAX_TOP_HITS_RESULT_WINDOW,
+        helper.getMaxInnerResultWindow()); // use actual index setting if available
     TopHitsAggregationBuilder topHitsBuilder =
         createTopHitsBuilder(
             aggCall,
             args,
             aggName,
             helper,
-            MAX_TOP_HITS_RESULT_WINDOW,
+            sizeLimit,
             false,
             false,
             null,
             null);
     yield Pair.of(topHitsBuilder, new TopHitsParser(aggName, false, true));
 }
Suggestion importance[1-10]: 3

__

Why: The suggestion proposes using helper.getMaxInnerResultWindow() which may not exist in the helper object's API, making the improved_code potentially incorrect. The PR intentionally uses MAX_TOP_HITS_RESULT_WINDOW and documents the requirement to update the index setting, so this is more of a design concern than a bug.

Low
Suggestions up to commit 8e5b95f
CategorySuggestion                                                                                                                                    Impact
Possible issue
Fix wrong JUnit lifecycle annotation in JUnit 5 test

The @After annotation is from JUnit 4, but the test class uses JUnit 5 annotations
(@Test from org.junit.jupiter.api.Test, Assertions from org.junit.jupiter.api). In
JUnit 5, the equivalent annotation is @AfterEach. Using @After in a JUnit 5 test
class means the teardown method will never be executed, leaving the index setting
permanently changed.

integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteMvCombineCommandIT.java [38-41]

-@After
+@AfterEach
 public void afterTest() throws IOException {
   updateIndexSettings(INDEX, "{ \"index\": { \"max_inner_result_window\":" + 100 + " } }");
 }
Suggestion importance[1-10]: 8

__

Why: The @After annotation (JUnit 4) will not execute in a JUnit 5 test class, meaning the max_inner_result_window setting will never be reset after tests, potentially affecting other tests. The correct annotation is @AfterEach from JUnit 5.

Medium
Handle exceptions from async settings update call

The .get() call on the future can throw checked and unchecked exceptions (e.g.,
ExecutionException, InterruptedException) that are not handled. If the settings
update fails, the exception will propagate as an unhandled runtime exception without
a meaningful error message. Wrap the call in a try-catch block similar to how
getIndexMaxResultWindows handles exceptions.

opensearch/src/main/java/org/opensearch/sql/opensearch/client/OpenSearchNodeClient.java [142-153]

-AcknowledgedResponse response =
-    client
-        .admin()
-        .indices()
-        .prepareUpdateSettings(indexExpression)
-        .setSettings(
-            settings.entrySet().stream()
-                .filter(e -> e.getKey().startsWith("index."))
-                .collect(Collectors.toMap(Map.Entry::getKey, Map.Entry::getValue)))
-        .get();
-return response.isAcknowledged();
+try {
+  AcknowledgedResponse response =
+      client
+          .admin()
+          .indices()
+          .prepareUpdateSettings(indexExpression)
+          .setSettings(
+              settings.entrySet().stream()
+                  .filter(e -> e.getKey().startsWith("index."))
+                  .collect(Collectors.toMap(Map.Entry::getKey, Map.Entry::getValue)))
+          .get();
+  return response.isAcknowledged();
+} catch (Exception e) {
+  throw new IllegalStateException(
+      String.format("Failed to update index settings %s for %s", settings, Arrays.toString(indexExpression)), e);
+}
Suggestion importance[1-10]: 7

__

Why: The .get() call can throw ExecutionException or InterruptedException which are not handled, potentially causing confusing failures. The OpenSearchRestClient counterpart already wraps its call in a try-catch with a meaningful error message, so consistency and proper error handling here is important.

Medium
Respect index max_inner_result_window setting

The COLLECT/ARRAY_AGG pushdown hardcodes MAX_TOP_HITS_RESULT_WINDOW (10000) as the
size, but the actual index setting max_inner_result_window may be lower than 10000
(the default is 100). If the index setting hasn't been updated, OpenSearch will
reject the query with an error. The code should either read the actual index setting
at query time or validate/cap the size against the configured
max_inner_result_window to avoid runtime failures.

opensearch/src/main/java/org/opensearch/sql/opensearch/request/AggregateAnalyzer.java [608-621]

 case COLLECT, ARRAY_AGG -> {
+    // Use the configured max_inner_result_window from the index settings, capped at MAX_TOP_HITS_RESULT_WINDOW
+    int size = Math.min(helper.getMaxInnerResultWindow(), MAX_TOP_HITS_RESULT_WINDOW);
     TopHitsAggregationBuilder topHitsBuilder =
         createTopHitsBuilder(
             aggCall,
             args,
             aggName,
             helper,
-            MAX_TOP_HITS_RESULT_WINDOW,
+            size,
             false,
             false,
             null,
             null);
     yield Pair.of(topHitsBuilder, new TopHitsParser(aggName, false, true));
   }
Suggestion importance[1-10]: 4

__

Why: While the concern about max_inner_result_window is valid, the PR's design intentionally requires users to update the index setting (as documented in the mvcombine.md), and the improved_code references helper.getMaxInnerResultWindow() which may not exist in the current codebase, making this suggestion speculative.

Low

@github-actions
Copy link
Contributor

github-actions bot commented Mar 5, 2026

Persistent review updated to latest commit d0714da

Signed-off-by: Lantao Jin <ltjin@amazon.com>
@github-actions
Copy link
Contributor

github-actions bot commented Mar 5, 2026

Persistent review updated to latest commit 8e5b95f

Signed-off-by: Lantao Jin <ltjin@amazon.com>
@github-actions
Copy link
Contributor

github-actions bot commented Mar 5, 2026

Persistent review updated to latest commit 70e2f57

Signed-off-by: Lantao Jin <ltjin@amazon.com>
@github-actions
Copy link
Contributor

github-actions bot commented Mar 5, 2026

Persistent review updated to latest commit aed63f7

Signed-off-by: Lantao Jin <ltjin@amazon.com>
@github-actions
Copy link
Contributor

github-actions bot commented Mar 5, 2026

Persistent review updated to latest commit adbbc79

@LantaoJin LantaoJin marked this pull request as ready for review March 5, 2026 09:41
@LantaoJin
Copy link
Member Author

cc @yuancu @penghuo

@github-actions
Copy link
Contributor

github-actions bot commented Mar 5, 2026

Persistent review updated to latest commit 1e3656d

Signed-off-by: Lantao Jin <ltjin@amazon.com>
@github-actions
Copy link
Contributor

github-actions bot commented Mar 6, 2026

Persistent review updated to latest commit e4fa9b2

case COLLECT, ARRAY_AGG -> {
TopHitsAggregationBuilder topHitsBuilder =
createTopHitsBuilder(
aggCall, args, aggName, helper, helper.queryBucketSize, false, false, null, null);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is useSingleColumn set to false? It seems only one field will be returned in top hits. Besides, setting it to true will disable fetching _source to reduce data transferred on wire, which is never used by its TopHitsParser.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[FEATURE] Support pushdown collect/array_agg aggregation

2 participants