Explicit head command should override fetch_size in PPL by ahkcs · Pull Request #5194 · opensearch-project/sql

ahkcs · 2026-02-27T01:08:41Z

Summary

Fixes bug 2 in #5191

When a user's PPL query contains an explicit head command, the fetch_size API parameter should not inject an additional Head node on top. Previously, AstStatementBuilder.visitPplStatement() always wrapped the plan with Head(fetchSize), creating a double-Head that capped results unexpectedly.

Example: source=index | head 100 with fetch_size=5 would only return 5 rows instead of the expected 100 rows, because the outer Head(5) silently overrode the user's explicit head 100.

Test plan

Manual verification via curl on local cluster:
- head 5 + fetch_size=10 → 5 rows ✓
- head 8 + fetch_size=3 → 8 rows ✓ (head overrides fetch_size)
- fetch_size=3 (no head) → 3 rows ✓ (fetch_size still works when no head)

github-actions · 2026-02-27T01:10:04Z

PR Reviewer Guide 🔍

(Review updated until commit `8c6d460`)

Here are some key observations to aid the review process:

🧪 PR contains tests
🔒 No security concerns identified
✅ No TODO sections
🔀 Multiple PR themes Sub-PR theme: Core fix — suppress fetch_size Head injection when explicit head exists Relevant files: ppl/src/main/java/org/opensearch/sql/ppl/parser/AstStatementBuilder.java ppl/src/test/java/org/opensearch/sql/ppl/parser/AstStatementBuilderTest.java integ-test/src/test/resources/expectedOutput/calcite/explain_fetch_size_smaller_than_head_push.yaml integ-test/src/test/resources/expectedOutput/calcite/explain_fetch_size_with_head_push.yaml integ-test/src/test/resources/expectedOutput/calcite_no_pushdown/explain_fetch_size_smaller_than_head_push.yaml integ-test/src/test/resources/expectedOutput/calcite_no_pushdown/explain_fetch_size_with_head_push.yaml Sub-PR theme: Integration tests and documentation updates for fetch_size/head precedence Relevant files: integ-test/src/test/java/org/opensearch/sql/ppl/FetchSizeIT.java integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteExplainIT.java integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalcitePPLJoinIT.java docs/user/interfaces/endpoint.rst
⚡ Recommended focus areas for review Incomplete Pipeline Walk The `containsHead` method walks only the first child of each node. If the AST has nodes with multiple children (e.g., a `Filter` or `Sort` node that stores its condition as a child alongside the pipeline child), the traversal may miss a `Head` node that is not always at index 0. Verify that all pipeline nodes consistently place the upstream plan at `children.get(0)`. private boolean containsHead(UnresolvedPlan plan) { UnresolvedPlan current = plan; while (current != null) { if (current instanceof Head) { return true; } List<? extends Node> children = current.getChild(); if (children.isEmpty() \|\| !(children.get(0) instanceof UnresolvedPlan)) { break; } current = (UnresolvedPlan) children.get(0); } return false; } Subquery Scope The comment says subqueries in joins are not traversed, but the implementation relies solely on the first-child chain. If a `Join` node places the right-side subquery at `children.get(0)`, `containsHead` could accidentally detect a `Head` inside a subquery and suppress the outer `fetch_size` injection. Confirm that `Join` nodes always place the main pipeline (left side) at index 0, not the subquery. private boolean containsHead(UnresolvedPlan plan) { UnresolvedPlan current = plan; while (current != null) { if (current instanceof Head) { return true; } List<? extends Node> children = current.getChild(); if (children.isEmpty() \|\| !(children.get(0) instanceof UnresolvedPlan)) { break; } current = (UnresolvedPlan) children.get(0); } return false; } Missing Test Method Body The new test `testHeadOverridesFetchSizeWhenSmaller` appears to be missing its method signature closing — the `@Test` annotation and method declaration are shown but the opening brace and body start at line 178 without a visible method declaration line. Verify the method compiles correctly and the annotation is properly attached. public void testHeadOverridesFetchSizeWhenSmaller() throws IOException { // Explicit head takes precedence over fetch_size // head 3 returns 3 rows regardless of fetch_size=10 JSONObject result = executeQueryWithFetchSize( String.format("source=%s \| head 3 \| fields firstname", TEST_INDEX_ACCOUNT), 10); JSONArray dataRows = result.getJSONArray("datarows"); assertEquals(3, dataRows.length()); }

github-actions · 2026-02-27T01:10:18Z

PR Code Suggestions ✨

Latest suggestions up to 8c6d460

Explore these optional code suggestions:

Category	Suggestion	Impact
General	Avoid hardcoded row count assumption in test The test asserts that `head 100` returns exactly 100 rows from `TEST_INDEX_ACCOUNT`, but this assumes the index contains at least 100 documents. If the index has fewer than 100 documents, the assertion will fail even when the feature works correctly. The assertion should use `assertTrue(dataRows.length() > 5)` or check against the actual document count to make the test robust. integ-test/src/test/java/org/opensearch/sql/ppl/FetchSizeIT.java [186-194] public void testHeadOverridesFetchSizeWhenLarger() throws IOException { // Explicit head takes precedence over fetch_size - // head 100 should return 100 rows even though fetch_size=5 + // head 100 should return more rows than fetch_size=5 would allow JSONObject result = executeQueryWithFetchSize( String.format("source=%s \| head 100 \| fields firstname", TEST_INDEX_ACCOUNT), 5); JSONArray dataRows = result.getJSONArray("datarows"); - assertEquals(100, dataRows.length()); + // fetch_size=5 is ignored; result should have more than 5 rows + assertTrue(dataRows.length() > 5); } Suggestion importance[1-10]: 6 __ Why: The test asserts exactly 100 rows from `TEST_INDEX_ACCOUNT`, which could fail if the index has fewer than 100 documents. Using `assertTrue(dataRows.length() > 5)` would make the test more robust and correctly validate that `fetch_size=5` is being ignored.	Low
General	Verify first-child traversal covers all pipeline node types The `containsHead` method walks only the first child of each node, but some pipeline nodes (e.g., `Project`) may store their child at a different index or use a different child ordering. If the pipeline structure places the main child at an index other than 0, the traversal will stop prematurely and miss a `Head` node. Consider verifying that `children.get(0)` is consistently the "upstream pipeline" child for all relevant node types, or document this assumption explicitly. ppl/src/main/java/org/opensearch/sql/ppl/parser/AstStatementBuilder.java [80-93] +/** + * Checks if the main pipeline contains a {@link Head} node by walking the first-child chain. + * Assumes that for all pipeline nodes, the upstream plan is always children.get(0). + */ private boolean containsHead(UnresolvedPlan plan) { UnresolvedPlan current = plan; while (current != null) { if (current instanceof Head) { return true; } List<? extends Node> children = current.getChild(); if (children.isEmpty() \|\| !(children.get(0) instanceof UnresolvedPlan)) { break; } current = (UnresolvedPlan) children.get(0); } return false; } Suggestion importance[1-10]: 3 __ Why: The suggestion asks to verify an assumption about child ordering rather than fixing a concrete bug. The `improved_code` is nearly identical to `existing_code` (only adds a comment), making this a documentation suggestion rather than a code fix.	Low
Possible issue	Use safe JSON construction for request body The `query` string is interpolated directly into a JSON string using `String.format`, but it contains double quotes (e.g., around index names or string literals) and special characters that could break the JSON structure or cause a parse error. Use a `JSONObject` to build the request body safely instead of manual string formatting. integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalcitePPLJoinIT.java [1099-1100] -request.setJsonEntity( - String.format(Locale.ROOT, "{\"query\": \"%s\", \"fetch_size\": 2}", query)); +JSONObject body = new JSONObject(); +body.put("query", query); +body.put("fetch_size", 2); +request.setJsonEntity(body.toString()); Suggestion importance[1-10]: 5 __ Why: The `query` string contains special characters (brackets, pipes, equals signs) that could potentially break the manually constructed JSON. Using `JSONObject` to build the request body is safer and avoids potential JSON injection issues.	Low

Previous suggestions

Suggestions up to commit 713b095

Category	Suggestion	Impact
Possible issue	Fix fragile assertion assuming index has exactly 100 documents The test asserts that `head 100` returns exactly 100 rows from `TEST_INDEX_ACCOUNT`. However, if the index contains fewer than 100 documents, the assertion will fail even when the feature works correctly. You should either assert `<= 100` rows, or use a known dataset size, or assert that the result is greater than `fetch_size` (5) to validate the override behavior. integ-test/src/test/java/org/opensearch/sql/ppl/FetchSizeIT.java [186-194] public void testHeadOverridesFetchSizeWhenLarger() throws IOException { // Explicit head takes precedence over fetch_size - // head 100 should return 100 rows even though fetch_size=5 + // head 100 should return more rows than fetch_size=5 (up to 100) JSONObject result = executeQueryWithFetchSize( String.format("source=%s \| head 100 \| fields firstname", TEST_INDEX_ACCOUNT), 5); JSONArray dataRows = result.getJSONArray("datarows"); - assertEquals(100, dataRows.length()); + // Result should be more than fetch_size=5, proving head takes precedence + assertTrue(dataRows.length() > 5); + assertTrue(dataRows.length() <= 100); } Suggestion importance[1-10]: 7 __ Why: The assertion `assertEquals(100, dataRows.length())` is fragile if `TEST_INDEX_ACCOUNT` has fewer than 100 documents. The suggestion to use `assertTrue(dataRows.length() > 5)` and `assertTrue(dataRows.length() <= 100)` is more robust and correctly validates the intended behavior (head overrides fetch_size) without depending on the exact dataset size.	Medium
General	Clarify main-pipeline traversal assumption for multi-child nodes The `containsHead` method only walks the first child of each node. For nodes that have multiple children (e.g., a `Join` node where the main pipeline is not always the first child), this could incorrectly detect a `Head` in a subquery branch as being in the main pipeline, or miss the main pipeline entirely. The comment says subqueries are not traversed, but the implementation relies on the assumption that the main pipeline is always `children.get(0)`, which may not hold for all plan node types. Consider documenting or asserting this assumption, or verifying it holds for all relevant node types (e.g., `Join`). ppl/src/main/java/org/opensearch/sql/ppl/parser/AstStatementBuilder.java [80-93] +/** + * Checks if the main pipeline contains a {@link Head} node by walking the first-child chain. + * This relies on the convention that the main pipeline child is always at index 0. + * Nodes like Join may have the main pipeline at a different index — subqueries are intentionally + * not traversed, so a Head inside a subquery will not be detected here. + */ private boolean containsHead(UnresolvedPlan plan) { UnresolvedPlan current = plan; while (current != null) { if (current instanceof Head) { return true; } List<? extends Node> children = current.getChild(); if (children.isEmpty() \|\| !(children.get(0) instanceof UnresolvedPlan)) { break; } current = (UnresolvedPlan) children.get(0); } return false; } Suggestion importance[1-10]: 2 __ Why: The suggestion only adds a comment/documentation to the existing code without changing any logic. The existing code already has a Javadoc comment explaining the behavior, and the `improved_code` is functionally identical to `existing_code` with only comment changes, which should score low per guidelines.	Low

Suggestions up to commit 713b095

Category	Suggestion	Impact
Possible issue	Fix flaky assertion against potentially small dataset The test asserts that `head 100` returns exactly 100 rows from `TEST_INDEX_ACCOUNT`, but if the index contains fewer than 100 documents, the assertion will fail. You should either assert `<= 100` or use an index known to have at least 100 documents to make this test reliable. integ-test/src/test/java/org/opensearch/sql/ppl/FetchSizeIT.java [186-194] public void testHeadOverridesFetchSizeWhenLarger() throws IOException { // Explicit head takes precedence over fetch_size - // head 100 should return 100 rows even though fetch_size=5 + // head 100 should return up to 100 rows even though fetch_size=5 JSONObject result = executeQueryWithFetchSize( String.format("source=%s \| head 100 \| fields firstname", TEST_INDEX_ACCOUNT), 5); JSONArray dataRows = result.getJSONArray("datarows"); - assertEquals(100, dataRows.length()); + assertTrue(dataRows.length() <= 100); + // fetch_size=5 must not have limited the result to 5 + assertTrue(dataRows.length() > 5); } Suggestion importance[1-10]: 7 __ Why: This is a valid concern — asserting `assertEquals(100, dataRows.length())` when `TEST_INDEX_ACCOUNT` may have fewer than 100 documents would cause a flaky test. The improved assertion using `assertTrue(dataRows.length() <= 100)` combined with `assertTrue(dataRows.length() > 5)` is more robust and correctly validates the intended behavior.	Medium
General	Improve pipeline traversal robustness in head detection The `containsHead` method walks only the first child of each node, but some plan nodes (e.g., `Project`) may wrap the pipeline differently and the `Head` node might not always be reachable via `children.get(0)`. Consider verifying that all relevant plan node types expose their pipeline child as the first element, or use a more robust traversal (e.g., checking all children or using a visitor pattern) to avoid false negatives where a `Head` node exists but is missed. ppl/src/main/java/org/opensearch/sql/ppl/parser/AstStatementBuilder.java [80-93] private boolean containsHead(UnresolvedPlan plan) { UnresolvedPlan current = plan; while (current != null) { if (current instanceof Head) { return true; } List<? extends Node> children = current.getChild(); - if (children.isEmpty() \|\| !(children.get(0) instanceof UnresolvedPlan)) { + UnresolvedPlan next = null; + for (Node child : children) { + if (child instanceof UnresolvedPlan) { + next = (UnresolvedPlan) child; + break; + } + } + if (next == null) { break; } - current = (UnresolvedPlan) children.get(0); + current = next; } return false; } Suggestion importance[1-10]: 3 __ Why: The suggestion proposes iterating all children instead of just `children.get(0)`, but the `improved_code` still only picks the first `UnresolvedPlan` child found — functionally equivalent to the original for most cases. The change adds minimal robustness and the comment about `Project` wrapping differently is speculative without concrete evidence from the codebase.	Low

Suggestions up to commit d7a93b3

Category	Suggestion	Impact
Possible issue	Fix fragile assertion assuming index has 100+ documents The test asserts that `head 100` returns exactly 100 rows from `TEST_INDEX_ACCOUNT`, but if the index contains fewer than 100 documents, the assertion will fail. You should either assert `<= 100` or use an index known to have at least 100 documents, or assert against the actual document count of the index. integ-test/src/test/java/org/opensearch/sql/ppl/FetchSizeIT.java [186-194] public void testHeadOverridesFetchSizeWhenLarger() throws IOException { // Explicit head takes precedence over fetch_size - // head 100 should return 100 rows even though fetch_size=5 + // head 100 should return up to 100 rows even though fetch_size=5 JSONObject result = executeQueryWithFetchSize( String.format("source=%s \| head 100 \| fields firstname", TEST_INDEX_ACCOUNT), 5); JSONArray dataRows = result.getJSONArray("datarows"); - assertEquals(100, dataRows.length()); + // Assert that fetch_size=5 did not cap the result; actual count depends on index size + assertTrue(dataRows.length() > 5); } Suggestion importance[1-10]: 7 __ Why: This is a valid concern — if `TEST_INDEX_ACCOUNT` has fewer than 100 documents, `assertEquals(100, dataRows.length())` will fail. The suggestion to use `assertTrue(dataRows.length() > 5)` is a reasonable fix to verify that `fetch_size=5` did not cap the result without assuming the exact document count.	Medium
General	Replace hardcoded document count with named constant The test hardcodes `7` as the expected row count for `TEST_INDEX_BANK`. If the index size ever changes (e.g., due to test data updates), this assertion will silently break. Consider using a named constant or fetching the total count dynamically to make the test more robust. integ-test/src/test/java/org/opensearch/sql/ppl/FetchSizeIT.java [231-237] public void testHeadLargerThanDatasetWithFetchSize() throws IOException { // head 1000 on a 7-row index with fetch_size=3: head takes precedence, returns all 7 rows + // TEST_INDEX_BANK is known to have exactly BANK_INDEX_DOCUMENT_COUNT documents JSONObject result = executeQueryWithFetchSize(String.format("source=%s \| head 1000", TEST_INDEX_BANK), 3); JSONArray dataRows = result.getJSONArray("datarows"); - assertEquals(7, dataRows.length()); + assertEquals(BANK_INDEX_DOCUMENT_COUNT, dataRows.length()); } Suggestion importance[1-10]: 3 __ Why: Using a named constant like `BANK_INDEX_DOCUMENT_COUNT` instead of the hardcoded `7` improves maintainability, but this assumes such a constant exists in the codebase. The suggestion is a minor style improvement with low risk.	Low

Suggestions up to commit 3039353

Category	Suggestion	Impact
Possible issue	Fix fragile assertion against dataset size The test asserts that `head 100` returns exactly 100 rows from `TEST_INDEX_ACCOUNT`, but if the index contains fewer than 100 documents, the assertion will fail. You should either assert `<= 100` rows, or use a known dataset size, or assert that the result is not capped at `fetch_size=5`. integ-test/src/test/java/org/opensearch/sql/ppl/FetchSizeIT.java [186-194] public void testHeadOverridesFetchSizeWhenLarger() throws IOException { // Explicit head takes precedence over fetch_size - // head 100 should return 100 rows even though fetch_size=5 + // head 100 should return up to 100 rows even though fetch_size=5 JSONObject result = executeQueryWithFetchSize( String.format("source=%s \| head 100 \| fields firstname", TEST_INDEX_ACCOUNT), 5); JSONArray dataRows = result.getJSONArray("datarows"); - assertEquals(100, dataRows.length()); + // Result should not be capped at fetch_size=5; it may be less than 100 if index has fewer docs + assertTrue(dataRows.length() > 5); } Suggestion importance[1-10]: 7 __ Why: The test asserts `assertEquals(100, dataRows.length())` but `TEST_INDEX_ACCOUNT` may not contain 100 documents, making this assertion potentially fragile. The suggestion to use `assertTrue(dataRows.length() > 5)` better captures the intent of verifying that `fetch_size=5` is not applied, without depending on the exact dataset size.	Medium
General	Robust pipeline child traversal in head detection The `containsHead` method walks only the first child of each node, but some plan nodes (e.g., `Project`) may wrap the pipeline differently and the `Head` node might not always be reachable via `children.get(0)`. For example, if `Project` stores its child at a different index or uses a different child ordering, the traversal would miss the `Head`. Consider verifying that the child traversal order matches how `UnresolvedPlan` nodes actually store their pipeline children, or use a more robust recursive search. ppl/src/main/java/org/opensearch/sql/ppl/parser/AstStatementBuilder.java [80-93] private boolean containsHead(UnresolvedPlan plan) { UnresolvedPlan current = plan; while (current != null) { if (current instanceof Head) { return true; } List<? extends Node> children = current.getChild(); - if (children.isEmpty() \|\| !(children.get(0) instanceof UnresolvedPlan)) { + UnresolvedPlan next = null; + for (Node child : children) { + if (child instanceof UnresolvedPlan) { + next = (UnresolvedPlan) child; + break; + } + } + if (next == null) { break; } - current = (UnresolvedPlan) children.get(0); + current = next; } return false; } Suggestion importance[1-10]: 3 __ Why: The suggestion proposes iterating over all children to find the first `UnresolvedPlan` child, rather than always taking `children.get(0)`. However, the existing code already checks `children.get(0) instanceof UnresolvedPlan` before casting, and PPL pipeline nodes consistently store their main pipeline child at index 0. The improvement is marginal and the existing tests (including `buildQueryStatementWithFetchSizeAndHeadFollowedByFields`) validate the current traversal works correctly.	Low

Suggestions up to commit 4cacb79

Category	Suggestion	Impact
Possible issue	Verify dataset size assumption in test The test asserts that `head 100` returns exactly 100 rows from `TEST_INDEX_ACCOUNT`. If the account index has fewer than 100 documents, this assertion will fail. You should verify that `TEST_INDEX_ACCOUNT` has at least 100 documents, or adjust the assertion to use `assertTrue(dataRows.length() <= 100)` combined with a check that all available rows are returned. integ-test/src/test/java/org/opensearch/sql/ppl/FetchSizeIT.java [186-193] public void testHeadOverridesFetchSizeWhenLarger() throws IOException { // Explicit head takes precedence over fetch_size // head 100 should return 100 rows even though fetch_size=5 + // (TEST_INDEX_ACCOUNT must have >= 100 documents for this assertion to hold) JSONObject result = executeQueryWithFetchSize( String.format("source=%s \| head 100 \| fields firstname", TEST_INDEX_ACCOUNT), 5); JSONArray dataRows = result.getJSONArray("datarows"); assertEquals(100, dataRows.length()); } Suggestion importance[1-10]: 6 __ Why: This is a valid concern — if `TEST_INDEX_ACCOUNT` has fewer than 100 documents, `assertEquals(100, dataRows.length())` will fail. However, the `improved_code` only adds a comment without actually changing the assertion logic, making it a verification suggestion rather than a fix.	Low
General	Use recursion for cleaner pipeline traversal The `containsHead` method walks only the first child of each node. For nodes like `Project` that wrap the pipeline, the first child is the inner plan, so this works for most cases. However, if a node has multiple children and the `Head` is not in the first-child chain (e.g., after a `Filter` that has the head as a sibling), it could be missed. Consider using `getChild()` to check all children or using a recursive traversal to ensure correctness across all pipeline shapes. ppl/src/main/java/org/opensearch/sql/ppl/parser/AstStatementBuilder.java [80-93] private boolean containsHead(UnresolvedPlan plan) { - UnresolvedPlan current = plan; - while (current != null) { - if (current instanceof Head) { - return true; - } - List<? extends Node> children = current.getChild(); - if (children.isEmpty() \|\| !(children.get(0) instanceof UnresolvedPlan)) { - break; - } - current = (UnresolvedPlan) children.get(0); + if (plan == null) return false; + if (plan instanceof Head) return true; + List<? extends Node> children = plan.getChild(); + if (children.isEmpty()) return false; + // Only traverse the main pipeline (first child chain), not subqueries + Node firstChild = children.get(0); + if (firstChild instanceof UnresolvedPlan) { + return containsHead((UnresolvedPlan) firstChild); } return false; } Suggestion importance[1-10]: 3 __ Why: The suggestion refactors the iterative traversal to a recursive one, but the logic is functionally equivalent — both only follow the first child chain. The improved code doesn't actually fix any real bug; it's a style preference with no meaningful behavioral difference.	Low

When a user's PPL query contains an explicit head command, the fetch_size parameter should not inject an additional Head node on top. Previously, fetch_size always wrapped the plan with Head(fetchSize), creating a double-Head that could cap results unexpectedly (e.g., head 100 with fetch_size=5 would only return 5 rows). This adds a containsHead() check in AstStatementBuilder to skip the Head injection when the user's query already has a head command, letting the user's explicit intent take precedence. Signed-off-by: Kai Huang <ahkcs@amazon.com>

github-actions · 2026-02-27T01:11:32Z

Persistent review updated to latest commit ff1fd32

Remove the outer LogicalSort(fetch=[fetchSize]) from expected explain plans when an explicit head command is present, since fetch_size no longer injects a Head node in that case. Signed-off-by: Kai Huang <ahkcs@amazon.com>

github-actions · 2026-02-27T03:42:28Z

Persistent review updated to latest commit 7f797a9

ppl/src/main/java/org/opensearch/sql/ppl/parser/AstStatementBuilder.java

Verifies that when a query has multiple heads (e.g., head 3 | head 500), fetch_size does not inject yet another Head node. The existing Head nodes are preserved as-is, with the inner head 3 limiting first. Signed-off-by: Kai Huang <ahkcs@amazon.com>

github-actions · 2026-02-27T18:58:50Z

Persistent review updated to latest commit adea087

Signed-off-by: Kai Huang <ahkcs@amazon.com>

github-actions · 2026-02-27T19:05:47Z

Persistent review updated to latest commit e62de53

Signed-off-by: ahkcs <austinhkcs@gmail.com> Signed-off-by: Kai Huang <ahkcs@amazon.com>

github-actions · 2026-02-27T21:55:09Z

Persistent review updated to latest commit a6e1216

ppl/src/main/java/org/opensearch/sql/ppl/parser/AstStatementBuilder.java

Change from recursive all-children traversal to iterative first-child chain walk. This ensures head commands inside join subqueries or nested structures do not incorrectly suppress fetch_size injection on the outer query. Signed-off-by: ahkcs <austinhkcs@gmail.com> Signed-off-by: Kai Huang <ahkcs@amazon.com>

github-actions · 2026-03-02T20:09:52Z

Persistent review updated to latest commit 5841f08

Explain that since PPL's fetch_size has no pagination support, it acts as a default response size rather than an absolute cap. When the user specifies an explicit head command, fetch_size yields to avoid silently discarding rows with no way to retrieve them. Also document that head commands inside subqueries do not affect fetch_size for the outer query. Signed-off-by: ahkcs <austinhkcs@gmail.com> Signed-off-by: Kai Huang <ahkcs@amazon.com>

github-actions · 2026-03-02T20:16:00Z

Persistent review updated to latest commit 4cacb79

ppl/src/main/java/org/opensearch/sql/ppl/parser/AstStatementBuilder.java

Verifies that containsHead correctly walks past Project (from fields) to find the Head node underneath. Signed-off-by: ahkcs <austinhkcs@gmail.com> Signed-off-by: Kai Huang <ahkcs@amazon.com>

github-actions · 2026-03-02T21:46:43Z

Persistent review updated to latest commit 3039353

dai-chen · 2026-03-02T22:39:29Z

docs/user/interfaces/endpoint.rst

+-----------------------------------------------+---------------------------------------------------+
+| Query                                         | Behavior with ``fetch_size=5``                    |
+===============================================+===================================================+
+| ``source=t \| fields age``                    | ``fetch_size`` applies — returns 5 rows           |


np: is the \ required?

Nice catch! Removed

Signed-off-by: Kai Huang <ahkcs@amazon.com>

github-actions · 2026-03-02T22:48:34Z

Persistent review updated to latest commit d7a93b3

github-actions · 2026-03-02T22:49:41Z

Persistent review updated to latest commit 713b095

github-actions · 2026-03-02T22:50:24Z

Persistent review updated to latest commit 713b095

penghuo · 2026-03-03T19:13:31Z

docs/user/interfaces/endpoint.rst

+===============================================+===================================================+
+| ``source=t | fields age``                     | ``fetch_size`` applies — returns 5 rows           |
+-----------------------------------------------+---------------------------------------------------+
+| ``source=t | head 100 | fields age``          | ``head`` takes precedence — returns 100 rows      |


fetch_size is optional — if you don't want it, don't send it. The server should honor what the client explicitly requests. The fix belongs in the client that's injecting fetch_size by default.

From a REST API perspective, fetch_size is an explicit, optional parameter the caller chooses to send. If the caller sends fetch_size=5, they're requesting at most 5 rows — that's the API contract. Adding server-side logic to silently ignore that parameter based on query content breaks the principle of least surprise at the API layer.

I think this is a very valid point. If we fix it server-side, we're working around a client behavior by making the API semantics inconsistent. If we fix it client-side, the API contract stays clean.

Opened PR on OSD side: opensearch-project/OpenSearch-Dashboards#11430

Validates that head commands inside join subqueries do not suppress fetch_size on the outer query, locking down the scoping guarantee. Signed-off-by: ahkcs <austinhkcs@gmail.com> Signed-off-by: Kai Huang <ahkcs@amazon.com>

github-actions · 2026-03-03T20:34:39Z

Persistent review updated to latest commit 8c6d460

ahkcs force-pushed the fix/fetch-size-head-total-hits branch from d2b35ee to ff1fd32 Compare February 27, 2026 01:10

dai-chen reviewed Feb 27, 2026

View reviewed changes

ppl/src/main/java/org/opensearch/sql/ppl/parser/AstStatementBuilder.java Outdated Show resolved Hide resolved

spotless

e62de53

Signed-off-by: Kai Huang <ahkcs@amazon.com>

ahkcs requested a review from dai-chen February 27, 2026 21:09

Add documentation for fetch_size and head command interaction

a6e1216

Signed-off-by: ahkcs <austinhkcs@gmail.com> Signed-off-by: Kai Huang <ahkcs@amazon.com>

dai-chen reviewed Mar 2, 2026

View reviewed changes

ppl/src/main/java/org/opensearch/sql/ppl/parser/AstStatementBuilder.java Outdated Show resolved Hide resolved

ahkcs requested a review from dai-chen March 2, 2026 18:55

dai-chen reviewed Mar 2, 2026

View reviewed changes

ppl/src/main/java/org/opensearch/sql/ppl/parser/AstStatementBuilder.java Show resolved Hide resolved

Add test case for head followed by fields with fetch_size

3039353

Verifies that containsHead correctly walks past Project (from fields) to find the Head node underneath. Signed-off-by: ahkcs <austinhkcs@gmail.com> Signed-off-by: Kai Huang <ahkcs@amazon.com>

dai-chen reviewed Mar 2, 2026

View reviewed changes

dai-chen previously approved these changes Mar 2, 2026

View reviewed changes

ahkcs dismissed dai-chen’s stale review via b1fe41b March 2, 2026 22:47

ahkcs force-pushed the fix/fetch-size-head-total-hits branch from b1fe41b to d7a93b3 Compare March 2, 2026 22:48

Remove unnecessary backslash escapes in RST table pipe characters

713b095

Signed-off-by: Kai Huang <ahkcs@amazon.com>

ahkcs force-pushed the fix/fetch-size-head-total-hits branch from d7a93b3 to 713b095 Compare March 2, 2026 22:48

ahkcs requested a review from dai-chen March 3, 2026 17:08

penghuo reviewed Mar 3, 2026

View reviewed changes

Add integration test for fetch_size with head in join subquery

8c6d460

Validates that head commands inside join subqueries do not suppress fetch_size on the outer query, locking down the scoping guarantee. Signed-off-by: ahkcs <austinhkcs@gmail.com> Signed-off-by: Kai Huang <ahkcs@amazon.com>

Conversation

ahkcs commented Feb 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test plan

Uh oh!

github-actions bot commented Feb 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Reviewer Guide 🔍

(Review updated until commit 8c6d460)

Uh oh!

github-actions bot commented Feb 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Code Suggestions ✨

Previous suggestions

Uh oh!

github-actions bot commented Feb 27, 2026

Uh oh!

github-actions bot commented Feb 27, 2026

Uh oh!

Uh oh!

github-actions bot commented Feb 27, 2026

Uh oh!

github-actions bot commented Feb 27, 2026

Uh oh!

github-actions bot commented Feb 27, 2026

Uh oh!

Uh oh!

github-actions bot commented Mar 2, 2026

Uh oh!

github-actions bot commented Mar 2, 2026

Uh oh!

Uh oh!

github-actions bot commented Mar 2, 2026

Uh oh!

dai-chen Mar 2, 2026

Choose a reason for hiding this comment

Uh oh!

ahkcs Mar 2, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Mar 2, 2026

Uh oh!

github-actions bot commented Mar 2, 2026

Uh oh!

github-actions bot commented Mar 2, 2026

Uh oh!

penghuo Mar 3, 2026

Choose a reason for hiding this comment

Uh oh!

ahkcs Mar 3, 2026

Choose a reason for hiding this comment

Uh oh!

ahkcs Mar 3, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Mar 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

ahkcs commented Feb 27, 2026 •

edited

Loading

github-actions bot commented Feb 27, 2026 •

edited

Loading

(Review updated until commit `8c6d460`)

github-actions bot commented Feb 27, 2026 •

edited

Loading