Skip to content

Explicit head command should override fetch_size in PPL#5194

Open
ahkcs wants to merge 10 commits intoopensearch-project:mainfrom
ahkcs:fix/fetch-size-head-total-hits
Open

Explicit head command should override fetch_size in PPL#5194
ahkcs wants to merge 10 commits intoopensearch-project:mainfrom
ahkcs:fix/fetch-size-head-total-hits

Conversation

@ahkcs
Copy link
Contributor

@ahkcs ahkcs commented Feb 27, 2026

Summary

Fixes bug 2 in #5191

When a user's PPL query contains an explicit head command, the fetch_size API parameter should not inject an additional Head node on top. Previously, AstStatementBuilder.visitPplStatement() always wrapped the plan with Head(fetchSize), creating a double-Head that capped results unexpectedly.

Example: source=index | head 100 with fetch_size=5 would only return 5 rows instead of the expected 100 rows, because the outer Head(5) silently overrode the user's explicit head 100.

Test plan

  • Manual verification via curl on local cluster:
    • head 5 + fetch_size=10 → 5 rows ✓
    • head 8 + fetch_size=3 → 8 rows ✓ (head overrides fetch_size)
    • fetch_size=3 (no head) → 3 rows ✓ (fetch_size still works when no head)

@github-actions
Copy link
Contributor

github-actions bot commented Feb 27, 2026

PR Reviewer Guide 🔍

(Review updated until commit 8c6d460)

Here are some key observations to aid the review process:

🧪 PR contains tests
🔒 No security concerns identified
✅ No TODO sections
🔀 Multiple PR themes

Sub-PR theme: Core fix — suppress fetch_size Head injection when explicit head exists

Relevant files:

  • ppl/src/main/java/org/opensearch/sql/ppl/parser/AstStatementBuilder.java
  • ppl/src/test/java/org/opensearch/sql/ppl/parser/AstStatementBuilderTest.java
  • integ-test/src/test/resources/expectedOutput/calcite/explain_fetch_size_smaller_than_head_push.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_fetch_size_with_head_push.yaml
  • integ-test/src/test/resources/expectedOutput/calcite_no_pushdown/explain_fetch_size_smaller_than_head_push.yaml
  • integ-test/src/test/resources/expectedOutput/calcite_no_pushdown/explain_fetch_size_with_head_push.yaml

Sub-PR theme: Integration tests and documentation updates for fetch_size/head precedence

Relevant files:

  • integ-test/src/test/java/org/opensearch/sql/ppl/FetchSizeIT.java
  • integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteExplainIT.java
  • integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalcitePPLJoinIT.java
  • docs/user/interfaces/endpoint.rst

⚡ Recommended focus areas for review

Incomplete Pipeline Walk

The containsHead method walks only the first child of each node. If the AST has nodes with multiple children (e.g., a Filter or Sort node that stores its condition as a child alongside the pipeline child), the traversal may miss a Head node that is not always at index 0. Verify that all pipeline nodes consistently place the upstream plan at children.get(0).

private boolean containsHead(UnresolvedPlan plan) {
  UnresolvedPlan current = plan;
  while (current != null) {
    if (current instanceof Head) {
      return true;
    }
    List<? extends Node> children = current.getChild();
    if (children.isEmpty() || !(children.get(0) instanceof UnresolvedPlan)) {
      break;
    }
    current = (UnresolvedPlan) children.get(0);
  }
  return false;
}
Subquery Scope

The comment says subqueries in joins are not traversed, but the implementation relies solely on the first-child chain. If a Join node places the right-side subquery at children.get(0), containsHead could accidentally detect a Head inside a subquery and suppress the outer fetch_size injection. Confirm that Join nodes always place the main pipeline (left side) at index 0, not the subquery.

private boolean containsHead(UnresolvedPlan plan) {
  UnresolvedPlan current = plan;
  while (current != null) {
    if (current instanceof Head) {
      return true;
    }
    List<? extends Node> children = current.getChild();
    if (children.isEmpty() || !(children.get(0) instanceof UnresolvedPlan)) {
      break;
    }
    current = (UnresolvedPlan) children.get(0);
  }
  return false;
}
Missing Test Method Body

The new test testHeadOverridesFetchSizeWhenSmaller appears to be missing its method signature closing — the @Test annotation and method declaration are shown but the opening brace and body start at line 178 without a visible method declaration line. Verify the method compiles correctly and the annotation is properly attached.

public void testHeadOverridesFetchSizeWhenSmaller() throws IOException {
  // Explicit head takes precedence over fetch_size
  // head 3 returns 3 rows regardless of fetch_size=10
  JSONObject result =
      executeQueryWithFetchSize(
          String.format("source=%s | head 3 | fields firstname", TEST_INDEX_ACCOUNT), 10);
  JSONArray dataRows = result.getJSONArray("datarows");
  assertEquals(3, dataRows.length());
}

@github-actions
Copy link
Contributor

github-actions bot commented Feb 27, 2026

PR Code Suggestions ✨

Latest suggestions up to 8c6d460

Explore these optional code suggestions:

CategorySuggestion                                                                                                                                    Impact
General
Avoid hardcoded row count assumption in test

The test asserts that head 100 returns exactly 100 rows from TEST_INDEX_ACCOUNT, but
this assumes the index contains at least 100 documents. If the index has fewer than
100 documents, the assertion will fail even when the feature works correctly. The
assertion should use assertTrue(dataRows.length() > 5) or check against the actual
document count to make the test robust.

integ-test/src/test/java/org/opensearch/sql/ppl/FetchSizeIT.java [186-194]

 public void testHeadOverridesFetchSizeWhenLarger() throws IOException {
   // Explicit head takes precedence over fetch_size
-  // head 100 should return 100 rows even though fetch_size=5
+  // head 100 should return more rows than fetch_size=5 would allow
   JSONObject result =
       executeQueryWithFetchSize(
           String.format("source=%s | head 100 | fields firstname", TEST_INDEX_ACCOUNT), 5);
   JSONArray dataRows = result.getJSONArray("datarows");
-  assertEquals(100, dataRows.length());
+  // fetch_size=5 is ignored; result should have more than 5 rows
+  assertTrue(dataRows.length() > 5);
 }
Suggestion importance[1-10]: 6

__

Why: The test asserts exactly 100 rows from TEST_INDEX_ACCOUNT, which could fail if the index has fewer than 100 documents. Using assertTrue(dataRows.length() > 5) would make the test more robust and correctly validate that fetch_size=5 is being ignored.

Low
Verify first-child traversal covers all pipeline node types

The containsHead method walks only the first child of each node, but some pipeline
nodes (e.g., Project) may store their child at a different index or use a different
child ordering. If the pipeline structure places the main child at an index other
than 0, the traversal will stop prematurely and miss a Head node. Consider verifying
that children.get(0) is consistently the "upstream pipeline" child for all relevant
node types, or document this assumption explicitly.

ppl/src/main/java/org/opensearch/sql/ppl/parser/AstStatementBuilder.java [80-93]

+/**
+ * Checks if the main pipeline contains a {@link Head} node by walking the first-child chain.
+ * Assumes that for all pipeline nodes, the upstream plan is always children.get(0).
+ */
 private boolean containsHead(UnresolvedPlan plan) {
   UnresolvedPlan current = plan;
   while (current != null) {
     if (current instanceof Head) {
       return true;
     }
     List<? extends Node> children = current.getChild();
     if (children.isEmpty() || !(children.get(0) instanceof UnresolvedPlan)) {
       break;
     }
     current = (UnresolvedPlan) children.get(0);
   }
   return false;
 }
Suggestion importance[1-10]: 3

__

Why: The suggestion asks to verify an assumption about child ordering rather than fixing a concrete bug. The improved_code is nearly identical to existing_code (only adds a comment), making this a documentation suggestion rather than a code fix.

Low
Possible issue
Use safe JSON construction for request body

The query string is interpolated directly into a JSON string using String.format,
but it contains double quotes (e.g., around index names or string literals) and
special characters that could break the JSON structure or cause a parse error. Use a
JSONObject to build the request body safely instead of manual string formatting.

integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalcitePPLJoinIT.java [1099-1100]

-request.setJsonEntity(
-    String.format(Locale.ROOT, "{\"query\": \"%s\", \"fetch_size\": 2}", query));
+JSONObject body = new JSONObject();
+body.put("query", query);
+body.put("fetch_size", 2);
+request.setJsonEntity(body.toString());
Suggestion importance[1-10]: 5

__

Why: The query string contains special characters (brackets, pipes, equals signs) that could potentially break the manually constructed JSON. Using JSONObject to build the request body is safer and avoids potential JSON injection issues.

Low

Previous suggestions

Suggestions up to commit 713b095
CategorySuggestion                                                                                                                                    Impact
Possible issue
Fix fragile assertion assuming index has exactly 100 documents

The test asserts that head 100 returns exactly 100 rows from TEST_INDEX_ACCOUNT.
However, if the index contains fewer than 100 documents, the assertion will fail
even when the feature works correctly. You should either assert <= 100 rows, or use
a known dataset size, or assert that the result is greater than fetch_size (5) to
validate the override behavior.

integ-test/src/test/java/org/opensearch/sql/ppl/FetchSizeIT.java [186-194]

 public void testHeadOverridesFetchSizeWhenLarger() throws IOException {
   // Explicit head takes precedence over fetch_size
-  // head 100 should return 100 rows even though fetch_size=5
+  // head 100 should return more rows than fetch_size=5 (up to 100)
   JSONObject result =
       executeQueryWithFetchSize(
           String.format("source=%s | head 100 | fields firstname", TEST_INDEX_ACCOUNT), 5);
   JSONArray dataRows = result.getJSONArray("datarows");
-  assertEquals(100, dataRows.length());
+  // Result should be more than fetch_size=5, proving head takes precedence
+  assertTrue(dataRows.length() > 5);
+  assertTrue(dataRows.length() <= 100);
 }
Suggestion importance[1-10]: 7

__

Why: The assertion assertEquals(100, dataRows.length()) is fragile if TEST_INDEX_ACCOUNT has fewer than 100 documents. The suggestion to use assertTrue(dataRows.length() > 5) and assertTrue(dataRows.length() <= 100) is more robust and correctly validates the intended behavior (head overrides fetch_size) without depending on the exact dataset size.

Medium
General
Clarify main-pipeline traversal assumption for multi-child nodes

The containsHead method only walks the first child of each node. For nodes that have
multiple children (e.g., a Join node where the main pipeline is not always the first
child), this could incorrectly detect a Head in a subquery branch as being in the
main pipeline, or miss the main pipeline entirely. The comment says subqueries are
not traversed, but the implementation relies on the assumption that the main
pipeline is always children.get(0), which may not hold for all plan node types.
Consider documenting or asserting this assumption, or verifying it holds for all
relevant node types (e.g., Join).

ppl/src/main/java/org/opensearch/sql/ppl/parser/AstStatementBuilder.java [80-93]

+/**
+ * Checks if the main pipeline contains a {@link Head} node by walking the first-child chain.
+ * This relies on the convention that the main pipeline child is always at index 0.
+ * Nodes like Join may have the main pipeline at a different index — subqueries are intentionally
+ * not traversed, so a Head inside a subquery will not be detected here.
+ */
 private boolean containsHead(UnresolvedPlan plan) {
   UnresolvedPlan current = plan;
   while (current != null) {
     if (current instanceof Head) {
       return true;
     }
     List<? extends Node> children = current.getChild();
     if (children.isEmpty() || !(children.get(0) instanceof UnresolvedPlan)) {
       break;
     }
     current = (UnresolvedPlan) children.get(0);
   }
   return false;
 }
Suggestion importance[1-10]: 2

__

Why: The suggestion only adds a comment/documentation to the existing code without changing any logic. The existing code already has a Javadoc comment explaining the behavior, and the improved_code is functionally identical to existing_code with only comment changes, which should score low per guidelines.

Low
Suggestions up to commit 713b095
CategorySuggestion                                                                                                                                    Impact
Possible issue
Fix flaky assertion against potentially small dataset

The test asserts that head 100 returns exactly 100 rows from TEST_INDEX_ACCOUNT, but
if the index contains fewer than 100 documents, the assertion will fail. You should
either assert <= 100 or use an index known to have at least 100 documents to make
this test reliable.

integ-test/src/test/java/org/opensearch/sql/ppl/FetchSizeIT.java [186-194]

 public void testHeadOverridesFetchSizeWhenLarger() throws IOException {
   // Explicit head takes precedence over fetch_size
-  // head 100 should return 100 rows even though fetch_size=5
+  // head 100 should return up to 100 rows even though fetch_size=5
   JSONObject result =
       executeQueryWithFetchSize(
           String.format("source=%s | head 100 | fields firstname", TEST_INDEX_ACCOUNT), 5);
   JSONArray dataRows = result.getJSONArray("datarows");
-  assertEquals(100, dataRows.length());
+  assertTrue(dataRows.length() <= 100);
+  // fetch_size=5 must not have limited the result to 5
+  assertTrue(dataRows.length() > 5);
 }
Suggestion importance[1-10]: 7

__

Why: This is a valid concern — asserting assertEquals(100, dataRows.length()) when TEST_INDEX_ACCOUNT may have fewer than 100 documents would cause a flaky test. The improved assertion using assertTrue(dataRows.length() <= 100) combined with assertTrue(dataRows.length() > 5) is more robust and correctly validates the intended behavior.

Medium
General
Improve pipeline traversal robustness in head detection

The containsHead method walks only the first child of each node, but some plan nodes
(e.g., Project) may wrap the pipeline differently and the Head node might not always
be reachable via children.get(0). Consider verifying that all relevant plan node
types expose their pipeline child as the first element, or use a more robust
traversal (e.g., checking all children or using a visitor pattern) to avoid false
negatives where a Head node exists but is missed.

ppl/src/main/java/org/opensearch/sql/ppl/parser/AstStatementBuilder.java [80-93]

 private boolean containsHead(UnresolvedPlan plan) {
   UnresolvedPlan current = plan;
   while (current != null) {
     if (current instanceof Head) {
       return true;
     }
     List<? extends Node> children = current.getChild();
-    if (children.isEmpty() || !(children.get(0) instanceof UnresolvedPlan)) {
+    UnresolvedPlan next = null;
+    for (Node child : children) {
+      if (child instanceof UnresolvedPlan) {
+        next = (UnresolvedPlan) child;
+        break;
+      }
+    }
+    if (next == null) {
       break;
     }
-    current = (UnresolvedPlan) children.get(0);
+    current = next;
   }
   return false;
 }
Suggestion importance[1-10]: 3

__

Why: The suggestion proposes iterating all children instead of just children.get(0), but the improved_code still only picks the first UnresolvedPlan child found — functionally equivalent to the original for most cases. The change adds minimal robustness and the comment about Project wrapping differently is speculative without concrete evidence from the codebase.

Low
Suggestions up to commit d7a93b3
CategorySuggestion                                                                                                                                    Impact
Possible issue
Fix fragile assertion assuming index has 100+ documents

The test asserts that head 100 returns exactly 100 rows from TEST_INDEX_ACCOUNT, but
if the index contains fewer than 100 documents, the assertion will fail. You should
either assert <= 100 or use an index known to have at least 100 documents, or assert
against the actual document count of the index.

integ-test/src/test/java/org/opensearch/sql/ppl/FetchSizeIT.java [186-194]

 public void testHeadOverridesFetchSizeWhenLarger() throws IOException {
   // Explicit head takes precedence over fetch_size
-  // head 100 should return 100 rows even though fetch_size=5
+  // head 100 should return up to 100 rows even though fetch_size=5
   JSONObject result =
       executeQueryWithFetchSize(
           String.format("source=%s | head 100 | fields firstname", TEST_INDEX_ACCOUNT), 5);
   JSONArray dataRows = result.getJSONArray("datarows");
-  assertEquals(100, dataRows.length());
+  // Assert that fetch_size=5 did not cap the result; actual count depends on index size
+  assertTrue(dataRows.length() > 5);
 }
Suggestion importance[1-10]: 7

__

Why: This is a valid concern — if TEST_INDEX_ACCOUNT has fewer than 100 documents, assertEquals(100, dataRows.length()) will fail. The suggestion to use assertTrue(dataRows.length() > 5) is a reasonable fix to verify that fetch_size=5 did not cap the result without assuming the exact document count.

Medium
General
Replace hardcoded document count with named constant

The test hardcodes 7 as the expected row count for TEST_INDEX_BANK. If the index
size ever changes (e.g., due to test data updates), this assertion will silently
break. Consider using a named constant or fetching the total count dynamically to
make the test more robust.

integ-test/src/test/java/org/opensearch/sql/ppl/FetchSizeIT.java [231-237]

 public void testHeadLargerThanDatasetWithFetchSize() throws IOException {
   // head 1000 on a 7-row index with fetch_size=3: head takes precedence, returns all 7 rows
+  // TEST_INDEX_BANK is known to have exactly BANK_INDEX_DOCUMENT_COUNT documents
   JSONObject result =
       executeQueryWithFetchSize(String.format("source=%s | head 1000", TEST_INDEX_BANK), 3);
   JSONArray dataRows = result.getJSONArray("datarows");
-  assertEquals(7, dataRows.length());
+  assertEquals(BANK_INDEX_DOCUMENT_COUNT, dataRows.length());
 }
Suggestion importance[1-10]: 3

__

Why: Using a named constant like BANK_INDEX_DOCUMENT_COUNT instead of the hardcoded 7 improves maintainability, but this assumes such a constant exists in the codebase. The suggestion is a minor style improvement with low risk.

Low
Suggestions up to commit 3039353
CategorySuggestion                                                                                                                                    Impact
Possible issue
Fix fragile assertion against dataset size

The test asserts that head 100 returns exactly 100 rows from TEST_INDEX_ACCOUNT, but
if the index contains fewer than 100 documents, the assertion will fail. You should
either assert <= 100 rows, or use a known dataset size, or assert that the result is
not capped at fetch_size=5.

integ-test/src/test/java/org/opensearch/sql/ppl/FetchSizeIT.java [186-194]

 public void testHeadOverridesFetchSizeWhenLarger() throws IOException {
   // Explicit head takes precedence over fetch_size
-  // head 100 should return 100 rows even though fetch_size=5
+  // head 100 should return up to 100 rows even though fetch_size=5
   JSONObject result =
       executeQueryWithFetchSize(
           String.format("source=%s | head 100 | fields firstname", TEST_INDEX_ACCOUNT), 5);
   JSONArray dataRows = result.getJSONArray("datarows");
-  assertEquals(100, dataRows.length());
+  // Result should not be capped at fetch_size=5; it may be less than 100 if index has fewer docs
+  assertTrue(dataRows.length() > 5);
 }
Suggestion importance[1-10]: 7

__

Why: The test asserts assertEquals(100, dataRows.length()) but TEST_INDEX_ACCOUNT may not contain 100 documents, making this assertion potentially fragile. The suggestion to use assertTrue(dataRows.length() > 5) better captures the intent of verifying that fetch_size=5 is not applied, without depending on the exact dataset size.

Medium
General
Robust pipeline child traversal in head detection

The containsHead method walks only the first child of each node, but some plan nodes
(e.g., Project) may wrap the pipeline differently and the Head node might not always
be reachable via children.get(0). For example, if Project stores its child at a
different index or uses a different child ordering, the traversal would miss the
Head. Consider verifying that the child traversal order matches how UnresolvedPlan
nodes actually store their pipeline children, or use a more robust recursive search.

ppl/src/main/java/org/opensearch/sql/ppl/parser/AstStatementBuilder.java [80-93]

 private boolean containsHead(UnresolvedPlan plan) {
   UnresolvedPlan current = plan;
   while (current != null) {
     if (current instanceof Head) {
       return true;
     }
     List<? extends Node> children = current.getChild();
-    if (children.isEmpty() || !(children.get(0) instanceof UnresolvedPlan)) {
+    UnresolvedPlan next = null;
+    for (Node child : children) {
+      if (child instanceof UnresolvedPlan) {
+        next = (UnresolvedPlan) child;
+        break;
+      }
+    }
+    if (next == null) {
       break;
     }
-    current = (UnresolvedPlan) children.get(0);
+    current = next;
   }
   return false;
 }
Suggestion importance[1-10]: 3

__

Why: The suggestion proposes iterating over all children to find the first UnresolvedPlan child, rather than always taking children.get(0). However, the existing code already checks children.get(0) instanceof UnresolvedPlan before casting, and PPL pipeline nodes consistently store their main pipeline child at index 0. The improvement is marginal and the existing tests (including buildQueryStatementWithFetchSizeAndHeadFollowedByFields) validate the current traversal works correctly.

Low
Suggestions up to commit 4cacb79
CategorySuggestion                                                                                                                                    Impact
Possible issue
Verify dataset size assumption in test

The test asserts that head 100 returns exactly 100 rows from TEST_INDEX_ACCOUNT. If
the account index has fewer than 100 documents, this assertion will fail. You should
verify that TEST_INDEX_ACCOUNT has at least 100 documents, or adjust the assertion
to use assertTrue(dataRows.length() <= 100) combined with a check that all available
rows are returned.

integ-test/src/test/java/org/opensearch/sql/ppl/FetchSizeIT.java [186-193]

 public void testHeadOverridesFetchSizeWhenLarger() throws IOException {
   // Explicit head takes precedence over fetch_size
   // head 100 should return 100 rows even though fetch_size=5
+  // (TEST_INDEX_ACCOUNT must have >= 100 documents for this assertion to hold)
   JSONObject result =
       executeQueryWithFetchSize(
           String.format("source=%s | head 100 | fields firstname", TEST_INDEX_ACCOUNT), 5);
   JSONArray dataRows = result.getJSONArray("datarows");
   assertEquals(100, dataRows.length());
 }
Suggestion importance[1-10]: 6

__

Why: This is a valid concern — if TEST_INDEX_ACCOUNT has fewer than 100 documents, assertEquals(100, dataRows.length()) will fail. However, the improved_code only adds a comment without actually changing the assertion logic, making it a verification suggestion rather than a fix.

Low
General
Use recursion for cleaner pipeline traversal

The containsHead method walks only the first child of each node. For nodes like
Project that wrap the pipeline, the first child is the inner plan, so this works for
most cases. However, if a node has multiple children and the Head is not in the
first-child chain (e.g., after a Filter that has the head as a sibling), it could be
missed. Consider using getChild() to check all children or using a recursive
traversal to ensure correctness across all pipeline shapes.

ppl/src/main/java/org/opensearch/sql/ppl/parser/AstStatementBuilder.java [80-93]

 private boolean containsHead(UnresolvedPlan plan) {
-  UnresolvedPlan current = plan;
-  while (current != null) {
-    if (current instanceof Head) {
-      return true;
-    }
-    List<? extends Node> children = current.getChild();
-    if (children.isEmpty() || !(children.get(0) instanceof UnresolvedPlan)) {
-      break;
-    }
-    current = (UnresolvedPlan) children.get(0);
+  if (plan == null) return false;
+  if (plan instanceof Head) return true;
+  List<? extends Node> children = plan.getChild();
+  if (children.isEmpty()) return false;
+  // Only traverse the main pipeline (first child chain), not subqueries
+  Node firstChild = children.get(0);
+  if (firstChild instanceof UnresolvedPlan) {
+    return containsHead((UnresolvedPlan) firstChild);
   }
   return false;
 }
Suggestion importance[1-10]: 3

__

Why: The suggestion refactors the iterative traversal to a recursive one, but the logic is functionally equivalent — both only follow the first child chain. The improved code doesn't actually fix any real bug; it's a style preference with no meaningful behavioral difference.

Low

When a user's PPL query contains an explicit head command, the
fetch_size parameter should not inject an additional Head node on top.
Previously, fetch_size always wrapped the plan with Head(fetchSize),
creating a double-Head that could cap results unexpectedly (e.g.,
head 100 with fetch_size=5 would only return 5 rows).

This adds a containsHead() check in AstStatementBuilder to skip the
Head injection when the user's query already has a head command,
letting the user's explicit intent take precedence.

Signed-off-by: Kai Huang <ahkcs@amazon.com>
@ahkcs ahkcs force-pushed the fix/fetch-size-head-total-hits branch from d2b35ee to ff1fd32 Compare February 27, 2026 01:10
@github-actions
Copy link
Contributor

Persistent review updated to latest commit ff1fd32

Remove the outer LogicalSort(fetch=[fetchSize]) from expected explain
plans when an explicit head command is present, since fetch_size no
longer injects a Head node in that case.

Signed-off-by: Kai Huang <ahkcs@amazon.com>
@github-actions
Copy link
Contributor

Persistent review updated to latest commit 7f797a9

Verifies that when a query has multiple heads (e.g., head 3 | head 500),
fetch_size does not inject yet another Head node. The existing Head
nodes are preserved as-is, with the inner head 3 limiting first.

Signed-off-by: Kai Huang <ahkcs@amazon.com>
@github-actions
Copy link
Contributor

Persistent review updated to latest commit adea087

Signed-off-by: Kai Huang <ahkcs@amazon.com>
@github-actions
Copy link
Contributor

Persistent review updated to latest commit e62de53

@ahkcs ahkcs requested a review from dai-chen February 27, 2026 21:09
Signed-off-by: ahkcs <austinhkcs@gmail.com>
Signed-off-by: Kai Huang <ahkcs@amazon.com>
@github-actions
Copy link
Contributor

Persistent review updated to latest commit a6e1216

@ahkcs ahkcs requested a review from dai-chen March 2, 2026 18:55
Change from recursive all-children traversal to iterative first-child
chain walk. This ensures head commands inside join subqueries or nested
structures do not incorrectly suppress fetch_size injection on the
outer query.

Signed-off-by: ahkcs <austinhkcs@gmail.com>
Signed-off-by: Kai Huang <ahkcs@amazon.com>
@github-actions
Copy link
Contributor

github-actions bot commented Mar 2, 2026

Persistent review updated to latest commit 5841f08

Explain that since PPL's fetch_size has no pagination support, it acts
as a default response size rather than an absolute cap. When the user
specifies an explicit head command, fetch_size yields to avoid silently
discarding rows with no way to retrieve them. Also document that head
commands inside subqueries do not affect fetch_size for the outer query.

Signed-off-by: ahkcs <austinhkcs@gmail.com>
Signed-off-by: Kai Huang <ahkcs@amazon.com>
@github-actions
Copy link
Contributor

github-actions bot commented Mar 2, 2026

Persistent review updated to latest commit 4cacb79

Verifies that containsHead correctly walks past Project (from fields)
to find the Head node underneath.

Signed-off-by: ahkcs <austinhkcs@gmail.com>
Signed-off-by: Kai Huang <ahkcs@amazon.com>
@github-actions
Copy link
Contributor

github-actions bot commented Mar 2, 2026

Persistent review updated to latest commit 3039353

+-----------------------------------------------+---------------------------------------------------+
| Query | Behavior with ``fetch_size=5`` |
+===============================================+===================================================+
| ``source=t \| fields age`` | ``fetch_size`` applies — returns 5 rows |
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

np: is the \ required?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice catch! Removed

dai-chen
dai-chen previously approved these changes Mar 2, 2026
@ahkcs ahkcs force-pushed the fix/fetch-size-head-total-hits branch from b1fe41b to d7a93b3 Compare March 2, 2026 22:48
Signed-off-by: Kai Huang <ahkcs@amazon.com>
@github-actions
Copy link
Contributor

github-actions bot commented Mar 2, 2026

Persistent review updated to latest commit d7a93b3

@ahkcs ahkcs force-pushed the fix/fetch-size-head-total-hits branch from d7a93b3 to 713b095 Compare March 2, 2026 22:48
@github-actions
Copy link
Contributor

github-actions bot commented Mar 2, 2026

Persistent review updated to latest commit 713b095

1 similar comment
@github-actions
Copy link
Contributor

github-actions bot commented Mar 2, 2026

Persistent review updated to latest commit 713b095

@ahkcs ahkcs requested a review from dai-chen March 3, 2026 17:08
+===============================================+===================================================+
| ``source=t | fields age`` | ``fetch_size`` applies — returns 5 rows |
+-----------------------------------------------+---------------------------------------------------+
| ``source=t | head 100 | fields age`` | ``head`` takes precedence — returns 100 rows |
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fetch_size is optional — if you don't want it, don't send it. The server should honor what the client explicitly requests. The fix belongs in the client that's injecting fetch_size by default.

From a REST API perspective, fetch_size is an explicit, optional parameter the caller chooses to send. If the caller sends fetch_size=5, they're requesting at most 5 rows — that's the API contract. Adding server-side logic to silently ignore that parameter based on query content breaks the principle of least surprise at the API layer.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is a very valid point. If we fix it server-side, we're working around a client behavior by making the API semantics inconsistent. If we fix it client-side, the API contract stays clean.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Validates that head commands inside join subqueries do not suppress
fetch_size on the outer query, locking down the scoping guarantee.

Signed-off-by: ahkcs <austinhkcs@gmail.com>
Signed-off-by: Kai Huang <ahkcs@amazon.com>
@github-actions
Copy link
Contributor

github-actions bot commented Mar 3, 2026

Persistent review updated to latest commit 8c6d460

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bugFix PPL Piped processing language

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants