Skip to content

Apply docs website feedback to ppl functions#5207

Open
ritvibhatt wants to merge 5 commits intoopensearch-project:mainfrom
ritvibhatt:ppl-functions-updates
Open

Apply docs website feedback to ppl functions#5207
ritvibhatt wants to merge 5 commits intoopensearch-project:mainfrom
ritvibhatt:ppl-functions-updates

Conversation

@ritvibhatt
Copy link
Contributor

Description

  • Apply feedback from PR for adding PPL functions documentation to documentation website to keep sql repo in sync (Update PPL functions documentation documentation-website#11747)
  • Update exporter script to add backtick wrapping for standalone %-prefixed format specifiers in exported CLI tables and ensured index.md gets nav_order: 1 in the functions directory.

Related Issues

Check List

  • New functionality includes testing.
  • New functionality has been documented.
  • New functionality has javadoc added.
  • New functionality has a user manual doc added.
  • New PPL command checklist all confirmed.
  • API changes companion pull request created.
  • Commits are signed per the DCO using --signoff or -s.
  • Public documentation issue/PR created.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Signed-off-by: Ritvi Bhatt <ribhatt@amazon.com>
Signed-off-by: Ritvi Bhatt <ribhatt@amazon.com>
@github-actions
Copy link
Contributor

github-actions bot commented Mar 5, 2026

PR Reviewer Guide 🔍

(Review updated until commit 93f92d5)

Here are some key observations to aid the review process:

🧪 No relevant tests
🔒 No security concerns identified
✅ No TODO sections
🔀 Multiple PR themes

Sub-PR theme: Update docs exporter script with backtick wrapping and index.md nav ordering

Relevant files:

  • scripts/docs_exporter/export_to_docs_website.py

Sub-PR theme: Apply documentation website feedback to PPL function docs

Relevant files:

  • docs/user/ppl/functions/math.md
  • docs/user/ppl/functions/statistical.md
  • docs/user/ppl/functions/condition.md

⚡ Recommended focus areas for review

Nested Function Definition

The backtick_percent function is defined inside a loop body (within the list comprehension context), causing it to be redefined on every iteration of the outer loop. It should be moved outside the loop to avoid unnecessary repeated function object creation.

def backtick_percent(cell):
    if "'" in cell or '"' in cell or '`' in cell:
        return cell
    if re.match(r'^%\S+$', cell):
        return f'`{cell}`'
    return cell
cells = [backtick_percent(c) for c in cells]
Missing Reference Link

The EARLIEST function description removed the reference link to the detailed relative timestamp documentation ("Read more details here") that was present in the old version. This may reduce discoverability of important documentation for users who need more detail on relative time formats.

**Relative string formats**:
1. `"now"` or `"now()"`: Uses the current system time.
2. Absolute format (`MM/dd/yyyy:HH:mm:ss` or `yyyy-MM-dd HH:mm:ss`): Converts the string to a timestamp and compares it against the field value.
3. Relative format: `(+|-)<time_integer><time_unit>[+<...>]@<snap_unit>`

**Steps to specify a relative time**:
- **Time offset**: Indicate the offset from the current time using `+` or `-`.
- **Time amount**: Provide a numeric value followed by a time unit (`s`, `m`, `h`, `d`, `w`, `M`, `y`).
- **Snap to unit**: Optionally, specify a snap unit using `@<unit>` to round the result down to the nearest unit (for example, hour, day, month).

**Examples** (assuming current time is `2025-05-28 14:28:34`):
- `-3d+2y``2027-05-25 14:28:34`.
- `+1d@m``2025-05-29 14:28:00`.
- `-3M+1y@M``2026-02-01 00:00:00`.
EXISTS Section Accuracy

The EXISTS section header now says "Use isnull(field) or isnotnull(field) to test field existence" as its usage, which is not a function signature but a recommendation. This may be confusing as it doesn't follow the same pattern as other function entries and could mislead users into thinking EXISTS is a callable function.

## EXISTS

**Usage**: Use `isnull(field)` or `isnotnull(field)` to test field existence

Since OpenSearch doesn't differentiate between null and missing values, functions like `ismissing`/`isnotmissing` are not available. Use `isnull`/`isnotnull` to test field existence instead.
MULTIPLY NULL Behavior

The old documentation for MULTIPLY noted "If y equals to 0, then returns NULL" in the return type description. This behavior note has been removed in the new version. If this is still the actual behavior, it should be documented to avoid user confusion.

## MULTIPLY

**Usage**: `MULTIPLY(x, y)`

Calculates the product of `x` and `y`.

**Parameters**:

- `x` (Required): An `INTEGER`, `LONG`, `FLOAT`, or `DOUBLE` value.
- `y` (Required): An `INTEGER`, `LONG`, `FLOAT`, or `DOUBLE` value.

**Return type**: The wider numeric type between `x` and `y`

**Synonyms**: Multiplication Symbol (`*`)

@github-actions
Copy link
Contributor

github-actions bot commented Mar 5, 2026

PR Code Suggestions ✨

Latest suggestions up to 93f92d5

Explore these optional code suggestions:

CategorySuggestion                                                                                                                                    Impact
General
Fix incorrect return type documentation

The return type for ROUND with FLOAT/DOUBLE input is incorrectly documented as LONG.
When rounding a FLOAT or DOUBLE with a decimal place argument, the result should
remain DOUBLE, not LONG. This is a documentation accuracy issue that could mislead
users.

docs/user/ppl/functions/math.md [1184-1186]

 **Return type**:
 - `(INTEGER/LONG [,INTEGER])` -> `LONG`.
-- `(FLOAT/DOUBLE [,INTEGER])` -> `LONG`.
+- `(FLOAT/DOUBLE [,INTEGER])` -> `DOUBLE`.
Suggestion importance[1-10]: 7

__

Why: The ROUND function's return type for FLOAT/DOUBLE inputs is documented as LONG, which matches the old documentation. However, this could be a genuine documentation error if the actual behavior returns DOUBLE for decimal rounding. This is a potentially important accuracy issue for users relying on the documentation.

Medium
Fix inconsistent code formatting in description

The backtick formatting is inconsistent — multi_match query should be formatted as
multi_match query (with only the function name in backticks, not the word "query").
This is a documentation accuracy issue that could confuse readers.

docs/user/ppl/functions/relevance.md [211]

-Maps to the `multi_match query` in the OpenSearch engine.
+Maps to the `multi_match` query in the OpenSearch engine.
Suggestion importance[1-10]: 5

__

Why: The backtick formatting wraps multi_match query as a single code span, but only multi_match is the function name. The word "query" should be outside the backticks for consistency with how other functions are described in the same file.

Low
Move helper function outside the loop

Defining a function inside a loop body (which runs for every table row) is
inefficient. The backtick_percent function should be defined outside the loop,
ideally at module level or at least outside the for line in lines loop, to avoid
redefining it on every iteration.

scripts/docs_exporter/export_to_docs_website.py [108-114]

 def backtick_percent(cell):
     if "'" in cell or '"' in cell or '`' in cell:
         return cell
     if re.match(r'^%\S+$', cell):
         return f'`{cell}`'
     return cell
+
+# ... (inside the loop)
 cells = [backtick_percent(c) for c in cells]
Suggestion importance[1-10]: 4

__

Why: The suggestion is valid - defining backtick_percent inside the loop body causes it to be redefined on every iteration, which is inefficient. However, the performance impact is minor for typical documentation processing, and the improved_code doesn't clearly show where the function should be moved to.

Low
Restore missing NULL behavior documentation

The MULTIPLY function description previously noted that if y equals 0, it returns
NULL, but this behavior note was removed in the new documentation. This is important
behavioral information that users need to know. Consider adding a note about the
behavior when multiplying by zero.

docs/user/ppl/functions/math.md [47-49]

 **Return type**: The wider numeric type between `x` and `y`
 
-**Synonyms**: Addition Symbol (`+`)
+**Synonyms**: Multiplication Symbol (`*`)
 
+> **Note**: If `y` equals `0`, the function returns `NULL`.
+
Suggestion importance[1-10]: 4

__

Why: The improved_code incorrectly applies the note to the ADD function section (which has Addition Symbol (+) synonym) rather than the MULTIPLY function section. The existing_code snippet matches the ADD function, not MULTIPLY, making this suggestion misplaced.

Low
Use consistent callout/note formatting throughout document

The blockquote format used here (>) is inconsistent with the note format used
elsewhere in the same file (e.g., {: .note}). Using a consistent callout style
throughout the document improves readability and ensures proper rendering in the
documentation framework.

docs/user/ppl/functions/aggregations.md [787-790]

-> The `plugins.ppl.values.max.limit` setting controls the maximum number of unique values returned:
-> - The default value is 0, which returns an unlimited number of values.
-> - Setting this to any positive integer limits the number of unique values.
-> - See the [PPL Settings](../admin/settings.md#plugins-ppl-values-max-limit) documentation for more details
+The `plugins.ppl.values.max.limit` setting controls the maximum number of unique values returned:
+- The default value is `0`, which returns an unlimited number of values.
+- Setting this to any positive integer limits the number of unique values.
+- See the [PPL Settings](../admin/settings.md#plugins-ppl-values-max-limit) documentation for more details.
+{: .note}
Suggestion importance[1-10]: 4

__

Why: The blockquote format (>) is inconsistent with the {: .note} callout style used elsewhere in the file. Consistent formatting improves readability and ensures proper rendering in the documentation framework.

Low
Narrow regex to avoid wrapping compound specifiers

The regex r'^%\S+$' will also match cells like %V%m (compound format specifiers) or
%% (literal percent), wrapping them in backticks even though they may already be
handled or should not be wrapped. Consider whether compound specifiers like %s%f or
%% should be excluded from this transformation to avoid double-wrapping or incorrect
rendering.

scripts/docs_exporter/export_to_docs_website.py [111-112]

-if re.match(r'^%\S+$', cell):
+if re.match(r'^%[A-Za-z%]$', cell) or re.match(r'^%[A-Za-z]{1,2}$', cell):
     return f'`{cell}`'
Suggestion importance[1-10]: 3

__

Why: The concern about compound specifiers like %s%f or %% being wrapped is valid, but looking at the PR context, the table cells in the markdown already have backtick-wrapped specifiers (e.g., `%a`), so the check if '' in cellwould prevent double-wrapping. Theimproved_codeuses a regex that would miss valid single-character specifiers like%:z, %::z, %:::z` that appear in the STRFTIME table, making it less correct than the original.

Low
Remove inconsistent extra blank lines in examples

There is an extra blank line between the closing code fence and the result table for
the regexp_match examples. This inconsistency in formatting differs from all other
examples in the document and may render unexpectedly. Remove the extra blank line to
maintain consistent formatting.

docs/user/ppl/functions/condition.md [845-852]

 ```ppl
 source=logs
 | where regexp_match(message, 'ERROR|WARN|FATAL')
 | fields timestamp, message

| timestamp | message |
+| --- | --- |
+| 2024-01-15 10:23:45 | ERROR: Connection timeout to database |
+| 2024-01-15 10:24:12 | WARN: High memory usage detected |
+| 2024-01-15 10:25:33 | FATAL: System crashed unexpectedly |

<details><summary>Suggestion importance[1-10]: 3</summary>

__

Why: The extra blank line between the code fence and result table is a minor formatting inconsistency. The `improved_code` also reformats the table structure, which goes beyond just removing the blank line.


</details></details></td><td align=center>Low

</td></tr><tr><td>



<details><summary>Add missing newline at end of file</summary>

___


**The file ends without a newline character, which can cause issues with some Markdown <br>renderers and version control systems. Add a newline at the end of the file to <br>follow standard file formatting conventions.**

[docs/user/ppl/functions/condition.md [898]](https://github.com/opensearch-project/sql/pull/5207/files#diff-d96327aa4c8968cb04d38b138e14622bb2b5370283579c26145c2b0dc1adc164R898-R898)

```diff
-| Desktop Computer Tower | Computing|
-+------------------------+----------+
-```
+| Desktop Computer Tower | Computing |
+| --- | --- |
+| Dell Laptop XPS | Computing |
+| iPhone 15 Pro | Mobile |
+| Wireless Mouse | Other |
+| Desktop Computer Tower | Computing |
Suggestion importance[1-10]: 2

__

Why: The improved_code doesn't match the existing_code - it replaces the old-style table with a new-style table rather than simply adding a newline. The suggestion is also minor formatting concern with low impact.

Low
Verify anchor link resolves to correct section

The anchor link was changed from #percentile-or-percentile_approx to
#percentile-percentile_approx, but the section heading in aggregations.md is ###
PERCENTILE, PERCENTILE_APPROX. Markdown typically generates anchors by lowercasing
and replacing spaces/special characters with hyphens, so the comma and space would
produce #percentile-percentile_approx. Verify that this anchor correctly resolves to
the intended section in the rendered documentation.

docs/user/ppl/functions/index.md [21]

+- [PERCENTILE, PERCENTILE_APPROX](aggregations.md/#percentile-percentile_approx).
 
-
Suggestion importance[1-10]: 2

__

Why: The existing_code and improved_code are identical, making this purely a verification suggestion. The anchor #percentile-percentile_approx is likely correct for the heading ### PERCENTILE, PERCENTILE_APPROX in standard Markdown rendering.

Low
Possible issue
Verify return type change from INTEGER to BOOLEAN

The old documentation listed the return type of LIKE as INTEGER, but the new
documentation correctly changes it to BOOLEAN. However, ILIKE also had INTEGER as
its return type in the old docs and is now changed to BOOLEAN — this is correct.
Verify that both LIKE and ILIKE actually return BOOLEAN (not INTEGER) in the
implementation to ensure the documentation matches the actual behavior.

docs/user/ppl/functions/string.md [120-128]

+**Return type**: `BOOLEAN`
 
+### Example
+  
+```ppl
+source=people
+| eval `LIKE('hello world', '_ello%')` = LIKE('hello world', '_ello%'), `LIKE('hello world', '_ELLo%', true)` = LIKE('hello world', '_ELLo%', true), `LIKE('hello world', '_ELLo%', false)` = LIKE('hello world', '_ELLo%', false)
+| fields `LIKE('hello world', '_ello%')`, `LIKE('hello world', '_ELLo%', true)`, `LIKE('hello world', '_ELLo%', false)`
+```
Suggestion importance[1-10]: 3

__

Why: The existing_code and improved_code are identical, making this a verification suggestion rather than an actual code change. The suggestion asks to verify implementation behavior, which is valid but doesn't warrant a high score.

Low

Previous suggestions

Suggestions up to commit c578e42
CategorySuggestion                                                                                                                                    Impact
Possible issue
Fix trailing percent in format specifier

The format string for DAY_HOUR appears to have a stray % at the end: %d%H%. Based on
the pattern of other entries and the original content, it should be %d%H (without
the trailing %).

docs/user/ppl/functions/datetime.md [1097]

-| `DAY_HOUR` | `%d%H%` |
+| `DAY_HOUR` | `%d%H` |
Suggestion importance[1-10]: 7

__

Why: The DAY_HOUR format string %d%H% has a stray trailing % that appears to be a typo. The original old hunk also had %d%H% (same bug), but the correct format should be %d%H based on the pattern of other entries. This is a documentation accuracy issue.

Medium
Fix incorrect return type for float/double inputs

The return type for FLOAT/DOUBLE inputs to ROUND is listed as LONG, but rounding a
float/double should return a DOUBLE (or at minimum FLOAT), not a LONG. The original
documentation also listed this as LONG, but this appears to be a documentation
inaccuracy that should be corrected.

docs/user/ppl/functions/math.md [1184-1186]

 **Return type**:
 - `(INTEGER/LONG [,INTEGER])` -> `LONG`.
-- `(FLOAT/DOUBLE [,INTEGER])` -> `LONG`.
+- `(FLOAT/DOUBLE [,INTEGER])` -> `DOUBLE`.
Suggestion importance[1-10]: 5

__

Why: The return type for FLOAT/DOUBLE inputs to ROUND is listed as LONG, which appears incorrect since rounding a float/double with decimal places should return a DOUBLE. This is a potential documentation inaccuracy that could mislead users about the actual behavior of the function.

Low
Verify anchor link matches actual heading ID

The anchor link for PERCENTILE, PERCENTILE_APPROX was changed from
#percentile-or-percentile_approx to #percentile-percentile_approx. Verify that the
new anchor ID matches the actual heading in aggregations.md, which reads ###
PERCENTILE, PERCENTILE_APPROX, to avoid a broken link.

docs/user/ppl/functions/index.md [21]

+- [PERCENTILE, PERCENTILE_APPROX](aggregations.md/#percentile-percentile_approx).
 
-
Suggestion importance[1-10]: 3

__

Why: The existing_code and improved_code are identical, making this only a verification request. The anchor #percentile-percentile_approx should match the heading in aggregations.md, but no actual code change is proposed.

Low
General
Fix inconsistent inline code formatting

The backtick formatting is inconsistent — multi_match query should be formatted as
two separate elements or as multi_match query. The current formatting wraps both the
function name and the word "query" together in a code span, which is misleading.

docs/user/ppl/functions/relevance.md [211]

-Maps to the `multi_match query` in the OpenSearch engine.
+Maps to the `multi_match` query in the OpenSearch engine.
Suggestion importance[1-10]: 5

__

Why: The formatting multi_match query wraps both the function name and the word "query" in a single code span, which is misleading. The improved code correctly formats only multi_match as code, making the documentation clearer.

Low
Move helper function outside of loop

Defining a function inside a loop body is inefficient since it gets redefined on
every iteration. Move the backtick_percent function definition outside of the loop
(or at module level) so it is only created once.

scripts/docs_exporter/export_to_docs_website.py [108-114]

 def backtick_percent(cell):
     if "'" in cell or '"' in cell or '`' in cell:
         return cell
     if re.match(r'^%\S+$', cell):
         return f'`{cell}`'
     return cell
+
+# ... (inside the loop)
 cells = [backtick_percent(c) for c in cells]
Suggestion importance[1-10]: 4

__

Why: The backtick_percent function is defined inside the loop body and gets redefined on every iteration, which is slightly inefficient. Moving it outside the loop is a valid improvement, though the performance impact is minimal for typical documentation processing.

Low
Restore missing type compatibility warning

The old documentation noted that if expr1 and expr2 are different types, a semantic
check failure occurs. The new documentation drops this important constraint warning,
which could mislead users into thinking mixed types are always safe. Consider adding
a note about type compatibility requirements.

docs/user/ppl/functions/condition.md [293-303]

 **Usage**: `if(condition, expr1, expr2)`
 
 Returns `expr1` if the condition is `true`, otherwise returns `expr2`.
 
 **Parameters**:
 
 - `condition` (Required): The Boolean expression to evaluate.
 - `expr1` (Required): The value to return if the condition is `true`.
 - `expr2` (Required): The value to return if the condition is `false`.
 
 **Return type**: Least restrictive common type of `expr1` and `expr2`
 
+> **Note**: If `expr1` and `expr2` have incompatible types, the query will fail a semantic check.
+
Suggestion importance[1-10]: 4

__

Why: The old documentation warned that incompatible types for expr1 and expr2 would fail a semantic check. Removing this constraint warning could mislead users, but the new **Return type**: Least restrictive common type phrasing partially addresses this by implying type coercion occurs rather than failure.

Low
Restore missing type compatibility constraint warning

The old documentation noted that if the two parameters have different types, a
semantic check failure occurs. This important constraint was removed in the new
version, which could mislead users. Consider adding a note about type compatibility
requirements for ifnull.

docs/user/ppl/functions/condition.md [196-205]

 **Usage**: `ifnull(field1, field2)`
 
 Returns `field2` if `field1` is `NULL`.
 
 **Parameters**:
 
 - `field1` (Required): The field to check for `NULL` values.
 - `field2` (Required): The value to return if `field1` is `NULL`.
 
 **Return type**: Any (matches input types)
 
+> **Note**: If `field1` and `field2` have different types, the query will fail a semantic check.
+
Suggestion importance[1-10]: 4

__

Why: The old documentation warned that different types for field1 and field2 would fail a semantic check. This constraint was removed in the new version, which could mislead users. Adding this note back would improve accuracy, though the **Return type**: Any (matches input types) phrasing partially conveys the expectation of matching types.

Low
Fix inconsistent callout formatting

The blockquote formatting is inconsistent with the rest of the document, which uses
{: .note} callout syntax for similar informational notes (as seen in the LIST and
PERCENTILE sections). Using a blockquote here creates visual inconsistency.

docs/user/ppl/functions/aggregations.md [787-790]

-> The `plugins.ppl.values.max.limit` setting controls the maximum number of unique values returned:
-> - The default value is 0, which returns an unlimited number of values.
-> - Setting this to any positive integer limits the number of unique values.
-> - See the [PPL Settings](../admin/settings.md#plugins-ppl-values-max-limit) documentation for more details
+The `plugins.ppl.values.max.limit` setting controls the maximum number of unique values returned:
+- The default value is `0`, which returns an unlimited number of values.
+- Setting this to any positive integer limits the number of unique values.
+- See the [PPL Settings](../admin/settings.md#plugins-ppl-values-max-limit) documentation for more details.
+{: .note}
Suggestion importance[1-10]: 4

__

Why: The blockquote formatting is inconsistent with the {: .note} callout syntax used elsewhere in the document (e.g., in LIST and PERCENTILE sections). Standardizing to {: .note} would improve visual consistency across the documentation.

Low
Fix missing table row and trailing newline

The file ends without a newline character, which can cause issues with some Markdown
renderers and version control systems. The last row of the table is also missing the
final entry | Desktop Computer Tower | Computing |. Add the missing table row and a
trailing newline.

docs/user/ppl/functions/condition.md [898]

-| Desktop Computer Tower | Computing|
-+------------------------+----------+
-```
+| Desktop Computer Tower | Computing |
Suggestion importance[1-10]: 3

__

Why: The suggestion points out a missing table row for Desktop Computer Tower and a trailing newline. Looking at the PR diff, the new format uses a Markdown table without the +---+ borders, and the Desktop Computer Tower row is indeed missing from the new format. However, the improved_code only adds the missing row without addressing the trailing newline concern, making it a partial fix with moderate impact.

Low
Verify corrected return types match implementation

The LIKE function's documented return type was changed from INTEGER to BOOLEAN in
the new code, which is actually a correction. However, the ILIKE function's return
type was also changed from INTEGER to BOOLEAN. Verify that both LIKE and ILIKE
actually return BOOLEAN and not INTEGER to ensure the documentation matches the
implementation.

docs/user/ppl/functions/string.md [120]

+**Return type**: `BOOLEAN`
 
-
Suggestion importance[1-10]: 2

__

Why: The existing_code and improved_code are identical, so this is just a verification request. The suggestion asks to confirm the return type change from INTEGER to BOOLEAN matches the implementation, but doesn't propose any actual code change.

Low
Suggestions up to commit a650ebf
CategorySuggestion                                                                                                                                    Impact
Possible issue
Fix malformed format specifier in table

The format string for DAY_HOUR appears to have a stray % at the end (%d%H%), which
looks like a copy-paste error. It should likely be %d%H to match the pattern of the
other combined specifiers.

docs/user/ppl/functions/datetime.md [1097]

-| `DAY_HOUR` | `%d%H%` |
+| `DAY_HOUR` | `%d%H` |
Suggestion importance[1-10]: 7

__

Why: The DAY_HOUR format specifier %d%H% has a trailing % that appears to be a typo, inconsistent with the pattern of all other combined specifiers in the table. This is a documentation accuracy issue that could mislead users.

Medium
Missing table row in final example

The last example in the REGEXP_MATCH section is missing the final row "Desktop
Computer Tower" in the new Markdown table format. The table ends with "Wireless
Mouse" but the original output included a fourth row. Additionally, the file is
missing a trailing newline, which can cause issues with some Markdown renderers and
linters.

docs/user/ppl/functions/condition.md [895-898]

-| Desktop Computer Tower | Computing|
-+------------------------+----------+
-```
+| name | category |
+| --- | --- |
+| Dell Laptop XPS | Computing |
+| iPhone 15 Pro | Mobile |
+| Wireless Mouse | Other |
+| Desktop Computer Tower | Computing |
Suggestion importance[1-10]: 6

__

Why: The new Markdown table format for the last REGEXP_MATCH example is missing the "Desktop Computer Tower" row that was present in the original documentation, resulting in incomplete example output.

Low
Fix incorrect return type for ROUND with float input

The return type for ROUND with FLOAT/DOUBLE input is listed as LONG, but rounding a
float/double to decimal places should return a DOUBLE, not a LONG. The original
documentation also listed this as LONG, but this appears to be a documentation
inaccuracy that should be corrected in the new version.

docs/user/ppl/functions/math.md [1184-1186]

 **Return type**:
 - `(INTEGER/LONG [,INTEGER])` -> `LONG`.
-- `(FLOAT/DOUBLE [,INTEGER])` -> `LONG`.
+- `(FLOAT/DOUBLE [,INTEGER])` -> `DOUBLE`.
Suggestion importance[1-10]: 5

__

Why: The return type LONG for FLOAT/DOUBLE input to ROUND appears incorrect when decimal places are specified; however, this was also in the original docs, so it may reflect actual implementation behavior rather than a documentation error.

Low
General
Fix inconsistent inline code formatting

The backtick formatting is inconsistent — multi_match query should be formatted as
two separate elements or as multi_match query. The current formatting wraps both the
function name and the word "query" together in a code span, which is misleading.

docs/user/ppl/functions/relevance.md [211]

-Maps to the `multi_match query` in the OpenSearch engine.
+Maps to the `multi_match` query in the OpenSearch engine.
Suggestion importance[1-10]: 5

__

Why: The formatting multi_match query wraps both the function name and the word "query" in a single code span, which is misleading. The fix to multi_match query is accurate and improves clarity.

Low
Avoid redefining function inside loop

Defining a function inside a loop body is inefficient since it gets redefined on
every iteration of the outer loop. Move backtick_percent outside the loop (or at
module level) so it is defined only once.

scripts/docs_exporter/export_to_docs_website.py [108-114]

+# Define outside the loop, e.g. at module level or before the loop:
 def backtick_percent(cell):
     if "'" in cell or '"' in cell or '`' in cell:
         return cell
     if re.match(r'^%\S+$', cell):
         return f'`{cell}`'
     return cell
+
+# Inside the loop, just use it:
 cells = [backtick_percent(c) for c in cells]
Suggestion importance[1-10]: 4

__

Why: The backtick_percent function is defined inside a loop body, causing it to be redefined on every iteration. Moving it outside the loop is a valid optimization, though the performance impact is minimal for typical documentation processing workloads.

Low
Restore NULL return behavior for MULTIPLY by zero

The MULTIPLY function description previously noted that if y equals 0, it returns
NULL, but this important behavior detail was removed in the new documentation. This
is a meaningful behavioral note that users should be aware of.

docs/user/ppl/functions/math.md [115-117]

-**Return type**: The wider numeric type between `x` and `y`
+**Return type**: The wider numeric type between `x` and `y`. If `y` equals `0`, returns `NULL`.
 
-**Synonyms**: Addition Symbol (`+`)
+**Synonyms**: Multiplication Symbol (`*`)
Suggestion importance[1-10]: 4

__

Why: The suggestion correctly identifies that the NULL-on-zero behavior note was removed from MULTIPLY, but the improved_code incorrectly applies this to the ADD function's return type section rather than the MULTIPLY section, and also incorrectly changes the synonym symbol.

Low
Restore type mismatch warning for IF function

The original documentation noted that if expr1 and expr2 have different types, a
semantic check failure occurs. The new text says "Least restrictive common type" but
omits the important warning about type mismatch causing a semantic check failure,
which is useful information for users.

docs/user/ppl/functions/condition.md [293-303]

 **Usage**: `if(condition, expr1, expr2)`
-...
-**Return type**: Least restrictive common type of `expr1` and `expr2`
 
+Returns `expr1` if the condition is `true`, otherwise returns `expr2`.
+
+**Parameters**:
+
+- `condition` (Required): The Boolean expression to evaluate.
+- `expr1` (Required): The value to return if the condition is `true`.
+- `expr2` (Required): The value to return if the condition is `false`.
+
+**Return type**: Least restrictive common type of `expr1` and `expr2`. If `expr1` and `expr2` have incompatible types, a semantic check error occurs.
+
Suggestion importance[1-10]: 3

__

Why: The suggestion to add a semantic check warning for type mismatches in IF is a minor documentation improvement, but the new "Least restrictive common type" wording already implies type coercion behavior, making this a low-impact change.

Low
Standardize callout/note formatting style

The blockquote formatting used here is inconsistent with the note formatting used
elsewhere in the same file (for example, the LIST function uses {: .note} callout
syntax). Using mixed formatting styles for similar advisory content reduces visual
consistency in the documentation.

docs/user/ppl/functions/aggregations.md [787-790]

-> The `plugins.ppl.values.max.limit` setting controls the maximum number of unique values returned:
-> - The default value is 0, which returns an unlimited number of values.
-> - Setting this to any positive integer limits the number of unique values.
-> - See the [PPL Settings](../admin/settings.md#plugins-ppl-values-max-limit) documentation for more details
+The `plugins.ppl.values.max.limit` setting controls the maximum number of unique values returned:
+- The default value is 0, which returns an unlimited number of values.
+- Setting this to any positive integer limits the number of unique values.
+- See the [PPL Settings](../admin/settings.md#plugins-ppl-values-max-limit) documentation for more details.
+{: .note}
Suggestion importance[1-10]: 3

__

Why: The blockquote formatting is inconsistent with the {: .note} callout syntax used elsewhere in the file (e.g., for LIST). Standardizing improves visual consistency, but this is a minor style issue.

Low
Verify corrected return types match implementation

The LIKE function's documented return type was changed from INTEGER to BOOLEAN in
the new code, which is actually a correction. However, the ILIKE function's return
type was also changed from INTEGER to BOOLEAN. Verify that both LIKE and ILIKE
actually return BOOLEAN and not INTEGER to ensure the documentation matches the
implementation.

docs/user/ppl/functions/string.md [120]

+**Return type**: `BOOLEAN`
 
-
Suggestion importance[1-10]: 2

__

Why: The suggestion asks to verify that LIKE and ILIKE return BOOLEAN rather than INTEGER, but the existing_code and improved_code are identical, making this a verification request rather than an actionable code change.

Low
Verify anchor link resolves to correct section

The anchor link #percentile-percentile_approx was changed from the original
#percentile-or-percentile_approx. The section heading in aggregations.md is ###
PERCENTILE, PERCENTILE_APPROX, which would generate the anchor
#percentile-percentile_approx. Verify that this anchor correctly resolves to the
intended section, as the comma in the heading may generate a different anchor
depending on the Markdown renderer.

docs/user/ppl/functions/index.md [21]

+- [PERCENTILE, PERCENTILE_APPROX](aggregations.md/#percentile-percentile_approx).
 
-
Suggestion importance[1-10]: 2

__

Why: The existing_code and improved_code are identical, making this only a verification request. The anchor #percentile-percentile_approx should work correctly for the heading ### PERCENTILE, PERCENTILE_APPROX in most Markdown renderers.

Low
Suggestions up to commit 2306832
CategorySuggestion                                                                                                                                    Impact
Possible issue
Fix incorrect return type for ROUND function

The return type for ROUND with FLOAT/DOUBLE input is incorrectly listed as LONG.
When rounding a FLOAT or DOUBLE with a decimal precision argument, the result should
be DOUBLE, not LONG. This is a documentation accuracy issue that could mislead
users.

docs/user/ppl/functions/math.md [1184-1186]

 **Return type**:
 - `(INTEGER/LONG [,INTEGER])` -> `LONG`.
-- `(FLOAT/DOUBLE [,INTEGER])` -> `LONG`.
+- `(FLOAT/DOUBLE [,INTEGER])` -> `DOUBLE`.
Suggestion importance[1-10]: 7

__

Why: The return type for ROUND with FLOAT/DOUBLE input is listed as LONG, which appears to be incorrect - rounding a float/double with decimal precision should return DOUBLE. This is a documentation accuracy issue that could mislead users about function behavior.

Medium
Fix incorrect return type for LIKE function

The LIKE function's documented return type is listed as INTEGER, but the function
description says it returns TRUE or FALSE. The return type should be BOOLEAN to
match the actual behavior and the description. The original documentation also
incorrectly listed it as INTEGER.

docs/user/ppl/functions/string.md [120]

-**Return type**: `INTEGER`
+**Return type**: `BOOLEAN`
Suggestion importance[1-10]: 7

__

Why: The LIKE function description clearly states it returns TRUE or FALSE, but the documented return type is INTEGER. This is a factual error in the documentation that should be corrected to BOOLEAN.

Medium
Fix incorrect return type for ILIKE function

The ILIKE function's return type is listed as INTEGER, but the function description
states it returns TRUE or FALSE. The return type should be BOOLEAN to match the
actual behavior and the description.

docs/user/ppl/functions/string.md [158-162]

-**Return type**: `INTEGER`
+**Return type**: `BOOLEAN`
 
 ### Example
   
 ```ppl
 source=people
 | eval `ILIKE('hello world', '_ELLo%')` = ILIKE('hello world', '_ELLo%')
Suggestion importance[1-10]: 7

__

Why: The ILIKE function description states it returns TRUE or FALSE, but the documented return type is INTEGER. This is a factual error that should be corrected to BOOLEAN, consistent with the function's described behavior.

Medium
Fix missing table row and trailing newline

The file ends without a newline character, which can cause issues with some Markdown
renderers and version control systems. The last row of the table is also missing the
Desktop Computer Tower entry in the new table format. Add the missing row and a
trailing newline.

docs/user/ppl/functions/condition.md [895-898]

-| Desktop Computer Tower | Computing|
-+------------------------+----------+
-```
+| name | category |
+| --- | --- |
+| Dell Laptop XPS | Computing |
+| iPhone 15 Pro | Mobile |
+| Wireless Mouse | Other |
+| Desktop Computer Tower | Computing |
Suggestion importance[1-10]: 5

__

Why: The new table format for REGEXP_MATCH examples is missing the Desktop Computer Tower row that existed in the old format. This is a valid content omission that reduces documentation completeness.

Low
General
Restrict regex to match only single format specifiers

The regex ^%\S+$ matches any non-whitespace sequence after %, including compound
format strings like %Y-%m-%d. If the intent is only to wrap single format specifiers
(like %a, %b), the pattern should be more restrictive. Consider using ^%[A-Za-z]$ or
^%[^%\s]+$ depending on the intended scope.

scripts/docs_exporter/export_to_docs_website.py [108-113]

 def backtick_percent(cell):
     if "'" in cell or '"' in cell or '`' in cell:
         return cell
-    if re.match(r'^%\S+$', cell):
+    if re.match(r'^%[A-Za-z]$', cell):
         return f'`{cell}`'
     return cell
Suggestion importance[1-10]: 6

__

Why: The current regex ^%\S+$ would match compound format strings like %Y-%m-%d, wrapping them in backticks unintentionally. Restricting to ^%[A-Za-z]$ better targets single format specifiers like %a, %b, etc., which appear in the DATE_FORMAT table being processed.

Low
Fix inconsistent inline code formatting

The backtick formatting is inconsistent — multi_match query should be formatted as
two separate elements or as multi_match query. The current formatting wraps both the
function name and the word "query" together in a code span, which is incorrect.

docs/user/ppl/functions/relevance.md [211]

-Maps to the `multi_match query` in the OpenSearch engine.
+Maps to the `multi_match` query in the OpenSearch engine.
Suggestion importance[1-10]: 6

__

Why: The formatting multi_match query wraps both the function name and the word "query" in a single code span, which is inconsistent with how other functions are documented (e.g., match query, match_phrase query). This is a valid documentation quality fix.

Low
Restore important type constraint note for IFNULL

The documentation for IFNULL removed the important note that both parameters must
have the same type or the semantic check will fail. This is a behavioral constraint
that users need to know to avoid errors. The note should be preserved.

docs/user/ppl/functions/condition.md [196-205]

 **Usage**: `ifnull(field1, field2)`
 
 Returns `field2` if `field1` is `NULL`.
 
 **Parameters**:
 
 - `field1` (Required): The field to check for `NULL` values.
-- `field2` (Required): The value to return if `field1` is `NULL`.
+- `field2` (Required): The value to return if `field1` is `NULL`. Must be the same type as `field1`.
 
 **Return type**: Any (matches input types)
 
+> **Note**: If the two parameters have different types, the query will fail the semantic check.
+
Suggestion importance[1-10]: 5

__

Why: The PR removed the note that ifnull parameters must have the same type or the semantic check will fail. This is a behavioral constraint users need to know to avoid errors, and its removal reduces documentation accuracy.

Low
Avoid redefining function inside a loop

Defining a function inside a loop body (which runs for every table row) is
inefficient. The backtick_percent function should be defined outside the loop to
avoid redefining it on every iteration. Move the function definition to the module
or class level, or at least outside the loop.

scripts/docs_exporter/export_to_docs_website.py [108-114]

+# Define outside the loop (e.g., at module level or before the loop)
 def backtick_percent(cell):
     if "'" in cell or '"' in cell or '`' in cell:
         return cell
     if re.match(r'^%\S+$', cell):
         return f'`{cell}`'
     return cell
+
+# Inside the loop:
 cells = [backtick_percent(c) for c in cells]
Suggestion importance[1-10]: 4

__

Why: The backtick_percent function is defined inside the loop body, causing it to be redefined on every iteration. Moving it outside the loop is a valid optimization, though the performance impact is minor in practice for typical document sizes.

Low
Restore NULL return behavior for zero divisor

The original documentation for MULTIPLY included an important note that if y equals
0, the function returns NULL. This behavior was removed in the new version and
should be preserved in the return type description to avoid user confusion.

docs/user/ppl/functions/math.md [104-117]

 ## MULTIPLY
 
 **Usage**: `MULTIPLY(x, y)`
 
 Calculates the product of `x` and `y`.
-...
-**Return type**: The wider numeric type between `x` and `y`
+
+**Parameters**:
+
+- `x` (Required): An `INTEGER`, `LONG`, `FLOAT`, or `DOUBLE` value.
+- `y` (Required): An `INTEGER`, `LONG`, `FLOAT`, or `DOUBLE` value.
+
+**Return type**: The wider numeric type between `x` and `y`. If `y` equals `0`, returns `NULL`.
 
 **Synonyms**: Multiplication Symbol (`*`)
Suggestion importance[1-10]: 4

__

Why: The original documentation noted that MULTIPLY returns NULL when y equals 0, but this behavior was removed in the new version. However, this behavior is unusual for multiplication (unlike division), so it may have been intentionally removed if the behavior changed. The suggestion is valid but uncertain.

Low
Verify anchor link resolves correctly

The anchor link for PERCENTILE, PERCENTILE_APPROX was changed from
#percentile-or-percentile_approx to #percentile-percentile_approx, but the section
heading in aggregations.md is ### PERCENTILE, PERCENTILE_APPROX. Markdown typically
generates anchors by lowercasing and replacing spaces/commas with hyphens, which
would produce #percentile-percentile_approx. Verify that this anchor correctly
resolves to the intended section in the rendered documentation.

docs/user/ppl/functions/index.md [21]

+- [PERCENTILE, PERCENTILE_APPROX](aggregations.md/#percentile-percentile_approx).
 
-
Suggestion importance[1-10]: 3

__

Why: The suggestion asks to verify that the anchor #percentile-percentile_approx resolves correctly. The existing_code and improved_code are identical, so this is just a verification request with no actual code change, warranting a low score.

Low

Signed-off-by: Ritvi Bhatt <ribhatt@amazon.com>
@github-actions
Copy link
Contributor

github-actions bot commented Mar 5, 2026

Persistent review updated to latest commit a650ebf

Signed-off-by: Ritvi Bhatt <ribhatt@amazon.com>
@github-actions
Copy link
Contributor

github-actions bot commented Mar 5, 2026

Persistent review updated to latest commit c578e42

Signed-off-by: Ritvi Bhatt <ribhatt@amazon.com>
@github-actions
Copy link
Contributor

github-actions bot commented Mar 6, 2026

Persistent review updated to latest commit 93f92d5

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant