Adding Getting Started PPL Documentation [5200] by anasalkouz · Pull Request #5201 · opensearch-project/sql

anasalkouz · 2026-03-04T18:30:43Z

Description

This PR enhances the PPL documentation by adding a comprehensive "Getting Started" tutorial and reorganizing the PPL reference manual (index.md) for improved usability and discoverability.

Changes

1. New Getting Started Tutorial (`docs/user/ppl/tutorials/getting-started.md`)

Created a hands-on 15-minute tutorial that introduces PPL fundamentals through practical examples:

Tutorial Structure:

Introduction: PPL vs SQL comparison showing the intuitive pipeline approach
Sample Data: OpenTelemetry log structure with example documents for context
8 Progressive Steps: Building from simple queries to complex aggregations
- Step 1: Basic Query - View data with source, fields, and head
- Step 2: Filter Data - Use where to find errors
- Step 3: Combine Conditions - Multiple filters with AND/OR
- Step 4: Time Range Filtering - Filter by timestamp
- Step 5: Count Aggregation - Basic stats count()
- Step 6: Group By - Count errors by service
- Step 7: Multiple Aggregations - Calculate count and averages with rounding
- Step 8: Sort Results - Order by error count
Real-World Example: Complex query combining filtering, aggregation, evaluation, field selection, sorting, and limiting
Query Building Tips: Best practices for iterative query development
Common Patterns: Reusable query templates for log analysis, performance monitoring, and traffic analysis

2. PPL Reference Manual Reorganization (`docs/user/ppl/index.md`)

Restructured the reference manual for better navigation and quick lookup:

3. Test Infrastructure

New Test Data:

Added doctest/test_data/otel_logs.json with 1,747 OpenTelemetry log records
Includes realistic fields: timestamp, severity, service name, HTTP status codes, duration, messages
Updated doctest/test_docs.py to map otel_logs index to the data file

Test Coverage:

All 8 tutorial steps are tested with doctest
Real-world example query is tested
All expected outputs verified against actual query results

Testing

Run the tutorial tests:

# Test the new tutorial
./gradlew :doctest:doctest -Pdocs=tutorials/getting-started

# Test all PPL docs
./gradlew :doctest:doctest

Related Issues

Resolves #5200

Check List

New functionality includes testing.
New functionality has been documented.
New functionality has javadoc added.
New functionality has a user manual doc added.
New PPL command checklist all confirmed.
API changes companion pull request created.
Commits are signed per the DCO using --signoff or -s.
Public documentation issue/PR created.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Signed-off-by: Anas Alkouz <aalkouz@amazon.com>

github-actions · 2026-03-04T18:32:15Z

PR Reviewer Guide 🔍

(Review updated until commit `33a205f`)

Here are some key observations to aid the review process:

🧪 PR contains tests
🔒 No security concerns identified
✅ No TODO sections
🔀 Multiple PR themes Sub-PR theme: Add otel_logs test data and infrastructure Relevant files: doctest/test_docs.py doctest/test_mapping/otel_logs.json docs/user/dql/metadata.rst docs/category.json Sub-PR theme: Add Getting Started PPL tutorial and reorganize PPL reference index Relevant files: docs/user/ppl/tutorials/getting-started.md docs/user/ppl/index.md
⚡ Recommended focus areas for review Hardcoded Output The tutorial contains hardcoded expected query outputs (row counts, specific values like error counts of 523, 412, 198, 114) that are not validated against actual test data. If the test data in `otel_logs.json` changes or differs, the documented outputs will be misleading or incorrect. These outputs should either be verified against the actual test data or clearly marked as illustrative examples. ```text fetched rows / total rows = 5/5 +---------------------+----------+------------------+---------------------------------+ \| @timestamp \| severity \| service.name \| message \| \|---------------------+----------+------------------+---------------------------------\| \| 2024-03-15 11:30:40 \| ERROR \| checkout-service \| Invalid product ID \| \| 2024-03-15 11:30:40 \| ERROR \| payment-api \| Insufficient funds check failed \| \| 2024-03-15 11:30:50 \| ERROR \| checkout-service \| Inventory check failed \| \| 2024-03-15 11:30:50 \| ERROR \| payment-api \| Transaction timeout \| \| 2024-03-15 11:31:00 \| ERROR \| checkout-service \| Invalid product ID \| +---------------------+----------+------------------+---------------------------------+ What this does: `source=otel_logs` - Specifies which index to query `fields` - Selects specific fields to display `head 5` - Returns only the first 5 results 💡 Tip: The `head` command is perfect for previewing data before building complex queries. Filtering Data: Finding What Matters Now that we've seen the data, let's find specific information. The `where` command filters results based on conditions. Step 2: Find All Errors Let's find all error logs to investigate issues: `source=otel_logs \| where severity = "ERROR" \| fields @timestamp, service.name, message \| head 5` fetched rows / total rows = 5/5 +---------------------+------------------+---------------------------------+ \| @timestamp \| service.name \| message \| \|---------------------+------------------+---------------------------------\| \| 2024-03-15 11:30:40 \| checkout-service \| Invalid product ID \| \| 2024-03-15 11:30:40 \| payment-api \| Insufficient funds check failed \| \| 2024-03-15 11:30:50 \| checkout-service \| Inventory check failed \| \| 2024-03-15 11:30:50 \| payment-api \| Transaction timeout \| \| 2024-03-15 11:31:00 \| checkout-service \| Invalid product ID \| +---------------------+------------------+---------------------------------+ What this does: Filters to show only logs where `severity` equals `"ERROR"` Selects specific fields to display Limits results to 5 records Step 3: Filter by Multiple Conditions Let's narrow down to errors from a specific service: `source=otel_logs \| where severity = "ERROR" AND service.name = "checkout-service" \| fields @timestamp, service.name, message` What this does: Uses `AND` to combine multiple conditions Shows only errors from the checkout service fetched rows / total rows = 523/523 +---------------------+------------------+-----------------------------+ \| @timestamp \| service.name \| message \| \|---------------------+------------------+-----------------------------\| \| 2024-03-15 11:30:40 \| checkout-service \| Invalid product ID \| \| 2024-03-15 11:30:50 \| checkout-service \| Inventory check failed \| \| 2024-03-15 11:31:00 \| checkout-service \| Invalid product ID \| \| 2024-03-15 11:31:10 \| checkout-service \| Invalid product ID \| \| 2024-03-15 11:31:20 \| checkout-service \| Invalid product ID \| \| 2024-03-15 11:31:30 \| checkout-service \| Payment processing error \| \| 2024-03-15 11:31:40 \| checkout-service \| Invalid product ID \| \| 2024-03-15 11:31:50 \| checkout-service \| Order validation failed \| \| 2024-03-15 11:32:00 \| checkout-service \| Payment processing error \| \| 2024-03-15 11:32:10 \| checkout-service \| Database connection timeout \| \| 2024-03-15 11:32:20 \| checkout-service \| Order validation failed \| \| 2024-03-15 11:32:30 \| checkout-service \| Database connection timeout \| \| 2024-03-15 11:32:40 \| checkout-service \| Payment processing error \| \| 2024-03-15 11:32:50 \| checkout-service \| Invalid product ID \| \| 2024-03-15 11:33:00 \| checkout-service \| Invalid product ID \| ... +---------------------+------------------+-----------------------------+ Step 4: Filter by Time Range Most log analysis focuses on recent data. Let's find errors from a specific time period: `source=otel_logs \| where severity = "ERROR" AND @timestamp >= "2024-03-15 11:30:00" AND @timestamp < "2024-03-15 11:31:00" \| fields @timestamp, service.name, message \| head 5` fetched rows / total rows = 5/5 +---------------------+------------------+---------------------------------+ \| @timestamp \| service.name \| message \| \|---------------------+------------------+---------------------------------\| \| 2024-03-15 11:30:40 \| checkout-service \| Invalid product ID \| \| 2024-03-15 11:30:40 \| payment-api \| Insufficient funds check failed \| \| 2024-03-15 11:30:50 \| checkout-service \| Inventory check failed \| \| 2024-03-15 11:30:50 \| payment-api \| Transaction timeout \| \| 2024-03-15 11:30:00 \| checkout-service \| Inventory check failed \| +---------------------+------------------+---------------------------------+ What this does: Filters to errors within a specific time window Uses string literals for timestamp comparison Shows the first 5 matching results 💡 Tip: For relative time ranges, use functions like `date_sub(now(), INTERVAL 24 HOUR)` to calculate "24 hours ago" Aggregating Data: Uncover Patterns Aggregation helps you understand trends and patterns in your data. Step 5: Count Total Errors How many errors occurred? `source=otel_logs \| where severity = "ERROR" \| stats count()` What this does: `stats count()` counts all matching records Returns a single number Expected output: `fetched rows / total rows = 1/1 +---------+ \| count() \| \|---------\| \| 1247 \| +---------+` Step 6: Count Errors by Service Which services have the most errors? `source=otel_logs \| where severity = "ERROR" \| stats count() by service.name` What this does: `by service.name` groups results by service Counts errors for each service Expected output: `fetched rows / total rows = 4/4 +---------+------------------+ \| count() \| service.name \| \|---------+------------------\| \| 523 \| checkout-service \| \| 198 \| inventory-svc \| \| 412 \| payment-api \| \| 114 \| user-service \| +---------+------------------+` Step 7: Multiple Aggregations Let's get more insights with multiple aggregations: `source=otel_logs \| where severity = "ERROR" \| stats count() as error_count, avg(duration_ms) as avg_duration, max(duration_ms) as max_duration by service.name \| eval avg_duration = round(avg_duration, 2) \| fields service.name, error_count, avg_duration, max_duration` What this does: Calculates multiple metrics per service Uses `as` to name the calculated fields Uses `eval` with `round()` function to limit decimal places to 2 Uses `fields` to control the column order in output Shows service name, error count, average duration, and max duration Expected output: `fetched rows / total rows = 4/4 +------------------+-------------+--------------+--------------+ \| service.name \| error_count \| avg_duration \| max_duration \| \|------------------+-------------+--------------+--------------\| \| checkout-service \| 523 \| 1267.99 \| 5000 \| \| inventory-svc \| 198 \| 466.71 \| 1800 \| \| payment-api \| 412 \| 894.33 \| 3200 \| \| user-service \| 114 \| 609.54 \| 1094 \| +------------------+-------------+--------------+--------------+` 💡 Insight: The checkout-service has both the most errors and the longest durations - a clear area for investigation! Sorting and Limiting: Prioritize Your Findings Step 8: Sort Results Let's find which services have the most errors: `source=otel_logs \| where severity = "ERROR" \| stats count() as error_count by service.name \| sort error_count desc` What this does: `sort error_count desc` sorts by error count, highest first `desc` means descending (use `asc` for ascending) Expected output: `fetched rows / total rows = 4/4 +-------------+------------------+ \| error_count \| service.name \| \|-------------+------------------\| \| 523 \| checkout-service \| \| 412 \| payment-api \| \| 198 \| inventory-svc \| \| 114 \| user-service \| +-------------+------------------+` Putting It All Together: A Real-World Query Let's combine everything we've learned to answer: "What are the top error patterns by service and HTTP status code?" `source=otel_logs \| where severity = "ERROR" AND http.status_code >= 400 \| stats count() as error_count, avg(duration_ms) as avg_response_time, max(duration_ms) as max_response_time by service.name, http.status_code \| eval avg_response_time = round(avg_response_time, 2) \| fields service.name, http.status_code, error_count, avg_response_time, max_response_time \| sort error_count desc \| head 10` What this query does: Filters to errors with HTTP status codes 400+ Aggregates error counts and response times by service and status code Rounds average response time to 2 decimal places Selects fields in a logical order for analysis Sorts by error count to find the biggest problems Limits to top 10 results Expected output: fetched rows / total rows = 10/10 +------------------+------------------+-------------+-------------------+-------------------+ \| service.name \| http.status_code \| error_count \| avg_response_time \| max_response_time \| \|------------------+------------------+-------------+-------------------+-------------------\| \| checkout-service \| 400 \| 265 \| 1258.26 \| 2354 \| \| checkout-service \| 500 \| 258 \| 1278.0 \| 5000 \| \| payment-api \| 503 \| 143 \| 921.71 \| 3200 \| \| payment-api \| 400 \| 140 \| 882.92 \| 1593 \| \| payment-api \| 500 \| 129 \| 876.36 \| 1525 \| \| inventory-svc \| 500 \| 108 \| 481.69 \| 1800 \| \| inventory-svc \| 400 \| 90 \| 448.73 \| 785 \| \| user-service \| 401 \| 42 \| 631.93 \| 1094 \| \| user-service \| 403 \| 37 \| 568.08 \| 972 \| \| user-service \| 500 \| 35 \| 626.49 \| 1015 \| +------------------+------------------+-------------+-------------------+-------------------+ </details> <details><summary><a href='https://github.com/opensearch-project/sql/pull/5201/files#diff-1c35c2d36c5008bcd1a5cbeec3acce5af8c0c9c8317bbb86564a9dd6b8cb3cd7R34-R39'><strong>PPL Comment Syntax</strong></a> The tutorial uses `//` as a comment syntax in a PPL code block (line 35), but PPL uses `/` or `--` for comments, not `//`. This could confuse users trying to copy and run the example. </summary> ```markdown ```ppl // PPL: Read left to right, like a story source=otel_logs \| where severity = "ERROR" \| stats count() by service.name </details> <details><summary><a href='https://github.com/opensearch-project/sql/pull/5201/files#diff-f7e3783015726eb3f35843294aaa812746908071b5772681ccef318b1fa16095R202-R215'><strong>Missing Links</strong></a> The new index references `functions/index.md` in the getting-started tutorial's "Next Steps" section, but the index.md itself does not include a link to `cmd/syntax.md` which was previously listed in the old index. The `Interfaces` and `Administration` sections (endpoint, protocol, settings, security, monitoring, datasources, connectors, cross-cluster search) that existed in the old index are completely removed with no replacement links, potentially breaking navigation for users looking for those resources. </summary> ```markdown ## Documentation & Resources - [Optimization Guide](../../user/optimization/optimization.rst)* - Query performance tuning - [Limitations](limitations/limitations.md) - Known limitations and workarounds - [OpenSearch Documentation](https://opensearch.org/docs/latest/) - Main OpenSearch docs --- ## Need Help? - New to PPL? Start with the [Getting Started Tutorial](tutorials/getting-started.md) - Have questions or issues? - Submit issues to the [OpenSearch SQL repository](https://github.com/opensearch-project/sql/issues) - Join our [public Slack channel](https://opensearch.org/slack.html) for community support Row Count Accuracy The total row count in the SHOW TABLES output was updated from 24 to 25 to account for the new `otel_logs` index. However, `mvcombine_data` was also reformatted (whitespace fix) in the same diff. Verify that the count of 25 is accurate and reflects all indices currently registered in the test environment. fetched rows / total rows = 25/25

github-actions · 2026-03-04T18:32:39Z

PR Code Suggestions ✨

Latest suggestions up to 33a205f

Explore these optional code suggestions:

Category	Suggestion	Impact
Possible issue	Remove invalid comment syntax from PPL code block PPL does not use `//` for comments — this is not valid PPL syntax. Using `//` as a comment prefix may confuse readers into thinking it's valid PPL, and could cause issues if the code block is executed. Remove the comment line or use a prose description outside the code block instead. docs/user/ppl/tutorials/getting-started.md [34-39] ```ppl -// PPL: Read left to right, like a story source=otel_logs \| where severity = "ERROR" \| stats count() by service.name <details><summary>Suggestion importance[1-10]: 7</summary> __ Why: The `//` comment syntax is not valid PPL syntax and could mislead users into thinking it's valid PPL. Removing it improves accuracy of the documentation example. </details></details></td><td align=center>Medium </td></tr><tr><td> <details><summary>Fix broken link to functions reference</summary> ___ The link <code>../functions/index.md</code> points to a file that does not appear to exist in the <br>repository based on the diff. The existing function reference files are individual <br>pages (e.g., <code>functions/aggregations.md</code>). This broken link should be corrected to <br>point to a valid file, such as <code>../functions/aggregations.md</code> or the main index. [docs/user/ppl/tutorials/getting-started.md [464]](https://github.com/opensearch-project/sql/pull/5201/files#diff-1c35c2d36c5008bcd1a5cbeec3acce5af8c0c9c8317bbb86564a9dd6b8cb3cd7R464-R464) ```diff -[Functions Reference](../functions/index.md) - Explore available functions for calculations +[Functions Reference](../functions/aggregations.md) - Explore available functions for calculations Suggestion importance[1-10]: 5 __ Why: The link `../functions/index.md` likely doesn't exist based on the PR diff, which only shows individual function pages. This could result in a broken link for users, though the exact file structure outside the diff is uncertain.	Low
Possible issue	General	Standardize inconsistent timestamp formats in test data The test data file mixes two different timestamp formats: some entries use millisecond precision (`2024-03-15T10:30:45.000Z`) while others use second precision (`2024-03-15T10:34:00Z`). This inconsistency could cause test failures if the documentation examples rely on a specific timestamp format for parsing or filtering. Standardize all timestamps to use the same format throughout the file. doctest/test_data/otel_logs.json [1-14] -{"@timestamp": "2024-03-15T10:30:45.000Z", ...} -{"@timestamp": "2024-03-15T10:30:46.000Z", ...} +{"@timestamp": "2024-03-15T10:30:45.000Z", "severity": "INFO", "service.name": "checkout-service", "message": "Order created successfully", "http.status_code": 200, "duration_ms": 150, "http.route": "/api/checkout"} +{"@timestamp": "2024-03-15T10:30:46.000Z", "severity": "INFO", "service.name": "payment-api", "message": "Payment processed", "http.status_code": 200, "duration_ms": 320, "http.route": "/api/payment"} ... -{"@timestamp": "2024-03-15T10:34:00Z", ...} +{"@timestamp": "2024-03-15T10:34:00.000Z", "severity": "ERROR", ...} Suggestion importance[1-10]: 4 __ Why: The suggestion correctly identifies a real inconsistency between millisecond-precision timestamps (lines 1-9) and second-precision timestamps (line 10 onwards). However, this is test data and many parsers handle both ISO 8601 formats correctly, so the practical impact may be low. The `improved_code` only partially demonstrates the fix and doesn't fully reflect the scope of changes needed.	Low
Clarify inclusive/exclusive time range boundary in example The time range filter uses `<` (strictly less than) for the upper bound `"2024-03-15` `11:31:00"`, but the sample output includes a row with `@timestamp` of `2024-03-15` `11:30:00` which is at the lower bound. More importantly, the output shows only 5 rows with timestamps between 11:30:40 and 11:30:50, which is consistent, but the last row shows `2024-03-15 11:30:00` which is exactly at the lower bound — this is fine. However, the query uses `>=` for the lower bound and `<` for the upper bound, which means `11:31:00` is excluded. The sample output is consistent, but the tutorial should clarify this boundary behavior to avoid confusion. docs/user/ppl/tutorials/getting-started.md [203-205] \| where severity = "ERROR" AND @timestamp >= "2024-03-15 11:30:00" - AND @timestamp < "2024-03-15 11:31:00" + AND @timestamp <= "2024-03-15 11:30:59" Suggestion importance[1-10]: 2 __ Why: The suggestion changes `<` to `<=` with a different timestamp, but the existing code is logically consistent and the `improved_code` changes the semantics unnecessarily. The original boundary behavior is standard and the suggestion offers minimal improvement.	General	Low

Previous suggestions

Suggestions up to commit 35cc46e

Category	Suggestion	Impact
Possible issue	Remove invalid comment syntax from PPL code block PPL does not support `//` style comments. Using `//` as a comment prefix in a PPL code block is incorrect and may confuse users or cause errors if they copy and run the query. Remove the comment line or use a prose description outside the code block instead. docs/user/ppl/tutorials/getting-started.md [35-40] ```ppl -// PPL: Read left to right, like a story source=otel_logs \| where severity = "ERROR" \| stats count() by service.name <details><summary>Suggestion importance[1-10]: 7</summary> __ Why: PPL does not support `//` style comments, so including `// PPL: Read left to right, like a story` in a PPL code block is incorrect and could confuse users who copy and run the query. Removing it improves accuracy of the documentation. </details></details></td><td align=center>Medium </td></tr><tr><td rowspan=3>General</td> <td> <details><summary>Standardize inconsistent timestamp formats in test data</summary> ___ The test data file mixes two different timestamp formats: some entries use <br>millisecond precision (<code>2024-03-15T10:30:45.000Z</code>) while others use second precision <br>(<code>2024-03-15T10:34:00Z</code>). This inconsistency could cause test failures if the <br>documentation examples rely on a specific timestamp format for parsing or filtering. <br>Standardize all timestamps to use the same format throughout the file. [doctest/test_data/otel_logs.json [1-2]](https://github.com/opensearch-project/sql/pull/5201/files#diff-975ff637bd6e510d15564ca368fae81c5dfea7edd0885670802c6763751e08a1R1-R2) ```diff -{"@timestamp": "2024-03-15T10:30:45.000Z", ...} -{"@timestamp": "2024-03-15T10:30:46.000Z", ...} +{"@timestamp": "2024-03-15T10:30:45.000Z", "severity": "INFO", "service.name": "checkout-service", "message": "Order created successfully", "http.status_code": 200, "duration_ms": 150, "http.route": "/api/checkout"} +{"@timestamp": "2024-03-15T10:30:46.000Z", "severity": "INFO", "service.name": "payment-api", "message": "Payment processed", "http.status_code": 200, "duration_ms": 320, "http.route": "/api/payment"} Suggestion importance[1-10]: 4 __ Why: The suggestion correctly identifies that the test data file mixes timestamp formats (millisecond precision like `2024-03-15T10:30:45.000Z` vs second precision like `2024-03-15T10:34:00Z`). However, the `improved_code` only shows the first two lines unchanged and doesn't actually demonstrate the standardization fix. The impact is moderate as this inconsistency could affect timestamp parsing in tests, but both formats are valid ISO 8601 and most parsers handle both.	Low
Fix potentially broken documentation links The links `../cmd/index.md` and `../functions/index.md` reference `index.md` files that may not exist in the repository, as the PR only adds `getting-started.md` and the existing structure uses individual command files. Broken links in documentation degrade user experience and can cause build failures. Verify these files exist or update the links to point to valid targets. docs/user/ppl/tutorials/getting-started.md [463-464] -[PPL Command Reference](../cmd/index.md) - Complete command documentation -- [Functions Reference](../functions/index.md) - Explore available functions for calculations +[PPL Command Reference](../index.md#quick-command-reference) - Complete command documentation +- [Functions Reference](../index.md#functions) - Explore available functions for calculations Suggestion importance[1-10]: 4 __ Why: The links `../cmd/index.md` and `../functions/index.md` may point to non-existent files, but this is a verification concern rather than a confirmed bug. The suggested fix to anchor links in `../index.md` is a reasonable alternative, but the score is limited since existence of these files cannot be confirmed from the diff alone.	Low
Ensure sample output matches the filter conditions shown The time range filter uses `<` (strictly less than) for the upper bound at `11:31:00`, but the sample output includes a row with timestamp `2024-03-15 11:30:00` which is the exact lower bound. The output row `2024-03-15 11:30:00` is consistent with `>=`, but the last row shown (`11:30:00`) appears out of order compared to the other results. Ensure the sample output is consistent with the filter logic to avoid confusing readers. docs/user/ppl/tutorials/getting-started.md [203-205] +\| where severity = "ERROR" + AND @timestamp >= "2024-03-15 11:30:00" + AND @timestamp < "2024-03-15 11:31:00" - Suggestion importance[1-10]: 2 __ Why: The `existing_code` and `improved_code` are identical, so no actual change is proposed. The suggestion only asks to verify consistency between the filter and sample output, which is a minor documentation quality concern but not an actionable code fix.	Low

Suggestions up to commit fdacf43

Category	Suggestion	Impact
Possible issue	Remove invalid comment syntax from code block PPL does not use `//` for comments; this is invalid PPL syntax. The comment style shown may confuse readers who try to run this code. Remove the comment line or use a plain text explanation outside the code block instead. docs/user/ppl/tutorials/getting-started.md [34-39] ```ppl -// PPL: Read left to right, like a story source=otel_logs \| where severity = "ERROR" \| stats count() by service.name <details><summary>Suggestion importance[1-10]: 7</summary> __ Why: PPL does not support `//` comment syntax, so including it in a code block labeled `ppl` is misleading and could cause confusion for users who try to run the example. Removing the comment line makes the code block valid and accurate. </details></details></td><td align=center>Medium </td></tr><tr><td> <details><summary>Verify row count matches table entries</summary> ___ The total row count was updated from 24 to 25 to account for the new <code>otel_logs</code> <br>index, but the table listing must also be verified to contain exactly 25 rows. If <br>the table does not match this count, the doctest will fail. Ensure the number of <br>data rows in the table matches the declared total. [docs/user/dql/metadata.rst [38]](https://github.com/opensearch-project/sql/pull/5201/files#diff-119b6412f52dd06e6a6ddc02bec5f72d53a0020c309d04177e3abf72dc23775dR38-R38) ```diff +fetched rows / total rows = 25/25 - Suggestion importance[1-10]: 4 __ Why: The `existing_code` and `improved_code` are identical, so no actual change is suggested. The suggestion only asks to verify the count matches, which is a valid concern but doesn't warrant a high score since it's just a verification request.	Low
Possible issue	General	Fix potentially broken documentation links The links `../cmd/index.md` and `../functions/index.md` reference files that may not exist based on the rest of the documentation structure, which uses specific filenames like `cmd/search.md` and `functions/aggregations.md`. Broken links in documentation degrade user experience and may cause test failures. Verify these index files exist or replace with valid links. docs/user/ppl/tutorials/getting-started.md [463-464] -- [PPL Command Reference](../cmd/index.md) - Complete command documentation -- [Functions Reference](../functions/index.md) - Explore available functions for calculations +- [PPL Command Reference](../index.md#quick-command-reference) - Complete command documentation +- [Functions Reference](../index.md#functions) - Explore available functions for calculations Suggestion importance[1-10]: 4 __ Why: The links `../cmd/index.md` and `../functions/index.md` may not exist based on the documentation structure visible in the PR. However, the suggestion to replace them with anchor links to `../index.md` is a reasonable workaround, though it's uncertain whether those anchors exist either.	Low

Suggestions up to commit 9c1af6e

Category Suggestion Impact

Possible issue

Fix invalid comment syntax in PPL example

PPL does not use // for comments; this is not valid PPL syntax. Using incorrect
comment syntax in a tutorial could confuse users and cause query failures if they
copy the code. Remove the comment line or use a valid PPL comment style (e.g., /
... /).

docs/user/ppl/tutorials/getting-started.md [34-39]

 ```ppl
-// PPL: Read left to right, like a story
+/* PPL: Read left to right, like a story */
 source=otel_logs
 | where severity = "ERROR"
 | stats count() by service.name

<details><summary>Suggestion importance[1-10]: 7</summary>

__

Why: PPL does not support `//` style comments, so using this syntax in a tutorial is misleading and could cause query failures if users copy the code. The fix to use `/* ... */` block comment syntax is accurate and important for a tutorial document.


</details></details></td><td align=center>Medium

</td></tr><tr><td>



<details><summary>Fix broken links to non-existent documentation files</summary>

___


**These "Next Steps" links reference files that do not appear to exist in the <br>repository (<code>learn-common-commands.md</code>, <code>../quick-reference.md</code>, <code>../functions/index.md</code>). <br>Broken links in documentation degrade user experience and trust. Either create these <br>files or update the links to point to existing documentation.**

[docs/user/ppl/tutorials/getting-started.md [463-465]](https://github.com/opensearch-project/sql/pull/5201/files#diff-1c35c2d36c5008bcd1a5cbeec3acce5af8c0c9c8317bbb86564a9dd6b8cb3cd7R463-R465)

```diff
-- **[Learn Common Commands](learn-common-commands.md)** - Master the 10 most-used PPL commands
-- **[Quick Reference](../quick-reference.md)** - Handy cheat sheet for all commands
-- **[Functions Reference](../functions/index.md)** - Explore available functions for calculations
+- **[Learn Common Commands](../index.md)** - Master the most-used PPL commands
+- **[Quick Reference](../index.md)** - Handy reference for all commands
+- **[Functions Reference](../functions/aggregations.md)** - Explore available functions for calculations

Suggestion importance[1-10]: 5

__

Why: The links learn-common-commands.md, ../quick-reference.md, and ../functions/index.md likely don't exist yet, but the suggested replacements pointing to ../index.md or ../functions/aggregations.md are approximations that may not be ideal either. The issue is valid but the improved code is a rough workaround rather than a definitive fix.

Low

General

Fix broken command reference index link

The link ../cmd/index.md references a file that does not appear to exist in the
repository based on the diff (individual command files like cmd/search.md exist, but
not cmd/index.md). This broken link in the "Getting Help" section would leave users
unable to find the command reference. Update it to point to an existing file.

docs/user/ppl/tutorials/getting-started.md [488]

-- **[PPL Command Reference](../cmd/index.md)** - Complete command documentation
+- **[PPL Command Reference](../index.md)** - Complete command documentation

Suggestion importance[1-10]: 4

__

Why: The link ../cmd/index.md may not exist, but the suggested replacement ../index.md is a rough approximation. The issue is valid but the fix is uncertain without knowing the full repository structure.

Low

Signed-off-by: Anas Alkouz <aalkouz@amazon.com>

github-actions · 2026-03-04T19:37:28Z

Persistent review updated to latest commit fdacf43

Signed-off-by: Anas Alkouz <aalkouz@amazon.com>

github-actions · 2026-03-04T20:58:18Z

Persistent review updated to latest commit 35cc46e

Signed-off-by: Anas Alkouz <aalkouz@amazon.com>

github-actions · 2026-03-04T21:14:22Z

Persistent review updated to latest commit 33a205f

dai-chen · 2026-03-06T23:46:31Z

doctest/test_data/otel_logs.json

@@ -0,0 +1,1751 @@
+{"@timestamp": "2024-03-15T10:30:45.000Z", "severity": "INFO", "service.name": "checkout-service", "message": "Order created successfully", "http.status_code": 200, "duration_ms": 150, "http.route": "/api/checkout"}


Is it too much to have 1,751 records for doctest data? Would it make sense to trim this down to ~30-50 well-crafted records that still cover all use cases?

dai-chen · 2026-03-06T23:52:47Z

docs/user/ppl/tutorials/getting-started.md

+  "trace_id": "a1b2c3d4e5f6g7h8i9j0",
+  "span_id": "1a2b3c4d5e6f7g8h",


Are these missing in otel_logs test data?

dai-chen · 2026-03-07T00:05:45Z

docs/user/ppl/tutorials/getting-started.md

+
+Throughout this tutorial, we'll use OpenTelemetry (OTEL) observability data. OTEL is the industry standard for collecting telemetry data (logs, metrics, and traces) from applications.
+
+### Sample Log Structure


Just curious any plan to cover trace, metrics and correlation case among them in future?

anasalkouz added 6 commits March 4, 2026 08:41

Enhance PPL docs by adding getting started and improve the index.md

7e215b1

Signed-off-by: Anas Alkouz <aalkouz@amazon.com>

Remove broken links and add the slack channel link

93f4bfd

Signed-off-by: Anas Alkouz <aalkouz@amazon.com>

rename dataset name to otel_logs

b3b4eaf

Signed-off-by: Anas Alkouz <aalkouz@amazon.com>

Fix refrecnes links under the getting help section

00a7b28

Signed-off-by: Anas Alkouz <aalkouz@amazon.com>

Remove section 9 from the getting started tutorial

8f13491

Signed-off-by: Anas Alkouz <aalkouz@amazon.com>

Fix metadata.rst doc test failure

9c1af6e

Signed-off-by: Anas Alkouz <aalkouz@amazon.com>

anasalkouz added the documentation Improvements or additions to documentation label Mar 4, 2026

anasalkouz added the PPL Piped processing language label Mar 4, 2026

anasalkouz self-assigned this Mar 4, 2026

Remove broken links

fdacf43

Signed-off-by: Anas Alkouz <aalkouz@amazon.com>

Add otel_logs dataset

35cc46e

Signed-off-by: Anas Alkouz <aalkouz@amazon.com>

Add the quick reference link into the getting started

33a205f

Signed-off-by: Anas Alkouz <aalkouz@amazon.com>

anasalkouz added this to PPL 2026 Roadmap Mar 5, 2026

github-project-automation bot moved this to Todo in PPL 2026 Roadmap Mar 5, 2026

anasalkouz removed this from PPL 2026 Roadmap Mar 5, 2026

dai-chen reviewed Mar 7, 2026

View reviewed changes

		@@ -0,0 +1,1751 @@
		{"@timestamp": "2024-03-15T10:30:45.000Z", "severity": "INFO", "service.name": "checkout-service", "message": "Order created successfully", "http.status_code": 200, "duration_ms": 150, "http.route": "/api/checkout"}

		"trace_id": "a1b2c3d4e5f6g7h8i9j0",
		"span_id": "1a2b3c4d5e6f7g8h",


		Throughout this tutorial, we'll use OpenTelemetry (OTEL) observability data. OTEL is the industry standard for collecting telemetry data (logs, metrics, and traces) from applications.

		### Sample Log Structure

Conversation

anasalkouz commented Mar 4, 2026

Description

Changes

1. New Getting Started Tutorial (docs/user/ppl/tutorials/getting-started.md)

2. PPL Reference Manual Reorganization (docs/user/ppl/index.md)

3. Test Infrastructure

Testing

Related Issues

Check List

Uh oh!

github-actions bot commented Mar 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Reviewer Guide 🔍

(Review updated until commit 33a205f)

Filtering Data: Finding What Matters

Step 2: Find All Errors

Step 3: Filter by Multiple Conditions

Step 4: Filter by Time Range

Aggregating Data: Uncover Patterns

Step 5: Count Total Errors

Step 6: Count Errors by Service

Step 7: Multiple Aggregations

Sorting and Limiting: Prioritize Your Findings

Step 8: Sort Results

Putting It All Together: A Real-World Query

Uh oh!

github-actions bot commented Mar 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Code Suggestions ✨

Previous suggestions

Uh oh!

github-actions bot commented Mar 4, 2026

Uh oh!

github-actions bot commented Mar 4, 2026

Uh oh!

github-actions bot commented Mar 4, 2026

Uh oh!

dai-chen Mar 6, 2026

Choose a reason for hiding this comment

Uh oh!

dai-chen Mar 6, 2026

Choose a reason for hiding this comment

Uh oh!

dai-chen Mar 7, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

1. New Getting Started Tutorial (`docs/user/ppl/tutorials/getting-started.md`)

2. PPL Reference Manual Reorganization (`docs/user/ppl/index.md`)

github-actions bot commented Mar 4, 2026 •

edited

Loading

(Review updated until commit `33a205f`)

github-actions bot commented Mar 4, 2026 •

edited

Loading