Skip to content

Adding Getting Started PPL Documentation [5200]#5201

Open
anasalkouz wants to merge 9 commits intoopensearch-project:mainfrom
anasalkouz:feature/EnhancePPLDocStructure
Open

Adding Getting Started PPL Documentation [5200]#5201
anasalkouz wants to merge 9 commits intoopensearch-project:mainfrom
anasalkouz:feature/EnhancePPLDocStructure

Conversation

@anasalkouz
Copy link
Member

Description

This PR enhances the PPL documentation by adding a comprehensive "Getting Started" tutorial and reorganizing the PPL reference manual (index.md) for improved usability and discoverability.

Changes

1. New Getting Started Tutorial (docs/user/ppl/tutorials/getting-started.md)

Created a hands-on 15-minute tutorial that introduces PPL fundamentals through practical examples:

Tutorial Structure:

  • Introduction: PPL vs SQL comparison showing the intuitive pipeline approach
  • Sample Data: OpenTelemetry log structure with example documents for context
  • 8 Progressive Steps: Building from simple queries to complex aggregations
    • Step 1: Basic Query - View data with source, fields, and head
    • Step 2: Filter Data - Use where to find errors
    • Step 3: Combine Conditions - Multiple filters with AND/OR
    • Step 4: Time Range Filtering - Filter by timestamp
    • Step 5: Count Aggregation - Basic stats count()
    • Step 6: Group By - Count errors by service
    • Step 7: Multiple Aggregations - Calculate count and averages with rounding
    • Step 8: Sort Results - Order by error count
  • Real-World Example: Complex query combining filtering, aggregation, evaluation, field selection, sorting, and limiting
  • Query Building Tips: Best practices for iterative query development
  • Common Patterns: Reusable query templates for log analysis, performance monitoring, and traffic analysis

2. PPL Reference Manual Reorganization (docs/user/ppl/index.md)

Restructured the reference manual for better navigation and quick lookup:

3. Test Infrastructure

New Test Data:

  • Added doctest/test_data/otel_logs.json with 1,747 OpenTelemetry log records
  • Includes realistic fields: timestamp, severity, service name, HTTP status codes, duration, messages
  • Updated doctest/test_docs.py to map otel_logs index to the data file

Test Coverage:

  • All 8 tutorial steps are tested with doctest
  • Real-world example query is tested
  • All expected outputs verified against actual query results

Testing

Run the tutorial tests:

# Test the new tutorial
./gradlew :doctest:doctest -Pdocs=tutorials/getting-started

# Test all PPL docs
./gradlew :doctest:doctest

Related Issues

Resolves #5200

Check List

  • New functionality includes testing.
  • New functionality has been documented.
  • New functionality has javadoc added.
  • New functionality has a user manual doc added.
  • New PPL command checklist all confirmed.
  • API changes companion pull request created.
  • Commits are signed per the DCO using --signoff or -s.
  • Public documentation issue/PR created.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Signed-off-by: Anas Alkouz <aalkouz@amazon.com>
Signed-off-by: Anas Alkouz <aalkouz@amazon.com>
Signed-off-by: Anas Alkouz <aalkouz@amazon.com>
Signed-off-by: Anas Alkouz <aalkouz@amazon.com>
Signed-off-by: Anas Alkouz <aalkouz@amazon.com>
Signed-off-by: Anas Alkouz <aalkouz@amazon.com>
@anasalkouz anasalkouz added the PPL Piped processing language label Mar 4, 2026
@github-actions
Copy link
Contributor

github-actions bot commented Mar 4, 2026

PR Reviewer Guide 🔍

(Review updated until commit 33a205f)

Here are some key observations to aid the review process:

🧪 PR contains tests
🔒 No security concerns identified
✅ No TODO sections
🔀 Multiple PR themes

Sub-PR theme: Add otel_logs test data and infrastructure

Relevant files:

  • doctest/test_docs.py
  • doctest/test_mapping/otel_logs.json
  • docs/user/dql/metadata.rst
  • docs/category.json

Sub-PR theme: Add Getting Started PPL tutorial and reorganize PPL reference index

Relevant files:

  • docs/user/ppl/tutorials/getting-started.md
  • docs/user/ppl/index.md

⚡ Recommended focus areas for review

Hardcoded Output

The tutorial contains hardcoded expected query outputs (row counts, specific values like error counts of 523, 412, 198, 114) that are not validated against actual test data. If the test data in otel_logs.json changes or differs, the documented outputs will be misleading or incorrect. These outputs should either be verified against the actual test data or clearly marked as illustrative examples.

```text
fetched rows / total rows = 5/5
+---------------------+----------+------------------+---------------------------------+
| @timestamp          | severity | service.name     | message                         |
|---------------------+----------+------------------+---------------------------------|
| 2024-03-15 11:30:40 | ERROR    | checkout-service | Invalid product ID              |
| 2024-03-15 11:30:40 | ERROR    | payment-api      | Insufficient funds check failed |
| 2024-03-15 11:30:50 | ERROR    | checkout-service | Inventory check failed          |
| 2024-03-15 11:30:50 | ERROR    | payment-api      | Transaction timeout             |
| 2024-03-15 11:31:00 | ERROR    | checkout-service | Invalid product ID              |
+---------------------+----------+------------------+---------------------------------+

What this does:

  • source=otel_logs - Specifies which index to query
  • fields - Selects specific fields to display
  • head 5 - Returns only the first 5 results

💡 Tip: The head command is perfect for previewing data before building complex queries.


Filtering Data: Finding What Matters

Now that we've seen the data, let's find specific information. The where command filters results based on conditions.

Step 2: Find All Errors

Let's find all error logs to investigate issues:

source=otel_logs | where severity = "ERROR" | fields @timestamp, service.name, message | head 5
fetched rows / total rows = 5/5
+---------------------+------------------+---------------------------------+
| @timestamp          | service.name     | message                         |
|---------------------+------------------+---------------------------------|
| 2024-03-15 11:30:40 | checkout-service | Invalid product ID              |
| 2024-03-15 11:30:40 | payment-api      | Insufficient funds check failed |
| 2024-03-15 11:30:50 | checkout-service | Inventory check failed          |
| 2024-03-15 11:30:50 | payment-api      | Transaction timeout             |
| 2024-03-15 11:31:00 | checkout-service | Invalid product ID              |
+---------------------+------------------+---------------------------------+

What this does:

  • Filters to show only logs where severity equals "ERROR"
  • Selects specific fields to display
  • Limits results to 5 records

Step 3: Filter by Multiple Conditions

Let's narrow down to errors from a specific service:

source=otel_logs | where severity = "ERROR" AND service.name = "checkout-service" | fields @timestamp, service.name, message

What this does:

  • Uses AND to combine multiple conditions
  • Shows only errors from the checkout service
fetched rows / total rows = 523/523
+---------------------+------------------+-----------------------------+
| @timestamp          | service.name     | message                     |
|---------------------+------------------+-----------------------------|
| 2024-03-15 11:30:40 | checkout-service | Invalid product ID          |
| 2024-03-15 11:30:50 | checkout-service | Inventory check failed      |
| 2024-03-15 11:31:00 | checkout-service | Invalid product ID          |
| 2024-03-15 11:31:10 | checkout-service | Invalid product ID          |
| 2024-03-15 11:31:20 | checkout-service | Invalid product ID          |
| 2024-03-15 11:31:30 | checkout-service | Payment processing error    |
| 2024-03-15 11:31:40 | checkout-service | Invalid product ID          |
| 2024-03-15 11:31:50 | checkout-service | Order validation failed     |
| 2024-03-15 11:32:00 | checkout-service | Payment processing error    |
| 2024-03-15 11:32:10 | checkout-service | Database connection timeout |
| 2024-03-15 11:32:20 | checkout-service | Order validation failed     |
| 2024-03-15 11:32:30 | checkout-service | Database connection timeout |
| 2024-03-15 11:32:40 | checkout-service | Payment processing error    |
| 2024-03-15 11:32:50 | checkout-service | Invalid product ID          |
| 2024-03-15 11:33:00 | checkout-service | Invalid product ID          |
...
+---------------------+------------------+-----------------------------+

Step 4: Filter by Time Range

Most log analysis focuses on recent data. Let's find errors from a specific time period:

source=otel_logs
| where severity = "ERROR" 
  AND @timestamp >= "2024-03-15 11:30:00"
  AND @timestamp < "2024-03-15 11:31:00"
| fields @timestamp, service.name, message
| head 5
fetched rows / total rows = 5/5
+---------------------+------------------+---------------------------------+
| @timestamp          | service.name     | message                         |
|---------------------+------------------+---------------------------------|
| 2024-03-15 11:30:40 | checkout-service | Invalid product ID              |
| 2024-03-15 11:30:40 | payment-api      | Insufficient funds check failed |
| 2024-03-15 11:30:50 | checkout-service | Inventory check failed          |
| 2024-03-15 11:30:50 | payment-api      | Transaction timeout             |
| 2024-03-15 11:30:00 | checkout-service | Inventory check failed          |
+---------------------+------------------+---------------------------------+

What this does:

  • Filters to errors within a specific time window
  • Uses string literals for timestamp comparison
  • Shows the first 5 matching results

💡 Tip: For relative time ranges, use functions like date_sub(now(), INTERVAL 24 HOUR) to calculate "24 hours ago"


Aggregating Data: Uncover Patterns

Aggregation helps you understand trends and patterns in your data.

Step 5: Count Total Errors

How many errors occurred?

source=otel_logs
| where severity = "ERROR"
| stats count()

What this does:

  • stats count() counts all matching records
  • Returns a single number

Expected output:

fetched rows / total rows = 1/1
+---------+
| count() |
|---------|
| 1247    |
+---------+

Step 6: Count Errors by Service

Which services have the most errors?

source=otel_logs
| where severity = "ERROR"
| stats count() by service.name

What this does:

  • by service.name groups results by service
  • Counts errors for each service

Expected output:

fetched rows / total rows = 4/4
+---------+------------------+
| count() | service.name     |
|---------+------------------|
| 523     | checkout-service |
| 198     | inventory-svc    |
| 412     | payment-api      |
| 114     | user-service     |
+---------+------------------+

Step 7: Multiple Aggregations

Let's get more insights with multiple aggregations:

source=otel_logs
| where severity = "ERROR"
| stats count() as error_count, 
        avg(duration_ms) as avg_duration,
        max(duration_ms) as max_duration
  by service.name
| eval avg_duration = round(avg_duration, 2)
| fields service.name, error_count, avg_duration, max_duration

What this does:

  • Calculates multiple metrics per service
  • Uses as to name the calculated fields
  • Uses eval with round() function to limit decimal places to 2
  • Uses fields to control the column order in output
  • Shows service name, error count, average duration, and max duration

Expected output:

fetched rows / total rows = 4/4
+------------------+-------------+--------------+--------------+
| service.name     | error_count | avg_duration | max_duration |
|------------------+-------------+--------------+--------------|
| checkout-service | 523         | 1267.99      | 5000         |
| inventory-svc    | 198         | 466.71       | 1800         |
| payment-api      | 412         | 894.33       | 3200         |
| user-service     | 114         | 609.54       | 1094         |
+------------------+-------------+--------------+--------------+

💡 Insight: The checkout-service has both the most errors and the longest durations - a clear area for investigation!


Sorting and Limiting: Prioritize Your Findings

Step 8: Sort Results

Let's find which services have the most errors:

source=otel_logs
| where severity = "ERROR"
| stats count() as error_count by service.name
| sort error_count desc

What this does:

  • sort error_count desc sorts by error count, highest first
  • desc means descending (use asc for ascending)

Expected output:

fetched rows / total rows = 4/4
+-------------+------------------+
| error_count | service.name     |
|-------------+------------------|
| 523         | checkout-service |
| 412         | payment-api      |
| 198         | inventory-svc    |
| 114         | user-service     |
+-------------+------------------+

Putting It All Together: A Real-World Query

Let's combine everything we've learned to answer: "What are the top error patterns by service and HTTP status code?"

source=otel_logs
| where severity = "ERROR" 
  AND http.status_code >= 400
| stats count() as error_count,
        avg(duration_ms) as avg_response_time,
        max(duration_ms) as max_response_time
  by service.name, http.status_code
| eval avg_response_time = round(avg_response_time, 2)
| fields service.name, http.status_code, error_count, avg_response_time, max_response_time
| sort error_count desc
| head 10

What this query does:

  1. Filters to errors with HTTP status codes 400+
  2. Aggregates error counts and response times by service and status code
  3. Rounds average response time to 2 decimal places
  4. Selects fields in a logical order for analysis
  5. Sorts by error count to find the biggest problems
  6. Limits to top 10 results

Expected output:

fetched rows / total rows = 10/10
+------------------+------------------+-------------+-------------------+-------------------+
| service.name     | http.status_code | error_count | avg_response_time | max_response_time |
|------------------+------------------+-------------+-------------------+-------------------|
| checkout-service | 400              | 265         | 1258.26           | 2354              |
| checkout-service | 500              | 258         | 1278.0            | 5000              |
| payment-api      | 503              | 143         | 921.71            | 3200              |
| payment-api      | 400              | 140         | 882.92            | 1593              |
| payment-api      | 500              | 129         | 876.36            | 1525              |
| inventory-svc    | 500              | 108         | 481.69            | 1800              |
| inventory-svc    | 400              | 90          | 448.73            | 785               |
| user-service     | 401              | 42          | 631.93            | 1094              |
| user-service     | 403              | 37          | 568.08            | 972               |
| user-service     | 500              | 35          | 626.49            | 1015              |
+------------------+------------------+-------------+-------------------+-------------------+

</details>

<details><summary><a href='https://github.com/opensearch-project/sql/pull/5201/files#diff-1c35c2d36c5008bcd1a5cbeec3acce5af8c0c9c8317bbb86564a9dd6b8cb3cd7R34-R39'><strong>PPL Comment Syntax</strong></a>

The tutorial uses `//` as a comment syntax in a PPL code block (line 35), but PPL uses `/*` or `--` for comments, not `//`. This could confuse users trying to copy and run the example.
</summary>

```markdown
```ppl
// PPL: Read left to right, like a story
source=otel_logs
| where severity = "ERROR"
| stats count() by service.name

</details>

<details><summary><a href='https://github.com/opensearch-project/sql/pull/5201/files#diff-f7e3783015726eb3f35843294aaa812746908071b5772681ccef318b1fa16095R202-R215'><strong>Missing Links</strong></a>

The new index references `functions/index.md` in the getting-started tutorial's "Next Steps" section, but the index.md itself does not include a link to `cmd/syntax.md` which was previously listed in the old index. The `Interfaces` and `Administration` sections (endpoint, protocol, settings, security, monitoring, datasources, connectors, cross-cluster search) that existed in the old index are completely removed with no replacement links, potentially breaking navigation for users looking for those resources.
</summary>

```markdown
## Documentation & Resources

- **[Optimization Guide](../../user/optimization/optimization.rst)** - Query performance tuning
- **[Limitations](limitations/limitations.md)** - Known limitations and workarounds
- **[OpenSearch Documentation](https://opensearch.org/docs/latest/)** - Main OpenSearch docs

---

## Need Help?

- **New to PPL?** Start with the [Getting Started Tutorial](tutorials/getting-started.md)
- **Have questions or issues?** 
  - Submit issues to the [OpenSearch SQL repository](https://github.com/opensearch-project/sql/issues)
  - Join our [public Slack channel](https://opensearch.org/slack.html) for community support  
Row Count Accuracy

The total row count in the SHOW TABLES output was updated from 24 to 25 to account for the new otel_logs index. However, mvcombine_data was also reformatted (whitespace fix) in the same diff. Verify that the count of 25 is accurate and reflects all indices currently registered in the test environment.

fetched rows / total rows = 25/25

@anasalkouz anasalkouz self-assigned this Mar 4, 2026
@github-actions
Copy link
Contributor

github-actions bot commented Mar 4, 2026

PR Code Suggestions ✨

Latest suggestions up to 33a205f

Explore these optional code suggestions:

CategorySuggestion                                                                                                                                    Impact
Possible issue
Remove invalid comment syntax from PPL code block

PPL does not use // for comments — this is not valid PPL syntax. Using // as a
comment prefix may confuse readers into thinking it's valid PPL, and could cause
issues if the code block is executed. Remove the comment line or use a prose
description outside the code block instead.

docs/user/ppl/tutorials/getting-started.md [34-39]

 ```ppl
-// PPL: Read left to right, like a story
 source=otel_logs
 | where severity = "ERROR"
 | stats count() by service.name
<details><summary>Suggestion importance[1-10]: 7</summary>

__

Why: The `//` comment syntax is not valid PPL syntax and could mislead users into thinking it's valid PPL. Removing it improves accuracy of the documentation example.


</details></details></td><td align=center>Medium

</td></tr><tr><td>



<details><summary>Fix broken link to functions reference</summary>

___


**The link <code>../functions/index.md</code> points to a file that does not appear to exist in the <br>repository based on the diff. The existing function reference files are individual <br>pages (e.g., <code>functions/aggregations.md</code>). This broken link should be corrected to <br>point to a valid file, such as <code>../functions/aggregations.md</code> or the main index.**

[docs/user/ppl/tutorials/getting-started.md [464]](https://github.com/opensearch-project/sql/pull/5201/files#diff-1c35c2d36c5008bcd1a5cbeec3acce5af8c0c9c8317bbb86564a9dd6b8cb3cd7R464-R464)

```diff
-**[Functions Reference](../functions/index.md)** - Explore available functions for calculations
+**[Functions Reference](../functions/aggregations.md)** - Explore available functions for calculations
Suggestion importance[1-10]: 5

__

Why: The link ../functions/index.md likely doesn't exist based on the PR diff, which only shows individual function pages. This could result in a broken link for users, though the exact file structure outside the diff is uncertain.

Low
General
Standardize inconsistent timestamp formats in test data

The test data file mixes two different timestamp formats: some entries use
millisecond precision (2024-03-15T10:30:45.000Z) while others use second precision
(2024-03-15T10:34:00Z). This inconsistency could cause test failures if the
documentation examples rely on a specific timestamp format for parsing or filtering.
Standardize all timestamps to use the same format throughout the file.

doctest/test_data/otel_logs.json [1-14]

-{"@timestamp": "2024-03-15T10:30:45.000Z", ...}
-{"@timestamp": "2024-03-15T10:30:46.000Z", ...}
+{"@timestamp": "2024-03-15T10:30:45.000Z", "severity": "INFO", "service.name": "checkout-service", "message": "Order created successfully", "http.status_code": 200, "duration_ms": 150, "http.route": "/api/checkout"}
+{"@timestamp": "2024-03-15T10:30:46.000Z", "severity": "INFO", "service.name": "payment-api", "message": "Payment processed", "http.status_code": 200, "duration_ms": 320, "http.route": "/api/payment"}
 ...
-{"@timestamp": "2024-03-15T10:34:00Z", ...}
+{"@timestamp": "2024-03-15T10:34:00.000Z", "severity": "ERROR", ...}
Suggestion importance[1-10]: 4

__

Why: The suggestion correctly identifies a real inconsistency between millisecond-precision timestamps (lines 1-9) and second-precision timestamps (line 10 onwards). However, this is test data and many parsers handle both ISO 8601 formats correctly, so the practical impact may be low. The improved_code only partially demonstrates the fix and doesn't fully reflect the scope of changes needed.

Low
Clarify inclusive/exclusive time range boundary in example

The time range filter uses < (strictly less than) for the upper bound "2024-03-15
11:31:00", but the sample output includes a row with @timestamp of 2024-03-15
11:30:00 which is at the lower bound. More importantly, the output shows only 5 rows
with timestamps between 11:30:40 and 11:30:50, which is consistent, but the last row
shows 2024-03-15 11:30:00 which is exactly at the lower bound — this is fine.
However, the query uses >= for the lower bound and < for the upper bound, which means
11:31:00 is excluded. The sample output is consistent, but the tutorial should
clarify this boundary behavior to avoid confusion.

docs/user/ppl/tutorials/getting-started.md [203-205]

 | where severity = "ERROR" 
   AND @timestamp >= "2024-03-15 11:30:00"
-  AND @timestamp < "2024-03-15 11:31:00"
+  AND @timestamp <= "2024-03-15 11:30:59"
Suggestion importance[1-10]: 2

__

Why: The suggestion changes < to <= with a different timestamp, but the existing code is logically consistent and the improved_code changes the semantics unnecessarily. The original boundary behavior is standard and the suggestion offers minimal improvement.

Low

Previous suggestions

Suggestions up to commit 35cc46e
CategorySuggestion                                                                                                                                    Impact
Possible issue
Remove invalid comment syntax from PPL code block

PPL does not support // style comments. Using // as a comment prefix in a PPL code
block is incorrect and may confuse users or cause errors if they copy and run the
query. Remove the comment line or use a prose description outside the code block
instead.

docs/user/ppl/tutorials/getting-started.md [35-40]

 ```ppl
-// PPL: Read left to right, like a story
 source=otel_logs
 | where severity = "ERROR"
 | stats count() by service.name
<details><summary>Suggestion importance[1-10]: 7</summary>

__

Why: PPL does not support `//` style comments, so including `// PPL: Read left to right, like a story` in a PPL code block is incorrect and could confuse users who copy and run the query. Removing it improves accuracy of the documentation.


</details></details></td><td align=center>Medium

</td></tr><tr><td rowspan=3>General</td>
<td>



<details><summary>Standardize inconsistent timestamp formats in test data</summary>

___


**The test data file mixes two different timestamp formats: some entries use <br>millisecond precision (<code>2024-03-15T10:30:45.000Z</code>) while others use second precision <br>(<code>2024-03-15T10:34:00Z</code>). This inconsistency could cause test failures if the <br>documentation examples rely on a specific timestamp format for parsing or filtering. <br>Standardize all timestamps to use the same format throughout the file.**

[doctest/test_data/otel_logs.json [1-2]](https://github.com/opensearch-project/sql/pull/5201/files#diff-975ff637bd6e510d15564ca368fae81c5dfea7edd0885670802c6763751e08a1R1-R2)

```diff
-{"@timestamp": "2024-03-15T10:30:45.000Z", ...}
-{"@timestamp": "2024-03-15T10:30:46.000Z", ...}
+{"@timestamp": "2024-03-15T10:30:45.000Z", "severity": "INFO", "service.name": "checkout-service", "message": "Order created successfully", "http.status_code": 200, "duration_ms": 150, "http.route": "/api/checkout"}
+{"@timestamp": "2024-03-15T10:30:46.000Z", "severity": "INFO", "service.name": "payment-api", "message": "Payment processed", "http.status_code": 200, "duration_ms": 320, "http.route": "/api/payment"}
Suggestion importance[1-10]: 4

__

Why: The suggestion correctly identifies that the test data file mixes timestamp formats (millisecond precision like 2024-03-15T10:30:45.000Z vs second precision like 2024-03-15T10:34:00Z). However, the improved_code only shows the first two lines unchanged and doesn't actually demonstrate the standardization fix. The impact is moderate as this inconsistency could affect timestamp parsing in tests, but both formats are valid ISO 8601 and most parsers handle both.

Low
Fix potentially broken documentation links

The links ../cmd/index.md and ../functions/index.md reference index.md files that
may not exist in the repository, as the PR only adds getting-started.md and the
existing structure uses individual command files. Broken links in documentation
degrade user experience and can cause build failures. Verify these files exist or
update the links to point to valid targets.

docs/user/ppl/tutorials/getting-started.md [463-464]

-**[PPL Command Reference](../cmd/index.md)** - Complete command documentation
-- **[Functions Reference](../functions/index.md)** - Explore available functions for calculations
+**[PPL Command Reference](../index.md#quick-command-reference)** - Complete command documentation
+- **[Functions Reference](../index.md#functions)** - Explore available functions for calculations
Suggestion importance[1-10]: 4

__

Why: The links ../cmd/index.md and ../functions/index.md may point to non-existent files, but this is a verification concern rather than a confirmed bug. The suggested fix to anchor links in ../index.md is a reasonable alternative, but the score is limited since existence of these files cannot be confirmed from the diff alone.

Low
Ensure sample output matches the filter conditions shown

The time range filter uses < (strictly less than) for the upper bound at 11:31:00,
but the sample output includes a row with timestamp 2024-03-15 11:30:00 which is the
exact lower bound. The output row 2024-03-15 11:30:00 is consistent with >=, but the
last row shown (11:30:00) appears out of order compared to the other results. Ensure
the sample output is consistent with the filter logic to avoid confusing readers.

docs/user/ppl/tutorials/getting-started.md [203-205]

+| where severity = "ERROR" 
+  AND @timestamp >= "2024-03-15 11:30:00"
+  AND @timestamp < "2024-03-15 11:31:00"
 
-
Suggestion importance[1-10]: 2

__

Why: The existing_code and improved_code are identical, so no actual change is proposed. The suggestion only asks to verify consistency between the filter and sample output, which is a minor documentation quality concern but not an actionable code fix.

Low
Suggestions up to commit fdacf43
CategorySuggestion                                                                                                                                    Impact
Possible issue
Remove invalid comment syntax from code block

PPL does not use // for comments; this is invalid PPL syntax. The comment style
shown may confuse readers who try to run this code. Remove the comment line or use a
plain text explanation outside the code block instead.

docs/user/ppl/tutorials/getting-started.md [34-39]

 ```ppl
-// PPL: Read left to right, like a story
 source=otel_logs
 | where severity = "ERROR"
 | stats count() by service.name
<details><summary>Suggestion importance[1-10]: 7</summary>

__

Why: PPL does not support `//` comment syntax, so including it in a code block labeled `ppl` is misleading and could cause confusion for users who try to run the example. Removing the comment line makes the code block valid and accurate.


</details></details></td><td align=center>Medium

</td></tr><tr><td>



<details><summary>Verify row count matches table entries</summary>

___


**The total row count was updated from 24 to 25 to account for the new <code>otel_logs</code> <br>index, but the table listing must also be verified to contain exactly 25 rows. If <br>the table does not match this count, the doctest will fail. Ensure the number of <br>data rows in the table matches the declared total.**

[docs/user/dql/metadata.rst [38]](https://github.com/opensearch-project/sql/pull/5201/files#diff-119b6412f52dd06e6a6ddc02bec5f72d53a0020c309d04177e3abf72dc23775dR38-R38)

```diff
+fetched rows / total rows = 25/25
 
-
Suggestion importance[1-10]: 4

__

Why: The existing_code and improved_code are identical, so no actual change is suggested. The suggestion only asks to verify the count matches, which is a valid concern but doesn't warrant a high score since it's just a verification request.

Low
General
Fix potentially broken documentation links

The links ../cmd/index.md and ../functions/index.md reference files that may not
exist based on the rest of the documentation structure, which uses specific
filenames like cmd/search.md and functions/aggregations.md. Broken links in
documentation degrade user experience and may cause test failures. Verify these
index files exist or replace with valid links.

docs/user/ppl/tutorials/getting-started.md [463-464]

-- **[PPL Command Reference](../cmd/index.md)** - Complete command documentation
-- **[Functions Reference](../functions/index.md)** - Explore available functions for calculations
+- **[PPL Command Reference](../index.md#quick-command-reference)** - Complete command documentation
+- **[Functions Reference](../index.md#functions)** - Explore available functions for calculations
Suggestion importance[1-10]: 4

__

Why: The links ../cmd/index.md and ../functions/index.md may not exist based on the documentation structure visible in the PR. However, the suggestion to replace them with anchor links to ../index.md is a reasonable workaround, though it's uncertain whether those anchors exist either.

Low
Suggestions up to commit 9c1af6e
CategorySuggestion                                                                                                                                    Impact
Possible issue
Fix invalid comment syntax in PPL example

PPL does not use // for comments; this is not valid PPL syntax. Using incorrect
comment syntax in a tutorial could confuse users and cause query failures if they
copy the code. Remove the comment line or use a valid PPL comment style (e.g., /

... /).

docs/user/ppl/tutorials/getting-started.md [34-39]

 ```ppl
-// PPL: Read left to right, like a story
+/* PPL: Read left to right, like a story */
 source=otel_logs
 | where severity = "ERROR"
 | stats count() by service.name
<details><summary>Suggestion importance[1-10]: 7</summary>

__

Why: PPL does not support `//` style comments, so using this syntax in a tutorial is misleading and could cause query failures if users copy the code. The fix to use `/* ... */` block comment syntax is accurate and important for a tutorial document.


</details></details></td><td align=center>Medium

</td></tr><tr><td>



<details><summary>Fix broken links to non-existent documentation files</summary>

___


**These "Next Steps" links reference files that do not appear to exist in the <br>repository (<code>learn-common-commands.md</code>, <code>../quick-reference.md</code>, <code>../functions/index.md</code>). <br>Broken links in documentation degrade user experience and trust. Either create these <br>files or update the links to point to existing documentation.**

[docs/user/ppl/tutorials/getting-started.md [463-465]](https://github.com/opensearch-project/sql/pull/5201/files#diff-1c35c2d36c5008bcd1a5cbeec3acce5af8c0c9c8317bbb86564a9dd6b8cb3cd7R463-R465)

```diff
-- **[Learn Common Commands](learn-common-commands.md)** - Master the 10 most-used PPL commands
-- **[Quick Reference](../quick-reference.md)** - Handy cheat sheet for all commands
-- **[Functions Reference](../functions/index.md)** - Explore available functions for calculations
+- **[Learn Common Commands](../index.md)** - Master the most-used PPL commands
+- **[Quick Reference](../index.md)** - Handy reference for all commands
+- **[Functions Reference](../functions/aggregations.md)** - Explore available functions for calculations
Suggestion importance[1-10]: 5

__

Why: The links learn-common-commands.md, ../quick-reference.md, and ../functions/index.md likely don't exist yet, but the suggested replacements pointing to ../index.md or ../functions/aggregations.md are approximations that may not be ideal either. The issue is valid but the improved code is a rough workaround rather than a definitive fix.

Low
General
Fix broken command reference index link

The link ../cmd/index.md references a file that does not appear to exist in the
repository based on the diff (individual command files like cmd/search.md exist, but
not cmd/index.md). This broken link in the "Getting Help" section would leave users
unable to find the command reference. Update it to point to an existing file.

docs/user/ppl/tutorials/getting-started.md [488]

-- **[PPL Command Reference](../cmd/index.md)** - Complete command documentation
+- **[PPL Command Reference](../index.md)** - Complete command documentation
Suggestion importance[1-10]: 4

__

Why: The link ../cmd/index.md may not exist, but the suggested replacement ../index.md is a rough approximation. The issue is valid but the fix is uncertain without knowing the full repository structure.

Low

Signed-off-by: Anas Alkouz <aalkouz@amazon.com>
@github-actions
Copy link
Contributor

github-actions bot commented Mar 4, 2026

Persistent review updated to latest commit fdacf43

Signed-off-by: Anas Alkouz <aalkouz@amazon.com>
@github-actions
Copy link
Contributor

github-actions bot commented Mar 4, 2026

Persistent review updated to latest commit 35cc46e

Signed-off-by: Anas Alkouz <aalkouz@amazon.com>
@github-actions
Copy link
Contributor

github-actions bot commented Mar 4, 2026

Persistent review updated to latest commit 33a205f

@@ -0,0 +1,1751 @@
{"@timestamp": "2024-03-15T10:30:45.000Z", "severity": "INFO", "service.name": "checkout-service", "message": "Order created successfully", "http.status_code": 200, "duration_ms": 150, "http.route": "/api/checkout"}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it too much to have 1,751 records for doctest data? Would it make sense to trim this down to ~30-50 well-crafted records that still cover all use cases?

Comment on lines +83 to +84
"trace_id": "a1b2c3d4e5f6g7h8i9j0",
"span_id": "1a2b3c4d5e6f7g8h",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are these missing in otel_logs test data?


Throughout this tutorial, we'll use OpenTelemetry (OTEL) observability data. OTEL is the industry standard for collecting telemetry data (logs, metrics, and traces) from applications.

### Sample Log Structure
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just curious any plan to cover trace, metrics and correlation case among them in future?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

documentation Improvements or additions to documentation PPL Piped processing language

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[DOC] Add Getting Started Tutorial

2 participants