Update graphlookup syntax by LantaoJin · Pull Request #5209 · opensearch-project/sql

LantaoJin · 2026-03-06T08:11:05Z

Description

Update naming convention to support directional edge syntax
Unidirectional:

| graphLookup employees
  start=reportsTo
  edge=reportsTo-->name
  filter=(status='active')
  as reportingHierarchy

Bidirectional:

| graphLookup employees
  start=reportsTo
  edge=reportsTo<->name
  filter=(status='active')
  as reportingHierarchy

Related Issues

Resolves #5208

Check List

New functionality includes testing.
New functionality has been documented.
New functionality has javadoc added.
New functionality has a user manual doc added.
New PPL command checklist all confirmed.
API changes companion pull request created.
Commits are signed per the DCO using --signoff or -s.
Public documentation issue/PR created.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Signed-off-by: Lantao Jin <ltjin@amazon.com>

github-actions · 2026-03-06T08:12:33Z

PR Reviewer Guide 🔍

Here are some key observations to aid the review process:

🧪 PR contains tests
🔒 No security concerns identified
✅ No TODO sections
🔀 Multiple PR themes Sub-PR theme: Update graphLookup grammar and AST parsing for new edge syntax Relevant files: ppl/src/main/antlr/OpenSearchPPLLexer.g4 ppl/src/main/antlr/OpenSearchPPLParser.g4 ppl/src/main/java/org/opensearch/sql/ppl/parser/AstBuilder.java core/src/main/java/org/opensearch/sql/ast/tree/GraphLookup.java Sub-PR theme: Update tests and anonymizer for new graphLookup edge syntax Relevant files: integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalcitePPLGraphLookupIT.java ppl/src/test/java/org/opensearch/sql/ppl/calcite/CalcitePPLGraphLookupTest.java ppl/src/test/java/org/opensearch/sql/ppl/parser/AstBuilderTest.java ppl/src/test/java/org/opensearch/sql/ppl/utils/PPLQueryDataAnonymizerTest.java ppl/src/main/java/org/opensearch/sql/ppl/utils/PPLQueryDataAnonymizer.java Sub-PR theme: Update graphLookup documentation for new edge syntax Relevant files: docs/user/ppl/cmd/graphlookup.md
⚡ Recommended focus areas for review Lexer Mode Leak The custom lexer modes (EDGE_MODE, EDGE_ARROW_MODE, EDGE_TO_MODE) use `popMode` only in EDGE_TO_MODE. If parsing fails mid-way (e.g., invalid arrow or missing toField), the lexer may remain stuck in a non-default mode, causing subsequent tokens to be misidentified. There is no error recovery or explicit `popMode` on failure paths. mode EDGE_MODE; EDGE_EQUAL: '=' -> type(EQUAL); EDGE_WS: [ \t\r\n]+ -> skip; EDGE_ID: [@A-Z_]+? [A-Z_0-9]* -> type(ID), mode(EDGE_ARROW_MODE); // EDGE_ARROW_MODE: entered after fromField, handles optional trailing dashes, arrow, and toField mode EDGE_ARROW_MODE; EDGE_UNI_ARROW: '-->' -> type(UNI_ARROW), mode(EDGE_TO_MODE); EDGE_BI_ARROW: '<->' -> type(BI_ARROW), mode(EDGE_TO_MODE); TRAIL_HYPHEN: '-'+; EDGE_ARROW_WS: [ \t\r\n]+ -> skip; // EDGE_TO_MODE: entered after arrow, lexes toField then pops back to default mode EDGE_TO_MODE; EDGE_TO_WS: [ \t\r\n]+ -> skip; EDGE_TO_ID: ID_LITERAL -> type(ID), popMode; EDGE_ID Pattern The EDGE_ID rule pattern `[@A-Z_]+? [A-Z_0-9]` uses case-insensitive matching only for uppercase letters (A-Z). If the lexer grammar is not configured with case-insensitive flag, lowercase field names like `reportsTo` or `connects` would not be matched. Verify that the grammar uses `options { caseInsensitive=true; }` or that the pattern covers lowercase letters. EDGE_ID: [@A-Z_]+? [A-Z_0-9] -> type(ID), mode(EDGE_ARROW_MODE); Trailing Hyphen Handling When `trailHyphen` is present, the code appends the hyphen text to the fromField name string. However, this approach relies on `fromField.getField().toString()` returning a plain string, which may not hold for all `QualifiedName` implementations (e.g., multi-part names). Additionally, there is no validation that the resulting field name is valid. if (edgeCtx.trailHyphen != null) { String name = fromField.getField().toString() + edgeCtx.trailHyphen.getText(); fromField = new Field(new QualifiedName(name)); } Null Check Missing `startField` is parsed from `ctx.startClause()` without a null check. If the grammar allows `startClause` to be absent (e.g., due to grammar ambiguity or future changes), a NullPointerException would occur. The old code had explicit null handling for optional fields. OpenSearchPPLParser.StartClauseContext startCtx = ctx.startClause(); Field startField = (Field) internalVisitExpression(startCtx.startField);

github-actions · 2026-03-06T08:13:05Z

PR Code Suggestions ✨

Explore these optional code suggestions:

Category	Suggestion	Impact
Possible issue	Fix regex to match lowercase identifiers *The `EDGE_ID` regex `[@A-Z_]+? [A-Z_0-9]` uses a lazy quantifier followed by a space literal, which likely doesn't match the intended pattern for identifiers. It should use a character class without a space, such as `[@A-Za-z_][A-Za-z_0-9]` (or similar), to correctly match field names. The current pattern would fail to match lowercase field names like `reportsTo` or `connects`. ppl/src/main/antlr/OpenSearchPPLLexer.g4 [607] -EDGE_ID: [@A-Z_]+? [A-Z_0-9] -> type(ID), mode(EDGE_ARROW_MODE); +EDGE_ID: [@A-Za-z_][A-Za-z_0-9]* -> type(ID), mode(EDGE_ARROW_MODE); Suggestion importance[1-10]: 8 __ Why: The `EDGE_ID` pattern `[@A-Z_]+? [A-Z_0-9]*` only matches uppercase characters, which would fail to match lowercase field names like `reportsTo` or `connects` used throughout the tests. This is a critical bug that would break the new `edge=` syntax entirely.	Medium
General	Remove unnecessary null check for required field The `start` field is now a required parameter in the new syntax (not optional), but the anonymizer still guards it with a null check. Since `startField` is always required by the new grammar, this conditional may hide bugs where `startField` is unexpectedly null. Consider asserting non-null or removing the guard to match the required nature of the field. ppl/src/main/java/org/opensearch/sql/ppl/utils/PPLQueryDataAnonymizer.java [235-239] -if (node.getStartField() != null) { - command.append(" start=").append(MASK_COLUMN); -} +command.append(" start=").append(MASK_COLUMN); String arrow = node.getDirection() == GraphLookup.Direction.BI ? "<->" : "-->"; command.append(" edge=").append(MASK_COLUMN).append(arrow).append(MASK_COLUMN); Suggestion importance[1-10]: 4 __ Why: Since `startField` is now a required parameter in the new grammar, the null guard in the anonymizer is inconsistent with the required nature of the field. Removing it would make the anonymizer consistent with the grammar and could surface unexpected null values as bugs rather than silently omitting the field.	Low
General	Document hyphenated field name edge case The `trailHyphen` token captures one or more trailing hyphens from the `fromField` name, but the grammar rule `TRAIL_HYPHEN: '-'+` is defined in `EDGE_ARROW_MODE`, which is entered only after `EDGE_ID` is matched. This means the trailing hyphen is a separate token and the concatenation logic here is correct, but there's no validation that the resulting field name with appended hyphens is a valid field identifier. Consider adding a check or documenting this behavior clearly to avoid silent misuse. ppl/src/main/java/org/opensearch/sql/ppl/parser/AstBuilder.java [1526-1529] if (edgeCtx.trailHyphen != null) { String name = fromField.getField().toString() + edgeCtx.trailHyphen.getText(); + // trailHyphen allows field names with trailing hyphens (e.g., "manager-") fromField = new Field(new QualifiedName(name)); } Suggestion importance[1-10]: 1 __ Why: This suggestion only adds a comment to existing code without changing any logic. Adding comments/documentation is explicitly excluded from scoring above 0, but the existing code is functionally correct.	Low

Update graphlookup syntax

8b3e24d

Signed-off-by: Lantao Jin <ltjin@amazon.com>

LantaoJin added the enhancement New feature or request label Mar 6, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update graphlookup syntax#5209

Update graphlookup syntax#5209
LantaoJin wants to merge 1 commit intoopensearch-project:mainfrom
LantaoJin:pr/issues/5208

LantaoJin commented Mar 6, 2026 •

edited

Loading

Uh oh!

github-actions bot commented Mar 6, 2026

Uh oh!

github-actions bot commented Mar 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

LantaoJin commented Mar 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Related Issues

Check List

Uh oh!

github-actions bot commented Mar 6, 2026

PR Reviewer Guide 🔍

Uh oh!

github-actions bot commented Mar 6, 2026

PR Code Suggestions ✨

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

LantaoJin commented Mar 6, 2026 •

edited

Loading