Implement naming package, new IdentifierIntroduction.qll, unicode funcs.#950
Implement naming package, new IdentifierIntroduction.qll, unicode funcs.#950MichaelRFairhurst wants to merge 8 commits intomainfrom
Conversation
|
Note that the unicode data came from advanced-security/codeql-qtil#13 I should definitely finish unicode support in qtil, publish, and then use that here. Likely, that should be done before merge, but not strictly necessary. |
|
Relevant qtil pull request: advanced-security/codeql-qtil#13 |
There was a problem hiding this comment.
Pull request overview
This PR implements a comprehensive naming validation package for MISRA C++ RULE-5-10-1, which enforces proper identifier formation in C++ code. The implementation introduces a sophisticated identifier tracking system that validates identifiers against multiple constraints including Unicode normalization, reserved names, namespace restrictions, and macro naming conventions.
Key changes:
- Introduces the
IdentifierIntroductionabstraction that systematically captures all identifier declarations across various C++ constructs (variables, functions, types, macros, namespaces, templates, etc.) - Implements Unicode support with UAX#44 compliance checking and NFC normalization validation using extensible predicates with external YAML data
- Adds MISRA C++ RULE-5-10-1 query to detect poorly formed identifiers including underscore violations, lowercase in macros, reserved names, and reserved namespace usage
Reviewed changes
Copilot reviewed 17 out of 18 changed files in this pull request and generated 11 comments.
Show a summary per file
| File | Description |
|---|---|
| cpp/common/src/codingstandards/cpp/Identifiers.qll | Introduces comprehensive IdentifierIntroduction class hierarchy that systematically tracks all identifier declarations across various C++ constructs |
| cpp/common/src/codingstandards/cpp/Unicode.qll | Implements Unicode property checking (NFC_QC, XID_Start, XID_Continue) and unicode escape sequence handling for identifier validation |
| cpp/common/src/codingstandards/cpp/Macro.qll | Fixes variadic macro parameter extraction to properly exclude ellipsis and empty parameter names |
| cpp/misra/src/rules/RULE-5-10-1/PoorlyFormedIdentifier.ql | Implements the main query that validates identifiers against MISRA C++ RULE-5-10-1 constraints |
| cpp/common/src/codingstandards/cpp/exclusions/cpp/Naming2.qll | Autogenerated metadata for Naming2 package query registration |
| cpp/common/src/codingstandards/cpp/exclusions/cpp/RuleMetadata.qll | Registers Naming2 package in the rule metadata system |
| rule_packages/cpp/Naming2.json | Defines query metadata for RULE-5-10-1 including severity, precision, and tags |
| cpp/misra/test/rules/RULE-5-10-1/test.cpp | Comprehensive test file with 189 lines covering Unicode, normalization, underscores, macros, namespaces, and reserved names |
| cpp/misra/test/rules/RULE-5-10-1/PoorlyFormedIdentifier.expected | Expected query results showing 48 violations across various identifier validation rules |
| cpp/misra/test/rules/RULE-5-10-1/PoorlyFormedIdentifier.qlref | Query reference file for test execution |
| cpp/common/test/library/codingstandards/cpp/identifiers/* | Library test suite with 666 lines testing identifier extraction across all C++ constructs |
| cpp/common/test/includes/standard-library/utility.h | Adds pair and tuple support for structured binding tests |
| cpp/common/src/qlpack.yml | Registers unicode.yml data extension |
| change_notes/2025-08-22-function-like-macro-param-name-bug-fixes.md | Documents bug fixes in function-like macro parameter handling |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| exists(Function func | func = intro.getElement().(FunctionDeclarationEntry).getFunction() | | ||
| isUserDefinedLiteralSuffixNonCompliant(func) and | ||
| message = "User-defined literal suffix '" + ident + "' is malformed." | ||
| ) |
There was a problem hiding this comment.
This condition appears unreachable. The query checks if the element is a FunctionDeclarationEntry with a Function that has a malformed user-defined literal suffix, and then tries to use 'ident' in the message. However, for user-defined literal suffixes, the identifier extracted on line 53 via 'intro.unescapeUnicode()' will be the suffix without the 'operator ""' prefix (e.g., '_foo'), not the full function name. This means this branch would never match the conditions in 'isUserDefinedLiteralSuffixNonCompliant' which checks for patterns in the full function name like 'operator""%'. This clause should either be removed as unreachable or the logic should be corrected to properly handle this case.
| exists(Function func | func = intro.getElement().(FunctionDeclarationEntry).getFunction() | | |
| isUserDefinedLiteralSuffixNonCompliant(func) and | |
| message = "User-defined literal suffix '" + ident + "' is malformed." | |
| ) |
There was a problem hiding this comment.
@MichaelRFairhurst could this explain the missing alert for test.cpp:71?
There was a problem hiding this comment.
Handled, but required some offset trickery!
Thanks for catching this!
mbaluda
left a comment
There was a problem hiding this comment.
One regex fix and a couple of inconsistent test annotations... looks great otherwise!
| int varα = 2; // COMPLIANT - XID_Continue character | ||
| int var_γ = 3; // COMPLIANT - underscore and XID_Continue | ||
| int var⁺invalid = 5; // NON_COMPLIANT - U+207A not in XID_Continue class | ||
| int var̃ = 6; // COMPLIANT - combining tilde, XID_Continue but not XID_Start |
There was a problem hiding this comment.
| int var̃ = 6; // COMPLIANT - combining tilde, XID_Continue but not XID_Start | |
| int var̃ = 6; // NON_COMPLIANT - combining tilde, XID_Continue but not XID_Start |
There was a problem hiding this comment.
Interesting.
This test was I guess intending to test combining marks that are in XID_Continue, but tilde is not NFC form.
I switched this to an NFC form combining mark (one that doesn't ever precompose). This also prompted me to add a test in the NFC form section that checks a number of additional NFC form combining marks.
| exists(Function func | func = intro.getElement().(FunctionDeclarationEntry).getFunction() | | ||
| isUserDefinedLiteralSuffixNonCompliant(func) and | ||
| message = "User-defined literal suffix '" + ident + "' is malformed." | ||
| ) |
There was a problem hiding this comment.
@MichaelRFairhurst could this explain the missing alert for test.cpp:71?
Description
Implement naming package.
Change request type
.ql,.qll,.qlsor unit tests)Rules with added or modified queries
RULE 5-10-1Release change checklist
A change note (development_handbook.md#change-notes) is required for any pull request which modifies:
If you are only adding new rule queries, a change note is not required.
Author: Is a change note required?
🚨🚨🚨
Reviewer: Confirm that format of shared queries (not the .qll file, the
.ql file that imports it) is valid by running them within VS Code.
Reviewer: Confirm that either a change note is not required or the change note is required and has been added.
Query development review checklist
For PRs that add new queries or modify existing queries, the following checklist should be completed by both the author and reviewer:
Author
As a rule of thumb, predicates specific to the query should take no more than 1 minute, and for simple queries be under 10 seconds. If this is not the case, this should be highlighted and agreed in the code review process.
Reviewer
As a rule of thumb, predicates specific to the query should take no more than 1 minute, and for simple queries be under 10 seconds. If this is not the case, this should be highlighted and agreed in the code review process.