Conversation
|
!test |
|
Review updated until commit 3cc2e10 Description
|
| Relevant files | |||||
|---|---|---|---|---|---|
| Enhancement |
| ||||
| Tests |
|
PR Reviewer Guide
Here are some key observations to aid the review process:
| 🧪 No relevant tests |
| ⚡ Recommended focus areas for review |
Algorithm Correctness
|
78310cd to
ba2f27d
Compare
|
!test |
f3a9608 to
f184027
Compare
|
!test |
f184027 to
b7345f7
Compare
|
!test |
aaa42fd to
12cc04e
Compare
|
!test |
| getProjectedExtent(id), commonOrConstExtent(ca_map_, id)); | ||
| } | ||
|
|
||
| void ContiguousInnerDimensionsMapper::addProjectedExtent( |
There was a problem hiding this comment.
| // Ordering of dimensions is important in this analysis, if an ordering is | ||
| // contiguous in the reference, but not the target tensor views, then we | ||
| // cannot consider that a contiguous merge dimension for vectorization. | ||
| auto projected_logical = projectId(filtered_ids, logical_domain); |
There was a problem hiding this comment.
projected_logical gives me the wrong impression that the whole logical domain is projected. In fact, it's still as filtered as filtered_ids.
Greptile SummaryRefactored Key changes:
Potential concerns:
Confidence Score: 2/5
Important Files Changed
Flowchart%%{init: {'theme': 'neutral'}}%%
flowchart TD
A[Start: projectId with from/to/leaf domains] --> B[forward_project to 'to' domain]
B --> C[Save frontier state]
C --> D[forward_project to 'leaf' allocation domain]
D --> E[Restore frontier state]
E --> F[backward_project to 'to' domain]
F --> G[Return projected IDs]
D --> H[During projection: combinePE/distributePE]
H --> I[addProjectedExtent records allocation IDs]
G --> J[getContigMergeOfInnerSize uses projected extents]
J --> K{Is alloc_id contiguous?}
K -->|Yes| L[Find logical_id from alloc_id]
L --> M{logical_id found?}
M -->|No| N[Break - LayoutOp case]
M -->|Yes| O{Matches projected_dim?}
O -->|No| N
O -->|Yes| P[getProjectedExtent alloc_id]
P --> Q{alloc_id in projected_extent_?}
Q -->|No| R[THROW: Not projected]
Q -->|Yes| S[Multiply into product]
S --> K
Last reviewed commit: 3cc2e10 |
csrc/scheduler/vectorize_helper.cpp
Outdated
| for (auto [alloc_id, cont] : | ||
| zip(alloc | std::views::reverse, contiguity | std::views::reverse)) { | ||
| auto is_treated_as_size_one = [](IterDomain* id) { | ||
| return id->isReduction() || id->isBroadcast() || id->isParallelized() || | ||
| id->extent()->isOneInt(); | ||
| }; | ||
| if (is_treated_as_size_one(alloc_id)) { | ||
| continue; | ||
| } | ||
|
|
||
| auto contiguity_i = contiguity.at(alloc_ii); | ||
| if (!contiguity_i.has_value()) { | ||
| NVF_THROW("contiguity flag at alloc_ii can't be null"); | ||
| } else { | ||
| // Not contiguous | ||
| if (!contiguity_i.value()) { | ||
| break; | ||
| } | ||
| NVF_ERROR(cont.has_value()); | ||
| if (!cont.value()) { | ||
| break; | ||
| } | ||
|
|
||
| // Get the logical ID corresponding to the allocation ID. | ||
| auto exprs = DependencyCheck::getAllExprsBetween( | ||
| {tv->getLogicalDomain().begin(), tv->getLogicalDomain().end()}, | ||
| {alloc_iid}); | ||
| IterDomain* logical_id = alloc_iid; | ||
| Val* num_devices = tv->container()->oneVal(); | ||
| bool only_valid_device_split = true; | ||
| for (Expr* expr : exprs | std::views::reverse) { | ||
| if (!isValidDeviceSplit(expr)) { | ||
| only_valid_device_split = false; | ||
| break; | ||
| } | ||
| auto* split = expr->as<Split>(); | ||
| logical_id = split->in(); | ||
| num_devices = SimplifyingIrBuilder::mulExpr(num_devices, split->factor()); | ||
| while (projected_dim != projected_dims.rend() && | ||
| is_treated_as_size_one(*projected_dim)) { | ||
| projected_dim++; | ||
| } | ||
|
|
||
| // Non device split could lead to padding, which prevents vectorization | ||
| if (!only_valid_device_split) { | ||
| break; | ||
| } | ||
| IterDomain* logical_id = [&]() { | ||
| std::vector<IterDomain*> reachable_ids = | ||
| ir_utils::getReachableIds(tv->getLogicalDomain(), {alloc_id}); | ||
| NVF_ERROR_EQ(reachable_ids.size(), 1); | ||
| return reachable_ids.front(); | ||
| }(); | ||
|
|
||
| // Mapping order isn't correct, cannot expand vectorization dimension. | ||
| if (projected_dims[--projected_dims_i] != logical_id) { | ||
| if (projected_dim == projected_dims.rend() || | ||
| *projected_dim != logical_id) { | ||
| break; | ||
| } | ||
|
|
||
| Val* sharded_extent; | ||
| if (logical_id->isDeviceDim()) { | ||
| sharded_extent = tv->container()->oneVal(); | ||
| } else { | ||
| sharded_extent = SimplifyingIrBuilder::divExpr( | ||
| getProjectedExtent(logical_id), num_devices); | ||
| } | ||
| product_of_inner_extents = | ||
| SimplifyingIrBuilder::mulExpr(product_of_inner_extents, sharded_extent); | ||
| // This assumes projected_dim can be matched only once. This assumption is | ||
| // OK for now but when we get to non-outermost sharding such as | ||
| // ``` | ||
| // [iS0] | ||
| // / \. | ||
| // iS1 iS2 | ||
| // / \. | ||
| // iDIDx3 iS4 | ||
| // ``` | ||
| // We may want to allow multiple contiguous allocation IDs to match | ||
| // projected_dim. | ||
| projected_dim++; | ||
|
|
||
| product_of_inner_extents = SimplifyingIrBuilder::mulExpr( | ||
| product_of_inner_extents, getProjectedExtent(alloc_id)); |
There was a problem hiding this comment.
Allocation extent mismatched
getContigMergeOfInnerSize now multiplies getProjectedExtent(alloc_id) (i.e., allocation ID) after matching logical_id against mappedLogicalIds(tv) (logical IDs). projected_extent_ values originate from recording logical/root projections and are not guaranteed to include allocation IDs; in those cases getProjectedExtent(alloc_id) will throw "Not projected" at runtime. Even when present, using allocation-ID extents here changes semantics vs the previous logical-ID-based computation and can incorrectly size the contig inner extent for TVs with an allocation permutation. Consider multiplying the projected extent for the matched logical_id (or ensure allocation IDs are always recorded consistently before using them here).
|
!test |
|
!test --diff |
| product_of_inner_extents = SimplifyingIrBuilder::mulExpr( | ||
| product_of_inner_extents, getProjectedExtent(alloc_id)); |
There was a problem hiding this comment.
Potential runtime error if alloc_id wasn't recorded in projected_extent_ during projection. While logical_id != nullptr check on line 812 handles disconnected allocation IDs, there may be edge cases where alloc_id is reachable from logical but wasn't successfully projected (e.g., if forward_project(leaf) didn't reach it due to complex transformations). Consider checking projected_extent_.count(alloc_id) before calling getProjectedExtent, or using logical_id instead.
| product_of_inner_extents = SimplifyingIrBuilder::mulExpr( | |
| product_of_inner_extents, getProjectedExtent(alloc_id)); | |
| product_of_inner_extents = SimplifyingIrBuilder::mulExpr( | |
| product_of_inner_extents, getProjectedExtent(logical_id)); |
Makes the code less error prone, and removes the reliance on isValidDeviceSplit to support non-outermost sharding in the future.
Should be an NFC.