Skip to content

Comments

use relative indices in post methods#5974

Merged
Priya2698 merged 11 commits intomainfrom
pm/relative_index
Feb 19, 2026
Merged

use relative indices in post methods#5974
Priya2698 merged 11 commits intomainfrom
pm/relative_index

Conversation

@Priya2698
Copy link
Collaborator

@Priya2698 Priya2698 commented Feb 18, 2026

For cases such as broadcast-based allgather in a host loop, the root index is the for-loop index, which may not be the absolute device ID. I am changing all lowering methods to use relative root index which is what the backends use as well.

@Priya2698
Copy link
Collaborator Author

!test

@github-actions
Copy link

github-actions bot commented Feb 18, 2026

Review updated until commit 25b081c

Description

  • Refactor communication operations to use relative indices instead of absolute device IDs

  • Move getRelativeIndex() from Communication class to utility function in utils.cpp

  • Update all lowering methods (scatter, gather, broadcast, reduce, sendrecv) to use relative indices

  • Modify post communication functions to accept relative indices and handle team-based lookups

  • Update test case to demonstrate relative index usage with SendRecv communication

Changes walkthrough

Relevant files
Enhancement
lower_to_communication.cpp
Update lowering methods to use relative indices                   

csrc/host_ir/lower_to_communication.cpp

  • Replace direct root usage with getRelativeIndex() calls in scatter,
    gather, broadcast, reduce operations
  • Update SendRecv to use relative index for sender in team-based
    communication
  • Change include from post_communication.h to communication.h
  • +8/-7     
    post_communication.cpp
    Update post communication functions for relative indices 

    csrc/multidevice/post_communication.cpp

  • Update all post functions to accept int64_t root_index instead of
    DeviceIdxType
  • Add team-based device lookups using team.at(root_index) for mesh
    validation
  • Remove getRelativeIndex() calls since indices are now relative by
    default
  • Simplify SendRecv logic to use relative team indices directly
  • +54/-63 
    utils.cpp
    Add getRelativeIndex utility function                                       

    csrc/multidevice/utils.cpp

  • Add getRelativeIndex() utility function to convert absolute device ID
    to relative index
  • Include error handling for device not found in team
  • +6/-0     
    post_communication.h
    Update function signatures for relative indices                   

    csrc/multidevice/post_communication.h

  • Update function signatures to use int64_t for root_index parameter
  • Change my_device_index parameter name to my_device for consistency
  • +3/-3     
    utils.h
    Add getRelativeIndex function declaration                               

    csrc/multidevice/utils.h

  • Add getRelativeIndex() function declaration with team and rank
    parameters
  • Include documentation comment explaining the function purpose
  • +3/-0     
    Refactoring
    communication.cpp
    Remove getRelativeIndex method from Communication class   

    csrc/multidevice/communication.cpp

  • Remove getRelativeIndex() method from Communication class
  • Keep validate() and clone functionality intact
  • +0/-7     
    communication.h
    Remove getRelativeIndex method declaration                             

    csrc/multidevice/communication.h

  • Remove getRelativeIndex() method declaration from Communication class
  • Keep all other method declarations intact
  • +0/-5     
    Tests
    test_multidevice_communications.cpp
    Update test to use relative indices                                           

    tests/cpp/test_multidevice_communications.cpp

  • Update SendRecv test to use getRelativeIndex() for creating
    communication with relative root
  • Create team explicitly and pass relative index instead of absolute
    device ID
  • +6/-1     

    PR Reviewer Guide

    Here are some key observations to aid the review process:

    🧪 PR contains tests
    ⚡ Recommended focus areas for review
    Parameter Type Consistency

    The PR changes parameter types from DeviceIdxType to int64_t for my_device_index and root_index in multiple post communication functions. While this works for the relative indices being passed, we should verify that all call sites are consistently updated and that the type change doesn't introduce any implicit conversion issues, especially when absolute device IDs need to be converted to relative indices.

    c10::intrusive_ptr<c10d::Work> postBroadcast(
        Communication* communication,
        int64_t my_device_index,
        int64_t root_index,
        c10d::Backend* backend,
        at::Tensor input_tensor,
        at::Tensor output_tensor) {
    Root Index Validation

    In functions like postBroadcast, postGather, postScatter, and postReduce, the code now uses team.at(root_index) to access the actual device mesh. This assumes root_index is a valid relative index (0 to team.size()-1). We should verify that all callers properly convert absolute device IDs to relative indices before calling these functions, and that error handling is adequate for out-of-bounds relative indices.

    const Team& team = communication->team();
    if (my_device_index == root_index) {
      if (communication->out()->getDeviceMesh().has(team.at(root_index))) {
    SendRecv Logic Simplification

    The postSendRecv function has been significantly simplified to assume team size is 1 or 2, and uses relative indices directly. The previous implementation had more robust error checking for different team sizes. We should verify that this simplification doesn't break existing use cases and that the relative index logic correctly handles all valid team configurations.

    c10::intrusive_ptr<c10d::Work> postSendRecv(
        Communication* communication,
        int64_t my_device_index,
        int64_t root_index,
        c10d::Backend* backend,
        at::Tensor input_tensor,
        at::Tensor output_tensor) {
      const Team& team = communication->team();
      NVF_ERROR_LE(team.size(), 2);
    
      if (team.size() == 1 || team[0] == team[1]) {
        doLocalCopy(output_tensor, input_tensor);
        return nullptr;
      }
      // All indices are relative and not absolute device IDs.
      int64_t sender_index = root_index;
      int64_t receiver_index = 1 - sender_index;
      std::vector<at::Tensor> tensors;
      if (my_device_index == root_index) {
        tensors = {input_tensor};
        return backend->send(
            tensors,
            static_cast<int>(receiver_index),
            /*tag=*/0);
      }
      NVF_ERROR_EQ(my_device_index, receiver_index);
      tensors = {output_tensor};
      return backend->recv(
          tensors,
          static_cast<int>(sender_index),
          /*tag=*/0);
    }

    @greptile-apps
    Copy link
    Contributor

    greptile-apps bot commented Feb 18, 2026

    Greptile Summary

    Refactored communication lowering to use relative device indices instead of absolute device IDs. The Communication::getRelativeIndex method was extracted into a standalone utility function and all lowering code now converts absolute device indices to team-relative indices at the point of Communication creation.

    Key Changes:

    • Moved getRelativeIndex from Communication class to utils.cpp as a standalone function
    • Updated all Communication creation sites in lower_to_communication.cpp to call getRelativeIndex when setting the root index
    • Modified all post* functions to accept int64_t (relative) instead of DeviceIdxType (absolute) for my_device_index and root_index parameters
    • Simplified postSendRecv to use 1 - sender_index arithmetic since indices are now guaranteed to be relative (0 or 1 for a 2-device team)
    • Updated test to properly construct team and use relative indices

    Confidence Score: 5/5

    • This PR is safe to merge with minimal risk
    • The refactoring is well-structured and addresses the stated goal of supporting broadcast-based allgather in host loops where root indices are loop variables rather than absolute device IDs. All previously identified compilation issues have been resolved (extra brace removed, header declaration added). The changes are consistent across the codebase, with proper separation of concerns between lowering (which converts absolute to relative) and posting (which uses relative). The test has been updated correctly.
    • No files require special attention

    Important Files Changed

    Filename Overview
    csrc/multidevice/utils.cpp Added getRelativeIndex function to convert absolute device indices to team-relative indices
    csrc/multidevice/utils.h Added declaration for getRelativeIndex helper function
    csrc/host_ir/lower_to_communication.cpp Updated all Communication creation calls to use relative indices via getRelativeIndex
    csrc/multidevice/post_communication.cpp Refactored post methods to accept relative indices; simplified postSendRecv logic

    Flowchart

    %%{init: {'theme': 'neutral'}}%%
    flowchart TD
        A[lowerToScatter/Gather/Broadcast/SendRecv/Reduce] --> B[Create Team from DeviceMesh]
        B --> C[getRelativeIndex: Convert absolute DeviceIdxType to relative int64_t]
        C --> D[Create Communication with relative root_index]
        D --> E[postSingleCommunication called at runtime]
        E --> F[Convert my_device absolute to relative using getRelativeIndex]
        F --> G{Communication Type?}
        G -->|Broadcast| H[postBroadcast with relative indices]
        G -->|Gather| I[postGather with relative indices]
        G -->|SendRecv| J[postSendRecv with relative indices]
        G -->|Scatter| K[postScatter with relative indices]
        G -->|Reduce| L[postReduce with relative indices]
        J --> M[sender_index = root_index]
        M --> N[receiver_index = 1 - sender_index]
        N --> O[backend->send/recv with relative indices]
    
    Loading

    Last reviewed commit: 25b081c

    Copy link
    Contributor

    @greptile-apps greptile-apps bot left a comment

    Choose a reason for hiding this comment

    The reason will be displayed to describe this comment to others. Learn more.

    5 files reviewed, 4 comments

    Edit Code Review Agent Settings | Greptile

    @greptile-apps
    Copy link
    Contributor

    greptile-apps bot commented Feb 18, 2026

    Additional Comments (2)

    csrc/multidevice/post_communication.cpp
    Calls removed member function

    postScatter still calls communication->getRelativeIndex(root_index), but the getRelativeIndex member function was removed from the Communication class in this PR. Since root_index is now already a relative index (converted in postSingleCommunication at line 465), this should just pass root_index directly, consistent with the pattern applied to postBroadcast, postGather, and postReduce.

          {.rootRank = root_index});
    

    csrc/multidevice/post_communication.cpp
    Uses relative index as device ID

    After the changes in postSingleCommunication (line 465), root_index is now a relative team index (0-based), not a global device ID. However, output_device_mesh.has(root_index) checks whether a device ID exists in the mesh, not a relative index. This should be output_device_mesh.has(team.at(root_index)) to first resolve the relative index back to a device ID, consistent with the pattern used in postBroadcast (line 116), postGather (line 144), and postReduce (line 267).

    You'll need to add const Team& team = communication->team(); at the beginning of this function as well.

    Copy link
    Contributor

    @greptile-apps greptile-apps bot left a comment

    Choose a reason for hiding this comment

    The reason will be displayed to describe this comment to others. Learn more.

    5 files reviewed, 1 comment

    Edit Code Review Agent Settings | Greptile

    Copy link
    Contributor

    @greptile-apps greptile-apps bot left a comment

    Choose a reason for hiding this comment

    The reason will be displayed to describe this comment to others. Learn more.

    7 files reviewed, 2 comments

    Edit Code Review Agent Settings | Greptile

    @greptile-apps
    Copy link
    Contributor

    greptile-apps bot commented Feb 18, 2026

    Additional Comments (1)

    tests/cpp/test_multidevice_communications.cpp
    Test passes absolute index as root_index

    After this PR, postSingleCommunication expects root_index to be a relative index (consistent with communication->root() now storing relative indices). However, this test passes the absolute sender (value 1) as root_index.

    This works only by coincidence because team = {0, 1} happens to have device 1 at relative position 1. If the test used different device IDs (e.g., team = {3, 5}), the absolute sender ID would not equal its relative index.

    To be consistent with the new convention, this should pass the relative index:

        auto work = postSingleCommunication(
            communication, rank, backend, input_tensor, output_tensor, getRelativeIndex(team, sender));
    

    Copy link
    Contributor

    @greptile-apps greptile-apps bot left a comment

    Choose a reason for hiding this comment

    The reason will be displayed to describe this comment to others. Learn more.

    7 files reviewed, 1 comment

    Edit Code Review Agent Settings | Greptile

    @greptile-apps
    Copy link
    Contributor

    greptile-apps bot commented Feb 18, 2026

    Additional Comments (1)

    tests/cpp/test_multidevice_communications.cpp
    Test passes absolute index as root_index

    After this PR, postSingleCommunication expects root_index to be a relative index (since my_device_index is now converted to relative at line 462 of post_communication.cpp). Here, sender = 1 (absolute device index) is passed as root_index. This works only by coincidence because team = {0, 1}, so the absolute index of device 1 happens to equal its relative index in the team.

    To be consistent with the rest of the PR's intent, consider passing the relative index explicitly:

        auto work = postSingleCommunication(
            communication, rank, backend, input_tensor, output_tensor, getRelativeIndex(team, sender));
    

    @Priya2698 Priya2698 marked this pull request as draft February 18, 2026 20:08
    Copy link
    Contributor

    @greptile-apps greptile-apps bot left a comment

    Choose a reason for hiding this comment

    The reason will be displayed to describe this comment to others. Learn more.

    7 files reviewed, 2 comments

    Edit Code Review Agent Settings | Greptile

    @greptile-apps
    Copy link
    Contributor

    greptile-apps bot commented Feb 18, 2026

    Additional Comments (1)

    tests/cpp/test_multidevice_communications.cpp
    Test passes absolute ID as relative index

    postSingleCommunication now expects root_index to be a relative index (since my_device_index is converted to relative inside the function, and all comparisons are done in relative space). Here, sender is the absolute device ID (1), not the relative index. It works by coincidence because team = {0, 1} gives an identity mapping between absolute and relative indices.

    For correctness and to actually test the relative-index contract, pass the relative index:

        auto work = postSingleCommunication(
            communication, rank, backend, input_tensor, output_tensor, getRelativeIndex(team, sender));
    

    if (std::find(team.begin(), team.end(), my_device_index) == team.end()) {
    return nullptr;
    }
    my_device_index = getRelativeIndex(team, my_device_index);
    Copy link
    Collaborator Author

    Choose a reason for hiding this comment

    The reason will be displayed to describe this comment to others. Learn more.

    I can also do this in caller methods such that postSingleCommunication receives all relative indices.

    Copy link
    Collaborator

    Choose a reason for hiding this comment

    The reason will be displayed to describe this comment to others. Learn more.

    Using the same name for both absolute and relative index is a bit confusing. Maybe my_index_in_team of type int64_t will be clearer?

    Copy link
    Collaborator

    Choose a reason for hiding this comment

    The reason will be displayed to describe this comment to others. Learn more.

    Also, you may want to consider using _index for and only for relative indices. E.g. my_device vs my_device_index/my_index_in_team and root vs root_index. You sort of followed that convention in other methods already, e.g., sender vs sender_index.

    @Priya2698
    Copy link
    Collaborator Author

    !test

    @Priya2698
    Copy link
    Collaborator Author

    !test

    @Priya2698 Priya2698 marked this pull request as ready for review February 19, 2026 02:09
    @Priya2698 Priya2698 requested a review from wujingyue February 19, 2026 02:10
    Copy link
    Contributor

    @greptile-apps greptile-apps bot left a comment

    Choose a reason for hiding this comment

    The reason will be displayed to describe this comment to others. Learn more.

    7 files reviewed, no comments

    Edit Code Review Agent Settings | Greptile

    if (std::find(team.begin(), team.end(), my_device_index) == team.end()) {
    return nullptr;
    }
    my_device_index = getRelativeIndex(team, my_device_index);
    Copy link
    Collaborator

    Choose a reason for hiding this comment

    The reason will be displayed to describe this comment to others. Learn more.

    Using the same name for both absolute and relative index is a bit confusing. Maybe my_index_in_team of type int64_t will be clearer?

    if (std::find(team.begin(), team.end(), my_device_index) == team.end()) {
    return nullptr;
    }
    my_device_index = getRelativeIndex(team, my_device_index);
    Copy link
    Collaborator

    Choose a reason for hiding this comment

    The reason will be displayed to describe this comment to others. Learn more.

    Also, you may want to consider using _index for and only for relative indices. E.g. my_device vs my_device_index/my_index_in_team and root vs root_index. You sort of followed that convention in other methods already, e.g., sender vs sender_index.

    @wujingyue
    Copy link
    Collaborator

    A note for future work: communication libraries like NCCL take absolute ranks so all relative indices here will have to be translated to absolute by accessing the device mesh. This translation can be done in ProcessGroup (as is today), HostIrEvaluator, or LLVM IR that HostIrJit generates. For SOL, we may have to avoid doing this translation for every communication. For that, we may have to unify the identical meshes (recall that meshes are per TensorView) so LLVM's CSE can kick in.

    cc @Priya2698

    @Priya2698
    Copy link
    Collaborator Author

    A note for future work: communication libraries like NCCL take absolute ranks so all relative indices here will have to be translated to absolute by accessing the device mesh.

    NCCL accepts ranks (https://docs.nvidia.com/deeplearning/nccl/user-guide/docs/usage/collectives.html#reduce) instead of device IDs.

    Channel management in NCCL is coordinated at the communicator level, where each GPU receives a unique rank between 0 and n-1, where n is the total number of GPUs participating in the communicator

    Similarly for NVSHMEM:

    An NVSHMEM job represents a single program, multiple data (SPMD) parallel execution. Each PE is assigned an integer identifier (ID), that ranges from zero to one less than the total number of PEs.

    @Priya2698
    Copy link
    Collaborator Author

    !test

    @Priya2698
    Copy link
    Collaborator Author

    Renaming to distinguish device ids from indices in team in other files will be done in a follow-up PR.

    Copy link
    Contributor

    @greptile-apps greptile-apps bot left a comment

    Choose a reason for hiding this comment

    The reason will be displayed to describe this comment to others. Learn more.

    8 files reviewed, no comments

    Edit Code Review Agent Settings | Greptile

    @Priya2698 Priya2698 merged commit 99b6446 into main Feb 19, 2026
    60 of 61 checks passed
    @Priya2698 Priya2698 deleted the pm/relative_index branch February 19, 2026 22:03
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

    Labels

    None yet

    Projects

    None yet

    Development

    Successfully merging this pull request may close these issues.

    2 participants