Skip to content

Add Melt method to DataFrame#7578

Open
sevenzees wants to merge 2 commits intodotnet:mainfrom
sevenzees:main
Open

Add Melt method to DataFrame#7578
sevenzees wants to merge 2 commits intodotnet:mainfrom
sevenzees:main

Conversation

@sevenzees
Copy link
Contributor

Add DataFrame.Melt() method for transforming wide to long format

Description

This PR implements a Melt() method for the DataFrame class that transforms data from wide format to long format, similar to Pandas' pandas.melt() function. This is a fundamental data reshaping operation that "unpivots" multiple value columns into a pair of variable-value columns.

Fixes #7577

What does this change do?

The Melt() method:

  • Accepts identifier columns that remain fixed in the output
  • Unpivots specified value columns (or auto-detects them if not specified) into two new columns: one containing the original column names and one containing the values
  • Supports customizable column names for the variable and value columns
  • Handles mixed data types across value columns by converting to string when necessary
  • Optionally filters out null or empty values from the result
  • Includes comprehensive input validation with clear error messages

Why this approach?

Performance optimizations:

  • Pre-calculates the total output size to allocate columns once upfront, eliminating expensive incremental resize operations
  • Uses direct iteration instead of creating intermediate index arrays
  • Caches column references to reduce repeated dictionary lookups

Design decisions:

  • Separated validation and data processing into focused helper methods for maintainability
  • Follows the Pandas API design for familiarity to users coming from Python
  • Maintains type safety by preserving column types when all value columns share the same type
  • Defaults to using all non-ID columns as value columns for convenience (matches Pandas behavior)

API signature:

public DataFrame Melt(
    IEnumerable<string> idColumns, 
    IEnumerable<string> valueColumns = null, 
    string variableName = "variable", 
    string valueName = "value", 
    bool dropNulls = false)

Changes included

  • DataFrame.cs: Added Melt() method and supporting helper methods
    • ValidateMeltParameters(): Input validation
    • CalculateTotalOutputRows(): Pre-calculates output size for efficient allocation
    • InitializeIdColumns(): Sets up ID columns with correct size
    • CreateValueColumn(): Creates appropriately typed value column
    • FillMeltedData(): Performs the actual unpivoting operation

Example usage

// Transform quarterly sales data from wide to long format
var df = new DataFrame(new[]
{
    new StringDataFrameColumn("Region", new[] { "North", "South" }),
    new Int32DataFrameColumn("Q1", new[] { 1000, 800 }),
    new Int32DataFrameColumn("Q2", new[] { 1200, 900 }),
    new Int32DataFrameColumn("Q3", new[] { 1100, 950 })
});

var melted = df.Melt(
    idColumns: new[] { "Region" },
    valueColumns: new[] { "Q1", "Q2", "Q3" },
    variableName: "Quarter",
    valueName: "Sales"
);

// Result:
// | Region | Quarter | Sales |
// |--------|---------|-------|
// | North  | Q1      | 1000  |
// | North  | Q2      | 1200  |
// | North  | Q3      | 1100  |
// | South  | Q1      | 800   |
// | South  | Q2      | 900   |
// | South  | Q3      | 950   |

Additional notes

This implementation brings the .NET DataFrame API closer to feature parity with Pandas and supports common data transformation workflows needed for analysis and visualization. The method is optimized for performance while maintaining code readability and maintainability.

@codecov
Copy link

codecov bot commented Feb 8, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 69.08%. Comparing base (3604580) to head (8508ba3).

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #7578      +/-   ##
==========================================
+ Coverage   69.05%   69.08%   +0.02%     
==========================================
  Files        1483     1483              
  Lines      274362   274586     +224     
  Branches    28270    28286      +16     
==========================================
+ Hits       189466   189698     +232     
+ Misses      77510    77506       -4     
+ Partials     7386     7382       -4     
Flag Coverage Δ
Debug 69.08% <100.00%> (+0.02%) ⬆️
production 63.33% <100.00%> (+0.02%) ⬆️
test 89.54% <100.00%> (+0.02%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines Coverage Δ
src/Microsoft.Data.Analysis/DataFrame.cs 93.46% <100.00%> (+1.61%) ⬆️
...st/Microsoft.Data.Analysis.Tests/DataFrameTests.cs 99.91% <100.00%> (+0.01%) ⬆️

... and 4 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@sevenzees sevenzees marked this pull request as ready for review February 9, 2026 15:30
@sevenzees
Copy link
Contributor Author

Not sure who all would want to look at this, but I have another PR here. @tarekgh @ericstj @jeffhandley

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add Melt method

1 participant