-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Description
Is your feature request related to a problem? Please describe.
Working with DataFrames in wide format is often inconvenient for analysis and visualization. I'm frequently frustrated when I need to transform data from wide format (multiple columns representing different variables) to long format (rows representing observations). Currently, there's no built-in way to "unpivot" or "melt" a DataFrame in the .NET DataFrame API, which means I have to write complex manual loops to restructure my data. This is a common operation in data analysis workflows, especially when preparing data for charting libraries or performing grouped aggregations.
Describe the solution you'd like
I'd like a Melt() method similar to Pandas' pandas.melt() function (https://pandas.pydata.org/docs/reference/api/pandas.melt.html) that transforms a DataFrame from wide format to long format. The method should:
- Accept identifier columns (
idColumns) that remain as columns in the output - Accept value columns (
valueColumns) to unpivot, or automatically use all non-ID columns if not specified - Allow customization of the variable column name (defaults to "variable") that will contain the original column names
- Allow customization of the value column name (defaults to "value") that will contain the unpivoted values
- Support a
dropNullsparameter to exclude null or empty values from the result - Handle mixed data types across value columns by converting to string when necessary
- Validate inputs to prevent invalid configurations (overlapping ID/value columns, empty column lists, etc.)
Describe alternatives you've considered
I have written application level code to do this, but it is such a common use case, that I think it makes sense to include it in the DataFrame where everyone can use it.
Additional context
This feature would bring the .NET DataFrame API closer to feature parity with popular data analysis libraries like Pandas (Python) and tidyr (R). The melt operation is fundamental for "tidy data" principles and is commonly used for:
- Preparing data for time series visualization
- Reshaping survey or experimental data where each column represents a measurement
- Converting measurement matrices into observation tables
- Preparing data for statistical modeling that expects long-format inputs
Example use case:
// Original wide format
// | ID | Name | Q1_Sales | Q2_Sales | Q3_Sales | Q4_Sales |
// |----|-------|----------|----------|----------|----------|
// | 1 | North | 1000 | 1200 | 1100 | 1300 |
// | 2 | South | 800 | 900 | 950 | 1000 |
var melted = df.Melt(
idColumns: new[] { "ID", "Name" },
valueColumns: new[] { "Q1_Sales", "Q2_Sales", "Q3_Sales", "Q4_Sales" },
variableName: "Quarter",
valueName: "Sales"
);
// Result: long format suitable for charting
// | ID | Name | Quarter | Sales |
// |----|-------|-----------|-------|
// | 1 | North | Q1_Sales | 1000 |
// | 1 | North | Q2_Sales | 1200 |
// | 1 | North | Q3_Sales | 1100 |
// | 1 | North | Q4_Sales | 1300 |
// | 2 | South | Q1_Sales | 800 |
// | 2 | South | Q2_Sales | 900 |
// | 2 | South | Q3_Sales | 950 |
// | 2 | South | Q4_Sales | 1000 |This would significantly improve the DataFrame API's usability for data transformation workflows.