Pivot Transform
The Pivot transform allows you to reshape your data from a long format to a wide format by creating new columns based on unique values in a specified column. This is useful for summarizing and analyzing data in a more compact and readable format.
Basic Usage
To pivot your data:
- Select the Pivot transform from the transform menu.
- Choose the columns by which to group your rows.
- Select the column to create new headers from.
- Specify the column containing values for the new columns.
- (Optional) Configure any advanced options.
Configuration Options
Basic Options
- Group Rows By: Select one or more columns to use as the index for your pivoted data.
- Create New Columns From: Choose the column whose unique values will become new column headers.
- Values for New Columns: Select the column containing the data that will populate the new columns.
Advanced Options
-
Aggregation Method: Choose how to aggregate values when they fall into the same pivoted cell. Available options include:
- Sum
- Mean
- Count
- Min
- Max
- (And more, depending on the data type)
-
Fill Value: Specify a value to fill empty cells after pivoting.
Examples
Here are some examples demonstrating how to use the Pivot transform:
Example 1: Basic Pivot
Input Dataset:
| Date | Store | Product | Sales |
|---|---|---|---|
| 2024-09-01 | Store A | Product 1 | 100 |
| 2024-09-01 | Store A | Product 2 | 150 |
| 2024-09-02 | Store B | Product 1 | 90 |
| 2024-09-02 | Store B | Product 2 | 130 |
Configuration:
- Group Rows By:
Store - Create New Columns From:
Product - Values for New Columns:
Sales - Aggregation Method:
Sum - Fill Value:
0
Result:
| Store | Product 1 | Product 2 |
|---|---|---|
| Store A | 100 | 150 |
| Store B | 90 | 130 |
Example 2: Multi-Index Pivot
Input Dataset:
| Date | Store | Product | Sales |
|---|---|---|---|
| 2024-09-01 | Store A | Product 1 | 100 |
| 2024-09-01 | Store A | Product 2 | 150 |
| 2024-09-02 | Store B | Product 1 | 90 |
| 2024-09-02 | Store B | Product 2 | 130 |
| 2024-09-03 | Store A | Product 1 | 120 |
Configuration:
- Group Rows By:
Date,Store - Create New Columns From:
Product - Values for New Columns:
Sales - Aggregation Method:
Sum - Fill Value:
0
Result:
| Date | Store | Product 1 | Product 2 |
|---|---|---|---|
| 2024-09-01 | Store A | 100 | 150 |
| 2024-09-02 | Store B | 90 | 130 |
| 2024-09-03 | Store A | 120 | 0 |
When pivoting data with multiple entries for the same combination of index and column values, make sure to choose an appropriate aggregation method to combine the values.
Pivoting can significantly increase the number of columns in your dataset, especially if the "Create New Columns From" field has many unique values. This may impact performance and readability for very large datasets.
Best Practices
-
Choose Meaningful Indexes: Select columns for "Group Rows By" that provide a useful structure for the pivoted data.
-
Consider Data Types: The aggregation methods available may depend on the data type of the "Values for New Columns" field. Choose a method that fits your data.
-
Handle Missing Data: Use the "Fill Value" option to handle empty cells after pivoting (e.g., 0 for numerical data, "Unknown" for categorical data).
-
Limit New Columns: To keep your pivoted data manageable, limit the number of unique values in the "Create New Columns From" field.
-
Preview Results: Always preview the results of your pivot to ensure the output matches your expectations, especially when working with large datasets.