Normalize Data Transform

The Normalize Data transform allows you to scale numerical data in your dataset using various methods. This is crucial for many machine learning algorithms and statistical analyses that assume data is on a similar scale.

Basic Usage

To normalize numerical data in your dataset:

Select the Normalize Data transform from the transform menu.
Choose the numerical column(s) you want to normalize in the "Target Columns" dropdown.
Select the normalization method you want to apply.
(Optional) Configure advanced options for the selected method.
Apply the transformation.

note

Only numerical columns will be available for selection in the "Target Columns" dropdown.

Configuration Options

Basic Options

Target Columns: Select one or more numerical columns to normalize.
Method: Choose the normalization method to apply. Available options are:
- Min-Max Scaling
- Z-Score Standardization
- Robust Scaling
- Max Absolute Scaling
- Normalizer
- Quantile Transformation
- Power Transformation

tip

Hover over each normalization method to see a brief explanation of its use case and characteristics.

Advanced Options

Each normalization method has its own set of advanced options:

Min-Max Scaling

Range From: Lower bound of the scaling range (default: 0)
Range To: Upper bound of the scaling range (default: 1)

Z-Score Standardization

With Mean: Center the data before scaling (default: True)
With Standard Deviation: Scale the data to unit variance (default: True)

Robust Scaling

Quantile Range: IQR range for scaling (default: 25-75)
With Centering: Center the data before scaling (default: True)
With Scaling: Scale the data to IQR (default: True)

Normalizer

Norm: The norm to use for normalization (options: 'l1', 'l2', 'max')

Quantile Transformation

Number of Quantiles: Number of quantiles to compute (default: 1000)
Output Distribution: Type of output distribution (options: 'uniform', 'normal')

Power Transformation

Method: Power transformation method (options: 'yeo-johnson', 'box-cox')

Normalization Methods Explained

Min-Max Scaling

Scales data to a fixed range, typically between 0 and 1.

Best Use Case: When you need bounded values within a specific range, useful for algorithms that require non-negative inputs.

Z-Score Standardization

Transforms data to have a mean of 0 and a standard deviation of 1.

Best Use Case: When your data follows a Gaussian distribution and you need to compare features with different scales.

Robust Scaling

Scales data using statistics that are robust to outliers.

Best Use Case: When your dataset contains significant outliers that would distort other scaling methods.

Max Absolute Scaling

Scales each feature by its maximum absolute value.

Best Use Case: When you want to scale data without moving the zero point, particularly useful for sparse data.

Normalizer

Scales individual samples to a unit norm.

Best Use Case: When you're interested in the proportions of the features rather than their absolute values, often used in text classification or clustering.

Quantile Transformation

Transforms features to follow a uniform or normal distribution.

Best Use Case: When you want to spread out the most frequent values or reduce the impact of outliers.

Power Transformation

Applies a power transformation to make data more Gaussian-like.

Best Use Case: When dealing with skewed data and you want to stabilize variance and improve the normality of features.

Examples

Here's an example of how to use the Normalize Data transform:

Example: Normalizing Student Grades

Input Dataset:

Student_ID	Student_Name	Subject	Grade	Ranking
101	Alice	Math	0	6
102	Bob	Science	20	5
103	Charlie	English	40	4
104	David	Math	60	3
105	Eva	Science	80	2
106	Frank	Math	100	1

Configuration:

Target Columns: Grade
Method: Min-Max Scaling
Range From: 0
Range To: 1

Result:

Student_ID	Student_Name	Subject	Grade	Ranking
101	Alice	Math	0.0	6
102	Bob	Science	0.2	5
103	Charlie	English	0.4	4
104	David	Math	0.6	3
105	Eva	Science	0.8	2
106	Frank	Math	1.0	1

Best Practices

Choose the Right Method: Consider the distribution of your data and the requirements of your analysis or model when selecting a normalization method.
Handle Outliers: If your data contains significant outliers, consider using robust scaling or methods that are less sensitive to extreme values.
Preserve Zero: For some applications, it's important to preserve the zero point. In such cases, consider methods like Max Absolute Scaling.
Consistent Scaling: Apply the same scaling method to both your training and test datasets to ensure consistency.
Check Assumptions: Some methods (like Z-Score Standardization) assume a normal distribution. Verify if your data meets the assumptions of the chosen method.

Troubleshooting

If you don't see a column in the "Target Columns" dropdown, check if it's correctly identified as a numerical column in your dataset.
For methods sensitive to outliers (like Min-Max Scaling), check your data for extreme values that might skew the results.
If using Power Transformation with the Box-Cox method, ensure all your data is positive, as Box-Cox doesn't work with zero or negative values.

Basic Usage​

Configuration Options​

Basic Options​

Advanced Options​

Normalization Methods Explained​

Examples​

Best Practices​

Troubleshooting​