Skip to main content

Normalize Data Transform

The Normalize Data transform allows you to scale numerical data in your dataset using various methods. This is crucial for many machine learning algorithms and statistical analyses that assume data is on a similar scale.

Basic Usage

To normalize numerical data in your dataset:

  1. Select the Normalize Data transform from the transform menu.
  2. Choose the numerical column(s) you want to normalize in the "Target Columns" dropdown.
  3. Select the normalization method you want to apply.
  4. (Optional) Configure advanced options for the selected method.
  5. Apply the transformation.
note

Only numerical columns will be available for selection in the "Target Columns" dropdown.

Configuration Options

Basic Options

  • Target Columns: Select one or more numerical columns to normalize.
  • Method: Choose the normalization method to apply. Available options are:
    • Min-Max Scaling
    • Z-Score Standardization
    • Robust Scaling
    • Max Absolute Scaling
    • Normalizer
    • Quantile Transformation
    • Power Transformation
tip

Hover over each normalization method to see a brief explanation of its use case and characteristics.

Advanced Options

Each normalization method has its own set of advanced options:

Min-Max Scaling
  • Range From: Lower bound of the scaling range (default: 0)
  • Range To: Upper bound of the scaling range (default: 1)
Z-Score Standardization
  • With Mean: Center the data before scaling (default: True)
  • With Standard Deviation: Scale the data to unit variance (default: True)
Robust Scaling
  • Quantile Range: IQR range for scaling (default: 25-75)
  • With Centering: Center the data before scaling (default: True)
  • With Scaling: Scale the data to IQR (default: True)
Normalizer
  • Norm: The norm to use for normalization (options: 'l1', 'l2', 'max')
Quantile Transformation
  • Number of Quantiles: Number of quantiles to compute (default: 1000)
  • Output Distribution: Type of output distribution (options: 'uniform', 'normal')
Power Transformation
  • Method: Power transformation method (options: 'yeo-johnson', 'box-cox')

Normalization Methods Explained

Min-Max Scaling

Scales data to a fixed range, typically between 0 and 1.

Best Use Case: When you need bounded values within a specific range, useful for algorithms that require non-negative inputs.

Z-Score Standardization

Transforms data to have a mean of 0 and a standard deviation of 1.

Best Use Case: When your data follows a Gaussian distribution and you need to compare features with different scales.

Robust Scaling

Scales data using statistics that are robust to outliers.

Best Use Case: When your dataset contains significant outliers that would distort other scaling methods.

Max Absolute Scaling

Scales each feature by its maximum absolute value.

Best Use Case: When you want to scale data without moving the zero point, particularly useful for sparse data.

Normalizer

Scales individual samples to a unit norm.

Best Use Case: When you're interested in the proportions of the features rather than their absolute values, often used in text classification or clustering.

Quantile Transformation

Transforms features to follow a uniform or normal distribution.

Best Use Case: When you want to spread out the most frequent values or reduce the impact of outliers.

Power Transformation

Applies a power transformation to make data more Gaussian-like.

Best Use Case: When dealing with skewed data and you want to stabilize variance and improve the normality of features.

Examples

Here's an example of how to use the Normalize Data transform:

Example: Normalizing Student Grades

Input Dataset:

Student_IDStudent_NameSubjectGradeRanking
101AliceMath06
102BobScience205
103CharlieEnglish404
104DavidMath603
105EvaScience802
106FrankMath1001

Configuration:

  • Target Columns: Grade
  • Method: Min-Max Scaling
  • Range From: 0
  • Range To: 1

Result:

Student_IDStudent_NameSubjectGradeRanking
101AliceMath0.06
102BobScience0.25
103CharlieEnglish0.44
104DavidMath0.63
105EvaScience0.82
106FrankMath1.01

Best Practices

  1. Choose the Right Method: Consider the distribution of your data and the requirements of your analysis or model when selecting a normalization method.

  2. Handle Outliers: If your data contains significant outliers, consider using robust scaling or methods that are less sensitive to extreme values.

  3. Preserve Zero: For some applications, it's important to preserve the zero point. In such cases, consider methods like Max Absolute Scaling.

  4. Consistent Scaling: Apply the same scaling method to both your training and test datasets to ensure consistency.

  5. Check Assumptions: Some methods (like Z-Score Standardization) assume a normal distribution. Verify if your data meets the assumptions of the chosen method.

Troubleshooting

  • If you don't see a column in the "Target Columns" dropdown, check if it's correctly identified as a numerical column in your dataset.
  • For methods sensitive to outliers (like Min-Max Scaling), check your data for extreme values that might skew the results.
  • If using Power Transformation with the Box-Cox method, ensure all your data is positive, as Box-Cox doesn't work with zero or negative values.