Skip to main content

Union Transform

The Union transform enables you to combine multiple datasets vertically by appending rows from different sources into a single dataset. This transform provides several options for aligning columns and handling discrepancies between input datasets.

Basic Usage

To union multiple datasets:

caution

The Union transform requires at least two Data Input nodes.
Make sure you have added two datasets to your pipeline before using this operation.

union Mode

  1. Select the Union transform from the transform menu.
  2. Choose the datasets you want to union.
  3. Configure the alignment mode and additional options.
  4. (Optional) Adjust advanced settings for more control.

Configuration Options

Basic Options

  • Alignment Mode: Choose how columns from different datasets will be aligned:

    • Name: Align columns based on their names.
    • Position: Align columns based on their order in each dataset.
    • Manual: Manually map columns between datasets.
  • Output Order: Drag and drop to reorder the datasets in the final output.

  • Common Columns Only: Enable this option to include only columns present in all input datasets.

Advanced Options

  • Manual Column Mapping: When using Manual alignment mode, specify how columns from each dataset should be mapped.

  • Column Configuration: Manually configure which fields to include and how they should be aligned.

  • Warn on Extra Columns: Receive warnings about columns that are not present in all input datasets.

Examples

Here are some examples of how to use the Union transform:

Example 1: Name-based Alignment

Input Datasets:

Dataset 1:

AB
1x
2y

Dataset 2:

BC
z3
w4

Configuration:

  • Alignment Mode: Name
  • Common Columns Only: Off

Result:

ABC
1xnull
2ynull
nullz3
nullw4
Example 2: Position-based Alignment

Input Datasets:

Dataset 1:

AB
1x
2y

Dataset 2:

CD
3z
4w

Configuration:

  • Alignment Mode: Position
  • Common Columns Only: Off

Result:

AB
1x
2y
3z
4w
Example 3: Manual Alignment

Input Datasets:

Dataset 1:

NameAge
John30
Mary25

Dataset 2:

Full NameYears
Bob Smith40
Alice Lee35

Configuration:

  • Alignment Mode: Manual
  • Manual Mapping:
    • Name → Full Name
    • Age → Years

Result:

Full NameYears
John30
Mary25
Bob Smith40
Alice Lee35
tip

When working with datasets that have different column names but represent the same information, use the Manual alignment mode for precise control over how columns are combined.

caution

Be cautious when using Position-based alignment with datasets that have different structures. Columns might be misaligned if their order differs across datasets.

Best Practices

  1. Verify Column Alignment: Always double-check that columns are aligned correctly, especially when using Position-based alignment.

  2. Handle Missing Data: If Common Columns Only is disabled, missing columns may introduce null values in the result.

  3. Use Output Order: Reorder input datasets as needed to prioritize data from specific datasets in the output.

  4. Monitor Warnings: Pay attention to warnings about extra columns to ensure no data is unintentionally omitted.

  5. Test with Samples: For large datasets, test with smaller samples first to ensure your union configuration works as expected.