Skip to main content

Split Columns Transform

The Split Columns transform allows you to divide the content of a single column into multiple new columns based on a specified separator. This is useful for parsing complex data fields, separating combined information, or restructuring your dataset for more detailed analysis.

Basic Usage

To split a column in your dataset:

  1. Select the Split Columns transform from the transform menu.
  2. Choose the column you want to split in the "Select Column" dropdown.
  3. Specify the separator in the "Separator" input field.
  4. Enter names for the new columns that will be created.
  5. (Optional) Configure advanced options for more precise control.
  6. Apply the transformation.

Configuration Options

Basic Options

  • Select Column: Choose the column you want to split.
  • Separator: Enter the character or string that separates the values you want to split. For a space, enter " " (including the quotes).
  • New Column Names: Provide names for the new columns that will be created from the split.

Advanced Options

  • Max Split: Specify the maximum number of splits to perform. Enter -1 for no limit. (Default: -1)
  • Fill Value: Enter a value to use for empty cells resulting from the split. (Default: empty string)
  • Convert to String: Toggle this option to automatically convert non-string values to strings before splitting. (Default: Off)

Examples

Here are some examples of how to use the Split Columns transform:

Example 1: Splitting Full Names

Input Dataset:

full_nameemail
John Doejohn.doe@example.com
Jane Smithjane.smith@example.com
Michael Johnsonmichael.j@example.com

Configuration:

  • Select Column: full_name
  • Separator: " " (space in quotes)
  • New Column Names: first_name, last_name

Result:

full_nameemailfirst_namelast_name
John Doejohn.doe@example.comJohnDoe
Jane Smithjane.smith@example.comJaneSmith
Michael Johnsonmichael.j@example.comMichaelJohnson
Example 2: Splitting Addresses

Input Dataset:

addresspostal_code
123 Main St, Springfield, IL62704
456 Elm St, Chicago, IL60616
789 Oak St, Naperville, IL60540

Configuration:

  • Select Column: address
  • Separator: ,
  • New Column Names: street, city, state
  • Max Split: 2
  • Fill Value: "Unknown"

Result:

addresspostal_codestreetcitystate
123 Main St, Springfield, IL62704123 Main StSpringfieldIL
456 Elm St, Chicago, IL60616456 Elm StChicagoIL
789 Oak St, Naperville, IL60540789 Oak StNapervilleIL
tip

When splitting columns, consider the consistency of your data. If some rows might have fewer elements than others, use the "Fill Value" option to handle missing data after the split.

caution

Splitting columns can significantly change your data structure. Always preview the results to ensure the split behaves as expected across all your data.

Best Practices

  1. Choose Clear Separators: Ensure your separator is consistently used throughout the column you're splitting.

  2. Handle Inconsistent Data: Use the "Max Split" and "Fill Value" options to manage cases where data might not split evenly across all rows.

  3. Name New Columns Clearly: Choose descriptive names for your new columns to maintain clarity in your dataset.

  4. Preserve Original Data: Consider keeping the original unsplit column in case you need to refer back to it or undo the split.

  5. Check Data Types: If your column contains mixed data types, use the "Convert to String" option to ensure consistent splitting.

Troubleshooting

  • If your split results in unexpected empty columns, check if your separator is correct and consistently used in your data.
  • For columns with mixed data types, try enabling the "Convert to String" option to avoid errors.
  • If you're not seeing all expected splits, check if you need to adjust the "Max Split" value.