Skip to main content

Regex Filter Transform

The Regex Filter transform allows you to apply regular expression patterns to text columns in your dataset. This powerful tool enables you to search, replace, or transform text data based on specific patterns.

Basic Usage

To apply a regex filter to your dataset:

  1. Select the Regex Filter transform from the transform menu.
  2. Choose one or more text columns to apply the filter.
  3. Enter your regex pattern.
  4. Specify the replacement text or function.
  5. Apply the transformation.

Configuration Options

Basic Options

  • Select Column(s): Choose one or more text columns to apply the regex filter. Only string-type columns will be available for selection.
  • Regex Pattern: Enter the regular expression pattern to match within the selected column(s).
  • Replacement: Specify the text or pattern to replace the matched regex. This can be a static string or a more complex replacement pattern.
tip

If you're not familiar with regex patterns, consider using our AI assistant to help formulate the appropriate regex for your needs.

Examples

Here are some examples of how to use the Regex Filter transform:

Example 1: Masking Phone Numbers

Input Dataset:

NamePhone
Alice123-456-7890
Bob(987) 654-3210
Carol555.123.4567

Configuration:

  • Select Column(s): Phone
  • Regex Pattern: \d
  • Replacement: X

Result:

NamePhone
AliceXXX-XXX-XXXX
Bob(XXX) XXX-XXXX
CarolXXX.XXX.XXXX
Example 2: Standardizing Email Domains

Input Dataset:

EmployeeEmail
Johnjohn@oldomain.com
Sarahsarah@anotherdomain.net
Mikemike@olddomain.org

Configuration:

  • Select Column(s): Email
  • Regex Pattern: @.*$
  • Replacement: @newdomain.com

Result:

EmployeeEmail
Johnjohn@newdomain.com
Sarahsarah@newdomain.com
Mikemike@newdomain.com
caution

Regex filters are powerful but can also be complex. Always preview your results to ensure the transformation behaves as expected, especially when working with critical data.

Best Practices

  1. Test Your Regex: Before applying a regex filter to your entire dataset, test it on a small sample to ensure it produces the desired results.

  2. Be Specific: Create regex patterns that are as specific as possible to avoid unintended matches.

  3. Consider Edge Cases: Think about potential edge cases in your data that might produce unexpected results with your regex pattern.

  4. Preserve Original Data: When possible, create new columns for transformed data rather than overwriting existing ones, especially when working with sensitive information.

  5. Document Your Patterns: Keep a record of the regex patterns you use and their purposes for future reference and reproducibility.