pyspark.sql.functions.regexp_extract_all

pyspark.sql.functions.regexp_extract_all(str: ColumnOrName, regexp: ColumnOrName, idx: Union[int, pyspark.sql.column.Column, None] = None) → pyspark.sql.column.Column[source]

Extract all strings in the str that match the Java regex regexp and corresponding to the regex group index.

New in version 3.5.0.

Parameters
strColumn or str

target column to work on.

regexpColumn or str

regex pattern to apply.

idxint

matched group id.

Returns
Column

all strings in the str that match a Java regex and corresponding to the regex group index.

Examples

>>> df = spark.createDataFrame([("100-200, 300-400", r"(\d+)-(\d+)")], ["str", "regexp"])
>>> df.select(regexp_extract_all('str', lit(r'(\d+)-(\d+)')).alias('d')).collect()
[Row(d=['100', '300'])]
>>> df.select(regexp_extract_all('str', lit(r'(\d+)-(\d+)'), 1).alias('d')).collect()
[Row(d=['100', '300'])]
>>> df.select(regexp_extract_all('str', lit(r'(\d+)-(\d+)'), 2).alias('d')).collect()
[Row(d=['200', '400'])]
>>> df.select(regexp_extract_all('str', col("regexp")).alias('d')).collect()
[Row(d=['100', '300'])]