pyspark.sql.functions.regexp_extract_all¶

pyspark.sql.functions.regexp_extract_all(str: ColumnOrName, regexp: ColumnOrName, idx: Union[int, pyspark.sql.column.Column, None] = None) → pyspark.sql.column.Column[source]¶

Extract all strings in the str that match the Java regex regexp and corresponding to the regex group index.

New in version 3.5.0.

Parameters

strColumn or str: target column to work on.
regexpColumn or str: regex pattern to apply.
idxint: matched group id.

Returns

Column: all strings in the str that match a Java regex and corresponding to the regex group index.

Examples

>>> df = spark.createDataFrame([("100-200, 300-400", r"(\d+)-(\d+)")], ["str", "regexp"])
>>> df.select(regexp_extract_all('str', lit(r'(\d+)-(\d+)')).alias('d')).collect()
[Row(d=['100', '300'])]
>>> df.select(regexp_extract_all('str', lit(r'(\d+)-(\d+)'), 1).alias('d')).collect()
[Row(d=['100', '300'])]
>>> df.select(regexp_extract_all('str', lit(r'(\d+)-(\d+)'), 2).alias('d')).collect()
[Row(d=['200', '400'])]
>>> df.select(regexp_extract_all('str', col("regexp")).alias('d')).collect()
[Row(d=['100', '300'])]

pyspark.sql.functions.regexp_extract

pyspark.sql.functions.regexp_replace