pyspark.sql.functions.regexp_extract_all¶
-
pyspark.sql.functions.
regexp_extract_all
(str: ColumnOrName, regexp: ColumnOrName, idx: Union[int, pyspark.sql.column.Column, None] = None) → pyspark.sql.column.Column[source]¶ Extract all strings in the str that match the Java regex regexp and corresponding to the regex group index.
New in version 3.5.0.
- Parameters
- Returns
Column
all strings in the str that match a Java regex and corresponding to the regex group index.
Examples
>>> df = spark.createDataFrame([("100-200, 300-400", r"(\d+)-(\d+)")], ["str", "regexp"]) >>> df.select(regexp_extract_all('str', lit(r'(\d+)-(\d+)')).alias('d')).collect() [Row(d=['100', '300'])] >>> df.select(regexp_extract_all('str', lit(r'(\d+)-(\d+)'), 1).alias('d')).collect() [Row(d=['100', '300'])] >>> df.select(regexp_extract_all('str', lit(r'(\d+)-(\d+)'), 2).alias('d')).collect() [Row(d=['200', '400'])] >>> df.select(regexp_extract_all('str', col("regexp")).alias('d')).collect() [Row(d=['100', '300'])]