pyspark.sql.DataFrame.intersectAll¶
-
DataFrame.
intersectAll
(other: pyspark.sql.dataframe.DataFrame) → pyspark.sql.dataframe.DataFrame[source]¶ Return a new
DataFrame
containing rows in both thisDataFrame
and anotherDataFrame
while preserving duplicates.This is equivalent to INTERSECT ALL in SQL. As standard in SQL, this function resolves columns by position (not by name).
New in version 2.4.0.
Changed in version 3.4.0: Supports Spark Connect.
- Parameters
- Returns
DataFrame
Combined DataFrame.
Examples
>>> df1 = spark.createDataFrame([("a", 1), ("a", 1), ("b", 3), ("c", 4)], ["C1", "C2"]) >>> df2 = spark.createDataFrame([("a", 1), ("a", 1), ("b", 3)], ["C1", "C2"]) >>> df1.intersectAll(df2).sort("C1", "C2").show() +---+---+ | C1| C2| +---+---+ | a| 1| | a| 1| | b| 3| +---+---+