pyspark.SparkContext.union¶
-
SparkContext.
union
(rdds: List[pyspark.rdd.RDD[T]]) → pyspark.rdd.RDD[T][source]¶ Build the union of a list of RDDs.
This supports unions() of RDDs with different serialized formats, although this forces them to be reserialized using the default serializer:
New in version 0.7.0.
See also
Examples
>>> import os >>> import tempfile >>> with tempfile.TemporaryDirectory() as d: ... # generate a text RDD ... with open(os.path.join(d, "union-text.txt"), "w") as f: ... _ = f.write("Hello") ... text_rdd = sc.textFile(d) ... ... # generate another RDD ... parallelized = sc.parallelize(["World!"]) ... ... unioned = sorted(sc.union([text_rdd, parallelized]).collect())
>>> unioned ['Hello', 'World!']