pyspark.SparkContext.runJob¶
-
SparkContext.
runJob
(rdd, partitionFunc, partitions=None, allowLocal=False)[source]¶ Executes the given partitionFunc on the specified set of partitions, returning the result as an array of elements.
If ‘partitions’ is not specified, this will run over all partitions.
Examples
>>> myRDD = sc.parallelize(range(6), 3) >>> sc.runJob(myRDD, lambda part: [x * x for x in part]) [0, 1, 4, 9, 16, 25]
>>> myRDD = sc.parallelize(range(6), 3) >>> sc.runJob(myRDD, lambda part: [x * x for x in part], [0, 2], True) [0, 1, 16, 25]