pyspark.RDD.reduce¶

RDD.reduce(f)[source]¶

Reduces the elements of this RDD using the specified commutative and associative binary operator. Currently reduces partitions locally.

Examples

>>> from operator import add
>>> sc.parallelize([1, 2, 3, 4, 5]).reduce(add)
15
>>> sc.parallelize((2 for _ in range(10))).map(lambda x: 1).cache().reduce(add)
10
>>> sc.parallelize([]).reduce(add)
Traceback (most recent call last):
    ...
ValueError: Can not reduce() empty RDD

pyspark.RDD.randomSplit

pyspark.RDD.reduceByKey