A datatype that can be accumulated, i.
Helper object defining how to accumulate values of a particular type.
A simpler value of Accumulable where the result type being accumulated is the same as the types of elements being merged.
A simpler version of AccumulableParam where the only datatype you can add in is the same type as the accumulated value.
A set of functions used to aggregate data.
A FutureAction for actions that could trigger multiple Spark jobs.
Base class for dependencies.
A future for the result of an action.
A Partitioner that implements hash-based partitioning using Java's Object.hashCode
.
An iterator that wraps around an existing iterator to provide task killing functionality.
Utility trait for classes that want to log data.
Base class for dependencies where each partition of the parent RDD is used by at most one partition of the child RDD.
Represents a one-to-one dependency between partitions of the parent and child RDDs.
A partition of an RDD.
An object that defines how the elements in a key-value pair RDD are partitioned by key.
Represents a one-to-one dependency between ranges of partitions in the parent and child RDDs.
A Partitioner that partitions sortable records by range into roughly equal ranges.
Represents a dependency on the output of a shuffle stage.
The future holding the result of an action that triggers a single job.
Main entry point for Spark functionality.
Holds all the runtime environment objects for a running Spark instance (either master or worker), including the serializer, Akka actor system, block manager, map output tracker, etc.
The SparkContext object contains a number of implicit conversions and parameters for use with various Spark features.
Core Spark functionality. SparkContext serves as the main entry point to Spark, while RDD is the data type representing a distributed collection, and provides most parallel operations.
In addition, PairRDDFunctions contains operations available only on RDDs of key-value pairs, such as
groupByKey
andjoin
; DoubleRDDFunctions contains operations available only on RDDs of Doubles; and SequenceFileRDDFunctions contains operations available on RDDs that can be saved as SequenceFiles. These operations are automatically available on any RDD of the right type (e.g. RDD[(Int, Int)] through implicit conversions when youimport org.apache.spark.SparkContext._
.