Re-creates a StreamingContext from a checkpoint file.
Re-creates a StreamingContext from a checkpoint file.
Path either to the directory that was specified as the checkpoint directory, or to the checkpoint file 'graph' or 'graph.bk'.
Creates a StreamingContext using an existing SparkContext.
Creates a StreamingContext using an existing SparkContext.
The underlying JavaSparkContext to use
The time interval at which streaming data will be divided into batches
Creates a StreamingContext.
Creates a StreamingContext.
Name of the Spark Master
Name to be used when registering with the scheduler
The time interval at which streaming data will be divided into batches
The SPARK_HOME directory on the slave nodes
Collection of JARs to send to the cluster. These can be paths on the local file system or HDFS, HTTP, HTTPS, or FTP URLs.
Environment variables to set on worker nodes
Creates a StreamingContext.
Creates a StreamingContext.
Name of the Spark Master
Name to be used when registering with the scheduler
The time interval at which streaming data will be divided into batches
The SPARK_HOME directory on the slave nodes
Collection of JARs to send to the cluster. These can be paths on the local file system or HDFS, HTTP, HTTPS, or FTP URLs.
Creates a StreamingContext.
Creates a StreamingContext.
Name of the Spark Master
Name to be used when registering with the scheduler
The time interval at which streaming data will be divided into batches
The SPARK_HOME directory on the slave nodes
JAR file containing job code, to ship to cluster. This can be a path on the local file system or an HDFS, HTTP, HTTPS, or FTP URL.
Creates a StreamingContext.
Creates a StreamingContext.
Name of the Spark Master
Name to be used when registering with the scheduler
The time interval at which streaming data will be divided into batches
Create an input stream with any arbitrary user implemented actor receiver.
Create an input stream with any arbitrary user implemented actor receiver.
Props object defining creation of the actor
Name of the actor
An important point to note: Since Actor may exist outside the spark framework, It is thus user's responsibility to ensure the type safety, i.e parametrized type of data received and actorStream should be same.
Create an input stream with any arbitrary user implemented actor receiver.
Create an input stream with any arbitrary user implemented actor receiver.
Props object defining creation of the actor
Name of the actor
Storage level to use for storing the received objects
An important point to note: Since Actor may exist outside the spark framework, It is thus user's responsibility to ensure the type safety, i.e parametrized type of data received and actorStream should be same.
Create an input stream with any arbitrary user implemented actor receiver.
Create an input stream with any arbitrary user implemented actor receiver.
Props object defining creation of the actor
Name of the actor
Storage level to use for storing the received objects
An important point to note: Since Actor may exist outside the spark framework, It is thus user's responsibility to ensure the type safety, i.e parametrized type of data received and actorStream should be same.
Sets the context to periodically checkpoint the DStream operations for master fault-tolerance.
Sets the context to periodically checkpoint the DStream operations for master fault-tolerance. The graph will be checkpointed every batch interval.
HDFS-compatible directory where the checkpoint data will be reliably stored
Creates a input stream that monitors a Hadoop-compatible filesystem for new files and reads them using the given key-value types and input format.
Creates a input stream that monitors a Hadoop-compatible filesystem for new files and reads them using the given key-value types and input format. File names starting with . are ignored.
Key type for reading HDFS file
Value type for reading HDFS file
Input format for reading HDFS file
HDFS directory to monitor for new file
Creates a input stream from a Flume source.
Creates a input stream from a Flume source.
Hostname of the slave machine to which the flume data will be sent
Port of the slave machine to which the flume data will be sent
Creates a input stream from a Flume source.
Creates a input stream from a Flume source.
Hostname of the slave machine to which the flume data will be sent
Port of the slave machine to which the flume data will be sent
Storage level to use for storing the received objects
Create an input stream that pulls messages form a Kafka Broker.
Create an input stream that pulls messages form a Kafka Broker.
Type of RDD
Type of kafka decoder
Map of kafka configuration paramaters. See: http://kafka.apache.org/configuration.html
Map of (topic_name -> numPartitions) to consume. Each partition is consumed in its own thread.
RDD storage level. Defaults to memory-only
Create an input stream that pulls messages form a Kafka Broker.
Create an input stream that pulls messages form a Kafka Broker.
Zookeper quorum (hostname:port,hostname:port,..).
The group id for this consumer.
Map of (topic_name -> numPartitions) to consume. Each partition is consumed in its own thread.
RDD storage level. Defaults to memory-only
Create an input stream that pulls messages form a Kafka Broker.
Create an input stream that pulls messages form a Kafka Broker.
Zookeper quorum (hostname:port,hostname:port,..).
The group id for this consumer.
Map of (topic_name -> numPartitions) to consume. Each partition is consumed in its own thread.
Creates a input stream from an queue of RDDs.
Creates a input stream from an queue of RDDs. In each batch, it will process either one or all of the RDDs returned by the queue.
NOTE: changes to the queue after the stream is created will not be recognized.
Type of objects in the RDD
Queue of RDDs
Whether only one RDD should be consumed from the queue in every interval
Default RDD is returned by the DStream when the queue is empty
Creates a input stream from an queue of RDDs.
Creates a input stream from an queue of RDDs. In each batch, it will process either one or all of the RDDs returned by the queue.
NOTE: changes to the queue after the stream is created will not be recognized.
Type of objects in the RDD
Queue of RDDs
Whether only one RDD should be consumed from the queue in every interval
Creates a input stream from an queue of RDDs.
Creates a input stream from an queue of RDDs. In each batch, it will process either one or all of the RDDs returned by the queue.
NOTE: changes to the queue after the stream is created will not be recognized.
Type of objects in the RDD
Queue of RDDs
Create a input stream from network source hostname:port, where data is received as serialized blocks (serialized using the Spark's serializer) that can be directly pushed into the block manager without deserializing them.
Create a input stream from network source hostname:port, where data is received as serialized blocks (serialized using the Spark's serializer) that can be directly pushed into the block manager without deserializing them. This is the most efficient way to receive data.
Type of the objects in the received blocks
Hostname to connect to for receiving data
Port to connect to for receiving data
Create a input stream from network source hostname:port, where data is received as serialized blocks (serialized using the Spark's serializer) that can be directly pushed into the block manager without deserializing them.
Create a input stream from network source hostname:port, where data is received as serialized blocks (serialized using the Spark's serializer) that can be directly pushed into the block manager without deserializing them. This is the most efficient way to receive data.
Type of the objects in the received blocks
Hostname to connect to for receiving data
Port to connect to for receiving data
Storage level to use for storing the received objects
Registers an output stream that will be computed every interval
Sets each DStreams in this context to remember RDDs it generated in the last given duration.
Sets each DStreams in this context to remember RDDs it generated in the last given duration. DStreams remember RDDs only for a limited duration of duration and releases them for garbage collection. This method allows the developer to specify how to long to remember the RDDs ( if the developer wishes to query old data outside the DStream computation).
Minimum duration that each DStream should remember its RDDs
The underlying SparkContext
Create a input stream from network source hostname:port.
Create a input stream from network source hostname:port. Data is received using a TCP socket and the receive bytes it interepreted as object using the given converter.
Type of the objects received (after converting bytes to objects)
Hostname to connect to for receiving data
Port to connect to for receiving data
Function to convert the byte stream to objects
Storage level to use for storing the received objects
Create a input stream from network source hostname:port.
Create a input stream from network source hostname:port. Data is received using a TCP socket and the receive bytes is interpreted as UTF8 encoded \n delimited lines.
Hostname to connect to for receiving data
Port to connect to for receiving data
Create a input stream from network source hostname:port.
Create a input stream from network source hostname:port. Data is received using a TCP socket and the receive bytes is interpreted as UTF8 encoded \n delimited lines.
Hostname to connect to for receiving data
Port to connect to for receiving data
Storage level to use for storing the received objects (default: StorageLevel.MEMORY_AND_DISK_SER_2)
Starts the execution of the streams.
Sstops the execution of the streams.
Creates a input stream that monitors a Hadoop-compatible filesystem for new files and reads them as text files (using key as LongWritable, value as Text and input format as TextInputFormat).
Creates a input stream that monitors a Hadoop-compatible filesystem for new files and reads them as text files (using key as LongWritable, value as Text and input format as TextInputFormat). File names starting with . are ignored.
HDFS directory to monitor for new file
Create a new DStream in which each RDD is generated by applying a function on RDDs of the DStreams.
Create a new DStream in which each RDD is generated by applying a function on RDDs of the DStreams. The order of the JavaRDDs in the transform function parameter will be the same as the order of corresponding DStreams in the list. Note that for adding a JavaPairDStream in the list of JavaDStreams, convert it to a JavaDStream using JavaPairDStream.toJavaDStream(). In the transform function, convert the JavaRDD corresponding to that JavaDStream to a JavaPairRDD using org.apache.spark.api.java.JavaPairRDD.fromJavaRDD().
Create a new DStream in which each RDD is generated by applying a function on RDDs of the DStreams.
Create a new DStream in which each RDD is generated by applying a function on RDDs of the DStreams. The order of the JavaRDDs in the transform function parameter will be the same as the order of corresponding DStreams in the list. Note that for adding a JavaPairDStream in the list of JavaDStreams, convert it to a JavaDStream using JavaPairDStream.toJavaDStream(). In the transform function, convert the JavaRDD corresponding to that JavaDStream to a JavaPairRDD using org.apache.spark.api.java.JavaPairRDD.fromJavaRDD().
Create a input stream that returns tweets received from Twitter using Twitter4J's default OAuth authentication; this requires the system properties twitter4j.
Create a input stream that returns tweets received from Twitter using Twitter4J's default OAuth authentication; this requires the system properties twitter4j.oauth.consumerKey, .consumerSecret, .accessToken and .accessTokenSecret to be set.
Create a input stream that returns tweets received from Twitter.
Create a input stream that returns tweets received from Twitter.
Twitter4J Authorization
Create a input stream that returns tweets received from Twitter using Twitter4J's default OAuth authentication; this requires the system properties twitter4j.
Create a input stream that returns tweets received from Twitter using Twitter4J's default OAuth authentication; this requires the system properties twitter4j.oauth.consumerKey, .consumerSecret, .accessToken and .accessTokenSecret to be set.
Set of filter strings to get only those tweets that match them
Create a input stream that returns tweets received from Twitter.
Create a input stream that returns tweets received from Twitter.
Twitter4J Authorization
Set of filter strings to get only those tweets that match them
Create a input stream that returns tweets received from Twitter using Twitter4J's default OAuth authentication; this requires the system properties twitter4j.
Create a input stream that returns tweets received from Twitter using Twitter4J's default OAuth authentication; this requires the system properties twitter4j.oauth.consumerKey, .consumerSecret, .accessToken and .accessTokenSecret to be set.
Set of filter strings to get only those tweets that match them
Storage level to use for storing the received objects
Create a input stream that returns tweets received from Twitter.
Create a input stream that returns tweets received from Twitter.
Twitter4J Authorization object
Set of filter strings to get only those tweets that match them
Storage level to use for storing the received objects
Create a unified DStream from multiple DStreams of the same type and same slide duration.
Create a unified DStream from multiple DStreams of the same type and same slide duration.
Create an input stream that receives messages pushed by a zeromq publisher.
Create an input stream that receives messages pushed by a zeromq publisher.
Url of remote zeromq publisher
topic to subscribe to
A zeroMQ stream publishes sequence of frames for each topic and each frame has sequence of byte thus it needs the converter(which might be deserializer of bytes) to translate from sequence of sequence of bytes, where sequence refer to a frame and sub sequence refer to its payload.
Create an input stream that receives messages pushed by a zeromq publisher.
Create an input stream that receives messages pushed by a zeromq publisher.
Url of remote zeromq publisher
topic to subscribe to
A zeroMQ stream publishes sequence of frames for each topic and each frame has sequence of byte thus it needs the converter(which might be deserializer of bytes) to translate from sequence of sequence of bytes, where sequence refer to a frame and sub sequence refer to its payload.
RDD storage level. Defaults to memory-only.
Create an input stream that receives messages pushed by a zeromq publisher.
Create an input stream that receives messages pushed by a zeromq publisher.
Url of remote zeromq publisher
topic to subscribe to
A zeroMQ stream publishes sequence of frames for each topic and each frame has sequence of byte thus it needs the converter(which might be deserializer of bytes) to translate from sequence of sequence of bytes, where sequence refer to a frame and sub sequence refer to its payload.
Storage level to use for storing the received objects
A StreamingContext is the main entry point for Spark Streaming functionality. Besides the basic information (such as, cluster URL and job name) to internally create a SparkContext, it provides methods used to create DStream from various input sources.