pyspark.sql.streaming.DataStreamWriter.format¶
-
DataStreamWriter.
format
(source: str) → pyspark.sql.streaming.readwriter.DataStreamWriter[source]¶ Specifies the underlying output data source.
New in version 2.0.0.
Changed in version 3.5.0: Supports Spark Connect.
- Parameters
- sourcestr
string, name of the data source, which for now can be ‘parquet’.
Notes
This API is evolving.
Examples
>>> df = spark.readStream.format("rate").load() >>> df.writeStream.format("text") <...streaming.readwriter.DataStreamWriter object ...>
This API allows to configure the source to write. The example below writes a CSV file from Rate source in a streaming manner.
>>> import tempfile >>> import time >>> with tempfile.TemporaryDirectory() as d, tempfile.TemporaryDirectory() as cp: ... df = spark.readStream.format("rate").load() ... q = df.writeStream.format("csv").option("checkpointLocation", cp).start(d) ... time.sleep(5) ... q.stop() ... spark.read.schema("timestamp TIMESTAMP, value STRING").csv(d).show() +...---------+-----+ |...timestamp|value| +...---------+-----+ ...