pyspark.sql.streaming.DataStreamWriter.format

DataStreamWriter.format(source: str) → pyspark.sql.streaming.readwriter.DataStreamWriter[source]

Specifies the underlying output data source.

New in version 2.0.0.

Changed in version 3.5.0: Supports Spark Connect.

Parameters
sourcestr

string, name of the data source, which for now can be ‘parquet’.

Notes

This API is evolving.

Examples

>>> df = spark.readStream.format("rate").load()
>>> df.writeStream.format("text")
<...streaming.readwriter.DataStreamWriter object ...>

This API allows to configure the source to write. The example below writes a CSV file from Rate source in a streaming manner.

>>> import tempfile
>>> import time
>>> with tempfile.TemporaryDirectory() as d, tempfile.TemporaryDirectory() as cp:
...     df = spark.readStream.format("rate").load()
...     q = df.writeStream.format("csv").option("checkpointLocation", cp).start(d)
...     time.sleep(5)
...     q.stop()
...     spark.read.schema("timestamp TIMESTAMP, value STRING").csv(d).show()
+...---------+-----+
|...timestamp|value|
+...---------+-----+
...