pyspark.sql.Catalog.createTable¶
-
Catalog.
createTable
(tableName: str, path: Optional[str] = None, source: Optional[str] = None, schema: Optional[pyspark.sql.types.StructType] = None, description: Optional[str] = None, **options: str) → pyspark.sql.dataframe.DataFrame[source]¶ Creates a table based on the dataset in a data source.
New in version 2.2.0.
- Parameters
- tableNamestr
name of the table to create.
Changed in version 3.4.0: Allow
tableName
to be qualified with catalog name.- pathstr, optional
the path in which the data for this table exists. When
path
is specified, an external table is created from the data at the given path. Otherwise a managed table is created.- sourcestr, optional
the source of this table such as ‘parquet, ‘orc’, etc. If
source
is not specified, the default data source configured byspark.sql.sources.default
will be used.- schemaclass:StructType, optional
the schema for this table.
- descriptionstr, optional
the description of this table.
Changed in version 3.1.0: Added the
description
parameter.- **optionsdict, optional
extra options to specify in the table.
- Returns
DataFrame
The DataFrame associated with the table.
Examples
Creating a managed table.
>>> _ = spark.catalog.createTable("tbl1", schema=spark.range(1).schema, source='parquet') >>> _ = spark.sql("DROP TABLE tbl1")
Creating an external table
>>> import tempfile >>> with tempfile.TemporaryDirectory() as d: ... _ = spark.catalog.createTable( ... "tbl2", schema=spark.range(1).schema, path=d, source='parquet') >>> _ = spark.sql("DROP TABLE tbl2")