pyspark.sql.DataFrame.transform

DataFrame.transform(func: Callable[[…], DataFrame], *args: Any, **kwargs: Any) → pyspark.sql.dataframe.DataFrame[source]

Returns a new DataFrame. Concise syntax for chaining custom transformations.

New in version 3.0.0.

Changed in version 3.4.0: Supports Spark Connect.

Parameters
funcfunction

a function that takes and returns a DataFrame.

*args

Positional arguments to pass to func.

New in version 3.3.0.

**kwargs

Keyword arguments to pass to func.

New in version 3.3.0.

Returns
DataFrame

Transformed DataFrame.

Examples

>>> from pyspark.sql.functions import col
>>> df = spark.createDataFrame([(1, 1.0), (2, 2.0)], ["int", "float"])
>>> def cast_all_to_int(input_df):
...     return input_df.select([col(col_name).cast("int") for col_name in input_df.columns])
...
>>> def sort_columns_asc(input_df):
...     return input_df.select(*sorted(input_df.columns))
...
>>> df.transform(cast_all_to_int).transform(sort_columns_asc).show()
+-----+---+
|float|int|
+-----+---+
|    1|  1|
|    2|  2|
+-----+---+
>>> def add_n(input_df, n):
...     return input_df.select([(col(col_name) + n).alias(col_name)
...                             for col_name in input_df.columns])
>>> df.transform(add_n, 1).transform(add_n, n=10).show()
+---+-----+
|int|float|
+---+-----+
| 12| 12.0|
| 13| 13.0|
+---+-----+