pyspark.pandas.DataFrame.applymap¶

DataFrame.applymap(func: Callable[[Any], Any]) → pyspark.pandas.frame.DataFrame[source]¶

Apply a function to a Dataframe elementwise.

This method applies a function that accepts and returns a scalar to every element of a DataFrame.

Note

this API executes the function once to infer the type which is potentially expensive, for instance, when the dataset is created after aggregations or sorting.

To avoid this, specify return type in func, for instance, as below:

>>> def square(x) -> np.int32:
...     return x ** 2

pandas-on-Spark uses return type hint and does not try to infer the type.

Parameters

funccallable: Python function, returns a single value from a single value.

Returns

DataFrame: Transformed DataFrame.

Examples

>>> df = ps.DataFrame([[1, 2.12], [3.356, 4.567]])
>>> df
       0      1
0  1.000  2.120
1  3.356  4.567

>>> def str_len(x) -> int:
...     return len(str(x))
>>> df.applymap(str_len)
   0  1
0  3  4
1  5  5

>>> def power(x) -> float:
...     return x ** 2
>>> df.applymap(power)
           0          1
0   1.000000   4.494400
1  11.262736  20.857489

You can omit the type hint and let pandas-on-Spark infer its type.

>>> df.applymap(lambda x: x ** 2)
           0          1
0   1.000000   4.494400
1  11.262736  20.857489

pyspark.pandas.DataFrame.apply

pyspark.pandas.DataFrame.pipe