pyspark.pandas.melt¶
-
pyspark.pandas.
melt
(frame: pyspark.pandas.frame.DataFrame, id_vars: Union[Any, Tuple[Any, …], List[Union[Any, Tuple[Any, …]]], None] = None, value_vars: Union[Any, Tuple[Any, …], List[Union[Any, Tuple[Any, …]]], None] = None, var_name: Union[str, List[str], None] = None, value_name: str = 'value') → pyspark.pandas.frame.DataFrame[source]¶ Unpivot a DataFrame from wide format to long format, optionally leaving identifier variables set.
This function is useful to massage a DataFrame into a format where one or more columns are identifier variables (id_vars), while all other columns, considered measured variables (value_vars), are “unpivoted” to the row axis, leaving just two non-identifier columns, ‘variable’ and ‘value’.
- Parameters
- frameDataFrame
- id_varstuple, list, or ndarray, optional
Column(s) to use as identifier variables.
- value_varstuple, list, or ndarray, optional
Column(s) to unpivot. If not specified, uses all columns that are not set as id_vars.
- var_namescalar, default ‘variable’
Name to use for the ‘variable’ column. If None it uses frame.columns.name or ‘variable’.
- value_namescalar, default ‘value’
Name to use for the ‘value’ column.
- Returns
- DataFrame
Unpivoted DataFrame.
Examples
>>> df = ps.DataFrame({'A': {0: 'a', 1: 'b', 2: 'c'}, ... 'B': {0: 1, 1: 3, 2: 5}, ... 'C': {0: 2, 1: 4, 2: 6}}, ... columns=['A', 'B', 'C']) >>> df A B C 0 a 1 2 1 b 3 4 2 c 5 6
>>> ps.melt(df) variable value 0 A a 1 B 1 2 C 2 3 A b 4 B 3 5 C 4 6 A c 7 B 5 8 C 6
>>> df.melt(id_vars='A') A variable value 0 a B 1 1 a C 2 2 b B 3 3 b C 4 4 c B 5 5 c C 6
>>> df.melt(value_vars='A') variable value 0 A a 1 A b 2 A c
>>> ps.melt(df, id_vars=['A', 'B']) A B variable value 0 a 1 C 2 1 b 3 C 4 2 c 5 C 6
>>> df.melt(id_vars=['A'], value_vars=['C']) A variable value 0 a C 2 1 b C 4 2 c C 6
The names of ‘variable’ and ‘value’ columns can be customized:
>>> ps.melt(df, id_vars=['A'], value_vars=['B'], ... var_name='myVarname', value_name='myValname') A myVarname myValname 0 a B 1 1 b B 3 2 c B 5