pyspark.sql.DataFrame.groupingSets#
- DataFrame.groupingSets(groupingSets, *cols)[source]#
Create multi-dimensional aggregation for the current class:DataFrame using the specified grouping sets, so we can run aggregation on them.
New in version 4.0.0.
- Parameters
- groupingSetssequence of sequence of columns or str
Individual set of columns to group on.
- cols
Column
or str Addional grouping columns specified by users. Those columns are shown as the output columns after aggregation.
- Returns
GroupedData
Grouping sets of the data based on the specified columns.
See also
Examples
Example 1: Group by city and car_model, city, and all, and calculate the sum of quantity.
>>> from pyspark.sql import functions as sf >>> df = spark.createDataFrame([ ... (100, 'Fremont', 'Honda Civic', 10), ... (100, 'Fremont', 'Honda Accord', 15), ... (100, 'Fremont', 'Honda CRV', 7), ... (200, 'Dublin', 'Honda Civic', 20), ... (200, 'Dublin', 'Honda Accord', 10), ... (200, 'Dublin', 'Honda CRV', 3), ... (300, 'San Jose', 'Honda Civic', 5), ... (300, 'San Jose', 'Honda Accord', 8) ... ], schema="id INT, city STRING, car_model STRING, quantity INT")
>>> df.groupingSets( ... [("city", "car_model"), ("city",), ()], ... "city", "car_model" ... ).agg(sf.sum(sf.col("quantity")).alias("sum")).sort("city", "car_model").show() +--------+------------+---+ | city| car_model|sum| +--------+------------+---+ | NULL| NULL| 78| | Dublin| NULL| 33| | Dublin|Honda Accord| 10| | Dublin| Honda CRV| 3| | Dublin| Honda Civic| 20| | Fremont| NULL| 32| | Fremont|Honda Accord| 15| | Fremont| Honda CRV| 7| | Fremont| Honda Civic| 10| |San Jose| NULL| 13| |San Jose|Honda Accord| 8| |San Jose| Honda Civic| 5| +--------+------------+---+
Example 2: Group by multiple columns and calculate both average and sum.
>>> df.groupingSets( ... [("city", "car_model"), ("city",), ()], ... "city", "car_model" ... ).agg( ... sf.avg(sf.col("quantity")).alias("avg_quantity"), ... sf.sum(sf.col("quantity")).alias("sum_quantity") ... ).sort("city", "car_model").show() +--------+------------+------------------+------------+ | city| car_model| avg_quantity|sum_quantity| +--------+------------+------------------+------------+ | NULL| NULL| 9.75| 78| | Dublin| NULL| 11.0| 33| | Dublin|Honda Accord| 10.0| 10| | Dublin| Honda CRV| 3.0| 3| | Dublin| Honda Civic| 20.0| 20| | Fremont| NULL|10.666666666666666| 32| | Fremont|Honda Accord| 15.0| 15| | Fremont| Honda CRV| 7.0| 7| | Fremont| Honda Civic| 10.0| 10| |San Jose| NULL| 6.5| 13| |San Jose|Honda Accord| 8.0| 8| |San Jose| Honda Civic| 5.0| 5| +--------+------------+------------------+------------+