pyspark.pandas.DataFrame.spark.repartition¶
-
spark.
repartition
(num_partitions: int) → ps.DataFrame¶ Returns a new DataFrame partitioned by the given partitioning expressions. The resulting DataFrame is hash partitioned.
- Parameters
- num_partitionsint
The target number of partitions.
- Returns
- DataFrame
Examples
>>> psdf = ps.DataFrame({"age": [5, 5, 2, 2], ... "name": ["Bob", "Bob", "Alice", "Alice"]}).set_index("age") >>> psdf.sort_index() name age 2 Alice 2 Alice 5 Bob 5 Bob >>> new_psdf = psdf.spark.repartition(7) >>> new_psdf.to_spark().rdd.getNumPartitions() 7 >>> new_psdf.sort_index() name age 2 Alice 2 Alice 5 Bob 5 Bob