pyspark.pandas.Series.rank¶
-
Series.
rank
(method: str = 'average', ascending: bool = True, numeric_only: Optional[bool] = None) → pyspark.pandas.series.Series[source]¶ Compute numerical data ranks (1 through n) along axis. Equal values are assigned a rank that is the average of the ranks of those values.
Note
the current implementation of rank uses Spark’s Window without specifying partition specification. This leads to moveing all data into a single partition in a single machine and could cause serious performance degradation. Avoid this method with very large datasets.
- Parameters
- method{‘average’, ‘min’, ‘max’, ‘first’, ‘dense’}
average: average rank of group
min: lowest rank in group
max: highest rank in group
first: ranks assigned in order they appear in the array
dense: like ‘min’, but rank always increases by 1 between groups
- ascendingboolean, default True
False for ranks by high (1) to low (N)
- numeric_onlybool, optional
If set to True, rank numeric Series, or return an empty Series for non-numeric Series
- Returns
- rankssame type as caller
Examples
>>> s = ps.Series([1, 2, 2, 3], name='A') >>> s 0 1 1 2 2 2 3 3 Name: A, dtype: int64
>>> s.rank() 0 1.0 1 2.5 2 2.5 3 4.0 Name: A, dtype: float64
If method is set to ‘min’, it uses lowest rank in group.
>>> s.rank(method='min') 0 1.0 1 2.0 2 2.0 3 4.0 Name: A, dtype: float64
If method is set to ‘max’, it uses highest rank in group.
>>> s.rank(method='max') 0 1.0 1 3.0 2 3.0 3 4.0 Name: A, dtype: float64
If method is set to ‘first’, it is assigned rank in order without groups.
>>> s.rank(method='first') 0 1.0 1 2.0 2 3.0 3 4.0 Name: A, dtype: float64
If method is set to ‘dense’, it leaves no gaps in group.
>>> s.rank(method='dense') 0 1.0 1 2.0 2 2.0 3 3.0 Name: A, dtype: float64
If numeric_only is set to ‘True’, rank only numeric Series, return an empty Series otherwise.
>>> s = ps.Series(['a', 'b', 'c'], name='A', index=['x', 'y', 'z']) >>> s x a y b z c Name: A, dtype: object
>>> s.rank(numeric_only=True) Series([], Name: A, dtype: float64)