pyspark.pandas.Series.replace¶
-
Series.
replace
(to_replace: Union[Any, List, Tuple, Dict, None] = None, value: Union[List, Tuple, None] = None, regex: Union[str, bool] = False) → pyspark.pandas.series.Series[source]¶ Replace values given in to_replace with value. Values of the Series are replaced with other values dynamically.
Note
For partial pattern matching, the replacement is against the whole string, which is different from pandas. That’s by the nature of underlying Spark API.
- Parameters
- to_replacestr, list, tuple, dict, Series, int, float, or None
How to find the values that will be replaced. * numeric, str:
numeric: numeric values equal to to_replace will be replaced with value
str: string exactly matching to_replace will be replaced with value
list of str or numeric:
if to_replace and value are both lists or tuples, they must be the same length.
str and numeric rules apply as above.
dict:
Dicts can be used to specify different replacement values for different existing values. For example, {‘a’: ‘b’, ‘y’: ‘z’} replaces the value ‘a’ with ‘b’ and ‘y’ with ‘z’. To use a dict in this way the value parameter should be None.
For a DataFrame a dict can specify that different values should be replaced in different columns. For example, {‘a’: 1, ‘b’: ‘z’} looks for the value 1 in column ‘a’ and the value ‘z’ in column ‘b’ and replaces these values with whatever is specified in value. The value parameter should not be None in this case. You can treat this as a special case of passing two lists except that you are specifying the column to search in.
See the examples section for examples of each of these.
- valuescalar, dict, list, tuple, str default None
Value to replace any values matching to_replace with. For a DataFrame a dict of values can be used to specify which value to use for each column (columns not in the dict will not be filled). Regular expressions, strings and lists or dicts of such objects are also allowed.
- regex: bool or str, default False
Whether to interpret to_replace and/or value as regular expressions. If this is True then to_replace must be a string. Alternatively, this could be a regular expression in which case to_replace must be None.
- Returns
- Series
Object after replacement.
Examples
Scalar to_replace and value
>>> s = ps.Series([0, 1, 2, 3, 4]) >>> s 0 0 1 1 2 2 3 3 4 4 dtype: int64
>>> s.replace(0, 5) 0 5 1 1 2 2 3 3 4 4 dtype: int64
List-like to_replace
>>> s.replace([0, 4], 5000) 0 5000 1 1 2 2 3 3 4 5000 dtype: int64
>>> s.replace([1, 2, 3], [10, 20, 30]) 0 0 1 10 2 20 3 30 4 4 dtype: int64
Dict-like to_replace
>>> s.replace({1: 1000, 2: 2000, 3: 3000, 4: 4000}) 0 0 1 1000 2 2000 3 3000 4 4000 dtype: int64
Also support for MultiIndex
>>> midx = pd.MultiIndex([['lama', 'cow', 'falcon'], ... ['speed', 'weight', 'length']], ... [[0, 0, 0, 1, 1, 1, 2, 2, 2], ... [0, 1, 2, 0, 1, 2, 0, 1, 2]]) >>> s = ps.Series([45, 200, 1.2, 30, 250, 1.5, 320, 1, 0.3], ... index=midx) >>> s lama speed 45.0 weight 200.0 length 1.2 cow speed 30.0 weight 250.0 length 1.5 falcon speed 320.0 weight 1.0 length 0.3 dtype: float64
>>> s.replace(45, 450) lama speed 450.0 weight 200.0 length 1.2 cow speed 30.0 weight 250.0 length 1.5 falcon speed 320.0 weight 1.0 length 0.3 dtype: float64
>>> s.replace([45, 30, 320], 500) lama speed 500.0 weight 200.0 length 1.2 cow speed 500.0 weight 250.0 length 1.5 falcon speed 500.0 weight 1.0 length 0.3 dtype: float64
>>> s.replace({45: 450, 30: 300}) lama speed 450.0 weight 200.0 length 1.2 cow speed 300.0 weight 250.0 length 1.5 falcon speed 320.0 weight 1.0 length 0.3 dtype: float64
Regular expression to_replace
>>> psser = ps.Series(['bat', 'foo', 'bait', 'abc', 'bar', 'zoo']) >>> psser.replace(to_replace=r'^ba.$', value='new', regex=True) 0 new 1 foo 2 bait 3 abc 4 new 5 zoo dtype: object
>>> psser.replace(value='new', regex=r'^.oo$') 0 bat 1 new 2 bait 3 abc 4 bar 5 new dtype: object
For partial pattern matching, the replacement is against the whole string
>>> psser.replace('ba', 'xx', regex=True) 0 xx 1 foo 2 xx 3 abc 4 xx 5 zoo dtype: object