pyspark.sql.DataFrame.__getitem__¶
-
DataFrame.
__getitem__
(item: Union[int, str, pyspark.sql.column.Column, List, Tuple]) → Union[pyspark.sql.column.Column, pyspark.sql.dataframe.DataFrame][source]¶ Returns the column as a
Column
.New in version 1.3.0.
Changed in version 3.4.0: Supports Spark Connect.
- Parameters
- itemint, str,
Column
, list or tuple column index, column name, column, or a list or tuple of columns
- itemint, str,
- Returns
- :class:`Column` orclass:DataFrame
a specified column, or a filtered or projected dataframe.
Examples
>>> df = spark.createDataFrame([ ... (2, "Alice"), (5, "Bob")], schema=["age", "name"])
Retrieve a column instance.
>>> df.select(df['age']).show() +---+ |age| +---+ | 2| | 5| +---+
>>> df.select(df[1]).show() +-----+ | name| +-----+ |Alice| | Bob| +-----+
Select multiple string columns as index.
>>> df[["name", "age"]].show() +-----+---+ | name|age| +-----+---+ |Alice| 2| | Bob| 5| +-----+---+ >>> df[df.age > 3].show() +---+----+ |age|name| +---+----+ | 5| Bob| +---+----+ >>> df[df[0] > 3].show() +---+----+ |age|name| +---+----+ | 5| Bob| +---+----+