CoordinateMatrix¶
-
class
pyspark.mllib.linalg.distributed.
CoordinateMatrix
(entries: pyspark.rdd.RDD[Union[Tuple[int, int, float], pyspark.mllib.linalg.distributed.MatrixEntry]], numRows: int = 0, numCols: int = 0)[source]¶ Represents a matrix in coordinate format.
- Parameters
- entries
pyspark.RDD
An RDD of MatrixEntry inputs or (int, int, float) tuples.
- numRowsint, optional
Number of rows in the matrix. A non-positive value means unknown, at which point the number of rows will be determined by the max row index plus one.
- numColsint, optional
Number of columns in the matrix. A non-positive value means unknown, at which point the number of columns will be determined by the max row index plus one.
- entries
Methods
numCols
()Get or compute the number of cols.
numRows
()Get or compute the number of rows.
toBlockMatrix
([rowsPerBlock, colsPerBlock])Convert this matrix to a BlockMatrix.
Convert this matrix to an IndexedRowMatrix.
Convert this matrix to a RowMatrix.
Transpose this CoordinateMatrix.
Attributes
Entries of the CoordinateMatrix stored as an RDD of MatrixEntries.
Methods Documentation
-
numCols
() → int[source]¶ Get or compute the number of cols.
Examples
>>> entries = sc.parallelize([MatrixEntry(0, 0, 1.2), ... MatrixEntry(1, 0, 2), ... MatrixEntry(2, 1, 3.7)])
>>> mat = CoordinateMatrix(entries) >>> print(mat.numCols()) 2
>>> mat = CoordinateMatrix(entries, 7, 6) >>> print(mat.numCols()) 6
-
numRows
() → int[source]¶ Get or compute the number of rows.
Examples
>>> entries = sc.parallelize([MatrixEntry(0, 0, 1.2), ... MatrixEntry(1, 0, 2), ... MatrixEntry(2, 1, 3.7)])
>>> mat = CoordinateMatrix(entries) >>> print(mat.numRows()) 3
>>> mat = CoordinateMatrix(entries, 7, 6) >>> print(mat.numRows()) 7
-
toBlockMatrix
(rowsPerBlock: int = 1024, colsPerBlock: int = 1024) → pyspark.mllib.linalg.distributed.BlockMatrix[source]¶ Convert this matrix to a BlockMatrix.
- Parameters
- rowsPerBlockint, optional
Number of rows that make up each block. The blocks forming the final rows are not required to have the given number of rows.
- colsPerBlockint, optional
Number of columns that make up each block. The blocks forming the final columns are not required to have the given number of columns.
Examples
>>> entries = sc.parallelize([MatrixEntry(0, 0, 1.2), ... MatrixEntry(6, 4, 2.1)]) >>> mat = CoordinateMatrix(entries).toBlockMatrix()
>>> # This CoordinateMatrix will have 7 effective rows, due to >>> # the highest row index being 6, and the ensuing >>> # BlockMatrix will have 7 rows as well. >>> print(mat.numRows()) 7
>>> # This CoordinateMatrix will have 5 columns, due to the >>> # highest column index being 4, and the ensuing >>> # BlockMatrix will have 5 columns as well. >>> print(mat.numCols()) 5
-
toIndexedRowMatrix
() → pyspark.mllib.linalg.distributed.IndexedRowMatrix[source]¶ Convert this matrix to an IndexedRowMatrix.
Examples
>>> entries = sc.parallelize([MatrixEntry(0, 0, 1.2), ... MatrixEntry(6, 4, 2.1)]) >>> mat = CoordinateMatrix(entries).toIndexedRowMatrix()
>>> # This CoordinateMatrix will have 7 effective rows, due to >>> # the highest row index being 6, and the ensuing >>> # IndexedRowMatrix will have 7 rows as well. >>> print(mat.numRows()) 7
>>> # This CoordinateMatrix will have 5 columns, due to the >>> # highest column index being 4, and the ensuing >>> # IndexedRowMatrix will have 5 columns as well. >>> print(mat.numCols()) 5
-
toRowMatrix
() → pyspark.mllib.linalg.distributed.RowMatrix[source]¶ Convert this matrix to a RowMatrix.
Examples
>>> entries = sc.parallelize([MatrixEntry(0, 0, 1.2), ... MatrixEntry(6, 4, 2.1)]) >>> mat = CoordinateMatrix(entries).toRowMatrix()
>>> # This CoordinateMatrix will have 7 effective rows, due to >>> # the highest row index being 6, but the ensuing RowMatrix >>> # will only have 2 rows since there are only entries on 2 >>> # unique rows. >>> print(mat.numRows()) 2
>>> # This CoordinateMatrix will have 5 columns, due to the >>> # highest column index being 4, and the ensuing RowMatrix >>> # will have 5 columns as well. >>> print(mat.numCols()) 5
-
transpose
() → pyspark.mllib.linalg.distributed.CoordinateMatrix[source]¶ Transpose this CoordinateMatrix.
New in version 2.0.0.
Examples
>>> entries = sc.parallelize([MatrixEntry(0, 0, 1.2), ... MatrixEntry(1, 0, 2), ... MatrixEntry(2, 1, 3.7)]) >>> mat = CoordinateMatrix(entries) >>> mat_transposed = mat.transpose()
>>> print(mat_transposed.numRows()) 2
>>> print(mat_transposed.numCols()) 3
Attributes Documentation
-
entries
¶ Entries of the CoordinateMatrix stored as an RDD of MatrixEntries.
Examples
>>> mat = CoordinateMatrix(sc.parallelize([MatrixEntry(0, 0, 1.2), ... MatrixEntry(6, 4, 2.1)])) >>> entries = mat.entries >>> entries.first() MatrixEntry(0, 0, 1.2)