IDFModel¶
-
class
pyspark.mllib.feature.
IDFModel
(java_model: py4j.java_gateway.JavaObject)[source]¶ Represents an IDF model that can transform term frequency vectors.
New in version 1.2.0.
Methods
call
(name, *a)Call method of java_model
docFreq
()Returns the document frequency.
idf
()Returns the current IDF vector.
numDocs
()Returns number of documents evaluated to compute idf
transform
(x)Transforms term frequency (TF) vectors to TF-IDF vectors.
Methods Documentation
-
call
(name: str, *a: Any) → Any¶ Call method of java_model
-
idf
() → pyspark.mllib.linalg.Vector[source]¶ Returns the current IDF vector.
New in version 1.4.0.
-
transform
(x: Union[VectorLike, pyspark.rdd.RDD[VectorLike]]) → Union[pyspark.mllib.linalg.Vector, pyspark.rdd.RDD[pyspark.mllib.linalg.Vector]][source]¶ Transforms term frequency (TF) vectors to TF-IDF vectors.
If minDocFreq was set for the IDF calculation, the terms which occur in fewer than minDocFreq documents will have an entry of 0.
New in version 1.2.0.
- Parameters
- x
pyspark.mllib.linalg.Vector
orpyspark.RDD
an RDD of term frequency vectors or a term frequency vector
- x
- Returns
pyspark.mllib.linalg.Vector
orpyspark.RDD
an RDD of TF-IDF vectors or a TF-IDF vector
Notes
In Python, transform cannot currently be used within an RDD transformation or action. Call transform directly on the RDD instead.
-