HashingTF#
- class pyspark.mllib.feature.HashingTF(numFeatures=1048576)[source]#
- Maps a sequence of terms to their term frequencies using the hashing trick. - New in version 1.2.0. - Parameters
- numFeaturesint, optional
- number of features (default: 2^20) 
 
 - Notes - The terms must be hashable (can not be dict/set/list…). - Examples - >>> htf = HashingTF(100) >>> doc = "a a b b c d".split(" ") >>> htf.transform(doc) SparseVector(100, {...}) - Methods - indexOf(term)- Returns the index of the input term. - setBinary(value)- If True, term frequency vector will be binary such that non-zero term counts will be set to 1 (default: False) - transform(document)- Transforms the input document (list of terms) to term frequency vectors, or transform the RDD of document to RDD of term frequency vectors. - Methods Documentation