pyspark.pandas.Series.nunique#
- Series.nunique(dropna=True, approx=False, rsd=0.05)#
- Return number of unique elements in the object. Excludes NA values by default. - Parameters
- dropnabool, default True
- Don’t include NaN in the count. 
- approx: bool, default False
- If False, will use the exact algorithm and return the exact number of unique. If True, it uses the HyperLogLog approximate algorithm, which is significantly faster for large amount of data. Note: This parameter is specific to pandas-on-Spark and is not found in pandas. 
- rsd: float, default 0.05
- Maximum estimation error allowed in the HyperLogLog algorithm. Note: Just like - approxthis parameter is specific to pandas-on-Spark.
 
- Returns
- int
 
 - See also - DataFrame.nunique
- Method nunique for DataFrame. 
- Series.count
- Count non-NA/null observations in the Series. 
 - Examples - >>> ps.Series([1, 2, 3, np.nan]).nunique() 3 - >>> ps.Series([1, 2, 3, np.nan]).nunique(dropna=False) 4 - On big data, we recommend using the approximate algorithm to speed up this function. The result will be very close to the exact unique count. - >>> ps.Series([1, 2, 3, np.nan]).nunique(approx=True) 3 - >>> idx = ps.Index([1, 1, 2, None]) >>> idx Index([1.0, 1.0, 2.0, nan], dtype='float64') - >>> idx.nunique() 2 - >>> idx.nunique(dropna=False) 3