spark_pipeline_stage {sparklyr} | R Documentation |
Helper function to create pipeline stage objects with common parameter setters.
spark_pipeline_stage(sc, class, uid, features_col = NULL, label_col = NULL, prediction_col = NULL, probability_col = NULL, raw_prediction_col = NULL, k = NULL, max_iter = NULL, seed = NULL, input_col = NULL, input_cols = NULL, output_col = NULL, output_cols = NULL)
sc |
A 'spark_connection' object. |
class |
Class name for the pipeline stage. |
uid |
A character string used to uniquely identify the ML estimator. |
features_col |
Features column name, as a length-one character vector. The column should be single vector column of numeric values. Usually this column is output by |
label_col |
Label column name. The column should be a numeric column. Usually this column is output by |
prediction_col |
Prediction column name. |
probability_col |
Column name for predicted class conditional probabilities. |
raw_prediction_col |
Raw prediction (a.k.a. confidence) column name. |
k |
The number of clusters to create |
max_iter |
The maximum number of iterations to use. |
seed |
A random seed. Set this value if you need your results to be reproducible across repeated calls. |
input_col |
The name of the input column. |
input_cols |
Names of input columns. |
output_col |
The name of the output column. |
thresholds |
Thresholds in multi-class classification to adjust the probability of predicting each class. Array must have length equal to the number of classes, with values > 0 excepting that at most one value may be 0. The class with largest value |
features_col |
Features column name, as a length-one character vector. The column should be single vector column of numeric values. Usually this column is output by |
input_cols |
Names of output columns. |