Returns an array of optimal splits for all nodes at a given level.
Returns an array of optimal splits for all nodes at a given level. Splits the task into multiple groups if the level-wise training task could lead to memory overflow.
RDD of org.apache.spark.mllib.regression.LabeledPoint used as training data for DecisionTree
Impurities for all parent nodes for the current level
org.apache.spark.mllib.tree.configuration.Strategy instance containing parameters for construction the DecisionTree
Level of the tree
Filters for all nodes at a given level
possible splits for all features
possible bins for all features
the deepest level for single-group level-wise computation.
array of splits with best splits for all nodes at a given level.
Returns split and bins for decision tree calculation.
Returns split and bins for decision tree calculation.
RDD of org.apache.spark.mllib.regression.LabeledPoint used as training data for DecisionTree
org.apache.spark.mllib.tree.configuration.Strategy instance containing parameters for construction the DecisionTree
a tuple of (splits,bins) where splits is an Array of [org.apache.spark.mllib.tree .model.Split] of size (numFeatures, numSplits-1) and bins is an Array of [org.apache .spark.mllib.tree.model.Bin] of size (numFeatures, numSplits1)
Method to train a decision tree model where the instances are represented as an RDD of (label, features) pairs.
Method to train a decision tree model where the instances are represented as an RDD of (label, features) pairs. The decision tree method supports binary classification and regression. For the binary classification, the label for each instance should either be 0 or 1 to denote the two classes. The method also supports categorical features inputs where the number of categories can specified using the categoricalFeaturesInfo option.
input RDD of org.apache.spark.mllib.regression.LabeledPoint used as training data for DecisionTree
classification or regression
criterion used for information gain calculation
maximum depth of the tree
maximum number of bins used for splitting features
algorithm for calculating quantiles
A map storing information about the categorical variables and the number of discrete values they take. For example, an entry (n -> k) implies the feature n is categorical with k categories 0, 1, 2, ... , k-1. It's important to note that features are zero-indexed.
a DecisionTreeModel that can be used for prediction
Method to train a decision tree model where the instances are represented as an RDD of (label, features) pairs.
Method to train a decision tree model where the instances are represented as an RDD of (label, features) pairs. The method supports binary classification and regression. For the binary classification, the label for each instance should either be 0 or 1 to denote the two classes.
input RDD of org.apache.spark.mllib.regression.LabeledPoint used as training data
algorithm, classification or regression
impurity criterion used for information gain calculation
maxDepth maximum depth of the tree
a DecisionTreeModel that can be used for prediction
Method to train a decision tree model where the instances are represented as an RDD of (label, features) pairs.
Method to train a decision tree model where the instances are represented as an RDD of (label, features) pairs. The method supports binary classification and regression. For the binary classification, the label for each instance should either be 0 or 1 to denote the two classes. The parameters for the algorithm are specified using the strategy parameter.
RDD of org.apache.spark.mllib.regression.LabeledPoint used as training data for DecisionTree
The configuration parameters for the tree algorithm which specify the type of algorithm (classification, regression, etc.), feature type (continuous, categorical), depth of the tree, quantile calculation strategy, etc.
a DecisionTreeModel that can be used for prediction