- abort(WriterCommitMessage[]) - Method in interface org.apache.spark.sql.sources.v2.writer.DataSourceWriter
- abort() - Method in interface org.apache.spark.sql.sources.v2.writer.DataWriter
Aborts this writer if it is failed.
- abort(long, WriterCommitMessage[]) - Method in interface org.apache.spark.sql.sources.v2.writer.streaming.StreamWriter
- abort(WriterCommitMessage[]) - Method in interface org.apache.spark.sql.sources.v2.writer.streaming.StreamWriter
- abortJob(JobContext) - Method in class org.apache.spark.internal.io.FileCommitProtocol
Aborts a job after the writes fail.
- abortJob(JobContext) - Method in class org.apache.spark.internal.io.HadoopMapReduceCommitProtocol
- abortTask(TaskAttemptContext) - Method in class org.apache.spark.internal.io.FileCommitProtocol
Aborts a task after the writes have failed.
- abortTask(TaskAttemptContext) - Method in class org.apache.spark.internal.io.HadoopMapReduceCommitProtocol
- abs(Column) - Static method in class org.apache.spark.sql.functions
Computes the absolute value of a numeric value.
- abs() - Method in class org.apache.spark.sql.types.Decimal
- absent() - Static method in class org.apache.spark.api.java.Optional
- AbsoluteError - Class in org.apache.spark.mllib.tree.loss
:: DeveloperApi ::
Class for absolute error loss calculation (for regression).
- AbsoluteError() - Constructor for class org.apache.spark.mllib.tree.loss.AbsoluteError
- AbstractLauncher<T extends AbstractLauncher<T>> - Class in org.apache.spark.launcher
Base class for launcher implementations.
- accept(Parsers) - Static method in class org.apache.spark.ml.feature.RFormulaParser
- accept(ES, Function1<ES, List<Object>>) - Static method in class org.apache.spark.ml.feature.RFormulaParser
- accept(String, PartialFunction<Object, U>) - Static method in class org.apache.spark.ml.feature.RFormulaParser
- accept(Path) - Method in class org.apache.spark.ml.image.SamplePathFilter
- acceptIf(Function1<Object, Object>, Function1<Object, String>) - Static method in class org.apache.spark.ml.feature.RFormulaParser
- acceptMatch(String, PartialFunction<Object, U>) - Static method in class org.apache.spark.ml.feature.RFormulaParser
- acceptSeq(ES, Function1<ES, Iterable<Object>>) - Static method in class org.apache.spark.ml.feature.RFormulaParser
- acceptsType(DataType) - Method in class org.apache.spark.sql.types.ObjectType
- accId() - Method in class org.apache.spark.CleanAccum
- accumCleaned(long) - Method in interface org.apache.spark.CleanerListener
- Accumulable<R,T> - Class in org.apache.spark
- Accumulable(R, AccumulableParam<R, T>) - Constructor for class org.apache.spark.Accumulable
- accumulable(T, AccumulableParam<T, R>) - Method in class org.apache.spark.api.java.JavaSparkContext
- accumulable(T, String, AccumulableParam<T, R>) - Method in class org.apache.spark.api.java.JavaSparkContext
- accumulable(R, AccumulableParam<R, T>) - Method in class org.apache.spark.SparkContext
- accumulable(R, String, AccumulableParam<R, T>) - Method in class org.apache.spark.SparkContext
- accumulableCollection(R, Function1<R, Growable<T>>, ClassTag<R>) - Method in class org.apache.spark.SparkContext
- AccumulableInfo - Class in org.apache.spark.scheduler
:: DeveloperApi ::
Information about an
modified during a task or stage.
- AccumulableInfo - Class in org.apache.spark.status.api.v1
- accumulableInfoFromJson(JsonAST.JValue) - Static method in class org.apache.spark.util.JsonProtocol
- accumulableInfoToJson(AccumulableInfo) - Static method in class org.apache.spark.util.JsonProtocol
- AccumulableParam<R,T> - Interface in org.apache.spark
- accumulables() - Method in class org.apache.spark.scheduler.StageInfo
Terminal values of accumulables updated during this stage, including all the user-defined
- accumulables() - Method in class org.apache.spark.scheduler.TaskInfo
Intermediate updates to accumulables during this task.
- accumulablesToJson(Traversable<AccumulableInfo>) - Static method in class org.apache.spark.util.JsonProtocol
- Accumulator<T> - Class in org.apache.spark
- accumulator(int) - Method in class org.apache.spark.api.java.JavaSparkContext
- accumulator(int, String) - Method in class org.apache.spark.api.java.JavaSparkContext
- accumulator(double) - Method in class org.apache.spark.api.java.JavaSparkContext
- accumulator(double, String) - Method in class org.apache.spark.api.java.JavaSparkContext
- accumulator(T, AccumulatorParam<T>) - Method in class org.apache.spark.api.java.JavaSparkContext
- accumulator(T, String, AccumulatorParam<T>) - Method in class org.apache.spark.api.java.JavaSparkContext
- accumulator(T, AccumulatorParam<T>) - Method in class org.apache.spark.SparkContext
- accumulator(T, String, AccumulatorParam<T>) - Method in class org.apache.spark.SparkContext
- AccumulatorContext - Class in org.apache.spark.util
An internal class used to track accumulators by Spark itself.
- AccumulatorContext() - Constructor for class org.apache.spark.util.AccumulatorContext
- AccumulatorParam<T> - Interface in org.apache.spark
- AccumulatorParam.DoubleAccumulatorParam$ - Class in org.apache.spark
- AccumulatorParam.FloatAccumulatorParam$ - Class in org.apache.spark
- AccumulatorParam.IntAccumulatorParam$ - Class in org.apache.spark
- AccumulatorParam.LongAccumulatorParam$ - Class in org.apache.spark
- AccumulatorParam.StringAccumulatorParam$ - Class in org.apache.spark
- ACCUMULATORS() - Static method in class org.apache.spark.status.TaskIndexNames
- accumulatorUpdates() - Method in class org.apache.spark.status.api.v1.StageData
- accumulatorUpdates() - Method in class org.apache.spark.status.api.v1.TaskData
- AccumulatorV2<IN,OUT> - Class in org.apache.spark.util
The base class for accumulators, that can accumulate inputs of type IN
, and produce output of
type OUT
- AccumulatorV2() - Constructor for class org.apache.spark.util.AccumulatorV2
- accumUpdates() - Method in class org.apache.spark.ExceptionFailure
- accumUpdates() - Method in class org.apache.spark.scheduler.SparkListenerExecutorMetricsUpdate
- accumUpdates() - Method in class org.apache.spark.TaskKilled
- accuracy() - Method in interface org.apache.spark.ml.classification.LogisticRegressionSummary
Returns accuracy.
- accuracy() - Method in class org.apache.spark.mllib.evaluation.MulticlassMetrics
Returns accuracy
(equals to the total number of correctly classified instances
out of the total number of instances.)
- accuracy() - Method in class org.apache.spark.mllib.evaluation.MultilabelMetrics
Returns accuracy
- acos(Column) - Static method in class org.apache.spark.sql.functions
- acos(String) - Static method in class org.apache.spark.sql.functions
- ActivationFunction - Interface in org.apache.spark.ml.ann
Trait for functions and their derivatives for functional layers
- active() - Static method in class org.apache.spark.sql.SparkSession
Returns the currently active SparkSession, otherwise the default one.
- active() - Method in class org.apache.spark.sql.streaming.StreamingQueryManager
Returns a list of active queries associated with this SQLContext
- active() - Method in class org.apache.spark.streaming.scheduler.ReceiverInfo
- ACTIVE() - Static method in class org.apache.spark.streaming.scheduler.ReceiverState
- activeStages() - Method in class org.apache.spark.status.LiveJob
- activeTasks() - Method in class org.apache.spark.status.api.v1.ExecutorSummary
- activeTasks() - Method in class org.apache.spark.status.LiveExecutor
- activeTasks() - Method in class org.apache.spark.status.LiveJob
- activeTasks() - Method in class org.apache.spark.status.LiveStage
- activeTasksPerExecutor() - Method in class org.apache.spark.status.LiveStage
- add(T) - Method in class org.apache.spark.Accumulable
Add more data to this accumulator / accumulable
- add(Vector) - Method in class org.apache.spark.ml.clustering.ExpectationAggregator
Add a new training instance to this ExpectationAggregator, update the weights,
means and covariances for each distributions, and update the log likelihood.
- add(Datum) - Method in interface org.apache.spark.ml.optim.aggregator.DifferentiableLossAggregator
Add a single data point to this aggregator.
- add(AFTPoint) - Method in class org.apache.spark.ml.regression.AFTAggregator
Add a new training data to this AFTAggregator, and update the loss and gradient
of the objective function.
- add(double[], MultivariateGaussian[], ExpectationSum, Vector<Object>) - Static method in class org.apache.spark.mllib.clustering.ExpectationSum
- add(Vector) - Method in class org.apache.spark.mllib.feature.IDF.DocumentFrequencyAggregator
Adds a new document.
- add(BlockMatrix) - Method in class org.apache.spark.mllib.linalg.distributed.BlockMatrix
Adds the given block matrix other
to this
block matrix: this + other
- add(Vector) - Method in class org.apache.spark.mllib.stat.MultivariateOnlineSummarizer
Add a new sample to this summarizer, and update the statistical summary.
- add(StructField) - Method in class org.apache.spark.sql.types.StructType
- add(String, DataType) - Method in class org.apache.spark.sql.types.StructType
Creates a new
by adding a new nullable field with no metadata.
- add(String, DataType, boolean) - Method in class org.apache.spark.sql.types.StructType
Creates a new
by adding a new field with no metadata.
- add(String, DataType, boolean, Metadata) - Method in class org.apache.spark.sql.types.StructType
Creates a new
by adding a new field and specifying metadata.
- add(String, DataType, boolean, String) - Method in class org.apache.spark.sql.types.StructType
Creates a new
by adding a new field and specifying metadata.
- add(String, String) - Method in class org.apache.spark.sql.types.StructType
Creates a new
by adding a new nullable field with no metadata where the
dataType is specified as a String.
- add(String, String, boolean) - Method in class org.apache.spark.sql.types.StructType
Creates a new
by adding a new field with no metadata where the
dataType is specified as a String.
- add(String, String, boolean, Metadata) - Method in class org.apache.spark.sql.types.StructType
Creates a new
by adding a new field and specifying metadata where the
dataType is specified as a String.
- add(String, String, boolean, String) - Method in class org.apache.spark.sql.types.StructType
Creates a new
by adding a new field and specifying metadata where the
dataType is specified as a String.
- add(long, long) - Static method in class org.apache.spark.streaming.util.RawTextHelper
- add(IN) - Method in class org.apache.spark.util.AccumulatorV2
Takes the inputs and accumulates.
- add(T) - Method in class org.apache.spark.util.CollectionAccumulator
- add(Double) - Method in class org.apache.spark.util.DoubleAccumulator
Adds v to the accumulator, i.e.
- add(double) - Method in class org.apache.spark.util.DoubleAccumulator
Adds v to the accumulator, i.e.
- add(T) - Method in class org.apache.spark.util.LegacyAccumulatorWrapper
- add(Long) - Method in class org.apache.spark.util.LongAccumulator
Adds v to the accumulator, i.e.
- add(long) - Method in class org.apache.spark.util.LongAccumulator
Adds v to the accumulator, i.e.
- add(Object) - Method in class org.apache.spark.util.sketch.CountMinSketch
Increments item
's count by one.
- add(Object, long) - Method in class org.apache.spark.util.sketch.CountMinSketch
Increments item
's count by count
- add_months(Column, int) - Static method in class org.apache.spark.sql.functions
Returns the date that is numMonths
after startDate
- addAccumulator(R, T) - Method in interface org.apache.spark.AccumulableParam
Add additional data to the accumulator value.
- addAccumulator(T, T) - Method in interface org.apache.spark.AccumulatorParam
- addAppArgs(String...) - Method in class org.apache.spark.launcher.AbstractLauncher
Adds command line arguments for the application.
- addAppArgs(String...) - Method in class org.apache.spark.launcher.SparkLauncher
- addBinary(byte[]) - Method in class org.apache.spark.util.sketch.CountMinSketch
Increments item
's count by one.
- addBinary(byte[], long) - Method in class org.apache.spark.util.sketch.CountMinSketch
Increments item
's count by count
- addDirectory(String, File) - Method in interface org.apache.spark.rpc.RpcEnvFileServer
Adds a local directory to be served via this file server.
- addFile(String) - Method in class org.apache.spark.api.java.JavaSparkContext
Add a file to be downloaded with this Spark job on every node.
- addFile(String, boolean) - Method in class org.apache.spark.api.java.JavaSparkContext
Add a file to be downloaded with this Spark job on every node.
- addFile(String) - Method in class org.apache.spark.launcher.AbstractLauncher
Adds a file to be submitted with the application.
- addFile(String) - Method in class org.apache.spark.launcher.SparkLauncher
- addFile(File) - Method in interface org.apache.spark.rpc.RpcEnvFileServer
Adds a file to be served by this RpcEnv.
- addFile(String) - Method in class org.apache.spark.SparkContext
Add a file to be downloaded with this Spark job on every node.
- addFile(String, boolean) - Method in class org.apache.spark.SparkContext
Add a file to be downloaded with this Spark job on every node.
- addFilters(Seq<ServletContextHandler>, SparkConf) - Static method in class org.apache.spark.ui.JettyUtils
Add filters, if any, to the given list of ServletContextHandlers
- addGrid(Param<T>, Iterable<T>) - Method in class org.apache.spark.ml.tuning.ParamGridBuilder
Adds a param with multiple values (overwrites if the input param exists).
- addGrid(DoubleParam, double[]) - Method in class org.apache.spark.ml.tuning.ParamGridBuilder
Adds a double param with multiple values.
- addGrid(IntParam, int[]) - Method in class org.apache.spark.ml.tuning.ParamGridBuilder
Adds an int param with multiple values.
- addGrid(FloatParam, float[]) - Method in class org.apache.spark.ml.tuning.ParamGridBuilder
Adds a float param with multiple values.
- addGrid(LongParam, long[]) - Method in class org.apache.spark.ml.tuning.ParamGridBuilder
Adds a long param with multiple values.
- addGrid(BooleanParam) - Method in class org.apache.spark.ml.tuning.ParamGridBuilder
Adds a boolean param with true and false.
- addInPlace(R, R) - Method in interface org.apache.spark.AccumulableParam
Merge two accumulated values together.
- addInPlace(double, double) - Method in class org.apache.spark.AccumulatorParam.DoubleAccumulatorParam$
- addInPlace(float, float) - Method in class org.apache.spark.AccumulatorParam.FloatAccumulatorParam$
- addInPlace(int, int) - Method in class org.apache.spark.AccumulatorParam.IntAccumulatorParam$
- addInPlace(long, long) - Method in class org.apache.spark.AccumulatorParam.LongAccumulatorParam$
- addInPlace(String, String) - Method in class org.apache.spark.AccumulatorParam.StringAccumulatorParam$
- addJar(String) - Method in class org.apache.spark.api.java.JavaSparkContext
Adds a JAR dependency for all tasks to be executed on this SparkContext in the future.
- addJar(String) - Method in class org.apache.spark.launcher.AbstractLauncher
Adds a jar file to be submitted with the application.
- addJar(String) - Method in class org.apache.spark.launcher.SparkLauncher
- addJar(File) - Method in interface org.apache.spark.rpc.RpcEnvFileServer
Adds a jar to be served by this RpcEnv.
- addJar(String) - Method in class org.apache.spark.SparkContext
Adds a JAR dependency for all tasks to be executed on this SparkContext
in the future.
- addJar(String) - Method in interface org.apache.spark.sql.hive.client.HiveClient
Add a jar into class loader
- addJar(String) - Method in class org.apache.spark.sql.hive.HiveSessionResourceLoader
- addListener(SparkAppHandle.Listener) - Method in interface org.apache.spark.launcher.SparkAppHandle
Adds a listener to be notified of changes to the handle's information.
- addListener(StreamingQueryListener) - Method in class org.apache.spark.sql.streaming.StreamingQueryManager
- addListener(L) - Method in interface org.apache.spark.util.ListenerBus
Add a listener to listen events.
- addLocalConfiguration(String, int, int, int, JobConf) - Static method in class org.apache.spark.rdd.HadoopRDD
Add Hadoop configuration specific to a single partition and attempt.
- addLong(long) - Method in class org.apache.spark.util.sketch.CountMinSketch
Increments item
's count by one.
- addLong(long, long) - Method in class org.apache.spark.util.sketch.CountMinSketch
Increments item
's count by count
- addMapOutput(int, MapStatus) - Method in class org.apache.spark.ShuffleStatus
Register a map output.
- addMetrics(TaskMetrics, TaskMetrics) - Static method in class org.apache.spark.status.LiveEntityHelpers
Add m2 values to m1.
- addPartition(LiveRDDPartition) - Method in class org.apache.spark.status.RDDPartitionSeq
- addPartToPGroup(Partition, PartitionGroup) - Method in class org.apache.spark.rdd.DefaultPartitionCoalescer
- addPyFile(String) - Method in class org.apache.spark.launcher.AbstractLauncher
Adds a python file / zip / egg to be submitted with the application.
- addPyFile(String) - Method in class org.apache.spark.launcher.SparkLauncher
- address() - Method in class org.apache.spark.BarrierTaskInfo
- address() - Method in class org.apache.spark.status.api.v1.RDDDataDistribution
- addSchedulable(Schedulable) - Method in interface org.apache.spark.scheduler.Schedulable
- addShutdownHook(Function0<BoxedUnit>) - Static method in class org.apache.spark.util.ShutdownHookManager
Adds a shutdown hook with default priority.
- addShutdownHook(int, Function0<BoxedUnit>) - Static method in class org.apache.spark.util.ShutdownHookManager
Adds a shutdown hook with the given priority.
- addSparkArg(String) - Method in class org.apache.spark.launcher.AbstractLauncher
Adds a no-value argument to the Spark invocation.
- addSparkArg(String, String) - Method in class org.apache.spark.launcher.AbstractLauncher
Adds an argument with a value to the Spark invocation.
- addSparkArg(String) - Method in class org.apache.spark.launcher.SparkLauncher
- addSparkArg(String, String) - Method in class org.apache.spark.launcher.SparkLauncher
- addSparkListener(SparkListenerInterface) - Method in class org.apache.spark.SparkContext
:: DeveloperApi ::
Register a listener to receive up-calls from events that happen during execution.
- addSparkVersionMetadata(RecordWriter<NullWritable, Writable>) - Static method in class org.apache.spark.sql.hive.orc.OrcFileFormat
Add a metadata specifying Spark version.
- addStreamingListener(StreamingListener) - Method in class org.apache.spark.streaming.api.java.JavaStreamingContext
- addStreamingListener(StreamingListener) - Method in class org.apache.spark.streaming.StreamingContext
- addString(String) - Method in class org.apache.spark.util.sketch.CountMinSketch
Increments item
's count by one.
- addString(String, long) - Method in class org.apache.spark.util.sketch.CountMinSketch
Increments item
's count by count
- addTaskCompletionListener(TaskCompletionListener) - Method in class org.apache.spark.BarrierTaskContext
- addTaskCompletionListener(TaskCompletionListener) - Method in class org.apache.spark.TaskContext
Adds a (Java friendly) listener to be executed on task completion.
- addTaskCompletionListener(Function1<TaskContext, U>) - Method in class org.apache.spark.TaskContext
Adds a listener in the form of a Scala closure to be executed on task completion.
- addTaskFailureListener(TaskFailureListener) - Method in class org.apache.spark.BarrierTaskContext
- addTaskFailureListener(TaskFailureListener) - Method in class org.apache.spark.TaskContext
Adds a listener to be executed on task failure.
- addTaskFailureListener(Function2<TaskContext, Throwable, BoxedUnit>) - Method in class org.apache.spark.TaskContext
Adds a listener to be executed on task failure.
- addTaskSetManager(Schedulable, Properties) - Method in interface org.apache.spark.scheduler.SchedulableBuilder
- addTime() - Method in class org.apache.spark.status.api.v1.ExecutorSummary
- addTime() - Method in class org.apache.spark.status.LiveExecutor
- addURL(URL) - Method in class org.apache.spark.util.MutableURLClassLoader
- AddWebUIFilter(String, Map<String, String>, String) - Constructor for class org.apache.spark.scheduler.cluster.CoarseGrainedClusterMessages.AddWebUIFilter
- AddWebUIFilter$() - Constructor for class org.apache.spark.scheduler.cluster.CoarseGrainedClusterMessages.AddWebUIFilter$
- AFTAggregator - Class in org.apache.spark.ml.regression
AFTAggregator computes the gradient and loss for a AFT loss function,
as used in AFT survival regression for samples in sparse or dense vector in an online fashion.
- AFTAggregator(Broadcast<DenseVector<Object>>, boolean, Broadcast<double[]>) - Constructor for class org.apache.spark.ml.regression.AFTAggregator
- AFTCostFun - Class in org.apache.spark.ml.regression
AFTCostFun implements Breeze's DiffFunction[T] for AFT cost.
- AFTCostFun(RDD<AFTPoint>, boolean, Broadcast<double[]>, int) - Constructor for class org.apache.spark.ml.regression.AFTCostFun
- AFTSurvivalRegression - Class in org.apache.spark.ml.regression
- AFTSurvivalRegression(String) - Constructor for class org.apache.spark.ml.regression.AFTSurvivalRegression
- AFTSurvivalRegression() - Constructor for class org.apache.spark.ml.regression.AFTSurvivalRegression
- AFTSurvivalRegressionModel - Class in org.apache.spark.ml.regression
- AFTSurvivalRegressionParams - Interface in org.apache.spark.ml.regression
Params for accelerated failure time (AFT) regression.
- agg(Column, Column...) - Method in class org.apache.spark.sql.Dataset
Aggregates on the entire Dataset without groups.
- agg(Tuple2<String, String>, Seq<Tuple2<String, String>>) - Method in class org.apache.spark.sql.Dataset
(Scala-specific) Aggregates on the entire Dataset without groups.
- agg(Map<String, String>) - Method in class org.apache.spark.sql.Dataset
(Scala-specific) Aggregates on the entire Dataset without groups.
- agg(Map<String, String>) - Method in class org.apache.spark.sql.Dataset
(Java-specific) Aggregates on the entire Dataset without groups.
- agg(Column, Seq<Column>) - Method in class org.apache.spark.sql.Dataset
Aggregates on the entire Dataset without groups.
- agg(TypedColumn<V, U1>) - Method in class org.apache.spark.sql.KeyValueGroupedDataset
Computes the given aggregation, returning a
of tuples for each unique key
and the result of computing this aggregation over all elements in the group.
- agg(TypedColumn<V, U1>, TypedColumn<V, U2>) - Method in class org.apache.spark.sql.KeyValueGroupedDataset
Computes the given aggregations, returning a
of tuples for each unique key
and the result of computing these aggregations over all elements in the group.
- agg(TypedColumn<V, U1>, TypedColumn<V, U2>, TypedColumn<V, U3>) - Method in class org.apache.spark.sql.KeyValueGroupedDataset
Computes the given aggregations, returning a
of tuples for each unique key
and the result of computing these aggregations over all elements in the group.
- agg(TypedColumn<V, U1>, TypedColumn<V, U2>, TypedColumn<V, U3>, TypedColumn<V, U4>) - Method in class org.apache.spark.sql.KeyValueGroupedDataset
Computes the given aggregations, returning a
of tuples for each unique key
and the result of computing these aggregations over all elements in the group.
- agg(Column, Column...) - Method in class org.apache.spark.sql.RelationalGroupedDataset
Compute aggregates by specifying a series of aggregate columns.
- agg(Tuple2<String, String>, Seq<Tuple2<String, String>>) - Method in class org.apache.spark.sql.RelationalGroupedDataset
(Scala-specific) Compute aggregates by specifying the column names and
aggregate methods.
- agg(Map<String, String>) - Method in class org.apache.spark.sql.RelationalGroupedDataset
(Scala-specific) Compute aggregates by specifying a map from column name to
aggregate methods.
- agg(Map<String, String>) - Method in class org.apache.spark.sql.RelationalGroupedDataset
(Java-specific) Compute aggregates by specifying a map from column name to
aggregate methods.
- agg(Column, Seq<Column>) - Method in class org.apache.spark.sql.RelationalGroupedDataset
Compute aggregates by specifying a series of aggregate columns.
- aggregate(U, Function2<U, T, U>, Function2<U, U, U>) - Method in interface org.apache.spark.api.java.JavaRDDLike
Aggregate the elements of each partition, and then the results for all the partitions, using
given combine functions and a neutral "zero value".
- aggregate(U, Function2<U, T, U>, Function2<U, U, U>, ClassTag<U>) - Method in class org.apache.spark.rdd.RDD
Aggregate the elements of each partition, and then the results for all the partitions, using
given combine functions and a neutral "zero value".
- aggregateByKey(U, Partitioner, Function2<U, V, U>, Function2<U, U, U>) - Method in class org.apache.spark.api.java.JavaPairRDD
Aggregate the values of each key, using given combine functions and a neutral "zero value".
- aggregateByKey(U, int, Function2<U, V, U>, Function2<U, U, U>) - Method in class org.apache.spark.api.java.JavaPairRDD
Aggregate the values of each key, using given combine functions and a neutral "zero value".
- aggregateByKey(U, Function2<U, V, U>, Function2<U, U, U>) - Method in class org.apache.spark.api.java.JavaPairRDD
Aggregate the values of each key, using given combine functions and a neutral "zero value".
- aggregateByKey(U, Partitioner, Function2<U, V, U>, Function2<U, U, U>, ClassTag<U>) - Method in class org.apache.spark.rdd.PairRDDFunctions
Aggregate the values of each key, using given combine functions and a neutral "zero value".
- aggregateByKey(U, int, Function2<U, V, U>, Function2<U, U, U>, ClassTag<U>) - Method in class org.apache.spark.rdd.PairRDDFunctions
Aggregate the values of each key, using given combine functions and a neutral "zero value".
- aggregateByKey(U, Function2<U, V, U>, Function2<U, U, U>, ClassTag<U>) - Method in class org.apache.spark.rdd.PairRDDFunctions
Aggregate the values of each key, using given combine functions and a neutral "zero value".
- AggregatedDialect - Class in org.apache.spark.sql.jdbc
AggregatedDialect can unify multiple dialects into one virtual Dialect.
- AggregatedDialect(List<JdbcDialect>) - Constructor for class org.apache.spark.sql.jdbc.AggregatedDialect
- aggregateMessages(Function1<EdgeContext<VD, ED, A>, BoxedUnit>, Function2<A, A, A>, TripletFields, ClassTag<A>) - Method in class org.apache.spark.graphx.Graph
Aggregates values from the neighboring edges and vertices of each vertex.
- aggregateMessagesWithActiveSet(Function1<EdgeContext<VD, ED, A>, BoxedUnit>, Function2<A, A, A>, TripletFields, Option<Tuple2<VertexRDD<?>, EdgeDirection>>, ClassTag<A>) - Method in class org.apache.spark.graphx.impl.GraphImpl
- aggregateUsingIndex(RDD<Tuple2<Object, VD2>>, Function2<VD2, VD2, VD2>, ClassTag<VD2>) - Method in class org.apache.spark.graphx.impl.VertexRDDImpl
- aggregateUsingIndex(RDD<Tuple2<Object, VD2>>, Function2<VD2, VD2, VD2>, ClassTag<VD2>) - Method in class org.apache.spark.graphx.VertexRDD
Aggregates vertices in messages
that have the same ids using reduceFunc
, returning a
VertexRDD co-indexed with this
- AggregatingEdgeContext<VD,ED,A> - Class in org.apache.spark.graphx.impl
- AggregatingEdgeContext(Function2<A, A, A>, Object, BitSet) - Constructor for class org.apache.spark.graphx.impl.AggregatingEdgeContext
- aggregationDepth() - Method in interface org.apache.spark.ml.param.shared.HasAggregationDepth
Param for suggested depth for treeAggregate (>= 2).
- Aggregator<K,V,C> - Class in org.apache.spark
:: DeveloperApi ::
A set of functions used to aggregate data.
- Aggregator(Function1<V, C>, Function2<C, V, C>, Function2<C, C, C>) - Constructor for class org.apache.spark.Aggregator
- aggregator() - Method in class org.apache.spark.ShuffleDependency
- Aggregator<IN,BUF,OUT> - Class in org.apache.spark.sql.expressions
:: Experimental ::
A base class for user-defined aggregations, which can be used in Dataset
operations to take
all of the elements of a group and reduce them to a single value.
- Aggregator() - Constructor for class org.apache.spark.sql.expressions.Aggregator
- aic(RDD<Tuple3<Object, Object, Object>>, double, double, double) - Method in class org.apache.spark.ml.regression.GeneralizedLinearRegression.Binomial$
- aic(RDD<Tuple3<Object, Object, Object>>, double, double, double) - Method in class org.apache.spark.ml.regression.GeneralizedLinearRegression.Gamma$
- aic(RDD<Tuple3<Object, Object, Object>>, double, double, double) - Method in class org.apache.spark.ml.regression.GeneralizedLinearRegression.Gaussian$
- aic(RDD<Tuple3<Object, Object, Object>>, double, double, double) - Method in class org.apache.spark.ml.regression.GeneralizedLinearRegression.Poisson$
- aic() - Method in class org.apache.spark.ml.regression.GeneralizedLinearRegressionSummary
Akaike Information Criterion (AIC) for the fitted model.
- Algo - Class in org.apache.spark.mllib.tree.configuration
Enum to select the algorithm for the decision tree
- Algo() - Constructor for class org.apache.spark.mllib.tree.configuration.Algo
- algo() - Method in class org.apache.spark.mllib.tree.configuration.Strategy
- algo() - Method in class org.apache.spark.mllib.tree.model.DecisionTreeModel
- algo() - Method in class org.apache.spark.mllib.tree.model.GradientBoostedTreesModel
- algo() - Method in class org.apache.spark.mllib.tree.model.RandomForestModel
- algorithm() - Method in class org.apache.spark.mllib.regression.StreamingLinearRegressionWithSGD
- alias(String) - Method in class org.apache.spark.sql.Column
Gives the column an alias.
- alias(String) - Method in class org.apache.spark.sql.Dataset
Returns a new Dataset with an alias set.
- alias(Symbol) - Method in class org.apache.spark.sql.Dataset
(Scala-specific) Returns a new Dataset with an alias set.
- All - Static variable in class org.apache.spark.graphx.TripletFields
Expose all the fields (source, edge, and destination).
- AllJobsCancelled - Class in org.apache.spark.scheduler
- AllJobsCancelled() - Constructor for class org.apache.spark.scheduler.AllJobsCancelled
- allocator() - Method in class org.apache.spark.storage.memory.SerializedValuesHolder
- AllReceiverIds - Class in org.apache.spark.streaming.scheduler
A message used by ReceiverTracker to ask all receiver's ids still stored in
- AllReceiverIds() - Constructor for class org.apache.spark.streaming.scheduler.AllReceiverIds
- allSources() - Static method in class org.apache.spark.metrics.source.StaticSources
The set of all static sources.
- alpha() - Method in interface org.apache.spark.ml.recommendation.ALSParams
Param for the alpha parameter in the implicit preference formulation (nonnegative).
- alpha() - Method in class org.apache.spark.mllib.random.WeibullGenerator
- ALS - Class in org.apache.spark.ml.recommendation
Alternating Least Squares (ALS) matrix factorization.
- ALS(String) - Constructor for class org.apache.spark.ml.recommendation.ALS
- ALS() - Constructor for class org.apache.spark.ml.recommendation.ALS
- ALS - Class in org.apache.spark.mllib.recommendation
Alternating Least Squares matrix factorization.
- ALS() - Constructor for class org.apache.spark.mllib.recommendation.ALS
Constructs an ALS instance with default parameters: {numBlocks: -1, rank: 10, iterations: 10,
lambda: 0.01, implicitPrefs: false, alpha: 1.0}.
- ALS.InBlock$ - Class in org.apache.spark.ml.recommendation
- ALS.LeastSquaresNESolver - Interface in org.apache.spark.ml.recommendation
Trait for least squares solvers applied to the normal equation.
- ALS.Rating<ID> - Class in org.apache.spark.ml.recommendation
:: DeveloperApi ::
Rating class for better code readability.
- ALS.Rating$ - Class in org.apache.spark.ml.recommendation
- ALS.RatingBlock$ - Class in org.apache.spark.ml.recommendation
- ALSModel - Class in org.apache.spark.ml.recommendation
Model fitted by ALS.
- ALSModelParams - Interface in org.apache.spark.ml.recommendation
Common params for ALS and ALSModel.
- ALSParams - Interface in org.apache.spark.ml.recommendation
Common params for ALS.
- alterDatabase(CatalogDatabase) - Method in interface org.apache.spark.sql.hive.client.HiveClient
Alter a database whose name matches the one specified in database
, assuming it exists.
- alterFunction(String, CatalogFunction) - Method in interface org.apache.spark.sql.hive.client.HiveClient
Alter a function whose name matches the one specified in `func`, assuming it exists.
- alterPartitions(String, String, Seq<CatalogTablePartition>) - Method in interface org.apache.spark.sql.hive.client.HiveClient
Alter one or more table partitions whose specs match the ones specified in newParts
assuming the partitions exist.
- alterTable(CatalogTable) - Method in interface org.apache.spark.sql.hive.client.HiveClient
Alter a table whose name matches the one specified in `table`, assuming it exists.
- alterTable(String, String, CatalogTable) - Method in interface org.apache.spark.sql.hive.client.HiveClient
Updates the given table with new metadata, optionally renaming the table or
moving across different database.
- alterTableDataSchema(String, String, StructType, Map<String, String>) - Method in interface org.apache.spark.sql.hive.client.HiveClient
Updates the given table with a new data schema and table properties, and keep everything else
- am() - Method in class org.apache.spark.scheduler.cluster.CoarseGrainedClusterMessages.RegisterClusterManager
- AnalysisException - Exception in org.apache.spark.sql
Thrown when a query fails to analyze, usually because the query itself is invalid.
- and(Column) - Method in class org.apache.spark.sql.Column
Boolean AND.
- And - Class in org.apache.spark.sql.sources
A filter that evaluates to true
iff both left
or right
evaluate to true
- And(Filter, Filter) - Constructor for class org.apache.spark.sql.sources.And
- antecedent() - Method in class org.apache.spark.mllib.fpm.AssociationRules.Rule
- ANY() - Static method in class org.apache.spark.scheduler.TaskLocality
- AnyDataType - Class in org.apache.spark.sql.types
An AbstractDataType
that matches any concrete data types.
- AnyDataType() - Constructor for class org.apache.spark.sql.types.AnyDataType
- anyNull() - Method in interface org.apache.spark.sql.Row
Returns true if there are any NULL values in this row.
- anyNull() - Method in class org.apache.spark.sql.vectorized.ColumnarRow
- ApiHelper - Class in org.apache.spark.ui.jobs
- ApiHelper() - Constructor for class org.apache.spark.ui.jobs.ApiHelper
- ApiRequestContext - Interface in org.apache.spark.status.api.v1
- appAttemptId() - Method in class org.apache.spark.scheduler.SparkListenerApplicationStart
- Append() - Static method in class org.apache.spark.sql.streaming.OutputMode
OutputMode in which only the new rows in the streaming DataFrame/Dataset will be
written to the sink.
- appendBias(Vector) - Static method in class org.apache.spark.mllib.util.MLUtils
Returns a new vector with 1.0
(bias) appended to the input vector.
- appendColumn(StructType, String, DataType, boolean) - Static method in class org.apache.spark.ml.util.SchemaUtils
Appends a new column to the input schema.
- appendColumn(StructType, StructField) - Static method in class org.apache.spark.ml.util.SchemaUtils
Appends a new column to the input schema.
- appendReadColumns(Configuration, Seq<Integer>, Seq<String>) - Static method in class org.apache.spark.sql.hive.HiveShim
- AppHistoryServerPlugin - Interface in org.apache.spark.status
An interface for creating history listeners(to replay event logs) defined in other modules like
SQL, and setup the UI of the plugin to rebuild the history UI.
- appId() - Method in interface org.apache.spark.scheduler.SchedulerBackend
- appId() - Method in class org.apache.spark.scheduler.SparkListenerApplicationStart
- appId() - Method in interface org.apache.spark.scheduler.TaskScheduler
- appId() - Method in interface org.apache.spark.status.api.v1.BaseAppResource
- APPLICATION_EXECUTOR_LIMIT() - Static method in class org.apache.spark.ui.ToolTips
- applicationAttemptId() - Method in interface org.apache.spark.scheduler.SchedulerBackend
Get the attempt ID for this run, if the cluster manager supports multiple
- applicationAttemptId() - Method in interface org.apache.spark.scheduler.TaskScheduler
Get an application's attempt ID associated with the job.
- applicationAttemptId() - Method in class org.apache.spark.SparkContext
- ApplicationAttemptInfo - Class in org.apache.spark.status.api.v1
- applicationEndFromJson(JsonAST.JValue) - Static method in class org.apache.spark.util.JsonProtocol
- applicationEndToJson(SparkListenerApplicationEnd) - Static method in class org.apache.spark.util.JsonProtocol
- ApplicationEnvironmentInfo - Class in org.apache.spark.status.api.v1
- applicationId() - Method in interface org.apache.spark.scheduler.SchedulerBackend
Get an application ID associated with the job.
- applicationId() - Method in interface org.apache.spark.scheduler.TaskScheduler
Get an application ID associated with the job.
- applicationId() - Method in class org.apache.spark.SparkContext
A unique identifier for the Spark application.
- ApplicationInfo - Class in org.apache.spark.status.api.v1
- applicationStartFromJson(JsonAST.JValue) - Static method in class org.apache.spark.util.JsonProtocol
- applicationStartToJson(SparkListenerApplicationStart) - Static method in class org.apache.spark.util.JsonProtocol
- ApplicationStatus - Enum in org.apache.spark.status.api.v1
- apply(T1) - Static method in class org.apache.spark.CleanAccum
- apply(T1) - Static method in class org.apache.spark.CleanBroadcast
- apply(T1) - Static method in class org.apache.spark.CleanCheckpoint
- apply(T1) - Static method in class org.apache.spark.CleanRDD
- apply(T1) - Static method in class org.apache.spark.CleanShuffle
- apply(T1, T2) - Static method in class org.apache.spark.ContextBarrierId
- apply(T1, T2, T3, T4, T5, T6, T7) - Static method in class org.apache.spark.ExceptionFailure
- apply(T1, T2, T3) - Static method in class org.apache.spark.ExecutorLostFailure
- apply(T1) - Static method in class org.apache.spark.ExecutorRegistered
- apply(T1) - Static method in class org.apache.spark.ExecutorRemoved
- apply(T1, T2, T3, T4, T5) - Static method in class org.apache.spark.FetchFailed
- apply(RDD<Tuple2<Object, VD>>, RDD<Edge<ED>>, VD, StorageLevel, StorageLevel, ClassTag<VD>, ClassTag<ED>) - Static method in class org.apache.spark.graphx.Graph
Construct a graph from a collection of vertices and
edges with attributes.
- apply(RDD<Edge<ED>>, VD, StorageLevel, StorageLevel, ClassTag<VD>, ClassTag<ED>) - Static method in class org.apache.spark.graphx.impl.GraphImpl
Create a graph from edges, setting referenced vertices to defaultVertexAttr
- apply(RDD<Tuple2<Object, VD>>, RDD<Edge<ED>>, VD, StorageLevel, StorageLevel, ClassTag<VD>, ClassTag<ED>) - Static method in class org.apache.spark.graphx.impl.GraphImpl
Create a graph from vertices and edges, setting missing vertices to defaultVertexAttr
- apply(VertexRDD<VD>, EdgeRDD<ED>, ClassTag<VD>, ClassTag<ED>) - Static method in class org.apache.spark.graphx.impl.GraphImpl
Create a graph from a VertexRDD and an EdgeRDD with arbitrary replicated vertices.
- apply(Graph<VD, ED>, A, int, EdgeDirection, Function3<Object, VD, A, VD>, Function1<EdgeTriplet<VD, ED>, Iterator<Tuple2<Object, A>>>, Function2<A, A, A>, ClassTag<VD>, ClassTag<ED>, ClassTag<A>) - Static method in class org.apache.spark.graphx.Pregel
Execute a Pregel-like iterative vertex-parallel abstraction.
- apply(RDD<Tuple2<Object, VD>>, ClassTag<VD>) - Static method in class org.apache.spark.graphx.VertexRDD
Constructs a standalone
(one that is not set up for efficient joins with an
) from an RDD of vertex-attribute pairs.
- apply(RDD<Tuple2<Object, VD>>, EdgeRDD<?>, VD, ClassTag<VD>) - Static method in class org.apache.spark.graphx.VertexRDD
Constructs a VertexRDD
from an RDD of vertex-attribute pairs.
- apply(RDD<Tuple2<Object, VD>>, EdgeRDD<?>, VD, Function2<VD, VD, VD>, ClassTag<VD>) - Static method in class org.apache.spark.graphx.VertexRDD
Constructs a VertexRDD
from an RDD of vertex-attribute pairs.
- apply(DenseMatrix<Object>, DenseMatrix<Object>, Function1<Object, Object>) - Static method in class org.apache.spark.ml.ann.ApplyInPlace
- apply(DenseMatrix<Object>, DenseMatrix<Object>, DenseMatrix<Object>, Function2<Object, Object, Object>) - Static method in class org.apache.spark.ml.ann.ApplyInPlace
- apply(String) - Method in class org.apache.spark.ml.attribute.AttributeGroup
Gets an attribute by its name.
- apply(int) - Method in class org.apache.spark.ml.attribute.AttributeGroup
Gets an attribute by its index.
- apply(T1, T2) - Static method in class org.apache.spark.ml.clustering.ClusterData
- apply(T1, T2) - Static method in class org.apache.spark.ml.feature.LabeledPoint
- apply(int, int) - Method in class org.apache.spark.ml.linalg.DenseMatrix
- apply(int) - Method in class org.apache.spark.ml.linalg.DenseVector
- apply(int, int) - Method in interface org.apache.spark.ml.linalg.Matrix
Gets the (i, j)-th element.
- apply(int, int) - Method in class org.apache.spark.ml.linalg.SparseMatrix
- apply(int) - Method in class org.apache.spark.ml.linalg.SparseVector
- apply(int) - Method in interface org.apache.spark.ml.linalg.Vector
Gets the value of the ith element.
- apply(Param<T>) - Method in class org.apache.spark.ml.param.ParamMap
Gets the value of the input param or its default value if it does not exist.
- apply(GeneralizedLinearRegressionBase) - Method in class org.apache.spark.ml.regression.GeneralizedLinearRegression.FamilyAndLink$
Constructs the FamilyAndLink object from a parameter map
- apply(Split) - Method in class org.apache.spark.ml.tree.DecisionTreeModelReadWrite.SplitData$
- apply(T1, T2, T3) - Static method in class org.apache.spark.mllib.classification.impl.GLMClassificationModel.SaveLoadV1_0$.Data
- apply(T1, T2, T3) - Static method in class org.apache.spark.mllib.classification.NaiveBayesModel.SaveLoadV1_0$.Data
- apply(T1, T2, T3, T4) - Static method in class org.apache.spark.mllib.classification.NaiveBayesModel.SaveLoadV2_0$.Data
- apply(BinaryConfusionMatrix) - Method in interface org.apache.spark.mllib.evaluation.binary.BinaryClassificationMetricComputer
- apply(BinaryConfusionMatrix) - Static method in class org.apache.spark.mllib.evaluation.binary.FalsePositiveRate
- apply(BinaryConfusionMatrix) - Static method in class org.apache.spark.mllib.evaluation.binary.Precision
- apply(BinaryConfusionMatrix) - Static method in class org.apache.spark.mllib.evaluation.binary.Recall
- apply(T1) - Static method in class org.apache.spark.mllib.feature.ChiSqSelectorModel.SaveLoadV1_0$.Data
- apply(T1, T2, T3, T4, T5) - Static method in class org.apache.spark.mllib.feature.VocabWord
- apply(int, int) - Method in class org.apache.spark.mllib.linalg.DenseMatrix
- apply(int) - Method in class org.apache.spark.mllib.linalg.DenseVector
- apply(T1, T2) - Static method in class org.apache.spark.mllib.linalg.distributed.IndexedRow
- apply(T1, T2, T3) - Static method in class org.apache.spark.mllib.linalg.distributed.MatrixEntry
- apply(int, int) - Method in interface org.apache.spark.mllib.linalg.Matrix
Gets the (i, j)-th element.
- apply(int, int) - Method in class org.apache.spark.mllib.linalg.SparseMatrix
- apply(int) - Method in class org.apache.spark.mllib.linalg.SparseVector
- apply(int) - Method in interface org.apache.spark.mllib.linalg.Vector
Gets the value of the ith element.
- apply(T1, T2, T3) - Static method in class org.apache.spark.mllib.recommendation.Rating
- apply(T1, T2) - Static method in class org.apache.spark.mllib.regression.impl.GLMRegressionModel.SaveLoadV1_0$.Data
- apply(T1, T2) - Static method in class org.apache.spark.mllib.stat.test.BinarySample
- apply(int) - Static method in class org.apache.spark.mllib.tree.configuration.Algo
- apply(int) - Static method in class org.apache.spark.mllib.tree.configuration.EnsembleCombiningStrategy
- apply(int) - Static method in class org.apache.spark.mllib.tree.configuration.FeatureType
- apply(int) - Static method in class org.apache.spark.mllib.tree.configuration.QuantileStrategy
- apply(int, Node) - Method in class org.apache.spark.mllib.tree.model.DecisionTreeModel.SaveLoadV1_0$.NodeData$
- apply(Row) - Method in class org.apache.spark.mllib.tree.model.DecisionTreeModel.SaveLoadV1_0$.NodeData$
- apply(int, Node) - Static method in class org.apache.spark.mllib.tree.model.DecisionTreeModel.SaveLoadV1_0$.NodeData
- apply(Row) - Static method in class org.apache.spark.mllib.tree.model.DecisionTreeModel.SaveLoadV1_0$.NodeData
- apply(Predict) - Method in class org.apache.spark.mllib.tree.model.DecisionTreeModel.SaveLoadV1_0$.PredictData$
- apply(Row) - Method in class org.apache.spark.mllib.tree.model.DecisionTreeModel.SaveLoadV1_0$.PredictData$
- apply(Predict) - Static method in class org.apache.spark.mllib.tree.model.DecisionTreeModel.SaveLoadV1_0$.PredictData
- apply(Row) - Static method in class org.apache.spark.mllib.tree.model.DecisionTreeModel.SaveLoadV1_0$.PredictData
- apply(Split) - Method in class org.apache.spark.mllib.tree.model.DecisionTreeModel.SaveLoadV1_0$.SplitData$
- apply(Row) - Method in class org.apache.spark.mllib.tree.model.DecisionTreeModel.SaveLoadV1_0$.SplitData$
- apply(Split) - Static method in class org.apache.spark.mllib.tree.model.DecisionTreeModel.SaveLoadV1_0$.SplitData
- apply(Row) - Static method in class org.apache.spark.mllib.tree.model.DecisionTreeModel.SaveLoadV1_0$.SplitData
- apply(int, Predict, double, boolean) - Static method in class org.apache.spark.mllib.tree.model.Node
Construct a node with nodeIndex, predict, impurity and isLeaf parameters.
- apply(T1, T2, T3, T4) - Static method in class org.apache.spark.mllib.tree.model.Split
- apply(int) - Static method in class org.apache.spark.rdd.CheckpointState
- apply(int) - Static method in class org.apache.spark.rdd.DeterministicLevel
- apply(long, String, Option<String>, String, boolean) - Static method in class org.apache.spark.scheduler.AccumulableInfo
- apply(long, String, Option<String>, String) - Static method in class org.apache.spark.scheduler.AccumulableInfo
- apply(long, String, String) - Static method in class org.apache.spark.scheduler.AccumulableInfo
- apply(T1, T2, T3, T4) - Static method in class org.apache.spark.scheduler.AskPermissionToCommitOutput
- apply(T1, T2) - Static method in class org.apache.spark.scheduler.BlacklistedExecutor
- apply(String, long, Enumeration.Value, ByteBuffer) - Method in class org.apache.spark.scheduler.cluster.CoarseGrainedClusterMessages.StatusUpdate$
Alternate factory method that takes a ByteBuffer directly for the data field
- apply(T1, T2, T3) - Static method in class org.apache.spark.scheduler.local.KillTask
- apply() - Static method in class org.apache.spark.scheduler.local.ReviveOffers
- apply(T1, T2, T3) - Static method in class org.apache.spark.scheduler.local.StatusUpdate
- apply() - Static method in class org.apache.spark.scheduler.local.StopExecutor
- apply(long, TaskMetrics) - Static method in class org.apache.spark.scheduler.RuntimePercentage
- apply(int) - Static method in class org.apache.spark.scheduler.SchedulingMode
- apply(T1) - Static method in class org.apache.spark.scheduler.SparkListenerApplicationEnd
- apply(T1, T2, T3, T4, T5, T6) - Static method in class org.apache.spark.scheduler.SparkListenerApplicationStart
- apply(T1, T2, T3, T4, T5) - Static method in class org.apache.spark.scheduler.SparkListenerBlockManagerAdded
- apply(T1, T2) - Static method in class org.apache.spark.scheduler.SparkListenerBlockManagerRemoved
- apply(T1) - Static method in class org.apache.spark.scheduler.SparkListenerBlockUpdated
- apply(T1) - Static method in class org.apache.spark.scheduler.SparkListenerEnvironmentUpdate
- apply(T1, T2, T3) - Static method in class org.apache.spark.scheduler.SparkListenerExecutorAdded
- apply(T1, T2, T3) - Static method in class org.apache.spark.scheduler.SparkListenerExecutorBlacklisted
- apply(T1, T2, T3, T4, T5) - Static method in class org.apache.spark.scheduler.SparkListenerExecutorBlacklistedForStage
- apply(T1, T2) - Static method in class org.apache.spark.scheduler.SparkListenerExecutorMetricsUpdate
- apply(T1, T2, T3) - Static method in class org.apache.spark.scheduler.SparkListenerExecutorRemoved
- apply(T1, T2) - Static method in class org.apache.spark.scheduler.SparkListenerExecutorUnblacklisted
- apply(T1, T2, T3) - Static method in class org.apache.spark.scheduler.SparkListenerJobEnd
- apply(T1, T2, T3, T4) - Static method in class org.apache.spark.scheduler.SparkListenerJobStart
- apply(T1) - Static method in class org.apache.spark.scheduler.SparkListenerLogStart
- apply(T1, T2, T3) - Static method in class org.apache.spark.scheduler.SparkListenerNodeBlacklisted
- apply(T1, T2, T3, T4, T5) - Static method in class org.apache.spark.scheduler.SparkListenerNodeBlacklistedForStage
- apply(T1, T2) - Static method in class org.apache.spark.scheduler.SparkListenerNodeUnblacklisted
- apply(T1) - Static method in class org.apache.spark.scheduler.SparkListenerSpeculativeTaskSubmitted
- apply(T1) - Static method in class org.apache.spark.scheduler.SparkListenerStageCompleted
- apply(T1, T2) - Static method in class org.apache.spark.scheduler.SparkListenerStageSubmitted
- apply(T1, T2, T3, T4, T5, T6) - Static method in class org.apache.spark.scheduler.SparkListenerTaskEnd
- apply(T1) - Static method in class org.apache.spark.scheduler.SparkListenerTaskGettingResult
- apply(T1, T2, T3) - Static method in class org.apache.spark.scheduler.SparkListenerTaskStart
- apply(T1) - Static method in class org.apache.spark.scheduler.SparkListenerUnpersistRDD
- apply(int) - Static method in class org.apache.spark.scheduler.TaskLocality
- apply(Object) - Method in class org.apache.spark.sql.Column
Extracts a value or values from a complex type.
- apply(String) - Method in class org.apache.spark.sql.Dataset
Selects column based on the column name and returns it as a
- apply(Column...) - Method in class org.apache.spark.sql.expressions.UserDefinedAggregateFunction
Creates a Column
for this UDAF using given Column
s as input arguments.
- apply(Seq<Column>) - Method in class org.apache.spark.sql.expressions.UserDefinedAggregateFunction
Creates a Column
for this UDAF using given Column
s as input arguments.
- apply(Column...) - Method in class org.apache.spark.sql.expressions.UserDefinedFunction
Returns an expression that invokes the UDF, using the given arguments.
- apply(Seq<Column>) - Method in class org.apache.spark.sql.expressions.UserDefinedFunction
Returns an expression that invokes the UDF, using the given arguments.
- apply(LogicalPlan) - Method in class org.apache.spark.sql.hive.DetermineTableStats
- apply(T1, T2, T3, T4) - Static method in class org.apache.spark.sql.hive.execution.CreateHiveTableAsSelectCommand
- apply(ScriptInputOutputSchema) - Static method in class org.apache.spark.sql.hive.execution.HiveScriptIOSchema
- apply(T1, T2, T3, T4, T5) - Static method in class org.apache.spark.sql.hive.execution.InsertIntoHiveDirCommand
- apply(T1, T2, T3, T4, T5, T6) - Static method in class org.apache.spark.sql.hive.execution.InsertIntoHiveTable
- apply(T1, T2, T3, T4, T5) - Static method in class org.apache.spark.sql.hive.execution.ScriptTransformationExec
- apply(LogicalPlan) - Static method in class org.apache.spark.sql.hive.HiveAnalysis
- apply(LogicalPlan) - Method in class org.apache.spark.sql.hive.HiveStrategies.HiveTableScans$
- apply(LogicalPlan) - Static method in class org.apache.spark.sql.hive.HiveStrategies.HiveTableScans
- apply(LogicalPlan) - Method in class org.apache.spark.sql.hive.HiveStrategies.Scripts$
- apply(LogicalPlan) - Static method in class org.apache.spark.sql.hive.HiveStrategies.Scripts
- apply(T1, T2) - Static method in class org.apache.spark.sql.hive.HiveUDAFBuffer
- apply(LogicalPlan) - Method in class org.apache.spark.sql.hive.RelationConversions
- apply(LogicalPlan) - Method in class org.apache.spark.sql.hive.ResolveHiveSerdeTable
- apply(T1, T2) - Static method in class org.apache.spark.sql.jdbc.JdbcType
- apply(Dataset<Row>, Seq<Expression>, RelationalGroupedDataset.GroupType) - Static method in class org.apache.spark.sql.RelationalGroupedDataset
- apply(int) - Method in interface org.apache.spark.sql.Row
Returns the value at position i.
- apply(T1, T2) - Static method in class org.apache.spark.sql.sources.And
- apply(T1, T2) - Static method in class org.apache.spark.sql.sources.EqualNullSafe
- apply(T1, T2) - Static method in class org.apache.spark.sql.sources.EqualTo
- apply(T1, T2) - Static method in class org.apache.spark.sql.sources.GreaterThan
- apply(T1, T2) - Static method in class org.apache.spark.sql.sources.GreaterThanOrEqual
- apply(T1, T2) - Static method in class org.apache.spark.sql.sources.In
- apply(T1) - Static method in class org.apache.spark.sql.sources.IsNotNull
- apply(T1) - Static method in class org.apache.spark.sql.sources.IsNull
- apply(T1, T2) - Static method in class org.apache.spark.sql.sources.LessThan
- apply(T1, T2) - Static method in class org.apache.spark.sql.sources.LessThanOrEqual
- apply(T1) - Static method in class org.apache.spark.sql.sources.Not
- apply(T1, T2) - Static method in class org.apache.spark.sql.sources.Or
- apply(T1, T2) - Static method in class org.apache.spark.sql.sources.StringContains
- apply(T1, T2) - Static method in class org.apache.spark.sql.sources.StringEndsWith
- apply(T1, T2) - Static method in class org.apache.spark.sql.sources.StringStartsWith
- apply(String) - Static method in class org.apache.spark.sql.streaming.ProcessingTime
- apply(Duration) - Static method in class org.apache.spark.sql.streaming.ProcessingTime
- apply(DataType) - Static method in class org.apache.spark.sql.types.ArrayType
Construct a
object with the given element type.
- apply(T1) - Static method in class org.apache.spark.sql.types.CharType
- apply(double) - Static method in class org.apache.spark.sql.types.Decimal
- apply(long) - Static method in class org.apache.spark.sql.types.Decimal
- apply(int) - Static method in class org.apache.spark.sql.types.Decimal
- apply(BigDecimal) - Static method in class org.apache.spark.sql.types.Decimal
- apply(BigDecimal) - Static method in class org.apache.spark.sql.types.Decimal
- apply(BigInteger) - Static method in class org.apache.spark.sql.types.Decimal
- apply(BigInt) - Static method in class org.apache.spark.sql.types.Decimal
- apply(BigDecimal, int, int) - Static method in class org.apache.spark.sql.types.Decimal
- apply(BigDecimal, int, int) - Static method in class org.apache.spark.sql.types.Decimal
- apply(long, int, int) - Static method in class org.apache.spark.sql.types.Decimal
- apply(String) - Static method in class org.apache.spark.sql.types.Decimal
- apply(DataType, DataType) - Static method in class org.apache.spark.sql.types.MapType
Construct a
object with the given key type and value type.
- apply(T1, T2, T3, T4) - Static method in class org.apache.spark.sql.types.StructField
- apply(String) - Method in class org.apache.spark.sql.types.StructType
- apply(Set<String>) - Method in class org.apache.spark.sql.types.StructType
Returns a
s of the given names, preserving the
original order of fields.
- apply(int) - Method in class org.apache.spark.sql.types.StructType
- apply(T1) - Static method in class org.apache.spark.sql.types.VarcharType
- apply(T1, T2, T3, T4, T5, T6, T7, T8) - Static method in class org.apache.spark.status.api.v1.ApplicationAttemptInfo
- apply(T1, T2, T3, T4, T5, T6, T7) - Static method in class org.apache.spark.status.api.v1.ApplicationInfo
- apply(T1) - Static method in class org.apache.spark.status.api.v1.StackTrace
- apply(T1, T2, T3, T4, T5, T6, T7) - Static method in class org.apache.spark.status.api.v1.ThreadStackTrace
- apply(int) - Method in class org.apache.spark.status.RDDPartitionSeq
- apply(String) - Static method in class org.apache.spark.storage.BlockId
- apply(String, String, int, Option<String>) - Static method in class org.apache.spark.storage.BlockManagerId
- apply(ObjectInput) - Static method in class org.apache.spark.storage.BlockManagerId
- apply(T1, T2) - Static method in class org.apache.spark.storage.BroadcastBlockId
- apply(T1, T2) - Static method in class org.apache.spark.storage.RDDBlockId
- apply(T1, T2, T3) - Static method in class org.apache.spark.storage.ShuffleBlockId
- apply(T1, T2, T3) - Static method in class org.apache.spark.storage.ShuffleDataBlockId
- apply(T1, T2, T3) - Static method in class org.apache.spark.storage.ShuffleIndexBlockId
- apply(boolean, boolean, boolean, boolean, int) - Static method in class org.apache.spark.storage.StorageLevel
:: DeveloperApi ::
Create a new StorageLevel object.
- apply(boolean, boolean, boolean, int) - Static method in class org.apache.spark.storage.StorageLevel
:: DeveloperApi ::
Create a new StorageLevel object without setting useOffHeap.
- apply(int, int) - Static method in class org.apache.spark.storage.StorageLevel
:: DeveloperApi ::
Create a new StorageLevel object from its integer representation.
- apply(ObjectInput) - Static method in class org.apache.spark.storage.StorageLevel
:: DeveloperApi ::
Read StorageLevel object from ObjectInput stream.
- apply(T1, T2) - Static method in class org.apache.spark.storage.StreamBlockId
- apply(T1) - Static method in class org.apache.spark.storage.TaskResultBlockId
- apply(T1) - Static method in class org.apache.spark.streaming.Duration
- apply(long) - Static method in class org.apache.spark.streaming.Milliseconds
- apply(long) - Static method in class org.apache.spark.streaming.Minutes
- apply(T1, T2, T3, T4, T5, T6) - Static method in class org.apache.spark.streaming.scheduler.BatchInfo
- apply(T1, T2, T3, T4, T5, T6, T7) - Static method in class org.apache.spark.streaming.scheduler.OutputOperationInfo
- apply(T1, T2, T3, T4, T5, T6, T7, T8) - Static method in class org.apache.spark.streaming.scheduler.ReceiverInfo
- apply(int) - Static method in class org.apache.spark.streaming.scheduler.ReceiverState
- apply(T1) - Static method in class org.apache.spark.streaming.scheduler.StreamingListenerBatchCompleted
- apply(T1) - Static method in class org.apache.spark.streaming.scheduler.StreamingListenerBatchStarted
- apply(T1) - Static method in class org.apache.spark.streaming.scheduler.StreamingListenerBatchSubmitted
- apply(T1) - Static method in class org.apache.spark.streaming.scheduler.StreamingListenerOutputOperationCompleted
- apply(T1) - Static method in class org.apache.spark.streaming.scheduler.StreamingListenerOutputOperationStarted
- apply(T1) - Static method in class org.apache.spark.streaming.scheduler.StreamingListenerReceiverError
- apply(T1) - Static method in class org.apache.spark.streaming.scheduler.StreamingListenerReceiverStarted
- apply(T1) - Static method in class org.apache.spark.streaming.scheduler.StreamingListenerReceiverStopped
- apply(T1) - Static method in class org.apache.spark.streaming.scheduler.StreamingListenerStreamingStarted
- apply(long) - Static method in class org.apache.spark.streaming.Seconds
- apply(T1, T2, T3) - Static method in class org.apache.spark.TaskCommitDenied
- apply(T1, T2, T3) - Static method in class org.apache.spark.TaskKilled
- apply(int) - Static method in class org.apache.spark.TaskState
- apply(TraversableOnce<Object>) - Static method in class org.apache.spark.util.StatCounter
Build a StatCounter from a list of values.
- apply(Seq<Object>) - Static method in class org.apache.spark.util.StatCounter
Build a StatCounter from a list of values passed as variable-length arguments.
- ApplyInPlace - Class in org.apache.spark.ml.ann
Implements in-place application of functions in the arrays
- ApplyInPlace() - Constructor for class org.apache.spark.ml.ann.ApplyInPlace
- applySchema(RDD<Row>, StructType) - Method in class org.apache.spark.sql.SQLContext
- applySchema(JavaRDD<Row>, StructType) - Method in class org.apache.spark.sql.SQLContext
- applySchema(RDD<?>, Class<?>) - Method in class org.apache.spark.sql.SQLContext
- applySchema(JavaRDD<?>, Class<?>) - Method in class org.apache.spark.sql.SQLContext
- appName() - Method in class org.apache.spark.api.java.JavaSparkContext
- appName() - Method in class org.apache.spark.scheduler.SparkListenerApplicationStart
- appName() - Method in class org.apache.spark.SparkContext
- appName(String) - Method in class org.apache.spark.sql.SparkSession.Builder
Sets a name for the application, which will be shown in the Spark web UI.
- approx_count_distinct(Column) - Static method in class org.apache.spark.sql.functions
Aggregate function: returns the approximate number of distinct items in a group.
- approx_count_distinct(String) - Static method in class org.apache.spark.sql.functions
Aggregate function: returns the approximate number of distinct items in a group.
- approx_count_distinct(Column, double) - Static method in class org.apache.spark.sql.functions
Aggregate function: returns the approximate number of distinct items in a group.
- approx_count_distinct(String, double) - Static method in class org.apache.spark.sql.functions
Aggregate function: returns the approximate number of distinct items in a group.
- approxCountDistinct(Column) - Static method in class org.apache.spark.sql.functions
- approxCountDistinct(String) - Static method in class org.apache.spark.sql.functions
- approxCountDistinct(Column, double) - Static method in class org.apache.spark.sql.functions
- approxCountDistinct(String, double) - Static method in class org.apache.spark.sql.functions
- ApproxHist() - Static method in class org.apache.spark.mllib.tree.configuration.QuantileStrategy
- ApproximateEvaluator<U,R> - Interface in org.apache.spark.partial
An object that computes a function incrementally by merging in results of type U from multiple
- approxQuantile(String, double[], double) - Method in class org.apache.spark.sql.DataFrameStatFunctions
Calculates the approximate quantiles of a numerical column of a DataFrame.
- approxQuantile(String[], double[], double) - Method in class org.apache.spark.sql.DataFrameStatFunctions
Calculates the approximate quantiles of numerical columns of a DataFrame.
- appSparkVersion() - Method in class org.apache.spark.status.api.v1.ApplicationAttemptInfo
- AppStatusUtils - Class in org.apache.spark.status
- AppStatusUtils() - Constructor for class org.apache.spark.status.AppStatusUtils
- AreaUnderCurve - Class in org.apache.spark.mllib.evaluation
Computes the area under the curve (AUC) using the trapezoidal rule.
- AreaUnderCurve() - Constructor for class org.apache.spark.mllib.evaluation.AreaUnderCurve
- areaUnderPR() - Method in class org.apache.spark.mllib.evaluation.BinaryClassificationMetrics
Computes the area under the precision-recall curve.
- areaUnderROC() - Method in interface org.apache.spark.ml.classification.BinaryLogisticRegressionSummary
Computes the area under the receiver operating characteristic (ROC) curve.
- areaUnderROC() - Method in class org.apache.spark.mllib.evaluation.BinaryClassificationMetrics
Computes the area under the receiver operating characteristic (ROC) curve.
- argmax() - Method in class org.apache.spark.ml.linalg.DenseVector
- argmax() - Method in class org.apache.spark.ml.linalg.SparseVector
- argmax() - Method in interface org.apache.spark.ml.linalg.Vector
Find the index of a maximal element.
- argmax() - Method in class org.apache.spark.mllib.linalg.DenseVector
- argmax() - Method in class org.apache.spark.mllib.linalg.SparseVector
- argmax() - Method in interface org.apache.spark.mllib.linalg.Vector
Find the index of a maximal element.
- argString() - Method in class org.apache.spark.sql.hive.execution.CreateHiveTableAsSelectCommand
- array(DataType) - Method in class org.apache.spark.sql.ColumnName
Creates a new StructField
of type array.
- array(Column...) - Static method in class org.apache.spark.sql.functions
Creates a new array column.
- array(String, String...) - Static method in class org.apache.spark.sql.functions
Creates a new array column.
- array(Seq<Column>) - Static method in class org.apache.spark.sql.functions
Creates a new array column.
- array(String, Seq<String>) - Static method in class org.apache.spark.sql.functions
Creates a new array column.
- array() - Method in class org.apache.spark.sql.vectorized.ColumnarArray
- array_contains(Column, Object) - Static method in class org.apache.spark.sql.functions
Returns null if the array is null, true if the array contains value
, and false otherwise.
- array_distinct(Column) - Static method in class org.apache.spark.sql.functions
Removes duplicate values from the array.
- array_except(Column, Column) - Static method in class org.apache.spark.sql.functions
Returns an array of the elements in the first array but not in the second array,
without duplicates.
- array_intersect(Column, Column) - Static method in class org.apache.spark.sql.functions
Returns an array of the elements in the intersection of the given two arrays,
without duplicates.
- array_join(Column, String, String) - Static method in class org.apache.spark.sql.functions
Concatenates the elements of column
using the delimiter
- array_join(Column, String) - Static method in class org.apache.spark.sql.functions
Concatenates the elements of column
using the delimiter
- array_max(Column) - Static method in class org.apache.spark.sql.functions
Returns the maximum value in the array.
- array_min(Column) - Static method in class org.apache.spark.sql.functions
Returns the minimum value in the array.
- array_position(Column, Object) - Static method in class org.apache.spark.sql.functions
Locates the position of the first occurrence of the value in the given array as long.
- array_remove(Column, Object) - Static method in class org.apache.spark.sql.functions
Remove all elements that equal to element from the given array.
- array_repeat(Column, Column) - Static method in class org.apache.spark.sql.functions
Creates an array containing the left argument repeated the number of times given by the
right argument.
- array_repeat(Column, int) - Static method in class org.apache.spark.sql.functions
Creates an array containing the left argument repeated the number of times given by the
right argument.
- array_sort(Column) - Static method in class org.apache.spark.sql.functions
Sorts the input array in ascending order.
- array_union(Column, Column) - Static method in class org.apache.spark.sql.functions
Returns an array of the elements in the union of the given two arrays, without duplicates.
- arrayLengthGt(double) - Static method in class org.apache.spark.ml.param.ParamValidators
Check that the array length is greater than lowerBound.
- arrays_overlap(Column, Column) - Static method in class org.apache.spark.sql.functions
Returns true
if a1
and a2
have at least one non-null element in common.
- arrays_zip(Column...) - Static method in class org.apache.spark.sql.functions
Returns a merged array of structs in which the N-th struct contains all N-th values of input
- arrays_zip(Seq<Column>) - Static method in class org.apache.spark.sql.functions
Returns a merged array of structs in which the N-th struct contains all N-th values of input
- ArrayType - Class in org.apache.spark.sql.types
- ArrayType(DataType, boolean) - Constructor for class org.apache.spark.sql.types.ArrayType
- arrayValues() - Method in class org.apache.spark.storage.memory.DeserializedValuesHolder
- ArrowColumnVector - Class in org.apache.spark.sql.vectorized
A column vector backed by Apache Arrow.
- ArrowColumnVector(ValueVector) - Constructor for class org.apache.spark.sql.vectorized.ArrowColumnVector
- as(Encoder<U>) - Method in class org.apache.spark.sql.Column
Provides a type hint about the expected return value of this column.
- as(String) - Method in class org.apache.spark.sql.Column
Gives the column an alias.
- as(Seq<String>) - Method in class org.apache.spark.sql.Column
(Scala-specific) Assigns the given aliases to the results of a table generating function.
- as(String[]) - Method in class org.apache.spark.sql.Column
Assigns the given aliases to the results of a table generating function.
- as(Symbol) - Method in class org.apache.spark.sql.Column
Gives the column an alias.
- as(String, Metadata) - Method in class org.apache.spark.sql.Column
Gives the column an alias with metadata.
- as(Encoder<U>) - Method in class org.apache.spark.sql.Dataset
:: Experimental ::
Returns a new Dataset where each record has been mapped on to the specified type.
- as(String) - Method in class org.apache.spark.sql.Dataset
Returns a new Dataset with an alias set.
- as(Symbol) - Method in class org.apache.spark.sql.Dataset
(Scala-specific) Returns a new Dataset with an alias set.
- asBinary() - Method in interface org.apache.spark.ml.classification.LogisticRegressionSummary
Convenient method for casting to binary logistic regression summary.
- asBreeze() - Method in interface org.apache.spark.ml.linalg.Matrix
Converts to a breeze matrix.
- asBreeze() - Method in interface org.apache.spark.ml.linalg.Vector
Converts the instance to a breeze vector.
- asBreeze() - Method in interface org.apache.spark.mllib.linalg.Matrix
Converts to a breeze matrix.
- asBreeze() - Method in interface org.apache.spark.mllib.linalg.Vector
Converts the instance to a breeze vector.
- asc() - Method in class org.apache.spark.sql.Column
Returns a sort expression based on ascending order of the column.
- asc(String) - Static method in class org.apache.spark.sql.functions
Returns a sort expression based on ascending order of the column.
- asc_nulls_first() - Method in class org.apache.spark.sql.Column
Returns a sort expression based on ascending order of the column,
and null values return before non-null values.
- asc_nulls_first(String) - Static method in class org.apache.spark.sql.functions
Returns a sort expression based on ascending order of the column,
and null values return before non-null values.
- asc_nulls_last() - Method in class org.apache.spark.sql.Column
Returns a sort expression based on ascending order of the column,
and null values appear after non-null values.
- asc_nulls_last(String) - Static method in class org.apache.spark.sql.functions
Returns a sort expression based on ascending order of the column,
and null values appear after non-null values.
- ascii(Column) - Static method in class org.apache.spark.sql.functions
Computes the numeric value of the first character of the string column, and returns the
result as an int column.
- asin(Column) - Static method in class org.apache.spark.sql.functions
- asin(String) - Static method in class org.apache.spark.sql.functions
- asIterator() - Method in class org.apache.spark.serializer.DeserializationStream
Read the elements of this stream through an iterator.
- asJavaPairRDD() - Method in class org.apache.spark.api.r.PairwiseRRDD
- asJavaRDD() - Method in class org.apache.spark.api.r.RRDD
- asJavaRDD() - Method in class org.apache.spark.api.r.StringRRDD
- asKeyValueIterator() - Method in class org.apache.spark.serializer.DeserializationStream
Read the elements of this stream through an iterator over key-value pairs.
- AskPermissionToCommitOutput - Class in org.apache.spark.scheduler
- AskPermissionToCommitOutput(int, int, int, int) - Constructor for class org.apache.spark.scheduler.AskPermissionToCommitOutput
- askRpcTimeout(SparkConf) - Static method in class org.apache.spark.util.RpcUtils
Returns the default Spark timeout to use for RPC ask operations.
- askSlaves() - Method in class org.apache.spark.storage.BlockManagerMessages.GetBlockStatus
- askSlaves() - Method in class org.apache.spark.storage.BlockManagerMessages.GetMatchingBlockIds
- asMap() - Method in class org.apache.spark.sql.sources.v2.DataSourceOptions
- asML() - Method in class org.apache.spark.mllib.linalg.DenseMatrix
- asML() - Method in class org.apache.spark.mllib.linalg.DenseVector
- asML() - Method in interface org.apache.spark.mllib.linalg.Matrix
Convert this matrix to the new mllib-local representation.
- asML() - Method in class org.apache.spark.mllib.linalg.SparseMatrix
- asML() - Method in class org.apache.spark.mllib.linalg.SparseVector
- asML() - Method in interface org.apache.spark.mllib.linalg.Vector
Convert this vector to the new mllib-local representation.
- asNondeterministic() - Method in class org.apache.spark.sql.expressions.UserDefinedFunction
Updates UserDefinedFunction to nondeterministic.
- asNonNullable() - Method in class org.apache.spark.sql.expressions.UserDefinedFunction
Updates UserDefinedFunction to non-nullable.
- asNullable() - Method in class org.apache.spark.sql.types.ObjectType
- asRDDId() - Method in class org.apache.spark.storage.BlockId
- assertConf(JobContext, SparkConf) - Method in class org.apache.spark.internal.io.HadoopWriteConfigUtil
- assertNotSpilled(SparkContext, String, Function0<BoxedUnit>) - Static method in class org.apache.spark.TestUtils
Run some code involving jobs submitted to the given context and assert that the jobs
did not spill.
- assertSpilled(SparkContext, String, Function0<BoxedUnit>) - Static method in class org.apache.spark.TestUtils
Run some code involving jobs submitted to the given context and assert that the jobs spilled.
- assignClusters(Dataset<?>) - Method in class org.apache.spark.ml.clustering.PowerIterationClustering
Run the PIC algorithm and returns a cluster assignment for each input vertex.
- Assignment(long, int) - Constructor for class org.apache.spark.mllib.clustering.PowerIterationClustering.Assignment
- Assignment$() - Constructor for class org.apache.spark.mllib.clustering.PowerIterationClustering.Assignment$
- assignments() - Method in class org.apache.spark.mllib.clustering.PowerIterationClusteringModel
- AssociationRules - Class in org.apache.spark.ml.fpm
- AssociationRules() - Constructor for class org.apache.spark.ml.fpm.AssociationRules
- associationRules() - Method in class org.apache.spark.ml.fpm.FPGrowthModel
Get association rules fitted using the minConfidence.
- AssociationRules - Class in org.apache.spark.mllib.fpm
Generates association rules from a RDD[FreqItemset[Item}
- AssociationRules() - Constructor for class org.apache.spark.mllib.fpm.AssociationRules
Constructs a default instance with default parameters {minConfidence = 0.8}.
- AssociationRules.Rule<Item> - Class in org.apache.spark.mllib.fpm
An association rule between sets of items.
- ASYNC_TRACKING_ENABLED() - Static method in class org.apache.spark.status.config
- AsyncEventQueue - Class in org.apache.spark.scheduler
An asynchronous queue for events.
- AsyncEventQueue(String, SparkConf, LiveListenerBusMetrics, LiveListenerBus) - Constructor for class org.apache.spark.scheduler.AsyncEventQueue
- AsyncRDDActions<T> - Class in org.apache.spark.rdd
A set of asynchronous RDD actions available through an implicit conversion.
- AsyncRDDActions(RDD<T>, ClassTag<T>) - Constructor for class org.apache.spark.rdd.AsyncRDDActions
- atan(Column) - Static method in class org.apache.spark.sql.functions
- atan(String) - Static method in class org.apache.spark.sql.functions
- atan2(Column, Column) - Static method in class org.apache.spark.sql.functions
- atan2(Column, String) - Static method in class org.apache.spark.sql.functions
- atan2(String, Column) - Static method in class org.apache.spark.sql.functions
- atan2(String, String) - Static method in class org.apache.spark.sql.functions
- atan2(Column, double) - Static method in class org.apache.spark.sql.functions
- atan2(String, double) - Static method in class org.apache.spark.sql.functions
- atan2(double, Column) - Static method in class org.apache.spark.sql.functions
- atan2(double, String) - Static method in class org.apache.spark.sql.functions
- attempt() - Method in class org.apache.spark.status.api.v1.TaskData
- ATTEMPT() - Static method in class org.apache.spark.status.TaskIndexNames
- attemptId() - Method in class org.apache.spark.scheduler.StageInfo
- attemptId() - Method in class org.apache.spark.status.api.v1.ApplicationAttemptInfo
- attemptId() - Method in interface org.apache.spark.status.api.v1.BaseAppResource
- attemptId() - Method in class org.apache.spark.status.api.v1.StageData
- attemptNumber() - Method in class org.apache.spark.BarrierTaskContext
- attemptNumber() - Method in class org.apache.spark.scheduler.AskPermissionToCommitOutput
- attemptNumber() - Method in class org.apache.spark.scheduler.StageInfo
- attemptNumber() - Method in class org.apache.spark.scheduler.TaskInfo
- attemptNumber() - Method in class org.apache.spark.TaskCommitDenied
- attemptNumber() - Method in class org.apache.spark.TaskContext
How many times this task has been attempted.
- attempts() - Method in class org.apache.spark.status.api.v1.ApplicationInfo
- AtTimestamp(Date) - Constructor for class org.apache.spark.streaming.kinesis.KinesisInitialPositions.AtTimestamp
- attr() - Method in class org.apache.spark.graphx.Edge
- attr() - Method in class org.apache.spark.graphx.EdgeContext
The attribute associated with the edge.
- attr() - Method in class org.apache.spark.graphx.impl.AggregatingEdgeContext
- Attribute - Class in org.apache.spark.ml.attribute
:: DeveloperApi ::
Abstract class for ML attributes.
- Attribute() - Constructor for class org.apache.spark.ml.attribute.Attribute
- attribute() - Method in class org.apache.spark.sql.sources.EqualNullSafe
- attribute() - Method in class org.apache.spark.sql.sources.EqualTo
- attribute() - Method in class org.apache.spark.sql.sources.GreaterThan
- attribute() - Method in class org.apache.spark.sql.sources.GreaterThanOrEqual
- attribute() - Method in class org.apache.spark.sql.sources.In
- attribute() - Method in class org.apache.spark.sql.sources.IsNotNull
- attribute() - Method in class org.apache.spark.sql.sources.IsNull
- attribute() - Method in class org.apache.spark.sql.sources.LessThan
- attribute() - Method in class org.apache.spark.sql.sources.LessThanOrEqual
- attribute() - Method in class org.apache.spark.sql.sources.StringContains
- attribute() - Method in class org.apache.spark.sql.sources.StringEndsWith
- attribute() - Method in class org.apache.spark.sql.sources.StringStartsWith
- AttributeFactory - Interface in org.apache.spark.ml.attribute
Trait for ML attribute factories.
- AttributeGroup - Class in org.apache.spark.ml.attribute
:: DeveloperApi ::
Attributes that describe a vector ML column.
- AttributeGroup(String) - Constructor for class org.apache.spark.ml.attribute.AttributeGroup
Creates an attribute group without attribute info.
- AttributeGroup(String, int) - Constructor for class org.apache.spark.ml.attribute.AttributeGroup
Creates an attribute group knowing only the number of attributes.
- AttributeGroup(String, Attribute[]) - Constructor for class org.apache.spark.ml.attribute.AttributeGroup
Creates an attribute group with attributes.
- AttributeKeys - Class in org.apache.spark.ml.attribute
Keys used to store attributes.
- AttributeKeys() - Constructor for class org.apache.spark.ml.attribute.AttributeKeys
- attributes() - Method in class org.apache.spark.ml.attribute.AttributeGroup
Optional array of attributes.
- ATTRIBUTES() - Static method in class org.apache.spark.ml.attribute.AttributeKeys
- AttributeType - Class in org.apache.spark.ml.attribute
:: DeveloperApi ::
An enum-like type for attribute types: AttributeType$.Numeric
, AttributeType$.Nominal
and AttributeType$.Binary
- AttributeType(String) - Constructor for class org.apache.spark.ml.attribute.AttributeType
- attrType() - Method in class org.apache.spark.ml.attribute.Attribute
Attribute type.
- attrType() - Method in class org.apache.spark.ml.attribute.BinaryAttribute
- attrType() - Method in class org.apache.spark.ml.attribute.NominalAttribute
- attrType() - Method in class org.apache.spark.ml.attribute.NumericAttribute
- attrType() - Static method in class org.apache.spark.ml.attribute.UnresolvedAttribute
- available() - Method in class org.apache.spark.io.NioBufferedFileInputStream
- available() - Method in class org.apache.spark.io.ReadAheadInputStream
- available() - Method in class org.apache.spark.storage.BufferReleasingInputStream
- Average() - Static method in class org.apache.spark.mllib.tree.configuration.EnsembleCombiningStrategy
- avg(MapFunction<T, Double>) - Static method in class org.apache.spark.sql.expressions.javalang.typed
Average aggregate function.
- avg(Function1<IN, Object>) - Static method in class org.apache.spark.sql.expressions.scalalang.typed
Average aggregate function.
- avg(Column) - Static method in class org.apache.spark.sql.functions
Aggregate function: returns the average of the values in a group.
- avg(String) - Static method in class org.apache.spark.sql.functions
Aggregate function: returns the average of the values in a group.
- avg(String...) - Method in class org.apache.spark.sql.RelationalGroupedDataset
Compute the mean value for each numeric columns for each group.
- avg(Seq<String>) - Method in class org.apache.spark.sql.RelationalGroupedDataset
Compute the mean value for each numeric columns for each group.
- avg() - Method in class org.apache.spark.util.DoubleAccumulator
Returns the average of elements added to the accumulator.
- avg() - Method in class org.apache.spark.util.LongAccumulator
Returns the average of elements added to the accumulator.
- avgEventRate() - Method in class org.apache.spark.status.api.v1.streaming.ReceiverInfo
- avgInputRate() - Method in class org.apache.spark.status.api.v1.streaming.StreamingStatistics
- avgMetrics() - Method in class org.apache.spark.ml.tuning.CrossValidatorModel
- avgProcessingTime() - Method in class org.apache.spark.status.api.v1.streaming.StreamingStatistics
- avgSchedulingDelay() - Method in class org.apache.spark.status.api.v1.streaming.StreamingStatistics
- avgTotalDelay() - Method in class org.apache.spark.status.api.v1.streaming.StreamingStatistics
- awaitAnyTermination() - Method in class org.apache.spark.sql.streaming.StreamingQueryManager
Wait until any of the queries on the associated SQLContext has terminated since the
creation of the context, or since resetTerminated()
was called.
- awaitAnyTermination(long) - Method in class org.apache.spark.sql.streaming.StreamingQueryManager
Wait until any of the queries on the associated SQLContext has terminated since the
creation of the context, or since resetTerminated()
was called.
- awaitReady(Awaitable<T>, Duration) - Static method in class org.apache.spark.util.ThreadUtils
Preferred alternative to Await.ready()
- awaitResult(Awaitable<T>, Duration) - Static method in class org.apache.spark.util.ThreadUtils
Preferred alternative to Await.result()
- awaitTermination() - Method in interface org.apache.spark.sql.streaming.StreamingQuery
Waits for the termination of this
query, either by query.stop()
or by an exception.
- awaitTermination(long) - Method in interface org.apache.spark.sql.streaming.StreamingQuery
Waits for the termination of this
query, either by query.stop()
or by an exception.
- awaitTermination() - Method in class org.apache.spark.streaming.api.java.JavaStreamingContext
Wait for the execution to stop.
- awaitTermination() - Method in class org.apache.spark.streaming.StreamingContext
Wait for the execution to stop.
- awaitTerminationOrTimeout(long) - Method in class org.apache.spark.streaming.api.java.JavaStreamingContext
Wait for the execution to stop.
- awaitTerminationOrTimeout(long) - Method in class org.apache.spark.streaming.StreamingContext
Wait for the execution to stop.
- axpy(double, Vector, Vector) - Static method in class org.apache.spark.ml.linalg.BLAS
y += a * x
- axpy(double, Vector, Vector) - Static method in class org.apache.spark.mllib.linalg.BLAS
y += a * x
- cache() - Method in class org.apache.spark.api.java.JavaDoubleRDD
Persist this RDD with the default storage level (MEMORY_ONLY
- cache() - Method in class org.apache.spark.api.java.JavaPairRDD
Persist this RDD with the default storage level (MEMORY_ONLY
- cache() - Method in class org.apache.spark.api.java.JavaRDD
Persist this RDD with the default storage level (MEMORY_ONLY
- cache() - Method in class org.apache.spark.graphx.Graph
Caches the vertices and edges associated with this graph at the previously-specified target
storage levels, which default to MEMORY_ONLY
- cache() - Method in class org.apache.spark.graphx.impl.EdgeRDDImpl
Persists the edge partitions using targetStorageLevel
, which defaults to MEMORY_ONLY.
- cache() - Method in class org.apache.spark.graphx.impl.GraphImpl
- cache() - Method in class org.apache.spark.graphx.impl.VertexRDDImpl
Persists the vertex partitions at targetStorageLevel
, which defaults to MEMORY_ONLY.
- cache() - Method in class org.apache.spark.mllib.linalg.distributed.BlockMatrix
Caches the underlying RDD.
- cache() - Method in class org.apache.spark.rdd.RDD
Persist this RDD with the default storage level (MEMORY_ONLY
- cache() - Method in class org.apache.spark.sql.Dataset
Persist this Dataset with the default storage level (MEMORY_AND_DISK
- cache() - Method in class org.apache.spark.streaming.api.java.JavaDStream
Persist RDDs of this DStream with the default storage level (MEMORY_ONLY_SER)
- cache() - Method in class org.apache.spark.streaming.api.java.JavaPairDStream
Persist RDDs of this DStream with the default storage level (MEMORY_ONLY_SER)
- cache() - Method in class org.apache.spark.streaming.dstream.DStream
Persist RDDs of this DStream with the default storage level (MEMORY_ONLY_SER)
- cacheNodeIds() - Method in interface org.apache.spark.ml.tree.DecisionTreeParams
If false, the algorithm will pass trees to executors to match instances with nodes.
- cacheSize() - Method in interface org.apache.spark.SparkExecutorInfo
- cacheSize() - Method in class org.apache.spark.SparkExecutorInfoImpl
- cacheTable(String) - Method in class org.apache.spark.sql.catalog.Catalog
Caches the specified table in-memory.
- cacheTable(String, StorageLevel) - Method in class org.apache.spark.sql.catalog.Catalog
Caches the specified table with the given storage level.
- cacheTable(String) - Method in class org.apache.spark.sql.SQLContext
Caches the specified table in-memory.
- calculate(DenseVector<Object>) - Method in class org.apache.spark.ml.regression.AFTCostFun
- calculate(double[], double) - Static method in class org.apache.spark.mllib.tree.impurity.Entropy
:: DeveloperApi ::
information calculation for multiclass classification
- calculate(double, double, double) - Static method in class org.apache.spark.mllib.tree.impurity.Entropy
:: DeveloperApi ::
variance calculation
- calculate(double[], double) - Static method in class org.apache.spark.mllib.tree.impurity.Gini
:: DeveloperApi ::
information calculation for multiclass classification
- calculate(double, double, double) - Static method in class org.apache.spark.mllib.tree.impurity.Gini
:: DeveloperApi ::
variance calculation
- calculate(double[], double) - Method in interface org.apache.spark.mllib.tree.impurity.Impurity
:: DeveloperApi ::
information calculation for multiclass classification
- calculate(double, double, double) - Method in interface org.apache.spark.mllib.tree.impurity.Impurity
:: DeveloperApi ::
information calculation for regression
- calculate(double[], double) - Static method in class org.apache.spark.mllib.tree.impurity.Variance
:: DeveloperApi ::
information calculation for multiclass classification
- calculate(double, double, double) - Static method in class org.apache.spark.mllib.tree.impurity.Variance
:: DeveloperApi ::
variance calculation
- calculateNumberOfPartitions(long, int, int) - Method in class org.apache.spark.ml.feature.Word2VecModel.Word2VecModelWriter$
Calculate the number of partitions to use in saving the model.
- CalendarIntervalType - Class in org.apache.spark.sql.types
The data type representing calendar time intervals.
- CalendarIntervalType() - Constructor for class org.apache.spark.sql.types.CalendarIntervalType
- CalendarIntervalType - Static variable in class org.apache.spark.sql.types.DataTypes
Gets the CalendarIntervalType object.
- call(K, Iterator<V1>, Iterator<V2>) - Method in interface org.apache.spark.api.java.function.CoGroupFunction
- call(T) - Method in interface org.apache.spark.api.java.function.DoubleFlatMapFunction
- call(T) - Method in interface org.apache.spark.api.java.function.DoubleFunction
- call(T) - Method in interface org.apache.spark.api.java.function.FilterFunction
- call(T) - Method in interface org.apache.spark.api.java.function.FlatMapFunction
- call(T1, T2) - Method in interface org.apache.spark.api.java.function.FlatMapFunction2
- call(K, Iterator<V>) - Method in interface org.apache.spark.api.java.function.FlatMapGroupsFunction
- call(K, Iterator<V>, GroupState<S>) - Method in interface org.apache.spark.api.java.function.FlatMapGroupsWithStateFunction
- call(T) - Method in interface org.apache.spark.api.java.function.ForeachFunction
- call(Iterator<T>) - Method in interface org.apache.spark.api.java.function.ForeachPartitionFunction
- call(T1) - Method in interface org.apache.spark.api.java.function.Function
- call() - Method in interface org.apache.spark.api.java.function.Function0
- call(T1, T2) - Method in interface org.apache.spark.api.java.function.Function2
- call(T1, T2, T3) - Method in interface org.apache.spark.api.java.function.Function3
- call(T1, T2, T3, T4) - Method in interface org.apache.spark.api.java.function.Function4
- call(T) - Method in interface org.apache.spark.api.java.function.MapFunction
- call(K, Iterator<V>) - Method in interface org.apache.spark.api.java.function.MapGroupsFunction
- call(K, Iterator<V>, GroupState<S>) - Method in interface org.apache.spark.api.java.function.MapGroupsWithStateFunction
- call(Iterator<T>) - Method in interface org.apache.spark.api.java.function.MapPartitionsFunction
- call(T) - Method in interface org.apache.spark.api.java.function.PairFlatMapFunction
- call(T) - Method in interface org.apache.spark.api.java.function.PairFunction
- call(T, T) - Method in interface org.apache.spark.api.java.function.ReduceFunction
- call(T) - Method in interface org.apache.spark.api.java.function.VoidFunction
- call(T1, T2) - Method in interface org.apache.spark.api.java.function.VoidFunction2
- call() - Method in interface org.apache.spark.sql.api.java.UDF0
- call(T1) - Method in interface org.apache.spark.sql.api.java.UDF1
- call(T1, T2, T3, T4, T5, T6, T7, T8, T9, T10) - Method in interface org.apache.spark.sql.api.java.UDF10
- call(T1, T2, T3, T4, T5, T6, T7, T8, T9, T10, T11) - Method in interface org.apache.spark.sql.api.java.UDF11
- call(T1, T2, T3, T4, T5, T6, T7, T8, T9, T10, T11, T12) - Method in interface org.apache.spark.sql.api.java.UDF12
- call(T1, T2, T3, T4, T5, T6, T7, T8, T9, T10, T11, T12, T13) - Method in interface org.apache.spark.sql.api.java.UDF13
- call(T1, T2, T3, T4, T5, T6, T7, T8, T9, T10, T11, T12, T13, T14) - Method in interface org.apache.spark.sql.api.java.UDF14
- call(T1, T2, T3, T4, T5, T6, T7, T8, T9, T10, T11, T12, T13, T14, T15) - Method in interface org.apache.spark.sql.api.java.UDF15
- call(T1, T2, T3, T4, T5, T6, T7, T8, T9, T10, T11, T12, T13, T14, T15, T16) - Method in interface org.apache.spark.sql.api.java.UDF16
- call(T1, T2, T3, T4, T5, T6, T7, T8, T9, T10, T11, T12, T13, T14, T15, T16, T17) - Method in interface org.apache.spark.sql.api.java.UDF17
- call(T1, T2, T3, T4, T5, T6, T7, T8, T9, T10, T11, T12, T13, T14, T15, T16, T17, T18) - Method in interface org.apache.spark.sql.api.java.UDF18
- call(T1, T2, T3, T4, T5, T6, T7, T8, T9, T10, T11, T12, T13, T14, T15, T16, T17, T18, T19) - Method in interface org.apache.spark.sql.api.java.UDF19
- call(T1, T2) - Method in interface org.apache.spark.sql.api.java.UDF2
- call(T1, T2, T3, T4, T5, T6, T7, T8, T9, T10, T11, T12, T13, T14, T15, T16, T17, T18, T19, T20) - Method in interface org.apache.spark.sql.api.java.UDF20
- call(T1, T2, T3, T4, T5, T6, T7, T8, T9, T10, T11, T12, T13, T14, T15, T16, T17, T18, T19, T20, T21) - Method in interface org.apache.spark.sql.api.java.UDF21
- call(T1, T2, T3, T4, T5, T6, T7, T8, T9, T10, T11, T12, T13, T14, T15, T16, T17, T18, T19, T20, T21, T22) - Method in interface org.apache.spark.sql.api.java.UDF22
- call(T1, T2, T3) - Method in interface org.apache.spark.sql.api.java.UDF3
- call(T1, T2, T3, T4) - Method in interface org.apache.spark.sql.api.java.UDF4
- call(T1, T2, T3, T4, T5) - Method in interface org.apache.spark.sql.api.java.UDF5
- call(T1, T2, T3, T4, T5, T6) - Method in interface org.apache.spark.sql.api.java.UDF6
- call(T1, T2, T3, T4, T5, T6, T7) - Method in interface org.apache.spark.sql.api.java.UDF7
- call(T1, T2, T3, T4, T5, T6, T7, T8) - Method in interface org.apache.spark.sql.api.java.UDF8
- call(T1, T2, T3, T4, T5, T6, T7, T8, T9) - Method in interface org.apache.spark.sql.api.java.UDF9
- callSite() - Method in class org.apache.spark.storage.RDDInfo
- callUDF(String, Column...) - Static method in class org.apache.spark.sql.functions
Call an user-defined function.
- callUDF(String, Seq<Column>) - Static method in class org.apache.spark.sql.functions
Call an user-defined function.
- cancel() - Method in class org.apache.spark.ComplexFutureAction
- cancel() - Method in interface org.apache.spark.FutureAction
Cancels the execution of this action.
- cancel() - Method in class org.apache.spark.SimpleFutureAction
- cancelAllJobs() - Method in class org.apache.spark.api.java.JavaSparkContext
Cancel all jobs that have been scheduled or are running.
- cancelAllJobs() - Method in class org.apache.spark.SparkContext
Cancel all jobs that have been scheduled or are running.
- cancelJob(int, String) - Method in class org.apache.spark.SparkContext
Cancel a given job if it's scheduled or running.
- cancelJob(int) - Method in class org.apache.spark.SparkContext
Cancel a given job if it's scheduled or running.
- cancelJobGroup(String) - Method in class org.apache.spark.api.java.JavaSparkContext
Cancel active jobs for the specified group.
- cancelJobGroup(String) - Method in class org.apache.spark.SparkContext
Cancel active jobs for the specified group.
- cancelStage(int, String) - Method in class org.apache.spark.SparkContext
Cancel a given stage and all jobs associated with it.
- cancelStage(int) - Method in class org.apache.spark.SparkContext
Cancel a given stage and all jobs associated with it.
- cancelTasks(int, boolean) - Method in interface org.apache.spark.scheduler.TaskScheduler
- canCreate(String) - Method in interface org.apache.spark.scheduler.ExternalClusterManager
Check if this cluster manager instance can create scheduler components
for a certain master URL.
- canDoMerge() - Method in class org.apache.spark.sql.hive.HiveUDAFBuffer
- canEqual(Object) - Static method in class org.apache.spark.ExpireDeadHosts
- canEqual(Object) - Static method in class org.apache.spark.ml.feature.Dot
- canEqual(Object) - Static method in class org.apache.spark.Resubmitted
- canEqual(Object) - Static method in class org.apache.spark.rpc.netty.OnStart
- canEqual(Object) - Static method in class org.apache.spark.rpc.netty.OnStop
- canEqual(Object) - Static method in class org.apache.spark.scheduler.AllJobsCancelled
- canEqual(Object) - Method in class org.apache.spark.scheduler.cluster.ExecutorInfo
- canEqual(Object) - Static method in class org.apache.spark.scheduler.JobSucceeded
- canEqual(Object) - Static method in class org.apache.spark.scheduler.ResubmitFailedStages
- canEqual(Object) - Static method in class org.apache.spark.scheduler.StopCoordinator
- canEqual(Object) - Static method in class org.apache.spark.sql.jdbc.MySQLDialect
- canEqual(Object) - Static method in class org.apache.spark.sql.jdbc.OracleDialect
- canEqual(Object) - Static method in class org.apache.spark.sql.jdbc.TeradataDialect
- canEqual(Object) - Static method in class org.apache.spark.sql.types.BinaryType
- canEqual(Object) - Static method in class org.apache.spark.sql.types.BooleanType
- canEqual(Object) - Static method in class org.apache.spark.sql.types.ByteType
- canEqual(Object) - Static method in class org.apache.spark.sql.types.CalendarIntervalType
- canEqual(Object) - Static method in class org.apache.spark.sql.types.DateType
- canEqual(Object) - Static method in class org.apache.spark.sql.types.DoubleType
- canEqual(Object) - Static method in class org.apache.spark.sql.types.FloatType
- canEqual(Object) - Static method in class org.apache.spark.sql.types.IntegerType
- canEqual(Object) - Static method in class org.apache.spark.sql.types.LongType
- canEqual(Object) - Static method in class org.apache.spark.sql.types.NullType
- canEqual(Object) - Static method in class org.apache.spark.sql.types.ShortType
- canEqual(Object) - Static method in class org.apache.spark.sql.types.StringType
- canEqual(Object) - Static method in class org.apache.spark.sql.types.TimestampType
- canEqual(Object) - Static method in class org.apache.spark.StopMapOutputTracker
- canEqual(Object) - Static method in class org.apache.spark.streaming.kinesis.DefaultCredentials
- canEqual(Object) - Static method in class org.apache.spark.streaming.scheduler.AllReceiverIds
- canEqual(Object) - Static method in class org.apache.spark.streaming.scheduler.GetAllReceiverInfo
- canEqual(Object) - Static method in class org.apache.spark.streaming.scheduler.StopAllReceivers
- canEqual(Object) - Static method in class org.apache.spark.Success
- canEqual(Object) - Static method in class org.apache.spark.TaskResultLost
- canEqual(Object) - Static method in class org.apache.spark.TaskSchedulerIsSet
- canEqual(Object) - Static method in class org.apache.spark.UnknownReason
- canEqual(Object) - Method in class org.apache.spark.util.MutablePair
- canHandle(String) - Method in class org.apache.spark.sql.jdbc.AggregatedDialect
- canHandle(String) - Static method in class org.apache.spark.sql.jdbc.DB2Dialect
- canHandle(String) - Static method in class org.apache.spark.sql.jdbc.DerbyDialect
- canHandle(String) - Method in class org.apache.spark.sql.jdbc.JdbcDialect
Check if this dialect instance can handle a certain jdbc url.
- canHandle(String) - Static method in class org.apache.spark.sql.jdbc.MsSqlServerDialect
- canHandle(String) - Static method in class org.apache.spark.sql.jdbc.MySQLDialect
- canHandle(String) - Static method in class org.apache.spark.sql.jdbc.NoopDialect
- canHandle(String) - Static method in class org.apache.spark.sql.jdbc.OracleDialect
- canHandle(String) - Static method in class org.apache.spark.sql.jdbc.PostgresDialect
- canHandle(String) - Static method in class org.apache.spark.sql.jdbc.TeradataDialect
- CanonicalRandomVertexCut$() - Constructor for class org.apache.spark.graphx.PartitionStrategy.CanonicalRandomVertexCut$
- canWrite(DataType, DataType, Function2<String, String, Object>, String, Function1<String, BoxedUnit>) - Static method in class org.apache.spark.sql.types.DataType
Returns true if the write data type can be read using the read data type.
- cartesian(JavaRDDLike<U, ?>) - Method in interface org.apache.spark.api.java.JavaRDDLike
Return the Cartesian product of this RDD and another one, that is, the RDD of all pairs of
elements (a, b) where a is in this
and b is in other
- cartesian(RDD<U>, ClassTag<U>) - Method in class org.apache.spark.rdd.RDD
Return the Cartesian product of this RDD and another one, that is, the RDD of all pairs of
elements (a, b) where a is in this
and b is in other
- caseSensitive() - Method in class org.apache.spark.ml.feature.StopWordsRemover
Whether to do a case sensitive comparison over the stop words.
- cast(DataType) - Method in class org.apache.spark.sql.Column
Casts the column to a different data type.
- cast(String) - Method in class org.apache.spark.sql.Column
Casts the column to a different data type, using the canonical string representation
of the type.
- Catalog - Class in org.apache.spark.sql.catalog
Catalog interface for Spark.
- Catalog() - Constructor for class org.apache.spark.sql.catalog.Catalog
- catalog() - Method in class org.apache.spark.sql.SparkSession
Interface through which the user may create, drop, alter or query underlying
databases, tables, functions etc.
- catalogString() - Method in class org.apache.spark.sql.types.ArrayType
- catalogString() - Static method in class org.apache.spark.sql.types.BinaryType
- catalogString() - Static method in class org.apache.spark.sql.types.BooleanType
- catalogString() - Static method in class org.apache.spark.sql.types.ByteType
- catalogString() - Static method in class org.apache.spark.sql.types.CalendarIntervalType
- catalogString() - Method in class org.apache.spark.sql.types.DataType
String representation for the type saved in external catalogs.
- catalogString() - Static method in class org.apache.spark.sql.types.DateType
- catalogString() - Static method in class org.apache.spark.sql.types.DoubleType
- catalogString() - Static method in class org.apache.spark.sql.types.FloatType
- catalogString() - Static method in class org.apache.spark.sql.types.IntegerType
- catalogString() - Static method in class org.apache.spark.sql.types.LongType
- catalogString() - Method in class org.apache.spark.sql.types.MapType
- catalogString() - Static method in class org.apache.spark.sql.types.NullType
- catalogString() - Static method in class org.apache.spark.sql.types.ShortType
- catalogString() - Static method in class org.apache.spark.sql.types.StringType
- catalogString() - Method in class org.apache.spark.sql.types.StructType
- catalogString() - Static method in class org.apache.spark.sql.types.TimestampType
- CatalystScan - Interface in org.apache.spark.sql.sources
An interface for experimenting with a more direct connection to the query planner.
- Categorical() - Static method in class org.apache.spark.mllib.tree.configuration.FeatureType
- categoricalCols() - Method in class org.apache.spark.ml.feature.FeatureHasher
Numeric columns to treat as categorical features.
- categoricalFeaturesInfo() - Method in class org.apache.spark.mllib.tree.configuration.Strategy
- CategoricalSplit - Class in org.apache.spark.ml.tree
Split which tests a categorical feature.
- categories() - Method in class org.apache.spark.mllib.tree.model.DecisionTreeModel.SaveLoadV1_0$.SplitData
- categories() - Method in class org.apache.spark.mllib.tree.model.Split
- categoryMaps() - Method in class org.apache.spark.ml.feature.VectorIndexerModel
- categorySizes() - Method in class org.apache.spark.ml.feature.OneHotEncoderModel
- cause() - Method in exception org.apache.spark.sql.AnalysisException
- cause() - Method in exception org.apache.spark.sql.streaming.StreamingQueryException
- CausedBy - Class in org.apache.spark.util
Extractor Object for pulling out the root cause of an error.
- CausedBy() - Constructor for class org.apache.spark.util.CausedBy
- cbrt(Column) - Static method in class org.apache.spark.sql.functions
Computes the cube-root of the given value.
- cbrt(String) - Static method in class org.apache.spark.sql.functions
Computes the cube-root of the given column.
- ceil(Column) - Static method in class org.apache.spark.sql.functions
Computes the ceiling of the given value.
- ceil(String) - Static method in class org.apache.spark.sql.functions
Computes the ceiling of the given column.
- ceil() - Method in class org.apache.spark.sql.types.Decimal
- censorCol() - Method in interface org.apache.spark.ml.regression.AFTSurvivalRegressionParams
Param for censor column name.
- chainl1(Function0<Parsers.Parser<T>>, Function0<Parsers.Parser<Function2<T, T, T>>>) - Static method in class org.apache.spark.ml.feature.RFormulaParser
- chainl1(Function0<Parsers.Parser<T>>, Function0<Parsers.Parser<U>>, Function0<Parsers.Parser<Function2<T, U, T>>>) - Static method in class org.apache.spark.ml.feature.RFormulaParser
- chainr1(Function0<Parsers.Parser<T>>, Function0<Parsers.Parser<Function2<T, U, U>>>, Function2<T, U, U>, U) - Static method in class org.apache.spark.ml.feature.RFormulaParser
- changePrecision(int, int) - Method in class org.apache.spark.sql.types.Decimal
Update precision and scale while keeping our value the same, and return true if successful.
- channelRead0(ChannelHandlerContext, byte[]) - Method in class org.apache.spark.api.r.RBackendAuthHandler
- CharType - Class in org.apache.spark.sql.types
Hive char type.
- CharType(int) - Constructor for class org.apache.spark.sql.types.CharType
- checkAndGetK8sMasterUrl(String) - Static method in class org.apache.spark.util.Utils
Check the validity of the given Kubernetes master URL and return the resolved URL.
- checkColumnNameDuplication(Seq<String>, String, Function2<String, String, Object>) - Static method in class org.apache.spark.sql.util.SchemaUtils
Checks if input column names have duplicate identifiers.
- checkColumnNameDuplication(Seq<String>, String, boolean) - Static method in class org.apache.spark.sql.util.SchemaUtils
Checks if input column names have duplicate identifiers.
- checkColumnType(StructType, String, DataType, String) - Static method in class org.apache.spark.ml.util.SchemaUtils
Check whether the given schema contains a column of the required data type.
- checkColumnTypes(StructType, String, Seq<DataType>, String) - Static method in class org.apache.spark.ml.util.SchemaUtils
Check whether the given schema contains a column of one of the require data types.
- checkDataColumns(RFormula, Dataset<?>) - Static method in class org.apache.spark.ml.r.RWrapperUtils
DataFrame column check.
- checkedCast() - Method in interface org.apache.spark.ml.recommendation.ALSModelParams
Attempts to safely cast a user/item id to an Int.
- checkFileExists(String, Configuration) - Static method in class org.apache.spark.streaming.util.HdfsUtils
Check if the file exists at the given path.
- checkHost(String) - Static method in class org.apache.spark.util.Utils
- checkHostPort(String) - Static method in class org.apache.spark.util.Utils
- checkNumericType(StructType, String, String) - Static method in class org.apache.spark.ml.util.SchemaUtils
Check whether the given schema contains a column of the numeric data type.
- checkpoint() - Method in interface org.apache.spark.api.java.JavaRDDLike
Mark this RDD for checkpointing.
- checkpoint() - Method in class org.apache.spark.graphx.Graph
Mark this Graph for checkpointing.
- checkpoint() - Method in class org.apache.spark.graphx.impl.EdgeRDDImpl
- checkpoint() - Method in class org.apache.spark.graphx.impl.GraphImpl
- checkpoint() - Method in class org.apache.spark.graphx.impl.VertexRDDImpl
- checkpoint() - Method in class org.apache.spark.rdd.HadoopRDD
- checkpoint() - Method in class org.apache.spark.rdd.RDD
Mark this RDD for checkpointing.
- checkpoint() - Method in class org.apache.spark.sql.Dataset
Eagerly checkpoint a Dataset and return the new Dataset.
- checkpoint(boolean) - Method in class org.apache.spark.sql.Dataset
Returns a checkpointed version of this Dataset.
- checkpoint(Duration) - Method in interface org.apache.spark.streaming.api.java.JavaDStreamLike
Enable periodic checkpointing of RDDs of this DStream.
- checkpoint(String) - Method in class org.apache.spark.streaming.api.java.JavaStreamingContext
Sets the context to periodically checkpoint the DStream operations for master
- checkpoint(Duration) - Method in class org.apache.spark.streaming.dstream.DStream
Enable periodic checkpointing of RDDs of this DStream
- checkpoint(String) - Method in class org.apache.spark.streaming.StreamingContext
Set the context to periodically checkpoint the DStream operations for driver
- checkpointCleaned(long) - Method in interface org.apache.spark.CleanerListener
- Checkpointed() - Static method in class org.apache.spark.rdd.CheckpointState
- CheckpointingInProgress() - Static method in class org.apache.spark.rdd.CheckpointState
- checkpointInterval() - Method in interface org.apache.spark.ml.param.shared.HasCheckpointInterval
Param for set checkpoint interval (>= 1) or disable checkpoint (-1).
- checkpointInterval() - Method in class org.apache.spark.mllib.tree.configuration.Strategy
- CheckpointReader - Class in org.apache.spark.streaming
- CheckpointReader() - Constructor for class org.apache.spark.streaming.CheckpointReader
- CheckpointState - Class in org.apache.spark.rdd
Enumeration to manage state transitions of an RDD through checkpointing
- CheckpointState() - Constructor for class org.apache.spark.rdd.CheckpointState
- checkSchemaColumnNameDuplication(StructType, String, boolean) - Static method in class org.apache.spark.sql.util.SchemaUtils
Checks if an input schema has duplicate column names.
- checkSingleVsMultiColumnParams(Params, Seq<Param<?>>, Seq<Param<?>>) - Static method in class org.apache.spark.ml.param.ParamValidators
Utility for Param validity checks for Transformers which have both single- and multi-column
- checkSpeculatableTasks(int) - Method in interface org.apache.spark.scheduler.Schedulable
- checkState(boolean, Function0<String>) - Static method in class org.apache.spark.streaming.util.HdfsUtils
- checkThresholdConsistency() - Method in interface org.apache.spark.ml.classification.LogisticRegressionParams
If threshold
and thresholds
are both set, ensures they are consistent.
- child() - Method in class org.apache.spark.sql.hive.execution.ScriptTransformationExec
- child() - Method in class org.apache.spark.sql.sources.Not
- CHILD_CONNECTION_TIMEOUT - Static variable in class org.apache.spark.launcher.SparkLauncher
Maximum time (in ms) to wait for a child process to connect back to the launcher server
when using @link{#start()}.
- CHILD_PROCESS_LOGGER_NAME - Static variable in class org.apache.spark.launcher.SparkLauncher
Logger name to use when launching a child process.
- ChildFirstURLClassLoader - Class in org.apache.spark.util
A mutable class loader that gives preference to its own URLs over the parent class loader
when loading classes and resources.
- ChildFirstURLClassLoader(URL[], ClassLoader) - Constructor for class org.apache.spark.util.ChildFirstURLClassLoader
- chiSqFunc() - Method in class org.apache.spark.mllib.stat.test.ChiSqTest.Method
- ChiSqSelector - Class in org.apache.spark.ml.feature
Chi-Squared feature selection, which selects categorical features to use for predicting a
categorical label.
- ChiSqSelector(String) - Constructor for class org.apache.spark.ml.feature.ChiSqSelector
- ChiSqSelector() - Constructor for class org.apache.spark.ml.feature.ChiSqSelector
- ChiSqSelector - Class in org.apache.spark.mllib.feature
Creates a ChiSquared feature selector.
- ChiSqSelector() - Constructor for class org.apache.spark.mllib.feature.ChiSqSelector
- ChiSqSelector(int) - Constructor for class org.apache.spark.mllib.feature.ChiSqSelector
The is the same to call this() and setNumTopFeatures(numTopFeatures)
- ChiSqSelectorModel - Class in org.apache.spark.ml.feature
- ChiSqSelectorModel - Class in org.apache.spark.mllib.feature
Chi Squared selector model.
- ChiSqSelectorModel(int[]) - Constructor for class org.apache.spark.mllib.feature.ChiSqSelectorModel
- ChiSqSelectorModel.SaveLoadV1_0$ - Class in org.apache.spark.mllib.feature
- ChiSqSelectorModel.SaveLoadV1_0$.Data - Class in org.apache.spark.mllib.feature
Model data for import/export
- ChiSqSelectorModel.SaveLoadV1_0$.Data$ - Class in org.apache.spark.mllib.feature
- ChiSqSelectorParams - Interface in org.apache.spark.ml.feature
- chiSqTest(Vector, Vector) - Static method in class org.apache.spark.mllib.stat.Statistics
Conduct Pearson's chi-squared goodness of fit test of the observed data against the
expected distribution.
- chiSqTest(Vector) - Static method in class org.apache.spark.mllib.stat.Statistics
Conduct Pearson's chi-squared goodness of fit test of the observed data against the uniform
distribution, with each category having an expected frequency of 1 / observed.size
- chiSqTest(Matrix) - Static method in class org.apache.spark.mllib.stat.Statistics
Conduct Pearson's independence test on the input contingency matrix, which cannot contain
negative entries or columns or rows that sum up to 0.
- chiSqTest(RDD<LabeledPoint>) - Static method in class org.apache.spark.mllib.stat.Statistics
Conduct Pearson's independence test for every feature against the label across the input RDD.
- chiSqTest(JavaRDD<LabeledPoint>) - Static method in class org.apache.spark.mllib.stat.Statistics
Java-friendly version of chiSqTest()
- ChiSqTest - Class in org.apache.spark.mllib.stat.test
Conduct the chi-squared test for the input RDDs using the specified method.
- ChiSqTest() - Constructor for class org.apache.spark.mllib.stat.test.ChiSqTest
- ChiSqTest.Method - Class in org.apache.spark.mllib.stat.test
param: name String name for the method.
- ChiSqTest.Method$ - Class in org.apache.spark.mllib.stat.test
- ChiSqTest.NullHypothesis$ - Class in org.apache.spark.mllib.stat.test
- ChiSqTestResult - Class in org.apache.spark.mllib.stat.test
Object containing the test results for the chi-squared hypothesis test.
- chiSquared(Vector, Vector, String) - Static method in class org.apache.spark.mllib.stat.test.ChiSqTest
- chiSquaredFeatures(RDD<LabeledPoint>, String) - Static method in class org.apache.spark.mllib.stat.test.ChiSqTest
Conduct Pearson's independence test for each feature against the label across the input RDD.
- chiSquaredMatrix(Matrix, String) - Static method in class org.apache.spark.mllib.stat.test.ChiSqTest
- ChiSquareTest - Class in org.apache.spark.ml.stat
:: Experimental ::
- ChiSquareTest() - Constructor for class org.apache.spark.ml.stat.ChiSquareTest
- chmod700(File) - Static method in class org.apache.spark.util.Utils
JDK equivalent of chmod 700 file
- CholeskyDecomposition - Class in org.apache.spark.mllib.linalg
Compute Cholesky decomposition.
- CholeskyDecomposition() - Constructor for class org.apache.spark.mllib.linalg.CholeskyDecomposition
- cipherStream() - Method in interface org.apache.spark.security.CryptoStreamUtils.BaseErrorHandler
The encrypted stream that may get into an unhealthy state.
- classForName(String) - Static method in class org.apache.spark.util.Utils
Preferred alternative to Class.forName(className)
- Classification() - Static method in class org.apache.spark.mllib.tree.configuration.Algo
- ClassificationLoss - Interface in org.apache.spark.mllib.tree.loss
- ClassificationModel<FeaturesType,M extends ClassificationModel<FeaturesType,M>> - Class in org.apache.spark.ml.classification
:: DeveloperApi ::
- ClassificationModel() - Constructor for class org.apache.spark.ml.classification.ClassificationModel
- ClassificationModel - Interface in org.apache.spark.mllib.classification
Represents a classification model that predicts to which of a set of categories an example
- Classifier<FeaturesType,E extends Classifier<FeaturesType,E,M>,M extends ClassificationModel<FeaturesType,M>> - Class in org.apache.spark.ml.classification
:: DeveloperApi ::
- Classifier() - Constructor for class org.apache.spark.ml.classification.Classifier
- classifier() - Method in interface org.apache.spark.ml.classification.OneVsRestParams
param for the base binary classifier that we reduce multiclass classification into.
- ClassifierParams - Interface in org.apache.spark.ml.classification
(private[spark]) Params for classification.
- ClassifierTypeTrait - Interface in org.apache.spark.ml.classification
- classIsLoadable(String) - Static method in class org.apache.spark.util.Utils
Determines whether the provided class is loadable in the current thread.
- className() - Method in class org.apache.spark.ExceptionFailure
- className() - Static method in class org.apache.spark.ml.linalg.JsonMatrixConverter
Unique class name for identifying JSON object encoded by this class.
- className() - Method in class org.apache.spark.sql.catalog.Function
- classpathEntries() - Method in class org.apache.spark.status.api.v1.ApplicationEnvironmentInfo
- classTag() - Method in class org.apache.spark.api.java.JavaDoubleRDD
- classTag() - Method in class org.apache.spark.api.java.JavaPairRDD
- classTag() - Method in class org.apache.spark.api.java.JavaRDD
- classTag() - Method in interface org.apache.spark.api.java.JavaRDDLike
- classTag() - Method in class org.apache.spark.sql.Dataset
- classTag() - Method in class org.apache.spark.storage.memory.DeserializedMemoryEntry
- classTag() - Method in interface org.apache.spark.storage.memory.MemoryEntry
- classTag() - Method in class org.apache.spark.storage.memory.SerializedMemoryEntry
- classTag() - Method in class org.apache.spark.streaming.api.java.JavaDStream
- classTag() - Method in interface org.apache.spark.streaming.api.java.JavaDStreamLike
- classTag() - Method in class org.apache.spark.streaming.api.java.JavaInputDStream
- classTag() - Method in class org.apache.spark.streaming.api.java.JavaPairDStream
- classTag() - Method in class org.apache.spark.streaming.api.java.JavaReceiverInputDStream
- clean(long, boolean) - Method in class org.apache.spark.streaming.util.WriteAheadLog
Clean all the records that are older than the threshold time.
- clean(Object, boolean, boolean) - Static method in class org.apache.spark.util.ClosureCleaner
Clean the given closure in place.
- CleanAccum - Class in org.apache.spark
- CleanAccum(long) - Constructor for class org.apache.spark.CleanAccum
- CleanBroadcast - Class in org.apache.spark
- CleanBroadcast(long) - Constructor for class org.apache.spark.CleanBroadcast
- CleanCheckpoint - Class in org.apache.spark
- CleanCheckpoint(int) - Constructor for class org.apache.spark.CleanCheckpoint
- CleanerListener - Interface in org.apache.spark
Listener class used for testing when any item has been cleaned by the Cleaner class.
- cleaning() - Method in class org.apache.spark.status.LiveStage
- CleanRDD - Class in org.apache.spark
- CleanRDD(int) - Constructor for class org.apache.spark.CleanRDD
- CleanShuffle - Class in org.apache.spark
- CleanShuffle(int) - Constructor for class org.apache.spark.CleanShuffle
- cleanupOldBlocks(long) - Method in interface org.apache.spark.streaming.receiver.ReceivedBlockHandler
Cleanup old blocks older than the given threshold time
- CleanupTask - Interface in org.apache.spark
Classes that represent cleaning tasks.
- CleanupTaskWeakReference - Class in org.apache.spark
A WeakReference associated with a CleanupTask.
- CleanupTaskWeakReference(CleanupTask, Object, ReferenceQueue<Object>) - Constructor for class org.apache.spark.CleanupTaskWeakReference
- clear(Param<?>) - Method in interface org.apache.spark.ml.param.Params
Clears the user-supplied value for the input param.
- clear() - Method in class org.apache.spark.sql.util.ExecutionListenerManager
- clear() - Static method in class org.apache.spark.util.AccumulatorContext
- clearActive() - Static method in class org.apache.spark.sql.SQLContext
- clearActiveSession() - Static method in class org.apache.spark.sql.SparkSession
Clears the active SparkSession for current thread.
- clearCache() - Method in class org.apache.spark.sql.catalog.Catalog
Removes all cached tables from the in-memory cache.
- clearCache() - Method in class org.apache.spark.sql.SQLContext
Removes all cached tables from the in-memory cache.
- clearCallSite() - Method in class org.apache.spark.api.java.JavaSparkContext
Pass-through to SparkContext.setCallSite.
- clearCallSite() - Method in class org.apache.spark.SparkContext
Clear the thread-local property for overriding the call sites
of actions and RDDs.
- clearDefaultSession() - Static method in class org.apache.spark.sql.SparkSession
Clears the default SparkSession that is returned by the builder.
- clearDependencies() - Method in class org.apache.spark.rdd.CoGroupedRDD
- clearDependencies() - Method in class org.apache.spark.rdd.ShuffledRDD
- clearDependencies() - Method in class org.apache.spark.rdd.UnionRDD
- clearJobGroup() - Method in class org.apache.spark.api.java.JavaSparkContext
Clear the current thread's job group ID and its description.
- clearJobGroup() - Method in class org.apache.spark.SparkContext
Clear the current thread's job group ID and its description.
- clearThreshold() - Method in class org.apache.spark.mllib.classification.LogisticRegressionModel
Clears the threshold so that predict
will output raw prediction scores.
- clearThreshold() - Method in class org.apache.spark.mllib.classification.SVMModel
Clears the threshold so that predict
will output raw prediction scores.
- Clock - Interface in org.apache.spark.util
An interface to represent clocks, so that they can be mocked out in unit tests.
- CLogLog$() - Constructor for class org.apache.spark.ml.regression.GeneralizedLinearRegression.CLogLog$
- clone() - Method in class org.apache.spark.SparkConf
Copy this object
- clone() - Method in class org.apache.spark.sql.ExperimentalMethods
- clone() - Method in class org.apache.spark.sql.types.Decimal
- clone() - Method in class org.apache.spark.sql.util.ExecutionListenerManager
Get an identical copy of this listener manager.
- clone() - Method in class org.apache.spark.storage.StorageLevel
- clone() - Method in class org.apache.spark.util.random.BernoulliCellSampler
- clone() - Method in class org.apache.spark.util.random.BernoulliSampler
- clone() - Method in class org.apache.spark.util.random.PoissonSampler
- clone() - Method in interface org.apache.spark.util.random.RandomSampler
return a copy of the RandomSampler object
- clone(T, SerializerInstance, ClassTag<T>) - Static method in class org.apache.spark.util.Utils
Clone an object using a Spark serializer.
- cloneComplement() - Method in class org.apache.spark.util.random.BernoulliCellSampler
Return a sampler that is the complement of the range specified of the current sampler.
- cloneProperties(Properties) - Static method in class org.apache.spark.util.Utils
Create a new properties object with the same values as `props`
- close() - Method in class org.apache.spark.api.java.JavaSparkContext
- close() - Method in class org.apache.spark.io.NioBufferedFileInputStream
- close() - Method in class org.apache.spark.io.ReadAheadInputStream
- close() - Method in class org.apache.spark.io.SnappyOutputStreamWrapper
- close() - Method in interface org.apache.spark.security.CryptoStreamUtils.BaseErrorHandler
- close() - Method in class org.apache.spark.serializer.DeserializationStream
- close() - Method in class org.apache.spark.serializer.SerializationStream
- close(Throwable) - Method in class org.apache.spark.sql.ForeachWriter
Called when stopping to process one partition of new data in the executor side.
- close() - Method in class org.apache.spark.sql.hive.execution.HiveOutputWriter
- close() - Method in class org.apache.spark.sql.SparkSession
Synonym for stop()
- close() - Method in class org.apache.spark.sql.vectorized.ArrowColumnVector
- close() - Method in class org.apache.spark.sql.vectorized.ColumnarBatch
Called to close all the columns in this batch.
- close() - Method in class org.apache.spark.sql.vectorized.ColumnVector
Cleans up memory for this column vector.
- close() - Method in class org.apache.spark.storage.BufferReleasingInputStream
- close() - Method in class org.apache.spark.storage.CountingWritableChannel
- close() - Method in class org.apache.spark.storage.TimeTrackingOutputStream
- close() - Method in class org.apache.spark.streaming.api.java.JavaStreamingContext
- close() - Method in class org.apache.spark.streaming.util.WriteAheadLog
Close this log and release any resources.
- closed() - Method in interface org.apache.spark.security.CryptoStreamUtils.BaseErrorHandler
- closeWriter(TaskAttemptContext) - Method in class org.apache.spark.internal.io.HadoopWriteConfigUtil
- ClosureCleaner - Class in org.apache.spark.util
A cleaner that renders closures serializable if they can be done so safely.
- ClosureCleaner() - Constructor for class org.apache.spark.util.ClosureCleaner
- closureSerializer() - Method in class org.apache.spark.SparkEnv
- cls() - Method in class org.apache.spark.sql.types.ObjectType
- cls() - Method in class org.apache.spark.util.MethodIdentifier
- clsTag() - Method in interface org.apache.spark.sql.Encoder
A ClassTag that can be used to construct an Array to contain a collection of T
- cluster() - Method in class org.apache.spark.ml.clustering.ClusteringSummary
Cluster centers of the transformed data.
- cluster() - Method in class org.apache.spark.mllib.clustering.PowerIterationClustering.Assignment
- clusterCenter() - Method in class org.apache.spark.ml.clustering.ClusterData
- clusterCenters() - Method in class org.apache.spark.ml.clustering.BisectingKMeansModel
- clusterCenters() - Method in class org.apache.spark.ml.clustering.KMeansModel
- clusterCenters() - Method in class org.apache.spark.mllib.clustering.BisectingKMeansModel
Leaf cluster centers.
- clusterCenters() - Method in class org.apache.spark.mllib.clustering.KMeansModel
- clusterCenters() - Method in class org.apache.spark.mllib.clustering.StreamingKMeansModel
- ClusterData - Class in org.apache.spark.ml.clustering
Helper class for storing model data
- ClusterData(int, Vector) - Constructor for class org.apache.spark.ml.clustering.ClusterData
- clusteredColumns - Variable in class org.apache.spark.sql.sources.v2.reader.partitioning.ClusteredDistribution
The names of the clustered columns.
- ClusteredDistribution - Class in org.apache.spark.sql.sources.v2.reader.partitioning
- ClusteredDistribution(String[]) - Constructor for class org.apache.spark.sql.sources.v2.reader.partitioning.ClusteredDistribution
- clusterIdx() - Method in class org.apache.spark.ml.clustering.ClusterData
- ClusteringEvaluator - Class in org.apache.spark.ml.evaluation
:: Experimental ::
- ClusteringEvaluator(String) - Constructor for class org.apache.spark.ml.evaluation.ClusteringEvaluator
- ClusteringEvaluator() - Constructor for class org.apache.spark.ml.evaluation.ClusteringEvaluator
- ClusteringSummary - Class in org.apache.spark.ml.clustering
:: Experimental ::
Summary of clustering algorithms.
- clusterSizes() - Method in class org.apache.spark.ml.clustering.ClusteringSummary
Size of (number of data points in) each cluster.
- ClusterStats(Vector, double, long) - Constructor for class org.apache.spark.ml.evaluation.SquaredEuclideanSilhouette.ClusterStats
- ClusterStats$() - Constructor for class org.apache.spark.ml.evaluation.SquaredEuclideanSilhouette.ClusterStats$
- clusterWeights() - Method in class org.apache.spark.mllib.clustering.StreamingKMeansModel
- cn() - Method in class org.apache.spark.mllib.feature.VocabWord
- coalesce(int) - Method in class org.apache.spark.api.java.JavaDoubleRDD
Return a new RDD that is reduced into numPartitions
- coalesce(int, boolean) - Method in class org.apache.spark.api.java.JavaDoubleRDD
Return a new RDD that is reduced into numPartitions
- coalesce(int) - Method in class org.apache.spark.api.java.JavaPairRDD
Return a new RDD that is reduced into numPartitions
- coalesce(int, boolean) - Method in class org.apache.spark.api.java.JavaPairRDD
Return a new RDD that is reduced into numPartitions
- coalesce(int) - Method in class org.apache.spark.api.java.JavaRDD
Return a new RDD that is reduced into numPartitions
- coalesce(int, boolean) - Method in class org.apache.spark.api.java.JavaRDD
Return a new RDD that is reduced into numPartitions
- coalesce(int, RDD<?>) - Method in class org.apache.spark.rdd.DefaultPartitionCoalescer
Runs the packing algorithm and returns an array of PartitionGroups that if possible are
load balanced and grouped by locality
- coalesce(int, RDD<?>) - Method in interface org.apache.spark.rdd.PartitionCoalescer
Coalesce the partitions of the given RDD.
- coalesce(int, boolean, Option<PartitionCoalescer>, Ordering<T>) - Method in class org.apache.spark.rdd.RDD
Return a new RDD that is reduced into numPartitions
- coalesce(int) - Method in class org.apache.spark.sql.Dataset
Returns a new Dataset that has exactly numPartitions
partitions, when the fewer partitions
are requested.
- coalesce(Column...) - Static method in class org.apache.spark.sql.functions
Returns the first column that is not null, or null if all inputs are null.
- coalesce(Seq<Column>) - Static method in class org.apache.spark.sql.functions
Returns the first column that is not null, or null if all inputs are null.
- CoarseGrainedClusterMessage - Interface in org.apache.spark.scheduler.cluster
- CoarseGrainedClusterMessages - Class in org.apache.spark.scheduler.cluster
- CoarseGrainedClusterMessages() - Constructor for class org.apache.spark.scheduler.cluster.CoarseGrainedClusterMessages
- CoarseGrainedClusterMessages.AddWebUIFilter - Class in org.apache.spark.scheduler.cluster
- CoarseGrainedClusterMessages.AddWebUIFilter$ - Class in org.apache.spark.scheduler.cluster
- CoarseGrainedClusterMessages.GetExecutorLossReason - Class in org.apache.spark.scheduler.cluster
- CoarseGrainedClusterMessages.GetExecutorLossReason$ - Class in org.apache.spark.scheduler.cluster
- CoarseGrainedClusterMessages.KillExecutors - Class in org.apache.spark.scheduler.cluster
- CoarseGrainedClusterMessages.KillExecutors$ - Class in org.apache.spark.scheduler.cluster
- CoarseGrainedClusterMessages.KillExecutorsOnHost - Class in org.apache.spark.scheduler.cluster
- CoarseGrainedClusterMessages.KillExecutorsOnHost$ - Class in org.apache.spark.scheduler.cluster
- CoarseGrainedClusterMessages.KillTask - Class in org.apache.spark.scheduler.cluster
- CoarseGrainedClusterMessages.KillTask$ - Class in org.apache.spark.scheduler.cluster
- CoarseGrainedClusterMessages.LaunchTask - Class in org.apache.spark.scheduler.cluster
- CoarseGrainedClusterMessages.LaunchTask$ - Class in org.apache.spark.scheduler.cluster
- CoarseGrainedClusterMessages.RegisterClusterManager - Class in org.apache.spark.scheduler.cluster
- CoarseGrainedClusterMessages.RegisterClusterManager$ - Class in org.apache.spark.scheduler.cluster
- CoarseGrainedClusterMessages.RegisteredExecutor$ - Class in org.apache.spark.scheduler.cluster
- CoarseGrainedClusterMessages.RegisterExecutor - Class in org.apache.spark.scheduler.cluster
- CoarseGrainedClusterMessages.RegisterExecutor$ - Class in org.apache.spark.scheduler.cluster
- CoarseGrainedClusterMessages.RegisterExecutorFailed - Class in org.apache.spark.scheduler.cluster
- CoarseGrainedClusterMessages.RegisterExecutorFailed$ - Class in org.apache.spark.scheduler.cluster
- CoarseGrainedClusterMessages.RegisterExecutorResponse - Interface in org.apache.spark.scheduler.cluster
- CoarseGrainedClusterMessages.RemoveExecutor - Class in org.apache.spark.scheduler.cluster
- CoarseGrainedClusterMessages.RemoveExecutor$ - Class in org.apache.spark.scheduler.cluster
- CoarseGrainedClusterMessages.RemoveWorker - Class in org.apache.spark.scheduler.cluster
- CoarseGrainedClusterMessages.RemoveWorker$ - Class in org.apache.spark.scheduler.cluster
- CoarseGrainedClusterMessages.RequestExecutors - Class in org.apache.spark.scheduler.cluster
- CoarseGrainedClusterMessages.RequestExecutors$ - Class in org.apache.spark.scheduler.cluster
- CoarseGrainedClusterMessages.RetrieveLastAllocatedExecutorId$ - Class in org.apache.spark.scheduler.cluster
- CoarseGrainedClusterMessages.RetrieveSparkAppConfig$ - Class in org.apache.spark.scheduler.cluster
- CoarseGrainedClusterMessages.ReviveOffers$ - Class in org.apache.spark.scheduler.cluster
- CoarseGrainedClusterMessages.SetupDriver - Class in org.apache.spark.scheduler.cluster
- CoarseGrainedClusterMessages.SetupDriver$ - Class in org.apache.spark.scheduler.cluster
- CoarseGrainedClusterMessages.Shutdown$ - Class in org.apache.spark.scheduler.cluster
- CoarseGrainedClusterMessages.SparkAppConfig - Class in org.apache.spark.scheduler.cluster
- CoarseGrainedClusterMessages.SparkAppConfig$ - Class in org.apache.spark.scheduler.cluster
- CoarseGrainedClusterMessages.StatusUpdate - Class in org.apache.spark.scheduler.cluster
- CoarseGrainedClusterMessages.StatusUpdate$ - Class in org.apache.spark.scheduler.cluster
- CoarseGrainedClusterMessages.StopDriver$ - Class in org.apache.spark.scheduler.cluster
- CoarseGrainedClusterMessages.StopExecutor$ - Class in org.apache.spark.scheduler.cluster
- CoarseGrainedClusterMessages.StopExecutors$ - Class in org.apache.spark.scheduler.cluster
- CoarseGrainedClusterMessages.UpdateDelegationTokens - Class in org.apache.spark.scheduler.cluster
- CoarseGrainedClusterMessages.UpdateDelegationTokens$ - Class in org.apache.spark.scheduler.cluster
- code() - Method in class org.apache.spark.mllib.feature.VocabWord
- CodegenMetrics - Class in org.apache.spark.metrics.source
:: Experimental ::
Metrics for code generation.
- CodegenMetrics() - Constructor for class org.apache.spark.metrics.source.CodegenMetrics
- codeLen() - Method in class org.apache.spark.mllib.feature.VocabWord
- coefficientMatrix() - Method in class org.apache.spark.ml.classification.LogisticRegressionModel
- coefficients() - Method in class org.apache.spark.ml.classification.LinearSVCModel
- coefficients() - Method in class org.apache.spark.ml.classification.LogisticRegressionModel
A vector of model coefficients for "binomial" logistic regression.
- coefficients() - Method in class org.apache.spark.ml.regression.AFTSurvivalRegressionModel
- coefficients() - Method in class org.apache.spark.ml.regression.GeneralizedLinearRegressionModel
- coefficients() - Method in class org.apache.spark.ml.regression.LinearRegressionModel
- coefficientStandardErrors() - Method in class org.apache.spark.ml.regression.GeneralizedLinearRegressionTrainingSummary
Standard error of estimated coefficients and intercept.
- coefficientStandardErrors() - Method in class org.apache.spark.ml.regression.LinearRegressionSummary
Standard error of estimated coefficients and intercept.
- cogroup(JavaPairRDD<K, W>, Partitioner) - Method in class org.apache.spark.api.java.JavaPairRDD
For each key k in this
or other
, return a resulting RDD that contains a tuple with the
list of values for that key in this
as well as other
- cogroup(JavaPairRDD<K, W1>, JavaPairRDD<K, W2>, Partitioner) - Method in class org.apache.spark.api.java.JavaPairRDD
For each key k in this
or other1
or other2
, return a resulting RDD that contains a
tuple with the list of values for that key in this
, other1
and other2
- cogroup(JavaPairRDD<K, W1>, JavaPairRDD<K, W2>, JavaPairRDD<K, W3>, Partitioner) - Method in class org.apache.spark.api.java.JavaPairRDD
For each key k in this
or other1
or other2
or other3
return a resulting RDD that contains a tuple with the list of values
for that key in this
, other1
, other2
and other3
- cogroup(JavaPairRDD<K, W>) - Method in class org.apache.spark.api.java.JavaPairRDD
For each key k in this
or other
, return a resulting RDD that contains a tuple with the
list of values for that key in this
as well as other
- cogroup(JavaPairRDD<K, W1>, JavaPairRDD<K, W2>) - Method in class org.apache.spark.api.java.JavaPairRDD
For each key k in this
or other1
or other2
, return a resulting RDD that contains a
tuple with the list of values for that key in this
, other1
and other2
- cogroup(JavaPairRDD<K, W1>, JavaPairRDD<K, W2>, JavaPairRDD<K, W3>) - Method in class org.apache.spark.api.java.JavaPairRDD
For each key k in this
or other1
or other2
or other3
return a resulting RDD that contains a tuple with the list of values
for that key in this
, other1
, other2
and other3
- cogroup(JavaPairRDD<K, W>, int) - Method in class org.apache.spark.api.java.JavaPairRDD
For each key k in this
or other
, return a resulting RDD that contains a tuple with the
list of values for that key in this
as well as other
- cogroup(JavaPairRDD<K, W1>, JavaPairRDD<K, W2>, int) - Method in class org.apache.spark.api.java.JavaPairRDD
For each key k in this
or other1
or other2
, return a resulting RDD that contains a
tuple with the list of values for that key in this
, other1
and other2
- cogroup(JavaPairRDD<K, W1>, JavaPairRDD<K, W2>, JavaPairRDD<K, W3>, int) - Method in class org.apache.spark.api.java.JavaPairRDD
For each key k in this
or other1
or other2
or other3
return a resulting RDD that contains a tuple with the list of values
for that key in this
, other1
, other2
and other3
- cogroup(RDD<Tuple2<K, W1>>, RDD<Tuple2<K, W2>>, RDD<Tuple2<K, W3>>, Partitioner) - Method in class org.apache.spark.rdd.PairRDDFunctions
For each key k in this
or other1
or other2
or other3
return a resulting RDD that contains a tuple with the list of values
for that key in this
, other1
, other2
and other3
- cogroup(RDD<Tuple2<K, W>>, Partitioner) - Method in class org.apache.spark.rdd.PairRDDFunctions
For each key k in this
or other
, return a resulting RDD that contains a tuple with the
list of values for that key in this
as well as other
- cogroup(RDD<Tuple2<K, W1>>, RDD<Tuple2<K, W2>>, Partitioner) - Method in class org.apache.spark.rdd.PairRDDFunctions
For each key k in this
or other1
or other2
, return a resulting RDD that contains a
tuple with the list of values for that key in this
, other1
and other2
- cogroup(RDD<Tuple2<K, W1>>, RDD<Tuple2<K, W2>>, RDD<Tuple2<K, W3>>) - Method in class org.apache.spark.rdd.PairRDDFunctions
For each key k in this
or other1
or other2
or other3
return a resulting RDD that contains a tuple with the list of values
for that key in this
, other1
, other2
and other3
- cogroup(RDD<Tuple2<K, W>>) - Method in class org.apache.spark.rdd.PairRDDFunctions
For each key k in this
or other
, return a resulting RDD that contains a tuple with the
list of values for that key in this
as well as other
- cogroup(RDD<Tuple2<K, W1>>, RDD<Tuple2<K, W2>>) - Method in class org.apache.spark.rdd.PairRDDFunctions
For each key k in this
or other1
or other2
, return a resulting RDD that contains a
tuple with the list of values for that key in this
, other1
and other2
- cogroup(RDD<Tuple2<K, W>>, int) - Method in class org.apache.spark.rdd.PairRDDFunctions
For each key k in this
or other
, return a resulting RDD that contains a tuple with the
list of values for that key in this
as well as other
- cogroup(RDD<Tuple2<K, W1>>, RDD<Tuple2<K, W2>>, int) - Method in class org.apache.spark.rdd.PairRDDFunctions
For each key k in this
or other1
or other2
, return a resulting RDD that contains a
tuple with the list of values for that key in this
, other1
and other2
- cogroup(RDD<Tuple2<K, W1>>, RDD<Tuple2<K, W2>>, RDD<Tuple2<K, W3>>, int) - Method in class org.apache.spark.rdd.PairRDDFunctions
For each key k in this
or other1
or other2
or other3
return a resulting RDD that contains a tuple with the list of values
for that key in this
, other1
, other2
and other3
- cogroup(KeyValueGroupedDataset<K, U>, Function3<K, Iterator<V>, Iterator<U>, TraversableOnce<R>>, Encoder<R>) - Method in class org.apache.spark.sql.KeyValueGroupedDataset
Applies the given function to each cogrouped data.
- cogroup(KeyValueGroupedDataset<K, U>, CoGroupFunction<K, V, U, R>, Encoder<R>) - Method in class org.apache.spark.sql.KeyValueGroupedDataset
Applies the given function to each cogrouped data.
- cogroup(JavaPairDStream<K, W>) - Method in class org.apache.spark.streaming.api.java.JavaPairDStream
Return a new DStream by applying 'cogroup' between RDDs of this
DStream and other
- cogroup(JavaPairDStream<K, W>, int) - Method in class org.apache.spark.streaming.api.java.JavaPairDStream
Return a new DStream by applying 'cogroup' between RDDs of this
DStream and other
- cogroup(JavaPairDStream<K, W>, Partitioner) - Method in class org.apache.spark.streaming.api.java.JavaPairDStream
Return a new DStream by applying 'cogroup' between RDDs of this
DStream and other
- cogroup(DStream<Tuple2<K, W>>, ClassTag<W>) - Method in class org.apache.spark.streaming.dstream.PairDStreamFunctions
Return a new DStream by applying 'cogroup' between RDDs of this
DStream and other
- cogroup(DStream<Tuple2<K, W>>, int, ClassTag<W>) - Method in class org.apache.spark.streaming.dstream.PairDStreamFunctions
Return a new DStream by applying 'cogroup' between RDDs of this
DStream and other
- cogroup(DStream<Tuple2<K, W>>, Partitioner, ClassTag<W>) - Method in class org.apache.spark.streaming.dstream.PairDStreamFunctions
Return a new DStream by applying 'cogroup' between RDDs of this
DStream and other
- CoGroupedRDD<K> - Class in org.apache.spark.rdd
:: DeveloperApi ::
An RDD that cogroups its parents.
- CoGroupedRDD(Seq<RDD<? extends Product2<K, ?>>>, Partitioner, ClassTag<K>) - Constructor for class org.apache.spark.rdd.CoGroupedRDD
- CoGroupFunction<K,V1,V2,R> - Interface in org.apache.spark.api.java.function
A function that returns zero or more output records from each grouping key and its values from 2
- col(String) - Method in class org.apache.spark.sql.Dataset
Selects column based on the column name and returns it as a
- col(String) - Static method in class org.apache.spark.sql.functions
Returns a
based on the given column name.
- coldStartStrategy() - Method in interface org.apache.spark.ml.recommendation.ALSModelParams
Param for strategy for dealing with unknown or new users/items at prediction time.
- colIter() - Method in class org.apache.spark.ml.linalg.DenseMatrix
- colIter() - Method in interface org.apache.spark.ml.linalg.Matrix
Returns an iterator of column vectors.
- colIter() - Method in class org.apache.spark.ml.linalg.SparseMatrix
- colIter() - Method in class org.apache.spark.mllib.linalg.DenseMatrix
- colIter() - Method in interface org.apache.spark.mllib.linalg.Matrix
Returns an iterator of column vectors.
- colIter() - Method in class org.apache.spark.mllib.linalg.SparseMatrix
- collect() - Method in interface org.apache.spark.api.java.JavaRDDLike
Return an array that contains all of the elements in this RDD.
- collect() - Method in class org.apache.spark.graphx.impl.EdgeRDDImpl
- collect() - Method in class org.apache.spark.rdd.RDD
Return an array that contains all of the elements in this RDD.
- collect(PartialFunction<T, U>, ClassTag<U>) - Method in class org.apache.spark.rdd.RDD
Return an RDD that contains all matching values by applying f
- collect() - Method in class org.apache.spark.sql.Dataset
Returns an array that contains all rows in this Dataset.
- collect_list(Column) - Static method in class org.apache.spark.sql.functions
Aggregate function: returns a list of objects with duplicates.
- collect_list(String) - Static method in class org.apache.spark.sql.functions
Aggregate function: returns a list of objects with duplicates.
- collect_set(Column) - Static method in class org.apache.spark.sql.functions
Aggregate function: returns a set of objects with duplicate elements eliminated.
- collect_set(String) - Static method in class org.apache.spark.sql.functions
Aggregate function: returns a set of objects with duplicate elements eliminated.
- collectAsList() - Method in class org.apache.spark.sql.Dataset
Returns a Java list that contains all rows in this Dataset.
- collectAsMap() - Method in class org.apache.spark.api.java.JavaPairRDD
Return the key-value pairs in this RDD to the master as a Map.
- collectAsMap() - Method in class org.apache.spark.rdd.PairRDDFunctions
Return the key-value pairs in this RDD to the master as a Map.
- collectAsync() - Method in interface org.apache.spark.api.java.JavaRDDLike
The asynchronous version of collect
, which returns a future for
retrieving an array containing all of the elements in this RDD.
- collectAsync() - Method in class org.apache.spark.rdd.AsyncRDDActions
Returns a future for retrieving all elements of this RDD.
- collectEdges(EdgeDirection) - Method in class org.apache.spark.graphx.GraphOps
Returns an RDD that contains for each vertex v its local edges,
i.e., the edges that are incident on v, in the user-specified direction.
- collectionAccumulator() - Method in class org.apache.spark.SparkContext
Create and register a CollectionAccumulator
, which starts with empty list and accumulates
inputs by adding them into the list.
- collectionAccumulator(String) - Method in class org.apache.spark.SparkContext
Create and register a CollectionAccumulator
, which starts with empty list and accumulates
inputs by adding them into the list.
- CollectionAccumulator<T> - Class in org.apache.spark.util
- CollectionAccumulator() - Constructor for class org.apache.spark.util.CollectionAccumulator
- CollectionsUtils - Class in org.apache.spark.util
- CollectionsUtils() - Constructor for class org.apache.spark.util.CollectionsUtils
- collectNeighborIds(EdgeDirection) - Method in class org.apache.spark.graphx.GraphOps
Collect the neighbor vertex ids for each vertex.
- collectNeighbors(EdgeDirection) - Method in class org.apache.spark.graphx.GraphOps
Collect the neighbor vertex attributes for each vertex.
- collectPartitions(int[]) - Method in interface org.apache.spark.api.java.JavaRDDLike
Return an array that contains all of the elements in a specific partition of this RDD.
- collectSubModels() - Method in interface org.apache.spark.ml.param.shared.HasCollectSubModels
Param for whether to collect a list of sub-models trained during tuning.
- colPtrs() - Method in class org.apache.spark.ml.linalg.SparseMatrix
- colPtrs() - Method in class org.apache.spark.mllib.linalg.SparseMatrix
- colRegex(String) - Method in class org.apache.spark.sql.Dataset
Selects column based on the column name specified as a regex and returns it as
- colsPerBlock() - Method in class org.apache.spark.mllib.linalg.distributed.BlockMatrix
- colStats(RDD<Vector>) - Static method in class org.apache.spark.mllib.stat.Statistics
Computes column-wise summary statistics for the input RDD[Vector].
- Column - Class in org.apache.spark.sql.catalog
A column in Spark, as returned by
method in
- Column(String, String, String, boolean, boolean, boolean) - Constructor for class org.apache.spark.sql.catalog.Column
- Column - Class in org.apache.spark.sql
A column that will be computed based on the data in a DataFrame
- Column(Expression) - Constructor for class org.apache.spark.sql.Column
- Column(String) - Constructor for class org.apache.spark.sql.Column
- column(String) - Static method in class org.apache.spark.sql.functions
Returns a
based on the given column name.
- column(int) - Method in class org.apache.spark.sql.vectorized.ColumnarBatch
Returns the column at `ordinal`.
- ColumnarArray - Class in org.apache.spark.sql.vectorized
- ColumnarArray(ColumnVector, int, int) - Constructor for class org.apache.spark.sql.vectorized.ColumnarArray
- ColumnarBatch - Class in org.apache.spark.sql.vectorized
This class wraps multiple ColumnVectors as a row-wise table.
- ColumnarBatch(ColumnVector[]) - Constructor for class org.apache.spark.sql.vectorized.ColumnarBatch
- ColumnarMap - Class in org.apache.spark.sql.vectorized
- ColumnarMap(ColumnVector, ColumnVector, int, int) - Constructor for class org.apache.spark.sql.vectorized.ColumnarMap
- ColumnarRow - Class in org.apache.spark.sql.vectorized
- ColumnarRow(ColumnVector, int) - Constructor for class org.apache.spark.sql.vectorized.ColumnarRow
- ColumnName - Class in org.apache.spark.sql
A convenient class used for constructing schema.
- ColumnName(String) - Constructor for class org.apache.spark.sql.ColumnName
- ColumnPruner - Class in org.apache.spark.ml.feature
Utility transformer for removing temporary columns from a DataFrame.
- ColumnPruner(String, Set<String>) - Constructor for class org.apache.spark.ml.feature.ColumnPruner
- ColumnPruner(Set<String>) - Constructor for class org.apache.spark.ml.feature.ColumnPruner
- columns() - Method in class org.apache.spark.sql.Dataset
Returns all column names as an array.
- columnSchema() - Static method in class org.apache.spark.ml.image.ImageSchema
Schema for the image column: Row(String, Int, Int, Int, Int, Array[Byte])
- columnSimilarities() - Method in class org.apache.spark.mllib.linalg.distributed.IndexedRowMatrix
Compute all cosine similarities between columns of this matrix using the brute-force
approach of computing normalized dot products.
- columnSimilarities() - Method in class org.apache.spark.mllib.linalg.distributed.RowMatrix
Compute all cosine similarities between columns of this matrix using the brute-force
approach of computing normalized dot products.
- columnSimilarities(double) - Method in class org.apache.spark.mllib.linalg.distributed.RowMatrix
Compute similarities between columns of this matrix using a sampling approach.
- columnsToPrune() - Method in class org.apache.spark.ml.feature.ColumnPruner
- columnToOldVector(Dataset<?>, String) - Static method in class org.apache.spark.ml.util.DatasetUtils
- columnToVector(Dataset<?>, String) - Static method in class org.apache.spark.ml.util.DatasetUtils
Cast a column in a Dataset to Vector type.
- ColumnVector - Class in org.apache.spark.sql.vectorized
An interface representing in-memory columnar data in Spark.
- combineByKey(Function<V, C>, Function2<C, V, C>, Function2<C, C, C>, Partitioner, boolean, Serializer) - Method in class org.apache.spark.api.java.JavaPairRDD
Generic function to combine the elements for each key using a custom set of aggregation
- combineByKey(Function<V, C>, Function2<C, V, C>, Function2<C, C, C>, Partitioner) - Method in class org.apache.spark.api.java.JavaPairRDD
Generic function to combine the elements for each key using a custom set of aggregation
- combineByKey(Function<V, C>, Function2<C, V, C>, Function2<C, C, C>, int) - Method in class org.apache.spark.api.java.JavaPairRDD
Simplified version of combineByKey that hash-partitions the output RDD and uses map-side
- combineByKey(Function<V, C>, Function2<C, V, C>, Function2<C, C, C>) - Method in class org.apache.spark.api.java.JavaPairRDD
Simplified version of combineByKey that hash-partitions the resulting RDD using the existing
partitioner/parallelism level and using map-side aggregation.
- combineByKey(Function1<V, C>, Function2<C, V, C>, Function2<C, C, C>, Partitioner, boolean, Serializer) - Method in class org.apache.spark.rdd.PairRDDFunctions
Generic function to combine the elements for each key using a custom set of aggregation
- combineByKey(Function1<V, C>, Function2<C, V, C>, Function2<C, C, C>, int) - Method in class org.apache.spark.rdd.PairRDDFunctions
Simplified version of combineByKeyWithClassTag that hash-partitions the output RDD.
- combineByKey(Function1<V, C>, Function2<C, V, C>, Function2<C, C, C>) - Method in class org.apache.spark.rdd.PairRDDFunctions
Simplified version of combineByKeyWithClassTag that hash-partitions the resulting RDD using the
existing partitioner/parallelism level.
- combineByKey(Function<V, C>, Function2<C, V, C>, Function2<C, C, C>, Partitioner) - Method in class org.apache.spark.streaming.api.java.JavaPairDStream
Combine elements of each key in DStream's RDDs using custom function.
- combineByKey(Function<V, C>, Function2<C, V, C>, Function2<C, C, C>, Partitioner, boolean) - Method in class org.apache.spark.streaming.api.java.JavaPairDStream
Combine elements of each key in DStream's RDDs using custom function.
- combineByKey(Function1<V, C>, Function2<C, V, C>, Function2<C, C, C>, Partitioner, boolean, ClassTag<C>) - Method in class org.apache.spark.streaming.dstream.PairDStreamFunctions
Combine elements of each key in DStream's RDDs using custom functions.
- combineByKeyWithClassTag(Function1<V, C>, Function2<C, V, C>, Function2<C, C, C>, Partitioner, boolean, Serializer, ClassTag<C>) - Method in class org.apache.spark.rdd.PairRDDFunctions
:: Experimental ::
Generic function to combine the elements for each key using a custom set of aggregation
- combineByKeyWithClassTag(Function1<V, C>, Function2<C, V, C>, Function2<C, C, C>, int, ClassTag<C>) - Method in class org.apache.spark.rdd.PairRDDFunctions
:: Experimental ::
Simplified version of combineByKeyWithClassTag that hash-partitions the output RDD.
- combineByKeyWithClassTag(Function1<V, C>, Function2<C, V, C>, Function2<C, C, C>, ClassTag<C>) - Method in class org.apache.spark.rdd.PairRDDFunctions
:: Experimental ::
Simplified version of combineByKeyWithClassTag that hash-partitions the resulting RDD using the
existing partitioner/parallelism level.
- combineCombinersByKey(Iterator<? extends Product2<K, C>>, TaskContext) - Method in class org.apache.spark.Aggregator
- combineValuesByKey(Iterator<? extends Product2<K, V>>, TaskContext) - Method in class org.apache.spark.Aggregator
- CommandLineUtils - Interface in org.apache.spark.util
Contains basic command line parsing functionality and methods to parse some common Spark CLI
- commit(Function0<Parsers.Parser<T>>) - Static method in class org.apache.spark.ml.feature.RFormulaParser
- commit(Offset) - Method in interface org.apache.spark.sql.sources.v2.reader.streaming.ContinuousReader
Informs the source that Spark has completed processing all data for offsets less than or
equal to `end` and will only request offsets greater than `end` in the future.
- commit(Offset) - Method in interface org.apache.spark.sql.sources.v2.reader.streaming.MicroBatchReader
Informs the source that Spark has completed processing all data for offsets less than or
equal to `end` and will only request offsets greater than `end` in the future.
- commit(WriterCommitMessage[]) - Method in interface org.apache.spark.sql.sources.v2.writer.DataSourceWriter
Commits this writing job with a list of commit messages.
- commit() - Method in interface org.apache.spark.sql.sources.v2.writer.DataWriter
- commit(long, WriterCommitMessage[]) - Method in interface org.apache.spark.sql.sources.v2.writer.streaming.StreamWriter
Commits this writing job for the specified epoch with a list of commit messages.
- commit(WriterCommitMessage[]) - Method in interface org.apache.spark.sql.sources.v2.writer.streaming.StreamWriter
- commitJob(JobContext, Seq<FileCommitProtocol.TaskCommitMessage>) - Method in class org.apache.spark.internal.io.FileCommitProtocol
Commits a job after the writes succeed.
- commitJob(JobContext, Seq<FileCommitProtocol.TaskCommitMessage>) - Method in class org.apache.spark.internal.io.HadoopMapReduceCommitProtocol
- commitTask(TaskAttemptContext) - Method in class org.apache.spark.internal.io.FileCommitProtocol
Commits a task after the writes succeed.
- commitTask(TaskAttemptContext) - Method in class org.apache.spark.internal.io.HadoopMapReduceCommitProtocol
- commitTask(OutputCommitter, TaskAttemptContext, int, int) - Static method in class org.apache.spark.mapred.SparkHadoopMapRedUtil
Commits a task output.
- commonHeaderNodes(HttpServletRequest) - Static method in class org.apache.spark.ui.UIUtils
- comparator(Schedulable, Schedulable) - Method in interface org.apache.spark.scheduler.SchedulingAlgorithm
- compare(PartitionGroup, PartitionGroup) - Method in class org.apache.spark.rdd.DefaultPartitionCoalescer
- compare(Option<PartitionGroup>, Option<PartitionGroup>) - Method in class org.apache.spark.rdd.DefaultPartitionCoalescer
- compare(Decimal) - Method in class org.apache.spark.sql.types.Decimal
- compare(Decimal, Decimal) - Method in interface org.apache.spark.sql.types.Decimal.DecimalIsConflicted
- compare(RDDInfo) - Method in class org.apache.spark.storage.RDDInfo
- compareTo(SparkShutdownHook) - Method in class org.apache.spark.util.SparkShutdownHook
- compileValue(Object) - Static method in class org.apache.spark.sql.jdbc.DB2Dialect
- compileValue(Object) - Static method in class org.apache.spark.sql.jdbc.DerbyDialect
- compileValue(Object) - Method in class org.apache.spark.sql.jdbc.JdbcDialect
Converts value to SQL expression.
- compileValue(Object) - Static method in class org.apache.spark.sql.jdbc.MsSqlServerDialect
- compileValue(Object) - Static method in class org.apache.spark.sql.jdbc.MySQLDialect
- compileValue(Object) - Static method in class org.apache.spark.sql.jdbc.NoopDialect
- compileValue(Object) - Static method in class org.apache.spark.sql.jdbc.OracleDialect
- compileValue(Object) - Static method in class org.apache.spark.sql.jdbc.PostgresDialect
- compileValue(Object) - Static method in class org.apache.spark.sql.jdbc.TeradataDialect
- Complete() - Static method in class org.apache.spark.sql.streaming.OutputMode
OutputMode in which all the rows in the streaming DataFrame/Dataset will be written
to the sink every time there are some updates.
- completed() - Method in class org.apache.spark.status.api.v1.ApplicationAttemptInfo
- completedIndices() - Method in class org.apache.spark.status.LiveJob
- completedIndices() - Method in class org.apache.spark.status.LiveStage
- completedStages() - Method in class org.apache.spark.status.LiveJob
- completedTasks() - Method in class org.apache.spark.status.api.v1.ExecutorSummary
- completedTasks() - Method in class org.apache.spark.status.LiveExecutor
- completedTasks() - Method in class org.apache.spark.status.LiveJob
- completedTasks() - Method in class org.apache.spark.status.LiveStage
- COMPLETION_TIME() - Static method in class org.apache.spark.status.TaskIndexNames
- completionTime() - Method in class org.apache.spark.scheduler.StageInfo
Time when all tasks in the stage completed or when the stage was cancelled.
- completionTime() - Method in class org.apache.spark.status.api.v1.JobData
- completionTime() - Method in class org.apache.spark.status.api.v1.StageData
- completionTime() - Method in class org.apache.spark.status.LiveJob
- ComplexFutureAction<T> - Class in org.apache.spark
for actions that could trigger multiple Spark jobs.
- ComplexFutureAction(Function1<JobSubmitter, Future<T>>) - Constructor for class org.apache.spark.ComplexFutureAction
- compressed() - Method in interface org.apache.spark.ml.linalg.Matrix
Returns a matrix in dense column major, dense row major, sparse row major, or sparse column
major format, whichever uses less storage.
- compressed() - Method in interface org.apache.spark.ml.linalg.Vector
Returns a vector in either dense or sparse format, whichever uses less storage.
- compressed() - Method in interface org.apache.spark.mllib.linalg.Vector
Returns a vector in either dense or sparse format, whichever uses less storage.
- compressedColMajor() - Method in interface org.apache.spark.ml.linalg.Matrix
Returns a matrix in dense or sparse column major format, whichever uses less storage.
- compressedInputStream(InputStream) - Method in interface org.apache.spark.io.CompressionCodec
- compressedInputStream(InputStream) - Method in class org.apache.spark.io.LZ4CompressionCodec
- compressedInputStream(InputStream) - Method in class org.apache.spark.io.LZFCompressionCodec
- compressedInputStream(InputStream) - Method in class org.apache.spark.io.SnappyCompressionCodec
- compressedInputStream(InputStream) - Method in class org.apache.spark.io.ZStdCompressionCodec
- compressedOutputStream(OutputStream) - Method in interface org.apache.spark.io.CompressionCodec
- compressedOutputStream(OutputStream) - Method in class org.apache.spark.io.LZ4CompressionCodec
- compressedOutputStream(OutputStream) - Method in class org.apache.spark.io.LZFCompressionCodec
- compressedOutputStream(OutputStream) - Method in class org.apache.spark.io.SnappyCompressionCodec
- compressedOutputStream(OutputStream) - Method in class org.apache.spark.io.ZStdCompressionCodec
- compressedRowMajor() - Method in interface org.apache.spark.ml.linalg.Matrix
Returns a matrix in dense or sparse row major format, whichever uses less storage.
- CompressionCodec - Interface in org.apache.spark.io
:: DeveloperApi ::
CompressionCodec allows the customization of choosing different compression implementations
to be used in block storage.
- compute(Partition, TaskContext) - Method in class org.apache.spark.api.r.BaseRRDD
- compute(Partition, TaskContext) - Method in class org.apache.spark.graphx.EdgeRDD
- compute(Partition, TaskContext) - Method in class org.apache.spark.graphx.VertexRDD
Provides the RDD[(VertexId, VD)]
equivalent output.
- compute(Vector, double, Vector) - Method in class org.apache.spark.mllib.optimization.Gradient
Compute the gradient and loss given the features of a single data point.
- compute(Vector, double, Vector, Vector) - Method in class org.apache.spark.mllib.optimization.Gradient
Compute the gradient and loss given the features of a single data point,
add the gradient to a provided vector to avoid creating new objects, and return loss.
- compute(Vector, double, Vector) - Method in class org.apache.spark.mllib.optimization.HingeGradient
- compute(Vector, double, Vector, Vector) - Method in class org.apache.spark.mllib.optimization.HingeGradient
- compute(Vector, Vector, double, int, double) - Method in class org.apache.spark.mllib.optimization.L1Updater
- compute(Vector, double, Vector) - Method in class org.apache.spark.mllib.optimization.LeastSquaresGradient
- compute(Vector, double, Vector, Vector) - Method in class org.apache.spark.mllib.optimization.LeastSquaresGradient
- compute(Vector, double, Vector, Vector) - Method in class org.apache.spark.mllib.optimization.LogisticGradient
- compute(Vector, Vector, double, int, double) - Method in class org.apache.spark.mllib.optimization.SimpleUpdater
- compute(Vector, Vector, double, int, double) - Method in class org.apache.spark.mllib.optimization.SquaredL2Updater
- compute(Vector, Vector, double, int, double) - Method in class org.apache.spark.mllib.optimization.Updater
Compute an updated value for weights given the gradient, stepSize, iteration number and
regularization parameter.
- compute(Partition, TaskContext) - Method in class org.apache.spark.rdd.CoGroupedRDD
- compute(Partition, TaskContext) - Method in class org.apache.spark.rdd.HadoopRDD
- compute(Partition, TaskContext) - Method in class org.apache.spark.rdd.JdbcRDD
- compute(Partition, TaskContext) - Method in class org.apache.spark.rdd.NewHadoopRDD
- compute(Partition, TaskContext) - Method in class org.apache.spark.rdd.PartitionPruningRDD
- compute(Partition, TaskContext) - Method in class org.apache.spark.rdd.RDD
:: DeveloperApi ::
Implemented by subclasses to compute a given partition.
- compute(Partition, TaskContext) - Method in class org.apache.spark.rdd.ShuffledRDD
- compute(Partition, TaskContext) - Method in class org.apache.spark.rdd.UnionRDD
- compute(Time) - Method in class org.apache.spark.streaming.api.java.JavaDStream
Generate an RDD for the given duration
- compute(Time) - Method in class org.apache.spark.streaming.api.java.JavaPairDStream
Method that generates an RDD for the given Duration
- compute(Time) - Method in class org.apache.spark.streaming.dstream.ConstantInputDStream
- compute(Time) - Method in class org.apache.spark.streaming.dstream.DStream
Method that generates an RDD for the given time
- compute(Time) - Method in class org.apache.spark.streaming.dstream.ReceiverInputDStream
- compute(long, long, long, long) - Method in interface org.apache.spark.streaming.scheduler.rate.RateEstimator
Computes the number of records the stream attached to this RateEstimator
should ingest per second, given an update on the size and completion
times of the latest batch.
- computeClusterStats(Dataset<Row>, String, String) - Static method in class org.apache.spark.ml.evaluation.CosineSilhouette
The method takes the input dataset and computes the aggregated values
about a cluster which are needed by the algorithm.
- computeClusterStats(Dataset<Row>, String, String) - Static method in class org.apache.spark.ml.evaluation.SquaredEuclideanSilhouette
The method takes the input dataset and computes the aggregated values
about a cluster which are needed by the algorithm.
- computeColumnSummaryStatistics() - Method in class org.apache.spark.mllib.linalg.distributed.RowMatrix
Computes column-wise summary statistics.
- computeCorrelation(RDD<Object>, RDD<Object>) - Method in interface org.apache.spark.mllib.stat.correlation.Correlation
Compute correlation for two datasets.
- computeCorrelation(RDD<Object>, RDD<Object>) - Static method in class org.apache.spark.mllib.stat.correlation.PearsonCorrelation
Compute the Pearson correlation for two datasets.
- computeCorrelation(RDD<Object>, RDD<Object>) - Static method in class org.apache.spark.mllib.stat.correlation.SpearmanCorrelation
Compute Spearman's correlation for two datasets.
- computeCorrelationMatrix(RDD<Vector>) - Method in interface org.apache.spark.mllib.stat.correlation.Correlation
Compute the correlation matrix S, for the input matrix, where S(i, j) is the correlation
between column i and j.
- computeCorrelationMatrix(RDD<Vector>) - Static method in class org.apache.spark.mllib.stat.correlation.PearsonCorrelation
Compute the Pearson correlation matrix S, for the input matrix, where S(i, j) is the
correlation between column i and j.
- computeCorrelationMatrix(RDD<Vector>) - Static method in class org.apache.spark.mllib.stat.correlation.SpearmanCorrelation
Compute Spearman's correlation matrix S, for the input matrix, where S(i, j) is the
correlation between column i and j.
- computeCorrelationMatrixFromCovariance(Matrix) - Static method in class org.apache.spark.mllib.stat.correlation.PearsonCorrelation
Compute the Pearson correlation matrix from the covariance matrix.
- computeCorrelationWithMatrixImpl(RDD<Object>, RDD<Object>) - Method in interface org.apache.spark.mllib.stat.correlation.Correlation
Combine the two input RDD[Double]s into an RDD[Vector] and compute the correlation using the
correlation implementation for RDD[Vector].
- computeCorrelationWithMatrixImpl(RDD<Object>, RDD<Object>) - Static method in class org.apache.spark.mllib.stat.correlation.PearsonCorrelation
- computeCorrelationWithMatrixImpl(RDD<Object>, RDD<Object>) - Static method in class org.apache.spark.mllib.stat.correlation.SpearmanCorrelation
- computeCost(Dataset<?>) - Method in class org.apache.spark.ml.clustering.BisectingKMeansModel
Computes the sum of squared distances between the input points and their corresponding cluster
- computeCost(Dataset<?>) - Method in class org.apache.spark.ml.clustering.KMeansModel
- computeCost(Vector) - Method in class org.apache.spark.mllib.clustering.BisectingKMeansModel
Computes the squared distance between the input point and the cluster center it belongs to.
- computeCost(RDD<Vector>) - Method in class org.apache.spark.mllib.clustering.BisectingKMeansModel
Computes the sum of squared distances between the input points and their corresponding cluster
- computeCost(JavaRDD<Vector>) - Method in class org.apache.spark.mllib.clustering.BisectingKMeansModel
Java-friendly version of computeCost()
- computeCost(RDD<Vector>) - Method in class org.apache.spark.mllib.clustering.KMeansModel
Return the K-means cost (sum of squared distances of points to their nearest center) for this
model on the given data.
- computeCovariance() - Method in class org.apache.spark.mllib.linalg.distributed.RowMatrix
Computes the covariance matrix, treating each row as an observation.
- computeError(RDD<LabeledPoint>, DecisionTreeRegressionModel[], double[], Loss) - Static method in class org.apache.spark.ml.tree.impl.GradientBoostedTrees
Method to calculate error of the base learner for the gradient boosting calculation.
- computeError(org.apache.spark.mllib.tree.model.TreeEnsembleModel, RDD<LabeledPoint>) - Method in interface org.apache.spark.mllib.tree.loss.Loss
Method to calculate error of the base learner for the gradient boosting calculation.
- computeError(double, double) - Method in interface org.apache.spark.mllib.tree.loss.Loss
Method to calculate loss when the predictions are already known.
- computeFractionForSampleSize(int, long, boolean) - Static method in class org.apache.spark.util.random.SamplingUtils
Returns a sampling rate that guarantees a sample of size greater than or equal to
sampleSizeLowerBound 99.99% of the time.
- computeGradient(DenseMatrix<Object>, DenseMatrix<Object>, Vector, int) - Method in interface org.apache.spark.ml.ann.TopologyModel
Computes gradient for the network
- computeGramianMatrix() - Method in class org.apache.spark.mllib.linalg.distributed.IndexedRowMatrix
Computes the Gramian matrix A^T A
- computeGramianMatrix() - Method in class org.apache.spark.mllib.linalg.distributed.RowMatrix
Computes the Gramian matrix A^T A
- computeInitialPredictionAndError(RDD<LabeledPoint>, double, DecisionTreeRegressionModel, Loss) - Static method in class org.apache.spark.ml.tree.impl.GradientBoostedTrees
Compute the initial predictions and errors for a dataset for the first
iteration of gradient boosting.
- computeInitialPredictionAndError(RDD<LabeledPoint>, double, DecisionTreeModel, Loss) - Static method in class org.apache.spark.mllib.tree.model.GradientBoostedTreesModel
:: DeveloperApi ::
Compute the initial predictions and errors for a dataset for the first
iteration of gradient boosting.
- computePreferredLocations(Seq<InputFormatInfo>) - Static method in class org.apache.spark.scheduler.InputFormatInfo
Computes the preferred locations based on input(s) and returned a location to block map.
- computePrevDelta(DenseMatrix<Object>, DenseMatrix<Object>, DenseMatrix<Object>) - Method in interface org.apache.spark.ml.ann.LayerModel
Computes the delta for back propagation.
- computePrincipalComponents(int) - Method in class org.apache.spark.mllib.linalg.distributed.RowMatrix
Computes the top k principal components only.
- computePrincipalComponentsAndExplainedVariance(int) - Method in class org.apache.spark.mllib.linalg.distributed.RowMatrix
Computes the top k principal components and a vector of proportions of
variance explained by each principal component.
- computeProbability(double) - Method in interface org.apache.spark.mllib.tree.loss.ClassificationLoss
Computes the class probability given the margin.
- computeSilhouetteCoefficient(Broadcast<Map<Object, Tuple2<Vector, Object>>>, Vector, double) - Static method in class org.apache.spark.ml.evaluation.CosineSilhouette
It computes the Silhouette coefficient for a point.
- computeSilhouetteCoefficient(Broadcast<Map<Object, SquaredEuclideanSilhouette.ClusterStats>>, Vector, double, double) - Static method in class org.apache.spark.ml.evaluation.SquaredEuclideanSilhouette
It computes the Silhouette coefficient for a point.
- computeSilhouetteScore(Dataset<?>, String, String) - Static method in class org.apache.spark.ml.evaluation.CosineSilhouette
Compute the Silhouette score of the dataset using the cosine distance measure.
- computeSilhouetteScore(Dataset<?>, String, String) - Static method in class org.apache.spark.ml.evaluation.SquaredEuclideanSilhouette
Compute the Silhouette score of the dataset using squared Euclidean distance measure.
- computeSVD(int, boolean, double) - Method in class org.apache.spark.mllib.linalg.distributed.IndexedRowMatrix
Computes the singular value decomposition of this IndexedRowMatrix.
- computeSVD(int, boolean, double) - Method in class org.apache.spark.mllib.linalg.distributed.RowMatrix
Computes singular value decomposition of this matrix.
- computeThresholdByKey(Map<K, AcceptanceResult>, Map<K, Object>) - Static method in class org.apache.spark.util.random.StratifiedSamplingUtils
Given the result returned by getCounts, determine the threshold for accepting items to
generate exact sample size.
- concat(Column...) - Static method in class org.apache.spark.sql.functions
Concatenates multiple input columns together into a single column.
- concat(Seq<Column>) - Static method in class org.apache.spark.sql.functions
Concatenates multiple input columns together into a single column.
- concat_ws(String, Column...) - Static method in class org.apache.spark.sql.functions
Concatenates multiple input string columns together into a single string column,
using the given separator.
- concat_ws(String, Seq<Column>) - Static method in class org.apache.spark.sql.functions
Concatenates multiple input string columns together into a single string column,
using the given separator.
- Conf(int, int, double, double, double, double, double, double) - Constructor for class org.apache.spark.graphx.lib.SVDPlusPlus.Conf
- conf() - Method in interface org.apache.spark.input.Configurable
- conf() - Method in class org.apache.spark.SparkEnv
- conf() - Method in class org.apache.spark.sql.hive.RelationConversions
- conf() - Method in class org.apache.spark.sql.SparkSession
Runtime configuration interface for Spark.
- confidence() - Method in class org.apache.spark.mllib.fpm.AssociationRules.Rule
Returns the confidence of the rule.
- confidence() - Method in class org.apache.spark.partial.BoundedDouble
- confidence() - Method in class org.apache.spark.util.sketch.CountMinSketch
- config(String, String) - Method in class org.apache.spark.sql.SparkSession.Builder
Sets a config option.
- config(String, long) - Method in class org.apache.spark.sql.SparkSession.Builder
Sets a config option.
- config(String, double) - Method in class org.apache.spark.sql.SparkSession.Builder
Sets a config option.
- config(String, boolean) - Method in class org.apache.spark.sql.SparkSession.Builder
Sets a config option.
- config(SparkConf) - Method in class org.apache.spark.sql.SparkSession.Builder
Sets a list of config options based on the given SparkConf
- config - Class in org.apache.spark.status
- config() - Constructor for class org.apache.spark.status.config
- ConfigEntryWithDefault<T> - Class in org.apache.spark.internal.config
- ConfigEntryWithDefault(String, List<String>, T, Function1<String, T>, Function1<T, String>, String, boolean) - Constructor for class org.apache.spark.internal.config.ConfigEntryWithDefault
- ConfigEntryWithDefaultFunction<T> - Class in org.apache.spark.internal.config
- ConfigEntryWithDefaultFunction(String, List<String>, Function0<T>, Function1<String, T>, Function1<T, String>, String, boolean) - Constructor for class org.apache.spark.internal.config.ConfigEntryWithDefaultFunction
- ConfigEntryWithDefaultString<T> - Class in org.apache.spark.internal.config
- ConfigEntryWithDefaultString(String, List<String>, String, Function1<String, T>, Function1<T, String>, String, boolean) - Constructor for class org.apache.spark.internal.config.ConfigEntryWithDefaultString
- ConfigHelpers - Class in org.apache.spark.internal.config
- ConfigHelpers() - Constructor for class org.apache.spark.internal.config.ConfigHelpers
- ConfigProvider - Interface in org.apache.spark.internal.config
A source of configuration values.
- configTestLog4j(String) - Static method in class org.apache.spark.TestUtils
config a log4j properties used for testsuite
- Configurable - Interface in org.apache.spark.input
A trait to implement Configurable
- configuration() - Method in class org.apache.spark.scheduler.InputFormatInfo
- CONFIGURATION_INSTANTIATION_LOCK() - Static method in class org.apache.spark.rdd.HadoopRDD
Configuration's constructor is not threadsafe (see SPARK-1097 and HADOOP-10456).
- CONFIGURATION_INSTANTIATION_LOCK() - Static method in class org.apache.spark.rdd.NewHadoopRDD
Configuration's constructor is not threadsafe (see SPARK-1097 and HADOOP-10456).
- configureJobPropertiesForStorageHandler(TableDesc, Configuration, boolean) - Static method in class org.apache.spark.sql.hive.HiveTableUtil
- confusionMatrix() - Method in class org.apache.spark.mllib.evaluation.MulticlassMetrics
Returns confusion matrix:
predicted classes are in columns,
they are ordered by class label ascending,
as in "labels"
- connectedComponents() - Method in class org.apache.spark.graphx.GraphOps
Compute the connected component membership of each vertex and return a graph with the vertex
value containing the lowest vertex id in the connected component containing that vertex.
- connectedComponents(int) - Method in class org.apache.spark.graphx.GraphOps
Compute the connected component membership of each vertex and return a graph with the vertex
value containing the lowest vertex id in the connected component containing that vertex.
- ConnectedComponents - Class in org.apache.spark.graphx.lib
Connected components algorithm.
- ConnectedComponents() - Constructor for class org.apache.spark.graphx.lib.ConnectedComponents
- consequent() - Method in class org.apache.spark.mllib.fpm.AssociationRules.Rule
- ConstantInputDStream<T> - Class in org.apache.spark.streaming.dstream
An input stream that always returns the same RDD on each time step.
- ConstantInputDStream(StreamingContext, RDD<T>, ClassTag<T>) - Constructor for class org.apache.spark.streaming.dstream.ConstantInputDStream
- constructTree(org.apache.spark.mllib.tree.model.DecisionTreeModel.SaveLoadV1_0.NodeData[]) - Method in class org.apache.spark.mllib.tree.model.DecisionTreeModel.SaveLoadV1_0$
Given a list of nodes from a tree, construct the tree.
- constructTrees(RDD<org.apache.spark.mllib.tree.model.DecisionTreeModel.SaveLoadV1_0.NodeData>) - Method in class org.apache.spark.mllib.tree.model.DecisionTreeModel.SaveLoadV1_0$
- constructURIForAuthentication(URI, org.apache.spark.SecurityManager) - Static method in class org.apache.spark.util.Utils
Construct a URI container information used for authentication.
- contains(Param<?>) - Method in class org.apache.spark.ml.param.ParamMap
Checks whether a parameter is explicitly specified.
- contains(String) - Method in class org.apache.spark.SparkConf
Does the configuration contain a given parameter?
- contains(Object) - Method in class org.apache.spark.sql.Column
Contains the other element.
- contains(String) - Method in class org.apache.spark.sql.types.Metadata
Tests whether this Metadata contains a binding for a key.
- containsDelimiters() - Method in class org.apache.spark.sql.hive.execution.HiveOptions
- containsKey(Object) - Method in class org.apache.spark.api.java.JavaUtils.SerializableMapWrapper
- containsNull() - Method in class org.apache.spark.sql.types.ArrayType
- contentType() - Method in class org.apache.spark.ui.JettyUtils.ServletParams
- context() - Method in interface org.apache.spark.api.java.JavaRDDLike
- context() - Method in class org.apache.spark.InterruptibleIterator
- context(SQLContext) - Static method in class org.apache.spark.ml.r.RWrappers
- context(SQLContext) - Method in interface org.apache.spark.ml.util.BaseReadWrite
- context(SQLContext) - Method in class org.apache.spark.ml.util.GeneralMLWriter
- context(SQLContext) - Method in class org.apache.spark.ml.util.MLReader
- context(SQLContext) - Method in class org.apache.spark.ml.util.MLWriter
- context() - Method in class org.apache.spark.rdd.RDD
- context() - Method in interface org.apache.spark.streaming.api.java.JavaDStreamLike
- context() - Method in class org.apache.spark.streaming.dstream.DStream
Return the StreamingContext associated with this DStream
- ContextBarrierId - Class in org.apache.spark
For each barrier stage attempt, only at most one barrier() call can be active at any time, thus
we can use (stageId, stageAttemptId) to identify the stage attempt where the barrier() call is
- ContextBarrierId(int, int) - Constructor for class org.apache.spark.ContextBarrierId
- Continuous() - Static method in class org.apache.spark.mllib.tree.configuration.FeatureType
- Continuous(long) - Static method in class org.apache.spark.sql.streaming.Trigger
A trigger that continuously processes streaming data, asynchronously checkpointing at
the specified interval.
- Continuous(long, TimeUnit) - Static method in class org.apache.spark.sql.streaming.Trigger
A trigger that continuously processes streaming data, asynchronously checkpointing at
the specified interval.
- Continuous(Duration) - Static method in class org.apache.spark.sql.streaming.Trigger
A trigger that continuously processes streaming data, asynchronously checkpointing at
the specified interval.
- Continuous(String) - Static method in class org.apache.spark.sql.streaming.Trigger
A trigger that continuously processes streaming data, asynchronously checkpointing at
the specified interval.
- ContinuousInputPartition<T> - Interface in org.apache.spark.sql.sources.v2.reader
- ContinuousInputPartitionReader<T> - Interface in org.apache.spark.sql.sources.v2.reader.streaming
- ContinuousReader - Interface in org.apache.spark.sql.sources.v2.reader.streaming
- ContinuousReadSupport - Interface in org.apache.spark.sql.sources.v2
- ContinuousSplit - Class in org.apache.spark.ml.tree
Split which tests a continuous feature.
- conv(Column, int, int) - Static method in class org.apache.spark.sql.functions
Convert a number in a string column from one base to another.
- CONVERT_METASTORE_ORC() - Static method in class org.apache.spark.sql.hive.HiveUtils
- CONVERT_METASTORE_PARQUET() - Static method in class org.apache.spark.sql.hive.HiveUtils
- CONVERT_METASTORE_PARQUET_WITH_SCHEMA_MERGING() - Static method in class org.apache.spark.sql.hive.HiveUtils
- convertMatrixColumnsFromML(Dataset<?>, String...) - Static method in class org.apache.spark.mllib.util.MLUtils
Converts matrix columns in an input Dataset to the
type from the new
type under the
- convertMatrixColumnsFromML(Dataset<?>, Seq<String>) - Static method in class org.apache.spark.mllib.util.MLUtils
Converts matrix columns in an input Dataset to the
type from the new
type under the
- convertMatrixColumnsToML(Dataset<?>, String...) - Static method in class org.apache.spark.mllib.util.MLUtils
Converts Matrix columns in an input Dataset from the
type to the new
type under the
- convertMatrixColumnsToML(Dataset<?>, Seq<String>) - Static method in class org.apache.spark.mllib.util.MLUtils
Converts Matrix columns in an input Dataset from the
type to the new
type under the
- convertToCanonicalEdges(Function2<ED, ED, ED>) - Method in class org.apache.spark.graphx.GraphOps
Convert bi-directional edges into uni-directional ones.
- convertToOldLossType(String) - Method in interface org.apache.spark.ml.tree.GBTRegressorParams
- convertToTimeUnit(long, TimeUnit) - Static method in class org.apache.spark.streaming.ui.UIUtils
Convert milliseconds
to the specified unit
- convertVectorColumnsFromML(Dataset<?>, String...) - Static method in class org.apache.spark.mllib.util.MLUtils
Converts vector columns in an input Dataset to the
type from the new
type under the
- convertVectorColumnsFromML(Dataset<?>, Seq<String>) - Static method in class org.apache.spark.mllib.util.MLUtils
Converts vector columns in an input Dataset to the
type from the new
type under the
- convertVectorColumnsToML(Dataset<?>, String...) - Static method in class org.apache.spark.mllib.util.MLUtils
Converts vector columns in an input Dataset from the
type to the new
type under the
- convertVectorColumnsToML(Dataset<?>, Seq<String>) - Static method in class org.apache.spark.mllib.util.MLUtils
Converts vector columns in an input Dataset from the
type to the new
type under the
- CoordinateMatrix - Class in org.apache.spark.mllib.linalg.distributed
Represents a matrix in coordinate format.
- CoordinateMatrix(RDD<MatrixEntry>, long, long) - Constructor for class org.apache.spark.mllib.linalg.distributed.CoordinateMatrix
- CoordinateMatrix(RDD<MatrixEntry>) - Constructor for class org.apache.spark.mllib.linalg.distributed.CoordinateMatrix
Alternative constructor leaving matrix dimensions to be determined automatically.
- copy(ParamMap) - Method in class org.apache.spark.ml.classification.DecisionTreeClassificationModel
- copy(ParamMap) - Method in class org.apache.spark.ml.classification.DecisionTreeClassifier
- copy(ParamMap) - Method in class org.apache.spark.ml.classification.GBTClassificationModel
- copy(ParamMap) - Method in class org.apache.spark.ml.classification.GBTClassifier
- copy(ParamMap) - Method in class org.apache.spark.ml.classification.LinearSVC
- copy(ParamMap) - Method in class org.apache.spark.ml.classification.LinearSVCModel
- copy(ParamMap) - Method in class org.apache.spark.ml.classification.LogisticRegression
- copy(ParamMap) - Method in class org.apache.spark.ml.classification.LogisticRegressionModel
- copy(ParamMap) - Method in class org.apache.spark.ml.classification.MultilayerPerceptronClassificationModel
- copy(ParamMap) - Method in class org.apache.spark.ml.classification.MultilayerPerceptronClassifier
- copy(ParamMap) - Method in class org.apache.spark.ml.classification.NaiveBayes
- copy(ParamMap) - Method in class org.apache.spark.ml.classification.NaiveBayesModel
- copy(ParamMap) - Method in class org.apache.spark.ml.classification.OneVsRest
- copy(ParamMap) - Method in class org.apache.spark.ml.classification.OneVsRestModel
- copy(ParamMap) - Method in class org.apache.spark.ml.classification.RandomForestClassificationModel
- copy(ParamMap) - Method in class org.apache.spark.ml.classification.RandomForestClassifier
- copy(ParamMap) - Method in class org.apache.spark.ml.clustering.BisectingKMeans
- copy(ParamMap) - Method in class org.apache.spark.ml.clustering.BisectingKMeansModel
- copy(ParamMap) - Method in class org.apache.spark.ml.clustering.DistributedLDAModel
- copy(ParamMap) - Method in class org.apache.spark.ml.clustering.GaussianMixture
- copy(ParamMap) - Method in class org.apache.spark.ml.clustering.GaussianMixtureModel
- copy(ParamMap) - Method in class org.apache.spark.ml.clustering.KMeans
- copy(ParamMap) - Method in class org.apache.spark.ml.clustering.KMeansModel
- copy(ParamMap) - Method in class org.apache.spark.ml.clustering.LDA
- copy(ParamMap) - Method in class org.apache.spark.ml.clustering.LocalLDAModel
- copy(ParamMap) - Method in class org.apache.spark.ml.clustering.PowerIterationClustering
- copy(ParamMap) - Method in class org.apache.spark.ml.Estimator
- copy(ParamMap) - Method in class org.apache.spark.ml.evaluation.BinaryClassificationEvaluator
- copy(ParamMap) - Method in class org.apache.spark.ml.evaluation.ClusteringEvaluator
- copy(ParamMap) - Method in class org.apache.spark.ml.evaluation.Evaluator
- copy(ParamMap) - Method in class org.apache.spark.ml.evaluation.MulticlassClassificationEvaluator
- copy(ParamMap) - Method in class org.apache.spark.ml.evaluation.RegressionEvaluator
- copy(ParamMap) - Method in class org.apache.spark.ml.feature.Binarizer
- copy(ParamMap) - Method in class org.apache.spark.ml.feature.BucketedRandomProjectionLSH
- copy(ParamMap) - Method in class org.apache.spark.ml.feature.BucketedRandomProjectionLSHModel
- copy(ParamMap) - Method in class org.apache.spark.ml.feature.Bucketizer
- copy(ParamMap) - Method in class org.apache.spark.ml.feature.ChiSqSelector
- copy(ParamMap) - Method in class org.apache.spark.ml.feature.ChiSqSelectorModel
- copy(ParamMap) - Method in class org.apache.spark.ml.feature.ColumnPruner
- copy(ParamMap) - Method in class org.apache.spark.ml.feature.CountVectorizer
- copy(ParamMap) - Method in class org.apache.spark.ml.feature.CountVectorizerModel
- copy(ParamMap) - Method in class org.apache.spark.ml.feature.FeatureHasher
- copy(ParamMap) - Method in class org.apache.spark.ml.feature.HashingTF
- copy(ParamMap) - Method in class org.apache.spark.ml.feature.IDF
- copy(ParamMap) - Method in class org.apache.spark.ml.feature.IDFModel
- copy(ParamMap) - Method in class org.apache.spark.ml.feature.Imputer
- copy(ParamMap) - Method in class org.apache.spark.ml.feature.ImputerModel
- copy(ParamMap) - Method in class org.apache.spark.ml.feature.IndexToString
- copy(ParamMap) - Method in class org.apache.spark.ml.feature.Interaction
- copy(ParamMap) - Method in class org.apache.spark.ml.feature.MaxAbsScaler
- copy(ParamMap) - Method in class org.apache.spark.ml.feature.MaxAbsScalerModel
- copy(ParamMap) - Method in class org.apache.spark.ml.feature.MinHashLSH
- copy(ParamMap) - Method in class org.apache.spark.ml.feature.MinHashLSHModel
- copy(ParamMap) - Method in class org.apache.spark.ml.feature.MinMaxScaler
- copy(ParamMap) - Method in class org.apache.spark.ml.feature.MinMaxScalerModel
- copy(ParamMap) - Method in class org.apache.spark.ml.feature.OneHotEncoder
- copy(ParamMap) - Method in class org.apache.spark.ml.feature.OneHotEncoderEstimator
- copy(ParamMap) - Method in class org.apache.spark.ml.feature.OneHotEncoderModel
- copy(ParamMap) - Method in class org.apache.spark.ml.feature.PCA
- copy(ParamMap) - Method in class org.apache.spark.ml.feature.PCAModel
- copy(ParamMap) - Method in class org.apache.spark.ml.feature.PolynomialExpansion
- copy(ParamMap) - Method in class org.apache.spark.ml.feature.QuantileDiscretizer
- copy(ParamMap) - Method in class org.apache.spark.ml.feature.RegexTokenizer
- copy(ParamMap) - Method in class org.apache.spark.ml.feature.RFormula
- copy(ParamMap) - Method in class org.apache.spark.ml.feature.RFormulaModel
- copy(ParamMap) - Method in class org.apache.spark.ml.feature.SQLTransformer
- copy(ParamMap) - Method in class org.apache.spark.ml.feature.StandardScaler
- copy(ParamMap) - Method in class org.apache.spark.ml.feature.StandardScalerModel
- copy(ParamMap) - Method in class org.apache.spark.ml.feature.StopWordsRemover
- copy(ParamMap) - Method in class org.apache.spark.ml.feature.StringIndexer
- copy(ParamMap) - Method in class org.apache.spark.ml.feature.StringIndexerModel
- copy(ParamMap) - Method in class org.apache.spark.ml.feature.Tokenizer
- copy(ParamMap) - Method in class org.apache.spark.ml.feature.VectorAssembler
- copy(ParamMap) - Method in class org.apache.spark.ml.feature.VectorAttributeRewriter
- copy(ParamMap) - Method in class org.apache.spark.ml.feature.VectorIndexer
- copy(ParamMap) - Method in class org.apache.spark.ml.feature.VectorIndexerModel
- copy(ParamMap) - Method in class org.apache.spark.ml.feature.VectorSizeHint
- copy(ParamMap) - Method in class org.apache.spark.ml.feature.VectorSlicer
- copy(ParamMap) - Method in class org.apache.spark.ml.feature.Word2Vec
- copy(ParamMap) - Method in class org.apache.spark.ml.feature.Word2VecModel
- copy(ParamMap) - Method in class org.apache.spark.ml.fpm.FPGrowth
- copy(ParamMap) - Method in class org.apache.spark.ml.fpm.FPGrowthModel
- copy(ParamMap) - Method in class org.apache.spark.ml.fpm.PrefixSpan
- copy(Vector, Vector) - Static method in class org.apache.spark.ml.linalg.BLAS
y = x
- copy() - Method in class org.apache.spark.ml.linalg.DenseMatrix
- copy() - Method in class org.apache.spark.ml.linalg.DenseVector
- copy() - Method in interface org.apache.spark.ml.linalg.Matrix
Get a deep copy of the matrix.
- copy() - Method in class org.apache.spark.ml.linalg.SparseMatrix
- copy() - Method in class org.apache.spark.ml.linalg.SparseVector
- copy() - Method in interface org.apache.spark.ml.linalg.Vector
Makes a deep copy of this vector.
- copy(ParamMap) - Method in class org.apache.spark.ml.Model
- copy() - Method in class org.apache.spark.ml.param.ParamMap
Creates a copy of this param map.
- copy(ParamMap) - Method in interface org.apache.spark.ml.param.Params
Creates a copy of this instance with the same UID and some extra params.
- copy(ParamMap) - Method in class org.apache.spark.ml.Pipeline
- copy(ParamMap) - Method in class org.apache.spark.ml.PipelineModel
- copy(ParamMap) - Method in class org.apache.spark.ml.PipelineStage
- copy(ParamMap) - Method in class org.apache.spark.ml.Predictor
- copy(ParamMap) - Method in class org.apache.spark.ml.recommendation.ALS
- copy(ParamMap) - Method in class org.apache.spark.ml.recommendation.ALSModel
- copy(ParamMap) - Method in class org.apache.spark.ml.regression.AFTSurvivalRegression
- copy(ParamMap) - Method in class org.apache.spark.ml.regression.AFTSurvivalRegressionModel
- copy(ParamMap) - Method in class org.apache.spark.ml.regression.DecisionTreeRegressionModel
- copy(ParamMap) - Method in class org.apache.spark.ml.regression.DecisionTreeRegressor
- copy(ParamMap) - Method in class org.apache.spark.ml.regression.GBTRegressionModel
- copy(ParamMap) - Method in class org.apache.spark.ml.regression.GBTRegressor
- copy(ParamMap) - Method in class org.apache.spark.ml.regression.GeneralizedLinearRegression
- copy(ParamMap) - Method in class org.apache.spark.ml.regression.GeneralizedLinearRegressionModel
- copy(ParamMap) - Method in class org.apache.spark.ml.regression.IsotonicRegression
- copy(ParamMap) - Method in class org.apache.spark.ml.regression.IsotonicRegressionModel
- copy(ParamMap) - Method in class org.apache.spark.ml.regression.LinearRegression
- copy(ParamMap) - Method in class org.apache.spark.ml.regression.LinearRegressionModel
- copy(ParamMap) - Method in class org.apache.spark.ml.regression.RandomForestRegressionModel
- copy(ParamMap) - Method in class org.apache.spark.ml.regression.RandomForestRegressor
- copy(ParamMap) - Method in class org.apache.spark.ml.Transformer
- copy(ParamMap) - Method in class org.apache.spark.ml.tuning.CrossValidator
- copy(ParamMap) - Method in class org.apache.spark.ml.tuning.CrossValidatorModel
- copy(ParamMap) - Method in class org.apache.spark.ml.tuning.TrainValidationSplit
- copy(ParamMap) - Method in class org.apache.spark.ml.tuning.TrainValidationSplitModel
- copy(ParamMap) - Method in class org.apache.spark.ml.UnaryTransformer
- copy(Vector, Vector) - Static method in class org.apache.spark.mllib.linalg.BLAS
y = x
- copy() - Method in class org.apache.spark.mllib.linalg.DenseMatrix
- copy() - Method in class org.apache.spark.mllib.linalg.DenseVector
- copy() - Method in interface org.apache.spark.mllib.linalg.Matrix
Get a deep copy of the matrix.
- copy() - Method in class org.apache.spark.mllib.linalg.SparseMatrix
- copy() - Method in class org.apache.spark.mllib.linalg.SparseVector
- copy() - Method in interface org.apache.spark.mllib.linalg.Vector
Makes a deep copy of this vector.
- copy() - Method in class org.apache.spark.mllib.random.ExponentialGenerator
- copy() - Method in class org.apache.spark.mllib.random.GammaGenerator
- copy() - Method in class org.apache.spark.mllib.random.LogNormalGenerator
- copy() - Method in class org.apache.spark.mllib.random.PoissonGenerator
- copy() - Method in interface org.apache.spark.mllib.random.RandomDataGenerator
Returns a copy of the RandomDataGenerator with a new instance of the rng object used in the
class when applicable for non-locking concurrent usage.
- copy() - Method in class org.apache.spark.mllib.random.StandardNormalGenerator
- copy() - Method in class org.apache.spark.mllib.random.UniformGenerator
- copy() - Method in class org.apache.spark.mllib.random.WeibullGenerator
- copy() - Method in class org.apache.spark.mllib.tree.configuration.Strategy
Returns a shallow copy of this instance.
- copy() - Method in interface org.apache.spark.sql.Row
Make a copy of the current
- copy() - Method in class org.apache.spark.sql.vectorized.ColumnarArray
- copy() - Method in class org.apache.spark.sql.vectorized.ColumnarMap
- copy() - Method in class org.apache.spark.sql.vectorized.ColumnarRow
Revisit this.
- copy() - Method in class org.apache.spark.util.AccumulatorV2
Creates a new copy of this accumulator.
- copy() - Method in class org.apache.spark.util.CollectionAccumulator
- copy() - Method in class org.apache.spark.util.DoubleAccumulator
- copy() - Method in class org.apache.spark.util.LegacyAccumulatorWrapper
- copy() - Method in class org.apache.spark.util.LongAccumulator
- copy() - Method in class org.apache.spark.util.StatCounter
Clone this StatCounter
- copyAndReset() - Method in class org.apache.spark.util.AccumulatorV2
Creates a new copy of this accumulator, which is zero value.
- copyAndReset() - Method in class org.apache.spark.util.CollectionAccumulator
- copyFileStreamNIO(FileChannel, FileChannel, long, long) - Static method in class org.apache.spark.util.Utils
- copyStream(InputStream, OutputStream, boolean, boolean) - Static method in class org.apache.spark.util.Utils
Copy all data from an InputStream to an OutputStream.
- copyValues(T, ParamMap) - Method in interface org.apache.spark.ml.param.Params
Copies param values from this instance to another instance for params shared by them.
- cores() - Method in class org.apache.spark.scheduler.cluster.CoarseGrainedClusterMessages.RegisterExecutor
- coresGranted() - Method in class org.apache.spark.status.api.v1.ApplicationInfo
- coresPerExecutor() - Method in class org.apache.spark.status.api.v1.ApplicationInfo
- corr(Dataset<?>, String, String) - Static method in class org.apache.spark.ml.stat.Correlation
:: Experimental ::
Compute the correlation matrix for the input Dataset of Vectors using the specified method.
- corr(Dataset<?>, String) - Static method in class org.apache.spark.ml.stat.Correlation
Compute the Pearson correlation matrix for the input Dataset of Vectors.
- corr(RDD<Object>, RDD<Object>, String) - Static method in class org.apache.spark.mllib.stat.correlation.Correlations
- corr(RDD<Vector>) - Static method in class org.apache.spark.mllib.stat.Statistics
Compute the Pearson correlation matrix for the input RDD of Vectors.
- corr(RDD<Vector>, String) - Static method in class org.apache.spark.mllib.stat.Statistics
Compute the correlation matrix for the input RDD of Vectors using the specified method.
- corr(RDD<Object>, RDD<Object>) - Static method in class org.apache.spark.mllib.stat.Statistics
Compute the Pearson correlation for the input RDDs.
- corr(JavaRDD<Double>, JavaRDD<Double>) - Static method in class org.apache.spark.mllib.stat.Statistics
Java-friendly version of corr()
- corr(RDD<Object>, RDD<Object>, String) - Static method in class org.apache.spark.mllib.stat.Statistics
Compute the correlation for the input RDDs using the specified method.
- corr(JavaRDD<Double>, JavaRDD<Double>, String) - Static method in class org.apache.spark.mllib.stat.Statistics
Java-friendly version of corr()
- corr(String, String, String) - Method in class org.apache.spark.sql.DataFrameStatFunctions
Calculates the correlation of two columns of a DataFrame.
- corr(String, String) - Method in class org.apache.spark.sql.DataFrameStatFunctions
Calculates the Pearson Correlation Coefficient of two columns of a DataFrame.
- corr(Column, Column) - Static method in class org.apache.spark.sql.functions
Aggregate function: returns the Pearson Correlation Coefficient for two columns.
- corr(String, String) - Static method in class org.apache.spark.sql.functions
Aggregate function: returns the Pearson Correlation Coefficient for two columns.
- Correlation - Class in org.apache.spark.ml.stat
API for correlation functions in MLlib, compatible with DataFrames and Datasets.
- Correlation() - Constructor for class org.apache.spark.ml.stat.Correlation
- Correlation - Interface in org.apache.spark.mllib.stat.correlation
Trait for correlation algorithms.
- CorrelationNames - Class in org.apache.spark.mllib.stat.correlation
Maintains supported and default correlation names.
- CorrelationNames() - Constructor for class org.apache.spark.mllib.stat.correlation.CorrelationNames
- Correlations - Class in org.apache.spark.mllib.stat.correlation
Delegates computation to the specific correlation object based on the input method name.
- Correlations() - Constructor for class org.apache.spark.mllib.stat.correlation.Correlations
- corrMatrix(RDD<Vector>, String) - Static method in class org.apache.spark.mllib.stat.correlation.Correlations
- cos(Column) - Static method in class org.apache.spark.sql.functions
- cos(String) - Static method in class org.apache.spark.sql.functions
- cosh(Column) - Static method in class org.apache.spark.sql.functions
- cosh(String) - Static method in class org.apache.spark.sql.functions
- CosineSilhouette - Class in org.apache.spark.ml.evaluation
The algorithm which is implemented in this object, instead, is an efficient and parallel
implementation of the Silhouette using the cosine distance measure.
- CosineSilhouette() - Constructor for class org.apache.spark.ml.evaluation.CosineSilhouette
- count() - Method in interface org.apache.spark.api.java.JavaRDDLike
Return the number of elements in the RDD.
- count() - Method in class org.apache.spark.graphx.impl.EdgeRDDImpl
The number of edges in the RDD.
- count() - Method in class org.apache.spark.graphx.impl.VertexRDDImpl
The number of vertices in the RDD.
- count() - Method in class org.apache.spark.ml.clustering.ExpectationAggregator
- count() - Method in class org.apache.spark.ml.regression.AFTAggregator
- count(Column, Column) - Static method in class org.apache.spark.ml.stat.Summarizer
- count(Column) - Static method in class org.apache.spark.ml.stat.Summarizer
- count() - Method in class org.apache.spark.mllib.stat.MultivariateOnlineSummarizer
Sample size.
- count() - Method in interface org.apache.spark.mllib.stat.MultivariateStatisticalSummary
Sample size.
- count() - Method in class org.apache.spark.rdd.RDD
Return the number of elements in the RDD.
- count() - Method in class org.apache.spark.sql.Dataset
Returns the number of rows in the Dataset.
- count(MapFunction<T, Object>) - Static method in class org.apache.spark.sql.expressions.javalang.typed
Count aggregate function.
- count(Function1<IN, Object>) - Static method in class org.apache.spark.sql.expressions.scalalang.typed
Count aggregate function.
- count(Column) - Static method in class org.apache.spark.sql.functions
Aggregate function: returns the number of items in a group.
- count(String) - Static method in class org.apache.spark.sql.functions
Aggregate function: returns the number of items in a group.
- count() - Method in class org.apache.spark.sql.KeyValueGroupedDataset
Returns a
that contains a tuple with each key and the number of items present
for that key.
- count() - Method in class org.apache.spark.sql.RelationalGroupedDataset
Count the number of rows for each group.
- count() - Method in class org.apache.spark.status.RDDPartitionSeq
- count() - Method in class org.apache.spark.storage.ReadableChannelFileRegion
- count() - Method in interface org.apache.spark.streaming.api.java.JavaDStreamLike
Return a new DStream in which each RDD has a single element generated by counting each RDD
of this DStream.
- count() - Method in class org.apache.spark.streaming.dstream.DStream
Return a new DStream in which each RDD has a single element generated by counting each RDD
of this DStream.
- count() - Method in class org.apache.spark.util.DoubleAccumulator
Returns the number of elements added to the accumulator.
- count() - Method in class org.apache.spark.util.LongAccumulator
Returns the number of elements added to the accumulator.
- count() - Method in class org.apache.spark.util.StatCounter
- countApprox(long, double) - Method in interface org.apache.spark.api.java.JavaRDDLike
Approximate version of count() that returns a potentially incomplete result
within a timeout, even if not all tasks have finished.
- countApprox(long) - Method in interface org.apache.spark.api.java.JavaRDDLike
Approximate version of count() that returns a potentially incomplete result
within a timeout, even if not all tasks have finished.
- countApprox(long, double) - Method in class org.apache.spark.rdd.RDD
Approximate version of count() that returns a potentially incomplete result
within a timeout, even if not all tasks have finished.
- countApproxDistinct(double) - Method in interface org.apache.spark.api.java.JavaRDDLike
Return approximate number of distinct elements in the RDD.
- countApproxDistinct(int, int) - Method in class org.apache.spark.rdd.RDD
Return approximate number of distinct elements in the RDD.
- countApproxDistinct(double) - Method in class org.apache.spark.rdd.RDD
Return approximate number of distinct elements in the RDD.
- countApproxDistinctByKey(double, Partitioner) - Method in class org.apache.spark.api.java.JavaPairRDD
Return approximate number of distinct values for each key in this RDD.
- countApproxDistinctByKey(double, int) - Method in class org.apache.spark.api.java.JavaPairRDD
Return approximate number of distinct values for each key in this RDD.
- countApproxDistinctByKey(double) - Method in class org.apache.spark.api.java.JavaPairRDD
Return approximate number of distinct values for each key in this RDD.
- countApproxDistinctByKey(int, int, Partitioner) - Method in class org.apache.spark.rdd.PairRDDFunctions
Return approximate number of distinct values for each key in this RDD.
- countApproxDistinctByKey(double, Partitioner) - Method in class org.apache.spark.rdd.PairRDDFunctions
Return approximate number of distinct values for each key in this RDD.
- countApproxDistinctByKey(double, int) - Method in class org.apache.spark.rdd.PairRDDFunctions
Return approximate number of distinct values for each key in this RDD.
- countApproxDistinctByKey(double) - Method in class org.apache.spark.rdd.PairRDDFunctions
Return approximate number of distinct values for each key in this RDD.
- countAsync() - Method in interface org.apache.spark.api.java.JavaRDDLike
The asynchronous version of count
, which returns a
future for counting the number of elements in this RDD.
- countAsync() - Method in class org.apache.spark.rdd.AsyncRDDActions
Returns a future for counting the number of elements in the RDD.
- countByKey() - Method in class org.apache.spark.api.java.JavaPairRDD
Count the number of elements for each key, and return the result to the master as a Map.
- countByKey() - Method in class org.apache.spark.rdd.PairRDDFunctions
Count the number of elements for each key, collecting the results to a local Map.
- countByKeyApprox(long) - Method in class org.apache.spark.api.java.JavaPairRDD
Approximate version of countByKey that can return a partial result if it does
not finish within a timeout.
- countByKeyApprox(long, double) - Method in class org.apache.spark.api.java.JavaPairRDD
Approximate version of countByKey that can return a partial result if it does
not finish within a timeout.
- countByKeyApprox(long, double) - Method in class org.apache.spark.rdd.PairRDDFunctions
Approximate version of countByKey that can return a partial result if it does
not finish within a timeout.
- countByValue() - Method in interface org.apache.spark.api.java.JavaRDDLike
Return the count of each unique value in this RDD as a map of (value, count) pairs.
- countByValue(Ordering<T>) - Method in class org.apache.spark.rdd.RDD
Return the count of each unique value in this RDD as a local map of (value, count) pairs.
- countByValue() - Method in interface org.apache.spark.streaming.api.java.JavaDStreamLike
Return a new DStream in which each RDD contains the counts of each distinct value in
each RDD of this DStream.
- countByValue(int) - Method in interface org.apache.spark.streaming.api.java.JavaDStreamLike
Return a new DStream in which each RDD contains the counts of each distinct value in
each RDD of this DStream.
- countByValue(int, Ordering<T>) - Method in class org.apache.spark.streaming.dstream.DStream
Return a new DStream in which each RDD contains the counts of each distinct value in
each RDD of this DStream.
- countByValueAndWindow(Duration, Duration) - Method in interface org.apache.spark.streaming.api.java.JavaDStreamLike
Return a new DStream in which each RDD contains the count of distinct elements in
RDDs in a sliding window over this DStream.
- countByValueAndWindow(Duration, Duration, int) - Method in interface org.apache.spark.streaming.api.java.JavaDStreamLike
Return a new DStream in which each RDD contains the count of distinct elements in
RDDs in a sliding window over this DStream.
- countByValueAndWindow(Duration, Duration, int, Ordering<T>) - Method in class org.apache.spark.streaming.dstream.DStream
Return a new DStream in which each RDD contains the count of distinct elements in
RDDs in a sliding window over this DStream.
- countByValueApprox(long, double) - Method in interface org.apache.spark.api.java.JavaRDDLike
Approximate version of countByValue().
- countByValueApprox(long) - Method in interface org.apache.spark.api.java.JavaRDDLike
Approximate version of countByValue().
- countByValueApprox(long, double, Ordering<T>) - Method in class org.apache.spark.rdd.RDD
Approximate version of countByValue().
- countByWindow(Duration, Duration) - Method in interface org.apache.spark.streaming.api.java.JavaDStreamLike
Return a new DStream in which each RDD has a single element generated by counting the number
of elements in a window over this DStream.
- countByWindow(Duration, Duration) - Method in class org.apache.spark.streaming.dstream.DStream
Return a new DStream in which each RDD has a single element generated by counting the number
of elements in a sliding window over this DStream.
- countDistinct(Column, Column...) - Static method in class org.apache.spark.sql.functions
Aggregate function: returns the number of distinct items in a group.
- countDistinct(String, String...) - Static method in class org.apache.spark.sql.functions
Aggregate function: returns the number of distinct items in a group.
- countDistinct(Column, Seq<Column>) - Static method in class org.apache.spark.sql.functions
Aggregate function: returns the number of distinct items in a group.
- countDistinct(String, Seq<String>) - Static method in class org.apache.spark.sql.functions
Aggregate function: returns the number of distinct items in a group.
- COUNTER() - Static method in class org.apache.spark.metrics.sink.StatsdMetricType
- CountingWritableChannel - Class in org.apache.spark.storage
- CountingWritableChannel(WritableByteChannel) - Constructor for class org.apache.spark.storage.CountingWritableChannel
- countMinSketch(String, int, int, int) - Method in class org.apache.spark.sql.DataFrameStatFunctions
Builds a Count-min Sketch over a specified column.
- countMinSketch(String, double, double, int) - Method in class org.apache.spark.sql.DataFrameStatFunctions
Builds a Count-min Sketch over a specified column.
- countMinSketch(Column, int, int, int) - Method in class org.apache.spark.sql.DataFrameStatFunctions
Builds a Count-min Sketch over a specified column.
- countMinSketch(Column, double, double, int) - Method in class org.apache.spark.sql.DataFrameStatFunctions
Builds a Count-min Sketch over a specified column.
- CountMinSketch - Class in org.apache.spark.util.sketch
A Count-min sketch is a probabilistic data structure used for cardinality estimation using
sub-linear space.
- CountMinSketch() - Constructor for class org.apache.spark.util.sketch.CountMinSketch
- CountMinSketch.Version - Enum in org.apache.spark.util.sketch
- countTowardsTaskFailures() - Method in class org.apache.spark.ExecutorLostFailure
- countTowardsTaskFailures() - Method in class org.apache.spark.FetchFailed
Fetch failures lead to a different failure handling path: (1) we don't abort the stage after
4 task failures, instead we immediately go back to the stage which generated the map output,
and regenerate the missing data.
- countTowardsTaskFailures() - Static method in class org.apache.spark.Resubmitted
- countTowardsTaskFailures() - Method in class org.apache.spark.TaskCommitDenied
If a task failed because its attempt to commit was denied, do not count this failure
towards failing the stage.
- countTowardsTaskFailures() - Method in interface org.apache.spark.TaskFailedReason
Whether this task failure should be counted towards the maximum number of times the task is
allowed to fail before the stage is aborted.
- countTowardsTaskFailures() - Method in class org.apache.spark.TaskKilled
- countTowardsTaskFailures() - Static method in class org.apache.spark.TaskResultLost
- countTowardsTaskFailures() - Static method in class org.apache.spark.UnknownReason
- CountVectorizer - Class in org.apache.spark.ml.feature
- CountVectorizer(String) - Constructor for class org.apache.spark.ml.feature.CountVectorizer
- CountVectorizer() - Constructor for class org.apache.spark.ml.feature.CountVectorizer
- CountVectorizerModel - Class in org.apache.spark.ml.feature
Converts a text document to a sparse vector of token counts.
- CountVectorizerModel(String, String[]) - Constructor for class org.apache.spark.ml.feature.CountVectorizerModel
- CountVectorizerModel(String[]) - Constructor for class org.apache.spark.ml.feature.CountVectorizerModel
- CountVectorizerParams - Interface in org.apache.spark.ml.feature
- cov() - Method in class org.apache.spark.ml.stat.distribution.MultivariateGaussian
- cov(String, String) - Method in class org.apache.spark.sql.DataFrameStatFunctions
Calculate the sample covariance of two numerical columns of a DataFrame.
- covar_pop(Column, Column) - Static method in class org.apache.spark.sql.functions
Aggregate function: returns the population covariance for two columns.
- covar_pop(String, String) - Static method in class org.apache.spark.sql.functions
Aggregate function: returns the population covariance for two columns.
- covar_samp(Column, Column) - Static method in class org.apache.spark.sql.functions
Aggregate function: returns the sample covariance for two columns.
- covar_samp(String, String) - Static method in class org.apache.spark.sql.functions
Aggregate function: returns the sample covariance for two columns.
- covs() - Method in class org.apache.spark.ml.clustering.ExpectationAggregator
- crc32(Column) - Static method in class org.apache.spark.sql.functions
Calculates the cyclic redundancy check value (CRC32) of a binary column and
returns the value as a bigint.
- CreatableRelationProvider - Interface in org.apache.spark.sql.sources
- create(boolean, boolean, boolean, boolean, int) - Static method in class org.apache.spark.api.java.StorageLevels
Create a new StorageLevel object.
- create(JavaSparkContext, JdbcRDD.ConnectionFactory, String, long, long, int, Function<ResultSet, T>) - Static method in class org.apache.spark.rdd.JdbcRDD
Create an RDD that executes a SQL query on a JDBC connection and reads results.
- create(JavaSparkContext, JdbcRDD.ConnectionFactory, String, long, long, int) - Static method in class org.apache.spark.rdd.JdbcRDD
Create an RDD that executes a SQL query on a JDBC connection and reads results.
- create(RDD<T>, Function1<Object, Object>) - Static method in class org.apache.spark.rdd.PartitionPruningRDD
Create a PartitionPruningRDD.
- create(RpcEnvConfig) - Method in interface org.apache.spark.rpc.RpcEnvFactory
- create(Object, DataType, Seq<Option<ScalaReflection.Schema>>) - Static method in class org.apache.spark.sql.expressions.SparkUserDefinedFunction
- create(Object...) - Static method in class org.apache.spark.sql.RowFactory
Create a
from the given arguments.
- create(String) - Static method in class org.apache.spark.sql.streaming.ProcessingTime
- create(long, TimeUnit) - Static method in class org.apache.spark.sql.streaming.ProcessingTime
- create(long) - Static method in class org.apache.spark.util.sketch.BloomFilter
Creates a
with the expected number of insertions and a default expected
false positive probability of 3%.
- create(long, double) - Static method in class org.apache.spark.util.sketch.BloomFilter
Creates a
with the expected number of insertions and expected false
positive probability.
- create(long, long) - Static method in class org.apache.spark.util.sketch.BloomFilter
Creates a
with given
, it will
pick an optimal
which can minimize
for the bloom filter.
- create(int, int, int) - Static method in class org.apache.spark.util.sketch.CountMinSketch
- create(double, double, int) - Static method in class org.apache.spark.util.sketch.CountMinSketch
Creates a
with given relative error (
and random
- createArrayType(DataType) - Static method in class org.apache.spark.sql.types.DataTypes
Creates an ArrayType by specifying the data type of elements (elementType
- createArrayType(DataType, boolean) - Static method in class org.apache.spark.sql.types.DataTypes
Creates an ArrayType by specifying the data type of elements (elementType
) and
whether the array contains null values (containsNull
- createAttrGroupForAttrNames(String, int, boolean, boolean) - Static method in class org.apache.spark.ml.feature.OneHotEncoderCommon
Creates an `AttributeGroup` with the required number of `BinaryAttribute`.
- createCombiner() - Method in class org.apache.spark.Aggregator
- createCommitter(int) - Method in class org.apache.spark.internal.io.HadoopWriteConfigUtil
- createCompiledClass(String, File, TestUtils.JavaSourceFromString, Seq<URL>) - Static method in class org.apache.spark.TestUtils
Creates a compiled class with the source file.
- createCompiledClass(String, File, String, String, Seq<URL>) - Static method in class org.apache.spark.TestUtils
Creates a compiled class with the given name.
- createContinuousReader(Optional<StructType>, String, DataSourceOptions) - Method in interface org.apache.spark.sql.sources.v2.ContinuousReadSupport
- createContinuousReader(PartitionOffset) - Method in interface org.apache.spark.sql.sources.v2.reader.ContinuousInputPartition
Create an input partition reader with particular offset as its startOffset.
- createCryptoInputStream(InputStream, SparkConf, byte[]) - Static method in class org.apache.spark.security.CryptoStreamUtils
Helper method to wrap InputStream
with CryptoInputStream
for decryption.
- createCryptoOutputStream(OutputStream, SparkConf, byte[]) - Static method in class org.apache.spark.security.CryptoStreamUtils
Helper method to wrap OutputStream
with CryptoOutputStream
for encryption.
- createDatabase(CatalogDatabase, boolean) - Method in interface org.apache.spark.sql.hive.client.HiveClient
Creates a new database with the given name.
- createDataFrame(RDD<A>, TypeTags.TypeTag<A>) - Method in class org.apache.spark.sql.SparkSession
:: Experimental ::
Creates a DataFrame
from an RDD of Product (e.g.
- createDataFrame(Seq<A>, TypeTags.TypeTag<A>) - Method in class org.apache.spark.sql.SparkSession
:: Experimental ::
Creates a DataFrame
from a local Seq of Product.
- createDataFrame(RDD<Row>, StructType) - Method in class org.apache.spark.sql.SparkSession
:: DeveloperApi ::
Creates a
from an
s using the given schema.
- createDataFrame(JavaRDD<Row>, StructType) - Method in class org.apache.spark.sql.SparkSession
:: DeveloperApi ::
Creates a
from a
s using the given schema.
- createDataFrame(List<Row>, StructType) - Method in class org.apache.spark.sql.SparkSession
:: DeveloperApi ::
Creates a
from a
s using the given schema.
- createDataFrame(RDD<?>, Class<?>) - Method in class org.apache.spark.sql.SparkSession
Applies a schema to an RDD of Java Beans.
- createDataFrame(JavaRDD<?>, Class<?>) - Method in class org.apache.spark.sql.SparkSession
Applies a schema to an RDD of Java Beans.
- createDataFrame(List<?>, Class<?>) - Method in class org.apache.spark.sql.SparkSession
Applies a schema to a List of Java Beans.
- createDataFrame(RDD<A>, TypeTags.TypeTag<A>) - Method in class org.apache.spark.sql.SQLContext
- createDataFrame(Seq<A>, TypeTags.TypeTag<A>) - Method in class org.apache.spark.sql.SQLContext
- createDataFrame(RDD<Row>, StructType) - Method in class org.apache.spark.sql.SQLContext
- createDataFrame(JavaRDD<Row>, StructType) - Method in class org.apache.spark.sql.SQLContext
- createDataFrame(List<Row>, StructType) - Method in class org.apache.spark.sql.SQLContext
- createDataFrame(RDD<?>, Class<?>) - Method in class org.apache.spark.sql.SQLContext
- createDataFrame(JavaRDD<?>, Class<?>) - Method in class org.apache.spark.sql.SQLContext
- createDataFrame(List<?>, Class<?>) - Method in class org.apache.spark.sql.SQLContext
- createDataset(Seq<T>, Encoder<T>) - Method in class org.apache.spark.sql.SparkSession
:: Experimental ::
Creates a
from a local Seq of data of a given type.
- createDataset(RDD<T>, Encoder<T>) - Method in class org.apache.spark.sql.SparkSession
:: Experimental ::
Creates a
from an RDD of a given type.
- createDataset(List<T>, Encoder<T>) - Method in class org.apache.spark.sql.SparkSession
:: Experimental ::
Creates a
from a
of a given type.
- createDataset(Seq<T>, Encoder<T>) - Method in class org.apache.spark.sql.SQLContext
- createDataset(RDD<T>, Encoder<T>) - Method in class org.apache.spark.sql.SQLContext
- createDataset(List<T>, Encoder<T>) - Method in class org.apache.spark.sql.SQLContext
- createDataWriter(int, long, long) - Method in interface org.apache.spark.sql.sources.v2.writer.DataWriterFactory
Returns a data writer to do the actual writing work.
- createDecimalType(int, int) - Static method in class org.apache.spark.sql.types.DataTypes
Creates a DecimalType by specifying the precision and scale.
- createDecimalType() - Static method in class org.apache.spark.sql.types.DataTypes
Creates a DecimalType with default precision and scale, which are 10 and 0.
- createDF(RDD<byte[]>, StructType, SparkSession) - Static method in class org.apache.spark.sql.api.r.SQLUtils
- createDirectory(String, String) - Static method in class org.apache.spark.util.Utils
Create a directory inside the given parent directory.
- createdTempDir() - Method in interface org.apache.spark.sql.hive.execution.SaveAsHiveFile
- createExternalTable(String, String) - Method in class org.apache.spark.sql.catalog.Catalog
- createExternalTable(String, String, String) - Method in class org.apache.spark.sql.catalog.Catalog
- createExternalTable(String, String, Map<String, String>) - Method in class org.apache.spark.sql.catalog.Catalog
- createExternalTable(String, String, Map<String, String>) - Method in class org.apache.spark.sql.catalog.Catalog
- createExternalTable(String, String, StructType, Map<String, String>) - Method in class org.apache.spark.sql.catalog.Catalog
- createExternalTable(String, String, StructType, Map<String, String>) - Method in class org.apache.spark.sql.catalog.Catalog
- createExternalTable(String, String) - Method in class org.apache.spark.sql.SQLContext
- createExternalTable(String, String, String) - Method in class org.apache.spark.sql.SQLContext
- createExternalTable(String, String, Map<String, String>) - Method in class org.apache.spark.sql.SQLContext
- createExternalTable(String, String, Map<String, String>) - Method in class org.apache.spark.sql.SQLContext
- createExternalTable(String, String, StructType, Map<String, String>) - Method in class org.apache.spark.sql.SQLContext
- createExternalTable(String, String, StructType, Map<String, String>) - Method in class org.apache.spark.sql.SQLContext
- createFilter(StructType, Filter[]) - Static method in class org.apache.spark.sql.hive.orc.OrcFilters
- createFunction(String, CatalogFunction) - Method in interface org.apache.spark.sql.hive.client.HiveClient
Create a function in an existing database.
- createGlobalTempView(String) - Method in class org.apache.spark.sql.Dataset
Creates a global temporary view using the given name.
- CreateHiveTableAsSelectCommand - Class in org.apache.spark.sql.hive.execution
Create table and insert the query result into it.
- CreateHiveTableAsSelectCommand(CatalogTable, LogicalPlan, Seq<String>, SaveMode) - Constructor for class org.apache.spark.sql.hive.execution.CreateHiveTableAsSelectCommand
- createJar(Seq<File>, File, Option<String>) - Static method in class org.apache.spark.TestUtils
Create a jar file that contains this set of files.
- createJarWithClasses(Seq<String>, String, Seq<Tuple2<String, String>>, Seq<URL>) - Static method in class org.apache.spark.TestUtils
Create a jar that defines classes with the given names.
- createJarWithFiles(Map<String, String>, File) - Static method in class org.apache.spark.TestUtils
Create a jar file containing multiple files.
- createJobContext(String, int) - Method in class org.apache.spark.internal.io.HadoopWriteConfigUtil
- createJobID(Date, int) - Static method in class org.apache.spark.internal.io.SparkHadoopWriterUtils
- createJobTrackerID(Date) - Static method in class org.apache.spark.internal.io.SparkHadoopWriterUtils
- createKey(SparkConf) - Static method in class org.apache.spark.security.CryptoStreamUtils
Creates a new encryption key.
- createListeners(SparkConf, ElementTrackingStore) - Method in interface org.apache.spark.status.AppHistoryServerPlugin
Creates listeners to replay the event logs.
- createLogForDriver(SparkConf, String, Configuration) - Static method in class org.apache.spark.streaming.util.WriteAheadLogUtils
Create a WriteAheadLog for the driver.
- createLogForReceiver(SparkConf, String, Configuration) - Static method in class org.apache.spark.streaming.util.WriteAheadLogUtils
Create a WriteAheadLog for the receiver.
- createMapType(DataType, DataType) - Static method in class org.apache.spark.sql.types.DataTypes
Creates a MapType by specifying the data type of keys (keyType
) and values
- createMapType(DataType, DataType, boolean) - Static method in class org.apache.spark.sql.types.DataTypes
Creates a MapType by specifying the data type of keys (keyType
), the data type of
values (keyType
), and whether values contain any null value
- createMetrics(long, long, long, long, long, long, long, long, long, long, long, long, long, long, long, long, long, long, long, long, long, long, long, long) - Static method in class org.apache.spark.status.LiveEntityHelpers
- createMetrics(long) - Static method in class org.apache.spark.status.LiveEntityHelpers
- createMicroBatchReader(Optional<StructType>, String, DataSourceOptions) - Method in interface org.apache.spark.sql.sources.v2.MicroBatchReadSupport
Creates a
to read batches of data from this data source in a
streaming query.
- createModel(DenseVector<Object>) - Method in interface org.apache.spark.ml.ann.Layer
Returns the instance of the layer based on weights provided.
- createOrReplaceGlobalTempView(String) - Method in class org.apache.spark.sql.Dataset
Creates or replaces a global temporary view using the given name.
- createOrReplaceTempView(String) - Method in class org.apache.spark.sql.Dataset
Creates a local temporary view using the given name.
- createOutputOperationFailureForUI(String) - Static method in class org.apache.spark.streaming.ui.UIUtils
- createPartitionReader() - Method in interface org.apache.spark.sql.sources.v2.reader.InputPartition
Returns an input partition reader to do the actual reading work.
- createPartitions(String, String, Seq<CatalogTablePartition>, boolean) - Method in interface org.apache.spark.sql.hive.client.HiveClient
Create one or many partitions in the given table.
- createPathFromString(String, JobConf) - Static method in class org.apache.spark.internal.io.SparkHadoopWriterUtils
- createPMMLModelExport(Object) - Static method in class org.apache.spark.mllib.pmml.export.PMMLModelExportFactory
Factory object to help creating the necessary PMMLModelExport implementation
taking as input the machine learning model (for example KMeansModel).
- createProxyHandler(Function1<String, Option<String>>) - Static method in class org.apache.spark.ui.JettyUtils
Create a handler for proxying request to Workers and Application Drivers
- createProxyLocationHeader(String, HttpServletRequest, URI) - Static method in class org.apache.spark.ui.JettyUtils
- createProxyURI(String, String, String, String) - Static method in class org.apache.spark.ui.JettyUtils
- createRDDFromArray(JavaSparkContext, byte[][]) - Static method in class org.apache.spark.api.r.RRDD
Create an RRDD given a sequence of byte arrays.
- createRDDFromFile(JavaSparkContext, String, int) - Static method in class org.apache.spark.api.r.RRDD
Create an RRDD given a temporary file name.
- createReadableChannel(ReadableByteChannel, SparkConf, byte[]) - Static method in class org.apache.spark.security.CryptoStreamUtils
Wrap a ReadableByteChannel
for decryption.
- createReader(StructType, DataSourceOptions) - Method in interface org.apache.spark.sql.sources.v2.ReadSupport
- createReader(DataSourceOptions) - Method in interface org.apache.spark.sql.sources.v2.ReadSupport
- createRedirectHandler(String, String, Function1<HttpServletRequest, BoxedUnit>, String, Set<String>) - Static method in class org.apache.spark.ui.JettyUtils
Create a handler that always redirects the user to the given path
- createRelation(SQLContext, SaveMode, Map<String, String>, Dataset<Row>) - Method in interface org.apache.spark.sql.sources.CreatableRelationProvider
Saves a DataFrame to a destination (using data source-specific parameters)
- createRelation(SQLContext, Map<String, String>) - Method in interface org.apache.spark.sql.sources.RelationProvider
Returns a new base relation with the given parameters.
- createRelation(SQLContext, Map<String, String>, StructType) - Method in interface org.apache.spark.sql.sources.SchemaRelationProvider
Returns a new base relation with the given parameters and user defined schema.
- createSchedulerBackend(SparkContext, String, TaskScheduler) - Method in interface org.apache.spark.scheduler.ExternalClusterManager
Create a scheduler backend for the given SparkContext and scheduler.
- createSecret(SparkConf) - Static method in class org.apache.spark.util.Utils
- createServlet(JettyUtils.ServletParams<T>, org.apache.spark.SecurityManager, SparkConf) - Static method in class org.apache.spark.ui.JettyUtils
- createServletHandler(String, JettyUtils.ServletParams<T>, org.apache.spark.SecurityManager, SparkConf, String) - Static method in class org.apache.spark.ui.JettyUtils
Create a context handler that responds to a request with the given path prefix
- createServletHandler(String, HttpServlet, String) - Static method in class org.apache.spark.ui.JettyUtils
Create a context handler that responds to a request with the given path prefix
- createSink(SQLContext, Map<String, String>, Seq<String>, OutputMode) - Method in interface org.apache.spark.sql.sources.StreamSinkProvider
- createSource(SQLContext, String, Option<StructType>, String, Map<String, String>) - Method in interface org.apache.spark.sql.sources.StreamSourceProvider
- createSparkContext(String, String, String, String[], Map<Object, Object>, Map<Object, Object>) - Static method in class org.apache.spark.api.r.RRDD
- createStaticHandler(String, String) - Static method in class org.apache.spark.ui.JettyUtils
Create a handler for serving files from a static directory
- createStream(StreamingContext, String, String, String, String, InitialPositionInStream, Duration, StorageLevel, Function1<Record, T>, ClassTag<T>) - Static method in class org.apache.spark.streaming.kinesis.KinesisUtils
- createStream(StreamingContext, String, String, String, String, InitialPositionInStream, Duration, StorageLevel, Function1<Record, T>, String, String, ClassTag<T>) - Static method in class org.apache.spark.streaming.kinesis.KinesisUtils
- createStream(StreamingContext, String, String, String, String, InitialPositionInStream, Duration, StorageLevel, Function1<Record, T>, String, String, String, String, String, ClassTag<T>) - Static method in class org.apache.spark.streaming.kinesis.KinesisUtils
- createStream(StreamingContext, String, String, String, String, InitialPositionInStream, Duration, StorageLevel) - Static method in class org.apache.spark.streaming.kinesis.KinesisUtils
- createStream(StreamingContext, String, String, String, String, InitialPositionInStream, Duration, StorageLevel, String, String) - Static method in class org.apache.spark.streaming.kinesis.KinesisUtils
- createStream(JavaStreamingContext, String, String, String, String, InitialPositionInStream, Duration, StorageLevel, Function<Record, T>, Class<T>) - Static method in class org.apache.spark.streaming.kinesis.KinesisUtils
- createStream(JavaStreamingContext, String, String, String, String, InitialPositionInStream, Duration, StorageLevel, Function<Record, T>, Class<T>, String, String) - Static method in class org.apache.spark.streaming.kinesis.KinesisUtils
- createStream(JavaStreamingContext, String, String, String, String, InitialPositionInStream, Duration, StorageLevel, Function<Record, T>, Class<T>, String, String, String, String, String) - Static method in class org.apache.spark.streaming.kinesis.KinesisUtils
- createStream(JavaStreamingContext, String, String, String, String, InitialPositionInStream, Duration, StorageLevel) - Static method in class org.apache.spark.streaming.kinesis.KinesisUtils
- createStream(JavaStreamingContext, String, String, String, String, InitialPositionInStream, Duration, StorageLevel, String, String) - Static method in class org.apache.spark.streaming.kinesis.KinesisUtils
- createStream(JavaStreamingContext, String, String, String, String, int, Duration, StorageLevel, String, String, String, String, String) - Method in class org.apache.spark.streaming.kinesis.KinesisUtilsPythonHelper
- createStreamWriter(String, StructType, OutputMode, DataSourceOptions) - Method in interface org.apache.spark.sql.sources.v2.StreamWriteSupport
Creates an optional
to save the data to this data source.
- createStructField(String, String, boolean) - Static method in class org.apache.spark.sql.api.r.SQLUtils
- createStructField(String, DataType, boolean, Metadata) - Static method in class org.apache.spark.sql.types.DataTypes
Creates a StructField by specifying the name (name
), data type (dataType
) and
whether values of this field can be null values (nullable
- createStructField(String, DataType, boolean) - Static method in class org.apache.spark.sql.types.DataTypes
Creates a StructField with empty metadata.
- createStructType(Seq<StructField>) - Static method in class org.apache.spark.sql.api.r.SQLUtils
- createStructType(List<StructField>) - Static method in class org.apache.spark.sql.types.DataTypes
Creates a StructType with the given list of StructFields (fields
- createStructType(StructField[]) - Static method in class org.apache.spark.sql.types.DataTypes
Creates a StructType with the given StructField array (fields
- createTable(String, String) - Method in class org.apache.spark.sql.catalog.Catalog
:: Experimental ::
Creates a table from the given path and returns the corresponding DataFrame.
- createTable(String, String, String) - Method in class org.apache.spark.sql.catalog.Catalog
:: Experimental ::
Creates a table from the given path based on a data source and returns the corresponding
- createTable(String, String, Map<String, String>) - Method in class org.apache.spark.sql.catalog.Catalog
:: Experimental ::
Creates a table based on the dataset in a data source and a set of options.
- createTable(String, String, Map<String, String>) - Method in class org.apache.spark.sql.catalog.Catalog
:: Experimental ::
Creates a table based on the dataset in a data source and a set of options.
- createTable(String, String, StructType, Map<String, String>) - Method in class org.apache.spark.sql.catalog.Catalog
:: Experimental ::
Create a table based on the dataset in a data source, a schema and a set of options.
- createTable(String, String, StructType, Map<String, String>) - Method in class org.apache.spark.sql.catalog.Catalog
:: Experimental ::
Create a table based on the dataset in a data source, a schema and a set of options.
- createTable(CatalogTable, boolean) - Method in interface org.apache.spark.sql.hive.client.HiveClient
Creates a table with the given metadata.
- createTaskAttemptContext(String, int, int, int) - Method in class org.apache.spark.internal.io.HadoopWriteConfigUtil
- createTaskScheduler(SparkContext, String) - Method in interface org.apache.spark.scheduler.ExternalClusterManager
Create a task scheduler instance for the given SparkContext
- createTempDir(String, String) - Static method in class org.apache.spark.util.Utils
Create a temporary directory inside the given parent directory.
- createTempView(String) - Method in class org.apache.spark.sql.Dataset
Creates a local temporary view using the given name.
- createUnsafe(long, int, int) - Static method in class org.apache.spark.sql.types.Decimal
Creates a decimal from unscaled, precision and scale without checking the bounds.
- createWorkspace(int) - Static method in class org.apache.spark.mllib.optimization.NNLS
- createWritableChannel(WritableByteChannel, SparkConf, byte[]) - Static method in class org.apache.spark.security.CryptoStreamUtils
Wrap a WritableByteChannel
for encryption.
- createWriter(String, StructType, SaveMode, DataSourceOptions) - Method in interface org.apache.spark.sql.sources.v2.WriteSupport
- createWriterFactory() - Method in interface org.apache.spark.sql.sources.v2.writer.DataSourceWriter
Creates a writer factory which will be serialized and sent to executors.
- crossJoin(Dataset<?>) - Method in class org.apache.spark.sql.Dataset
Explicit cartesian join with another DataFrame
- crosstab(String, String) - Method in class org.apache.spark.sql.DataFrameStatFunctions
Computes a pair-wise frequency table of the given columns.
- CrossValidator - Class in org.apache.spark.ml.tuning
K-fold cross validation performs model selection by splitting the dataset into a set of
non-overlapping randomly partitioned folds which are used as separate training and test datasets
e.g., with k=3 folds, K-fold cross validation will generate 3 (training, test) dataset pairs,
each of which uses 2/3 of the data for training and 1/3 for testing.
- CrossValidator(String) - Constructor for class org.apache.spark.ml.tuning.CrossValidator
- CrossValidator() - Constructor for class org.apache.spark.ml.tuning.CrossValidator
- CrossValidatorModel - Class in org.apache.spark.ml.tuning
CrossValidatorModel contains the model with the highest average cross-validation
metric across folds and uses this model to transform input data.
- CrossValidatorModel.CrossValidatorModelWriter - Class in org.apache.spark.ml.tuning
Writer for CrossValidatorModel.
- CrossValidatorParams - Interface in org.apache.spark.ml.tuning
- CryptoStreamUtils - Class in org.apache.spark.security
A util class for manipulating IO encryption and decryption streams.
- CryptoStreamUtils() - Constructor for class org.apache.spark.security.CryptoStreamUtils
- CryptoStreamUtils.BaseErrorHandler - Interface in org.apache.spark.security
- CryptoStreamUtils.ErrorHandlingReadableChannel - Class in org.apache.spark.security
- csv(String...) - Method in class org.apache.spark.sql.DataFrameReader
Loads CSV files and returns the result as a DataFrame
- csv(String) - Method in class org.apache.spark.sql.DataFrameReader
Loads a CSV file and returns the result as a DataFrame
- csv(Dataset<String>) - Method in class org.apache.spark.sql.DataFrameReader
Loads an Dataset[String]
storing CSV rows and returns the result as a DataFrame
- csv(Seq<String>) - Method in class org.apache.spark.sql.DataFrameReader
Loads CSV files and returns the result as a DataFrame
- csv(String) - Method in class org.apache.spark.sql.DataFrameWriter
Saves the content of the DataFrame
in CSV format at the specified path.
- csv(String) - Method in class org.apache.spark.sql.streaming.DataStreamReader
Loads a CSV file stream and returns the result as a DataFrame
- cube(Column...) - Method in class org.apache.spark.sql.Dataset
Create a multi-dimensional cube for the current Dataset using the specified columns,
so we can run aggregation on them.
- cube(String, String...) - Method in class org.apache.spark.sql.Dataset
Create a multi-dimensional cube for the current Dataset using the specified columns,
so we can run aggregation on them.
- cube(Seq<Column>) - Method in class org.apache.spark.sql.Dataset
Create a multi-dimensional cube for the current Dataset using the specified columns,
so we can run aggregation on them.
- cube(String, Seq<String>) - Method in class org.apache.spark.sql.Dataset
Create a multi-dimensional cube for the current Dataset using the specified columns,
so we can run aggregation on them.
- CubeType$() - Constructor for class org.apache.spark.sql.RelationalGroupedDataset.CubeType$
- cume_dist() - Static method in class org.apache.spark.sql.functions
Window function: returns the cumulative distribution of values within a window partition,
- current_date() - Static method in class org.apache.spark.sql.functions
Returns the current date as a date column.
- current_timestamp() - Static method in class org.apache.spark.sql.functions
Returns the current timestamp as a timestamp column.
- currentAttemptId() - Method in interface org.apache.spark.SparkStageInfo
- currentAttemptId() - Method in class org.apache.spark.SparkStageInfoImpl
- currentDatabase() - Method in class org.apache.spark.sql.catalog.Catalog
Returns the current default database in this session.
- currentResult() - Method in interface org.apache.spark.partial.ApproximateEvaluator
- currentRow() - Static method in class org.apache.spark.sql.expressions.Window
Value representing the current row.
- currentRow() - Static method in class org.apache.spark.sql.functions
- currPrefLocs(Partition, RDD<?>) - Method in class org.apache.spark.rdd.DefaultPartitionCoalescer
- customMetrics() - Method in class org.apache.spark.sql.streaming.StateOperatorProgress