read.jdbc {SparkR} | R Documentation |
Additional JDBC database connection properties can be set (...) You can find the JDBC-specific option and parameter documentation for reading tables via JDBC in https://spark.apache.org/docs/latest/sql-data-sources-jdbc.html#data-source-option Data Source Option in the version you use.
read.jdbc( url, tableName, partitionColumn = NULL, lowerBound = NULL, upperBound = NULL, numPartitions = 0L, predicates = list(), ... )
url |
JDBC database url of the form |
tableName |
the name of the table in the external database |
partitionColumn |
the name of a column of numeric, date, or timestamp type that will be used for partitioning. |
lowerBound |
the minimum value of |
upperBound |
the maximum value of |
numPartitions |
the number of partitions, This, along with |
predicates |
a list of conditions in the where clause; each one defines one partition |
... |
additional JDBC database connection named properties. |
Only one of partitionColumn or predicates should be set. Partitions of the table will be
retrieved in parallel based on the numPartitions
or by the predicates.
Don't create too many partitions in parallel on a large cluster; otherwise Spark might crash your external database systems.
SparkDataFrame
read.jdbc since 2.0.0
## Not run:
##D sparkR.session()
##D jdbcUrl <- "jdbc:mysql://localhost:3306/databasename"
##D df <- read.jdbc(jdbcUrl, "table", predicates = list("field<=123"), user = "username")
##D df2 <- read.jdbc(jdbcUrl, "table2", partitionColumn = "index", lowerBound = 0,
##D upperBound = 10000, user = "username", password = "password")
## End(Not run)