Powered by GitBook

10.2.1 创建 `DataFrame`

With a SparkSession, applications can create DataFrames from an existing RDD, from a Hive table, or from Spark data sources.

有了 SparkSession 之后, 通过 SparkSession有 3 种方式来创建DataFrame:

通过 Spark 的数据源创建
通过已知的 RDD 来创建
通过查询一个 Hive 表来创建.

通过 Spark 数据源创建

Spark支持的数据源:

// 读取 json 文件
scala> val df = spark.read.json("/opt/module/spark-local/examples/src/main/resources/employees.json")
df: org.apache.spark.sql.DataFrame = [name: string, salary: bigint]

// 展示结果
scala> df.show
+-------+------+
|   name|salary|
+-------+------+
|Michael|  3000|
|   Andy|  4500|
| Justin|  3500|
|  Berta|  4000|
+-------+------+

通过 `RDD` 进行转换

后面章节专门讨论

通过查询 Hive 表创建

后面章节专门讨论

results matching ""

No results matching ""