spark-sql

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
1.rdd转换成dataframe ---->rdd需要时列表或者元组构成的
rdd1 = sc.parallelize([('a',1),('b',2)])
df = spark.createDataFrame(rdd1)
df.show()
输出结果:
+---+---+
| _1| _2|
+---+---+
| a| 1|
| b| 2|
+---+---+

df.first():
Row(_1='a', _2=1)

df.printSchema() ===>查看df的列字段类型同pandas的info
root
|-- _1: string (nullable = true)
|-- _2: long (nullable = true)