如何处理spark中数据框列名称中的空格-Java 学习之路

我在df中注册了一个tmp表，在列header中有空格 . 我可以通过sqlContext使用sql查询时提取列 . 我尝试使用后退但它不起作用

df1 =  sqlContext.sql("""select Company, Sector, Industry, `Altman Z-score as Z_Score` from tmp1 """)

2 回答

您只需将列名放在后面的刻度中，而不是它的别名：

Without Alias ：

df1 =  sqlContext.sql("""select Company, Sector, Industry, `Altman Z-score` as Z_Score from tmp1""")

With Alias ：

df1 =  sqlContext.sql("""select t1.Company, t1.Sector, t1.Industry, t1.`Altman Z-score` as Z_Score from tmp1 t1""")

回复于 2024-06-02T16:45:26+08:00

查询中存在问题，更正后的查询如下（ wrapped as Z_Score in `` ）： -

df1 =  sqlContext.sql("""select Company, Sector, Industry, `Altman Z-score` as Z_Score from tmp1 """)

还有一个替代： -

import pyspark.sql.functions as F
df1 =  sqlContext.sql("""select * from tmp1 """)
df1.select(F.col("Altman Z-score").alias("Z_Score")).show()

回复于 2024-06-02T16:45:26+08:00