How to convert yyyyMMdd to yyyy-MM-dd format with appropriate schema in spark/pyspark

Question : I have a data frame with a column for date in the format yyyymmdd, how can I convert it into yyyy-MM-dd with appropriate schema in spark/pyspark ?

I would like to answer above question.

Converting yyyyMMdd to yyyy-MM-dd format with appropriate schema

Spark

Code

%spark
var df = Seq("20200601","20200602","20200603").toDF("Date")
.withColumn("Date", to_date(col("Date"), "yyyyMMdd"))

df.show
df.printSchema

Output

+----------+
|      Date|
+----------+
|2020-06-01|
|2020-06-02|
|2020-06-03|
+----------+

root
 |-- Date: date (nullable = true)

I used some methods below.

  • def to_date(e: Column, fmt: String): Column
    Converts the column into a DateType with a specified format
    See Datetime Patterns for valid date and time format patterns
    • e
      A date, timestamp or string. If a string, the data must be in a format that can be cast to a date, such as yyyy-MM-dd or yyyy-MM-dd HH:mm:ss.SSSS
    • fmt
      A date time pattern detailing the format of e when eis a string
    • returns
      A date, or null if e was a string that could not be cast to a date or fmt was an invalid format
    • Since
      2.2.0

pyspark

Code

%pyspark
from pyspark.sql.functions import to_date, col

df = spark.createDataFrame([("20200601",), ("20200602",), ("20200603",)]).toDF("date")
df = df.withColumn("date", to_date(col("date"), "yyyyMMdd"))

df.show()
df.printSchema()

Output

+----------+
|      date|
+----------+
|2020-06-01|
|2020-06-02|
|2020-06-03|
+----------+

root
 |-- date: date (nullable = true)

I used some methods below.

That’s all. Thank you.

If you are new to Spark, I recommend Oreilly Safari online learning.

コメントを残す

メールアドレスが公開されることはありません。