I receive files in real-time in hdfs and they have the same naming convention.
id_name_..._timestamp
Can I somehow define this naming convention on spark (scala), so I can compare these later with the ID for example?
Thank you
I receive files in real-time in hdfs and they have the same naming convention.
id_name_..._timestamp
Can I somehow define this naming convention on spark (scala), so I can compare these later with the ID for example?
Thank you
Copyright © 2021 Jogjafile Inc.
you use something like this :
register udf
import org.apache.spark.sql.functions.input_file_name