I have a file that has a mix of comma delimited lines and pipe delimited lines I need to import into Databricks.
Is it possible to indicate the use of two or more different separators when creating a sql table in Databricks/Spark?
I see lots of posts for multiple character separators, but nothing on different separators.
etc.
I'm currently using something like this.
create table myschema.mytable (
foo string,
bar string
)
using csv
options (
header = "true",
delimiter = ","
);
One methood you could try is to create spark dataframe first and then make a table out of it. Giving example for a hypothetical case below using pyspark where delimiters were | and -
BEWARE: we are using split and it means that it will split everything, e.g. 2000-12-31 is a value yest it will be split. Therefor we should be very sure that no such case would ever occur in data. As general advice, one should never accept these types of files as there are accidents waiting to happen.
How sample data looks: in this case we have 2 files in our directory with | and - occurring randomly as delimiters