I'm implementing spark(1.5.2) sql RelationProvider for custom data source (properties files).
Can some one please explain how automatic inference algorithm should be implemented?
I'm implementing spark(1.5.2) sql RelationProvider for custom data source (properties files).
Can some one please explain how automatic inference algorithm should be implemented?
Copyright © 2021 Jogjafile Inc.
In general, you need to create a
StructTypethat represents your schema. AStructTypecontains anArray[StructField], where each element of the array corresponds to a column in your schema. AStructFieldcan be any supportedDataType-- including anotherStructTypefor nested schemas.Creating a schema can be as simple as:
If you want to generate a schema from a complex dataset -- one that includes nested
StructTypes-- then you most likely need to create a recursive function. A good example of what such a function looks like can be found in thespark-avrointegration library. The function toSqlType takes anAvroschema and converts it into a SparkStructType.