In my foundry environment, I have a pyspark dataset with only one column called "data".
Each row has a string that looks like a TSV. Each row has a value like this:
ott-akamai-logs-processor srv 2023-07-29 17:46:50.134 2023-07-29 17:46:49.358 unstruct 103b9271-777 ott node-3.13.1 ssc-2.8.2-kinesis snowplow-enrich-kinesis-3.7.0 3.65.234.x 12345679 DE HE Karachi 60313 50.1188 8.6843 Malta {"schema":"iglu:com.xxx/1-0-0","data":{"schema":"xxx/hls_manifest_requested/jsonschema/1-0-1","data":{"channel":"bildtv-broadcast","session_id":"xxx","request_id":"xxx","total_bytes":351,"referrer":"^","geo_country":"DE","geo_state":"Berlin","geo_city":"-","variant_name":"6.m3u8"}}} snowplow-nodejs-tracker/3.13.1 Europe/Berlin 2023-07-29 17:46:49.281 {"schema":"xxx/contexts/jsonschema/1-0-1","data":[{"schema":"iglu:nl.basjes/yauaa_context/jsonschema/1-0-4","data":{"deviceBrand":"Unknown","deviceName":"Unknown","operatingSystemVersionMajor":"??","layoutEngineNameVersion":"Unknown ??","operatingSystemNameVersion":"Unknown ??","agentInformationEmail":"Unknown","networkType":"Unknown","webviewAppNameVersionMajor":"Unknown ??","layoutEngineNameVersionMajor":"Unknown ??","operatingSystemName":"Unknown","agentVersionMajor":"3","layoutEngineVersionMajor":"??","webviewAppName":"Unknown","deviceClass":"Unknown","agentNameVersionMajor":"Snowplow-Nodejs-Tracker 3","operatingSystemNameVersionMajor":"Unknown ??","webviewAppVersionMajor":"??","operatingSystemClass":"Unknown","webviewAppVersion":"??","layoutEngineName":"Unknown","agentName":"Snowplow-Nodejs-Tracker","agentVersion":"3.13.1","layoutEngineClass":"Unknown","agentNameVersion":"Snowplow-Nodejs-Tracker 3.13.1","operatingSystemVersion":"??","agentClass":"Special","layoutEngineVersion":"??","agentInformationUrl":"Unknown"}},{"schema":"iglu:com.snowplowanalytics.snowplow/ua_parser_context/jsonschema/1-0-0","data":{"useragentFamily":"Other","useragentMajor":null,"useragentMinor":null,"useragentPatch":null,"useragentVersion":"Other","osFamily":"Other","osMajor":null,"osMinor":null,"osPatch":null,"osPatchMinor":null,"osVersion":"Other","deviceFamily":"Other"}}]} 2023-07-29 17:46:09.938 com.axelspringer.ott hls_manifest_requested jsonschema 1-0-1 2023-07-29 17:46:09.938
Here, things are separated by tabs. For each tab separation, I want to put the values into different columns. How can I do so?
def unnamed_1(my_df):
df = my_df
return df
You can use
pandas.read_csvfor that: