KDB split by fixed delimiter

3.5k Views Asked by At

I have a column with xmls

<Options TE="2017/09/01, 16:45:00.000" ST="2017/09/01, 09:00:00.000" TT="2017/09/01, 16:45:00.000"/>
<Options TE="2017/09/01, 16:45:00.000" ST="2017/09/01, 09:00:00.000" TT="2017/09/01, 16:45:00.000"/>
<Options TE="2017/09/04, 16:45:00.000" ST="2017/09/04, 09:00:00.000" TT="2017/09/04, 16:45:00.000"/>

That I am trying to split in columns

TE, ST, TT

The type of the data is C

Not very familiar with kdb/q I tried to go the very manual way. First removed the start and end tags

x:update `$ssr[;"<Options";""] each tags from x
x:update `$ssr[;"/>";""] each string tags from x

leaving me with rows like

TE="2017/09/01, 16:45:00.000" ST="2017/09/01, 09:00:00.000" TT="2017/09/01, 16:45:00.000"

Then, splitting the string

select `$"\"" vs' string tags from  x

gives me a list where the odd entries are my times. I just can't figure out how to take that list and split it into separate columns. Any ideas?

3

There are 3 best solutions below

3
On BEST ANSWER

I've taken a slightly different approach but the following should do what you want:

//Clean the tags up for separation
//(get rid of open/close tags, change ", " to "," for ease of parsing and remove quote marks) 
x:update tags:{ssr/[x;("<Options ";"/>";", ";"\"");("";"";",";"")]} each tags from x


//Parse the various tags using 0:, put the result into a dictionary,
//exec out to table form and add to x
x:x,'exec (!) ./: ("S= " 0:/: tags) from x

For reference here's the table I used:

x:([] tags:("<Options TE=\"2017/09/01, 16:45:00.000\" ST=\"2017/09/01, 09:00:00.000\" TT=\"2017/09/01, 16:45:00.000\"/>";
"<Options TE=\"2017/09/01, 16:45:00.000\" ST=\"2017/09/01, 09:00:00.000\" TT=\"2017/09/01, 16:45:00.000\"/>";
"<Options TE=\"2017/09/04, 16:45:00.000\" ST=\"2017/09/04, 09:00:00.000\" TT=\"2017/09/04, 16:45:00.000\"/>"))
0
On

Crazy thought: Is your XML data that regular looking, so that one can select "columns" via indexing. If so, suppose the data (above) was in 3-element list of strings, is it not possible that you apply some function foo to:

foo xmllist[;ind]

where ind selects the data required. The function foo would do the necessary conversion to the timestamp datatype, either by using (types;delimiter) 0: ... ?

0
On

see if you can export XML file into JSON file. kdb+/q has a json parser which does all the dirty work for you. .j.k and .j.j.

Reference: http://code.kx.com/q/cookbook/websockets/#json