How to convert XML to edn in Clojure?

902 Views Asked by At

I'm new to Clojure and would like to convert an XML I have into an edn object.

The XML file I reads:

<Vehicle>
  <Model>Toyota</Model>
  <Color>Red</Color>
  <Loans>
    <Reoccuring>Monthly</Reoccuring>
    <Owners>
      <Owner>Bob</Owner>
    </Owners>
  </Loans>
  <Tires>
    <Model>123123</Model>
    <Size>23</Size>
  </Tires>
  <Engine>
    <Model>30065</Model>
  </Engine>
</Vehicle>

And I have saved it as 'test/resources/vehicle.xml

Ultimately, I would like to have a EDN object that looks like:

:Vehicle
    :Model "Toyota"
    :Color "Red"
    :Loans
        :Reoccuring "Monthly"
        :Owners
            :Owner "Bob"
    :Tires
        :Model 123123
        :Size 23
    :Engine
        :Model 30065

So far, what I have tried in Clojure is the parse method:

(def xml-parser
  (parse "<Vehicle><Model>Toyota</Model><Color>Red</Color><Loans><Reoccuring>Monthly</Reoccuring><Owners><Owner>Bob</Owner></Owners></Loans><Tires><Model>123123</Model><Size>23</Size></Tires><Engine><Model>30065</Model></Engine></Vehicle>"))

However, this returns a Clojure hash that looks like:

{:tag :Vehicle, :attrs nil, :content [{:tag :Model, :attrs nil, :content ["Toyota"]} {:tag :Color, :attrs nil, :content ["Red"]} {:tag :Loans, :attrs nil, :content [{:tag :Reoccuring, :attrs nil, :content ["Monthly"]} {:tag :Owners, :attrs nil, :content [{:tag :Owner, :attrs nil, :content ["Bob"]}]}]} {:tag :Tires, :attrs nil, :content [{:tag :Model, :attrs nil, :content ["123123"]} {:tag :Size, :attrs nil, :content ["23"]}]} {:tag :Engine, :attrs nil, :content [{:tag :Model, :attrs nil, :content ["30065"]}]}]}

I'm having trouble with this initial step of conversion. Thank you for your help in advance.

2

There are 2 best solutions below

2
On

The output you've gotten is the most flexible way to model XML documents in Clojure. As Alan Thompson notes, it is often called Enlive, since enlive was the library to popularize this model. It's not clear what else you were hoping to get, since your expected output is just a mess of keywords with no structure. You might have been hoping for Hiccup style (again described in Alan Thompson's answer), or a nested map (called "mappy" by Alan Thompson's answer), but if so I urge you to reconsider. The only advantage of Hiccup is that it is easy to write by hand. Enlive is much easier to consume and transform.

The mappy format is very convenient to use, but sadly it does not correspond 1:1 with XML documents, because those may have repeated element names, and because mappy has no way to describe attributes, only elements. So an XML parser cannot provide that as an output format without losing fidelity. If you know your input does not have any of these issues, you can write a conversion function from Enlive yourself - it is quite easy for a fixed schema.

0
On

The data you have is in Enlive format. Use clojure.pprint/pprint to see a nicer format:

{:tag :Vehicle,
 :attrs nil,
 :content
 [{:tag :Model, :attrs nil, :content ["Toyota"]}
  {:tag :Color, :attrs nil, :content ["Red"]}
  {:tag :Loans,
   :attrs nil,
   :content
   [{:tag :Reoccuring, :attrs nil, :content ["Monthly"]}
    {:tag :Owners,
     :attrs nil,
     :content [{:tag :Owner, :attrs nil, :content ["Bob"]}]}]}
  {:tag :Tires,
   :attrs nil,
   :content
   [{:tag :Model, :attrs nil, :content ["123123"]}
    {:tag :Size, :attrs nil, :content ["23"]}]}
  {:tag :Engine,
   :attrs nil,
   :content [{:tag :Model, :attrs nil, :content ["30065"]}]}]}

The problem is that your desired output is not actually legal EDN data format. However, you can use the tupelo.forest library to convert among several data formats:

First declare the data and parse it into Enlive format:

(ns tst.demo.core
  (:use tupelo.core tupelo.test)
  (:require
    [tupelo.parse.xml :as xml]
    [tupelo.forest :as tf])
  )

(def xml-str
   "<Vehicle>
      <Model>Toyota</Model>
      <Color>Red</Color>
      <Loans>
        <Reoccuring>Monthly</Reoccuring>
        <Owners>
          <Owner>Bob</Owner>
        </Owners>
      </Loans>
      <Tires>
        <Model>123123</Model>
        <Size>23</Size>
      </Tires>
      <Engine>
        <Model>30065</Model>
      </Engine>
    </Vehicle> ")

verify the result

(dotest
  (let [data-enlive (xml/parse xml-str)]
    (is= data-enlive
      {:tag     :Vehicle,
       :attrs   {},
       :content [{:tag :Model, :attrs {}, :content ["Toyota"]}
                 {:tag :Color, :attrs {}, :content ["Red"]}
                 {:tag     :Loans,
                  :attrs   {},
                  :content [{:tag :Reoccuring, :attrs {}, :content ["Monthly"]}
                            {:tag     :Owners,
                             :attrs   {},
                             :content [{:tag :Owner, :attrs {}, :content ["Bob"]}]}]}
                 {:tag     :Tires,
                  :attrs   {},
                  :content [{:tag :Model, :attrs {}, :content ["123123"]}
                            {:tag :Size, :attrs {}, :content ["23"]}]}
                 {:tag     :Engine,
                  :attrs   {},
                  :content [{:tag :Model, :attrs {}, :content ["30065"]}]}]})

convert to Hiccup format:

    (is= (tf/enlive->hiccup data-enlive)
      [:Vehicle
       [:Model "Toyota"]
       [:Color "Red"]
       [:Loans [:Reoccuring "Monthly"]
        [:Owners [:Owner "Bob"]]]
       [:Tires [:Model "123123"]
        [:Size "23"]]
       [:Engine [:Model "30065"]]])

You may also like the "bush" format:

    (is= (tf/enlive->bush data-enlive)
      [{:tag :Vehicle}
       [{:tag :Model, :value "Toyota"}]
       [{:tag :Color, :value "Red"}]
       [{:tag :Loans}
        [{:tag :Reoccuring, :value "Monthly"}]
        [{:tag :Owners} [{:tag :Owner, :value "Bob"}]]]
       [{:tag :Tires}
        [{:tag :Model, :value "123123"}]
        [{:tag :Size, :value "23"}]]
       [{:tag :Engine} [{:tag :Model, :value "30065"}]]])

or the more detailed "tree" format

    (is= (tf/enlive->tree data-enlive)
      {:tag :Vehicle,
       :tupelo.forest/kids
            [{:tag :Model, :value "Toyota", :tupelo.forest/kids []}
             {:tag :Color, :value "Red", :tupelo.forest/kids []}
             {:tag :Loans,
              :tupelo.forest/kids
                   [{:tag :Reoccuring, :value "Monthly", :tupelo.forest/kids []}
                    {:tag :Owners,
                     :tupelo.forest/kids
                          [{:tag :Owner, :value "Bob", :tupelo.forest/kids []}]}]}
             {:tag :Tires,
              :tupelo.forest/kids
                   [{:tag :Model, :value "123123", :tupelo.forest/kids []}
                    {:tag :Size, :value "23", :tupelo.forest/kids []}]}
             {:tag :Engine,
              :tupelo.forest/kids
                   [{:tag :Model, :value "30065", :tupelo.forest/kids []}]}]})
    ))

See the Tupelo Forest docs for full information.


The above code was run using this template project.


If you are looking for a hierarchical map style output, you can kludge together something like so:

(ns tst.demo.core
  (:use tupelo.core tupelo.test)
  (:require [clojure.walk :as walk]))

(dotest
  (let [data  [:Vehicle
               [:Model "Toyota"]
               [:Color "Red"]
               [:Loans
                [:Reoccuring "Monthly"]
                [:Owners
                 [:Owner "Bob"]]]
               [:Tires
                [:Model "123123"]
                [:Size "23"]]
               [:Engine
                [:Model "30065"]]]

        mappy (walk/postwalk
                (fn [item]
                  (if (vector? item)
                    (if (= 2 (count item))
                      (conj {} item)
                      {(first item)
                       (into {} (rest item))})
                    item))
                data)]

with test

    (is= mappy
      {:Vehicle
       {:Model  "Toyota",
        :Color  "Red",
        :Loans  {:Reoccuring "Monthly"
                 :Owners     {:Owner "Bob"}},
        :Tires  {:Model "123123"
                 :Size  "23"},
        :Engine {:Model "30065"}}})))

Although this is pretty fragile as written.