I have a log file, which is 1.6 GB in size and contains 2 million records. I am reading the contents of the log into a channel, performing some transformation and writing the contents back onto another channel.
Finally, I am writing the contents of the second channel into a file.
My code is working fine, and the results are as expected. However, the entire operation is taking ~45 seconds, which is too long.
I need to reduce the time taken.
(def reader-channel (delay (let [temp (chan)]
(go
(with-open [reader (clojure.java.io/reader "My_Big_Log")]
(doseq [ln (line-seq reader)]
(>! temp ln)))
(close! temp))
temp)))
(def writer-channel (chan))
(defn make-collection [] (loop [my-coll []] (let [item (<!! @reader-channel)]
(if (nil? item)
my-coll
(do (let [temp (re-find #"[a-z]+\.[a-z]+\.[a-z]+" item)]
(recur (conj my-coll temp))))))))
(def transformed-collection (delay (partition-by identity
(remove nil? (sort (make-collection))))))
(defn transform [] (go-loop [counter 0]
(if (>= counter (count @transformed-collection))
(do (close! writer-channel)
(println "Goodbye"))
(do (let [item (str "Referrer " (+ counter 1) ": "
(first (nth @transformed-collection counter)))]
(>! writer-channel item))
(let [item (str "Number of entries associated with this referrer: "
(count (nth @transformed-collection counter)))]
(>! writer-channel item))
(recur (inc counter))))))
(defn write-to-file [] (with-open [wrtr (clojure.java.io/writer "Result.txt" :append true)]
(loop []
(when-let [temp (<!! writer-channel)]
(.write wrtr (str temp "\n"))
(recur)))))
I apologise for bad indentation and formatting.
transformis doing multiple tremendously expensive operations every time through the loop.countandnthon a lazy sequence each take O(n) time. Instead of using either of these, process the sequence lazily withfirstandnext.