How to download 300k log lines from my application?

784 Views Asked by At

I am running a job on my Heroku app that generates about 300k lines of log within 5 minutes. I need to extract all of them into a file. How can I do this?

The Heroku UI only shows logs in real time, since the moment it was opened, and only keeps 10k lines.

I attached a LogDNA Add-on as a drain, but their export also only allows 10k lines export. To even have the option of export, I need to apply a search filter (I typed 2020 because all the lines start with a date, but still...). I can scroll through all the logs to see them, but as I scroll up the bottom gets truncated, so I can't even copy-paste them myself.

I then attached Sumo Logic as a drain, which is better, because the export limit is 100k. However I still need to filter the logs in 30s to 60s intervals and download separately. Also it exports to CSV file and in reverse order (newest first, not what I want) so I have to still work on the file after its downloaded.

Is there no option to get actual raw log files in full?

2

There are 2 best solutions below

0
On

Is there no option to get actual raw log files in full?

There are no actual raw log files.

Heroku's architecture requires that logging be distributed. By default, its Logplex service aggregates log output from all services into a single stream and makes it available via heroku logs. However,

Logplex is designed for collating and routing log messages, not for storage. It retains the most recent 1,500 lines of your consolidated logs, which expire after 1 week.

For longer persistence you need something else. In addition to commercial logging services like those you mentioned, you have several options:

  • Log to a database instead of files. Something like Apache Cassandra might be a good fit.
  • Send your logs to a logging server via Syslog (my preference):

    Syslog drains allow you to forward your Heroku logs to an external Syslog server for long-term archiving.

  • Send your logs to a custom logging process via HTTPS.

    Log drains also support messaging via HTTPS. This makes it easy to write your own log-processing logic and run it on a web service (such as another Heroku app).

2
On

Speaking solely from the Sumo Logic point of view, since that’s the only one I’m familiar with here, you could do this with its Search Job API: https://help.sumologic.com/APIs/Search-Job-API/About-the-Search-Job-API

The Search Job API lets you kick off a search, poll it for status, and then when complete, page through the results (up to 1M records, I believe) and do whatever you want with them, such as dumping them into a CSV file.

But this is only available to trial and Enterprise accounts.

I just looked at Heroku’s docs and it does not look like they have a native way to retrieve more than 1500 and you do have to forward those logs via syslog to a separate server / service.

I think your best solution is going to depend, however, on your use-case, such as why specifically you need these logs in a CSV.