How to parse log file using python and store data in database?

Question

How to parse log file using python and store data in database?

4.4k Views Asked by rajsinghaniaful At 04 September 2014 at 05:36

I am trying to parse a log file .which contains the structure like given below i want to do it with python and want to store extracted data in database how can i do this ?

i am able to parse simple key value pair but facing some problem.

1: How can i parse nested structure for example context field in the sample file is nested in main group?

2: How to tackle with condition if separator comes as a string . like for key:value pair separator is colon (:) and in the "site" key there is a key:value pair site_url:http://something.com here url also contains colon (:) which gives the wrong answer.

{
        "username": "lavania",
        "host": "10.105.22.32",
        "event_source": "server",
        "event_type": "/courses/XYZ/CS101/2014_T1/xblock
/i4x:;_;_XYZ;_CS101;_video;_d333fa637a074b41996dc2fd5e675818/handler/xmodule_handler/save_user_state",
        "context": {
            "course_id": "XYZ/CS101/2014_T1",
            "course_user_tags": {},
            "user_id": 42,
            "org_id": "XYZ"
        },
        "time": "2014-06-20T05:49:10.468638+00:00",
        "site":"http://something.com",
        "ip": "127.0.0.1",
        "event": "{\"POST\": {\"saved_video_position\": [\"00:02:10\"]}, \"GET\": {}}",
        "agent": "Mozilla/5.0 (X11; Ubuntu; Linux i686; rv:18.0) Gecko/20100101 Firefox/18.0",
        "page": null
    }

    {
        "username": "rihana",
        "host": "10.105.22.32",
        "event_source": "server",
        "event_type": "problem_check",
        "context": {
            "course_id": "XYZ/CS101/2014_T1",
            "course_user_tags": {},
            "user_id": 40,
            "org_id": "XYZ",
            "module": {
                "display_name": ""
            }
        },
        "time": "2014-06-20T06:43:52.716455+00:00",
        "ip": "127.0.0.1",
        "event": {
            "submission": {
                "i4x-XYZ-CS101-problem-33e4aac93dc84f368c93b1d08fa984fc_2_1": {
                    "input_type": "choicegroup",
                    "question": "",
                    "response_type": "multiplechoiceresponse",
                    "answer": "MenuInflater.inflate()",
                    "variant": "",
                    "correct": true
                }
            },
            "success": "correct",
            "grade": 1,
            "correct_map": {
                "i4x-XYZ-CS101-problem-33e4aac93dc84f368c93b1d08fa984fc_2_1": {
                    "hint": "",
                    "hintmode": null,
                    "correctness": "correct",
                    "npoints": null,
                    "msg": "",
                    "queuestate": null
                }
            },
            "state": {
                "student_answers": {},
                "seed": 1,
                "done": null,
                "correct_map": {},
                "input_state": {
                    "i4x-XYZ-CS101-problem-33e4aac93dc84f368c93b1d08fa984fc_2_1": {}
                }
            },
            "answers": {
                "i4x-XYZ-CS101-problem-33e4aac93dc84f368c93b1d08fa984fc_2_1": "choice_0"
            },
            "attempts": 1,
            "max_grade": 1,
            "problem_id": "i4x://XYZ/CS101/problem/33e4aac93dc84f368c93b1d08fa984fc"
        },
        "agent": "Mozilla/5.0 (X11; Ubuntu; Linux i686; rv:29.0) Gecko/20100101 Firefox/29.0",
        "page": "x_module"
    }


    {
        "username": "troysa",
        "host": "localhost",
        "event_source": "server",
        "event_type": "/courses/XYZ/CS101/2014_T1/instructor_dashboard/api/list_instructor_tasks",
        "context": {
            "course_id": "XYZ/CS101/2014_T1",
            "course_user_tags": {},
            "user_id": 6,
            "org_id": "XYZ"
        },
        "time": "2014-06-20T05:49:26.780244+00:00",
        "ip": "127.0.0.1",
        "event": "{\"POST\": {}, \"GET\": {}}",
        "agent": "Mozilla/5.0 (X11; Ubuntu; Linux i686; rv:29.0) Gecko/20100101 Firefox/29.0",
        "page": null
    }

Original Q&A

There are 2 best solutions below

MattDMo On 04 September 2014 at 05:38

Your data is in the JSON format. Use the json module in the standard library to parse it.

However, your data seems to be several JSON dicts concatenated together. Hopefully you just pasted from several individual entries, otherwise you're going to have to do some data cleanup before you start parsing in great detail.

Supposing these are individual files, I'll give an example of the "username": "raeha" set, which has been loaded into the data variable:

>>> import json
>>> newdata = json.loads(data)
>>> print(newdata["context"])
{'course_id': 'XYZ/CS101/2014_T1', 'course_user_tags': {}, 'org_id': 'XYZ', 'user_id': 40, 'module': {'display_name': ''}}
>>> print(newdata["context"]["user_id"])
40

The json.loads() method takes raw JSON data (as a string) and formats it into Python datatypes. Typically, the outermost type is a dict, each key of which is a string, and each value can be a string, list, dict, numeric value, or item like True, False, or None. These correspond to true, false, and null in JSON.

**Michael Petch** · Accepted Answer · 2014-09-04T06:32:40.480000

As has been pointed out this is a JSON data structure. I wrote some quick code that will read your log file line by line and attempt to find complete multi-line json objects. Once all the lines are read it is finished. I use pprint on the objects so that the output is human readable to ensure the dict that is returned looks correct.

import json
import pprint

with open("log.txt") as infile:
    # Loop until we have parsed all the lines.
    for line in infile:
        # Read lines until we find a complete object
        while (True):
            try:
                json_data = json.loads(line)
                # We have a complete onject here
                pprint.pprint(json_data)
                # Try and find a new JSON object
                break
            except ValueError:
                # We don't have a complete JSON object
                # read another line and try again
                line += next(infile)

This code is a bit of a kludge. It reads a line and sees if we have a complete parseable object. If not it reads the next line and concatenates it with the last. This continues until a parseable object can be loaded. It then does this over and over until all the lines are consumed and all objects have been found.

At this point in the code you have read a complete JSON object into json_data:

pprint.pprint(json_data)

I pprint the dict out but it is a standard python dictionary that can be processed for data as using normal dict traversal. For example you could retrieve the course_id with something like:

json_data['context']['course_id']

or the host via:

json_data['host']

How to parse log file using python and store data in database?

There are 2 best solutions below

Related Questions in PYTHON

Related Questions in PARSING

Related Questions in LOGGING

Related Questions in LOGFILES

Related Questions in LOG-FILES

Trending Questions

Popular # Hahtags

Popular Questions