Pythonic way to parse huge string into fields

52 Views Asked by At

I have several hundred files that contain email messages as strings. The filenames look like this:

00005.34bcaad58ad5f598f5d6af8cfa0c0465 --
00250.c7603b27a45284d12b49adf767b2b6fa --
00249.b9183324a9726e8b6c8779045a921243 --
00248.9599b06d2d2c08b57ff1de06316d66c0 --
00247.42534d5df0700cb2adf240556c539947 --
00246.fdaacadac7143848978ea0af07eed070 --

The content of those files looks similar the snippet below

Return-Path: <[email protected]>
Received: from lockergnome.com (sprocket.lockergnome.com [130.94.96.247])
    by dogma.slashnull.org (8.11.6/8.11.6) with SMTP id g6IKksJ07017
    for <[email protected]>; Thu, 18 Jul 2002 21:46:54 +0100
X-Mailer: ListManager Web Interface
Date: Thu, 18 Jul 2002 09:55:22 -0500
Subject: [Lockergnome Windows Daily]  Sticker Courtesy
To: [email protected]
From: Lockergnome Windows Daily <[email protected]>
List-Unsubscribe: <mailto:[email protected]>
List-Subscribe: <mailto:[email protected]>
List-Owner: <mailto:[email protected]>
X-URL: <http://www.lockergnome.com/>
X-List-Host: Lockergnome <http://www.lockergnome.com/>
Reply-To: [email protected]
Sender: [email protected]
Message-Id: <LISTMANAGERSQL-2534368-1682723-2002.07.18-09.57.34--qqqqqqqqqq-lg#[email protected]>
MIME-Version: 1.0
Content-Type: text/html; charset=us-ascii
...

There aren't any rules or standards to the actual content in the files (I already had a difficult time decoding them).

Is there a way in python to parse this into something akin to a dictionary? I'm not super picky, although a dictionary would be lovely. Really, I just don't want to write a mammoth of a custom parser. I've tried a couple of standards (e.g. json.loads), but I haven't found anything that works universally.

0

There are 0 best solutions below