Open a .webarchive Modify It and Save It

2.7k Views Asked by At

I'm developing an app for Lion and what I want to do is open a .webarchive file, modify a snippet of the DOM, and then write out the modified DOM to the same file.

Here is my code thus far. It opens the webarchive, modifies it, and then saves it back to the file.

    NSString *archivePath = @"/Users/tigger/Library/Mail/V2/MailData/Signatures/1216DD8D-C7E2-4DE1-9FCD-0A9A3412C788.webarchive";
    NSData *plistData = [NSData dataWithContentsOfFile:archivePath];
    NSString *error;
    NSPropertyListFormat format;
    NSMutableDictionary *plist;

    plist = (NSMutableDictionary *)[NSPropertyListSerialization propertyListFromData:plistData
                                             mutabilityOption:NSPropertyListMutableContainersAndLeaves
                                                       format:&format
                                             errorDescription:&error];
    if(!plist){
        printf("no plist");
        [error release];
    }else{
        NSString *s = [NSString stringWithUTF8String:[[[plist objectForKey:@"WebMainResource"] objectForKey:@"WebResourceData"] bytes]];
        NSString *new = [s stringByReplacingOccurrencesOfString:@"</body>" withString:@"hey there!</body>"];

        [[plist objectForKey:@"WebMainResource"] setObject:new forKey:@"WebResourceData"];
        printf("Archive: %s", [[plist description] UTF8String]);       
        NSData *data = [NSPropertyListSerialization dataFromPropertyList:plist format:NSPropertyListBinaryFormat_v1_0 errorDescription:nil];
        [data writeToURL:[NSURL fileURLWithPath:@"/Users/tigger/Library/Mail/V2/MailData/Signatures/test.webarchive"] atomically:YES];

    }

The problem is that the resulting webarchive is invalid. The original looks like this:

bplist00—_WebMainResource’  
_WebResourceTextEncodingName_WebResourceFrameName^WebResourceURL_WebResourceData_WebResourceMIMETypeUUTF-8PUdata:O<span class="Apple-style-span" style="border-collapse: separate; color: rgb(0, 0, 0); font-family: Helvetica; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: normal; orphans: 2; text-align: -webkit-auto; text-indent: 0px; text-transform: none; white-space: normal; widows: 2; word-spacing: 0px; -webkit-border-horizontal-spacing: 0px; -webkit-border-vertical-spacing: 0px; -webkit-text-decorations-in-effect: none; -webkit-text-size-adjust: auto; -webkit-text-stroke-width: 0px; font-size: medium; "><body style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space; "><div>Dan Shipper</div><div>[email protected]</div><div><br></div></body></span><br class="Apple-interchange-newline">Ytext/html(F]l~îöõ°™
¥

While the resulting webarchive looks like this:

bplist00—_WebMainResource’  
^WebResourceURL_WebResourceFrameName_WebResourceMIMEType_WebResourceData_WebResourceTextEncodingNameUdata:PYtext/html_<span class="Apple-style-span" style="border-collapse: separate; color: rgb(0, 0, 0); font-family: Helvetica; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: normal; orphans: 2; text-align: -webkit-auto; text-indent: 0px; text-transform: none; white-space: normal; widows: 2; word-spacing: 0px; -webkit-border-horizontal-spacing: 0px; -webkit-border-vertical-spacing: 0px; -webkit-text-decorations-in-effect: none; -webkit-text-size-adjust: auto; -webkit-text-stroke-width: 0px; font-size: medium; "><body style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space; "><div>Dan Shipper</div><div>[email protected]</div><div><br></div>hey there!</body></span><br class="Apple-interchange-newline">UUTF-8(7Ndvîöõ•∏
æ

Anyone have any ideas on why it's invalid or how to fix it? Thanks so much for your help!

I've also tried to use the textutil convert command to generate the webarchive, but it doesn't work because in my original HTML file I have an image like this:

<img src="http://www.domainpolish.com/images/crowd.png">

But when I use textutil it downloads the image and saves it like this:

<img src"file:///1.png">

Even though I don't want it to download or change the url. I've used the noload, nostore and baseurl options to no avail.

EDIT: Fixed it!! So the problem was that I was when I was replacing the HTML I was inserting it as an NSString instead of an NSData:

NSString *s = [NSString stringWithUTF8String:[[[plist objectForKey:@"WebMainResource"] objectForKey:@"WebResourceData"] bytes]];
NSString *new = [s stringByReplacingOccurrencesOfString:@"</body>" withString:@"hi there!</body>"];
NSData *sourceData = [new dataUsingEncoding:NSUTF8StringEncoding];
[[plist objectForKey:@"WebMainResource"] setObject:sourceData forKey:@"WebResourceData"];
2

There are 2 best solutions below

1
On

From Wikipedia :

The webarchive format is a concatenation of source files with filenames saved in the binary plist format using NSKeyedEncoder.

With that in mind, you could just use NSKeyedEncoder to find the list of file and then use NSData to split the files and find the HTML you're looking for.

0
On

Update: I just re-read the question and saw the solution...

You are replacing the main resource data with the wrong object in this line:

[[plist objectForKey:@"WebMainResource"] setObject:new forKey:@"WebResourceData"];

new is a NSString where it you should be a NSData object:

After the replacement, you should convert the string content to binary data.

[[plist objectForKey:@"WebMainResource"] setObject:[new dataUsingEncoding:NSUTF8StringEncoding] forKey:@"WebResourceData"];