I'm developing an app for Lion and what I want to do is open a .webarchive file, modify a snippet of the DOM, and then write out the modified DOM to the same file.
Here is my code thus far. It opens the webarchive, modifies it, and then saves it back to the file.
NSString *archivePath = @"/Users/tigger/Library/Mail/V2/MailData/Signatures/1216DD8D-C7E2-4DE1-9FCD-0A9A3412C788.webarchive";
NSData *plistData = [NSData dataWithContentsOfFile:archivePath];
NSString *error;
NSPropertyListFormat format;
NSMutableDictionary *plist;
plist = (NSMutableDictionary *)[NSPropertyListSerialization propertyListFromData:plistData
mutabilityOption:NSPropertyListMutableContainersAndLeaves
format:&format
errorDescription:&error];
if(!plist){
printf("no plist");
[error release];
}else{
NSString *s = [NSString stringWithUTF8String:[[[plist objectForKey:@"WebMainResource"] objectForKey:@"WebResourceData"] bytes]];
NSString *new = [s stringByReplacingOccurrencesOfString:@"</body>" withString:@"hey there!</body>"];
[[plist objectForKey:@"WebMainResource"] setObject:new forKey:@"WebResourceData"];
printf("Archive: %s", [[plist description] UTF8String]);
NSData *data = [NSPropertyListSerialization dataFromPropertyList:plist format:NSPropertyListBinaryFormat_v1_0 errorDescription:nil];
[data writeToURL:[NSURL fileURLWithPath:@"/Users/tigger/Library/Mail/V2/MailData/Signatures/test.webarchive"] atomically:YES];
}
The problem is that the resulting webarchive is invalid. The original looks like this:
bplist00—_WebMainResource’
_WebResourceTextEncodingName_WebResourceFrameName^WebResourceURL_WebResourceData_WebResourceMIMETypeUUTF-8PUdata:O<span class="Apple-style-span" style="border-collapse: separate; color: rgb(0, 0, 0); font-family: Helvetica; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: normal; orphans: 2; text-align: -webkit-auto; text-indent: 0px; text-transform: none; white-space: normal; widows: 2; word-spacing: 0px; -webkit-border-horizontal-spacing: 0px; -webkit-border-vertical-spacing: 0px; -webkit-text-decorations-in-effect: none; -webkit-text-size-adjust: auto; -webkit-text-stroke-width: 0px; font-size: medium; "><body style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space; "><div>Dan Shipper</div><div>[email protected]</div><div><br></div></body></span><br class="Apple-interchange-newline">Ytext/html(F]l~îöõ°™
¥
While the resulting webarchive looks like this:
bplist00—_WebMainResource’
^WebResourceURL_WebResourceFrameName_WebResourceMIMEType_WebResourceData_WebResourceTextEncodingNameUdata:PYtext/html_<span class="Apple-style-span" style="border-collapse: separate; color: rgb(0, 0, 0); font-family: Helvetica; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: normal; orphans: 2; text-align: -webkit-auto; text-indent: 0px; text-transform: none; white-space: normal; widows: 2; word-spacing: 0px; -webkit-border-horizontal-spacing: 0px; -webkit-border-vertical-spacing: 0px; -webkit-text-decorations-in-effect: none; -webkit-text-size-adjust: auto; -webkit-text-stroke-width: 0px; font-size: medium; "><body style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space; "><div>Dan Shipper</div><div>[email protected]</div><div><br></div>hey there!</body></span><br class="Apple-interchange-newline">UUTF-8(7Ndvîöõ•∏
æ
Anyone have any ideas on why it's invalid or how to fix it? Thanks so much for your help!
I've also tried to use the textutil convert command to generate the webarchive, but it doesn't work because in my original HTML file I have an image like this:
<img src="http://www.domainpolish.com/images/crowd.png">
But when I use textutil it downloads the image and saves it like this:
<img src"file:///1.png">
Even though I don't want it to download or change the url. I've used the noload, nostore and baseurl options to no avail.
EDIT: Fixed it!! So the problem was that I was when I was replacing the HTML I was inserting it as an NSString instead of an NSData:
NSString *s = [NSString stringWithUTF8String:[[[plist objectForKey:@"WebMainResource"] objectForKey:@"WebResourceData"] bytes]];
NSString *new = [s stringByReplacingOccurrencesOfString:@"</body>" withString:@"hi there!</body>"];
NSData *sourceData = [new dataUsingEncoding:NSUTF8StringEncoding];
[[plist objectForKey:@"WebMainResource"] setObject:sourceData forKey:@"WebResourceData"];
From Wikipedia :
With that in mind, you could just use
NSKeyedEncoder
to find the list of file and then use NSData to split the files and find theHTML
you're looking for.