I have a Blink archive (in mht format) saved from Chrome browser. I'm trying to convert the section
Content-Type: image/jpeg
Content-Transfer-Encoding: binary
Content-Location: https://some_url
ÿØÿà^@^PJFIF^@^A^A^A^@`^@`^@^@ÿÛ^@C^@^A^A^A^A^A^A^A^A^A^A^A^A^A^A^A^A^A^A^A^A^A^A^A^A^A^A^A^A^A^A^A^A^A^A^A^A^A^A^A^A^A^A^A^A^A^A^A^A^A^A^A^A^A^A^A^A^A^A^A^A^A^A^A^AÿÛ^
^KÿÄ^@µ^P^@^B^A^C^C^B^D^C^E^E^D^D^@^@^A}^A^B^C^@
to image file as follows
string s = "\nÿØÿà^@^PJ..."
byte [] result = System.Convert.FromBase64String(s)
File.WriteAllBytes("image.jpg", result);
And I have an error message The input is not a valid Base-64 string as it contains a non-base 64 character, more than two padding characters, or an illegal character among the padding characters.
How can I fix it? There are probably \n characters in the string. When I replace \n with empty string it does not help.
Because you mentioned that you want to implement your solution in Java, I developed a simple solution that can be easily converted to Java.
The following code reads the
robot.mhtmlfile and dumps the content of each part to separate files in theout/directory:I tested it, and it works:
Let me provide a complete explanation of the Regex for you:
Content-Locationheader) and its content..includes everything except\n. Therefore, when we intend to include everything, including new lines, we should use(.|\n).\r\nbetween the headers and content.(?<group_name>pattern)creates a regex group with the namegroup_nameand a specifiedpattern, allowing us to request the matches to return only these specific parts from the complete match.+?signifies that it should not extend the text greedily. If you use a simple+, it captures content until the last\n------MultipartBoundary--(resulting in only one file being extracted). However, we aim to capture content until the first occurrence (visit here for more information)..+(?=sequence)implies searching until thesequenceis located (see here for more information).Some other notes:
ISO-8859-1. So, you should read and write files using this encoding.In addition to coding and logging, you can test your customized regex in this dotnet-specific regex tester to observe the results and captured groups: