I am using ActionMailer in a Ruby on Rails app to read emails (ruby 1.9.3, rails 3.2.13). I have an email that has a winmail.dat file attached to it (ms-tnef) and I am using the tnef gem to extract its contents.
The problem is that when I read the attachment from the mail, it gets corrupted and tnef can not extract files from it.
$ tnef winmail.dat
ERROR: invalid checksum, input file may be corrupted
Extracting the winmail.dat attachment using any mail app, the extracted winmail.dat works fine with tnef and I got it's content.
Comparing the two files I noticed that: - original file is bigger (76k against 72k) - they differ on line breaks: Orginal file has the windows format (0D 0A) and the file saved by rails has the linux format (0A)
I wrote this test:
it 'should extract winmail.dat from email and extract its contents' do
file_path = "#{::Rails.root}/spec/files/winmail-dat-001.eml"
message = Mail::Message.new(File.read(file_path))
anexo = message.attachments[0]
files = []
Tnef.unpack(anexo) do |file|
files << File.basename(file)
end
puts files.inspect
files.size.should == 2
end
That fails with these messages:
WARNING: invalid checksum, input file may be corrupted
Invalid RTF CRC, input file may be corrupted
WARNING: invalid checksum, input file may be corrupted
Assertion failed: ((attr->lvl_type == LVL_MESSAGE) || (attr->lvl_type == LVL_ATTACHMENT)), function attr_read, file attr.c, line 240.
Errno::EPIPE: Broken pipe
anexo = message.attachments[0]
=> #<Mail::Part:2159872060, Multipart: false, Headers: <Content-Type: application/ms-tnef; name="winmail.dat">, <Content-Transfer-Encoding: quoted-printable>, <Content-Disposition: attachment; filename="winmail.dat">>
I tried to save it to disk as bynary, and read it again, but I got the same result
it 'should extract winmail.dat from email and extract its contents' do
file_path = "#{::Rails.root}/spec/files/winmail-dat-001.eml"
message = Mail::Message.new(File.read(file_path))
anexo = message.attachments[0]
tmpfile_name = "#{::Rails.root}/tmp/#{anexo.filename}"
File.open(tmpfile_name, 'w+b', 0644) { |f| f.write anexo.body.decoded }
anexo = File.open(tmpfile_name)
files = []
Tnef.unpack(anexo) do |file|
files << File.basename(file)
end
puts files.inspect
files.size.should == 2
end
How should I read the attachment?
The method anexo.body.decoded calls the decode method of the best suited encoding (
Mail::Encodings
) for the attachment, in your case quoted_printable.Some of these encodings (7bit, 8bit and quoted_printable), perform a conversion, changing different types of line breaks to the platform specific line break. the *quoted_printable" call .to_lf that corrupt the winmail.dat file
mail/core_extensions/string.rb:
To solve it you have perform the same encoding without the last .to_lf. To do that you can create a new encoding that does not corrupt your file and use it to encode you attachment.
create the file: lib/encodings/tnef_encoding.rb
To use your custom encoding you must, first, register it:
And then, set it as your preferred encoding for the attachment:
Your test would, then, become:
Hope it helps!