Unable to recombine chunked download of MP3 data

156 Views Asked by At

I am using Perl with WWW::Mechanize to download an MP3 file which is served in chunks of 400KB (around 20 seconds).

When I save the data with binmode on the file handle, appending each chunk as it arrives, only the first chunk is played correctly; the rest is not.

When I don't use binmode I can't play the whole file -- it plays but sounds interesting!

This is my program

use WWW::Mechanize;

$agent = WWW::Mechanize->new( cookie_jar => {} );

@links = ("http://thehost.com/chunk1","http://thehost.com/chunk2","http://thehost.com/chunk3");

foreach (@links){
    $agent->get($_);

    my $filename = 'test.mp3';
    open(my $fh, '>>', $filename) or die "Could not open file '$filename' $!";
    binmode $fh;
    print $fh $agent->content;
    close $fh;
}

What am I doing wrong?

Update

These are the HTTP headers that are being returned.

Cache-Control: public
Connection: close
Date: Tue, 28 Oct 2014 18:38:37 GMT
Pragma:
Server: Apache
Content-Length: 409600
Content-Type: application/octet-stream
Expires: Sat, 24 Oct 2015 12:08:00 GMT
Access-Control-Allow-Origin: *
Client-Date: Tue, 28 Oct 2014 18:38:28 GMT
Client-Peer: **.**.***.***:80
Client-Response-Num: 1
3

There are 3 best solutions below

3
On

I can't explain the behaviour that you're getting, but WWW::Mechanize is intended for working with HTML text pages, and isn't that good with binary data. Using the LWP::UserAgent module directly isn't at all hard.

I suggest you use something like this instead.

use strict;
use warnings;
use 5.010;
use autodie;

use LWP;

my @links = qw(
  http://thehost.com/chunk1
  http://thehost.com/chunk2
  http://thehost.com/chunk3
);

my $agent = LWP::UserAgent->new;

my $filename = 'test.mp3';
open my $fh, '>:raw', $filename;

for my $link (@links) {
    my $resp = $agent->get($link);
    die $resp->status_line unless $resp->is_success;
    print $fh $resp->decoded_content;
}

close $fh;

If you still have problems then please add a line like this

print $resp->headers_as_string, "\n\n";

right after the get call, and report back with the results you get.

You may also get some results by using the content method instead of decoded_content.

Of course it may help us a lot if you could give out the real URLs, but I realise that you may not be able to do that.

7
On

I suspect the content is served with incorrect headers, and as you are using the API that automatically decodes, this corrupts the octet stream.

Use the mirror method instead and concatenate the files after downloading.

6
On

I doubt that a single mp3 file is just split after some number of bytes and then these chunks are offered as a separate downloads. Instead I assume that these are each separate mp3 files which contain 20 seconds of the original file and each of the URLs contains a correct mp3 file. Because mp3 is just not data but header and data you cannot simple merge these mp3 files by just concatenating them together. Instead you must you a program like ffmpeg to create a single mp3 file from multiple mp3 files, see https://superuser.com/questions/314239/how-to-join-merge-many-mp3-files