How to seek in ogg vorbis file without loading all file into memory?

2.8k Views Asked by At

I'm trying to find a way to move to required position in the track without loading all file into memory. And without using vorbisfile, because the file is stored in the remote server. I read the paragraph in documentation about seeking but couldn't understand it.

2

There are 2 best solutions below

0
On

Seeking ogg files is hard.

List of things to understand

  1. When ogg_page_packets(&og) > 0, the ogg_page is an end page
  2. When ogg_page_granulepos(&og) > 0, the page has the last time stamp within the stream of packets
  3. When ogg_page_peek(&oy,&og) == 1, a complete page is popped out.
  4. When ogg_page_peek(&oy,&og) == 0, the page is incomplete
  5. When ogg_page_peek(&oy,&og) == -1, the libogg function did not find the beginning of the page
  6. vorbis_granule_time(&vd, ogg_page_granulepos(&og)) prints out the time in seconds

You will need this function

int buffer_data(){
//oy is an ogg_sync_state https://xiph.org/ogg/doc/libogg/ogg_sync_state.html
//in is just a file
  char *buffer=ogg_sync_buffer(&oy,4096);
  int bytes=fread(buffer,1,4096,&in);
  ogg_sync_wrote(&oy,bytes);
  return(bytes);
}

In my code below, I will add another layer which is essentially the file buffer in addition to the ogg page and ogg packet. Essentially, my code only bisects the first synced end page of each file buffer.

When I cannot find a ogg_page_sync, my code creates a second block cursor to load the next 4k file buffer until I either find the page sync or surpasses the boundaries.

#include <unordered_map>
struct _page_info {
    size_t block_number;
    double_t time;
    ogg_int64_t granulepos;
};


struct _page_info left_page = { .time = 0, .block_number = 0, .granulepos = 0 };
struct _page_info mid_page = { .time = 0, .block_number = 0, .granulepos = 0 };
struct _page_info right_page = { .time = DBL_MAX, .block_number = 0x7FFFFFFFFFFFFFFF, .granulepos = 0x7FFFFFFFFFFFFFFF };
unordered_map<int, double> block_time;
unordered_map<ogg_int64_t, _page_info> page_info_table;
ogg_page og;

while (left <= right) {
    //Seek to block
    size_t mid_block = left + (right - left) / 2;
    int block = mid_block;

    if (block_time.has(block)) {
        //Check whether this block has been visited
        break;
    }

    //clear the sync state
    ogg_sync_reset(&oy);
    file.seek(block * buffer_size);
    buffer_data();

    bool next_midpoint = true;
    while (true) {
        //keep syncing until a page is found. Buffer is only 4k while ogg pages can be up to 65k in size
        int ogg_page_sync_state = ogg_sync_pageout(&oy, &og);
        if (ogg_page_sync_state == -1) {
            //Give up when the file advances past the right boundary
            if (buffer_data() == 0) {
                right = mid_block;
                break;
            } else {
                //increment block size we buffered the next block
                block++;
            }
        } else {
            if (ogg_page_sync_state == 0) {
                //Check if I reached the end of the file
                if (buffer_data() == 0) {
                    right = mid_block;
                    break;
                } else {
                    block++;
                }
            } else {
                //Only pages with a end packet have granulepos. Check the stream
                if (ogg_page_packets(&og) > 0 && ogg_page_serialno(&og) == vo.serialno) {
                    next_midpoint = false;
                    break;
                }
            }
        }
    }
    if (next_midpoint)
        continue;

    ogg_int64_t granulepos = ogg_page_granulepos(&og);
    ogg_int64_t page_number = ogg_page_pageno(&og);
    struct _page_info pg_info = { .time = vorbis_granule_time(vd, granulepos), .block_number = mid_block, .granulepos = granulepos };
    page_info_table[page_number] = pg_info;
    block_time[mid_block] = pg_info.time;
    mid_page = pg_info;

    //I can finally implement the binary search comparisons
    if (abs(p_time - pg_info.time) < .001) {
        //The video managed to be equal
        right_page = pg_info;
        break;
    }
    if (pg_info.time > p_time) {
        if (pg_info.granulepos < right_page.granulepos)
            right_page = pg_info;
        right = mid_block;
    } else {
        if (pg_info.granulepos > left_page.granulepos)
            left_page = pg_info;
        left = mid_block;
    }
}

When you are done, you essentially backtrack the ogg_pages until you find the desired ogg_packet.

Here is a trick to calculate the timestamp with serially incremented packets

 while(ogg_sync_pageout(&oy, &og) > 0)
    ogg_stream_pagein(&vo, &og);
    ogg_int64_t last_granule = ogg_page_granulepos(&og);
    ogg_int64_t total_granule = ogg_page_packets(&og));
    while(ogg_stream_packetout(&vo, &op) > 0 ) {
         double time = vorbis_granule_time(&vd, last_granule - total_granule--);
    }

https://xiph.org/ogg/doc/libogg/reference.html

https://github.com/xiph/theora/blob/master/examples/player_example.c

https://xiph.org/vorbis/doc/libvorbis/reference.html

https://xkcd.com/979/

https://xiph.org/oggz/doc/group__basics.html

1
On

If the remote server lets you use HTTP GET's with Range headers, you can "fake" the file access by sending a bunch of requests for the different parts just like you would for a local file...

ASSUMING: The file is Ogg-encapsulated and ONLY has the Vorbis stream in it...

  1. Do an HTTP HEAD request to get the total length of the file
  2. GET the first 4KB of the file and "sync" the Vorbis headers. You might need to get more data to complete this.
  3. GET the last 4KB of the file and "sync" the last Ogg page header to get the total sample count
  4. Do the spec-described bi-section search, substituting HTTP GET w/ Range in place of fseek / fread

If you do it right, seeking should transfer less than 100KB in most cases.

UPDATE:

The bi-section search is a bit non-intuitive... The idea is to jump around in the file looking for the correct page, but every jump is "informed" by the previous jumps and the current page... An example is probably best:

To seek to sample 300,000 in a file having 1,000,000 samples (I'm assuming we're on step 4 above):

  1. Seek physical file to {fileStream.Length * .3}
  2. Read forward until you find an Ogg Page
  3. Check that the page is part of the Vorbis stream in question
  4. If not, go to next Ogg Page & go to step 3
  5. Check the granule position
  6. If not the right page, seek physical file to {current position + ((300000 - granule position) / 1000000) * fileStream.Length} & go to step 2
  7. You've found the right page, but may need to move back a page to get the "pre-roll"... Vorbis requires 1 packet to be decoded before the desired packet.

There are probably better algorithms for this, but that's the basic idea.

Remember, granule position is the sample count at the END of the page, so when you find the correct page its granule position will be slightly larger than your target.