timestamps seem to allow 0 seconds duration for some words in results, is this a bug?

180 Views Asked by At

When using the google cloud speech api, the new word accurate timestamps/timecode feature, seem to allow 0 seconds duration for some words in results, here is an example

... { startTime: '48.800s', endTime: '48.800s', word: 'a' }, { startTime: '48.800s', endTime: '49.200s', word: 'kindly' }, ...

is this a bug?

To test I used a clip from audio archive "Arthur the Rat", "USA - General mid-western speaker (Michigan)".

2

There are 2 best solutions below

0
On

you can get better than second precision using the returned timestamp.

you get the start time out of the structure containing the word and you can output it in the following way:

start_time.seconds + start_time.nanos * 1e-9

0
On

David Anderson's answer is correct, I just thought I'd elaborate it as I initially thought the response is only to the second precision and not 100ms as the docs describe.

As of July 2018, sending a request to the google cloud speech API including word time offsets returns a response object where each word result in response.results has the structure:

start_time {
  seconds: 24
  nanos: 100000000
}
end_time {
  seconds: 24
  nanos: 700000000
}
word: "of"

The nanos field allows you to get the start and end time to the 100ms precision. So you can obtain the start and end times like so:

print(start_time.seconds + start_time.nanos * 1e-9)
print(end_time.seconds + end_time.nanos * 1e-9)

==== Output ====

24.1
24.7