I enjoy reading the updates on Daala, not so much because I would be urgently waiting for the next video codec, but because of the fresh ideas on technology that are interesting to read.
While preceding updates contained a lot of "out of the box"-thinking, I was surprised on how "conventional" your description of the challenge to encode I-frames came along.
Did you ever consider ditching the whole "I-/B-/P-frame" methodology? If not, please do: After all, all frames are displayed for the same period of time. There's no obvious sense in spending a lot of bits on few of them, and only few bits on most others. When you say that there have to be reference frames for seeking and such: That is not true. It would be just as possible to encode a Group Of Pictures as a whole, where the decompression would yield all frames of the GOP at once. If experience with using Vector Quantization has shown us one thing, that is the efficiency of encoding "as many correlated pieces of information as possible" - and the frames inside a GOP certainly are highly correlated. So why not consider the time (inside a GOP) just one dimension of vectors that also contain spatial and color information in other dimensions, and encode them alltogether, using the available bits on all frames of the GOP equally, not handling some "I-frame" specially?
I understand that encoding/decoding whole GOPs does require some memory and would mean a lower limit on latency for streams depending on the number of frames inside a GOP - but hey - that's not so much different as with h.264 etc., where there are already multiple frames depending on each other such that you have to decode "future" frames before you can display the "current"...
I hope I didn't just miss any preceding discussion on the ideas above, if so, I apologize and would be greatful for some link to it.
What about thinking outside the box of I-/B-/P-Frames?
Date: 2014-12-25 07:10 pm (UTC)I enjoy reading the updates on Daala, not so much because I would be urgently waiting for the next video codec, but because of the fresh ideas on technology that are interesting to read.
While preceding updates contained a lot of "out of the box"-thinking, I was surprised on how "conventional" your description of the challenge to encode I-frames came along.
Did you ever consider ditching the whole "I-/B-/P-frame" methodology? If not, please do: After all, all frames are displayed for the same period of time. There's no obvious sense in spending a lot of bits on few of them, and only few bits on most others. When you say that there have to be reference frames for seeking and such: That is not true. It would be just as possible to encode a Group Of Pictures as a whole, where the decompression would yield all frames of the GOP at once. If experience with using Vector Quantization has shown us one thing, that is the efficiency of encoding "as many correlated pieces of information as possible" - and the frames inside a GOP certainly are highly correlated. So why not consider the time (inside a GOP) just one dimension of vectors that also contain spatial and color information in other dimensions, and encode them alltogether, using the available bits on all frames of the GOP equally, not handling some "I-frame" specially?
I understand that encoding/decoding whole GOPs does require some memory and would mean a lower limit on latency for streams depending on the number of frames inside a GOP - but hey - that's not so much different as with h.264 etc., where there are already multiple frames depending on each other such that you have to decode "future" frames before you can display the "current"...
I hope I didn't just miss any preceding discussion on the ideas above, if so, I apologize and would be greatful for some link to it.
Thanks for listening and keep up the good work!