A Fabulous Daala Holiday Update
Dec. 23rd, 2014 04:09 pm![[personal profile]](https://www.dreamwidth.org/img/silk/identity/user.png)
Before we get into the update itself, yes, the level of magenta in that banner image got away from me just a bit. Then it was just begging for inappropriate abuse of a font...
Ahem.
Hey everyone! I just posted a Daala update that mostly has to do with still image performance improvements (yes, still image in a video codec. Go read it to find out why!). The update includes metric plots showing our improvement on objective metrics over the past year and relative to other codecs. Since objective metrics are only of limited use, there's also side-by-side interactive image comparisons against jpeg, vp8, vp9, x264 and x265.
The update text (and demo code) was originally for a July update, as still image work was mostly in the beginning of the year. That update get held up and hadn't been released officially, though it had been discovered by and discussed at forums like doom9. I regenerated the metrics and image runs to use latest versions of all the codecs involved (only Daala and x265 improved) for this official better-late-than-never progress report!
TBA?
Date: 2014-12-23 11:04 pm (UTC)Re: TBA?
Date: 2014-12-24 01:15 am (UTC)Java applet?
Date: 2014-12-24 08:08 am (UTC)I do nto think that the video tag will soon be able to play Daala.
Re: Java applet?
Date: 2014-12-25 08:54 pm (UTC)Javascript (via emscripten or other asm.js systems) is the current plan, and we have early playback running in broswer via javascript.
Re: Java applet?
Date: 2015-01-03 12:37 pm (UTC)On the other and I saw in the roadmap (or to-do?) that you plan to get te video tag handling Daala.
That would be the best thing and eventually it will happen, I believe it.
Everytime I see the image comparisons I'm impressed by what Daala achieves, and even my personnal compressions (on a 10s video) show improvements release after release.
What about thinking outside the box of I-/B-/P-Frames?
Date: 2014-12-25 07:10 pm (UTC)I enjoy reading the updates on Daala, not so much because I would be urgently waiting for the next video codec, but because of the fresh ideas on technology that are interesting to read.
While preceding updates contained a lot of "out of the box"-thinking, I was surprised on how "conventional" your description of the challenge to encode I-frames came along.
Did you ever consider ditching the whole "I-/B-/P-frame" methodology? If not, please do: After all, all frames are displayed for the same period of time. There's no obvious sense in spending a lot of bits on few of them, and only few bits on most others. When you say that there have to be reference frames for seeking and such: That is not true. It would be just as possible to encode a Group Of Pictures as a whole, where the decompression would yield all frames of the GOP at once. If experience with using Vector Quantization has shown us one thing, that is the efficiency of encoding "as many correlated pieces of information as possible" - and the frames inside a GOP certainly are highly correlated. So why not consider the time (inside a GOP) just one dimension of vectors that also contain spatial and color information in other dimensions, and encode them alltogether, using the available bits on all frames of the GOP equally, not handling some "I-frame" specially?
I understand that encoding/decoding whole GOPs does require some memory and would mean a lower limit on latency for streams depending on the number of frames inside a GOP - but hey - that's not so much different as with h.264 etc., where there are already multiple frames depending on each other such that you have to decode "future" frames before you can display the "current"...
I hope I didn't just miss any preceding discussion on the ideas above, if so, I apologize and would be greatful for some link to it.
Thanks for listening and keep up the good work!
Re: What about thinking outside the box of I-/B-/P-Frames?
Date: 2014-12-25 09:01 pm (UTC)Motion compensation via 3D transform is likely kind of doomed. Frames are relatively speaking 'far apart' temporally and the motion changes between them aren't very smooth. Not much useful redundancy just falls out as a result of handling the block of frames all at once, and then you have to buffer multiple hundreds of megabytes of frame data, gigabytes for HD. Someday that much memory will be free, but by then, we'll likely be up to super-mega-128k+-UHD video and we'll need orders of magnitudes more.
Re: What about thinking outside the box of I-/B-/P-Frames?
Date: 2014-12-26 12:01 am (UTC)But even if it's not possible to avoid keeping frames kind of seperate with regards to their temporal placement - wouldn't it still be possible and even beneficial to overcome the "one (I-)frame is the golden reference taking much more bits to encode than every other in a GOP"-paradigma?
I would assume that even if you choose which frame to encode as an I-frame cleverly, chances are that this I-frame will contain parts (e.g. out-of-focus or motion-blurred areas) that could have been derived from another frame in the GOP (where the same objects are more in focus or less motion blurred) better while spending bits more efficiently there.
I could envision that all frames of a GOP are first scanned for regions that are (a) rich of detail information and (b) have less detail-rich counterparts in other frames of a GOP, and then any frame of a GOP could be declared "the reference frame for a certain region", which the other frames only encode differences to.
BTW: Has "blurring" of a region, in general, ever been considered to be a useful transformation that helps encoding part of a frame from another frame that holds a "sharper" version of the same region? I would expect that one could often find both "blurred" and "sharp" versions of the same objects within a sequence of frames, do to motion of that object starting or stopping.
While I am in brainstorming mode, one more completely different, wild idea: You've certainly heard of seam-carving and the C/C++ library "Liquid Rescale" that implements it. I wonder whether anybody has ever considered using seam carving for compression purposes. I am not quite sure this would work, but theoretically, one could try to re-target an image to a smaller size during compression (finally compressing the residual smaller image) and do the reverse during decompression. That, of course, would lose information, and maybe it's of no practical use. But unless falsified, one could speculate that a an image retargeted to really small dimensions might be usable as an interesting "prediction" start point for reconstructing the full-size image, because seam carving has the tendency to get rid of image areas that aren't so important to human viewers, anyway.
Hope you don't get bored reading my ideas, but I had to spill them somewhere :-)
Re: What about thinking outside the box of I-/B-/P-Frames?
Date: 2015-01-04 12:12 am (UTC)Yes I, know, there would be signalling costs but maybe there could be some way to make this efficiently...
rather vague idea towards removing the need for I-frames
Date: 2014-12-26 08:20 am (UTC)I'm a different Anonymous, but share the same feeling as him/her about P/I-frames. When I observe video artifacts they sometimes seem to vanish abruptly. I guess an I-frames reset the error accumulated so far. I wonder if its worth to explore the following idea to remove the need for I-frames:
If I-frames are analogous to single samples in one dimensional signal processing, then P-frames are analogous to back-differences. To reconstruct original samples from back-differences a Integrator is needed. If there is error in the differences then the error accumulates in the integrator without bounds. Sometimes a "leaky Integrator" is used, that decays to zero if the input is zero. If there is noise in the differences the error accumulation in the leaky integrator is more bounded. An exact inverse of the leaky integrator exist. I don't know if it also has a name, let's call it a "leaky difference".
If I understand correctly P-frames store block motion and residuals and B/I-frames are there to reset a "build up" of artifacts. Could B/I-frames be dropped, if the block residuals in P-frames were more "leaky"?
Thank you for writing the Demo Pages, I enjoy their audiovisual/interactive presentation.
Will Dala support obsolete nonsense?
Date: 2015-01-08 06:11 pm (UTC)I was wondering whether still support such things as interlacing, chroma subsampling, limited range?
All of which only adds complexity meanwhile decreasing encoding efficiency and quality.
I see no reason why they should be kept around. Above this the format could decode to RGB, avoiding crappy conversions by poor software, right?
Prefilter ringing effects
Date: 2015-01-14 02:21 am (UTC)To avoid blocking artifacts, Daala uses a lapped transforms (with its pre-filter). As I know, ringing artifacts are results of quantization.
If quantization is known before image transformation, its not possible to create a 'pre-filter' that adds the inverse of artifacts generate by quantization? So, after quantization, artifacts are cancelled.
Thanks.
Maurício Kanada
New way to represent images
Date: 2015-02-05 03:17 am (UTC)largely to the understanding of how humans understand the sound / music.
I think the new image/video compression formats should also
consider how our brain perceive the images.
In particular, I believe that the new formats should represents the
image as vectors + texture. Different transformations could be applied to each
parts.
What do you think about?
Re: New way to represent images
Date: 2015-02-05 11:31 am (UTC)The hard part is making it work better than preexisting techniques.
Re: New way to represent images
Date: 2015-02-05 07:40 pm (UTC)Motivation: Ghost (audio codec) split audio in tone + noise, applying different techniques to each part.
Motivation 2: DCT related transformation does not deal hard edges very well (specially after quantization).
My idea came after read this research:
http://www.cse.cuhk.edu.hk/~leojia/projects/L0smoothing/index.html
The idea whas:
1 - Vectorize the 'l0 smothed' version of image.
2 - Use DCT related (or any other frequency based transformation) to the 'difference' (texture?).
Maybe this idea can be used in future codec, not in (near finished) Daala.
Maurício Kanada
Re: New way to represent images
Date: 2015-02-11 02:06 am (UTC)Recent deep learning approach looks promising for replacing DCT based model.
It's possible to apply optimized set of (pre-trained) filters for specific type of image/block - noisy, pattern, gradient, landscape, face, ...
Reference
http://the-locster.livejournal.com/110724.html
http://www.cs.nyu.edu/~ranzato/research/projects.html#sparse_coding
no subject
Date: 2015-02-13 03:46 pm (UTC)no subject
Date: 2015-02-13 04:41 pm (UTC)