xiphmont: (Default)
[personal profile] xiphmont

Before we get into the update itself, yes, the level of magenta in that banner image got away from me just a bit. Then it was just begging for inappropriate abuse of a font...

Ahem.

Hey everyone! I just posted a Daala update that mostly has to do with still image performance improvements (yes, still image in a video codec. Go read it to find out why!). The update includes metric plots showing our improvement on objective metrics over the past year and relative to other codecs. Since objective metrics are only of limited use, there's also side-by-side interactive image comparisons against jpeg, vp8, vp9, x264 and x265.

The update text (and demo code) was originally for a July update, as still image work was mostly in the beginning of the year. That update get held up and hadn't been released officially, though it had been discovered by and discussed at forums like doom9. I regenerated the metrics and image runs to use latest versions of all the codecs involved (only Daala and x265 improved) for this official better-late-than-never progress report!

TBA?

Date: 2014-12-23 11:04 pm (UTC)
From: (Anonymous)
What's the ETA on Dalaa? Been hearing a lot of good things but its hard to get excited over experimental software.

Re: TBA?

Date: 2014-12-24 01:15 am (UTC)
From: (Anonymous)
According to http://people.xiph.org/~tterribe/daala/videocodec-201407.pdf , which is from July 2014, Daala's bitstream is expected to be frozen by the end of 2015.

Java applet?

Date: 2014-12-24 08:08 am (UTC)
From: [identity profile] zik zak (from livejournal.com)
Can we expect a new version of Cortado to handle Daala (and Opus)?
I do nto think that the video tag will soon be able to play Daala.

Re: Java applet?

Date: 2014-12-25 08:54 pm (UTC)
From: [identity profile] xiphmont.livejournal.com
Java applets are hard going right now-- essentially anyone who doesn't want to get compromised instantly has all browser Java support disabled these days.

Javascript (via emscripten or other asm.js systems) is the current plan, and we have early playback running in broswer via javascript.

Re: Java applet?

Date: 2015-01-03 12:37 pm (UTC)
From: [identity profile] zik zak (from livejournal.com)
I just tried the JS demo and got 2fps ^_^
On the other and I saw in the roadmap (or to-do?) that you plan to get te video tag handling Daala.
That would be the best thing and eventually it will happen, I believe it.

Everytime I see the image comparisons I'm impressed by what Daala achieves, and even my personnal compressions (on a 10s video) show improvements release after release.
From: (Anonymous)
Hi Monty,

I enjoy reading the updates on Daala, not so much because I would be urgently waiting for the next video codec, but because of the fresh ideas on technology that are interesting to read.

While preceding updates contained a lot of "out of the box"-thinking, I was surprised on how "conventional" your description of the challenge to encode I-frames came along.

Did you ever consider ditching the whole "I-/B-/P-frame" methodology? If not, please do: After all, all frames are displayed for the same period of time. There's no obvious sense in spending a lot of bits on few of them, and only few bits on most others. When you say that there have to be reference frames for seeking and such: That is not true. It would be just as possible to encode a Group Of Pictures as a whole, where the decompression would yield all frames of the GOP at once. If experience with using Vector Quantization has shown us one thing, that is the efficiency of encoding "as many correlated pieces of information as possible" - and the frames inside a GOP certainly are highly correlated. So why not consider the time (inside a GOP) just one dimension of vectors that also contain spatial and color information in other dimensions, and encode them alltogether, using the available bits on all frames of the GOP equally, not handling some "I-frame" specially?

I understand that encoding/decoding whole GOPs does require some memory and would mean a lower limit on latency for streams depending on the number of frames inside a GOP - but hey - that's not so much different as with h.264 etc., where there are already multiple frames depending on each other such that you have to decode "future" frames before you can display the "current"...

I hope I didn't just miss any preceding discussion on the ideas above, if so, I apologize and would be greatful for some link to it.

Thanks for listening and keep up the good work!
From: [identity profile] xiphmont.livejournal.com
Our early Tarkin codec tried this strategy-- it encoded entire blocks of frames, the equivalent of a GOP, en-masse. The problem is that one must quantize to get effective compression in video, and transforms with decent compaction are acausal. As a result, motion artifacts show up before the motion begins, and the slightest hint of pre-motion artifacts stands out like a sore thumb. When we ratcheted the precision high enough to avoid the problem, there was no longer any benefit to encoding the entire block at once (but a number of remaining disadvantages).

Motion compensation via 3D transform is likely kind of doomed. Frames are relatively speaking 'far apart' temporally and the motion changes between them aren't very smooth. Not much useful redundancy just falls out as a result of handling the block of frames all at once, and then you have to buffer multiple hundreds of megabytes of frame data, gigabytes for HD. Someday that much memory will be free, but by then, we'll likely be up to super-mega-128k+-UHD video and we'll need orders of magnitudes more.
Edited Date: 2014-12-25 09:13 pm (UTC)
From: (Anonymous)
Ok, understood, thanks for the explanation.

But even if it's not possible to avoid keeping frames kind of seperate with regards to their temporal placement - wouldn't it still be possible and even beneficial to overcome the "one (I-)frame is the golden reference taking much more bits to encode than every other in a GOP"-paradigma?

I would assume that even if you choose which frame to encode as an I-frame cleverly, chances are that this I-frame will contain parts (e.g. out-of-focus or motion-blurred areas) that could have been derived from another frame in the GOP (where the same objects are more in focus or less motion blurred) better while spending bits more efficiently there.

I could envision that all frames of a GOP are first scanned for regions that are (a) rich of detail information and (b) have less detail-rich counterparts in other frames of a GOP, and then any frame of a GOP could be declared "the reference frame for a certain region", which the other frames only encode differences to.

BTW: Has "blurring" of a region, in general, ever been considered to be a useful transformation that helps encoding part of a frame from another frame that holds a "sharper" version of the same region? I would expect that one could often find both "blurred" and "sharp" versions of the same objects within a sequence of frames, do to motion of that object starting or stopping.

While I am in brainstorming mode, one more completely different, wild idea: You've certainly heard of seam-carving and the C/C++ library "Liquid Rescale" that implements it. I wonder whether anybody has ever considered using seam carving for compression purposes. I am not quite sure this would work, but theoretically, one could try to re-target an image to a smaller size during compression (finally compressing the residual smaller image) and do the reverse during decompression. That, of course, would lose information, and maybe it's of no practical use. But unless falsified, one could speculate that a an image retargeted to really small dimensions might be usable as an interesting "prediction" start point for reconstructing the full-size image, because seam carving has the tendency to get rid of image areas that aren't so important to human viewers, anyway.

Hope you don't get bored reading my ideas, but I had to spill them somewhere :-)
From: (Anonymous)
Another question: Have you considered the possibility that the key-frames be applied only to a part of the image (or even to a couple of blocks)? Consider e.g. moving scene where a new object appears in the part of the image...

Yes I, know, there would be signalling costs but maybe there could be some way to make this efficiently...
From: (Anonymous)
Hi Monty,
I'm a different Anonymous, but share the same feeling as him/her about P/I-frames. When I observe video artifacts they sometimes seem to vanish abruptly. I guess an I-frames reset the error accumulated so far. I wonder if its worth to explore the following idea to remove the need for I-frames:
If I-frames are analogous to single samples in one dimensional signal processing, then P-frames are analogous to back-differences. To reconstruct original samples from back-differences a Integrator is needed. If there is error in the differences then the error accumulates in the integrator without bounds. Sometimes a "leaky Integrator" is used, that decays to zero if the input is zero. If there is noise in the differences the error accumulation in the leaky integrator is more bounded. An exact inverse of the leaky integrator exist. I don't know if it also has a name, let's call it a "leaky difference".
If I understand correctly P-frames store block motion and residuals and B/I-frames are there to reset a "build up" of artifacts. Could B/I-frames be dropped, if the block residuals in P-frames were more "leaky"?

Thank you for writing the Demo Pages, I enjoy their audiovisual/interactive presentation.

Will Dala support obsolete nonsense?

Date: 2015-01-08 06:11 pm (UTC)
From: [identity profile] zsolt [ʒolt] (from livejournal.com)
Hello!

I was wondering whether still support such things as interlacing, chroma subsampling, limited range?
All of which only adds complexity meanwhile decreasing encoding efficiency and quality.

I see no reason why they should be kept around. Above this the format could decode to RGB, avoiding crappy conversions by poor software, right?

Prefilter ringing effects

Date: 2015-01-14 02:21 am (UTC)
From: (Anonymous)
(sorry if it is a so stupid idea)

To avoid blocking artifacts, Daala uses a lapped transforms (with its pre-filter). As I know, ringing artifacts are results of quantization.
If quantization is known before image transformation, its not possible to create a 'pre-filter' that adds the inverse of artifacts generate by quantization? So, after quantization, artifacts are cancelled.

Thanks.

Maurício Kanada

New way to represent images

Date: 2015-02-05 03:17 am (UTC)
From: (Anonymous)
The great success of current lossy audio formats (opus, vorbis, mp3) should
largely to the understanding of how humans understand the sound / music.

I think the new image/video compression formats should also
consider how our brain perceive the images.

In particular, I believe that the new formats should represents the
image as vectors + texture. Different transformations could be applied to each
parts.

What do you think about?

Re: New way to represent images

Date: 2015-02-05 11:31 am (UTC)
From: [identity profile] xiphmont.livejournal.com
This is not a crazy idea, and it's not even that hard to make it work. Quite a bit or research either does just this, or is at least inspired by the idea.
The hard part is making it work better than preexisting techniques.

Re: New way to represent images

Date: 2015-02-05 07:40 pm (UTC)
From: (Anonymous)
Thanks by response, Monty.

Motivation: Ghost (audio codec) split audio in tone + noise, applying different techniques to each part.
Motivation 2: DCT related transformation does not deal hard edges very well (specially after quantization).

My idea came after read this research:

http://www.cse.cuhk.edu.hk/~leojia/projects/L0smoothing/index.html

The idea whas:

1 - Vectorize the 'l0 smothed' version of image.
2 - Use DCT related (or any other frequency based transformation) to the 'difference' (texture?).

Maybe this idea can be used in future codec, not in (near finished) Daala.

Maurício Kanada

Re: New way to represent images

Date: 2015-02-11 02:06 am (UTC)
From: (Anonymous)
(another anonymous)

Recent deep learning approach looks promising for replacing DCT based model.
It's possible to apply optimized set of (pre-trained) filters for specific type of image/block - noisy, pattern, gradient, landscape, face, ...

Reference
http://the-locster.livejournal.com/110724.html
http://www.cs.nyu.edu/~ranzato/research/projects.html#sparse_coding

Date: 2015-02-13 03:46 pm (UTC)
From: (Anonymous)
I hope you publish exact versions, encoding parameters and file sizes the next time

Date: 2015-02-13 04:41 pm (UTC)
From: [identity profile] xiphmont.livejournal.com
We did. All encoders were pulled from their repository masters on Dec 16, 2014 and built from source. The scripts used to generate all images are in the Daala repository under the tools/ directory, specifically ab_compare_xxxx. You can see the complete invocations of each encoder there, and replicate everything we did for each file.
Edited Date: 2015-02-13 04:42 pm (UTC)

Profile

xiphmont: (Default)
xiphmont

Most Popular Tags