xiphmont

I've just posted part 3 in my demo series introducing the Daala video codec. This one is kind of a long one, mainly because I think it's one of the only really detailed presentations of a technique Jean-Marc Valin of Xiph invented and first introduced in the Opus audio codec: 'TF' aka Time/Frequency resolution switching.

Even better... while I was documenting TF for posterity, I spotted a possible improvement. So, I've tossed in documentation of a brand new technique as well!

Flat | Top-Level Comments Only

From:

some41.livejournal.com

How many operations does 2-stage TF need? Can second stage be merged with first as an optimization?

From:

xiphmont.livejournal.com

First stage is seven adds and half a shift per four pixels, second is six adds and four shifts per four pixels total. I haven't looked yet at all possible implementation optimizations, but second stage would certainly be rolled into the first stage.

I literally discovered the second stage while writing the demo page about the 'regular' TF we used in Opus. Working two-stage TF is about six days old.

Edited Date: 2013-08-13 01:30 am (UTC)

From:

some41.livejournal.com

So a naive implementation of 2 stage TF is about twice as expensive as regular TF, but there's a room for improvement? Assuming you come up with a fast implementation and 2 stage TF is great all around, can it be integrated back into Opus, or is it too late?

From:

xiphmont.livejournal.com

a bit under twice as expensive, possible room for improvement, yes. For the moment though, I'm going to be looking at lapping's effects and larger point sizes.

> can it be integrated back into Opus, or is it too late

I don't know yet that it's generally applicable to larger point sizes, but indications are it is. That will be first thing to check (starting today).

It's too late to get it into original RFC6716 Opus. That doesn't mean we can't extend Opus though.

From:

some41.livejournal.com

great stuff, thanks for answering :)

From:

xiphmont.livejournal.com

oops, since I can't edit the above anymore....

Thinking about it a bit more, I first talked to RH about the TF stuff one week ago, and that was a few days after trying to pin down my manager about it, so.. I think I actually lost a week in there. Call TF 13 days old as of yesterday, in case that should ever be relevant :-)

And considering it may be able to replace the DCT in some audio and image processing applications, that might actually become relevant.

From: (Anonymous)

I was expecting TF idea from Opus could be used for video purposes and such a nice surprise is this article.
Considering that in HEVC, one big bottleneck is the 4x4 transform compared with the 32x32 for the same data. TF can be used to split the fastest 32x32 transform and gain some performance ?

From:

xiphmont.livejournal.com

It can be used in either direction to gain speed over the DCT; however, there's a coding gain penalty.

As we're finding out, though, coding gain is not the end of the story (more about that later ;-)

From: (Anonymous)

Thanks for the great post. Interesting technique.
If I understand correctly, the goal of the TF is to compute a cheap DCT like transform of a neighbor block of the wrong size.
Then, how does it compare quality and speed wise with computing a Walsh Hadamard transform of the neighbor block (adjusted to the appropriate size) ? Is it any faster ?
It would also be great to have a visual comparison side by side between round-trip TF, DCT & WHT.

From:

xiphmont.livejournal.com

The original need was to weld/split blocks entirely from frequency domain data, without having to go back to time/space. Flexibility was the point, not speed.
It being faster than a DCT was a secondary (though intentional) design criteria.

It would not be as fast as a straight WHT since TF is still operating on DCT output (so there's a DCT in there at some point). The coding gain penalty for using the WHT as a primary transform would be substantial, so I don't think anyone considered it.

From: (Anonymous)

...or are you working on other stuff?

From:

xiphmont.livejournal.com

yes to both.

Flat | Top-Level Comments Only

Profile

xiphmont

Introducing Daala, part 3: Time/Frequency Resolution Switching

Introducing Daala, part 3: Time/Frequency Resolution Switching

no subject

no subject

no subject

no subject

no subject

no subject

TF merge speedups

Re: TF merge speedups

TF vs. WHT

Re: TF vs. WHT

Is demo 4 coming?

Re: Is demo 4 coming?

Profile