No navigation frames? Click here
Next: The tables and potential Up: A faster way to Previous: A faster way to IntroductionThe algorithm that is employed in the JPEGLib for downscaling to a fourth during decoding uses the following matrix-vector product, which is derived from the full IDCT from Loeffler, Ligtenberg and Moschytz:In this matrix-vector product, (n) is the DCT coefficient at index n and f (n) is the pixel value in the spatial domain at position n. The constants Cn that are used are defined as follows: The structure of the algorithm can probably better be explained as a flowgraph as in figure 1. What can be found surprising in this structure is the fact, that all multiplicative constants scale the DCT coefficients, much like the scaling coefficients of the Arai-Agui-Nakajima DCT. The JPEGLib accounts for the Arai-Agui-Nakajima DCT's coefficients in that these are absorbed into the dequantization coefficients. This is what makes the Arai-Agui-Nakajima DCT so fast in comparison to other schemes. Unfortunately, the JPEGLib does not use this approach for the algorithm that scales to a fourth. If this were done in the JPEGLib, scaling to a fourth would in theory simply consist out of additions. The author made an implementation for this inside the JPEGLib and it actually turned out that decoding is much faster this way since now only additions and shift operations are involved. Unfortunately, it also turned out that there are dependencies on the bitness of the platform being used. The scaling constants over both dimensions when represented as real numbers look like the following:
4.000000, 3.624510, 2.000000, 1.272759, 2.000000, 0.850430, 2.000000, 0.720960, 3.624510, 3.284268, 1.812255, 1.153281, 1.812255, 0.770598, 1.812255, 0.653281, 2.000000, 1.812255, 1.000000, 0.636379, 1.000000, 0.425215, 1.000000, 0.360480, 1.272759, 1.153281, 0.636379, 0.404979, 0.636379, 0.270598, 0.636379, 0.229402, 2.000000, 1.812255, 1.000000, 0.636379, 1.000000, 0.425215, 1.000000, 0.360480, 0.850430, 0.770598, 0.425215, 0.270598, 0.425215, 0.180808, 0.425215, 0.153281, 2.000000, 1.812255, 1.000000, 0.636379, 1.000000, 0.425215, 1.000000, 0.360480, 0.720960, 0.653281, 0.360480, 0.229402, 0.360480, 0.153281, 0.360480, 0.129946 With this table, first all dequantization table values must be scaled and afterwards two passes over the dequantized DCT coefficients have to be made which in theory only consist out of additions: In the first pass we perform the algorithm over the columns 0, 1, 3, 5 and 7, each time doing the additions like in the flowgraph. This yields two values per each of these 5 columns and thus two rows with 5 columns of interest. The second pass performs only the additions like in the flowgraph, but this time over the two rows that hold the result of the first pass. In order to implement this table with fixed-point arithmetics, careful observation of potential overflow when scaling the quantization table and when adding and subtracting is required. This leads to six different tables in this new implementation that arise from dependencies on the platform's bitness, the size of a sample (8 bits or 12 bits) and also allow a speed versus accuracy tradeoff. In order to implement the functionality, the new macro USE_FASTER_2x2_IDCT was introduced which simply needs to be defined in jconfig.h or jmorecfg.h like this:
#define USE_FASTER_2x2_IDCT If this macro is not defined, the standard functionality from JPEGLib version 6b is used. Depending on this macro, additional code in jddctmgr.c multiplies the dequantization constants with a scaled variant of the table above and in jidctred.c an alternative implementation of jpeg_idct_2x2 gets compiled. The dependence on the bitness of the platform is automatically resolved by examining the value of the constant INT_MAX from limits.h. If INT_MAX evaluates to 32767, it is a 16-bit platform, otherwise a 32-bit platform or higher is expected. In order to use the fastest possible mode (with the least accuracy), additionally the macro USE_INACCURATE_IDCT can be defined in jconfig.h or jmorecfg.h like this:
#define USE_INACCURATE_IDCT Depending on the platform's bitness, this macro has different functionality:
|