Kopfbereich

Direkt zum Inhalt Direkt zur Navigation

Inhalt

Performance
next up previous
Next: Summary Up: A faster way to Previous: Accuracy

Performance

Performance was first tested on a 16-bit platform, namely MS-DOS version 6.22. This choice of operating system was done to ensure that no two or more concurrently running processes and scheduler strategies could have influence on the measured performance. Intentionally, the smartdrive hard disk cache was turned on in order to delay the write operations during decoding with djpeg until the end of the process. This way very constant conditions for measuring performance could be achieved. As the memory manager, jmemnobs.c was used. The compiler being used was Watcom version 10.6 and for both the current implementation of the JPEGLib and the new variants, the same compiler settings were used. The computer being used for measuring the numbers in the following was a rather old 486 computer with a Cyrix CPU running at 8 MHz of clock speed. In order to really measure the theoretically possible performance improvement, the adaptive IDCT that checks for rows and columns of zeros were turned off in both the old and the new implementations. The file that was downscale-decoded was a mid-quality JPEG image of size 88007 bytes and a resolution of 1024×768 pixels in 4:4:4 chroma subsampling.

Without USE_INACCURATE_IDCT defined, it took 88 seconds to decode this file, with USE_INACCURATE_IDCT defined, this took 72 seconds. The JPEGLib's current implementation needed 96 seconds to decode this file. In order to measure the improvement of the IDCT step alone, a variant of the JPEGLib was built that simply omits the IDCT step. This variant took 48 seconds to execute. This means, that the downscale-decoding step in the current implementation of the JPEGLib took 96 - 48 = 48 seconds. The IDCT without USE_INACCURATE_IDCT defined took 88 - 48 = 40 seconds, which is an improvement in performance of the IDCT of roughly 16 %. With USE_INACCURATE_IDCT defined, the IDCT took 72 - 48 = 24 seconds, which is an improvement in performance of the IDCT of roughly 50 %. Note however that if USE_INACCURATE_IDCT is not defined, the new implementation's adaptive IDCT in the first pass is slightly less performant than the current implementation one's since it must use a temporary INT32 variable in order to store the result of the dequantization which is then scaled by a right shift over 6 digits2. Note also, that this is not an issue on a 32-bit platform which uses 32-bit integers throughout the whole algorithm.

The second test platform was the same hardware at the same clock speed running Windows NT 4.0 Workstation with only those processes running that are essential for the operating system to work. The compiler in use was Microsoft Visual C/C++ 6.0. The makefile and jconfig.h file were the ones that come with the JPEGLib package for usage with Visual C/C++. The same tests as above for this 32-bit binary took 53.7 seconds for the implementation that comes with the current release of the JPEGLib. Both new variants of the algorithm took 50.6 seconds and the variant that does not perform the IDCT step took 33.0 seconds to execute. This means that for the JPEGLib's current release the IDCT step takes roughly 53.7 - 33.0 = 20.7 seconds whereas the new variants take 50.6 - 33.0 = 17.6 seconds. This means that the IDCT step is around 15 % faster with the new implementations on this platform.

Last Updated on Sunday, 28 April 2002 16:19