summaryrefslogtreecommitdiff
path: root/src/lib/openjp2/dwt.c
AgeCommit message (Collapse)Author
2024-02-18opj_dwt_decode_tile(): avoid potential UndefinedBehaviorSanitizer 'applying ↵Even Rouault
zero offset to null pointer' (fixes #1505)
2022-02-10Avoid integer overflows in DWT. Fixes ↵Even Rouault
https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=44544
2021-12-05Fix some typos (found by codespell)Stefan Weil
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-09-03Avoid integer overflows in DWT. Fixes ↵Even Rouault
https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=11700 and https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=30646
2020-11-30Encoder: avoid global buffer overflow on irreversible conversion when too ↵Even Rouault
many decomposition levels are specified (fixes #1286)
2020-05-23Forward DWT 9-7: major speed up by vectorizing vertical passEven Rouault
`bench_dwt -I -encode` times goes from 8.6s to 2.1s
2020-05-23Forward DWT 5-3: major speed up by vectorizing vertical passEven Rouault
`bench_dwt -encode` times goes from 7.9s to 1.7s
2020-05-22Forward DWT: small code refactoring to allow future improvements for the ↵Even Rouault
vertical pass
2020-05-22dwt.c: remove unused typedefEven Rouault
2020-05-22Forward DWT 5x3: performance improvements in horizontal pass, and modest in ↵Even Rouault
vertical pass
2020-05-22Forward DWT: small code refactoring to allow future improvements for the ↵Even Rouault
horizontal pass
2020-05-21Speed-up 9x7 IDWD by ~30% with OPJ_NUM_THREADS=2Even Rouault
"bench_dwt -I" time goes from 2.2s to 1.5s
2020-05-21Remove useless + 5U margin in opj_dwt_decode_tile_97()Even Rouault
Nothing in code analysis nor test suite shows that this margin is needed. It dates back to commit dbeebe72b9d35f6ff807c21c7f217b569fa894f6 where vector 9x7 decoding was introduced.
2020-05-21Speed-up 9x7 IDWD by ~20%Even Rouault
"bench_dwt -I" time goes from 2.8s to 2.2s
2020-05-20Irreversible decoding: partially revert previous commit, to fix failures in ↵Even Rouault
test suite
2020-05-20Irreversible compression/decompression DWT: use 1/K constant as per standardEven Rouault
The previous constant opj_c13318 was mysteriously equal to 2/K , and in the DWT, we had to divide K and opj_c13318 by 2... The issue was that the band->stepsize computation in tcd.c didn't take into account the log2gain of the band. The effect of this change is expected to be mostly equivalent to the previous situation, except some difference in rounding. But it leads to a dramatic reduction of the mean square error and peak error in the irreversible encoding of issue141.tif !
2020-05-20opj_dwt_encode_1_real(): avoid many bound comparisons, similarly to decoding ↵Even Rouault
side
2020-05-20Encoder: use floating-point operations for irreversible transformationEven Rouault
2020-05-20dwt.c: change sign of constants to match standard and compensate (no ↵Even Rouault
functional change)
2020-05-20Add multithreaded support in the DWT encoder.Even Rouault
Update the bench_dwt utility to have a -decode/-encode switch Measured performance gains for DWT encoder on a Intel(R) Core(TM) i7-6700HQ CPU @ 2.60GHz (4 cores, hyper threaded) Encoding time: $ ./bin/bench_dwt -encode -num_threads 1 time for dwt_encode: total = 8.348 s, wallclock = 8.352 s $ ./bin/bench_dwt -encode -num_threads 2 time for dwt_encode: total = 9.776 s, wallclock = 4.904 s $ ./bin/bench_dwt -encode -num_threads 4 time for dwt_encode: total = 13.188 s, wallclock = 3.310 s $ ./bin/bench_dwt -encode -num_threads 8 time for dwt_encode: total = 30.024 s, wallclock = 4.064 s Scaling is probably limited by memory access patterns causing memory access to be the bottleneck. The slightly worse results with threads==8 than with thread==4 is due to hyperthreading being not appropriate here.
2018-10-31Fix several memory and resource leaksNikola Forró
Signed-off-by: Nikola Forró <nforro@redhat.com>
2018-09-05Fix some typos in code comments and documentationStefan Weil
All typos were found by Codespell. Signed-off-by: Stefan Weil <sw@weilnetz.de>
2017-09-20Avoid index-out-of-bounds access when invoking opj_compress with -n 11 or ↵Even Rouault
higher. But not a proper fix itself (refs #493)
2017-09-06Fix null pointer dereference on partial tile decoding when they are empty. ↵Even Rouault
Fixes https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=3297 (master only)
2017-09-04Replace uses of size_t by OPJ_SIZE_TEven Rouault
2017-09-01opj_v4dwt_decode_step1_sse(): rework a bit to improve code generationEven Rouault
2017-09-01opj_v4dwt_decode_step2_sse(): loop unrollEven Rouault
2017-09-01opj_dwt_decode_partial_97(): simplify/more efficient use of sparse arrays in ↵Even Rouault
vertical pass
2017-09-01opj_dwt_decode_partial_1_parallel(): add SSE2 optimizationEven Rouault
2017-09-01Sub-tile decoding: speed up vertical pass in IDWT5x3 by processing 4 cols at ↵Even Rouault
a time
2017-09-01Optimize opj_dwt_decode_partial_1() when cas == 0Even Rouault
2017-09-01Various changes to allow tile buffers of more than 4giga pixelsEven Rouault
Untested though, since that means a tile buffer of at least 16 GB. So there might be places where uint32 overflow on multiplication still occur...
2017-09-01Fix compiler warning in release modeEven Rouault
2017-09-01opj_dwt_decode_partial_tile(): avoid undefined behaviour in lifting ↵Even Rouault
operation by properly initializing working buffer
2017-09-01Sub-tile decoding: only allocate tile component buffer of the needed dimensionEven Rouault
Instead of being the full tile size. * Use a sparse array mechanism to store code-blocks and intermediate stages of IDWT. * IDWT, DC level shift and MCT stages are done just on that smaller array. * Improve copy of tile component array to final image, by saving an intermediate buffer. * For full-tile decoding at reduced resolution, only allocate the tile buffer to the reduced size, instead of the full-resolution size.
2017-09-01Fix undefined shift behaviour in opj_dwt_is_whole_tile_decoding(). Fixes ↵Even Rouault
https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=3255. Credit to OSS Fuzz
2017-08-29Use IDWT whole tile decoding if the area of interest equals to the image ↵Even Rouault
bounds, taking into account the reduced resolution factor
2017-08-28Subtile decoding: fix overflows in subband coordinate computation that cause ↵Even Rouault
later buffer overflow. Fixes https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=3115. Credit to OSS Fuzz. master only
2017-08-23opj_dwt_decode_partial_97(): perf improvement: limit copy of coefficients at ↵Even Rouault
end of horizontal pass to actual range of interest
2017-08-21Add comments for filter_width valuesEven Rouault
2017-08-20Subtile decoding: only do 9x7 IDWT computations on relevant areas of ↵Even Rouault
tile-component buffer.
2017-08-18Subtile decoding: only do 5x3 IDWT computations on relevant areas of ↵Even Rouault
tile-component buffer. This lowers 'bin/opj_decompress -i ../MAPA.jp2 -o out.tif -d 0,0,256,256' down to 0.860s
2017-07-06Comment fixEven Rouault
2017-06-30IDWT 5x3: fix bug in AVX2 implementation (#953, #957)Even Rouault
2017-06-21IDWT 5x3: generalize SSE2 version for AVX2Even Rouault
Thanks to our macros that abstract SSE use, the functions can use AVX2 when available (at compile time) This brings an extra 23% speed improvement on bench_dwt in 64bit builds with AVX2 compared to SSE2.
2017-06-21dwt.c: small cleanupEven Rouault
2017-06-20Improve performance of inverse DWT 5x3 (#953)Even Rouault
* Use single-pass lifting inverse wavelet transform. * For vertical pass, use SSE2 when available so as to process 8 columns in parallel. This is the most beneficial improvement, since the vertical pass involves a lot of cache trashing. With the bench_dwt utility with default arguments (16383x16383 image), time goes from 4.064 s to 1.212 s.
2017-06-17Fix astyle issueEven Rouault
2017-06-17Fix warnings with recent GCC versionsEven Rouault
2017-05-09Reformat whole codebase with astyle.options (#128)Even Rouault