| Age | Commit message (Collapse) | Author |
|
Signed-off-by: Stefan Weil <sw@weilnetz.de>
|
|
|
|
Update the bench_dwt utility to have a -decode/-encode switch
Measured performance gains for DWT encoder on a
Intel(R) Core(TM) i7-6700HQ CPU @ 2.60GHz (4 cores, hyper threaded)
Encoding time:
$ ./bin/bench_dwt -encode -num_threads 1
time for dwt_encode: total = 8.348 s, wallclock = 8.352 s
$ ./bin/bench_dwt -encode -num_threads 2
time for dwt_encode: total = 9.776 s, wallclock = 4.904 s
$ ./bin/bench_dwt -encode -num_threads 4
time for dwt_encode: total = 13.188 s, wallclock = 3.310 s
$ ./bin/bench_dwt -encode -num_threads 8
time for dwt_encode: total = 30.024 s, wallclock = 4.064 s
Scaling is probably limited by memory access patterns causing
memory access to be the bottleneck.
The slightly worse results with threads==8 than with thread==4
is due to hyperthreading being not appropriate here.
|
|
|
|
|
|
Instead of being the full tile size.
* Use a sparse array mechanism to store code-blocks and intermediate stages of
IDWT.
* IDWT, DC level shift and MCT stages are done just on that smaller array.
* Improve copy of tile component array to final image, by saving an intermediate
buffer.
* For full-tile decoding at reduced resolution, only allocate the tile buffer to
the reduced size, instead of the full-resolution size.
|
|
tile-component buffer.
This lowers 'bin/opj_decompress -i ../MAPA.jp2 -o out.tif -d 0,0,256,256'
down to 0.860s
|
|
|
|
|