| Age | Commit message (Collapse) | Author |
|
Signed-off-by: Stefan Weil <sw@weilnetz.de>
|
|
opj_tcd_dc_level_shift_decode(): avoid increment nullptr (fixes #1480)
|
|
(likely harmless issue as we don't dereference it)
|
|
|
|
|
|
- Avoid doing 128 iterations all the time, and stop when the threshold
doesn't vary much
- Avoid calling costly opj_t2_encode_packets() repeatdly when bisecting the
layer ratio if the truncation points haven't changed since the last
iteration.
When used with the GDAL gdal_translate application to convert a 11977 x
8745 raster with data type UInt16 and 8 channels, the conversion time
to JPEG2000 with 20 quality layers using disto/rate allocation (
-co "IC=C8" -co "JPEG2000_DRIVER=JP2OPENJPEG" -co "PROFILE=NPJE_NUMERICALLY_LOSSLESS"
creation options of the GDAL NITF driver) goes from 5m56 wall clock
(8m20s total, 12 vCPUs) down to 1m16 wall clock (3m45 total).
|
|
(fixes #1283)
|
|
|
|
(fixes #1283)
|
|
(fixes #1283)
|
|
write heap buffer overflow in opj_mqc_flush (fixes #1283)
|
|
test suite
|
|
The previous constant opj_c13318 was mysteriously equal to 2/K , and in
the DWT, we had to divide K and opj_c13318 by 2... The issue was that the
band->stepsize computation in tcd.c didn't take into account the log2gain of
the band.
The effect of this change is expected to be mostly equivalent to the previous
situation, except some difference in rounding. But it leads to a dramatic
reduction of the mean square error and peak error in the irreversible encoding
of issue141.tif !
|
|
messing up with stepsize (no functional change)
|
|
|
|
|
|
Update the bench_dwt utility to have a -decode/-encode switch
Measured performance gains for DWT encoder on a
Intel(R) Core(TM) i7-6700HQ CPU @ 2.60GHz (4 cores, hyper threaded)
Encoding time:
$ ./bin/bench_dwt -encode -num_threads 1
time for dwt_encode: total = 8.348 s, wallclock = 8.352 s
$ ./bin/bench_dwt -encode -num_threads 2
time for dwt_encode: total = 9.776 s, wallclock = 4.904 s
$ ./bin/bench_dwt -encode -num_threads 4
time for dwt_encode: total = 13.188 s, wallclock = 3.310 s
$ ./bin/bench_dwt -encode -num_threads 8
time for dwt_encode: total = 30.024 s, wallclock = 4.064 s
Scaling is probably limited by memory access patterns causing
memory access to be the bottleneck.
The slightly worse results with threads==8 than with thread==4
is due to hyperthreading being not appropriate here.
|
|
- API wise, opj_codec_set_threads() can be used on the encoding side
- opj_compress has a -threads switch similar to opj_uncompress
|
|
* -PLT switch added to opj_compress
* Add a opj_encoder_set_extra_options() function that
accepts a PLT=YES option, and could be expanded later
for other uses.
-------
Testing with a Sentinel2 10m band, T36JTT_20160914T074612_B02.jp2,
coming from S2A_MSIL1C_20160914T074612_N0204_R135_T36JTT_20160914T081456.SAFE
Decompress it to TIFF:
```
opj_uncompress -i T36JTT_20160914T074612_B02.jp2 -o T36JTT_20160914T074612_B02.tif
```
Recompress it with similar parameters as original:
```
opj_compress -n 5 -c [256,256],[256,256],[256,256],[256,256],[256,256] -t 1024,1024 -PLT -i T36JTT_20160914T074612_B02.tif -o T36JTT_20160914T074612_B02_PLT.jp2
```
Dump codestream detail with GDAL dump_jp2.py utility (https://github.com/OSGeo/gdal/blob/master/gdal/swig/python/samples/dump_jp2.py)
```
python dump_jp2.py T36JTT_20160914T074612_B02.jp2 > /tmp/dump_sentinel2_ori.txt
python dump_jp2.py T36JTT_20160914T074612_B02_PLT.jp2 > /tmp/dump_sentinel2_openjpeg_plt.txt
```
The diff between both show very similar structure, and identical number of packets in PLT markers
Now testing with Kakadu (KDU803_Demo_Apps_for_Linux-x86-64_200210)
Full file decompression:
```
kdu_expand -i T36JTT_20160914T074612_B02_PLT.jp2 -o tmp.tif
Consumed 121 tile-part(s) from a total of 121 tile(s).
Consumed 80,318,806 codestream bytes (excluding any file format) = 5.329697
bits/pel.
Processed using the multi-threaded environment, with
8 parallel threads of execution
```
Partial decompresson (presumably using PLT markers):
```
kdu_expand -i T36JTT_20160914T074612_B02.jp2 -o tmp.pgm -region "{0.5,0.5},{0.01,0.01}"
kdu_expand -i T36JTT_20160914T074612_B02_PLT.jp2 -o tmp2.pgm -region "{0.5,0.5},{0.01,0.01}"
diff tmp.pgm tmp2.pgm && echo "same !"
```
-------
Funded by ESA for S2-MPC project
|
|
opj_tcd_get_encoder_input_buffer_size()
|
|
Add -IMF switch to opj_compress as well
|
|
That could lead to later assertion failures.
Fixes #1231 / CVE-2020-8112
|
|
eal(): proper deal with a number of samples larger than 4 billion (refs #1151)
|
|
images with huge dimensions. Credit to Google Autofuzz project for providing test case
|
|
This adds a opj_set_decoded_components(opj_codec_t *p_codec,
OPJ_UINT32 numcomps, const OPJ_UINT32* comps_indices) function,
and equivalent "opj_decompress -c compno[,compno]*" option.
When specified, neither the MCT transform nor JP2 channel transformations
will be applied.
Tests added for various combinations of whole image vs tiled-based decoding,
full or reduced resolution, use of decode area or not.
|
|
|
|
the same number of resolutions. Also fixes an issue with subtile decoding. Fixes https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=3331. Credit to OSS Fuzz
|
|
heap-buffer-overflow when numcomps < 3
|
|
https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=3305 (master only)
|
|
Subtile decoding: memory use reduction and perf improvements
|
|
|
|
reuse the tile buffer one
|
|
data exceeds system limits' (refs https://github.com/uclouvain/openjpeg/pull/730#issuecomment-326654188)
|
|
And save a useless loop, which should be a tiny faster.
|
|
|
|
|
|
Untested though, since that means a tile buffer of at least 16 GB. So
there might be places where uint32 overflow on multiplication still occur...
|
|
pixels must remain under 4 billion)
|
|
However the intermediate buffer for decoding must still be smaller than 4
billion pixels, so this is useful for decoding at a lower resolution level,
or subtile decoding.
|
|
previous commit)
|
|
Instead of being the full tile size.
* Use a sparse array mechanism to store code-blocks and intermediate stages of
IDWT.
* IDWT, DC level shift and MCT stages are done just on that smaller array.
* Improve copy of tile component array to final image, by saving an intermediate
buffer.
* For full-tile decoding at reduced resolution, only allocate the tile buffer to
the reduced size, instead of the full-resolution size.
|
|
later buffer overflow. Fixes https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=3115. Credit to OSS Fuzz. master only
|
|
|
|
tile-component buffer.
|
|
tile-component buffer.
This lowers 'bin/opj_decompress -i ../MAPA.jp2 -o out.tif -d 0,0,256,256'
down to 0.860s
|
|
window specified in opj_set_decode_area()
|
|
write heap buffer overflow in opj_mqc_flush (#982)
|
|
error message when the output buffer is too small
|
|
Instead of having the chunk array at the segment level, we can move it down to
the codeblock itself since segments are filled in sequential order.
Limit the number of memory allocation, and decrease slightly the memory usage.
On MAPA_005.jp2
n4: 1871312549 (heap allocation functions) malloc/new/new[], --alloc-fns, etc.
n1: 1610689344 0x4E781E7: opj_aligned_malloc (opj_malloc.c:61)
n1: 1610689344 0x4E71D1B: opj_alloc_tile_component_data (tcd.c:676)
n1: 1610689344 0x4E726CF: opj_tcd_init_decode_tile (tcd.c:816)
n1: 1610689344 0x4E4BE39: opj_j2k_read_tile_header (j2k.c:8617)
n1: 1610689344 0x4E4C902: opj_j2k_decode_tiles (j2k.c:10348)
n1: 1610689344 0x4E4E3CE: opj_j2k_decode (j2k.c:7846)
n1: 1610689344 0x4E53002: opj_jp2_decode (jp2.c:1564)
n0: 1610689344 0x40374E: main (opj_decompress.c:1459)
n1: 219232541 0x4E4BC50: opj_j2k_read_tile_header (j2k.c:4683)
n1: 219232541 0x4E4C902: opj_j2k_decode_tiles (j2k.c:10348)
n1: 219232541 0x4E4E3CE: opj_j2k_decode (j2k.c:7846)
n1: 219232541 0x4E53002: opj_jp2_decode (jp2.c:1564)
n0: 219232541 0x40374E: main (opj_decompress.c:1459)
n1: 23893200 0x4E72735: opj_tcd_init_decode_tile (tcd.c:1225)
n1: 23893200 0x4E4BE39: opj_j2k_read_tile_header (j2k.c:8617)
n1: 23893200 0x4E4C902: opj_j2k_decode_tiles (j2k.c:10348)
n1: 23893200 0x4E4E3CE: opj_j2k_decode (j2k.c:7846)
n1: 23893200 0x4E53002: opj_jp2_decode (jp2.c:1564)
n0: 23893200 0x40374E: main (opj_decompress.c:1459)
n0: 17497464 in 52 places, all below massif's threshold (1.00%)
|
|
Currently we allocate at least 8192 bytes for each codeblock, and copy
the relevant parts of the codestream in that per-codeblock buffer as we
decode packets.
As the whole codestream for the tile is ingested in memory and alive
during the decoding, we can directly point to it instead of copying. But
to do that, we need an intermediate concept, a 'chunk' of code-stream segment,
given that segments may be made of data at different places in the code-stream
when quality layers are used.
With that change, the decoding of MAPA_005.jp2 goes down from the previous
improvement of 2.7 GB down to 1.9 GB.
New profile:
n4: 1885648469 (heap allocation functions) malloc/new/new[], --alloc-fns, etc.
n1: 1610689344 0x4E78287: opj_aligned_malloc (opj_malloc.c:61)
n1: 1610689344 0x4E71D7B: opj_alloc_tile_component_data (tcd.c:676)
n1: 1610689344 0x4E7272C: opj_tcd_init_decode_tile (tcd.c:816)
n1: 1610689344 0x4E4BDD9: opj_j2k_read_tile_header (j2k.c:8618)
n1: 1610689344 0x4E4C8A2: opj_j2k_decode_tiles (j2k.c:10349)
n1: 1610689344 0x4E4E36E: opj_j2k_decode (j2k.c:7847)
n1: 1610689344 0x4E52FA2: opj_jp2_decode (jp2.c:1564)
n0: 1610689344 0x40374E: main (opj_decompress.c:1459)
n1: 219232541 0x4E4BBF0: opj_j2k_read_tile_header (j2k.c:4685)
n1: 219232541 0x4E4C8A2: opj_j2k_decode_tiles (j2k.c:10349)
n1: 219232541 0x4E4E36E: opj_j2k_decode (j2k.c:7847)
n1: 219232541 0x4E52FA2: opj_jp2_decode (jp2.c:1564)
n0: 219232541 0x40374E: main (opj_decompress.c:1459)
n1: 39822000 0x4E727A9: opj_tcd_init_decode_tile (tcd.c:1219)
n1: 39822000 0x4E4BDD9: opj_j2k_read_tile_header (j2k.c:8618)
n1: 39822000 0x4E4C8A2: opj_j2k_decode_tiles (j2k.c:10349)
n1: 39822000 0x4E4E36E: opj_j2k_decode (j2k.c:7847)
n1: 39822000 0x4E52FA2: opj_jp2_decode (jp2.c:1564)
n0: 39822000 0x40374E: main (opj_decompress.c:1459)
n0: 15904584 in 52 places, all below massif's threshold (1.00%)
|