Commit Graph

19 Commits

Author SHA1 Message Date
Martin Pulec
92565ca2ae vcomp/cmpto_j2k: run GPU conv as prepreprocess
Set the pixfmt conversion CUDA kernel as cmpto_j2k_enc preprocessor,
not run directly.

This also eliminates to need to have the conversion kernel if conversion
is needed - CPU conversion will be sufficient. Currently not effective,
only R12L is converted for which there is the kernel.

refer to GH-406
2024-09-12 16:35:41 +02:00
Martin Pulec
6a5758ef48 kernels.cu: compat 2024-09-10 16:10:04 +02:00
Martin Pulec
faa6a7bd72 rt48_to_r12l_compute_blk: fixed odd widths
see the previous commit as well
2024-09-10 15:24:36 +02:00
Martin Pulec
4c12bc85da r12l_to_rg48_compute_blk: fixed odd widths
Fixes unaligned access introduced in HEAD~2 (optimizing the r12l_to_rg48
kernel).
2024-09-10 15:24:36 +02:00
Martin Pulec
dfba579437 rt48_to_r12l_compute_blk: optimize as well
see the previous commit

Duration for 4096x2160 reduction is from some 18.5 to 0.5 ms.

refers to GH-406
2024-09-10 15:24:36 +02:00
Martin Pulec
f17b0d8487 r12l_to_rg48_compute_blk: optimize
load/store uint32_t to optimize the performance

This reduces the duration from some 16.6 ms to 0.6 ms for 4096x216 on
1080 Ti.

refers to GH-406
2024-09-10 15:24:16 +02:00
Martin Pulec
4d5f7a76ac kernels: report elapsed mode in debug
Instead of using compile-time DEBUG macro, prefer run-time specified
log_level.
2024-09-10 12:07:05 +02:00
Martin Pulec
c40430e07a rt48_to_r12l_compute_blk: compute |last_bl|%8 != 0
Compute last incomplete block (as already done for the cmpto_j2k enc)
in CUDA kernel.
2024-09-03 16:50:56 +02:00
Martin Pulec
f860792e42 kernel_rg48_to_r12l: don't process incomplete blks
see also the commit 1a543b2c
2024-09-03 16:50:56 +02:00
Martin Pulec
c08bce79b8 kernel_r12l_to_rg48: compute last incomplete blk 2024-09-03 16:50:56 +02:00
Martin Pulec
2c6519223a kernel_r12l_to_rg48: of-by-one error
There must be an equal comparison, because the position_x indicates the
beginning of the block, so in the first case it means at most last
unfinished block and the 2nd comparison means the last unaligned block
(if any).
2024-09-03 16:50:56 +02:00
Martin Pulec
7869b2d0b3 kernel_r12l_to_rg48: rework a bit
- separate the block computation

Drop every non-aligned end of the line, not just on the last line. The
point is, that whereas out of bound read is no more a problem, we may
also do the out-of-bound write - write the trash at the beginning of
the following line.
2024-09-03 16:50:56 +02:00
Martin Pulec
d82ae79c00 kernels.cu: move commons ahead; mark parts 2024-09-03 16:50:56 +02:00
Martin Pulec
1cb99cba98 conversion kernels: watch duration
print a warning if consumes more than 10 ms

+ print the function name if DEBUG for elapsed time
2024-08-30 16:21:36 +02:00
Martin Pulec
930abe5325 vcomp/cmpto_j2k: kernel for R12L->RG48 conversion
see also the commit 4f3add780
2024-08-30 16:21:36 +02:00
Martin Pulec
904c2bb4f2 added debug CUDA kernel duration measureent
to measure the duration of the newly created kernel for ->R12L kernel

It is actually 1.6 ms for 1920x1080 picture on GeForce GTX TITAN, which
seems to be more or less OK for now (it could be perhaps optimzed but
doesn't seem to be a blocker for now).
2024-08-28 13:29:49 +02:00
Martin Pulec
4f3add780d vdec/cmpto_j2k: use kernel for ->R12L conversion
refers to GH-406
2024-08-28 13:29:36 +02:00
Martin Pulec
757b03fd67 cuda_wrapper/cuda_runtime.h wrapper: added documentation 2023-03-30 14:01:10 +02:00
Martin Pulec
7849d4fcbe LDGM: remove UG and Linux dependency
TODO: replace gettimeofday() calls (now commented out) with
std::chrono::steady_clock (or similar).
2014-10-16 17:33:05 +02:00