UltraGrid

mirror of https://github.com/outbackdingo/UltraGrid.git synced 2026-03-22 05:40:27 +00:00

Author	SHA1	Message	Date
Martin Pulec	92565ca2ae	vcomp/cmpto_j2k: run GPU conv as prepreprocess Set the pixfmt conversion CUDA kernel as cmpto_j2k_enc preprocessor, not run directly. This also eliminates to need to have the conversion kernel if conversion is needed - CPU conversion will be sufficient. Currently not effective, only R12L is converted for which there is the kernel. refer to GH-406	2024-09-12 16:35:41 +02:00
Martin Pulec	6a5758ef48	kernels.cu: compat	2024-09-10 16:10:04 +02:00
Martin Pulec	faa6a7bd72	rt48_to_r12l_compute_blk: fixed odd widths see the previous commit as well	2024-09-10 15:24:36 +02:00
Martin Pulec	4c12bc85da	r12l_to_rg48_compute_blk: fixed odd widths Fixes unaligned access introduced in HEAD~2 (optimizing the r12l_to_rg48 kernel).	2024-09-10 15:24:36 +02:00
Martin Pulec	dfba579437	rt48_to_r12l_compute_blk: optimize as well see the previous commit Duration for 4096x2160 reduction is from some 18.5 to 0.5 ms. refers to GH-406	2024-09-10 15:24:36 +02:00
Martin Pulec	f17b0d8487	r12l_to_rg48_compute_blk: optimize load/store uint32_t to optimize the performance This reduces the duration from some 16.6 ms to 0.6 ms for 4096x216 on 1080 Ti. refers to GH-406	2024-09-10 15:24:16 +02:00
Martin Pulec	4d5f7a76ac	kernels: report elapsed mode in debug Instead of using compile-time DEBUG macro, prefer run-time specified log_level.	2024-09-10 12:07:05 +02:00
Martin Pulec	c40430e07a	rt48_to_r12l_compute_blk: compute \|last_bl\|%8 != 0 Compute last incomplete block (as already done for the cmpto_j2k enc) in CUDA kernel.	2024-09-03 16:50:56 +02:00
Martin Pulec	f860792e42	kernel_rg48_to_r12l: don't process incomplete blks see also the commit 1a543b2c	2024-09-03 16:50:56 +02:00
Martin Pulec	c08bce79b8	kernel_r12l_to_rg48: compute last incomplete blk	2024-09-03 16:50:56 +02:00
Martin Pulec	2c6519223a	kernel_r12l_to_rg48: of-by-one error There must be an equal comparison, because the position_x indicates the beginning of the block, so in the first case it means at most last unfinished block and the 2nd comparison means the last unaligned block (if any).	2024-09-03 16:50:56 +02:00
Martin Pulec	7869b2d0b3	kernel_r12l_to_rg48: rework a bit - separate the block computation Drop every non-aligned end of the line, not just on the last line. The point is, that whereas out of bound read is no more a problem, we may also do the out-of-bound write - write the trash at the beginning of the following line.	2024-09-03 16:50:56 +02:00
Martin Pulec	d82ae79c00	kernels.cu: move commons ahead; mark parts	2024-09-03 16:50:56 +02:00
Martin Pulec	1cb99cba98	conversion kernels: watch duration print a warning if consumes more than 10 ms + print the function name if DEBUG for elapsed time	2024-08-30 16:21:36 +02:00
Martin Pulec	930abe5325	vcomp/cmpto_j2k: kernel for R12L->RG48 conversion see also the commit `4f3add780`	2024-08-30 16:21:36 +02:00
Martin Pulec	904c2bb4f2	added debug CUDA kernel duration measureent to measure the duration of the newly created kernel for ->R12L kernel It is actually 1.6 ms for 1920x1080 picture on GeForce GTX TITAN, which seems to be more or less OK for now (it could be perhaps optimzed but doesn't seem to be a blocker for now).	2024-08-28 13:29:49 +02:00
Martin Pulec	4f3add780d	vdec/cmpto_j2k: use kernel for ->R12L conversion refers to GH-406	2024-08-28 13:29:36 +02:00
Martin Pulec	757b03fd67	cuda_wrapper/cuda_runtime.h wrapper: added documentation	2023-03-30 14:01:10 +02:00
Martin Pulec	7849d4fcbe	LDGM: remove UG and Linux dependency TODO: replace gettimeofday() calls (now commented out) with std::chrono::steady_clock (or similar).	2014-10-16 17:33:05 +02:00

19 Commits