It doesn't seem that libsvt_vp9 is better even for AVX2+ processors.
From testing with i9-9820X and 9950X with `testcard2:noise:fps=200` we got
60 FPS for SVT and 72 for VPX in case of Intel and 153 and 128 for AMD.
In the SVT case, the frame was significatly (250 framees) delaysed with
default setting.
Also, while libsvt_vp9 supports just yuv420p, libvpx-vp9 offers up to
12 bit 4:4:4 subsampled both YUV and RGB pixel formats.
For the record, there has been a [patch](https://github.com/OpenVisualCloud/SVT-VP9/pull/67)
for libsvt_vp9 to support pre-AVX2, but it doesn't seem to make much sense
to use.
the default - libsvt_vp9 - requires AVX2 instructions so if not present,
use the fallback.
broken by some weird mistake since the commit ff3e34ef (24th Oct 2025)
On Arch with wlroots-git, sway-git and nvidia 595.45.04 data buffers are not marked as SPA_DATA_FLAG_READABLE for some reason and PW mmaps them with no read access leading to segfaults.
- check return value of send_picture for poison pill or reconfigure msg
- if configure fails, continue, do not break - not sure if useful but it
may be able to continue further (user can interrupt, anyways). At least
not it could be terminated gracefully, when the thread is terminated it
won't (maybe then either abort or exit_uv will be more proper)
- warn if bitrate is lower than 7.5 Mbps - for getting packet we then
receive SvtJxsErrorEncodeFrameError
Avoid counting frames for the consumer and instead pass a reconfigure message
and then wait for configured_consumer=false (with corresponding _cv).
Also removed the `configured` variable and set cleanup() to be idempotent.
- reenabloe bpp with decimal point after check for validity added
- change default bpp to 3/2 - it might be better suitable, eg. with
testcard:m=VGA there are observable artifacts with 0.7 but disappear with 1.5
The error triggered <https://github.com/CESNET/UltraGrid/issues/492>
causes freeze on svt_jpeg_xs_frame_pool_get().
It doesn't work, anyways - perhaps incorrect handling in SVT-JPEG-XS. At least
when `enc_input.image.ready_to_release` is not set to 1, current impl of
svt_jpeg_xs_frame_pool_release() doesn't release anything. But even if
set to 1, it still doesn't help.
So at least fixing our use and hopefully it will get also fixed
upstram. This shouldn't happen, anyways, unless passwdd wrong parameters
as in the #492.
if no data is comming, the messages are not dispatched
Maybe not a big problem, it would be dispatched when some data arives. The
potential problem, however, can be if added huge number of participants
at that time, the queue may overflow and further message dropped. (Current
max queue len is 10 - maybe it can be extended later.)
The conversions from/to R12L might take 10-15 ms for 4K frame so make
it run parallel. This will slightly increase the latency and also allows
4K @60 R10L video encode on slower machines (shouldn't be much issue with
current CPU).
The value 2 should be sufficient - one as output buffer for video
conversion while the other being encoded. The original value 5 just
caused slower propagation of insufficient performance while increasing
the latency by 3 more frames.
the value 3 seems to be very large as a default value
If the user omits the setting, it easily exceeeds 1 Gbps for
2160p60. Equivalent for JPEG Q=75 might be something like 0.5-0.7 so
using the upper bound.
Blocking svt_jpeg_xs_frame_pool_get is called with s->mtx held. If
there is no currently available buffer, it causes a dead-lock because
the consumer stops on wait on the lock.
Possible to reproduce (perhaps sometimes) with eg.
`-t testcard:patt=noise:size=3840x2160:fps=100:c=R12L` with the bitrate
exceeding egress iface speed when exitting.
Also removed a lock guarding nothing in _done - maybe it was kept there
to ensure the lock is lockable? If so, it is incorrect, anyways.
The change to unsigned is rather new and it may be safer actually to
use int (to_planar already uses that) - eg. in the constructiosn where
iterating over multiple lines and then computing the rest, if there is
less than so that we do not wraparound to `UINT_MAX - something`.
- uyvy_to_nv12: move width before the loop (clang is complaining)
- r12l_to_gbrpXXle: drop ALWAYS_INLINE+OPTIMIZED_FOR - for GCC/Intel we
get the same performance but for Clang/M1 it is something like 15 % faster
(eg. keeping OPTIMIZED_FOR without ALWAYS_INLINE worsens the performance
on other hand)
except r12l_to_gbrpXXle and derived, we don't have now a benchmark for
The initial use was incorrect - it assumed that on big endian, the bytes
b0,b1,b2,b3 need to be swapped so value b[0] would be what was b[3]
on little-endian which is not true (writting bytes)
This mistake has then spread across UltraGrid.