In Windows, uv.exe is directly in top-level directory, not in "bin"
subdirectory. Thus, cut "bin" suffix only if there is any.
This fixes Vulkan shaders not being found - eg. "C:\UltraGrid\uv.exe"
resulted in shaders being searched in "C:\shaders"
(UltraGrid/../shaders).
Currently, discrete GPU is always chosen if available but when display
is connected through a different GPU and the discrete card is NVIDIA,
colors are swapped as in #274. Detecting GPU with connected display is
not trivial and requires VK_KHR_display, which is not widely supported
by graphic drivers.
This is a small usability change that helps to workaround the problem by
allowing `:gpu=integrated` instead of passing the numeric index of the
card.
Now it doesn't seem to increase frame variance as it used to with
current x264 and parameter set. On contrary, it improves visual quality
when there are motion and reduces compressed pictures' sizes on average.
Deinterlacing now will be HW accelerated - this means lower latency and
increase of performance for codec for which the computation is complex,
like v210 and R10k.
+ use logger where there hasn't been
The messages are not particulary useful and produce higher number of
lines on init so it would be better to use debug level.
Printout by video decoder might have been incorrect if there is a
postprocessor that changes properties, eg.:
uv -t testcard:fps=50i -d gl -p double_framerate
printed 50i but display was actually set to 50p.
At least a little optimization for slow codecs - these have fixed number
of iterations per pixel block so we can give a compiler an oppurtunity
to unroll and optimize.
speedup on i9-9820X - v210, R10k 8%; R12L 12%
instead of just interpolating between 2 lines and writing result to
both, average everytime line N with N+1 and write result to N:
1 1 1 1 1A1A1A1A
A A A A A2A2A2A2
2 2 2 2 -> 2B2B2B2B
B B B B B3B3B3B3
3 3 3 3 3C3C3C3C
C C C C 3C3C3C3C (last 2 lines are the same)
Performance assessment - for SSE optimized pixel formats (8-bit ones)
the impact is small (5% on i9-9820X) - it is perhaps memory-bound and
adjacent lines stays in cache (each loop re-reads one used in previous
iteration). For v210, R10k and R12L the situation is worse and the
slow-down is around 90%.
Jack does not wait for the server to exit completely on client_close(),
which causes a race condition when running uv with `--capabilities`:
when the jack playback device is probed right after probing the jack
capture device, jack still sees the terminating server and tries to
connect to it unsuccessfully 5 times in a row.
This changes reduces the time it takes for --capabilities to run by ~7
seconds, greately reducing the GUI startup time.
now faster than vc_deinterlace
+ fixed possible error that has there been perhaps always when dst_pitch
> src_linesize - after inner cycle, dst was incremented by
(dst_pitch+src_linesize), not 2xdst_pitch
The memcpy was there left after changing vc_deinterlice for
vc_deinterlace_ex but not only that it is not needed since it converts
directly to output buffer but also the output would get rewritten by the
input.