Running only the unit-tests of select crates on some platforms is
problematic. We are unlikely to update this list of crates as we
introduce new ones. It is a better default to run the tests of all
crates on all platforms and selectively exclude the ones that can't run
because they are unsupported.
We use several buffer pools across `connlib` that are all backed by the
same buffer-pool library. Within that library, we currently use another
object-pool library to provide the actual pooling functionality.
Benchmarking has shown that spend quite a bit of time (a few % of total
CPU time), fighting for the lock to either add or remote a buffer from
the pool. This is unnecessary. By using a queue, we can remove buffers
from the front and add buffers at the back, both of which can be
implemented in a lock-free way such that they don't contend.
Using the well-known `crossbeam-queue` library, we have such a queue
directly available.
I wasn't able to directly measure a performance gain in terms of
throughput. What we can measure though, is how much time we spend
dealing with our buffer pool vs everything else. If we compare the
`perf` outputs that were recorded during an `iperf` run each, we can see
that we spend about 60% less time dealing with the buffer pool than we
did before.
|Before|After|
|---|---|
|<img width="1982" height="553" alt="Screenshot From 2025-07-24
20-27-50"
src="https://github.com/user-attachments/assets/1698f28b-5821-456f-95fa-d6f85d901920"
/>|<img width="1982" height="553" alt="Screenshot From 2025-07-24
20-27-53"
src="https://github.com/user-attachments/assets/4f26a2d1-03e3-4c0d-84da-82c53b9761dd"
/>|
The number in the thousands on the left is how often the respective
function was the currently executing function during the profiling run.
Resolves: #9972
Profiling has shown that using a spinlock-based buffer pool is
marginally (~1%) faster than the mutex-based one because it resolves
contention quicker.
The current `rust/` directory is a bit of a wild-west in terms of how
the crates are organised. Most of them are simply at the top-level when
in reality, they are all `connlib`-related. The Apple and Android FFI
crates - which are entrypoints in the Rust code are defined several
layers deep.
To improve the situation, we move around and rename several crates. The
end result is that all top-level crates / directories are:
- Either entrypoints into the Rust code, i.e. applications such as
Gateway, Relay or a Client
- Or crates shared across all those entrypoints, such as `telemetry` or
`logging`