Files
firezone/rust/gateway
Gabi bc8f438a56 feat(connlib): directly send wireguard traffic instead of tunneling it through WebRTC datachannels (#2643)
This PR started as part of a degradation in performance for the
gateways.

The way to test performance in a realistic enviroment is using a GCP vm
as a client and an AWS vm as a gateway with a single iperf server behind
the gateway.

Then the `iperf` results with current main:

```
Connecting to host 172.31.92.238, port 5201
Reverse mode, remote host 172.31.92.238 is sending
[  5] local 100.83.194.77 port 58426 connected to 172.31.92.238 port 5201
[ ID] Interval           Transfer     Bitrate
[  5]   0.00-1.00   sec  1.01 MBytes  8.50 Mbits/sec                  
[  5]   1.00-2.00   sec  1.14 MBytes  9.59 Mbits/sec                  
[  5]   2.00-3.00   sec   699 KBytes  5.73 Mbits/sec                  
[  5]   3.00-4.00   sec  1.11 MBytes  9.31 Mbits/sec                  
[  5]   4.00-5.00   sec   664 KBytes  5.44 Mbits/sec                  
[  5]   5.00-6.00   sec   591 KBytes  4.84 Mbits/sec                  
[  5]   6.00-7.00   sec   722 KBytes  5.91 Mbits/sec                  
[  5]   7.00-8.00   sec   833 KBytes  6.83 Mbits/sec                  
[  5]   8.00-9.00   sec   738 KBytes  6.04 Mbits/sec                  
[  5]   9.00-10.00  sec   836 KBytes  6.85 Mbits/sec                  
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-10.06  sec  8.78 MBytes  7.32 Mbits/sec    3             sender
[  5]   0.00-10.00  sec  8.23 MBytes  6.90 Mbits/sec                  receiver

iperf Done.
```

Most of the performance problems were due to using SCTP and DTLS.

So I created a
[fork](https://github.com/firezone/webrtc/tree/expose-new-endpoint) of
webrtc that let us circumvent those, since we don't need them because we
are depending on wireguard for encryption.

With those changes much better throughput is achieved:

```
gabriel@cloudshell:~ (firezone-personal-instances)$ iperf3 -R -c 172.31.92.238
Connecting to host 172.31.92.238, port 5201
Reverse mode, remote host 172.31.92.238 is sending
[  5] local 100.83.194.77 port 51206 connected to 172.31.92.238 port 5201
[ ID] Interval           Transfer     Bitrate
[  5]   0.00-1.00   sec  5.60 MBytes  47.0 Mbits/sec                  
[  5]   1.00-2.00   sec  17.2 MBytes   144 Mbits/sec                  
[  5]   2.00-3.00   sec  15.8 MBytes   132 Mbits/sec                  
[  5]   3.00-4.00   sec  14.8 MBytes   125 Mbits/sec                  
[  5]   4.00-5.00   sec  15.9 MBytes   133 Mbits/sec                  
[  5]   5.00-6.00   sec  15.8 MBytes   133 Mbits/sec                  
[  5]   6.00-7.00   sec  15.3 MBytes   128 Mbits/sec                  
[  5]   7.00-8.00   sec  15.6 MBytes   131 Mbits/sec                  
[  5]   8.00-9.00   sec  15.6 MBytes   131 Mbits/sec                  
[  5]   9.00-10.00  sec  16.0 MBytes   134 Mbits/sec                  
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-10.05  sec   151 MBytes   126 Mbits/sec   74             sender
[  5]   0.00-10.00  sec   148 MBytes   124 Mbits/sec                  receiver

iperf Done
```

However, this is still worse than it was achieved with a previous
commit(`21afdf0a9a113c996d60a63b2e8c8f32d3aeb87`):
```
gabriel@cloudshell:~ (firezone-personal-instances)$ iperf3 -R -c 172.31.92.238
Connecting to host 172.31.92.238, port 5201
Reverse mode, remote host 172.31.92.238 is sending
[  5] local 100.100.68.41 port 49762 connected to 172.31.92.238 port 5201
[ ID] Interval           Transfer     Bitrate
[  5]   0.00-1.00   sec  6.14 MBytes  51.5 Mbits/sec                  
[  5]   1.00-2.00   sec  17.1 MBytes   144 Mbits/sec                  
[  5]   2.00-3.00   sec  22.8 MBytes   191 Mbits/sec                  
[  5]   3.00-4.00   sec  23.5 MBytes   197 Mbits/sec                  
[  5]   4.00-5.00   sec  23.0 MBytes   193 Mbits/sec                  
[  5]   5.00-6.00   sec  22.1 MBytes   185 Mbits/sec                  
[  5]   6.00-7.00   sec  23.0 MBytes   193 Mbits/sec                  
[  5]   7.00-8.00   sec  22.7 MBytes   190 Mbits/sec                  
[  5]   8.00-9.00   sec  21.0 MBytes   176 Mbits/sec                  
[  5]   9.00-10.00  sec  19.9 MBytes   167 Mbits/sec                  
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-10.05  sec   204 MBytes   170 Mbits/sec  127             sender
[  5]   0.00-10.00  sec   201 MBytes   169 Mbits/sec                  receiver
```

My profiling suggested that this is due to reading/writing packets
happening in its own dedicated tasks. So much so that maybe in the
future we should even consider spawning their own dedicated runtime so
that those loops have a dedicated OS thread.

Also, probably using a multi-queue interface will give us huge gains if
we have a dedicated task for each queue(currently the interface is
started as a multi-queue but a single file descriptor is used) for
handling multiple concurrent clients.

However, the changes proposed in this PR are good enough for now as long
as performance don't degrade.

In that line I will create a CI that reports the throughput using the
local `docker-compose.yml` file that we should always check before
merging, that is not the be all end all of the performance story but for
smaller PRs the correlation to real world throughput should be enough.

For bigger PRs we should manually test before merging for now, until we
have a way in CI to spin up some realistic tests(note that vms should be
in separate cloud enviroments, the same-cloud links are so reliable that
we miss actual performance degradation due to dropped packets). On this
note I'll write a small manual on how to conduct those tests with full
current results that we should use always before merging new PRs that
affect the hot-path. cc @thomaseizinger

Finally, when testing these changes I found some flakiness regarding the
re-connection path. So I changed things so that we cleanup connections
only using wireguard's error(connection expiration). This is quite slow
for now (~120 seconds) but in the future we can issue an ice restart
each time wireguard keepalive expires(rekey timeout) so that we can
restart connection each ~30 seconds and we can reduce the keepalive time
out from the portal to accelerate it even more. And in the future we can
get smarter about it.

---------

Co-authored-by: Thomas Eizinger <thomas@eizinger.io>
2023-11-16 02:59:48 +00:00
..

gateway

This crate houses the Firezone gateway.

Building

You can build the gateway using: cargo build --release --bin firezone-gateway

You should then find a binary in target/release/firezone-gateway.

Running

The Firezone Gateway supports Linux only. To run the Gateway binary on your Linux host:

  1. Generate a new Gateway token from the "Gateways" section of the admin portal and save it in your secrets manager.
  2. Ensure the FIREZONE_TOKEN=<gateway_token> environment variable is set securely in your Gateway's shell environment. The Gateway requires this variable at startup.
  3. Set FIREZONE_ID to a unique string to identify this gateway in the portal, e.g. export FIREZONE_ID=$(uuidgen). The Gateway requires this variable at startup.
  4. Now, you can start the Gateway with:
firezone-gateway

If you're running as a non-root user, you'll need the CAP_NET_ADMIN capability to open /dev/net/tun. You can add this to the gateway binary with:

sudo setcap 'cap_net_admin+eip' /path/to/firezone-gateway

Ports

The gateway requires no open ports. Connections automatically traverse NAT with STUN/TURN via the relay.