I don't believe we use/need TCP for the Relays. Better to keep the ports
closed if so.
Also, the docker-compose.yml is updated to allow the `relay-1` service
to respond to all its ports, since we don't need those mapped typically.
Currently, `phoenix-channel` calls `flush` manually to ensure we don't
have messages sitting in a buffer somewhere. This is somewhat wasteful
if we haven't actually written any message. We can move the flushing to
directly after sending the message.
To avoid further buffering on the TCP level, we disable Nagle's
algorithm to avoid buffering on the TCP level.
Closes#4907
They're still accepted, but the binary entirely determines the behavior.
This makes the code for CLI parsing and token handling simpler with
fewer branches, so it's easier to be sure it's correct.
Replaces #4942 which isn't doing what I intended anymore.
Whenever we receive a `relays_presence` message from the portal, we
invalidate the candidates of all now disconnected relays and make
allocations on the new ones. This triggers signalling of new candidates
to the remote party and migrates the connection to the newly nominated
socket.
This still relies on #4613 until we have #4634.
Resolves: #4548.
---------
Co-authored-by: Jamil <jamilbk@users.noreply.github.com>
Unfortunately I had to keep `linux-client` to get the compatibility
tests to pass. #4578 aims to remove that package.
Please add to this list if you think of anything:
```[tasklist]
# Things that may break that CI/CD won't catch
- [ ] Github release artifacts
- [ ] Knowledge base
- [ ] Docker images
- [ ] Docker containers
- [ ] Existing `linux-client` users
- [ ] Anything that downloads ghcr artifacts
- [ ] Nix (Not sure if it's built in CI. It had a merge conflict)
```
Refs #4515, and #3712, #3782
I think this is what Thomas and I agreed on in Slack / Github
---------
Signed-off-by: Reactor Scram <ReactorScram@users.noreply.github.com>
Co-authored-by: Thomas Eizinger <thomas@eizinger.io>
This is part of a yak shave towards CI testing of #3812
Moving the DNS control method out of `docker-compose.yml` and up to the
integration tests themselves allows us to test these scenarios:
- `systemd-resolved`
- `etc-resolv-conf`
- `systemd-resolved` but we're in a container where that won't work, so
we should gracefully degrade to just allowing IP/CIDR resources
This adds an integration test that downloads a 10MB file from a server
and simulates the client roaming to another network while the download
is active.
We use a DNS resource for this to ensure it also doesn't take too long
in that case. DNS resources are what most users will be using and we
clear some internal DNS caches on connection failures. Hence, using a
DNS resource here is a somewhat roundabout way to test that we aren't
failing and re-establishing the connection but migrate it to a new
network path.
Followup from #4100:
- Add `perf/relay` and `debug/relay` etc data plane images in
`firezone-staging`.
- The `perf` images are `debug` stage images and have tooling installed,
but use release binaries.
- The `debug` images are `debug` binaries inside `debug` images
- `firezone-prod` contains only release binaries -- these image names
haven't changed
- Runs release asset builds simultaneously with `deploy-staging`. Those
don't depend on each other.
- Prevents running some build workflows in CD because they're run
already in the PR and in the merge group, and the risk of semantic
conflict is negligible
- Run `release` assets in staging
- Adds `compatibility_tests`: **To successfully introduce a breaking
change in the control / data plane APIs, you must now "Merge as
Administrator"**
- Since `CI` is no longer run on `main`, caching needed to be refactored
to make sense again
- Since `CI` is no longer run on `main`, the Elixir
`migrations_and_seeds_test` had to be rewritten. This now tests
migrations using `git checkout` instead of importing `main`'s DB dump.
- Move tauri builds to its own workflow so we can trigger Linux and
Windows builds manually on an adhoc basis like we do for the Swift and
Kotlin builds
- Add a new `hotfix` workflow that will run `compatibility_tests` with
the latest published images
- Add `workflow_dispatch` to trigger `CD` manually for testing purposes
(cc @ReactorScram)
Refs #3995
Closes#3815
Changes that are breaking (but these aren't in production so it should
be okay)
- Windows, renaming `device_id.json` to `firezone-id.json` to match the
rest of the code
- Linux GUI, storing the firezone-id under `/var/lib` instead of under
`$HOME`
- Linux GUI, bails out if not run with `sudo --preserve-env` by
detecting `$HOME == root` or `$USER != root`
---------
Signed-off-by: Reactor Scram <ReactorScram@users.noreply.github.com>
The iperf3 server sometimes hangs, or takes a while to startup.
Rather than trying to reset the iperf3 state between performance tests,
this PR refactors them so they each run in their matrix job. This
ensures each performance test will run on a separate VM, unaffected by
previous test runs to eliminate the effect any residual network buffer
state can have on a particular test.
It also makes sure the server is listening with a `healthcheck`.
This will prevent services from restarting out from under us during
tests.
Service restarts should be explicitly tested as integration tests.
Should fix#3666
Whilst debugging the performance tests in #3391, I found that we are
using a 4 year old version of `iperf` for the server. This, plus
restarting the server inbetween the performance runs resulted in flaky
tests. I am not sure how we arrived at #3303 but
[this](https://github.com/firezone/firezone/actions/runs/7926579022?pr=3391)
CI run succeeded with a big matrix using the newer iperf server and
without the restarts.
Attempt at cleaning a couple things I missed in code review.
The old httpbin resource wasn't being used anyhow, so I just deduped
them and updated things in a couple other places that had drifted.
Hopefully this fixes the [flaky
CI](https://github.com/firezone/firezone/actions/runs/7918422653/job/21616835910)
Only user-facing if users are using the Docker image for the Linux
client.
I split off a module for `/etc/resolv.conf` since the code and unit
tests are about 300 lines and aren't related to the rest of the
`tun_linux.rs` code.
---------
Signed-off-by: Reactor Scram <ReactorScram@users.noreply.github.com>
The Docker image for the client is opted in to this new feature. The
bare `linux-client-x64` exe is not. I don't know if users are using the
Docker images?
I wanted to use CLI args, but the DNS control code ("config" or
"control"? Or "SplitDNS"?) has to run at the end of `set_iface_config`,
which on Linux runs in a worker, so I couldn't figure out how to move it
into `on_set_interface_config` in the callbacks. Maybe there is a way,
but the env var results in a small diff.
Why:
* To allow syncing of users/groups/memberships from an IDP to Firezone,
a custom identify provider adapter needs to be created in the portal
codebase at this time. The custom IDP adapter created in this commit is
for Okta.
* This commit also includes some additional tests for the Microsoft
Entra IDP adapter. These tests were mistakenly overlooked when finishing
the Entra adapter.
This splits off the easy parts from #3605.
- Add quotes around `PHOENIX_SECURE_COOKIES` because my local
`docker-compose` considers unquoted 'false' to be a schema error - Env
vars are strings or numbers, not bools, it says
- Create `test.httpbin.docker.local` container in a new subnet so it can
be used as a DNS resource without the existing CIDR resource picking it
up
- Add resources and policies to `seeds.exs` per #3342
- Fix warning about `CONNLIB_LOG_UPLOAD_INTERVAL_SECS` not being set
- Add `resolv-conf` dep and unit tests to `firezone-tunnel` and
`firezone-linux-client`
- Impl `on_disconnect` in the Linux client with `tracing::error!`
- Add comments
```[tasklist]
- [x] (failed) Confirm that the client container actually does stop faster this way
- [x] Wait for tests to pass
- [x] Mark as ready for review
```
In the `snownet` integration branch, we ran into some problems because
we actually tried to use the IPv6 relay. This doesn't work though
because the docker-compose doesn't provide an IPv6 socket to the
container and thus the relay falsely registers with the portal as having
an IPv6 address.
Internally, we only bind to a wildcard address (`0.0.0.0` and `::`)
which unfortunately, doesn't seem to fail, even if we don't have an IPv6
interface.
Test basic connectivity with the headless client after the portal API
restarts.
Based on top of #3364 to test that portal restarts don't cause a
cascading failure.
Currently, only the gateway has a reconnect logic for (transient) errors
when connecting to the portal. Instead of duplicating this for the
relay, I moved the reconnect state machine to `phoenix-channel`. This
means the relay now automatically gets it too and in the future, the
clients will also benefit from it.
As a nice benefit, this also greatly simplifies the gateway's
`Eventloop` and removes a bunch of cruft with channels.
Resolves: #2915.
Getting IPv6-related timeouts and flakiness. It's disabled for the
testbed and the connection tests so following suit here since we don't
have tests that use IPv6.