Commit Graph

116 Commits

Author SHA1 Message Date
Reactor Scram
493716ab6b refactor(headless-client): change CLI args for the IPC daemon (#4604)
Closes #4515
2024-04-15 18:33:30 +00:00
Reactor Scram
3a67eacfbe refactor(linux-client): replace client-tunnel with headless-client which is the same thing (#4516)
Unfortunately I had to keep `linux-client` to get the compatibility
tests to pass. #4578 aims to remove that package.

Please add to this list if you think of anything:

```[tasklist]
# Things that may break that CI/CD won't catch
- [ ] Github release artifacts
- [ ] Knowledge base 
- [ ] Docker images
- [ ] Docker containers
- [ ] Existing `linux-client` users
- [ ] Anything that downloads ghcr artifacts
- [ ] Nix (Not sure if it's built in CI. It had a merge conflict)
```

Refs #4515, and #3712, #3782

I think this is what Thomas and I agreed on in Slack / Github

---------

Signed-off-by: Reactor Scram <ReactorScram@users.noreply.github.com>
Co-authored-by: Thomas Eizinger <thomas@eizinger.io>
2024-04-10 22:01:55 +00:00
Jamil
7d88e28872 chore(ci): Configure relay with new IP on restart tests (#4571)
See https://firezonehq.slack.com/archives/C0575SD66E5/p1712726575563089
2024-04-10 08:45:38 +00:00
Jamil
09532ea845 chore(ci): Add portal and relay downtime DNS resource tests (#4517)
Tests that DNS still works in the client with established connections
after the portal and/or relay go down.
2024-04-08 09:43:59 +00:00
Reactor Scram
1e4ed7bad6 refactor(ci): move DNS control method up to docker-compose.yml (#4341)
This is part of a yak shave towards CI testing of #3812 

Moving the DNS control method out of `docker-compose.yml` and up to the
integration tests themselves allows us to test these scenarios:

- `systemd-resolved`
- `etc-resolv-conf`
- `systemd-resolved` but we're in a container where that won't work, so
we should gracefully degrade to just allowing IP/CIDR resources
2024-04-02 17:11:29 +00:00
Andrew Dryga
2cf63cb33a fix(portal): Serve static files with digests at root (#4386)
Closes #4384
2024-03-28 16:13:13 -06:00
Thomas Eizinger
18033eafec ci: ensure roaming between networks doesn't abort file download (#4213)
This adds an integration test that downloads a 10MB file from a server
and simulates the client roaming to another network while the download
is active.

We use a DNS resource for this to ensure it also doesn't take too long
in that case. DNS resources are what most users will be using and we
clear some internal DNS caches on connection failures. Hence, using a
DNS resource here is a somewhat roundabout way to test that we aren't
failing and re-establishing the connection but migrate it to a new
network path.
2024-03-26 05:44:59 +00:00
Andrew Dryga
114696c0ba chore(infra): Split terraform files into folders and add domain to production app (#4172) 2024-03-16 11:54:06 -06:00
Andrew Dryga
a85b9ab185 chore(infra): Deploy domain app on a separate instance and enable background jobs on it (#4160)
Closes #3801
2024-03-16 08:58:20 -06:00
Jamil
ffc034d5c4 chore(docker): Add missing okta provider (#4131) 2024-03-14 16:18:26 +00:00
Jamil
63c546eb45 chore(docker): Fix docker image local builds (#4127)
Fixes an artifact leftover from the refactor.

Fixes #4122
2024-03-14 00:06:10 +00:00
Jamil
574585d146 chore(ci): Add debug/ and perf/ prefix to some images (#4104)
Followup from #4100:


- Add `perf/relay` and `debug/relay` etc data plane images in
`firezone-staging`.
- The `perf` images are `debug` stage images and have tooling installed,
but use release binaries.
- The `debug` images are `debug` binaries inside `debug` images
- `firezone-prod` contains only release binaries -- these image names
haven't changed
2024-03-12 20:27:32 +00:00
Jamil
6575e0ca26 chore(ci): Refactor CI to use prod images in staging and prevent accidental hotfix breakages (#4049)
- Runs release asset builds simultaneously with `deploy-staging`. Those
don't depend on each other.
- Prevents running some build workflows in CD because they're run
already in the PR and in the merge group, and the risk of semantic
conflict is negligible
- Run `release` assets in staging
- Adds `compatibility_tests`: **To successfully introduce a breaking
change in the control / data plane APIs, you must now "Merge as
Administrator"**
- Since `CI` is no longer run on `main`, caching needed to be refactored
to make sense again
- Since `CI` is no longer run on `main`, the Elixir
`migrations_and_seeds_test` had to be rewritten. This now tests
migrations using `git checkout` instead of importing `main`'s DB dump.
- Move tauri builds to its own workflow so we can trigger Linux and
Windows builds manually on an adhoc basis like we do for the Swift and
Kotlin builds
- Add a new `hotfix` workflow that will run `compatibility_tests` with
the latest published images
- Add `workflow_dispatch` to trigger `CD` manually for testing purposes
(cc @ReactorScram)


Refs #3995
2024-03-11 20:01:34 +00:00
Reactor Scram
7211e88338 feat(linux-client): generate firezone-id (device ID) automatically if it's not provided at launch (#3920)
Closes #3815 

Changes that are breaking (but these aren't in production so it should
be okay)

- Windows, renaming `device_id.json` to `firezone-id.json` to match the
rest of the code
- Linux GUI, storing the firezone-id under `/var/lib` instead of under
`$HOME`
- Linux GUI, bails out if not run with `sudo --preserve-env` by
detecting `$HOME == root` or `$USER != root`

---------

Signed-off-by: Reactor Scram <ReactorScram@users.noreply.github.com>
2024-03-08 16:13:59 +00:00
Jamil
2ed6b3d07f chore(connlib): Tune log filters to enable debug in dev and info for gateway deployments (#3788)
Refs #3618

---------

Signed-off-by: Jamil <jamilbk@users.noreply.github.com>
Co-authored-by: Thomas Eizinger <thomas@eizinger.io>
2024-02-27 23:35:08 +00:00
Jamil
5bd717b877 fix(ci): Use workflow id to fetch perf results (#3710) 2024-02-20 19:40:16 -08:00
Jamil
7ff40b82ed fix(ci): Run each perf test in its own matrix job (#3695)
The iperf3 server sometimes hangs, or takes a while to startup.

Rather than trying to reset the iperf3 state between performance tests,
this PR refactors them so they each run in their matrix job. This
ensures each performance test will run on a separate VM, unaffected by
previous test runs to eliminate the effect any residual network buffer
state can have on a particular test.

It also makes sure the server is listening with a `healthcheck`.
2024-02-20 22:44:20 +00:00
Gabi
3d3e737ba3 refactor(connlib): replace webrtc-rs with snownet (#3391)
Co-authored-by: Thomas Eizinger <thomas@eizinger.io>

Resolves: #3377.

---------

Co-authored-by: Thomas Eizinger <thomas@eizinger.io>
2024-02-20 06:56:31 +00:00
Andrew Dryga
4dc8cdf908 Revert "fix(gateway): Remove /dev/net/tun requirement and clean up upgrade script (#3691)
This reverts PR #3392.
This reverts commit 16f5401a73.
2024-02-19 20:03:14 +00:00
Jamil
120b3474ee chore(portal): Add okta as IdP in dev (#3675) 2024-02-17 19:09:05 +00:00
Reactor Scram
87f843dcfb ci: document and fix a couple things for local Docker testing (#3672)
Signed-off-by: Reactor Scram <ReactorScram@users.noreply.github.com>
Co-authored-by: Jamil <jamilbk@users.noreply.github.com>
2024-02-17 16:16:39 +00:00
Jamil
073b324d02 fix(ci): Be explicit about service start order (#3673)
This will prevent services from restarting out from under us during
tests.

Service restarts should be explicitly tested as integration tests.

Should fix #3666
2024-02-16 23:19:13 +00:00
Thomas Eizinger
3bc466db9a ci: upgrade iperf (#3662)
Whilst debugging the performance tests in #3391, I found that we are
using a 4 year old version of `iperf` for the server. This, plus
restarting the server inbetween the performance runs resulted in flaky
tests. I am not sure how we arrived at #3303 but
[this](https://github.com/firezone/firezone/actions/runs/7926579022?pr=3391)
CI run succeeded with a big matrix using the newer iperf server and
without the restarts.
2024-02-16 15:08:45 +00:00
Jamil
9054f70995 refactor(ci): simplify dns resources in ci (#3653)
Attempt at cleaning a couple things I missed in code review.

The old httpbin resource wasn't being used anyhow, so I just deduped
them and updated things in a couple other places that had drifted.

Hopefully this fixes the [flaky
CI](https://github.com/firezone/firezone/actions/runs/7918422653/job/21616835910)
2024-02-15 23:50:12 +00:00
Reactor Scram
00f6fcdd09 feat(linux): If FIREZONE_DNS_CONTROL is etc-resolv-conf, modify '/etc/resolv.conf' (#3639)
Only user-facing if users are using the Docker image for the Linux
client.

I split off a module for `/etc/resolv.conf` since the code and unit
tests are about 300 lines and aren't related to the rest of the
`tun_linux.rs` code.

---------

Signed-off-by: Reactor Scram <ReactorScram@users.noreply.github.com>
2024-02-14 23:50:01 +00:00
Reactor Scram
1056af4020 feat(linux): Add FIREZONE_DNS_CONTROL env var to choose which DNS control method to use (#3629)
The Docker image for the client is opted in to this new feature. The
bare `linux-client-x64` exe is not. I don't know if users are using the
Docker images?

I wanted to use CLI args, but the DNS control code ("config" or
"control"? Or "SplitDNS"?) has to run at the end of `set_iface_config`,
which on Linux runs in a worker, so I couldn't figure out how to move it
into `on_set_interface_config` in the callbacks. Maybe there is a way,
but the env var results in a small diff.
2024-02-14 02:54:16 +00:00
Brian Manifold
f18ec6e4d5 Add Okta directory sync (#3614)
Why:

* To allow syncing of users/groups/memberships from an IDP to Firezone,
a custom identify provider adapter needs to be created in the portal
codebase at this time. The custom IDP adapter created in this commit is
for Okta.

* This commit also includes some additional tests for the Microsoft
Entra IDP adapter. These tests were mistakenly overlooked when finishing
the Entra adapter.
2024-02-13 02:12:54 +00:00
Reactor Scram
830302af43 test(linux): Low-risk changes to prepare for Linux DNS support (#3625)
This splits off the easy parts from #3605.

- Add quotes around `PHOENIX_SECURE_COOKIES` because my local
`docker-compose` considers unquoted 'false' to be a schema error - Env
vars are strings or numbers, not bools, it says
- Create `test.httpbin.docker.local` container in a new subnet so it can
be used as a DNS resource without the existing CIDR resource picking it
up
- Add resources and policies to `seeds.exs` per #3342
- Fix warning about `CONNLIB_LOG_UPLOAD_INTERVAL_SECS` not being set
- Add `resolv-conf` dep and unit tests to `firezone-tunnel` and
`firezone-linux-client`
- Impl `on_disconnect` in the Linux client with `tracing::error!`
- Add comments

```[tasklist]
- [x] (failed) Confirm that the client container actually does stop faster this way
- [x] Wait for tests to pass
- [x] Mark as ready for review
```
2024-02-12 19:04:51 +00:00
Thomas Eizinger
5889037c91 fix: don't initialize relay with non-existent interface (#3582)
In the `snownet` integration branch, we ran into some problems because
we actually tried to use the IPv6 relay. This doesn't work though
because the docker-compose doesn't provide an IPv6 socket to the
container and thus the relay falsely registers with the portal as having
an IPv6 address.

Internally, we only bind to a wildcard address (`0.0.0.0` and `::`)
which unfortunately, doesn't seem to fail, even if we don't have an IPv6
interface.
2024-02-06 10:17:32 +00:00
Jamil
6fcfc5497d chore(portal): Enable Microsoft Entra by default in all envs (#3576)
🚀
2024-02-06 00:39:28 +00:00
Jamil
16f5401a73 fix(gateway): Remove /dev/net/tun requirement and clean up upgrade script (#3392)
* Clean up gateway upgrade script
* Fixes #3226 to remove another place where things can go wrong when
upgrading gateways
2024-01-29 04:19:59 +00:00
Jamil
d469f6ad42 feat(ci): Test client gracefully handles portal and relay disconnects (#3376)
Test basic connectivity with the headless client after the portal API
restarts.

Based on top of #3364 to test that portal restarts don't cause a
cascading failure.
2024-01-24 21:04:02 +00:00
Gabi
acb7e17462 refactor(gateway): Update gateway logs level (#3387)
This is to see when connection/reconnections happen
2024-01-24 19:56:26 +00:00
Thomas Eizinger
6b789d6932 feat(phoenix-channel): automatically reconnect based on provided ExponentialBackoff (#3364)
Currently, only the gateway has a reconnect logic for (transient) errors
when connecting to the portal. Instead of duplicating this for the
relay, I moved the reconnect state machine to `phoenix-channel`. This
means the relay now automatically gets it too and in the future, the
clients will also benefit from it.

As a nice benefit, this also greatly simplifies the gateway's
`Eventloop` and removes a bunch of cruft with channels.

Resolves: #2915.
2024-01-24 16:39:53 +00:00
Jamil
bc5582cd2d fix(ci): Disable IPv6 in Docker-based integration tests due to flakiness (#3277)
Getting IPv6-related timeouts and flakiness. It's disabled for the
testbed and the connection tests so following suit here since we don't
have tests that use IPv6.
2024-01-17 22:15:53 +00:00
Jamil Bou Kheir
09526f497a depend on httpbin 2024-01-17 03:48:11 -08:00
Jamil
3c2b32c215 revert(devops): Revert healthcommands (#3280) 2024-01-17 03:35:45 -08:00
Andrew Dryga
832fc3f2e3 Implement rest of TODOs after token refactoring (#3160)
- [x] Introduce api_client actor type and code to create and
authenticate using it's token
- [x] Unify Tokens usage for Relays and Gateways
- [x] Unify Tokens usage for magic links


Closes #2367
Ref #2696
2024-01-16 21:39:00 +00:00
Jamil
36209c7d2d fix(rust): Check /proc for health checks (#3250)
Debian slim is slimmer than we could ever have imagined.
2024-01-16 16:46:44 +00:00
Jamil
b1738bdd46 feat(ci): Add e2e test bed (#3135)
- [x] Launch control plane via docker compose
- [x] Ensure all clients build
2024-01-16 01:57:41 +00:00
Jamil
b8e2a59570 fix(connlib): Use debian:12-slim for Rust base image (#3243)
Fixes #3215
2024-01-16 01:53:32 +00:00
Andrew Dryga
ed5437c881 security(portal): Rework auth tokens (#2696)
- [x] make sure that session cookie for client is stored separately from
session cookie for the portal (will close #2647 and #2032)
- [x] #2622
- [ ] #2501
- [ ] show identity tokens and allow rotating/deleting them (#2138)
- [ ] #2042
- [ ] use Tokens context for Relays and Gateways to remove duplication
- [x] #2823
- [ ] Expire LiveView sockets when subject is expired
- [ ] Service Accounts UI is ambiguous now because of token identity and
actual token shown
- [ ] Limit subject permissions based on token type

Closes #2924. Now we extend the lifetime for client tokens, but not for
browsers.
2024-01-09 13:36:21 -06:00
Gabi
5edfe80eb0 connlib: tune disconnect parameters (#2977)
Should fix #2946 (still testing, trying to reproduce the error reported
in the issue)
2023-12-21 19:37:07 +00:00
Gabi
8e34457340 Add support for DNS sudomains (#2735)
This PR changes the protocol and adds support for DNS subdomains, now
when a DNS resource is added all its subdomains are automatically
tunneled too. Later we will add support for `*.domain` or `?.domain` but
currently there is an Apple split tunnel implementation limitation which
is too labor-intensive to fix right away.

Fixes #2661 

Co-authored-by: Andrew Dryga <andrew@dryga.com>
2023-12-08 00:16:42 -05:00
bmanifold
ef480e1acd Add routing option for sites (#2610)
Why:

* As sites are created, the default behavior right now is to route
traffic through whichever path is easiest/fastest. This commit adds the
ability to allow the admin to choose a routing policy for a given site.
2023-11-22 19:59:54 +00:00
Gabi
aec5b97012 Add performance tests for client-gateway communication (#2655) 2023-11-17 00:32:34 -06:00
Jamil
2bca378f17 Allow data plane configuration at runtime (#2477)
## Changelog

- Updates connlib parameter API_URL (formerly known under different
names as `CONTROL_PLANE_URL`, `PORTAL_URL`, `PORTAL_WS_URL`, and
friends) to be configured as an "advanced" or "hidden" feature at
runtime so that we can test production builds on both staging and
production.
- Makes `AUTH_BASE_URL` configurable at runtime too
- Moves `CONNLIB_LOG_FILTER_STRING` to be configured like this as well
and simplifies its naming
- Fixes a timing attack bug on Android when comparing the `csrf` token
- Adds proper account ID validation to Android to prevent invalid URL
parameter strings from being saved and used
- Cleans up a number of UI / view issues on Android regarding typos,
consistency, etc
- Hides vars from from the `relay` CLI we may not want to expose just
yet
- `get_device_id()` is flawed for connlib components -- SMBios is rarely
available. Data plane components now require a `FIREZONE_ID` now instead
to use for upserting.


Fixes #2482 
Fixes #2471

---------

Signed-off-by: Jamil <jamilbk@users.noreply.github.com>
Co-authored-by: Gabi <gabrielalejandro7@gmail.com>
2023-10-30 23:46:53 -07:00
Andrew Dryga
98383e8622 Introduce Sites (#2516)
Closes #2513
2023-10-27 13:10:36 -06:00
Andrew Dryga
34cb88f5af Fix cache registry references 2023-10-25 13:52:50 -06:00
Jamil
fa57d66965 Publish Releases (#2344)
- rebuild and publish gateway and relay binaries to currently drafted
release
- re-tag current relay/gateway images and push to ghcr.io

Stacked on #2341 to prevent conflicts

Fixes #2223 
Fixes #2205 
Fixes #2202
Fixes #2239 

~~Still TODO: `arm64` images and binaries...~~ Edit: added via
`cross-rs`
2023-10-20 14:20:43 -07:00