Commit Graph

102 Commits

Author SHA1 Message Date
Jamil
2ed6b3d07f chore(connlib): Tune log filters to enable debug in dev and info for gateway deployments (#3788)
Refs #3618

---------

Signed-off-by: Jamil <jamilbk@users.noreply.github.com>
Co-authored-by: Thomas Eizinger <thomas@eizinger.io>
2024-02-27 23:35:08 +00:00
Jamil
5bd717b877 fix(ci): Use workflow id to fetch perf results (#3710) 2024-02-20 19:40:16 -08:00
Jamil
7ff40b82ed fix(ci): Run each perf test in its own matrix job (#3695)
The iperf3 server sometimes hangs, or takes a while to startup.

Rather than trying to reset the iperf3 state between performance tests,
this PR refactors them so they each run in their matrix job. This
ensures each performance test will run on a separate VM, unaffected by
previous test runs to eliminate the effect any residual network buffer
state can have on a particular test.

It also makes sure the server is listening with a `healthcheck`.
2024-02-20 22:44:20 +00:00
Gabi
3d3e737ba3 refactor(connlib): replace webrtc-rs with snownet (#3391)
Co-authored-by: Thomas Eizinger <thomas@eizinger.io>

Resolves: #3377.

---------

Co-authored-by: Thomas Eizinger <thomas@eizinger.io>
2024-02-20 06:56:31 +00:00
Andrew Dryga
4dc8cdf908 Revert "fix(gateway): Remove /dev/net/tun requirement and clean up upgrade script (#3691)
This reverts PR #3392.
This reverts commit 16f5401a73.
2024-02-19 20:03:14 +00:00
Jamil
120b3474ee chore(portal): Add okta as IdP in dev (#3675) 2024-02-17 19:09:05 +00:00
Reactor Scram
87f843dcfb ci: document and fix a couple things for local Docker testing (#3672)
Signed-off-by: Reactor Scram <ReactorScram@users.noreply.github.com>
Co-authored-by: Jamil <jamilbk@users.noreply.github.com>
2024-02-17 16:16:39 +00:00
Jamil
073b324d02 fix(ci): Be explicit about service start order (#3673)
This will prevent services from restarting out from under us during
tests.

Service restarts should be explicitly tested as integration tests.

Should fix #3666
2024-02-16 23:19:13 +00:00
Thomas Eizinger
3bc466db9a ci: upgrade iperf (#3662)
Whilst debugging the performance tests in #3391, I found that we are
using a 4 year old version of `iperf` for the server. This, plus
restarting the server inbetween the performance runs resulted in flaky
tests. I am not sure how we arrived at #3303 but
[this](https://github.com/firezone/firezone/actions/runs/7926579022?pr=3391)
CI run succeeded with a big matrix using the newer iperf server and
without the restarts.
2024-02-16 15:08:45 +00:00
Jamil
9054f70995 refactor(ci): simplify dns resources in ci (#3653)
Attempt at cleaning a couple things I missed in code review.

The old httpbin resource wasn't being used anyhow, so I just deduped
them and updated things in a couple other places that had drifted.

Hopefully this fixes the [flaky
CI](https://github.com/firezone/firezone/actions/runs/7918422653/job/21616835910)
2024-02-15 23:50:12 +00:00
Reactor Scram
00f6fcdd09 feat(linux): If FIREZONE_DNS_CONTROL is etc-resolv-conf, modify '/etc/resolv.conf' (#3639)
Only user-facing if users are using the Docker image for the Linux
client.

I split off a module for `/etc/resolv.conf` since the code and unit
tests are about 300 lines and aren't related to the rest of the
`tun_linux.rs` code.

---------

Signed-off-by: Reactor Scram <ReactorScram@users.noreply.github.com>
2024-02-14 23:50:01 +00:00
Reactor Scram
1056af4020 feat(linux): Add FIREZONE_DNS_CONTROL env var to choose which DNS control method to use (#3629)
The Docker image for the client is opted in to this new feature. The
bare `linux-client-x64` exe is not. I don't know if users are using the
Docker images?

I wanted to use CLI args, but the DNS control code ("config" or
"control"? Or "SplitDNS"?) has to run at the end of `set_iface_config`,
which on Linux runs in a worker, so I couldn't figure out how to move it
into `on_set_interface_config` in the callbacks. Maybe there is a way,
but the env var results in a small diff.
2024-02-14 02:54:16 +00:00
Brian Manifold
f18ec6e4d5 Add Okta directory sync (#3614)
Why:

* To allow syncing of users/groups/memberships from an IDP to Firezone,
a custom identify provider adapter needs to be created in the portal
codebase at this time. The custom IDP adapter created in this commit is
for Okta.

* This commit also includes some additional tests for the Microsoft
Entra IDP adapter. These tests were mistakenly overlooked when finishing
the Entra adapter.
2024-02-13 02:12:54 +00:00
Reactor Scram
830302af43 test(linux): Low-risk changes to prepare for Linux DNS support (#3625)
This splits off the easy parts from #3605.

- Add quotes around `PHOENIX_SECURE_COOKIES` because my local
`docker-compose` considers unquoted 'false' to be a schema error - Env
vars are strings or numbers, not bools, it says
- Create `test.httpbin.docker.local` container in a new subnet so it can
be used as a DNS resource without the existing CIDR resource picking it
up
- Add resources and policies to `seeds.exs` per #3342
- Fix warning about `CONNLIB_LOG_UPLOAD_INTERVAL_SECS` not being set
- Add `resolv-conf` dep and unit tests to `firezone-tunnel` and
`firezone-linux-client`
- Impl `on_disconnect` in the Linux client with `tracing::error!`
- Add comments

```[tasklist]
- [x] (failed) Confirm that the client container actually does stop faster this way
- [x] Wait for tests to pass
- [x] Mark as ready for review
```
2024-02-12 19:04:51 +00:00
Thomas Eizinger
5889037c91 fix: don't initialize relay with non-existent interface (#3582)
In the `snownet` integration branch, we ran into some problems because
we actually tried to use the IPv6 relay. This doesn't work though
because the docker-compose doesn't provide an IPv6 socket to the
container and thus the relay falsely registers with the portal as having
an IPv6 address.

Internally, we only bind to a wildcard address (`0.0.0.0` and `::`)
which unfortunately, doesn't seem to fail, even if we don't have an IPv6
interface.
2024-02-06 10:17:32 +00:00
Jamil
6fcfc5497d chore(portal): Enable Microsoft Entra by default in all envs (#3576)
🚀
2024-02-06 00:39:28 +00:00
Jamil
16f5401a73 fix(gateway): Remove /dev/net/tun requirement and clean up upgrade script (#3392)
* Clean up gateway upgrade script
* Fixes #3226 to remove another place where things can go wrong when
upgrading gateways
2024-01-29 04:19:59 +00:00
Jamil
d469f6ad42 feat(ci): Test client gracefully handles portal and relay disconnects (#3376)
Test basic connectivity with the headless client after the portal API
restarts.

Based on top of #3364 to test that portal restarts don't cause a
cascading failure.
2024-01-24 21:04:02 +00:00
Gabi
acb7e17462 refactor(gateway): Update gateway logs level (#3387)
This is to see when connection/reconnections happen
2024-01-24 19:56:26 +00:00
Thomas Eizinger
6b789d6932 feat(phoenix-channel): automatically reconnect based on provided ExponentialBackoff (#3364)
Currently, only the gateway has a reconnect logic for (transient) errors
when connecting to the portal. Instead of duplicating this for the
relay, I moved the reconnect state machine to `phoenix-channel`. This
means the relay now automatically gets it too and in the future, the
clients will also benefit from it.

As a nice benefit, this also greatly simplifies the gateway's
`Eventloop` and removes a bunch of cruft with channels.

Resolves: #2915.
2024-01-24 16:39:53 +00:00
Jamil
bc5582cd2d fix(ci): Disable IPv6 in Docker-based integration tests due to flakiness (#3277)
Getting IPv6-related timeouts and flakiness. It's disabled for the
testbed and the connection tests so following suit here since we don't
have tests that use IPv6.
2024-01-17 22:15:53 +00:00
Jamil Bou Kheir
09526f497a depend on httpbin 2024-01-17 03:48:11 -08:00
Jamil
3c2b32c215 revert(devops): Revert healthcommands (#3280) 2024-01-17 03:35:45 -08:00
Andrew Dryga
832fc3f2e3 Implement rest of TODOs after token refactoring (#3160)
- [x] Introduce api_client actor type and code to create and
authenticate using it's token
- [x] Unify Tokens usage for Relays and Gateways
- [x] Unify Tokens usage for magic links


Closes #2367
Ref #2696
2024-01-16 21:39:00 +00:00
Jamil
36209c7d2d fix(rust): Check /proc for health checks (#3250)
Debian slim is slimmer than we could ever have imagined.
2024-01-16 16:46:44 +00:00
Jamil
b1738bdd46 feat(ci): Add e2e test bed (#3135)
- [x] Launch control plane via docker compose
- [x] Ensure all clients build
2024-01-16 01:57:41 +00:00
Jamil
b8e2a59570 fix(connlib): Use debian:12-slim for Rust base image (#3243)
Fixes #3215
2024-01-16 01:53:32 +00:00
Andrew Dryga
ed5437c881 security(portal): Rework auth tokens (#2696)
- [x] make sure that session cookie for client is stored separately from
session cookie for the portal (will close #2647 and #2032)
- [x] #2622
- [ ] #2501
- [ ] show identity tokens and allow rotating/deleting them (#2138)
- [ ] #2042
- [ ] use Tokens context for Relays and Gateways to remove duplication
- [x] #2823
- [ ] Expire LiveView sockets when subject is expired
- [ ] Service Accounts UI is ambiguous now because of token identity and
actual token shown
- [ ] Limit subject permissions based on token type

Closes #2924. Now we extend the lifetime for client tokens, but not for
browsers.
2024-01-09 13:36:21 -06:00
Gabi
5edfe80eb0 connlib: tune disconnect parameters (#2977)
Should fix #2946 (still testing, trying to reproduce the error reported
in the issue)
2023-12-21 19:37:07 +00:00
Gabi
8e34457340 Add support for DNS sudomains (#2735)
This PR changes the protocol and adds support for DNS subdomains, now
when a DNS resource is added all its subdomains are automatically
tunneled too. Later we will add support for `*.domain` or `?.domain` but
currently there is an Apple split tunnel implementation limitation which
is too labor-intensive to fix right away.

Fixes #2661 

Co-authored-by: Andrew Dryga <andrew@dryga.com>
2023-12-08 00:16:42 -05:00
bmanifold
ef480e1acd Add routing option for sites (#2610)
Why:

* As sites are created, the default behavior right now is to route
traffic through whichever path is easiest/fastest. This commit adds the
ability to allow the admin to choose a routing policy for a given site.
2023-11-22 19:59:54 +00:00
Gabi
aec5b97012 Add performance tests for client-gateway communication (#2655) 2023-11-17 00:32:34 -06:00
Jamil
2bca378f17 Allow data plane configuration at runtime (#2477)
## Changelog

- Updates connlib parameter API_URL (formerly known under different
names as `CONTROL_PLANE_URL`, `PORTAL_URL`, `PORTAL_WS_URL`, and
friends) to be configured as an "advanced" or "hidden" feature at
runtime so that we can test production builds on both staging and
production.
- Makes `AUTH_BASE_URL` configurable at runtime too
- Moves `CONNLIB_LOG_FILTER_STRING` to be configured like this as well
and simplifies its naming
- Fixes a timing attack bug on Android when comparing the `csrf` token
- Adds proper account ID validation to Android to prevent invalid URL
parameter strings from being saved and used
- Cleans up a number of UI / view issues on Android regarding typos,
consistency, etc
- Hides vars from from the `relay` CLI we may not want to expose just
yet
- `get_device_id()` is flawed for connlib components -- SMBios is rarely
available. Data plane components now require a `FIREZONE_ID` now instead
to use for upserting.


Fixes #2482 
Fixes #2471

---------

Signed-off-by: Jamil <jamilbk@users.noreply.github.com>
Co-authored-by: Gabi <gabrielalejandro7@gmail.com>
2023-10-30 23:46:53 -07:00
Andrew Dryga
98383e8622 Introduce Sites (#2516)
Closes #2513
2023-10-27 13:10:36 -06:00
Andrew Dryga
34cb88f5af Fix cache registry references 2023-10-25 13:52:50 -06:00
Jamil
fa57d66965 Publish Releases (#2344)
- rebuild and publish gateway and relay binaries to currently drafted
release
- re-tag current relay/gateway images and push to ghcr.io

Stacked on #2341 to prevent conflicts

Fixes #2223 
Fixes #2205 
Fixes #2202
Fixes #2239 

~~Still TODO: `arm64` images and binaries...~~ Edit: added via
`cross-rs`
2023-10-20 14:20:43 -07:00
Jamil
573124bd2f Document relay gateway client CLIs (#2424)
Fixes #2363 

* Rename `relay` package to `firezone-relay` so that binaries outputted
match the `firezone-*` cli naming scheme
* Rename `firezone-headless-client` package to `firezone-linux-client`
for consistency
* Add READMEs for user-facing CLI components (there will also be docs
later)
2023-10-19 00:59:17 +00:00
Jamil
6ec10b2669 Revert "Fix/website mdx" (#2434)
Reverts firezone/firezone#2433
2023-10-18 11:42:54 -07:00
Jamil
caef531b17 Fix/website mdx (#2433) 2023-10-18 11:42:18 -07:00
Andrew Dryga
0aab4077f8 Fix auth flow state, bump COS to 109, enable fluentbit logging, auto-remove docker registry artifacts (#2315) 2023-10-11 16:19:47 -06:00
Andrew Dryga
0eeefa03c7 Use postgres 15.2 in docker-compose (same as production) 2023-10-06 15:47:56 -06:00
Andrew Dryga
42bbafc04d Merge firezone/containers into elixir/Dockerfile for better reuse and maintainability (#2267)
Upsides:
1. We don't need to maintain a separate repo and Dockerfile just for
Elixir image (permissions, runner labels, etc)
2. No need to push intermediate images to the container registry
3. No need to copy-paste alpine/erlang/elixir version and hashes from
`firezone/containers` to `elixir/dockerfile` every time they change
4. No need to cross-compile for local dev environments, better
experience building with slow internet connection
5. One command to test if our code works on our containers but a
different alpine/erlang/elixir version

Downsides:
1. Locally devs will need to compile Erlang at least once per version,
but the whole build takes ~6 minutes on my M1 Max. It also takes only 8
minutes on the free GitHub Actions runner without any cache.
2. Worse experience on slow machines

FYI: there is no performance penalty once we have cache layers, still
takes 30 seconds on CI.
2023-10-06 15:34:47 -06:00
Andrew Dryga
a75e71ef7e Rename caches (#2255) 2023-10-05 10:01:15 -06:00
Thomas Eizinger
9a41983447 ci: optimize caching further (#2246)
This patch-set aims to make several improvements to our CI caching:

1. Use of registry as build cache: Pushes a separate image to our docker
registry at GCP that contains the cache layers. This happens for every
PR & main. As a result, we can restore from **both** which should make
repeated runs of CI on an individual PR faster and give us a good
baseline cache for new PRs from `main`. See
https://docs.docker.com/build/ci/github-actions/cache/#registry-cache
for details. As a nice side-effect, this allows us to use the 10 GB we
have on GitHub actions for other jobs.
2. We make better use of `restore-keys` by also attempting to restore
the cache if the fingerprint of our lockfiles doesn't match. This is
useful for CI runs that upgrade dependencies. Those will restore a cache
that is still useful although doesn't quite match. That is better[^1]
than not hitting the cache at all.
3. There were two tiny bugs in our Swift and Android builds:
a. We used `rustup show` in the wrong directory and thus did not
actually install the toolchain properly.
b. We used `shared-key` instead of `key` for the
https://github.com/Swatinem/rust-cache action and thus did not
differentiate between jobs properly.
5. Our Dockerfile for Rust had a bug where it did not copy in the
`rust-toolchain.toml` file in the `chef` layer and thus also did not use
the correctly toolchain.
6. We remove the dedicated gradle cache because the build action already
comes with a cache configuration:
https://github.com/firezone/firezone/actions/runs/6416847209/job/17421412150#step:10:25

[^1]: Over time, this may mean that our caches grow a bit. In an ideal
world, we automatically remove files from the caches that haven't been
used in a while. The cache action we use for Rust does that
automatically:
https://github.com/Swatinem/rust-cache?tab=readme-ov-file#cache-details.
As a workaround, we can just purge all caches every now and then.

---------

Signed-off-by: Jamil <jamilbk@users.noreply.github.com>
Co-authored-by: Jamil <jamilbk@users.noreply.github.com>
2023-10-05 06:26:56 -07:00
Jamil
80234f9c71 Github Actions cache on main and scope caches for all languages/runtimes (#2233) 2023-10-04 17:29:04 -07:00
Thomas Eizinger
464efbad56 refactor(connlib): restructure directory for consistency (#2236) 2023-10-05 09:52:35 +11:00
Jamil
cd5a57f413 Update tokio-tungstenite to fix webpki vuln (#2181)
Fixes https://github.com/firezone/firezone/security/dependabot/75
Fixes https://github.com/firezone/firezone/security/dependabot/72
2023-10-02 19:35:42 +00:00
Jamil
c4c6f3e4ca refactor(portal): Don't pin session token to user_agent or remote_ip (#2195)
Removing the check to get Rust PRs to pass.

**Note**: #2182 was dependent on this one, and has since merged into
this one.
2023-09-30 07:40:57 -07:00
Jamil
72044cc065 refactor(android): Make app links more robust in the emulator (#2188)
Getting some weird behavior with AppLinks. They don't seem to work upon
first use and require a few tries to function correctly.

Edit: Found the issue: Android Studio doesn't like when the Manifest
contains variables for AppLinks. I added a note in the Manifest.

@conectado To test Applinks are working correctly, you can use the App
Link Assistant:

<img width="930" alt="Screenshot 2023-09-28 at 11 15 11 PM"
src="https://github.com/firezone/firezone/assets/167144/e4bd4674-d562-44ec-bdb8-3a5f97250b84">

Then from there you can click "Test App Links":

<img width="683" alt="Screenshot 2023-09-28 at 11 15 30 PM"
src="https://github.com/firezone/firezone/assets/167144/f3dc8e0d-f58a-4a4b-9855-62472096dc9e">
2023-09-29 18:09:04 +00:00
Jamil
a98f30a8dd fix(ci): Fix flaky integration tests (#2190) 2023-09-29 01:12:29 -07:00