Commit Graph

1001 Commits

Author SHA1 Message Date
Jamil
2b030d801d feat(android): Bundle GITHUB_SHA into Android client (#6405)
Closes #6400 


<img width="659" alt="Screenshot 2024-08-21 at 11 24 16 PM"
src="https://github.com/user-attachments/assets/c1240406-4dda-41df-a36e-1ed9e9b0895a">
2024-08-27 05:17:22 +00:00
Jamil
84a981f668 refactor(ci): Remove browser-based integration tests (#6435)
Fixes a new issue with puppeteer, chromium 128, and Alpine 3.20 that's
causing failing browser tests.

See more: https://github.com/puppeteer/puppeteer/issues/12189

Failure:

https://github.com/firezone/firezone/actions/runs/10549430305/job/29224528663?pr=6391

Unfortunately, puppeteer's embedded browser doesn't seem to want to run
in Alpine:


https://github.com/firezone/firezone/actions/runs/10563167497/job/29265175731?pr=6435#step:6:56


Fixing this is proving very difficult since we can't seem to use
puppeteer with the latest Alpine images, so I questioned the need to
have these in at all. These tests were added at a time where the DNS
mappings were brittle, so we wanted to verify that relayed and direct
connections held up as we deployed.

This is no longer the case, and we also now have much more unit test
coverage around these things, so given the pain of maintaining these
(and the lack of a current solution to the above), they are removed.

---------

Signed-off-by: Jamil <jamilbk@users.noreply.github.com>
2024-08-26 20:01:00 +00:00
Thomas Eizinger
095358dd4a ci: set GITHUB_TOKEN For cargo-binstall (#6420)
`install-action` uses `cargo-binstall` as a fallback. That binary
contacts GitHub which may run into rate-limit without being
authenticated. In that case, we will install manually which takes very
long.

Resolves: #6374.
2024-08-23 04:01:40 +00:00
Jamil
0994bd145a feat(apple): Build GITHUB_SHA into Apple clients (#6406)
Closes #6401 

<img width="1012" alt="Screenshot 2024-08-21 at 11 52 31 PM"
src="https://github.com/user-attachments/assets/3012d088-97cb-4a82-8a8f-b2a398865755">

![Screenshot 2024-08-22 at 12 05
44 AM](https://github.com/user-attachments/assets/5e1209f9-e8fa-4453-9bdd-9f40339649b4)
2024-08-22 20:49:57 +00:00
Jamil
c8eed59387 ci: Release 1.2.0 (#6395)
Releasing 1.2.0 to unblock portal deploy! Some of these have already
been published.
2024-08-22 00:18:27 +00:00
Thomas Eizinger
b2e8ccbb49 chore: delete snownet-tests (#6359)
When `snownet` was first being developed, these tests ensured that
hole-punching as well as connectivity via a relayed works correctly. We
have since added extensive tests that ensure connectivity works in many
scenarios via `tunnel_test`. `tunnel_test` does not (yet) have a
simulated NAT so hole-punching itself is not covered by that.

UDP hole-punching is shockingly trivial though because all you need to
do is send UDP packets to the same socket that the other party is
sending from. This isn't done by our own code but rather by str0m's
implement of ICE (as long as we add the correct candidates).

The `snownet-tests` themselves are quite fragile because they need to
set up their own event loop and manually construct an IP packet. They
haven't caught a single bug to my knowledge so I am proposing to delete
them for ease of maintenance.

For example, in
https://github.com/firezone/firezone/actions/runs/10449965474/job/28948590058?pr=6335
the tests fail because we no longer directly force a handshake when the
connection is established. This is unnecessary now because the buffered
intent packet will directly force a handshake from the client to the
gateway. Yet, `snownet-tests` event loop would need adjusting to also do
that.
2024-08-20 03:40:54 +00:00
Thomas Eizinger
504e823a02 ci: assert that we sample certain transitions (#6339)
It happened in the past that we screwed up the `preconditions` of the
state machine test such that no more transitions were sampled that
actually send packets. To protect against this, we use the newly
introduced logs and grep for certain transitions.

In the future, we can consider emitting a more structured output, like
writing all testcases to a DB and run more complex queries against it to
ensure that certain cases are covered.
2024-08-19 22:40:43 +00:00
Jamil
01552cfaec ci: Skip publish images workflow for gui client (#6240)
It doesn't make sense to run this workflow when publishing GUI clients.

Fixes https://github.com/firezone/firezone/actions/runs/10315532876
2024-08-19 04:24:55 +00:00
Thomas Eizinger
3b56664e02 test(rust): ensure deterministic proptests (#6319)
For quite a while now, we have been making extensive use of
property-based testing to ensure `connlib` works as intended. The idea
of proptests is that - given a certain seed - we deterministically
sample test inputs and assert properties on a given function.

If the test fails, `proptest` prints the seed which can then be added to
a regressions file to iterate on the test case and fix it. It is quite
obvious that non-determinism in how the test input gets generated is no
bueno and reduces the value we get out of these tests a fair bit.

The `HashMap` and `HashSet` data structures are known to be
non-deterministic in their iteration order. This causes non-determinism
during the input generation because we make use of a lot of maps and
sets to gradually build up the test input. We fix all uses of `HashMap`
and `HashSet` by replacing them with `BTreeMap` and `BTreeSet`.

To ensure this doesn't regress, we refactor `tunnel_test` to not make
use of proptest's macros and instead, we initialise and run the test
ourselves. This allows us to dump the sampled state and transitions into
a file per test run. In CI, we then run a 2nd iteration of all
regression tests and compare the sampled state and transitions with the
previous run. They must match byte-for-byte.

Finally, to discourage use of non-deterministic iteration, we ban the
use of the iteration functions on `HashMap` and `HashSet` across the
codebase. This doesn't catch iteration in a `for`-loop but it is better
than not linting against it at all.

---------

Signed-off-by: Thomas Eizinger <thomas@eizinger.io>
Co-authored-by: Reactor Scram <ReactorScram@users.noreply.github.com>
2024-08-16 23:15:58 +00:00
Thomas Eizinger
7c70850217 feat(connlib): allow glob patterns for matching domain names (#5901)
Currently, `connlib` can only handle "simple" DNS wildcards where `*`
matches any number of subdomains, including zero and `?` matches a
single subdomain.

With this PR, we expand `connlib'`s capabilities to allow for a much
more complex matching of domains that more closely resembles glob
patterns:

- `**` matches any number of subdomains. This supersedes the previous
`*` operator.
- `*` matches a single subdomain. This supersedes the previous `?`
operator.
- `?` matches a single character. This wasn't possible before.
- Additionally, any of these can be combined. Previously, only `*` or
`?` was allowed and they were only accepted at the front of the domain
name pattern.

Resolves: #5056.

---------

Signed-off-by: Thomas Eizinger <thomas@eizinger.io>
2024-08-15 01:30:53 +00:00
Jamil
296ca4ad4d ci: Bump Clients and Gateways to fix NAT / allocation issues (#6287)
Bump all Clients and Gateways due to #6265 being fixed.

---------

Co-authored-by: Not Applicable <ReactorScram@users.noreply.github.com>
2024-08-13 21:58:12 +00:00
Thomas Eizinger
eb91a052c3 chore(rust): group testing crates into a tests/ directory (#6257)
Resolves: #5695.

Signed-off-by: Reactor Scram <ReactorScram@users.noreply.github.com>
Co-authored-by: Reactor Scram <ReactorScram@users.noreply.github.com>
2024-08-12 17:17:01 +00:00
Thomas Eizinger
47a447c65a chore: prepare hotfix release for Tauri & headless clients (#6235) 2024-08-09 08:28:25 +00:00
Jamil
67ae8ff380 ci: publish Gateway 1.1.4 (#6228)
Publishes the `ENABLE_MASQUERADE` removal.
2024-08-09 03:45:26 +00:00
Jamil
a6ba9868dd ci: Revert bumps to 1.2 (#6227)
We need these at 1.1 until ready to release.
2024-08-08 18:34:39 -07:00
Jamil
096ddfe7c5 ci: bump gui/headless to 1.1.10 (#6221)
To publish the mpsc channel fix.

---------

Signed-off-by: Jamil <jamilbk@users.noreply.github.com>
Co-authored-by: Reactor Scram <ReactorScram@users.noreply.github.com>
2024-08-08 16:20:20 +00:00
Jamil
406426c59f fix(ci): Fix underscores / dashes typo from #6208 (#6212)
Fix underscores / dashes typo from #6208
2024-08-07 12:58:15 -07:00
Jamil
0c6cd4a804 fix(ci): Add http test server image specifiers to CI (#6208)
- Adds `http_test_server_image` to inputs so that it gets set properly
for CI (`debug`) and CD (`perf`)
- Updates `dev` -> `debug` in docker-compose.yml to fix pulls
- Fixes issue with seeds and relevant docs from #6205
2024-08-07 12:15:00 -07:00
Thomas Eizinger
662a73115a ci: use Google's DockerHub mirror (#6195)
DockerHub has pretty low rate limits [0] for pulling images: Only 100
pulls / 6h. This can stall our CI which pulls several (base) images.

To not hurt our velocity, use Google's public mirror [1].

[0]: https://www.docker.com/increase-rate-limits/.
[1]:
https://cloud.google.com/artifact-registry/docs/pull-cached-dockerhub-images
2024-08-07 05:20:47 +00:00
Jamil
51e0b61c9c chore: Bump all clients and gateway versions (#6149)
Includes major fixes https://github.com/firezone/firezone/pull/6143 and
https://github.com/firezone/firezone/pull/6117
2024-08-02 01:12:49 -07:00
Reactor Scram
23161ec840 chore(gui-client): release 1.1.8 (#6136)
Signed-off-by: Reactor Scram <ReactorScram@users.noreply.github.com>
2024-08-01 21:58:18 +00:00
Reactor Scram
78cca053a6 ci(client/tauri): upgrade pnpm from 8.x to 9.3 (#6114)
Closes #5859

The Git version was always showing `-modified` because the lockfile was
made by pnpm 9, and pnpm would modify it to work with pnpm 8.
2024-07-31 21:54:38 +00:00
Jamil
9d8a15ebee ci: Use the same version of buildx for building, tagging, and merging images (#6066)
In debugging https://firezone.statuspage.io/incidents/3vjmjmbh92mw, we
realized that we use potentially different versions of buildx. This PR
fixes that.
2024-07-30 15:05:09 +00:00
Thomas Eizinger
5687befc9d ci: use correct service name in docker-compose.yml (#6055)
The compose service I defined is called `otel` not `otlp`. With this fix
in place, the relay successfully connects to the OTLP exporter.

it is worthwhile noting that the connection to the OTLP exporter itself
is not critical for relay operation. Even if it fails, it won't affect
the actual data plane. I do think it makes sense to still have a working
OTLP exporter in the compose definition. As it makes it easier to test
whether the ingestion of metrics and traces works as expected.
2024-07-27 02:48:08 +00:00
Reactor Scram
6862213cc2 fix(headless-client/linux): only notify systemd that we're up after Resources are available (#6026)
Closes #5912

Before this, I had the `--exit` CLI flag and the `sd_notify` call
hanging off the wrong callback.
2024-07-26 18:53:08 +00:00
Thomas Eizinger
b2298392e6 ci: only post benchmark results on alerts (#6030)
It appears that the configuration via env variables doesn't work as
expected. This PR changes bencher's config to use commandline arguments.
With that, the `--branch-start-point` actually takes effect and copies
over the thresholds configured on bencher for the `main` branch.

With the thresholds in place, we can configure bencher to only alert us
if a threshold is exceeded and otherwise be quiet and not post a
comment.
2024-07-25 04:28:34 +00:00
Jamil
7034344334 ci: Sleep 3 seconds after upping services for integration tests (#6019)
Fixes #4921
2024-07-24 15:49:39 +00:00
Thomas Eizinger
7c8bbd550b test(connlib): introduce network latency to tunnel_test (#5948)
Currently, `tunnel_test` executes all actions within the same `Instant`,
i.e. time is never advanced by itself. The difficulty with advancing
time compared to other actions like sending packets is that all
time-related actions "overlap". In other words, all timers within
connlib advance at the same time. This makes it difficult to model the
expected behaviour after a certain amount of time has passed as we'd
effectively need to model all timers and their relation to particular
actions (like resending of connection intents or STUN requests).

Instead of only advancing time by itself, we can model some aspect of it
by introducing latency on network messages. This allows us to define a
range of an "acceptable" network latency within everything is expected
to work.

Whilst this doesn't cover all failure cases, it gives us a solid
foundation of parameters within which we should not expect any
operational problems.
2024-07-24 04:01:50 +00:00
Thomas Eizinger
a8aafc9e14 ci: use bencher.dev for continuous benchmarking (#5915)
Currently, we have a homegrown benchmark suite that reports results of
the iperf runs within CI by comparing a run on `main` with the current
branch.

These comments are noisy because they happen on every PR, regardless of
the performance results. As a result, they tend to be skimmed over by
devs and not actually considered. To properly track performance, we need
to record benchmark results over time and use statistics to detect
regressions.

https://bencher.dev does exactly that. it supports various benchmark
harnesses to automatically collect benchmarks. For our case, we simply
use the generic JSON adapter to extract the relevant metrics from the
iperf results and report them to the bencher backend.

With these metrics in place, bencher can plot the results over time, and
alert us in the case of regressions using thresholds based on
statistical tests.

Resolves: #5818.

---------

Signed-off-by: Thomas Eizinger <thomas@eizinger.io>
Co-authored-by: Reactor Scram <ReactorScram@users.noreply.github.com>
2024-07-24 01:22:17 +00:00
Thomas Eizinger
50d6b865a1 refactor(connlib): move Tun implementations out of firezone-tunnel (#5903)
The different implementations of `Tun` are the last platform-specific
code within `firezone-tunnel`. By introducing a dedicated crate and a
`Tun` trait, we can move this code into (platform-specific) leaf crates:

- `connlib-client-android`
- `connlib-client-apple`
- `firezone-bin-shared`

Related: #4473.

---------

Co-authored-by: Not Applicable <ReactorScram@users.noreply.github.com>
2024-07-24 01:10:50 +00:00
Thomas Eizinger
05e1f1e3d9 ci: add opentelemetry_sdk to dependabot group (#5982)
These need to be bumped in a group.
2024-07-23 23:53:58 +00:00
Andrew Dryga
50318ae1d2 chore(ci): Do not run terraform plan in PRs when there are no changes (#5964) 2024-07-23 09:18:49 -06:00
Reactor Scram
75529ea799 chore(rust): bump nightly version used for checking unused deps (#5918)
This version was a few months old and started throwing errors about
features that stabilized since then.

e.g.
https://github.com/firezone/firezone/actions/runs/10011089436/job/27673759249

```
error[E0658]: use of unstable library feature 'proc_macro_byte_character'
   --> /home/runner/.cargo/registry/src/index.crates.io-6f17d22bba15001f/proc-macro2-1.0.86/src/wrapper.rs:871:21
    |
871 |                     proc_macro::Literal::byte_character(byte)
    |                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    |
    = note: see issue #115268 <https://github.com/rust-lang/rust/issues/115268> for more information
    = help: add `#![feature(proc_macro_byte_character)]` to the crate attributes to enable
    = note: this compiler was built on 2024-03-25; consider upgrading it if it is out of date

error[E0658]: use of unstable library feature 'proc_macro_c_str_literals'
   --> /home/runner/.cargo/registry/src/index.crates.io-6f17d22bba15001f/proc-macro2-1.0.86/src/wrapper.rs:898:21
    |
898 |                     proc_macro::Literal::c_string(string)
    |                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    |
    = note: see issue #119750 <https://github.com/rust-lang/rust/issues/119750> for more information
    = help: add `#![feature(proc_macro_c_str_literals)]` to the crate attributes to enable
    = note: this compiler was built on 2024-03-25; consider upgrading it if it is out of date

For more information about this error, try `rustc --explain E0658`.
error: could not compile `proc-macro2` (lib) due to 2 previous errors
```
2024-07-19 22:32:51 +00:00
Reactor Scram
6b1b14dc2c chore(gui-client): release GUI Client 1.1.7 (#5897)
Signed-off-by: Reactor Scram <ReactorScram@users.noreply.github.com>
2024-07-17 22:23:43 +00:00
Thomas Eizinger
aa279d7731 ci: never tolerate warnings in Rust code (#5893)
Our Rust CI runs various jobs in different configurations of packages
and / or features. Currently, only the clippy job denies warnings which
makes it possible that some code still generates warnings under
particular configurations.

To ensure we always fail on warnings, we set a global env var to deny
warnings for all Rust CI jobs.

---------

Signed-off-by: Thomas Eizinger <thomas@eizinger.io>
Co-authored-by: Reactor Scram <ReactorScram@users.noreply.github.com>
2024-07-17 22:22:12 +00:00
Reactor Scram
a8ece49d9e chore: bump GUI to 1.1.6 (#5862)
I started a playbook for publishing GUI releases, I didn't see any other
one around.

I think there's a middle step I'm not clear on:

1. Open this PR and get it approved
2. Do something? Publish the draft release maybe? Run a special CI
workflow?
3. Merge this PR to update the changelog and bump the versions in Git

```[tasklist]
### Tasks
```
2024-07-12 18:45:56 +00:00
Thomas Eizinger
c92dd559f7 chore(rust): format Cargo.toml using cargo-sort (#5851) 2024-07-12 04:57:22 +00:00
Reactor Scram
64e0b71b77 feat(gui-client): set a different tray icon when signed out (#5817)
Closes #5810 

```[tasklist]
### Tasks
- [x] Try not to set the icon every time we change Resources
- [x] Get production icons
- [x] Add changelog comment
- [x] Add CI stress test that sets the icon 10,000 times
- [x] Open for review
- [x] Repair changelog
- [ ] Merge
```

---------

Signed-off-by: Reactor Scram <ReactorScram@users.noreply.github.com>
2024-07-11 20:50:44 +00:00
Reactor Scram
78f1c7c519 test(firezone-tunnel/windows): Test Windows upload speed in CI (#5607)
Closes #5601
It looks like we can hit 100+ Mbps in theory. This covers Wintun, Tokio,
and Windows OS overhead. It doesn't cover the cryptography or anything
in connlib itself.

The code is kinda messy but I'm not sure how to clean it up so I'll just
leave it for review.

This test should fail if there's any regressions in #5598.

It fails if any packet is dropped or if the speed is under 100 Mbps

```[tasklist]
### Tasks
- [x] Use `ip_packet::make`
- [x] Switch to `cargo bench`
- [x] Extract windows ARM PR
- [x] Clean up wintun.dll install code
- [x] Re-request review
```
2024-07-10 19:09:45 +00:00
Jamil
446d24a761 ci: Fix scoping dialyzer cache to elixir version (#5825)
This fixes a CI bug where the dialyzer cache was not being scoped to the
elixir version, causing cache issues that fail CI jobs.

This also performs some tidying up of the cache key to scope it by
runner arch too for elixir deps, and make clear what the cache key
references.

https://github.com/firezone/firezone/actions/runs/9877195625
2024-07-10 18:01:32 +00:00
Thomas Eizinger
79b14d4399 ci: don't build optimised Rust tests (#5805)
In #5786, we massively increase the performance of `tunnel_test` and
thus, it is no longer necessary to build all tests using optimisation
level 1. Windows is very slow in compiling Rust and forcing it to
compile with optimisations doesn't help that.

On `main`, the compile phase takes ~ **8min**:
https://github.com/firezone/firezone/actions/runs/9847792756/job/27188488313#step:5:968

With this patch, the compile phase takes ~**6min**:
https://github.com/firezone/firezone/actions/runs/9849448280/job/27193128597?pr=5805#step:5:967
2024-07-09 13:17:07 +00:00
Jamil
ef3b4e5dfe feat(linux-gui): Bump GUI to 1.1.5 for arm64 support (#5800) 2024-07-08 21:58:10 -07:00
Jamil
cd1b46c8f5 fix(ci): Install GH CLI on arm runners (#5802)
`main` failure:

https://github.com/firezone/firezone/actions/runs/9847918080/job/27190842443

Opened an issue:
https://github.com/actions/runner-images/issues/10192

gh cli instructions:

https://github.com/cli/cli/blob/trunk/docs/install_linux.md#debian-ubuntu-linux-raspberry-pi-os-apt
2024-07-09 02:56:24 +00:00
Thomas Eizinger
9caca475dc test(connlib): introduce routing table to tunnel_test (#5786)
Currently, `tunnel_test` uses a rather naive approach when dispatching
`Transmit`s. In particular, it checks client, gateway and relay
separately whether they "want" a certain packet. In a real network,
these packets are routed based on their IP.

To mimic something similar, we introduce a `Host` abstraction that wraps
each component: client, gateway and relay. Additionally, we introduce a
`RoutingTable` where we can add and remove hosts. With these things in
place, routing a `Transmit` is as easy as looking up the destination IP
in the routing table and dispatching to the corresponding host.

Our hosts are type-safe: client, gateway and relay have different types.
Thus, we abstract over them using a `HostId` in order to know, which
host a certain message is for. Following these patches, we can easily
introduce multiple gateways and relays to this test by simply making
more entries in this routing table. This will increase the test coverage
of connlib.

Lastly, this patch massively increases the performance of `tunnel_test`.
It turns out that previously, we spent a lot of CPU cycles accessing
"random" IPs from very large iterators. With this patch, we take a
limited range of 100 IPs that we sample from, thus drastically
increasing performance of this test. The configured 1000 testcases
execute in 3s on my machine now (with opt-level 1 which is what we use
in CI).

---------

Signed-off-by: Thomas Eizinger <thomas@eizinger.io>
2024-07-09 01:48:54 +00:00
Reactor Scram
e0326be807 ci(gui-client/linux): see if we can build the GUI Client for ARM (#5793)
This would make it a little easier to replicate prod issues on old
releases

```[tasklist]
### Tasks
- [x] Add comment to changelog
- [x] Check Vercel preview
- [x] Request review
- [x] Update arches link
- [x] `apt-get update`
- [x] Re-request review
```
2024-07-08 21:30:48 +00:00
Jamil
1b7338e5c3 fix(website): fix sha of deployed portal (#5782)
Needs a storage key, not an env var to read.
2024-07-06 17:25:00 -07:00
Jamil
e39ce22b36 chore: Publish new linux/windows clients (#5767)
Adds the DNS fix.
2024-07-05 13:19:30 -07:00
Reactor Scram
d0f68fc133 test(gui-client): multi-process smoke test for GUI + IPC service (#5672)
```[tasklist]
### Tasks
- [x] Check the GUI saves its settings file
- [x] Check the IPC service writes the device ID to disk
- [x] Check the GUI writes a log file (skipped - we already check if the exported zip has any files in it)
- [x] Run the crash file through `minidump-stackwalk`
- [x] Reach feature parity with the original smoke tests
- [x] Ready for review
- [x] Finish #5452
- [ ] Start on #5453 
```
2024-07-04 21:10:31 +00:00
Jamil
086c730aaf chore: Bump clients to 1.1.2 for DNS record type forward (#5703)
Apps are already in review with App Stores
2024-07-04 01:31:26 +00:00
Jamil
3b0f54ec3c ci: Push infra images to ghcr.io (#5669)
Fixes #5447

---------

Signed-off-by: Jamil <jamilbk@users.noreply.github.com>
2024-07-03 19:36:06 +00:00