In the `snownet` integration branch, we ran into some problems because
we actually tried to use the IPv6 relay. This doesn't work though
because the docker-compose doesn't provide an IPv6 socket to the
container and thus the relay falsely registers with the portal as having
an IPv6 address.
Internally, we only bind to a wildcard address (`0.0.0.0` and `::`)
which unfortunately, doesn't seem to fail, even if we don't have an IPv6
interface.
Test basic connectivity with the headless client after the portal API
restarts.
Based on top of #3364 to test that portal restarts don't cause a
cascading failure.
Currently, only the gateway has a reconnect logic for (transient) errors
when connecting to the portal. Instead of duplicating this for the
relay, I moved the reconnect state machine to `phoenix-channel`. This
means the relay now automatically gets it too and in the future, the
clients will also benefit from it.
As a nice benefit, this also greatly simplifies the gateway's
`Eventloop` and removes a bunch of cruft with channels.
Resolves: #2915.
Getting IPv6-related timeouts and flakiness. It's disabled for the
testbed and the connection tests so following suit here since we don't
have tests that use IPv6.
- [x] Introduce api_client actor type and code to create and
authenticate using it's token
- [x] Unify Tokens usage for Relays and Gateways
- [x] Unify Tokens usage for magic links
Closes#2367
Ref #2696
- [x] make sure that session cookie for client is stored separately from
session cookie for the portal (will close#2647 and #2032)
- [x] #2622
- [ ] #2501
- [ ] show identity tokens and allow rotating/deleting them (#2138)
- [ ] #2042
- [ ] use Tokens context for Relays and Gateways to remove duplication
- [x] #2823
- [ ] Expire LiveView sockets when subject is expired
- [ ] Service Accounts UI is ambiguous now because of token identity and
actual token shown
- [ ] Limit subject permissions based on token type
Closes#2924. Now we extend the lifetime for client tokens, but not for
browsers.
This PR changes the protocol and adds support for DNS subdomains, now
when a DNS resource is added all its subdomains are automatically
tunneled too. Later we will add support for `*.domain` or `?.domain` but
currently there is an Apple split tunnel implementation limitation which
is too labor-intensive to fix right away.
Fixes#2661
Co-authored-by: Andrew Dryga <andrew@dryga.com>
Why:
* As sites are created, the default behavior right now is to route
traffic through whichever path is easiest/fastest. This commit adds the
ability to allow the admin to choose a routing policy for a given site.
## Changelog
- Updates connlib parameter API_URL (formerly known under different
names as `CONTROL_PLANE_URL`, `PORTAL_URL`, `PORTAL_WS_URL`, and
friends) to be configured as an "advanced" or "hidden" feature at
runtime so that we can test production builds on both staging and
production.
- Makes `AUTH_BASE_URL` configurable at runtime too
- Moves `CONNLIB_LOG_FILTER_STRING` to be configured like this as well
and simplifies its naming
- Fixes a timing attack bug on Android when comparing the `csrf` token
- Adds proper account ID validation to Android to prevent invalid URL
parameter strings from being saved and used
- Cleans up a number of UI / view issues on Android regarding typos,
consistency, etc
- Hides vars from from the `relay` CLI we may not want to expose just
yet
- `get_device_id()` is flawed for connlib components -- SMBios is rarely
available. Data plane components now require a `FIREZONE_ID` now instead
to use for upserting.
Fixes#2482Fixes#2471
---------
Signed-off-by: Jamil <jamilbk@users.noreply.github.com>
Co-authored-by: Gabi <gabrielalejandro7@gmail.com>
- rebuild and publish gateway and relay binaries to currently drafted
release
- re-tag current relay/gateway images and push to ghcr.io
Stacked on #2341 to prevent conflicts
Fixes#2223Fixes#2205Fixes#2202Fixes#2239
~~Still TODO: `arm64` images and binaries...~~ Edit: added via
`cross-rs`
Fixes#2363
* Rename `relay` package to `firezone-relay` so that binaries outputted
match the `firezone-*` cli naming scheme
* Rename `firezone-headless-client` package to `firezone-linux-client`
for consistency
* Add READMEs for user-facing CLI components (there will also be docs
later)
Upsides:
1. We don't need to maintain a separate repo and Dockerfile just for
Elixir image (permissions, runner labels, etc)
2. No need to push intermediate images to the container registry
3. No need to copy-paste alpine/erlang/elixir version and hashes from
`firezone/containers` to `elixir/dockerfile` every time they change
4. No need to cross-compile for local dev environments, better
experience building with slow internet connection
5. One command to test if our code works on our containers but a
different alpine/erlang/elixir version
Downsides:
1. Locally devs will need to compile Erlang at least once per version,
but the whole build takes ~6 minutes on my M1 Max. It also takes only 8
minutes on the free GitHub Actions runner without any cache.
2. Worse experience on slow machines
FYI: there is no performance penalty once we have cache layers, still
takes 30 seconds on CI.
This patch-set aims to make several improvements to our CI caching:
1. Use of registry as build cache: Pushes a separate image to our docker
registry at GCP that contains the cache layers. This happens for every
PR & main. As a result, we can restore from **both** which should make
repeated runs of CI on an individual PR faster and give us a good
baseline cache for new PRs from `main`. See
https://docs.docker.com/build/ci/github-actions/cache/#registry-cache
for details. As a nice side-effect, this allows us to use the 10 GB we
have on GitHub actions for other jobs.
2. We make better use of `restore-keys` by also attempting to restore
the cache if the fingerprint of our lockfiles doesn't match. This is
useful for CI runs that upgrade dependencies. Those will restore a cache
that is still useful although doesn't quite match. That is better[^1]
than not hitting the cache at all.
3. There were two tiny bugs in our Swift and Android builds:
a. We used `rustup show` in the wrong directory and thus did not
actually install the toolchain properly.
b. We used `shared-key` instead of `key` for the
https://github.com/Swatinem/rust-cache action and thus did not
differentiate between jobs properly.
5. Our Dockerfile for Rust had a bug where it did not copy in the
`rust-toolchain.toml` file in the `chef` layer and thus also did not use
the correctly toolchain.
6. We remove the dedicated gradle cache because the build action already
comes with a cache configuration:
https://github.com/firezone/firezone/actions/runs/6416847209/job/17421412150#step:10:25
[^1]: Over time, this may mean that our caches grow a bit. In an ideal
world, we automatically remove files from the caches that haven't been
used in a while. The cache action we use for Rust does that
automatically:
https://github.com/Swatinem/rust-cache?tab=readme-ov-file#cache-details.
As a workaround, we can just purge all caches every now and then.
---------
Signed-off-by: Jamil <jamilbk@users.noreply.github.com>
Co-authored-by: Jamil <jamilbk@users.noreply.github.com>
Getting some weird behavior with AppLinks. They don't seem to work upon
first use and require a few tries to function correctly.
Edit: Found the issue: Android Studio doesn't like when the Manifest
contains variables for AppLinks. I added a note in the Manifest.
@conectado To test Applinks are working correctly, you can use the App
Link Assistant:
<img width="930" alt="Screenshot 2023-09-28 at 11 15 11 PM"
src="https://github.com/firezone/firezone/assets/167144/e4bd4674-d562-44ec-bdb8-3a5f97250b84">
Then from there you can click "Test App Links":
<img width="683" alt="Screenshot 2023-09-28 at 11 15 30 PM"
src="https://github.com/firezone/firezone/assets/167144/f3dc8e0d-f58a-4a4b-9855-62472096dc9e">
Basically we were having a panic inside a panic before, when I tried to
drop the runtime in `on_disconnect` since you can't drop a runtime
within a runtime. This PR spawns a new thread that listen for
disconnection and stops the runtime right there.
This also fixes the timer for reconnections.
Note: That I first stop it and the drop it which is redundant but I
rather be safe :)
This PR allows the TURN allocation binding to be optionally configured
by `TURN_LOWEST_PORT` and `TURN_HIGHEST_PORT` environment variables.
This will allow client app developers to test their apps against a
fully-working local development cluster in Docker Desktop for
Linux/macOS/Windows, allowing us to remove the PortalMock, Connlib Mock,
and SwiftMock codepaths entirely.
cc @roop @pratikvelani
Previously, we required the user to specify a `LISTEN_IP4_ADDR` and/or a
`LISTEN_IP6_ADDR` parameter. This is cumbersome because dynamically
fetching the address of the local interface is not trivial in all
environments.
We remove this parameter in exchange for listening on all interfaces.
This is a trade-off. The relay will now listen on all interfaces, even
the ones not exposed to the public internet. This is true for the main
socket on port 3478 and for all created allocations. Actually relaying
data relies on the 4-tuple of a "connection", i.e. the source and
destination address and port. Technically, I think it is possible with
this change to send traffic to a relay via an interface that was not
intended to be used for that. I think this will still require spoofing
the source address which is a known and accepted problem.
It is still recommended that operators put appropriate firewall rules in
place to not allow ingress traffic on any interface other than the one
intended for relaying.
I've tested locally that we are correctly using the `IPV6_ONLY` flag. In
other words, a relay listening on the `0.0.0.0` wildcard interface will
not accept IPv6 traffic and vice versa.
Resolves#1886.
This PR should fix the way we handle the `length` field in the
`DataChannel` messages, previous to this fix relaying data (using the
`webrtc-rs` crate) was impossible)
The new way to handle this is if the actual message is bigger than what
this data field says we ignore the extra bytes (which I think is the
correct way to do it according to spec)
Also, I added an integration test to verify relay messages using
`iptables`, not the cleanest way to do it but the easiest, in this vein
I tried to fix the caching for rust containers since 2 integration test
in our current state would take ~20 minutes each.
Previously, I thought it might be helpful to refuse a insecure
connections to the portal unless the user explicitly opts-in to this. In
our CI and testing environment, this however proved to cause more
headaches than it helps.
This PR removes this flag and assumes that users are smart enough that
they should protect self-hosted portals with transport-level encryption.