Commit Graph

75 Commits

Author SHA1 Message Date
Andrew Dryga
ed5437c881 security(portal): Rework auth tokens (#2696)
- [x] make sure that session cookie for client is stored separately from
session cookie for the portal (will close #2647 and #2032)
- [x] #2622
- [ ] #2501
- [ ] show identity tokens and allow rotating/deleting them (#2138)
- [ ] #2042
- [ ] use Tokens context for Relays and Gateways to remove duplication
- [x] #2823
- [ ] Expire LiveView sockets when subject is expired
- [ ] Service Accounts UI is ambiguous now because of token identity and
actual token shown
- [ ] Limit subject permissions based on token type

Closes #2924. Now we extend the lifetime for client tokens, but not for
browsers.
2024-01-09 13:36:21 -06:00
Gabi
5edfe80eb0 connlib: tune disconnect parameters (#2977)
Should fix #2946 (still testing, trying to reproduce the error reported
in the issue)
2023-12-21 19:37:07 +00:00
Gabi
8e34457340 Add support for DNS sudomains (#2735)
This PR changes the protocol and adds support for DNS subdomains, now
when a DNS resource is added all its subdomains are automatically
tunneled too. Later we will add support for `*.domain` or `?.domain` but
currently there is an Apple split tunnel implementation limitation which
is too labor-intensive to fix right away.

Fixes #2661 

Co-authored-by: Andrew Dryga <andrew@dryga.com>
2023-12-08 00:16:42 -05:00
bmanifold
ef480e1acd Add routing option for sites (#2610)
Why:

* As sites are created, the default behavior right now is to route
traffic through whichever path is easiest/fastest. This commit adds the
ability to allow the admin to choose a routing policy for a given site.
2023-11-22 19:59:54 +00:00
Gabi
aec5b97012 Add performance tests for client-gateway communication (#2655) 2023-11-17 00:32:34 -06:00
Jamil
2bca378f17 Allow data plane configuration at runtime (#2477)
## Changelog

- Updates connlib parameter API_URL (formerly known under different
names as `CONTROL_PLANE_URL`, `PORTAL_URL`, `PORTAL_WS_URL`, and
friends) to be configured as an "advanced" or "hidden" feature at
runtime so that we can test production builds on both staging and
production.
- Makes `AUTH_BASE_URL` configurable at runtime too
- Moves `CONNLIB_LOG_FILTER_STRING` to be configured like this as well
and simplifies its naming
- Fixes a timing attack bug on Android when comparing the `csrf` token
- Adds proper account ID validation to Android to prevent invalid URL
parameter strings from being saved and used
- Cleans up a number of UI / view issues on Android regarding typos,
consistency, etc
- Hides vars from from the `relay` CLI we may not want to expose just
yet
- `get_device_id()` is flawed for connlib components -- SMBios is rarely
available. Data plane components now require a `FIREZONE_ID` now instead
to use for upserting.


Fixes #2482 
Fixes #2471

---------

Signed-off-by: Jamil <jamilbk@users.noreply.github.com>
Co-authored-by: Gabi <gabrielalejandro7@gmail.com>
2023-10-30 23:46:53 -07:00
Andrew Dryga
98383e8622 Introduce Sites (#2516)
Closes #2513
2023-10-27 13:10:36 -06:00
Andrew Dryga
34cb88f5af Fix cache registry references 2023-10-25 13:52:50 -06:00
Jamil
fa57d66965 Publish Releases (#2344)
- rebuild and publish gateway and relay binaries to currently drafted
release
- re-tag current relay/gateway images and push to ghcr.io

Stacked on #2341 to prevent conflicts

Fixes #2223 
Fixes #2205 
Fixes #2202
Fixes #2239 

~~Still TODO: `arm64` images and binaries...~~ Edit: added via
`cross-rs`
2023-10-20 14:20:43 -07:00
Jamil
573124bd2f Document relay gateway client CLIs (#2424)
Fixes #2363 

* Rename `relay` package to `firezone-relay` so that binaries outputted
match the `firezone-*` cli naming scheme
* Rename `firezone-headless-client` package to `firezone-linux-client`
for consistency
* Add READMEs for user-facing CLI components (there will also be docs
later)
2023-10-19 00:59:17 +00:00
Jamil
6ec10b2669 Revert "Fix/website mdx" (#2434)
Reverts firezone/firezone#2433
2023-10-18 11:42:54 -07:00
Jamil
caef531b17 Fix/website mdx (#2433) 2023-10-18 11:42:18 -07:00
Andrew Dryga
0aab4077f8 Fix auth flow state, bump COS to 109, enable fluentbit logging, auto-remove docker registry artifacts (#2315) 2023-10-11 16:19:47 -06:00
Andrew Dryga
0eeefa03c7 Use postgres 15.2 in docker-compose (same as production) 2023-10-06 15:47:56 -06:00
Andrew Dryga
42bbafc04d Merge firezone/containers into elixir/Dockerfile for better reuse and maintainability (#2267)
Upsides:
1. We don't need to maintain a separate repo and Dockerfile just for
Elixir image (permissions, runner labels, etc)
2. No need to push intermediate images to the container registry
3. No need to copy-paste alpine/erlang/elixir version and hashes from
`firezone/containers` to `elixir/dockerfile` every time they change
4. No need to cross-compile for local dev environments, better
experience building with slow internet connection
5. One command to test if our code works on our containers but a
different alpine/erlang/elixir version

Downsides:
1. Locally devs will need to compile Erlang at least once per version,
but the whole build takes ~6 minutes on my M1 Max. It also takes only 8
minutes on the free GitHub Actions runner without any cache.
2. Worse experience on slow machines

FYI: there is no performance penalty once we have cache layers, still
takes 30 seconds on CI.
2023-10-06 15:34:47 -06:00
Andrew Dryga
a75e71ef7e Rename caches (#2255) 2023-10-05 10:01:15 -06:00
Thomas Eizinger
9a41983447 ci: optimize caching further (#2246)
This patch-set aims to make several improvements to our CI caching:

1. Use of registry as build cache: Pushes a separate image to our docker
registry at GCP that contains the cache layers. This happens for every
PR & main. As a result, we can restore from **both** which should make
repeated runs of CI on an individual PR faster and give us a good
baseline cache for new PRs from `main`. See
https://docs.docker.com/build/ci/github-actions/cache/#registry-cache
for details. As a nice side-effect, this allows us to use the 10 GB we
have on GitHub actions for other jobs.
2. We make better use of `restore-keys` by also attempting to restore
the cache if the fingerprint of our lockfiles doesn't match. This is
useful for CI runs that upgrade dependencies. Those will restore a cache
that is still useful although doesn't quite match. That is better[^1]
than not hitting the cache at all.
3. There were two tiny bugs in our Swift and Android builds:
a. We used `rustup show` in the wrong directory and thus did not
actually install the toolchain properly.
b. We used `shared-key` instead of `key` for the
https://github.com/Swatinem/rust-cache action and thus did not
differentiate between jobs properly.
5. Our Dockerfile for Rust had a bug where it did not copy in the
`rust-toolchain.toml` file in the `chef` layer and thus also did not use
the correctly toolchain.
6. We remove the dedicated gradle cache because the build action already
comes with a cache configuration:
https://github.com/firezone/firezone/actions/runs/6416847209/job/17421412150#step:10:25

[^1]: Over time, this may mean that our caches grow a bit. In an ideal
world, we automatically remove files from the caches that haven't been
used in a while. The cache action we use for Rust does that
automatically:
https://github.com/Swatinem/rust-cache?tab=readme-ov-file#cache-details.
As a workaround, we can just purge all caches every now and then.

---------

Signed-off-by: Jamil <jamilbk@users.noreply.github.com>
Co-authored-by: Jamil <jamilbk@users.noreply.github.com>
2023-10-05 06:26:56 -07:00
Jamil
80234f9c71 Github Actions cache on main and scope caches for all languages/runtimes (#2233) 2023-10-04 17:29:04 -07:00
Thomas Eizinger
464efbad56 refactor(connlib): restructure directory for consistency (#2236) 2023-10-05 09:52:35 +11:00
Jamil
cd5a57f413 Update tokio-tungstenite to fix webpki vuln (#2181)
Fixes https://github.com/firezone/firezone/security/dependabot/75
Fixes https://github.com/firezone/firezone/security/dependabot/72
2023-10-02 19:35:42 +00:00
Jamil
c4c6f3e4ca refactor(portal): Don't pin session token to user_agent or remote_ip (#2195)
Removing the check to get Rust PRs to pass.

**Note**: #2182 was dependent on this one, and has since merged into
this one.
2023-09-30 07:40:57 -07:00
Jamil
72044cc065 refactor(android): Make app links more robust in the emulator (#2188)
Getting some weird behavior with AppLinks. They don't seem to work upon
first use and require a few tries to function correctly.

Edit: Found the issue: Android Studio doesn't like when the Manifest
contains variables for AppLinks. I added a note in the Manifest.

@conectado To test Applinks are working correctly, you can use the App
Link Assistant:

<img width="930" alt="Screenshot 2023-09-28 at 11 15 11 PM"
src="https://github.com/firezone/firezone/assets/167144/e4bd4674-d562-44ec-bdb8-3a5f97250b84">

Then from there you can click "Test App Links":

<img width="683" alt="Screenshot 2023-09-28 at 11 15 30 PM"
src="https://github.com/firezone/firezone/assets/167144/f3dc8e0d-f58a-4a4b-9855-62472096dc9e">
2023-09-29 18:09:04 +00:00
Jamil
a98f30a8dd fix(ci): Fix flaky integration tests (#2190) 2023-09-29 01:12:29 -07:00
Andrew Dryga
9281b7fede Allow client logs and messages instrumentation (#2086)
Closes #2019
2023-09-18 15:03:51 -06:00
Gabi
7d0e0acfe9 fix(connlib): assorted fixes (#1953)
* single stack ipv6/ipv4
* set mtu for linux connlib
* add iperf3 resource on dev docker-compose

---------

Signed-off-by: Gabi <gabrielalejandro7@gmail.com>
Co-authored-by: Jamil <jamilbk@users.noreply.github.com>
2023-08-28 23:47:00 +00:00
Gabi
8621953fe6 connlib: fix how we handle disconnect (#1923)
Basically we were having a panic inside a panic before, when I tried to
drop the runtime in `on_disconnect` since you can't drop a runtime
within a runtime. This PR spawns a new thread that listen for
disconnection and stops the runtime right there.

This also fixes the timer for reconnections.

Note: That I first stop it and the drop it which is redundant but I
rather be safe :)
2023-08-23 00:35:11 +00:00
Jamil
bf2d794064 feat(relay): allow configuration for lowest and highest allocation port (#1921)
This PR allows the TURN allocation binding to be optionally configured
by `TURN_LOWEST_PORT` and `TURN_HIGHEST_PORT` environment variables.

This will allow client app developers to test their apps against a
fully-working local development cluster in Docker Desktop for
Linux/macOS/Windows, allowing us to remove the PortalMock, Connlib Mock,
and SwiftMock codepaths entirely.

cc @roop @pratikvelani
2023-08-18 13:04:26 -07:00
Thomas Eizinger
79a24ca9cf feat(relay): remove LISTEN_IPX_ADDR parameters (#1922)
Previously, we required the user to specify a `LISTEN_IP4_ADDR` and/or a
`LISTEN_IP6_ADDR` parameter. This is cumbersome because dynamically
fetching the address of the local interface is not trivial in all
environments.

We remove this parameter in exchange for listening on all interfaces.
This is a trade-off. The relay will now listen on all interfaces, even
the ones not exposed to the public internet. This is true for the main
socket on port 3478 and for all created allocations. Actually relaying
data relies on the 4-tuple of a "connection", i.e. the source and
destination address and port. Technically, I think it is possible with
this change to send traffic to a relay via an interface that was not
intended to be used for that. I think this will still require spoofing
the source address which is a known and accepted problem.

It is still recommended that operators put appropriate firewall rules in
place to not allow ingress traffic on any interface other than the one
intended for relaying.

I've tested locally that we are correctly using the `IPV6_ONLY` flag. In
other words, a relay listening on the `0.0.0.0` wildcard interface will
not accept IPv6 traffic and vice versa.

Resolves #1886.
2023-08-18 09:44:41 +00:00
Jamil
82e70411ae Fix incorrect IPv6 subnets used (#1917)
`fc00::/7` is the local prefix space.

Fixes #1819
2023-08-17 08:40:56 +00:00
Gabi
577ce43942 Gabi/fix relay expected message size (#1911)
This PR should fix the way we handle the `length` field in the
`DataChannel` messages, previous to this fix relaying data (using the
`webrtc-rs` crate) was impossible)

The new way to handle this is if the actual message is bigger than what
this data field says we ignore the extra bytes (which I think is the
correct way to do it according to spec)

Also, I added an integration test to verify relay messages using
`iptables`, not the cleanest way to do it but the easiest, in this vein
I tried to fix the caching for rust containers since 2 integration test
in our current state would take ~20 minutes each.
2023-08-16 20:29:51 +00:00
Andrew Dryga
65bd044ab0 Enable google_workspace auth on staging and locally 2023-08-11 17:56:56 -05:00
Andrew Dryga
eb7f856a0c Remove unused env variable 2023-08-09 01:21:43 -05:00
Andrew Dryga
9e17352fd6 Deploy relays (#1706)
Will finish once #1705 is merged and stable.

cc @thomaseizinger
2023-08-08 17:15:33 -05:00
Thomas Eizinger
b1c324a01e relay: remove --allow-insecure-ws flag (#1871)
Previously, I thought it might be helpful to refuse a insecure
connections to the portal unless the user explicitly opts-in to this. In
our CI and testing environment, this however proved to cause more
headaches than it helps.

This PR removes this flag and assumes that users are smart enough that
they should protect self-hosted portals with transport-level encryption.
2023-08-08 16:45:02 +00:00
Gabi
b563c7ad5a connlib: fix ipv6 (#1855)
Fixes some of the ipv6 handling.

Making this PR I also realized we need to update checksums on UDP and
TCP too, since we're mangling packets.
2023-08-04 03:17:35 +00:00
Thomas Eizinger
4a72e33378 feat: allocate IPv6 address for relay in docker-compose.yml (#1852) 2023-08-04 03:17:08 +00:00
Gabi
7ad2fb623a connlib: add client dns interception support (#1807) 2023-07-24 21:41:42 +00:00
Gabi
c817473aef Feat/connlib handle error messages (#1735)
With this PR we handle in the client an error message due to
gateway/relay although rate limiting is needed.

Waiting for #1729 to be merged.
2023-07-06 18:47:01 +00:00
Gabi
eb5fc34f35 CI: add a flow that test client to resource ping (#1729)
This PR fixes a bunch of small things to allow a new flow to test
clients pinging a resource within docker compose.

Masquerade/Forwarding is enabled directly in the container for now, this
might change in the future.

Also added a README to be able to run this locally.

---------

Signed-off-by: Gabi <gabrielalejandro7@gmail.com>
Co-authored-by: Jamil <jamilbk@users.noreply.github.com>
2023-07-05 03:17:26 +00:00
Gabi
8967b53170 Feat/connlib full flow (#1722)
With this PR the full control-plane message flow is working.

Meaning that if you do:

```
docker compose up -d
docker compose exec -it client "ping 172.20.0.2" # will fix this IP later
```

Messages start flowing to gateway. The gateway still not correctly
forwards the messages to the resource since masquerading is still not
working, although I suspect there might be an additional problem. Will
fix this in my next PR along with a README on how to test this whole
flow.

This PR also fixes how we sent the stamp secret to the gateway from the
relay, but I still see some warnings in the webrtc that I'm sure that
are due to a mismatch between how webrtc-rs and the relay handle
messages (The most important being `bind() failed: unexpected response
type`), I will take a look at that and a way to test that the flow works
when:
1. hole-punching is available
2. through relay when it's not
Since the flow right now works without hole-punching or relay since the
gateway is in the same network in the docker compose.
2023-07-03 19:25:37 +00:00
Andrew Dryga
710aa71778 Wait for client and gateway containers for api to become ready 2023-06-29 14:09:12 -06:00
Gabi
720b2f8cd9 Fix/docker compose up (#1705)
This PR fixes `docker compose up` but it doesn't have the test client ->
resource flow working but it prevent anything from erroring at startup.

This fixes:
* tokens (use the correct token for the client user agent we are using)
* randomize `name_suffix` at start up for connlib (we will eventually
allow options to set it manually)
* remove port ranges for relay (see firezone/product#613)
2023-06-28 18:48:33 +00:00
bmanifold
d5d39b9c35 CONTRIBUTING.md updates (#1704)
**Update CONTRIBUTING.md**

Why:

* The CONTRIBUTING.md doc seems to have fallen slightly out of date with
      how Firezone now works.  This commit updates the doc to provide a
quick start guide for getting all of the various Firezone components
up and running as quick as possible. The doc then links to the more
      specific `Elixir` and `Rust` README.md files in the respective
      directories to help developers who would like to contribute.
      
**Update docker-compose vault health check**

 Why:

* The current Vault health check listed in the docker-compose file does
not seem to be working when using `localhost` in the `wget` command.
      Updating the URL to use `127.0.0.1` seems to have fixed it.

---------

Signed-off-by: bmanifold <bmanifold@users.noreply.github.com>
Co-authored-by: Jamil <jamilbk@users.noreply.github.com>
2023-06-27 19:10:12 -07:00
Andrew Dryga
e7d5d0579b Authentication for the live app (#1674)
Co-authored-by: Jamil <jamilbk@users.noreply.github.com>
2023-06-27 13:11:36 -06:00
Gabi
b9bd34b5f1 docker: fix building for macos (#1700)
There are problems building the docker images in macos using musl due to
ring's problems therefore we started using slim-debian with glibc for
development.
2023-06-26 22:50:14 +00:00
Gabi
e9be4b9ef5 connlib: moves it to the main firezone library
This brindgs connlib from its own separated repo to firezone's monorepo.
    
 On top of bringing connlib we also add and unify the Dockerfile for all
 rust binaries and add a docker-compose that can run a headless client, a
 relay and a gateway which eventually will test the whole flow between a
 client and a resource. For this to work we also incorporated some elixir
 scripts to generate portal tokens for those components.
2023-06-23 16:39:58 -06:00
Andrew Dryga
e039f1919d Hotifx seeds and references (#1689) 2023-06-23 15:09:52 -06:00
Andrew Dryga
9083ab79aa Set correct outbound email in local env 2023-06-06 17:13:54 -06:00
Andrew Dryga
d9eb2d18df Deployment for the cloud version (#1638)
TODO:
- [x] Cluster formation for all API and web nodes
- [x] Injest Docker logs to Stackdriver
- [x] Fix assets building for prod

To finish later:
- [ ] Structured logging:
https://issuetracker.google.com/issues/285950891
- [ ] Better networking policy (eg. use public postmark ranges and deny
all unwanted egress)
- [ ] OpenTelemetry collector for Google Stackdriver
- [ ] LoggerJSON.Plug integration

---------

Signed-off-by: Andrew Dryga <andrew@dryga.com>
Co-authored-by: Jamil <jamilbk@users.noreply.github.com>
2023-06-06 15:03:26 -06:00
Andrew Dryga
37a2d7b7f5 Move elixir code to a subfolder (#1631) 2023-05-24 15:46:51 -06:00