firezone

mirror of https://github.com/outbackdingo/firezone.git synced 2026-01-28 18:18:55 +00:00

Author	SHA1	Message	Date
Jamil	8e64a01f4a	chore(infra): Disable debug log for otel (#7874 ) In the relay's `cloud-init.yaml`, we've overridden the `telemetry` service log filter to be `debug`. This results in this log being printed to Cloud Logging every 1s, for _every_ relay: ``` 2025-01-26T23:00:35.066Z debug memorylimiter/memorylimiter.go:200 Currently used memory. {"kind": "processor", "name": "memory_limiter", "pipeline": "logs", "cur_mem_mib": 31} ``` These logs are consuming over half of our total log count, which accounts for over half our Cloud Monitoring cost -- the second highest cost in our GCP account. This PR removes the override so that the relay app has the same `otel-collector` log level as the Elixir, the default (presumably `info`).	2025-01-26 18:57:07 -08:00
Jamil	7b40282ebe	revert: pre-relay change for prod test (#7873 ) Doing another (hopefully final) reversion of staging from the prod setup to what we're after with respect to relay infra. Reverts firezone/firezone#7872	2025-01-26 14:50:49 -08:00
Jamil	fe343a9372	chore(infra): revert to pre-relay change for prod test (#7872 )	2025-01-26 14:02:53 -08:00
Jamil	d96276e1ac	fix(infra): Use naming_suffix in instance_group_manager (#7871 ) Google still had lingering Relay instance groups and subnets around from a previous deployment that were deleted in the UI and gone, but then popped back up. Theoretically, the instance groups should be deleted because there is no current Terraform config matching them. This change will ensure that instance groups also get rolled over based on the naming suffix introduced in #7870. Related: #7870	2025-01-26 12:10:34 -08:00
Jamil	0454fb173d	refactor(infra): Ensure network names unique (#7870 ) Turns out subnets need to have globally unique names as well. This PR updates the instance-template, VPC, and subnet names to append an 8-character random string. This random string "depends on" the subnet IP range configuration specified above, so that if we change that in the future, causing a network change, the naming will change as well. Lastly, this random_string is also passed to the `relays` module to be used in the instance template name prefix. While that name does _not_ need to be globally unique, the `instance_template` needs to be rolled over if the subnets change, because otherwise it will contain a network interface that is linked to both old and new subnets and GCP will complain about that. Reverts: firezone/firezone#7869	2025-01-26 08:16:23 -08:00
Jamil	1826700b89	revert: re-apply Relay region changes (#7869 ) Reverts firezone/firezone#7868	2025-01-26 06:46:24 -08:00
Jamil	0805e87016	chore(infra): re-apply Relay region changes (#7868 ) Reverts firezone/firezone#7835 in order to test how this will be applied to prod. If this goes through fine, we should be ok for a prod rollout.	2025-01-26 06:13:26 -08:00
Jamil	90f445a971	chore(infra): Revert relay regions to test prod-like deploy (#7835 ) Since we know we now have the Relay configuration we want (and works), this PR rolls back staging to how it was pre-Relay region changes, so we can test that a single `terraform apply` on prod will deploy without any errors.	2025-01-25 17:05:06 +00:00
Jamil	aaea3bf537	revert(infra): Billing budget (PR #7836 ) (#7855 ) This is causing issues applying because our CI terraform IAM user doesn't have the `Billing Account Administrator` role. Rather than granting such a sensitive role to our CI pipeline, I'm suggesting we create the billing budget outside the scope of the terraform config tracked in this repo. If we want it to be tracked as code, I would propose maybe we have a separate (private) repository with a separate token / IAM permissions that we can monitor separately. For the time being, I'll plan to manually create this budget in the UI. Reverts: #7836	2025-01-24 06:53:47 +00:00
Jamil	c913086dbe	feat(infra): Add billing budget alerts to infra (#7836 ) To help prevent surprises with unexpected cloud bills, we add a billing budget amount that will trigger when the 50% threshold is hit. The exact amount is considered secret and is set via variables that are already added in HCP staging and prod envs.	2025-01-23 19:19:36 +00:00
Jamil	dca9645adf	chore(infra): Remove unused tf vars (#7803 ) These were leftover from #7737 and friends.	2025-01-22 05:32:28 +00:00
Jamil	0a1cd92c00	fix(infra): Rotate naming to taint old Relay instances (#7739 ) The Relay instance template is sticking around because none of its inputs have changed, so we bump its name.	2025-01-12 21:34:18 -08:00
Jamil	5dd640daa8	fix(infra): Define Relay subnets outside of Relays module (#7736 ) Even after all of the changes made to make the subnets update properly in the Relays module, it will always fail because of these two facts combined: - lifecycle is `create_before_destroy` - GCP instance group template binds a network interface on a per-subnet basis and this cannot be bound to both old and new subnet. The fix for this would be to create a new instance group manager on each deploy Rather than needlessly roll over the relay networks on each deploy, since they're not changing, it would make more sense to define them outside of the Relays module so that they aren't tainted by code changes. This will prevent needless resource replacement and allow for the Relay module to use them as-is.	2025-01-12 19:04:44 -08:00
Jamil	03d81ed2df	fix(infra): Fix subnet numbering across all regions (#7734 ) #7733 fixed the randomness generation, but didn't fix the numbering. According to [GCP docs](https://cloud.google.com/vpc/docs/subnets), we can use virtually any RFC 1918 space for this. This PR updates our numbering scheme to use the `10.128.0.0/9` space for Relay subnets and changes the elixir app to use `10.2.2.0/20` to prevent collisions.	2025-01-12 16:33:03 -08:00
Jamil	e9a120c272	fix(infra): Rotate random vars on each image version (#7733 )	2025-01-12 14:22:14 -08:00
Jamil	d6d0d78bda	chore(infra): Use `numeric` instead of `number` (#7731 ) `number` is deprecated for the built-in `random_string` resource.	2025-01-12 13:09:29 -08:00
Jamil	ba5b8ed3f5	fix(infra): Use computed cidrsubnet for Relays (#7730 ) When a Relay's instances are updated / changed, the contained subnetwork's `name` and `ip_cidr_range` need to be updated to something else because we are using the `create_before_destroy` lifecycle configuration for the Relays module. To fix this, we need to make sure that when recreating Relays, we use a unique `name` and `ip_cidr_range` for the new instances so as not to conflict with existing ones. To handle this, we use a computed state-tracked value for `ip_cidr_range` that will automatically adjust to the number of Relay regions we have and it will be incremented each time the Relays are recreated. Then we update the `name` to include this range to ensure we never have a subnet name that conflicts with an existing one.	2025-01-12 12:22:39 -08:00
Jamil	45bfe0f2a3	chore(infra): Deny connections from US-sanctioned countries with HTTP 403 (#7462 ) Implementing the remainder of the legally required block. Will be applied on Dec 9th, as we notified customers.	2024-12-06 20:26:30 +00:00
Jamil	3a62709c77	docs: Add restricted regions docs (#7395 ) This will be referred to when we make our email announcement.	2024-11-24 17:20:06 +00:00
Jamil	5437c3e2df	fix(infra): Block signups if expression matches (#7337 )	2024-11-13 21:29:47 +00:00
Jamil	6f7f6a4f34	style: Enforce code style across all supported languages using Prettier (#7322 ) This ensure that we run prettier across all supported filetypes to check for any formatting / style inconsistencies. Previously, it was only run for files in the website/ directory using a deprecated pre-commit plugin. The benefit to keeping this in our pre-commit config is that devs can optionally run these checks locally with `pre-commit run --config .github/pre-commit-config.yaml`. --------- Signed-off-by: Jamil <jamilbk@users.noreply.github.com> Co-authored-by: Thomas Eizinger <thomas@eizinger.io>	2024-11-13 00:19:15 +00:00
Jamil	fa40d6e852	fix(infra): Adjust rule to total_latencies from backend_latencies (#7323 ) This is the check that Oneleet is expecting.	2024-11-12 21:30:28 +00:00
Jamil	f40528f8f0	chore(infra): Relax load balancer to app latency alert to 3s (#7317 ) 1000ms is a little too agressive here. The latency is measured from load balancer, which are global, to our app servers, which are in us-east1.	2024-11-12 05:44:05 +00:00
Brian Manifold	50ba752d30	fix(infra): Update gcp cloud armor rules (#7293 ) The expression for one of the rules was not able to be applied due to invalid characters (`\n`) and even once the invalid characters were removed there is a limit of 5 subexpressions, but the previous expression contained 10. Along with the expression change, the `deny(451)` is not allowed. The only `deny` codes allowed are `403`, `404`, `502`	2024-11-09 15:09:16 +00:00
Jamil	83dfd3a98c	fix(infra): Don't use macros for Cloud armor (#7285 ) Fixes #6807 Follow up to #7282	2024-11-06 21:06:21 -08:00
Jamil	1bd9a3e134	fix(infra): Use proper common expression language syntax (#7282 ) https://github.com/firezone/firezone/actions/runs/11713228570/job/32626046819 Language reference: https://github.com/google/cel-spec/blob/master/doc/langdef.md#macros	2024-11-06 23:59:34 +00:00
Andrew Dryga	0a79cd5045	chore(portal): Do not allow signing up from legally-restricted jurisdictions (#7088 ) Related to #6807 --------- Co-authored-by: Jamil <jamilbk@users.noreply.github.com>	2024-11-06 22:40:20 +00:00
Jamil	2825522844	fix(infra): Filter out WebSocket upgrade from latency alerting (#7242 )	2024-11-02 15:43:49 -07:00
Jamil	e9db936c0f	feat(infra): Add Google load balancer latency alert (#7231 ) Oneleet has a new monitor failing that suggests adding this. https://app.oneleet.com/tenants/148d888b-6cbe-4198-b4be-359e816927f4/monitors/9ad764bf-147b-4b87-bee8-f825ea9e0adc	2024-11-01 15:57:32 +00:00
James Winegar	733ada2a26	Add RUST_LOG to cloud_init.yaml for google-cloud gateway RIG (#6736 ) Signed-off-by: James Winegar <jameswinegar@users.noreply.github.com>	2024-09-17 15:02:34 -06:00
Andrew Dryga	c57a670dbb	fix(devops): Create SSL certs before destroy (#6607 )	2024-09-05 14:35:16 -07:00
Andrew Dryga	2ae5f921c8	fix(portal): Disable IP check for browser session tokens (#6598 ) This PR reverts commit that moves out IPv6 address to a separate subdomain (deploying that will cause a prod downtime) and simply removes the check that causes redirect loops.	2024-09-05 11:07:40 -07:00
Andrew	14e3c379c1	Fix DNS cert replacement	2024-09-05 08:51:01 -07:00
Jamil	c581439ee2	fix(portal): Use `app-ipv6.firezone.dev` for IPv6 app to prevent websocket / http from using different stacks (#6522 ) Based on testing and research it does not appear that Chrome will reliably choose a consistent protocol stack for loading the initial web page as it does for connecting the WebSocket when connecting over VPN tunnels. If one or the other stacks experiences a slight delay or packet loss causing retransmission, or QUIC simply doesn't play nicely with the MTU (in our case 1280), it may fall back to IPv4 (which has less per-packet overhead) or even a TCP connection. Unfortunately this violates an assumption we have in token validation logic. Namely, that the remote_ip used to create the token (via sign in) is the same one used to the connect the WebSocket. I can see where this logic comes from in a security context, but thinking through the attack vector(s) that would be able to leverage this violation has me left wondering if this check is worth the breakage we currently face in #6511. - Scenario 1: MITM - attacker steals token somehow via MITM (would need to somehow break TLS) - the attacker is already in our network path and can rewrite the remote_ip already with his/her own. - Scenario 2: Malicious browser plugin stealing session token. It will be harder to spoof the remote IP in this case, but if this is a possibility, the plugin could presumably directly control the tab where the user is logged in. - Scenario 3: IdP is compromised leading to malicious redirect before arriving to Firezone - if this is the case, the user could likely login in directly and create his/her own valid session token anyhow. Perhaps I'm missing other scenarios, open to feedback. If we want to ensure the token used by the websocket originated from the same browser as it was minted from, perhaps we could generate a small random key, save it in local storage, and send that in a header when connecting the WebSocket. I think cookies handle that for us already though. Fixes #6511	2024-09-04 07:28:14 +00:00
Thomas Eizinger	93d678aaea	feat(relay): set OTEL metadata for metrics and traces (#6249 ) I recently discovered that the metrics reporting to Google Cloud Metrics for the relays is actually working. Unfortunately, they are all bucketed together because we don't set the metadata correctly. This PR aims to fix that be setting some useful default metadata for traces and metrics and additionally, discoveres instance ID and name from GCE metadata. Related: #2033.	2024-08-10 16:32:01 +00:00
Andrew Dryga	ba71d651d9	chore(infra): Silence alerts from OTEL Finch integration (#6188 )	2024-08-07 10:26:51 -06:00
Thomas Eizinger	94527f9fa1	fix(gateway): always masquerade for docker-deployed gateways (#6169 ) Without masquerading, packets sent by the gateway through the TUN interface use the wrong source address (the TUN device's address) instead of the gateway's actual network interface. We set this env variable in all our uses of the gateway, thus we might as well remove it and always perform unconditionally. --------- Signed-off-by: Thomas Eizinger <thomas@eizinger.io> Co-authored-by: Reactor Scram <ReactorScram@users.noreply.github.com>	2024-08-07 03:00:50 +00:00
Reactor Scram	5eb2bba47b	feat(headless-client): use `systemd-resolved` DNS control by default (#6163 ) Closes #5063, supersedes #5850 Other refactors and changes made as part of this: - Adds the ability to disable DNS control on Windows - Removes the spooky-action-at-a-distance `from_env` functions that used to be buried in `tunnel` - `FIREZONE_DNS_CONTROL` is now a regular `clap` argument again --------- Signed-off-by: Reactor Scram <ReactorScram@users.noreply.github.com>	2024-08-06 18:16:51 +00:00
Andrew Dryga	823b3cb276	fix(infra): Resolve capacity issues during rollouts (#6007 ) I've managed to finally reserve enough e2 instances for our needs and also used e2 for gateways to workaround the quota issues. The `web` app still used n2 because quota doesn't allow additional n4's. Rollouts also fixed to not go over the reservations/quotas.	2024-07-23 19:58:29 -06:00
Andrew Dryga	0b6e3564f3	chore(infra): Deploy relay and portal to more zones and use more modern CPU arch (#5921 )	2024-07-19 15:15:28 -06:00
Jamil	ffe4d5f950	docs: fix references to AWS and Azure example modules (#5829 ) These are now published at https://www.github.com/firezone/terraform-aws-gateway and https://www.github.com/firezone/terraform-azurerm-gateway to match the unclear docs for registry module naming...	2024-07-11 16:10:12 +00:00
Jamil	ae87abacff	chore: move AWS firezone-gateway module to dedicated repo (#5816 ) Why: Managing the module from Terraform registry is simpler if our published module is in its own repo. See https://github.com/firezone/terraform-firezone-aws	2024-07-09 14:05:14 -07:00
Andrew	4037a7bdd3	Provision and read-only DB replica in Europe	2024-07-04 13:00:55 -06:00
Jamil	60d2a2befd	fix(infra): relay listens on UDP only (#5718 ) I don't believe we use/need TCP for the Relays. Better to keep the ports closed if so. Also, the docker-compose.yml is updated to allow the `relay-1` service to respond to all its ports, since we don't need those mapped typically.	2024-07-04 16:53:08 +00:00
Jamil	9ac9dedfb9	feat: Azure scalable Gateway module and docs (#5644 ) Resolves #2603	2024-07-03 07:16:56 +00:00
Jamil	fc8d89ea73	docs: Add AWS NAT Gateway example (#5543 ) - Adds the AWS equivalent of our GCP scalable NAT Gateway. - Adds a new kb section `/kb/automate` that will contain various automation / IaaC recipes going forward. It's better to have these guides in the main docs with all the other info. ~~Will update the GCP example in another PR.~~ Portal helper docs in the gateway deploy page will come in another PR after this is merged.	2024-06-27 21:05:38 -07:00
Jamil	e82a9506ab	fix(infra): use sensitive attribute for all secrets (#5562 ) Is there a reason not to mark these `sensitive`? https://developer.hashicorp.com/terraform/tutorials/configuration-language/sensitive-variables	2024-06-27 08:13:35 +00:00
Andrew Dryga	fa15e1568f	fix(portal): Use RESTRICTED SSL policy to remove weak cipher suites (#5358 )	2024-06-13 11:31:47 -06:00
Andrew Dryga	7fd8e66f7d	Enable flow logs and delete default network	2024-05-24 11:04:10 -06:00
Andrew Dryga	7c67c87422	Do not r/./- when deploying gateways	2024-05-13 14:43:25 -06:00

1 2 3 4

159 Commits