# Welcome to Elixir-land!
This README provides an overview for running and managing Firezone's
Elixir-based control plane.
## Running Control Plane for local development
You can use the [Top-Level Docker Compose](../docker-compose.yml) to start any
services locally. The `web` and `api` compose services are built application
releases that are pretty much the same as the ones we run in production, while
the `elixir` compose service runs raw Elixir code, without a built release.
This means you'll want to use the `elixir` compose service to run Mix tasks and
any Elixir code on-the-fly, but you can't do that in `web`/`api` so easily
because Elixir strips out Mix and other tooling
[when building an application release](https://hexdocs.pm/mix/Mix.Tasks.Release.html).
`elixir` additionally caches `_build` and `node_modules` to speed up compilation
time and syncs `/apps`, `/config` and other folders with the host machine.
```bash
# Make sure to run this every time code in elixir/ changes,
# docker doesn't do that for you!
❯ docker-compose build
# Create the database
#
# Hint: you can run any mix commands like this,
# eg. mix ecto.reset will reset your database
#
# Also to drop the database you need to stop all active connections,
# so if you get an error stop all services first:
#
# docker-compose down
#
# Or you can just run both reset and seed in one-liner:
#
# docker-compose run elixir /bin/sh -c "cd apps/domain && mix do ecto.reset, ecto.seed"
#
❯ docker-compose run elixir /bin/sh -c "cd apps/domain && mix ecto.create"
# Ensure database is migrated before running seeds
❯ docker-compose run api bin/migrate
# or
❯ docker-compose run elixir /bin/sh -c "cd apps/domain && mix ecto.migrate"
# Seed the database
# Hint: some access tokens will be generated and written to stdout,
# don't forget to save them for later
❯ docker-compose run api bin/seed
# or
❯ docker-compose run elixir /bin/sh -c "cd apps/domain && mix ecto.seed"
# Start the API service for control plane sockets while listening to STDIN
# (where you will see all the logs)
❯ docker-compose up api --build
```
Now you can verify that it's working by connecting to a websocket:
Gateway
```bash
# Note: The token value below is an example. The token value you will need is generated and printed out when seeding the database, as described earlier in the document.
❯ export GATEWAY_TOKEN_FROM_SEEDS=".SFMyNTY.g2gDaANtAAAAJGM4OWJjYzhjLTkzOTItNGRhZS1hNDBkLTg4OGFlZjZkMjhlMG0AAAAkMjI3NDU2MGItZTk3Yi00NWU0LThiMzQtNjc5Yzc2MTdlOThkbQAAADhPMDJMN1VTMkozVklOT01QUjlKNklMODhRSVFQNlVPOEFRVk82VTVJUEwwVkpDMjJKR0gwPT09PW4GAF3gLBONAWIAAVGA.DCT0Qv80qzF5OQ6CccLKXPLgzC3Rzx5DqzDAh9mWAww"
❯ websocat --header="User-Agent: iOS/12.7 (iPhone) connlib/0.7.412" "ws://127.0.0.1:13000/gateway/websocket?token=${GATEWAY_TOKEN_FROM_SEEDS}&external_id=thisisrandomandpersistent&name=kkX1&public_key=kceI60D6PrwOIiGoVz6hD7VYCgD1H57IVQlPJTTieUE="
# After this you need to join the `gateway` topic.
# For details on this structure see https://hexdocs.pm/phoenix/Phoenix.Socket.Message.html
❯ {"event":"phx_join","topic":"gateway","payload":{},"ref":"unique_string_ref","join_ref":"unique_join_ref"}
{"ref":"unique_string_ref","payload":{"status":"ok","response":{}},"topic":"gateway","event":"phx_reply"}
{"ref":null,"payload":{"interface":{"ipv6":"fd00:2021:1111::35:f630","ipv4":"100.77.125.87"},"ipv4_masquerade_enabled":true,"ipv6_masquerade_enabled":true},"topic":"gateway","event":"init"}
```
Relay
```bash
# Note: The token value below is an example. The token value you will need is generated and printed out when seeding the database, as described earlier in the document.
❯ export RELAY_TOKEN_FROM_SEEDS=".SFMyNTY.g2gDaANtAAAAJGM4OWJjYzhjLTkzOTItNGRhZS1hNDBkLTg4OGFlZjZkMjhlMG0AAAAkNTQ5YzQxMDctMTQ5Mi00ZjhmLWE0ZWMtYTlkMmE2NmQ4YWE5bQAAADhQVTVBSVRFMU84VkRWTk1ITU9BQzc3RElLTU9HVERJQTY3MlM2RzFBQjAyT1MzNEg1TUUwPT09PW4GAJeo1TONAWIAAVGA.Vi3gCkFKoWH03uSUshAYYzRhw7eKQxYw1piFnkFPGtA"
❯ websocat --header="User-Agent: Linux/5.2.6 (Debian; x86_64) relay/0.7.412" "ws://127.0.0.1:8081/relay/websocket?token=${RELAY_TOKEN_FROM_SEEDS}&ipv4=24.12.79.100&ipv6=4d36:aa7f:473c:4c61:6b9e:2416:9917:55cc"
# Here is what you will see in docker logs firezone-api-1
# {"time":"2023-06-05T23:16:01.537Z","severity":"info","message":"CONNECTED TO API.Relay.Socket in 251ms\n Transport: :websocket\n Serializer: Phoenix.Socket.V1.JSONSerializer\n Parameters: %{\"ipv4\" => \"24.12.79.100\", \"ipv6\" => \"4d36:aa7f:473c:4c61:6b9e:2416:9917:55cc\", \"stamp_secret\" => \"[FILTERED]\", \"token\" => \"[FILTERED]\"}","metadata":{"domain":["elixir"],"erl_level":"info"}}
# After this you need to join the `relay` topic and pass a `stamp_secret` in the payload.
# For details on this structure see https://hexdocs.pm/phoenix/Phoenix.Socket.Message.html
❯ {"event":"phx_join","topic":"relay","payload":{"stamp_secret":"makemerandomplz"},"ref":"unique_string_ref","join_ref":"unique_join_ref"}
{"event":"phx_reply","payload":{"response":{},"status":"ok"},"ref":"unique_string_ref","topic":"relay"}
{"event":"init","payload":{},"ref":null,"topic":"relay"}
```
Client
```bash
# Note: The token value below is an example. The token value you will need is generated and printed out when seeding the database, as described earlier in the document.
❯ export CLIENT_TOKEN_FROM_SEEDS="n.SFMyNTY.g2gDaANtAAAAJGM4OWJjYzhjLTkzOTItNGRhZS1hNDBkLTg4OGFlZjZkMjhlMG0AAAAkN2RhN2QxY2QtMTExYy00NGE3LWI1YWMtNDAyN2I5ZDIzMGU1bQAAACtBaUl5XzZwQmstV0xlUkFQenprQ0ZYTnFJWktXQnMyRGR3XzJ2Z0lRdkZnbgYAGUmu74wBYgABUYA.UN3vSLLcAMkHeEh5VHumPOutkuue8JA6wlxM9JxJEPE"
# Panel will only accept token if it's coming with this User-Agent header and from IP 172.28.0.1
❯ export CLIENT_USER_AGENT="iOS/12.5 (iPhone) connlib/0.7.412"
❯ websocat --header="User-Agent: ${CLIENT_USER_AGENT}" "ws://127.0.0.1:8081/client/websocket?token=${CLIENT_TOKEN_FROM_SEEDS}&external_id=thisisrandomandpersistent&name=kkX1&public_key=kceI60D6PrwOIiGoVz6hD7VYCgD1H57IVQlPJTTieUE="
# Here is what you will see in docker logs firezone-api-1
# firezone-api-1 | {"domain":["elixir"],"erl_level":"info","logging.googleapis.com/sourceLocation":{"file":"lib/phoenix/logger.ex","line":306,"function":"Elixir.Phoenix.Logger.phoenix_socket_connected/4"},"message":"CONNECTED TO API.Client.Socket in 83ms\n Transport: :websocket\n Serializer: Phoenix.Socket.V1.JSONSerializer\n Parameters: %{\"external_id\" => \"thisisrandomandpersistent\", \"name\" => \"kkX1\", \"public_key\" => \"[FILTERED]\", \"token\" => \"[FILTERED]\"}","severity":"INFO","time":"2023-06-23T21:01:49.566Z"}
# After this you need to join the `client` topic and pass a `stamp_secret` in the payload.
# For details on this structure see https://hexdocs.pm/phoenix/Phoenix.Socket.Message.html
❯ {"event":"phx_join","topic":"client","payload":{},"ref":"unique_string_ref","join_ref":"unique_join_ref"}
{"ref":"unique_string_ref","topic":"client","event":"phx_reply","payload":{"status":"ok","response":{}}}
{"ref":null,"topic":"client","event":"init","payload":{"interface":{"ipv6":"fd00:2021:1111::11:f4bd","upstream_dns":[],"ipv4":"100.71.71.245"},"resources":[{"id":"4429d3aa-53ea-4c03-9435-4dee2899672b","name":"172.20.0.1/16","type":"cidr","address":"172.20.0.0/16"},{"id":"85a1cffc-70d3-46dd-aa6b-776192af7b06","name":"gitlab.mycorp.com","type":"dns","address":"gitlab.mycorp.com","ipv6":"fd00:2021:1111::5:b370","ipv4":"100.85.109.146"}]}}
# List online relays for a Resource
❯ {"event":"prepare_connection","topic":"client","payload":{"resource_id":"1f27735f-651d-49e8-840c-8f1ba581d88e"},"ref":"unique_prepare_connection_ref"}
{"ref":"unique_prepare_connection_ref","payload":{"status":"ok","response":{"relays":[{"type":"turn","uri":"turn:189.172.72.111:3478","username":"1738022400:4ZxvSNDzU98MJiEjsR8DOA","password":"TVZvSgIGFK0TtNDXFVU9gv9a1WDz2Ou7RTEUis4E6To","expires_at":1738022400},{"type":"turn","uri":"turn:[::1]:3478","username":"1738022400:KCYrRTRmfGNAEEe7KyjHkA","password":"8KYplQOKBf5smJRZDhC54kiKKNVmUxsVxH1V8xfY/do","expires_at":1738022400}],"resource_id":"1f27735f-651d-49e8-840c-8f1ba581d88e","gateway_remote_ip":"127.0.0.1","gateway_id":"6e52c0ce-ccd9-46d9-8715-796ec9812719"}},"topic":"client","event":"phx_reply"}
{"event":"request_connection","topic":"client","payload":{"resource_id":"1f27735f-651d-49e8-840c-8f1ba581d88e","client_payload":"RTC_SD","client_preshared_key":"+HapiGI5UdeRjKuKTwk4ZPPYpCnlXHvvqebcIevL+2A="},"ref":"unique_request_connection_ref"}
# Initiate connection to a resource
❯ {"event":"request_connection","topic":"client","payload":{"gateway_id":"6e52c0ce-ccd9-46d9-8715-796ec9812719","resource_id":"1f27735f-651d-49e8-840c-8f1ba581d88e","client_payload":"RTC_SD","client_preshared_key":"+HapiGI5UdeRjKuKTwk4ZPPYpCnlXHvvqebcIevL+2A="},"ref":"unique_request_connection_ref"}
```
Note: when you run multiple commands it can hang because Phoenix expects a
heartbeat packet every 5 seconds, so it can kill your websocket if you send
commands slower than that.
You can reset the database (eg. when there is a migration that breaks data model
for unreleased versions) using following command:
```bash
❯ docker-compose run elixir /bin/sh -c "cd apps/domain && mix ecto.reset"
```
Stopping everything is easy too:
```bash
docker-compose down
```
## Useful commands for local testing and debugging
Connecting to an IEx interactive console:
```bash
❯ docker-compose run elixir /bin/sh -c "cd apps/domain && iex -S mix"
```
Connecting to a running api/web instance shell:
```bash
❯ docker exec -it firezone-api-1 sh
/app
```
Connecting to a running api/web instance to run Elixir code from them:
```bash
# Start all services in daemon mode (in background)
❯ docker-compose up -d --build
# Connect to a running API node
❯ docker exec -it firezone-api-1 bin/api remote
Erlang/OTP 25 [erts-13.1.4] [source] [64-bit] [smp:5:5] [ds:5:5:10] [async-threads:1]
Interactive Elixir (1.14.3) - press Ctrl+C to exit (type h() ENTER for help)
iex(api@127.0.0.1)1>
# Connect to a running Web node
❯ docker exec -it firezone-web-1 bin/web remote
Erlang/OTP 25 [erts-13.1.4] [source] [64-bit] [smp:5:5] [ds:5:5:10] [async-threads:1]
Interactive Elixir (1.14.3) - press Ctrl+C to exit (type h() ENTER for help)
iex(web@127.0.0.1)1>
```
From `iex` shell you can run any Elixir code, for example you can emulate a full
flow using process messages, just keep in mind that you need to run seeds before
executing this example:
```elixir
[gateway | _rest_gateways] = Domain.Repo.all(Domain.Gateways.Gateway)
:ok = Domain.Gateways.connect_gateway(gateway)
[relay | _rest_relays] = Domain.Repo.all(Domain.Relays.Relay)
relay_secret = Domain.Crypto.random_token()
:ok = Domain.Relays.connect_relay(relay, relay_secret)
```
Now if you connect and list resources there will be one online because there is
a relay and gateway online.
Some of the functions require authorization, here is how you can obtain a
subject:
```elixir
user_agent = "User-Agent: iOS/12.7 (iPhone) connlib/0.7.412"
remote_ip = {127, 0, 0, 1}
# For a client
context = %Domain.Auth.Context{type: :client, user_agent: user_agent, remote_ip: remote_ip}
{:ok, subject} = Domain.Auth.authenticate(client_token, context)
# For an admin user, imitating the browser session
context = %Domain.Auth.Context{type: :browser, user_agent: user_agent, remote_ip: remote_ip}
provider = Domain.Repo.get_by(Domain.Auth.Provider, adapter: :userpass)
identity = Domain.Repo.get_by(Domain.Auth.Identity, provider_id: provider.id, provider_identifier: "firezone@localhost.local")
token = Domain.Auth.create_token(identity, context, "", nil)
browser_token = Domain.Tokens.encode_fragment!(token)
{:ok, subject} = Domain.Auth.authenticate(browser_token, context)
```
Listing connected gateways, relays, clients for an account:
```elixir
account_id = "c89bcc8c-9392-4dae-a40d-888aef6d28e0"
%{
gateways: Domain.Gateways.Presence.list("gateways:#{account_id}"),
relays: Domain.Relays.Presence.list("relays:#{account_id}"),
clients: Domain.Clients.Presence.list("clients:#{account_id}"),
}
```
### Connecting billing in dev mode for manual testing
Prerequisites:
- A Stripe account (Note: for the Firezone team, you will need to be invited to
the Firezone Stripe account)
- [Stripe CLI](https://github.com/stripe/stripe-cli)
Steps:
1. Use static seeds to provision account ID that corresponds to staging setup on
Stripe:
```bash
STATIC_SEEDS=true mix do ecto.reset, ecto.seed
```
1. Start Stripe CLI webhook proxy:
```bash
stripe listen --forward-to localhost:13001/integrations/stripe/webhooks
```
1. Start the Phoenix server with enabled billing from the [`elixir/`](./) folder
using a [test mode token](https://dashboard.stripe.com/test/apikeys):
```bash
cd elixir/
BILLING_ENABLED=true STRIPE_SECRET_KEY="...copy from stripe dashboard..." STRIPE_WEBHOOK_SIGNING_SECRET="...copy from stripe cli tool.." mix phx.server
```
When updating the billing plan in stripe, use the
[Stripe Testing Docs](https://docs.stripe.com/testing#testing-interactively) for
how to add test payment info
### WorkOS integration
WorkOS is currently being used for JumpCloud directory sync integration. This
allows JumpCloud users to use SCIM on the JumpCloud side, rather than having to
give Firezone an admin JumpCloud API token.
#### Connecting WorkOS in dev mode for manual testing
If you are not planning to use the JumpCloud provider in your local development
setup, then no additional setup is needed. However, if you do need to use the
JumpCloud provider locally, you will need to obtain an API Key and Client ID
from the [WorkOS Dashboard](https://dashboard.workos.com/api-keys).
To obtain a WorkOS dashboard login, contact one of the following Firezone team
members:
- @jamilbk
- @bmanifold
- @AndrewDryga
Once you are able to login to the WorkOS Dashboard, make sure that you have
selected the 'Staging' environment within WorkOS. Navigate to the API Keys page
and use the `Create Key` button to obtain credentials.
After obtaining WorkOS API credentials, you will need to make sure they are set
in the environment ENVs when starting your local dev instance of Firezone. As an
example:
```bash
cd elixir/
WORKOS_API_KEY="..." WORKOS_CLIENT_ID="..." mix phx.server
```
### Acceptance tests
You can disable headless mode for the browser by adding
```elixir
@tag debug: true
feature ....
```
to the acceptance test that you are running.
## Connecting to a staging or production instance
We use Google Cloud Platform for all our staging and production infrastructure.
You'll need access to this env to perform the commands below; to request access
you need to complete the following process:
- Open a PR adding yourself to `project_owners` in `main.tf` for each of the
[environments](../terraform/environments) you need access.
- Request a review from an existing project owner.
- Once approved, merge the PR and verify access by continuing with one of the
steps below.
This is a danger zone so first of all, ALWAYS make sure on which environment
your code is running:
```bash
❯ gcloud config get project
firezone-staging
```
Then you want to figure out which specific instance you want to connect to:
```bash
❯ gcloud compute instances list
NAME ZONE MACHINE_TYPE PREEMPTIBLE INTERNAL_IP EXTERNAL_IP STATUS
api-b02t us-east1-d n1-standard-1 10.128.0.22 RUNNING
api-srkp us-east1-d n1-standard-1 10.128.0.23 RUNNING
web-51wd us-east1-d n1-standard-1 10.128.0.21 RUNNING
web-6k3n us-east1-d n1-standard-1 10.128.0.20 RUNNING
```
SSH into the host VM:
```bash
❯ gcloud compute ssh api-b02t --tunnel-through-iap
No zone specified. Using zone [us-east1-d] for instance: [api-b02t].
...
########################[ Welcome ]########################
# You have logged in to the guest OS. #
# To access your containers use 'docker attach' command #
###########################################################
andrew@api-b02t ~ $ $(docker ps | grep klt- | head -n 1 | awk '{split($NF, arr, "-"); print "docker exec -it "$NF" bin/"arr[2]" remote";}')
Erlang/OTP 25 [erts-13.1.4] [source] [64-bit] [smp:1:1] [ds:1:1:10] [async-threads:1] [jit]
Interactive Elixir (1.14.3) - press Ctrl+C to exit (type h() ENTER for help)
iex(api@api-b02t.us-east1-d.c.firezone-staging.internal)1>
```
One-liner to connect to a running application container:
```bash
❯ gcloud compute ssh $(gcloud compute instances list | grep "web-" | tail -n 1 | awk '{ print $1 }') --tunnel-through-iap -- '$(docker ps | grep klt- | head -n 1 | awk '\''{split($NF, arr, "-"); print "docker exec -it " $NF " bin/" arr[2] " remote";}'\'')'
Interactive Elixir (1.15.2) - press Ctrl+C to exit (type h() ENTER for help)
iex(web@web-w2f6.us-east1-d.c.firezone-staging.internal)1>
```
### Quickly provisioning an account
Useful for onboarding beta customers. See the `Domain.Ops.provision_account/1`
function:
```elixir
iex> Domain.Ops.create_and_provision_account(%{
name: "Customer Account",
slug: "customer_account",
admin_name: "Test User",
admin_email: "test@firezone.localhost"
})
```
### Creating an account on staging instance using CLI
```elixir
❯ gcloud compute ssh web-3vmw --tunnel-through-iap
andrew@web-3vmw ~ $ docker ps --format json | jq '"\(.ID) \(.Image)"'
"09eff3c0ebe8 us-east1-docker.pkg.dev/firezone-staging/firezone/web:b9c11007a4e230ab28f0138afc98188b1956dfd3"
andrew@web-3vmw ~ $ docker exec -it 09eff3c0ebe8 bin/web remote
Erlang/OTP 26 [erts-14.0.2] [source] [64-bit] [smp:1:1] [ds:1:1:20] [async-threads:1] [jit]
Interactive Elixir (1.15.2) - press Ctrl+C to exit (type h() ENTER for help)
iex(web@web-3vmw.us-east1-d.c.firezone-staging.internal)1> {:ok, account} = Domain.Accounts.create_account(%{name: "Firezone", slug: "firezone"})
{:ok, ...}
iex(web@web-3vmw.us-east1-d.c.firezone-staging.internal)2> {:ok, email_provider} = Domain.Auth.create_provider(account, %{name: "Email (OTP)", adapter: :email, adapter_config: %{}})
{:ok, ...}
iex(web@web-3vmw.us-east1-d.c.firezone-staging.internal)3> {:ok, actor} = Domain.Actors.create_actor(account, %{type: :account_admin_user, name: "Andrii Dryga"})
{:ok, ...}
iex(web@web-3vmw.us-east1-d.c.firezone-staging.internal)4> {:ok, identity} = Domain.Auth.upsert_identity(actor, email_provider, %{provider_identifier: "a@firezone.dev", provider_identifier_confirmation: "a@firezone.dev"})
...
iex(web@web-3vmw.us-east1-d.c.firezone-staging.internal)5> context = %Domain.Auth.Context{type: :browser, user_agent: "User-Agent: iOS/12.7 (iPhone) connlib/0.7.412", remote_ip: {127, 0, 0, 1}}
iex(web@web-3vmw.us-east1-d.c.firezone-staging.internal)6> {:ok, identity} = Domain.Auth.Adapters.Email.request_sign_in_token(identity, context)
{:ok, ...}
iex(web@web-3vmw.us-east1-d.c.firezone-staging.internal)7> Domain.Mailer.AuthEmail.sign_in_link_email(identity) |> Domain.Mailer.deliver()
{:ok, %{id: "d24dbe9a-d0f5-4049-ac0d-0df793725a80"}}
```
### Obtaining admin subject on staging
```elixir
❯ gcloud compute ssh web-2f4j --tunnel-through-iap -- '$(docker ps | grep klt- | head -n 1 | awk '\''{split($NF, arr, "-"); print "docker exec -it " $NF " bin/" arr[2] " remote";}'\'')'
Erlang/OTP 26 [erts-14.0.2] [source] [64-bit] [smp:1:1] [ds:1:1:20] [async-threads:1] [jit]
Interactive Elixir (1.15.2) - press Ctrl+C to exit (type h() ENTER for help)
iex(web@web-2f4j.us-east1-d.c.firezone-staging.internal)1> account_id = "REPLACE_ME"
...
iex(web@web-2f4j.us-east1-d.c.firezone-staging.internal)2> context = %Domain.Auth.Context{type: :browser, user_agent: "User-Agent: iOS/12.7 (iPhone) connlib/0.7.412", remote_ip: {127, 0, 0, 1}}
...
iex(web@web-2f4j.us-east1-d.c.firezone-staging.internal)3> [actor | _] = Domain.Actors.Actor.Query.by_type(:account_admin_user) |> Domain.Actors.Actor.Query.by_account_id(account_id) |> Domain.Repo.all()
...
iex(web@web-2f4j.us-east1-d.c.firezone-staging.internal)4> [identity | _] = Domain.Auth.Identity.Query.by_actor_id(actor.id) |> Domain.Repo.all()
...
iex(web@web-2f4j.us-east1-d.c.firezone-staging.internal)5> token = Domain.Auth.create_token(identity, context, "", nil)
...
iex(web@web-2f4j.us-east1-d.c.firezone-staging.internal)6> browser_token = Domain.Tokens.encode_fragment!(token)
...
iex(web@web-2f4j.us-east1-d.c.firezone-staging.internal)7> {:ok, subject} = Domain.Auth.authenticate(browser_token, context)
```
### Rotate relay token
```elixir
iex(web@web-xxxx.us-east1-d.c.firezone-staging.internal)1> group = Domain.Repo.one!(Domain.Relays.Group.Query.global())
...
iex(web@web-xxxx.us-east1-d.c.firezone-staging.internal)2> {:ok, token} = Domain.Relays.create_token(group, %{})
...
```
## Connection to production Cloud SQL instance
Install
[`cloud-sql-proxy`](https://cloud.google.com/sql/docs/postgres/connect-instance-auth-proxy)
(eg. `brew install cloud-sql-proxy`) and run:
First, obtain a fresh token:
```bash
gcloud auth application-default login
```
```bash
cloud-sql-proxy --auto-iam-authn "firezone-prod:us-east1:firezone-prod?address=0.0.0.0&port=9000"
```
Then you can connect to the PostgreSQL using `psql`:
```bash
# Use your work email as username to connect
PG_USER=$(gcloud auth list --filter=status:ACTIVE --format="value(account)" | head -n 1)
psql "host=localhost port=9000 sslmode=disable dbname=firezone user=${PG_USER}"
```
### Connecting to Cloud SQL instance as the `firezone` user
Some operations like DROP'ing indexes to recreate them require you to connect as the table owner, which in our case is the `firezone` user.
The password for this user is randomly generated by Terraform, so to connect as this user you need to obtain the password
from the Application configuration inside a running elixir container.
First, [obtain an iex shell](#connecting-to-a-staging-or-production-instances), then view the password with:
```elixir
Application.get_env(:domain, Domain.Repo)
```
Now, you can connect to the Cloud SQL instance as the `firezone` user:
```bash
psql "host=localhost port=9000 sslmode=disable dbname=firezone user=firezone"
```
## Deploying
### Apply Terraform changes without deploying new containers
This can be helpful when you want to quickly iterate over Terraform configuration in staging environment, without
having to merge for every single apply attempt.
Switch to the staging environment:
```bash
cd terraform/environments/staging
```
and apply changes reusing previous container versions:
```bash
terraform apply -var image_tag=$(terraform output -raw image_tag)
```
### Deploying production
Before deploying, check if the `main` branch has any breaking changes since the last deployment. You can do this by comparing the `main` branch with the last deployed commit, which you can find [here](https://github.com/firezone/firezone/deployments/gcp_production).
Here is a one-liner to open the comparison in your browser:
```bash
open "https://github.com/firezone/firezone/compare/$(curl -L -H "Accept: application/vnd.github+json" -H "X-GitHub-Api-Version: 2022-11-28" "https://api.github.com/repos/firezone/firezone/actions/workflows/deploy.yml/runs?status=completed&per_page=1" | jq -r '.workflow_runs[0].head_commit.id')...main"
```
If there are any breaking changes, make sure to confirm with the rest of the team on a rollout strategy before proceeding with any of the steps listed below.
Then, go to ["Deploy Production"](https://github.com/firezone/firezone/actions/workflows/deploy.yml) CI workflow and click "Run Workflow".
1. In the form that appears, read the warning and check the checkbox next to it.
2. The main branch is selected by default for deployment. To deploy a previous version, enter the commit SHA in the "Image tag to deploy" field.
The commit MUST be from the `main` branch.
3. Click "Run Workflow" to start the process.
The workflow will run all the way till the `deploy-production` step (which runs `terraform apply`) and wait for an approval from one of the project owners,
message one of your colleagues to approve it.
#### Deployment Takes Too Long to Complete
Typically, `terraform apply` takes around 15 minutes in production. If it's taking longer (or you want to monitor the status), here are a few things you can check:
1. **Monitor the run status in [Terraform Cloud](https://app.terraform.io/app/firezone/workspaces/production/runs).**
2. **Check the status of Instance Groups in [Google Cloud Console](https://console.cloud.google.com/compute/instanceGroups/list?project=firezone-prod).**
3. [Check the logs](#viewing-logs) for the deployed instances.
For instance groups stuck in the `UPDATING` state:
- Open the group and look for any errors. Typically, if deployment is stuck, you'll find one instance in the group with an error (and a recent creation time), while the others are pending updates.
- To quickly view logs for that instance, click the instance name and then click the `Logging` link.
_Do not panic—our production environment should remain stable. GCP and Terraform are designed to keep old instances running until the new ones are healthy._
#### Common Reasons for Deployment Issues
**1. A Bug in the Code**
- This can either crash the instance or make it unresponsive (you’ll notice failing health checks and error logs).
- If this happens, ensure there were no database migrations as part of the changes (check `priv/repo/migrations`).
- If no migrations are involved, rollback the deployment. To do this, cancel the currently running deployment,
find the last successful deployment in Terraform Cloud, copy the `image_tag` from its output, and run:
```bash
cd terraform/environments/production
terraform apply -var image_tag=
```
- You can also rollback a specific component by overriding its image tag in the `terraform apply` command:
```bash
terraform apply -var image_tag= -var _image_tag=
```
_If there were migrations and they’ve already been applied, proceed to the next option._
**2. An Issue with the Migration**
- You’ll notice failing health checks and error logs related to the migration.
- You can either:
- Fix the data causing the migration to fail (refer to [Connection to Production Cloud SQL Instance](#connection-to-production-cloud-sql-instance)).
- Fix the migration code and redeploy.
**3. Insufficient Resources to Deploy New Instances**
- If there are no errors but updates are pending, there might not be enough resources to deploy new instances.
- This can be found in the Errors tab of the instance group.
Typically, this issue resolves itself as old reservations are freed up.
## Monitoring and Troubleshooting
### Viewing logs
Logs can be viewed via th [Logs Explorer](https://console.cloud.google.com/logs)
in GCP, or via the `gcloud` CLI:
```bash
# First, login
> gcloud auth login
# Always make sure you're in the correct environment
> gcloud config get project
firezone-staging
# Now you can stream logs directly to your terminal.
############
# Examples #
############
# Stream all Elixir error logs:
> gcloud logging read "jsonPayload.message.severity=ERROR"
# Stream Web app logs (portal UI):
> gcloud logging read 'jsonPayload."cos.googleapis.com/container_name":web'
# Stream API app logs (connlib control plane):
> gcloud logging read 'jsonPayload."cos.googleapis.com/container_name":api'
# For more info on the filter expression syntax, see:
# https://cloud.google.com/logging/docs/view/logging-query-language
```
Here is a helpful filter to show all errors and crashes:
```
resource.type="gce_instance"
(severity>=ERROR OR "Kernel pid terminated" OR "Crash dump is being written")
-protoPayload.@type="type.googleapis.com/google.cloud.audit.AuditLog"
-logName:"/logs/GCEGuestAgent"
-logName:"/logs/OSConfigAgent"
-logName:"/logs/ops-agent-fluent-bit"
```
An alert will be sent to the `#feed-proudction` Slack channel when a new error is logged that matches this filter.
You can also see all errors in [Google Cloud Error Reporting](https://console.cloud.google.com/errors?project=firezone-prod).
Sometimes logs will not provide enough context to understand the issue. In those cases you can
try to filter by the `trace` field to get more information. Copy the `trace` value from a log entry
and use it in the filter:
```
resource.type="gce_instance"
jsonPayload.trace:""
```
Note: If you simply click "Show entries for this trace" in the log entry, it will
automatically **append** the filter for you. You might want to remove rest of filters
so you can see all logs for that trace.
## Viewing metrics
Metrics can be viewed via the [Metrics Explorer](https://console.cloud.google.com/monitoring/metrics-explorer) in GCP.
## Viewing traces
Traces can be viewed via the [Trace Explorer](https://console.cloud.google.com/traces/list) in GCP.
They are mostly helpful for debugging Clients, Relays and Gateways.
For example, if you want to find all traces for client management processes, you can use the following filter:
```
RootSpan: client.connect
```
Then you can drill down either by using a `client_id: ` or an `account_id: `.
Note: For WS API processes, the total trace duration might not be helpful since a single trace is defined for
the entire connection lifespan.