The lint workflow was slow and we don't often change buf or angular
these days so they're not necessary.
The remaining valuable task is cspell, which we can speed up with a
dedicated step.
mpvl suggests @embed is a more ideal solution than our implementation of
core.Component.Instances for the use case of unifying YAML data updated
by Kargo Stage resources.
See the issue for a link to the discussion.
I'd like to add the kargo-demo repository to Unity to test evalv3, but
can't get a handle on the main function to wire up to testscript.
This patch fixes the problem by moving the MakeMain function to a public
package so the kargo-demo go module can import and call it using the go
mod tools technique.
Previously holos render platform was not setting the --extract-yaml file
when calling holos render component, causing data file instances defined
in the Platform spec to be discarded.
This patch passes the value along using the flag.
Extract YAML is more clear and aligns with the schema docs for the
Component Instance field which has an extractYAML kind. This also
leaves the door open for additional kinds of data extractors which are
almost certainly going to be needed.
Previously there isn't a good way to unify json and yaml files with the
cue configuration. This is a problem for use cases where data can be
generated idempotentialy prior to rendering the platform configuration.
The first use case is to explore unifying configuration with decrypted
sops values, which isn't typical since Holos is designed to handle
secrets with ExternalSecret resources, but does fit into the use case of
executing a command to produce data idempotently, then make the data
available to the platform configuration.
Other use cases this feature is intended to support are the prior
experiment where we fetch top level platform configuration from an rpc
service, and the future goal of integrating with data provided by
Terraform.
PROBLEM:
We've noticed that Holos almost immediately gets compared to Timoni, and
we frequently get asked for specifics in how they're similar/different.
SOLUTION:
* Add a `Comparison` page.
* Include a section that compares Holos to Timoni
OUTCOME:
Fewer questions about how Holos compares to Timoni because people are
able to find that answer themselves on our docs page.
It didn't work, failed with:
❯ holos show buildplans --selector app.holos.run/city=ams
could not run: Component.Name: 2 errors in empty disjunction: (and 2 more errors) at internal/builder/instance.go:66
Component.Name: 2 errors in empty disjunction:
Component.Name: conflicting values "no-name" and "podinfo-ams":
/Users/jeff/Holos/foo/holos-environments-tutorial/components/podinfo/podinfo.cue:6:12
/Users/jeff/Holos/foo/holos-environments-tutorial/schema.cue:6:13
/Users/jeff/Holos/foo/holos-environments-tutorial/schema.cue:35:2
/Users/jeff/Holos/foo/holos-environments-tutorial/tags.cue:13:19
Component.Name: conflicting values "podinfo" and "podinfo-ams":
/Users/jeff/Holos/foo/holos-environments-tutorial/components/podinfo/podinfo.cue:6:12
/Users/jeff/Holos/foo/holos-environments-tutorial/components/podinfo/podinfo.cue:7:8
/Users/jeff/Holos/foo/holos-environments-tutorial/schema.cue:6:13
/Users/jeff/Holos/foo/holos-environments-tutorial/schema.cue:35:2
This was likely because the podinfo component was used in different ways
in different topics. Don't use the shared component to fix the problem.
Previously holos unconditionally executed helm repo add which failed for
private repositories requiring basic authentication.
This patch addresses the problem by using the Helm SDK to pull and cache
charts without adding them as repositories. New fields for the
core.Helm type allow basic auth credentials to be read from environment
variables.
Multiple repositories are supported by using different env vars for
different repositories.
PROBLEM:
We've created a YouTube video walking people through Holos and the Helm
Values tutorial, but now we need to embed it on the site for visitors to
watch.
SOLUTION:
* Create a `YouTube` MDX plugin
* Use that plugin on Overview and Helm Values
* Tune the video size/attributes using CSS
OUTCOME:
The Helm Values YouTube video is embedded on our site for visitors to
watch.
Without this patch we do not support installing Kargo from an OCI helm
chart. We want to support:
```
Component: #Helm & {
Name: "kargo"
Namespace: Kargo.Namespace
Chart: {
name: "oci://ghcr.io/akuity/kargo-charts/kargo"
version: "1.0.3"
release: Name
}
EnableHooks: true
Values: Kargo.Values
}
```
This patch fixes the problem by using the base name for filesystem cache
operations.
This shows up in the Unity tests I'm working on with mvdan and goes to a
blank page without the redirect in place.
--- FAIL: TestGuides_v1alpha5 (0.00s)
--- FAIL: TestGuides_v1alpha5/helm (0.60s)
testscript.go:584: # Helm Guide https://holos.run/docs/guides/helm/
Previously, build tags were not propagated from `holos render platform
-t validate` through to the underlying `holos render component` command.
This is a problem because validators need to be selectively enabled as a
work around until we have an audit mode field.
This patch fixes the problem by propagating command line tags from the
render platform command to the underlying commands. This patch also
propagates tags for the show command.
Previously Holos only supported tags in the form of key=value. CUE
supports boolean style tags in the form of `key [ "=" value ]` which we
want to use to conditionally use to register components with the
platform.
This patch modifies the flag parsing to support -t foo like cue does,
for use with the @if(foo) build tag.
Previously the BuildPlan pipeline didn't execute generators and
transformers concurrently. All steps were sequentially executed. Holos
was primarily concurrent by executing multiple BuildPlans at once.
This patch changes the Build implementation for each BuildPlan to
execute a GoRoutine pipeline. One producer fans out to a group of
routines each executing the pipeline for one artifact in the build plan.
The pipeline has 3 stages:
1: Fan-out to build each Generator concurrently.
2: Fan-in to build each Transformer sequentially.
3: Fan-out again to run each validator concurrently.
When the artifact pipelines return, the producer closes the tasks
channel causing the worker tasks to return.
Note the overall runtime for 8 BuildPlans is roughly equivalent to
previously at 160ms with --concurrency=8 on my M3 Max. I expect this to
perform better than previously when multiple artifacts are rendered for
each BuildPlan.
Writes files based on parent pid and process pid to avoid collisions.
Analyze with:
export HOLOS_TRACE=trace.%d.%d.out
go tool trace trace.999.1000.out
export HOLOS_CPU_PROFILE=cpu.%d.%d.prof
go tool pprof cpu.999.1000.prof
export HOLOS_MEM_PROFILE=mem.%d.%d.prof
go tool pprof mem.999.1000.prof
Without this patch the validator fails if a component manages two of the
same kind of resource, which is common.
This patch updates the example to use the metadata namespace and name as
lookup keys. This works for most components, but may not for
ClusterResources. Use the kind top level field in that case and pass
the field name of the validator as a tag value to vary by component.
Without this patch `holos cue vet` always returns exit code 0, even when
there are errors.
This patch fixes the problem by catching the error and returning it to
our own top level error handler. Note the final error, "could not run:
terminating because of errors" which wraps the generic error reported by
cue in the presence of multiple errors.
Result:
```
❯ holos cue vet ./policy --path 'strings.ToLower(kind)' /tmp/podinfo.gen.yaml
deployment.kind: conflicting values "Forbidden" and "Deployment":
./policy/validations.cue:18:8
../../../../../tmp/podinfo.gen.yaml:25:7
deployment.spec.template.spec.containers.0.resources.limits: conflicting values null and {[string]:"k8s.io/apimachinery/pkg/api/resource".#Quantity} (mismatched types null and struct):
./cue.mod/gen/k8s.io/api/apps/v1/types_go_gen.cue:355:9
./cue.mod/gen/k8s.io/api/apps/v1/types_go_gen.cue:376:12
./cue.mod/gen/k8s.io/api/core/v1/types_go_gen.cue:2840:11
./cue.mod/gen/k8s.io/api/core/v1/types_go_gen.cue:2968:14
./cue.mod/gen/k8s.io/api/core/v1/types_go_gen.cue:3882:15
./cue.mod/gen/k8s.io/api/core/v1/types_go_gen.cue:3882:18
./cue.mod/gen/k8s.io/api/core/v1/types_go_gen.cue:5027:9
./cue.mod/gen/k8s.io/api/core/v1/types_go_gen.cue:6407:16
./policy/validations.cue:17:13
../../../../../tmp/podinfo.gen.yaml:104:19
could not run: terminating because of errors
```
Similar to the Clusters topic, add a topic about configuring multiple
environments. This likely needs some work, the example is a bit
contrivied but at least shows how we can look up attributes, then use
those attributes to look up additional configuration from platform-wide
configuration data.
This commit removes the extra `./platform` argument from any of the
current tutorial/topic docs to reflect the change that was made to
`holos render platform` in version `0.100.0`.
If someone accidentally provides the same index multiple times, or
indexes less than the next expected, the program would silently discard
the data. This would be difficult to troubleshoot since an
OrderedEncoder is usually used with concurrent go routines, which would
likely mislead the investigator.
Better to just fail hard with an error indicating the caller in these
situations.
Assert against the complete build plan so we know if we change the
output format in the future.
It's easy to update if so:
HOLOS_UPDATE_SCRIPTS=1 go test github.com/holos-run/holos/cmd/holos
Without this patch trying to use a Kustomize patch with the optional
name field omitted results in the following error:
could not run: holos.spec.artifacts.0.transformers.0.kustomize.kustomization.patches.0.target.name: cannot convert non-concrete value string at builder/v1alpha5/builder.go:218
holos.spec.artifacts.0.transformers.0.kustomize.kustomization.patches.0.target.name: cannot convert non-concrete value string:
$WORK/cue.mod/gen/sigs.k8s.io/kustomize/api/types/var_go_gen.cue:33:2
This patch fixes the problem by providing a default value for the name
field matching the Go zero value for a string.
Without this patch the BuildPlan resulting from a Platform that has
components with labels and annotations does not have the labels or
annotations of the source component.
Holos should copy the labels and annotations defined on each of the
Platform.spec.components to the resulting BuildPlan so end users can see
clearly where a BuildPlan originated from, and filter with selectors the
intermediate output BuildPlan the same way we filter with selectors the
original Platform spec components list.
Result:
```
holos init platform v1alpha5 --force
holos show buildplans | head
```
```yaml
kind: BuildPlan
apiVersion: v1alpha5
metadata:
name: podinfo
labels:
app.holos.run/cluster: local
app.holos.run/name: podinfo
annotations:
app.holos.run/description: podinfo for cluster local
```
Without this patch the holos show buildplans command returns results in
an inconsistent order. This is a problem because the output should be
idempotent.
This patch fixes the problem by adding an EncodeSeq(idx int, v any) method to
the encoder interface. idx represents the index position of the
Platform.spec.components list after selector filtering has been applied.
This patch modifies the json and yaml encoders to buffer out of order
results from the concurrent go routines.
Result:
Concurrent execution is preserved. The buffer is kept to a reasonable
size, entries are deleted once they're encoded in the correct order.
Most importantly the output is consistent and idempotent so we can write
effective integration tests.
Sometimes, but not always, the holos show buildplans command produces no
output.
```
❯ holos show buildplans --selector app.holos.run/cluster==w3 --log-level=debug
finalized config from flags
rendered platform in 13.458µs
```
It only happens when there's a selector. It doesn't happen without the
selector flag. It only happens with ==, not with =.
This test fails quickly.
```
while [[ $(holos show buildplans --selector app.holos.run/cluster==w3 --log-level=debug | wc -l) -eq 39 ]]; do true; done
```
This test runs until killed.
```
while [[ $(holos show buildplans --log-level=debug | wc -l) -eq 279 ]]; do true; done
```
Solution:
The problem is the use of the map. Iterating over the keys happens in a
random order. With the fix we check in an explicit order.
Without this patch the `holos show buildplans` BuildPlan output has
incorrect yaml with the v3 encoder. For example apiversion: v1alpha5
instead of apiVersion.
Show subcommand:
This is large change that accomplishes a number of goals. First, there
was no convenient way to show a build plan without using the debug logs
to indentify the tags to inject, then calling the cue command with the
right incantation to inspect the BuildPlan.
This patch addresses the problem by adding a `holos show buildplans`
command. The command loads the Platform spec from the platform
directory, then iterates over all Components to produce the BuildPlan.
This patch adds labels and annotations to the platform Components
collection in order to select and filter the output.
Result:
```
❯ holos show components --selector app.holos.run/cluster=local --format=yaml | head
kind: BuildPlan
apiversion: v1alpha5
metadata:
name: podinfo
spec:
artifacts:
- artifact: clusters/local/components/podinfo/podinfo.gen.yaml
generators:
- kind: Helm
output: helm.gen.yaml
```
---
Interface refactor:
This refactors the interface between the `holos` Go CLI layer and the
various core schema data structures. We now use a proper Go interface.
Concurrent execution over platform components has been improved to
accept a closure function so we can use the same interface method to
process the components. We use this to show each component and render
each component from different subcommands using the same interface
embedded in the builder.Platform struct.
The embedded interface allows us to easily swap in different versions,
e.g. v1beta1 and eventually v1. The number of interface methods are
quite small. 14 methods across 4 interfaces in holos/interface.go.
---
Remove old versions:
This patch removes support for versions prior to v1alpha5 in an effort
to clean up cruft.
Previously the holos render platform and component subcommands had flags
for oidc authentication and client access to the gRPC service. These
flags aren't currently used, they're remnants from the json powered form
prototype.
This patch gates the flags behind a feature flag which is disabled by
default.
Result:
holos render platform --help
render an entire platform
Usage:
holos render platform DIRECTORY [flags]
Examples:
holos render platform ./platform
Flags:
--concurrency int number of components to render concurrently (default 8)
-v, --version version for platform
Global Flags:
--log-drop strings log attributes to drop (example "user-agent,version")
--log-format string log format (text|json|console) (default "console")
--log-level string log level (debug|info|warn|error) (default "info")
---
HOLOS_FEATURE_CLIENT=1 holos render platform --help
render an entire platform
Usage:
holos render platform DIRECTORY [flags]
Examples:
holos render platform ./platform
Flags:
--concurrency int number of components to render concurrently (default 8)
--oidc-client-id string oidc client id. (default "270319630705329162@holos_platform")
--oidc-extra-scopes strings optional oidc scopes
--oidc-force-refresh force refresh
--oidc-issuer string oidc token issuer url. (default "https://login.holos.run")
--oidc-scopes strings required oidc scopes (default openid,email,profile,groups,offline_access)
--server string server to connect to (default "https://app.holos.run:443")
-v, --version version for platform
Global Flags:
--log-drop strings log attributes to drop (example "user-agent,version")
--log-format string log format (text|json|console) (default "console")
--log-level string log level (debug|info|warn|error) (default "info")
Previously we didn't have good documentation on how to manage multiple
sets of clusters.
This patch adds a clusters topic in the structures category. Each one
of the environments, projects, owners, etc... structures follow the same
pattern as #Clusters and #ClusterSets, so it makes sense to put them
into a dedicated sidebar category for specific CUE structures.
When we moved from v1alpha4 to v1alpha5 we removed ArgoConfig from the
author schema. There was no longer a clear example of how to configure
an ArgoCD Application for every component in v1alpha5.
This patch adds a topic document with an example of how to add an
Application along side the resources by mixing an additional Artifact
into the BuildPlan.
Previously the Helm generator had no support for the --kube-version
flag. This is a problem for helm charts that conditionally render
resources based on this capability.
This patch plumbs support through the author and core schemas with a new
field similar to how the enable hooks field is handled.
Previously the Helm generator had no support for the --api-versions
flag. This is a problem for helm charts that conditionally render
resources based on this capability.
This patch plumbs support through the author and core schemas with a new
field similar to how the enable hooks field is handled.
Version doesn't make as much sense since we're doing a hello world
tutorial.
Also consolidate the values information into one step, and consolidate
the breaking it down section to make it shorter and clearer.
npm i @docusaurus/core@latest @docusaurus/plugin-client-redirects@latest \
@docusaurus/preset-classic@latest @docusaurus/theme-mermaid@latest \
@docusaurus/module-type-aliases@latest @docusaurus/tsconfig@latest \
@docusaurus/types@latest
This time in the correct directory.
A tree view of the `holos-tutorial/` directory should give readers a
quick, high-level understanding of the folder structure of a typical
Holos platform project.
Previously the current version would always be unreleased at /docs/next
and we'd have to copy the doc/md/ folder into the
doc/website/versioned_docs/version-v1alpha5/ every time we made a
change.
We're going to be working on v1alpha5 for awhile so it makes sense to
just have the current version published at /docs/v1alpha5/ and we can
start /docs/v1alpha6/ whenever we're ready.
This also has the nice effect of giving us permalinks if we change the
structure again. /docs/v1alpha5/ will remain over time.
Add a helm values tutorial which is a cut down version of the v1alpha4
helm guide. The httpbin kustomize will immediately follow building on
the prometheus and blackbox charts.
Previously the holos command line expected a Platform and BuildPlan
resource at the top level of the exported data from CUE. This forced us
to use hidden fields for everything else.
This patch modifies the BuildData struct to first look for a holos top
level field and use it if present. This opens up other top level fields
for use by end users.
Our intent is to reserve any top level field prefixed with holos.
Note this follows how Timoni works as well.
This patch strips down the v1alpha4 core and author schemas to only with
is absolutely necessary for all holos users. Aspects of platform
configuration applicable to some, even most, but not all users will be
moved into documentation topics organized as a recipe book.
The functionality removed from the v1alpha4 author schemas in v1alpha5
will move into self contained examples documented as topics on the docs
site.
The overall purpose is to have a focused, composeable, maintainable
author schema to help people get started and ideally we can support for
years with making breaking changes.
With this patch the v1alpha5 helm guide test passes. We're not going to
have this guide anymore but it demonstrates we're back to where we were
with v1alpha4.
Without this patch each version of the core and author schemas are
duplicated into each docs version. This is unnecessary and difficult to
maintain now that we have docusaurus versioned docs enabled.
This patch updates the schema generation script to check if the docs
version has been released, and if so write into a markdown file in the
versioned docs folder. If not, the version is written into the next
version folder.
This patch also updates some, but not all, document links to the md or
mdx relative file paths. This is necessary to generate the correct
versioned links.
A nice outcome of this change is that technical docs no longer need to
link to version specific pages. For example, `[Core Schema]:
./api/core.md` will always refer to the correct auto generated docs
associated with the docs version.
The docs for v1alpha4 have the right information, but in the wrong
places. The most important bits are tucked away in the Core API docs.
One of our first users entirely missed the `holos generate platform`
command mentioned in the Helm guide.
We'll fix this by organizing the docs into two distinct categories.
First, a tutorial written as a series progressively building up the
minimum knowledge to use holos effectively and gain the benefits. Think
of it as a tour of the essential bits.
The second category are focused topics which stand alone. They're the
things most people using holos will need to know eventually, but aren't
essential for everyone to know. For example, Clusters and Fleets will
move from the Author API to stand alone examples of how to implement
these features if necessary.
Then there's a Glossary which serves as the place to describe our
concepts and domain specific language.
Finally there's the API documentation which should be cut down to the
specific version. The next release version will be v1alpha5.
Attribution: We're copying the Tokio docs structure, it's concise and a
similar size and complexity to our own project.
The Go docs are also an inspiration, but the project is much larger so
not directly comparable. The organization of https://go.dev/doc/ feels
complete at first glance, despite the size and age of the project. The
site also makes clear who each section is for without needing to come
right out and say it. Getting started, Using and understanding Go,
Writing modules, using databases, etc...
We switched from using a kustomize remote base to a local file so the
tests don't need to make a network round trip to github. It's also
better practice to use local files for this sort of thing.
In doing so I botched the location of the file, putting it in the
platform registration section. This patch claifies how `resources.yaml`
is linked to `httpbin.cue` through the `KustomizeConfig: Files:
"resources.yaml": _` field.
Previously there was no test coverage of the
https://holos.run/docs/guides/helm/ guide. This patch uses Roger's
testscript package, which the CUE folks also use to add comprehensive
test coverage of each step in the guide. Ideally we would execute these
commands directly from the guide itself, but for now we'll duplicate the
commands into the test script. This could be enhanced by generating the
test script from the document itself in some way.
When updating the script, use the `holos txtar` command to embed entire
helm charts into the test script. It's not super fast, but it's better
than network access and it's not terribly slow either. A few seconds to
unpack.
---
txtar: quote files for testscript unquote
For the helm guide test script we want to include the entire helm chart
which may have files that need to be quoted. This patch changes the
default behavior of the holos txtar command to quote files if necessary
and list them in an unquote script command in the comment of the
archive.
The purpose is for testscript authors to copy and paste the entire thing
into a test script and include the unquote command at the top.
---
This change also updates CUE to v0.10.1
No longer necessary now that we're on v1alpha4. Test coverage for
v1alpha4 and the user facing guides will be added back soon for use both
in the holos repo and in Unity.
Previously it wasn't clear for users if platform wide structs should be
definitions or hidden fields in CUE. They should be hidden fields when
they contain data and definitions when they define a schema.
This patch updates the generate platform v1alpha4 subcommand to use the
correct field names consistently for clarity.
Generate the social card manually from https://www.opengraph.xyz/
Override the page title tag, otherwise it shows up as "Announcing Holos
| Holos" in social links, which is weird.
The api references are in reverse order and don't have good descriptions
in the index listings. This patch adds front matter to each generated
document to order them correctly and add a nice description.
Without this patch it's difficult to mix in a plain file as a config
map. This is necessary for the use case of using a Job to generate a
secret in-cluster. We want a plain shell script to be carried through
and transformed into the job.
We already have the KustomizeConfig fields to support this, they just
weren't wired up to the #Kustomization component kind.
I didn't check if it's wired up to Helm and Kustomize for expedience.
They may be missing there as well.
PROBLEM:
Version v1alpha4 of the Author API has been updated with backwards
incompatible changes, and the deploy-a-service guide uses code from
version v1alpha3.
SOLUTION:
Update any code, links, and data that is out of date, and then run
through the guide to make sure it works locally.
OUTCOME:
The instructions in the deploy-a-service guide will work successfully
with version v1alpha4 of the Author API.
Cue uses --inject, -t as the flags to set variables for fields tagged
using @tag(var,type=string).
We used --tag, which is different and requires a mental mapping. Let's
use the same flag and also pass it multiple times like they require so
we can copy and paste the command line output from the debug logs into a
cue export command to see what's going on.
This patch deprecates the --cluster-name flag, use --inject
holos_cluster=mycluster instead.
This patch also removes the environment field from the Component core
API, leaving this to the user namespace to define via tags. We don't
want to be too opinionated on how users manage their platform, baking
environment into the schema is a slippery slope toward those kinds of
opinions.
Closes: #276
Now that we have CommonLabels as part of the ComponentConfig for all
components, it makes sense to also mix in CommonLabels for a Project.
Common labes are key aspect of the Technical Overview document.
For the Author API, it would be nice to define a schema for the fields
common to all component kinds. Users could then configure all kinds by
unifying the schema into their own platform tree.
This makes a clear use case to extract the common fields back into an
embedded struct like we did in v1alpha3. I removed the embedded struct
in v1alpha4 because it wasn't clear why it should be separate, but now
the use case is clear, to configure all component kinds.
Without this patch holos render platform may hang until the overall
context timeout is reached. This is a problem because the user has no
idea why it's hung.
This patch adds a warning at the 5 second and another at the 10 second
mark indicating the lock may be deadlocked. The user can then remove
the directory.
The Kustomize build plan kind needs to support both copying files from
the component directory and pulling resources from https URL's. Without
this patch this support is missing from the Author API
With this patch the Kustomize build plan kind has a KustomizeConfig
field with two structs, Files and Resources. The kustomization
resources list is built up from both of these.
Two transformers are used so we don't affect the GitOps transfomer which
really only needs CommonLabels.
I decided to keep this field exclusive to the Kustomize kind, but it
could replace the Kustomization field of the other kinds as well.
Without this patch the user facing API doesn't have a way to kustomize
the output of all the build plan kinds. This patch ensures the
Kustomization field is present on all of Helm, Kustomize, and
Kubernetes.
This field is inteded for patches and transforms. The second
kustomization in the transformer sequence is intended for common labels
and annotations, managed by a corresponding field instead of a full on
Kustomization resource.
Fix:
could not run: could not marshal json projects/platform/components/cert-manager: cue: marshal error: spec.artifacts.0.generators.0.helm.enableHooks: cannot convert incomplete value "bool" to JSON at internal/builder/builder.go:63
spec.artifacts.0.generators.0.helm.enableHooks: cannot convert incomplete value "bool" to JSON:
/Users/jeff/Holos/bank-of-holos/cue.mod/gen/github.com/holos-run/holos/api/core/v1alpha4/types_go_gen.cue:235:16
could not run: could not render component: exit status 1 at builder/v1alpha4/builder.go:94
Without this patch kustomize errors aren't surfaced when executing holos
render platform.
This patch gives a fighting chance to the user to figure out what's
going on. The stderr is copied, logged, and surfaced up to the parent
holos render platform command.
Previously the #Helm and #Kustomize build plan helpers were not defined
in the v1alpha4 Author API. We need this definition to update the
Quickstart guide for v1alpha4 from v1alpha3.
This patch defines the #Helm and #Kustomize helpers in the Author API
similar to how #Kubernetes is defined.
Previously #Kubernetes was defined in the platform code. This is a
problem because every platform engineer would need to copy and paste
this code.
This patch moves the #Kubernetes helper into the cue.mod directory so it
can be imported and used ergonomically.
This patch gets the Author API rendering the namespaces component in the
Bank of Holos guide. It's not the final form of the API yet, we still
need to decide how best to expose the Kubernetes, Helm, and Kustomize
definitions.
I'm thinking we abstract away the transformers and generators within the
Author API Kubernetes definition.
Without this patch the --write-to flag can't be controlled from the
PlatformSpec in the CoreAPI. We need to surface this for the ArgoConfig
struct in the AuthorAPI.
That is to say, in v1alpha3 the --write-to flag was previously assumed
to be deploy/ in ArgoConfig using the DeployFiles functionality. We no
longer have DeployFiles in Core API v1alpha4, all artifacts are instead
written relative to the --write-to flag. Still, we need to expose this
flag in the PlatformSpec so users can use something other than the
deploy directory.
Previously the file generator was unimplemented. This patch implements
it as a simple file read into the ArtifactMap for use by the Kustomize
or Join transformers.
With this patch all v1alpha4 Core API features are implemented.
Resources, Helm, and File generators. Kustomize and Join transformers.
Blank lines show up in the output which is confusing. This patch fixes
the only source location identified with the following command.
export HOLOS_LOG_LEVEL=debug
export HOLOS_LOG_FORMAT=json
holos render platform ./platform 2>&1 | jq -r 'select (.msg == "")'
Previously helm charts were cached only by name, which is a problem
because the wrong version would be used when previously cached.
This patch caches charts by name and version to ensure changes in the
version results in pulling the new cached version. It is the user's
responsibility to remove old versions.
This patch also ensures only a single go routine can run cacheChart() at
a time across processes. This is necessary when rendering a platform
because multiple processes will run the Helm generator concurrently, for
example when the same chart is used for multiple environments or
customers.
The mkdir system call serves as the locking mechanism, which is robust
and atomic on all commonly used file systems.
Previously the helm generator was not implemented and returned an error.
This patch is a first pass copying the helm method from
internal/render/helm.go
Basic testing performed with a podinfo chart. It works as the previous
versions in v1alpha3 and before works. This patch does not address the
cached version issue in #273
holos.FilePath is intended for paths relative to the platform root
directory. We use the Artifact to store lots of stuff not related to
the platform root directory, for example kustomization.yaml in a temp
dir. Most entries are not relative to the platform root directory given
the implicit cfg.WriteTo prefix.
Previously:
could not run: could not build dev-join: could not get foo.yaml: not set at builder/v1alpha4/builder.go:180
This is confusing because set has nothing to do with the missing input
from the cue code the user writes.
Result:
could not run: could not build test-join: missing foo.yaml at builder/v1alpha4/builder.go:180
This is better because it at doesn't distract the user from the fact
they're missing a foo.yaml generator output to align with the
transformer input.
The code was inlined in a number of places, it makes sense to move it to
the interface. It'll also make it easier to test, we can provide a null
writer concrete value.
Previously the Artifact collection was processed sequentially. This
patch provides a modest performance improvement, about 16% faster for
our simple 2 artifact use case, by processing each artifact
concurrently.
Platform rendering provides poor user feedback:
```
❯ holos render platform ./platforms/minimal
rendered namespaces for cluster local in 143.068583ms
rendered namespaces for cluster local in 143.861834ms
rendered namespaces for cluster local in 144.072666ms
rendered namespaces for cluster local in 144.219417ms
rendered platform in 144.326625ms
```
We want to see the metadata.name field of each BuildPlan. This patch
injects the build plan name from the platform spec to make the name
available through the end to end platform rendering process.
Result:
```
❯ holos render platform ./platforms/minimal
rendered stage-namespaces for cluster local in 146.078375ms
rendered prod-namespaces for cluster local in 146.544583ms
rendered test-namespaces for cluster local in 147.0535ms
rendered dev-namespaces for cluster local in 147.499166ms
rendered platform in 147.553875ms
```
With this patch the first use case of CUE Resources + Kustomize is fully
working, artifacts are written into the deploy directory.
❯ holos render platform ./platforms/minimal
rendered namespaces for cluster local in 143.068583ms
rendered namespaces for cluster local in 143.861834ms
rendered namespaces for cluster local in 144.072666ms
rendered namespaces for cluster local in 144.219417ms
rendered platform in 144.326625ms
The output indicates we need to plumb the BuildPlan metadata.name from
the PlatfromSpec through to the render component command. This is
necessary so we can report the correct name instead of just the base
path.
Without this patch holos writes a single yaml document that is a list.
It needs to write a file that contains multiple documents, each document
a map[string]any representing the kubernetes resource.
This patch fixes the problem. With this patch kustomize fully executes.
The manifest field isn't clear.
Much more clear to have generators produce one Output. Transformers
take multiple Inputs and produce one Output.
The final Transformer, or a single Generator, must produce the final
Artifact.
The Inputs and Output naming to produce an Artifact makes clear the
rendering pipeline we're implementing.
This also makes clear that multiple generators must have at least one
transformer to produce the final output artifact. We model a simple
Join transformer for this case, which is what `holos` was implicitly
doing previously.
Component makes much more sense, that's the domain terminology we use.
BuildContext was meant to be re-used elsewhere, but we never did so the
name serves no purpose.
The repeated enabled booleans and file fields are awkward. It's clear
it's three separate things smashed into one.
kustomize isn't really a generator. It's useless because there is no
way to reference a plain file in a component directory.
This patch replaces the kustomize generator with a file generator which
simply reads one single file. Multiple of these generators may be used
to read one or more files.
Then, kustomize may transform these generated files, which are generated
by simply reading from the filesystem.
This API is much improved over the previous.
```
kind: BuildPlan
apiVersion: v1alpha4
metadata:
name: prod-namespaces
spec:
component: projects/platform/components/namespaces
steps:
- artifact: clusters/no-cluster/components/prod-namespaces/prod-namespaces.gen.yaml
generators:
- kind: Resources
manifest: resources.gen.yaml
resources:
Namespace:
prod-jeff:
metadata:
name: prod-jeff
labels:
kubernetes.io/metadata.name: prod-jeff
kind: Namespace
apiVersion: v1
prod-gary:
metadata:
name: prod-gary
labels:
kubernetes.io/metadata.name: prod-gary
kind: Namespace
apiVersion: v1
prod-nate:
metadata:
name: prod-nate
labels:
kubernetes.io/metadata.name: prod-nate
kind: Namespace
apiVersion: v1
transformers:
- kind: Kustomize
kustomize:
kustomization:
commonLabels:
holos.run/component.name: prod-namespaces
resources:
- resources.gen.yaml
- application.gen.yaml
- artifact: clusters/no-cluster/gitops/prod-namespaces.gen.yaml
generators:
- kind: Resources
manifest: application.gen.yaml
resources:
Application:
argocd:
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
name: prod-namespaces
namespace: argocd
spec:
destination:
server: https://kubernetes.default.svc
project: default
source:
path: examples/v1alpha4/deploy/clusters/no-cluster/components/prod-namespaces
repoURL: https://github.com/holos-run/bank-of-holos
targetRevision: main
transformers:
- kind: Kustomize
kustomize:
kustomization:
commonLabels:
holos.run/component.name: prod-namespaces
resources:
- resources.gen.yaml
- application.gen.yaml
```
A build step either produces kubernetes objects or a gitops manifest.
Both are effectively the same, they're just kubernetes resources.
For the use case of applying common labels to both, we'll have the
Author API pass the same Kustomization to two separate build steps. One
step to produce the resources, a second to produce the argocd
application or flux kustomization.
Each step produces a manifest and a gitops file, so we need a unique
name for each step. The most common case will be a single build step
matching the name of the build plan itself.
The kustomize transformer needs a filename to store the output from
generators so it has an input for the transformer. This patch adds
fields for each kind of generator so the kustomize.#Kustomization can be
configured with the files `holos` will write generated output to.
This patch implements the v1alpha4 component rendering builder for a
component BuildPlan. We don't yet have the CUE definitions, so this
hasn't been end to end tested yet, the next step is defining the
generators and transforms in the core API BuildPlan.
This patch plumbs the switch statement to branch on a v1alpha4
BuildPlan. Tags need to be passed from the render platform subcommand
to the render component subcommand via the --tags argument.
This patch implements minimal rendering of a v1alpha4 platform using the
new render.Builder interface.
Tags aren't wired up yet, but this patch does cleanly separate Builder
interface from the Artifacts. Platform rendering doesn't have an
artifact itself, all artifacts are produced by rendering each component,
so we'll see how that works when we make the same changes to component
rendering, breaking it down to a render.Builder interface that sets
values in an Artifact.
The holos cli does not use an interface to handle different Platform api
versions. This makes it difficult to evolve the API in a backwards
compatible way.
This patch adds a top level switch statement to the `holos render
platform` command. The switch discriminates on the Platform API
version. v1alpha3 and earlier are classified as legacy versions and
will use the existing strict types. v1alpha4 and later versions will
use an interface to render the platform, allowing for multiple types to
implement the platform rendering interface.
PROBLEM:
The landing page contains a lot of text, and much of that text was
written before we refined our messaging within the guides and technical
overview pages.
SOLUTION:
* Whittle down landing page text to only the key messages we want to convey.
* Provide messaging bullets for the features.
* Steer folks (via links) to the quickstart guide or technical overview document.
OUTCOME:
Visitors don't need to wade through a lot of text to receive key
messaging talking points or links to the pages they should read.
PROBLEM:
There's a lot of text to grok on the landing page. A diagram would help
to visually convey what Holos does.
SOLUTION:
* Create a diagram
* Add to landing page
OUTCOME:
A visual aide is present on the landing page that helps explain where
Holos sits.
PROBLEM:
The Quickstart is lacking narrative tying the changes we're asking
people to make to the underlying organizational problems.
SOLUTION:
Improve the narrative to surface the problems we are solving and how
this affects the different teams at the Bank of Holos
OUTCOME:
Clarity on the problems the quickstart is solving.
Closes: #259
This gets us through to the end with podinfo deployed. Need to tell the
story of the migration team a bit better though, working with the
platform team to expose the service.
It's too long for documentation. Shorten it for clarity.
Result:
```
❯ holos render platform ./platform
rendered bank-accounts-db for cluster workload in 160.7245ms
rendered bank-ledger-db for cluster workload in 162.465625ms
rendered bank-userservice for cluster workload in 166.150417ms
rendered bank-ledger-writer for cluster workload in 168.075459ms
rendered bank-balance-reader for cluster workload in 172.492292ms
rendered bank-backend-config for cluster workload in 198.117916ms
rendered bank-secrets for cluster workload in 223.200042ms
rendered gateway for cluster workload in 124.841917ms
rendered httproutes for cluster workload in 131.86625ms
rendered bank-contacts for cluster workload in 154.463792ms
rendered bank-transaction-history for cluster workload in 159.968208ms
rendered bank-frontend for cluster workload in 325.24425ms
rendered app-projects for cluster workload in 110.577916ms
rendered ztunnel for cluster workload in 137.502792ms
rendered cni for cluster workload in 209.993375ms
rendered cert-manager for cluster workload in 172.933834ms
rendered external-secrets for cluster workload in 135.759792ms
rendered local-ca for cluster workload in 98.026708ms
rendered istiod for cluster workload in 403.050833ms
rendered argocd for cluster workload in 294.663167ms
rendered gateway-api for cluster workload in 228.47875ms
rendered namespaces for cluster workload in 113.586916ms
rendered base for cluster workload in 533.76675ms
rendered external-secrets-crds for cluster workload in 529.053375ms
rendered crds for cluster workload in 931.180458ms
rendered platform in 1.248310167s
```
Previously:
```
❯ holos render platform ./platform
rendered projects/bank-of-holos/backend/components/bank-ledger-db for cluster workload in 158.534875ms
rendered projects/bank-of-holos/backend/components/bank-accounts-db for cluster workload in 159.836166ms
rendered projects/bank-of-holos/backend/components/bank-userservice for cluster workload in 160.360667ms
rendered projects/bank-of-holos/backend/components/bank-balance-reader for cluster workload in 169.478584ms
rendered projects/bank-of-holos/backend/components/bank-ledger-writer for cluster workload in 169.437833ms
rendered projects/bank-of-holos/backend/components/bank-backend-config for cluster workload in 182.089333ms
rendered projects/bank-of-holos/security/components/bank-secrets for cluster workload in 196.502792ms
rendered projects/platform/components/istio/gateway for cluster workload in 122.273083ms
rendered projects/bank-of-holos/frontend/components/bank-frontend for cluster workload in 307.573584ms
rendered projects/platform/components/httproutes for cluster workload in 149.631583ms
rendered projects/bank-of-holos/backend/components/bank-contacts for cluster workload in 153.529708ms
rendered projects/bank-of-holos/backend/components/bank-transaction-history for cluster workload in 165.375667ms
rendered projects/platform/components/app-projects for cluster workload in 107.253958ms
rendered projects/platform/components/istio/ztunnel for cluster workload in 137.22275ms
rendered projects/platform/components/istio/cni for cluster workload in 233.980958ms
rendered projects/platform/components/cert-manager for cluster workload in 171.966958ms
rendered projects/platform/components/external-secrets for cluster workload in 134.207792ms
rendered projects/platform/components/istio/istiod for cluster workload in 403.19ms
rendered projects/platform/components/local-ca for cluster workload in 97.544708ms
rendered projects/platform/components/argocd/argocd for cluster workload in 289.577208ms
rendered projects/platform/components/gateway-api for cluster workload in 218.290458ms
rendered projects/platform/components/namespaces for cluster workload in 109.534125ms
rendered projects/platform/components/istio/base for cluster workload in 526.32525ms
rendered projects/platform/components/external-secrets-crds for cluster workload in 523.7495ms
rendered projects/platform/components/argocd/crds for cluster workload in 1.002546375s
rendered platform in 1.312824333s
```
Without this patch ArgoCD treats the Application as constantly out of
sync. This is also a good example of how to patch an arbitrary
component, though it patches the core BuildPlan itself now. If this is
widely used, it would be nice to add this behavior to the schema api
(aka author api).
Without this patch ArgoCD treats the Application as constantly out of
sync. This is also a good example of how to patch an arbitrary
component, though it patches the core BuildPlan itself now. If this is
widely used, it would be nice to add this behavior to the schema api
(aka author api).
Without this patch ArgoCD treats the Application as constantly out of
sync. This is also a good example of how to patch an arbitrary
component, though it patches the core BuildPlan itself now. If this is
widely used, it would be nice to add this behavior to the schema api
(aka author api).
Without this patch browsing https://bank.holos.localhost frequently gets
connection reset errors. These errors are caused by the frontend
deployment redirecting the browser to http, which is not enabled on the
Gateway we use in the guides.
This patch sets the scheme to https which corrects the problems.
See https://github.com/GoogleCloudPlatform/bank-of-anthos/issues/478
With this patch the frontend, accounts-db, and userservice all start and
become ready.
The user can log in, but on redirecting to home the site can't be
reached.
Rather than commit the jwt private key to version control like upstream
does, we use a SecretStore and ExternalSecret to sync the secret
generated by the security team in the bank-security namespace.
With this patch the SecretStore validates and the ExternalSecret
automatically syncs the secret from the bank-security namespace to the
bank-frontend namespace.
```
❯ k get ss
NAME AGE STATUS CAPABILITIES READY
bank-security 1s Valid ReadWrite True
❯ k get es
NAME STORE REFRESH INTERVAL STATUS READY
jwt-key bank-security 5s SecretSynced True
```
The pod start successfully.
```
❯ k get pods
NAME READY STATUS RESTARTS AGE
frontend-646d797d6b-7jhrx 1/1 Running 0 2m39s
❯ k logs frontend-646d797d6b-7jhrx
{"timestamp": "2024-09-16 21:44:47", "message": "info | Starting gunicorn 22.0.0", "severity": "INFO"}
{"timestamp": "2024-09-16 21:44:47", "message": "info | Listening at: http://0.0.0.0:8080 (7)", "severity": "INFO"}
{"timestamp": "2024-09-16 21:44:47", "message": "info | Using worker: gthread", "severity": "INFO"}
{"timestamp": "2024-09-16 21:44:47", "message": "info | Booting worker with pid: 8", "severity": "INFO"}
{"timestamp": "2024-09-16 21:44:57", "message": "create_app | Unable to retrieve cluster name from metadata server metadata.google.internal.", "severity": "WARNING"}
{"timestamp": "2024-09-16 21:44:57", "message": "create_app | Unable to retrieve zone from metadata server metadata.google.internal.", "severity": "WARNING"}
{"timestamp": "2024-09-16 21:44:57", "message": "create_app | Starting frontend service.", "severity": "INFO"}
{"timestamp": "2024-09-16 21:44:57", "message": "create_app | 🚫 Tracing disabled.", "severity": "INFO"}
{"timestamp": "2024-09-16 21:44:57", "message": "create_app | Platform is set to 'local'", "severity": "INFO"}
```
Expose Service frontend in the bank-frontend namespace via httproute
https://bank.holos.localhost
Organize into frontend, backend, security projects to align with three
teams who would each own this work.
remove secret from version control
Google added the secret to version control but we can generate the
secret in-cluster. Holos makes it easier to manage the ExternalSecret
or RoleBinding necessary to get it in the right place.
We need a way to demonstrate the value Holos offers in a platform team
managing projects for other teams. This patch addresses the need by
establishing the bank-of-holos schematic, which is a port of the Bank of
Anthos project to Holos.
This patch adds only the frontend to get the process started. As of
this patch the frontend pod starts and becomes ready but is not exposed
via HTTPRoute.
Refer to https://github.com/GoogleCloudPlatform/bank-of-anthos/
Previously all generated ArgoCD Application resources go into the
default project following the Quickstart guide. The configuration code
is being organized into the concept of projects in the filesystem, so we
want to the GitOps configuration to also reflect this concept of
projects.
This patch extends the ArgoConfig user facing schema to accept a project
string. The app-projects component automatically manages AppProject
resources in the argocd namespace for each of the defined projects.
This allows CUE configuration in the a project directory to specify the
project name so that all Applications are automatically assigned to the
correct project.
Providing ArgoConfig only works with the Helm kind without this patch.
This is a problem because we want to produce an Application for every
supported component kind when rendering the platform.
This patch threads the ArgoConfig struct described in the Quickstart
guide through every supported component kind.
Define a place for components to register HTTPRoute resources the
platform team needs to manage in the Gateway namespace.
The files are organized to delegate to the platform team.
This patch also fixes the naming of the argocd component so that the
Service is argocd-server instead of argo-cd-argocd-server.
Previously, the #Resources struct listing valid resources to use with
APIObjects in each of the components types was closed. This made it
very difficult for users to mix in new resources and use the Kubernetes
component kind.
This patch moves the definition of the valid resources to package holos
from the schema API. The schema still enforces some light constraints,
but doesn't keep the struct closed.
A new convention is introduced in the form of configuring all components
using _ComponentConfig defined at the root, then unifying this struct
with all of the component kinds. See schema.gen.cue for how this works.
This approach enables mixing in ArgoCD applications to all component
kinds, not just Helm as was done previously. Similarly, the
user-constrained #Resources definition unifies with all component kinds.
It's OK to leave the yaml.Marshall in the schema API. The user
shouldn't ever have to deal with #APIObjects, instead they should pass
Resources through the schema API which will use APIObjects to create
apiObjectMap for each component type and the BuildPlan.
This is still more awkward than I want, but it's a good step in the
right direction.
Without this patch the istio-gateway component isn't functional, the
HTTPRoute created for httpbin isn't programmed correctly. There is no
Gateway resource, just a deployment created by the istio helm chart.
This patch replaces the helm chart with a Gateway resource as was done
previously in the k3d platform schematic.
This patch also simplifies the certificate management to issue a single
cert valid for the platform domain and a wildcard. We intentionally
avoid building a dynamic Gateway.spec.listeners structure to keep the
expose a service guide relatively simple and focused on getting started
with Holos.
This patch adds the httpbin routes component. It's missing the
Certificate component, the next step is to wire up automatic certificate
management in the gateway configuration, which is a prime use case for
holos. Similar to how we register components and namespaces, we'll
register certificates.
This patch also adds the #Platform.Domain field to the user facing
schema API. We previously stored the domain in the Model but it makes
sense to lift it up to the Platform and have a sensible default value
for it.
Another example of #237 needing to be addressed soon.
This patch manages the httpbin Deployment, Service, and ReferenceGrant.
The remaining final step is to expose the service with an HTTPRoute and
Certificate.
We again needed to add a field to the schema APIObjects to get this to
work. We need to fix#237 soon. We'll need to do it again for the
HTTPRoute and Certificate resources.
The progression of namespaces, cert-manager, then gateway api and istio
makes much more sense than the previous progression of gateway api,
namespaces, istio.
cert-manager builds nicely on top of namespaces. gateway api are only
crds necessary for istio.
This patch also adds the local-ca component which surfaces issue #237
The Kubernetes APIObjects are unnecessarily constrained to resources we
define in the schema. We need to move the marshal code into package
holos so the user can add their own resource kinds.
This patch adds Istio to the Expose a Service documentation and
introduces new concepts. The Kubernetes build plan schema, the
namespaces component, and an example of how to safely re-use Helm values
from the root to multiple leaf components.
fix: istio cni not ready on k3d
---
The istio-k3d component embedded into holos fixes the cni pod not
becoming ready with our k3d local cluster guide. The pod log error this
fixes is:
configuration requires updates, (re)writing CNI config file at "": no networks found in /host/etc/cni/net.d
Istio CNI is configured as chained plugin, but cannot find existing CNI network config: no networks found in /host/etc/cni/net.d
Waiting for CNI network config file to be written in /host/etc/cni/net.d...
[Platform k3d]: https://istio.io/latest/docs/ambient/install/platform-prerequisites/#k3d
docs: clarify how to reset the local cluster
---
This is something we do all the time while developing and documenting,
so make it easy and fast to reset the cluster to a known good state.
This patch adds the schema api for the Kubernetes build plan, which
produces plain API resources directly from CUE. It's needed for the
namespaces component which is foundational to many of our guides.
The first guide that needs this is the expose a service guide, we need
to register the namespaces from the istio component.
The Expose a Service doc is meant to be the second step after the
Quickstart doc. This commit adds the section describing how to install
the Gateway API.
The Kustomize build plan is introduced at this point in a similar way
the Helm build plan was introduced in the quickstart.
We need an easy way to help people add a workload cluster to their
workload fleet when working through the guides. Generated platforms
should not define any clusters so they can be reused with multiple
guides.
This patch adds a simple component schematic that drops a root cue file
to define a workload cluster named workload.
The result is the following sequence renders the Gateway API when run
from an empty directory.
holos generate platform guide
holos generate component workload-cluster
holos generate component gateway-api
holos render platform ./platform
Without this patch nothing is rendered because there are no workload
clusters in the base guide platform.
Having the management cluster hard coded into the definition of the
standard fleets is problematic for guides that don't need a management
cluster.
Define the fleets, but leave the set of clusters empty until they're
needed.
Previously helm and cue components were split into two different
subcommands off the holos generate component command. This is
unnecessary, I'm not sure why it was there in the first place. The code
seemed perfectly duplicated.
This patch combines them to focus on the concept of a Component. It
doesn't matter what kind it is now that it's expected to be run from the
root of the platform repository and drop configuration at the root and
the leaf of the tree.
Previously, each document needed to be manually included in the sidebars
to show up. In addition, index paths like /docs/ and /docs/guides/ were
not found.
This patch addresses both problems by switching sidebars to
automatically generate from filesystem directories. Important documents
like the getting started guide and introduction are expected to add a
`slug: /foo` front matter item to create a permalink.
The result is the sidebar reflects the filesystem while the URL bar is
more of a permalink. Files should be able to be moved around the file
system and the sidebar tree without affecting their URL.
This patch also consolidates the API and Docs sidebars into one.
Our guides should be useful reading them only from a mobile device. For
those readers who also want to apply the manifests to a real cluster we
need a companion guide that describes how to get one.
This patch adds that guide, adapted from the old try holos locally page.
This patch incorporates the main feedback from Gary and Nate from this
morning. The note tab in argocd.cue was awkware to Gary and I. The use
of _ in CUE needs an explicit comment which this patch adds.
This patch focuses on the Day 2 benefits holos offers, specifically
making it easier to visiualize exactly what will change when upgrading
components.
In addition, it's easier to apply changes slowly and deliberately since
they're all just flat files in the local filesystem and Git repository.
Previously the quickstart didn't cover adding workload clusters and
rendering a platform with multiple clusters. This patch demonstrates
how it's effectively a one line change to clone the configuration of a
workload cluster to another geographic region.
Make sure go install works from the quickstart documentation by doing a
release. Otherwise, v0.93.1 is installed which doesn't include the
platform schema.
Previously, the quickstart step of generating the pod info component and
generating the platform as a whole left the task of integrating the
Component into the Platform as an exercise for the reader. This is a
problem because it creates unnecessary friction.
This patch addresses the problem by lifting up the Platform concept
into the user-facing Schema API. The generated platform includes a top
level #Platform definition which exposes the core Platform specification
on the Output field.
The Platform CUE instance then reduces to a simple `#Platform.Output`
which provides the Platform spec to holos for rendering each component
for each cluster.
The CUE code for the schema.#Platform iterates over each
Component to derive the list of components to manage for the Platform.
The CUE code for the generated quickstart platform links the definition
of StandardFleets, which is a Workload fleet and a Management cluster
fleet to the Platform conveninece wrapper.
Finally, the generated podinfo component drops a CUE file at the
repository root to automatically add the component to every workload
cluster.
The result is the only task left for the end user is to define at least
one workload cluster. Once defined, the component is automatically
managed because it is managed on all workload clusters.
This approach futher opens the door to allow generated components to
define their namespaces and generated secrets on the management cluster
separate from their workloads on the workload clusters.
This patch includes a behavior change, from now on all generated
components should assume they are writing to the root of the user's Git
repository so that they can generate files through the whole tree.
In the future, we should template output paths for generated components.
A simple approach might be to embed a file with a .target suffix, with
the contents being a simple Go template of the file path to write to.
The holos generate subcommand can then check if any given embedded file
foo has a foo.target companion, then write the target to the rendered
template value.
Users need to customize the default behavior of the core components,
like the Helm schema wrapper to mix-in an ArgoCD Application resource to
each component. This patch wires up #Helm in the holos package to
schema.#Helm from the v1alpha3 api.
The result is illustrated in the Quickstart documentation, it is now
simple for users to modify the definition of a Helm component such that
Application resources are mixed in to every component in the platform.
Previosly the end user needed to write, or at least copy and paste, a
large amount of boiler plate code to achieve the goal of declaring a
helm chart component. There is a gap between the cue code:
(#Helm & Chart).Output
And the full BuildPlan produced for the Holos cli to execute the
rendering process. The boiler plate code in schema.cue at the root of
the platform infrastructure repository was largely responsible for
defining how a BuildPlan with one HelmChart component is derived from
this #Helm definition.
This patch moves the definitions into a new, documented API named
`schema`. End users are expected to define their own #Helm definition
using the schema.#Helm, like so in the root level schema.cue:
#Helm: schema.#Helm
Without this patch deployments to the dev environment are failing with
the following error when commits are pushed to the main branch.
GIT_DETAIL=v0.93.0-3-g4db3fb4 GIT_SUFFIX= bash ./hack/deploy-dev
Cloning into 'holos-infra'...
could not validate
could not run: could not validate invalid BuildPlan: apiVersion invalid: want: v1alpha3 have: v1alpha2 at internal/builder/builder.go:308
could not run: could not render component: exit status 1 at internal/render/platform.go:48
make: *** [Makefile:147: dev-deploy] Error 1
This patch removes the api version check in the build plan validation
function. In the future, we should pass an interface internally in the
holos executable.
The result is holos render platform ./platform succeeds with this patch
applied.
Previously the CUE code needed to specify the Platform.spec.model field,
which created friction. This patch adds a cue struct tag to unify the
field with an open struct.
❯ holos render platform ./platform --log-level=debug
could not run: could not marshal cue instance platform: cue: marshal error: spec.model: cannot convert incomplete value "_" to JSON at internal/builder/platform.go:45
spec.model: cannot convert incomplete value "_" to JSON
The render command completes successfully with this patch without the
user having to provide a value for the spec.model field.
This patch adds the minimal amount of CUE code necessary to successfully
run the following two commands from the quickstart.
holos generate platform quickstart
holos render platform ./platform
The result is no componets are rendered, so nothing is done, but it does
succeeed.
This patch surfaces some friction and inconsistency with how the Model
is passed in and the initial structure of the _PlatformConfig. The tags
are required otherwise holos errors out.
Without this patch the `holos generate platform` command automatically
makes an rpc call to holos server. This creates friction for the
quickstart guide because we don't need to require users to register and
have an organization and platform already created in the server just to
generate a simple platform to exercise a simple helm chart component.
A future patch should implement the behavior of linking a server side
platform to a local git repository by making the API call to get the
platform ID then updating the platform.metadata.json file.
Previously the landing page focused on Holos as a reference platform.
We're refocusing the release on the holos package management tool. This
patch updates the landing page and adds placeholders for a new quick
start guide which will focus on wrapping a helm chart and a concepts
page which will provide a high level overview of how holos is unique
from other tools.
In an effort to increase reliability when trying holos locally. The
idea being generate to render platform should ideally work without a
network connection provided the executable has already been downloaded.
For example, to give a quick demo without a network connection.
Without this patch the argo install manifest may fail because the
resources are fetched from github.
This patch embeds the same resources to increase speed and reliability.
Without this patch the argo crds component takes a few seconds to render
and may fail because the resources are fetched from github.
This patch embeds the same resources to increase speed and reliability.
Without this patch the gateway api component takes a few seconds to
render and may fail because the resources are fetched from github.
This patch embeds the same resources to increase speed and reliability.
Result:
rendered components/gateway-api for cluster workload in 257.206208ms
Building the cluster today I got hung up on a `ERR_CONNECTION_CLOSED`
error from Chrome when trying to access httpbin.
The problem was I forgot to run the local-ca script, thinking I already
had a local ca. The problem is the script also copies the private key
to the cluster, so it must be run every time the cluster is created.
This patch clarifies the sequence. When resetting, everything following
the Create the Cluster step needs to be executed.
Previously the image is build on merge to main, but not deployed
anywhere. This patch adds steps to the publish workflow to deploy the
image that was published using gitops and argocd.
On a release, make tools is run which pulls in the latest connect tools
for angular. This is a problem because it makes the git tree dirty.
The packages should be in the package.json file and the lock file so
these additional steps should not be necessary.
Remove them.
Desired result is make tools is idempotent and installs the correct
pinned versions necessary to build and release the container image.
This makes the following changes to the getting started guide after
running through both the signed-in and signed-out paths:
* Added helm and git as requirements
* made it easier to modify the requirements by using all "1." list items
* Wait for the httpbin pod to be ready before continuing
* Make all the signed-out steps work
* Fixed sub-section header values so they show up in the TOC
* Fix minor typos and grammar issues
* Fix minor spacing and formatting inconsistencies
* Mark the ArgoCD guide as "coming soon"
Also fixed the docs for running the website locally to be able to
preview all these changes while working on them.
Noticed a few remaining rough edges when I read through it on my phone
last night. This patch hopefully gets the try holos doc into a place
we're happy with.
Instead of tutorials. The goal is to refine Try Holos Locally down to a
minimal number of steps and then branch out to deeper use cases like
ArgoCD, Backstage, etc...
This patch moves the ArgoCD related sections to a separate "dive deeper"
guide to trim down the length of the try holos guide.
When someone is trying holos locally but has not signed up, ArgoCD needs
to be configured to allow anonymous access. This patch enables
anonymous access and gives the admin role.
With this patch the Try Holos Locally guide can be completed without
signing up or signing in.
Nate gave the feedback the Try Holos Locally doesn't work with Orb.
This patch makes the input form accept *.local domains so we can use the
default Orb managed domain of *.k8s.orb.local
I haven't tested this, but we at least need to allow the domain to
test it.
[1]: https://docs.orbstack.dev/kubernetes/#loadbalancer-ingress
Previously the top level logger used a json handler while the rest of
the code used the default console handler. This patch unifies them to
be consistent.
Remove side comments about the reference platform. Move the in-line
exploration of ArgoCD and CUE to the end once the reader has completed
their goal. Other minor edits.
Previously CUE paniced when holos tried to unify values originating from
two different cue runtimes. This patch fixes the problem by
initializaing cue.Value structs from the same cue context.
Log messages are also improved after making one complete pass through
the Try Holos Locally guide.
Now that we have multi-platform images, we need a way to easily deploy
them. This involves changing the image tag. kustomize edit is often
used to bump image tags, but we can do better providing it directly in
the unified CUE configuration.
This patch modifies the builder to unify user data *.json files
recursively under userdata/ into the #UserData definition of the holos
entrypoint.
This is to support automation that writes simple json files to version
control, executes holos render platform, then commits and pushes the
results for git ops to take over deployment.
The make deploy target is the reason this change exists, to demonstrate
how to automatically deploy a new container image.
This patch addresses Nate's feedback that it's difficult to know what
platform is being operated on.
Previously it wasn't clear where the platform id used for push and pull
comes from. The source of truth is the platform.metadata.json file
created when the platform is first generated using `holos generate
platform k3d`.
This patch removes the platformId field from the platform.config.json
file, renames the platform.config.json file to platform.model.json and
renames the internal symbols to match the domain language of "Platform
Model" instead of the less clear "config"
This patch also changes the API between holos and CUE to use the proto
json imported from the proto file instead of generated from the go code
generated from the proto file. The purpose is to ensure protojson
encoding is used end to end.
Default log handler:
The patch also changes the default log output to print only the message
to stderr. This addresses similar feedback from both Gary and Nate that
the output is skipped over because it feels like internal debug logs.
We still want 100% of output to go through the logger so we can ensure
each line can be made into valid json. Info messages however are meant
for the user and all other attributes can be stripped off by default.
If additional source location is necessary, enable the text or json
output format.
Protobuf JSON:
This patch modifies the API contract between holos and CUE to ensure
data is exchanged exclusively using protojson. This is necessary
because protobuf has a canonical json format which is not compatible
with the go json package struct tags. When Holos handles a protobuf
message, it must marshal and unmarshal it using the protojson package.
Similarly, when importing protobuf messages into CUE, we must use `cue
import` instead of `cue go get` so that the canonical format is used
instead of the invalid go json struct tags.
Finally, when a Go struct like v1alpha1.Form is used to represent data
defined in cue which contains a nested protobuf message, Holos should
use a cue.Value to lookup the nested path, marshal it into json bytes,
then unmarshal it again using protojson.
Previously there was no way to delete a platform. This patch adds a
basic delete subcommand which deletes platforms by their id using the
rpc api.
❯ holos get platform
NAME DESCRIPTION AGE ID
k3d Holos Local k3d 20h 0190c78a-4027-7a7e-82d0-0b9f400f4bc9
k3d2 Holos Local k3d 20h 0190c7b3-382b-7212-81d6-ffcfc4a3fe7e
k3dasdf Holos Local k3d 20h 0190c7b3-728a-7212-b56d-2d2edf389003
k3d9 Holos Local k3d 20h 0190c7b8-4c4e-7cea-9d3d-a6b9434ae438
k3d-8581 Holos Local k3d 20h 0190c7ba-1de9-7cea-bff8-f15b51a56bdd
k3d-13974 Holos Local k3d 20h 0190c7ba-5833-7cea-b863-8e5ffb926810
k3d-20760 Holos Local k3d 19h 0190c7ba-7a12-7cea-a350-d55b4817d8bc
❯ holos delete platform 0190c7ba-1de9-7cea-bff8-f15b51a56bdd 0190c7ba-5833-7cea-b863-8e5ffb926810 0190c7ba-7a12-7cea-a350-d55b4817d8bc
deleted platform k3d-8581
deleted platform k3d-13974
deleted platform k3d-20760
Previously there was no way to get/list platforms. This patch adds a
basic get subcommand with list as an alias to get the platforms
currently defined in the organization.
❯ holos get platform
NAME DESCRIPTION AGE ID
k3d Holos Local k3d 18h 0190c78a-4027-7a7e-82d0-0b9f400f4bc9
k3d2 Holos Local k3d 17h 0190c7b3-382b-7212-81d6-ffcfc4a3fe7e
k3dasdf Holos Local k3d 17h 0190c7b3-728a-7212-b56d-2d2edf389003
k3d9 Holos Local k3d 17h 0190c7b8-4c4e-7cea-9d3d-a6b9434ae438
k3d-8581 Holos Local k3d 17h 0190c7ba-1de9-7cea-bff8-f15b51a56bdd
k3d-13974 Holos Local k3d 17h 0190c7ba-5833-7cea-b863-8e5ffb926810
k3d-20760 Holos Local k3d 17h 0190c7ba-7a12-7cea-a350-d55b4817d8bc
k3d-13916 Holos Local k3d 17h 0190c7ba-8313-7cea-be37-41491c95ae79
k3d-26154 Holos Local k3d 17h 0190c7ba-a117-7cea-8229-ce27da84135e
❯ holos get platform foo
7:16AM ERR could not execute version=0.89.1 code=unknown err="not found"
❯ holos get platform foo k3d
NAME DESCRIPTION AGE ID
k3d Holos Local k3d 18h 0190c78a-4027-7a7e-82d0-0b9f400f4bc9
Previously the CreatePlatform rpc wrote over all fields when the
platform already exists. This is surprising and basically the
UpdatePlatform rpc.
This patch changes the behavior to do nothing except set the
already_exists flag in the response message.
Users who have the use case of needing to know if the creation actually
created a new resource should use the API to check the already_exists
flag. The CLI has no affordance for this other than parsing the log
messages.
Previously holos.platform.v1alpha1.PlatformService.CreatePlatform
returns an error for a request to create a platform of the same name as
an existing platform.
holos create platform --name k3d --display-name "Try Holos Locally"
8:00AM ERR could not execute version=0.87.2 code=failed_precondition
err="failed_precondition: platform.go:55: ent: constraint failed:
ERROR: duplicate key value violates unique constraint
\"platform_org_id_name\" (SQLSTATE 23505)" loc=client.go:138
This patch makes the CreatePlatform rpc idempotent using the upsert API.
The already_exists bool field is added to CreatePlatformResponse
response to indicate to the client if the platform already exists or
not.
Result:
holos create platform --display-name "Holos Local" --name k3d10
11:53AM INF create.go:56 created platform k3d10 version=0.87.2
name=k3d10 id=0190c731-1808-7e7d-9ccb-3d17434d0055
org=0190c6d6-4974-7733-9f7b-5d759a3e60e7 exists=false
holos create platform --display-name "Holos Local" --name k3d10
11:53AM INF create.go:56 updated platform k3d10 version=0.87.2
name=k3d10 id=0190c731-1808-7e7d-9ccb-3d17434d0055
org=0190c6d6-4974-7733-9f7b-5d759a3e60e7 exists=true
Previously I developed holos server in the dev-holos namespace of a
remote cluster. This patch updates the Tilt configs to develop locally
against k3d quickly and easily.
The database is a CNPG database which replaces PGO. This is simpler and
ligher weight, one container in one pod. CNPG has no repo host like PGO
has.
When starting holos server from the production Deployment, pgbouncer
blocks the automatic migration on startup.
```json
{
"time": "2024-07-16T16:35:52.54507682-07:00",
"level": "ERROR",
"msg": "could not execute",
"version": "0.87.2",
"code": "unknown",
"err": "sql/schema: create \"users\" table: ERROR: permission denied for schema public (SQLSTATE 42501)",
"loc": "cli.go:82"
}
```
This patch separates automatic migration into a `holos server init`
subcommand intended for use in a Job.
Closes: #204
Previously, the Tiltfile was hard-wired to Jeff's development
environment on the k2 cluster on-prem. This doesn't work for other
contributors.
This patch fixes the problem by re-using the [Try Holos Locally][1]
documentation to create a local development enironment. This has a
number of benefits. The evaluation documentation will be kept up to
date because it doubles as our development environment. Developing
locally is preferrable to developing in a remote cluster. Hostnames and
URL's can be constant, e.g. https://app.holos.localhost/ for local dev
and https://app.holos.run/ for production. We don't need to push to a
remote container registry, k3d has a local registry built in that works
with Tilt.
The only difference presently between evaluation and development when
following the local/k3d doc is the addition of a local registry.
With this patch holos starts up and is accessible at
https://app.holos.localhost/
[1]: https://holos.run/docs/tutorial/local/k3d/
This applies various grammar, formatting, and flow improvements to the
local k3d tutorial steps based on running through it from start to
finish.
This also removes the Go code responsible for embedding the website into
`holos`, which isn't needed since the site is hosted on Cloudflare
Pages.
Made it in preview using a background png from https://social.cards/ and
converting our logo.
mogrify -background none -resize 1200x -format png logo.svg
This patch fixes up the link colors and mermaid diagrams to look better
in both light and dark mode. This may not be the final result but it
moves in the right direction.
Links are now blue with a visible line on hover.
Previously the guide did not cover reconciling holos platform components
with GitOps. This patch adds instructions on how to apply the
application resources, review the diff, sync manually, and finally
enable automatic sync using CUE's struct merge feature.
Previously there is no web app except httpbin in the k3d platform. This
commit adds ArgoCD with an httproute and authorization policy at the
mesh layer. The application layer authenticates against a separate
oidc client id in the same issuer the mesh uses to demonstrate zero
trust and compatibility between the application and platform layers.
With this patch the user can authenticate and log in, but applications
are not configured. The user has no roles in ArgoCD either, rbac needs
to be configured properly for the getting started guide.
This patch adds the authproxy and authpolicy holos components to the k3d
platform for local evaluation. This combination implements a basic Zero
Trust security model. The httpbin backend service is protected with
authenication and authorization at the platform level without any
changes to the backend service.
The client id and project are static because they're defined centrally
in https://login.holos.run to avoid needing to setup a full identity
provider locally in k3d.
With this patch authentication and authorization work from both the web
browser and from the command line with curl using the token provided by
the holos cli.
Previously the local k3d tutorial doesn't expose any services to verify
the local certificate and the local dns changes work as expected.
This patch adds instructions and modifies the k3d platform to work with
a local mkcert certificate. A ClusterIssuer is configured to issue
Certificate resources using the ca private key created my mkcert.
With this patch, following the instructions results in a working and
trusted httpbin resource at https://httpbin.holos.localhost This works
both in Chrome and curl on the command line.
This patch adds a script to install a local CA and configure cert
manager to issue certs similar to how it issues certs using LetsEncrypt
in a real cluster.
Previously there is no way to evaluate Holos on local host. This is a
problem because it's a high barrier to entry to setup a full blown GKE
and EKS cluster to evaluate the reference platform.
This patch adds a minimal, but useful, k3d platform which deploys to a
single local k3d cluster. The purpose is to provide a shorter on ramp
to see the value of ArgoCD integrated with Istio to provide a zero trust
auth proxy.
The intentional trade off is to provide a less-holistic k3d platform
with a faster on-ramp to learn about the value the more-holistic holos
platform.
With this patch the documentation is correct and the platform renders
fully. The user doesn't need to provide any Platform Model values, the
default suffice.
For the ArgoCD client ID, we'll use https://login.holos.run as the
issuer instead of building a new OIDC issuer inside of k3d, which would
create significant friction.
This patch adds a diagram that gives an overview of the holos rendering
pipeline. This is an importantn concept to understand when working with
holos components.
Note this probably should not go in the Overview, which is intended only
to give a sense of what getting started looks like. Move it to the
render page when we add it.
Previously there are no diagrams in the documentation. This patch wires
up mermaid for use in code blocks in the markdown files. A minimal
diagram is added to verify mermaid works but it's not the final diagram.
Previously the Docusaurus features examples were still in place on the
home page. This patch replaces the homepage features with Holos
specific features and illustrations from undraw.
Refer to https://undraw.co/search
Generating the docusaurus site is not idempotent like generating the
Angular web app. This is a problem for building and releasing the
executable because it creates a dirty git state.
Embedding the doc website into the executable is no longer necessary
since we're deploying the site with Cloudflare pages. Remove it from
the compiled executable as a result.
Cloudflare fails to build the website with:
```
07:44:47.179 sh: 1: docusaurus: not found
07:44:47.192 Failed: Error while executing user command. Exited with error code: 127
```
Resolve it by executing npm install from the build-website script and
note the script is intended for use in a cloudflare context.
The API docs are not published yet becuase the module is private. Our
own docs site does not have any API reference docs.
This patch adds auto-generated markdown docs for the core v1alpha2 types
by generating them directly from the go source code.
Some light editing of the output of `gomarkdoc` is necessary to get the
heading anchor tags to align correctly for Docusaurus.
The github workflows fail because yarn is not available. The Angular
frontend app uses npm so we should also use npm for the website to
minimize dependencies.
Previously `go install` fails to install holos.
```
❯ go install github.com/holos-run/holos/cmd/holos@latest
../../go/pkg/mod/github.com/holos-run/holos@v0.86.0/internal/frontend/frontend.go:25:12: pattern holos/dist/holos/ui/index.html: no matching files found
../../go/pkg/mod/github.com/holos-run/holos@v0.86.0/doc/website/website.go:14:12: pattern all:build: no matching files found
```
This is because we do not commit required files. This patch fixes the
problem by following Rob Pike's guidance to commit generated files.
This patch also replaces the previous use of Makefile tasks to generate
code with //go:generate directives.
This means the process of keeping the source code clean is straight
forward:
```
git clone
make tools
make generate
make build
```
Refer to https://go.dev/blog/generate
> Also, if the containing package is intended for import by go get, once
> the file is generated (and tested!) it must be checked into the source
> code repository to be available to clients. - Rob Pike
Previously docs are not published. This patch adds Docusaurus into the
doc/website directory which is also a Go package to embed the static
site into the executable.
Serve the site using http.Server with a h2c handler with the command:
holos website --log-format=json --log-drop=source
The website subcommand is intended to be run from a container as a
Deployment. For expedience, the website subcommand doesn't use the
signals package like the server subcommand does. Consider using it for
graceful Deployment restarts.
Refer to https://github.com/ent/ent/tree/master/doc/website
Previously a couple of methods were defined on the Result struct.
This patch moves the methods to an internal wrapper struct to remove
them from the API documentation.
With this patch the API between holos and CUE is entirely a data API.
Previosly, the holos component Results for each ArgoCD Application
resource managed as part of each BuildPlan results in an empty file
being written for the empty list of k8s api objects.
This patch fixes the problem by skipping writing the accumulated output
of API objects with the Result metadata.name starts with `gitops/`.
This is kind of a hack, but it works well enough for now.
Previously components appeared to be duplicated, it was not clear to the
user one build plan results in two components: one for the k8s yaml and
one for the gitops argocd Application resource.
```
❯ holos render component --cluster-name aws1 components/login/zitadel-server
9:27AM INF result.go:195 wrote deploy file version=0.84.1 path=deploy/clusters/aws1/gitops/zitadel-server.application.gen.yaml bytes=338
9:27AM INF render.go:92 rendered zitadel-server version=0.84.1 cluster=aws1 name=zitadel-server status=ok action=rendered
9:27AM INF render.go:92 rendered zitadel-server version=0.84.1 cluster=aws1 name=zitadel-server status=ok action=rendered
```
This patch prefixes the ArgoCD Application resource, which is
implemented as a separate HolosComponent in the same BuildPlan. The
result is more clear about what is going on:
```
❯ holos render component --cluster-name aws1 components/login/zitadel-server
9:39AM INF result.go:195 wrote deploy file version=0.84.1 path=deploy/clusters/aws1/gitops/zitadel-server.application.gen.yaml bytes=338
9:39AM INF render.go:92 rendered gitops/zitadel-server version=0.84.1 cluster=aws1 name=gitops/zitadel-server status=ok action=rendered
9:39AM INF render.go:92 rendered zitadel-server version=0.84.1 cluster=aws1 name=zitadel-server status=ok action=rendered
```
The pod identity webhook component fails to render with v1alpha2. This
patch fixes the problem by providing concrete values for enableHooks and
the namespace of the helm chart holos component.
The namespace is mainly necessary to render the ArgoCD Application
resource along side the helm chart output.
With this patch the eso-creds-manager component renders correctly. This
is a `#Kubernetes` type build plan which uses the
spec.components.resources map to manage resources.
The only issue was needing to provide the namespace to the nested holos
component inside the BuildPlan.
The ArgoCD Application resource moves to the DeployFiles field of a
separate holos component in the same build plan at
spec.components.resources.argocd. For this reason a separate Result
object is no longer necessary inside of the Holos cli for the purpose of
managing Flux or ArgoCD gitops. The CUE code can simply inline whatever
gitops resources it wants and the holos cli will write the files
relative to the cluster specific deploy directory.
Result:
```
❯ holos render component --cluster-name management components/eso-creds-manager
2:55PM INF result.go:195 wrote deploy file version=0.84.1 path=deploy/clusters/management/gitops/eso-creds-manager.application.gen.yaml bytes=350
2:55PM INF render.go:92 rendered eso-creds-manager version=0.84.1 cluster=management name=eso-creds-manager status=ok action=rendered
```
Previously holos render platform failed for the holos platform. The issue was
caused by the deployFiles field moving from the BuildPlan down to
HolosComponent.
This patch fixes the problem by placing the ArgoCD Application resource into a
separate Resources entry of the BuildPlan. The sole purpose of this additional
entry in the Resources map is to produce the Application resource along side
any other components which are part of the build plan.
Previously methods were defined on the API objects in the v1alpha1 API.
The API should be data structures only. This patch refactors the
methods responsible for orchestrating the build plan to pull them into
the internal render package.
The result is the API is cleaner and has no methods. The render package
has corresponding data structures which simply wrap around the API
structure and implement the methods to render and return the result to
the CLI.
This commit compiles, but it has not been tested at all. It's almost
surely broken completely.
Previously in v1alpha1, all Holos structs are located in the same
package. This makes it difficult to focus on only the structs necessary
to transfer configuration data from CUE to the `holos` cli.
This patch splits the structs into `meta` and `core` where the core
package holds the structs end users should refer to and focus on. Only
the Platform resource is in core now, but other BuildPlan types will be
added shortly.
Previously Backstage was not configured to integrate with GitHub. The
integration is necessary for Backstage to automatically discover
resources in a GitHub organization and import them into the Catalog.
This patch adds a new platform model form field and section for the
primary GitHub organization name of the platform. Additional GitHub
organizations can be added in the future, Backstage supports them.
The result is Backstage automatically scans public and private
repositories and adds the information in `catalog-info.yaml` to the UI.
Previosly the gateway ArogCD Application resource is out of sync because
the `default-istio` `ServiceAccount` is not in the git repository
source. Argo would prune the service account on sync which is a problem.
This patch manages the service account so the Application can be synced
properly.
Previously the holos render platform command fails with the following
error when giving a demo after the generate platform step.
This patch updates the internal generated holos platform to the latest
version.
Running through the demo is successful now.
```
holos logout
holos login
holos register user
holos generate platform holos
holos pull platform config .
holos render platform ./platform
```
I'm not sure if we should check in the loop, in the go routine, or in
both places. Double check in both cases just to be sure we're not doing
extra unnecessary work.
Previously a channel was used to limit concurrency. This is more
difficult to read and comprehend than the inbuilt errorgroup.SetLimit
functionality.
This patch uses `errgroup.`[Group.SetLimit()][1] to limit concurrency,
avoid leaking go routines, and avoid unnecessary work.
[1]: https://pkg.go.dev/golang.org/x/sync/errgroup#Group.SetLimit
This adds concurrency to the 'holos render platform' command so platform
components are rendered in less time than before.
Default concurrency is set to `min(runtime.NumCPU(), 8)`, which is the
lesser of 8 or the number of CPU cores. In testing, I found that past 8,
there are diminishing or negative returns due to memory usage or
rendering each component.
In practice, this reduced rendering of the saas platform components from
~90s to ~28s on my 12-core macbook pro.
This also changes the key name of the Helm Chart's version in log lines
from `version` to `chart_version` since `version` already exists and
shows the Holos CLI version.
Previously, when a user registered and logged into the holos app server,
they were able to reach admin interfaces like
https://argocd.admin.example.com
This patch adds AuthorizationPolicy resources governing the whole
cluster. Users with the prod-cluster-{admin,edit,view} roles may access
admin services like argocd.
Users without these roles are blocked with RBAC: access denied.
In ZITADEL, the Holos Platform project is granted to the CIAM
organization without granting the prod-cluster-* roles, so there's no
possible way a CIAM user account can have these roles.
Previously there wasn't a good way to populate the platform model in the
database after building a new instance of holos server.
With this patch, the process to reset clean is:
```
export HOLOS_SERVER=https://dev.app.holos.run:443
grpcurl -H "x-oidc-id-token: $(holos token)" ${HOLOS_SERVER##*/} holos.user.v1alpha1.SystemService.DropTables
grpcurl -H "x-oidc-id-token: $(holos token)" ${HOLOS_SERVER##*/} holos.system.v1alpha1.SystemService.SeedDatabase
```
Then populate the form and model:
```
holos push platform form .
holos push platform model .
```
The `platform.config.json` file stored in version control is pushed to
the holos server and stored in the database. This makes it nice and
easy to reset entirely, or move to another service url.
Previously the default oidc issuer was to one of the kubernetes clusters
running in my basement. This patch changes the issuer to the production
ready issuer running in EKS.
Previously the holos server Service was not exposed.
This patch exposes the holos service with an HTTPRoute behind the auth
proxy. Holos successfully authenticates the user with the
x-oidc-id-token header set by the default Gateway.
---
Add dev-holos-infra and dev-holos-app
Previously the PostgresCluster and the holos server Deployment are not
managed on the aws2 cluster.
This patch is a start, but the Deployment does not yet start. We need
to pass an option for the oidc issuer.
---
Add namespaces and cert for prod-holos, dev-holos, jeff-holos
Previously we didn't have a place to deploy holos server. This patch
adds a namespace, creates a Gateway listener, and binds the tls certs
for app.example.com and *.app.example.com to the listeners.
In addition, cluster specific endpoints of *.app.aws2.example.com,
*.app.aws1.example.com, etc. are created to provide dev environment
urls. For example jeff.app.aws2.example.com is my personal dev hostname.
Previously holos render platform ./platform did not render any GitOps
resources for Flux or ArgoCD.
This patch uses the new DeployFiles field in holos v0.83.0 to write an
Application resource for every component BuildPlan listed in the
platform.
Previously, each BuildPlan has no clear way to produce an ArgoCD
Application resource. This patch provides a general solution where each
BuildPlan can provide arbitrary files as a map[string]string where the
key is the file path relative to the gitops repository `deploy/` folder.
Previously ArgoCD has no ssh credentials to connect to GitHub. This
patch adds an ssh ed25519 key as a secret in the management cluster.
The secret is synced to the workload clusters using an ExternalSecret
with the proper label for ArgoCD to find and load it for use with any
application that references the Git URL.
Previously a logged in user could not modify anything in ArgoCD. With
this patch users who have been granted the prod-cluster-admin role in
ZITADEL are granted the admin role in ArgoCD.
Previously ArgoCD was present in the platform configuration, but not
functional. This patch brings ArgoCD fully up, integrated with the
service mesh, auth proxy, and SSO at
https://argocd.admin.clustername.example.com/
The upstream [helm chart][1] is used instead of the kustomize install
method. We had existing prior art integrating the v6 helm chart with
the holos platform identity provider, so we continue with the helm
chart.
CRDs are still managed with the kustomize version. The CRDs need to be
kept in sync. It's possible to generate the kustomization.yaml file
from the same version value as is used by the helm chart, but we don't
for the time being.
[1]: https://github.com/argoproj/argo-helm/tree/argo-cd-7.1.1/charts/argo-cd
Previously, no RequestAuthentication or AuthorizationPolicy resources
govern the default Gateway. This patch adds the resources and
configures the service mesh with the authproxy as an ExtAuthZ provider
for CUSTOM AuthorizationPolicy rules.
This patch also fixes a bug in the zitadel-server component where
resources from the upstream helm chart did not specify a namespace.
Kustomize is used as a post processor to force all resources into the
zitadel namespace.
Add multiple HTTPRoutes to validate http2 connection reuse
This patch adds multiple HTTPRoute resources which match
*.admin.example.com The purpose is to validate http2 connections are
reused properly with Chrome.
With this patch no 404 no route errors are encountered when navigating
between the various httpbin{1,2,3,4} urls.
Add note backupRestore will trigger a restore
The process of configuring ZITADEL to provision from a datasource will
cause an in-place restore from S3. This isn't a major issue, but users
should be aware data added since the most recent backup will be lost.
Previously, HTTPRoute resources were in the same namespace as the
backend service, httpbin in this case. This doesn't follow the default
behavior of a Gateway listener only allowing attachment from HTTPRoute
resources in the same namespace as the Gateway.
This also complicates intercepting the authproxy path prefix and sending
it to the authproxy. We'd need to add a ReferenceGrant in the authproxy
namespace, which seems backwards and dangerous because it would grant
the application developer the ability to route requests to all Services
in the istio-gateways namespace.
This patch enables Cluster Operators to manage the HTTPRoute resources
and direct the auth proxy path prefix of `/holos/authproxy` to the auth
proxy Service in the same namespace.
ReferenceGrant resources are used to enable the HTTPRoute backend
references.
When an application developer needs to manage their own HTTPRoute, as is
the case for ZITADEL, a label selector may be used and will override
less specific HTTPRoute hostsnames in the istio-gateways namespace.
With redis. The auth proxy authenticates correctly against zitadel
running in the same cluster. Validated by visiting
https://httpbin.admin.clustername.example.com/holos/authproxy
Visiting
https://httpbin.admin.clustername.example.com/holos/authproxy/auth
returns the id token in the response header, visible in the Chrome
network inspector. The ID token works as expected from multiple orgs
with project grants in ZITADEL from the Holos org to the OIS org.
This patch doesn't fully implement the auth proxy feature.
AuthorizationPolicy and RequestAuthentication resources need to be
added.
Before we do so, we need to move the HTTPRoute resources into the
gateway namespace so all of the security policies are in one place and
to simplify the process of routing requests to two backends, the
authproxy and the backend server.
This patch adds multiple HTTPRoute resources which match
*.admin.example.com The purpose is to validate http2 connections are
reused properly with Chrome.
With this patch no 404 no route errors are encountered when navigating
between the various httpbin{1,2,3,4} urls.
Problem:
Istio 1.22 with Gateway API and HTTPRoute is mis-routing HTTP2 requests
when the tls certificate has two dns names, for example
login.example.com and *.login.example.com.
When the user visits login.example.com and then tries to visit
other.login.example.com with Chrome, the connection is re-used and istio
returns a 404 route not found error even though there is a valid and
accepted HTTPRoute for *.login.example.com
This patch attempts to fix the problem by ensuring certificate dns names
map exactly to Gateway listeners. When a wildcard cert is used, the
corresponding Gateway listener host field exactly matches the wildcard
cert dns name so Istio and envoy should not get confused.
This patch adds the ZITADEL server component, which deploys zitadel from
a helm chart. Kustomize is used heavily to patch the output of helm to
make the configuration fit nicely with the holos platform.
With this patch the two Jobs that initialize the database and setup
ZITADEL run successfully. The ZITADEL deployment starts successfully.
ZITADEL is accessible at https://login.example.com/ with the default
admin username of `zitadel-admin@zitadel.login.example.com` and password
`Password1!`.
Use grant.holos.run/subdomain.admin: "true" for HTTPRoute
This patch clarifies the label that grants httproute attachment for a
subdomain Gateway listener to a namespace.
Fix istio-base holos component name
Was named `base` which is the chart name, not the holos component name.
This patch adds the postgres clusters and a few console form controls to
configure how backups are taken and if the postgres cluster is
initialized from an existing backup or not.
The pgo-s3-creds file is manually created at this time. It looks like:
❯ holos get secret -n zitadel pgo-s3-creds --print-key s3.conf
[global]
repo2-cipher-pass=XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
repo2-s3-key=KKKKKKKKKKKKKKKKKKKK
repo2-s3-key-secret=/SSSSSSS/SSSSSSSSSSSSSSSSSSSSSSSSSSSSSSS
repo3-cipher-pass=XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
repo3-s3-key=KKKKKKKKKKKKKKKKKKKK
repo3-s3-key-secret=/SSSSSSS/SSSSSSSSSSSSSSSSSSSSSSSSSSSSSSS
The s3 key and secret are credentials to read / write to the bucket.
The cipher pass is a random string for client side encryption. Generate
it with `tr -dc A-Za-z0-9 </dev/urandom | head -c 64`
This patch is foundational work for the ZITADEL login service.
This patch adds a tls certificate with names *.login.example.com and
login.example.com, a pair of listeners attached to the certificate in
the `default` Gateway, and the ExternalSecret to sync the secret from
the management cluster.
The zitadel namespace is managed and has the label
holos.run/login.grant: "true" to grant HTTPRoute attachment from the
zitadel namespace to the default Gateway in the istio-gateways
namespace.
With this change, https://httpbin.admin.aws1.example.com works as
expected.
PROXY protocol is configured on the AWS load balancer and the istio
gateway. The istio gateway logs have the correct client source ip
address and x-forwarded-for headers.
Namespaces must have the holos.run/admin.grant: "true" label in order to
attach an HTTPRoute to the admin section of the default Gateway.
The TLS certificate is working as expected and hopefully does not suffer
from the NR route not found issued encountered with the Istio Gateway
API.
This patch gets the istio-ingressgateway up and running in AWS with
minimal configuration. No authentication or authorization policies have
been migrated from previous iterations of the platform. These will be
handled in subsequent iterations.
Connectivity to a backend service like httpbin has not yet been tested.
This will happen in a follow up as well using /httpbin path prefixes on
existing services like argocd to conserve certificate resources.
This is the standard way to issue public facing certificates. Be aware
of the 50 cert limit per week from LetsEncrypt. We map names to certs
1:1 to avoid http2 connection reuse issues with istio.
Manage certificates on a project basis similar to how namespaces
associated with each project are managed.
Manage the Certificate resources on the management cluster in the
istio-ingress namespace so the tls certs can be synced to the workload
clusters.
The secretstores component is critical and provides the mechanism to
securely fetch Secret resources from the Management Cluster.
The holos server and configuration code stored in version control
contains only ExternalSecret references, no actual secrets.
This component adds a `default` `SecretStore` to each management
namespace which uses the `eso-reader` service account token to
authenticate to the management cluster. This service account is limited
to reading secrets within the namespace it resides in.
For example:
```yaml
---
apiVersion: external-secrets.io/v1beta1
kind: SecretStore
metadata:
name: default
namespace: external-secrets
spec:
provider:
kubernetes:
auth:
token:
bearerToken:
key: token
name: eso-reader
remoteNamespace: external-secrets
server:
caBundle: Long Base64 encoded string
url: https://34.121.54.174
```
This patch adds the `eso-creds-manager` component which needs to be
applied to the management cluster prior to the `eso-creds-refreher`
component being applied to workload clusters.
The manager component configures rbac to allow the creds-refresher job
to complete.
This patch also adjusts the behavior to only create secrets for the
eso-reader account by default.
Namespaces with the label `holos.run/eso.writer=true` will also have an
eso-writer secret provisioned in their namespace, allowing secrets to be
written back to the management cluster. This is intended for the
PushSecret resource.
Use v0.81.2 to build out the holos platform. Once we have the
components structured fairly well we can circle back around and copy the
components to schematics. There's a bit of friction regenerating the
platform from schematic each time.
Using CUE definitions like #Platform to hold data is confusing. Clarify
the use of fields, definitions like #Platform define the shape (schema)
of the data while private fields like _Platform represent and hold the
data.
The first thing most platforms need to do is come up with a strategy for
managing namespaces across multiple clusters.
This patch defines #Namespaces in the holos platform and adds a
namespaces component which loops over all values in the #Namespaces
struct and manages a kubernetes Namespace object.
The platform resource itself loops over all clusters in the platform to
manage all namespaces across all clusters.
From a blank slate:
```
❯ holos generate platform holos
4:26PM INF platform.go:79 wrote platform.metadata.json version=0.82.0 platform_id=018fa1cf-a609-7463-aa6e-fa53bfded1dc path=/home/jeff/workspace/holos-run/holos-infra/saas/platform.metadata.json
4:26PM INF platform.go:91 generated platform holos version=0.82.0 platform_id=018fa1cf-a609-7463-aa6e-fa53bfded1dc path=/home/jeff/workspace/holos-run/holos-infra/saas
❯ holos pull platform config .
4:26PM INF pull.go:64 pulled platform model version=0.82.0 server=https://jeff.app.dev.k2.holos.run:443 platform_id=018fa1cf-a609-7463-aa6e-fa53bfded1dc
4:26PM INF pull.go:75 saved platform config version=0.82.0 server=https://jeff.app.dev.k2.holos.run:443 platform_id=018fa1cf-a609-7463-aa6e-fa53bfded1dc path=platform.config.json
❯ (cd components && holos generate component cue namespaces)
4:26PM INF component.go:147 generated component version=0.82.0 name=namespaces path=/home/jeff/workspace/holos-run/holos-infra/saas/components/namespaces
❯ holos render platform ./platform/
4:26PM INF platform.go:29 ok render component version=0.82.0 path=components/namespaces cluster=management num=1 total=2 duration=464.055541ms
4:26PM INF platform.go:29 ok render component version=0.82.0 path=components/namespaces cluster=aws1 num=2 total=2 duration=467.978499ms
```
The result:
```sh
cat deploy/clusters/management/components/namespaces/namespaces.gen.yaml
```
```yaml
---
metadata:
name: holos
labels:
kubernetes.io/metadata.name: holos
kind: Namespace
apiVersion: v1
```
Without this patch the
holos.platform.v1alpha1.PlatformService.CreatePlatform doesn't work as
expected. The Platform message is used which incorrectly requires a
client supplied id which is ignored by the server.
This patch allows the creation of a new platform by reusing the update
operation as a mutation that applies to both create and update. Only
modifiable fields are part of the PlatformMutation message.
This patch adds to more example helm chart based components. podinfo
installs as a normal https repository based helm chart. podinfo-oci
uses an oci image to manage the helm chart.
The way holos handls OCI images is subtle, so it's good to include an
example right out of the chute. Github actions uses OCI images for
example.
This patch adds a command to generate CUE based holos components from
examples embedded in the executable. The examples are passed through
the go template rendering engine with values pulled from flags.
Each directory in the embedded filesystem becomes a unique command for
nice tab completion. The `--name` flag defaults to "example" and is the
resulting component name.
A follow up patch with more flags will set the stage for a Helm
component schematic.
```
holos generate component cue minimal
```
```txt
3:07PM INF component.go:91 generated component version=0.80.2 name=example path=/home/jeff/holos/dev/bare/components/example
```
Split holos render into component and platform.
This patch splits the previous `holos render` command into subcommands.
`holos render component ./path/to/component/` behaves as the previous
`holos render` command and renders an individual component.
The new `holos render platform ./path/to/platform/` subcommand makes
space to render the entire platform using the platform model pulled from
the PlatformService.
Starting with an empty directory:
```sh
holos register user
holos generate platform bare
holos pull platform config .
holos render platform ./platform/
```
```txt
10:01AM INF platform.go:29 ok render component version=0.80.2 path=components/configmap cluster=k1 num=1 total=1 duration=448.133038ms
```
The bare platform has a single component which refers to the platform
model pulled from the PlatformService:
```sh
cat deploy/clusters/mycluster/components/platform-configmap/platform-configmap.gen.yaml
```
```yaml
---
kind: ConfigMap
apiVersion: v1
metadata:
name: platform
namespace: default
data:
platform: |
spec:
model:
cloud:
providers:
- cloudflare
cloudflare:
email: platform@openinfrastructure.co
org:
displayName: Open Infrastructure Services
name: ois
```
This patch adds a subcommand to pull the data necessary to construct a
PlatformConfig DTO. The PlatformConfig message contains all of the
fields and values necessary to build a platform and the platform
components. This is an alternative to holos passing multiple tags to
CUE. The PlatformConfig is marshalled and passed once.
The platform config is also stored in the local filesystem in the root
directory of the platform. This enables repeated local building and
rendering without making an rpc call.
The build / render pipeline is expected to cache the PlatformConfig once
at the start of the pipeline using the pull subcommand.
The `holos render platform` command is unimplemented. This patch
partially implements platform rendering by fetching the platform model
from the PlatformService and providing it to CUE using a tag.
CUE returns a `kind: Platform` resource to `holos` which will eventually
process a Buildlan for each platform component listed in the Platform
spec.
For now, however, it's sufficient to have the current platform model
available to CUE.
Problem:
Rendering the whole platform doesn't need a cluster name.
Solution:
Make the flag optional, do not set the cue tag if it's empty.
Result:
Holos renders the platform resource and proceeds to the point where we
need to implement the iteration over platform components, passing the
platform model to each one and rendering the component.
We need to output a kind: Platform resource from cue so holos can
iterate over each build plan. The platform resource itself should also
contain a copy of the platform model obtained from the PlatformService
so holos can easily pass the model to each BuildPlan it needs to execute
to render the full platform.
This patch lays the groundwork for the Platform resource. A future
patch will have the holos cli obtain the platform model and inject it as
a JSON encoded string to CUE. CUE will return the Platform resource
which is a list of references to build plans. Holos will then iterate
over each build plan, pass the model back in, and execute the build
plan.
To illustrate where we're headed, the `cue export` step will move into
`holos` with a future patch.
```
❯ holos register user
3:34PM INF register.go:77 user version=0.80.0 email=jeff@ois.run server=https://app.dev.k2.holos.run:443 user_id=018f8839-3d74-7e39-afe9-181ad2fc8abe org_id=018f8839-3d74-7e3a-918c-b36494da0115
❯ holos generate platform bare
3:34PM INF generate.go:79 wrote platform.metadata.json version=0.80.0 platform_id=018f8839-3d74-7e3b-8cb8-77a2c124d173 path=/home/jeff/holos/dev/bare/platform.metadata.json
3:34PM INF generate.go:91 generated platform bare version=0.80.0 platform_id=018f8839-3d74-7e3b-8cb8-77a2c124d173 path=/home/jeff/holos/dev/bare
❯ holos push platform form .
3:34PM INF push.go:70 pushed: https://app.dev.k2.holos.run:443/ui/platform/018f8839-3d74-7e3b-8cb8-77a2c124d173 version=0.80.0
❯ cue export ./platform/
{
"metadata": {
"name": "bare",
"labels": {},
"annotations": {}
},
"spec": {
"model": {}
},
"kind": "Platform",
"apiVersion": "holos.run/v1alpha1"
}
```
When the holos server URL switches, we also need to update the client
context to get the correct org id.
Also improve quality of life by printing the url to the form when the
platform form is pushed to the server.
❯ holos push platform form .
11:41AM INF push.go:71 updated platform form version=0.79.0 server=https://app.dev.k2.holos.run:443 platform_id=018f87d1-7ca2-7e37-97ed-a06bcee9b442
11:41AM INF push.go:72 https://app.dev.k2.holos.run:443/ui/platform/018f87d1-7ca2-7e37-97ed-a06bcee9b442 version=0.79.0
This sub-command renders the web app form from CUE code and updates the
form using the `holos.platform.v1alpha1.PlatformService/UpdatePlatform`
rpc method.
Example use case, starting fresh:
```
rm -rf ~/holos
mkdir ~/holos
cd ~/holos
```
Step 1: Login
```sh
holos login
```
```txt
9:53AM INF login.go:40 logged in as jeff@ois.run version=0.79.0 name="Jeff McCune" exp="2024-05-17 21:16:07 -0700 PDT" email=jeff@ois.run
```
Step 2: Register to create server side resources.
```sh
holos register user
```
```
9:52AM INF register.go:68 user version=0.79.0 email=jeff@ois.run user_id=018f826d-85a8-751d-81ee-64d0f2775b3f org_id=018f826d-85a8-751e-98dd-a6cddd9dd8f0
```
Step 3: Generate the bare platform in the local filesystem.
```sh
holos generate platform bare
```
```txt
9:52AM INF generate.go:79 wrote platform.metadata.json version=0.79.0 platform_id=018f826d-85a8-751f-96d0-0d2bf70df909 path=/home/jeff/holos/platform.metadata.json
9:52AM INF generate.go:91 generated platform bare version=0.79.0 platform_id=018f826d-85a8-751f-96d0-0d2bf70df909 path=/home/jeff/holos
```
Step 4: Push the platform form to the `holos server` web app.
```sh
holos push platform form .
```
```txt
9:52AM INF client.go:67 updated platform version=0.79.0 platform_id=018f826d-85a8-751f-96d0-0d2bf70df909 duration=73.62995ms
```
At this point the platform form is published and functions as expected
when visiting the platform web interface.
Makes it easier to work with grpcurl:
grpcurl -H "x-oidc-id-token: $(holos token)" -d '{"org_id":"'$(holos orgid)'"}' ${HOLOS_SERVER##*/} holos.platform.v1alpha1.PlatformService.ListPlatforms
When the user generates a platform, we need to know the platform ID it's
linked to in the holos server. If there is no platform with the same
name, the `holos generate platform` command should error out.
This is necessary because the first thing we want to show is pushing an
updated form to `holos server`. To update the web ui the CLI needs to
know the platform ID to update.
This patch modifies the generate command to obtain a list of platforms
for the org and verify the generated name matches one of the platforms
that already exists.
A future patch could have the `generate platform` command call the
`holos.platform.v1alpha1.PlatformService.CreatePlatform` method if the
platform isn't found.
Results:
```sh
holos generate platform bare
```
```txt
4:15PM INF generate.go:77 wrote platform.metadata.json version=0.77.1 platform_id=018f826d-85a8-751f-96d0-0d2bf70df909 path=/home/jeff/holos/platform.metadata.json
4:15PM INF generate.go:89 generated platform bare version=0.77.1 platform_id=018f826d-85a8-751f-96d0-0d2bf70df909 path=/home/jeff/holos
```
```sh
cat platform.metadata.json
```
```json
{
"id": "018f826d-85a8-751f-96d0-0d2bf70df909",
"name": "bare",
"display_name": "Bare Platform"
}
```
This patch logs the service and rpc method of every request at Info
level. The error code and message is also logged. This gives a good
indication of what rpc methods are being called and by whom.
This patch adds a `holos register user` command. Given an authenticated
id token and no other record of the user in the database, the cli tool
use the API to:
1. User is registered in `holos server`
2. User is linked to one Holos Organization.
3. Holos Organization has the `bare` platform.
4. Holos Organization has the `reference` platform.
5. Ensure `~/.holos/client-context.json` contains the user id and an
org id.
The `holos.ClientContext` struct is intended as a light weight way to
save and load the current organization id to the file system for further
API calls.
The assumption is most users will have only one single org. We can add
a more complicated config context system like kubectl uses if and when
we need it.
This patch adds a generate subcommand that copies a platform embedded
into the executable to the local filesystem. The purpose is to
accelerate initial setup with canned example platforms.
Two platforms are intended to start, one bare and one reference
platform. The number of platforms embedded into holos should be kept
small (2-3) to limit our support burden.
This patch adds the GetVersion rpc method to
holos.system.v1alpha1.SystemService and wires the version information up
to the Web UI.
This is a good example to crib from later regarding fetching and
refreshing data from the web ui using grpc and field masks.
This patch refactors the API following the [API Best Practices][api]
documentation. The UpdatePlatform method is modeled after a mutating
operation described [by Netflix][nflx] instead of using a REST resource
representation. This makes it much easier to iterate over the fields
that need to be updated as the PlatformUpdateOperation is a flat data
structure while a Platform resource may have nested fields. Nested
fields are more complicated and less clear to handle with a FieldMask.
This patch also adds a snapckbar message on save. Previously, the save
button didn't give any indication of success or failure. This patch
fixes the problem by adding a snackbar message that pop up at the bottom
of the screen nicely.
When the snackbar message is dismissed or times out the save button is
re-enabled.
[api]: https://protobuf.dev/programming-guides/api/
[nflx]: https://netflixtechblog.com/practical-api-design-at-netflix-part-2-protobuf-fieldmask-for-mutation-operations-2e75e1d230e4
Examples:
FieldMask for ListPlatforms
```
grpcurl -H "x-oidc-id-token: $(holos token)" -d @ ${HOLOS_SERVER##*/} holos.platform.v1alpha1.PlatformService.ListPlatforms <<EOF
{
"org_id": "018f36fb-e3f7-7f7f-a1c5-c85fb735d215",
"field_mask": { "paths": ["id","name"] }
}
EOF
```
```json
{
"platforms": [
{
"id": "018f36fb-e3ff-7f7f-a5d1-7ca2bf499e94",
"name": "bare"
},
{
"id": "018f6b06-9e57-7223-91a9-784e145d998c",
"name": "gary"
},
{
"id": "018f6b06-9e53-7223-8ae1-1ad53d46b158",
"name": "jeff"
},
{
"id": "018f6b06-9e5b-7223-8b8b-ea62618e8200",
"name": "nate"
}
]
}
```
Closes: #171
This patch refactors the API to be resource-oriented around one service
per resource type. PlatformService, OrganizationService, UserService,
etc...
Validation is improved to use CEL rules provided by [protovalidate][1].
Place holders for FieldMask and other best practices are added, but are
unimplemented as per [API Best Practices][2].
The intent is to set us up well for copying and pasting solid existing
examples as we add features.
With this patch the server and web app client are both updated to use
the refactored API, however the following are not working:
1. Update the model.
2. Field Masks.
[1]: https://buf.build/bufbuild/protovalidate
[2]: https://protobuf.dev/programming-guides/api/
This command is just a prototype of how to fetch the platform model so
we can make it available to CUE.
The idea is we take the data from the holos server and write it into a
CUE `_Platform` struct. This will probably involve converting the data
to CUE format and nesting it under the platform struct spec field.
This patch restructures the bare platform in preparation for a
`Platform` kind of output from CUE in addition to the existing
`BuildPlan` kind.
This patch establishes a pattern where our own CUE defined code goes
into the two CUE module paths:
1. `internal/platforms/cue.mod/gen/github.com/holos-run/holos/api/v1alpha1`
2. `internal/platforms/cue.mod/pkg/github.com/holos-run/holos/api/v1alpha1`
3. `internal/platforms/cue.mod/usr/github.com/holos-run/holos/api/v1alpha1`
The first path is automatically generated from Go structs. The second
path is where we override and provide additional cue level integration.
The third path is reserved for the end user to further refine and
constrain our definitions.
This form goes a good way toward capturing what we need to configure the
entire reference platform. Elements and sections are responsive to
which cloud providers are selected, which achieves my goal of modeling a
reasonably advanced form using only JSON data produced by CUE.
To write the form via the API:
cue export ./forms/platform/ --out json \
| jq '{platform_id: "'${platformId}'", fields: .spec.fields}' \
| grpcurl -H "x-oidc-id-token: $(holos token)" -d @ ${host}:443 \
holos.platform.v1alpha1.PlatformService.PutForm
The way we were organizing fields into section broke Formly validation.
This patch fixes the problem by using the recommended approach of
[Nested Forms][1].
This patch also refactors the PlatformService API to clean it up.
GetForm / PutForm are separated from the Platform methods. Similarly
GetModel / PutModel are separated out and are specific to get and put
the model data.
NOTE: I'm not sure we should have separated out the platform service
into it's own protobuf package. Seems maybe unnecessary.
❯ grpcurl -H "x-oidc-id-token: $(holos token)" -d '{"platform_id":"018f36fb-e3ff-7f7f-a5d1-7ca2bf499e94"}' jeff.app.dev.k2.holos.run:443 holos.platform.v1alpha1.PlatformService.GetModel
{
"model": {
"org": {
"contactEmail": "platform@openinfrastructure.co",
"displayName": "Open Infrastructure Services LLC",
"domain": "ois.run",
"name": "ois"
},
"privacy": {
"country": "earth",
"regions": [
"us-east-2",
"us-west-2"
]
},
"terms": {
"didAgree": true
}
}
}
[1]: https://formly.dev/docs/examples/other/nested-formly-forms
This patch wires up a Select and a Multi Select box. This patch also
establishes a decision as it relates to Formly TypeScript / gRPC Proto3
/ CUE definitions of the form data structure. The decision is to use
gRPC as a transport for any JSON to avoid friction trying to fit Formly
types into Proto3 messages.
Note when using google.protobuf.Value messages with bufbuild/connect-es,
we need to round trip them one last time through JSON to get the
original JSON on the other side. This is because connect-es preserves
the type discriminators in the case and value fields of the message.
Refer to: [Accessing oneof
groups](https://github.com/bufbuild/protobuf-es/blob/main/docs/runtime_api.md#accessing-oneof-groups)
NOTE: On the wire, carry any JSON as field configs for expedience. I
attempted to reflect FormlyFieldConfig in protobuf, but it was too time
consuming. The loosely defined Formly json data API creates significant
friction when joined with a well defined protobuf API. Therefore, we do
not specify anything about the Forms API, convey any valid JSON, and
leave it up to CUE and Formly on the sending and receiving side of the
API.
We use CUE to define our own holos form elements as a subset of the loose
Formly definitions. We further hope Formly will move toward a better JSON
data API, but it's unlikely. Consider replacing Formly entirely and
building on top of the strongly typed Angular Dyanmic Forms API.
Refer to: https://github.com/ngx-formly/ngx-formly/blob/v6.3.0/src/core/src/lib/models/fieldconfig.ts#L15
Consider: https://angular.io/guide/dynamic-form
Usage:
Generate the form from CUE
cue export ./forms/platform/ --out json | jq -cM | pbcopy
Store the form JSON in the config_values column of the platforms table.
View the form, and submit some data. Then get the data back out for use rendering the platform:
grpcurl -H "x-oidc-id-token: $(holos token)" -d '{"platform_id":"'${platformId}'"}' $holos holos.v1alpha1.PlatformService.GetConfig
```json
{
"platform": {
"spec": {
"config": {
"user": {
"sections": {
"org": {
"fields": {
"contactEmail": "jeff@openinfrastructure.co",
"displayName": "Open Infrastructure Services LLC",
"domain": "ois.run",
"name": "ois"
}
},
"privacy": {
"fields": {
"country": "earth",
"regions": [
"us-east-2",
"us-west-2"
]
}
},
"terms": {
"fields": {
"didAgree": true
}
}
}
}
}
}
}
}
```
Problem:
The GetConfig response value isn't directly usable with CUE without some
gymnastics.
Solution:
Refactor the protobuf definition and response output to make the user
defined and supplied config values provided by the API directly usable
in the CUE code that defines the platform.
Result:
The top level platform config is directly usable in the
`internal/platforms/bare` directory:
grpcurl -H "x-oidc-id-token: $(holos token)" -d '{"platform_id":"'${platformID}'"}' $host \
holos.v1alpha1.PlatformService.GetConfig \
> platform.holos.json
Vet the user supplied data:
cue vet ./ -d '#PlatformConfig' platform.holos.json
Build the holos component. The ConfigMap consumes the user supplied
data:
cue export --out yaml -t cluster=k2 ./components/configmap platform.holos.json \
| yq .spec.components
Note the data provided by the input form is embedded into the
ConfigMap managed by Holos:
```yaml
KubernetesObjectsList:
- metadata:
name: platform-configmap
apiObjectMap:
ConfigMap:
platform: |
metadata:
name: platform
namespace: default
labels:
app.holos.run/managed: "true"
data:
platform: |
kind: Platform
spec:
config:
user:
sections:
org:
fields:
contactEmail: jeff@openinfrastructure.co
displayName: Open Infrastructure Services LLC
domain: ois.run
name: ois
apiVersion: app.holos.run/v1alpha1
metadata:
name: bare
labels: {}
annotations: {}
holos:
flags:
cluster: k2
kind: ConfigMap
apiVersion: v1
Skip: false
```
Problem:
The use of google.protobuf.Any was making it awkward to work with the
data provided by the user. The structure of the form data is defined by
the platform engineer, so the intent of Any was to wrap the data in a
way we can pass over the network and persist in the database.
The escaped JSON encoding was problematic and error prone to decode on
the other end.
Solution:
Define the Platform values as a two level map with string keys, but with
protobuf message fields "sections" and "fields" respectively. Use
google.protobuf.Value from the struct package to encode the actual
value.
Result:
In TypeScript, google.protobuf.Value encodes and decodes easily to a
JSON value. On the go side, connect correctly handles the value as
well.
No more ugly error prone escaping:
```
❯ grpcurl -H "x-oidc-id-token: $(holos token)" -d '{"platform_id":"'${platformId}'"}' $host holos.v1alpha1.PlatformService.GetConfig
{
"sections": {
"org": {
"fields": {
"contactEmail": "jeff@openinfrastructure.co",
"displayName": "Open Infrastructure Services LLC",
"domain": "ois.run",
"name": "ois"
}
}
}
}
```
This return value is intended to be directly usable in the CUE code, so
we may further nest the values into a platform.spec key.
This patch changes the backend to store the platform config form
definition and the config values supplied by the form as JSON in the
database.
The gRPC API does not change with this patch, but may need to depending
on how this works and how easy it is to evolve the data model and add
features.
This patch is a work in progress wiring up the form to put the values to
the holos server using grpc.
In an effort to simplify the platform configuration, the structure is a
two level map with the top level being configuration sections and the
second level being the fields associated with the config section.
To support multiple kinds of values and field controls, the values are
serialized to JSON for rpc over the network and for storage in the
database. When they values are used, either by the UI or by the `holos
render` command, they're to be unmarshalled and in-lined into the
Platform Config data structure.
Pick back up ensuring the Platform rpc handler correctly encodes and
decodes the structure to the database.
Consider changing the config_form and config_values fields to JSON field
types in the database. It will likely make working with this a lot
easier.
With this patch we're ready to wire up the holos render command to fetch
the platform configuration and create the end to end demo.
Here's essentially what the render command will fetch and lay down as a
json file for CUE:
```
❯ grpcurl -H "x-oidc-id-token: $(holos token)" -d '{"platform_id":"018f2c4e-ecde-7bcb-8b89-27a99e6cc7a1"}' jeff.app.dev.k2.holos.run:443 holos.v1alpha1.PlatformService.GetPlatform | jq .platform.config.values
{
"sections": {
"org": {
"values": {
"contactEmail": "\"platform@openinfrastructure.co\"",
"displayName": "\"Open Infrastructure Services LLC\"",
"domain": "\"ois.run\"",
"name": "\"ois\""
}
}
}
}
```
This patch adds a /platform/:id route path to a PlatformDetail
component. The platform detail component calls the GetPlatform method
given the platform ID and renders the platform config form on the detail
tab.
The submit button is not yet wired up.
The API for adding platforms changes, allowing raw json bytes using the
RawConfig. The raw bytes are not presented on the read path though,
calling GetPlatforms provides the platform and the config form inline in
the response.
Use the `raw_config` field instead of `config` when creating the form
data.
```
❯ grpcurl -H "x-oidc-id-token: $(holos token)" -d @ jeff.app.dev.k2.holos.run:443 holos.v1alpha1.PlatformService.AddPlatform <<EOF
{
"platform": {
"org_id": "018f27cd-e5ac-7f98-bfe1-2dbab208a48c",
"name": "bare2",
"raw_config": {
"form": "$(cue export ./forms/platform/ --out json | jq -cM | base64 -w0)"
}
}
}
EOF
```
This patch adds 4 fields to the Platform table:
1. Config Form represents the JSON FormlyFieldConfig for the UI.
2. Config CUE represents the CUE file containing a definition the
Config Values must unify with.
3. Config Definition is the CUE definition variable name used to unify
the values with the cue code. Should be #PlatformSpec in most
cases.
4. Config Values represents the JSON values provided by the UI.
The use case is the platform engineer defines the #PlatformSpec in cue,
and provides the form field config. The platform engineer then provides
1-3 above when adding or updating a Platform.
The UI then presents the form to the end user and provides values for 4
when the user submits the form.
This patch also refactors the AddPlatform method to accept a Platform
message. To do so we make the id field optional since it is server
assigned.
The patch also adds a database constraint to ensure platform names are
unique within the scope of an organization.
Results:
Note how the CUE representation of the Platform Form is exported to JSON
then converted to a base64 encoded string, which is the protobuf JSON
representation of a bytes[] value.
```
grpcurl -H "x-oidc-id-token: $(holos token)" -d @ jeff.app.dev.k2.holos.run:443 holos.v1alpha1.PlatformService.AddPlatform <<EOF
{
"platform": {
"id": "0d3dc0c0-bbc8-41f8-8c6e-75f0476509d6",
"org_id": "018f27cd-e5ac-7f98-bfe1-2dbab208a48c",
"name": "bare",
"config": {
"form": "$(cd internal/platforms/bare && cue export ./forms/platform/ --out json | jq -cM | base64 -w0)"
}
}
}
EOF
```
Note the requested platform ID is ignored.
```
{
"platforms": [
{
"id": "018f2af9-f7ba-772a-9db6-f985ece8fed1",
"timestamps": {
"createdAt": "2024-04-29T17:49:36.058379Z",
"updatedAt": "2024-04-29T17:49:36.058379Z"
},
"name": "bare",
"creator": {
"id": "018f27cd-e591-7f98-a9d2-416167282d37"
},
"config": {
"form": "eyJhcGlWZXJzaW9uIjoiZm9ybXMuaG9sb3MucnVuL3YxYWxwaGExIiwia2luZCI6IlBsYXRmb3JtRm9ybSIsIm1ldGFkYXRhIjp7Im5hbWUiOiJiYXJlIn0sInNwZWMiOnsic2VjdGlvbnMiOlt7Im5hbWUiOiJvcmciLCJkaXNwbGF5TmFtZSI6Ik9yZ2FuaXphdGlvbiIsImRlc2NyaXB0aW9uIjoiT3JnYW5pemF0aW9uIGNvbmZpZyB2YWx1ZXMgYXJlIHVzZWQgdG8gZGVyaXZlIG1vcmUgc3BlY2lmaWMgY29uZmlndXJhdGlvbiB2YWx1ZXMgdGhyb3VnaG91dCB0aGUgcGxhdGZvcm0uIiwiZmllbGRDb25maWdzIjpbeyJrZXkiOiJuYW1lIiwidHlwZSI6ImlucHV0IiwicHJvcHMiOnsibGFiZWwiOiJOYW1lIiwicGxhY2Vob2xkZXIiOiJleGFtcGxlIiwiZGVzY3JpcHRpb24iOiJETlMgbGFiZWwsIGUuZy4gJ2V4YW1wbGUnIiwicmVxdWlyZWQiOnRydWV9fSx7ImtleSI6ImRvbWFpbiIsInR5cGUiOiJpbnB1dCIsInByb3BzIjp7ImxhYmVsIjoiRG9tYWluIiwicGxhY2Vob2xkZXIiOiJleGFtcGxlLmNvbSIsImRlc2NyaXB0aW9uIjoiRE5TIGRvbWFpbiwgZS5nLiAnZXhhbXBsZS5jb20nIiwicmVxdWlyZWQiOnRydWV9fSx7ImtleSI6ImRpc3BsYXlOYW1lIiwidHlwZSI6ImlucHV0IiwicHJvcHMiOnsibGFiZWwiOiJEaXNwbGF5IE5hbWUiLCJwbGFjZWhvbGRlciI6IkV4YW1wbGUgT3JnYW5pemF0aW9uIiwiZGVzY3JpcHRpb24iOiJEaXNwbGF5IG5hbWUsIGUuZy4gJ0V4YW1wbGUgT3JnYW5pemF0aW9uJyIsInJlcXVpcmVkIjp0cnVlfX0seyJrZXkiOiJjb250YWN0RW1haWwiLCJ0eXBlIjoiaW5wdXQiLCJwcm9wcyI6eyJsYWJlbCI6IkNvbnRhY3QgRW1haWwiLCJwbGFjZWhvbGRlciI6InBsYXRmb3JtLXRlYW1AZXhhbXBsZS5jb20iLCJkZXNjcmlwdGlvbiI6IlRlY2huaWNhbCBjb250YWN0IGVtYWlsIGFkZHJlc3MiLCJyZXF1aXJlZCI6dHJ1ZX19XX1dfX0K"
}
}
]
}
```
This patch adds a basic AddPlatform method that adds a platform with a
name and a display name.
Next steps are to add fields for the Platform Config Form definition and
the Platform Config values submitted from the form.
Next step: AddPlatform
Also consider extracting the queries to get the requested org_id to a
helper function. This will likely eventually move to an interceptor
because every request is org scoped and needs authorization checks
against the org.
```
grpcurl -H "x-oidc-id-token: $(holos token)" -d '{"org_id":"018f27cd-e5ac-7f98-bfe1-2dbab208a48c"}' jeff.app.dev.k2.holos.run:443 holos.v1alpha1.PlatformService.GetPlatforms
```
Problem:
Platform engineers need the ability to define custom input fields for
their own platform level configuration values. The holos web UI needs
to present the platform config values in a clean way. The values
entered on the form need to make their way into the top level
Platform.spec field for use across all components and clusters in the
platform.
Solution:
Define a Platform Form in a forms cue package. The output of this
definition is intended to be sent to the holos server to provide to the
web UI.
Result:
Platform engineers can define their platform config input values in
their infrastructure repository. For example, the bare platform form
inputs are defined at `platforms/bare/forms/platform/platform-form.cue`.
This cue file produces [FormlyFieldConfig][1] output.
```console
cue export ./forms/platform/ --out yaml
```
```yaml
apiVersion: forms.holos.run/v1alpha1
kind: PlatformForm
metadata:
name: bare
spec:
sections:
- name: org
displayName: Organization
description: Organization config values are used to derive more specific configuration values throughout the platform.
fieldConfigs:
- key: name
type: input
props:
label: Name
placeholder: example
description: DNS label, e.g. 'example'
required: true
- key: domain
type: input
props:
label: Domain
placeholder: example.com
description: DNS domain, e.g. 'example.com'
required: true
- key: displayName
type: input
props:
label: Display Name
placeholder: Example Organization
description: Display name, e.g. 'Example Organization'
required: true
- key: contactEmail
type: input
props:
label: Contact Email
placeholder: platform-team@example.com
description: Technical contact email address
required: true
```
Next Steps:
Add a holos subcommand to produce the output and store it in the
backend. Wire the front end to fetch the form config from the backend.
[1]: https://formly.dev/docs/api/core#formlyfieldconfig
This patch adds a bare platform that does nothing but render a configmap
containing the platform config structure itself.
The definition of the platform structure is firming up. The platform
designer, which may be a holos customer, is responsible for defining the
structure of the `platform.spec` output field.
Us holos developers have a reserved namespace to add configuration
fields and data in the `platform.holos` output file.
Beyond these two fields, the platform config structure has TypeMeta and
ObjectMeta fields similar to a kubernetes api object to support
versioning the platform config data, naming the platform, annotating the
platform, and labeling the platform.
The path forward from here is to:
1. Eventually move the stable definitions into a CUE module that gets
imported into the user's package.
2. As a platform designer, add the organization field to the
#PlatformSpec definition as a CUE definition.
3. As a platform designer, add the organization field Form data
structure as a JSON file.
4. Add an API to upload the #PlatformSpec cue file and the
#PlatformSpec form json file to the saas backend.
5. Wire up Angular to pull the form json from the API and render the
form.
6. Wire up Angular to write the form data to a gRPC service method.
7. Wire up the `holos cli` to read the form data from a gRPC service
method.
8. Tie it all together where the holos cli renders the configmap.
This patch adds an organization "selector" that's really just a place
holder. The active organization is the last element in the list
returned by the GetCallerOrganizations method for now.
The purpose is to make sure we have the structure in place for more than
one organizations without needing to implement full support for the
feature at this early stage.
The Angular frontend is expected to call the activeOrg() method of the
OrganizationService. In the future this could store the state of which
organization the user has selected. The purpose is to return an org id
to send as a request parameter for other requests.
Note this patch also implements refresh behavior. The list of orgs is
fetched once on application load. If there is no user, or the user has
zero orgs, the user is created and an organization is added with them as
an owner. This is accompished using observable pipes.
The pipe is tied to a refresh behavior. Clicking the org button
triggers the refresh behavior, which executes the pipe again and
notifies all subscribers.
This works quite well and should be idiomatic angular / rxjs. Clicking
the button automatically updates the UI after making the necessary API
calls.
This patch adds the OrganizationService to the Angular front end and
displays a simple list of the organizations the user is a member of in
the profile card.
There isn't a service yet to return the currently selected
organization, but that could be a simple method to return the most
recent entry in the list until we put something more complicated in
place like local storage of what the user has selected.
It may make sense to put a database constraint on the number of
organizations until we implement the feature later, it's too early to do
so now, I just want to make sure it's possible to add later.
Problem:
When loading the page the GetCallerClaims rpc method is called multiple
times unnecessarily.
Solution:
Use [shareReplay][1] to replay the last observable event for all
subscribers, including subscribers coming late to the party.
Result:
Network inspector in chrome indicates GetCallerClaims is called once and
only once.
[1]: https://rxjs.dev/api/operators/shareReplay
This patch adds a ProfileButton component which makes a ConnectRPC gRPC
call to the `holos.v1alpha1.UserService.GetCallerClaims` method and
renders the profile button based on the claims.
Note, in the network inspector there are two API calls to
`holos.v1alpha1.UserService.GetCallerClaims` which is unfortunate. A
follow up patch might be good to fix this.
Problem:
It's slow to build the angular app, compile it into the go executable,
copy it to the pod, then restart the server.
Solution:
Configure the mesh to route /ui to `ng serve` running on my local
host.
Result:
Navigating to https://jeff.app.dev.k2.holos.run/ui gets responses from
the ng development server.
Use:
ng serve --host 0.0.0.0
This patch simplifies the user and organization registration and query
for the UI. The pattern clients are expected to follow is to create if
the get fails. For example, the following pseudo-go-code is the
expected calling convention:
var entity *ent.User
entity, err := Get()
if err != nil {
if ent.MaskNotFound(err) == nil {
entity = Create()
} else {
return err
}
}
return entity
This patch adds the following service methods. For initial
registration, all input data comes from the id token claims of the
authenticated user.
```
❯ grpcurl -H "x-oidc-id-token: $(holos token)" jeff.app.dev.k2.holos.run:443 list | xargs -n1 grpcurl -H "x-oidc-id-token: $(holos token)" jeff.app.dev.k2.holos.run:443 list
holos.v1alpha1.OrganizationService.CreateCallerOrganization
holos.v1alpha1.OrganizationService.GetCallerOrganizations
holos.v1alpha1.UserService.CreateCallerUser
holos.v1alpha1.UserService.GetCallerClaims
holos.v1alpha1.UserService.GetCallerUser
```
The server will frequently look up the user record given the iss and sub
claims from the id token, index them and make sure the combination of
the two is unique.
The `make-provisioner-jwt` incorrectly used the choria broker password
as the provisioning token. In the reference [setup.sh][1] both the
token and the `broker_provisioning_password` are set to `s3cret` so I
confused the two, but they are actually different values.
This patch ensures the provisioning token configured in
`provisioner.yaml` matches the token embedded into the provisioning.jwt
file using `choria jwt provisioning` via the `make-provisioner-jwt`
script.
[1]: 6dbc8fd105/example/setup/templates/provisioner/provisioner.yaml (L6)
Problem:
When the ingress default Gateway AuthorizationPolicy/authpolicy-custom
rule is in place the choria machine room holos controller fails to
connect to the provisioner broker with the following error:
```
❯ holos controller run --config=agent.cfg
WARN[0000] Starting controller version 0.68.1 with config file /home/jeff/workspace/holos-run/holos/hack/choria/agent/agent.cfg leader=false
WARN[0000] Switching to provisioning configuration due to build defaults and missing /home/jeff/workspace/holos-run/holos/hack/choria/agent/agent.cfg
WARN[0000] Setting anonymous TLS mode during provisioning component=server connection=coffee.home identity=coffee.home
WARN[0000] Initial connection to the Broker failed on try 1: invalid websocket connection component=server connection=coffee.home identity=coffee.home
WARN[0000] Initial connection to the Broker failed on try 2: invalid websocket connection component=server connection=coffee.home identity=coffee.home
WARN[0002] Initial connection to the Broker failed on try 3: invalid websocket connection component=server connection=coffee.home identity=coffee.home
```
This problem is caused because the provisioning token url is set to
`wss://jeff.provision.dev.k2.holos.run:443` which has the port number
specified.
Solution:
Follow the upstream istio guidance of [Writing Host Match Policies][1]
to match host headers with or without the port specified.
Result:
The controller is able to connect to the provisioner broker:
[1]: https://istio.io/latest/docs/ops/best-practices/security/#writing-host-match-policies
This problem fixes an error where the istio ingress gateway proxy failed
to verify the TLS certificate presented by the choria broker upstream
server.
kubectl logs choria-broker-0
level=error msg="websocket: TLS handshake error from 10.244.1.190:36142: remote error: tls: unknown certificate\n"
Istio ingress logs:
kubectl -n istio-ingress logs -l app=istio-ingressgateway -f | grep --line-buffered '^{' | jq .
"upstream_transport_failure_reason": "TLS_error:|268435581:SSL_routines:OPENSSL_internal:CERTIFICATE_VERIFY_FAILED:TLS_error_end:TLS_error_end"
Client curl output:
curl https://jeff.provision.dev.k2.holos.run
upstream connect error or disconnect/reset before headers. retried and the latest reset reason: remote connection failure, transport failure reason: TLS_error:|268435581:SSL routines:OPENSSL_i
nternal:CERTIFICATE_VERIFY_FAILED:TLS_error_end:TLS_error_end
Explanation of error:
Istio defaults to expecting a tls certificate matching the downstream
host/authority which isn't how we've configured Choria.
Refer to [ClientTLSSettings][1]
> A list of alternate names to verify the subject identity in the
> certificate. If specified, the proxy will verify that the server
> certificate’s subject alt name matches one of the specified values. If
> specified, this list overrides the value of subject_alt_names from the
> ServiceEntry. If unspecified, automatic validation of upstream presented
> certificate for new upstream connections will be done based on the
> downstream HTTP host/authority header, provided
> VERIFY_CERTIFICATE_AT_CLIENT and ENABLE_AUTO_SNI environmental variables
> are set to true.
[1]: https://istio.io/latest/docs/reference/config/networking/destination-rule/#ClientTLSSettings
This patch configures ArgoCD to log in via PKCE.
Note the changes are primarily in platform.site.cue and ensuring the
emailDomain is set properly. Note too the redirect URL needs to be
`/pkce/verify` when PKCE is enabled. Finally, if the setting is
reconfigured make sure to clear cookies otherwise the incorrect
`/auth/callback` path may be used.
Problem:
Port names in the default Gateway.spec.servers.port field must be unique
across all servers associated with the workload.
Solution:
Append the fully qualified domain name with dots replaced with hyphens.
Result:
Port name is unique.
Problem:
The default gateway in one cluster gets server entries for all hosts in
the problem. This makes the list unnecessarily large with entries for
clusters that should not be handled on the current cluster.
For example, the k2 cluster has gateway entries to route hosts for k1,
k3, k4, k5, etc...
Solution:
Add a field to the CertInfo definition representing which clusters the
host is valid on.
Result:
Hosts which are valid on all clusters, e.g. login.ois.run, have all
project clusters added to the clusters field of the CertInfo. Hosts
which are valid on a single cluster have the coresponding single entry
added.
When building resources, holos components should check if `#ClusterName`
is a valid field of the CertInfo.clusters field. If so, the host is
valid for the current cluster. If not, the host should be omitted from
the current cluster.
Doing so forces unnecessary hosts for some projects. For example,
iam.ois.run is useless for the iam project, the primary project host is
login to build login.ois.run.
Some projects may not need any hosts as well.
Better to let the user specify `project: foo: hosts: foo: _` if they
want it.
This patch loops over every Gateway.spec.servers entry in the default
gateway and manages an ExternalSecret to sync the credential from the
provisioner cluster.
Problem:
A Holos Component is created for each project stage, but all hosts for
all stages in the project are added. This creates duplicates.
Solution:
Sort project hosts by their stage and map the holos component for a
stage to the hosts for that stage.
Result:
Duplicates are eliminated, the prod certs are not in the dev holos
component and vice-versa.
This patch provisions wildcard certs in the provisioning cluster. The
CN matches the project stage host global hostname without any cluster
qualifiers.
The use of a wildcard in place of the environment name dns segment at
the leftmost position of the fully qualified dns name enables additional
environments to be configured without reissuing certificates.
This is to avoid the 100 name per cert limit in LetsEncrypt.
Mapping each project host fqdn to the stage is unnecessary. The list of
gateway servers is constructed from each FQDN in the project.
This patch removes the unnecessary struct mappings.
Problem:
It's difficult to map and reduce the collection of project hosts when
configuring related Certificate, Gateway.spec.servers, VirtualService,
and auth proxy cookie domain settings.
Solution:
Define #ProjectHosts which takes a project and provides Hosts which is a
struct with a fqdn key and a #CertInfo value. The #CertInfo definition
is intended to provide everything need to reduce the Hosts property to
structs usful for the problematic resources mentioned previously.
Result:
Gateway.spec.servers are mapped using #ProjectHosts
Next step is to map the Certificate resources on the provisioner
cluster.
Problem:
Adding environments to a project causes certs to be re-issued.
Solution:
Enable wildcard certs for per-environment namespaces like jeff, gary,
nate, etc...
Result:
Environments can be added to a project stage without needing the cert to
be re-issued.
This patch avoids LetsEncrypt rate limits by consolidating multiple dns
names into one certificate.
For each project host, create a certificate for each stage in the
project. The certificate contains the dns names for all clusters and
environments associated with that stage and host.
This can become quite a list, the limit is 100 dnsNames.
For the Holos project which has 7 clusters and 4 dev environments, the
number of dns names is 32 (4 envs + 4 envs * 7 clusters = 32 dns names).
Still, a much needed improvement because we're limited to 50 certs per
week.
It may be worth considering wildcards for the per-developer
environments, which are the ones we'll likely spin up the most
frequently.
This patch is a partial step toward getting the choria broker up
and running in my own namespace. The choria broker is necessary for
provisioning machine room agents such as the holos controller.
This patch adds an initial holos controller subcommand. The machine
room agent starts, but doesn't yet provision because we haven't deployed
the provisioning infrastructure yet.
Configure NATS in a 3 Node deployment with resolver authentication using
an Operator JWT.
The operator secret nkeys are stored in the provisioner cluster. Get
them with:
holos get secret -n jeff-holos nats-nsc --print-key nsc.tgz | tar -tvzf-
This patch sets up basic routing and a 404 not found page. The Home and
Clusters page are generated from the [dashboard schematic][1]
ng generate @angular/material:dashboard home
ng generate @angular/material:dashboard cluster-list
ng g c error-not-found
[1]: https://material.angular.io/guide/schematics#dashboard-schematic
Instead of trying to hand-craft a navigation sidebar and toolbar from
Youtube videos, use the [navigation schematic][1] to quickly get a "good
enough" UI.
ng generate @angular/material:navigation nav
[1]: https://material.angular.io/guide/schematics#navigation-schematic
Angular must build output into a path compatible with the Go
http.FileServer. We cannot easily graft an fs.FS onto a sub-path, so we
need the `./ui/` path in the output. This requires special
configuration from the Angular default application builder behavior.
Angular must build output into a path compatible with the Go
http.FileServer. We cannot easily graft an fs.FS onto a sub-path, so we
need the `./ui/` path in the output. This requires special
configuration from the Angular default application builder behavior.
ng add @angular/material
```
❯ ng add @angular/material
Skipping installation: Package already installed
? Choose a prebuilt theme name, or "custom" for a custom theme: Indigo/Pink [ Preview: https://material.angular.io?theme=indigo-pink ]
? Set up global Angular Material typography styles? Yes
? Include the Angular animations module? Include and enable animations Yes
```
And add a logout command that deletes the token cache.
The token package is intended for subcommands that need to make API
calls to the holos api server, getting a token should be a simple matter
of calling the token.Get() method, which takes minimal dependencies.
This copies the login command from the previous holos cli. Wire
dependency injection and all the rest of the unnecessary stuff from
kubelogin are removed, streamlined down into a single function that
takes a few oidc related parameters.
This will need to be extracted out into an infrastructure service so
multiple other command line tools can easily re-use it and get the ID
token into the x-oidc-id-token header.
The upstream nats charts don't specify namespaces for each attribute.
This works with helm update, but not helm template which holos uses to
render the yaml.
The missing namespace causes flux to fail.
This patch uses the flux kustomization to add the target namespace to
all resources.
When rendering a holos component which contains more than one helm chart, rendering fails. It should succeed.
```
holos render --cluster-name=k2 /home/jeff/workspace/holos-run/holos/docs/examples/platforms/reference/clusters/holos/... --log-level debug
```
```
9:03PM ERR could not execute version=0.64.2 err="could not rename: rename /home/jeff/workspace/holos-run/holos/docs/examples/platforms/reference/clusters/holos/nats/envs/vendor553679311 /home/jeff/workspace/holos-run/holos/docs/examples/platforms/reference/clusters/holos/nats/envs/vendor: file exists" loc=helm.go:145
```
This patch fixes the problem by moving each child item of the temporary
directory charts are installed into. This avoids the problem of moving
the parent when the parent target already exists.
Add Tilt back from holos server
Note with this patch the ec-creds.yaml file needs to be applied to the
provisioner and an external secret used to sync the image pull creds.
With this patch the dev instance is accessible behind the auth proxy.
pgAdmin also works from the Tilt UI.
https://jeff.holos.dev.k2.ois.run/app/start
goreleaser fails with Failure: plugin connect-query: could not find protoc plugin for name connect-query - please make sure protoc-gen-connect-query is installed and present on your $PATH
Remove the server.Config struct, not needed. Remove the app struct and
move the configuration to the main holos.Config.ServerConfig.
Add flags specific to server configuration.
With this patch logging is simplified. Subcommands have a handle on the
top level holos.Config and can get a fully configured logger from
cfg.Logger() after flag parsing happens.
Disambiguate the term `core` which should mean the core domain. The app
is a supporting domain concerned with logging and configuration
initialization early in the life cycle.
This runbook documents how to write a full database backup to a blank S3
bucket given an existing postgrescluster resource with a live, running
database.
The pgo controller needs to remove and re-create the repo for the backup
to succeed, otherwise it complains about a missing file expected from a
previous backup.
Without this patch users encounter an error from istio because it does
not have a valid Jwks from ZITADEL to verify the request when processing
a `RequestAuthentication` policy.
Fixes error `AuthProxy JWKS Error - Jwks doesn't have key to match kid or alg from Jwt`.
Occurs when accessing a protected URL for the first time after tokens have expired.
Grafana does not yet have the istio sidecar. Prometheus is accessible
through the auth proxy. Cert manager is added to the workload clusters
so tls certs can be issued for webhooks, the kube-prom-stack helm chart
uses cert manager for this purpose.
With this patch Grafana is integrated with OIDC and I'm able to log in
as an Administrator.
Problem:
The VirtualService that catches auth routes for paths, e.g.
`/holos/authproxy/istio-ingress` is bound to the default gateway which
no longer exists because it has no hosts.
Solution:
It's unnecessary and complicated to create a Gateway for every project.
Instead, put all server entries into one `default` gateway and
consolidate the list using CUE.
Result:
It's easier to reason about this system. There is only one ingress
gateway, `default` and everything gets added to it. VirtualServices
need only bind to this gateway, which has a hosts entry appropriately
namespaced for the project.
Problem:
The ZITADEL database isn't restoring into the prod-iam namespace after
moving from prod-iam-zitadel because no backup exists at the bucket
path.
Solution:
Hard-code the path to the old namespace to restore the database. We'll
figure out how to move the backups to the new location in a follow up
change.
The `prod-platform-gateway` kustomization is reconciling early:
ExternalSecret/istio-ingress/argocd.ois.run dry-run failed: failed to
get API group resources: unable to retrieve the complete list of server
APIs: external-secrets.io/v1beta1: the server could not find the
requested resource
This patch moves ZITADEL from the prod-iam-zitadel namespace to the
projects managed prod-iam namespace, which is the prod environment of
the prod stage of the iam project.
Using the Helm chart so we can inject the istio sidecar with a kustomize
patch and tweak the configs for OIDC integration.
Login works, istio sidecar is injected. ArgoCD can only be configured
with one domain unfortunately, it's not accessible at argocd.ois.run,
only argocd.k2.ois.run (or whatever cluster it's installed into).
Ideally it would use the Host header but it does not.
RBAC is not implemented but the User Info endpoint does have group
membership so this shouldn't be a problem to implement.
This patch defines a #AuthPolicyRules struct which excludes hosts from
the blanket auth policy and includes them in specialized auth policies.
The purpose is to handle special cases like vault requests which have an
`X-Vault-Token` and `X-Vault-Request` header.
Vault does not use jwts so we cannot verify them in the mesh, have to
pass them along to the backend.
Closes: #93
The ingress gateway auth proxy callback conflicts with the project stage
auth proxy callback for the same backend Host: header value.
This patch disambiguates them by the namespace the auth proxy resides
in.
This patch adds a `RequestAuthentication` and `AuthorizationPolicy` rule
to protect all requests flowing through the default ingress gateway.
Consider a browser request for httpbin.k2.example.com representing any
arbitrary host with a valid destination inside the service mesh. The
default ingress gateway will check if there is already an
x-oidc-id-token header, and if so validate the token is issued by
ZITADEL and the aud value contains the ZITADEL project number.
If the header is not present, the request is forwarded to oauth2-proxy
in the istio-ingress namespace. This auth proxy is configured to start
the oidc auth flow with a redirect back to /holos/oidc/callback of the
Host: value originally provided in the browser request.
Closes: #82
This patch adds an ingress gateway extauthz provider. Because ZITADEL
returns all applications associated with a ZITADEL project in the aud
claim, it makes sense to have one ingress auth proxy at the initial
ingress gateway so we can get the ID token in the request header for
backend namespaces to match using `RequestAuthentication` and
`AuthorizationPolicy`.
This change likely makes the additional per-stage auth proxies
unnecessary and over-engineered. Backend namespaces will have access to
the ID token.
It doesn't make sense to link the stage ext authz provider to the
ingress gateway because there can be only one provider per workload.
Link it instead to the backend environment and use the
`security.holos.run/authproxy` label to match the workload.
Problem:
Backend services and web apps expect to place their own credentials into
the Authorization header. oauth2-proxy writes over the authorization
header creating a conflict.
Solution:
Use the alpha configuration to place the id token into the
x-oidc-id-token header and configure the service mesh to authenticate
requests that have this header in place.
Note: ZITADEL does not use a JWT for an access token, unlike Keycloak
and Dex. The access token is not compatible with a
RequestAuthentication jwt rule so we must use the id token.
Without this patch the istio RequestAuthentication resources fail to
match because the access token from ZITADEL returned by oauth2-proxy in
the x-auth-request-access-token header is not a proper jwt.
The error is:
```
Jwt is not in the form of Header.Payload.Signature with two dots and 3 sections
```
This patch works around the problem by configuring oauth2-proxy to set
the ID token, which is guaranteed to be a proper JWT in the
authorization response headers.
Unfortunately, oauth2-proxy will only place the ID token in the
Authorization header response, which will write over any header set by a
client application. This is likely to cause problems with single page
apps.
We'll probably need to work around this issue by using the alpha
configuration to set the id token in some out-of-the-way header. We've
done this before, it'll just take some work to setup the ConfigMap and
translate the config again.
This patch configures an istio envoyExtAuthzHttp provider for each stage
in each project. An example provider for the dev stage of the holos
project is `authproxy-dev-holos`
This patch configures the service mesh to route all requests with a uri
path prefix of `/holos/oidc` to the auth proxy associated with the
project stage.
Consider a request to https://jeff.holos.dev.k2.ois.run/holos/oidc/sign_in
This request is usually routed to the backend app, but
VirtualService/authproxy in the dev-holos-system namespace matches the
request and routes it to the auth proxy instead.
The auth proxy matches the request Host: header against the whitelist
and cookiedomain setting, which matches the suffix
`.holos.dev.k2.ois.run`. The auth proxy redirects to the oidc issuer
with a callback url of the request Host for a url of
`https://jeff.holos.dev.k2.ois.run/holos/oidc/callback`.
ZITADEL matches the callback against those registered with the app and
the app client id. A code is then sent back to the auth proxy.
The auth proxy sets a cookie named `__Secure-authproxy-dev-holos` with a
domain of `.holos.dev.k2.ois.run` from the suffix match of the
`--cookiedomain` flag.
Because this all works using paths, the `auth` prefix domains have been
removed. They're unnecessary, oauth2-proxy is available for any host
routed to the project stage at path prefix `/holos/oidc`.
Refer to https://oauth2-proxy.github.io/oauth2-proxy/features/endpoints/
for good endpoints for debuggin, replacing `/oauth2` with `/holos/oidc`
This patch deploys oauth2-proxy and redis to the system namespace of
each stage in each project. The plan is to redirect unauthenticated
requests to the request host at the /holos/oidc/callback endpoint.
This patch removes the --redirect-uri flag, which makes the auth domain
prefix moot, so a future patch should remove those if they really are
unnecessary.
The reason to remove the --redirect-uri flag is to make sure we set the
cookie to a domain suffix of the request Host: header.
This patch adds entries to the project stage Gateway for oauth2-proxy.
Three entries for each stage are added, one for the global endpoint plus
one for each cluster.
Without this patch the auth proxy cookie domain is difficult to manage.
This patch refactors the hosts managed for each environment in a project
to better align with security domains and auth proxy session cookies.
The convention is: `<env?>.<host>.<stage?>.<cluster?>.<domain>` where
`host` can be 0..N entries with a default value of `[projectName]`.
env may be omitted for prod or the dev env of the dev stage. stage may
be omitted for prod. cluster may be omitted for the global endpoint.
For a project named `holos`:
| Project | Stage | Env | Cluster | Host |
| ------- | ----- | --- | ------- | ------ |
| holos | dev | jeff | k2 | jeff.holos.dev.k2.ois.run |
| holos | dev | jeff | global | jeff.holos.dev.ois.run |
| holos | dev | - | k2 | holos.dev.k2.ois.run |
| holos | dev | - | global | holos.dev.ois.run |
| holos | prod | - | k2 | holos.k2.ois.run |
| holos | prod | - | global | holos.ois.run |
Auth proxy:
| Project | Stage | Auth Proxy Host | Auth Cookie Domain |
| ------- | ----- | ------ | ------------------ |
| holos | dev | auth.holos.dev.ois.run | holos.dev.ois.run |
| holos | dev | auth.holos.dev.k1.ois.run | holos.dev.k1.ois.run |
| holos | dev | auth.holos.dev.k2.ois.run | holos.dev.k2.ois.run |
| holos | prod | auth.holos.ois.run | holos.ois.run |
| holos | prod | auth.holos.k1.ois.run | holos.k1.ois.run |
| holos | prod | auth.holos.k2.ois.run | holos.k2.ois.run |
Prior to this, when running the 'install' or 'build' Makefile target,
the version of holos being built was not shown even though the 'build'
target attempted to show the version.
```
.PHONY: build
build: generate ## Build holos executable.
@echo "building ${BIN_NAME} ${VERSION}"
```
For example:
```
> make install
go generate ./...
building holos
...
```
Holo's version is stored in pkg/version/embedded/{major,minor,patch},
not the `Version` const. So the fix is to change the value of `VERSION`
so that it comes from those embedded files.
Now the version of holos is shown:
```
> make install
go generate ./...
building holos 0.61.1
...
```
This also adds a new Makefile target called `show-version` which shows
the full version string (i.e. the value of `$VERSION`).
The goal of this patch is to verify each project environment is wired up
to the ingress Gateway for the project stage.
This is a necessary step to eventually configure the VirtualService and
AuthorizationPolicy to only match on the `/dump/request` path of each
endpoint for troubleshooting.
This patch uses the existing #ManagedNamespaces definition to create and
manage namespaces on the provisioner and workload clusters so that
SecretStore and eso-creds-refresher resources are managed in the project
environment namespaces and the project stage system namespace.
Provisioner cluster:
This patch creates a Certificate resource in the provisioner for each
host associated with the project. By default, one host is created for
each stage with the short hostname set to the project name.
A namespace is also created for each project for eso creds refresher to
manage service accounts for SecretStore resources in the workload
clusters.
Workload cluster:
For each env, plus one system namespace per stage:
- Namespace per env
- SecretStore per env
- ExternalSecret per host in the env
Common names for the holos project, prod stage:
- holos.k1.ois.run
- holos.k2.ois.run
- holos.ois.run
Common names for the holos project, dev stage:
- holos.dev.k1.ois.run
- holos.dev.k2.ois.run
- holos.dev.ois.run
- holos.gary.k1.ois.run
- holos.gary.k2.ois.run
- holos.gary.ois.run
- holos.jeff.k1.ois.run
- holos.jeff.k2.ois.run
- holos.jeff.ois.run
- holos.nate.k1.ois.run
- holos.nate.k2.ois.run
- holos.nate.ois.run
Usage:
holos render --cluster-name=provisioner \
~/workspace/holos-run/holos/docs/examples/platforms/reference/clusters/provisioner/projects/...
holos render --cluster-name=k1 \
~/workspace/holos-run/holos/docs/examples/platforms/reference/clusters/workload/projects/...
holos render --cluster-name=k2 \
~/workspace/holos-run/holos/docs/examples/platforms/reference/clusters/workload/projects/...
This patch introduces a new BuildPlan spec.components.resources
collection, which is a map version of
spec.components.kubernetesObjectsList. The map version is much easier
to work with and produce in CUE than the list version.
The list version should be deprecated and removed prior to public
release.
The projects holos instance renders multiple holos components, each
containing kubernetes api objects defined directly in CUE.
<project>-system is intended for the ext auth proxy providers for all
stages.
<project>-namespaces is intended to create a namespace for each
environment in the project.
The intent is to expand the platform level definition of a project to
include the per-stage auth proxy and per-env role bindings. Secret
Store and ESO creds refresher resources will also be defined by the
platform level definition of a project.
This patch disallows unknown fields from CUE. The purpose is to fail
early if there is a typo in a nested field name and to speed up
refactoring the reference platform.
With this patch, refactoring the type definition of the Holos/CUE API is
a faster process:
1. Change api/vX/*.go
2. make gencue
3. Render the reference platform
4. Fix error with unknown fields
5. Verify rendered output is the same as before
Closes: #72
This patch establishes the BuildPlan struct as the single API contract
between CUE and Holos. A BuildPlan spec contains a list of each of the
support holos component types.
The purpose of this data structure is to support the use case of one CUE
instance generating 1 build plan that contains 0..N of each type of
holos component.
The need for multiple components per one CUE instance is to support the
generation of a collection of N~4 flux kustomization resources per
project and P~6 projects built from one CUE instance.
Tested with:
holos render --cluster-name=k2 ~/workspace/holos-run/holos/docs/examples/platforms/reference/clusters/foundation/cloud/init/namespaces/...
Common labels are removed because they're too tightly coupled to the
model of one component per one cue instance.
This patch refactors the go structs used to decode cue output for
processing by the holos cli. For context, the purpose of the structs
are to inform holos how the data from cue should be modeled and
processed as a rendering pipeline that provides rendered yaml to
configure kubernetes api objects.
The structs share common fields in the form of the HolosComponent
embedded struct. The three main holos component kinds today are:
1. KubernetesObjects - CUE outputs a nested map where each value is a
single rendered api object (resource).
2. HelmChart - CUE outputs the chart name and values. Holos calls helm
template to render the chart. Additional api objects may be
overlaid into the rendered output. Kustomize may also optionally be
called at the end of the render pipeline.
3. KustomizeBuild - CUE outputs data to construct a kustomize
kustomization build. The holos component contains raw yaml files to
use as kustomization resources. CUE optionally defines additional
patches, common labels, etc.
With the Go structs, cue may directly import the definitions to more
easily keep the CUE definitions in sync with what the holos cli expects
to receive.
The holos component types may be imported into cue using:
cue get go github.com/holos-run/holos/api/v1alpha1/...
Without this patch ks/prod-iam-zitadel often gets blocked waiting for
jobs that will never complete. In addition, flux should not manage the
zitadel-test-connection Pod which is an unnecessary artifact of the
upstream helm chart.
We'd disable helm hooks, but they're necessary to create the init and
setup jobs.
This patch also changes the default behavior of Kustomizations from
wait: true to wait: false. Waiting is expensive for the api server and
slows down the reconciliation process considerably.
Component authors should use ks.spec.healthChecks to target specific
important resources to watch and wait for.
This patch fixes the problem of the actions runner scale set listener
pod failing every 3 seconds. See
https://github.com/actions/actions-runner-controller/issues/3351
The solution is not ideal, if the primary cluster is down workflows will
not execute. The primary cluster shouldn't go down though so this is
the trade off. Lower log spam and resource usage by eliminating the
failing pods on other clusters for lower availability if the primary
cluster is not available.
We could let the pods loop and if the primary is unavailable another
would quickly pick up the role, but it doesn't seem worth it.
The effect of this patch is limited to refreshing credentials only for
namespaces that exist in the local cluster. There is structure in place
in the CUE code to allow for namespaces bound to specific clusters, but
this is used only by the optional Vault component.
This patch was an attempt to work around
https://github.com/actions/actions-runner-controller/issues/3351 by
deploying the runner scale sets into unique namespaces.
This effort was a waste of time, only one listener pod successfully
registered for a given scale set name / group combination.
Because we have only one group named Default we can only have one
listener pod globally for a given scale set name.
Because we want our workflows to execute regardless of the availability
of a single cluster, we're going to let this fail for now. The pod
retries every 3 seconds. When a cluster is destroyed, another cluster
will quickly register.
A follow up patch will look to expand this retry behavior.
This patch migrates the vault component from [holos-infra][1] to a cue
based component. Vault is optional in the reference platform, so this
patch also defines an `#OptionalServices` struct to conditionally manage
a service across multiple clusters in the platform.
The primary use case for optional services is managing a namespace to
provision and provide secrets across clusters.
[1]: https://github.com/holos-run/holos-infra/tree/v0.5.0/components/core/core/vault
Pods are unnecessarily created when deploying helm based holos
components and often fail. Prevent these test pods by disabling helm
hooks with the `--no-hooks` flag.
Closes: #54
Problem:
The standby cluster on k2 fails to start. A pgbackrest pod first
restores the database from S3, then the pgha nodes try to replay the WAL
as part of the standby initialization process. This fails because the
PGDATA directory is not empty.
Solution:
Specify the spec.dataSource field only when the cluster is configured as
a primary cluster.
Result:
Non-primary clusters are standby, they skip the pgbackrest job to
restore from S3 and move straight to patroni replaying the WAL from S3
as part of the pgha pods.
One of the two pgha pods becomes the "standby leader" and restores the
WAL from S3. The other is a cascading standby and then restores the
same WAL from the standby leader.
After 8 minutes both pods are ready.
```
❯ k get pods
NAME READY STATUS RESTARTS AGE
zitadel-pgbouncer-d9f8cffc-j469g 2/2 Running 0 11m
zitadel-pgbouncer-d9f8cffc-xq29g 2/2 Running 0 11m
zitadel-pgha1-27w7-0 4/4 Running 0 11m
zitadel-pgha1-c5qj-0 4/4 Running 0 11m
zitadel-repo-host-0 2/2 Running 0 11m
```
Problem:
The k3 and k4 clusters are getting the Zitadel components which are
really only intended for the core cluster pair.
Solution:
Split the workload subtree into two, named foundation and accounts. The
core cluster pair gets foundation+accounts while the kX clusters get
just the foundation subtree.
Result:
prod-zitadel-iam is no longer managed on k3 and k4
Set the restore point to time="2024-03-11T17:08:58Z" level=info
msg="crunchy-pgbackrest ends" which is just after Gary and Nate
registered and were granted the cluster-admin role.
The [Streaming Standby][standby] architecture requires custom tls certs
for two clusters in two regions to connect to each other.
This patch manages the custom certs following the configuration
described in the article [Using Cert Manager to Deploy TLS for Postgres
on Kubernetes][article].
NOTE: One thing not mentioned anywhere in the crunchy documentation is
how custom tls certs work with pgbouncer. The pgbouncer service uses a
tls certificate issued by the pgo root cert, not by the custom
certificate authority.
For this reason, we use kustomize to patch the zitadel Deployment and
the zitadel-init and zitadel-setup Jobs. The patch projects the ca
bundle from the `zitadel-pgbouncer` secret into the zitadel pods at
/pgbouncer/ca.crt
[standby]: https://access.crunchydata.com/documentation/postgres-operator/latest/architecture/disaster-recovery#streaming-standby-with-an-external-repo
[article]: https://www.crunchydata.com/blog/using-cert-manager-to-deploy-tls-for-postgres-on-kubernetes
A full backup was taken using:
```
kubectl annotate postgrescluster zitadel postgres-operator.crunchydata.com/pgbackrest-backup="$(date)"
```
And completed with:
```
❯ k logs -f zitadel-backup-5r6v-v5jnm
time="2024-03-10T21:52:15Z" level=info msg="crunchy-pgbackrest starts"
time="2024-03-10T21:52:15Z" level=info msg="debug flag set to false"
time="2024-03-10T21:52:15Z" level=info msg="backrest backup command requested"
time="2024-03-10T21:52:15Z" level=info msg="command to execute is [pgbackrest backup --stanza=db --repo=2 --type=full]"
time="2024-03-10T21:55:18Z" level=info msg="crunchy-pgbackrest ends"
```
This patch verifies the point in time backup is robust in the face of
the following operations:
1. pg cluster zitadel was deleted (whole namespace emptied)
2. pg cluster zitadel was re-created _without_ a `dataSource`
3. pgo initailized a new database and backed up the blank database to
S3.
4. pg cluster zitadel was deleted again.
5. pg cluster zitadel was re-created with `dataSource` `options: ["--type=time", "--target=\"2024-03-10 21:56:00+00\""]` (Just after the full backup completed)
6. Restore completed successfully.
7. Applied the holos zitadel component.
8. Zitadel came up successfully and user login worked as expected.
- [x] Perform an in place [restore][restore] from [s3][bucket].
- [x] Set repo1-retention-full to clear warning
[restore]: https://access.crunchydata.com/documentation/postgres-operator/latest/tutorials/backups-disaster-recovery/disaster-recovery#restore-properties
[bucket]: https://access.crunchydata.com/documentation/postgres-operator/latest/tutorials/backups-disaster-recovery/disaster-recovery#cloud-based-data-source
could not run: holos.spec.artifacts.0.transformers.0.kustomize.kustomization.patches.0.target.name: cannot convert non-concrete value string at builder/v1alpha5/builder.go:218
holos.spec.artifacts.0.transformers.0.kustomize.kustomization.patches.0.target.name: cannot convert non-concrete value string:
// Label is an arbitrary unique identifier internal to holos itself. The holos
// cli is expected to never write a Label value to rendered output files,
// therefore use a [Label] then the identifier must be unique and internal.
// Defined as a type for clarity and type checking.
//
// A Label is useful to convert a CUE struct to a list, for example producing a list of [APIObject] resources from an [APIObjectMap]. A CUE struct using
// Label keys is guaranteed to not lose data when rendering output because a
// Label is expected to never be written to the final output.
typeLabelstring
// Kind is a kubernetes api object kind. Defined as a type for clarity and type checking.
typeKindstring
// APIObject represents the most basic generic form of a single kubernetes api
// object. Represented as a JSON object internally for compatibility between
// tools, for example loading from CUE.
typeAPIObjectstructpb.Struct
// APIObjectMap represents the marshalled yaml representation of kubernetes api
// objects. Do not produce an APIObjectMap directly, instead use [APIObjects]
// to produce the marshalled yaml representation from CUE data, then provide the
// result to [HolosComponent].
typeAPIObjectMapmap[Kind]map[Label]string
// APIObjects represents Kubernetes API objects defined directly from CUE code.
// Useful to mix in resources to any kind of [HolosComponent], for example
// adding an ExternalSecret resource to a [HelmChart].
//
// [Kind] must be the resource kind, e.g. Deployment or Service.
//
// [Label] is an arbitrary internal identifier to uniquely identify the resource
// within the context of a `holos` command. Holos will never write the
// intermediate label to rendered output.
//
// Refer to [HolosComponent] which accepts an [APIObjectMap] field provided by
stderr 'Error: execution error at \(zitadel/templates/secret_zitadel-masterkey.yaml:2:4\): Either set .Values.zitadel.masterkey xor .Values.zitadel.masterkeySecretName'
# Reference the name of a secret that contains ZITADEL configuration.
configSecretName:
# The key under which the ZITADEL configuration is located in the secret.
configSecretKey: config-yaml
# ZITADEL uses the masterkey for symmetric encryption.
# You can generate it for example with tr -dc A-Za-z0-9 </dev/urandom | head -c 32
masterkey: ""
# Reference the name of the secret that contains the masterkey. The key should be named "masterkey".
# Note: Either zitadel.masterkey or zitadel.masterkeySecretName must be set
masterkeySecretName: ""
# Annotations set on masterkey secret
masterkeyAnnotations:
helm.sh/hook: pre-install,pre-upgrade
helm.sh/hook-delete-policy: before-hook-creation
helm.sh/hook-weight: "0"
# The CA Certificate needed for establishing secure database connections
dbSslCaCrt: ""
# The Secret containing the CA certificate at key ca.crt needed for establishing secure database connections
dbSslCaCrtSecret: ""
# The db admins secret containing the client certificate and key at tls.crt and tls.key needed for establishing secure database connections
dbSslAdminCrtSecret: ""
# The db users secret containing the client certificate and key at tls.crt and tls.key needed for establishing secure database connections
dbSslUserCrtSecret: ""
# Generate a self-signed certificate using an init container
# This will also mount the generated files to /etc/tls/ so that you can reference them in the pod.
# E.G. KeyPath: /etc/tls/tls.key CertPath: /etc/tls/tls.crt
# By default, the SAN DNS names include, localhost, the POD IP address and the POD name. You may include one more by using additionalDnsName like "my.zitadel.fqdn".
selfSignedCert:
enabled: false
additionalDnsName:
replicaCount: 3
image:
repository: ghcr.io/zitadel/zitadel
pullPolicy: IfNotPresent
# Overrides the image tag whose default is the chart appVersion.
tag: ""
chownImage:
repository: alpine
pullPolicy: IfNotPresent
tag: "3.19"
imagePullSecrets: []
nameOverride: ""
fullnameOverride: ""
# Annotations to add to the deployment
annotations: {}
# Annotations to add to the configMap
configMap:
annotations:
helm.sh/hook: pre-install,pre-upgrade
helm.sh/hook-delete-policy: before-hook-creation
helm.sh/hook-weight: "0"
serviceAccount:
# Specifies whether a service account should be created
create: true
# Annotations to add to the service account
annotations:
helm.sh/hook: pre-install,pre-upgrade
helm.sh/hook-delete-policy: before-hook-creation
helm.sh/hook-weight: "0"
# The name of the service account to use.
# If not set and create is true, a name is generated using the fullname template
name: ""
podAnnotations: {}
podAdditionalLabels: {}
podSecurityContext:
runAsNonRoot: true
runAsUser: 1000
securityContext: {}
# Additional environment variables
env:
[]
# - name: ZITADEL_DATABASE_POSTGRES_HOST
# valueFrom:
# secretKeyRef:
# name: postgres-pguser-postgres
# key: host
service:
type: ClusterIP
# If service type is "ClusterIP", this can optionally be set to a fixed IP address.
clusterIP: ""
port: 8080
protocol: http2
annotations: {}
scheme: HTTP
ingress:
enabled: false
className: ""
annotations: {}
hosts:
- host: localhost
paths:
- path: /
pathType: Prefix
tls: []
resources: {}
nodeSelector: {}
tolerations: []
affinity: {}
topologySpreadConstraints: []
initJob:
# Once ZITADEL is installed, the initJob can be disabled.
Holos is a tool intended to lighten the burden of managing Kubernetes resources. In 2020 we set out to develop a holistic platform composed from open source cloud native components. We quickly became frustrated with how each of the major components packaged and distributed their software in a different way. Many projects choose to distribute their software with Helm charts, while others provide plain yaml files and Kustomize bases. The popular Kube Prometheus Stack project provides Jsonnet to render and update Kubernetes yaml manifests.
In this guide, we'll explore how Holos supports the frontend development team at [Bank of Holos] in reconfiguring an already deployed service. Along the way, we'll demonstrate how simple configuration changes are made safer with type checking, and how rendering the complete platform provides clear visibility into those changes.
This guide builds on the concepts covered in the [Quickstart] and [Deploy a Service] guides.
## What you'll need {#requirements}
Like our other guides, this guide is intended to be useful without needing to
run each command. If you'd like to apply the manifests to a real Cluster,
complete the [Local Cluster Guide](/docs/guides/local-cluster) before this
guide.
You'll need the following tools installed to run the commands in this guide.
1. [holos](/docs/install) - to build the Platform.
2. [helm](https://helm.sh/docs/intro/install/) - to render Holos Components that
wrap Helm charts.
3. [kubectl](https://kubernetes.io/docs/tasks/tools/) - to render Holos
Components that render with Kustomize.
## Fork the Git Repository
If you haven't already done so, [fork the Bank of
Holos](https://github.com/holos-run/bank-of-holos/fork) then clone the
could not run: could not marshal json projects/platform/components/istio/cni: cue: marshal error: _Organization.DisplayName: 2 errors in empty disjunction: (and 2 more errors) at internal/builder/builder.go:63
_Organization.DisplayName: _Organization.DisplayName: 2 errors in empty disjunction: (and 2 more errors)
could not run: could not marshal json projects/platform/components/argocd/crds: cue: marshal error: _Organization.DisplayName: 2 errors in empty disjunction: (and 2 more errors) at internal/builder/builder.go:63
_Organization.DisplayName: _Organization.DisplayName: 2 errors in empty disjunction: (and 2 more errors)
could not run: could not render component: exit status 1 at builder/v1alpha4/builder.go:95
```
</TabItem>
</Tabs>
:::warning Whoops
The development team defined a value that isn't allowed by the
configuration.
:::
Someone else in the organization placed a [constraint] on the
configuration to ensure the display name contains only letters, numbers, and
spaces. This constraint is expressed as a [regular expression].
:::tip
CUE provides clear visibility where to start looking to resolve conflicts. Each
file and line number listed is a place the `#Organization.DisplayName` field is
defined.
:::
Let's try again, this time replacing the hyphen with a space.
description: Self service platform resource management for project teams.
slug: /archive/guides/2024-09-17-manage-a-project
sidebar_position: 250
---
import Tabs from '@theme/Tabs';
import TabItem from '@theme/TabItem';
import Admonition from '@theme/Admonition';
# Manage a Project
In this guide we'll explore how Holos easily, safely, and consistently manages
platform resources for teams to develop the projects they're working on.
Intended Audience: Platform Engineers and Software Engineers.
Goal is to demonstrate how the platform team can consistently, easily, and
safely provide platform resources to software engineers.
Assumption is software engineers have a container they want to deploy onto the
platform and make accessible. We'll use httpbin as a stand-in for the dev
team's container.
Project is roughly equivalent to Dev Team for the purpose of this guide, but in
practice multiple teams work on a given project over the lifetime of the
project, so we structure the files into projects instead of teams.
## What you'll need {#requirements}
You'll need the following tools installed to complete this guide.
1. [holos](/docs/install) - to build the Platform.
2. [helm](https://helm.sh/docs/intro/install/) - to render Helm Components.
3. [kubectl](https://kubernetes.io/docs/tasks/tools/) - to render Kustomize Components.
If you'd like to apply the manifests we render in this guide complete the
following optional, but recommended, steps.
a. Complete the [Local Cluster] guide to set up a local cluster to work with.
b. You'll need a GitHub account to fork the repository associated with this
guide.
## Fork the Guide Repository
<Tabs groupId="fork">
<TabItem value="command" label="Command">
```bash
```
</TabItem>
<TabItem value="output" label="Output">
```txt showLineNumbers
```
</TabItem>
</Tabs>
This guide assumes you will run commands from the root directory of this
repository unless stated otherwise.
[Quickstart]: /docs/quickstart
[Local Cluster]: /docs/guides/local-cluster
## Render the Platform
So we can build the basic platform. Don't dwell on the platform bits.
## Apply the Manifests
Deploy ArgoCD, but not any of the Application resources.
## Browse to ArgoCD
Note there is nothing here yet.
## Switch to your Fork
Note all of the Applications change consistently.
## Apply the Applications
Note how ArgoCD takes over management, no longer need to k apply.
## Create a Project
Project is a conceptual, not technical, thing in Holos. Mainly about how components are laid out in the filesystem tree.
We use a schematic built into holos as an example, the platform team could use the same or provide a similar template and instructions for development teams to self-serve.
## Render the Platform
Notice:
1. Project is registered with the platform at the root.
2. HTTPRoute and Namespace resources are added close to the root in `projects`
3. Deployment and Service resources are added at the leaf in `projects/httpbin/backend`
## Update the image tag
Add a basic schematic to demonstrate this. May need to add two new flags for image url and image tag to the generate subcommand, but should just be two new fields on the struct.
## Dive Deeper
Set the stage for constraints. Ideas: Limit what resources can be added,
namespaces can be operated in, enforce labels, etc...
File diff suppressed because it is too large
Load Diff
Some files were not shown because too many files have changed in this diff
Show More
Reference in New Issue
Block a user
Blocking a user prevents them from interacting with repositories, such as opening or commenting on pull requests or issues. Learn more about blocking a user.