## What this PR does
When an NFS-backed RWX volume is published to multiple VMs, the
`CiliumNetworkPolicy` `endpointSelector.matchLabels` only included the
first VM. Subsequent `ControllerPublishVolume` calls added
`ownerReferences` but never broadened the selector, causing Cilium to
block NFS egress — mounts hang on all nodes except the first.
This PR switches from `matchLabels` to `matchExpressions` (`operator:
In`) so the selector can list multiple VM names, and rebuilds it
whenever ownerReferences are added or removed.
### Release note
```release-note
[kubernetes] Fixed CiliumNetworkPolicy endpointSelector not being updated when NFS-backed RWX volumes are published to multiple VMs, which caused NFS mounts to hang on all nodes except the first.
```
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit
* **New Features**
* KubeVirt CSI driver now supports selecting and targeting multiple
virtual machines for volume publishing.
* **Improvements**
* Network policy targets are rebuilt automatically when VM ownership
references change, improving correctness and lifecycle handling in
multi-VM scenarios.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
## Summary
- Adds `inlineScrapeConfig` support to the tenant `VMAgent` resource in
`packages/system/monitoring/templates/vm/vmagent.yaml`
- Adds commented usage example in both
`packages/system/monitoring/values.yaml` and
`packages/extra/monitoring/values.yaml`
## Problem
The tenant VMAgent resource does not support custom scrape
configurations. Using `additionalScrapeConfigs` (which references a
Kubernetes Secret) is not viable because tenant users have no access to
create or read Secrets — they can only manage resources under
`apps.cozystack.io`. Instead, `inlineScrapeConfig` accepts inline YAML
directly through Helm values, which is consistent with tenant
permissions.
Relates to #2194
## Usage
```yaml
vmagent:
inlineScrapeConfig: |
- job_name: "custom"
static_configs:
- targets: ["my-service:9090"]
```
## Test plan
- [ ] `helm template` monitoring with `inlineScrapeConfig` set — block
rendered
- [ ] `helm template` monitoring without it — no `inlineScrapeConfig` in
output
- [ ] Deploy and verify custom scrape targets are picked up by vmagent
```release-note
Add inlineScrapeConfig support to tenant vmagent
```
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit
* **New Features**
* Support for including an inline scrape configuration into VMAgent
monitoring when provided.
* **Documentation**
* Added a commented example showing how to supply an inline scrape job
and targets.
* Reordered remote-write URL entries in monitoring configuration for
clearer ordering.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
## What this PR does
Adds `authentication.oidc.keycloakInternalUrl` platform value that
allows oauth2-proxy
in the dashboard to route backend calls (token exchange, JWKS, userinfo,
logout) through
an internal cluster URL while keeping browser redirects on the external
Keycloak URL.
When set, oauth2-proxy uses `--skip-oidc-discovery` and explicit
endpoint URLs pointing
to the internal Keycloak service. This avoids external DNS lookups and
TLS overhead for
pod-to-pod communication.
Fully backward-compatible: when the value is empty (default), behavior
is unchanged.
### Release note
```release-note
[dashboard] Added `authentication.oidc.keycloakInternalUrl` platform value to route oauth2-proxy backend requests through internal Keycloak service URL, bypassing external DNS and TLS.
```
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit
* **New Features**
* Added configuration support for an internal KeyCloak URL, enabling
backend authentication requests to be routed through an alternative
endpoint while maintaining existing external URLs for browser
interactions.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
When an NFS-backed RWX volume is published to multiple VMs, the
CiliumNetworkPolicy egress rule only allowed traffic from the first VM.
The endpointSelector.matchLabels was set once on creation and never
broadened, causing NFS mounts to hang on all nodes except the first.
Switch from matchLabels to matchExpressions (operator: In) so the
selector can list multiple VM names. Rebuild the selector whenever
ownerReferences are added or removed.
Signed-off-by: mattia-eleuteri <mattia@hidora.io>
<!-- Thank you for making a contribution! Here are some tips for you:
- Start the PR title with the [label] of Cozystack component:
- For system components: [platform], [system], [linstor], [cilium],
[kube-ovn], [dashboard], [cluster-api], etc.
- For managed apps: [apps], [tenant], [kubernetes], [postgres],
[virtual-machine] etc.
- For development and maintenance: [tests], [ci], [docs], [maintenance].
- If it's a work in progress, consider creating this PR as a draft.
- Don't hesistate to ask for opinion and review in the community chats,
even if it's still a draft.
- Add the label `backport` if it's a bugfix that needs to be backported
to a previous version.
-->
## What this PR does
### Release note
<!-- Write a release note:
- Explain what has changed internally and for users.
- Start with the same [label] as in the PR title
- Follow the guidelines at
https://github.com/kubernetes/community/blob/master/contributors/guide/release-notes.md.
-->
```release-note
[backups] Added fix to roles and changed backupstrategy-controller location
```
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit
* **Chores**
* Updated backup controller namespace configuration.
* Enhanced backup controller permissions for leader election and event
recording capabilities.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
## Summary
- Remove merged RWX validation patch (`001-rwx-validation.diff`)
- Add new patch (`001-relocate-after-clone-restore.diff`) that includes:
- Randomized node selection for snapshot restore
- `linstor.csi.linbit.com/relocateAfterClone` StorageClass parameter to
relocate replicas to optimal nodes after clone
- `snap.linstor.csi.linbit.com/relocate-after-restore`
VolumeSnapshotClass parameter to relocate replicas to optimal nodes
after snapshot restore
- Both parameters are **disabled by default**
- Placing the snapshot restore parameter in VolumeSnapshotClass prevents
unwanted relocation when Velero creates temporary PVCs during data mover
backup
Upstream PRs:
- https://github.com/piraeusdatastore/linstor-csi/pull/418
- https://github.com/piraeusdatastore/linstor-csi/pull/419
## Test plan
- [x] Clone a PVC and verify relocation logic executes
- [x] Restore a PVC from snapshot and verify replicas get migrated to
optimal nodes
- [x] Verified on dev5 cluster (3-node) — snapshot restore triggered
actual migration (node0 → node2)
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit
## Release Notes
* **New Features**
* Automatic resource relocation after clone and restore operations
optimizes storage placement and load balancing across nodes.
* RWX block attachment validation with optional disable flag ensures
proper multi-pod access control.
* **Chores**
* Updated CDI clone strategy to use CSI-clone for improved efficiency.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
This PR adds the changelog for release `v1.1.2`.
✅ Changelog has been automatically generated in
`docs/changelogs/v1.1.2.md`.
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit
* **Bug Fixes**
* Fixed S3 Manager endpoint alignment with BucketInfo secrets
* Resolved spurious OpenAPI post-processing errors on startup
* **Documentation**
* Added troubleshooting guidance for DependenciesNotReady
* Enhanced installation documentation with Ansible guide
* Added CA rotation operations documentation
* Improved backup and recovery guidance
* Expanded metrics and architecture references
* Reorganized operator-first installation guidance
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
When set, oauth2-proxy skips OIDC discovery and routes all backend calls
(token exchange, JWKS, userinfo, logout) through the internal cluster URL
while keeping browser redirects on the external URL. This avoids external
DNS lookups and TLS overhead for pod-to-pod communication with Keycloak.
Assisted-By: Claude AI
Signed-off-by: Kirill Ilin <stitch14@yandex.ru>
This PR adds the changelog for release `v1.0.5`.
✅ Changelog has been automatically generated in
`docs/changelogs/v1.0.5.md`.
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit
* **Bug Fixes**
* Resolved spurious error messages in OpenAPI post-processing for
certain configurations.
* **Documentation**
* Enhanced troubleshooting guides and installation instructions.
* Expanded operational procedures for backups and CA rotation.
* Added custom metrics collection guidance and architecture
documentation.
* Completed comprehensive v1 documentation refresh.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
## What this PR does
The OpenAPI `PostProcessSpec` callback is invoked for every registered
group-version (apps, core, version, etc.), but the Application schema
cloning logic only applies to `apps.cozystack.io`. When called for other
GVs the base Application schemas are absent, producing a spurious error
on every API server start:
```
ERROR klog Failed to build OpenAPI v3 for group version, "base Application* schemas not found"
```
This PR changes the post-processor (both v2 and v3) to return early
when the base schemas are not found, instead of returning an error.
### Release note
```release-note
[platform] Fix spurious "base Application* schemas not found" error logged on cozystack-api startup
```
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit
* **Bug Fixes**
* Improved error handling for missing OpenAPI schema components. The
system now gracefully continues processing instead of halting when
certain base schemas are unavailable.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
## What this PR does
Fixes s3manager UI deployment to use the actual S3 endpoint from
BucketInfo (COSI) instead of constructing it from the tenant namespace
host.
The deployment was using `s3.<tenant>.<cluster-domain>` while
credentials issued by COSI point to the root-level S3 endpoint. This
mismatch caused "invalid credentials" errors on login even with correct
credentials from the bucket secret.
Falls back to the constructed namespace host on first deploy before
BucketAccess secrets exist.
### Release note
```release-note
[bucket] Fix s3manager endpoint mismatch causing "invalid credentials" errors in login mode
```
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit
* **Refactor**
* Deployment configuration now supports per-user endpoint customization.
Endpoints are dynamically retrieved from account-specific settings,
enabling flexible configurations while maintaining backward
compatibility for standard deployments without custom settings.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
<!-- Thank you for making a contribution! Here are some tips for you:
- Start the PR title with the [label] of Cozystack component:
- For system components: [platform], [system], [linstor], [cilium],
[kube-ovn], [dashboard], [cluster-api], etc.
- For managed apps: [apps], [tenant], [kubernetes], [postgres],
[virtual-machine] etc.
- For development and maintenance: [tests], [ci], [docs], [maintenance].
- If it's a work in progress, consider creating this PR as a draft.
- Don't hesistate to ask for opinion and review in the community chats,
even if it's still a draft.
- Add the label `backport` if it's a bugfix that needs to be backported
to a previous version.
-->
## What this PR does
This PR adds version specific ubuntu base images to fix errors when base
image has new deb packages of kubeadm and kubelet installed, but at
runtime it was downgraded by replacing just binaries. Now update by
replacing binaries works as intended - latest patch version of minor
version used.
Core issue was in kubeadm<1.32 expecting conntrack binary in its
preflight checks but it was not found. It happened because kubelet deb
package dropped conntrack dependency since 1.32 (actually it absent in
1.31.14 too).
So now status of supported tenant k8s versions is:
- 1.30 - works because kubelet package provided conntrack, also
conntrack preflight check ignored (see 1.31).
- 1.31 - works because conntrack preflight check ignored (for some
reason kubelet 1.31.14 did't provide conntrack dependency, unlike
1.31.13 did).
- \>=1.32 - works because conntrack preflight check removed from
`kubeadm init` entirely.
Conntrack preflight check ignoring is legit for tenant kubernetes
clusters because until 1.32 it was used in kube-proxy but cozystack k8s
approach doesn't use kube-proxy (replaced with cilium).
Issue with conntrack may be mitigated with only `ignorePreflightErrors`,
but I think proper base image build will help to avoid similar bugs in
future.
### Release note
<!-- Write a release note:
- Explain what has changed internally and for users.
- Start with the same [label] as in the PR title
- Follow the guidelines at
https://github.com/kubernetes/community/blob/master/contributors/guide/release-notes.md.
-->
```release-note
[kubernetes] Fixed tenant k8s older than 1.32 creation by adding version specific ubuntu base images
```
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit
## Release Notes
* **New Features**
* Added multi-version Kubernetes support with version-specific container
images.
* Enhanced compatibility with newer Kubernetes releases, including
version 1.31.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
When dependencies are not ready the reconciler returned without
requeueing, relying solely on watch events to re-trigger. If a watch
event was missed (controller restart, race condition, dependency already
ready before watch setup), the package would stay stuck in
DependenciesNotReady forever.
Add RequeueAfter: 30s so dependencies are periodically rechecked.
Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: Andrei Kvapil <kvapss@gmail.com>
The OpenAPI PostProcessSpec callback is invoked for every group-version
(apps, core, version, etc.), but the Application schema cloning logic
only applies to apps.cozystack.io. When called for other GVs the base
Application schemas are absent, causing a spurious error log on every
API server start.
Return early instead of erroring when the base schemas are not found.
Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: Andrei Kvapil <kvapss@gmail.com>
The deployment template was constructing the S3 endpoint from the tenant's
namespace host (e.g. s3.freedom.infra.example.com), while COSI credentials
are issued for the actual SeaweedFS endpoint (e.g. s3.infra.example.com).
This mismatch caused 'invalid credentials' errors when users tried to log
in with valid credentials from the bucket secret.
Now the endpoint is resolved from BucketInfo (same source as credentials),
with a fallback to the constructed namespace host for first-time deploys
before BucketAccess secrets are created.
Signed-off-by: IvanHunters <xorokhotnikov@gmail.com>
Hardcode relocateAfterClone=true and relocateAfterRestore=false as
defaults in the CSI driver patch instead of exposing them via
StorageClass/VolumeSnapshotClass parameters.
Remove the extra linstor-snapshots-ephemeral VolumeSnapshotClass (Velero)
and the relocateAfterRestore parameter from the default VolumeSnapshotClass.
This is a temporary measure while upstream linstor-server is deciding on
the interface for the rebalance feature (see LINBIT/linstor-server#487).
Once upstream provides native rebalance support, these hardcoded defaults
will be replaced by the proper upstream mechanism.
Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: Andrei Kvapil <kvapss@gmail.com>
## What this PR does
Adds the cozystack-scheduler as an optional system package, vendored
from https://github.com/cozystack/cozystack-scheduler. The scheduler
extends the default kube-scheduler with SchedulingClass-aware affinity
plugins, allowing platform operators to define cluster-wide scheduling
constraints via a SchedulingClass CRD. Pods opt in via the
`scheduler.cozystack.io/scheduling-class` annotation.
The package includes:
- Helm chart with RBAC, ConfigMap, Deployment, and CRD
- PackageSource definition for the cozystack package system
- Optional inclusion in the platform system bundle
### Release note
```release-note
[cozystack-scheduler] Add cozystack-scheduler as an optional system
package. The custom scheduler supports SchedulingClass CRDs for
cluster-wide node affinity, pod affinity, and topology spread constraints.
```
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit
* **New Features**
* Added cozystack-scheduler component as an optional system package.
* Introduced SchedulingClass custom resource for advanced scheduling
configurations.
* Scheduler supports node affinity, pod affinity, pod anti-affinity, and
topology spread constraints.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
## What this PR does
Updates the Keycloak Operator Helm chart to v1.32.0 and builds a custom
operator image from upstream master
(https://github.com/epam/edp-keycloak-operator/commit/facbc36)
with the group rename detection patch from PR
https://github.com/epam/edp-keycloak-operator/pull/309 applied on top.
Bumps Keycloak server from 26.0.4 to 26.5.2, which is required by the
new operator client (sends `description` field rejected by older
versions).
Adds SSO session settings (idleTimeout: 86400, maxLifespan: 604800) to
the ClusterKeycloakRealm to match the dashboard client's session
attributes,
as Keycloak 26 enforces realm-level session limits strictly.
Removes `authorizationServicesEnabled` from the dashboard
KeycloakClient,
which is incompatible with Keycloak 26's stricter validation.
### Release note
```release-note
[keycloak-operator] Update the operator to v1.32.0 with group rename fix
(https://github.com/epam/edp-keycloak-operator/pull/309). Bump Keycloak to 26.5.2.
```
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit
* **New Features**
* Optional webhook support with cert-manager integration and webhook
service; feature toggles for webhooks and owner refs
* New realm login and session settings, organizations resource/CRD, and
expanded client/user schemas (including ClientRolesV2)
* **Documentation**
* Chart bumped to 1.32.0; README documents new values (clusterDomain,
podLabels, image.registry, securityContext, containerSecurityContext)
* **Security / RBAC**
* RBAC updated to cover organization resources and webhook bindings;
consolidated operator permissions
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
## Summary
Add external-dns as a standalone self-managed application in
`packages/extra/external-dns/`, allowing tenants to deploy and configure
their own DNS management directly from the dashboard.
## Motivation
Tenants need the ability to manage their own DNS domains with their own
provider. Following the [developers
guide](https://github.com/cozystack/website/pull/413), this is
implemented as an extra package (like `ingress` and `seaweedfs`) using
the HelmRelease-based pattern, rather than embedding it in the tenant
chart.
This enables multi-tenant scenarios where:
- Tenant A uses Cloudflare for `domain-a.com`
- Tenant B uses AWS Route53 for `domain-b.com`
- Each tenant deploys and manages external-dns independently from the
dashboard
## Changes
- **New package**: `packages/extra/external-dns/` — standalone
HelmRelease-based application
- **New PackageSource**:
`packages/core/platform/sources/external-dns-application.yaml` —
references `system/external-dns` and `extra/external-dns` components
- **Cleaned tenant chart**: removed the previously embedded
`externalDns` block from `packages/apps/tenant/`
## Features
- Support for 9 DNS providers: cloudflare, aws, azure, google,
digitalocean, linode, ovh, exoscale, godaddy
- Per-provider credential configuration with full JSON schema validation
- Domain filtering via `domainFilters`
- Configurable sync policy (`sync` or `upsert-only`)
- Namespaced operation (`namespaced: true`) for tenant isolation
- Unique `txtOwnerId` per namespace to prevent DNS record conflicts
- Resource sizing via presets or explicit CPU/memory
## Usage Example
Deploy from the dashboard, or via values:
```yaml
# Cloudflare
provider: cloudflare
domainFilters:
- example.com
cloudflare:
apiToken: "your-cloudflare-api-token"
```
```yaml
# AWS Route53
provider: aws
domainFilters:
- example.org
aws:
accessKeyId: "AKIAXXXXXXXX"
secretAccessKey: "your-secret-key"
region: "us-east-1"
```
## Test plan
- [ ] `helm template external-dns packages/extra/external-dns/ --set
provider=cloudflare --set cloudflare.apiToken=test` renders correctly
- [ ] `helm template external-dns packages/extra/external-dns/` fails
(provider required)
- [ ] `helm template wrong-name packages/extra/external-dns/ --set
provider=cloudflare` fails (release name check)
- [ ] Deploy external-dns from tenant dashboard
- [ ] Verify HelmRelease is created in tenant namespace with namespaced
RBAC
- [ ] Create an Ingress and verify DNS record is created
- [ ] Verify no conflict with global external-dns instance
🤖 Generated with [Claude Code](https://claude.com/claude-code)
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit
* **New Features**
* Added an External DNS package for automatic DNS record management.
* Support for 9 DNS providers: Cloudflare, AWS, Azure, Google,
DigitalOcean, Linode, OVH, Exoscale, GoDaddy.
* Helm-based deployment with namespaced/system variants and release
configuration options.
* Configurable synchronization policies, domain filtering, provider
credentials, extra args, and resource presets.
* **Documentation**
* New README and schema-driven values documentation for installation and
configuration.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
## What this PR does
Updates the Keycloak Operator Helm chart to v1.32.0 and builds a custom
operator image from upstream master (epam/edp-keycloak-operator@facbc36)
with the group rename detection patch from PR
epam/edp-keycloak-operator#309 applied on top.
Bumps Keycloak server from 26.0.4 to 26.5.2, which is required by the
new operator client (sends `description` field rejected by older versions).
Adds SSO session settings (idleTimeout: 86400, maxLifespan: 604800) to
the ClusterKeycloakRealm to match the dashboard client's session attributes,
as Keycloak 26 enforces realm-level session limits strictly.
Removes `authorizationServicesEnabled` from the dashboard KeycloakClient,
which is incompatible with Keycloak 26's stricter validation.
### Release note
```release-note
[keycloak-operator] Update the operator to v1.32.0 with group rename fix
(epam/edp-keycloak-operator#309). Bump Keycloak to 26.5.2.
```
Signed-off-by: Timofei Larkin <lllamnyp@gmail.com>
## What this PR does
Migrates VictoriaLogs from the deprecated single-node `VLogs` CR to
`VLCluster` (cluster mode) with vlinsert/vlselect/vlstorage components
for reliability and horizontal scalability.
**Operator upgrade:**
- Upgrades victoria-metrics-operator from v0.55.0 to v0.68.1 to add
VLCluster CRD support
**VLCluster deployment:**
- Replaces `VLogs` (v1beta1) with `VLCluster` (v1) — 2 replicas per
component, consistent with VMCluster
- Adds VPA for all VLCluster components (vlinsert, vlselect, vlstorage)
- Updates WorkloadMonitors for the three-component architecture
**Endpoint updates:**
- Fluent-bit outputs: `vlogs-generic:9428` → `vlinsert-generic:9481`
- Grafana datasource: `vlogs-{name}:9428` → `vlselect-{name}:9471`
- ExternalName service: `vlogs-generic` → `vlinsert-generic`
**Migration (35 → 36):**
- Adds `helm.sh/resource-policy: keep` annotation to existing VLogs
resources so they are preserved during upgrade
- Users need to verify the new VLCluster is working, then optionally
migrate historical data and manually delete old VLogs resources
### Release note
```release-note
[monitoring] Migrate VictoriaLogs from single-node VLogs to VLCluster (cluster mode). Old VLogs resources are preserved with `helm.sh/resource-policy: keep` annotation. After upgrade, verify the new cluster is working, then optionally migrate historical data and delete old VLogs resources manually.
```
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit
* **New Features**
* CRD upgrade workflow with configurable hooks, Job, and ServiceAccount
support
* Monitoring storage split into vlinsert / vlselect / vlstorage with
corresponding VPAs
* Service traffic distribution option and optional shareProcessNamespace
toggle
* **Updates**
* VictoriaMetrics Operator bumped to v0.68.1
* Fluent Bit and Grafana endpoints/ports updated to new monitoring
targets
* Global extra-labels support for resources
* Migration target advanced to version 36
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
The victoria-metrics-operator v0.68.1 renamed VMCluster status field
from .status.clusterStatus to .status.updateStatus.
Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: Kirill Ilin <stitch14@yandex.ru>
This PR adds the changelog for release `v1.1.1`.
✅ Changelog has been automatically generated in
`docs/changelogs/v1.1.1.md`.
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit
* **Bug Fixes**
* Fixed MarketplacePanel visibility and disabled state management in
dashboard sidebar
* Fixed External IP address display in dashboard resource tables
* Fixed MAC address preservation during virtual machine migrations
* Resolved deprecated component image issue
* Improved migration handling for missing component dependencies
* Fixed Keycloak health monitoring and application stability
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
This PR adds the changelog for release `v1.0.4`.
✅ Changelog has been automatically generated in
`docs/changelogs/v1.0.4.md`.
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit
* **Documentation**
* Updated v1.0.4 changelog with system, platform, and dashboard
improvements
* Added OIDC self-signed certificates configuration guide
* Documented fixes for health probes, virtual machine migration, and
dashboard features
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
## What this PR does
Adds the cozystack-scheduler as an optional system package, vendored from
https://github.com/cozystack/cozystack-scheduler. The scheduler extends
the default kube-scheduler with SchedulingClass-aware affinity plugins,
allowing platform operators to define cluster-wide scheduling constraints
via a SchedulingClass CRD. Pods opt in via the
`scheduler.cozystack.io/scheduling-class` annotation.
The package includes:
- Helm chart with RBAC, ConfigMap, Deployment, and CRD
- PackageSource definition for the cozystack package system
- Optional inclusion in the platform system bundle
### Release note
```release-note
[cozystack-scheduler] Add cozystack-scheduler as an optional system
package. The custom scheduler supports SchedulingClass CRDs for
cluster-wide node affinity, pod affinity, and topology spread constraints.
```
Signed-off-by: Timofei Larkin <lllamnyp@gmail.com>
## Summary
- Splits `dashboards.list` into tenant-facing dashboards and
infrastructure dashboards (`dashboards-infra.list`)
- Infrastructure dashboards (VictoriaMetrics, Flux, Hubble, LINSTOR,
control-plane, etc.) are only rendered for `tenant-root`
- Tenant-facing dashboards (ingress, db, kafka, nats, clickhouse, vm)
remain available to all tenants
## Problem
All tenants currently receive infrastructure dashboards
(VictoriaMetrics, Hubble, LINSTOR, Flux, control-plane, etc.) that are
only relevant to platform operators.
Relates to #2194
## Test plan
- [ ] `helm template` monitoring in `tenant-root` namespace — both lists
rendered
- [ ] `helm template` monitoring in a child tenant namespace — only
`dashboards.list` rendered
- [ ] Verify no dashboard names collide between the two lists
```release-note
Scope infrastructure dashboards to tenant-root only
```
🤖 Generated with [Claude Code](https://claude.com/claude-code)
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit
* **New Features**
* Infrastructure dashboards exposed via the monitoring system for
tenant-root deployments.
* Added new ingress dashboards including vhosts and vhost-detail views.
* **Chores**
* Removed a large set of legacy dashboards to streamline the monitoring
surface.
* Reorganized dashboard generation to separate infra-specific dashboards
from standard sets.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
## Summary
- Adds a `CiliumClusterwideNetworkPolicy` allowing egress from tenant
pods to `virt-handler` in `cozy-kubevirt` namespace on port 8443/TCP
- Conditional on `.Values.monitoring` being enabled
## Problem
Tenant vmagent cannot scrape KubeVirt VM metrics from `virt-handler`
because no network policy allows the traffic.
Relates to #2194
## Test plan
- [ ] `helm template` tenant with `monitoring: true` — virt-handler
policy present
- [ ] `helm template` tenant with `monitoring: false` — virt-handler
policy absent
- [ ] Deploy and verify vmagent can scrape kubevirt_vmi_* metrics
```release-note
Allow tenant egress to virt-handler for VM metrics scraping
```
🤖 Generated with [Claude Code](https://claude.com/claude-code)
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit
* **Chores**
* Added a network egress policy that, when monitoring is enabled, allows
tenant namespaces to reach the virt-handler service on TCP port 8443,
improving connectivity for monitoring-related traffic.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
After migrating VictoriaLogs from VLogs to VLCluster, the e2e test
still waited for the old vlogs/generic resource which no longer exists.
Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: Kirill Ilin <stitch14@yandex.ru>
## Summary
- Sidebar menu was showing all resources regardless of their
MarketplacePanel `hidden` state
- Fetch MarketplacePanels during sidebar reconciliation and skip
resources where `hidden=true`
- Hiding a resource from the marketplace now also removes it from the
sidebar navigation
## Test plan
- [ ] Set `hidden: true` on a MarketplacePanel (e.g. qdrant)
- [ ] Trigger controller reconciliation
- [ ] Verify the resource is removed from the sidebar menu
- [ ] Set `hidden: false` and verify the resource reappears in the
sidebar
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit
* **New Features**
* Sidebar can now hide resources based on MarketplacePanel configuration
parsed from panel definitions.
* Hidden resources are filtered early when assembling sidebar
categories, preventing them from contributing to menu items.
* Listing failures are non-fatal: if configuration fetch fails, no
hiding is applied and the dashboard remains functional.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
## What this PR does
The `pull-requests.yaml` workflow used `paths-ignore` at the trigger
level to skip runs for docs-only changes. This prevented the entire
workflow from triggering, so the required "E2E Tests" check was never
created — blocking merge for non-admin users.
This PR replaces trigger-level `paths-ignore` with a `detect-changes`
job using `dorny/paths-filter@v3`. The workflow now always triggers (so
all checks are reported to GitHub), but `build` and downstream jobs are
skipped when only `docs/` files change.
| PR type | build | resolve_assets | e2e |
| --- | --- | --- | --- |
| Code PR | runs | skipped | runs |
| Release PR | skipped (label) | runs | runs |
| Docs-only PR | skipped | skipped | skipped |
### Release note
```release-note
[ci] Fix required E2E check blocking merge of docs-only pull requests
```
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit
* **Chores**
* CI pipeline optimized to skip builds when only documentation changes
occur.
* Added a checks step that detects whether code changed and gates the
build accordingly.
* Build now runs only if code changes are present and the PR is not
marked as a release, reducing unnecessary build runs.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
## What this PR does
This PR updates the cilium system package to the version 1.19.1
### Release note
```release-note
[cilium] Update cilium system package to the version 1.19.1
```
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit
## Release Notes
* **New Features**
* Upgraded to version 1.19.1 with enhanced security and observability
capabilities
* Added standalone DNS proxy support for improved DNS handling
* Enhanced multi-cluster service mesh support with automatic CoreDNS
configuration
* Expanded cloud provider integrations with improved node resource
management
* Added ztunnel encryption support
* **Improvements**
* Enhanced TLS certificate management and auto-generation
* Extended observability and profiling options
* Improved endpoint and service handling with updated resource
management
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
## Summary
- Fix "Disabling features from menu and marketplace is not working" by
preserving user-set `disabled` and `hidden` values during controller
reconciliation
- The controller was hardcoding `disabled=false` and `hidden=false` on
every reconcile loop, overwriting any changes made through the dashboard
UI
## Test plan
- [ ] Disable a service from the dashboard marketplace panel
- [ ] Verify the service stays disabled after controller reconciliation
- [ ] Hide a service from the dashboard menu
- [ ] Verify the service stays hidden after controller reconciliation
- [ ] Create a new ApplicationDefinition and verify its MarketplacePanel
defaults to disabled=false, hidden=false
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit
* **Bug Fixes**
* Fixed an issue where user-configured "disabled" and "hidden" settings
in the marketplace panel could be reset during updates. These
preferences are now preserved when the panel is created or updated, and
the system avoids applying unnecessary configuration changes when values
haven't actually changed.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
- Add CiliumClusterwideNetworkPolicy for vmagent egress to virt-handler
- Restrict endpointSelector to vmagent pods only via app.kubernetes.io/name label
Signed-off-by: Mattia Eleuteri <mattia.eleuteri@hidora.io>
Signed-off-by: mattia-eleuteri <mattia@hidora.io>
## Summary
- Fix External IPs page showing empty rows in the dashboard by
correcting EnrichedTable properties in the `external-ips` factory
- Replace `clusterNamePartOfUrl` with `cluster` and change `pathToItems`
from array format to dot-path string to match convention used by all
other EnrichedTable instances
## Test plan
- [ ] Open Administration → External IPs in dashboard for a tenant with
LoadBalancer services
- [ ] Verify table columns (Name, ClusterIP, LoadbalancerIP, Created)
are rendered
- [ ] Verify service data is displayed correctly in the rows
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit
* **Bug Fixes**
* Fixed configuration handling for the external-ips dashboard tab to
ensure cluster names display correctly and service items are
consistently listed, improving stability and data presentation.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
## What this PR does
During the virtual-machine → vm-instance migration (script 29), VM MAC
addresses
are not preserved. Kube-OVN reads MAC exclusively from the pod
annotation
`ovn.kubernetes.io/mac_address`, not from the IP resource
`spec.macAddress`.
Without the annotation, migrated VMs get a new random MAC, breaking
OS-level
network config that matches by MAC (e.g. netplan).
This adds a Helm `lookup` for the Kube-OVN IP resource in the
vm-instance chart
template. When the resource exists, its `macAddress` and `ipAddress` are
automatically injected as pod annotations. This approach is reliable
across
HelmRelease reconciliations — unlike postRenderers, the annotations
cannot be
accidentally lost.
Fixes#2166
### Release note
```release-note
[platform] Fix VM MAC address not preserved during virtual-machine → vm-instance migration, causing network loss on VMs with MAC-based netplan config
```
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit
* **Chores**
* VM templates now automatically populate network annotations (MAC and
IP) from kubeovn IP records when available. This streamlines VM network
setup on deployment, reduces manual annotation steps, and lowers risk of
misconfiguration by ensuring VMs receive the correct address and MAC
information from associated network records.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
Enable relocate-after-restore on the default linstor-snapshots class
so that PVCs restored from snapshots get replicas relocated to optimal
nodes.
Add a separate linstor-snapshots-velero class with the Velero
annotation and without relocation, so Velero's temporary data mover
PVCs are not relocated unnecessarily.
Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: Andrei Kvapil <kvapss@gmail.com>
Use CSI clone instead of host-assisted copy for CDI DataVolume
cloning. This leverages LINSTOR's native rd-clone mechanism which
is significantly faster than pod-based data copying, and works
together with the new relocateAfterClone parameter.
Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: Andrei Kvapil <kvapss@gmail.com>
Move snapshot restore relocation parameter from StorageClass to
VolumeSnapshotClass to avoid unwanted relocation during Velero
data mover backup. Change relocateAfterClone default to false.
Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: Andrei Kvapil <kvapss@gmail.com>
Remove merged RWX validation patch and add new patch that includes:
- Randomized node selection for snapshot restore
- Relocate replicas to optimal nodes after clone and snapshot restore
Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: Andrei Kvapil <kvapss@gmail.com>
Kube-OVN reads MAC address exclusively from the pod annotation
ovn.kubernetes.io/mac_address, not from the IP resource spec.macAddress.
Without pod-level annotations, migrated VMs receive a new random MAC,
breaking OS-level network config that matches by MAC (e.g. netplan).
Add a Helm lookup for the Kube-OVN IP resource in the vm-instance chart
template. When the IP resource exists, its macAddress and ipAddress are
automatically injected as pod annotations. This removes the need for
fragile Flux postRenderers on the HelmRelease — the chart itself handles
MAC/IP preservation based on actual cluster state.
Remove the postRenderers approach from migration 29 since the chart now
handles this natively.
Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: Kirill Ilin <stitch14@yandex.ru>
The pull-requests workflow used paths-ignore at the trigger level, which
prevented the entire workflow from running on docs-only PRs. This meant
the required "E2E Tests" check was never created, blocking merge for
non-admin users.
Replace trigger-level paths-ignore with a detect-changes job using
dorny/paths-filter. The workflow now always triggers (so checks are
always reported), but build and downstream jobs are skipped when only
docs files change.
Signed-off-by: IvanHunters <xorokhotnikov@gmail.com>
The external-ips factory used incorrect EnrichedTable properties causing
empty rows in the dashboard. Replace `clusterNamePartOfUrl` with
`cluster` and change `pathToItems` from array to dot-path string format
to match the convention used by all other working EnrichedTable instances.
Signed-off-by: IvanHunters <xorokhotnikov@gmail.com>
The controller was hardcoding disabled=false and hidden=false on every
reconciliation, overwriting any user changes made through the dashboard
UI. Move spec building inside the CreateOrUpdate mutate function to read
and preserve current disabled/hidden values from the existing resource.
Signed-off-by: IvanHunters <xorokhotnikov@gmail.com>
The sidebar was generated independently from MarketplacePanels, always
showing all resources regardless of their hidden state. Fetch
MarketplacePanels during sidebar reconciliation and skip resources
where hidden=true, so hiding a resource from the marketplace also
removes it from the sidebar navigation.
Signed-off-by: IvanHunters <xorokhotnikov@gmail.com>
## Summary
- Replace deprecated `gcr.io/kubebuilder/kube-rbac-proxy:v0.16.0` with
`quay.io/brancz/kube-rbac-proxy:v0.18.1` in the vendored etcd-operator
chart
- The GCR-hosted image became unavailable after March 18, 2025
Fixes#2172#488
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit
* **Chores**
* Updated proxy component to v0.18.1 with configuration changes for
improved stability and compatibility.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
The gcr.io/kubebuilder/kube-rbac-proxy image is no longer available
since GCR was deprecated. Replace it with quay.io/brancz/kube-rbac-proxy
from the original upstream author.
Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: Andrei Kvapil <kvapss@gmail.com>
## Summary
- Migration 34 fails with `error: the server doesn't have a resource
type "rabbitmqs"` when `rabbitmqs.apps.cozystack.io` CRD does not exist
on the cluster
- This happens when RabbitMQ was never installed — the CRD is not
present, `kubectl get` fails, and `set -euo pipefail` terminates the
migration job
- Added a CRD existence check before listing resources; if CRD is
absent, the migration stamps the version and exits cleanly
## Test plan
- [ ] Deploy a cluster without RabbitMQ installed and run migration 34 —
should skip gracefully
- [ ] Deploy a cluster with RabbitMQ instances without `spec.version`
set — should patch them to `v3.13`
- [ ] Deploy a cluster with RabbitMQ instances already having
`spec.version` — should skip patching
## Summary
- Fix Keycloak crashloop caused by misconfigured liveness/readiness
probes
- Add `KC_HEALTH_ENABLED=true` to activate health endpoints on
management port
- Switch probes from application port 8080 (`/`, `/realms/master`) to
management port 9000 (`/health/live`, `/health/ready`)
## Problem
Keycloak 26.x redirects all HTTP requests on port 8080 to the configured
`KC_HOSTNAME` (HTTPS). Since kubelet does not follow redirects, probes
fail with:
```
Probe terminated redirects, Response body:
```
After consecutive failures, kubelet kills the container → restart →
crashloop.
Additionally, `KC_HEALTH_ENABLED` was not set, so the dedicated health
endpoints on the management port (9000) returned 404 even though the
management interface was active (via `KC_METRICS_ENABLED=true`).
## Changes
- `packages/system/keycloak/templates/sts.yaml`:
- Add `KC_HEALTH_ENABLED=true` env var to activate `/health/live` and
`/health/ready`
- Expose management port 9000 in container ports
- Liveness probe: `GET /health/live` on port 9000 (was `GET /` on 8080)
- Readiness probe: `GET /health/ready` on port 9000 (was `GET
/realms/master` on 8080)
- Increase failure thresholds for better startup tolerance
## Test plan
- [x] Verified `/health/live` returns `{"status":"UP"}` (HTTP 200) on
port 9000
- [x] Verified `/health/ready` returns
`{"status":"UP","checks":[{"name":"Keycloak database connections async
health check","status":"UP"}]}` (HTTP 200)
- [x] Confirmed 0 restarts after 10+ minutes
- [x] Confirmed no more `ProbeWarning` or `Killing` events
🤖 Generated with [Claude Code](https://claude.com/claude-code)
<!-- Thank you for making a contribution! Here are some tips for you:
- Start the PR title with the [label] of Cozystack component:
- For system components: [platform], [system], [linstor], [cilium],
[kube-ovn], [dashboard], [cluster-api], etc.
- For managed apps: [apps], [tenant], [kubernetes], [postgres],
[virtual-machine] etc.
- For development and maintenance: [tests], [ci], [docs], [maintenance].
- If it's a work in progress, consider creating this PR as a draft.
- Don't hesistate to ask for opinion and review in the community chats,
even if it's still a draft.
- Add the label `backport` if it's a bugfix that needs to be backported
to a previous version.
-->
## What this PR does
### Release note
<!-- Write a release note:
- Explain what has changed internally and for users.
- Start with the same [label] as in the PR title
- Follow the guidelines at
https://github.com/kubernetes/community/blob/master/contributors/guide/release-notes.md.
-->
```release-note
[docs] Fixed docs for managed apps
```
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit
## Documentation
* Updated FoundationDB README title to "Managed FoundationDB Service"
* Improved Harbor README text formatting for consistency
* Corrected spelling and terminology errors in MariaDB README
* Enhanced MariaDB recovery documentation with additional command
example
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
Replace deprecated single-node VLogs CR with VLCluster (cluster mode)
for reliability and horizontal scalability.
Changes:
- Replace VLogs (v1beta1) with VLCluster (v1) using vlinsert/vlselect/vlstorage
- Update fluent-bit outputs to vlinsert-generic:9481
- Update Grafana datasource to vlselect:9471
- Update ExternalName service from vlogs-generic to vlinsert-generic
- Add VPA for all VLCluster components
- Update WorkloadMonitors for three-component architecture
- Add migration 35 to preserve old VLogs resources with keep annotation
Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: Kirill Ilin <stitch14@yandex.ru>
Upgrade from v0.55.0 to v0.68.1 to add VLCluster CRD support,
which is required for migrating VictoriaLogs from single-node
to cluster mode.
Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: Kirill Ilin <stitch14@yandex.ru>
Migration 34 fails when rabbitmqs.apps.cozystack.io CRD does not exist,
which happens when RabbitMQ was never installed on the cluster. Add a
check for CRD presence before attempting to list resources.
Signed-off-by: IvanHunters <xorokhotnikov@gmail.com>
This PR adds the changelog for release `v1.1.0`.
✅ Changelog has been automatically generated in
`docs/changelogs/v1.1.0.md`.
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit
* **Documentation**
* Published v1.1.0 changelog documenting major features: managed secrets
service, tiered storage pools, per-user bucket credentials with S3 UI
updates, RabbitMQ version selection, and monitoring dashboards
* Included breaking changes and upgrade notes for v1.1.0
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
This PR adds the changelog for release `v1.0.3`.
✅ Changelog has been automatically generated in
`docs/changelogs/v1.0.3.md`.
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit
* **Bug Fixes**
* Fixed migration script to correctly apply configuration prefixes
during v0.41 to v1.0 upgrade.
* **Documentation**
* Added white labeling guide covering branding customization and SVG
handling.
* Updated backup and recovery documentation with improved operator and
tenant workflow guidance and administration resources.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
Use a startupProbe to defer liveness/readiness checks until Keycloak
has fully started, instead of relying on initialDelaySeconds. This is
more robust for applications with variable startup times.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Signed-off-by: mattia-eleuteri <mattia@hidora.io>
Keycloak 26.x exposes dedicated health endpoints on the management
port (9000) via /health/live and /health/ready. The previous probes
used GET / on port 8080 which redirects to the configured KC_HOSTNAME
(HTTPS), causing kubelet to fail the probe with "Probe terminated
redirects" and eventually kill the pod in a crashloop.
Changes:
- Add KC_HEALTH_ENABLED=true to activate health endpoints
- Expose management port 9000 in container ports
- Switch liveness probe to /health/live on port 9000
- Switch readiness probe to /health/ready on port 9000
- Increase failure thresholds for more tolerance during startup
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Signed-off-by: mattia-eleuteri <mattia@hidora.io>
## What this PR does
Combines and unifies COSI enhancements across seaweedfs and bucket
charts:
**SeaweedFS (extra + system charts):**
- Rename storage pool BucketClass suffix from `-worm` to `-lock`
- Rename parameter `disk` to `diskType` for consistency with COSI driver
- Reduce default object lock retention from 36500 to 365 days
- Add `-lock` BucketClass (COMPLIANCE mode, 365 days) for client and
system topologies
- Add `-readonly` BucketAccessClass with explicit `accessPolicy` for all
topologies
- Add explicit `accessPolicy: readwrite` on default BucketAccessClass
- Update pool name validation to reject `-lock` suffix (was `-worm`)
**Bucket app:**
- Add `locking` parameter: provisions from `-lock` BucketClass
- Add `storagePool` parameter: selects pool-specific BucketClass
- Replace hardcoded BucketAccess with `users` map — each entry creates a
BucketAccess with optional `readonly` flag
- Update dashboard RBAC to dynamically list user credential secrets
- Update ApplicationDefinition schema with new properties
**Breaking change:** empty `users: {}` (default) produces zero
BucketAccess resources. Existing buckets that relied on the implicit
default BucketAccess will need to define users explicitly.
### Release note
```release-note
[apps] Add locking, storagePool, and users configuration to bucket app; rename COSI BucketClass suffix from -worm to -lock
```
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit
* **New Features**
* Bucket locking with a shorter retention option, storage-pool
selectable bucket classes, and per-user access (per-user BucketAccess
and readonly controls)
* S3 Manager login mode: user login/logout, per-session credentials, and
new login UI
* **Behavior Changes**
* Credential handling changed to per-user secrets/label selection;
previously generated secrets removed; Ingress basic auth annotations
removed
* **Documentation**
* Added parameters: locking, storagePool, users (including per-user
readonly)
* **Updates**
* Updated COSI driver and S3 manager images
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
## Summary
- Set explicit MTU 1350 for Cilium in KubeVirt-based tenant Kubernetes
clusters to prevent packet drops caused by VXLAN encapsulation overhead
## Problem
Cilium's MTU auto-detection does not account for VXLAN overhead when
running inside KubeVirt VMs. The VM network interface inherits MTU 1400
from the parent cluster's OVN/Geneve overlay (1500 - 100 Geneve
overhead). Cilium detects this MTU and applies it to all tunnel
interfaces without subtracting the 50-byte VXLAN encapsulation overhead.
This results in:
- Large packets (> 1350 bytes) being silently dropped when crossing
VXLAN tunnels between nodes
- Intermittent connectivity issues for services in tenant clusters (TLS
handshakes, HTTP responses with data)
- HTTP 499 errors and timeouts observed under load
## Fix
Explicitly set `MTU: 1350` (1400 - 50 VXLAN overhead) in the default
Cilium values for tenant clusters. This value can still be overridden
via `addons.cilium.valuesOverride` if needed.
## Test plan
- [ ] Deploy a tenant Kubernetes cluster and verify Cilium interfaces
use MTU 1350
- [ ] Verify large packet connectivity from pods inside the tenant
cluster
Add mongodb/mongodb-overview and mongodb/mongodb-inmemory entries
to the monitoring dashboards list so GrafanaDashboard CRDs are
generated and dashboards are served by the grafana-dashboards
HTTP service.
Signed-off-by: IvanHunters <xorokhotnikov@gmail.com>
cozytest.sh executes .bats files as plain shell functions where bats
builtins like `run` are not available. Use `!` negation to assert that
readonly user upload fails.
Signed-off-by: IvanHunters <xorokhotnikov@gmail.com>
The mc client requires --insecure on each command when connecting to
SeaweedFS S3 with self-signed certificates via port-forward.
Signed-off-by: IvanHunters <xorokhotnikov@gmail.com>
Update bucket E2E test to match the new per-user access model:
- Create bucket with admin (readwrite) and viewer (readonly) users
- Test that readwrite user can upload, list, and download objects
- Test that readonly user can list and download but cannot upload
- Use per-user BucketAccess and credential secret names
Signed-off-by: IvanHunters <xorokhotnikov@gmail.com>
v0.1.2 ignores accessPolicy parameter from BucketAccessClass,
granting readwrite access to all users regardless of the
readonly flag. v0.3.0 includes support for readonly bucket
access, Object Lock, and improved error handling.
Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: IvanHunters <xorokhotnikov@gmail.com>
Rebuild s3manager with auth.go login page support and push
to 999669/s3manager registry for testing.
Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: IvanHunters <xorokhotnikov@gmail.com>
Remove nginx basic auth and credential secret injection from the
bucket Helm chart. s3manager now always starts in login mode and
handles authentication via its own login page with encrypted
session cookies. This eliminates the dependency on the -credentials
and -ui-auth secrets for the UI layer.
Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: IvanHunters <xorokhotnikov@gmail.com>
deployment.yaml: use s3._namespace.host for ENDPOINT instead of
secret ref, inject ACCESS_KEY_ID/SECRET_ACCESS_KEY only when users
exist. Without users, s3manager starts in login mode.
ingress.yaml: nginx basic auth annotations only when users exist.
Without users, s3manager handles authentication via its login form.
Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: IvanHunters <xorokhotnikov@gmail.com>
When a bucket has no users configured, s3manager previously crashed
due to missing ACCESS_KEY_ID/SECRET_ACCESS_KEY env vars. This adds
a login mode where users enter their S3 credentials via a web form.
New Go code (via cozystack.patch):
- auth.go: session-based auth middleware, login/logout handlers,
per-request S3 client from encrypted cookie session
- login.html.tmpl: Materialize CSS login form
- main.go: LoginMode toggle, conditional route setup
- Dependency: gorilla/sessions for AES-256 encrypted cookies
Dockerfile: add go mod tidy step for new dependency resolution.
Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: IvanHunters <xorokhotnikov@gmail.com>
Show only per-user credential secrets in the dashboard instead of
both the internal UI secret and per-user secrets.
Signed-off-by: IvanHunters <xorokhotnikov@gmail.com>
<!-- Thank you for making a contribution! Here are some tips for you:
- Start the PR title with the [label] of Cozystack component:
- For system components: [platform], [system], [linstor], [cilium],
[kube-ovn], [dashboard], [cluster-api], etc.
- For managed apps: [apps], [tenant], [kubernetes], [postgres],
[virtual-machine] etc.
- For development and maintenance: [tests], [ci], [docs], [maintenance].
- If it's a work in progress, consider creating this PR as a draft.
- Don't hesistate to ask for opinion and review in the community chats,
even if it's still a draft.
- Add the label `backport` if it's a bugfix that needs to be backported
to a previous version.
-->
## What this PR does
### Release note
<!-- Write a release note:
- Explain what has changed internally and for users.
- Start with the same [label] as in the PR title
- Follow the guidelines at
https://github.com/kubernetes/community/blob/master/contributors/guide/release-notes.md.
-->
```release-note
Fixed migrate-to-version-1.0.sh script to properly convert packages names.
```
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit
* **Chores**
* Updated migration tooling to improve package configuration handling
during version upgrades.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
The default BucketAccess was removed in favor of per-user access.
Update secret.yaml to look up the first user's COSI secret instead
of the non-existent default one, with nil-check for race conditions.
Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: IvanHunters <xorokhotnikov@gmail.com>
Cilium's MTU auto-detection does not account for VXLAN overhead when
running inside KubeVirt VMs. The VM interface inherits MTU 1400 from
the parent OVN/Geneve overlay, and Cilium sets all interfaces
(cilium_vxlan, lxc*, cilium_host/net) to 1400 without subtracting
the 50-byte VXLAN encapsulation overhead.
This causes intermittent packet drops for large packets (TLS
handshakes, HTTP responses with data), resulting in timeouts and
499 errors for services running in tenant clusters.
Set MTU to 1350 (1400 - 50 VXLAN overhead) explicitly in the default
Cilium values for tenant Kubernetes clusters.
Signed-off-by: IvanHunters <xorokhotnikov@gmail.com>
Remove the yq strip of properties from Makefile that was clearing
the schema, and run make generate to sync all generated files.
Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: IvanHunters <xorokhotnikov@gmail.com>
Create labeled secrets in the -system chart using lookup to copy
credentials from COSI-created secrets. The ApplicationDefinition
matchLabels selector exposes them in the dashboard.
Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: IvanHunters <xorokhotnikov@gmail.com>
Add a catch-all include selector so that COSI-created user credential
secrets (dynamically named per user) are visible in the dashboard.
The lineage webhook already verifies ownership via the graph walk
(Secret -> BucketAccess -> HelmRelease -> Bucket), so an empty selector
safely matches only secrets belonging to this application.
This is needed because COSI sidecar creates secrets without custom
labels, making the matchLabels pattern (used by rabbitmq) inapplicable.
Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: IvanHunters <xorokhotnikov@gmail.com>
The seaweedfs-cosi-driver v0.3.0 expects the parameter key 'disk',
not 'diskType'. Restore the correct key to match the driver's
paramDisk constant.
Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: IvanHunters <xorokhotnikov@gmail.com>
Add COSI resources for object locking and read-only access to both
client topology and system chart:
- BucketClass with -lock suffix (COMPLIANCE mode, 365 days retention)
- BucketAccessClass with -readonly suffix
- Explicit accessPolicy: readwrite on default BucketAccessClass
Signed-off-by: IvanHunters <xorokhotnikov@gmail.com>
<!-- Thank you for making a contribution! Here are some tips for you:
- Start the PR title with the [label] of Cozystack component:
- For system components: [platform], [system], [linstor], [cilium],
[kube-ovn], [dashboard], [cluster-api], etc.
- For managed apps: [apps], [tenant], [kubernetes], [postgres],
[virtual-machine] etc.
- For development and maintenance: [tests], [ci], [docs], [maintenance].
- If it's a work in progress, consider creating this PR as a draft.
- Don't hesistate to ask for opinion and review in the community chats,
even if it's still a draft.
- Add the label `backport` if it's a bugfix that needs to be backported
to a previous version.
-->
## What this PR does
### Release note
<!-- Write a release note:
- Explain what has changed internally and for users.
- Start with the same [label] as in the PR title
- Follow the guidelines at
https://github.com/kubernetes/community/blob/master/contributors/guide/release-notes.md.
-->
```release-note
- fixed rbac for backup controllers
```
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit
* **Chores**
* Updated backup controller permissions to focus on core backup
operations.
* Expanded backup strategy controller permissions to support enhanced
backup and restore capabilities, including Velero integration and status
management.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
This PR adds the changelog for release `v1.0.2`.
✅ Changelog has been automatically generated in
`docs/changelogs/v1.0.2.md`.
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit
* **Documentation**
* Published v1.0.2 release notes.
* **Bug Fixes**
* Fixed migration script to ensure all upgrade steps execute.
* Improved dashboard functionality for field clearing and secret
copying.
* Restored sidebar navigation on namespace-level pages.
* Updated proxy configurations for enhanced TLS handling.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
## What this PR does
Replaces the plain text input for `storageClass` fields with an
API-backed dropdown listing available StorageClasses from the cluster.
Follows the same pattern as the `instanceType` dropdown for VMInstance.
Affected applications:
- **Top-level `spec.storageClass`**: ClickHouse, Harbor, HTTPCache,
Kubernetes, MariaDB, MongoDB, NATS, OpenBAO, Postgres, Qdrant,
RabbitMQ, Redis, VMDisk
- **Nested `spec.storage.storageClass`**: FoundationDB
- **Nested `spec.kafka.storageClass` / `spec.zookeeper.storageClass`**:
Kafka
### Release note
```release-note
[dashboard] storageClass fields in stateful app forms now render as a
dropdown populated with available StorageClasses from the cluster,
instead of a free-text input.
```
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit
## Release Notes
* **New Features**
* Storage class selection dropdowns now available in configuration forms
for multiple database, messaging, and storage services.
* **Tests**
* Added comprehensive test coverage for storage class configuration
handling.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
## What this PR does
Restores `stock-instance-api-form`, `stock-instance-api-table`,
`stock-instance-builtin-form`, and `stock-instance-builtin-table`
sidebar
resources that were removed in #2106, and adds them back to the orphan
cleanup allowlist.
PR #2106 removed these sidebars to fix broken URLs on the main page
before
namespace selection (`default//api-table/...`). However,
`stock-instance-*`
sidebars are required by the frontend for namespace-level
api-table/api-form
pages. Without them and with `CUSTOMIZATION_SIDEBAR_FALLBACK_ID=""`, the
frontend cannot find a sidebar for pages like Backup Plans and renders
an
empty page where no interaction is possible.
The broken-URL bug is already fully fixed by
`CUSTOMIZATION_SIDEBAR_FALLBACK_ID=""`
in `web.yaml`. Re-adding `stock-instance-*` does not reintroduce it,
since
these sidebars are only shown when the user is on a namespace-level page
where the `{namespace}` placeholder is filled.
### Release note
```release-note
[dashboard] fix empty page on Backup Plans and other namespace-level api-table pages
```
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit
* **New Features**
* Added four new dashboard sidebar resources for stock instances: API
form, API table, built-in form, and built-in table views. These enable
expanded dashboard customization options for managing stock instance
configurations and data.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
## What this PR does
Updates the openapi-k8s-toolkit integration in the dashboard to fix two
UX issues:
**1. Allow clearing the instanceType field in VMInstance form**
When `instanceType` has a default value, clearing the field in the form
UI would
silently revert to the default, making it impossible to explicitly send
an empty
value. This blocked use of custom KubeVirt resources without a named
instance type.
Adds `allowEmpty: true` to the instanceType listInput field so the BFF
preserves
an explicit empty value. Also introduces a generic `persistType` prop
(`'str' | 'number' | 'arr' | 'obj'`) to the listInput component, so the
allow-empty behaviour works correctly for any field type, not just
strings.
Updates openapi-k8s-toolkit to release/1.4.0 (d6b9e4ad), which already
includes
the FormListInput layout refactor — the previous
formlistinput-value-binding.diff
patch is no longer needed.
Upstream PR:
https://github.com/PRO-Robotech/openapi-k8s-toolkit/pull/340
**2. Preserve newlines when copying secrets with CMD+C**
Native `input[type=text]` strips newlines on copy. Adds an `onCopy`
handler to
the SecretBase64Plain component that intercepts the copy event and
writes the full
decoded value (including newlines) to the clipboard.
Upstream PR:
https://github.com/PRO-Robotech/openapi-k8s-toolkit/pull/339
### Release note
```release-note
[dashboard] Fix clearing instanceType in VMInstance form: explicit empty value
is now correctly sent to the API instead of falling back to the schema default.
Fix CMD+C copying of secrets stripping newlines.
```
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit
* **New Features**
* Dropdown fields now support configuration to allow empty selections
* Enhanced empty value handling for form fields across multiple data
types (string, number, array, object)
* **Bug Fixes**
* Fixed secret field copy functionality to preserve plain-text format
when visible
* **Chores**
* Updated base image dependencies for dashboard build
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
PR #2106 removed stock-instance-* sidebar resources to fix broken URLs
on the main page before namespace selection. However, these sidebars are
required for rendering namespace-level pages (api-table, api-form, etc.)
such as the Backup Plans page.
Without stock-instance-api-table, the frontend cannot find the sidebar
for namespace-scoped api-table pages and renders an empty page instead.
The original bug (broken URLs with empty namespace placeholder) is already
fixed by CUSTOMIZATION_SIDEBAR_FALLBACK_ID="" in web.yaml, so re-adding
stock-instance-* sidebars does not reintroduce it.
Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: Kirill Ilin <stitch14@yandex.ru>
## What this PR does
Replace deprecated `KC_PROXY=edge` with `KC_PROXY_HEADERS=xforwarded`
and `KC_HTTP_ENABLED=true` in the Keycloak StatefulSet template.
`KC_PROXY` was removed in Keycloak 26.x, causing "Non-secure context
detected" warnings and broken cookie handling when running behind a
reverse proxy with TLS termination.
### Release note
```release-note
[system] Fix Keycloak proxy headers configuration for compatibility with Keycloak 26.x
```
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit
## Release Notes
* **Chores**
* Updated system configuration to improve proxy header handling and
enable direct HTTP support for enhanced compatibility with reverse proxy
environments.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
Update openapi-k8s-toolkit commit to d6b9e4ad (release/1.4.0) which
includes the FormListInput layout refactor, making formlistinput-value-binding.diff
obsolete.
Set allowEmpty: true on the VMInstance instanceType field so users can
explicitly clear the selection and override the default instance type.
Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: Kirill Ilin <stitch14@yandex.ru>
Update openapi-k8s-toolkit to release/1.4.0 (d6b9e4ad). The previous
value-binding layout refactor is already included upstream, so drop the
formlistinput-value-binding.diff patch.
Add formlistinput-allow-empty.diff patch which introduces two props to
the listInput component:
- allowEmpty: when set, auto-persists the field so BFF sends an empty
value instead of falling back to the schema default
- persistType: controls the type of empty value ('str' | 'number' | 'arr'
| 'obj'), allowing the feature to work correctly for any field type
Set allowEmpty: true on the VMInstance instanceType field so users can
explicitly clear the selection and override the default instance type.
Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: Kirill Ilin <stitch14@yandex.ru>
<!-- Thank you for making a contribution! Here are some tips for you:
- Start the PR title with the [label] of Cozystack component:
- For system components: [platform], [system], [linstor], [cilium],
[kube-ovn], [dashboard], [cluster-api], etc.
- For managed apps: [apps], [tenant], [kubernetes], [postgres],
[virtual-machine] etc.
- For development and maintenance: [tests], [ci], [docs], [maintenance].
- If it's a work in progress, consider creating this PR as a draft.
- Don't hesistate to ask for opinion and review in the community chats,
even if it's still a draft.
- Add the label `backport` if it's a bugfix that needs to be backported
to a previous version.
-->
## What this PR does
### Release note
<!-- Write a release note:
- Explain what has changed internally and for users.
- Start with the same [label] as in the PR title
- Follow the guidelines at
https://github.com/kubernetes/community/blob/master/contributors/guide/release-notes.md.
-->
```release-note
[platform] Fixed off-by-one error where the first required migration was always skipped.
```
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit
* **Bug Fixes**
* Corrected migration range handling so upgrade steps run for the
intended version window, preventing skipped or duplicated migrations.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
<!-- Thank you for making a contribution! Here are some tips for you:
- Start the PR title with the [label] of Cozystack component:
- For system components: [platform], [system], [linstor], [cilium],
[kube-ovn], [dashboard], [cluster-api], etc.
- For managed apps: [apps], [tenant], [kubernetes], [postgres],
[virtual-machine] etc.
- For development and maintenance: [tests], [ci], [docs], [maintenance].
- If it's a work in progress, consider creating this PR as a draft.
- Don't hesistate to ask for opinion and review in the community chats,
even if it's still a draft.
- Add the label `backport` if it's a bugfix that needs to be backported
to a previous version.
-->
## What this PR does
### Release note
<!-- Write a release note:
- Explain what has changed internally and for users.
- Start with the same [label] as in the PR title
- Follow the guidelines at
https://github.com/kubernetes/community/blob/master/contributors/guide/release-notes.md.
-->
```release-note
Disabled private key rotation in every CA cert in cozystack system packages to prevent trustchain problems when CA cert reissued
```
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit
* **Chores**
* Disabled private-key rotation (set rotationPolicy: Never) for CA/root
certificates used by multiple system components (ingress-nginx, linstor,
linstor-scheduler, seaweedfs, victoria-metrics-operator,
kubeovn-webhook, lineage-controller-webhook, cozystack-api, etcd,
linstor API/internal, seaweedfs).
* Added patch application steps to relevant update workflows to ensure
the certificate template changes are applied during chart/update
operations.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
<!-- Thank you for making a contribution! Here are some tips for you:
- Start the PR title with the [label] of Cozystack component:
- For system components: [platform], [system], [linstor], [cilium],
[kube-ovn], [dashboard], [cluster-api], etc.
- For managed apps: [apps], [tenant], [kubernetes], [postgres],
[virtual-machine] etc.
- For development and maintenance: [tests], [ci], [docs], [maintenance].
- If it's a work in progress, consider creating this PR as a draft.
- Don't hesistate to ask for opinion and review in the community chats,
even if it's still a draft.
- Add the label `backport` if it's a bugfix that needs to be backported
to a previous version.
-->
## What this PR does
### Release note
<!-- Write a release note:
- Explain what has changed internally and for users.
- Start with the same [label] as in the PR title
- Follow the guidelines at
https://github.com/kubernetes/community/blob/master/contributors/guide/release-notes.md.
-->
```release-note
[rabbitmq] Added version selection to newly created RabbitMQ instances.
```
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit
* **New Features**
* Configurable RabbitMQ major.minor version selector (v4.2, v4.1, v4.0,
v3.13), default v4.2; chart validates selection and uses it to pick the
runtime image.
* **Chores**
* Default RabbitMQ image updated to 4.2.4.
* Added an automated version-update helper and a Makefile target to
refresh available versions and regenerate manifests.
* **Migration**
* Migration added to backfill the version field on existing RabbitMQ
resources.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
Replace plain text input with an API-backed listInput dropdown for
storageClass fields across all applications that expose them.
The dropdown fetches available StorageClasses from the cluster via
/api/clusters/{cluster}/k8s/apis/storage.k8s.io/v1/storageclasses,
following the same pattern as the instanceType dropdown for VMInstance.
Top-level spec.storageClass: ClickHouse, Harbor, HTTPCache, Kubernetes,
MariaDB, MongoDB, NATS, OpenBAO, Postgres, Qdrant, RabbitMQ, Redis, VMDisk.
Nested paths: FoundationDB (spec.storage.storageClass),
Kafka (spec.kafka.storageClass and spec.zookeeper.storageClass).
Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: Kirill Ilin <stitch14@yandex.ru>
## What this PR does
Adds a check in the migration script to detect and suspend the
`cozy-proxy`
HelmRelease if it has `releaseName: cozystack`, which conflicts with the
installer
release and causes cozystack-operator deletion during upgrade from v0.41
to v1.0.
### Release note
```release-note
[platform] Fix migration script to handle cozy-proxy releaseName conflict during v0.41→v1.0 upgrade.
```
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit
* **Chores**
* Enhanced the version 1.0 migration process with automatic conflict
detection and interactive guidance, prompting users to resolve issues
during the upgrade for a smoother migration experience.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
In v0.41.x, cozy-proxy HelmRelease was configured with
releaseName: cozystack, which collides with the installer helm release.
If not suspended before upgrade, the cozy-proxy HR reconciles and
overwrites the installer release, deleting cozystack-operator.
Add a check in the migration script that detects this conflict and
suspends the cozy-proxy HelmRelease before proceeding.
Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: Andrei Kvapil <kvapss@gmail.com>
KC_PROXY=edge was deprecated and removed in Keycloak 26.x, causing
"Non-secure context detected" warnings and broken cookie handling
behind reverse proxy. Replace with KC_PROXY_HEADERS=xforwarded and
KC_HTTP_ENABLED=true.
Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: Kirill Ilin <stitch14@yandex.ru>
This PR adds the changelog for release `v1.0.1`.
✅ Changelog has been automatically generated in
`docs/changelogs/v1.0.1.md`.
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit
* **Documentation**
* Published v1.0.1 release notes with platform, installer, and dashboard
bug fixes
* Updated website documentation: renamed "Bundles" to "Variants," added
new variant options, and updated cross-references
* Added upgrade protection instructions for system components prior to
upgrade
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
## Summary
- Add FlowSchema `cozy-dashboard-exempt` to exempt the dashboard BFF
service account (`incloud-web-web`) from API Priority and Fairness
throttling
- BFF falls under the default `service-accounts` FlowSchema →
`workload-low` priority level, which causes 429 responses under load
## Test plan
- [ ] Deploy to a cluster with dashboard enabled
- [ ] Verify FlowSchema is created: `kubectl get flowschema
cozy-dashboard-exempt`
- [ ] Verify BFF no longer receives 429 errors under load
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit
* **Chores**
* Added a new Kubernetes FlowSchema configuration for system resource
access management.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
## What this PR does
Adds `helm.sh/resource-policy: keep` annotation to the `cozy-system`
Namespace resource
in the installer helm chart. This prevents Helm from deleting the
namespace (and all
HelmReleases within it) when the installer release is removed.
Also updates the v1.0 migration script to annotate the `cozy-system`
namespace and
`cozystack-version` ConfigMap with the same policy before generating the
Package resource.
### Release note
```release-note
[platform] Add helm.sh/resource-policy=keep annotation to cozy-system Namespace in installer chart to prevent namespace deletion on HelmRelease removal. Update migration script to protect namespace and cozystack-version ConfigMap before migration.
```
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit
* **Chores**
* Enhanced migration process with an interactive step to safeguard
critical resources during system upgrades.
* Added resource protection mechanisms to prevent unintended removal
during Helm operations.
* Improved control flow in the upgrade script with explicit user
confirmation prompts.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
Add helm.sh/resource-policy=keep annotation to the cozy-system Namespace
in the installer helm chart. This prevents Helm from deleting the
namespace when the HelmRelease is removed, which would otherwise destroy
all other HelmReleases within it.
Update the migration script to annotate the cozy-system namespace and
cozystack-version ConfigMap with helm.sh/resource-policy=keep before
generating the Package resource.
Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: Andrei Kvapil <kvapss@gmail.com>
The dashboard BFF service account (incloud-web-web) falls under the
default "service-accounts" FlowSchema which maps to the "workload-low"
priority level. Under load, this causes API Priority and Fairness to
return 429 (Too Many Requests) responses to the BFF, resulting in 500
errors for dashboard users.
Add a FlowSchema that maps the BFF service account to the "exempt"
priority level to prevent APF throttling of dashboard API requests.
Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: Andrei Kvapil <kvapss@gmail.com>
<!-- Thank you for making a contribution! Here are some tips for you:
- Start the PR title with the [label] of Cozystack component:
- For system components: [platform], [system], [linstor], [cilium],
[kube-ovn], [dashboard], [cluster-api], etc.
- For managed apps: [apps], [tenant], [kubernetes], [postgres],
[virtual-machine] etc.
- For development and maintenance: [tests], [ci], [docs], [maintenance].
- If it's a work in progress, consider creating this PR as a draft.
- Don't hesistate to ask for opinion and review in the community chats,
even if it's still a draft.
- Add the label `backport` if it's a bugfix that needs to be backported
to a previous version.
-->
## What this PR does
### Release note
<!-- Write a release note:
- Explain what has changed internally and for users.
- Start with the same [label] as in the PR title
- Follow the guidelines at
https://github.com/kubernetes/community/blob/master/contributors/guide/release-notes.md.
-->
```release-note
[ci] Added more debug information to ci tests
```
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit
* **Chores**
* Enhanced error handling and diagnostic output in development testing
infrastructure.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
## What this PR does
- Add `volume.pools` (Simple topology) and `volume.zones[name].pools`
(MultiZone topology) for creating separate Volume StatefulSets per disk
type (SSD/HDD/NVMe)
- Add `nodeSelector`, `storageClass`, and `dataCenter` overrides for
zones in MultiZone topology
- Create per-pool `BucketClass` and `BucketAccessClass` COSI resources
(including WORM and readonly variants)
- Bump seaweedfs-cosi-driver to v0.3.0 (adds `disk` parameter support in
BucketClass)
- Add `volume.diskType` field to tag default volume servers with a disk
type
### How It Works
#### Simple Topology
Each storage pool in `volume.pools` creates an additional Volume
StatefulSet alongside the default one. All pods (default + pool) may run
on the same nodes. SeaweedFS distinguishes storage via the
`-disk=<type>` flag on volume servers.
```yaml
volume:
replicas: 2
size: 10Gi
diskType: ""
pools:
ssd:
diskType: ssd
size: 50Gi
storageClass: local-nvme
```
#### MultiZone Topology
Pools are defined per-zone in `volume.zones[name].pools`. A StatefulSet
is created for each **zone × pool** combination (e.g., `us-east-ssd`,
`us-west-ssd`), inheriting nodeSelector and dataCenter from its parent
zone.
```yaml
volume:
replicas: 2
size: 10Gi
zones:
us-east:
replicas: 2
size: 100Gi
# nodeSelector defaults to: topology.kubernetes.io/zone: us-east
pools:
ssd:
diskType: ssd
size: 50Gi
us-west:
replicas: 3
```
In Simple topology, `volume.pools` is used. In MultiZone,
`volume.zones[name].pools` is used — `volume.pools` is explicitly
blocked to prevent BucketClasses without backing StatefulSets.
### Zone Overrides (MultiZone)
Zones now support:
- `nodeSelector` — YAML string, defaults to
`topology.kubernetes.io/zone: <zoneName>`
- `storageClass` — defaults to `volume.storageClass`
- `dataCenter` — SeaweedFS data center name, defaults to zone name
### COSI Resources
Each unique pool name generates 4 cluster-scoped COSI resources:
- `<namespace>-<pool>` BucketClass (Delete policy, `disk: <type>`)
- `<namespace>-<pool>-worm` BucketClass (Retain policy, object lock)
- `<namespace>-<pool>` BucketAccessClass (readwrite)
- `<namespace>-<pool>-readonly` BucketAccessClass (readonly)
### Validation
- Pool names must be valid DNS labels (no dots)
- Pool names must not end with `-worm` or `-readonly` (reserved COSI
suffixes)
- `diskType` is required and must be lowercase alphanumeric
- Pool `diskType` must differ from `volume.diskType`
- Pool name + zone name composed names must not collide with existing
zone names
- `volume.pools` is blocked in Client and MultiZone topologies
- All replicas have `minimum: 1` in JSON schema
### Inheritance Chain
| Field | Pool fallback (Simple) | Pool fallback (MultiZone) |
| ------------ | -------------------------------- |
---------------------------------------- |
| `replicas` | pool → volume | pool → zone → volume |
| `size` | pool → volume | pool → zone → volume |
| `storageClass` | pool → volume | pool → zone → volume |
| `resources` | pool → volume | pool → volume (zone resources inherited)
|
### Backward Compatibility
- Default `volume.pools: {}` produces identical output to current chart
- Default `volume.diskType: ""` adds no extra flags
- Existing default BucketClass remains unchanged
- No migration needed — pools create new StatefulSets
### Test plan
- [x] `helm template` with empty pools — output identical to current
- [x] `helm template` with Simple + volume.pools — additional volume
StatefulSets, BucketClasses, WorkloadMonitors
- [x] `helm template` with MultiZone + zone.pools — zone × pool
cross-product StatefulSets
- [x] `helm template` with `volume.diskType: hdd` — extraArgs includes
`-disk=hdd`
- [x] `helm template` with Client + volume.pools — fails with validation
error
- [x] `helm template` with MultiZone + volume.pools — fails with
validation error
- [x] `helm template` with reserved pool name suffix — fails with
validation
- [x] Deploy to test cluster and verify volume servers register with
correct disk types
### Release note
```release-note
[seaweedfs] add storage pools support for tiered storage with per-pool COSI resources
```
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit
* **New Features**
* Added support for multiple storage pools with configurable disk types
and resource allocation.
* Introduced per-pool bucket and access classes for storage management.
* Added zone-aware pool configurations for multi-zone deployments.
* Enhanced topology-driven resource monitoring and allocation.
* **Documentation**
* Updated service documentation with expanded configuration parameters
and improved formatting.
* **Chores**
* Updated container image to latest version.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
## Summary
- Add a readonly `BucketAccessClass` to the seaweedfs COSI chart with
`accessPolicy: "readonly"` parameter
- Each bucket now automatically creates two sets of S3 credentials:
readWrite (existing, for UI) and readonly
- Update dashboard RBAC and ApplicationDefinition to expose the readonly
credentials secret
## Test plan
- [ ] Verify seaweedfs chart templates render both `BucketAccessClass`
resources (readWrite and readonly)
- [ ] Verify bucket app templates render `BucketClaim` + 2
`BucketAccess` (readWrite + readonly)
- [ ] Deploy a bucket and confirm both credential secrets are created by
COSI driver
- [ ] Confirm readonly credentials can only read/list objects, not
write/delete
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit
* **New Features**
* Introduced read-only bucket access capabilities. Users can now
configure read-only permissions for bucket storage resources,
complementing existing access control options. New read-only access
classes and configurations provide enhanced security controls and
finer-grained permission management. This enables improved data
protection while maintaining flexibility for various access requirements
across applications and storage infrastructure.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
<!-- Thank you for making a contribution! Here are some tips for you:
- Start the PR title with the [label] of Cozystack component:
- For system components: [platform], [system], [linstor], [cilium],
[kube-ovn], [dashboard], [cluster-api], etc.
- For managed apps: [apps], [tenant], [kubernetes], [postgres],
[virtual-machine] etc.
- For development and maintenance: [tests], [ci], [docs], [maintenance].
- If it's a work in progress, consider creating this PR as a draft.
- Don't hesistate to ask for opinion and review in the community chats,
even if it's still a draft.
- Add the label `backport` if it's a bugfix that needs to be backported
to a previous version.
-->
## What this PR does
### Release note
<!-- Write a release note:
- Explain what has changed internally and for users.
- Start with the same [label] as in the PR title
- Follow the guidelines at
https://github.com/kubernetes/community/blob/master/contributors/guide/release-notes.md.
-->
```release-note
[cert-manager] Updated cert-manager to v1.19.3
```
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit
* **New Features**
* Global nodeSelector and hostUsers (pod user-namespace isolation) added
* New/updated CRDs for cert-manager resources (Certificate,
CertificateRequest, Order, etc.)
* **Documentation**
* Revised chart docs and installation guidance; added deprecation/notice
about private-key rotation
* Removed legacy CRD README and schema files from the CRD package
(documentation consolidated)
* **Chores**
* Upgraded cert-manager to v1.19.3
* Moved CRDs into a dedicated CRD package; ServiceMonitor targetPort
default renamed to "http-metrics"
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
<!-- Thank you for making a contribution! Here are some tips for you:
- Start the PR title with the [label] of Cozystack component:
- For system components: [platform], [system], [linstor], [cilium],
[kube-ovn], [dashboard], [cluster-api], etc.
- For managed apps: [apps], [tenant], [kubernetes], [postgres],
[virtual-machine] etc.
- For development and maintenance: [tests], [ci], [docs], [maintenance].
- If it's a work in progress, consider creating this PR as a draft.
- Don't hesistate to ask for opinion and review in the community chats,
even if it's still a draft.
- Add the label `backport` if it's a bugfix that needs to be backported
to a previous version.
-->
## What this PR does
### Release note
<!-- Write a release note:
- Explain what has changed internally and for users.
- Start with the same [label] as in the PR title
- Follow the guidelines at
https://github.com/kubernetes/community/blob/master/contributors/guide/release-notes.md.
-->
```release-note
[platform] Prevent cozystack-version configmap from deletion
```
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit
* **Chores**
* Updated deployment resource configuration to improve system
reliability by ensuring critical components are properly retained and
protected during system operations and maintenance activities.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
## What this PR does
Adds OpenBAO (open-source Vault fork) as a new managed PaaS application
in Cozystack.
**Structure follows existing app patterns (qdrant, nats):**
- System chart with vendored upstream `openbao/openbao` (chart v0.25.3,
appVersion v2.5.0)
- App chart with standalone/HA mode switching based on replicas count
- TLS via cert-manager self-signed certificates per instance
- ApplicationDefinition, PackageSource, PaaS bundle entry
- E2E test with init/unseal workflow
**Key design decisions:**
- `replicas: 1` → standalone mode with file storage; `replicas > 1` → HA
with Raft integrated storage and retry_join with TLS peer verification
- TLS enabled by default — each instance gets a self-signed Certificate
with DNS SANs covering services and pod addresses
- `disable_mlock = true` in HCL config since default security context
drops IPC_LOCK capability
- Injector and CSI provider disabled (cluster-scoped components, not
safe per-tenant)
- No auto-init/unseal — OpenBAO requires manual initialization by design
- E2E test performs full lifecycle: deploy, wait for certificate + API,
init, unseal, verify readiness, cleanup
### Release note
```release-note
[apps] Add OpenBAO as a managed secrets management service with standalone and HA Raft modes, TLS enabled by default
```
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit
## Release Notes
* **New Features**
* Added OpenBAO managed secrets management service with
high-availability and standalone deployment options
* Integrated monitoring and dashboards for operational visibility
* Enabled configurable external access and web UI
* Added automated snapshot backup capability
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
<!-- Thank you for making a contribution! Here are some tips for you:
- Start the PR title with the [label] of Cozystack component:
- For system components: [platform], [system], [linstor], [cilium],
[kube-ovn], [dashboard], [cluster-api], etc.
- For managed apps: [apps], [tenant], [kubernetes], [postgres],
[virtual-machine] etc.
- For development and maintenance: [tests], [ci], [docs], [maintenance].
- If it's a work in progress, consider creating this PR as a draft.
- Don't hesistate to ask for opinion and review in the community chats,
even if it's still a draft.
- Add the label `backport` if it's a bugfix that needs to be backported
to a previous version.
-->
## What this PR does
### Release note
<!-- Write a release note:
- Explain what has changed internally and for users.
- Start with the same [label] as in the PR title
- Follow the guidelines at
https://github.com/kubernetes/community/blob/master/contributors/guide/release-notes.md.
-->
```release-note
- added dropdown for selection backupClasses in Plan/BackupJob creation form
```
Redesign storage pools architecture:
- Move storagePools map from top-level into volume.pools (Simple topology)
and volume.zones[name].pools (MultiZone topology)
- Add nodeSelector, storageClass, dataCenter overrides for zones
- Add reserved suffix validation (-worm, -readonly) for pool names
- Block volume.pools usage in MultiZone (must use zone.pools instead)
- Use ternary/hasKey pattern for all optional replicas to handle 0 correctly
- Fix nodeSelector rendering for multiline values using indent
- Use disk: parameter (not diskType:) for COSI driver v0.3.0 BucketClass
- Bump seaweedfs-cosi-driver tag to v0.3.0
- Add minimum: 1 constraint for volume/zone/pool replicas in schema
- Regenerate README, CRD, and openAPISchema
Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: Kirill Ilin <stitch14@yandex.ru>
- Remove dots from pool name regex (K8s resources don't allow dots)
- Add zone×pool name collision validation for MultiZone topology
- Use conditional storageClass rendering to omit empty values
- Fix README resourcesPreset default value
Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: Kirill Ilin <stitch14@yandex.ru>
- Document MultiZone fallback chain for pool replicas and size
- Move `-volume` WorkloadMonitor reference inside Simple topology block in dashboard-resourcemap.yaml (it is only created for Simple)
Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: Kirill Ilin <stitch14@yandex.ru>
Make storageClass optional in storagePools — pools inherit from
volume.storageClass when not explicitly set. Add full COSI resource set
per storage pool: BucketClass, BucketClass-worm (Retain + object lock),
BucketAccessClass readwrite, and BucketAccessClass readonly.
Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: Kirill Ilin <stitch14@yandex.ru>
Add optional storagePools configuration that creates separate Volume
StatefulSets per disk type (SSD/HDD/NVMe), enabling tiered storage
within a single SeaweedFS tenant. Each pool gets its own BucketClass
and BucketAccessClass to prepare infrastructure for COSI driver
integration.
Supported in both Simple and MultiZone topologies:
- Simple: one StatefulSet per pool
- MultiZone: one StatefulSet per zone×pool combination
Also adds volume.diskType field for tagging default volume servers.
Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: Kirill Ilin <stitch14@yandex.ru>
This PR adds the changelog for release `v1.0.0`.
✅ Changelog has been automatically generated in
`docs/changelogs/v1.0.0.md`.
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit
* **Documentation**
* Added comprehensive v1.0.0 release notes documenting feature
highlights, improvements, and fixes across all platform components
* Included breaking changes and step-by-step upgrade guide for v0.x to
v1.0.0 migration
* Listed 33 incremental migrations and contributor credits
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
Add a readonly BucketAccessClass to the seaweedfs COSI chart and a
second fixed BucketAccess per bucket so each bucket automatically
gets both readWrite and readonly S3 credentials.
Signed-off-by: IvanHunters <xorokhotnikov@gmail.com>
We are thrilled to announce **Cozystack v1.0.0**, the first stable major release of the Cozystack platform. This milestone represents a fundamental architectural evolution from the v0.x series, introducing a fully operator-driven package management system, a comprehensive backup and restore framework, a redesigned virtual machine architecture, and a rich set of new managed applications — all hardened through an extensive alpha, beta, and release-candidate cycle.
## Feature Highlights
### Package-Based Architecture with Cozystack Operator
The most significant architectural change in v1.0.0 is the replacement of HelmRelease bundle deployments with a declarative **Package** and **PackageSource** model managed by the new `cozystack-operator`. Operators now define their platform configuration in a structured `values.yaml` and the operator reconciles the desired state by managing Package and PackageSource resources across the cluster.
The operator also takes ownership of CRD lifecycle — installing and updating CRDs from embedded manifests at every startup — eliminating the stale-CRD problem that affected Helm-only installations. Flux sharding has been added to distribute tenant HelmRelease reconciliation across multiple Flux controllers, providing horizontal scalability in large multi-tenant environments.
A migration script (`hack/migrate-to-version-1.0.sh`) is provided for upgrading existing v0.x clusters, along with 33 incremental migration steps that automate resource renaming, secret cleanup, and configuration conversion.
### Comprehensive Backup and Restore System
v1.0.0 ships a fully featured, production-ready backup and restore framework built on Velero integration. Users can define **BackupClass** resources to describe backup storage targets, create **BackupPlan** schedules, and trigger **RestoreJob** resources for end-to-end application recovery.
Virtual machine backups are supported natively via the Velero KubeVirt plugin, which captures consistent VM disk snapshots alongside metadata. The backup controller and the backup strategy sub-controllers (including the VM-specific strategy) are installed by default, and a full dashboard UI allows users to monitor backup status, view backup job history, and initiate restore workflows.
### Redesigned Virtual Machine Architecture
The legacy `virtual-machine` application has been replaced with a two-resource architecture: **`vm-disk`** for managing persistent disks and **`vm-instance`** for managing VM lifecycle. This separation provides cleaner disk/instance management, allows disks to be reused across VM instances, and aligns with modern KubeVirt patterns.
New capabilities include: a `cpuModel` field for direct CPU model specification without using an instanceType; the ability to switch between `instanceType`-based and custom resource-based configurations; migration from the deprecated `running` field to `runStrategy`; and native **RWX (NFS) filesystem support** in the KubeVirt CSI driver, enabling multiple pods to mount the same persistent volume simultaneously.
### New Managed Applications
v1.0.0 expands the application catalog significantly:
- **MongoDB**: A fully managed MongoDB replica set with persistent storage, monitoring integration, and unified user/database configuration API.
- **Qdrant**: A high-performance vector database for AI and machine learning workloads, supporting single-replica and clustered modes with API key authentication and optional external LoadBalancer access.
- **Harbor**: A fully managed OCI container registry backed by CloudNativePG, Redis operator, and COSI BucketClaim (SeaweedFS). Includes Trivy vulnerability scanner, auto-generated admin credentials, and TLS via cert-manager.
- **NATS**: Enhanced with full Grafana monitoring dashboards for JetStream and server metrics, Prometheus support with TLS-aware configuration, and updated image customization options.
- **MariaDB**: The `mysql` application is renamed to `mariadb`, accurately reflecting the underlying engine. An automatic migration (migration 27) converts all existing MySQL resources to use the `mariadb` naming.
FerretDB has been removed from the catalog as it is superseded by native MongoDB support.
### Multi-Location Networking with Kilo and cilium-kilo
Cozystack v1.0.0 introduces first-class support for multi-location clusters via the **Kilo** WireGuard mesh networking package. Kilo automatically establishes encrypted WireGuard tunnels between nodes in different network segments, enabling seamless cross-region communication.
A new integrated **`cilium-kilo`** networking variant combines Cilium eBPF CNI with Kilo's WireGuard overlay in a single platform configuration selection. This variant enables `enable-ipip-termination` in Cilium and deploys Kilo with `--compatibility=cilium`, allowing Cilium network policies to function correctly over the WireGuard mesh — without any manual configuration of the two components.
### Flux Sharding for Scalable Multi-Tenancy
Tenant HelmRelease reconciliation is now distributed across multiple Flux controllers via sharding labels. Each tenant workload is assigned to a shard based on a deterministic hash, preventing a single Flux controller from becoming a bottleneck in large multi-tenant environments. The platform operator manages the shard assignment automatically, and new shards can be added by scaling the Flux deployment.
## Major Features and Improvements
### Cozystack Operator
* **[cozystack-operator] Introduce Package and PackageSource APIs**: Added new CRDs for declarative package management, defining the full API for Package and PackageSource resources ([**@kvaps**](https://github.com/kvaps) in #1740, #1741, #1755, #1756, #1760, #1761).
* **[platform] Migrate from HelmRelease bundles to Package-based deployment**: Replaced HelmRelease bundle system with Package resources managed by cozystack-operator, including restructured values.yaml with full configuration support for networking, publishing, authentication, scheduling, branding, and resources ([**@kvaps**](https://github.com/kvaps) in #1816).
* **[cozystack-operator] Add automatic CRD installation at startup**: Added `--install-crds` flag to install embedded CRD manifests on every startup via server-side apply, ensuring CRDs and the PackageSource are always up to date ([**@lexfrei**](https://github.com/lexfrei) in #2060).
* **[installer] Remove CRDs from Helm chart, delegate lifecycle to operator**: The `cozy-installer` Helm chart no longer ships CRDs; CRD lifecycle is fully managed by the Cozystack operator ([**@lexfrei**](https://github.com/lexfrei) in #2074).
* **[cozystack-operator] Preserve existing suspend field in package reconciler**: Fixed package reconciler to properly preserve the suspend field state during reconciliation ([**@sircthulhu**](https://github.com/sircthulhu) in #2043).
* **[cozystack-operator] Fix namespace privileged flag resolution and field ownership**: Fixed operator to correctly check all Packages in a namespace when determining privileged status, and resolved SSA field ownership conflicts ([**@kvaps**](https://github.com/kvaps) in #2046).
* **[platform] Add flux-plunger controller**: Added flux-plunger controller to automatically fix stuck HelmRelease errors by cleaning up failed resources and retrying reconciliation ([**@kvaps**](https://github.com/kvaps) in #1843).
* **[installer] Add variant-aware templates for generic Kubernetes support**: Extended the installer to support generic and hosted Kubernetes deployments via the `cozystackOperator.variant=generic` parameter ([**@lexfrei**](https://github.com/lexfrei) in #2010).
* **[installer] Unify operator templates**: Merged separate operator templates into a single variant-based template supporting Talos and non-Talos deployments ([**@kvaps**](https://github.com/kvaps) in #2034).
### API and Platform
* **[api] Rename CozystackResourceDefinition to ApplicationDefinition**: Renamed CRD and all related types for clarity and consistency, with migration 24 handling the transition automatically ([**@kvaps**](https://github.com/kvaps) in #1864).
* **[platform] Add DNS-1035 validation for Application names**: Added dynamic DNS-1035 label validation for Application names at creation time, preventing resources with invalid names that would fail downstream ([**@lexfrei**](https://github.com/lexfrei) in #1771).
* **[platform] Make cluster issuer name and ACME solver configurable**: Added `publishing.certificates.solver` and `publishing.certificates.issuerName` parameters to allow pointing all ingress TLS annotations at any ClusterIssuer ([**@myasnikovdaniil**](https://github.com/myasnikovdaniil) in #2077).
* **[platform] Add cilium-kilo networking variant**: Added integrated `cilium-kilo` networking variant combining Cilium CNI with Kilo WireGuard mesh overlay ([**@kvaps**](https://github.com/kvaps) in #2064).
* **[cozystack-api] Switch from DaemonSet to Deployment**: Migrated cozystack-api to a Deployment with PreferClose topology spread constraints, reducing resource consumption while maintaining high availability ([**@kvaps**](https://github.com/kvaps) in #2041, #2048).
### Virtual Machines
* **[vm-instance] Complete migration from virtual-machine to vm-disk and vm-instance**: Fully migrated from `virtual-machine` to the new `vm-disk` and `vm-instance` architecture, with automatic migration script (migration 28) for existing VMs ([**@kvaps**](https://github.com/kvaps) in #2040).
* **[kubevirt-csi-driver] Add RWX Filesystem (NFS) support**: Added Read-Write-Many filesystem support to kubevirt-csi-driver via automatic NFS server deployment per PVC ([**@kvaps**](https://github.com/kvaps) in #2042).
* **[vm] Add cpuModel field to specify CPU model without instanceType**: Added cpuModel field to VirtualMachine API for granular CPU control ([**@sircthulhu**](https://github.com/sircthulhu) in #2007).
* **[vm] Allow switching between instancetype and custom resources**: Implemented atomic upgrade hook for switching between instanceType-based and custom resource VM configurations ([**@sircthulhu**](https://github.com/sircthulhu) in #2008).
* **[vm] Migrate to runStrategy instead of running**: Migrated VirtualMachine API from deprecated `running` field to `runStrategy` ([**@sircthulhu**](https://github.com/sircthulhu) in #2004).
* **[vm] Always expose VMs with a service**: Virtual machines are now always exposed with at least a ClusterIP service, ensuring in-cluster DNS names ([**@lllamnyp**](https://github.com/lllamnyp) in #1738, #1751).
* **[dashboard] VMInstance dropdowns for disks and instanceType**: VM instance creation form now renders API-backed dropdowns for `instanceType` and disk `name` fields ([**@sircthulhu**](https://github.com/sircthulhu) in #2071).
### Backup System
* **[backups] Implement comprehensive backup and restore functionality**: Core backup Plan controller, Velero strategy controller, RestoreJob resource with end-to-end restore workflows, and enhanced backup plans UI ([**@lllamnyp**](https://github.com/lllamnyp) in #1640, #1685, #1687, #1719, #1720, #1737, #1967; [**@androndo**](https://github.com/androndo) in #1762, #1967, #1968, #1811).
* **[backups] Add kubevirt plugin to velero**: Added KubeVirt plugin to Velero for consistent VM state and data snapshots ([**@lllamnyp**](https://github.com/lllamnyp) in #2017).
* **[backups] Install backupstrategy controller by default**: Enabled backupstrategy controller by default for automatic backup scheduling ([**@lllamnyp**](https://github.com/lllamnyp) in #2020).
* **[backups] Better selectors for VM strategy**: Improved VM backup strategy selectors for accurate and reliable backup targeting ([**@lllamnyp**](https://github.com/lllamnyp) in #2023).
* **[backups] Create RBAC for backup resources**: Added comprehensive RBAC configuration for backup operations and restore jobs ([**@lllamnyp**](https://github.com/lllamnyp) in #2018).
### Networking
* **[kilo] Introduce Kilo WireGuard mesh networking**: Added Kilo as a system package providing secure WireGuard-based VPN mesh for connecting Kubernetes nodes across different networks and regions ([**@kvaps**](https://github.com/kvaps) in #1691).
* **[kilo] Add Cilium compatibility variant**: Added `cilium` variant enabling Cilium-aware IPIP encapsulation for full network policy enforcement with Kilo mesh ([**@kvaps**](https://github.com/kvaps) in #2055).
* **[kilo] Update to v0.8.0 with configurable MTU**: Updated Kilo to v0.8.0 with configurable MTU parameter and performance improvements ([**@kvaps**](https://github.com/kvaps) in #2003, #2049, #2053).
* **[local-ccm] Add local-ccm package**: Added local cloud controller manager for managing load balancer services in bare-metal environments ([**@kvaps**](https://github.com/kvaps) in #1831).
* **[local-ccm] Add node-lifecycle-controller component**: Added optional node-lifecycle-controller that automatically deletes unreachable NotReady nodes, solving the "zombie" node problem in autoscaled clusters ([**@IvanHunters**](https://github.com/IvanHunters) in #1992).
* **[tenant] Allow egress to parent ingress pods**: Updated tenant network policies to allow egress traffic to parent cluster ingress pods ([**@lexfrei**](https://github.com/lexfrei) in #1765, #1776).
### New Applications
* **[mongodb] Add MongoDB managed application**: Added MongoDB as a fully managed database with replica sets, persistent storage, and unified user/database configuration ([**@lexfrei**](https://github.com/lexfrei) in #1822; [**@kvaps**](https://github.com/kvaps) in #1923).
* **[qdrant] Add Qdrant vector database**: Added Qdrant as a high-performance vector database for AI/ML workloads with API key authentication and optional LoadBalancer access ([**@lexfrei**](https://github.com/lexfrei) in #1987).
* **[harbor] Add managed Harbor container registry**: Added Harbor v2.14.2 as a managed tenant-level container registry with CloudNativePG, Redis operator, COSI BucketClaim storage, and Trivy scanner ([**@lexfrei**](https://github.com/lexfrei) in #2058).
* **[nats] Add monitoring**: Added Grafana dashboards for NATS JetStream and server metrics, Prometheus monitoring with TLS support ([**@klinch0**](https://github.com/klinch0) in #1381).
* **[mariadb] Rename mysql application to mariadb**: Renamed MySQL application to MariaDB with automatic migration (migration 27) for all existing resources ([**@kvaps**](https://github.com/kvaps) in #2026).
* **[ferretdb] Remove FerretDB application**: Removed FerretDB, superseded by native MongoDB support ([**@kvaps**](https://github.com/kvaps) in #2028).
### Kubernetes and System Components
* **[kubernetes] Update supported Kubernetes versions to v1.30–v1.35**: Updated the tenant Kubernetes version matrix, with v1.35 as the new default. Kamaji updated to edge-26.2.4 and CAPI Kamaji provider to v0.16.0 ([**@lexfrei**](https://github.com/lexfrei) in #2073).
* **[kubernetes] Auto-enable Gateway API support in cert-manager**: Added automatic Gateway API support in cert-manager for tenant clusters ([**@kvaps**](https://github.com/kvaps) in #1997).
* **[kubernetes] Use ingress-nginx nodeport service**: Changed tenant Kubernetes clusters to use ingress-nginx NodePort service for improved compatibility ([**@sircthulhu**](https://github.com/sircthulhu) in #1948).
* **[system] Add cluster-autoscaler for Hetzner and Azure**: Added cluster-autoscaler system package for automatically scaling management cluster nodes on Hetzner and Azure ([**@kvaps**](https://github.com/kvaps) in #1964).
* **[cluster-autoscaler] Enable enforce-node-group-min-size by default**: Ensures node groups are always scaled up to their configured minimum size ([**@kvaps**](https://github.com/kvaps) in #2050).
* **[system] Add clustersecret-operator package**: Added clustersecret-operator for managing secrets across multiple namespaces ([**@sircthulhu**](https://github.com/sircthulhu) in #2025).
### Monitoring
* **[monitoring] Enable monitoring for core components**: Enhanced monitoring capabilities with dashboards and metrics for core Cozystack components ([**@IvanHunters**](https://github.com/IvanHunters) in #1937).
* **[monitoring] Add SLACK_SEVERITY_FILTER and VMAgent for tenant monitoring**: Added SLACK_SEVERITY_FILTER for Slack alert filtering and VMAgent for tenant namespace metrics scraping ([**@IvanHunters**](https://github.com/IvanHunters) in #1712).
* **[monitoring-agents] Fix FQDN resolution for tenant workload clusters**: Fixed monitoring agents in tenant clusters to use full DNS names with cluster domain suffix ([**@IvanHunters**](https://github.com/IvanHunters) in #2075; [**@kvaps**](https://github.com/kvaps) in #2086).
### Storage
* **[linstor] Move CRDs to dedicated piraeus-operator-crds chart**: Moved LINSTOR CRDs to a dedicated chart, ensuring reliable installation of all CRDs including `linstorsatellites.io` ([**@kvaps**](https://github.com/kvaps) in #2036; [**@IvanHunters**](https://github.com/IvanHunters) in #1991).
* **[seaweedfs] Increase certificate duration to 10 years**: Increased SeaweedFS certificate validity to 10 years to reduce rotation overhead ([**@IvanHunters**](https://github.com/IvanHunters) in #1986).
## Improvements
* **[dashboard] Upgrade dashboard to version 1.4.0**: Updated Cozystack dashboard to v1.4.0 with new features and improvements ([**@sircthulhu**](https://github.com/sircthulhu) in #2051).
* **[dashboard] Hide Ingresses/Services/Secrets tabs when no selectors defined**: Tabs are now conditionally shown based on whether the ApplicationDefinition has resource selectors configured, reducing UI clutter ([**@kvaps**](https://github.com/kvaps) in #2087).
* **[dashboard] Add startupProbe to prevent container restarts on slow hardware**: Added startup probe to dashboard pods to prevent unnecessary restarts ([**@kvaps**](https://github.com/kvaps) in #1996).
* **[keycloak] Allow custom Ingress hostname via values**: Added `ingress.host` field to cozy-keycloak chart values for overriding the default `keycloak.<root-host>` hostname ([**@sircthulhu**](https://github.com/sircthulhu) in #2101).
* **[branding] Separate values for Keycloak**: Separated Keycloak branding values for better customization capabilities ([**@nbykov0**](https://github.com/nbykov0) in #1947).
* **[rbac] Use hierarchical naming scheme**: Refactored RBAC to use hierarchical naming for cluster roles and role bindings ([**@lllamnyp**](https://github.com/lllamnyp) in #2019).
* **[tenant,rbac] Use shared clusterroles**: Refactored tenant RBAC to use shared ClusterRoles for improved consistency ([**@lllamnyp**](https://github.com/lllamnyp) in #1999).
* **[kubernetes] Increase default apiServer resourcesPreset to large**: Increased kube-apiserver resource preset to `large` for more reliable operation under higher workloads ([**@kvaps**](https://github.com/kvaps) in #1875).
* **[kubernetes] Increase kube-apiserver startup probe threshold**: Increased startup probe threshold to allow more time for API server readiness ([**@kvaps**](https://github.com/kvaps) in #1876).
* **[etcd] Increase probe thresholds for better recovery**: Increased etcd probe thresholds to improve cluster resilience during temporary slowdowns ([**@kvaps**](https://github.com/kvaps) in #1874).
* **[etcd-operator] Add vertical-pod-autoscaler dependency**: Added VPA as a dependency to etcd-operator for proper resource scaling ([**@sircthulhu**](https://github.com/sircthulhu) in #2047).
* **[cilium] Change cilium-operator replicas to 1**: Reduced Cilium operator replicas to decrease resource consumption in smaller deployments ([**@IvanHunters**](https://github.com/IvanHunters) in #1784).
* **[keycloak-configure,dashboard] Enable insecure TLS verification by default**: Made SSL certificate verification configurable with insecure mode enabled by default for local development ([**@IvanHunters**](https://github.com/IvanHunters) in #2005).
* **[platform] Split telemetry between operator and controller**: Separated telemetry collection for better metrics isolation ([**@kvaps**](https://github.com/kvaps) in #1869).
* **[system] Add resource requests and limits to etcd-defrag**: Added resource requests and limits to etcd-defrag job to prevent resource contention ([**@matthieu-robin**](https://github.com/matthieu-robin) in #1785, #1786).
## Fixes
* **[dashboard] Fix sidebar visibility on cluster-level pages**: Fixed broken URLs with double `//` on cluster-level pages by hiding namespace-scoped sidebar items when no tenant is selected ([**@sircthulhu**](https://github.com/sircthulhu) in #2106).
* **[platform] Fix upgrade issues in migrations, etcd timeout, and migration script**: Fixed multiple upgrade failures discovered during v0.41.1 → v1.0 upgrade testing, including migration 26-29 fixes, RFC3339 format for annotations, and extended etcd HelmRelease timeout to 30m ([**@kvaps**](https://github.com/kvaps) in #2096).
* **[platform] Fix orphaned -rd HelmReleases after application renames**: Migrations 28-29 updated to remove orphaned `-rd` HelmReleases in `cozy-system` after `ferretdb→mongodb`, `mysql→mariadb`, and `virtual-machine→vm-disk+vm-instance` renames, with migration 33 as a safety net ([**@kvaps**](https://github.com/kvaps) in #2102).
* **[platform] Adopt tenant-root into cozystack-basics during migration**: Added migration 31 to adopt existing `tenant-root` Namespace and HelmRelease into `cozystack-basics` for a safe v0.41.x → v1.0 upgrade path ([**@kvaps**](https://github.com/kvaps) in #2065).
* **[platform] Preserve tenant-root HelmRelease during migration**: Fixed data-loss risk during migration where `tenant-root` HelmRelease could be deleted ([**@sircthulhu**](https://github.com/sircthulhu) in #2063).
* **[platform] Fix cozystack-values secret race condition**: Fixed race condition in cozystack-values secret creation that could cause initialization failures ([**@lllamnyp**](https://github.com/lllamnyp) in #2024).
* **[cozystack-basics] Preserve existing HelmRelease values during reconciliations**: Fixed data-loss bug where changes to `tenant-root` HelmRelease were dropped on the next reconciliation ([**@sircthulhu**](https://github.com/sircthulhu) in #2068).
* **[cozystack-basics] Deny resourcequotas deletion for tenant admin**: Fixed `cozy:tenant:admin:base` ClusterRole to explicitly deny deletion of ResourceQuota objects ([**@myasnikovdaniil**](https://github.com/myasnikovdaniil) in #2076).
* **[dashboard] Fix legacy templating and cluster identifier in sidebar links**: Standardized cluster identifier across dashboard menu links resolving broken link targets for Backups and External IPs ([**@androndo**](https://github.com/androndo) in #2093).
* **[dashboard] Fix backupjobs creation form and sidebar backup category identifier**: Fixed backup job creation form fields and fixed sidebar backup category identifier ([**@androndo**](https://github.com/androndo) in #2103).
* **[kubevirt] Update KubeVirt to v1.6.4 and CDI to v1.64.0, fix VM pod initialization**: Updated KubeVirt and CDI and disabled serial console logging globally to fix the `guest-console-log` init container blocking virt-launcher pods ([**@nbykov0**](https://github.com/nbykov0) in #1833; [**@kvaps**](https://github.com/kvaps)).
* **[linstor] Fix DRBD+LUKS+STORAGE resource creation failure**: Applied upstream fix for all newly created encrypted volumes failing due to missing `setExists(true)` call in `LuksLayer` ([**@kvaps**](https://github.com/kvaps) in #2072).
* **[platform] Clean up Helm secrets for removed releases**: Added cleanup logic to migration 23 to remove orphaned Helm secrets from removed `-rd` releases ([**@kvaps**](https://github.com/kvaps) in #2035).
* **[monitoring] Fix YAML parse error in vmagent template**: Fixed YAML parsing error in monitoring-agents vmagent template ([**@kvaps**](https://github.com/kvaps) in #2037).
* **[monitoring] Remove cozystack-controller dependency**: Fixed monitoring package to remove unnecessary cozystack-controller dependency ([**@IvanHunters**](https://github.com/IvanHunters) in #1990).
* **[monitoring] Remove duplicate dashboards.list**: Fixed duplicate dashboards.list configuration in extra/monitoring package ([**@IvanHunters**](https://github.com/IvanHunters) in #2016).
* **[linstor] Update piraeus-server patches with critical fixes**: Backported critical patches fixing edge cases in device management and DRBD resource handling ([**@kvaps**](https://github.com/kvaps) in #1850).
* **[apiserver] Fix Watch resourceVersion and bookmark handling**: Fixed Watch API handling of resourceVersion and bookmarks for proper event streaming ([**@kvaps**](https://github.com/kvaps) in #1860).
* **[bootbox] Auto-create bootbox-application as dependency**: Fixed bootbox package to automatically create required bootbox-application dependency ([**@kvaps**](https://github.com/kvaps) in #1974).
* **[postgres-operator] Correct PromQL syntax in CNPGClusterOffline alert**: Fixed incorrect PromQL syntax in the CNPGClusterOffline Prometheus alert ([**@mattia-eleuteri**](https://github.com/mattia-eleuteri) in #1981).
* **[coredns] Fix serviceaccount to match kubernetes bootstrap RBAC**: Fixed CoreDNS service account to correctly match Kubernetes bootstrap RBAC requirements ([**@mattia-eleuteri**](https://github.com/mattia-eleuteri) in #1958).
* **[dashboard] Verify JWT token**: Added JWT token verification to dashboard for improved security ([**@lllamnyp**](https://github.com/lllamnyp) in #1980).
* **[codegen] Fix missing gen_client in update-codegen.sh**: Fixed build error in `pkg/generated/applyconfiguration/utils.go` by including `gen_client` in the codegen script ([**@lexfrei**](https://github.com/lexfrei) in #2061).
* **[kubevirt-operator] Fix typo in VMNotRunningFor10Minutes alert**: Fixed typo in VM alert name ensuring proper alert triggering ([**@lexfrei**](https://github.com/lexfrei) in #1770, #1775).
## Security
* **[dashboard] Verify JWT token**: Added JWT token verification to the dashboard for improved authentication security ([**@lllamnyp**](https://github.com/lllamnyp) in #1980).
## Dependencies
* **[cilium] Update to v1.18.6**: Updated Cilium CNI to v1.18.6 with security fixes and performance improvements ([**@sircthulhu**](https://github.com/sircthulhu) in #1868).
* **[kube-ovn] Update to v1.15.3**: Updated Kube-OVN CNI to v1.15.3 with performance improvements and bug fixes ([**@kvaps**](https://github.com/kvaps) in #2022).
* **[kilo] Update to v0.8.0**: Updated Kilo WireGuard mesh to v0.8.0 with performance improvements and new compatibility features ([**@kvaps**](https://github.com/kvaps) in #2053).
* **Update Talos Linux to v1.12.1**: Updated Talos Linux to v1.12.1 with latest features and security patches ([**@kvaps**](https://github.com/kvaps) in #1877).
## System Configuration
* **[vpc] Migrate subnets definition from map to array format**: Migrated VPC subnets from `map[string]Subnet` to `[]Subnet` with explicit `name` field, with automatic migration via migration 30 ([**@kvaps**](https://github.com/kvaps) in #2052).
* **[migrations] Add migrations 23-33 for v1.0 upgrade path**: Added 11 incremental migrations handling CRD ownership, resource renaming, secret cleanup, Helm adoption, and configuration conversion for the v0.41.x → v1.0.0 upgrade path ([**@kvaps**](https://github.com/kvaps) in #1975, #2035, #2036, #2040, #2026, #2065, #2052, #2102).
* **[tenant] Run cleanup job from system namespace**: Moved tenant cleanup job to system namespace for improved security and resource isolation ([**@lllamnyp**](https://github.com/lllamnyp) in #1774, #1777).
## Development, Testing, and CI/CD
* **[ci] Use GitHub Copilot CLI for changelog generation**: Automated changelog generation using GitHub Copilot CLI ([**@androndo**](https://github.com/androndo) in #1753).
* **[ci] Choose runner conditional on label**: Added conditional runner selection in CI based on PR labels ([**@lllamnyp**](https://github.com/lllamnyp) in #1998).
* **[e2e] Use helm install instead of kubectl apply for cozystack installation**: Replaced static YAML apply flow with direct `helm upgrade --install` of the installer chart in E2E tests ([**@lexfrei**](https://github.com/lexfrei) in #2060).
* **[e2e] Make kubernetes test retries effective by cleaning up stale resources**: Fixed E2E test retries by adding pre-creation cleanup and increasing deployment wait timeout to 300s ([**@lexfrei**](https://github.com/lexfrei) in #2062).
* **[e2e] Increase HelmRelease readiness timeout for kubernetes test**: Increased HelmRelease readiness timeout to prevent false failures on slower hardware ([**@lexfrei**](https://github.com/lexfrei) in #2033).
* **[ci] Improve cozyreport functionality**: Enhanced cozyreport tool with improved reporting for CI/CD pipelines ([**@lllamnyp**](https://github.com/lllamnyp) in #2032).
* **feat(cozypkg): add cross-platform build targets with version injection**: Added cross-platform build targets for cozypkg/cozyhr tool for linux/amd64, linux/arm64, darwin/amd64, darwin/arm64 ([**@kvaps**](https://github.com/kvaps) in #1862).
* **refactor: move scripts to hack directory**: Reorganized scripts to the standard `hack/` location ([**@kvaps**](https://github.com/kvaps) in #1863).
* **Update CODEOWNERS**: Updated CODEOWNERS to include new maintainers ([**@lllamnyp**](https://github.com/lllamnyp) in #1972; [**@IvanHunters**](https://github.com/IvanHunters) in #2015).
* **[talm] Skip config loading for completion subcommands**: Fixed talm CLI to skip config loading for shell completion commands ([**@kitsunoff**](https://github.com/kitsunoff) in cozystack/talm#109).
* **[talm] Fix metadata.id type casting in physical_links_info**: Fixed Prometheus query to properly cast metadata.id to string for regexMatch operations ([**@kvaps**](https://github.com/kvaps) in cozystack/talm#110).
## Documentation
* **[website] Add documentation versioning**: Implemented comprehensive documentation versioning with separate v0 and v1 documentation trees and a version selector in the UI ([**@IvanStukov**](https://github.com/IvanStukov) in cozystack/website#415).
* **[website] Describe upgrade to v1.0**: Added detailed upgrade instructions for migrating from v0.x to v1.0 ([**@nbykov0**](https://github.com/nbykov0) in cozystack/website@21bbe84).
* **[website] Migrate ConfigMap references to Platform Package in v1 docs**: Updated entire v1 documentation to replace legacy ConfigMap-based configuration with the new Platform Package API ([**@sircthulhu**](https://github.com/sircthulhu) in cozystack/website#426).
* **[website] Add generic Kubernetes deployment guide for v1**: Added installation guide for deploying Cozystack on any generic Kubernetes cluster ([**@lexfrei**](https://github.com/lexfrei) in cozystack/website#408).
* **[website] Describe operator-based and HelmRelease-based package patterns**: Added development documentation explaining operator-based and HelmRelease-based package patterns ([**@kvaps**](https://github.com/kvaps) in cozystack/website#413).
* **[website] Add Helm chart development principles guide**: Added developer guide documenting Cozystack's four core Helm chart principles ([**@kvaps**](https://github.com/kvaps) in cozystack/website#418).
* **[website] Add network architecture overview**: Added comprehensive network architecture documentation covering the multi-layered networking stack with Mermaid diagrams ([**@IvanHunters**](https://github.com/IvanHunters) in cozystack/website#422).
* **[website] Add LINSTOR disk preparation guide**: Added comprehensive documentation for preparing disks for LINSTOR storage ([**@IvanHunters**](https://github.com/IvanHunters) in cozystack/website#411).
* **[website] Add Proxmox VM migration guide**: Added detailed guide for migrating virtual machines from Proxmox to Cozystack ([**@IvanHunters**](https://github.com/IvanHunters) in cozystack/website#410).
* **[website] Add cluster autoscaler documentation**: Added documentation for Hetzner setup with Talos, vSwitch, and Kilo mesh integration ([**@kvaps**](https://github.com/kvaps) in #1964).
* **[website] Improve Azure autoscaling troubleshooting guide**: Enhanced Azure autoscaling documentation with serial console instructions and `az vmss update --custom-data` guidance ([**@kvaps**](https://github.com/kvaps) in cozystack/website#424).
* **[website] Update multi-location documentation for cilium-kilo variant**: Updated multi-location networking docs to reflect the integrated `cilium-kilo` variant selection ([**@kvaps**](https://github.com/kvaps) in cozystack/website@02d63f0).
* **[website] Update documentation to use jsonpatch for service exposure**: Improved `kubectl patch` commands to use JSON Patch `add` operations ([**@sircthulhu**](https://github.com/sircthulhu) in cozystack/website#427).
* **[website] Update certificates section in Platform Package documentation**: Updated certificate configuration docs to reflect new `solver` and `issuerName` fields ([**@myasnikovdaniil**](https://github.com/myasnikovdaniil) in cozystack/website#429).
* **[website] Add tenant Kubernetes cluster log querying guide**: Added documentation for querying logs from tenant clusters in Grafana using VictoriaLogs labels ([**@IvanHunters**](https://github.com/IvanHunters) in cozystack/website#430).
* **[website] Replace non-idempotent commands with idempotent alternatives**: Updated `helm install` to `helm upgrade --install` and `kubectl create` to `kubectl apply` across all installation guides ([**@lexfrei**](https://github.com/lexfrei) in cozystack/website#431).
* **[website] Fix broken documentation links with .md suffix**: Fixed incorrect internal links across virtualization guides for v0 and v1 documentation ([**@cheese**](https://github.com/cheese) in cozystack/website#432).
* **[website] Refactor resource planning documentation**: Improved resource planning guide with clearer structure and more comprehensive coverage ([**@IvanStukov**](https://github.com/IvanStukov) in cozystack/website#423).
* **[website] Add ServiceAccount API access documentation and update FAQ**: Added documentation for ServiceAccount API access token configuration and updated FAQ ([**@IvanStukov**](https://github.com/IvanStukov) in cozystack/website#421).
* **[website] Update networking-mesh allowed-location-ips example**: Replaced provider-specific CLI with standard `kubectl` commands in multi-location networking guide ([**@kvaps**](https://github.com/kvaps) in cozystack/website#425).
* **[website] Add Hetzner RobotLB documentation**: Added documentation for configuring public IP with Hetzner RobotLB ([**@kvaps**](https://github.com/kvaps) in cozystack/website#394).
* **[website] Add documentation for creating and managing cloned VMs**: Added comprehensive guide for VM cloning operations ([**@sircthulhu**](https://github.com/sircthulhu) in cozystack/website#401).
* **[website] Update Talos installation docs for Hetzner and Servers.com**: Updated installation documentation for Hetzner and Servers.com environments ([**@kvaps**](https://github.com/kvaps) in cozystack/website#395).
* **[website] Add Hidora organization support details**: Added Hidora to the support page ([**@matthieu-robin**](https://github.com/matthieu-robin) in cozystack/website#397, cozystack/website#398).
* **[website] Check quotas before an upgrade**: Added troubleshooting documentation for checking resource quotas before upgrades ([**@nbykov0**](https://github.com/nbykov0) in cozystack/website#405).
* **[website] Update support documentation**: Updated support documentation with current contact information ([**@xrmtech-isk**](https://github.com/xrmtech-isk) in cozystack/website#420).
* **[website] Correct typo in kubeconfig reference in Kubernetes installation guide**: Fixed documentation typo in kubeconfig reference ([**@shkarface**](https://github.com/shkarface) in cozystack/website#414).
## Breaking Changes & Upgrade Notes
* **[api] CozystackResourceDefinition renamed to ApplicationDefinition**: The `CozystackResourceDefinition` CRD has been renamed to `ApplicationDefinition`. Migration 24 handles the transition automatically during upgrade ([**@kvaps**](https://github.com/kvaps) in #1864).
* **[platform] Certificate issuer configuration parameters renamed**: The `publishing.certificates.issuerType` field is renamed to `publishing.certificates.solver`, and the value `cloudflare` is renamed to `dns01`. A new `publishing.certificates.issuerName` field (default: `letsencrypt-prod`) is added. Migration 32 automatically converts existing configurations — no manual action required ([**@myasnikovdaniil**](https://github.com/myasnikovdaniil) in #2077).
* **[vpc] VPC subnets definition migrated from map to array format**: VPC subnets are now defined as `[]Subnet` with an explicit `name` field instead of `map[string]Subnet`. Migration 30 handles the conversion automatically ([**@kvaps**](https://github.com/kvaps) in #2052).
* **[vm] virtual-machine application replaced by vm-disk and vm-instance**: The legacy `virtual-machine` application has been fully replaced. Migration 28 automatically converts existing VMs to the new architecture ([**@kvaps**](https://github.com/kvaps) in #2040).
* **[mysql] mysql application renamed to mariadb**: Existing MySQL deployments are automatically renamed to MariaDB via migration 27 ([**@kvaps**](https://github.com/kvaps) in #2026).
### Upgrade Guide
To upgrade from v0.41.x to v1.0.0:
1.**Backup your cluster** before upgrading.
2. Run the provided migration script: `hack/migrate-to-version-1.0.sh`.
3. The 33 incremental migration steps will automatically handle all resource renaming, configuration conversion, CRD adoption, and secret cleanup.
4. Refer to the [upgrade documentation](https://cozystack.io/docs/v1/upgrade) for detailed instructions and troubleshooting.
## Contributors
We'd like to thank all contributors who made this release possible:
* **[platform] Prevent cozystack-version ConfigMap from deletion**: Added resource protection to prevent the `cozystack-version` ConfigMap from being accidentally deleted, improving platform stability and reliability ([**@myasnikovdaniil**](https://github.com/myasnikovdaniil) in #2112, #2114).
* **[installer] Add keep annotation to Namespace and update migration script**: Added `helm.sh/resource-policy: keep` annotation to the `cozy-system` Namespace in the installer Helm chart to prevent Helm from deleting the namespace (and all HelmReleases within it) when the installer release is removed. The v1.0 migration script is also updated to annotate the `cozy-system` namespace and `cozystack-version` ConfigMap with this policy before migration ([**@kvaps**](https://github.com/kvaps) in #2122, #2123).
* **[dashboard] Add FlowSchema to exempt BFF from API throttling**: Added a `cozy-dashboard-exempt` FlowSchema to exempt the dashboard Back-End-for-Frontend (BFF) service account from Kubernetes API Priority and Fairness throttling. Previously, the BFF fell under the `workload-low` priority level, causing 429 (Too Many Requests) errors under load, resulting in dashboard unresponsiveness ([**@kvaps**](https://github.com/kvaps) in #2121, #2124).
## Documentation
* **[website] Replace bundles documentation with variants**: Renamed the "Bundles" documentation section to "Variants" to match current Cozystack terminology. Removed deprecated variants (`iaas-full`, `distro-full`, `distro-hosted`) and added new variants: `default` (PackageSources only, for manual package management via cozypkg) and `isp-full-generic` (full PaaS/IaaS on k3s, kubeadm, or RKE2). Updated all cross-references throughout the documentation ([**@kvaps**](https://github.com/kvaps) in cozystack/website#433).
* **[website] Add step to protect namespace before upgrading**: Updated the cluster upgrade guide and v0.41→v1.0 migration guide with a required step to annotate the `cozy-system` namespace and `cozystack-version` ConfigMap with `helm.sh/resource-policy=keep` before running `helm upgrade`, preventing accidental namespace deletion ([**@kvaps**](https://github.com/kvaps) in cozystack/website#435).
* **[platform] Suspend cozy-proxy if it conflicts with installer release during migration**: Added a check in the v0.41→v1.0 migration script to detect and automatically suspend the `cozy-proxy` HelmRelease when its `releaseName` is set to `cozystack`, which conflicts with the installer release and would cause `cozystack-operator` deletion during the upgrade ([**@kvaps**](https://github.com/kvaps) in #2128, #2130).
* **[platform] Fix off-by-one error in run-migrations script**: Fixed a bug in the migration runner where the first required migration was always skipped due to an off-by-one error in the migration range calculation, ensuring all upgrade steps execute correctly ([**@myasnikovdaniil**](https://github.com/myasnikovdaniil) in #2126, #2132).
* **[system] Fix Keycloak proxy configuration for v26.x**: Replaced the deprecated `KC_PROXY=edge` environment variable with `KC_PROXY_HEADERS=xforwarded` and `KC_HTTP_ENABLED=true` in the Keycloak StatefulSet template. `KC_PROXY` was removed in Keycloak 26.x, previously causing "Non-secure context detected" warnings and broken cookie handling when running behind a reverse proxy with TLS termination ([**@sircthulhu**](https://github.com/sircthulhu) in #2125, #2134).
* **[dashboard] Allow clearing instanceType field and preserve newlines in secret copy**: Added `allowEmpty: true` to the `instanceType` field in the VMInstance form so users can explicitly clear it to use custom KubeVirt resources without a named instance type. Also fixed newline preservation when copying secrets with CMD+C ([**@sircthulhu**](https://github.com/sircthulhu) in #2135, #2137).
* **[dashboard] Restore stock-instance sidebars for namespace-level pages**: Restored `stock-instance-api-form`, `stock-instance-api-table`, `stock-instance-builtin-form`, and `stock-instance-builtin-table` sidebar resources that were inadvertently removed in #2106. Without these sidebars, namespace-level pages such as Backup Plans rendered as empty pages with no interactive content ([**@sircthulhu**](https://github.com/sircthulhu) in #2136, #2138).
* **[platform] Fix package name conversion in migration script**: Fixed the `migrate-to-version-1.0.sh` script to correctly prepend the `cozystack.` prefix when converting `BUNDLE_DISABLE` and `BUNDLE_ENABLE` package name lists, ensuring packages are properly identified during the v0.41→v1.0 upgrade ([**@myasnikovdaniil**](https://github.com/myasnikovdaniil) in #2144, #2148).
## Documentation
* **[website] Add white labeling guide**: Added a comprehensive guide for configuring white labeling (branding) in Cozystack v1, covering Dashboard fields (`titleText`, `footerText`, `tenantText`, `logoText`, `logoSvg`, `iconSvg`) and Keycloak fields (`brandName`, `brandHtmlName`). Includes SVG preparation workflow with theme-aware template variables, portable base64 encoding, and migration notes from the v0 ConfigMap approach ([**@lexfrei**](https://github.com/lexfrei) in cozystack/website#441).
* **[website] Actualize backup and recovery documentation**: Reworked the backup and recovery docs to be user-focused, separating operator and tenant workflows. Added tenant-facing documentation for `BackupJob` and `Plan` resources and status inspection commands, and added a new Velero administration guide for operators covering storage credentials and backup storage configuration ([**@androndo**](https://github.com/androndo) in cozystack/website#434).
* **[system] Fix Keycloak probe crashloop with management port health endpoints**: Fixed a crashloop where Keycloak 26.x was endlessly restarting because liveness and readiness probes were sending HTTP requests to port 8080. Keycloak 26.x redirects all requests on port 8080 to `KC_HOSTNAME` (HTTPS), and since kubelet does not follow redirects, probes failed, eventually triggering container restarts. The fix switches probes to the dedicated management port 9000 (`/health/live`, `/health/ready`) enabled via `KC_HEALTH_ENABLED=true`, exposes management port 9000, and adds a `startupProbe` with appropriate failure thresholds for better startup tolerance ([**@mattia-eleuteri**](https://github.com/mattia-eleuteri) in #2162, #2178).
* **[system] Fix etcd-operator deprecated kube-rbac-proxy image**: Replaced the deprecated `gcr.io/kubebuilder/kube-rbac-proxy:v0.16.0` image with `quay.io/brancz/kube-rbac-proxy:v0.18.1` in the vendored etcd-operator chart. The GCR-hosted image became unavailable after March 18, 2025, causing etcd-operator pods to fail on image pull ([**@kvaps**](https://github.com/kvaps) in #2181, #2183).
* **[platform] Fix VM MAC address not preserved during virtual-machine to vm-instance migration**: During the `virtual-machine` → `vm-instance` migration (script 29), VM MAC addresses were not preserved. Kube-OVN reads MAC addresses exclusively from the pod annotation `ovn.kubernetes.io/mac_address`, not from `spec.macAddress` of the IP resource. Without this annotation, migrated VMs received a new random MAC address, breaking OS-level network configuration that matches by MAC (e.g., netplan). The fix adds a Helm `lookup` in the vm-instance chart template to read the Kube-OVN IP resource and automatically inject the MAC and IP addresses as pod annotations ([**@sircthulhu**](https://github.com/sircthulhu) in #2169, #2191).
* **[dashboard] Fix External IPs page showing empty rows**: Fixed the External IPs administration page displaying empty rows instead of service data. The `EnrichedTable` configuration in the `external-ips` factory was using incorrect property names — replaced `clusterNamePartOfUrl` with `cluster` and changed `pathToItems` from array format to dot-path string format, matching the convention used by all other `EnrichedTable` instances ([**@IvanHunters**](https://github.com/IvanHunters) in #2175, #2192).
* **[dashboard] Fix disabled/hidden state reset on MarketplacePanel reconciliation**: Fixed a bug where the dashboard controller was hardcoding `disabled=false` and `hidden=false` on every reconcile loop, overwriting changes made through the dashboard UI. Services disabled or hidden via the marketplace panel now correctly retain their state after controller reconciliation ([**@IvanHunters**](https://github.com/IvanHunters) in #2176, #2202).
* **[dashboard] Fix hidden MarketplacePanel resources appearing in sidebar menu**: Fixed the sidebar navigation showing all resources regardless of their MarketplacePanel `hidden` state. The controller now fetches MarketplacePanels during sidebar reconciliation and filters out resources where `hidden=true`, ensuring that hiding a resource from the marketplace also removes it from the sidebar navigation. Listing failures are non-fatal — if the configuration fetch fails, no hiding is applied and the dashboard remains functional ([**@IvanHunters**](https://github.com/IvanHunters) in #2177, #2204).
## Documentation
* **[website] Add OIDC self-signed certificates configuration guide**: Added a comprehensive guide for configuring OIDC authentication with Keycloak when using self-signed certificates (the default in Cozystack). Covers Talos machine configuration with certificate mounting and host entries, kubelogin setup instructions, and a troubleshooting section. The guide is available for both v0 and v1 versioned documentation paths ([**@IvanHunters**](https://github.com/IvanHunters) in cozystack/website#443).
* **[api] Fix spurious OpenAPI post-processing errors for non-apps group versions**: The API server no longer logs false errors while generating OpenAPI specs for core and other non-`apps.cozystack.io` group versions. The post-processor now exits early when the base `Application` schemas are absent, reducing noisy startup logs without affecting application schema generation ([**@kvaps**](https://github.com/kvaps) in #2212, #2216).
## Documentation
* **[website] Add `DependenciesNotReady` troubleshooting and correct packages management build target**: Added a troubleshooting guide for packages stuck in `DependenciesNotReady`, including how to inspect operator logs and identify missing dependencies, and fixed the outdated `make image-cozystack` command to `make image-packages` in the packages management guide ([**@kvaps**](https://github.com/kvaps) in cozystack/website#450).
* **[website] Clarify operator-first installation order**: Reordered the platform installation guide and tutorial so users install the Cozystack operator before preparing and applying the Platform Package, matching the rest of the installation docs and reducing setup confusion during fresh installs ([**@sircthulhu**](https://github.com/sircthulhu) in cozystack/website#449).
* **[website] Add automated installation guide for Ansible**: Added end-to-end documentation for deploying Cozystack with the `cozystack.installer` Ansible collection, including inventory examples, distro-specific playbooks, configuration reference, verification steps, and explicit version pinning guidance to help operators automate installs safely ([**@lexfrei**](https://github.com/lexfrei) in cozystack/website#442).
* **[website] Expand CA rotation operations guide**: Completed the CA rotation documentation with separate Talos and Kubernetes certificate rotation procedures, dry-run preview steps, and post-rotation guidance for fetching updated `talosconfig` and `kubeconfig` files after certificate changes ([**@kvaps**](https://github.com/kvaps) in cozystack/website#406).
* **[website] Improve backup operations documentation**: Enhanced the operator backup and recovery guide with clearer Velero enablement steps, concrete provider and bucket examples, and more useful commands for inspecting backups, schedules, restores, CRD status, and logs ([**@androndo**](https://github.com/androndo) in cozystack/website#440).
* **[website] Add custom metrics collection guide**: Added a monitoring guide showing how tenants can expose their own Prometheus exporters through `VMServiceScrape` and `VMPodScrape`, including namespace labeling requirements, example manifests, verification steps, and troubleshooting advice ([**@IvanHunters**](https://github.com/IvanHunters) in cozystack/website#444).
* **[website] Document PackageSource and Package architecture**: Added a Key Concepts reference covering `PackageSource` and `Package` reconciliation flow, dependency handling, update propagation, rollback behavior, FluxPlunger recovery, and the `cozypkg` CLI for package management ([**@IvanHunters**](https://github.com/IvanHunters) in cozystack/website#445).
* **[website] Refresh v1 application and platform documentation**: Fixed the documentation auto-update flow and published a broad v1 documentation refresh covering newly documented applications, updated naming and navigation, virtualization and platform content updates, and reorganized versioned docs pages ([**@myasnikovdaniil**](https://github.com/myasnikovdaniil) in cozystack/website#439).
Cozystack v1.1.0 delivers a major expansion of the managed application catalog with **OpenBAO** (open-source HashiCorp Vault fork) for secrets management, comprehensive **tiered object storage** with SeaweedFS storage pools, a new bucket **user model** with per-user credentials and S3 login support, **RabbitMQ version selection**, and **MongoDB Grafana dashboards**. The dashboard gains storageClass dropdowns for all stateful apps. This release also incorporates all fixes from the v1.0.x patch series.
## Feature Highlights
### OpenBAO: Managed Secrets Management Service
Cozystack now ships **OpenBAO** as a fully managed PaaS application — an open-source fork of HashiCorp Vault providing enterprise-grade secrets management. Users can deploy OpenBAO instances in standalone mode (single replica with file storage) or in high-availability Raft mode (multiple replicas with integrated Raft consensus), with the mode switching automatically based on the `replicas` field.
Each OpenBAO instance gets TLS enabled by default via cert-manager self-signed certificates, with DNS SANs covering all service endpoints and pod addresses. The Vault injector and CSI provider are intentionally disabled (they are cluster-scoped components not safe for per-tenant use). OpenBAO requires manual initialization and unsealing by design — no auto-unseal is configured.
A full end-to-end E2E test covers the complete lifecycle: deploy, wait for certificate and API readiness, init, unseal, verify, and cleanup. OpenBAO is available in the application catalog for tenant namespaces.
### SeaweedFS Tiered Storage Pools
SeaweedFS now supports **tiered storage pools** — operators can define separate storage pools per disk type (SSD, HDD, NVMe) in the `volume.pools` field (Simple topology) or `volume.zones[name].pools` (MultiZone topology). Each pool creates an additional Volume StatefulSet alongside the default one, with SeaweedFS distinguishing storage via the `-disk=<type>` flag on volume servers.
Each pool automatically generates its own set of COSI resources: a standard `BucketClass`, a `-lock` BucketClass (COMPLIANCE mode, 365-day retention), a read-write `BucketAccessClass`, and a `-readonly` BucketAccessClass. This allows applications to place data on specific storage tiers and request appropriate access policies per pool.
In MultiZone topology, pools are defined per zone and each zone × pool combination creates a dedicated StatefulSet (e.g., `us-east-ssd`, `us-west-hdd`), with nodes selected via `topology.kubernetes.io/zone` labels. Existing deployments with no pools defined produce output identical to previous versions — no migration is required.
### Bucket User Model with S3 Login
The bucket application introduces a new **user model** for access management. Instead of a single implicit BucketAccess resource, operators now define a `users` map where each entry creates a dedicated `BucketAccess` with its own credentials secret and an optional `readonly` flag. The S3 Manager UI has been updated with a login screen that uses per-session credentials from the user's own secret, replacing the previous basic-auth approach.
Two new bucket parameters are available: `locking` provisions from the `-lock` BucketClass (COMPLIANCE mode, 365-day object lock retention) for write-once-read-many use cases, and `storagePool` selects a specific pool's BucketClass for tiered storage placement. The COSI driver has been updated to v0.3.0 to support the new `diskType` parameter.
**⚠️ Breaking change**: The implicit default BucketAccess resource is no longer created. Existing buckets that relied on the single auto-generated BucketAccess will need to explicitly define users in the `users` map after upgrading.
### RabbitMQ Version Selection
RabbitMQ instances now support a configurable **version selector** (`version` field with values: `v4.2`, `v4.1`, `v4.0`, `v3.13`; default `v4.2`). The chart validates the selection at deploy time and uses it to pin the runtime image, giving operators control over the RabbitMQ release channel per instance. An automatic migration backfills the `version` field on all existing RabbitMQ resources to `v4.2`.
## Major Features and Improvements
* **[apps] Add OpenBAO as a managed secrets management service**: Deployed as a PaaS application with standalone (file storage) and HA Raft modes, TLS enabled by default via cert-manager, injector and CSI provider disabled for tenant safety, and a full E2E lifecycle test ([**@lexfrei**](https://github.com/lexfrei) in #2059).
* **[seaweedfs] Add storage pools support for tiered storage**: Added `volume.pools` (Simple) and `volume.zones[name].pools` (MultiZone) for per-disk-type StatefulSets, zone overrides (`nodeSelector`, `storageClass`, `dataCenter`), per-pool COSI BucketClass and BucketAccessClass resources, and bumped seaweedfs-cosi-driver to v0.3.0 ([**@sircthulhu**](https://github.com/sircthulhu) in #2097).
* **[apps][system] Add bucket user model with locking and storage pool selection**: Replaced implicit BucketAccess with per-user `users` map, added `locking` and `storagePool` parameters, renamed COSI BucketClass suffix from `-worm` to `-lock`, added `-readonly` BucketAccessClass for all topologies, and updated S3 Manager with login screen using per-user credentials ([**@IvanHunters**](https://github.com/IvanHunters) in #2119).
* **[rabbitmq] Add version selection for RabbitMQ instances**: Added `version` field (`v4.2`, `v4.1`, `v4.0`, `v3.13`) with chart-level validation, default `v4.2`, and an automatic migration to backfill the field on existing instances ([**@myasnikovdaniil**](https://github.com/myasnikovdaniil) in #2092).
* **[system] Add MongoDB Overview and InMemory Details Grafana dashboards**: Added two comprehensive Grafana dashboards for MongoDB monitoring — Overview (command operations, connections, cursors, query efficiency, write time) and InMemory Details (WiredTiger cache, transactions, concurrency, eviction). Dashboards are registered in `dashboards.list` for automatic GrafanaDashboard CRD generation ([**@IvanHunters**](https://github.com/IvanHunters) in #2158).
* **[dashboard] Add storageClass dropdown for all stateful apps**: Replaced the free-text `storageClass` input with an API-backed dropdown listing available StorageClasses from the cluster. Affects ClickHouse, Harbor, HTTPCache, Kubernetes, MariaDB, MongoDB, NATS, OpenBAO, Postgres, Qdrant, RabbitMQ, Redis, VMDisk (top-level `storageClass`), FoundationDB (`storage.storageClass`), and Kafka (`kafka.storageClass`, `zookeeper.storageClass`) ([**@sircthulhu**](https://github.com/sircthulhu) in #2131).
* **[bucket] Add readonly S3 access credentials**: Added a readonly `BucketAccessClass` to the SeaweedFS COSI chart and updated the bucket application to automatically provision two sets of S3 credentials per bucket: read-write (for UI) and readonly ([**@IvanHunters**](https://github.com/IvanHunters) in #2105).
* **[dashboard] Hide sidebar on cluster-level pages when no tenant selected**: Fixed broken URLs with double `//` on the main cluster page (before tenant selection) by clearing `CUSTOMIZATION_SIDEBAR_FALLBACK_ID` so no sidebar renders when no namespace is selected ([**@sircthulhu**](https://github.com/sircthulhu) in #2106).
* **[cert-manager] Update cert-manager to v1.19.3**: Upgraded cert-manager with new CRDs moved into a dedicated CRD package, added global `nodeSelector` and `hostUsers` (pod user-namespace isolation), and renamed `ServiceMonitor` targetPort default to `http-metrics` ([**@myasnikovdaniil**](https://github.com/myasnikovdaniil) in #2070).
* **[dashboard] Add backupClasses dropdown to Plan/BackupJob forms**: Replaced free-text input for `backupClass` field with an API-backed dropdown populated with available BackupClass resources, making it easier to select the correct backup target ([**@androndo**](https://github.com/androndo) in #2104).
## Fixes
* **[platform] Fix package name conversion in migration script**: Fixed the `migrate-to-version-1.0.sh` script to correctly prepend the `cozystack.` prefix when converting `BUNDLE_DISABLE` and `BUNDLE_ENABLE` package name lists, ensuring packages are properly identified during the v0.41→v1.0 upgrade ([**@myasnikovdaniil**](https://github.com/myasnikovdaniil) in #2144, #2148).
* **[backups] Fix RBAC for backup controllers**: Updated RBAC permissions for the backup strategy controller to support enhanced backup and restore capabilities, including Velero integration and status management ([**@androndo**](https://github.com/androndo) in #2145).
* **[kubernetes] Set explicit MTU for Cilium in tenant clusters**: Set explicit MTU 1350 for Cilium in KubeVirt-based tenant Kubernetes clusters to prevent packet drops caused by VXLAN encapsulation overhead. Cilium's auto-detection does not account for VXLAN overhead (50 bytes) when the VM interface inherits MTU 1400 from the parent OVN/Geneve overlay, causing intermittent connectivity issues and HTTP 499 errors under load ([**@IvanHunters**](https://github.com/IvanHunters) in #2147).
* **[platform] Prevent cozystack-version ConfigMap from deletion**: Added resource protection annotations to prevent the `cozystack-version` ConfigMap from being accidentally deleted, improving platform stability ([**@myasnikovdaniil**](https://github.com/myasnikovdaniil) in #2112, #2114).
* **[installer] Add keep annotation to Namespace and update migration script**: Added `helm.sh/resource-policy: keep` annotation to the `cozy-system` Namespace in the installer Helm chart to prevent Helm from deleting the namespace and all HelmReleases within it when the installer release is removed. The v1.0 migration script is also updated to annotate the namespace and `cozystack-version` ConfigMap before migration ([**@kvaps**](https://github.com/kvaps) in #2122, #2123).
* **[dashboard] Add FlowSchema to exempt BFF from API throttling**: Added a `cozy-dashboard-exempt` FlowSchema to exempt the dashboard Back-End-for-Frontend service account from Kubernetes API Priority and Fairness throttling, preventing 429 errors under load ([**@kvaps**](https://github.com/kvaps) in #2121, #2124).
* **[platform] Suspend cozy-proxy if it conflicts with installer release during migration**: Added a check in the v0.41→v1.0 migration script to detect and suspend the `cozy-proxy` HelmRelease when its `releaseName` is set to `cozystack`, which conflicts with the installer release and would cause `cozystack-operator` deletion during the upgrade ([**@kvaps**](https://github.com/kvaps) in #2128, #2130).
* **[platform] Fix off-by-one error in run-migrations script**: Fixed a bug in the migration runner where the first required migration was always skipped due to an off-by-one error in the migration range calculation ([**@myasnikovdaniil**](https://github.com/myasnikovdaniil) in #2126, #2132).
* **[system] Fix Keycloak proxy configuration for v26.x**: Replaced the deprecated `KC_PROXY=edge` environment variable with `KC_PROXY_HEADERS=xforwarded` and `KC_HTTP_ENABLED=true` in the Keycloak StatefulSet. `KC_PROXY` was removed in Keycloak 26.x, previously causing "Non-secure context detected" warnings and broken cookie handling behind a reverse proxy with TLS termination ([**@sircthulhu**](https://github.com/sircthulhu) in #2125, #2134).
* **[dashboard] Allow clearing instanceType field and preserve newlines in secret copy**: Added `allowEmpty: true` to the `instanceType` field in the VMInstance form so users can explicitly clear it to use custom KubeVirt resources without a named instance type. Also fixed newline preservation when copying secrets with CMD+C ([**@sircthulhu**](https://github.com/sircthulhu) in #2135, #2137).
* **[dashboard] Restore stock-instance sidebars for namespace-level pages**: Restored `stock-instance-api-form`, `stock-instance-api-table`, `stock-instance-builtin-form`, and `stock-instance-builtin-table` sidebar resources that were inadvertently removed in #2106. Without these sidebars, namespace-level pages such as Backup Plans rendered as empty pages ([**@sircthulhu**](https://github.com/sircthulhu) in #2136, #2138).
## System Configuration
* **[platform] Disable private key rotation in CA certs**: Set `rotationPolicy: Never` for all CA/root certificates used by system components (ingress-nginx, linstor, linstor-scheduler, seaweedfs, victoria-metrics-operator, kubeovn-webhook, lineage-controller-webhook, cozystack-api, etcd, linstor API/internal) to prevent trust chain problems when CA certificates are reissued ([**@myasnikovdaniil**](https://github.com/myasnikovdaniil) in #2113).
## Development, Testing, and CI/CD
* **[ci] Add debug improvements for CI tests**: Added extra debug commands for Kubernetes startup diagnostics and improved error output in CI test runs ([**@myasnikovdaniil**](https://github.com/myasnikovdaniil) in #2111).
## Documentation
* **[website] Add object storage guide (pools, buckets, users)**: Added a comprehensive guide covering SeaweedFS object storage configuration including storage pools for tiered storage, bucket creation with access classes, per-user credential management, and credential rotation procedures ([**@sircthulhu**](https://github.com/sircthulhu) in cozystack/website#438).
* **[website] Add Build Your Own Platform (BYOP) guide**: Added a new "Build Your Own Platform" guide and split the installation documentation into platform installation and BYOP sub-pages, with cross-references throughout the documentation ([**@kvaps**](https://github.com/kvaps) in cozystack/website#437).
* **[website] Add white labeling guide**: Added a comprehensive guide for configuring white labeling (branding) in Cozystack v1, covering Dashboard fields (`titleText`, `footerText`, `tenantText`, `logoText`, `logoSvg`, `iconSvg`) and Keycloak fields (`brandName`, `brandHtmlName`). Includes SVG preparation workflow with theme-aware template variables and portable base64 encoding ([**@lexfrei**](https://github.com/lexfrei) in cozystack/website#441).
* **[website] Actualize backup and recovery documentation**: Reworked the backup and recovery docs to be user-focused, separating operator and tenant workflows. Added tenant-facing documentation for `BackupJob` and `Plan` resources and a new Velero administration guide for operators ([**@androndo**](https://github.com/androndo) in cozystack/website#434).
* **[website] Add step to protect namespace before upgrading**: Updated the cluster upgrade guide and v0.41→v1.0 migration guide with a required step to annotate the `cozy-system` namespace and `cozystack-version` ConfigMap with `helm.sh/resource-policy=keep` before running `helm upgrade` ([**@kvaps**](https://github.com/kvaps) in cozystack/website#435).
* **[website] Replace bundles documentation with variants**: Renamed the "Bundles" documentation section to "Variants" to match current Cozystack terminology. Removed deprecated variants and added new ones: `default` and `isp-full-generic` ([**@kvaps**](https://github.com/kvaps) in cozystack/website#433).
* **[website] Fix component values override instructions**: Corrected the component values override documentation to reflect current configuration patterns ([**@kvaps**](https://github.com/kvaps) in cozystack/website#436).
## Breaking Changes & Upgrade Notes
* **[bucket] Bucket user model now requires explicit user definitions**: The implicit default `BucketAccess` resource is no longer created automatically. Existing buckets that relied on a single auto-generated credential secret will need to define users explicitly in the `users` map after upgrading. Each user entry creates its own `BucketAccess` resource and credential secret (optionally with `readonly: true`). The COSI BucketClass suffix has also been renamed from `-worm` to `-lock` ([**@IvanHunters**](https://github.com/IvanHunters) in #2119).
## Contributors
We'd like to thank all contributors who made this release possible:
* **[dashboard] Fix hidden MarketplacePanel resources appearing in sidebar menu**: The sidebar was generated independently from MarketplacePanels, always showing all resources regardless of their `hidden` state. Fixed by fetching MarketplacePanels during sidebar reconciliation and skipping resources where `hidden=true`, so hiding a resource from the marketplace also removes it from the sidebar navigation ([**@IvanHunters**](https://github.com/IvanHunters) in #2177, #2203).
* **[dashboard] Fix disabled/hidden state overwritten on every MarketplacePanel reconciliation**: The controller was hardcoding `disabled=false` and `hidden=false` on every reconciliation, silently overwriting any user changes made through the dashboard UI. Fixed by reading and preserving the current `disabled`/`hidden` values from the existing resource before updating ([**@IvanHunters**](https://github.com/IvanHunters) in #2176, #2201).
* **[dashboard] Fix External IPs factory EnrichedTable rendering**: The external-IPs table displayed empty rows because the factory used incorrect `EnrichedTable` properties. Replaced `clusterNamePartOfUrl` with `cluster` and changed `pathToItems` from array to dot-path string format, consistent with all other working `EnrichedTable` instances ([**@IvanHunters**](https://github.com/IvanHunters) in #2175, #2193).
* **[platform] Fix VM MAC address not preserved during virtual-machine to vm-instance migration**: Kube-OVN reads MAC address exclusively from the pod annotation `ovn.kubernetes.io/mac_address`, not from the IP resource `spec.macAddress`. Without the annotation, migrated VMs received a new random MAC, breaking OS-level network configurations that match by MAC (e.g. netplan). Added a Helm `lookup` for the Kube-OVN IP resource in the vm-instance chart so that MAC and IP addresses are automatically injected as pod annotations when the resource exists ([**@sircthulhu**](https://github.com/sircthulhu) in #2169, #2190).
* **[etcd-operator] Replace deprecated kube-rbac-proxy image**: The `gcr.io/kubebuilder/kube-rbac-proxy` image became unavailable after Google Container Registry was deprecated. Replaced it with `quay.io/brancz/kube-rbac-proxy` from the original upstream author, restoring etcd-operator functionality ([**@kvaps**](https://github.com/kvaps) in #2181, #2182).
* **[migrations] Handle missing RabbitMQ CRD in migration 34**: Migration 34 failed with an error when the `rabbitmqs.apps.cozystack.io` CRD did not exist — which occurs on clusters where RabbitMQ was never installed. Added a CRD presence check before attempting to list resources so that migration 34 completes cleanly on such clusters ([**@IvanHunters**](https://github.com/IvanHunters) in #2168, #2180).
* **[keycloak] Fix Keycloak crashloop due to misconfigured health probes**: Keycloak 26.x redirects all HTTP requests on port 8080 to the configured HTTPS hostname; since kubelet does not follow redirects, liveness and readiness probes failed causing a crashloop. Fixed by enabling `KC_HEALTH_ENABLED=true`, exposing management port 9000, and switching all probes to `/health/live` and `/health/ready` on port 9000. Also added a `startupProbe` for improved startup tolerance ([**@mattia-eleuteri**](https://github.com/mattia-eleuteri) in #2162, #2179).
* **[bucket] Fix S3 Manager endpoint mismatch with COSI credentials**: The S3 Manager UI previously constructed an `s3.<tenant>.<cluster-domain>` endpoint even though COSI-issued bucket credentials point to the root-level S3 endpoint. This caused login failures with "invalid credentials" despite valid secrets. The deployment now uses the actual endpoint from `BucketInfo`, with the old namespace-based endpoint kept only as a fallback before `BucketAccess` secrets exist ([**@IvanHunters**](https://github.com/IvanHunters) in #2211, #2215).
* **[platform] Fix spurious OpenAPI post-processing errors on cozystack-api startup**: The OpenAPI post-processor was being invoked for non-`apps.cozystack.io` group versions where the base `Application*` schemas do not exist, producing noisy startup errors on every API server launch. It now skips those non-apps group versions gracefully instead of returning an error ([**@kvaps**](https://github.com/kvaps) in #2212, #2217).
## Documentation
* **[website] Add troubleshooting for packages stuck in `DependenciesNotReady`**: Added an operations guide that explains how to diagnose missing package dependencies in operator logs and corrected the packages management development docs to use the current `make image-packages` target ([**@kvaps**](https://github.com/kvaps) in cozystack/website#450).
* **[website] Reorder installation docs to install the operator before the platform package**: Updated the platform installation guide and tutorial so the setup sequence consistently installs the Cozystack operator first, then prepares and applies the Platform Package, matching the rest of the documentation set ([**@sircthulhu**](https://github.com/sircthulhu) in cozystack/website#449).
* **[website] Add automated installation guide for the Ansible collection**: Added a full guide for deploying Cozystack with the `cozystack.installer` collection, including inventory examples, distro-specific playbooks, configuration reference, and explicit version pinning guidance ([**@lexfrei**](https://github.com/lexfrei) in cozystack/website#442).
* **[website] Expand monitoring and platform architecture reference docs**: Added a tenant custom metrics collection guide for `VMServiceScrape` and `VMPodScrape`, and documented `PackageSource`/`Package` architecture, reconciliation flow, rollback behavior, and the `cozypkg` workflow in Key Concepts ([**@IvanHunters**](https://github.com/IvanHunters) in cozystack/website#444, cozystack/website#445).
* **[website] Improve operations guides for CA rotation and Velero backups**: Completed the CA rotation documentation with dry-run and post-rotation credential retrieval steps, and expanded the backup configuration guide with concrete examples, verification commands, and clearer operator procedures ([**@kvaps**](https://github.com/kvaps) in cozystack/website#406; [**@androndo**](https://github.com/androndo) in cozystack/website#440).
# Wait for container to be started (pod Running does not guarantee container is ready for exec on slow CI)
if ! timeout 120 sh -ec "until kubectl -n tenant-test get pod openbao-$name-0 --output jsonpath='{.status.containerStatuses[0].started}' 2>/dev/null | grep -q true; do sleep 5; done"; then
echo "=== DEBUG: Container did not start in time ===" >&2
kubectl -n tenant-test describe pod openbao-$name-0 >&2 || true
# bao status exit codes: 0 = unsealed, 1 = error/not ready, 2 = sealed but responsive
if ! timeout 60 sh -ec "until kubectl -n tenant-test exec openbao-$name-0 -- bao status >/dev/null 2>&1; rc=\$?; test \$rc -eq 0 -o \$rc -eq 2; do sleep 3; done"; then
echo "=== DEBUG: OpenBAO API did not become responsive ===" >&2
kubectl -n tenant-test describe pod openbao-$name-0 >&2 || true
| `replicas` | Number of OpenBAO replicas. HA with Raft is automatically enabled when replicas > 1. Switching between standalone (file storage) and HA (Raft storage) modes requires data migration. | `int` | `1` |
| `resources` | Explicit CPU and memory configuration for each OpenBAO replica. When omitted, the preset defined in `resourcesPreset` is applied. | `object` | `{}` |
| `resources.cpu` | CPU available to each replica. | `quantity` | `""` |
| `resources.memory` | Memory (RAM) available to each replica. | `quantity` | `""` |
| `resourcesPreset` | Default sizing preset used when `resources` is omitted. | `string` | `small` |
| `size` | Persistent Volume Claim size for data storage. | `quantity` | `10Gi` |
| `storageClass` | StorageClass used to store the data. | `string` | `""` |
| `external` | Enable external access from outside the cluster. | `bool` | `false` |
"description":"Enable external access from outside the cluster.",
"type":"boolean",
"default":false
},
"replicas":{
"description":"Number of OpenBAO replicas. HA with Raft is automatically enabled when replicas \u003e 1. Switching between standalone (file storage) and HA (Raft storage) modes requires data migration.",
"type":"integer",
"default":1
},
"resources":{
"description":"Explicit CPU and memory configuration for each OpenBAO replica. When omitted, the preset defined in `resourcesPreset` is applied.",
## @param {int} replicas - Number of OpenBAO replicas. HA with Raft is automatically enabled when replicas > 1. Switching between standalone (file storage) and HA (Raft storage) modes requires data migration.
replicas:1
## @param {Resources} [resources] - Explicit CPU and memory configuration for each OpenBAO replica. When omitted, the preset defined in `resourcesPreset` is applied.
resources:{}
## @param {ResourcesPreset} resourcesPreset="small" - Default sizing preset used when `resources` is omitted.
resourcesPreset:"small"
## @param {quantity} size - Persistent Volume Claim size for data storage.
size:10Gi
## @param {string} storageClass - StorageClass used to store the data.
storageClass:""
## @param {bool} external - Enable external access from outside the cluster.
Some files were not shown because too many files have changed in this diff
Show More
Reference in New Issue
Block a user
Blocking a user prevents them from interacting with repositories, such as opening or commenting on pull requests or issues. Learn more about blocking a user.