<!-- Thank you for making a contribution! Here are some tips for you:
- Start the PR title with the [label] of Cozystack component:
- For system components: [platform], [system], [linstor], [cilium],
[kube-ovn], [dashboard], [cluster-api], etc.
- For managed apps: [apps], [tenant], [kubernetes], [postgres],
[virtual-machine] etc.
- For development and maintenance: [tests], [ci], [docs], [maintenance].
- If it's a work in progress, consider creating this PR as a draft.
- Don't hesistate to ask for opinion and review in the community chats,
even if it's still a draft.
- Add the label `backport` if it's a bugfix that needs to be backported
to a previous version.
-->
## What this PR does
### Release note
<!-- Write a release note:
- Explain what has changed internally and for users.
- Start with the same [label] as in the PR title
- Follow the guidelines at
https://github.com/kubernetes/community/blob/master/contributors/guide/release-notes.md.
-->
```release-note
- fixed rbac for backup controllers
```
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit
* **Chores**
* Updated backup controller permissions to focus on core backup
operations.
* Expanded backup strategy controller permissions to support enhanced
backup and restore capabilities, including Velero integration and status
management.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
This PR adds the changelog for release `v1.0.2`.
✅ Changelog has been automatically generated in
`docs/changelogs/v1.0.2.md`.
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit
* **Documentation**
* Published v1.0.2 release notes.
* **Bug Fixes**
* Fixed migration script to ensure all upgrade steps execute.
* Improved dashboard functionality for field clearing and secret
copying.
* Restored sidebar navigation on namespace-level pages.
* Updated proxy configurations for enhanced TLS handling.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
## What this PR does
Replaces the plain text input for `storageClass` fields with an
API-backed dropdown listing available StorageClasses from the cluster.
Follows the same pattern as the `instanceType` dropdown for VMInstance.
Affected applications:
- **Top-level `spec.storageClass`**: ClickHouse, Harbor, HTTPCache,
Kubernetes, MariaDB, MongoDB, NATS, OpenBAO, Postgres, Qdrant,
RabbitMQ, Redis, VMDisk
- **Nested `spec.storage.storageClass`**: FoundationDB
- **Nested `spec.kafka.storageClass` / `spec.zookeeper.storageClass`**:
Kafka
### Release note
```release-note
[dashboard] storageClass fields in stateful app forms now render as a
dropdown populated with available StorageClasses from the cluster,
instead of a free-text input.
```
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit
## Release Notes
* **New Features**
* Storage class selection dropdowns now available in configuration forms
for multiple database, messaging, and storage services.
* **Tests**
* Added comprehensive test coverage for storage class configuration
handling.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
## What this PR does
Restores `stock-instance-api-form`, `stock-instance-api-table`,
`stock-instance-builtin-form`, and `stock-instance-builtin-table`
sidebar
resources that were removed in #2106, and adds them back to the orphan
cleanup allowlist.
PR #2106 removed these sidebars to fix broken URLs on the main page
before
namespace selection (`default//api-table/...`). However,
`stock-instance-*`
sidebars are required by the frontend for namespace-level
api-table/api-form
pages. Without them and with `CUSTOMIZATION_SIDEBAR_FALLBACK_ID=""`, the
frontend cannot find a sidebar for pages like Backup Plans and renders
an
empty page where no interaction is possible.
The broken-URL bug is already fully fixed by
`CUSTOMIZATION_SIDEBAR_FALLBACK_ID=""`
in `web.yaml`. Re-adding `stock-instance-*` does not reintroduce it,
since
these sidebars are only shown when the user is on a namespace-level page
where the `{namespace}` placeholder is filled.
### Release note
```release-note
[dashboard] fix empty page on Backup Plans and other namespace-level api-table pages
```
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit
* **New Features**
* Added four new dashboard sidebar resources for stock instances: API
form, API table, built-in form, and built-in table views. These enable
expanded dashboard customization options for managing stock instance
configurations and data.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
## What this PR does
Updates the openapi-k8s-toolkit integration in the dashboard to fix two
UX issues:
**1. Allow clearing the instanceType field in VMInstance form**
When `instanceType` has a default value, clearing the field in the form
UI would
silently revert to the default, making it impossible to explicitly send
an empty
value. This blocked use of custom KubeVirt resources without a named
instance type.
Adds `allowEmpty: true` to the instanceType listInput field so the BFF
preserves
an explicit empty value. Also introduces a generic `persistType` prop
(`'str' | 'number' | 'arr' | 'obj'`) to the listInput component, so the
allow-empty behaviour works correctly for any field type, not just
strings.
Updates openapi-k8s-toolkit to release/1.4.0 (d6b9e4ad), which already
includes
the FormListInput layout refactor — the previous
formlistinput-value-binding.diff
patch is no longer needed.
Upstream PR:
https://github.com/PRO-Robotech/openapi-k8s-toolkit/pull/340
**2. Preserve newlines when copying secrets with CMD+C**
Native `input[type=text]` strips newlines on copy. Adds an `onCopy`
handler to
the SecretBase64Plain component that intercepts the copy event and
writes the full
decoded value (including newlines) to the clipboard.
Upstream PR:
https://github.com/PRO-Robotech/openapi-k8s-toolkit/pull/339
### Release note
```release-note
[dashboard] Fix clearing instanceType in VMInstance form: explicit empty value
is now correctly sent to the API instead of falling back to the schema default.
Fix CMD+C copying of secrets stripping newlines.
```
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit
* **New Features**
* Dropdown fields now support configuration to allow empty selections
* Enhanced empty value handling for form fields across multiple data
types (string, number, array, object)
* **Bug Fixes**
* Fixed secret field copy functionality to preserve plain-text format
when visible
* **Chores**
* Updated base image dependencies for dashboard build
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
PR #2106 removed stock-instance-* sidebar resources to fix broken URLs
on the main page before namespace selection. However, these sidebars are
required for rendering namespace-level pages (api-table, api-form, etc.)
such as the Backup Plans page.
Without stock-instance-api-table, the frontend cannot find the sidebar
for namespace-scoped api-table pages and renders an empty page instead.
The original bug (broken URLs with empty namespace placeholder) is already
fixed by CUSTOMIZATION_SIDEBAR_FALLBACK_ID="" in web.yaml, so re-adding
stock-instance-* sidebars does not reintroduce it.
Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: Kirill Ilin <stitch14@yandex.ru>
## What this PR does
Replace deprecated `KC_PROXY=edge` with `KC_PROXY_HEADERS=xforwarded`
and `KC_HTTP_ENABLED=true` in the Keycloak StatefulSet template.
`KC_PROXY` was removed in Keycloak 26.x, causing "Non-secure context
detected" warnings and broken cookie handling when running behind a
reverse proxy with TLS termination.
### Release note
```release-note
[system] Fix Keycloak proxy headers configuration for compatibility with Keycloak 26.x
```
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit
## Release Notes
* **Chores**
* Updated system configuration to improve proxy header handling and
enable direct HTTP support for enhanced compatibility with reverse proxy
environments.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
Update openapi-k8s-toolkit commit to d6b9e4ad (release/1.4.0) which
includes the FormListInput layout refactor, making formlistinput-value-binding.diff
obsolete.
Set allowEmpty: true on the VMInstance instanceType field so users can
explicitly clear the selection and override the default instance type.
Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: Kirill Ilin <stitch14@yandex.ru>
Update openapi-k8s-toolkit to release/1.4.0 (d6b9e4ad). The previous
value-binding layout refactor is already included upstream, so drop the
formlistinput-value-binding.diff patch.
Add formlistinput-allow-empty.diff patch which introduces two props to
the listInput component:
- allowEmpty: when set, auto-persists the field so BFF sends an empty
value instead of falling back to the schema default
- persistType: controls the type of empty value ('str' | 'number' | 'arr'
| 'obj'), allowing the feature to work correctly for any field type
Set allowEmpty: true on the VMInstance instanceType field so users can
explicitly clear the selection and override the default instance type.
Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: Kirill Ilin <stitch14@yandex.ru>
<!-- Thank you for making a contribution! Here are some tips for you:
- Start the PR title with the [label] of Cozystack component:
- For system components: [platform], [system], [linstor], [cilium],
[kube-ovn], [dashboard], [cluster-api], etc.
- For managed apps: [apps], [tenant], [kubernetes], [postgres],
[virtual-machine] etc.
- For development and maintenance: [tests], [ci], [docs], [maintenance].
- If it's a work in progress, consider creating this PR as a draft.
- Don't hesistate to ask for opinion and review in the community chats,
even if it's still a draft.
- Add the label `backport` if it's a bugfix that needs to be backported
to a previous version.
-->
## What this PR does
### Release note
<!-- Write a release note:
- Explain what has changed internally and for users.
- Start with the same [label] as in the PR title
- Follow the guidelines at
https://github.com/kubernetes/community/blob/master/contributors/guide/release-notes.md.
-->
```release-note
[platform] Fixed off-by-one error where the first required migration was always skipped.
```
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit
* **Bug Fixes**
* Corrected migration range handling so upgrade steps run for the
intended version window, preventing skipped or duplicated migrations.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
<!-- Thank you for making a contribution! Here are some tips for you:
- Start the PR title with the [label] of Cozystack component:
- For system components: [platform], [system], [linstor], [cilium],
[kube-ovn], [dashboard], [cluster-api], etc.
- For managed apps: [apps], [tenant], [kubernetes], [postgres],
[virtual-machine] etc.
- For development and maintenance: [tests], [ci], [docs], [maintenance].
- If it's a work in progress, consider creating this PR as a draft.
- Don't hesistate to ask for opinion and review in the community chats,
even if it's still a draft.
- Add the label `backport` if it's a bugfix that needs to be backported
to a previous version.
-->
## What this PR does
### Release note
<!-- Write a release note:
- Explain what has changed internally and for users.
- Start with the same [label] as in the PR title
- Follow the guidelines at
https://github.com/kubernetes/community/blob/master/contributors/guide/release-notes.md.
-->
```release-note
Disabled private key rotation in every CA cert in cozystack system packages to prevent trustchain problems when CA cert reissued
```
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit
* **Chores**
* Disabled private-key rotation (set rotationPolicy: Never) for CA/root
certificates used by multiple system components (ingress-nginx, linstor,
linstor-scheduler, seaweedfs, victoria-metrics-operator,
kubeovn-webhook, lineage-controller-webhook, cozystack-api, etcd,
linstor API/internal, seaweedfs).
* Added patch application steps to relevant update workflows to ensure
the certificate template changes are applied during chart/update
operations.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
<!-- Thank you for making a contribution! Here are some tips for you:
- Start the PR title with the [label] of Cozystack component:
- For system components: [platform], [system], [linstor], [cilium],
[kube-ovn], [dashboard], [cluster-api], etc.
- For managed apps: [apps], [tenant], [kubernetes], [postgres],
[virtual-machine] etc.
- For development and maintenance: [tests], [ci], [docs], [maintenance].
- If it's a work in progress, consider creating this PR as a draft.
- Don't hesistate to ask for opinion and review in the community chats,
even if it's still a draft.
- Add the label `backport` if it's a bugfix that needs to be backported
to a previous version.
-->
## What this PR does
### Release note
<!-- Write a release note:
- Explain what has changed internally and for users.
- Start with the same [label] as in the PR title
- Follow the guidelines at
https://github.com/kubernetes/community/blob/master/contributors/guide/release-notes.md.
-->
```release-note
[rabbitmq] Added version selection to newly created RabbitMQ instances.
```
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit
* **New Features**
* Configurable RabbitMQ major.minor version selector (v4.2, v4.1, v4.0,
v3.13), default v4.2; chart validates selection and uses it to pick the
runtime image.
* **Chores**
* Default RabbitMQ image updated to 4.2.4.
* Added an automated version-update helper and a Makefile target to
refresh available versions and regenerate manifests.
* **Migration**
* Migration added to backfill the version field on existing RabbitMQ
resources.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
Replace plain text input with an API-backed listInput dropdown for
storageClass fields across all applications that expose them.
The dropdown fetches available StorageClasses from the cluster via
/api/clusters/{cluster}/k8s/apis/storage.k8s.io/v1/storageclasses,
following the same pattern as the instanceType dropdown for VMInstance.
Top-level spec.storageClass: ClickHouse, Harbor, HTTPCache, Kubernetes,
MariaDB, MongoDB, NATS, OpenBAO, Postgres, Qdrant, RabbitMQ, Redis, VMDisk.
Nested paths: FoundationDB (spec.storage.storageClass),
Kafka (spec.kafka.storageClass and spec.zookeeper.storageClass).
Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: Kirill Ilin <stitch14@yandex.ru>
## What this PR does
Adds a check in the migration script to detect and suspend the
`cozy-proxy`
HelmRelease if it has `releaseName: cozystack`, which conflicts with the
installer
release and causes cozystack-operator deletion during upgrade from v0.41
to v1.0.
### Release note
```release-note
[platform] Fix migration script to handle cozy-proxy releaseName conflict during v0.41→v1.0 upgrade.
```
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit
* **Chores**
* Enhanced the version 1.0 migration process with automatic conflict
detection and interactive guidance, prompting users to resolve issues
during the upgrade for a smoother migration experience.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
In v0.41.x, cozy-proxy HelmRelease was configured with
releaseName: cozystack, which collides with the installer helm release.
If not suspended before upgrade, the cozy-proxy HR reconciles and
overwrites the installer release, deleting cozystack-operator.
Add a check in the migration script that detects this conflict and
suspends the cozy-proxy HelmRelease before proceeding.
Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: Andrei Kvapil <kvapss@gmail.com>
KC_PROXY=edge was deprecated and removed in Keycloak 26.x, causing
"Non-secure context detected" warnings and broken cookie handling
behind reverse proxy. Replace with KC_PROXY_HEADERS=xforwarded and
KC_HTTP_ENABLED=true.
Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: Kirill Ilin <stitch14@yandex.ru>
This PR adds the changelog for release `v1.0.1`.
✅ Changelog has been automatically generated in
`docs/changelogs/v1.0.1.md`.
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit
* **Documentation**
* Published v1.0.1 release notes with platform, installer, and dashboard
bug fixes
* Updated website documentation: renamed "Bundles" to "Variants," added
new variant options, and updated cross-references
* Added upgrade protection instructions for system components prior to
upgrade
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
## Summary
- Add FlowSchema `cozy-dashboard-exempt` to exempt the dashboard BFF
service account (`incloud-web-web`) from API Priority and Fairness
throttling
- BFF falls under the default `service-accounts` FlowSchema →
`workload-low` priority level, which causes 429 responses under load
## Test plan
- [ ] Deploy to a cluster with dashboard enabled
- [ ] Verify FlowSchema is created: `kubectl get flowschema
cozy-dashboard-exempt`
- [ ] Verify BFF no longer receives 429 errors under load
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit
* **Chores**
* Added a new Kubernetes FlowSchema configuration for system resource
access management.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
## What this PR does
Adds `helm.sh/resource-policy: keep` annotation to the `cozy-system`
Namespace resource
in the installer helm chart. This prevents Helm from deleting the
namespace (and all
HelmReleases within it) when the installer release is removed.
Also updates the v1.0 migration script to annotate the `cozy-system`
namespace and
`cozystack-version` ConfigMap with the same policy before generating the
Package resource.
### Release note
```release-note
[platform] Add helm.sh/resource-policy=keep annotation to cozy-system Namespace in installer chart to prevent namespace deletion on HelmRelease removal. Update migration script to protect namespace and cozystack-version ConfigMap before migration.
```
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit
* **Chores**
* Enhanced migration process with an interactive step to safeguard
critical resources during system upgrades.
* Added resource protection mechanisms to prevent unintended removal
during Helm operations.
* Improved control flow in the upgrade script with explicit user
confirmation prompts.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
Add helm.sh/resource-policy=keep annotation to the cozy-system Namespace
in the installer helm chart. This prevents Helm from deleting the
namespace when the HelmRelease is removed, which would otherwise destroy
all other HelmReleases within it.
Update the migration script to annotate the cozy-system namespace and
cozystack-version ConfigMap with helm.sh/resource-policy=keep before
generating the Package resource.
Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: Andrei Kvapil <kvapss@gmail.com>
The dashboard BFF service account (incloud-web-web) falls under the
default "service-accounts" FlowSchema which maps to the "workload-low"
priority level. Under load, this causes API Priority and Fairness to
return 429 (Too Many Requests) responses to the BFF, resulting in 500
errors for dashboard users.
Add a FlowSchema that maps the BFF service account to the "exempt"
priority level to prevent APF throttling of dashboard API requests.
Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: Andrei Kvapil <kvapss@gmail.com>
<!-- Thank you for making a contribution! Here are some tips for you:
- Start the PR title with the [label] of Cozystack component:
- For system components: [platform], [system], [linstor], [cilium],
[kube-ovn], [dashboard], [cluster-api], etc.
- For managed apps: [apps], [tenant], [kubernetes], [postgres],
[virtual-machine] etc.
- For development and maintenance: [tests], [ci], [docs], [maintenance].
- If it's a work in progress, consider creating this PR as a draft.
- Don't hesistate to ask for opinion and review in the community chats,
even if it's still a draft.
- Add the label `backport` if it's a bugfix that needs to be backported
to a previous version.
-->
## What this PR does
### Release note
<!-- Write a release note:
- Explain what has changed internally and for users.
- Start with the same [label] as in the PR title
- Follow the guidelines at
https://github.com/kubernetes/community/blob/master/contributors/guide/release-notes.md.
-->
```release-note
[ci] Added more debug information to ci tests
```
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit
* **Chores**
* Enhanced error handling and diagnostic output in development testing
infrastructure.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
## What this PR does
- Add `volume.pools` (Simple topology) and `volume.zones[name].pools`
(MultiZone topology) for creating separate Volume StatefulSets per disk
type (SSD/HDD/NVMe)
- Add `nodeSelector`, `storageClass`, and `dataCenter` overrides for
zones in MultiZone topology
- Create per-pool `BucketClass` and `BucketAccessClass` COSI resources
(including WORM and readonly variants)
- Bump seaweedfs-cosi-driver to v0.3.0 (adds `disk` parameter support in
BucketClass)
- Add `volume.diskType` field to tag default volume servers with a disk
type
### How It Works
#### Simple Topology
Each storage pool in `volume.pools` creates an additional Volume
StatefulSet alongside the default one. All pods (default + pool) may run
on the same nodes. SeaweedFS distinguishes storage via the
`-disk=<type>` flag on volume servers.
```yaml
volume:
replicas: 2
size: 10Gi
diskType: ""
pools:
ssd:
diskType: ssd
size: 50Gi
storageClass: local-nvme
```
#### MultiZone Topology
Pools are defined per-zone in `volume.zones[name].pools`. A StatefulSet
is created for each **zone × pool** combination (e.g., `us-east-ssd`,
`us-west-ssd`), inheriting nodeSelector and dataCenter from its parent
zone.
```yaml
volume:
replicas: 2
size: 10Gi
zones:
us-east:
replicas: 2
size: 100Gi
# nodeSelector defaults to: topology.kubernetes.io/zone: us-east
pools:
ssd:
diskType: ssd
size: 50Gi
us-west:
replicas: 3
```
In Simple topology, `volume.pools` is used. In MultiZone,
`volume.zones[name].pools` is used — `volume.pools` is explicitly
blocked to prevent BucketClasses without backing StatefulSets.
### Zone Overrides (MultiZone)
Zones now support:
- `nodeSelector` — YAML string, defaults to
`topology.kubernetes.io/zone: <zoneName>`
- `storageClass` — defaults to `volume.storageClass`
- `dataCenter` — SeaweedFS data center name, defaults to zone name
### COSI Resources
Each unique pool name generates 4 cluster-scoped COSI resources:
- `<namespace>-<pool>` BucketClass (Delete policy, `disk: <type>`)
- `<namespace>-<pool>-worm` BucketClass (Retain policy, object lock)
- `<namespace>-<pool>` BucketAccessClass (readwrite)
- `<namespace>-<pool>-readonly` BucketAccessClass (readonly)
### Validation
- Pool names must be valid DNS labels (no dots)
- Pool names must not end with `-worm` or `-readonly` (reserved COSI
suffixes)
- `diskType` is required and must be lowercase alphanumeric
- Pool `diskType` must differ from `volume.diskType`
- Pool name + zone name composed names must not collide with existing
zone names
- `volume.pools` is blocked in Client and MultiZone topologies
- All replicas have `minimum: 1` in JSON schema
### Inheritance Chain
| Field | Pool fallback (Simple) | Pool fallback (MultiZone) |
| ------------ | -------------------------------- |
---------------------------------------- |
| `replicas` | pool → volume | pool → zone → volume |
| `size` | pool → volume | pool → zone → volume |
| `storageClass` | pool → volume | pool → zone → volume |
| `resources` | pool → volume | pool → volume (zone resources inherited)
|
### Backward Compatibility
- Default `volume.pools: {}` produces identical output to current chart
- Default `volume.diskType: ""` adds no extra flags
- Existing default BucketClass remains unchanged
- No migration needed — pools create new StatefulSets
### Test plan
- [x] `helm template` with empty pools — output identical to current
- [x] `helm template` with Simple + volume.pools — additional volume
StatefulSets, BucketClasses, WorkloadMonitors
- [x] `helm template` with MultiZone + zone.pools — zone × pool
cross-product StatefulSets
- [x] `helm template` with `volume.diskType: hdd` — extraArgs includes
`-disk=hdd`
- [x] `helm template` with Client + volume.pools — fails with validation
error
- [x] `helm template` with MultiZone + volume.pools — fails with
validation error
- [x] `helm template` with reserved pool name suffix — fails with
validation
- [x] Deploy to test cluster and verify volume servers register with
correct disk types
### Release note
```release-note
[seaweedfs] add storage pools support for tiered storage with per-pool COSI resources
```
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit
* **New Features**
* Added support for multiple storage pools with configurable disk types
and resource allocation.
* Introduced per-pool bucket and access classes for storage management.
* Added zone-aware pool configurations for multi-zone deployments.
* Enhanced topology-driven resource monitoring and allocation.
* **Documentation**
* Updated service documentation with expanded configuration parameters
and improved formatting.
* **Chores**
* Updated container image to latest version.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
## Summary
- Add a readonly `BucketAccessClass` to the seaweedfs COSI chart with
`accessPolicy: "readonly"` parameter
- Each bucket now automatically creates two sets of S3 credentials:
readWrite (existing, for UI) and readonly
- Update dashboard RBAC and ApplicationDefinition to expose the readonly
credentials secret
## Test plan
- [ ] Verify seaweedfs chart templates render both `BucketAccessClass`
resources (readWrite and readonly)
- [ ] Verify bucket app templates render `BucketClaim` + 2
`BucketAccess` (readWrite + readonly)
- [ ] Deploy a bucket and confirm both credential secrets are created by
COSI driver
- [ ] Confirm readonly credentials can only read/list objects, not
write/delete
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit
* **New Features**
* Introduced read-only bucket access capabilities. Users can now
configure read-only permissions for bucket storage resources,
complementing existing access control options. New read-only access
classes and configurations provide enhanced security controls and
finer-grained permission management. This enables improved data
protection while maintaining flexibility for various access requirements
across applications and storage infrastructure.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
<!-- Thank you for making a contribution! Here are some tips for you:
- Start the PR title with the [label] of Cozystack component:
- For system components: [platform], [system], [linstor], [cilium],
[kube-ovn], [dashboard], [cluster-api], etc.
- For managed apps: [apps], [tenant], [kubernetes], [postgres],
[virtual-machine] etc.
- For development and maintenance: [tests], [ci], [docs], [maintenance].
- If it's a work in progress, consider creating this PR as a draft.
- Don't hesistate to ask for opinion and review in the community chats,
even if it's still a draft.
- Add the label `backport` if it's a bugfix that needs to be backported
to a previous version.
-->
## What this PR does
### Release note
<!-- Write a release note:
- Explain what has changed internally and for users.
- Start with the same [label] as in the PR title
- Follow the guidelines at
https://github.com/kubernetes/community/blob/master/contributors/guide/release-notes.md.
-->
```release-note
[cert-manager] Updated cert-manager to v1.19.3
```
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit
* **New Features**
* Global nodeSelector and hostUsers (pod user-namespace isolation) added
* New/updated CRDs for cert-manager resources (Certificate,
CertificateRequest, Order, etc.)
* **Documentation**
* Revised chart docs and installation guidance; added deprecation/notice
about private-key rotation
* Removed legacy CRD README and schema files from the CRD package
(documentation consolidated)
* **Chores**
* Upgraded cert-manager to v1.19.3
* Moved CRDs into a dedicated CRD package; ServiceMonitor targetPort
default renamed to "http-metrics"
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
<!-- Thank you for making a contribution! Here are some tips for you:
- Start the PR title with the [label] of Cozystack component:
- For system components: [platform], [system], [linstor], [cilium],
[kube-ovn], [dashboard], [cluster-api], etc.
- For managed apps: [apps], [tenant], [kubernetes], [postgres],
[virtual-machine] etc.
- For development and maintenance: [tests], [ci], [docs], [maintenance].
- If it's a work in progress, consider creating this PR as a draft.
- Don't hesistate to ask for opinion and review in the community chats,
even if it's still a draft.
- Add the label `backport` if it's a bugfix that needs to be backported
to a previous version.
-->
## What this PR does
### Release note
<!-- Write a release note:
- Explain what has changed internally and for users.
- Start with the same [label] as in the PR title
- Follow the guidelines at
https://github.com/kubernetes/community/blob/master/contributors/guide/release-notes.md.
-->
```release-note
[platform] Prevent cozystack-version configmap from deletion
```
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit
* **Chores**
* Updated deployment resource configuration to improve system
reliability by ensuring critical components are properly retained and
protected during system operations and maintenance activities.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
## What this PR does
Adds OpenBAO (open-source Vault fork) as a new managed PaaS application
in Cozystack.
**Structure follows existing app patterns (qdrant, nats):**
- System chart with vendored upstream `openbao/openbao` (chart v0.25.3,
appVersion v2.5.0)
- App chart with standalone/HA mode switching based on replicas count
- TLS via cert-manager self-signed certificates per instance
- ApplicationDefinition, PackageSource, PaaS bundle entry
- E2E test with init/unseal workflow
**Key design decisions:**
- `replicas: 1` → standalone mode with file storage; `replicas > 1` → HA
with Raft integrated storage and retry_join with TLS peer verification
- TLS enabled by default — each instance gets a self-signed Certificate
with DNS SANs covering services and pod addresses
- `disable_mlock = true` in HCL config since default security context
drops IPC_LOCK capability
- Injector and CSI provider disabled (cluster-scoped components, not
safe per-tenant)
- No auto-init/unseal — OpenBAO requires manual initialization by design
- E2E test performs full lifecycle: deploy, wait for certificate + API,
init, unseal, verify readiness, cleanup
### Release note
```release-note
[apps] Add OpenBAO as a managed secrets management service with standalone and HA Raft modes, TLS enabled by default
```
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit
## Release Notes
* **New Features**
* Added OpenBAO managed secrets management service with
high-availability and standalone deployment options
* Integrated monitoring and dashboards for operational visibility
* Enabled configurable external access and web UI
* Added automated snapshot backup capability
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
<!-- Thank you for making a contribution! Here are some tips for you:
- Start the PR title with the [label] of Cozystack component:
- For system components: [platform], [system], [linstor], [cilium],
[kube-ovn], [dashboard], [cluster-api], etc.
- For managed apps: [apps], [tenant], [kubernetes], [postgres],
[virtual-machine] etc.
- For development and maintenance: [tests], [ci], [docs], [maintenance].
- If it's a work in progress, consider creating this PR as a draft.
- Don't hesistate to ask for opinion and review in the community chats,
even if it's still a draft.
- Add the label `backport` if it's a bugfix that needs to be backported
to a previous version.
-->
## What this PR does
### Release note
<!-- Write a release note:
- Explain what has changed internally and for users.
- Start with the same [label] as in the PR title
- Follow the guidelines at
https://github.com/kubernetes/community/blob/master/contributors/guide/release-notes.md.
-->
```release-note
- added dropdown for selection backupClasses in Plan/BackupJob creation form
```
Redesign storage pools architecture:
- Move storagePools map from top-level into volume.pools (Simple topology)
and volume.zones[name].pools (MultiZone topology)
- Add nodeSelector, storageClass, dataCenter overrides for zones
- Add reserved suffix validation (-worm, -readonly) for pool names
- Block volume.pools usage in MultiZone (must use zone.pools instead)
- Use ternary/hasKey pattern for all optional replicas to handle 0 correctly
- Fix nodeSelector rendering for multiline values using indent
- Use disk: parameter (not diskType:) for COSI driver v0.3.0 BucketClass
- Bump seaweedfs-cosi-driver tag to v0.3.0
- Add minimum: 1 constraint for volume/zone/pool replicas in schema
- Regenerate README, CRD, and openAPISchema
Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: Kirill Ilin <stitch14@yandex.ru>
- Remove dots from pool name regex (K8s resources don't allow dots)
- Add zone×pool name collision validation for MultiZone topology
- Use conditional storageClass rendering to omit empty values
- Fix README resourcesPreset default value
Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: Kirill Ilin <stitch14@yandex.ru>
- Document MultiZone fallback chain for pool replicas and size
- Move `-volume` WorkloadMonitor reference inside Simple topology block in dashboard-resourcemap.yaml (it is only created for Simple)
Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: Kirill Ilin <stitch14@yandex.ru>
Make storageClass optional in storagePools — pools inherit from
volume.storageClass when not explicitly set. Add full COSI resource set
per storage pool: BucketClass, BucketClass-worm (Retain + object lock),
BucketAccessClass readwrite, and BucketAccessClass readonly.
Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: Kirill Ilin <stitch14@yandex.ru>
Add optional storagePools configuration that creates separate Volume
StatefulSets per disk type (SSD/HDD/NVMe), enabling tiered storage
within a single SeaweedFS tenant. Each pool gets its own BucketClass
and BucketAccessClass to prepare infrastructure for COSI driver
integration.
Supported in both Simple and MultiZone topologies:
- Simple: one StatefulSet per pool
- MultiZone: one StatefulSet per zone×pool combination
Also adds volume.diskType field for tagging default volume servers.
Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: Kirill Ilin <stitch14@yandex.ru>
This PR adds the changelog for release `v1.0.0`.
✅ Changelog has been automatically generated in
`docs/changelogs/v1.0.0.md`.
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit
* **Documentation**
* Added comprehensive v1.0.0 release notes documenting feature
highlights, improvements, and fixes across all platform components
* Included breaking changes and step-by-step upgrade guide for v0.x to
v1.0.0 migration
* Listed 33 incremental migrations and contributor credits
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
This PR prepares the release `v1.0.0`.
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit
* **Chores**
* Updated container image versions from pre-release candidate tags to
stable v1.0.0 releases across core, system, and extra packages.
* Updated all associated container image digests to reflect the stable
release builds.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
<!-- Thank you for making a contribution! Here are some tips for you:
- Start the PR title with the [label] of Cozystack component:
- For system components: [platform], [system], [linstor], [cilium],
[kube-ovn], [dashboard], [cluster-api], etc.
- For managed apps: [apps], [tenant], [kubernetes], [postgres],
[virtual-machine] etc.
- For development and maintenance: [tests], [ci], [docs], [maintenance].
- If it's a work in progress, consider creating this PR as a draft.
- Don't hesistate to ask for opinion and review in the community chats,
even if it's still a draft.
- Add the label `backport` if it's a bugfix that needs to be backported
to a previous version.
-->
## What this PR does
In the current version, the sidebar incorrectly shows namespace-scoped
menu items on cluster-level pages (before a tenant is selected).
Clicking these items produces broken URLs with double `//` (e.g.
`default//api-table/backups.cozystack.io/...`) because the `{namespace}`
placeholder resolves to an empty string.
This PR fixes the issue by:
- Removing stock-instance-* sidebar resources that were populated with
the same namespace-scoped menu as stock-project-* sidebars
- Clearing the `CUSTOMIZATION_SIDEBAR_FALLBACK_ID` env var so the
frontend renders no sidebar when no matching sidebar resource exists
- Removing stock-instance-* from the expected resource set so orphan
cleanup removes stale instances on upgrade
Screenshot after changes
<img width="2560" height="1327" alt="dashboard screenshot with no tenant
selected"
src="https://github.com/user-attachments/assets/e0d795f7-55e9-471b-99b8-593b6fc145d8"
/>
### Test plan
- [x] On cluster list page (no tenant selected), sidebar is empty
- [x] After selecting a tenant, sidebar shows full menu
- [x] No double `//` in sidebar URLs
- [x] Existing tests pass: `go test ./internal/controller/dashboard/...`
### Release note
<!-- Write a release note:
- Explain what has changed internally and for users.
- Start with the same [label] as in the PR title
- Follow the guidelines at
https://github.com/kubernetes/community/blob/master/contributors/guide/release-notes.md.
-->
```release-note
[dashboard] fix: hide sidebar on cluster-level pages when no tenant selected
```
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit
* **Refactor**
* Streamlined sidebar resource management by reducing the number of
static sidebar configurations generated by the system.
* Removed sidebar fallback behavior, resulting in simplified sidebar
customization defaults.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
Remove stock-instance-* sidebars that were populated with namespace-scoped
menu items, causing the sidebar to incorrectly appear on cluster-level pages.
Clear the sidebar fallback ID so the frontend gracefully renders no sidebar
when no matching sidebar resource exists.
Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: Kirill Ilin <stitch14@yandex.ru>
## What this PR does
Hide Ingresses, Services, and Secrets tabs in the dashboard when the
ApplicationDefinition has no resource selectors (Include/Exclude) for
the corresponding type. Previously all tabs were always hardcoded as
visible.
### Release note
```release-note
[dashboard] Hide Ingresses/Services/Secrets tabs when ApplicationDefinition has no resource selectors defined
```
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit
* **Bug Fixes**
* Dashboard tab visibility refined: Workloads tab remains always
visible; Ingresses, Services, and Secrets tabs now appear only when
corresponding resource selectors are configured, reducing clutter and
showing relevant tabs based on configured resources.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
Add a readonly BucketAccessClass to the seaweedfs COSI chart and a
second fixed BucketAccess per bucket so each bucket automatically
gets both readWrite and readonly S3 credentials.
Signed-off-by: IvanHunters <xorokhotnikov@gmail.com>
This PR adds the changelog for release `v1.0.0-rc.2`.
✅ Changelog has been automatically generated in
`docs/changelogs/v1.0.0-rc.2.md`.
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit
* **Documentation**
* Added release notes for v1.0.0-rc.2, documenting features,
improvements, bug fixes, and contributor acknowledgments.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
This PR prepares the release `v1.0.0-rc.2`.
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit
* **Chores**
* Updated numerous container images to v1.0.0-rc.2 across platform
services, controllers, dashboard components, migration/e2e tooling,
storage and networking components.
* Refreshed several image digests (including kubevirt CSI, s3manager and
Linstor components) and other image references.
* Updated default tenant branding text used by the dashboard.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
… in sidebar
<!-- Thank you for making a contribution! Here are some tips for you:
- Start the PR title with the [label] of Cozystack component:
- For system components: [platform], [system], [linstor], [cilium],
[kube-ovn], [dashboard], [cluster-api], etc.
- For managed apps: [apps], [tenant], [kubernetes], [postgres],
[virtual-machine] etc.
- For development and maintenance: [tests], [ci], [docs], [maintenance].
- If it's a work in progress, consider creating this PR as a draft.
- Don't hesistate to ask for opinion and review in the community chats,
even if it's still a draft.
- Add the label `backport` if it's a bugfix that needs to be backported
to a previous version.
-->
## What this PR does
### Release note
<!-- Write a release note:
- Explain what has changed internally and for users.
- Start with the same [label] as in the PR title
- Follow the guidelines at
https://github.com/kubernetes/community/blob/master/contributors/guide/release-notes.md.
-->
```release-note
- fixed dashboard backupjobs creation form
- fixed dashboard sidebar backup category id
```
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit
* **Updates**
* Updated dashboard backups menu structure for improved organization
* Enhanced backup job management interface with new form configuration
including Name, Namespace, Plan Name, Application details, and Backup
Class selection
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
## What this PR does
Add `ingress.host` field to cozy-keycloak values, allowing users to
override the default
`keycloak.<root-host>` Ingress hostname. The custom hostname is applied
to both the Ingress
resource and the `KC_HOSTNAME` environment variable in the StatefulSet.
When left empty,
behavior is unchanged (backward compatible).
### Release note
```release-note
[system] Add `ingress.host` option to cozy-keycloak for custom Ingress hostname override
```
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit
* **New Features**
* Keycloak ingress hostname is now configurable and automatically
defaults to "keycloak.<root-host>" when not explicitly specified.
* **Chores**
* Refactored hostname configuration across Keycloak templates to use
dynamic variable resolution instead of hard-coded values for improved
consistency and flexibility.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
## What this PR does
After application renames (ferretdb→mongodb, mysql→mariadb,
virtual-machine→vm-disk+vm-instance),
the system-level `-rd` HelmReleases in `cozy-system` were left orphaned.
They reference
ExternalArtifacts that no longer exist, causing persistent
reconciliation failures:
- `ferretdb-rd` → no longer exists (replaced by `mongodb-rd`)
- `mysql-rd` → no longer exists (replaced by `mariadb-rd`)
- `virtual-machine-rd` → no longer exists (replaced by `vm-disk-rd` +
`vm-instance-rd`)
Migrations 28 and 29 handled user-facing HelmReleases but missed the
system-level `-rd` ones.
**Changes:**
- Add cleanup of `mysql-rd` to migration 28
- Add cleanup of `virtual-machine-rd` to migration 29
- Add migration 33 as a safety net for users who already passed
migrations 28/29
- Bump `targetVersion` from 33 to 34
### Release note
```release-note
[platform] Fix orphaned ferretdb-rd, mysql-rd, and virtual-machine-rd HelmReleases
that persist after upgrading, referencing non-existent ExternalArtifacts.
```
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit
* **Bug Fixes**
* Improved cleanup procedures in platform migrations to properly remove
orphaned system resources. This enhancement helps maintain system
stability and prevents potential resource conflicts during platform
updates.
* **Chores**
* Updated migration version target to latest.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
Add a safety-net migration for users who already passed migrations 28/29
and still have orphaned -rd HelmReleases in cozy-system:
- ferretdb-rd (replaced by mongodb-rd, never had a dedicated migration)
- mysql-rd (migration 28 only handled user HRs)
- virtual-machine-rd (migration 29 only handled user HRs)
These HRs reference ExternalArtifacts that no longer exist after the
application renames (ferretdb→mongodb, mysql→mariadb,
virtual-machine→vm-disk+vm-instance), causing persistent errors.
Bump targetVersion from 33 to 34.
Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: Andrei Kvapil <kvapss@gmail.com>
Migrations 28 (mysql→mariadb) and 29 (virtual-machine→vm-disk+vm-instance)
only handled user-facing HelmReleases but left the system-level -rd
HelmReleases (mysql-rd, virtual-machine-rd) orphaned in cozy-system.
These HRs reference ExternalArtifacts that no longer exist, causing
persistent reconciliation failures.
Add cleanup steps to delete the orphaned -rd HRs and their Helm secrets.
Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: Andrei Kvapil <kvapss@gmail.com>
Add ingress.host field to values.yaml for cozy-keycloak. When set,
it overrides the default "keycloak.<root-host>" hostname in both the
Ingress resource and the KC_HOSTNAME environment variable. When left
empty, behavior is unchanged for backward compatibility.
Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: Kirill Ilin <stitch14@yandex.ru>
<!-- Thank you for making a contribution! Here are some tips for you:
- Start the PR title with the [label] of Cozystack component:
- For system components: [platform], [system], [linstor], [cilium],
[kube-ovn], [dashboard], [cluster-api], etc.
- For managed apps: [apps], [tenant], [kubernetes], [postgres],
[virtual-machine] etc.
- For development and maintenance: [tests], [ci], [docs], [maintenance].
- If it's a work in progress, consider creating this PR as a draft.
- Don't hesistate to ask for opinion and review in the community chats,
even if it's still a draft.
- Add the label `backport` if it's a bugfix that needs to be backported
to a previous version.
-->
## What this PR does
### Release note
<!-- Write a release note:
- Explain what has changed internally and for users.
- Start with the same [label] as in the PR title
- Follow the guidelines at
https://github.com/kubernetes/community/blob/master/contributors/guide/release-notes.md.
-->
```release-note
fixed dashboard sidebar links to Backups and External IPs
```
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit
* **Refactor**
* Standardized cluster identifier usage across dashboard menu links,
administration links, and API request paths for consistent link
generation.
* **Bug Fixes**
* Resolved issues causing incorrect or inconsistent link targets and
ensured backup-class options load correctly in the UI.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
## What this PR does
Fixes multiple upgrade issues discovered during v0.41.1 → v1.0 upgrade
testing.
**Migration 26 (monitoring → monitoring-system):**
- Use `cozystack.io/ui=true` label with
`--field-selector=metadata.name=monitoring` instead of
`apps.cozystack.io/application.kind=Monitoring` to find monitoring
HelmReleases — the old label is guaranteed to exist on v0.41.1 clusters,
while the new one depends on migration 22 having run
- Add `delete_helm_secrets` function with fallback deletion by secret
name pattern and post-deletion verification
**Migrations 28 and 29 (mysql→mariadb, virtual-machine split):**
- Wrap `grep` in pipes with `{ ... || true; }` to prevent `pipefail`
exit when grep filters out all lines
- Fix reconcile annotation in migration 29 to use RFC3339 timestamp
format instead of Unix epoch
- Remove protection-webhook handling from migration 29 — it is an
external component and should not be managed by cozystack migrations
**Migration 27 (piraeus CRD ownership):**
- Skip CRDs that don't exist instead of failing the entire migration
- Add name-pattern fallback for helm secret deletion
**etcd HelmRelease:**
- Increase timeout from 10m to 30m to accommodate TLS cert rotation hook
**migrate-to-version-1.0.sh:**
- Add missing ConfigMap → Package field mappings: `bundle-disable`,
`bundle-enable`, `expose-ingress`, `expose-services`
- Remove redundant bundle enabled flags — the variant already determines
them via its values file
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit
* **New Features**
* Package generation now supports disabled/enabled package lists,
ingress name, and exposed services for customized publishing.
* **Bug Fixes**
* More robust secret cleanup with fallback deletions and post-deletion
verification.
* Guarded pipelines to avoid failures when no resources match.
* Reconciliation timestamps now use RFC3339 UTC.
* Suspension failures are no longer silently suppressed.
* **Chores**
* Increased etcd upgrade timeout; improved namespace discovery,
relabeling behavior, and user-facing messaging.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
## Summary
- Add @sircthulhu to the global CODEOWNERS list
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit
* **Chores**
* Updated repository maintenance configuration.
---
**Note:** This release contains only internal repository updates with no
user-facing changes.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
The protection-webhook is not part of the cozystack platform and should
not be managed by the migration script. Old services are now deleted
directly instead of being batched through the webhook disable/enable cycle.
Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: Andrei Kvapil <kvapss@gmail.com>
Migration 26 was using apps.cozystack.io/application.kind=Monitoring label
which is added by migration 22 and may not be present on v0.41.1 clusters.
Switch to cozystack.io/ui=true (guaranteed on old HRs) with field-selector
for exact name match.
Also remove redundant bundle enabled flags from migrate-to-version-1.0.sh
since the variant already determines them via its values file.
Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: Andrei Kvapil <kvapss@gmail.com>
Add ConfigMap fields that were not converted to Package values:
- bundle-disable → bundles.disabledPackages
- bundle-enable → bundles.enabledPackages
- expose-ingress → publishing.ingressName
- expose-services → publishing.exposedServices
Remove incorrect bundles.system.type field that is not part of the
Package values schema.
Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: Andrei Kvapil <kvapss@gmail.com>
The post-upgrade hook deletes TLS certificates and etcd pods to trigger
cert-manager regeneration. With 3 replicas and startup probes allowing
up to 25 minutes per pod, the previous 10m timeout was insufficient.
Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: Andrei Kvapil <kvapss@gmail.com>
Migration 27 failed with set -e when Piraeus CRDs did not exist on
clusters without linstor. Add existence check before annotating CRDs.
Also add name-pattern fallback for helm secret deletion, consistent
with migration 26.
Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: Andrei Kvapil <kvapss@gmail.com>
grep returns exit code 1 when no lines match. With set -euo pipefail,
this kills the script when all secrets are helm-release secrets or when
no matching resources exist. Wrap grep calls with { ... || true; }.
Also fix reconcile annotation in migration 29 to use RFC3339 timestamp
format instead of Unix epoch, which Flux v2 expects.
Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: Andrei Kvapil <kvapss@gmail.com>
Migration 26 silently skipped namespace processing when kubectl
queries failed, leaving helm release secrets intact. This caused
helm to diff old vs new chart manifests during upgrade, deleting
VLogs/CNPG resources and their PVCs.
- Remove silent error suppression (2>/dev/null || true) from
namespace discovery and HR suspend commands
- Add fallback secret deletion by name pattern when label selector
does not match
- Add verification that all helm release secrets are deleted
Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: Andrei Kvapil <kvapss@gmail.com>
Only show Ingresses/Services/Secrets tabs when Include selectors are
defined. Exclude selectors alone don't make resources visible as tenant
resources, so tabs would be empty.
Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: Andrei Kvapil <kvapss@gmail.com>
## What this PR does
PR #2075 introduced `_cluster.cluster-domain` references in
monitoring-agents `values.yaml` for FQDN resolution in tenant clusters.
This broke the fluent-bit subchart because `_cluster` values are not
accessible from the Helm subchart context — only `global` values are
shared with subcharts.
This PR replaces `_cluster` references with a new `global.clusterDomain`
variable:
- Empty by default (management cluster uses short DNS names like
`service.namespace.svc`)
- Set to the management cluster domain (e.g. `cozy.local`) for tenant
clusters, enabling FQDN resolution for cross-cluster service discovery
Fixes#2084
### Release note
```release-note
[system] Fix monitoring-agents installation failure caused by inaccessible _cluster values in fluent-bit subchart context. Introduce global.clusterDomain for proper FQDN resolution in tenant workload clusters.
```
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit
* **Chores**
* Monitoring agent configuration updated to support configurable cluster
domain names for greater flexibility.
* Remote write and log-forwarding endpoints adjusted to align with
cluster domain handling, improving compatibility when deploying across
different cluster DNS setups.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
Show these tabs only when the ApplicationDefinition has non-empty
Include or Exclude resource selectors for the corresponding type.
Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: Andrei Kvapil <kvapss@gmail.com>
PR #2075 added _cluster.cluster-domain references to monitoring-agents
values.yaml for FQDN resolution. This broke fluent-bit because it is
a subchart where _cluster values are not accessible (only global
values are shared with subcharts), causing "index of untyped nil".
Revert values.yaml to short DNS names (.svc) for the management
cluster where they resolve locally, and add FQDN with cluster-domain
suffix in the tenant kubernetes HelmRelease where _cluster values are
available and cross-cluster DNS resolution is needed.
Closes#2084
Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: Andrei Kvapil <kvapss@gmail.com>
This PR adds the changelog for release `v1.0.0-rc.1`.
✅ Changelog has been automatically generated in
`docs/changelogs/v1.0.0-rc.1.md`.
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit
* **Documentation**
* Added comprehensive release notes for v1.0.0-rc.1, documenting new
features, improvements, bug fixes, breaking changes, and upgrade
guidance.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
## What this PR does
Add 9 missing patch release changelogs:
- **v0.40.x**: v0.40.5, v0.40.6, v0.40.7
- **v0.41.x**: v0.41.4, v0.41.5, v0.41.6, v0.41.7, v0.41.8, v0.41.9
All PR authors were verified via `gh pr view` (not from commit authors).
Format matches existing changelog files in the repository.
### Release note
```release-note
[docs] Add missing changelogs for v0.40.5-v0.40.7 and v0.41.4-v0.41.9
```
This PR prepares the release `v1.0.0-rc.1`.
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit
## Release Notes
* **Chores**
* Upgraded container image versions from v1.0.0-beta.6 to v1.0.0-rc.1
across the entire platform. Updates span multiple critical components
including the core operator, platform migrations, dashboard and API
services, backup and storage controllers, networking components, and
various additional system infrastructure services. All updated images
include new SHA256 digests for ensuring integrity and verification.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
Add 9 missing patch release changelogs covering changes from
January to February 2026. All PR authors verified via GitHub CLI.
Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: Andrei Kvapil <kvapss@gmail.com>
## What this PR does
Updates kubevirt to v1.6.3 and CDI to v1.64.0.
Please note that VMs would be live-migrated as a part of the update.
### Release note
```release-note
Updated kubevirt to v1.6.3 and CDI to v1.64.0
```
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit
* **New Features**
* VM synchronization support with synchronization ports and address
reporting
* Cluster profiler and synchronization controller developer options
* **Updates**
* CDI operator bumped to v1.64.0; filesystem overhead default increased
* KubeVirt operator bumped to v1.6.3
* Added liveness/readiness probes and health/metrics ports
* Expanded operator tolerations for control-plane/master nodes
* Expanded operator permissions for synchronization and webhook-related
resources
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
KubeVirt v1.6.x has a known issue (#15989) where the guest-console-log
init container blocks virt-launcher pods from starting. The container
runs virt-tail as a long-running sidecar but fails to properly function
as a Kubernetes native sidecar, causing all VM pods to get stuck in
PodInitializing state indefinitely.
Disable serial console logging globally via the KubeVirt CR to prevent
the problematic init container from being created.
Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: Andrei Kvapil <kvapss@gmail.com>
## What this PR does
Reverts #2079 — the Kamaji update was merged prematurely while marked
with `do-not-merge` label. The upstream commit (`309d9889`) has not been
released as an edge tag yet.
### Release note
```release-note
NONE
```
Upstream PR clastix/kamaji#1087 (refactor!: datastore conditions) removed
the startup datastore existence check that our disable-datastore-check.diff
patch was working around. Update to the merge commit and drop the now
redundant patch.
Remaining patches:
- fix-kubelet-config-compat.diff (pending upstream PR #1084)
- increase-startup-probe-threshold.diff (no upstream fix)
Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: Aleksei Sviridkin <f@lex.la>
<!-- Thank you for making a contribution! Here are some tips for you:
- Start the PR title with the [label] of Cozystack component:
- For system components: [platform], [system], [linstor], [cilium],
[kube-ovn], [dashboard], [cluster-api], etc.
- For managed apps: [apps], [tenant], [kubernetes], [postgres],
[virtual-machine] etc.
- For development and maintenance: [tests], [ci], [docs], [maintenance].
- If it's a work in progress, consider creating this PR as a draft.
- Don't hesistate to ask for opinion and review in the community chats,
even if it's still a draft.
- Add the label `backport` if it's a bugfix that needs to be backported
to a previous version.
-->
## What this PR does
Previously `_cluster.clusterissuer` controlled the ACME solver type
using
values `http01` / `cloudflare`, and every ingress template hardcoded
`cert-manager.io/cluster-issuer: letsencrypt-prod` with no way to
override it.
This PR adds new parameters in platform chart:
- `publishing.certificates.solver` (default `http01`)
- `publishing.certificates.issuerName` (default: `letsencrypt-prod`)
instead of single parameter before
- `publishing.certificates.issuerType`
Previous `certificates.issuerType` was renamed to `certificates.solver`;
Also its possible value
`cloudflare` was renamed to `dns01` to use standard ACME terminology.
New `certificates.issuerName` (default: `letsencrypt-prod`) — propagated
as
`_cluster.issuer-name` to all packages via `cozystack-values` then its
value appears in
`cert-manager.io/cluster-issuer` annotation across 8 templates of
ingresses in system applications.
`publishing.certificates.solver` can be set empty to clearly support
`selfsigned-cluster-issuer`,
or have any value, but it can be a bit confusing.
Operators can now point ingresses at any ClusterIssuer (custom ACME,
self-signed, internal CA) by setting `certificates.issuerName` without
touching individual package templates.
## Breaking changes
| What changed | Before | After |
|---|---|---|
| Solver key | `certificates.issuerType` | `certificates.solver` |
| Cloudflare solver value | `issuerType: cloudflare` | `solver: dns01` |
This changes handled by migration when upgrading cozystack from v1
or by `migration-to-v1.0.sh` script (also checked by migration later)
No actions from user needed.
### Release note
<!-- Write a release note:
- Explain what has changed internally and for users.
- Start with the same [label] as in the PR title
- Follow the guidelines at
https://github.com/kubernetes/community/blob/master/contributors/guide/release-notes.md.
-->
```release-note
[platform] Added publishing.certificates.solver (http01/dns01) and
publishing.certificates.issuerName fields to allow configuring ACME challenge
type and ClusterIssuer per installation, replacing the old implicit issuerType field
[platform] Migration script and upgrade hook (migration 32) convert old
clusterissuer/issuerType fields to the new solver/issuerName fields
```
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit
* **Chores**
* Migrated certificate issuer configuration from legacy `issuerType`
field to new `solver` and `issuerName` fields system-wide.
* Automated migration script converts existing configurations, mapping
legacy values (cloudflare, http01) to new format.
* Updated all certificate-related templates to use new configurable
solver and issuer settings with sensible defaults.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
## What this PR does
Monitoring agents (vmagent, fluent-bit) in tenant workload clusters
failed to deliver metrics and logs because service addresses used short
DNS names (e.g. `vminsert-longterm.tenant-root.svc`) without the cluster
domain suffix. Tenant CoreDNS could not resolve these names across
cluster boundaries.
This PR appends the configured cluster domain from
`_cluster.cluster-domain` to all vmagent remoteWrite URLs and fluent-bit
output hosts, with a fallback to `cluster.local` when not set.
### Release note
```release-note
[system] Fix monitoring-agents endpoints to use FQDN with configured cluster domain, resolving metrics and logs delivery failures in tenant workload clusters.
```
## What this PR does
Updates Kubernetes version support to match current release landscape
and Talos 1.12 compatibility:
- Update Kamaji from `edge-25.4.1` to `edge-26.2.4` (adds K8s 1.35
support)
- Update Kubernetes version matrix: v1.30, v1.31, v1.32, v1.33, v1.34,
v1.35
- Drop EOL versions v1.28 and v1.29
- Remove merged-upstream patch (992.diff — label preservation fix)
- Regenerate disable-datastore-check.diff for new Kamaji version
Changes:
- Default Kubernetes version is now v1.35
- E2E tests will validate v1.35 (latest) and v1.34 (previous)
- Patch versions updated to latest available (v1.35.0, v1.34.4, v1.33.8,
v1.32.12, v1.31.14, v1.30.14)
### Release note
```release-note
[kubernetes] Update supported Kubernetes versions to v1.30-v1.35
```
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit
* **New Features**
* Added a Kamaji CRDs Helm chart with DataStore and KubeconfigGenerator
resources, plus deployment templates and configurable
kubeconfigGenerator settings
* DataStore now supports multiple backends (etcd, MySQL, PostgreSQL,
NATS) with TLS/auth validations and status tracking (observedGeneration)
* **Chores**
* Bumped default Kubernetes version from v1.33 to v1.35 (added v1.34;
removed v1.28–v1.29)
* Updated charts, packaging metadata, README/docs and helm
ignore/Makefile entries; updated builder base image and chart
dependencies
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
Kamaji edge-26.2.4 is compiled against Kubernetes 1.35 libraries.
SetDefaults_KubeletConfiguration() from 1.35 injects two fields
gated by feature gates that are not enabled in earlier versions:
- crashLoopBackOff.maxContainerRestartPeriod (KubeletCrashLoopBackOffMax)
- imagePullCredentialsVerificationPolicy (KubeletEnsureSecretPulledImages)
Kubelets < 1.35 reject these fields during configuration validation,
causing worker nodes to fail to join the tenant cluster.
Add a Go patch that clears these fields from the kubelet-config
ConfigMap when the target Kubernetes version is below 1.35.
See: https://github.com/clastix/kamaji/issues/1062
Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: Aleksei Sviridkin <f@lex.la>
<!-- Thank you for making a contribution! Here are some tips for you:
- Start the PR title with the [label] of Cozystack component:
- For system components: [platform], [system], [linstor], [cilium],
[kube-ovn], [dashboard], [cluster-api], etc.
- For managed apps: [apps], [tenant], [kubernetes], [postgres],
[virtual-machine] etc.
- For development and maintenance: [tests], [ci], [docs], [maintenance].
- If it's a work in progress, consider creating this PR as a draft.
- Don't hesistate to ask for opinion and review in the community chats,
even if it's still a draft.
- Add the label `backport` if it's a bugfix that needs to be backported
to a previous version.
-->
## What this PR does
### Release note
<!-- Write a release note:
- Explain what has changed internally and for users.
- Start with the same [label] as in the PR title
- Follow the guidelines at
https://github.com/kubernetes/community/blob/master/contributors/guide/release-notes.md.
-->
```release-note
Fixed cozy:tenant:admin:base ClusterRole to deny deletion of tenant ResourceQuotas for the tenant admin and superadmin
```
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit
* **Bug Fixes**
* Removed resource quota management permissions from tenant admin role
to reduce unnecessary administrative access.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
- Kill stale port-forward processes before starting a new one;
on retries, the previous attempt's port-forward still holds the
port, causing all kubectl commands to get "connection refused"
- Use -ge 2 instead of -eq 2 for node count check; MachineHealthCheck
may create a 3rd VM, leading to 3 nodes joining the tenant cluster
which would never satisfy the exact equality check
- Increase node join timeout from 5m to 8m; QEMU VMs with v1.34 need
more time to boot and join when running after kubernetes-latest
Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: Aleksei Sviridkin <f@lex.la>
When running kubernetes-latest and kubernetes-previous E2E tests
simultaneously, worker VMs compete for resources in the sandbox
environment. 3 minutes was insufficient for nodes to boot and
join the tenant cluster under load. Increase to 5 minutes.
Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: Aleksei Sviridkin <f@lex.la>
Kamaji auto-derives the konnectivity proxy image tag as v0.{minor}.0
from the Kubernetes version. For K8s v1.35, this produces v0.35.0,
but the kas-network-proxy/proxy-server:v0.35.0 image does not exist
in registry.k8s.io yet, causing ImagePullBackOff on new TCP pods.
Add konnectivity-versions.yaml mapping to explicitly override the
konnectivity version when the auto-derived tag is unavailable.
For v1.35, pin to v0.34.0 (latest available).
Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: Aleksei Sviridkin <f@lex.la>
Go 1.26 was released on 2026-02-10 and is fully compatible with
Kamaji edge-26.2.4 (which requires go 1.25.0 in go.mod).
Verified by local build.
Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: Aleksei Sviridkin <f@lex.la>
<!-- Thank you for making a contribution! Here are some tips for you:
- Start the PR title with the [label] of Cozystack component:
- For system components: [platform], [system], [linstor], [cilium],
[kube-ovn], [dashboard], [cluster-api], etc.
- For managed apps: [apps], [tenant], [kubernetes], [postgres],
[virtual-machine] etc.
- For development and maintenance: [tests], [ci], [docs], [maintenance].
- If it's a work in progress, consider creating this PR as a draft.
- Don't hesistate to ask for opinion and review in the community chats,
even if it's still a draft.
- Add the label `backport` if it's a bugfix that needs to be backported
to a previous version.
-->
## What this PR does
- Add API-backed dropdown selects for VMInstance form: `instanceType`
fetches from `VirtualMachineClusterInstancetype` resources,
`disks[].name` fetches from `VMDisk`
resources in the same namespace
- Default value for `instanceType` is read dynamically from the
ApplicationDefinition's OpenAPI schema
- Fix a bug in the upstream `FormListInput` component where Ant Design's
`Form.Item` couldn't pass `value`/`onChange` to `Select` because of an
intermediate `Flex` wrapper
Details
The dashboard renders forms from OpenAPI schemas using the
openapi-k8s-toolkit library. To turn a plain text field into an
API-backed dropdown, the CustomFormsOverride resource's schema field is
used with type: "listInput" and customProps containing the API endpoint
URL.
Controller changes (customformsoverride.go):
- applyListInputOverrides() — injects listInput schema overrides for
VMInstance kind
- parseOpenAPIProperties() — extracts top-level properties from OpenAPI
schema to read defaults
- ensureSchemaPath() / ensureArrayItemProps() — helpers to build nested
schema structures safely
Frontend patch (formlistinput-value-binding.diff):
- Moves `<Flex>` outside `<ResetedFormItem>` so `<Select>` becomes the
direct child — required for Ant Design's `Form.Item` to inject
`value`/`onChange` via `React.cloneElement`
Instance type example:
<img width="1143" height="1091" alt="instance type example"
src="https://github.com/user-attachments/assets/6c401916-b531-4da6-ae27-ca54e6b0bd04"
/>
VMDisks example:
<img width="875" height="323" alt="vmdisks example"
src="https://github.com/user-attachments/assets/18918115-c08a-40bb-b932-536419d6f2c1"
/>
### Release note
<!-- Write a release note:
- Explain what has changed internally and for users.
- Start with the same [label] as in the PR title
- Follow the guidelines at
https://github.com/kubernetes/community/blob/master/contributors/guide/release-notes.md.
-->
```release-note
[vminstance] Add dropdowns for `instanceType` and `disk[].name` using `VirtualMachineClusterInstancetype` cluster resources and `VMDisk` from current namespace
```
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit
* **New Features**
* Custom form overrides now support auto-population of dropdown field
defaults from API schemas for enhanced user workflows.
* Improved layout and visual alignment of form list input controls for
better usability and responsiveness.
* **Tests**
* Added comprehensive test coverage for custom form override
functionality and API schema integration, including edge cases and
default value handling scenarios.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
CAPI Kamaji provider v0.15.0 is incompatible with Kamaji edge-26.2.4
due to the new dataStoreUsername field with XValidation rule. The
provider's CreateOrUpdate drops the field (not in its Go types),
triggering "unsetting the dataStoreUsername is not supported" error.
This results in KamajiControlPlane staying INITIALIZED=false even
though the underlying TenantControlPlane reaches Ready.
v0.16.0 includes support for DataStoreUsername (PR #243 in v0.15.4)
and updated Kamaji types compatible with edge-26.2.4.
Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: Aleksei Sviridkin <f@lex.la>
## What this PR does
Adds the `skip-adjust-when-device-inaccessible.diff` patch (upstream
[LINBIT/linstor-server#477](https://github.com/LINBIT/linstor-server/pull/477))
which:
- Skips DRBD adjust and .res file regeneration when child layer devices
are inaccessible (fixes encrypted resource deletion)
- Skips lsblk when device path doesn't physically exist yet (fixes race
condition after drbdadm adjust)
- Only checks child devices when disk access is actually needed (allows
network reconnect from StandAlone)
- Fixes missing `setExists(true)` in `LuksLayer` — the root cause of all
new DRBD+LUKS+STORAGE resources failing with "not defined in your
config"
### Release note
```release-note
[linstor] Fix DRBD+LUKS+STORAGE resource creation: all new encrypted volumes were failing because the DRBD .res file was never written due to a missing exists flag in the LUKS layer
```
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit
* **Bug Fixes**
* Improved handling of inaccessible storage devices by adding pre-checks
before performing operations
* Operations are now skipped when underlying storage devices are
unavailable, preventing unnecessary failures
* Enhanced error recovery during storage adjustments when devices are
temporarily inaccessible
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
## What this PR does
Removes CRDs from the cozy-installer Helm chart `crds/` directory and
delegates
CRD lifecycle management entirely to the operator via `--install-crds`
flag.
The operator already applies embedded CRDs via server-side apply on
every startup,
making the Helm `crds/` directory redundant. Helm only installs CRDs on
initial
`helm install` and never updates or deletes them on upgrade/uninstall,
which causes
CRDs to become stale over time.
Changes:
- Remove `packages/core/installer/crds/` (Packages and PackageSources
CRDs)
- Remove `templates/packagesource.yaml` Helm template — PackageSource is
now
created by the operator at startup using server-side apply
- Add `installPlatformPackageSource()` function to operator with SSA
- Move variant validation from deleted template to
`cozystack-operator.yaml`
- Simplify `update-codegen.sh` to use single CRD destination
- Update Makefile to source CRDs from `internal/crdinstall/manifests/`
- Update E2E tests to wait for operator-managed CRDs and PackageSource
- Add unit tests for PackageSource creation
### Release note
```release-note
[installer] CRDs are no longer shipped in the Helm chart crds/ directory. The operator now manages CRD lifecycle via --install-crds flag, ensuring CRDs stay up to date on every startup.
```
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit
* **New Features**
* Operator now installs CRDs on startup and auto-creates the platform
PackageSource.
* **Improvements**
* Installer waits for CRDs to be established and for the platform
PackageSource to be present before proceeding.
* Deployment variant selection now has stricter validation to prevent
invalid choices.
* **Chores**
* Cleaned up legacy CRD templates and updated build/install scripting
paths.
* **Tests**
* Added comprehensive tests covering platform PackageSource installation
and URL/ref parsing.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
kubectl wait fails immediately with NotFound if the CRD does not exist
yet. The operator creates CRDs asynchronously on startup, so wrap the
wait in a retry loop that tolerates the initial absence.
Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: Aleksei Sviridkin <f@lex.la>
Monitoring agents in tenant workload clusters failed to deliver metrics
and logs because service addresses used short DNS names without the
cluster domain suffix. Tenant CoreDNS could not resolve these names
across cluster boundaries.
Append the configured cluster domain from _cluster.cluster-domain to
all vmagent remoteWrite URLs and fluent-bit output hosts, falling back
to cluster.local when not set.
Signed-off-by: IvanHunters <xorokhotnikov@gmail.com>
When platformSourceURL is empty no Flux source resource is created, so
creating a PackageSource that references it would leave a dangling
SourceRef in a permanent error state. Guard the creation block on
platformSourceURL != "".
Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: Aleksei Sviridkin <f@lex.la>
Replace repetitive Variant struct literals with a loop over variant
data, making it easier to add new variants and reducing copy-paste
errors.
Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: Aleksei Sviridkin <f@lex.la>
Kamaji edge-26.2.4 requires Go >= 1.25.0, update base image accordingly.
Also remove unused "context" import from disable-datastore-check patch,
since removing the CheckExists call was the only usage of that package.
Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: Aleksei Sviridkin <f@lex.la>
Change default --platform-source-name from "cozystack-packages" to
"cozystack-platform" to match the value passed by the Helm template.
Gate PackageSource creation on --install-crds flag: when the operator
does not manage CRDs, the PackageSource CRD may not exist yet, so
skip creation and let an external process handle it.
Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: Aleksei Sviridkin <f@lex.la>
Explicitly check error from parsePlatformSourceURL instead of relying on
the implicit guarantee that installPlatformSourceResource already checked
it. This prevents latent bugs if startup order is ever restructured.
Add wait for platform PackageSource existence in E2E test before creating
Package resource, preventing flaky failures when operator startup is slow.
Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: Aleksei Sviridkin <f@lex.la>
Switch installPlatformPackageSource to server-side apply (SSA) with
field manager, matching the pattern used in crdinstall. SSA is idempotent
and preserves metadata fields managed by other controllers.
Create PackageSource unconditionally (not only when platformSourceURL is
set), matching the previous Helm template behavior where PackageSource
was always created regardless of source URL configuration.
Use a dedicated context with its own 2-minute timeout for PackageSource
creation, separate from the platform source resource installation.
Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: Aleksei Sviridkin <f@lex.la>
Add explicit CRD wait (kubectl wait crd --for=condition=Established) in
E2E test before creating Package resources, preventing race condition
between operator CRD installation and resource creation.
Add unit tests for installPlatformPackageSource covering create, update,
and GitRepository sourceRef kind scenarios.
Document that hardcoded variant list is an intentional design choice
matching the previous Helm template behavior.
Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: Aleksei Sviridkin <f@lex.la>
Check error from parsePlatformSourceURL instead of discarding it, preventing
silent fallback to OCIRepository on invalid URLs.
Move variant validation from deleted packagesource.yaml to
cozystack-operator.yaml template so invalid cozystackOperator.variant
values still fail at helm template/install time.
Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: Aleksei Sviridkin <f@lex.la>
Replace the Helm hook approach with programmatic PackageSource creation
in the operator startup sequence. Helm hooks are unsuitable for persistent
resources like PackageSource because before-hook-creation policy causes
cascade deletion of owned ArtifactGenerators during upgrades.
The operator now creates the platform PackageSource after installing CRDs
and the Flux source resource, using the same create-or-update pattern as
installPlatformSourceResource(). The sourceRef.kind is derived from the
platform source URL (OCIRepository for oci://, GitRepository for git).
Also fix stale comment in e2e test referencing deleted crds/ directory.
Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: Aleksei Sviridkin <f@lex.la>
Remove the crds/ directory from the cozy-installer Helm chart. The operator
already installs embedded CRDs via server-side apply on every startup with
the --install-crds=true flag, making the Helm crds/ directory redundant.
Convert templates/packagesource.yaml to a Helm post-install/post-upgrade
hook so it is applied after the operator has started and installed CRDs.
Update codegen to write CRDs only to internal/crdinstall/manifests/ (single
source of truth) and update the Makefile to source build assets from there.
Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: Aleksei Sviridkin <f@lex.la>
Update Kubernetes version matrix to match Talos 1.12 support range:
- Add v1.35.0 (latest) and v1.34.4
- Update existing patch versions (v1.33.8, v1.32.12)
- Drop EOL versions v1.28 and v1.29
- Set default version to v1.35
Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: Aleksei Sviridkin <f@lex.la>
Update Kamaji from edge-25.4.1 to edge-26.2.4 which adds support for
Kubernetes 1.35 (KubeadmVersion bumped from v1.33.0 to v1.35.0).
- Update Dockerfile VERSION to edge-26.2.4
- Update vendored Helm charts from upstream
- Remove 992.diff patch (label preservation fix merged upstream)
- Regenerate disable-datastore-check.diff for new version
Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: Aleksei Sviridkin <f@lex.la>
Add skip-adjust-when-device-inaccessible.diff patch (upstream PR #477)
which prevents DRBD adjust and res file regeneration when child layer
devices are inaccessible (e.g. during encrypted resource deletion).
This patch also includes a fix for a missing setExists(true) call in
LuksLayer, which caused all new DRBD+LUKS+STORAGE resources to fail
with "not defined in your config" errors because the DRBD .res file
was never written.
Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: Andrei Kvapil <kvapss@gmail.com>
Add API-backed listInput dropdown for disks[].name in VMInstance form,
listing available VMDisk resources from the same namespace.
Signed-off-by: Kirill Ilin <stitch14@yandex.ru>
The Flex wrapper between ResetedFormItem and Select prevented Ant
Design's Form.Item from injecting value/onChange into the Select,
causing the dropdown to appear empty even when the form store had a
value. Move Flex outside ResetedFormItem so Select is its direct child.
Signed-off-by: Kirill Ilin <stitch14@yandex.ru>
Override spec.instanceType field with listInput type in schema so the
dashboard renders it as a select dropdown populated from
VirtualMachineClusterInstancetype resources. Default value is read
dynamically from the ApplicationDefinition's OpenAPI schema.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Signed-off-by: Kirill Ilin <stitch14@yandex.ru>
<!-- Thank you for making a contribution! Here are some tips for you:
- Start the PR title with the [label] of Cozystack component:
- For system components: [platform], [system], [linstor], [cilium],
[kube-ovn], [dashboard], [cluster-api], etc.
- For managed apps: [apps], [tenant], [kubernetes], [postgres],
[virtual-machine] etc.
- For development and maintenance: [tests], [ci], [docs], [maintenance].
- If it's a work in progress, consider creating this PR as a draft.
- Don't hesistate to ask for opinion and review in the community chats,
even if it's still a draft.
- Add the label `backport` if it's a bugfix that needs to be backported
to a previous version.
-->
## What this PR does
Bug: Changes to Tenant `tenant-root` will be dropped on next force (or
upgrade) reconciliation of `cosystack-basics` HelmRelease. This may lead
to data loss and outage of service
This PR fixes such behavior preserving values, applied by the user
### Release note
<!-- Write a release note:
- Explain what has changed internally and for users.
- Start with the same [label] as in the PR title
- Follow the guidelines at
https://github.com/kubernetes/community/blob/master/contributors/guide/release-notes.md.
-->
```release-note
[cozystack-basics] Preserve existing `HelmRelease` values of `tenant-root` during reconciliations
```
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit
* **Bug Fixes**
* Fixed cluster configuration to preserve existing settings during
updates instead of overwriting them. The system now properly merges new
configuration with prior values, ensuring no settings are unexpectedly
lost.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
## What this PR does
Adds Harbor v2.14.2 as a managed tenant-level container registry service
in the PaaS bundle.
**Architecture:**
- Wrapper chart (`apps/harbor`) — HelmRelease, Ingress,
WorkloadMonitors, BucketClaim, dashboard RBAC
- Vendored upstream chart (`system/harbor`) from helm.goharbor.io
v1.18.2
- System chart (`system/harbor`) provisions PostgreSQL via CloudNativePG
and Redis via redis-operator
- ApplicationDefinition (`system/harbor-rd`) for dynamic `Harbor` CRD
registration
- PackageSource and paas.yaml bundle entry for platform integration
**Key design decisions:**
- Database and Redis provisioned via CPNG and redis-operator (not
internal Helm-based instances) for reliable day-2 operations
- Registry image storage uses S3 via COSI BucketClaim/BucketAccess from
namespace SeaweedFS
- Trivy vulnerability scanner cache uses PVC (S3 not supported by
vendored chart)
- Token CA key/cert persisted across upgrades via Secret lookup
- Per-component resource configuration (core, registry, jobservice,
trivy)
- Ingress with TLS via cert-manager, cloudflare issuer type handling,
proxy timeouts for large image pushes
- Auto-generated admin credentials persisted across upgrades
**E2E test:** Creates Harbor instance, verifies HelmRelease readiness,
deployment availability, credentials secret, service port, then cleans
up.
### Release note
```release-note
[harbor] Add managed Harbor container registry as a tenant-level service
```
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit
* **New Features**
* Added Harbor container registry deployment with integrated Kubernetes
support, including database and cache layers.
* Enabled metrics monitoring via Prometheus integration.
* Configured dashboard management interface for Harbor administration.
* **Tests**
* Added end-to-end testing for Harbor deployment and verification.
* **Chores**
* Integrated Harbor into the platform's application package bundle.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
Add encoding, localeCollate, and localeCType to initdb bootstrap
configuration to ensure fulltext search (ilike) works correctly.
Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: Aleksei Sviridkin <f@lex.la>
This PR adds the changelog for release `v1.0.0-beta.6`.
✅ Changelog has been automatically generated in
`docs/changelogs/v1.0.0-beta.6.md`.
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit
* **New Features**
* Cilium-Kilo networking variant for enhanced multi-location setup
* NATS monitoring dashboards with Prometheus integration
* DNS validation for Application names
* Operator auto-installation of CRDs at startup
* **Bug Fixes**
* Fixed HelmRelease adoption during platform migration
* Improved test reliability and timeout handling
* **Documentation**
* Enhanced Azure autoscaling troubleshooting guidance
* Updated multi-location configuration docs
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
This PR prepares the release `v1.0.0-beta.6`.
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit
* **Chores**
* Updated container image references from v1.0.0-beta.5 to v1.0.0-beta.6
across the platform, including updates to operators, controllers,
dashboards, storage components, and other services with corresponding
digest updates.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
Specify initdb bootstrap with database and owner names explicitly
instead of relying on CNPG defaults which may change between versions.
Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: Aleksei Sviridkin <f@lex.la>
Follow the same hostname pattern as kubernetes and vpn apps: use
Release.Name directly (which already includes the harbor- prefix)
instead of adding an extra harbor. subdomain.
Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: Aleksei Sviridkin <f@lex.la>
Change sslmode from disable to require for CNPG PostgreSQL connection,
as CNPG supports TLS out of the box. Fix token key/cert preservation to
verify both values are present before passing to Harbor core.
Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: Aleksei Sviridkin <f@lex.la>
Use tenant base domain in default hostname construction (harbor.RELEASE.DOMAIN)
to match the pattern used by other apps (kubernetes, vpn). Remove unused $ingress
variable from harbor.yaml. Add cleanup of stale resources from previous failed
E2E runs.
Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: Aleksei Sviridkin <f@lex.la>
## Summary
- Replace pre-rendered static YAML application (`kubectl apply`) with
direct `helm upgrade --install` of the `packages/core/installer` chart
in E2E tests
- Remove CRD/operator artifact upload/download from CI workflow — the
chart with correct values is already present in the sandbox via
workspace copy and `pr.patch`
- Remove `copy-installer-manifest` Makefile target and its dependencies
## Test plan
- [ ] CI build job completes without uploading CRD/operator artifacts
- [ ] E2E `install-cozystack` step succeeds with `helm upgrade
--install`
- [ ] All existing E2E app tests pass
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit
* **Chores**
* PR workflows now only keep the primary disk asset; publishing/fetching
of auxiliary operator and CRD artifacts removed.
* CRD manifests are produced by concatenation and a verify-crds check
was added to unit tests; file-write permissions for embedded manifests
tightened.
* **New Features**
* Operator can install CRDs at startup to ensure resources exist before
reconcile.
* E2E install now uses the chart-based installer flow.
* **Tests**
* Added comprehensive tests for CRD-install handling and manifest
writing.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
Replace PVC-based registry storage with S3 via COSI BucketClaim/BucketAccess.
The system chart parses BucketInfo secret and creates a registry-s3 Secret
with REGISTRY_STORAGE_S3_* env vars that override Harbor's ConfigMap values.
- Add bucket-secret.yaml to system chart (BucketInfo parser)
- Remove storageType/size from registry config (S3 is now the only option)
- Use Harbor's existingSecret support for S3 credentials injection
- Add objectstorage-controller to PackageSource dependencies
- Update E2E test with COSI bucket provisioning waits and diagnostics
Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: Aleksei Sviridkin <f@lex.la>
<!-- Thank you for making a contribution! Here are some tips for you:
- Start the PR title with the [label] of Cozystack component:
- For system components: [platform], [system], [linstor], [cilium],
[kube-ovn], [dashboard], [cluster-api], etc.
- For managed apps: [apps], [tenant], [kubernetes], [postgres],
[virtual-machine] etc.
- For development and maintenance: [tests], [ci], [docs], [maintenance].
- If it's a work in progress, consider creating this PR as a draft.
- Don't hesistate to ask for opinion and review in the community chats,
even if it's still a draft.
- Add the label `backport` if it's a bugfix that needs to be backported
to a previous version.
-->
## What this PR does
### Release note
<!-- Write a release note:
- Explain what has changed internally and for users.
- Start with the same [label] as in the PR title
- Follow the guidelines at
https://github.com/kubernetes/community/blob/master/contributors/guide/release-notes.md.
-->
```release-note
[nats] add monitoring
```
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit
## Release Notes
* **New Features**
* Added Grafana dashboards for NATS JetStream and Server monitoring
* Added support for specifying container image digests and full image
names
* **Documentation**
* Enhanced NATS Helm chart documentation with container resource
configuration guidance
* **Chores**
* Updated NATS application version and component image versions
* Improved Kubernetes graceful shutdown and Prometheus exporter
configuration
<sub>✏️ Tip: You can customize this high-level summary in your review
settings.</sub>
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
## What this PR does
Adds migration 31 to adopt existing `tenant-root` Namespace and
HelmRelease
into the `cozystack-basics` Helm release during upgrade from v0.41.x to
v1.0.
In v0.41.x these resources were applied via `kubectl apply` (from the
platform
chart `apps.yaml`) with no Helm release tracking. In v1.0 they are
created by
the `cozystack-basics` chart. Without Helm ownership annotations, `helm
install`
of `cozystack-basics` fails because the resources already exist.
The migration adds:
- `meta.helm.sh/release-name` and `meta.helm.sh/release-namespace`
annotations
for Helm adoption
- `app.kubernetes.io/managed-by: Helm` label
- `helm.sh/resource-policy: keep` annotation on the HelmRelease to
prevent
accidental deletion
- `sharding.fluxcd.io/key: tenants` label required by flux sharding in
v1.0
This follows the same pattern as migrations 22 and 27 (CRD adoption).
Related: #2063
### Release note
```release-note
[platform] adopt tenant-root resources into cozystack-basics Helm release during migration
```
In v0.41.x the tenant-root Namespace and HelmRelease were applied via
kubectl apply with no Helm release tracking. In v1.0 these resources
are managed by the cozystack-basics Helm release. Without proper Helm
ownership annotations the install of cozystack-basics fails because
the resources already exist.
Add migration 31 that annotates and labels both the Namespace and
HelmRelease so Helm can adopt them, matching the pattern established
in migrations 22 and 27.
Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: Andrei Kvapil <kvapss@gmail.com>
Multiple Harbor instances in the same namespace would get the same
default hostname when derived from namespace host. Use Release.Name
instead for unique hostnames per instance.
Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: Aleksei Sviridkin <f@lex.la>
The vendored Harbor chart does an unsafe `lookup` of the Redis auth
Secret at template rendering time to extract the password. On first
install, the Secret doesn't exist yet (created by the same chart),
causing a nil pointer error. Failed installs are rolled back, deleting
the Secret, so retries also fail — creating an infinite failure loop.
Fix by generating the Redis password in the wrapper chart (same
pattern as admin password), storing it in the credentials Secret,
and injecting it via HelmRelease valuesFrom with targetPath. This
bypasses the vendored chart's lookup entirely — it uses the password
value directly instead of looking up the Secret.
Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: Aleksei Sviridkin <f@lex.la>
Dump HelmRelease status, pods, events, and ExternalArtifact info
when harbor-test-system fails to become ready, to diagnose the
root cause of the persistent timeout.
Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: Aleksei Sviridkin <f@lex.la>
Add postgres-operator and redis-operator to PackageSource dependsOn
to ensure CRDs are available before Harbor system chart deploys.
Make persistentVolumeClaim conditional to avoid empty YAML mapping
when using S3 storage without Trivy.
Increase E2E system HelmRelease timeout from 300s to 600s to account
for CPNG + Redis + Harbor bootstrap time on QEMU.
Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: Aleksei Sviridkin <f@lex.la>
Add registry.storageType parameter (pvc/s3) to let users choose
between PVC storage and S3 via COSI BucketClaim. Default is pvc,
which works without SeaweedFS in the tenant namespace.
Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: Aleksei Sviridkin <f@lex.la>
ApplicationDefinition has prefix "harbor-", so CR name "harbor" produces
HelmRelease "harbor-harbor". Use name="test" and release="harbor-test"
to correctly reference all resources.
Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: Aleksei Sviridkin <f@lex.la>
Add | quote to host values in ingress for proper YAML escaping.
Add cloudflare issuer type handling following bucket/dashboard pattern.
Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: Aleksei Sviridkin <f@lex.la>
Move Harbor from packages/extra/ to packages/apps/ as it is a
self-sufficient end-user application, not a singleton tenant module.
Update bundle entry from system to paas accordingly.
Replace registry PVC storage with S3 via COSI BucketClaim/BucketAccess,
provisioned from the namespace's SeaweedFS instance. S3 credentials are
injected into the HelmRelease via valuesFrom with targetPath.
Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: Aleksei Sviridkin <f@lex.la>
Replace Harbor's internal PostgreSQL with CloudNativePG operator and
internal Redis with redis-operator (RedisFailover), following established
Cozystack patterns from seaweedfs and redis apps.
Additional fixes from code review:
- Fix registry resources nesting level (registry.registry/controller)
- Persist token CA across upgrades to prevent JWT invalidation
- Update values schema and ApplicationDefinition
Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: Aleksei Sviridkin <f@lex.la>
Quote resourceNames values starting with {{ to prevent YAML parser
from interpreting them as flow mappings.
Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: Aleksei Sviridkin <f@lex.la>
Add Harbor v2.14.2 as a tenant-level managed service with per-component
resource configuration, ingress with TLS termination, and internal
PostgreSQL/Redis.
Includes:
- extra/harbor wrapper chart with HelmRelease, WorkloadMonitors, Ingress
- system/harbor with vendored upstream chart (helm.goharbor.io v1.18.2)
- harbor-rd ApplicationDefinition for dynamic CRD registration
- PackageSource and system.yaml bundle entry
- E2E test with Secret and Service verification
Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: Aleksei Sviridkin <f@lex.la>
CollectCRDNames now requires both apiVersion "apiextensions.k8s.io/v1"
and kind "CustomResourceDefinition", consistent with the validation in
crdinstall.Install.
Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: Aleksei Sviridkin <f@lex.la>
- Remove duplicate "Starting controller manager" log before install
phases, keep only the one before mgr.Start()
- Rename misleading test "document without kind returns error" to
"decoder rejects document without kind" to match actual behavior
- Document Helm uninstall CRD behavior in deployment template comment
- Use --health-probe-bind-address=0 consistently with metrics-bind
- Exclude all dotfiles in verify-crds diff, not just .gitattributes
Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: Aleksei Sviridkin <f@lex.la>
The workload reconciler was refactored to use the label
workloads.cozystack.io/monitor but the test still used the old
workloadmonitor.cozystack.io/name label, causing the reconciler to
delete the workload instead of keeping it.
Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: Aleksei Sviridkin <f@lex.la>
- Check both apiVersion and kind when validating embedded CRD manifests
to prevent applying objects with wrong API group
- Move ctrl.SetupSignalHandler() before install phases so CRD and Flux
installs respect SIGTERM instead of blocking for up to 2 minutes
- Replace custom contains/searchString helpers with strings.Contains
Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: Aleksei Sviridkin <f@lex.la>
Validate that all parsed objects are CustomResourceDefinition before
applying with force server-side apply. This prevents accidental
application of arbitrary resources if a non-CRD file is placed in
the manifests directory.
Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: Aleksei Sviridkin <f@lex.la>
Move duplicated YAML parsing (ReadYAMLObjects, ParseManifestFile) and
CRD readiness check (WaitForCRDsEstablished, CollectCRDNames) into a
shared internal/manifestutil package. Both crdinstall and fluxinstall
now import from manifestutil instead of maintaining identical copies.
Replace fluxinstall's time.Sleep(2s) after CRD apply with proper
WaitForCRDsEstablished polling, matching the crdinstall behavior.
Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: Aleksei Sviridkin <f@lex.la>
- Use explicit apiextensions.k8s.io/v1 CRD GVK in waitForCRDsEstablished
instead of fragile objects[0].GroupVersionKind()
- Add TestInstall_crdNotEstablished for context timeout path
- Add --recursive to diff in verify-crds Makefile target
- Document why both crds/ and --install-crds exist in deployment template
Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: Aleksei Sviridkin <f@lex.la>
Add verify-crds target that diffs packages/core/installer/crds/ and
internal/crdinstall/manifests/ to catch accidental divergence. Include
it in unit-tests target so it runs in CI.
Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: Aleksei Sviridkin <f@lex.la>
- Wait for CRDs to have Established condition after server-side apply,
instead of returning immediately
- Add TestInstall with fake client and interceptor to simulate CRD
establishment
- Add TestInstall_noManifests and TestInstall_writeManifestsFails for
error paths
- Fix fluxinstall/manifests.embed.go: use filepath.Join for OS paths
and restrict permissions from 0666 to 0600 (same fix as crdinstall)
Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: Aleksei Sviridkin <f@lex.la>
- Use filepath.Join instead of path.Join for OS file paths
- Restrict extracted manifest permissions from 0666 to 0600
- Add unit tests for readYAMLObjects, parseManifests, and
WriteEmbeddedManifests including permission verification
Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: Aleksei Sviridkin <f@lex.la>
After generating CRDs to packages/core/installer/crds/, copy them to
internal/crdinstall/manifests/ so the operator binary embeds the latest
CRD definitions.
Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: Aleksei Sviridkin <f@lex.la>
Add --install-crds=true to cozystack-operator container args so the
operator applies embedded CRD manifests on startup via server-side apply.
Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: Aleksei Sviridkin <f@lex.la>
Embed Package and PackageSource CRDs in the operator binary using Go
embed, following the same pattern as --install-flux. The operator applies
CRDs at startup using server-side apply, ensuring they are updated on
every operator restart/upgrade.
This addresses the CRD lifecycle concern: Helm crds/ directory handles
initial install, while the operator manages updates on subsequent
deployments.
Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: Aleksei Sviridkin <f@lex.la>
The test-cluster target references non-existent hack/e2e-cluster.bats
file. Remove it and its dependency from the test target.
Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: Aleksei Sviridkin <f@lex.la>
Helm installs crds/ contents before processing templates, resolving the
chicken-and-egg problem where PackageSource CR validation fails because
its CRD hasn't been registered yet.
- Move definitions/ to crds/ in the installer chart
- Remove templates/crds.yaml (Helm auto-installs from crds/)
- Update codegen script to write CRDs to crds/
- Replace helm template with cat for static CRD manifest generation
- Remove pre-apply CRD workaround from e2e test
Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: Aleksei Sviridkin <f@lex.la>
Helm cannot validate PackageSource CR during install because the CRD
is part of the same chart. Pre-apply CRDs via helm template + kubectl
apply --server-side before running helm upgrade --install.
Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: Aleksei Sviridkin <f@lex.la>
Replace pre-rendered static YAML application with direct helm chart
installation in e2e tests. The chart directory with correct values is
already present in the sandbox after pr.patch application.
- Remove CRD/operator artifact upload/download from CI workflow
- Remove copy-installer-manifest target from testing Makefile
- Use helm upgrade --install from local chart in e2e-install-cozystack.bats
Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: Aleksei Sviridkin <f@lex.la>
## Summary
Add a new `cilium-kilo` networking variant that combines Cilium as the
CNI with Kilo as the WireGuard mesh overlay. This replaces the
standalone kilo PackageSource with a unified variant under the
networking source.
## Changes
- Add `cilium-kilo` variant to `networking.yaml` PackageSource with
proper component ordering and dependencies
- Add `values-kilo.yaml` for Cilium to disable host firewall when used
with Kilo
- Remove standalone `kilo.yaml` PackageSource (now integrated into
networking source)
- Switch Kilo image to official
`ghcr.io/cozystack/cozystack/kilo:v0.8.2`
- Remove unused `podCIDR`/`serviceCIDR` options and `--service-cidr`
flag from Kilo chart
Switch kilo image to official ghcr.io/cozystack/cozystack/kilo:v0.8.2,
remove unnecessary enable-ipip-termination from cilium-kilo values,
and update platform source digest.
Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: Andrei Kvapil <kvapss@gmail.com>
The podCIDR was incorrectly passed as --service-cidr to prevent
masquerading on pod traffic. This is unnecessary for multi-location
mesh and was a leftover from single-cluster assumptions.
Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: Andrei Kvapil <kvapss@gmail.com>
The --service-cidr flag prevents masquerading for service IPs, but
service CIDRs are cluster-local and not useful for multi-location
mesh routing. Remove the serviceCIDR value and its template usage.
Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: Andrei Kvapil <kvapss@gmail.com>
Add a new networking variant that integrates Kilo with Cilium
pre-configured. Cilium is deployed with host firewall disabled and
enable-ipip-termination enabled, which are required for correct IPIP
encapsulation through Cilium's overlay.
Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: Andrei Kvapil <kvapss@gmail.com>
Kilo is now integrated into the cilium-kilo networking variant instead
of being a separate package that users install manually.
Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: Andrei Kvapil <kvapss@gmail.com>
## What this PR does
Fix build error in `pkg/generated/applyconfiguration/utils.go` caused by
a reference to `testing.TypeConverter` which was removed in client-go
v0.34.1.
The root cause was that `hack/update-codegen.sh` called `gen_helpers`
and
`gen_openapi` but never called `gen_client`, so the applyconfiguration
code
was never regenerated after the client-go upgrade.
Changes:
- Fix `THIS_PKG` from `k8s.io/sample-apiserver` template leftover to
correct module path
- Add `kube::codegen::gen_client` call with `--with-applyconfig` flag
- Regenerate applyconfiguration (now uses `managedfields.TypeConverter`)
- Add tests for `ForKind` and `NewTypeConverter` functions
### Release note
```release-note
[maintenance] Regenerate applyconfiguration code for client-go v0.34.1 compatibility
```
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit
* **Documentation**
* Updated backup class definitions example to reference MariaDB instead
of MySQL.
* **Chores**
* Updated code generation tooling and module dependencies to support
enhanced functionality.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
## Summary
- Add pre-creation cleanup of backend deployment/service and NFS pod/PVC
in `run-kubernetes.sh`, so E2E test retries start fresh instead of
reusing stuck resources from a failed attempt
- Increase the tenant deployment wait timeout from 90s to 300s to handle
CI resource pressure, aligning with other timeouts in the same function
(control plane 4m, TCP 5m, NFS pod 5m)
## Context
When `kubernetes-previous` (or `kubernetes-latest`) E2E test fails at
the `kubectl wait deployment --timeout=90s` step, `set -eu` causes
immediate exit before cleanup. On retry, `kubectl apply` sees the stuck
deployment as "unchanged", making retries 2 and 3 guaranteed to fail
against the same stuck pod.
This was observed in [run
22076291555](https://github.com/cozystack/cozystack/actions/runs/22076291555)
where `kubernetes-previous` failed 3/3 attempts while passing in 5/6
other recent runs.
## Test plan
- [ ] E2E tests pass (kubernetes-latest and kubernetes-previous)
- [ ] First clean run: `--ignore-not-found` cleanup commands are no-ops,
no side effects
- [ ] Retry after failure: stale deployment is deleted before
re-creation, fresh pod is scheduled
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit
* **Chores**
* Improved test deployment reliability with more robust pre/post-test
resource cleanup (including storage-related resources) to prevent
leftovers from prior runs.
* Made environment configuration handling more consistent to avoid
misreads of cluster credentials.
* Increased initialization and load balancer wait timeouts to reduce
flaky failures during provisioning and validation.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
## What this PR does
Add DNS-1035 validation for Application names to prevent creation of
resources with invalid names that would fail Kubernetes resource
creation.
### Changes
- Add `ValidateApplicationName()` function using standard
`IsDNS1035Label` from `k8s.io/apimachinery`
- Call validation in REST API `Create()` method (skipped on `Update` —
names are immutable)
- Add name length validation: `prefix + name` must fit within Helm
release name limit (53 chars)
- Remove rootHost-based length validation — the underlying label limit
issue is tracked in #2002
- Remove `parseRootHostFromSecret` and `cozystack-values` Secret reading
at API server startup
- Add unit tests for both DNS-1035 format and length validation
### DNS-1035 Requirements
- Start with a lowercase letter `[a-z]`
- Contain only lowercase alphanumeric or hyphens `[-a-z0-9]`
- End with an alphanumeric character `[a-z0-9]`
Fixes#1538Closes#2001
### Release note
```release-note
[platform] Add DNS-1035 validation for Application names to prevent invalid tenant names
```
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit
* **New Features**
* Enforced DNS-1035 name format and Helm-compatible length limits for
applications during creation; name constraints surfaced early and
reflected in API validation.
* **Documentation**
* Updated tenant naming rules to DNS-1035, clarified hyphen guidance,
and noted maximum length considerations.
* **Tests**
* Added comprehensive tests covering format and length validation,
including many invalid and boundary cases.
* **API**
* OpenAPI/Swagger schemas updated to include name pattern and max-length
validation.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
Quote all tenantkubeconfig-${test_name} references in run-kubernetes.sh
for consistent shell scripting style. The only exception is line 195
inside a sh -ec "..." double-quoted string where inner quotes would
break the outer quoting.
Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: Aleksei Sviridkin <f@lex.la>
When the kubernetes E2E test fails at the deployment wait step, set -eu
causes immediate exit before cleanup. On retry, kubectl apply outputs
"unchanged" for the stuck deployment, making retries 2 and 3 guaranteed
to fail against the same stuck pod.
Add pre-creation cleanup of backend deployment/service and NFS test
resources using --ignore-not-found, so retries start fresh. Also
increase the deployment wait timeout from 90s to 300s to handle CI
resource pressure, aligning with other timeouts in the same function.
Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: Aleksei Sviridkin <f@lex.la>
The applyconfiguration code referenced testing.TypeConverter from
k8s.io/client-go/testing, which was removed in client-go v0.34.1.
Root cause: hack/update-codegen.sh called gen_helpers and gen_openapi
but not gen_client, so applyconfiguration was never regenerated after
the client-go upgrade.
Changes:
- Fix THIS_PKG from sample-apiserver template leftover to correct
module path
- Add kube::codegen::gen_client call with --with-applyconfig flag
- Regenerate applyconfiguration (now uses managedfields.TypeConverter)
- Add tests for ForKind and NewTypeConverter functions
Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: Aleksei Sviridkin <f@lex.la>
Run make generate to bring generated files up to date with current
API types. This was pre-existing staleness unrelated to any code change.
Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: Aleksei Sviridkin <f@lex.la>
This PR prepares the release `v1.0.0-beta.5`.
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit
* **Chores**
* Updated all platform and system components to v1.0.0-beta.5, including
core services, controllers, dashboard, and networking utilities. Changes
include refreshed container image references across installer, platform
migrations, API services, backup systems, and object storage components
for improved stability and consistency.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
This PR adds the changelog for release `v1.0.0-beta.5`.
✅ Changelog has been automatically generated in
`docs/changelogs/v1.0.0-beta.5.md`.
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit
* **New Features**
* Added generic Kubernetes deployment support
* Added Cilium-compatible kilo variant
* **Improvements**
* Enhanced cluster autoscaler with enforced node group minimum sizes
* Upgraded dashboard to version 1.4.0
* **Breaking Changes**
* Updated VPC subnet configuration structure; automated migration
provided
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
## Summary
- Extend the installer chart to support generic and hosted Kubernetes
deployments via the existing `cozystackOperator.variant` parameter
(introduced in #2034)
- For `variant=generic`: render ConfigMaps (`cozystack`,
`cozystack-operator-config`) and an optional Platform Package CR —
resources previously required to be created manually before deploying on
non-Talos clusters
- Add variant validation in `packagesource.yaml` to fail fast on typos
- Publish the installer chart as an OCI Helm artifact
## Motivation
Deploying Cozystack on generic Kubernetes (k3s, kubeadm, RKE2) currently
requires manually creating ConfigMaps before applying the rendered
operator manifest. This change makes the installer chart variant-aware
so that:
1. `helm template -s` workflow continues to produce correct rendered
manifests
2. `helm install --set cozystackOperator.variant=generic` becomes a
viable single-command deployment path for generic clusters
3. Required ConfigMaps and optional Platform Package CR are generated
from values, eliminating manual steps
## OCI Helm chart publishing
The installer chart is now packaged and pushed to the OCI registry as
part of the `image` build target via `make chart`. A `.helmignore` file
ensures only chart-relevant files are included in the published
artifact.
## Test plan
- [ ] `helm template` with `variant=talos` (default) renders: operator +
PackageSource
- [ ] `helm template` with `variant=generic` renders: operator + 2
ConfigMaps + PackageSource
- [ ] `helm template` with `variant=generic` + `platform.enabled=true`
renders: + Package CR
- [ ] `helm template` with `variant=hosted` renders: operator +
PackageSource
- [ ] Invalid variant value produces a clear error message
- [ ] `make manifests` generates all asset files
- [ ] `helm package` produces a clean chart without build artifacts
## Summary
- Add a new `cilium` variant to the kilo PackageSource
- When selected, kilo is deployed with `--compatibility=cilium` flag
- This enables Cilium-aware IPIP encapsulation where outer packets are
routed through Cilium's VxLAN overlay instead of the host network
## Changes
- `packages/core/platform/sources/kilo.yaml` — new `cilium` variant with
`values-cilium.yaml`
- `packages/system/kilo/values-cilium.yaml` — sets `kilo.compatibility:
cilium`
- `packages/system/kilo/templates/kilo.yaml` — conditional
`--compatibility` flag
## Test plan
- [x] Tested on a cluster with Cilium networking and kilo
`--compatibility=cilium`
- [x] Verified IPIP tunneling works through Cilium's VxLAN overlay
- [x] Confirmed `kubectl exec` connectivity to worker nodes across
WireGuard mesh
Add a new "cilium" variant to the kilo PackageSource that deploys kilo
with --compatibility=cilium flag. This enables Cilium-aware IPIP
encapsulation, routing outer packets through Cilium's VxLAN overlay
instead of the host network.
Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: Andrei Kvapil <kvapss@gmail.com>
<!-- Thank you for making a contribution! Here are some tips for you:
- Start the PR title with the [label] of Cozystack component:
- For system components: [platform], [system], [linstor], [cilium],
[kube-ovn], [dashboard], [cluster-api], etc.
- For managed apps: [apps], [tenant], [kubernetes], [postgres],
[virtual-machine] etc.
- For development and maintenance: [tests], [ci], [docs], [maintenance].
- If it's a work in progress, consider creating this PR as a draft.
- Don't hesistate to ask for opinion and review in the community chats,
even if it's still a draft.
- Add the label `backport` if it's a bugfix that needs to be backported
to a previous version.
-->
## What this PR does
Upgrade dashboard to version 1.4.0.
- Upgrade CRDs in CozyStack dashboard controller
- Add new CRD for CFOMapping
- Increase Ingress proxy timeouts, so that websocket won't be terminated
after 10 seconds of idle
- Add patch in Dockerfile to fix temporary 404 "Factory not found" error
while loading factories on page open
<img width="2560" height="1333" alt="dashboard 1 4 0"
src="https://github.com/user-attachments/assets/4bc19d0e-58a6-4506-b571-07cc6bf4880a"
/>
Notable changes:
- cluster selector is not in the sidebar
- namespace selector moved from navigation to table block
### Release note
<!-- Write a release note:
- Explain what has changed internally and for users.
- Start with the same [label] as in the PR title
- Follow the guidelines at
https://github.com/kubernetes/community/blob/master/contributors/guide/release-notes.md.
-->
```release-note
[dashboard] Upgrade dashboard to version 1.4.0
```
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit
* **New Features**
* Added a new customization-mapping resource and integrated it into
dashboard navigation and static bootstrapping.
* **Improvements**
* Better routing between factory and detail pages; navigation sync
added.
* Unified dashboard link/URL placeholders for consistency.
* YAML editor now includes explicit API group/version context.
* Increased ingress/proxy timeouts for stability.
* Expanded frontend configuration for navigation, factories, and
customization behavior.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
## What this PR does
Migrates VPC subnets definition from map format (`map[string]Subnet`) to
array format (`[]Subnet`) with an explicit `name` field. This aligns VPC
subnet definitions with the vm-instance format.
Before:
```yaml
subnets:
mysubnet0:
cidr: "172.16.0.0/24"
```
After:
```yaml
subnets:
- name: mysubnet0
cidr: "172.16.0.0/24"
```
Subnet ID generation remains unchanged — the sha256 input
(`namespace/vpcId/subnetName`) is the same, so existing resources are
not affected.
Includes a migration script (migration 30) that automatically converts
existing VPC HelmRelease values Secrets from map to array format. The
migration is idempotent and skips subnets that are already arrays or
null.
### Release note
```release-note
[vpc] Migrate VPC subnets from map to array format with automatic data migration
```
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit
* **Breaking Changes**
* VPC subnet configuration format changed from object-based to
array-based structure
* **Migration**
* Automatic migration included to convert existing VPC configurations to
new format
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
Signed-off-by: Andrei Kvapil <kvapss@gmail.com>
<!-- Thank you for making a contribution! Here are some tips for you:
- Start the PR title with the [label] of Cozystack component:
- For system components: [platform], [system], [linstor], [cilium],
[kube-ovn], [dashboard], [cluster-api], etc.
- For managed apps: [apps], [tenant], [kubernetes], [postgres],
[virtual-machine] etc.
- For development and maintenance: [tests], [ci], [docs], [maintenance].
- If it's a work in progress, consider creating this PR as a draft.
- Don't hesistate to ask for opinion and review in the community chats,
even if it's still a draft.
- Add the label `backport` if it's a bugfix that needs to be backported
to a previous version.
-->
## What this PR does
### Release note
<!-- Write a release note:
- Explain what has changed internally and for users.
- Start with the same [label] as in the PR title
- Follow the guidelines at
https://github.com/kubernetes/community/blob/master/contributors/guide/release-notes.md.
-->
```release-note
[]
```
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit
* **Chores**
* Updated Kilo container version from v0.7.1 to v0.8.0 with
corresponding digest update.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
Add migration script that converts VPC HelmRelease values from map
format to array format. The script discovers all VirtualPrivateCloud
HelmReleases, reads their values Secrets, and converts subnets using
yq. Idempotent: skips if subnets are already an array or null.
Bumps migration targetVersion from 30 to 31.
Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: Andrei Kvapil <kvapss@gmail.com>
Change VPC subnets from map[string]Subnet to []Subnet with explicit
name field, aligning with the vm-instance subnet format.
Map format: subnets: {mysubnet: {cidr: "x"}}
Array format: subnets: [{name: mysubnet, cidr: "x"}]
Subnet ID generation (sha256 of namespace/vpcId/subnetName) remains
unchanged — subnetName now comes from .name field instead of map key.
ConfigMap output format stays the same.
Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: Andrei Kvapil <kvapss@gmail.com>
- Upgrade CRDs in CozyStack dashboard controller
- Add Ingress proxy timeouts for WebSocket to work without terminations
- Add CFOMapping custom resource
Signed-off-by: Kirill Ilin <stitch14@yandex.ru>
## Summary
- Enable `enforce-node-group-min-size` option for the system
cluster-autoscaler chart
- This ensures node groups are always scaled up to their configured
minimum size, even when current workload doesn't require it
## Test plan
- [ ] Deploy cluster-autoscaler with updated values and verify the flag
is passed to the container
- [ ] Verify node groups scale up to min-size when below minimum
This PR adds the changelog for release `v1.0.0-beta.4`.
✅ Changelog has been automatically generated in
`docs/changelogs/v1.0.0-beta.4.md`.
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit
* **Documentation**
* Added v1.0.0-beta.4 release changelog documenting major features,
improvements in VM architecture, storage/CSI, platform infrastructure,
RBAC, and backups, along with fixes, dependency updates, and development
enhancements.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
This PR prepares the release `v1.0.0-beta.4`.
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit
* **Chores**
* Updated container image references across system components from
v1.0.0-beta.3 to v1.0.0-beta.4, including backup controllers, API
services, dashboard components, and networking modules.
* Updated image digests to reflect latest builds.
* Fixed KubeVirt CSI driver image reference to a specific version.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
## Summary
Add migration script (#28) that converts virtual-machine HelmReleases
into separate vm-disk + vm-instance HelmReleases, preserving disk data,
kube-ovn IPs, and LoadBalancer IPs.
## Changes
- Add migration 28 that discovers all VirtualMachine HelmReleases and
for each:
- Extracts values, saves kube-ovn IP and LoadBalancer IP
- Switches PV reclaim to Retain, stops and deletes the VM
- Creates new PVC bound to existing PV (with volumeMode detection)
- Creates vm-disk and vm-instance HelmReleases with proper values
- Pre-creates LoadBalancer service with metallb annotation if
external=true
- Cleans up old resources including protected services
- Temporarily disables CDI operator and apiserver webhooks during
DataVolume creation to bypass PVC existence validation
- Script is fully idempotent (safe to re-run if interrupted)
Replace `|| echo ""` with `|| true` to avoid newline bugs in variable
assignments. Switch `for x in $(cmd)` loops to `while read` for safer
iteration over kubectl output.
Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: Andrei Kvapil <kvapss@gmail.com>
The virtual-machine application is replaced by separate vm-disk and
vm-instance applications. Migration 28 handles the conversion of
existing VirtualMachine HelmReleases.
Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: Andrei Kvapil <kvapss@gmail.com>
Disable both cdi-operator and cdi-apiserver during DataVolume creation
to prevent CDI webhook from rejecting PVC adoption. Also delete mutating
webhook configurations alongside validating ones. Fix cloud-init value
serialization to avoid spurious newline. Add volumeMode detection from PV.
Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: Andrei Kvapil <kvapss@gmail.com>
Splits virtual-machine HelmReleases into separate vm-disk and vm-instance
components. Handles PV data preservation, kube-ovn IP migration, and
LoadBalancer IP pinning via metallb annotation for external VMs.
Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: Andrei Kvapil <kvapss@gmail.com>
Port three features from virtual-machine to vm-instance:
- Add cpuModel field to specify CPU model without instanceType
- Allow switching between instancetype and custom resources via update hook
- Migrate from running boolean to runStrategy enum
Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: Andrei Kvapil <kvapss@gmail.com>
Signed-off-by: Andrei Kvapil <kvapss@gmail.com>
<!-- Thank you for making a contribution! Here are some tips for you:
- Start the PR title with the [label] of Cozystack component:
- For system components: [platform], [system], [linstor], [cilium],
[kube-ovn], [dashboard], [cluster-api], etc.
- For managed apps: [apps], [tenant], [kubernetes], [postgres],
[virtual-machine] etc.
- For development and maintenance: [tests], [ci], [docs], [maintenance].
- If it's a work in progress, consider creating this PR as a draft.
- Don't hesistate to ask for opinion and review in the community chats,
even if it's still a draft.
- Add the label `backport` if it's a bugfix that needs to be backported
to a previous version.
-->
## What this PR does
### Release note
<!-- Write a release note:
- Explain what has changed internally and for users.
- Start with the same [label] as in the PR title
- Follow the guidelines at
https://github.com/kubernetes/community/blob/master/contributors/guide/release-notes.md.
-->
```release-note
[]
```
<!-- Thank you for making a contribution! Here are some tips for you:
- Start the PR title with the [label] of Cozystack component:
- For system components: [platform], [system], [linstor], [cilium],
[kube-ovn], [dashboard], [cluster-api], etc.
- For managed apps: [apps], [tenant], [kubernetes], [postgres],
[virtual-machine] etc.
- For development and maintenance: [tests], [ci], [docs], [maintenance].
- If it's a work in progress, consider creating this PR as a draft.
- Don't hesistate to ask for opinion and review in the community chats,
even if it's still a draft.
- Add the label `backport` if it's a bugfix that needs to be backported
to a previous version.
-->
## What this PR does
Etcd operator depends on vertical pod autoscaler in its helm chart. This
PR adds vertical pod autoscaler to the operator dependencies
### Release note
<!-- Write a release note:
- Explain what has changed internally and for users.
- Start with the same [label] as in the PR title
- Follow the guidelines at
https://github.com/kubernetes/community/blob/master/contributors/guide/release-notes.md.
-->
```release-note
[etcd-operator] fix dependencies: add vertical-pod-autoscaler
```
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit
* **Chores**
* Updated etcd-operator configuration with an additional dependency
entry to enhance operational capabilities.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
## What this PR does
Fixes a race condition where multiple Packages sharing the same
namespace could overwrite each other's PodSecurity Admission labels via
Server-Side Apply.
When two PackageSources deploy to the same namespace (e.g.
`cozystack.linstor` and `cozystack.linstor-scheduler` both use
`cozy-linstor`), each Package only knew about its own components. If a
Package without `privileged: true` reconciled the namespace, SSA would
remove the `pod-security.kubernetes.io/enforce=privileged` label set by
the other Package.
On Talos clusters (which default to `baseline` PodSecurity enforcement),
this caused privileged pods like LINSTOR satellites to be rejected.
The fix makes each Package check ALL PackageSources and their active
Packages when reconciling a namespace. A namespace is set to
`privileged` if ANY Package has a component with `privileged: true` in
it. This ensures a consistent, holistic decision regardless of
reconciliation order.
### Release note
```release-note
[platform] Fixed namespace PodSecurity label race condition when multiple Packages share a namespace
```
## What this PR does
This PR removes the `cozystackAPIKind` toggle and makes the controller
always reconcile `cozystack-api` as a `Deployment`.
Why:
- `cozystack-api` was switched to `Deployment` in #2041.
- Keeping a DaemonSet/Deployment switch in `cozystack-controller` made
reconciliation mode configuration-dependent.
- In CI this could leave `cozystack-api` without the expected restart on
`ApplicationDefinition` changes, causing dynamic app resources (for
example `tenants`) to be missing temporarily.
Changes:
- Remove `cozystackAPIKind` from `cozystack-controller` chart values.
- Remove conditional `--reconcile-deployment` arg from controller
Deployment template.
- Remove `--reconcile-deployment` flag handling from
`cmd/cozystack-controller`.
- Simplify `ApplicationDefinitionReconciler` to always patch
`cozystack-api` `Deployment` pod template.
### Release note
```release-note
[platform] Remove cozystackAPIKind switch and always reconcile cozystack-api as Deployment
```
Instead of using per-Package SSA field owners (which is a workaround relying
on SSA mechanics), properly resolve whether a namespace should be privileged
by iterating all PackageSources and their active Packages. A namespace gets
the privileged PodSecurity label if ANY Package has a component with
privileged: true installed in it.
This fixes the race condition where Packages sharing a namespace (e.g.
linstor and linstor-scheduler in cozy-linstor) would overwrite each other's
labels depending on reconciliation order.
Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: Andrei Kvapil <kvapss@gmail.com>
When multiple Packages share a namespace (e.g. cozystack.linstor and
cozystack.linstor-scheduler both use cozy-linstor), the shared SSA field
owner "cozystack-package-controller" caused a race condition: the last
Package to reconcile would overwrite the namespace labels set by others.
This meant that if cozystack.linstor set pod-security.kubernetes.io/enforce=
privileged and then cozystack.linstor-scheduler reconciled without the
privileged flag, SSA would remove the label. On Talos clusters (which
default to "baseline" PodSecurity enforcement), this caused privileged
satellite pods to be rejected with PodSecurity violations.
Fix by using per-Package field owners (cozystack-package-{name}), so each
Package independently manages its own namespace labels without interfering
with other Packages sharing the same namespace.
Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: Andrei Kvapil <kvapss@gmail.com>
## What this PR does
Replace DaemonSet-based deployment of cozystack-api with a regular
Deployment
using Service `trafficDistribution: PreferClose`. Previously,
cozystack-api ran
as a DaemonSet on control-plane nodes with `internalTrafficPolicy:
Local` and
direct host kube-apiserver access, providing strict node-local routing
with no
fallback. The new approach prefers routing to the nearest pod but
gracefully
falls back to remote pods when the local one is unavailable.
Key changes:
- Always deploy as Deployment instead of conditional
DaemonSet/Deployment
- Use `trafficDistribution: PreferClose` on the Service (matching
seaweedfs pattern)
- Remove `KUBERNETES_SERVICE_HOST`/`KUBERNETES_SERVICE_PORT` override
- Replace hard `nodeSelector` with soft node affinity for control-plane
nodes
- Simplify the migration hook to always clean up leftover DaemonSet
### Release note
```release-note
[platform] Switch cozystack-api from DaemonSet to Deployment with trafficDistribution: PreferClose for improved resilience
```
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit
* **Breaking Changes**
* Removed localK8sAPIEndpoint configuration option; existing deployments
relying on this setting require reconfiguration.
* **Infrastructure Updates**
* Simplified deployment resource strategy and updated service traffic
distribution behavior to fixed configuration.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
## Summary
- Add a wrapper CSI driver that imports upstream kubevirt-csi-driver as
a Go module, delegates RWO volumes to upstream, and handles RWX
Filesystem volumes via LINSTOR NFS-Ganesha
- Add managed NFS application (`apps/nfs`) for provisioning per-tenant
NFS volumes with 1:1 PVC mapping (enabling snapshots, expansion, and
cloning per-volume)
- Add NFS driver system package with extra StorageClass support for
tenant clusters
- Adapt e2e tests to use native RWX support through the standard
`kubevirt` StorageClass
## How it works
RWX Filesystem PVCs in tenant clusters are handled by the wrapper CSI
driver which:
1. Creates a DataVolume with RWX accessMode in the infra cluster
2. Extracts the NFS endpoint from the LINSTOR PV
(`linstor.csi.linbit.com/nfs-export`)
3. Creates CiliumNetworkPolicy for NFS egress with VMI ownerReferences
4. Mounts NFS (`nfsvers=4.2`) in tenant VMs at the node level
RWO volumes are fully delegated to the upstream kubevirt-csi-driver
(hotplug SCSI).
## Test plan
- [x] `go build` compiles wrapper CSI driver without errors
- [x] RWO volume provisioning still works (regression)
- [x] RWX Filesystem PVC creates DataVolume with RWX in infra cluster
- [x] NFS endpoint correctly extracted from LINSTOR PV
- [x] CiliumNetworkPolicy created with correct VMI ownerReferences
- [x] NFS mount works in tenant VMs (write + read data)
- [ ] e2e test passes (`run-kubernetes.sh`)
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit
* **New Features**
* ReadWriteMany (RWX) NFS storage support via KubeVirt CSI driver.
* Dynamic CiliumNetworkPolicy management to allow NFS server egress.
* NFS volume mounting at the node level with automatic mount detection.
* **Tests**
* End-to-end test sequence validating RWX NFS PVC binding, I/O, and
cleanup.
* **Chores**
* Updated CSI driver image reference and runtime image deps (nfs-utils,
fs tools).
* Added RBAC for network policy management.
* New runtime flags to toggle controller/node services; updated Helm
ignore patterns.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
<!-- Thank you for making a contribution! Here are some tips for you:
- Start the PR title with the [label] of Cozystack component:
- For system components: [platform], [system], [linstor], [cilium],
[kube-ovn], [dashboard], [cluster-api], etc.
- For managed apps: [apps], [tenant], [kubernetes], [postgres],
[virtual-machine] etc.
- For development and maintenance: [tests], [ci], [docs], [maintenance].
- If it's a work in progress, consider creating this PR as a draft.
- Don't hesistate to ask for opinion and review in the community chats,
even if it's still a draft.
- Add the label `backport` if it's a bugfix that needs to be backported
to a previous version.
-->
## What this PR does
Now CozyStack operator removes `suspend` field on all changes, so
suspending resource is impossible. This breaks local development for
packages using `make apply`
This PR preserves `suspend` field for HelmRelease from existing
manifest.
### Release note
<!-- Write a release note:
- Explain what has changed internally and for users.
- Start with the same [label] as in the PR title
- Follow the guidelines at
https://github.com/kubernetes/community/blob/master/contributors/guide/release-notes.md.
-->
```release-note
[cozystack-operator] fix: preserve `suspend` field for `HelmRelease` in package reconciler
```
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit
## Release Notes
* **Bug Fixes**
* Fixed issue where the suspend state of an existing HelmRelease was not
preserved during updates, ensuring manual suspension settings remain
intact across reconciliation operations.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
- Move nil check before req dereference in CreateVolume
- Scope CiliumNetworkPolicy endpointSelector to specific VMI
- Use vmNamespace from NodeId for VMI lookup instead of infraNamespace
- Log PVC lookup errors in ControllerExpandVolume
- Wrap CNP ownerReference updates in retry.RetryOnConflict
- Fix infraClusterLabels validation to check runControllerService flag
- Dereference nodeName pointer in error message
- Replace panic with klog.Fatal for consistent error handling
- Honor CSI readonly flag in NFS NodePublishVolume
- Log mount list errors in isNFSMount
- Reorder Dockerfile ENTRYPOINT after COPY for better layer caching
- Add cleanup on e2e test failure and --wait on pod deletion
Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: Andrei Kvapil <kvapss@gmail.com>
Remove separate NFS Application dependency from e2e test. The kubevirt
CSI driver wrapper now handles RWX Filesystem volumes natively - PVCs
with ReadWriteMany accessMode use the standard kubevirt StorageClass.
Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: Andrei Kvapil <kvapss@gmail.com>
Implement a wrapper around upstream kubevirt-csi-driver that adds RWX
Filesystem volume support via LINSTOR NFS exports:
- CreateVolume: intercepts RWX+Filesystem requests, creates DataVolume
with explicit AccessMode=RWX and VolumeMode=Filesystem
- ControllerPublishVolume: waits for PVC bound, extracts NFS export URL
from infra PV, creates CiliumNetworkPolicy with VMI ownerReferences
- ControllerUnpublishVolume: manages CNP ownerReferences per-VMI
- ControllerExpandVolume: delegates to upstream, disables node expansion
for NFS volumes
- NodeStageVolume/NodePublishVolume: mounts NFS at target path
- NodeExpandVolume: no-op for NFS (LINSTOR handles resize)
Also includes:
- RBAC for CiliumNetworkPolicy management in infra cluster
- Explicit --run-node-service=false for controller deployment
- Explicit --run-controller-service=false for node DaemonSet
- nfs-utils in container image for NFS mount support
Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: Andrei Kvapil <kvapss@gmail.com>
## What this PR does
Fixes Helm upgrade failure for `monitoring-agents` caused by unrendered
`{{ .Release.Namespace }}` template expressions in `values.yaml`.
Introduces `global.target` parameter to control the target namespace for
monitoring services (vminsert, vlogs). Default is `cozy-system`,
platform bundle passes `tenant-root`. Uses `tpl` in vmagent template to
render URLs containing template expressions.
### Release note
```release-note
[monitoring] Fix YAML parse error in monitoring-agents vmagent template
```
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit
* **Chores**
* Enhanced monitoring agents configuration to use centralized
target-based values, replacing namespace-based identifiers for improved
consistency across multiple environments.
* Updated monitoring component labeling and URL handling to leverage
parametrized configuration approach for greater flexibility.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
## What this PR does
Renames the `mysql` application to `mariadb` across the codebase. The
chart has always deployed MariaDB (via mariadb-operator), but was
incorrectly named "mysql", causing confusion.
- Rename `packages/apps/mysql` → `packages/apps/mariadb` with all
internal references updated
- Rename `packages/system/mysql-rd` → `packages/system/mariadb-rd` with
CRD kind `MySQL` → `MariaDB`
- Update platform source, bundle references, RBAC, e2e tests, and backup
controller tests
- Add platform migration 27 to auto-discover and rename all deployed
MySQL resources to mariadb
- Preserves PVC data via PV claimRef rebinding
- Handles protection-webhook and mariadb-operator webhook during
migration
- Idempotent: safe to re-run
Real MySQL CLI/config tool names (`mysqldump`, `[mysqld]`, `MYSQL_*` env
vars) are intentionally left unchanged.
### Release note
```release-note
[apps] Rename mysql application to mariadb. A platform migration automatically renames all deployed MySQL resources. No user action required.
```
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit
* **New Features**
* Added automated migration capability from MySQL to MariaDB with data
persistence.
* **Tests**
* Added end-to-end tests for MariaDB deployment and validation.
* **Chores**
* Updated platform components and configuration to use MariaDB as the
default database solution.
* Migrated all internal references, documentation, and build
configurations from MySQL to MariaDB.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
Replace DaemonSet with direct host API access in favor of a regular
Deployment using Service trafficDistribution: PreferClose. This provides
prefer-local routing to the nearest cozystack-api pod with fallback
to remote pods when local one is unavailable.
- Replace DaemonSet/Deployment toggle with always-Deployment
- Replace internalTrafficPolicy: Local with trafficDistribution: PreferClose
- Remove KUBERNETES_SERVICE_HOST/PORT override (use default kubernetes service)
- Replace hard nodeSelector with soft nodeAffinity for control-plane nodes
- Simplify migration hook to always clean up DaemonSet if present
Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: Andrei Kvapil <kvapss@gmail.com>
## What this PR does
Migration 23 already removes the `cozystack-resource-definition-crd`
HelmRelease, but existing Helm release secrets
for `*-rd` applications still reference the deleted
`CozystackResourceDefinition` CRD, causing upgrade failures.
This adds a step to delete those stale Helm secrets so that Flux can
recreate the releases cleanly.
### Release note
```release-note
[platform] Clean up stale Helm release secrets for *-rd applications during migration
```
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit
* **Chores**
* Enhanced system cleanup routine to remove deprecated Kubernetes
secrets during version updates.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
Add platform migration that auto-discovers all deployed MySQL
HelmReleases and renames their resources to use the mariadb prefix.
The migration handles PVC data preservation via PV claimRef rebinding,
temporarily disables protection-webhook for protected resource deletion,
and scales mariadb-operator once for all instances.
Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: Andrei Kvapil <kvapss@gmail.com>
The mysql chart actually deploys MariaDB via mariadb-operator, but was
incorrectly named "mysql". Rename all references to use the correct
"mariadb" name across the codebase.
Changes:
- Rename packages/apps/mysql -> packages/apps/mariadb
- Rename packages/system/mysql-rd -> packages/system/mariadb-rd
- Rename platform source and bundle references
- Update CRD kind from MySQL to MariaDB
- Update RBAC, e2e tests, backup controller tests
- Keep real MySQL CLI/config tool names unchanged (mysqldump, [mysqld], etc.)
Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: Andrei Kvapil <kvapss@gmail.com>
Root-host validation for Tenant names is no longer needed here.
The underlying issue (namespace.cozystack.io/host label exceeding
63-char limit) will be addressed in #2002 by moving the label
to an annotation.
Name length validation now only checks the Helm release name
limit (53 - prefix length), which applies uniformly to all
application types.
Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: Aleksei Sviridkin <f@lex.la>
## What this PR does
Separates Piraeus CRD management from the `piraeus-operator` chart into
the dedicated `piraeus-operator-crds` chart. This prevents
`piraeus-operator` from deleting CRDs during upgrades.
Changes:
- Add `installCRDs` toggle to `piraeus-operator-crds` chart
- Update `piraeus-operator` Makefile to move CRDs template on chart
update
- Add migration 27 to reassign Helm ownership on existing Piraeus CRDs
and clean up stale Helm secrets
### Release note
```release-note
[linstor] Move CRDs installation logic to piraeus-operator-crds chart
```
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit
* **New Features**
* CRD installation can now be controlled via configuration settings
(enabled by default).
* System version upgraded to 28.
* **Chores**
* Added migration script to handle system upgrade from version 27 to 28,
including metadata updates and configuration management.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
## What this PR does
Remove the FerretDB managed application from Cozystack. This includes
the application Helm chart, resource definition, platform source, PaaS
bundle entry, RBAC clusterrole entry, and e2e test. Historical migration
scripts are left intact for upgrade compatibility.
### Release note
```release-note
[ferretdb] Removed FerretDB managed application
```
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit
* **Chores**
* Removed FerretDB managed database service and associated Helm chart,
documentation, and test components from the platform.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
## What this PR does
Merges three separate operator deployment templates
(`cozystack-operator.yaml`,
`cozystack-operator-generic.yaml`, `cozystack-operator-hosted.yaml`)
into a single
template with variant selection via `cozystackOperator.variant`
parameter in `values.yaml`.
This resolves conflicts when running `make apply` from
`packages/core/installer/`, which
previously rendered all three Deployment manifests simultaneously. Now
only the selected
variant is rendered (default: `talos`).
The root `Makefile` manifests target passes `--set
cozystackOperator.variant=<variant>`
to generate the separate asset files as before.
### Release note
```release-note
[installer] Merge operator deployment templates into a single variant-based template with `cozystackOperator.variant` parameter (talos/generic/hosted)
```
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit
* **Refactor**
* Consolidated operator deployment templates into a unified
configuration system
* Implemented variant-specific handling for Talos, Generic, and Hosted
deployment environments
* Updated operator asset references and naming conventions across build,
release, and testing workflows
* Improved deployment configuration consistency and maintainability
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
Replace unrendered Helm template expressions in values.yaml with
global.target parameter. Pass tenant-root from platform bundle,
default to cozy-monitoring. Use tpl in vmagent template to render URLs.
Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: Andrei Kvapil <kvapss@gmail.com>
Migrate Helm ownership labels and annotations on Piraeus CRDs from
piraeus-operator to piraeus-operator-crds, and delete stale Helm
secrets to prevent piraeus-operator from removing CRDs on upgrade.
Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: Andrei Kvapil <kvapss@gmail.com>
Update piraeus-operator Makefile to move CRDs template into
piraeus-operator-crds chart on update, and add installCRDs toggle.
Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: Andrei Kvapil <kvapss@gmail.com>
Add -talos suffix to the default variant output file for consistency
with -generic and -hosted variants. Update all references in CI
workflows, e2e tests, upload scripts, and testing Makefile.
Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: Andrei Kvapil <kvapss@gmail.com>
Consolidate three separate operator deployment templates (talos, generic,
hosted) into a single template with variant selection via values.yaml
parameter. This resolves conflicts when running `make apply` from the
installer package, which previously rendered all three deployments.
Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: Andrei Kvapil <kvapss@gmail.com>
## What this PR does
Previously the debug log collection script that fired when CI failed
treated Packages and PackageSources as namespaced resources and as a
result of incorrect parsing failed to correctly kubectl describe and
kubectl get -oyaml them. Additionally, the script did not read the logs
of init containers. These issues are fixed with this patch.
### Release note
```release-note
[ci] Improvements to cozyreport.sh (ci log collection script): fix
retrieval of Package and PackageSource details, consider initContainers
as well as containers, when fetching logs of errored pods.
```
## What this PR does
This patch improves the naming conventions used in Cozystack's RBAC
resources. It follows the k8s system convention of using colons as
separators in RBAC resource names (e.g. system:nodes:<nodename>) and
renames some default tenant clusterroles to a scheme like
cozy:tenant:admin.
### Release note
```release-note
[rbac] Use a more structured naming convention for Cozystack's RBAC
resources.
```
## What this PR does
This patch adds cluster roles that get deployed with the backup
controller and which follow aggregation rules to automatically let users
work with resources from the backups.cozystack.io API group as soon as
the CRDs and controller are installed.
### Release note
```release-note
[backups] Add RBAC resources to let users work with backups.
```
## What this PR does
This patch improves the naming conventions used in Cozystack's RBAC
resources. It follows the k8s system convention of using colons as
separators in RBAC resource names (e.g. system:nodes:<nodename>) and
renames some default tenant clusterroles to a scheme like
cozy:tenant:admin.
### Release note
```release-note
[rbac] Use a more structured naming convention for Cozystack's RBAC
resources.
```
Signed-off-by: Timofei Larkin <lllamnyp@gmail.com>
Keep the 1-minute timeout for other components (cilium, coredns, csi,
vsnap-crd) to preserve fast failure detection, and apply the 5-minute
timeout specifically to ingress-nginx which needs it after the
hostNetwork to NodePort migration.
Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: Aleksei Sviridkin <f@lex.la>
The 1-minute timeout for waiting on HelmRelease readiness is too short
for ingress-nginx after its migration from hostNetwork to NodePort
Service, causing consistent E2E failures on kubernetes-latest.
Increase the timeout to 5 minutes to allow sufficient time for all
components to become ready.
Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: Aleksei Sviridkin <f@lex.la>
## What this PR does
Previously the debug log collection script that fired when CI failed
treated Packages and PackageSources as namespaced resources and as a
result of incorrect parsing failed to correctly kubectl describe and
kubectl get -oyaml them. Additionally, the script did not read the logs
of init containers. These issues are fixed with this patch.
### Release note
```release-note
[ci] Improvements to cozyreport.sh (ci log collection script): fix
retrieval of Package and PackageSource details, consider initContainers
as well as containers, when fetching logs of errored pods.
```
Signed-off-by: Timofei Larkin <lllamnyp@gmail.com>
## What this PR does
This patch adds cluster roles that get deployed with the backup
controller and which follow aggregation rules to automatically let users
work with resources from the backups.cozystack.io API group as soon as
the CRDs and controller are installed.
### Release note
```release-note
[backups] Add RBAC resources to let users work with backups.
```
Signed-off-by: Timofei Larkin <lllamnyp@gmail.com>
This PR adds the changelog for release `v1.0.0-beta.3`.
✅ Changelog has been automatically generated in
`docs/changelogs/v1.0.0-beta.3.md`.
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit
* **Documentation**
* Added comprehensive changelog for v1.0.0-beta.3 release documenting
new applications, system components, networking features, virtual
machine functionality, backup capabilities, platform improvements, bug
fixes, updated dependencies, development enhancements, CI/CD changes,
and documentation updates with full contributor credits and detailed
comparison reference.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
This PR prepares the release `v1.0.0-beta.3`.
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit
## Release Notes
* **Chores**
* Updated container image versions from v1.0.0-beta.2 to v1.0.0-beta.3
across multiple system and application components.
* Refreshed image digests for cluster-autoscaler, kubevirt-csi-driver,
dashboard services, and backup controllers.
* Updated supporting infrastructure component references including
SeaweedFS, KubeOVN, Linstor, and Grafana components.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
- Fix goimports order: duration before validation/field
- Show rootHost in error messages only for Tenant kind where it
actually affects the length calculation
Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: Aleksei Sviridkin <f@lex.la>
- Return field.ErrorList from validateNameLength for consistent
apierrors.NewInvalid error shape (was NewBadRequest)
- Add klog warning when YAML parsing fails in parseRootHostFromSecret
- Fix maxHelmReleaseName comment to accurately describe Helm convention
- Add note that root-host changes require API server restart
- Replace interface{} with any throughout openapi.go and rest.go
- Remove trailing blank line in const block
Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: Aleksei Sviridkin <f@lex.la>
Remove patchObjectMetaNameValidation and patchObjectMetaNameValidationV2
functions that were modifying the global ObjectMeta schema. This patching
affected ALL resources served by the API server, not just Application
resources. Backend validation in Create() is sufficient for enforcing
name constraints.
Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: Aleksei Sviridkin <f@lex.la>
Replace custom DNS-1035 regex with k8s.io/apimachinery IsDNS1035Label.
Remove hardcoded maxApplicationNameLength=40 from both validation and
OpenAPI — length validation is now handled entirely by validateNameLength
which computes dynamic limits based on Helm release prefix and root-host.
Fix README to reflect that max length depends on cluster configuration.
Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: Aleksei Sviridkin <f@lex.la>
Skip DNS-1035 and length validation on Update since Kubernetes names
are immutable — validating would block updates to pre-existing resources
with non-conforming names. Replace fmt.Printf with klog for structured
logging consistency.
Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: Aleksei Sviridkin <f@lex.la>
- Add DNS-1035 format validation to Update path (was only in Create)
- Simplify Secret reading by reusing existing scheme instead of
creating a separate client
- Add nil secret test case for parseRootHostFromSecret
Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: Aleksei Sviridkin <f@lex.la>
Add protection against negative or zero maxLen when release prefix or
root host are too long, returning a clear configuration error instead of
a confusing "name too long" message. Add corresponding test cases.
Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: Aleksei Sviridkin <f@lex.la>
Read root-host from cozystack-values secret at API server startup
and use it to compute maximum allowed name length for applications.
For all apps: validates prefix + name fits within the Helm release
name limit (53 chars). For Tenants: additionally checks that the
host label (name + "." + rootHost) fits within the Kubernetes label
value limit (63 chars).
This replaces the static 40-char limit with a dynamic calculation
that accounts for the actual cluster root host length.
Ref: https://github.com/cozystack/cozystack/issues/2001
Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: Aleksei Sviridkin <f@lex.la>
Add pattern and maxLength constraints to ObjectMeta.name in OpenAPI schema.
This enables UI form validation when openapi-k8s-toolkit supports it.
- Pattern: ^[a-z]([-a-z0-9]*[a-z0-9])?$ (DNS-1035)
- MaxLength: 40
Depends on: cozystack/openapi-k8s-toolkit#1
Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: Aleksei Sviridkin <f@lex.la>
Clarify DNS-1035 naming rules:
- Must start with lowercase letter
- Allowed characters: a-z, 0-9, hyphen
- Must end with letter or number
- Maximum 40 characters
Change wording from "not allowed" to "discouraged" for dashes
since the validation technically permits them.
Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: Aleksei Sviridkin <f@lex.la>
Add validation to ensure Application names (including Tenants) conform
to DNS-1035 format. This prevents creation of resources with names
starting with digits, which would cause Kubernetes resource creation
failures (e.g., Services, Namespaces).
DNS-1035 requires names to:
- Start with a lowercase letter [a-z]
- Contain only lowercase alphanumeric or hyphens [-a-z0-9]
- End with an alphanumeric character [a-z0-9]
Also fixes broken validation.go that referenced non-existent internal
types (apps.Application, apps.ApplicationSpec).
Fixes: https://github.com/cozystack/cozystack/issues/1538
Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: Aleksei Sviridkin <f@lex.la>
Signed-off-by: Andrei Kvapil <kvapss@gmail.com>
<!-- Thank you for making a contribution! Here are some tips for you:
- Start the PR title with the [label] of Cozystack component:
- For system components: [platform], [system], [linstor], [cilium],
[kube-ovn], [dashboard], [cluster-api], etc.
- For managed apps: [apps], [tenant], [kubernetes], [postgres],
[virtual-machine] etc.
- For development and maintenance: [tests], [ci], [docs], [maintenance].
- If it's a work in progress, consider creating this PR as a draft.
- Don't hesistate to ask for opinion and review in the community chats,
even if it's still a draft.
- Add the label `backport` if it's a bugfix that needs to be backported
to a previous version.
-->
## What this PR does
### Release note
<!-- Write a release note:
- Explain what has changed internally and for users.
- Start with the same [label] as in the PR title
- Follow the guidelines at
https://github.com/kubernetes/community/blob/master/contributors/guide/release-notes.md.
-->
```release-note
Update Kube-OVN to v1.15.3
```
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit
* **New Features**
* DNS name resolver support for improved service discovery.
* Non-primary CNI mode configuration.
* **Improvements**
* Enhanced RBAC and tighter secret-access scoping.
* Added ephemeral-storage resource limits across components.
* Better dual-stack and single-stack networking handling.
* Expanded CRD schemas with richer descriptions and status fields.
* **Updates**
* KubeOVN chart and app versions updated to v1.15.3.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
## What this PR does
Update the MongoDB application logo to the official MongoDB icon with
the leaf symbol on a green radial gradient background, replacing the
previous simplified version.
### Release note
```release-note
[mongodb] Updated MongoDB logo to the official icon
```
Replace the old MongoDB logo with the official one featuring
the MongoDB leaf icon on a green radial gradient background.
Co-authored-by: Viktoriia Kvapil <159528100+kvapsova@users.noreply.github.com>
Signed-off-by: Andrei Kvapil <kvapss@gmail.com>
Remove the FerretDB managed application, its resource definition,
platform source, RBAC entry, and e2e test. Historical migration
scripts are left intact for upgrade compatibility.
Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: Andrei Kvapil <kvapss@gmail.com>
## What this PR does
Add a new system package `cluster-autoscaler` that supports multiple
cloud providers (Hetzner and Azure) for automatic node scaling of
Cozystack management cluster.
Key features:
- Single base package with provider-specific variants via PackageSource
- Auto-update capability from upstream Helm chart
- RBAC rules for leader election (coordination.k8s.io/leases)
- Comprehensive documentation for Hetzner setup with Talos Linux
The Hetzner documentation covers:
- Talos image creation via rescue mode snapshot
- vSwitch (private network) configuration
- Correct Talos machine config structure (nodeLabels, nodeIP,
cloud-provider)
- Testing with pod anti-affinity
- Troubleshooting common issues
- Kilo mesh networking integration
### Release note
```release-note
[system] Add cluster-autoscaler package with support for Hetzner Cloud and Azure providers
```
Documentation moved to the website repository as a separate PR.
Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: Andrei Kvapil <kvapss@gmail.com>
Cluster-autoscaler does not require privileged installation.
Remove the unnecessary privileged: true setting from both
Hetzner and Azure PackageSource definitions.
Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: Andrei Kvapil <kvapss@gmail.com>
## What this PR does
This patch installs the kubevirt-velero-plugin, so that backing up a
kubevirt resource such as a VMI will automatically include related
items, such as datavolumes, and, eventually, all the way down to pods
and volumes.
### Release note
```release-note
[backups] Include the kubevirt-velero-plugin with the Velero package.
```
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit
* **New Features**
* Added KubeVirt plugin integration: Velero now provides comprehensive
backup and restore support for KubeVirt-managed virtual machines and
associated resources. The plugin is automatically installed and
configured during deployment, enabling unified disaster recovery for
both containerized applications and virtualized workloads.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
## What this PR does
Enables the installation of the backupstrategy controller by default.
### Release note
```release-note
[backups] Install the backupstrategy controller by default.
```
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit
* **Chores**
* Backup strategy component is now included by default in all system
installations.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
## What this PR does
This patch adds another label selector for resources being backed up by
the default Velero strategy for virtual machines, ensuring that the
actual VM and VMI are captured by selector.
### Release note
```release-note
[backups] Refined the label selector in the Velero VM backup strategy to
capture resources previously missed.
```
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit
* **New Features**
* Expanded backup scope to include additional virtual machine and
storage resources.
* Added configurable timeout for snapshot operations (10 minutes).
* Enhanced resource selection logic for more flexible backup filtering.
* **Configuration Changes**
* Updated backup strategy configuration for improved resource coverage.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
## Summary
- Remove duplicate `dashboards.list` from `extra/monitoring`
- Dashboard files should only exist in `system/monitoring`
- Dashboards are created by `system/monitoring` when deployed via
HelmRelease
## Test plan
- [ ] Deploy extra/monitoring and verify GrafanaDashboard resources are
created by system/monitoring
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit
* **Chores**
* Removed monitoring dashboard configuration entries from the
deployment.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
## What this PR does
Adds optional node-lifecycle-controller component to local-ccm package
that automatically deletes unreachable NotReady nodes from the cluster.
This solves the problem of "zombie" node objects left behind when
cluster autoscaler deletes cloud instances but the node object remains
in Kubernetes.
**Features:**
- Monitors nodes matching a configurable label selector
- Deletes nodes that are NotReady for configurable duration (default:
5m)
- Verifies unreachability via ICMP ping before deletion
- Supports protected labels to prevent deletion of specific nodes
- Leader election for HA deployment
**Usage:**
```yaml
nodeLifecycleController:
enabled: true
nodeSelector: "node.kubernetes.io/instance-type" # optional
protectedLabels: "kilo.squat.ai/leader" # optional
```
Disabled by default.
### Release note
```release-note
[local-ccm] Add optional node-lifecycle-controller to automatically cleanup unreachable NotReady nodes
```
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit
* **New Features**
* Introduced optional Node Lifecycle Controller component for managing
node readiness states with configurable timeouts, health monitoring, and
reconciliation intervals. Supports leader election, dry-run modes, and
customizable node selectors and protected labels for operational
flexibility. Disabled by default.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
## What this PR does
Updates kilo to v0.7.0 from the cozystack fork, replacing the local
image build (with patches applied via Dockerfile) with a pre-built image
from `ghcr.io/cozystack/cozystack/kilo`. This simplifies the build
process and removes the need for maintaining local patches.
Additionally, exposes the `--mtu` flag in kilo's `values.yaml` to allow
overriding the WireGuard interface MTU. The default value is set to
`auto`, letting kilo determine the optimal MTU automatically.
### Release note
```release-note
[kilo] Update kilo to v0.7.0 from cozystack fork; add configurable `mtu` parameter (default: auto) for WireGuard interface
```
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit
* **New Features**
* Added configurable MTU support with automatic detection
* Added internal CIDR filtering capability for network mesh
configuration
* **Updates**
* Updated kilo image version to v0.7.0
* Transitioned to pre-built remote images
* **Improvements**
* Refined IP overlap and containment warning logic for better
diagnostics
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
<!-- Thank you for making a contribution! Here are some tips for you:
- Start the PR title with the [label] of Cozystack component:
- For system components: [platform], [system], [linstor], [cilium],
[kube-ovn], [dashboard], [cluster-api], etc.
- For managed apps: [apps], [tenant], [kubernetes], [postgres],
[virtual-machine] etc.
- For development and maintenance: [tests], [ci], [docs], [maintenance].
- If it's a work in progress, consider creating this PR as a draft.
- Don't hesistate to ask for opinion and review in the community chats,
even if it's still a draft.
- Add the label `backport` if it's a bugfix that needs to be backported
to a previous version.
-->
## What this PR does
### Release note
<!-- Write a release note:
- Explain what has changed internally and for users.
- Start with the same [label] as in the PR title
- Follow the guidelines at
https://github.com/kubernetes/community/blob/master/contributors/guide/release-notes.md.
-->
```release-note
[cilium] change cilium-operator replicas to 1
```
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit
* **Chores**
* Updated operator configuration settings. Added support for configuring
the number of operator replicas, enabling adjustment of deployment
specifications based on infrastructure requirements and resource
availability.
<sub>✏️ Tip: You can customize this high-level summary in your review
settings.</sub>
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
<!-- Thank you for making a contribution! Here are some tips for you:
- Start the PR title with the [label] of Cozystack component:
- For system components: [platform], [system], [linstor], [cilium],
[kube-ovn], [dashboard], [cluster-api], etc.
- For managed apps: [apps], [tenant], [kubernetes], [postgres],
[virtual-machine] etc.
- For development and maintenance: [tests], [ci], [docs], [maintenance].
- If it's a work in progress, consider creating this PR as a draft.
- Don't hesistate to ask for opinion and review in the community chats,
even if it's still a draft.
- Add the label `backport` if it's a bugfix that needs to be backported
to a previous version.
-->
## What this PR does
Adds system package for
[clustersecret-operator](https://github.com/sap/clustersecret-operator)
using Helm Chart v0.3.68
### Release note
<!-- Write a release note:
- Explain what has changed internally and for users.
- Start with the same [label] as in the PR title
- Follow the guidelines at
https://github.com/kubernetes/community/blob/master/contributors/guide/release-notes.md.
-->
```release-note
[clustersecret-operator] add option to deploy clustersecret-operator v0.3.68
```
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit
## Release Notes
* **New Features**
* Added ClusterSecret operator for managing cluster-scoped secrets
across multiple namespaces
* Secrets can be distributed to selected namespaces using namespace
selectors
* Supports template-based secret creation with customizable data
* Includes validation webhooks and admission control for ClusterSecret
resources
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
## What this PR does
The cozystack-values secret is created as part of the cozystack-basics
chart and is required by cozystack to install properly, however there is
a race condition between it and the lineage-controller-webhook, where it
may be blocked from being created if the mutating webhook configuration
is already set up. By adding a system label to this secret it is dropped
from consideration by the webhook.
### Release note
```release-note
[platform] Resolve race condition between the system cozystack-values
secret and the lineage webhook by adding a label that causes the webhook
to ignore this secret.
```
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit
* **Chores**
* Updated system manifest metadata configuration.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
## What this PR does
Enable insecure TLS verification for Keycloak communication to support
environments with self-signed certificates:
- **keycloak-configure**: Switch ClusterKeycloak to use internal service
URL (`http://keycloak-http.cozy-keycloak.svc:8080`) and enable
`insecureSkipVerify: true`
- **dashboard**: Add `--ssl-insecure-skip-verify=true` flag to
oauth2-proxy
### Release note
```release-note
[keycloak-configure,dashboard] Enable insecure TLS verification for Keycloak by default to support self-signed certificates
```
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit
* **Chores**
* Added an option to skip OIDC TLS certificate verification (disabled by
default).
* Authentication proxy can now be configured to omit SSL verification
when enabled.
* Keycloak connection switched to a non-TLS endpoint and TLS
verification handling updated.
* Removed the Keycloak authorization-services toggle from configuration.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
## What this PR does
The cozystack-values secret is created as part of the cozystack-basics
chart and is required by cozystack to install properly, however there is
a race condition between it and the lineage-controller-webhook, where it
may be blocked from being created if the mutating webhook configuration
is already set up. By adding a system label to this secret it is dropped
from consideration by the webhook.
### Release note
```release-note
[platform] Resolve race condition between the system cozystack-values
secret and the lineage webhook by adding a label that causes the webhook
to ignore this secret.
```
Signed-off-by: Timofei Larkin <lllamnyp@gmail.com>
## What this PR does
This patch adds another label selector for resources being backed up by
the default Velero strategy for virtual machines, ensuring that the
actual VM and VMI are captured by selector.
### Release note
```release-note
[backups] Refined the label selector in the Velero VM backup strategy to
capture resources previously missed.
```
Signed-off-by: Timofei Larkin <lllamnyp@gmail.com>
## What this PR does
Enables the installation of the backupstrategy controller by default.
### Release note
```release-note
[backups] Install the backupstrategy controller by default.
```
Signed-off-by: Timofei Larkin <lllamnyp@gmail.com>
## What this PR does
This patch installs the kubevirt-velero-plugin, so that backing up a
kubevirt resource such as a VMI will automatically include related
items, such as datavolumes, and, eventually, all the way down to pods
and volumes.
### Release note
```release-note
[backups] Include the kubevirt-velero-plugin with the Velero package.
```
Signed-off-by: Timofei Larkin <lllamnyp@gmail.com>
<!-- Thank you for making a contribution! Here are some tips for you:
- Start the PR title with the [label] of Cozystack component:
- For system components: [platform], [system], [linstor], [cilium],
[kube-ovn], [dashboard], [cluster-api], etc.
- For managed apps: [apps], [tenant], [kubernetes], [postgres],
[virtual-machine] etc.
- For development and maintenance: [tests], [ci], [docs], [maintenance].
- If it's a work in progress, consider creating this PR as a draft.
- Don't hesistate to ask for opinion and review in the community chats,
even if it's still a draft.
- Add the label `backport` if it's a bugfix that needs to be backported
to a previous version.
-->
## What this PR does
Add field `cpuModel` to specify VirtualMachine CPU model without usage
of `InstanceType`
### Release note
<!-- Write a release note:
- Explain what has changed internally and for users.
- Start with the same [label] as in the PR title
- Follow the guidelines at
https://github.com/kubernetes/community/blob/master/contributors/guide/release-notes.md.
-->
```release-note
[vm] add field `cpuModel` to specify CPU model without usage of KubeVirt InstanceType
```
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit
* **New Features**
* Added CPU model configuration parameter for virtual machines, allowing
users to specify the desired CPU model for their VM instances.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
Dashboard files should only be in system/monitoring.
The dashboards are created by system/monitoring when deployed
via HelmRelease from extra/monitoring.
Signed-off-by: IvanHunters <xorokhotnikov@gmail.com>
<!-- Thank you for making a contribution! Here are some tips for you:
- Start the PR title with the [label] of Cozystack component:
- For system components: [platform], [system], [linstor], [cilium],
[kube-ovn], [dashboard], [cluster-api], etc.
- For managed apps: [apps], [tenant], [kubernetes], [postgres],
[virtual-machine] etc.
- For development and maintenance: [tests], [ci], [docs], [maintenance].
- If it's a work in progress, consider creating this PR as a draft.
- Don't hesistate to ask for opinion and review in the community chats,
even if it's still a draft.
- Add the label `backport` if it's a bugfix that needs to be backported
to a previous version.
-->
## What this PR does
### Release note
<!-- Write a release note:
- Explain what has changed internally and for users.
- Start with the same [label] as in the PR title
- Follow the guidelines at
https://github.com/kubernetes/community/blob/master/contributors/guide/release-notes.md.
-->
```release-note
[]
```
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit
* **Chores**
* Updated code ownership governance (no user-facing impact).
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
Add authentication.oidc.insecureSkipVerify option to platform chart
with default=false. The flag is now conditionally included in
oauth2-proxy args only when explicitly enabled.
Signed-off-by: IvanHunters <xorokhotnikov@gmail.com>
## Summary
- Create new `piraeus-operator-crds` package containing all piraeus CRDs
- Add `piraeus-operator-crds` as dependency for `piraeus-operator` in
linstor PackageSource
- Set `privileged: true` for CRDs package to ensure namespace gets
correct PodSecurity labels
- Disable `installCRDs` in piraeus-operator values
## Motivation
Helm does not reliably install all CRDs from the `templates/` directory
when the crds.yaml file is large. This causes
`linstorsatellites.piraeus.io` CRD to be missing, which breaks satellite
pod creation.
Separating CRDs into a dedicated package (similar to cert-manager-crds,
prometheus-operator-crds) ensures reliable CRD installation.
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit
* **Chores**
* Introduced a separate component for managing Piraeus operator CRDs
with explicit dependency ordering.
* Updated operator configuration to disable built-in CRD installation
and depend on the new dedicated CRD component.
* Established installation dependency chain to ensure components
initialize in the correct sequence.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
## Summary
- Increase TLS certificate duration from 90 days to 10 years
- Adjust renewBefore from 15 to 30 days
- Prevents certificate expiration issues in SeaweedFS
## Test plan
- [ ] Deploy SeaweedFS with `global.enableSecurity=true`
- [ ] Verify certificates are issued with 10 year duration
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit
* **New Features**
* Added RestoreJob resource and end-to-end restore workflows with Velero
integration, including polling and status tracking.
* Introduced Velero restoreSpec to strategy templates and a
virtual-machine backup strategy template.
* Enhanced backup plans UI with backup class and schedule fields.
* **Bug Fixes**
* Enforced non-empty backupClassName and immutability for BackupJob.
* **Chores**
* Removed BackupJob webhook implementation and its tests.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
## What this PR does
Adds `startupProbe` to both `bff` and `web` containers in the dashboard
deployment. On slow hardware, kubelet kills containers because the
`livenessProbe` only allows ~33 seconds for startup. The `startupProbe`
gives containers up to 60 seconds to start before `livenessProbe` kicks
in.
### Release note
```release-note
[dashboard] Add startupProbe to bff and web containers to prevent restarts on slow hardware
```
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit
* **Chores**
* Added health monitoring during service startup for core application
services. Services now perform health checks to verify proper
initialization and readiness before handling requests.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
Implemented by upgrade hook atomically patching VM resource
<!-- Thank you for making a contribution! Here are some tips for you:
- Start the PR title with the [label] of Cozystack component:
- For system components: [platform], [system], [linstor], [cilium],
[kube-ovn], [dashboard], [cluster-api], etc.
- For managed apps: [apps], [tenant], [kubernetes], [postgres],
[virtual-machine] etc.
- For development and maintenance: [tests], [ci], [docs], [maintenance].
- If it's a work in progress, consider creating this PR as a draft.
- Don't hesistate to ask for opinion and review in the community chats,
even if it's still a draft.
- Add the label `backport` if it's a bugfix that needs to be backported
to a previous version.
-->
## What this PR does
Allow switching between `instancetype` and custom resources for VM
### Release note
<!-- Write a release note:
- Explain what has changed internally and for users.
- Start with the same [label] as in the PR title
- Follow the guidelines at
https://github.com/kubernetes/community/blob/master/contributors/guide/release-notes.md.
-->
```release-note
[vm] allow switching between instancetype and custom resources
```
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit
* **New Features**
* Improved virtual machine resource configuration with more flexible CPU
and memory management
* Added support for transitioning between instancetype and custom
resource configurations
* Enhanced VM update process to automatically detect and handle
resource-related changes, including removal of outdated configurations
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
## Summary
- Automatically enable `enableGatewayAPI` in cert-manager when the
Gateway API addon is enabled
- Uses the same `define` + `mergeOverwrite` pattern as Cilium for
consistency
- User can still override via `valuesOverride`
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit
* **Refactor**
* Improved certificate manager configuration handling by enhancing the
integration of default settings with custom user overrides in Kubernetes
deployments.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
Remove authorizationServicesEnabled as it's incompatible with public
clients (requires service account which public clients don't have).
Signed-off-by: IvanHunters <xorokhotnikov@gmail.com>
Switch ClusterKeycloak to use internal service URL instead of external
ingress, and enable insecureSkipVerify for environments with self-signed
certificates.
Signed-off-by: IvanHunters <xorokhotnikov@gmail.com>
<!-- Thank you for making a contribution! Here are some tips for you:
- Start the PR title with the [label] of Cozystack component:
- For system components: [platform], [system], [linstor], [cilium],
[kube-ovn], [dashboard], [cluster-api], etc.
- For managed apps: [apps], [tenant], [kubernetes], [postgres],
[virtual-machine] etc.
- For development and maintenance: [tests], [ci], [docs], [maintenance].
- If it's a work in progress, consider creating this PR as a draft.
- Don't hesistate to ask for opinion and review in the community chats,
even if it's still a draft.
- Add the label `backport` if it's a bugfix that needs to be backported
to a previous version.
-->
## What this PR does
Migrates usage of deprecated field `running` to `runStrategy`
### Release note
<!-- Write a release note:
- Explain what has changed internally and for users.
- Start with the same [label] as in the PR title
- Follow the guidelines at
https://github.com/kubernetes/community/blob/master/contributors/guide/release-notes.md.
-->
```release-note
[vm] migrate usage of deprecated field `running` to `runStrategy`
```
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit
* **New Features**
* VM running states now support Always, Halted, Manual, RerunOnFailure,
and Once strategies
* Configure CPU, memory, and socket allocation for virtual machines
* Add GPU support to virtual machines
* Define network subnets for virtual machines
* Configure storage classes and options for system disks
* Set SSH keys and cloud-init parameters for initialization
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
## Summary
- Remove `cozystack.cozystack-controller` from monitoring package
dependencies
- Allows monitoring to work without cozystack-engine being enabled
## Motivation
The monitoring package depends on `cozystack.cozystack-controller` which
is only installed as part of `cozystack-engine`. This prevents
monitoring from becoming ready when cozystack-engine is not enabled.
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit
* **Chores**
* Simplified monitoring default: reduced installed sub-components and
top-level dependencies for a leaner deployment.
* Removed generation of several automated workload-monitoring manifests
to streamline resource set.
* **New Features**
* Added a curated list of monitoring dashboards for cluster, networking,
storage, databases, control-plane, ingress, observability tooling, and
key services.
* Added three ExternalName services to expose tenant monitoring
endpoints.
* **Improvements**
* Made monitoring agents namespace-aware by templating
tenant/multi-namespace references for metrics and logging endpoints.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
Expose the --mtu flag in values.yaml to allow overriding the default
WireGuard interface MTU (1420). This is needed for environments where
the underlying network has a lower MTU, such as Azure VMs (eth0 MTU
1400), to avoid packet fragmentation in the WireGuard tunnel.
Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: Andrei Kvapil <kvapss@gmail.com>
## What this PR does
Add Qdrant vector database as a new managed application, following the
HelmRelease-based vendoring pattern (same as nats, ingress, seaweedfs).
Changes:
- `packages/system/qdrant/` — vendored upstream Qdrant Helm chart
- `packages/apps/qdrant/` — wrapper chart creating Flux HelmRelease CR
- `packages/system/qdrant-rd/` — ApplicationDefinition for
`apps.cozystack.io/v1alpha1/Qdrant` CRD
- `hack/e2e-apps/qdrant.bats` — E2E test
Features:
- Single replica or clustered mode (automatic based on replica count)
- Persistent storage with configurable size and storage class
- Resource presets (nano to 2xlarge)
- API key authentication (auto-generated)
- Optional external LoadBalancer access
- Dashboard integration with WorkloadMonitor and ServiceMonitor
### Release note
```release-note
[qdrant] Add Qdrant vector database as a managed application
```
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit
## Release Notes
* **New Features**
* Added Qdrant vector database as a managed service, enabling users to
deploy and manage Qdrant instances with configurable replicas, storage,
and resource presets.
* Introduced support for external access, persistent storage, API key
management, and metrics monitoring integration.
* Added end-to-end testing for Qdrant deployments.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
Add PackageSource and PaaS bundle entry so the platform deploys
qdrant-rd, which provides the ApplicationDefinition that registers
the Qdrant Kind in the API server.
Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: Aleksei Sviridkin <f@lex.la>
## What this PR does
Previously a namespaced role was created per tenant and access level.
Since these roles are all identical, it's sufficient to create a cluster
role per access level and just create namespaced rolebindings to these
cluster roles per tenant. This will enable aggregation rules. E.g. if a
new API group is installed, such as backups.cozystack.io, a new
clusterrole can be created for managing this API group with a label like
rbac.cozystack.io/aggregate-to-admin. Smart use of aggregation rules
will enable automatic granting of access rights not just to admin, but
to super-admin too, and there will be no need to update every single
tenant.
### Release note
```release-note
[tenant,rbac] Use ClusterRoles with aggregationRules instead of roles
per every tenant.
```
## What this PR does
Previously a namespaced role was created per tenant and access level.
Since these roles are all identical, it's sufficient to create a cluster
role per access level and just create namespaced rolebindings to these
cluster roles per tenant. This will enable aggregation rules. E.g. if a
new API group is installed, such as backups.cozystack.io, a new
clusterrole can be created for managing this API group with a label like
rbac.cozystack.io/aggregate-to-admin. Smart use of aggregation rules
will enable automatic granting of access rights not just to admin, but
to super-admin too, and there will be no need to update every single
tenant.
### Release note
```release-note
[tenant,rbac] Use ClusterRoles with aggregationRules instead of roles
per every tenant.
```
Signed-off-by: Timofei Larkin <lllamnyp@gmail.com>
Add topology.kubernetes.io/zone node label alongside kilo.squat.ai/location
for standard Kubernetes topology awareness across Hetzner and Azure zones.
Update Azure docs with correct Package resource format (spec.components),
replace hardcoded values with placeholders, and remove environment-specific
infrastructure references.
Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: Andrei Kvapil <kvapss@gmail.com>
## What this PR does
This patch adds a conditional for running on a statically defined VM the
maintainers have SSH access to if the pull request has a `debug` label.
This is useful for debugging failing workflows when the diagnostic info
from the pipeline is insufficient.
### Release note
```release-note
[ci] Run builds on a static VM with SSH access if the PR has a debug
label.
```
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit
## Release Notes
This release contains no user-facing changes. Internal workflow
adjustments have been made to optimize development infrastructure.
* **Chores**
* Updated CI/CD pipeline configuration for improved test execution
efficiency.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
WorkloadMonitor resources are only needed when extra/monitoring is
deployed. Move them from system/monitoring to extra/monitoring package.
Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: IvanHunters <xorokhotnikov@gmail.com>
## What this PR does
This patch adds a conditional for running on a statically defined VM the
maintainers have SSH access to if the pull request has a `debug` label.
This is useful for debugging failing workflows when the diagnostic info
from the pipeline is insufficient.
### Release note
```release-note
[ci] Run builds on a static VM with SSH access if the PR has a debug
label.
```
Signed-off-by: Timofei Larkin <lllamnyp@gmail.com>
- Replace hardcoded tenant-root with {{ .Release.Namespace }} in vmagent
and fluent-bit configs
- Add ExternalName services in cozystack-basics to redirect monitoring
traffic from cozy-monitoring to tenant-root when engine is deployed
- Add missing components to monitoring PackageSource
Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: IvanHunters <xorokhotnikov@gmail.com>
When the Gateway API addon is enabled, automatically configure
cert-manager with enableGatewayAPI: true. Uses the same default
values + mergeOverwrite pattern as Cilium for consistency.
Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: Andrei Kvapil <kvapss@gmail.com>
Kubelet kills bff and web containers on slow hardware because the
livenessProbe only allows 33 seconds for startup. Add startupProbe
with failureThreshold=30 and periodSeconds=2, giving containers up
to 60 seconds to start before livenessProbe kicks in.
Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: Andrei Kvapil <kvapss@gmail.com>
Add dashboards.list that defines which Grafana dashboards to create.
Without this file, no GrafanaDashboard resources were being generated.
Signed-off-by: IvanHunters <xorokhotnikov@gmail.com>
Add optional node-lifecycle-controller that automatically deletes
unreachable NotReady nodes from the cluster. This solves the problem
of "zombie" node objects left behind when cluster autoscaler deletes
cloud instances.
Features:
- Monitors nodes matching a label selector
- Deletes nodes that are NotReady for configurable duration
- Verifies unreachability via ICMP ping before deletion
- Supports protected labels to prevent deletion of specific nodes
- Leader election for HA deployment
Disabled by default. Enable with:
nodeLifecycleController:
enabled: true
Signed-off-by: IvanHunters <xorokhotnikov@gmail.com>
Remove components that are already defined in separate PackageSources
to avoid race conditions and HelmRelease conflicts:
- vertical-pod-autoscaler
- postgres-operator
- grafana-operator
- victoria-metrics-operator
- prometheus-operator-crds
These components should be installed via their dedicated PackageSources.
Signed-off-by: IvanHunters <xorokhotnikov@gmail.com>
Helm does not reliably install all CRDs from templates/ directory,
particularly when the crds.yaml file is large. This causes
linstorsatellites.piraeus.io CRD to be missing, breaking satellite pod
creation.
Changes:
- Create new piraeus-operator-crds package with all piraeus CRDs
- Add piraeus-operator-crds as dependency for piraeus-operator
- Set privileged: true for CRDs package to ensure namespace has correct
PodSecurity labels from the start
- Disable installCRDs in piraeus-operator since CRDs come from separate
package
Signed-off-by: IvanHunters <xorokhotnikov@gmail.com>
The monitoring package depends on cozystack.cozystack-controller which
is only installed as part of cozystack-engine. This prevents monitoring
from becoming ready when cozystack-engine is not enabled.
Remove this dependency to allow monitoring to work independently.
Signed-off-by: IvanHunters <xorokhotnikov@gmail.com>
## What this PR does
Use GitHub Copilot CLI for automatic changelog generation on tagged
releases.
### How it works
When a new version tag is pushed, the `generate-changelog` job in
`tags.yaml`:
1. Checks out the `main` branch with full history and tags
2. Verifies that a changelog file doesn't already exist in
`docs/changelogs/`
3. Installs GitHub Copilot CLI (`@github/copilot` npm package)
4. Runs Copilot CLI in non-interactive mode (`-p` flag) with
`--allow-all-tools --allow-all-paths` to generate the changelog
following the instructions in `docs/agents/changelog.md`
5. Commits the generated file to a `changelog-vX.Y.Z` branch and opens a
PR to `main`
### Authentication
- `COPILOT_GITHUB_TOKEN` secret — fine-grained PAT with **"Copilot
Requests: Read"** account permission, used to authenticate Copilot CLI
- `GH_PAT` secret — used by `gh` CLI inside the Copilot session to query
PR authors via GitHub API
### Release note
```release-note
[ci] Replaced Gemini with GitHub Copilot CLI for automatic changelog generation on release tags
```
## Summary
- Fix invalid PromQL expression in `CNPGClusterOffline` alert rule that
causes vmalert pods to crash
- Fix PromQL logic so the alert fires correctly when cluster is offline
## Problem
The `CNPGClusterOffline` alert in
`packages/system/postgres-operator/alerts/cnpg-default-alerts.yaml` had
two issues:
1. **Syntax error** - Extra closing parenthesis causing vmalert pods to
crash:
```promql
sum by (namespace, pod) (cnpg_collector_up)) OR on() vector(0) == 0
^^-- extra )
```
2. **Logic error** - After removing the extra parenthesis, the
expression still had incorrect logic:
```promql
sum by (namespace, pod) (cnpg_collector_up) OR on() vector(0) == 0
```
This evaluates as `cnpg_collector_up OR (vector(0) == 0)` which fires
when the cluster is **online** (wrong behavior).
## Solution
Fix both issues by properly wrapping the OR expression in parentheses:
```promql
(sum by (namespace, pod) (cnpg_collector_up) OR on() vector(0)) == 0
```
This correctly fires when `cnpg_collector_up` is 0 or absent (cluster
actually offline).
## Test plan
- [ ] Verify vmalert pods start successfully after applying the fix
- [ ] Verify the alert rule is properly loaded
- [ ] Verify alert only fires when CNPG cluster is actually offline
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Change TLS certificate duration from 90 days to 10 years to prevent
certificate expiration issues. Also adjust renewBefore from 15 to 30 days.
Signed-off-by: IvanHunters <xorokhotnikov@gmail.com>
## What this PR does
When OIDC is disabled, the dashboard's token-proxy now properly
validates bearer tokens against the k8s API's JWKS url.
### Release note
```release-note
[dashboard] Verify bearer tokens against the issuer's JWKS url.
```
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit
* **New Features**
* Implemented JWKS-based JWT token verification for enhanced security.
* Added conditional role-based access control for non-OIDC deployments.
* **Chores**
* Updated authentication dependencies and modernized token validation.
* Removed legacy external token validation mechanism.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
## What this PR does
When OIDC is disabled, the dashboard's token-proxy now properly
validates bearer tokens against the k8s API's JWKS url.
### Release note
```release-note
[dashboard] Verify bearer tokens against the issuer's JWKS url.
```
Signed-off-by: Timofei Larkin <lllamnyp@gmail.com>
Remove extra closing parenthesis in the CNPGClusterOffline alert expression
that causes vmalert pods to crash with "bad prometheus expr" error.
Signed-off-by: mattia-eleuteri <mattia@hidora.io>
<!-- Thank you for making a contribution! Here are some tips for you:
- Start the PR title with the [label] of Cozystack component:
- For system components: [platform], [system], [linstor], [cilium],
[kube-ovn], [dashboard], [cluster-api], etc.
- For managed apps: [apps], [tenant], [kubernetes], [postgres],
[virtual-machine] etc.
- For development and maintenance: [tests], [ci], [docs], [maintenance].
- If it's a work in progress, consider creating this PR as a draft.
- Don't hesistate to ask for opinion and review in the community chats,
even if it's still a draft.
- Add the label `backport` if it's a bugfix that needs to be backported
to a previous version.
-->
## What this PR does
### Release note
<!-- Write a release note:
- Explain what has changed internally and for users.
- Start with the same [label] as in the PR title
- Follow the guidelines at
https://github.com/kubernetes/community/blob/master/contributors/guide/release-notes.md.
-->
```release-note
- Add monitoring as a system chart
```
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit
## Release Notes
* **New Features**
* Introduced comprehensive monitoring stack with Grafana, Victoria
Metrics operators, Prometheus operator, and vertical pod autoscaler for
enhanced system observability and performance metrics.
* **Chores**
* System upgraded to version 27 with automatic monitoring resource
reorganization for improved infrastructure management.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
Add migration 26 that re-labels monitoring resources to be owned by
monitoring-system HelmRelease instead of monitoring. This allows
seamless transition from direct resource management in extra/monitoring
to delegated management via system/monitoring chart.
Migration steps:
- Find all monitoring HelmReleases in tenant namespaces
- Suspend HelmRelease to prevent reconciliation
- Delete helm secrets to orphan resources
- Relabel all resources to monitoring-system
- Delete suspended HelmRelease
Also updates:
- monitoring-application.yaml: add monitoring-system component
- helmrelease.yaml: reference monitoring-system ExternalArtifact
- values.yaml: bump targetVersion to 25
Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: Andrei Kvapil <kvapss@gmail.com>
## Summary
- Configure CoreDNS chart to create `kube-dns` ServiceAccount matching
Kubernetes bootstrap ClusterRoleBinding
- Fixes RBAC errors (`Failed to watch`) when CoreDNS pods restart
## Problem
Kubernetes bootstrap creates a `ClusterRoleBinding: system:kube-dns`
that references `ServiceAccount: kube-dns` in `kube-system`. However,
the coredns chart was using the `default` ServiceAccount because
`serviceAccount.create` was not enabled.
This caused CoreDNS pods to fail with `[ERROR] plugin/kubernetes: Failed
to watch` errors after restarts, as they lacked RBAC permissions to
watch the Kubernetes API. Old pods worked due to cached data, but new
pods failed after rollout.
## Solution
Add ServiceAccount configuration to
`packages/system/coredns/values.yaml`:
```yaml
serviceAccount:
create: true
name: kube-dns
```
## Test plan
- [x] Verify ServiceAccount `kube-dns` is created: `kubectl get sa -n
kube-system kube-dns`
- [x] Verify deployment uses correct ServiceAccount: `kubectl get
deployment -n kube-system coredns -o
jsonpath='{.spec.template.spec.serviceAccountName}'` → `kube-dns`
- [x] Restart CoreDNS pods and verify all pods are Ready
- [x] Check logs show no RBAC errors
- [x] Test DNS resolution works
🤖 Generated with [Claude Code](https://claude.com/claude-code)
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit
* **Chores**
* Updated DNS service configuration with service account settings.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
## Summary
- Add migration 25 for v1.0 upgrade cleanup:
- Remove legacy `cozystack` installer deployment
- Remove legacy `cozystack-assets` statefulset
- Remove `bootbox` and `bootbox-rd` HelmReleases (will be recreated by
new platform chart if enabled)
- Revert migration 23 to original state (only removes
cozystack-resource-definition-crd)
- Update targetVersion from 24 to 26
- Fix jq syntax error in `hack/migrate-to-version-1.0.sh`
Migration 23 was already executed on existing clusters at version 23+,
so adding cleanup there would never run. The new migration 25 ensures
proper cleanup during v1.0 upgrade.
## Test plan
- [ ] Upgrade from v0.4x to v1.0
- [ ] Verify old `cozystack` deployment is deleted
- [ ] Verify old `cozystack-assets` statefulset is deleted
- [ ] Verify old `bootbox` and `bootbox-rd` HelmReleases are deleted
- [ ] Verify platform installs successfully after migration
- [ ] Run `./hack/migrate-to-version-1.0.sh` without jq errors
Migration 23 was already executed on existing clusters, so the legacy
installer removal never ran. This moves the cleanup to a new migration 25
which will execute during upgrade.
Changes:
- Revert migration 23 to original state (only removes cozystack-resource-definition-crd)
- Add migration 25 for v1.0 upgrade cleanup:
- Remove legacy cozystack installer deployment
- Remove legacy cozystack-assets statefulset
- Remove bootbox and bootbox-rd HelmReleases
- Update targetVersion from 24 to 26
- Fix jq syntax error in migrate-to-version-1.0.sh
Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: Andrei Kvapil <kvapss@gmail.com>
## Summary
- Remove duplicate `bootbox-system` component from `bootbox-application`
PackageSource (was duplicating system/bootbox installation)
- Auto-create `cozystack.bootbox-application` Package when bootbox is
enabled in `bundles.enabledPackages`
- This ensures `bootbox-rd` (ApplicationDefinition) is properly
installed as a dependency before `cozystack.bootbox`
## Test plan
- [ ] Enable bootbox in `bundles.enabledPackages`
- [ ] Verify both `cozystack.bootbox-application` and
`cozystack.bootbox` Packages are created
- [ ] Verify `bootbox-rd` (ApplicationDefinition) is installed
- Fix docker build context for migrations image
- Fix yq flags for compatibility with Mike Farah's yq v4
- Add platform image to main build target
Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: Andrei Kvapil <kvapss@gmail.com>
Add cleanup of old installer components during migration:
- Delete cozystack deployment
- Delete cozystack-assets statefulset
This ensures no conflict between old installer and new operator
when upgrading to v1.0 architecture.
Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: Andrei Kvapil <kvapss@gmail.com>
- Remove duplicate bootbox-system component from bootbox-application
PackageSource (was duplicating system/bootbox installation)
- Auto-create cozystack.bootbox-application Package when bootbox is
enabled in bundles.enabledPackages
- This ensures bootbox-rd (ApplicationDefinition) is properly installed
as a dependency before cozystack.bootbox
Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: Andrei Kvapil <kvapss@gmail.com>
<!-- Thank you for making a contribution! Here are some tips for you:
- Start the PR title with the [label] of Cozystack component:
- For system components: [platform], [system], [linstor], [cilium],
[kube-ovn], [dashboard], [cluster-api], etc.
- For managed apps: [apps], [tenant], [kubernetes], [postgres],
[virtual-machine] etc.
- For development and maintenance: [tests], [ci], [docs], [maintenance].
- If it's a work in progress, consider creating this PR as a draft.
- Don't hesistate to ask for opinion and review in the community chats,
even if it's still a draft.
- Add the label `backport` if it's a bugfix that needs to be backported
to a previous version.
-->
## What this PR does
### Release note
<!-- Write a release note:
- Explain what has changed internally and for users.
- Start with the same [label] as in the PR title
- Follow the guidelines at
https://github.com/kubernetes/community/blob/master/contributors/guide/release-notes.md.
-->
```release-note
[
- add BackupClass Velero and strategy for VM
- dashboard fixes for BackupClass and simplifed Plan/BackupJob API
]
```
<!-- Thank you for making a contribution! Here are some tips for you:
- Start the PR title with the [label] of Cozystack component:
- For system components: [platform], [system], [linstor], [cilium],
[kube-ovn], [dashboard], [cluster-api], etc.
- For managed apps: [apps], [tenant], [kubernetes], [postgres],
[virtual-machine] etc.
- For development and maintenance: [tests], [ci], [docs], [maintenance].
- If it's a work in progress, consider creating this PR as a draft.
- Don't hesistate to ask for opinion and review in the community chats,
even if it's still a draft.
- Add the label `backport` if it's a bugfix that needs to be backported
to a previous version.
-->
## What this PR does
### Release note
<!-- Write a release note:
- Explain what has changed internally and for users.
- Start with the same [label] as in the PR title
- Follow the guidelines at
https://github.com/kubernetes/community/blob/master/contributors/guide/release-notes.md.
-->
```release-note
Implement restoreJob controller and velero strategy.
```
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit
* **New Features**
* Added a RestoreJob controller with manager registration to handle
restore operations.
* Added Velero-based and Job-based restore workflows with lifecycle
handling, requeueing, and polling.
* Velero templates now include a RestoreSpec field to configure
restores.
* **Bug Fixes / Reliability**
* New helpers to mark BackupJob and RestoreJob failures and reliably
update status.
* **Documentation / CRD**
* RestoreJob CRD gains a status subresource and a Phase print column.
<sub>✏️ Tip: You can customize this high-level summary in your review
settings.</sub>
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
<!-- Thank you for making a contribution! Here are some tips for you:
- Start the PR title with the [label] of Cozystack component:
- For system components: [platform], [system], [linstor], [cilium],
[kube-ovn], [dashboard], [cluster-api], etc.
- For managed apps: [apps], [tenant], [kubernetes], [postgres],
[virtual-machine] etc.
- For development and maintenance: [tests], [ci], [docs], [maintenance].
- If it's a work in progress, consider creating this PR as a draft.
- Don't hesistate to ask for opinion and review in the community chats,
even if it's still a draft.
- Add the label `backport` if it's a bugfix that needs to be backported
to a previous version.
-->
## What this PR does
- Removes `hostNetwork` for Ingress NGINX Controller for Managed
kubernetes cluster
- Adds Service of type NodePort for load balancing
### Release note
<!-- Write a release note:
- Explain what has changed internally and for users.
- Start with the same [label] as in the PR title
- Follow the guidelines at
https://github.com/kubernetes/community/blob/master/contributors/guide/release-notes.md.
-->
```release-note
[kubernetes] use NodePort service instead of hostNetwork for ingress-nginx
```
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit
* **Configuration**
* Updated ingress-nginx controller to use NodePort exposure method
(ports 30000 for HTTP, 30001 for HTTPS) when configured as "Proxied."
* Removed unnecessary hostNetwork configuration for the controller.
* Service port mappings updated to align with new NodePort settings.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
## What this PR does
Adds separate values to keycloak branding.
### Release note
```release-note
Added separate values to keycloak branding
```
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit
* **Refactor**
* Enhanced branding configuration application with improved conditional
logic to ensure branding is only applied when appropriate configuration
values are present.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
## What this PR does
Aligns MongoDB configuration with postgres and mysql patterns by
introducing a separate `databases` section.
**Before:**
```yaml
users:
myuser:
db: mydb
roles:
- name: readWrite
db: mydb
- name: dbAdmin
db: mydb
```
**After:**
```yaml
users:
myuser: {}
databases:
mydb:
roles:
admin:
- myuser
readonly:
- reader
```
Changes:
- Add separate `databases` section with `roles.admin` and
`roles.readonly`
- Simplify `users` to only contain optional password field
- All users authenticate via `admin` database (MongoDB production best
practice)
- `admin` role maps to `readWrite` + `dbAdmin` permissions
- `readonly` role maps to `read` permission
## Migration
Includes migration script (`migrations/24`) that automatically converts
existing MongoDB HelmReleases:
- Detects old format users (with `db` and `roles` fields)
- Converts `readWrite`/`dbAdmin` roles → `databases.{db}.roles.admin`
- Converts `read` role → `databases.{db}.roles.readonly`
- Preserves user passwords in new `users` format
- Skips HelmReleases that don't have old format
### Release note
```release-note
[mongodb] Unified users and databases configuration to match postgres and mysql patterns. Users now defined separately from databases, with role assignments in the databases section. Includes automatic migration for existing deployments.
```
This PR prepares the release `v1.0.0-beta.2`.
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit
* **Chores**
* Updated multiple system component images from alpha to beta releases
(v1.0.0-beta.2)
* Updated container image digests across various services for
consistency and stability
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
Service will be recreated
<!-- Thank you for making a contribution! Here are some tips for you:
- Start the PR title with the [label] of Cozystack component:
- For system components: [platform], [system], [linstor], [cilium],
[kube-ovn], [dashboard], [cluster-api], etc.
- For managed apps: [apps], [tenant], [kubernetes], [postgres],
[virtual-machine] etc.
- For development and maintenance: [tests], [ci], [docs], [maintenance].
- If it's a work in progress, consider creating this PR as a draft.
- Don't hesistate to ask for opinion and review in the community chats,
even if it's still a draft.
- Add the label `backport` if it's a bugfix that needs to be backported
to a previous version.
-->
## What this PR does
Allow changing field `external` after creation.
Service will be deleted and then created with correct type.
### Release note
<!-- Write a release note:
- Explain what has changed internally and for users.
- Start with the same [label] as in the PR title
- Follow the guidelines at
https://github.com/kubernetes/community/blob/master/contributors/guide/release-notes.md.
-->
```release-note
[vm] recreate service on `external` change to allow changing this field after creation
```
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit
* **New Features**
* Detects Service type changes during virtual machine updates and
automatically recreates the Service as needed; update workflow now runs
a job to remove and recreate Services before reconciliation.
* **Chores**
* Updated RBAC to grant the update workflow permissions to manage
(delete) the targeted Service.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
Add detailed documentation for setting up cluster-autoscaler with Hetzner
Cloud and Talos Linux, including:
- Talos image creation via rescue mode
- vSwitch (private network) configuration
- Correct Talos machine config structure for nodeLabels and nodeIP
- Package deployment with RBAC rules for leader election
- Testing with pod anti-affinity
- Configuration reference tables (env vars, server types, regions)
- Troubleshooting section for common issues
- Kilo mesh networking integration
Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: Andrei Kvapil <kvapss@gmail.com>
Add coordination.k8s.io/leases permissions to Role via additionalRules.
This fixes leader election failures in the cluster-autoscaler.
Also add documentation for Hetzner Cloud setup with Talos Linux.
Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: Andrei Kvapil <kvapss@gmail.com>
Add cluster-autoscaler system package with support for multiple cloud
providers. Each provider has its own PackageSource and values file,
allowing simultaneous deployment in multi-cloud setups.
Supported providers:
- Hetzner Cloud
- Azure
Each instance uses a unique leader-elect-resource-name to prevent
conflicts when running multiple autoscalers in the same cluster.
Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: Andrei Kvapil <kvapss@gmail.com>
The Kubernetes bootstrap creates a ClusterRoleBinding 'system:kube-dns'
that references ServiceAccount 'kube-dns' in 'kube-system'. However,
the coredns chart was using the 'default' ServiceAccount because
serviceAccount.create was not enabled.
This caused CoreDNS pods to fail with 'Failed to watch' errors after
restarts, as they lacked RBAC permissions to watch the Kubernetes API.
Configure the chart to create the 'kube-dns' ServiceAccount, which
matches the expected binding from Kubernetes bootstrap.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Signed-off-by: mattia-eleuteri <mattia@hidora.io>
Add instance profile metadata to workload monitor
Currently getWorkloadMetadata only extracts the
kubevirt.io/cluster-instancetype-name annotation from VMI objects. This
misses the instance profile information
(kubevirt.io/cluster-preference-name), which is needed to track
workload sizing/configuration alongside the instance type.
This change adds extraction of the cluster instance profile annotation
and exposes it as the
workloads.cozystack.io/kubevirt-vmi-instance-profile label on collected
metrics.
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit
* **New Features**
* Workload monitoring now detects kubevirt instance profile annotations
and adds a matching workload label
(workloads.cozystack.io/kubevirt-vmi-instance-profile) when present.
* Enriched workload metadata for virtualized workloads.
* No other behavior or control-flow changes.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
<!-- Thank you for making a contribution! Here are some tips for you:
- Start the PR title with the [label] of Cozystack component:
- For system components: [platform], [system], [linstor], [cilium],
[kube-ovn], [dashboard], [cluster-api], etc.
- For managed apps: [apps], [tenant], [kubernetes], [postgres],
[virtual-machine] etc.
- For development and maintenance: [tests], [ci], [docs], [maintenance].
- If it's a work in progress, consider creating this PR as a draft.
- Don't hesistate to ask for opinion and review in the community chats,
even if it's still a draft.
- Add the label `backport` if it's a bugfix that needs to be backported
to a previous version.
-->
## What this PR does
### Release note
<!-- Write a release note:
- Explain what has changed internally and for users.
- Start with the same [label] as in the PR title
- Follow the guidelines at
https://github.com/kubernetes/community/blob/master/contributors/guide/release-notes.md.
-->
```release-note
[]
```
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit
* **Chores**
* Configured cluster for high availability with multiple replicas
* Implemented rolling update strategy to ensure safer deployments
* Enhanced health constraints to maintain cluster stability during
updates
<sub>✏️ Tip: You can customize this high-level summary in your review
settings.</sub>
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
## Summary
Enable Cozystack deployment on generic Kubernetes clusters (kubeadm,
k3s, RKE2, etc.) by making Talos-specific configurations conditional and
adding a new `isp-full-generic` bundle variant.
## Problem 1: Cilium uses hardcoded Talos API endpoint
**Issue**: Cilium is configured to connect to Kubernetes API at
`localhost:7445` (Talos KubePrism). On non-Talos clusters, the API
server runs on standard `<node-ip>:6443`.
**Solution**: Add `networking.apiServer.host` and
`networking.apiServer.port` to platform values, passed to Cilium via
bundle template.
**Why this approach**: Allows per-deployment configuration while keeping
Talos defaults for backwards compatibility.
## Problem 2: Cilium cgroup automount disabled
**Issue**: `values-talos.yaml` sets `cgroup.autoMount.enabled: false`
because Talos mounts cgroups. On Ubuntu/Debian, Cilium needs to mount
cgroups itself.
**Solution**: Add `networking.cilium.cgroup.autoMount` configuration and
create `cilium-generic` / `kubeovn-cilium-generic` networking variants
without `values-talos.yaml`.
**Why this approach**: Separate variants avoid complex conditional logic
in templates and make the difference explicit.
## Problem 3: KubeOVN helm lookup fails on fresh clusters
**Issue**: KubeOVN chart uses `lookup` to find control-plane nodes.
During initial deployment, lookup returns empty results causing
installation failure.
**Solution**: Add `networking.kubeovn.MASTER_NODES` parameter to
explicitly pass node IPs when needed.
**Why this approach**: Helm lookup is unreliable during initial
deployment; explicit configuration is more predictable.
## Problem 4: LinstorSatelliteConfiguration breaks non-Talos nodes
**Issue**: `cozystack-talos` LinstorSatelliteConfiguration removes DRBD
init containers (`drbd-module-loader`, `drbd-shutdown-guard`). These are
required on Ubuntu/Debian where DRBD modules aren't pre-loaded.
**Solution**: Add `talos.enabled` value (default: true) and wrap the
configuration in `{{- if .Values.talos.enabled }}`.
**Why this approach**: Conditional rendering is cleaner than node
selectors (Talos has no unique default label) and maintains backwards
compatibility.
## Problem 5: linstor-scheduler image tag invalid for k3s
**Issue**: k3s reports version as `v1.35.0+k3s1`. The `+` character is
invalid in Docker image tags, causing `InvalidImageName` error for
kube-scheduler.
**Solution**: Strip distribution suffix using `regexReplaceAll "\\+.*$"`
in the helper template.
**Why this approach**: k3s version `v1.35.0+k3s1` is based on upstream
`v1.35.0`, so stripping the suffix produces the correct upstream image
tag. This is automatic and doesn't require manual version specification.
## Problem 6: No bundle for non-Talos deployments
**Issue**: `isp-full` bundle always uses Talos-specific networking and
doesn't pass `talos.enabled=false` to linstor.
**Solution**: Add `isp-full-generic` bundle variant that uses
`kubeovn-cilium-generic` networking and passes `talos.enabled=false` to
linstor.
**Why this approach**: Dedicated bundle variant makes deployment
straightforward — users just set `bundle-name: isp-full-generic` without
needing post-install patches.
## Files Changed
| Path | Change |
|------|--------|
| `packages/core/platform/values.yaml` | Add apiServer, cilium.cgroup,
kubeovn.MASTER_NODES config |
| `packages/core/platform/sources/networking.yaml` | Add generic
variants |
| `packages/core/platform/templates/bundles/system.yaml` | Add
isp-full-generic bundle |
| `packages/system/linstor/values.yaml` | Add talos.enabled |
| `packages/system/linstor/templates/satellites-talos.yaml` |
Conditional rendering |
| `packages/system/linstor-scheduler/.../templates/_helpers.tpl` | Strip
version suffix |
| `packages/system/linstor-scheduler/.../templates/deployment.yaml` |
Use helper for image tag |
## Testing
Tested on k3s v1.35.0+k3s1 with Ubuntu 24.04 LTS. All HelmReleases
deploy successfully without post-install patches.
## Breaking Changes
None. Existing Talos deployments continue to work unchanged with default
values.
---
Closes#1933
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit
## Release Notes
* **New Features**
* Added support for generic (non-Talos) Kubernetes clusters with new
isp-full-generic platform bundle
* Introduced generic and hosted deployment variants for
cozystack-operator
* Added generic networking variants with Cilium and KubeOVN support
* Expanded IAAS, PAAS, and NAAS bundle support for generic deployments
* Improved backup retention configuration
* **Bug Fixes**
* Fixed Kubernetes version handling to work with distribution-specific
suffixes
* **Chores**
* Made LINSTOR Talos satellite support optional
<sub>✏️ Tip: You can customize this high-level summary in your review
settings.</sub>
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
<!-- Thank you for making a contribution! Here are some tips for you:
- Start the PR title with the [label] of Cozystack component:
- For system components: [platform], [system], [linstor], [cilium],
[kube-ovn], [dashboard], [cluster-api], etc.
- For managed apps: [apps], [tenant], [kubernetes], [postgres],
[virtual-machine] etc.
- For development and maintenance: [tests], [ci], [docs], [maintenance].
- If it's a work in progress, consider creating this PR as a draft.
- Don't hesistate to ask for opinion and review in the community chats,
even if it's still a draft.
- Add the label `backport` if it's a bugfix that needs to be backported
to a previous version.
-->
## What this PR does
Add external ips (LoadBalancer services with assigned IP address) count
to the Tenant details page
<img width="2560" height="1330" alt="external ip count on tenant details
page"
src="https://github.com/user-attachments/assets/f2256629-3815-4892-b80d-e150d0fd3908"
/>
### Release note
<!-- Write a release note:
- Explain what has changed internally and for users.
- Start with the same [label] as in the PR title
- Follow the guidelines at
https://github.com/kubernetes/community/blob/master/contributors/guide/release-notes.md.
-->
```release-note
[dashboard] add external ips count (count of services of type LoadBalancer with assigned ip) to the Tenant details page
```
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit
* **New Features**
* External IPs count now shown in the Tenant application Details tab — a
left-column item displays the number of external IPs for LoadBalancer
services.
* Application status now exposes an External IPs Count metric for Tenant
applications, so UI and APIs can surface the number of LoadBalancer
external IPs without changing other behaviors.
<sub>✏️ Tip: You can customize this high-level summary in your review
settings.</sub>
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
<!-- Thank you for making a contribution! Here are some tips for you:
- Start the PR title with the [label] of Cozystack component:
- For system components: [platform], [system], [linstor], [cilium],
[kube-ovn], [dashboard], [cluster-api], etc.
- For managed apps: [apps], [tenant], [kubernetes], [postgres],
[virtual-machine] etc.
- For development and maintenance: [tests], [ci], [docs], [maintenance].
- If it's a work in progress, consider creating this PR as a draft.
- Don't hesistate to ask for opinion and review in the community chats,
even if it's still a draft.
- Add the label `backport` if it's a bugfix that needs to be backported
to a previous version.
-->
## What this PR does
- Fix `Used` column to show correct data
- Removes table from `Info` page
- Add patch to let flatMap work with jsonpath
<img width="1800" height="940" alt="resource quota table example"
src="https://github.com/user-attachments/assets/421d1d32-fdc8-4269-838f-b02db0ed6712"
/>
### Release note
<!-- Write a release note:
- Explain what has changed internally and for users.
- Start with the same [label] as in the PR title
- Follow the guidelines at
https://github.com/kubernetes/community/blob/master/contributors/guide/release-notes.md.
-->
```release-note
[dashboard] fix resource quota table on tenant details page
```
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit
* **Bug Fixes**
* Fixed resource quota data fetching for tenant resources to use correct
namespace references from status fields.
* **Refactor**
* Refined dashboard layout for resource quota displays and improved
custom field resolution with enhanced dynamic key mapping for complex
data structures.
<sub>✏️ Tip: You can customize this high-level summary in your review
settings.</sub>
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
Removed reference to .Values.networking.apiServer which was removed
from values.yaml. Use empty defaults and let ConfigMap or
apiServerEndpoint parsing override them.
Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: Aleksei Sviridkin <f@lex.la>
Extract common system packages shared between isp-full and isp-full-generic
into a helper template `cozystack.platform.system.common-packages`.
Packages moved to helper:
- kubeovn-webhook, kubeovn-plunger, cozy-proxy
- multus, metallb, reloader
- linstor-scheduler, snapshot-controller
Packages NOT in helper (differ between variants):
- networking (variant differs: kubeovn-cilium vs kubeovn-cilium-generic)
- linstor (talos.enabled differs)
Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: Aleksei Sviridkin <f@lex.la>
Per review feedback, create separate manifest files for different
deployment targets instead of using templated values:
- cozystack-operator.yaml (Talos): hardcoded localhost:7445 (KubePrism)
- cozystack-operator-generic.yaml: reads from cozystack-operator-config
ConfigMap (user must create before applying)
- cozystack-operator-hosted.yaml: no env override (uses in-cluster SA)
This keeps installation flow clean - users apply the manifest matching
their deployment target without needing to modify original files.
Build system updated to generate all three variants.
Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: Aleksei Sviridkin <f@lex.la>
- Remove apiServer and cilium.cgroup from user-facing values.yaml
- Hardcode Talos-specific values (localhost:7445, cgroup autoMount: false)
directly in isp-full bundle template
- isp-full-generic already hardcodes its own values
- Improve MASTER_NODES comment explaining helm lookup behavior
This hides implementation details from users as requested in review.
Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: Aleksei Sviridkin <f@lex.la>
The CRD upgrade pre-install job has image compatibility issues.
CRDs are installed as part of the Helm chart install anyway.
Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: Aleksei Sviridkin <f@lex.la>
rancher/kubectl is a minimal image without shell.
alpine/k8s includes kubectl and shell for running upgrade jobs.
Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: Aleksei Sviridkin <f@lex.la>
Bitnami images are forbidden and bitnamilegacy/kubectl:1.35 doesn't exist.
Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: Aleksei Sviridkin <f@lex.la>
The enabledPackages list must use fully qualified package names
(e.g., cozystack.velero) to match the helper template check.
Also add backupstrategy-controller which is required by backup-controller.
Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: Aleksei Sviridkin <f@lex.la>
Add velero to enabledPackages list for isp-full-generic to enable
backup functionality by default.
Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: Aleksei Sviridkin <f@lex.la>
- Allow iaas, paas, naas bundles when using isp-full-generic variant
- Enable all bundles by default in values-isp-full-generic.yaml
- Update bundle templates to accept isp-full-generic alongside isp-full
Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: Aleksei Sviridkin <f@lex.la>
The cozystack-values secret was using the default publishing.host value
from values.yaml (example.org) instead of reading from the cozystack
ConfigMap where the actual root-host is configured.
Add ConfigMap lookup to apps.yaml to read root-host from cozystack
ConfigMap, falling back to chart values if not present.
Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: Aleksei Sviridkin <f@lex.la>
Same issue as lineage-controller-webhook: DaemonSet uses hardcoded
nodeSelector with empty value, but k3s/kubeadm use value "true".
Changes:
- Add nodeSelector to cozystack-api values (default: empty for Talos)
- Update deployment template to use configurable nodeSelector
- Pass nodeSelector for both cozystack-api and lineage-controller-webhook
in isp-full-generic bundle variant
Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: Aleksei Sviridkin <f@lex.la>
On Talos, control-plane nodes have label node-role.kubernetes.io/control-plane
with empty value. On generic k8s (k3s, kubeadm), the same label has value "true".
The lineage-controller-webhook DaemonSet was hardcoded to use empty value,
causing 0 pods scheduled on k3s clusters.
Changes:
- Add nodeSelector to lineage-controller-webhook values (default: empty for Talos)
- Update DaemonSet template to use configurable nodeSelector
- Pass nodeSelector with value "true" for isp-full-generic bundle variant
Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: Aleksei Sviridkin <f@lex.la>
k3s and kubeadm set node-role.kubernetes.io/control-plane=true,
while Talos uses an empty value. KubeOVN node selector needs
the exact label value to match.
Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: Aleksei Sviridkin <f@lex.la>
For isp-full-generic bundle variant, the template now looks up the
cozystack ConfigMap to read cluster-specific settings instead of only
relying on chart default values.
This fixes the issue where Cilium was trying to connect to localhost:7445
(Talos KubePrism default) instead of the actual API server endpoint.
Values read from ConfigMap:
- api-server-endpoint: parsed to extract host/port for Cilium
- ipv4-pod-cidr, ipv4-pod-gateway, ipv4-svc-cidr, ipv4-join-cidr: for KubeOVN
Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: Aleksei Sviridkin <f@lex.la>
Remove trailing dash from {{- end -}} constructs in _helpers.tpl to
preserve trailing newlines after each rendered package YAML document.
Without this fix, consecutive packages are rendered without proper
separators:
variant: default---
apiVersion: cozystack.io/v1alpha1
This causes YAML parse errors: "mapping key 'apiVersion' already
defined" because the YAML parser sees a single document with duplicate
keys instead of two separate documents.
Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: Aleksei Sviridkin <f@lex.la>
The isp-full-generic variant should have iaas, paas, and naas bundles
disabled by default since these bundles require 'isp-full' or 'isp-hosted'
variants as per validation rules in their respective templates.
Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: Aleksei Sviridkin <f@lex.la>
Create dedicated values file for isp-full-generic bundle variant with
bundles.system.variant set to "isp-full-generic" instead of "isp-full".
This fixes the ArtifactGenerator generating identical digests for both
isp-full and isp-full-generic variants, which caused the wrong bundle
template to be selected.
Also adds isp-full-generic variant to installer PackageSource definition.
Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: Aleksei Sviridkin <f@lex.la>
The isp-full-generic bundle now automatically extracts host and port
from publishing.apiServerEndpoint URL for Cilium and KubeOVN
configuration. This eliminates the need for manual networking.apiServer
values when api-server-endpoint is set in the ConfigMap.
Changes:
- Parse apiServerEndpoint URL to extract host and port for Cilium
- Auto-detect MASTER_NODES for KubeOVN from apiServerEndpoint if not
explicitly set
- Hardcode cgroup.autoMount=true for generic k8s (non-Talos)
Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: Aleksei Sviridkin <f@lex.la>
Add isp-full-generic bundle variant for deploying Cozystack on non-Talos
Kubernetes clusters (kubeadm, k3s, RKE2, etc.).
This variant:
- Uses kubeovn-cilium-generic networking (no values-talos.yaml)
- Passes talos.enabled=false to linstor (keeps DRBD init containers)
- Supports all standard networking configuration via platform values
Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: Aleksei Sviridkin <f@lex.la>
Add cilium-generic and kubeovn-cilium-generic variants that exclude
values-talos.yaml. These variants are suitable for kubeadm, k3s, RKE2
and other non-Talos Kubernetes distributions.
Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: Aleksei Sviridkin <f@lex.la>
k3s and RKE2 include distribution suffixes in version string
(e.g., v1.35.0+k3s1) which are not valid container image tags.
Strip everything after '+' using regexReplaceAll to produce clean
version tags like v1.35.0.
Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: Aleksei Sviridkin <f@lex.la>
Add talos.enabled value (default: true) to control whether
LinstorSatelliteConfiguration cozystack-talos is created.
This allows deploying linstor on non-Talos clusters (Ubuntu, Debian, etc.)
where DRBD init containers are required.
Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: Aleksei Sviridkin <f@lex.la>
- Add networking.kubeovn.MASTER_NODES to platform values.yaml
- Conditionally include MASTER_NODES in kubeovn values when set
- Enables KubeOVN deployment on k3s/kubeadm without control-plane node labels
Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: Aleksei Sviridkin <f@lex.la>
Add networking.apiServer.host and networking.apiServer.port to platform values
to allow overriding the default Talos KubePrism settings (localhost:7445).
Also add networking.cilium.cgroup.autoMount for non-Talos clusters that need
Cilium to mount cgroups automatically.
Changes:
- packages/core/platform/values.yaml: Add apiServer and cilium.cgroup settings
- packages/core/platform/templates/bundles/system.yaml: Pass cilium values
- packages/core/installer/values.yaml: Add kubernetesServiceHost/Port
- packages/core/installer/templates/cozystack-operator.yaml: Template env vars
Closes: #1933
Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: Aleksei Sviridkin <f@lex.la>
## What this PR does
Fixes v1.0 migration script
### Release note
```release-note
V1.0 migration script fixed after an upgrade was performed.
```
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit
* **Chores**
* Updated bundle naming convention (paas → isp).
* Restructured configuration schema: flattened branding, renamed
scheduling field, and moved authentication redirect setting.
* Minor output text and prompt wording adjustments.
* **New Features**
* Added support for external IPs in generated configuration with safe
empty fallback.
* Consolidated branding and scheduling entries for simpler output.
* **Bug Fixes**
* Migration script now validates prerequisites and produces improved,
YAML-friendly output.
<sub>✏️ Tip: You can customize this high-level summary in your review
settings.</sub>
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
Add migration script to convert existing MongoDB HelmReleases
from old users format to new users + databases format.
Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: Andrei Kvapil <kvapss@gmail.com>
<!-- Thank you for making a contribution! Here are some tips for you:
- Start the PR title with the [label] of Cozystack component:
- For system components: [platform], [system], [linstor], [cilium],
[kube-ovn], [dashboard], [cluster-api], etc.
- For managed apps: [apps], [tenant], [kubernetes], [postgres],
[virtual-machine] etc.
- For development and maintenance: [tests], [ci], [docs], [maintenance].
- If it's a work in progress, consider creating this PR as a draft.
- Don't hesistate to ask for opinion and review in the community chats,
even if it's still a draft.
- Add the label `backport` if it's a bugfix that needs to be backported
to a previous version.
-->
## What this PR does
- Add Tenant details page to dashboard
- Add Resource Quota usage to Tenant page in dashboard
<img width="1800" height="939" alt="resource quota in tenant"
src="https://github.com/user-attachments/assets/7f4f6ed6-e9b8-4258-a60b-15f203443a1f"
/>
### Release note
<!-- Write a release note:
- Explain what has changed internally and for users.
- Start with the same [label] as in the PR title
- Follow the guidelines at
https://github.com/kubernetes/community/blob/master/contributors/guide/release-notes.md.
-->
```release-note
[dashboard] Add Tenant details page and resource quota usage to tenant page
```
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit
* **New Features**
* Added a Resource Quotas section to the Details tab to show allocation
limits and usage.
* Info and Tenant types now display quota data in an interactive table
with columns for resource, limits, and current usage.
* Enhanced quota table presentation with additional columns for
flattened resource keys and values for clearer reporting.
<sub>✏️ Tip: You can customize this high-level summary in your review
settings.</sub>
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
<!-- Thank you for making a contribution! Here are some tips for you:
- Start the PR title with the [label] of Cozystack component:
- For system components: [platform], [system], [linstor], [cilium],
[kube-ovn], [dashboard], [cluster-api], etc.
- For managed apps: [apps], [tenant], [kubernetes], [postgres],
[virtual-machine] etc.
- For development and maintenance: [tests], [ci], [docs], [maintenance].
- If it's a work in progress, consider creating this PR as a draft.
- Don't hesistate to ask for opinion and review in the community chats,
even if it's still a draft.
- Add the label `backport` if it's a bugfix that needs to be backported
to a previous version.
-->
## What this PR does
Add "Edit" button to all resource pages in dashboard
fixes https://github.com/cozystack/cozystack/issues/1911
<img width="1974" height="1327" alt="edit button"
src="https://github.com/user-attachments/assets/8e28dd2d-b474-488b-a7ae-a6c60d23619f"
/>
### Release note
<!-- Write a release note:
- Explain what has changed internally and for users.
- Start with the same [label] as in the PR title
- Follow the guidelines at
https://github.com/kubernetes/community/blob/master/contributors/guide/release-notes.md.
-->
```release-note
[dashboard] Add "Edit" button to all resources
```
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit
* **New Features**
* Added Edit action links to dashboard headers for quick access to edit
functionality.
* **UI/Layout**
* Improved dashboard header layout with enhanced spacing and center
alignment for better visual consistency.
<sub>✏️ Tip: You can customize this high-level summary in your review
settings.</sub>
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
## What this PR does
Fixes v1.0 packagesources.
### Release note
```release-note
V1.0 packagesources fixed after an upgrade was performed.
```
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit
* **Chores**
* Updated platform component configurations for cozy-proxy and
metrics-server deployments.
* Reorganized monitoring component namespace allocation.
* Added metrics server as a dependency in the monitoring pipeline.
* Integrated new system components (linstor-scheduler and
metrics-server) into the platform templates.
* Refined HelmRelease chart references for vertical pod autoscaler.
<sub>✏️ Tip: You can customize this high-level summary in your review
settings.</sub>
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
Align MongoDB configuration with postgres and mysql patterns:
- Add separate `databases` section with `roles.admin` and `roles.readonly`
- Simplify `users` to only contain optional password field
- All users now authenticate via `admin` database (MongoDB best practice)
- Admin role maps to readWrite + dbAdmin permissions
- Readonly role maps to read permission
Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: Andrei Kvapil <kvapss@gmail.com>
## What this PR does
Add missing changelog documentation for recent patch releases that were
released without changelog entries.
### Release note
```release-note
[docs] Add changelog documentation for v0.40.3, v0.41.1, v0.41.2, v0.41.3
```
Add missing changelog documentation for recent patch releases.
Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: Andrei Kvapil <kvapss@gmail.com>
## What this PR does
Fixes telemetry collection in cozystack-operator which was failing
because the manager's cache is filtered and doesn't include resources
needed for telemetry (kube-system namespace, nodes, services, PVs).
Switches to using `mgr.GetAPIReader()` which bypasses the cache.
### Release note
```release-note
[system] Fix telemetry collection in cozystack-operator
```
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit
## Release Notes
* **Chores**
* Updated Docker image versions from 1.5.0 to 1.6.1 for platform
migrations and testing environments.
* Enhanced telemetry collection system with improved data access
patterns.
<sub>✏️ Tip: You can customize this high-level summary in your review
settings.</sub>
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
## What this PR does
This patch enables the `offline_access` scope for the dashbord keycloak
client, so that users get a refresh token which gatekeeper can use to
automatically refresh an expiring access token. Also session timeouts
were increased.
### Release note
```release-note
[dashboard] Increase session timeouts, add the offline_access scope,
enable refresh tokens to improve the overall user experience when
working with the dashboard.
```
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit
* **Chores**
* Updated authentication configuration with additional OAuth2 scope
parameters to enhance security credentials handling.
* Enhanced Keycloak client configuration with optional client scopes and
session management attributes, including logout redirect settings and
session timeout controls for improved session lifecycle management.
<sub>✏️ Tip: You can customize this high-level summary in your review
settings.</sub>
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
The cozystack-operator's manager cache is filtered to only include
specific secrets and namespaces with certain labels. This caused
telemetry collection to fail because resources like kube-system
namespace, nodes, services, and PVs were not in the cache.
Switch to using mgr.GetAPIReader() which bypasses the cache and
queries the API server directly. This is appropriate for telemetry
since it only runs every 15 minutes.
Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: Andrei Kvapil <kvapss@gmail.com>
Introduce a new "External IPs" sidebar and associated dashboard factory
to display services of type LoadBalancer.
<!-- Thank you for making a contribution! Here are some tips for you:
- Start the PR title with the [label] of Cozystack component:
- For system components: [platform], [system], [linstor], [cilium],
[kube-ovn], [dashboard], [cluster-api], etc.
- For managed apps: [apps], [tenant], [kubernetes], [postgres],
[virtual-machine] etc.
- For development and maintenance: [tests], [ci], [docs], [maintenance].
- If it's a work in progress, consider creating this PR as a draft.
- Don't hesistate to ask for opinion and review in the community chats,
even if it's still a draft.
- Add the label `backport` if it's a bugfix that needs to be backported
to a previous version.
-->
## What this PR does
Add External IPs tab to sidebar with all `Service` resources of type
`LoadBalancer` in tenant
### Release note
<!-- Write a release note:
- Explain what has changed internally and for users.
- Start with the same [label] as in the PR title
- Follow the guidelines at
https://github.com/kubernetes/community/blob/master/contributors/guide/release-notes.md.
-->
```release-note
[dashboard] add External IPs tab with `Service` resources of type `LoadBalancer`
```
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit
* **New Features**
* Added External IPs menu item in the Administration section for quick
access to load balancer services
* Introduced a new dashboard view displaying external IPs with tabbed
interface for enhanced visibility
<sub>✏️ Tip: You can customize this high-level summary in your review
settings.</sub>
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
<!-- Thank you for making a contribution! Here are some tips for you:
- Start the PR title with the [label] of Cozystack component:
- For system components: [platform], [system], [linstor], [cilium],
[kube-ovn], [dashboard], [cluster-api], etc.
- For managed apps: [apps], [tenant], [kubernetes], [postgres],
[virtual-machine] etc.
- For development and maintenance: [tests], [ci], [docs], [maintenance].
- If it's a work in progress, consider creating this PR as a draft.
- Don't hesistate to ask for opinion and review in the community chats,
even if it's still a draft.
- Add the label `backport` if it's a bugfix that needs to be backported
to a previous version.
-->
## What this PR does
Add labels and selectors for Service and Ingress resources to correctly
show them in corresponding tabs for Kubernetes app in dashboard
### Release note
<!-- Write a release note:
- Explain what has changed internally and for users.
- Start with the same [label] as in the PR title
- Follow the guidelines at
https://github.com/kubernetes/community/blob/master/contributors/guide/release-notes.md.
-->
```release-note
[kubernetes] show Ingress and Service resources in dashboard
```
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit
* **Chores**
* Introduced standardized metadata labels across Kubernetes application
resources including cloud configuration, ingress, and service components
to improve resource organization, tracking, identification, and
multi-tenant environment management
* Extended the system's resource management framework to include and
manage additional Kubernetes infrastructure components, such as
ingress-nginx, enhancing overall cluster orchestration and service
networking capabilities
<sub>✏️ Tip: You can customize this high-level summary in your review
settings.</sub>
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
<!-- Thank you for making a contribution! Here are some tips for you:
- Start the PR title with the [label] of Cozystack component:
- For system components: [platform], [system], [linstor], [cilium],
[kube-ovn], [dashboard], [cluster-api], etc.
- For managed apps: [apps], [tenant], [kubernetes], [postgres],
[virtual-machine] etc.
- For development and maintenance: [tests], [ci], [docs], [maintenance].
- If it's a work in progress, consider creating this PR as a draft.
- Don't hesistate to ask for opinion and review in the community chats,
even if it's still a draft.
- Add the label `backport` if it's a bugfix that needs to be backported
to a previous version.
-->
## What this PR does
Fixes filtering pods on Pods tab of Service details page
### Release note
<!-- Write a release note:
- Explain what has changed internally and for users.
- Start with the same [label] as in the PR title
- Follow the guidelines at
https://github.com/kubernetes/community/blob/master/contributors/guide/release-notes.md.
-->
```release-note
[dashboard] Fix filtering pods on Pods tab for Service
```
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit
* **Bug Fixes**
* Corrected label selector field name in pod details table configuration
to ensure proper label filtering functionality.
<sub>✏️ Tip: You can customize this high-level summary in your review
settings.</sub>
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
## What this PR does
This patch enables the `offline_access` scope for the dashbord keycloak
client, so that users get a refresh token which gatekeeper can use to
automatically refresh an expiring access token. Also session timeouts
were increased.
### Release note
```release-note
[dashboard] Increase session timeouts, add the offline_access scope,
enable refresh tokens to improve the overall user experience when
working with the dashboard.
```
Signed-off-by: Timofei Larkin <lllamnyp@gmail.com>
Introduce a new "External IPs" sidebar and associated dashboard factory to display services of type LoadBalancer.
Signed-off-by: Kirill Ilin <stitch14@yandex.ru>
<!-- Thank you for making a contribution! Here are some tips for you:
- Start the PR title with the [label] of Cozystack component:
- For system components: [platform], [system], [linstor], [cilium],
[kube-ovn], [dashboard], [cluster-api], etc.
- For managed apps: [apps], [tenant], [kubernetes], [postgres],
[virtual-machine] etc.
- For development and maintenance: [tests], [ci], [docs], [maintenance].
- If it's a work in progress, consider creating this PR as a draft.
- Don't hesistate to ask for opinion and review in the community chats,
even if it's still a draft.
- Add the label `backport` if it's a bugfix that needs to be backported
to a previous version.
-->
## What this PR does
Set minReplicas for VPA for VMAgent to work correctly in the default
case when replicas = 1
Fixes https://github.com/cozystack/cozystack/issues/1893
### Release note
<!-- Write a release note:
- Explain what has changed internally and for users.
- Start with the same [label] as in the PR title
- Follow the guidelines at
https://github.com/kubernetes/community/blob/master/contributors/guide/release-notes.md.
-->
```release-note
[monitoring-agents] set minReplicas=1 in VPA for cluster-wide VMAgent to allow recrating single replica
```
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit
## Changes
* **Chores**
* Updated system monitoring autoscaler configuration to enforce minimum
replica constraints, enhancing service resilience and availability
during scaling operations.
<sub>✏️ Tip: You can customize this high-level summary in your review
settings.</sub>
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
This reverts commit 73d6e3013e.
<!-- Thank you for making a contribution! Here are some tips for you:
- Start the PR title with the [label] of Cozystack component:
- For system components: [platform], [system], [linstor], [cilium],
[kube-ovn], [dashboard], [cluster-api], etc.
- For managed apps: [apps], [tenant], [kubernetes], [postgres],
[virtual-machine] etc.
- For development and maintenance: [tests], [ci], [docs], [maintenance].
- If it's a work in progress, consider creating this PR as a draft.
- Don't hesistate to ask for opinion and review in the community chats,
even if it's still a draft.
- Add the label `backport` if it's a bugfix that needs to be backported
to a previous version.
-->
## What this PR does
Revert wrong type change according review
[request](https://github.com/cozystack/cozystack/pull/1873#discussion_r2717088708)
### Release note
<!-- Write a release note:
- Explain what has changed internally and for users.
- Start with the same [label] as in the PR title
- Follow the guidelines at
https://github.com/kubernetes/community/blob/master/contributors/guide/release-notes.md.
-->
```release-note
[fix strategyRef type in backups]
```
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit
## Release Notes
* **Breaking Changes**
* Backup strategy references have been updated from cluster-scoped to
namespace-scoped references. This change enables better namespace
isolation and control. Update your backup configurations to reference
strategies within the appropriate namespace.
<sub>✏️ Tip: You can customize this high-level summary in your review
settings.</sub>
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
## What this PR does
Removes the ability for users to specify custom container images
(`images.pmm` and `images.backup`) in the MongoDB application values.
This is a security hardening measure - allowing users to specify
arbitrary container images could lead to:
- Running malicious or compromised images
- Supply chain attacks through untrusted image registries
- Privilege escalation if malicious images are designed to exploit the
cluster
The images are now hardcoded in the template:
- `percona/pmm-client:2.44.1`
- `percona/percona-backup-mongodb:2.11.0`
### Release note
```release-note
[apps] Remove user-configurable images from MongoDB chart for security hardening
```
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit
* **Refactor**
* Removed container image customization options for MongoDB's monitoring
and backup components. The system now uses fixed, pre-determined image
versions (PMM Client 2.44.1 and Percona Backup MongoDB 2.11.0) to ensure
deployment consistency and reduce configuration complexity.
<sub>✏️ Tip: You can customize this high-level summary in your review
settings.</sub>
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
Add documentation for the resourceQuotas parameter explaining
supported resource types and their usage.
Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: Andrei Kvapil <kvapss@gmail.com>
<!-- Thank you for making a contribution! Here are some tips for you:
- Start the PR title with the [label] of Cozystack component:
- For system components: [platform], [system], [linstor], [cilium],
[kube-ovn], [dashboard], [cluster-api], etc.
- For managed apps: [apps], [tenant], [kubernetes], [postgres],
[virtual-machine] etc.
- For development and maintenance: [tests], [ci], [docs], [maintenance].
- If it's a work in progress, consider creating this PR as a draft.
- Don't hesistate to ask for opinion and review in the community chats,
even if it's still a draft.
- Add the label `backport` if it's a bugfix that needs to be backported
to a previous version.
-->
## What this PR does
### Release note
<!-- Write a release note:
- Explain what has changed internally and for users.
- Start with the same [label] as in the PR title
- Follow the guidelines at
https://github.com/kubernetes/community/blob/master/contributors/guide/release-notes.md.
-->
Closes#1880
```release-note
Added BackupClass into BackupJob and BackupPlan for simplify UX.
```
The following functions related StorageRef were removed as they are no
longer used after migrating to BackupClass API:
// - resolveBucketStorageRef: Previously resolved S3 credentials from
Bucket storageRef
// - createS3CredsForVelero: Previously created Velero S3 credentials
secrets
// - createBackupStorageLocation: Previously created Velero
BackupStorageLocation resources
// - createVolumeSnapshotLocation: Previously created Velero
VolumeSnapshotLocation resources
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit
* **New Features**
* Added BackupClass CRD and backupClassName field; webhook enforces and
defaults/validates backupClassName.
* **Refactor**
* Removed inline storageRef/strategyRef from specs; controllers resolve
strategy and parameters from BackupClass. Application apiGroup defaults
to apps.cozystack.io. Velero flow now uses resolved parameters.
* **Tests**
* Added unit tests for BackupClass resolution, BackupJob/Plan factory,
webhook validation, and Velero template rendering.
* **Chores**
* Manager cache/indexing updated to track BackupClass and index
BackupJobs by backupClassName.
<sub>✏️ Tip: You can customize this high-level summary in your review
settings.</sub>
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
<!-- Thank you for making a contribution! Here are some tips for you:
- Start the PR title with the [label] of Cozystack component:
- For system components: [platform], [system], [linstor], [cilium],
[kube-ovn], [dashboard], [cluster-api], etc.
- For managed apps: [apps], [tenant], [kubernetes], [postgres],
[virtual-machine] etc.
- For development and maintenance: [tests], [ci], [docs], [maintenance].
- If it's a work in progress, consider creating this PR as a draft.
- Don't hesistate to ask for opinion and review in the community chats,
even if it's still a draft.
- Add the label `backport` if it's a bugfix that needs to be backported
to a previous version.
-->
## What this PR does
Removes Tenant `isolated` flag and enforces isolation using network
policies
### Release note
<!-- Write a release note:
- Explain what has changed internally and for users.
- Start with the same [label] as in the PR title
- Follow the guidelines at
https://github.com/kubernetes/community/blob/master/contributors/guide/release-notes.md.
-->
```release-note
[tenant] Removed Tenant field `isolated` and enforced network isolation
```
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit
* **Chores**
* Network policies for tenant namespace isolation are now always
enforced and cannot be disabled.
* Removed the `isolated` configuration parameter from all tenant
settings, schema definitions, and documentation.
<sub>✏️ Tip: You can customize this high-level summary in your review
settings.</sub>
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
Remove the ability for users to specify custom container images
(images.pmm and images.backup) in the MongoDB application values.
This is a security hardening measure - allowing users to specify
arbitrary container images could lead to running malicious or
compromised images, supply chain attacks, or privilege escalation.
The images are now hardcoded in the template:
- percona/pmm-client:2.44.1
- percona/percona-backup-mongodb:2.11.0
Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: Andrei Kvapil <kvapss@gmail.com>
<!-- Thank you for making a contribution! Here are some tips for you:
- Start the PR title with the [label] of Cozystack component:
- For system components: [platform], [system], [linstor], [cilium],
[kube-ovn], [dashboard], [cluster-api], etc.
- For managed apps: [apps], [tenant], [kubernetes], [postgres],
[virtual-machine] etc.
- For development and maintenance: [tests], [ci], [docs], [maintenance].
- If it's a work in progress, consider creating this PR as a draft.
- Don't hesistate to ask for opinion and review in the community chats,
even if it's still a draft.
- Add the label `backport` if it's a bugfix that needs to be backported
to a previous version.
-->
## What this PR does
Add enum validation for addon `ingressNginx.exposeMethod` field
### Release note
<!-- Write a release note:
- Explain what has changed internally and for users.
- Start with the same [label] as in the PR title
- Follow the guidelines at
https://github.com/kubernetes/community/blob/master/contributors/guide/release-notes.md.
-->
```release-note
[kubernetes] add enum validation for ingressNginx `exposeMethod` field
```
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit
## Release Notes
* **Chores**
* Enhanced validation for Kubernetes ingress configuration to enforce
allowed exposure methods ("Proxied" or "LoadBalancer"). This ensures
configurations are properly restricted to supported options and prevents
invalid settings from being accepted.
<sub>✏️ Tip: You can customize this high-level summary in your review
settings.</sub>
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
<!-- Thank you for making a contribution! Here are some tips for you:
- Start the PR title with the [label] of Cozystack component:
- For system components: [platform], [system], [linstor], [cilium],
[kube-ovn], [dashboard], [cluster-api], etc.
- For managed apps: [apps], [tenant], [kubernetes], [postgres],
[virtual-machine] etc.
- For development and maintenance: [tests], [ci], [docs], [maintenance].
- If it's a work in progress, consider creating this PR as a draft.
- Don't hesistate to ask for opinion and review in the community chats,
even if it's still a draft.
- Add the label `backport` if it's a bugfix that needs to be backported
to a previous version.
-->
## What this PR does
### Release note
<!-- Write a release note:
- Explain what has changed internally and for users.
- Start with the same [label] as in the PR title
- Follow the guidelines at
https://github.com/kubernetes/community/blob/master/contributors/guide/release-notes.md.
-->
```release-note
[ci] Run e2e tests on shared runners
```
This PR prepares the release `v1.0.0-alpha.2`.
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit
* **Chores**
* Updated container images and system components across all major
platform services. Includes infrastructure upgrades for networking,
storage systems, application APIs, data backup services, system
monitoring, and supporting components. All updates incorporate the
latest security patches, critical stability improvements, and
performance optimizations to enhance overall system reliability,
operational efficiency, and platform stability.
<sub>✏️ Tip: You can customize this high-level summary in your review
settings.</sub>
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
## Summary
- Add Feature Highlights section with MongoDB as the main feature of
v0.41.0
- Add Documentation section with website repository changes to both
v0.41.0 and v1.0.0-alpha.2
- Add @matthieu-robin to v1.0.0-alpha.2 contributors
Fixes#749
## Changes
- Add documentation explaining how to enable Hubble for network
observability in Grafana
- Add four Grafana dashboards for Hubble metrics visualization
(Overview, DNS, L7 HTTP, Network)
- Register Hubble dashboards in the monitoring package
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit
* **New Features**
* Introduced Hubble DNS Overview dashboard for namespace-level DNS
metrics visibility.
* Introduced Hubble L7 HTTP Metrics dashboard for HTTP request and
performance monitoring.
* Introduced Hubble Network Overview dashboard for network flows and
policy insights.
* **Documentation**
* Added comprehensive Hubble observability guide covering setup,
configuration, and troubleshooting.
<sub>✏️ Tip: You can customize this high-level summary in your review
settings.</sub>
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
## What this PR does
Adds MongoDB as a managed application to Cozystack using Percona
Operator for MongoDB (Apache 2.0).
**New packages:**
- `packages/system/mongodb-operator/` - Percona psmdb-operator
deployment
- `packages/apps/mongodb/` - MongoDB application chart
- `packages/system/mongodb-rd/` - CozystackResourceDefinition for
dashboard
**Features:**
- Replica set mode with configurable replica count
- Sharded cluster mode (config servers, mongos routers, multiple shards)
- S3 backup integration with cron scheduling
- Point-in-time recovery from backups
- External access via LoadBalancer
- Multiple MongoDB versions (v6, v7, v8)
- Resource presets (nano to 2xlarge)
- User management with role-based access
### Release note
```release-note
[apps] Add MongoDB as a managed application with replica set, sharding, and S3 backup support
```
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit
* **New Features**
* Managed MongoDB service and operator packaging; replica sets with
optional sharding, external access, per-user credentials, workload
monitoring, and version mappings for provider images
* Backup & restore with S3 support and point-in-time recovery
* **Documentation**
* Comprehensive README plus Helm values and schema for configuration
* **Tests**
* Extensive unit/template tests and end-to-end tests covering
credentials, services, backup, restore, CR rendering, and operator
install
* **Packaging**
* Charts and bundle entries added for operator and managed service
installation
<sub>✏️ Tip: You can customize this high-level summary in your review
settings.</sub>
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
Signed-off-by: Andrei Kvapil <kvapss@gmail.com>
<!-- Thank you for making a contribution! Here are some tips for you:
- Start the PR title with the [label] of Cozystack component:
- For system components: [platform], [system], [linstor], [cilium],
[kube-ovn], [dashboard], [cluster-api], etc.
- For managed apps: [apps], [tenant], [kubernetes], [postgres],
[virtual-machine] etc.
- For development and maintenance: [tests], [ci], [docs], [maintenance].
- If it's a work in progress, consider creating this PR as a draft.
- Don't hesistate to ask for opinion and review in the community chats,
even if it's still a draft.
- Add the label `backport` if it's a bugfix that needs to be backported
to a previous version.
-->
## What this PR does
### Release note
<!-- Write a release note:
- Explain what has changed internally and for users.
- Start with the same [label] as in the PR title
- Follow the guidelines at
https://github.com/kubernetes/community/blob/master/contributors/guide/release-notes.md.
-->
```release-note
[kilo] Introduce kilo
```
## What this PR does
This PR introduces a new flux-plunger controller that automatically
fixes HelmRelease resources experiencing "has no deployed releases"
errors by cleaning up corrupted Helm release secrets.
### Problem
When Helm releases fail with "has no deployed releases" error, FluxCD
cannot recover automatically. This typically happens when:
- Helm release secrets are corrupted
- Failed upgrades leave the release in an inconsistent state
- Manual intervention is required to delete the problematic secret
### Solution
The flux-plunger controller:
1. Watches HelmRelease resources for "has no deployed releases" errors
2. Suspends the HelmRelease (with `flux-client-side-apply` field
manager)
3. Identifies and deletes the latest (problematic) Helm release secret
4. Tracks processed versions via annotation to prevent recursive
deletion
5. Unsuspends the HelmRelease to allow FluxCD to retry
**Key features:**
- Only deletes one secret at a time (prevents cascading deletions)
- Uses version tracking: if `latest+1 == processed`, skips (already
handled)
- Handles suspended HelmReleases: only unsuspends if it was suspended by
flux-plunger itself
- Gracefully handles invalid annotations by treating them as unprocessed
### Components
- **Controller:** `internal/controller/fluxplunger/flux_plunger.go`
- **Main:** `cmd/flux-plunger/main.go`
- **Helm Chart:** `packages/system/flux-plunger/`
- **Platform Integration:** Added to `paas-full.yaml` bundle
### Release note
```release-note
[platform] Add flux-plunger controller to automatically fix HelmRelease resources with "has no deployed releases" error by cleaning up corrupted Helm secrets
```
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit
## Release Notes
* **New Features**
* Introduced a new controller that automatically resolves HelmRelease
resource failures with built-in health probes and graceful recovery
procedures.
* Enabled by default in system installations.
<sub>✏️ Tip: You can customize this high-level summary in your review
settings.</sub>
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
## What this PR does
This PR adds a new package `local-ccm` to the Cozystack platform.
The local-ccm package provides a Cloud Controller Manager for local
development and testing environments. It integrates the upstream Helm
chart from https://github.com/cozystack/local-ccm (version 0.2.0).
**Changes:**
- Created package structure in `packages/system/local-ccm/`
- Added Makefile with automated upstream chart synchronization from
GitHub
- Added PackageSource definition in
`packages/core/platform/sources/local-ccm.yaml`
- Configured dependency on networking components
- Set privileged mode as required for cloud controller operations
**Package details:**
- Namespace: `cozy-local-ccm`
- Release name: `local-ccm`
- Upstream: https://github.com/cozystack/local-ccm
### Release note
```release-note
[local-ccm] Add local-ccm package for cloud controller manager in local environments
```
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit
* **New Features**
* Added Local Cloud Controller Manager package providing automatic node
IP detection, internal and external IP support, initialization taint
removal, and continuous node reconciliation with configurable intervals.
* **Chores**
* Updated system component tolerations and scheduling policies to
improve compatibility with cloud infrastructure, including better
handling of node readiness, network initialization, and cloud provider
states.
<sub>✏️ Tip: You can customize this high-level summary in your review
settings.</sub>
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
## What this PR does
Removes the redundant `cozystack.io/ui=true` label from HelmReleases.
The existing `apps.cozystack.io/application.kind`,
`apps.cozystack.io/application.group`, and
`apps.cozystack.io/application.name` labels already provide more
specific identification and are sufficient for filtering.
Updates all label selectors to use `apps.cozystack.io/application.kind`
instead.
### Release note
```release-note
[platform] Remove cozystack.io/ui label - use apps.cozystack.io/application.kind for filtering instead
```
## What this PR does
Splits telemetry collection between cozystack-operator and
cozystack-controller for better separation of concerns.
### Release note
```release-note
[platform] Split telemetry collection between cozystack-operator and cozystack-controller
```
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit
* **New Features**
* Added telemetry support to the operator with configurable options
(disable, endpoint, interval).
* Enhanced telemetry to track application kinds and UI-managed releases.
* **Bug Fixes**
* Removed the runtime cozystack-version flag; version is now managed at
build time.
* **Chores**
* Removed unused YAML dependency.
* Simplified build process with improved Makefile commands.
<sub>✏️ Tip: You can customize this high-level summary in your review
settings.</sub>
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
<!-- Thank you for making a contribution! Here are some tips for you:
- Start the PR title with the [label] of Cozystack component:
- For system components: [platform], [system], [linstor], [cilium],
[kube-ovn], [dashboard], [cluster-api], etc.
- For managed apps: [apps], [tenant], [kubernetes], [postgres],
[virtual-machine] etc.
- For development and maintenance: [tests], [ci], [docs], [maintenance].
- If it's a work in progress, consider creating this PR as a draft.
- Don't hesistate to ask for opinion and review in the community chats,
even if it's still a draft.
- Add the label `backport` if it's a bugfix that needs to be backported
to a previous version.
-->
## What this PR does
### Release note
<!-- Write a release note:
- Explain what has changed internally and for users.
- Start with the same [label] as in the PR title
- Follow the guidelines at
https://github.com/kubernetes/community/blob/master/contributors/guide/release-notes.md.
-->
```release-note
- Fix view of loadbalancer ip in services window
```
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit
* **Bug Fixes**
* Fixed LoadBalancer IP column display to correctly retrieve IP
addresses from the updated data source in the dashboard details view.
<sub>✏️ Tip: You can customize this high-level summary in your review
settings.</sub>
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
## What this PR does
Increases the `startupProbe.failureThreshold` for kube-apiserver
container from 3 to 30.
The default 30 seconds (3 failures × 10s period) is often insufficient
for kube-apiserver to complete RBAC bootstrap, especially:
- Under high load conditions
- With slower etcd connections
- During initial cluster provisioning
This change gives kube-apiserver up to 300 seconds to initialize,
preventing unnecessary CrashLoopBackOff cycles.
### Release note
```release-note
[kubernetes] Increase kube-apiserver startup probe threshold to prevent CrashLoopBackOff during slow RBAC initialization
```
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit
* **Chores**
* Updated startup probe threshold configuration to enhance system
initialization stability across core components.
<sub>✏️ Tip: You can customize this high-level summary in your review
settings.</sub>
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
## What this PR does
Changes the default `resourcesPreset` for kube-apiserver from "medium"
to "large" to prevent OOM kills during normal operation. The "medium"
preset provides insufficient memory for clusters with moderate
workloads, leading to repeated crashes and service disruption.
### Release note
```release-note
[kubernetes] Change default apiServer resourcesPreset from "medium" to "large" to prevent OOM kills
```
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit
* **Configuration Changes**
* Updated default API Server resource preset from medium to large for
enhanced resource allocation
* Removed version property from Kubernetes chart configuration schema
<sub>✏️ Tip: You can customize this high-level summary in your review
settings.</sub>
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
Add new package local-ccm from https://github.com/cozystack/local-ccm
- Create package structure in packages/system/local-ccm
- Add Makefile with upstream chart sync from GitHub
- Add PackageSource definition in packages/core/platform/sources/
- Package provides cloud controller manager for local environments
Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: Andrei Kvapil <kvapss@gmail.com>
The apps.cozystack.io/application.kind, apps.cozystack.io/application.group,
and apps.cozystack.io/application.name labels already exist on all HelmReleases
and provide more specific identification. Remove the redundant cozystack.io/ui
label and update all selectors to use apps.cozystack.io/application.kind instead.
Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: Andrei Kvapil <kvapss@gmail.com>
Add MongoDB managed service based on Percona Operator for MongoDB with:
- Replica set mode (default) and sharded cluster mode
- Configurable replicas, storage, and resource presets
- Custom users with role-based access control
- S3-compatible backup with PITR support
- Bootstrap/restore from backup
- External access support
- WorkloadMonitor integration for dashboard
- Comprehensive helm-unittest test coverage (91 tests)
Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: Aleksei Sviridkin <f@lex.la>
Increase startupProbe.failureThreshold from 3 to 30 for kube-apiserver
container. The default 30 seconds (3 failures * 10s period) is often
insufficient for apiserver to complete RBAC bootstrap, especially under
load or with slower etcd connections.
This change gives kube-apiserver up to 300 seconds to initialize,
preventing unnecessary CrashLoopBackOff cycles.
Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: Andrei Kvapil <kvapss@gmail.com>
Add flux-plunger controller to automatically fix HelmRelease resources
with "has no deployed releases" error.
The controller watches HelmRelease resources and performs the following:
- Detects HelmRelease with "has no deployed releases" error
- Suspends the HelmRelease with flux-client-side-apply field manager
- Deletes the latest Helm release secret
- Updates annotation with processed version to prevent recursive deletion
- Unsuspends the HelmRelease to allow Flux to retry
Special handling for suspended HelmRelease:
- If suspend=true and latest+1==processed: removes suspend
- Otherwise: skips processing as suspended by external process
Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: Andrei Kvapil <kvapss@gmail.com>
Signed-off-by: Andrei Kvapil <kvapss@gmail.com>
<!-- Thank you for making a contribution! Here are some tips for you:
- Start the PR title with the [label] of Cozystack component:
- For system components: [platform], [system], [linstor], [cilium],
[kube-ovn], [dashboard], [cluster-api], etc.
- For managed apps: [apps], [tenant], [kubernetes], [postgres],
[virtual-machine] etc.
- For development and maintenance: [tests], [ci], [docs], [maintenance].
- If it's a work in progress, consider creating this PR as a draft.
- Don't hesistate to ask for opinion and review in the community chats,
even if it's still a draft.
- Add the label `backport` if it's a bugfix that needs to be backported
to a previous version.
-->
## What this PR does
This PR updates Talos Linux which includes DRBD fixes:
- https://github.com/LINBIT/drbd/releases/tag/drbd-9.2.15
- https://github.com/LINBIT/drbd/releases/tag/drbd-9.2.16
### Release note
<!-- Write a release note:
- Explain what has changed internally and for users.
- Start with the same [label] as in the PR title
- Follow the guidelines at
https://github.com/kubernetes/community/blob/master/contributors/guide/release-notes.md.
-->
```release-note
Update Talos Linux v1.11.
```
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit
* **Chores**
* Bumped Talos profile version from v1.11.3 to v1.12.1 across all
configuration profiles
* Updated baseInstaller and system extensions with latest firmware and
driver images
* Added output format settings to profile configurations
<sub>✏️ Tip: You can customize this high-level summary in your review
settings.</sub>
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
## What this PR does
Increases etcd probe thresholds to allow more time for cluster
synchronization after pod restarts. This addresses issues where etcd
members were being killed by startup probes before they could fully sync
with the cluster, especially when VPA assigns minimal resources.
Changes:
- `startupProbe.failureThreshold`: 3 → 300 (allows 25 minutes for
initial sync)
- `livenessProbe.failureThreshold`: default → 10 (reduces unnecessary
restarts)
- `readinessProbe.failureThreshold`: default → 3
### Release note
```release-note
[etcd] Increase probe thresholds to prevent premature pod termination during cluster synchronization
```
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit
* **Chores**
* Improved etcd service stability with automatic health monitoring and
failover detection capabilities in Kubernetes environments.
<sub>✏️ Tip: You can customize this high-level summary in your review
settings.</sub>
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
Change the default resourcesPreset for kube-apiserver from "medium" to
"large" to prevent OOM kills during normal operation. The "medium"
preset provides insufficient memory for clusters with moderate workloads.
Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: Andrei Kvapil <kvapss@gmail.com>
Increase startup probe failureThreshold to 300 (25 minutes) to allow
etcd members more time to sync with the cluster after restart or
recovery. This prevents pods from being killed during initial
synchronization when VPA assigns minimal resources.
Also increase liveness probe failureThreshold to 10 to reduce
unnecessary restarts during temporary network issues.
Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: Andrei Kvapil <kvapss@gmail.com>
<!-- Thank you for making a contribution! Here are some tips for you:
- Start the PR title with the [label] of Cozystack component:
- For system components: [platform], [system], [linstor], [cilium],
[kube-ovn], [dashboard], [cluster-api], etc.
- For managed apps: [apps], [tenant], [kubernetes], [postgres],
[virtual-machine] etc.
- For development and maintenance: [tests], [ci], [docs], [maintenance].
- If it's a work in progress, consider creating this PR as a draft.
- Don't hesistate to ask for opinion and review in the community chats,
even if it's still a draft.
- Add the label `backport` if it's a bugfix that needs to be backported
to a previous version.
-->
## What this PR does
Updates cilium component to version 1.18.6
### Release note
<!-- Write a release note:
- Explain what has changed internally and for users.
- Start with the same [label] as in the PR title
- Follow the guidelines at
https://github.com/kubernetes/community/blob/master/contributors/guide/release-notes.md.
-->
```release-note
[cilium] update to v1.18.6
```
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit
* **Chores**
* Upgraded Cilium to v1.18.6 and updated container images and checksums
across components (Hubble Relay, Operator, Preflight, Cluster Mesh API
Server, SPIRE, etc.).
* Updated Envoy to v1.35.9 with new image digest.
* **Bug Fixes / Behavior**
* Adjusted preflight/Envoy gating: preflight-enabled now suppresses
certain Envoy bootstrap config and removed the preflight Envoy bootstrap
config mount.
<sub>✏️ Tip: You can customize this high-level summary in your review
settings.</sub>
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
## What this PR does
Fixes aggregated API server Watch implementation to properly work with
controller-runtime informers. External controllers watching Tenant
resources via the aggregated API were experiencing issues with cache
sync timeouts and missing reconciliations on startup.
### Changes
**ResourceVersion handling in List:**
- Compute ResourceVersion from items when the cached client doesn't set
it on the list itself
**Bookmark handling:**
- Pass through bookmark events with converted types for proper informer
sync
**ADDED event filtering (main fix):**
- Simplified the filtering logic that was incorrectly skipping initial
events
- Only skip ADDED events when `startingRV > 0` AND `objRV <= startingRV`
(client already has from List)
- When `startingRV == 0`, always send ADDED events (client wants full
state)
- Removed the complex `initialSyncComplete` tracking that had inverted
logic
### Problem
When a controller starts watching resources, controller-runtime may call
Watch with `resourceVersion=""`. The server should send all existing
objects as ADDED events. The previous `initialSyncComplete` logic was
inverted and could skip these events, causing objects (like Tenants with
lock annotations) to not be reconciled on controller startup.
### Release note
```release-note
[apiserver] Fix Watch resourceVersion and bookmark handling for controller-runtime compatibility
```
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit
* **Improvements**
* Enhanced resource watching and synchronization with Kubernetes 1.27+
compatibility, including proper handling of initial events and
bookmarks.
* Optimized event filtering and resource version tracking for
applications, tenant modules, tenant namespaces, and tenant secrets to
reduce unnecessary event noise.
* Improved list metadata consistency by deriving accurate resource
versions when unavailable in responses.
<sub>✏️ Tip: You can customize this high-level summary in your review
settings.</sub>
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
- Add resourceVersion handling for Watch requests by filtering ADDED events
based on the resourceVersion provided by the client
- Forward bookmark events from underlying HelmRelease watchers to clients
for proper resourceVersion synchronization
- Extract MaxResourceVersion helper using meta.EachListItem for cleaner code
- This ensures clients don't receive duplicate objects they already have
from List+Watch patterns
Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: Andrei Kvapil <kvapss@gmail.com>
## What this PR does
Update changelog for v1.0.0-alpha.1 to include missing features:
- **Cozystack Operator**: New operator for Package and PackageSource
management (#1740, #1741, #1755, #1756, #1760, #1761)
- **Backup System**: Comprehensive backup functionality with Velero
integration (#1640, #1685, #1687, #1708, #1719, #1720, #1737, #1762)
- Add @androndo to contributors
- Update Full Changelog link to v0.38.0...v1.0.0-alpha.1
### Release note
```release-note
[docs] Update changelog for v1.0.0-alpha.1: add cozystack-operator and backup system
```
This PR prepares the release `v1.0.0-alpha.1`.
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit
* **Chores**
* Updated container images for core system components to v1.0.0-alpha.1
* Upgraded Cilium networking from 1.17.8 to 1.18.5
* Updated infrastructure component images including Metallb, Linstor,
and storage services with latest digests for enhanced security and
stability
<sub>✏️ Tip: You can customize this high-level summary in your review
settings.</sub>
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
Add file existence checks to talos asset targets to prevent
rebuilding kernel, initramfs, installer, iso, nocloud, and metal
assets when they already exist from a previous build step.
This fixes duplicate talos builds in the release pipeline where
`make build` and `make assets` both triggered talos image generation.
Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: Andrei Kvapil <kvapss@gmail.com>
## What this PR does
Renames `CozystackResourceDefinition` CRD to `ApplicationDefinition` for
better clarity and consistency.
Changes include:
- Renamed all Go types (ApplicationDefinition,
ApplicationDefinitionList, ApplicationDefinitionSpec, etc.)
- Renamed controller files and reconciler structs
- Updated all packages that use these types (dashboard,
lineagecontrollerwebhook, crdmem, etc.)
- Renamed CRD Helm chart from `cozystack-resource-definition-crd` to
`application-definition-crd`
- Updated all 25 cozyrds YAML manifests to use `kind:
ApplicationDefinition`
- Added migration 23 to remove old `cozystack-resource-definition-crd`
HelmRelease
### Release note
```release-note
[api] Rename CozystackResourceDefinition CRD to ApplicationDefinition
```
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit
* **Refactor**
* Renamed the primary custom resource from CozystackResourceDefinition
to ApplicationDefinition across the platform, dashboards, controllers,
and manifests.
* Updated packaging and Helm charts to use ApplicationDefinition
consistently.
* **Chores**
* Bumped platform migration target to v24 and added a migration to apply
the new resource CRD.
* **Bug Fix**
* Restored deep-copy support for Component-related types.
<sub>✏️ Tip: You can customize this high-level summary in your review
settings.</sub>
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
Rename the CRD and all related types for better clarity:
- CozystackResourceDefinition -> ApplicationDefinition
- CozystackResourceDefinitionList -> ApplicationDefinitionList
- CozystackResourceDefinitionSpec -> ApplicationDefinitionSpec
- All related nested types updated accordingly
Updated components:
- API types and generated deepcopy code
- Controllers and reconcilers
- Dashboard, lineagecontrollerwebhook, crdmem packages
- CRD YAML definition and Helm chart
- All 25 cozyrds YAML manifests
- Migration scripts and documentation
Added migration 23 to remove old cozystack-resource-definition-crd HelmRelease.
Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: Andrei Kvapil <kvapss@gmail.com>
## What this PR does
- Add pattern rule for building cozypkg binaries per platform
(`assets-cozypkg-<os>-<arch>`) with checksums generation
- Add Version variable to cozypkg CLI injected via ldflags
- Split manifests into separate `cozystack-crds.yaml` and
`cozystack-operator.yaml` files
- Update CI workflow to handle CRDs and operator as separate artifacts
- Update e2e tests and upload script for new manifest structure
### Release note
```release-note
[ci] Add cross-platform cozypkg build targets with version injection and split installer manifests into CRDs and operator
```
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit
* **New Features**
* Multi-platform binaries (Linux, macOS, Windows; amd64 & arm64)
* Checksum file added for release artifact verification
* **Refactor**
* Installation flow reworked to validate and apply CRDs and Operator
manifests separately
* CLI exposes a configurable build-time version for reporting
<sub>✏️ Tip: You can customize this high-level summary in your review
settings.</sub>
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
- Add pattern rule for building cozypkg binaries per platform
(assets-cozypkg-<os>-<arch>) with checksums generation
- Add Version variable to cozypkg CLI injected via ldflags
- Split manifests into separate cozystack-crds.yaml and
cozystack-operator.yaml files
- Update CI workflow to handle CRDs and operator as separate artifacts
- Update e2e tests and upload script for new manifest structure
- Suppress git describe stderr when no tags exist
Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: Andrei Kvapil <kvapss@gmail.com>
## What this PR does
- Move `common-envs.mk` and `package.mk` from `scripts/` to `hack/`
directory
- Update all Makefile includes to use new paths
- Remove unused `issue-flux-certificates.sh` script
### Release note
```release-note
[refactor] Move build scripts from scripts/ to hack/ directory
```
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit
* **Chores**
* Reorganized build infrastructure by consolidating helper scripts and
configuration paths.
* Removed deprecated script from the build tooling.
<sub>✏️ Tip: You can customize this high-level summary in your review
settings.</sub>
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
Move common-envs.mk and package.mk from scripts/ to hack/ directory.
Update all Makefile includes to use new paths. Remove unused
issue-flux-certificates.sh script.
Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: Andrei Kvapil <kvapss@gmail.com>
Replace HelmRelease-based bundle system with Package resources managed
by cozystack-operator. Restructure values.yaml with full configuration
support.
## What this PR does
- Restructure values.yaml with full configuration (networking,
publishing, authentication, scheduling, branding, resources)
- Add values-isp-full.yaml and values-isp-hosted.yaml for bundle
variants
- Create templates/packages/isp-full.yaml with Package resources
- Move PackageSources from sources/ to templates/sources/
- Remove old bundle files and HelmRelease templates
- Add hack/migrate-to-version-1.0.sh migration script for converting
ConfigMaps to Package resource
### Release note
```release-note
[platform] Migrate from HelmRelease bundles to Package-based deployment
```
Add dedicated flux-tenants controller with label selector
--watch-label-selector=sharding.fluxcd.io/key=tenants to handle
tenant workloads separately from platform components.
Update all kubernetes app HelmReleases to:
- Use chartRef with ExternalArtifact instead of OCIRepository sourceRef
- Add sharding.fluxcd.io/key=tenants label
- Add cozystack.io/target-cluster-name label
Update fluxinstall to parse multiple YAML manifest files and
use flux service for storage-adv-addr.
Add cozystack-basics package for core tenant/namespace setup.
Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: Andrei Kvapil <kvapss@gmail.com>
Restructure platform bundles from monolithic files to modular
directory structure with separate applicationdefinitions.
Add PackageSources for better dependency management and
migrate from legacy HelmRepositories to new repository format.
Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: Andrei Kvapil <kvapss@gmail.com>
Update all CozystackResourceDefinition files to use chartRef
with ExternalArtifact instead of OCIRepository sourceRef.
Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: Andrei Kvapil <kvapss@gmail.com>
Remove legacy installer components (cozystack-assets-server,
installer.sh script, cozystack container image) in favor of
cozystack-operator based deployment.
Move migration scripts from scripts/migrations/ to
packages/core/platform/images/migrations/ for containerized execution.
Add grafana-dashboards image for centralized dashboard management.
Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: Andrei Kvapil <kvapss@gmail.com>
Replace the chart field with chartRef for referencing Helm charts via
ExternalArtifact resources. This enables the Package controller to
manage chart sources centrally.
Changes:
- Add chartRef field to CozystackResourceDefinition spec
- Remove chart field (deprecated)
- Remove validation (moved to controller)
- Update lineage mapper for new field structure
- Regenerate openapi specs
Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: Andrei Kvapil <kvapss@gmail.com>
Fix getVersion to parse "0.1.4+abcdef" format (with "+" separator)
instead of incorrectly looking for "sha256:" prefix.
Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: Andrei Kvapil <kvapss@gmail.com>
<!-- Thank you for making a contribution! Here are some tips for you:
- Start the PR title with the [label] of Cozystack component:
- For system components: [platform], [system], [linstor], [cilium],
[kube-ovn], [dashboard], [cluster-api], etc.
- For managed apps: [apps], [tenant], [kubernetes], [postgres],
[virtual-machine] etc.
- For development and maintenance: [tests], [ci], [docs], [maintenance].
- If it's a work in progress, consider creating this PR as a draft.
- Don't hesistate to ask for opinion and review in the community chats,
even if it's still a draft.
- Add the label `backport` if it's a bugfix that needs to be backported
to a previous version.
-->
## What this PR does
upstream PR https://github.com/piraeusdatastore/linstor-csi/pull/403
### Release note
<!-- Write a release note:
- Explain what has changed internally and for users.
- Start with the same [label] as in the PR title
- Follow the guidelines at
https://github.com/kubernetes/community/blob/master/contributors/guide/release-notes.md.
-->
```release-note
[linstor] Refactor node-level RWX validation
```
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit
* **New Features**
* Added a command-line option to disable strict RWX block volume
validation for specialized deployments.
* Centralized RWX validation into a shared utility used by the driver.
* **Improvements**
* Improved VM identification and pod/PV handling for RWX block
attachments, including hotplug scenarios.
* **Tests**
* Added comprehensive unit tests covering RWX validation and related
pod/PV scenarios.
<sub>✏️ Tip: You can customize this high-level summary in your review
settings.</sub>
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
## What this PR does
Removes node-level RWX block validation from linstor-csi as
controller-level check is sufficient. The controller already validates
that all pods attached to RWX block volume belong to the same VM by
extracting vmName from pod owner references (VirtualMachineInstance).
This simplifies the validation logic and fixes VM live migration issues.
### Release note
```release-note
[linstor] Remove node-level RWX block validation to fix VM live migration
```
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit
* **Improvements**
* Enhanced RWX (read-write-many) block validation with VM-aware checks
across node and controller flows, including support for hotplug-disk
pods and stricter prevention of cross-VM block sharing.
* Improved propagation and resolution of VM identity for attachments to
ensure consistent validation.
* **Tests**
* Added comprehensive unit tests covering single/multiple pod scenarios,
VM ownership, hotplug disks, upgrade paths, and legacy volumes.
<sub>✏️ Tip: You can customize this high-level summary in your review
settings.</sub>
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
## What this PR does
This PR updates piraeus-server patches to address several critical
production issues with DRBD resources and LUKS encryption:
1. **Add fix-duplicate-tcp-ports.diff** - Prevents duplicate TCP ports
after toggle-disk operations (upstream PR #476)
2. **Update skip-adjust-when-device-inaccessible.diff** - Comprehensive
fix for multiple issues:
- Resources stuck in StandAlone state after node reboot
- Unknown state race condition during satellite restart
- Encrypted LUKS resource deletion failures
- Network reconnect blocked by unavailable child device checks
These patches resolve scenarios where DRBD resources fail to
automatically reconnect after node reboots and improve LUKS resource
lifecycle management.
Upstream PRs:
- https://github.com/LINBIT/linstor-server/pull/476
- https://github.com/LINBIT/linstor-server/pull/477
### Release note
```release-note
[linstor] Fix DRBD resources stuck in StandAlone state after reboot and encrypted resource deletion issues
```
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit
* **Bug Fixes**
* Prevents duplicate TCP port conflicts after disk toggle operations
* Fixes resources stuck in StandAlone or Unknown state after reboot
* Resolves issues with encrypted resource deletion
* Improves handling of temporarily inaccessible storage devices
<sub>✏️ Tip: You can customize this high-level summary in your review
settings.</sub>
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
Remove node-level RWX block validation from linstor-csi as
controller-level check is sufficient. The controller already validates
that all pods attached to RWX block volume belong to the same VM by
extracting vmName from pod owner references (VirtualMachineInstance).
This simplifies the validation logic and fixes VM live migration issues.
Update linstor-csi image tag with rebuilt image containing the fix.
Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: Andrei Kvapil <kvapss@gmail.com>
Update piraeus-server patches to address critical production issues:
- Add fix-duplicate-tcp-ports.diff to prevent duplicate TCP ports
after toggle-disk operations (upstream PR #476)
- Update skip-adjust-when-device-inaccessible.diff with comprehensive
fixes for resources stuck in StandAlone after reboot, Unknown state
race condition, and encrypted LUKS resource deletion (upstream PR #477)
```release-note
[linstor] Fix DRBD resources stuck in StandAlone state after reboot and encrypted resource deletion issues
```
Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: Andrei Kvapil <kvapss@gmail.com>
## What this PR does
This PR generates missing changelogs for releases v0.37.10,
v0.38.5-v0.38.8, v0.39.2-v0.39.4, and v0.40.0 following the instructions
from `docs/agents/changelog.md`.
### Release note
```release-note
[docs] Add missing changelogs for releases v0.37.10, v0.38.5-v0.38.8, v0.39.2-v0.39.4, and v0.40.0
```
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit
* **Documentation**
* Added release notes for versions v0.37.10 through v0.40.0, documenting
improvements to VM storage resize, dashboard form schema, Windows VM
scheduling, SeaweedFS updates, and various platform enhancements and bug
fixes.
<sub>✏️ Tip: You can customize this high-level summary in your review
settings.</sub>
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
Generated changelogs for missing releases following changelog.md instructions:
- v0.37.10: Backports for dashboard schema fix and VM resize improvements
- v0.38.5-v0.38.8: Backports for SeaweedFS, Cilium, VM scheduling, and Multus fixes
- v0.39.2-v0.39.4: Backports for VM services, tenant policies, LINSTOR fixes, and Multus dependencies
- v0.40.0: Minor release with LINSTOR scheduler, SeaweedFS traffic locality, valuesFrom mechanism, auto-diskful, and version management systems
All changelogs include proper PR author attribution using GitHub CLI and user impact descriptions.
Signed-off-by: Andrei Kvapil <kvapss@gmail.com>
## What this PR does
Fixes field selector filtering for registry resources (Applications,
TenantModules, TenantSecrets) when using kubectl with field selectors
like `--field-selector=metadata.namespace=tenant-kvaps` or
`metadata.name=test`.
Controller-runtime cache doesn't support field selectors natively, which
caused incorrect filtering behavior. This PR implements manual filtering
for `metadata.name` and `metadata.namespace` field selectors in List()
and Watch() methods.
Changes:
- Created `pkg/registry/fields` package with `ParseFieldSelector`
utility for common field selector parsing
- Refactored field selector logic in application, tenantmodule, and
tenantsecret registries to use the common implementation
- Implemented manual post-processing filtering after label-based queries
- Removed `Raw` field usage and field selectors from
`client.ListOptions`
### Release note
```release-note
[registry] Fix field selector filtering for kubectl queries with metadata.name and metadata.namespace selectors
```
Controller-runtime cache doesn't support field selectors, causing
incorrect filtering when using kubectl with field selectors like
--field-selector=metadata.namespace=tenant-kvaps or metadata.name=test.
Changes:
- Created pkg/registry/fields package with ParseFieldSelector utility
- Refactored field selector parsing logic in application, tenantmodule,
and tenantsecret registries to use common implementation
- Implemented manual filtering for metadata.name and metadata.namespace
in List() and Watch() methods
- Removed Raw field usage and field selectors from client.ListOptions
- Label selectors passed directly via LabelSelector field
Field selectors now properly filter resources by name and namespace
through manual post-processing after label-based filtering.
See: https://github.com/kubernetes-sigs/controller-runtime/issues/612
Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: Andrei Kvapil <kvapss@gmail.com>
## What this PR does
Adds multus as a dependency as other CNIs
### Release note
```release-note
Adds multus as a dependency as other CNIs
```
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit
* **Chores**
* Updated component initialization ordering to ensure more consistent
and reliable platform startup sequencing.
* Streamlined dependency requirements for several core system components
to improve deployment reliability and reduce startup complexity.
<sub>✏️ Tip: You can customize this high-level summary in your review
settings.</sub>
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
## What this PR does
This PR adds custom linstor-csi image build to the linstor package with
an important upstream patch for RWX block volume validation.
**Changes:**
- Add linstor-csi Dockerfile based on upstream linstor-csi master
- Import patch from upstream PR piraeusdatastore/linstor-csi#403 for RWX
block volume validation
- Update Makefile to build both piraeus-server and linstor-csi images
- Configure LinstorCluster CR to use custom linstor-csi image in CSI
controller and node pods
**RWX Validation Patch:**
The imported patch prevents misuse of DRBD allow-two-primaries by
ensuring RWX block volumes are only used by pods belonging to the same
KubeVirt VM during live migration. This prevents data corruption when
multiple non-related pods attempt to use the same RWX block volume.
**Upstream PR:**
https://github.com/piraeusdatastore/linstor-csi/pull/403
### Release note
```release-note
[linstor] Add custom linstor-csi image build with RWX block volume validation patch that prevents data corruption during KubeVirt live migration
```
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit
* **New Features**
* Integrated LINSTOR CSI with deployable CSI components and node init
helpers; packaged as a production-ready image.
* **Updates**
* Added build/publish flow for Piraeus server and LINSTOR CSI images;
chart values updated with new image references and tags.
* **Behavior changes**
* Node-side RWX block-volume pre-validation to prevent conflicting
multi-node/multi-writer access.
* **Tests**
* Comprehensive RWX validation tests covering multi-pod and VM
scenarios.
<sub>✏️ Tip: You can customize this high-level summary in your review
settings.</sub>
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
Add custom linstor-csi image build to packages/system/linstor:
- Add Dockerfile based on upstream linstor-csi
- Import patch from upstream PR #403 for RWX block volume validation
(prevents misuse of allow-two-primaries in KubeVirt live migration)
- Update Makefile to build both piraeus-server and linstor-csi images
- Configure LinstorCluster CR to use custom linstor-csi image in
CSI controller and node pods
The RWX validation patch ensures that RWX block volumes with
allow-two-primaries are only used by pods belonging to the same
KubeVirt VM during live migration.
Upstream PR: https://github.com/piraeusdatastore/linstor-csi/pull/403
Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: Andrei Kvapil <kvapss@gmail.com>
## What this PR does
Removes multus memory limit due to short but unpredictable and large
memory consumption in some cases, such as starting up after a node
reboot (reported up to 3Gi).
The root cause will be fixed in future releases.
### Release note
```release-note
Multus memory limit removed.
```
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit
## Release Notes
* **Chores**
* Updated Multus system namespace configuration and DaemonSet naming for
improved environment organization.
* Adjusted container resource allocation: increased memory requests and
refined memory limits for optimized performance.
* Updated deployment template specifications to reflect infrastructure
changes.
<sub>✏️ Tip: You can customize this high-level summary in your review
settings.</sub>
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
## What this PR does
Update kube-ovn from v1.14.11 to v1.14.25.
Changes synced from upstream include:
- Updated chart templates
- New configuration options in values
### Release note
```release-note
[kube-ovn] Update to v1.14.25
```
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit
* **New Features**
* Added tolerations support for VpcNatGateway resources
* Added automatic VLAN subinterface creation capability for provider
networks
* Enhanced installation documentation with OCI Registry and Talos Linux
deployment examples
* **Security Improvements**
* Applied runtime-default seccomp profiles across deployments
* Configured service account token mounting behavior for improved pod
security
* **Chores**
* Updated Kube-OVN to v1.14.25
* Updated Helm chart metadata and dependencies
<sub>✏️ Tip: You can customize this high-level summary in your review
settings.</sub>
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
## Problem
After satellite restart with patched linstor-server, some DRBD resources
ended up in **Unknown** state. Investigation revealed a race condition
between `drbdadm adjust` completion and `updateDiscGran()` lsblk check:
1. `drbdadm adjust` completes successfully (21:53:49.791) - brings
devices up
2. `updateDiscGran()` immediately tries to check discard granularity
(21:53:49.799)
3. `/dev/drbd*` device node doesn't exist yet - kernel hasn't created it
4. `lsblk` fails with exit code 32: "not a block device"
5. `StorageException` interrupts DeviceManager cycle (21:53:50.811)
6. DRBD device remains in incomplete state → **Unknown**
### Timeline from logs
```
21:53:49.791 - [DeviceManager] Resource 'pvc-aafbd92a' [DRBD] adjusted ✓
21:53:49.799 - [lsblk_parser] ERROR: /dev/drbd1169 not a block device
21:53:49.804-50.807 - [lsblk_parser] 10+ retry errors (every ~100ms)
21:53:50.811 - [DeviceManager] ERROR: Error executing lsblk
21:53:50.878 - [DeviceManager] End cycle 9 (WITH ERROR!)
```
## Solution
Update `skip-adjust-when-device-inaccessible.diff` patch to add physical
device path existence check before calling `lsblk`:
- Check `Files.exists(devicePath)` in `updateDiscGran()` before lsblk
- If device doesn't exist yet → skip check, set `discGran =
UNINITIALIZED_SIZE`
- Next DeviceManager cycle (few seconds later) → device node available →
lsblk succeeds
This complements the existing patch which checks **child layer** devices
(LUKS/Storage) for deletion scenarios, while this fix addresses the
**DRBD device itself** during adjust operations.
## Testing
Manual `drbdadm up` on affected devices confirmed they were in down
state and brought them back to UpToDate, proving the issue was
incomplete device initialization.
## Related
- Upstream linstor-server PR:
https://github.com/LINBIT/linstor-server/pull/471#issuecomment-3723392917
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit
* **Bug Fixes**
* Enhanced robustness for storage operations by adding device
accessibility validation. Operations now gracefully skip when device
paths are unavailable or invalid, preventing unnecessary failures and
improving system resilience during device access issues.
<sub>✏️ Tip: You can customize this high-level summary in your review
settings.</sub>
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
## What this PR does
By its nature, the kubernetes client provisioned in keycloak is public:
anyone can read the client secret. Therefore the secret is not necessary
and the client should be explicitly provisioned as a public client, not
as a confidential client.
### Release note
```release-note
[keycloak] Change the kubernetes client to be public, without a client
secret.
```
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit
* **New Features**
* Add OIDC enablement flag and optional host derivation from tenant
release; allow explicit management kubeconfig endpoint override.
* **Refactor**
* Convert Keycloak client to public mode and remove reliance on a
Kubernetes client secret, simplifying authentication configuration.
* Remove secret-based credential plumbing from deployment templates,
streamlining kubeconfig generation.
<sub>✏️ Tip: You can customize this high-level summary in your review
settings.</sub>
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
## What this PR does
Enable DRBD auto-diskful options to automatically convert diskless nodes
to diskful when they remain in Primary state for an extended period.
### How it works
When LINSTOR is integrated with Kubernetes, the platform may schedule
workloads on nodes that don't have local storage replicas. In such
cases, DRBD operates in "diskless" mode, accessing data over the network
from nodes that have actual disk replicas.
The `auto-diskful` feature addresses this by:
1. **Monitoring Primary state duration**: When a diskless node holds a
DRBD resource in Primary state (actively using the volume) for more than
the configured time (30 minutes), LINSTOR automatically creates a local
disk replica on that node.
2. **Automatic cleanup**: With `auto-diskful-allow-cleanup` enabled,
when the resource is no longer in Primary state on that node, LINSTOR
can automatically remove the disk replica that was created, freeing up
storage space.
### Configuration
- `DrbdOptions/auto-diskful: 30` — Convert diskless to diskful after 30
minutes in Primary state
- `DrbdOptions/auto-diskful-allow-cleanup: true` — Allow automatic
removal of auto-created replicas when no longer needed
### Benefits
- Improves I/O performance for long-running workloads by creating local
replicas
- Reduces network traffic for frequently accessed data
- Automatic cleanup prevents storage waste from temporary replicas
### Release note
```release-note
[linstor] Enable auto-diskful to automatically create local replicas on diskless nodes that hold volumes in Primary state for more than 30 minutes, improving I/O performance for long-running workloads.
```
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit
* **New Features**
* Introduced an "auto diskful" option to automatically manage diskful
resources with configurable interval and optional cleanup.
* **Chores**
* Added default configuration values to enable and time the auto-diskful
behavior and to control automatic cleanup.
<sub>✏️ Tip: You can customize this high-level summary in your review
settings.</sub>
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
## What this PR does
Add linstor-scheduler-extender for optimal pod placement on nodes with
LINSTOR storage.
### How it works
The linstor-scheduler consists of two components:
#### 1. Scheduler Extender
A custom Kubernetes scheduler that works alongside the default
kube-scheduler. When a pod requests LINSTOR-backed storage, the
scheduler extender communicates with the LINSTOR controller to find
nodes that have local replicas of the requested volumes, prioritizing
placement on nodes with existing data to minimize network traffic.
#### 2. Admission Webhook
A MutatingWebhookConfiguration that automatically sets `schedulerName:
linstor-scheduler` on pods that use LINSTOR CSI driver volumes. This
ensures that only pods with LINSTOR PVCs are routed through the custom
scheduler, while all other pods continue using the default scheduler.
The webhook inspects pod creation requests and checks if any of the
pod's PVCs use a StorageClass with the LINSTOR CSI provisioner
(`linstor.csi.linbit.com`). If so, it mutates the pod spec to use the
linstor-scheduler.
### Components
- `linstor-scheduler` package in `packages/system/linstor-scheduler/`
- PackageSource definition in
`packages/core/platform/sources/linstor-scheduler.yaml`
- Integration into `paas-full` and `distro-full` bundles
### Upstream PRs
- https://github.com/piraeusdatastore/helm-charts/pull/67 — fix:
KubeSchedulerConfiguration v1 support for Kubernetes 1.25+
- https://github.com/piraeusdatastore/helm-charts/pull/68 — feat:
admission webhook support with cert-manager
- https://github.com/piraeusdatastore/helm-charts/pull/69 — fix chart
template issues
### Release note
```release-note
[linstor] Add linstor-scheduler for optimal pod placement on nodes with local LINSTOR replicas. Includes admission webhook that automatically routes pods using LINSTOR CSI volumes to the custom scheduler.
```
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit
* **New Features**
* Added LINSTOR scheduler component to the platform, packaged as a Helm
chart and installable package.
* Includes admission webhook, scheduler extender, autoscaling support,
pod disruption budget, RBAC and service account integration, and Helm
test hooks.
* **Documentation**
* Added chart README and default configuration values for easy
deployment and customization.
<sub>✏️ Tip: You can customize this high-level summary in your review
settings.</sub>
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
## What this PR does
Addes traffic locality capabilities to Seaweedfs.
### Release note
```release-note
Added traffic locality capabilities to Seaweedfs
Add physical device path existence check to prevent race condition where
lsblk is called before kernel creates DRBD device node after drbdadm adjust.
This fixes issue where resources ended up in Unknown state after satellite
restart because:
- drbdadm adjust completes successfully (brings devices up)
- updateDiscGran() immediately tries lsblk
- /dev/drbd* doesn't exist yet (kernel hasn't created node)
- lsblk fails with "not a block device" exit code 32
- StorageException interrupts DeviceManager cycle
- Device remains in incomplete state
The patch now checks Files.exists(devicePath) before lsblk, allowing
the check to be retried in next cycle when device node is available.
Upstream: https://github.com/LINBIT/linstor-server/pull/471
Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: Andrei Kvapil <kvapss@gmail.com>
## What this PR does
By its nature, the kubernetes client provisioned in keycloak is public:
anyone can read the client secret. Therefore the secret is not
necessary and the client should be explicitly provisioned as a public
client, not as a confidential client.
### Release note
```release-note
[keycloak] Change the kubernetes client to be public, without a client
secret.
```
Signed-off-by: Timofei Larkin <lllamnyp@gmail.com>
## What this PR does
Add representation for backup Plans, BackupJobs and Backups in the dashboard and a readme
describing, how to add further resources to the dashboard.
### Release note
```release-note
[backups,dashboard] Add backup Plans, BackupJobs, and Backups to the dashboard.
```
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit
* **New Features**
* Added a Backups section in the dashboard navigation with Plans,
BackupJobs, and Backups.
* Added detailed views for Plan, BackupJob, and Backup showing metadata,
references, status, timings, and artifacts.
* Enhanced list views with custom columns (application refs,
phase/status, created/taken timestamps).
* **Documentation**
* Added a Dashboard integration guide documenting how to add resources,
list/detail wiring, and verification steps.
<sub>✏️ Tip: You can customize this high-level summary in your review
settings.</sub>
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
## What this PR does
Cozystack-api and lineage-webhook tolerate all taints, so they are able
to run on masters no matter what.
Needed for unschedulable control-plane setup, quorum nodes, etc.
### Release note
```release-note
Cozystack-api and lineage-webhook tolerate all taints.
```
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit
* **Chores**
* Updated pod scheduling tolerations to improve deployment flexibility
and compatibility across diverse cluster configurations.
* Enhanced workload distribution by making pods more resilient to
cluster taints, enabling better resource utilization in various
environments.
<sub>✏️ Tip: You can customize this high-level summary in your review
settings.</sub>
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
Add DRBD options to automatically convert diskless nodes to diskful
when they remain in Primary state for more than 30 minutes.
Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: Andrei Kvapil <kvapss@gmail.com>
Add linstor-scheduler-extender for optimal pod placement on nodes
with LINSTOR storage.
Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: Andrei Kvapil <kvapss@gmail.com>
## Summary
Fix bug in toggle-disk retry logic that left orphaned DRBD devices in
kernel.
## Problem
When toggle-disk retry was triggered (e.g., user retries after a failed
operation), the code called `removeLayerData()` to clean up and recreate
the layer stack. However, `removeLayerData()` only removes data from the
controller's database — it does NOT call `drbdadm down` on the
satellite.
This caused DRBD devices to remain in the kernel (visible in `drbdsetup`
but not managed by LINSTOR), occupying ports and blocking subsequent
operations.
## Solution
Changed retry logic to simply repeat the operation with existing layer
data intact. The satellite handles this idempotently without creating
orphaned resources.
## Upstream
- https://github.com/LINBIT/linstor-server/pull/475 (updated)
```release-note
[linstor] Fix orphaned DRBD devices during toggle-disk retry
```
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit
* **New Features**
* Added cancel and retry capabilities for disk addition operations
* Added cancel and retry capabilities for disk removal operations
* Improved cleanup handling for diskless resources with orphaned storage
layers
<sub>✏️ Tip: You can customize this high-level summary in your review
settings.</sub>
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
The previous retry logic in toggle-disk removed layer data from controller DB
and recreated it. However, removeLayerData() only deletes from the database
without calling drbdadm down on the satellite, leaving orphaned DRBD devices
in the kernel that occupy ports and block new operations.
This fix changes retry to simply repeat the operation with existing layer data,
allowing the satellite to handle it idempotently.
Upstream: https://github.com/LINBIT/linstor-server/pull/475
Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: Andrei Kvapil <kvapss@gmail.com>
Moved README describing how to add new resources to the Cozystack
dashboard to a separate commit to directly reference relevant code
changes from the document.
Signed-off-by: Timofei Larkin <lllamnyp@gmail.com>
## What this PR does
Add representation for backup Plans in the dashboard and a readme
describing, how to add further resources to the dashboard.
### Release note
```release-note
[backups,dashboard] Add backup Plans to the dashboard.
```
Signed-off-by: Timofei Larkin <lllamnyp@gmail.com>
Update kube-ovn from v1.14.11 to v1.14.25.
Sync chart templates with upstream changes.
Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: Andrei Kvapil <kvapss@gmail.com>
Add documentation explaining how to enable Hubble for network
observability in Grafana. Include four pre-built Hubble dashboards
(overview, dns-namespace, l7-http-metrics, network-overview) and
register them in the monitoring hub's dashboard list.
Closes#749
Signed-off-by: majiayu000 <1835304752@qq.com>
copilot --prompt "prepare changelog file for tagged release v${{ steps.tag.outputs.version }}, use @docs/agents/changelog.md for it. Create the changelog file at docs/changelogs/v${{ steps.tag.outputs.version }}.md" \
const version = '${{ steps.tag.outputs.version }}';
const changelogBranch = `changelog-v${version}`;
const baseBranch = 'main';
// Check if PR already exists
const prs = await github.rest.pulls.list({
owner: context.repo.owner,
repo: context.repo.repo,
head: `${context.repo.owner}:${changelogBranch}`,
base: baseBranch,
state: 'open'
});
if (prs.data.length > 0) {
const pr = prs.data[0];
console.log(`PR #${pr.number} already exists for changelog branch ${changelogBranch}`);
// Update PR body with latest info
const body = `This PR adds the changelog for release \`v${version}\`.\n\n✅ Changelog has been automatically generated in \`docs/changelogs/v${version}.md\`.`;
await github.rest.pulls.update({
owner: context.repo.owner,
repo: context.repo.repo,
pull_number: pr.number,
body: body
});
console.log(`Updated existing PR #${pr.number}`);
} else {
// Create new PR
const pr = await github.rest.pulls.create({
owner: context.repo.owner,
repo: context.repo.repo,
head: changelogBranch,
base: baseBranch,
title: `docs: add changelog for v${version}`,
body: `This PR adds the changelog for release \`v${version}\`.\n\n✅ Changelog has been automatically generated in \`docs/changelogs/v${version}.md\`.`,
draft: false
});
// Add label if needed
await github.rest.issues.addLabels({
owner: context.repo.owner,
repo: context.repo.repo,
issue_number: pr.data.number,
labels: ['documentation', 'automated']
});
console.log(`Created PR #${pr.data.number} for changelog`);
* Set `ownerReferences` so the `BackupJob` is owned by the `Plan`.
**Note:** The `BackupJob` controller resolves the `BackupClass` to determine the appropriate strategy and parameters, based on the `ApplicationRef`. The strategy template is processed with a context containing the `Application` object and `Parameters` from the `BackupClass`.
The Plan controller does **not**:
* Execute backups itself.
@@ -159,17 +159,64 @@ The Plan controller does **not**:
---
### 4.2 Storage
### 4.2 BackupClass
**API Shape**
**Group/Kind**
`backups.cozystack.io/v1alpha1, Kind=BackupClass`
TBD
**Purpose**
Define a class of backup configurations that encapsulate strategy and parameters per application type. `BackupClass` is a cluster-scoped resource that allows admins to configure backup strategies and parameters in a reusable way.
**Storage usage**
**Key fields (spec)**
*`Plan` and `BackupJob` reference `Storage` via `TypedLocalObjectReference`.
* Drivers read `Storage` to know how/where to store or read artifacts.
* Core treats `Storage` spec as opaque; it does not directly talk to S3 or buckets.
```go
typeBackupClassSpecstruct{
// Strategies is a list of backup strategies, each matching a specific application type.
**Note:** Parameters are not stored directly in `Backup`. Instead, they are resolved from `BackupClass` parameters when the backup was created. The storage location is managed by the driver (e.g., Velero's `BackupStorageLocation`) and referenced via parameters in the `BackupClass`.
**Key fields (status)**
```go
@@ -290,7 +342,8 @@ type BackupStatus struct {
* Creates a `Backup` in the same namespace (typically owned by the `BackupJob`).
* Populates `spec` fields with:
* The application, storage, strategy references.
* The application reference.
* The strategy reference (resolved from `BackupClass` during `BackupJob` execution).
*`takenAt`.
* Optional `driverMetadata`.
* Sets `status` with:
@@ -306,6 +359,8 @@ type BackupStatus struct {
* Anchor `RestoreJob` operations.
* Implement higher-level policies (retention) if needed.
**Note:** Parameters are resolved from `BackupClass` when the `BackupJob` is created. The driver uses these parameters to determine where to store backups. The storage location itself is managed by the driver (e.g., Velero's `BackupStorageLocation` CRD) and is not directly referenced in the `Backup` resource. When restoring, the driver resolves the storage location from the original `BackupClass` parameters or from the driver's own metadata.
---
### 4.5 RestoreJob
@@ -353,13 +408,13 @@ type RestoreJobStatus struct {
* Determines effective:
* **Strategy**: `backup.spec.strategyRef`.
* **Storage**: `backup.spec.storageRef`.
* **Storage**: Resolved from driver metadata or`BackupClass` parameters (e.g., `backupStorageLocationName` stored in `driverMetadata` or resolved from the original `BackupClass`).
* **Target application**: `spec.targetApplicationRef` or `backup.spec.applicationRef`.
* If effective strategy’s GVK is one of its supported strategy types → driver is responsible.
3. Behaviour:
* On first reconcile, set `status.startedAt` and `phase = Running`.
@@ -414,8 +469,10 @@ The Cozystack backups core API:
* Uses a single group, `backups.cozystack.io`, for all core CRDs.
* Cleanly separates:
* **When & where** (Plan + Storage) – core-owned.
* **When** (Plan schedule) – core-owned.
* **How & where** (BackupClass) – central configuration unit that encapsulates strategy and parameters (e.g., storage reference) per application type, resolved per BackupJob/Plan.
* **Execution** (BackupJob) – created by Plan when schedule fires, resolves BackupClass to get strategy and parameters, then delegates to driver.
* **What backup artifacts exist** (Backup) – driver-created but cluster-visible.
"Interval between telemetry data collection (e.g. 15m, 1h)")
flag.StringVar(&platformSourceURL,"platform-source-url","","Platform source URL (oci:// or https://). If specified, generates OCIRepository or GitRepository resource.")
flag.StringVar(&platformSourceName,"platform-source-name","cozystack-packages","Name for the generated platform source resource (default: cozystack-packages)")
flag.StringVar(&platformSourceName,"platform-source-name","cozystack-platform","Name for the generated platform source resource and PackageSource")
flag.StringVar(&platformSourceRef,"platform-source-ref","","Reference specification as key=value pairs (e.g., 'branch=main' or 'digest=sha256:...,tag=v1.0'). For OCI: digest, semver, semverFilter, tag. For Git: branch, tag, semver, name, commit.")
flag.StringVar(&cozyValuesSecretName,"cozy-values-secret-name","cozystack-values","The name of the secret containing cluster-wide configuration values.")
flag.StringVar(&cozyValuesSecretNamespace,"cozy-values-secret-namespace","cozy-system","The namespace of the secret containing cluster-wide configuration values.")
@@ -127,8 +137,7 @@ func main() {
os.Exit(1)
}
// Start the controller manager
setupLog.Info("Starting controller manager")
// Initialize the controller manager
mgr,err:=ctrl.NewManager(config,ctrl.Options{
Scheme:scheme,
Cache:cache.Options{
@@ -170,10 +179,26 @@ func main() {
os.Exit(1)
}
// Set up signal handler early so install phases respect SIGTERM
mgrCtx:=ctrl.SetupSignalHandler()
// Install Cozystack CRDs before starting reconcile loop
ifinstallCRDs{
setupLog.Info("Installing Cozystack CRDs before starting reconcile loop")
* **[virtual-machine] Improve check for resizing job**: Improved storage resize logic to only expand persistent volume claims when storage is being increased, preventing unintended storage reduction operations. Added validation to accurately compare current and desired storage sizes before triggering resize operations ([**@kvaps**](https://github.com/kvaps) in #1688, #1702).
## Fixes
* **[dashboard] Fix CustomFormsOverride schema to nest properties under spec.properties**: Fixed the logic for generating CustomFormsOverride schema to properly nest properties under `spec.properties` instead of directly under `properties`, ensuring correct form schema generation in the dashboard ([**@kvaps**](https://github.com/kvaps) in #1692, #1699).
* **[virtual-machine,vm-instance] Add nodeAffinity for Windows VMs based on scheduling config**: Added nodeAffinity configuration to virtual-machine and vm-instance charts to support dedicated nodes for Windows VMs. When `dedicatedNodesForWindowsVMs` is enabled in the `cozystack-scheduling` ConfigMap, Windows VMs are scheduled on nodes with label `scheduling.cozystack.io/vm-windows=true`, while non-Windows VMs prefer nodes without this label ([**@kvaps**](https://github.com/kvaps) in #1693, #1744).
* **[cilium] Enable automatic pod rollout on configmap updates**: Cilium and Cilium operator pods now automatically restart when the cilium-config ConfigMap is updated, ensuring configuration changes are applied immediately without manual intervention ([**@kvaps**](https://github.com/kvaps) in #1728, #1745).
* **Update SeaweedFS v4.02**: Updated SeaweedFS to version 4.02 with improved S3 daemon performance and fixes. This update includes better S3 compatibility and performance improvements ([**@kvaps**](https://github.com/kvaps) in #1725, #1732).
## Fixes
* **[apps] Refactor apiserver to use typed objects and fix UnstructuredList GVK**: Refactored the apiserver REST handlers to use typed objects (`appsv1alpha1.Application`) instead of `unstructured.Unstructured`, eliminating the need for runtime conversions and simplifying the codebase. Additionally, fixed an issue where `UnstructuredList` objects were using the first registered kind from `typeToGVK` instead of the kind from the object's field when multiple kinds are registered with the same Go type. This fix includes the upstream fix from kubernetes/kubernetes#135537 ([**@kvaps**](https://github.com/kvaps) in #1679, #1709).
* **[kubevirt-operator] Fix typo in VMNotRunningFor10Minutes alert**: Fixed typo in VM alert name, ensuring proper alert triggering and monitoring for virtual machines that are not running for extended periods ([**@lexfrei**](https://github.com/lexfrei) in #1770).
* **[kubevirt-operator] Revert incorrect case change in VM alerts**: Reverted incorrect case change in VM alert names to maintain consistency with alert naming conventions ([**@lexfrei**](https://github.com/lexfrei) in #1804, #1805).
* **[multus] Remove memory limit**: Removed memory limit for Multus daemonset due to unpredictable memory consumption spikes during startup after node reboots (reported up to 3Gi). This temporary change prevents out-of-memory issues while the root cause is addressed in future releases ([**@nbykov0**](https://github.com/nbykov0) in #1834).
* **[vm] Always expose VMs with a service**: Virtual machines are now always exposed with at least a ClusterIP service, ensuring they have in-cluster DNS names and can be accessed from other pods even without public IP addresses ([**@lllamnyp**](https://github.com/lllamnyp) in #1738, #1751).
* **[tenant] Allow egress to parent ingress pods**: Updated tenant network policies to allow egress traffic to parent cluster ingress pods, enabling proper communication patterns between tenant namespaces and parent cluster ingress controllers ([**@lexfrei**](https://github.com/lexfrei) in #1765, #1776).
* **[system] Add resource requests and limits to etcd-defrag**: Added resource requests and limits to etcd-defrag job to ensure proper resource allocation and prevent resource contention during etcd maintenance operations ([**@matthieu-robin**](https://github.com/matthieu-robin) in #1785, #1786).
* **[tenant] Run cleanup job from system namespace**: Moved tenant cleanup job to run from system namespace, improving security and resource isolation for tenant cleanup operations ([**@lllamnyp**](https://github.com/lllamnyp) in #1774, #1777).
## Fixes
* **[kubevirt-operator] Fix typo in VMNotRunningFor10Minutes alert**: Fixed typo in VM alert name, ensuring proper alert triggering and monitoring for virtual machines that are not running for extended periods ([**@lexfrei**](https://github.com/lexfrei) in #1770, #1775).
* **[seaweedfs] Traffic locality**: Upgraded SeaweedFS to v4.05 with traffic locality capabilities, new admin component with web-based UI, worker component for distributed operations, and enhanced S3 monitoring with Grafana dashboards. Improves S3 service performance by routing requests to nearest available volume servers ([**@nbykov0**](https://github.com/nbykov0) in #1748, #1830).
* **[kube-ovn] Update to v1.14.25**: Updated Kube-OVN to version 1.14.25 with improved stability and new features ([**@kvaps**](https://github.com/kvaps) in #1819, #1837).
* **[linstor] Build linstor-server with custom patches**: Added custom patches to linstor-server build process, enabling platform-specific optimizations and fixes ([**@kvaps**](https://github.com/kvaps) in #1726, #1818).
* **[api, lineage] Tolerate all taints**: Updated API and lineage webhook to tolerate all taints, ensuring controllers can run on any node regardless of taint configuration ([**@nbykov0**](https://github.com/nbykov0) in #1781, #1827).
* **[ingress] Add topology anti-affinities**: Added topology anti-affinity rules to ingress controller deployment for better pod distribution across nodes ([**@kvaps**](https://github.com/kvaps) in commit 25f31022).
## Fixes
* **[linstor] fix: prevent DRBD device race condition in updateDiscGran**: Fixed race condition in DRBD device management during granularity updates, preventing potential data corruption or device conflicts ([**@kvaps**](https://github.com/kvaps) in #1829, #1836).
* **fix(linstor): prevent orphaned DRBD devices during toggle-disk retry**: Fixed issue where retry logic during disk toggle operations could leave orphaned DRBD devices, now properly cleans up devices during retry attempts ([**@kvaps**](https://github.com/kvaps) in #1823, #1825).
* **[kubernetes] Fix endpoints for cilium-gateway**: Fixed endpoint configuration for cilium-gateway, ensuring proper service discovery and connectivity ([**@kvaps**](https://github.com/kvaps) in #1729, #1808).
* **[kubevirt-operator] Revert incorrect case change in VM alerts**: Reverted incorrect case change in VM alert names to maintain consistency with alert naming conventions ([**@lexfrei**](https://github.com/lexfrei) in #1804, #1806).
## System Configuration
* **[kubeovn] Package from external repo**: Extracted Kube-OVN packaging from main repository to external repository, improving modularity ([**@lllamnyp**](https://github.com/lllamnyp) in #1535).
## Development, Testing, and CI/CD
* **[testing] Add aliases and autocomplete**: Added shell aliases and autocomplete support for testing commands, improving developer experience ([**@lllamnyp**](https://github.com/lllamnyp) in #1803, #1809).
## Dependencies
* **[seaweedfs] Traffic locality**: Upgraded SeaweedFS to v4.05 with traffic locality capabilities ([**@nbykov0**](https://github.com/nbykov0) in #1748, #1830).
* **[kube-ovn] Update to v1.14.25**: Updated Kube-OVN to version 1.14.25 ([**@kvaps**](https://github.com/kvaps) in #1819, #1837).
* **[paas-full] Add multus dependencies similar to other CNIs**: Added Multus as a dependency in the paas-full package, consistent with how other CNIs are included. This ensures proper dependency management and simplifies the installation process for environments using Multus networking ([**@nbykov0**](https://github.com/nbykov0) in #1835).
* **[linstor] Update piraeus-server patches with critical fixes**: Backported critical patches to piraeus-server that address storage stability issues and improve DRBD resource handling. These patches fix edge cases in device management and ensure more reliable storage operations ([**@kvaps**](https://github.com/kvaps) in #1850, #1853).
This release introduces LINSTOR scheduler for optimal pod placement, SeaweedFS traffic locality, a new valuesFrom-based configuration mechanism, auto-diskful for LINSTOR, automated version management systems, and numerous improvements across the platform.
## Feature Highlights
### LINSTOR Scheduler for Optimal Pod Placement
Cozystack now includes a custom Kubernetes scheduler extender that works alongside the default kube-scheduler to optimize pod placement on nodes with LINSTOR storage. When a pod requests LINSTOR-backed storage, the scheduler communicates with the LINSTOR controller to find nodes that have local replicas of the requested volumes, prioritizing placement on nodes with existing data to minimize network traffic and improve I/O performance.
The scheduler includes an admission webhook that automatically routes pods using LINSTOR CSI volumes to the custom scheduler, ensuring seamless integration without manual configuration. This feature significantly improves performance for workloads using LINSTOR storage by reducing network latency and improving data locality.
Learn more about LINSTOR in the [documentation](https://cozystack.io/docs/operations/storage/linstor/).
### SeaweedFS Traffic Locality
SeaweedFS has been upgraded to version 4.05 with new traffic locality capabilities that optimize S3 service traffic distribution. The update includes a new admin component with a web-based UI and authentication support, as well as a worker component for distributed operations. These enhancements improve S3 service performance and provide better visibility through enhanced Grafana dashboard panels for buckets, API calls, costs, and performance metrics.
The traffic locality feature ensures that S3 requests are routed to the nearest available volume servers, reducing latency and improving overall performance for distributed storage operations. TLS certificate support for admin and worker components adds an extra layer of security for management operations.
### ValuesFrom Configuration Mechanism
Cozystack now uses FluxCD's valuesFrom mechanism to replace Helm lookup functions for configuration propagation. This architectural improvement provides cleaner config propagation and eliminates the need for force reconcile controllers. Configuration from ConfigMaps (cozystack, cozystack-branding, cozystack-scheduling) and namespace service references (etcd, host, ingress, monitoring, seaweedfs) is now centrally managed through a `cozystack-values` Secret in each namespace.
This change simplifies Helm chart templates by replacing complex lookup functions with direct value references, improves configuration consistency, and reduces the reconciliation overhead. All HelmReleases now automatically receive cluster and namespace configuration through the valuesFrom mechanism, making configuration management more transparent and maintainable.
### Auto-diskful for LINSTOR
The LINSTOR integration now includes automatic diskful functionality that converts diskless nodes to diskful when they hold DRBD resources in Primary state for an extended period (30 minutes). This feature addresses scenarios where workloads are scheduled on nodes without local storage replicas by automatically creating local disk replicas when needed, improving I/O performance for long-running workloads.
When enabled with cleanup options, the system can automatically remove disk replicas that are no longer needed, preventing storage waste from temporary replicas. This intelligent storage management reduces network traffic for frequently accessed data while maintaining efficient storage utilization.
### Automated Version Management Systems
Cozystack now includes automated version management systems for PostgreSQL, Kubernetes, MariaDB, and Redis applications. These systems automatically track upstream versions and provide mechanisms for automated version updates, ensuring that platform users always have access to the latest stable versions while maintaining compatibility with existing deployments.
The version management systems integrate with the Cozystack API and dashboard, providing administrators with visibility into available versions and update paths. This infrastructure sets the foundation for future automated upgrade workflows and version compatibility management.
---
## Major Features and Improvements
### Storage
* **[linstor] Add linstor-scheduler package**: Added LINSTOR scheduler extender for optimal pod placement on nodes with LINSTOR storage. Includes admission webhook that automatically routes pods using LINSTOR CSI volumes to the custom scheduler, ensuring pods are placed on nodes with local replicas to minimize network traffic and improve I/O performance ([**@kvaps**](https://github.com/kvaps) in #1824).
* **[linstor] Enable auto-diskful for diskless nodes**: Enabled DRBD auto-diskful functionality to automatically convert diskless nodes to diskful when they hold volumes in Primary state for more than 30 minutes. Improves I/O performance for long-running workloads by creating local replicas and includes automatic cleanup options to prevent storage waste ([**@kvaps**](https://github.com/kvaps) in #1826).
* **[linstor] Build linstor-server with custom patches**: Added custom patches to linstor-server build process, enabling platform-specific optimizations and fixes ([**@kvaps**](https://github.com/kvaps) in #1726).
* **[seaweedfs] Traffic locality**: Upgraded SeaweedFS to v4.05 with traffic locality capabilities, new admin component with web-based UI, worker component for distributed operations, and enhanced S3 monitoring with Grafana dashboards. Improves S3 service performance by routing requests to nearest available volume servers ([**@nbykov0**](https://github.com/nbykov0) in #1748).
* **[linstor] fix: prevent DRBD device race condition in updateDiscGran**: Fixed race condition in DRBD device management during granularity updates, preventing potential data corruption or device conflicts ([**@kvaps**](https://github.com/kvaps) in #1829).
* **fix(linstor): prevent orphaned DRBD devices during toggle-disk retry**: Fixed issue where retry logic during disk toggle operations could leave orphaned DRBD devices, now properly cleans up devices during retry attempts ([**@kvaps**](https://github.com/kvaps) in #1823).
### Platform Architecture
* **[platform] Replace Helm lookup with valuesFrom mechanism**: Replaced Helm lookup functions with FluxCD valuesFrom mechanism for configuration propagation. Configuration from ConfigMaps and namespace references is now managed through `cozystack-values` Secret, simplifying templates and eliminating force reconcile controllers ([**@kvaps**](https://github.com/kvaps) in #1787).
* **[platform] refactor: split cozystack-resource-definitions into separate packages**: Refactored cozystack-resource-definitions into separate packages for better organization and maintainability, improving code structure and reducing coupling between components ([**@kvaps**](https://github.com/kvaps) in #1778).
* **[platform] Separate assets server into dedicated deployment**: Separated assets server from main platform deployment, improving scalability and allowing independent scaling of asset delivery infrastructure ([**@kvaps**](https://github.com/kvaps) in #1705).
* **[core] Extract Talos package from installer**: Extracted Talos package configuration from installer into a separate package, improving modularity and enabling independent updates ([**@kvaps**](https://github.com/kvaps) in #1724).
* **[registry] Add application labels and update filtering mechanism**: Added application labels to registry resources and improved filtering mechanism for better resource discovery and organization ([**@kvaps**](https://github.com/kvaps) in #1707).
* **fix(registry): implement field selector filtering for label-based resources**: Implemented field selector filtering for label-based resources in the registry, improving query performance and resource lookup efficiency ([**@kvaps**](https://github.com/kvaps) in #1845).
* **[platform] Add alphabetical sorting to registry resource lists**: Added alphabetical sorting to registry resource lists in the API and dashboard, improving user experience when browsing available applications ([**@lexfrei**](https://github.com/lexfrei) in #1764).
### Version Management
* **[postgres] Add version management system with automated version updates**: Introduced version management system for PostgreSQL with automated version tracking and update mechanisms ([**@kvaps**](https://github.com/kvaps) in #1671).
* **[kubernetes] Add version management system with automated version updates**: Added version management system for Kubernetes tenant clusters with automated version tracking and update capabilities ([**@kvaps**](https://github.com/kvaps) in #1672).
* **[mariadb] Add version management system with automated version updates**: Implemented version management system for MariaDB with automated version tracking and update mechanisms ([**@kvaps**](https://github.com/kvaps) in #1680).
* **[redis] Add version management system with automated version updates**: Added version management system for Redis with automated version tracking and update capabilities ([**@kvaps**](https://github.com/kvaps) in #1681).
### Networking
* **[kube-ovn] Update to v1.14.25**: Updated Kube-OVN to version 1.14.25 with improved stability and new features ([**@kvaps**](https://github.com/kvaps) in #1819).
* **[kubeovn] Package from external repo**: Extracted Kube-OVN packaging from main repository to external repository, improving modularity ([**@lllamnyp**](https://github.com/lllamnyp) in #1535).
* **[cilium] Update Cilium to v1.18.5**: Updated Cilium to version 1.18.5 with latest features and bug fixes ([**@lexfrei**](https://github.com/lexfrei) in #1769).
* **[system/cilium] Enable topology-aware routing for services**: Enabled topology-aware routing for Cilium services, improving traffic distribution and reducing latency by routing traffic to endpoints in the same zone when possible ([**@nbykov0**](https://github.com/nbykov0) in #1734).
* **[cilium] Enable automatic pod rollout on configmap updates**: Cilium and Cilium operator pods now automatically restart when the cilium-config ConfigMap is updated, ensuring configuration changes are applied immediately ([**@kvaps**](https://github.com/kvaps) in #1728).
* **[kubernetes] Fix endpoints for cilium-gateway**: Fixed endpoint configuration for cilium-gateway, ensuring proper service discovery and connectivity ([**@kvaps**](https://github.com/kvaps) in #1729).
* **[multus] Increase memory limit**: Increased memory limits for Multus components to handle larger network configurations and reduce out-of-memory issues ([**@nbykov0**](https://github.com/nbykov0) in #1773).
* **[main][paas-full] Add multus dependencies similar to other CNIs**: Added Multus as a dependency in the paas-full package, consistent with how other CNIs are included ([**@nbykov0**](https://github.com/nbykov0) in #1842).
### Virtual Machines
* **[vm] Always expose VMs with a service**: Virtual machines are now always exposed with at least a ClusterIP service, ensuring they have in-cluster DNS names and can be accessed from other pods even without public IP addresses ([**@lllamnyp**](https://github.com/lllamnyp) in #1738).
* **[virtual-machine] Improve check for resizing job**: Improved storage resize logic to only expand persistent volume claims when storage is being increased, preventing unintended storage reduction operations ([**@kvaps**](https://github.com/kvaps) in #1688).
* **[virtual-machine,vm-instance] Add nodeAffinity for Windows VMs based on scheduling config**: Added nodeAffinity configuration to virtual-machine and vm-instance charts to support dedicated nodes for Windows VMs ([**@kvaps**](https://github.com/kvaps) in #1693).
### Monitoring
* **[monitoring] Add SLACK_SEVERITY_FILTER field and VMAgent for tenant monitoring**: Introduced SLACK_SEVERITY_FILTER environment variable in Alerta deployment to enable filtering of alert severities for Slack notifications. Added VMAgent resource template for scraping metrics within tenant namespaces, improving monitoring granularity ([**@IvanHunters**](https://github.com/IvanHunters) in #1712).
* **[monitoring] Improve tenant metrics collection**: Improved tenant metrics collection mechanisms for better observability and monitoring coverage ([**@IvanHunters**](https://github.com/IvanHunters) in #1684).
### System Configuration
* **[api, lineage] Tolerate all taints**: Updated API and lineage webhook to tolerate all taints, ensuring controllers can run on any node regardless of taint configuration ([**@nbykov0**](https://github.com/nbykov0) in #1781).
* **[system] Add resource requests and limits to etcd-defrag**: Added resource requests and limits to etcd-defrag job to ensure proper resource allocation and prevent resource contention ([**@matthieu-robin**](https://github.com/matthieu-robin) in #1785).
* **[system:coredns] update coredns app labels to match Talos coredns labels**: Updated coredns app labels to match Talos coredns labels, ensuring consistency across the platform ([**@nbykov0**](https://github.com/nbykov0) in #1675).
* **[system:monitoring-agents] rename coredns metrics service**: Renamed coredns metrics service to avoid interference with coredns service used for name resolution in tenant k8s clusters ([**@nbykov0**](https://github.com/nbykov0) in #1676).
### Tenants and Namespaces
* **[tenant] Allow egress to parent ingress pods**: Updated tenant network policies to allow egress traffic to parent cluster ingress pods, enabling proper communication patterns ([**@lexfrei**](https://github.com/lexfrei) in #1765).
* **[tenant] Run cleanup job from system namespace**: Moved tenant cleanup job to run from system namespace, improving security and resource isolation ([**@lllamnyp**](https://github.com/lllamnyp) in #1774).
### FluxCD
* **[fluxcd] Add flux-aio module and migration**: Added FluxCD all-in-one module with migration support, simplifying FluxCD installation and management ([**@kvaps**](https://github.com/kvaps) in #1698).
* **[fluxcd] Enable source-watcher**: Enabled source-watcher in FluxCD configuration for improved GitOps synchronization and faster update detection ([**@kvaps**](https://github.com/kvaps) in #1706).
### Applications
* **[dashboard] Fix CustomFormsOverride schema to nest properties under spec.properties**: Fixed CustomFormsOverride schema generation to properly nest properties under `spec.properties` instead of directly under `properties`, ensuring correct form schema generation ([**@kvaps**](https://github.com/kvaps) in #1692).
* **[apps] Refactor apiserver to use typed objects and fix UnstructuredList GVK**: Refactored apiserver REST handlers to use typed objects instead of unstructured.Unstructured, eliminating runtime conversions. Fixed UnstructuredList GVK issue where objects were using the first registered kind instead of the correct kind ([**@kvaps**](https://github.com/kvaps) in #1679).
* **[keycloak] Make kubernetes client public**: Made Kubernetes client public in Keycloak configuration, enabling broader access patterns for Kubernetes integrations ([**@lllamnyp**](https://github.com/lllamnyp) in #1802).
## Improvements
* **[granular kubernetes application extensions dependencies]**: Improved dependency management for Kubernetes application extensions with more granular control over dependencies ([**@nbykov0**](https://github.com/nbykov0) in #1683).
* **[core:installer] Address buildx warnings**: Fixed Dockerfile syntax warnings from buildx, ensuring clean builds without warnings ([**@nbykov0**](https://github.com/nbykov0) in #1682).
* **[linstor] Update piraeus-operator v2.10.2**: Updated LINSTOR CSI to version 2.10.2 with improved stability and bug fixes ([**@kvaps**](https://github.com/kvaps) in #1689).
* **Update SeaweedFS v4.02**: Updated SeaweedFS to version 4.02 with improved S3 daemon performance and fixes ([**@kvaps**](https://github.com/kvaps) in #1725).
* **[installer,dx] Rename cozypkg to cozyhr**: Renamed cozypkg tool to cozyhr for better branding and consistency ([**@kvaps**](https://github.com/kvaps) in #1763).
## Fixes
* **fix(platform): fix migrations for v0.40 release**: Fixed platform migrations for v0.40 release, ensuring smooth upgrades from previous versions ([**@kvaps**](https://github.com/kvaps) in #1846).
* **[platform] fix migration for removing fluxcd-operator**: Fixed migration logic for removing fluxcd-operator, ensuring clean removal without leaving orphaned resources ([**@kvaps**](https://github.com/kvaps) in commit 4a83d2c7).
* **[kubevirt-operator] Fix typo in VMNotRunningFor10Minutes alert**: Fixed typo in VM alert name, ensuring proper alert triggering and monitoring ([**@kvaps**](https://github.com/kvaps) in #1770).
* **[kubevirt-operator] Revert incorrect case change in VM alerts**: Reverted incorrect case change in VM alert names to maintain consistency ([**@lexfrei**](https://github.com/lexfrei) in #1804).
* **[cozystack-controller] Fix: move crds to definitions**: Fixed CRD placement by moving them to definitions directory, ensuring proper resource organization ([**@kvaps**](https://github.com/kvaps) in #1759).
## Dependencies
* **Update SeaweedFS v4.02**: Updated SeaweedFS to version 4.02 ([**@kvaps**](https://github.com/kvaps) in #1725).
* **[seaweedfs] Traffic locality**: Upgraded SeaweedFS to v4.05 with traffic locality capabilities ([**@nbykov0**](https://github.com/nbykov0) in #1748).
* **[linstor] Update piraeus-operator v2.10.2**: Updated piraeus-operator to version 2.10.2 ([**@kvaps**](https://github.com/kvaps) in #1689).
* **[kube-ovn] Update to v1.14.25**: Updated Kube-OVN to version 1.14.25 ([**@kvaps**](https://github.com/kvaps) in #1819).
* **[cilium] Update Cilium to v1.18.5**: Updated Cilium to version 1.18.5 ([**@lexfrei**](https://github.com/lexfrei) in #1769).
* **Update go modules**: Updated Go modules to latest versions ([**@kvaps**](https://github.com/kvaps) in #1736).
## Development, Testing, and CI/CD
* **[ci] Fix auto-release workflow**: Fixed auto-release workflow to ensure correct release publishing and tagging ([**@kvaps**](https://github.com/kvaps) in commit 526af294).
* **fix(ci): ensure correct latest release after backport publishing**: Fixed CI workflow to correctly identify and tag the latest release after backport publishing ([**@kvaps**](https://github.com/kvaps) in #1800).
* **[workflows] Add auto patch release workflow**: Added automated patch release workflow for streamlined release management ([**@kvaps**](https://github.com/kvaps) in #1754).
* **[workflow] Add GitHub Action to update release notes from changelogs**: Added GitHub Action to automatically update release notes from changelog files ([**@kvaps**](https://github.com/kvaps) in #1752).
* **[ci] Improve backport workflow with merge_commits skip and conflict resolution**: Improved backport workflow with better merge commit handling and conflict resolution ([**@kvaps**](https://github.com/kvaps) in #1694).
* **[testing] Add aliases and autocomplete**: Added shell aliases and autocomplete support for testing commands, improving developer experience ([**@lllamnyp**](https://github.com/lllamnyp) in #1803).
* **[kubernetes] Add lb tests for tenant k8s**: Added load balancer tests for tenant Kubernetes clusters, improving test coverage ([**@IvanHunters**](https://github.com/IvanHunters) in #1783).
* **[agents] Add instructions for working with unresolved code review comments**: Added documentation and instructions for working with unresolved code review comments in agent workflows ([**@kvaps**](https://github.com/kvaps) in #1710).
* **feat(ci): add /retest command to rerun tests from Prepare environment**: Added `/retest` command to rerun tests from Prepare environment workflow ([**@kvaps**](https://github.com/kvaps) in commit 30c1041e).
* **fix(ci): remove GITHUB_TOKEN extraheader to trigger workflows**: Removed GITHUB_TOKEN extraheader to properly trigger workflows ([**@kvaps**](https://github.com/kvaps) in commit 68a639b3).
* **Fix: Add missing components to `distro-full` bundle**: Fixed missing components in distro-full bundle, ensuring all required components are included ([**@LoneExile**](https://github.com/LoneExile) in #1620).
* **Update Flux Operator (v0.33.0)**: Updated Flux Operator to version 0.33.0 ([**@kingdonb**](https://github.com/kingdonb) in #1649).
* **Add changelogs for v0.38.3 and v.0.38.4**: Added missing changelogs for v0.38.3 and v0.38.4 releases ([**@androndo**](https://github.com/androndo) in #1743).
* **Add changelogs to v.0.39.1**: Added changelog for v0.39.1 release ([**@androndo**](https://github.com/androndo) in #1750).
* **Add Cloupard to ADOPTERS.md**: Added Cloupard to the adopters list ([**@SerjioTT**](https://github.com/SerjioTT) in #1733).
## Documentation
* **[website] docs: expand monitoring and alerting documentation**: Expanded monitoring and alerting documentation with comprehensive guides, examples, and troubleshooting information ([**@IvanHunters**](https://github.com/IvanHunters) in [cozystack/website#388](https://github.com/cozystack/website/pull/388)).
* **[website] fix auto-generation of documentation**: Fixed automatic documentation generation process, ensuring all documentation is properly generated and formatted ([**@IvanHunters**](https://github.com/IvanHunters) in [cozystack/website#391](https://github.com/cozystack/website/pull/391)).
* **[website] secure boot**: Added documentation for Secure Boot support in Talos Linux ([**@kvaps**](https://github.com/kvaps) in [cozystack/website#387](https://github.com/cozystack/website/pull/387)).
## Tools
* **[talm] feat(helpers): add bond interface discovery helpers**: Added bond interface discovery helpers to talm for easier network configuration ([**@kvaps**](https://github.com/kvaps) in [cozystack/talm#94](https://github.com/cozystack/talm/pull/94)).
* **[talm] feat(talosconfig): add certificate regeneration from secrets.yaml**: Added certificate regeneration functionality to talm talosconfig command, allowing certificates to be regenerated from secrets.yaml ([**@kvaps**](https://github.com/kvaps) in cozystack/talm@1319dde).
* **[talm] fix(init): make name optional for -u flag**: Made name parameter optional for init command with -u flag, improving flexibility ([**@kvaps**](https://github.com/kvaps) in cozystack/talm@da29320).
* **[talm] fix(wrapper): copy NoOptDefVal when remapping -f to -F flag**: Fixed wrapper to properly copy NoOptDefVal when remapping flags, ensuring correct default value handling ([**@kvaps**](https://github.com/kvaps) in cozystack/talm@f6a6f1d).
* **[talm] fix(root): detect project root with secrets.encrypted.yaml**: Fixed root detection to properly identify project root when secrets.encrypted.yaml is present ([**@kvaps**](https://github.com/kvaps) in cozystack/talm@cf56780).
* **[talm] Fix interfaces helper for Talos v1.12**: Fixed interfaces helper to work correctly with Talos v1.12 ([**@kvaps**](https://github.com/kvaps) in cozystack/talm@34984ae).
* **[talm] Fix typo on README.md**: Fixed typo in README documentation ([**@diegolakatos**](https://github.com/diegolakatos) in [cozystack/talm#92](https://github.com/cozystack/talm/pull/92)).
* **[talm] fix(template): return error for invalid YAML in template output**: Fixed template command to return proper error for invalid YAML output ([**@kvaps**](https://github.com/kvaps) in [cozystack/talm#93](https://github.com/cozystack/talm/pull/93)).
* **[talm] feat(cozystack): enable allocateNodeCIDRs by default**: Enabled allocateNodeCIDRs by default in talm cozystack preset ([**@lexfrei**](https://github.com/lexfrei) in [cozystack/talm#91](https://github.com/cozystack/talm/pull/91)).
* **[boot-to-talos] feat(network): add VLAN interface support via netlink**: Added VLAN interface support via netlink in boot-to-talos for advanced network configuration ([**@kvaps**](https://github.com/kvaps) in cozystack/boot-to-talos@02874d7).
* **[boot-to-talos] feat(network): add bond interface support via netlink**: Added bond interface support via netlink in boot-to-talos for network bonding configurations ([**@kvaps**](https://github.com/kvaps) in cozystack/boot-to-talos@067822d).
* **[boot-to-talos] Draft EFI Support**: Added draft EFI support in boot-to-talos for UEFI boot scenarios ([**@kvaps**](https://github.com/kvaps) in cozystack/boot-to-talos@e194bc8).
* **[boot-to-talos] Change default install image size from 2GB to 3GB**: Changed default install image size from 2GB to 3GB to accommodate larger installations ([**@kvaps**](https://github.com/kvaps) in cozystack/boot-to-talos@3bfb035).
* **[cozyhr] feat(values): add valuesFrom support for HelmRelease**: Added valuesFrom support for HelmRelease in cozyhr tool, enabling better configuration management ([**@kvaps**](https://github.com/kvaps) in cozystack/cozyhr@7dff0c8).
* **[cozyhr] Rename cozypkg to cozyhr**: Renamed cozypkg tool to cozyhr for better branding ([**@kvaps**](https://github.com/kvaps) in cozystack/cozyhr@1029461).
---
## Contributors
We'd like to thank all contributors who made this release possible:
* **[linstor] Update piraeus-server patches with critical fixes**: Backported critical patches to piraeus-server that address storage stability issues and improve DRBD resource handling. These patches fix edge cases in device management and ensure more reliable storage operations ([**@kvaps**](https://github.com/kvaps) in #1850, #1852).
* **[linstor] Refactor node-level RWX validation**: Refactored the node-level ReadWriteMany (RWX) validation logic in LINSTOR CSI. The validation has been moved to the CSI driver level with a custom linstor-csi image build, providing more reliable RWX volume handling and clearer error messages when RWX requirements cannot be satisfied ([**@kvaps**](https://github.com/kvaps) in #1856, #1857).
## Fixes
* **[linstor] Remove node-level RWX validation**: Removed the problematic node-level RWX validation that was causing issues with volume provisioning. The validation logic has been refactored and moved to a more appropriate location in the LINSTOR CSI driver ([**@kvaps**](https://github.com/kvaps) in #1851).
* **[apiserver] Fix Watch resourceVersion and bookmark handling**: Fixed issues with Watch API handling of resourceVersion and bookmarks, ensuring proper event streaming and state synchronization for API clients ([**@kvaps**](https://github.com/kvaps) in #1860).
## Dependencies
* **[cilium] Update Cilium to v1.18.6**: Updated Cilium CNI to v1.18.6 with security fixes and performance improvements ([**@sircthulhu**](https://github.com/sircthulhu) in #1868, #1870).
* **[kubernetes] Increase default apiServer resourcesPreset to large**: Increased the default resource preset for kube-apiserver to `large` to ensure more reliable operation under higher workloads and prevent resource constraints ([**@kvaps**](https://github.com/kvaps) in #1875, #1882).
* **[kubernetes] Increase kube-apiserver startup probe threshold**: Increased the startup probe threshold for kube-apiserver to allow more time for the API server to become ready, especially in scenarios with slow storage or high load ([**@kvaps**](https://github.com/kvaps) in #1876, #1883).
* **[etcd] Increase probe thresholds for better recovery**: Increased etcd probe thresholds to provide more time for recovery operations, improving cluster resilience during network issues or temporary slowdowns ([**@kvaps**](https://github.com/kvaps) in #1874, #1878).
## Fixes
* **[dashboard] Fix view of loadbalancer IP in services window**: Fixed an issue where load balancer IP addresses were not displayed correctly in the services window of the dashboard ([**@IvanHunters**](https://github.com/IvanHunters) in #1884, #1887).
## Dependencies
* **Update Talos Linux v1.11.6**: Updated Talos Linux to v1.11.6 with latest security patches and improvements ([**@kvaps**](https://github.com/kvaps) in #1879).
* **[dashboard] Improve dashboard session params**: Improved session parameter handling in the dashboard for better user experience and more reliable session management ([**@lllamnyp**](https://github.com/lllamnyp) in #1913, #1919).
## Dependencies
* **Update cozyhr to v1.6.1**: Updated cozyhr to v1.6.1, which fixes a critical bug causing helm-controller v0.37.0+ to unexpectedly uninstall HelmReleases after cozyhr apply by correcting history snapshot fields for helm-controller compatibility ([**@kvaps**](https://github.com/kvaps) in cozystack/cozyhr#10).
* **[dashboard] Verify JWT token**: Added JWT token verification to the dashboard, ensuring that authentication tokens are properly validated before granting access. This prevents unauthorized access through forged or expired tokens ([**@lllamnyp**](https://github.com/lllamnyp) in #1980, #1984).
This release introduces MongoDB as a new managed application, expanding Cozystack's database offerings alongside existing PostgreSQL, MySQL, and Redis services. The release also includes storage improvements, Kubernetes stability enhancements, and updated documentation.
## Feature Highlights
### MongoDB Managed Application
Cozystack now includes MongoDB as a fully managed database service. Users can deploy production-ready MongoDB instances directly from the application catalog with minimal configuration.
Key capabilities:
- **Replica Set deployment**: Automatic configuration of MongoDB replica sets for high availability
- **Persistent storage**: Integration with Cozystack storage backends for reliable data persistence
- **Resource management**: Configurable CPU, memory, and storage resources
- **Monitoring integration**: Built-in metrics export for platform monitoring
Deploy MongoDB through the Cozystack dashboard or using the standard application deployment workflow ([**@lexfrei**](https://github.com/lexfrei) in #1822, #1881).
## Improvements
* **[linstor] Update piraeus-server patches with critical fixes**: Backported critical patches to piraeus-server that address storage stability issues and improve DRBD resource handling. These patches fix edge cases in device management and ensure more reliable storage operations ([**@kvaps**](https://github.com/kvaps) in #1850, #1852).
* **[linstor] Refactor node-level RWX validation**: Refactored the node-level ReadWriteMany (RWX) validation logic in LINSTOR CSI. The validation has been moved to the CSI driver level with a custom linstor-csi image build, providing more reliable RWX volume handling and clearer error messages when RWX requirements cannot be satisfied ([**@kvaps**](https://github.com/kvaps) in #1856, #1857).
* **[kubernetes] Increase default apiServer resourcesPreset to large**: Increased the default resource preset for kube-apiserver to `large` to ensure more reliable operation under higher workloads and prevent resource constraints ([**@kvaps**](https://github.com/kvaps) in #1875, #1882).
* **[kubernetes] Increase kube-apiserver startup probe threshold**: Increased the startup probe threshold for kube-apiserver to allow more time for the API server to become ready, especially in scenarios with slow storage or high load ([**@kvaps**](https://github.com/kvaps) in #1876, #1883).
* **[etcd] Increase probe thresholds for better recovery**: Increased etcd probe thresholds to provide more time for recovery operations, improving cluster resilience during network issues or temporary slowdowns ([**@kvaps**](https://github.com/kvaps) in #1874, #1878).
## Fixes
* **[linstor] Remove node-level RWX validation**: Removed the problematic node-level RWX validation that was causing issues with volume provisioning. The validation logic has been refactored and moved to a more appropriate location in the LINSTOR CSI driver ([**@kvaps**](https://github.com/kvaps) in #1851).
* **[apiserver] Fix Watch resourceVersion and bookmark handling**: Fixed issues with Watch API handling of resourceVersion and bookmarks, ensuring proper event streaming and state synchronization for API clients ([**@kvaps**](https://github.com/kvaps) in #1860).
* **[dashboard] Fix view of loadbalancer IP in services window**: Fixed an issue where load balancer IP addresses were not displayed correctly in the services window of the dashboard ([**@IvanHunters**](https://github.com/IvanHunters) in #1884, #1887).
## Dependencies
* **[cilium] Update cilium to v1.18.6**: Updated Cilium CNI to v1.18.6 with security fixes and performance improvements ([**@sircthulhu**](https://github.com/sircthulhu) in #1868, #1870).
* **Update Talos Linux v1.11.6**: Updated Talos Linux to v1.11.6 with latest security patches and improvements ([**@kvaps**](https://github.com/kvaps) in #1879).
## Documentation
* **[website] Add documentation for creating and managing cloned virtual machines**: Added comprehensive guide for VM cloning operations ([**@sircthulhu**](https://github.com/sircthulhu) in [cozystack/website#401](https://github.com/cozystack/website/pull/401)).
* **[website] Simplify NFS driver setup instructions**: Improved NFS driver setup documentation with clearer instructions ([**@kvaps**](https://github.com/kvaps) in [cozystack/website#399](https://github.com/cozystack/website/pull/399)).
* **[website] Update Talos installation docs for Hetzner and Servers.com**: Updated installation documentation with improved instructions for Hetzner and Servers.com environments ([**@kvaps**](https://github.com/kvaps) in [cozystack/website#395](https://github.com/cozystack/website/pull/395)).
* **[website] Add Hetzner RobotLB documentation**: Added documentation for configuring public IP with Hetzner RobotLB ([**@kvaps**](https://github.com/kvaps) in [cozystack/website#394](https://github.com/cozystack/website/pull/394)).
* **[website] Add Hidora organization support details**: Added Hidora to the support page with organization details ([**@matthieu-robin**](https://github.com/matthieu-robin) in [cozystack/website#397](https://github.com/cozystack/website/pull/397), [cozystack/website#398](https://github.com/cozystack/website/pull/398)).
* **[kubernetes] Add enum validation for IngressNginx exposeMethod**: Added enum validation for the `exposeMethod` field in IngressNginx configuration, preventing invalid values and improving user experience with clear valid options ([**@sircthulhu**](https://github.com/sircthulhu) in #1895, #1897).
* **[monitoring-agents] Set minReplicas to 1 for VPA for VMAgent**: Configured VPA (Vertical Pod Autoscaler) to maintain at least 1 replica for VMAgent, ensuring monitoring availability during scaling operations ([**@sircthulhu**](https://github.com/sircthulhu) in #1894, #1905).
* **[mongodb] Remove user-configurable images from MongoDB chart**: Removed user-configurable image options from the MongoDB chart to simplify configuration and ensure consistency with tested image versions ([**@kvaps**](https://github.com/kvaps) in #1901, #1904).
* **[kubernetes] Show Service and Ingress resources for Kubernetes app in dashboard**: Added visibility of Service and Ingress resources for Kubernetes applications in the dashboard, improving resource management and monitoring capabilities ([**@sircthulhu**](https://github.com/sircthulhu) in #1912, #1915).
## Fixes
* **[dashboard] Fix filtering on Pods tab for Service**: Fixed an issue where pod filtering was not working correctly on the Pods tab when viewing Services in the dashboard ([**@sircthulhu**](https://github.com/sircthulhu) in #1909, #1914).
* **Update cozyhr to v1.6.1**: Updated cozyhr to v1.6.1, which fixes a critical bug causing helm-controller v0.37.0+ to unexpectedly uninstall HelmReleases after cozyhr apply by correcting history snapshot fields for helm-controller compatibility ([**@kvaps**](https://github.com/kvaps) in cozystack/cozyhr#10).
* **[dashboard] Add "Edit" button to all resources**: Added an "Edit" button across all resource views in the dashboard, allowing users to modify resource configurations directly from the UI ([**@sircthulhu**](https://github.com/sircthulhu) in #1928, #1931).
* **[dashboard] Add resource quota usage to tenant details page**: Added resource quota usage display to the tenant details page, giving administrators visibility into how much of allocated resources each tenant is consuming ([**@sircthulhu**](https://github.com/sircthulhu) in #1929, #1932).
* **[branding] Separate values for keycloak**: Separated Keycloak branding values into dedicated configuration, allowing more granular customization of Keycloak appearance without affecting other branding settings ([**@nbykov0**](https://github.com/nbykov0) in #1946).
* **Add instance profile label to workload monitor**: Added instance profile metadata labels to the workload monitor, enabling better resource tracking and monitoring by instance profile type ([**@matthieu-robin**](https://github.com/matthieu-robin) in #1954, #1957).
## Fixes
* **[kubernetes] Fix manifests for kubernetes deployment**: Fixed incorrect manifests that prevented proper Kubernetes deployment, restoring correct application behavior ([**@IvanHunters**](https://github.com/IvanHunters) in #1943, #1945).
* **[vm] Allow changing field external after creation**: Users can now modify the external network field on virtual machines after initial creation, providing more flexibility in VM networking configuration without requiring recreation ([**@sircthulhu**](https://github.com/sircthulhu) in #1956, #1962).
* **[branding] Separate values for keycloak**: Separated Keycloak branding values into dedicated configuration for more granular customization of Keycloak appearance ([**@nbykov0**](https://github.com/nbykov0) in #1947, #1963).
## Fixes
* **[kubernetes] Fix coredns serviceaccount to match kubernetes bootstrap RBAC**: Configured the CoreDNS chart to create a `kube-dns` ServiceAccount matching the Kubernetes bootstrap ClusterRoleBinding, fixing RBAC errors (`Failed to watch`) when CoreDNS pods restart ([**@mattia-eleuteri**](https://github.com/mattia-eleuteri) in #1958, #1978).
* **[dashboard] Verify JWT token**: Added JWT token verification to the dashboard, ensuring that authentication tokens are properly validated before granting access. This prevents unauthorized access through forged or expired tokens ([**@lllamnyp**](https://github.com/lllamnyp) in #1980, #1983).
## Fixes
* **[postgres-operator] Correct PromQL syntax in CNPGClusterOffline alert**: Fixed incorrect PromQL syntax in the `CNPGClusterOffline` alert rule for CloudNativePG, ensuring the alert fires correctly when all instances of a PostgreSQL cluster are offline ([**@mattia-eleuteri**](https://github.com/mattia-eleuteri) in #1981, #1989).
* **[kubernetes] Auto-enable Gateway API support in cert-manager**: cert-manager now automatically enables `enableGatewayAPI` when the Gateway API addon is active in the Kubernetes application. Users no longer need to manually configure this setting, and the option can still be overridden via `valuesOverride` ([**@kvaps**](https://github.com/kvaps) in #1997, #2012).
* **[vm] Allow switching between instancetype and custom resources**: Users can now switch virtual machines between instancetype-based and custom resource configurations after creation. The upgrade hook atomically patches VM resources, providing more flexibility in adjusting VM sizing without recreation ([**@sircthulhu**](https://github.com/sircthulhu) in #2008, #2013).
## Fixes
* **[dashboard] Add startupProbe to prevent container restarts on slow hardware**: Added `startupProbe` to both `bff` and `web` containers in the dashboard deployment. On slow hardware, kubelet was killing containers because the `livenessProbe` only allowed ~33 seconds for startup. The `startupProbe` gives containers up to 60 seconds to start before `livenessProbe` kicks in ([**@kvaps**](https://github.com/kvaps) in #1996, #2014).
* **[cozystack-basics] Deny resourcequotas deletion for tenant admin**: Prevented tenant administrators from deleting resource quotas, ensuring that resource limits set by platform administrators cannot be bypassed by tenant-level users ([**@myasnikovdaniil**](https://github.com/myasnikovdaniil) in #2076).
## Dependencies
* **Update Kube-OVN to v1.15.3**: Updated Kube-OVN CNI to v1.15.3 with latest bug fixes and improvements ([**@kvaps**](https://github.com/kvaps)).
This alpha release introduces a fundamental architectural shift from HelmRelease bundles to Package-based deployment managed by the new cozystack-operator. It includes a comprehensive backup system with Velero integration, significant API changes that rename the core CRD, Flux sharding for improved tenant workload distribution, enhanced monitoring capabilities, and various improvements to virtual machines, tenants, and the build workflow.
> **⚠️ Alpha Release Warning**: This is a pre-release version intended for testing and early adoption. Breaking changes may occur before the stable v1.0.0 release.
## Breaking Changes
### API Rename: CozystackResourceDefinition → ApplicationDefinition
The `CozystackResourceDefinition` CRD has been renamed to `ApplicationDefinition` for better clarity and consistency. This change affects:
- All Go types and controller files
- CRD Helm chart renamed from `cozystack-resource-definition-crd` to `application-definition-crd`
- All cozyrds YAML manifests updated to use `kind: ApplicationDefinition`
A migration (v24) is included to handle the transition automatically.
### Package-Based Deployment
The platform now uses Package resources managed by cozystack-operator instead of HelmRelease bundles. Key changes:
- Restructured values.yaml with full configuration support (networking, publishing, authentication, scheduling, branding, resources)
- Added values-isp-full.yaml and values-isp-hosted.yaml for bundle variants
- Package resources replace old HelmRelease templates
- PackageSources moved from sources/ to templates/sources/
- Migration script `hack/migrate-to-version-1.0.sh` provided for converting ConfigMaps to Package resources
---
## Major Features and Improvements
### Cozystack Operator
A new operator has been introduced to manage Package and PackageSource resources, providing declarative package management for the platform:
* **[cozystack-operator] Introduce API objects: packages and packagesources**: Added new CRDs for declarative package management, defining the API for Package and PackageSource resources ([**@kvaps**](https://github.com/kvaps) in #1740).
* **[cozystack-operator] Introduce Cozystack-operator core logic**: Implemented core reconciliation logic for the operator, handling Package and PackageSource lifecycle management ([**@kvaps**](https://github.com/kvaps) in #1741).
* **[cozystack-operator] Add Package and PackageSource reconcilers**: Added controllers for Package and PackageSource resources with full reconciliation support ([**@kvaps**](https://github.com/kvaps) in #1755).
* **[cozystack-operator] Add deployment files**: Added Kubernetes deployment manifests for running cozystack-operator in the cluster ([**@kvaps**](https://github.com/kvaps) in #1761).
* **[platform] Add PackageSources for cozystack-operator**: Added PackageSource definitions for cozystack-operator integration ([**@kvaps**](https://github.com/kvaps) in #1760).
* **[cozypkg] Add tool for managing Package and PackageSources**: Added CLI tool for managing Package and PackageSource resources ([**@kvaps**](https://github.com/kvaps) in #1756).
### Backup System
Comprehensive backup functionality has been added with Velero integration for managing application backups:
* **[backups] Implement core backup Plan controller**: Core controller for managing backup schedules and plans, providing the foundation for backup orchestration ([**@lllamnyp**](https://github.com/lllamnyp) in #1640).
* **[backups] Build and deploy backup controller**: Deployment infrastructure for the backup controller, including container image builds and Kubernetes manifests ([**@lllamnyp**](https://github.com/lllamnyp) in #1685).
* **[backups] Scaffold a backup strategy API group**: Added API group for backup strategies, enabling pluggable backup implementations ([**@lllamnyp**](https://github.com/lllamnyp) in #1687).
* **[backups] Add indices to core backup resources**: Added indices to backup resources for improved query performance ([**@lllamnyp**](https://github.com/lllamnyp) in #1719).
* **[backups] Stub the Job backup strategy controller**: Added stub implementation for Job-based backup strategy ([**@lllamnyp**](https://github.com/lllamnyp) in #1720).
* **[backups] Implement Velero strategy controller**: Integration with Velero for backup operations, enabling enterprise-grade backup capabilities ([**@androndo**](https://github.com/androndo) in #1762).
* **[backups,dashboard] User-facing UI**: Dashboard interface for managing backups and backup jobs, providing visibility into backup status and history ([**@lllamnyp**](https://github.com/lllamnyp) in #1737).
### Platform Architecture
* **[platform] Migrate from HelmRelease bundles to Package-based deployment**: Replaced HelmRelease bundle system with Package resources managed by cozystack-operator. Includes restructured values.yaml with full configuration support and migration tooling ([**@kvaps**](https://github.com/kvaps) in #1816).
* **refactor(api): rename CozystackResourceDefinition to ApplicationDefinition**: Renamed CRD and all related types for better clarity and consistency. Updated all Go types, controllers, and 25+ YAML manifests ([**@kvaps**](https://github.com/kvaps) in #1864).
* **feat(flux): implement flux sharding for tenant HelmReleases**: Added Flux sharding support to distribute tenant HelmRelease reconciliation across multiple controllers, improving scalability in multi-tenant environments ([**@kvaps**](https://github.com/kvaps) in #1816).
* **refactor(installer): migrate installer to cozystack-operator**: Moved installer functionality to cozystack-operator for unified management ([**@kvaps**](https://github.com/kvaps) in #1816).
* **feat(api): add chartRef to ApplicationDefinition**: Added chartRef field to support ExternalArtifact references for flexible chart sourcing ([**@kvaps**](https://github.com/kvaps) in #1816).
* **feat(api): show only hash in version column for applications and modules**: Simplified version display in API responses for cleaner output ([**@kvaps**](https://github.com/kvaps) in #1816).
### Virtual Machines
* **[vm] Always expose VMs with a service**: Virtual machines are now always exposed with at least a ClusterIP service, ensuring they have in-cluster DNS names and can be accessed from other pods even without public IP addresses ([**@lllamnyp**](https://github.com/lllamnyp) in #1738, #1751).
### Monitoring
* **[monitoring] Add SLACK_SEVERITY_FILTER field and VMAgent for tenant monitoring**: Introduced the SLACK_SEVERITY_FILTER environment variable in the Alerta deployment to enable filtering of alert severities for Slack notifications based on the disabledSeverity configuration. Additionally, added a VMAgent resource template for scraping metrics within tenant namespaces, improving monitoring granularity and control ([**@IvanHunters**](https://github.com/IvanHunters) in #1712).
### Tenants
* **[tenant] Allow egress to parent ingress pods**: Updated tenant network policies to allow egress traffic to parent cluster ingress pods, enabling proper communication patterns between tenant namespaces and parent cluster ingress controllers ([**@lexfrei**](https://github.com/lexfrei) in #1765, #1776).
* **[tenant] Run cleanup job from system namespace**: Moved tenant cleanup job to run from system namespace, improving security and resource isolation for tenant cleanup operations ([**@lllamnyp**](https://github.com/lllamnyp) in #1774, #1777).
### System
* **[system] Add resource requests and limits to etcd-defrag**: Added resource requests and limits to etcd-defrag job to ensure proper resource allocation and prevent resource contention during etcd maintenance operations ([**@matthieu-robin**](https://github.com/matthieu-robin) in #1785, #1786).
### Development and Build
* **feat(cozypkg): add cross-platform build targets with version injection**: Added cross-platform build targets (linux/amd64, linux/arm64, darwin/amd64, darwin/arm64) for cozypkg/cozyhr tool with automatic version injection from git tags ([**@kvaps**](https://github.com/kvaps) in #1862).
* **refactor: move scripts to hack directory**: Reorganized scripts to standard hack/ location following Kubernetes project conventions ([**@kvaps**](https://github.com/kvaps) in #1863).
## Fixes
* **fix(talos): skip rebuilding assets if files already exist**: Improved Talos package build process to avoid redundant asset rebuilds when files are already present, reducing build time ([**@kvaps**](https://github.com/kvaps)).
* **[kubevirt-operator] Fix typo in VMNotRunningFor10Minutes alert**: Fixed typo in VM alert name, ensuring proper alert triggering and monitoring for virtual machines that are not running for extended periods ([**@lexfrei**](https://github.com/lexfrei) in #1770, #1775).
* **[backups] Fix malformed glob and split in template**: Fixed malformed glob pattern and split operation in backup template processing ([**@lllamnyp**](https://github.com/lllamnyp) in #1708).
## Documentation
* **[website] docs(storage): simplify NFS driver setup instructions**: Simplified NFS driver setup documentation with clearer instructions ([**@kvaps**](https://github.com/kvaps) in [cozystack/website#399](https://github.com/cozystack/website/pull/399)).
* **[website] Add Hidora organization support details**: Added Hidora to the support page with organization details ([**@matthieu-robin**](https://github.com/matthieu-robin) in [cozystack/website#397](https://github.com/cozystack/website/pull/397)).
* **[website] Update LinkedIn link for Hidora organization**: Updated LinkedIn link for Hidora organization on the support page ([**@matthieu-robin**](https://github.com/matthieu-robin) in [cozystack/website#398](https://github.com/cozystack/website/pull/398)).
---
## Migration Guide
### From v0.38.x / v0.39.x to v1.0.0-alpha.1
1.**Backup your cluster** before upgrading
2. Run the migration script: `hack/migrate-to-version-1.0.sh`
3. The migration will:
- Convert ConfigMaps to Package resources
- Rename CozystackResourceDefinition to ApplicationDefinition
- Update HelmRelease references to use Package-based deployment
### Known Issues
- This is an alpha release; some features may be incomplete or change before stable release
- Migration script should be tested in a non-production environment first
---
## Contributors
We'd like to thank all contributors who made this release possible:
> **⚠️ Alpha Release Warning**: This is a pre-release version intended for testing and early adoption. Breaking changes may occur before the stable v1.0.0 release.
## Major Features and Improvements
### New Applications
* **[apps] Add MongoDB managed application**: Added MongoDB as a new managed application, providing a fully managed MongoDB database with automatic scaling, backups, and high availability support ([**@lexfrei**](https://github.com/lexfrei) in #1822).
### Networking
* **[kilo] Introduce kilo**: Added Kilo WireGuard mesh networking support. Kilo provides secure WireGuard-based VPN mesh for connecting Kubernetes nodes across different networks and regions ([**@kvaps**](https://github.com/kvaps) in #1691).
* **[local-ccm] Add local-ccm package**: Added local cloud controller manager package for managing load balancer services in local/bare-metal environments without a cloud provider ([**@kvaps**](https://github.com/kvaps) in #1831).
### Platform
* **[platform] Add flux-plunger controller**: Added flux-plunger controller to automatically fix stuck HelmRelease errors by cleaning up failed resources and retrying reconciliation ([**@kvaps**](https://github.com/kvaps) in #1843).
* **[platform] Split telemetry between operator and controller**: Separated telemetry collection between cozystack-operator and cozystack-controller for better metrics isolation and monitoring capabilities ([**@kvaps**](https://github.com/kvaps) in #1869).
* **[platform] Remove cozystack.io/ui label**: Cleaned up deprecated `cozystack.io/ui` labels from platform components ([**@kvaps**](https://github.com/kvaps) in #1872).
## Improvements
* **[kubernetes] Increase default apiServer resourcesPreset to large**: Increased the default resource preset for kube-apiserver to `large` to ensure more reliable operation under higher workloads ([**@kvaps**](https://github.com/kvaps) in #1875).
* **[kubernetes] Increase kube-apiserver startup probe threshold**: Increased the startup probe threshold for kube-apiserver to allow more time for the API server to become ready ([**@kvaps**](https://github.com/kvaps) in #1876).
* **[etcd] Increase probe thresholds for better recovery**: Increased etcd probe thresholds to provide more time for recovery operations, improving cluster resilience ([**@kvaps**](https://github.com/kvaps) in #1874).
## Fixes
* **[apiserver] Fix Watch resourceVersion and bookmark handling**: Fixed issues with Watch API handling of resourceVersion and bookmarks, ensuring proper event streaming and state synchronization ([**@kvaps**](https://github.com/kvaps) in #1860).
* **[dashboard] Fix view of loadbalancer IP in services window**: Fixed an issue where load balancer IP addresses were not displayed correctly in the services window of the dashboard ([**@IvanHunters**](https://github.com/IvanHunters) in #1884).
## Dependencies
* **[cilium] Update cilium to v1.18.6**: Updated Cilium CNI to v1.18.6 with security fixes and performance improvements ([**@sircthulhu**](https://github.com/sircthulhu) in #1868).
* **Update Talos Linux v1.12.1**: Updated Talos Linux to v1.12.1 with latest features, security patches and improvements ([**@kvaps**](https://github.com/kvaps) in #1877).
## Documentation
* **[website] Add documentation for creating and managing cloned virtual machines**: Added comprehensive guide for VM cloning operations ([**@sircthulhu**](https://github.com/sircthulhu) in [cozystack/website#401](https://github.com/cozystack/website/pull/401)).
* **[website] Simplify NFS driver setup instructions**: Improved NFS driver setup documentation with clearer instructions ([**@kvaps**](https://github.com/kvaps) in [cozystack/website#399](https://github.com/cozystack/website/pull/399)).
* **[website] Update Talos installation docs for Hetzner and Servers.com**: Updated installation documentation with improved instructions for Hetzner and Servers.com environments ([**@kvaps**](https://github.com/kvaps) in [cozystack/website#395](https://github.com/cozystack/website/pull/395)).
* **[website] Add Hetzner RobotLB documentation**: Added documentation for configuring public IP with Hetzner RobotLB ([**@kvaps**](https://github.com/kvaps) in [cozystack/website#394](https://github.com/cozystack/website/pull/394)).
* **[website] Add Hidora organization support details**: Added Hidora to the support page with organization details ([**@matthieu-robin**](https://github.com/matthieu-robin) in [cozystack/website#397](https://github.com/cozystack/website/pull/397), [cozystack/website#398](https://github.com/cozystack/website/pull/398)).
> **⚠️ Beta Release Warning**: This is a pre-release version intended for testing and early adoption. Breaking changes may occur before the stable v1.0.0 release.
## Major Features and Improvements
### New Applications
* **[qdrant] Add Qdrant vector database application**: Added Qdrant as a new managed application, providing a high-performance vector database for AI and machine learning workloads. Supports single replica or clustered mode, persistent storage, resource presets, API key authentication, and optional external LoadBalancer access ([**@lexfrei**](https://github.com/lexfrei) in #1987).
### System Components
* **[system] Add cluster-autoscaler package for Hetzner and Azure**: Added cluster-autoscaler system package with support for multiple cloud providers (Hetzner and Azure) to automatically scale management cluster nodes. Includes comprehensive documentation for Hetzner setup with Talos Linux, vSwitch configuration, and Kilo mesh networking integration ([**@kvaps**](https://github.com/kvaps) in #1964).
* **[system] Add clustersecret-operator package**: Added clustersecret-operator system package for managing secrets across multiple namespaces in Kubernetes clusters ([**@sircthulhu**](https://github.com/sircthulhu) in #2025).
### Networking
* **[kilo] Update to v0.7.0 and add configurable MTU**: Updated Kilo WireGuard mesh networking to v0.7.0 from cozystack fork with pre-built images. Added configurable MTU parameter (default: auto) for WireGuard interface, allowing automatic MTU detection or manual override ([**@kvaps**](https://github.com/kvaps) in #2003).
* **[local-ccm] Add node-lifecycle-controller component**: Added optional node-lifecycle-controller to local-ccm package that automatically deletes unreachable NotReady nodes from the cluster. Solves the "zombie" node problem when cluster autoscaler deletes cloud instances but node objects remain in Kubernetes. Supports configurable node selectors, protected labels, and HA deployment with leader election ([**@IvanHunters**](https://github.com/IvanHunters) in #1992).
### Virtual Machines
* **[vm] Add cpuModel field to specify CPU model without instanceType**: Added cpuModel field to VirtualMachine API, allowing users to specify CPU model directly without using instanceType, providing more granular control over VM CPU configuration ([**@sircthulhu**](https://github.com/sircthulhu) in #2007).
* **[vm] Allow switching between instancetype and custom resources**: Implemented atomic upgrade hook that allows switching between instanceType-based and custom resource-based VM configuration, providing more flexibility in VM resource management ([**@sircthulhu**](https://github.com/sircthulhu) in #2008).
* **[vm] Migrate to runStrategy instead of running**: Migrated VirtualMachine API from deprecated `running` field to `runStrategy` field, following KubeVirt upstream best practices ([**@sircthulhu**](https://github.com/sircthulhu) in #2004).
### Backups
* **[backups] Add comprehensive backup and restore functionality**: Major update to backup system including BackupClass for Velero, virtual machine backup strategies, RestoreJob resource with end-to-end restore workflows, Velero integration with polling and status tracking, and enhanced backup plans UI with simplified Plan/BackupJob API ([**@androndo**](https://github.com/androndo) in #1967, [**@lllamnyp**](https://github.com/lllamnyp) in #1968).
* **[backups] Add kubevirt plugin to velero**: Added KubeVirt plugin to Velero for proper virtual machine backup support, enabling consistent snapshots of VM state and data ([**@lllamnyp**](https://github.com/lllamnyp) in #2017).
* **[backups] Install backupstrategy controller by default**: Enabled backupstrategy controller by default to provide automatic backup scheduling and management for managed applications ([**@lllamnyp**](https://github.com/lllamnyp) in #2020).
* **[backups] Better selectors for VM strategy**: Improved VM backup strategy selectors for more accurate and reliable backup targeting ([**@lllamnyp**](https://github.com/lllamnyp) in #2023).
### Platform
* **[kubernetes] Auto-enable Gateway API support in cert-manager**: Added automatic Gateway API support in cert-manager for tenant Kubernetes clusters, enabling automatic certificate management for Gateway API resources ([**@kvaps**](https://github.com/kvaps) in #1997).
* **[tenant,rbac] Use shared clusterroles**: Refactored tenant RBAC to use shared ClusterRoles, improving maintainability and consistency across tenant namespaces ([**@lllamnyp**](https://github.com/lllamnyp) in #1999).
* **[mongodb] Unify users and databases configuration**: Simplified MongoDB user and database configuration with a more unified API structure ([**@kvaps**](https://github.com/kvaps) in #1923).
## Improvements
* **[keycloak-configure,dashboard] Enable insecure TLS verification by default**: Made SSL certificate verification configurable with insecure mode enabled by default for easier local development and testing ([**@IvanHunters**](https://github.com/IvanHunters) in #2005).
* **[dashboard] Add startupProbe to prevent container restarts on slow hardware**: Added startup probe to dashboard pods to prevent unnecessary container restarts on slow hardware or during high load ([**@kvaps**](https://github.com/kvaps) in #1996).
* **[cilium] Change cilium-operator replicas to 1**: Reduced Cilium operator replicas from 2 to 1 to decrease resource consumption in smaller deployments ([**@IvanHunters**](https://github.com/IvanHunters) in #1784).
* **[monitoring] Enable monitoring for core components**: Enhanced monitoring capabilities with better dashboards and metrics collection for core Cozystack components ([**@IvanHunters**](https://github.com/IvanHunters) in #1937).
* **[branding] Separate values for Keycloak**: Separated Keycloak branding values for better customization capabilities ([**@nbykov0**](https://github.com/nbykov0) in #1947).
* **[kubernetes] Use ingress-nginx nodeport service**: Changed Kubernetes managed clusters to use ingress-nginx NodePort service for improved compatibility and flexibility ([**@sircthulhu**](https://github.com/sircthulhu) in #1948).
## Fixes
* **[linstor] Extract piraeus-operator CRDs into separate package**: Fixed issue where Helm did not reliably install all CRDs from large crds.yaml files by creating dedicated piraeus-operator-crds package. This ensures the linstorsatellites.io CRD is properly installed, preventing satellite pod creation failures ([**@IvanHunters**](https://github.com/IvanHunters) in #1991).
* **[platform] Fix cozystack-values secret race condition**: Fixed race condition in cozystack-values secret creation that could cause platform initialization failures ([**@lllamnyp**](https://github.com/lllamnyp) in #2024).
* **[seaweedfs] Increase certificate duration to 10 years**: Increased SeaweedFS certificate validity from 1 year to 10 years to reduce certificate rotation overhead and prevent unexpected certificate expiration issues ([**@IvanHunters**](https://github.com/IvanHunters) in #1986).
* **[monitoring] Remove cozystack-controller dependency**: Fixed monitoring package to remove unnecessary dependency on cozystack-controller, allowing monitoring to be installed independently ([**@IvanHunters**](https://github.com/IvanHunters) in #1990).
* **[monitoring] Remove duplicate dashboards.list from extra/monitoring**: Fixed duplicate dashboards.list configuration in extra/monitoring package ([**@IvanHunters**](https://github.com/IvanHunters) in #2016).
* **[mongodb] Fix pre-commit check**: Fixed pre-commit linting issues in MongoDB package ([**@kvaps**](https://github.com/kvaps) in #1753).
* **[mongodb] Update MongoDB logo**: Updated MongoDB application logo in the dashboard to use the correct branding ([**@kvaps**](https://github.com/kvaps) in #2027).
* **[bootbox] Auto-create bootbox-application as dependency**: Fixed bootbox package to automatically create required bootbox-application dependency ([**@kvaps**](https://github.com/kvaps) in #1974).
* **[migrations] Add migration 25 for v1.0 upgrade cleanup**: Added migration script to handle cleanup during v1.0 upgrade path ([**@kvaps**](https://github.com/kvaps) in #1975).
* **[build] Fix platform migrations image build**: Fixed Docker image build process for platform migrations ([**@kvaps**](https://github.com/kvaps) in #1976).
* **[postgres-operator] Correct PromQL syntax in CNPGClusterOffline alert**: Fixed incorrect PromQL syntax in CNPGClusterOffline Prometheus alert for PostgreSQL clusters ([**@mattia-eleuteri**](https://github.com/mattia-eleuteri) in #1981).
* **[coredns] Fix serviceaccount to match kubernetes bootstrap RBAC**: Fixed CoreDNS service account configuration to correctly match Kubernetes bootstrap RBAC requirements ([**@mattia-eleuteri**](https://github.com/mattia-eleuteri) in #1958).
* **[dashboard] Verify JWT token**: Added JWT token verification to dashboard for improved security ([**@lllamnyp**](https://github.com/lllamnyp) in #1980).
* **[talm] Skip config loading for completion subcommands**: Fixed talm CLI to skip unnecessary config loading for shell completion commands ([**@kitsunoff**](https://github.com/kitsunoff) in [cozystack/talm#109](https://github.com/cozystack/talm/pull/109)).
## Dependencies
* **[kube-ovn] Update Kube-OVN to v1.15.3**: Updated Kube-OVN CNI to v1.15.3 with performance improvements and bug fixes ([**@kvaps**](https://github.com/kvaps) in #2022).
* **[local-ccm] Update to v0.3.0**: Updated local cloud controller manager to v0.3.0 with node-lifecycle-controller support ([**@kvaps**](https://github.com/kvaps) in #1992).
* **[kilo] Update to v0.7.0**: Updated Kilo to v0.7.0 from cozystack fork with improved MTU handling ([**@kvaps**](https://github.com/kvaps) in #2003).
## Development, Testing, and CI/CD
* **[ci] Use GitHub Copilot CLI for changelog generation**: Automated changelog generation using GitHub Copilot CLI to improve release process efficiency ([**@androndo**](https://github.com/androndo) in #1753).
* **[ci] Choose runner conditional on label**: Added conditional runner selection in CI based on PR labels for more flexible CI/CD workflows ([**@lllamnyp**](https://github.com/lllamnyp) in #1998).
* **[backups] Add restore jobs controller**: Added controller for managing backup restore jobs ([**@androndo**](https://github.com/androndo) in #1811).
* **Update CODEOWNERS**: Updated CODEOWNERS file to include new maintainers ([**@lllamnyp**](https://github.com/lllamnyp) in #1972, [**@IvanHunters**](https://github.com/IvanHunters) in #2015).
## Documentation
* **[website] Add LINSTOR disk preparation guide**: Added comprehensive documentation for preparing disks for LINSTOR storage system ([**@IvanHunters**](https://github.com/IvanHunters) in [cozystack/website#411](https://github.com/cozystack/website/pull/411)).
* **[website] Add Proxmox VM migration guide**: Added detailed guide for migrating virtual machines from Proxmox to Cozystack ([**@IvanHunters**](https://github.com/IvanHunters) in [cozystack/website#410](https://github.com/cozystack/website/pull/410)).
* **[website] Describe operator-based and HelmRelease-based package patterns**: Added development documentation explaining operator-based and HelmRelease-based package patterns for Cozystack ([**@kvaps**](https://github.com/kvaps) in [cozystack/website#413](https://github.com/cozystack/website/pull/413)).
* **[website] Correct typo in kubeconfig reference in Kubernetes installation guide**: Fixed documentation typo in kubeconfig reference ([**@shkarface**](https://github.com/shkarface) in [cozystack/website#414](https://github.com/cozystack/website/pull/414)).
* **[website] Check quotas before an upgrade**: Added troubleshooting documentation for checking resource quotas before performing upgrades ([**@nbykov0**](https://github.com/nbykov0) in [cozystack/website#405](https://github.com/cozystack/website/pull/405)).
---
## Contributors
We'd like to thank all contributors who made this release possible:
> **⚠️ Beta Release Warning**: This is a pre-release version intended for testing and early adoption. Breaking changes may occur before the stable v1.0.0 release.
## Major Features and Improvements
### Virtual Machines
* **[vm-instance] Complete migration from virtual-machine to vm-disk and vm-instance**: Completed the architectural redesign of virtual machine management by fully migrating from the legacy `virtual-machine` application to the new `vm-disk` and `vm-instance` applications. This includes automatic migration scripts (migration 28) that convert existing virtual machines, handle CDI webhook configurations, and update cloud-init references. The new architecture provides better separation of concerns between disk management and VM lifecycle, enabling more flexible VM configuration and improved resource management ([**@kvaps**](https://github.com/kvaps) in #2040).
* **[vm-instance] Port advanced VM features**: Ported critical VM features from the legacy virtual-machine application including cpuModel field for direct CPU model specification, support for switching between instanceType and custom resource configurations, and migration from deprecated `running` field to `runStrategy` field following KubeVirt best practices ([**@kvaps**](https://github.com/kvaps) in #2040).
### Storage and CSI
* **[kubevirt-csi-driver] Add RWX Filesystem (NFS) support**: Added Read-Write-Many (RWX) filesystem support to kubevirt-csi-driver, enabling multiple pods to mount the same persistent volume simultaneously via NFS. This provides native NFS support for shared storage use cases without requiring external NFS provisioners, with automatic NFS server deployment per PVC and seamless integration with KubeVirt's storage layer ([**@kvaps**](https://github.com/kvaps) in #2042).
### Platform and Infrastructure
* **[cozystack-api] Switch from DaemonSet to Deployment**: Migrated cozystack-api from DaemonSet to Deployment with PreferClose topology spread constraints, improving resource efficiency while maintaining high availability. The Deployment approach reduces resource consumption compared to running API pods on every node, while topology spreading ensures resilient pod placement across the cluster ([**@kvaps**](https://github.com/kvaps) in #2041, #2048).
* **[linstor] Move CRDs installation to dedicated chart**: Refactored LINSTOR CRDs installation by moving them to a dedicated `piraeus-operator-crds` chart, solving Helm's limitation with large CRD files that could cause unreliable installations. This ensures all LINSTOR CRDs (including linstorsatellites.io) are properly installed before the operator starts, preventing satellite pod creation failures. Includes automatic migration script to reassign existing CRDs to the new chart ([**@kvaps**](https://github.com/kvaps) in #2036).
* **[installer] Unify operator templates**: Merged separate operator templates into a single variant-based template, simplifying the installation process and reducing configuration duplication. The new template supports different deployment variants (Talos, non-Talos) through a unified configuration approach ([**@kvaps**](https://github.com/kvaps) in #2034).
### Applications
* **[mariadb] Rename mysql application to mariadb**: Renamed the MySQL application to MariaDB to accurately reflect the underlying database engine being used. Includes automatic migration script (migration 27) that handles resource renaming and ensures seamless upgrade path for existing MySQL deployments. All resources, including databases, users, backups, and configurations, are automatically migrated to use the mariadb naming ([**@kvaps**](https://github.com/kvaps) in #2026).
* **[ferretdb] Remove FerretDB application**: Removed the FerretDB application from the catalog as it has been superseded by native MongoDB support with improved performance and features ([**@kvaps**](https://github.com/kvaps) in #2028).
## Improvements
* **[rbac] Use hierarchical naming scheme**: Refactored RBAC configuration to use hierarchical naming scheme for cluster roles and role bindings, improving organization and maintainability of permission structures across the platform ([**@lllamnyp**](https://github.com/lllamnyp) in #2019).
* **[backups] Create RBAC for backup resources**: Added comprehensive RBAC configuration for backup resources, enabling proper permission management for backup operations and restore jobs across different user roles ([**@lllamnyp**](https://github.com/lllamnyp) in #2018).
* **[etcd-operator] Add vertical-pod-autoscaler dependency**: Added vertical-pod-autoscaler as a dependency to etcd-operator package, ensuring proper resource scaling and optimization for etcd clusters ([**@sircthulhu**](https://github.com/sircthulhu) in #2047).
## Fixes
* **[cozystack-operator] Preserve existing suspend field in package reconciler**: Fixed package reconciler to properly preserve the existing suspend field state during reconciliation, preventing unintended resumption of suspended packages ([**@sircthulhu**](https://github.com/sircthulhu) in #2043).
* **[cozystack-operator] Fix namespace privileged flag resolution**: Fixed operator to correctly resolve namespace privileged flag by checking all Packages in the namespace, not just the first one. This ensures namespaces are properly marked as privileged when any package requires elevated permissions ([**@kvaps**](https://github.com/kvaps) in #2046).
* **[cozystack-operator] Fix namespace reconciliation field ownership**: Fixed Server-Side Apply (SSA) field ownership conflicts by using per-Package field owner for namespace reconciliation, preventing conflicts when multiple packages reconcile the same namespace ([**@kvaps**](https://github.com/kvaps) in #2046).
* **[platform] Clean up Helm secrets for removed releases**: Added cleanup logic to migration 23 to remove orphaned Helm secrets from removed -rd releases, preventing secret accumulation and reducing cluster resource usage ([**@kvaps**](https://github.com/kvaps) in #2035).
* **[monitoring] Fix YAML parse error in vmagent template**: Fixed YAML parsing error in monitoring-agents vmagent template that could cause monitoring stack deployment failures ([**@kvaps**](https://github.com/kvaps) in #2037).
* **[talm] Fix metadata.id type casting in physical_links_info**: Fixed Prometheus query in physical_links_info chart to properly cast metadata.id to string for regexMatch operations, preventing query failures with numeric interface IDs ([**@kvaps**](https://github.com/kvaps) in cozystack/talm#110).
## Dependencies
* **[kilo] Update to v0.7.1**: Updated Kilo WireGuard mesh networking to v0.7.1 with bug fixes and improvements ([**@kvaps**](https://github.com/kvaps) in #2049).
## Development, Testing, and CI/CD
* **[ci] Improve cozyreport functionality**: Enhanced cozyreport tool with improved reporting capabilities for CI/CD pipelines, providing better visibility into test results and build status ([**@lllamnyp**](https://github.com/lllamnyp) in #2032).
* **[e2e] Increase HelmRelease readiness timeout for kubernetes test**: Increased HelmRelease readiness timeout in Kubernetes end-to-end tests to prevent false failures on slower hardware or during high load conditions, specifically targeting ingress-nginx component which may take longer to become ready ([**@lexfrei**](https://github.com/lexfrei) in #2033).
## Documentation
* **[website] Add documentation versioning**: Implemented comprehensive documentation versioning system with separate v0 and v1 documentation trees, version selector in the UI, proper URL redirects for unversioned docs, and improved navigation for users working with different Cozystack versions ([**@IvanStukov**](https://github.com/IvanStukov) in cozystack/website#415).
* **[website] Describe upgrade to v1.0**: Added detailed upgrade instructions for migrating from v0.x to v1.0, including prerequisites, upgrade steps, and troubleshooting guidance ([**@nbykov0**](https://github.com/nbykov0) in cozystack/website@21bbe84).
* **[website] Update support documentation**: Updated support documentation with current contact information and support channels ([**@xrmtech-isk**](https://github.com/xrmtech-isk) in cozystack/website#420).
---
## Contributors
We'd like to thank all contributors who made this release possible:
> **⚠️ Beta Release Warning**: This is a pre-release version intended for testing and early adoption. Breaking changes may occur before the stable v1.0.0 release.
## Features and Improvements
* **[installer] Add variant-aware templates for generic Kubernetes support**: Extended the installer chart to support generic and hosted Kubernetes deployments via the existing `cozystackOperator.variant` parameter. When using `variant=generic`, the installer now renders separate templates for the Cozystack operator, skipping Talos-specific components. This enables users to deploy Cozystack on standard Kubernetes distributions and hosted Kubernetes services, expanding platform compatibility beyond Talos Linux ([**@lexfrei**](https://github.com/lexfrei) in #2010).
* **[kilo] Add Cilium compatibility variant**: Added a new `cilium` variant to the kilo PackageSource that deploys kilo with the `--compatibility=cilium` flag. This enables Cilium-aware IPIP encapsulation where the outer packet IP matches the inner packet source, allowing Cilium's network policies to function correctly with kilo's WireGuard mesh networking. Users can now run kilo alongside Cilium CNI while maintaining full network policy enforcement capabilities ([**@kvaps**](https://github.com/kvaps) in #2055).
* **[cluster-autoscaler] Enable enforce-node-group-min-size by default**: Enabled the `enforce-node-group-min-size` option for the system cluster-autoscaler chart. This ensures node groups are always scaled up to their configured minimum size, even when current workload demands are lower, preventing unexpected scale-down below minimum thresholds and improving cluster stability for production workloads ([**@kvaps**](https://github.com/kvaps) in #2050).
* **[dashboard] Upgrade dashboard to version 1.4.0**: Updated the Cozystack dashboard to version 1.4.0 with new features and improvements for better user experience and cluster management capabilities ([**@sircthulhu**](https://github.com/sircthulhu) in #2051).
## Breaking Changes & Upgrade Notes
* **[vpc] Migrate subnets definition from map to array format**: Migrated VPC subnets definition from map format (`map[string]Subnet`) to array format (`[]Subnet`) with an explicit `name` field. This aligns VPC subnet definitions with the vm-instance `networks` field pattern and provides more intuitive configuration. Existing VPC deployments are automatically migrated via migration 30, which converts the subnet map to an array while preserving all existing subnet configurations and network connectivity ([**@kvaps**](https://github.com/kvaps) in #2052).
## Dependencies
* **[kilo] Update to v0.8.0**: Updated Kilo WireGuard mesh networking to v0.8.0 with performance improvements, bug fixes, and new compatibility features ([**@kvaps**](https://github.com/kvaps) in #2053).
* **[talm] Skip config loading for __complete command**: Fixed CLI completion behavior by skipping config loading for the `__complete` command, preventing errors during shell completion when configuration files are not available or misconfigured ([**@kitsunoff**](https://github.com/kitsunoff) in cozystack/talm#109).
## Contributors
We'd like to thank all contributors who made this release possible:
> **⚠️ Beta Release Warning**: This is a pre-release version intended for testing and early adoption. Breaking changes may occur before the stable v1.0.0 release.
## Features and Improvements
* **[platform] Add cilium-kilo networking variant**: Added a new `cilium-kilo` networking variant that combines Cilium CNI with Kilo WireGuard mesh overlay. This variant enables `enable-ipip-termination` in Cilium for proper IPIP packet handling and deploys Kilo with `--compatibility=cilium` flag. Users can now select `cilium-kilo` as their networking variant during platform setup, simplifying the multi-location WireGuard setup compared to manually combining Cilium and standalone Kilo ([**@kvaps**](https://github.com/kvaps) in #2064).
* **[nats] Add monitoring**: Added Grafana dashboards for NATS JetStream and server metrics monitoring, along with Prometheus monitoring support with TLS-aware endpoint configuration. Includes updated image customization options (digest and full image name) and component version upgrades for the NATS exporter and utilities. Users now have full observability into NATS message broker performance and health ([**@klinch0**](https://github.com/klinch0) in #1381).
* **[platform] Add DNS-1035 validation for Application names**: Added dynamic DNS-1035 label validation for Application names in the Cozystack API, using `IsDNS1035Label` from `k8s.io/apimachinery`. Validation is performed at creation time and accounts for the root host length to prevent names that would exceed Kubernetes resource naming limits. This prevents creation of resources with invalid names that would fail downstream Kubernetes resource creation ([**@lexfrei**](https://github.com/lexfrei) in #1771).
* **[operator] Add automatic CRD installation at startup**: Added `--install-crds` flag to the Cozystack operator that installs embedded CRD manifests at startup, ensuring CRDs exist before the operator begins reconciliation. CRD manifests are now embedded in the operator binary and verified for consistency with the Helm `crds/` directory via a new CI Makefile check. This eliminates ordering issues during initial cluster setup where CRDs might not yet be present ([**@lexfrei**](https://github.com/lexfrei) in #2060).
## Fixes
* **[platform] Adopt tenant-root into cozystack-basics during migration**: Added migration 31 to adopt existing `tenant-root` Namespace and HelmRelease into the `cozystack-basics` Helm release when upgrading from v0.41.x to v1.0. Previously these resources were applied via `kubectl apply` with no Helm release tracking, causing Helm to treat them as foreign resources and potentially delete them during reconciliation. This migration ensures a safe upgrade path by annotating and labeling these resources for Helm adoption ([**@kvaps**](https://github.com/kvaps) in #2065).
* **[platform] Preserve tenant-root HelmRelease during migration**: Fixed a data-loss risk during migration from v0.41.x to v1.0.0-beta where the `tenant-root` HelmRelease (and the namespace it manages) could be deleted, causing tenant service outages. Added safety annotation to the HelmRelease and lookup logic to preserve current parameters during migration, preventing unwanted deletion of tenant-root resources ([**@sircthulhu**](https://github.com/sircthulhu) in #2063).
* **[codegen] Add gen_client to update-codegen.sh and regenerate applyconfiguration**: Fixed a build error in `pkg/generated/applyconfiguration/utils.go` caused by a reference to `testing.TypeConverter` which was removed in client-go v0.34.1. The root cause was that `hack/update-codegen.sh` never called `gen_client`, leaving the generated applyconfiguration code stale. Running the full code generation now produces a consistent and compilable codebase ([**@lexfrei**](https://github.com/lexfrei) in #2061).
* **[e2e] Make kubernetes test retries effective by cleaning up stale resources**: Fixed E2E test retries for the Kubernetes tenant test by adding pre-creation cleanup of backend deployment/service and NFS pod/PVC in `run-kubernetes.sh`. Previously, retries would fail immediately because stale resources from a failed attempt blocked re-creation. Also increased the tenant deployment wait timeout from 90s to 300s to handle CI resource pressure ([**@lexfrei**](https://github.com/lexfrei) in #2062).
## Development, Testing, and CI/CD
* **[e2e] Use helm install instead of kubectl apply for cozystack installation**: Replaced the pre-rendered static YAML application flow (`kubectl apply`) with direct `helm upgrade --install` of the `packages/core/installer` chart in E2E tests. Removed the CRD/operator artifact upload/download steps from the CI workflow, simplifying the pipeline. The chart with correct values is already present in the sandbox via workspace copy and `pr.patch` ([**@lexfrei**](https://github.com/lexfrei) in #2060).
## Documentation
* **[website] Improve Azure autoscaling troubleshooting guide**: Enhanced the Azure autoscaling troubleshooting documentation with serial console instructions for debugging VMSS worker nodes, a troubleshooting section for nodes stuck in maintenance mode due to invalid or missing machine config, `az vmss update --custom-data` instructions for updating machine config, and a warning that Azure does not support reading back `customData` ([**@kvaps**](https://github.com/kvaps) in cozystack/website#424).
* **[website] Update multi-location documentation for cilium-kilo variant**: Updated multi-location networking documentation to reflect the new integrated `cilium-kilo` variant selection during platform setup, replacing the previous manual Kilo installation and Cilium configuration steps. Added explanation of `enable-ipip-termination` and updated the troubleshooting section ([**@kvaps**](https://github.com/kvaps) in cozystack/website@02d63f0).
## Contributors
We'd like to thank all contributors who made this release possible:
> **⚠️ Release Candidate Warning**: This is a release candidate intended for final validation before the stable v1.0.0 release. Breaking changes are not expected at this stage, but please test thoroughly before deploying to production.
## Features and Improvements
* **[harbor] Add managed Harbor container registry**: Added Harbor v2.14.2 as a managed tenant-level container registry service. The application uses CloudNativePG for PostgreSQL, the Redis operator for caching, and S3 via COSI BucketClaim (from SeaweedFS) for registry image storage. Auto-generated admin credentials are persisted across upgrades, TLS is handled by cert-manager, and Trivy vulnerability scanner is included. Users can now deploy a fully managed, production-ready OCI container registry within their tenant ([**@lexfrei**](https://github.com/lexfrei) in #2058).
* **[kubernetes] Update supported Kubernetes versions to v1.30–v1.35**: Updated the tenant Kubernetes version matrix to v1.30, v1.31, v1.32, v1.33, v1.34, and v1.35 (now the default). EOL versions v1.28 and v1.29 are removed. Kamaji is updated to edge-26.2.4 with full Kubernetes 1.35 support, and the CAPI Kamaji provider is updated to v0.16.0. A compatibility patch ensures kubelets older than v1.35 are not broken by Kamaji injecting 1.35-specific kubelet fields ([**@lexfrei**](https://github.com/lexfrei) in #2073).
* **[platform] Make cluster issuer name and ACME solver configurable**: Added `publishing.certificates.solver` (`http01` or `dns01`) and `publishing.certificates.issuerName` (default: `letsencrypt-prod`) parameters to the platform chart. This allows operators to point all ingress TLS annotations at any ClusterIssuer — custom ACME, self-signed, or internal CA — without modifying individual package templates. See the Breaking Changes section for the rename from the previous `issuerType` field ([**@myasnikovdaniil**](https://github.com/myasnikovdaniil) in #2077).
* **[dashboard] VMInstance dropdowns for disks and instanceType**: The VM instance creation form now renders API-backed dropdowns for the `instanceType` field (populated from `VirtualMachineClusterInstancetype` cluster resources) and for disk `name` fields (populated from `VMDisk` resources in the same namespace). Default values are read from the ApplicationDefinition's OpenAPI schema. This eliminates manual lookups and reduces misconfiguration when attaching disks or selecting VM instance types ([**@sircthulhu**](https://github.com/sircthulhu) in #2071).
* **[installer] Remove CRDs from Helm chart, delegate lifecycle to operator**: The `cozy-installer` Helm chart no longer ships CRDs in its `crds/` directory. CRD lifecycle is now fully managed by the Cozystack operator via the `--install-crds` flag, which applies embedded CRD manifests on every startup using server-side apply. The platform PackageSource is also created by the operator instead of a Helm template. This ensures CRDs and the PackageSource are always up to date after each operator restart, eliminating stale CRDs from Helm's install-only behavior ([**@lexfrei**](https://github.com/lexfrei) in #2074).
## Fixes
* **[kubevirt] Update KubeVirt to v1.6.4 and CDI to v1.64.0, fix VM pod initialization**: Updated KubeVirt operator to v1.6.4 and CDI operator to v1.64.0, including live migration of existing VMs during the upgrade. Additionally, disabled serial console logging globally via the KubeVirt CR to prevent a known v1.6.x issue ([upstream #15989](https://github.com/kubevirt/kubevirt/issues/15989)) where the `guest-console-log` init container blocked virt-launcher pods from starting, causing all VMs to get stuck in `PodInitializing` state ([**@nbykov0**](https://github.com/nbykov0) in #1833; [**@kvaps**](https://github.com/kvaps) in 7dfb819).
* **[linstor] Fix DRBD+LUKS+STORAGE resource creation failure**: All newly created encrypted volumes were failing because the DRBD `.res` file was never written due to a missing `setExists(true)` call in the `LuksLayer`. Applied the upstream `skip-adjust-when-device-inaccessible` patch ([LINBIT/linstor-server#477](https://github.com/LINBIT/linstor-server/pull/477)) which fixes the root cause and also prevents unnecessary lsblk calls when devices are not yet physically present ([**@kvaps**](https://github.com/kvaps) in #2072).
* **[system] Fix monitoring-agents FQDN resolution for tenant workload clusters**: Monitoring agents (`vmagent`, `fluent-bit`) in tenant workload clusters were failing to deliver metrics and logs because service addresses used short DNS names without the cluster domain suffix. Fixed by appending the configured cluster domain from `_cluster.cluster-domain` (with fallback to `cluster.local`) to all vmagent remoteWrite URLs and fluent-bit output hosts ([**@IvanHunters**](https://github.com/IvanHunters) in #2075).
* **[cozystack-basics] Preserve existing HelmRelease values during reconciliations**: Fixed a data-loss bug where changes made to the `tenant-root` HelmRelease were silently dropped on the next forced or upgrade reconciliation of the `cozystack-basics` HelmRelease. The reconciler now merges new configuration with existing values instead of overwriting them ([**@sircthulhu**](https://github.com/sircthulhu) in #2068).
* **[cozystack-basics] Deny resourcequotas deletion for tenant admin**: Fixed the `cozy:tenant:admin:base` ClusterRole to explicitly deny deletion of `ResourceQuota` objects for tenant admins and superadmins, preventing accidental removal of tenant resource limits ([**@myasnikovdaniil**](https://github.com/myasnikovdaniil) in #2076).
## Breaking Changes & Upgrade Notes
* **[platform] Certificate issuer configuration parameters renamed**: The `publishing.certificates.issuerType` field is renamed to `publishing.certificates.solver`, and the value `cloudflare` is renamed to `dns01` to align with standard ACME terminology. A new `publishing.certificates.issuerName` field (default: `letsencrypt-prod`) is introduced to allow pointing all ingresses at a custom ClusterIssuer. Migration 32 is included and automatically converts existing configurations during upgrade — no manual action is required ([**@myasnikovdaniil**](https://github.com/myasnikovdaniil) in #2077).
## Documentation
* **[website] Migrate ConfigMap references to Platform Package in v1 documentation**: Updated the entire v1 documentation tree to replace legacy ConfigMap-based configuration references with the new Platform Package API, ensuring guides are consistent with the v1 configuration model ([**@sircthulhu**](https://github.com/sircthulhu) in cozystack/website#426).
* **[website] Add generic Kubernetes deployment guide for v1**: Added a new installation guide covering Cozystack deployment on any generic Kubernetes cluster, expanding the set of supported deployment targets beyond provider-specific guides ([**@lexfrei**](https://github.com/lexfrei) in cozystack/website#408).
* **[website] Refactor resource planning documentation**: Improved the resource planning guide with a clearer structure and more comprehensive coverage of planning considerations for Cozystack deployments ([**@IvanStukov**](https://github.com/IvanStukov) in cozystack/website#423).
* **[website] Add ServiceAccount API access documentation and update FAQ**: Added a new article documenting ServiceAccount API access token configuration and updated the FAQ to include related troubleshooting guidance ([**@IvanStukov**](https://github.com/IvanStukov) in cozystack/website#421).
* **[website] Update networking-mesh allowed-location-ips example**: Replaced provider-specific CLI usage with standard `kubectl` commands in the multi-location networking guide's `allowed-location-ips` example, making the documentation more universally applicable ([**@kvaps**](https://github.com/kvaps) in cozystack/website#425).
## Contributors
We'd like to thank all contributors who made this release possible:
> **⚠️ Release Candidate Warning**: This is a release candidate intended for final validation before the stable v1.0.0 release. Breaking changes are not expected at this stage, but please test thoroughly before deploying to production.
## Features and Improvements
* **[keycloak] Allow custom Ingress hostname via values**: Added an `ingress.host` field to the cozy-keycloak chart values, allowing operators to override the default `keycloak.<root-host>` Ingress hostname. The custom hostname is applied to both the Ingress resource and the `KC_HOSTNAME` environment variable in the StatefulSet. When left empty, the original behavior is preserved (fully backward compatible) ([**@sircthulhu**](https://github.com/sircthulhu) in #2101).
## Fixes
* **[platform] Fix upgrade issues in migrations, etcd timeout, and migration script**: Fixed multiple upgrade failures discovered during v0.41.1 → v1.0 upgrade testing. Migration 26 now uses the `cozystack.io/ui=true` label (always present on v0.41.1) instead of the new label that depends on migration 22 having run, and adds robust Helm secret deletion with fallback and verification. Migrations 28 and 29 wrap `grep` calls to prevent `pipefail` exits and fix the reconcile annotation to use RFC3339 format. Migration 27 now skips missing CRDs and adds a name-pattern fallback for Helm secret deletion. The etcd HelmRelease timeout is increased from 10m to 30m to accommodate TLS cert rotation hooks. The `migrate-to-version-1.0.sh` script gains the missing `bundle-disable`, `bundle-enable`, `expose-ingress`, and `expose-services` field mappings ([**@kvaps**](https://github.com/kvaps) in #2096).
* **[platform] Fix orphaned -rd HelmReleases after application renames**: After the `ferretdb→mongodb`, `mysql→mariadb`, and `virtual-machine→vm-disk+vm-instance` renames, the system-level `-rd` HelmReleases in `cozy-system` (`ferretdb-rd`, `mysql-rd`, `virtual-machine-rd`) were left orphaned, referencing ExternalArtifacts that no longer exist and causing persistent reconciliation failures. Migrations 28 and 29 are updated to remove these resources, and migration 33 is added as a safety net for clusters that already passed those migrations ([**@kvaps**](https://github.com/kvaps) in #2102).
* **[monitoring-agents] Fix FQDN resolution regression in tenant workload clusters**: The fix introduced in #2075 used `_cluster.cluster-domain` references in `values.yaml`, but `_cluster` values are not accessible from Helm subchart contexts — meaning fluent-bit received empty hostnames and failed to forward logs. This PR replaces the `_cluster` references with a new `global.clusterDomain` variable (empty by default for management clusters, set to the cluster domain for tenant clusters), which is correctly shared with all subcharts ([**@kvaps**](https://github.com/kvaps) in #2086).
* **[dashboard] Fix legacy templating and cluster identifier in sidebar links**: Standardized the cluster identifier used across dashboard menu links, administration links, and API request paths, resolving incorrect or broken link targets for the Backups and External IPs sidebar sections ([**@androndo**](https://github.com/androndo) in #2093).
* **[dashboard] Fix backupjobs creation form and sidebar backup category identifier**: Fixed the backup job creation form configuration, adding the required Name, Namespace, Plan Name, Application, and Backup Class fields. Fixed the sidebar backup category identifier that was causing incorrect navigation ([**@androndo**](https://github.com/androndo) in #2103).
## Documentation
* **[website] Add Helm chart development principles guide**: Added a new developer guide section documenting Cozystack's four core Helm chart principles: easy upstream updates, local-first artifacts, local dev/test workflow, and no external dependencies ([**@kvaps**](https://github.com/kvaps) in cozystack/website#418).
* **[website] Add network architecture overview**: Added comprehensive network architecture documentation covering the multi-layered networking stack — MetalLB (L2/BGP), Cilium eBPF (kube-proxy replacement), Kube-OVN (centralized IPAM), and tenant isolation with identity-based eBPF policies — with Mermaid diagrams for all major traffic flows ([**@IvanHunters**](https://github.com/IvanHunters) in cozystack/website#422).
* **[website] Update documentation to use jsonpatch for service exposure**: Improved `kubectl patch` commands throughout installation and configuration guides to use JSON Patch `add` operations for extending arrays instead of replacing them wholesale, making the documented commands safer and more precise ([**@sircthulhu**](https://github.com/sircthulhu) in cozystack/website#427).
* **[website] Update certificates section in Platform Package documentation**: Updated the certificate configuration documentation to reflect the new `solver` and `issuerName` fields introduced in v1.0.0-rc.1, replacing the legacy `issuerType` references ([**@myasnikovdaniil**](https://github.com/myasnikovdaniil) in cozystack/website#429).
* **[website] Add tenant Kubernetes cluster log querying guide**: Added documentation for querying logs from tenant Kubernetes clusters in Grafana using VictoriaLogs labels (`tenant`, `kubernetes_namespace_name`, `kubernetes_pod_name`), including the `monitoringAgents` addon prerequisite and step-by-step filtering examples ([**@IvanHunters**](https://github.com/IvanHunters) in cozystack/website#430).
* **[website] Replace non-idempotent commands with idempotent alternatives**: Updated `helm install` to `helm upgrade --install`, `kubectl create -f` to `kubectl apply -f`, and `kubectl create ns` to the dry-run+apply pattern across all installation and deployment guides so commands can be safely re-run ([**@lexfrei**](https://github.com/lexfrei) in cozystack/website#431).
* **[website] Fix broken documentation links with `.md` suffix**: Fixed incorrect internal links with `.md` suffix across virtualization guides for both v0 and v1 documentation, standardizing link text to "Developer Guide" ([**@cheese**](https://github.com/cheese) in cozystack/website#432).
## Contributors
We'd like to thank all contributors who made this release possible:
We are thrilled to announce **Cozystack v1.0.0**, the first stable major release of the Cozystack platform. This milestone represents a fundamental architectural evolution from the v0.x series, introducing a fully operator-driven package management system, a comprehensive backup and restore framework, a redesigned virtual machine architecture, and a rich set of new managed applications — all hardened through an extensive alpha, beta, and release-candidate cycle.
## Feature Highlights
### Package-Based Architecture with Cozystack Operator
The most significant architectural change in v1.0.0 is the replacement of HelmRelease bundle deployments with a declarative **Package** and **PackageSource** model managed by the new `cozystack-operator`. Operators now define their platform configuration in a structured `values.yaml` and the operator reconciles the desired state by managing Package and PackageSource resources across the cluster.
The operator also takes ownership of CRD lifecycle — installing and updating CRDs from embedded manifests at every startup — eliminating the stale-CRD problem that affected Helm-only installations. Flux sharding has been added to distribute tenant HelmRelease reconciliation across multiple Flux controllers, providing horizontal scalability in large multi-tenant environments.
A migration script (`hack/migrate-to-version-1.0.sh`) is provided for upgrading existing v0.x clusters, along with 33 incremental migration steps that automate resource renaming, secret cleanup, and configuration conversion.
### Comprehensive Backup and Restore System
v1.0.0 ships a fully featured, production-ready backup and restore framework built on Velero integration. Users can define **BackupClass** resources to describe backup storage targets, create **BackupPlan** schedules, and trigger **RestoreJob** resources for end-to-end application recovery.
Virtual machine backups are supported natively via the Velero KubeVirt plugin, which captures consistent VM disk snapshots alongside metadata. The backup controller and the backup strategy sub-controllers (including the VM-specific strategy) are installed by default, and a full dashboard UI allows users to monitor backup status, view backup job history, and initiate restore workflows.
### Redesigned Virtual Machine Architecture
The legacy `virtual-machine` application has been replaced with a two-resource architecture: **`vm-disk`** for managing persistent disks and **`vm-instance`** for managing VM lifecycle. This separation provides cleaner disk/instance management, allows disks to be reused across VM instances, and aligns with modern KubeVirt patterns.
New capabilities include: a `cpuModel` field for direct CPU model specification without using an instanceType; the ability to switch between `instanceType`-based and custom resource-based configurations; migration from the deprecated `running` field to `runStrategy`; and native **RWX (NFS) filesystem support** in the KubeVirt CSI driver, enabling multiple pods to mount the same persistent volume simultaneously.
### New Managed Applications
v1.0.0 expands the application catalog significantly:
- **MongoDB**: A fully managed MongoDB replica set with persistent storage, monitoring integration, and unified user/database configuration API.
- **Qdrant**: A high-performance vector database for AI and machine learning workloads, supporting single-replica and clustered modes with API key authentication and optional external LoadBalancer access.
- **Harbor**: A fully managed OCI container registry backed by CloudNativePG, Redis operator, and COSI BucketClaim (SeaweedFS). Includes Trivy vulnerability scanner, auto-generated admin credentials, and TLS via cert-manager.
- **NATS**: Enhanced with full Grafana monitoring dashboards for JetStream and server metrics, Prometheus support with TLS-aware configuration, and updated image customization options.
- **MariaDB**: The `mysql` application is renamed to `mariadb`, accurately reflecting the underlying engine. An automatic migration (migration 27) converts all existing MySQL resources to use the `mariadb` naming.
FerretDB has been removed from the catalog as it is superseded by native MongoDB support.
### Multi-Location Networking with Kilo and cilium-kilo
Cozystack v1.0.0 introduces first-class support for multi-location clusters via the **Kilo** WireGuard mesh networking package. Kilo automatically establishes encrypted WireGuard tunnels between nodes in different network segments, enabling seamless cross-region communication.
A new integrated **`cilium-kilo`** networking variant combines Cilium eBPF CNI with Kilo's WireGuard overlay in a single platform configuration selection. This variant enables `enable-ipip-termination` in Cilium and deploys Kilo with `--compatibility=cilium`, allowing Cilium network policies to function correctly over the WireGuard mesh — without any manual configuration of the two components.
### Flux Sharding for Scalable Multi-Tenancy
Tenant HelmRelease reconciliation is now distributed across multiple Flux controllers via sharding labels. Each tenant workload is assigned to a shard based on a deterministic hash, preventing a single Flux controller from becoming a bottleneck in large multi-tenant environments. The platform operator manages the shard assignment automatically, and new shards can be added by scaling the Flux deployment.
## Major Features and Improvements
### Cozystack Operator
* **[cozystack-operator] Introduce Package and PackageSource APIs**: Added new CRDs for declarative package management, defining the full API for Package and PackageSource resources ([**@kvaps**](https://github.com/kvaps) in #1740, #1741, #1755, #1756, #1760, #1761).
* **[platform] Migrate from HelmRelease bundles to Package-based deployment**: Replaced HelmRelease bundle system with Package resources managed by cozystack-operator, including restructured values.yaml with full configuration support for networking, publishing, authentication, scheduling, branding, and resources ([**@kvaps**](https://github.com/kvaps) in #1816).
* **[cozystack-operator] Add automatic CRD installation at startup**: Added `--install-crds` flag to install embedded CRD manifests on every startup via server-side apply, ensuring CRDs and the PackageSource are always up to date ([**@lexfrei**](https://github.com/lexfrei) in #2060).
* **[installer] Remove CRDs from Helm chart, delegate lifecycle to operator**: The `cozy-installer` Helm chart no longer ships CRDs; CRD lifecycle is fully managed by the Cozystack operator ([**@lexfrei**](https://github.com/lexfrei) in #2074).
* **[cozystack-operator] Preserve existing suspend field in package reconciler**: Fixed package reconciler to properly preserve the suspend field state during reconciliation ([**@sircthulhu**](https://github.com/sircthulhu) in #2043).
* **[cozystack-operator] Fix namespace privileged flag resolution and field ownership**: Fixed operator to correctly check all Packages in a namespace when determining privileged status, and resolved SSA field ownership conflicts ([**@kvaps**](https://github.com/kvaps) in #2046).
* **[platform] Add flux-plunger controller**: Added flux-plunger controller to automatically fix stuck HelmRelease errors by cleaning up failed resources and retrying reconciliation ([**@kvaps**](https://github.com/kvaps) in #1843).
* **[installer] Add variant-aware templates for generic Kubernetes support**: Extended the installer to support generic and hosted Kubernetes deployments via the `cozystackOperator.variant=generic` parameter ([**@lexfrei**](https://github.com/lexfrei) in #2010).
* **[installer] Unify operator templates**: Merged separate operator templates into a single variant-based template supporting Talos and non-Talos deployments ([**@kvaps**](https://github.com/kvaps) in #2034).
### API and Platform
* **[api] Rename CozystackResourceDefinition to ApplicationDefinition**: Renamed CRD and all related types for clarity and consistency, with migration 24 handling the transition automatically ([**@kvaps**](https://github.com/kvaps) in #1864).
* **[platform] Add DNS-1035 validation for Application names**: Added dynamic DNS-1035 label validation for Application names at creation time, preventing resources with invalid names that would fail downstream ([**@lexfrei**](https://github.com/lexfrei) in #1771).
* **[platform] Make cluster issuer name and ACME solver configurable**: Added `publishing.certificates.solver` and `publishing.certificates.issuerName` parameters to allow pointing all ingress TLS annotations at any ClusterIssuer ([**@myasnikovdaniil**](https://github.com/myasnikovdaniil) in #2077).
* **[platform] Add cilium-kilo networking variant**: Added integrated `cilium-kilo` networking variant combining Cilium CNI with Kilo WireGuard mesh overlay ([**@kvaps**](https://github.com/kvaps) in #2064).
* **[cozystack-api] Switch from DaemonSet to Deployment**: Migrated cozystack-api to a Deployment with PreferClose topology spread constraints, reducing resource consumption while maintaining high availability ([**@kvaps**](https://github.com/kvaps) in #2041, #2048).
### Virtual Machines
* **[vm-instance] Complete migration from virtual-machine to vm-disk and vm-instance**: Fully migrated from `virtual-machine` to the new `vm-disk` and `vm-instance` architecture, with automatic migration script (migration 28) for existing VMs ([**@kvaps**](https://github.com/kvaps) in #2040).
* **[kubevirt-csi-driver] Add RWX Filesystem (NFS) support**: Added Read-Write-Many filesystem support to kubevirt-csi-driver via automatic NFS server deployment per PVC ([**@kvaps**](https://github.com/kvaps) in #2042).
* **[vm] Add cpuModel field to specify CPU model without instanceType**: Added cpuModel field to VirtualMachine API for granular CPU control ([**@sircthulhu**](https://github.com/sircthulhu) in #2007).
* **[vm] Allow switching between instancetype and custom resources**: Implemented atomic upgrade hook for switching between instanceType-based and custom resource VM configurations ([**@sircthulhu**](https://github.com/sircthulhu) in #2008).
* **[vm] Migrate to runStrategy instead of running**: Migrated VirtualMachine API from deprecated `running` field to `runStrategy` ([**@sircthulhu**](https://github.com/sircthulhu) in #2004).
* **[vm] Always expose VMs with a service**: Virtual machines are now always exposed with at least a ClusterIP service, ensuring in-cluster DNS names ([**@lllamnyp**](https://github.com/lllamnyp) in #1738, #1751).
* **[dashboard] VMInstance dropdowns for disks and instanceType**: VM instance creation form now renders API-backed dropdowns for `instanceType` and disk `name` fields ([**@sircthulhu**](https://github.com/sircthulhu) in #2071).
### Backup System
* **[backups] Implement comprehensive backup and restore functionality**: Core backup Plan controller, Velero strategy controller, RestoreJob resource with end-to-end restore workflows, and enhanced backup plans UI ([**@lllamnyp**](https://github.com/lllamnyp) in #1640, #1685, #1687, #1719, #1720, #1737, #1967; [**@androndo**](https://github.com/androndo) in #1762, #1967, #1968, #1811).
* **[backups] Add kubevirt plugin to velero**: Added KubeVirt plugin to Velero for consistent VM state and data snapshots ([**@lllamnyp**](https://github.com/lllamnyp) in #2017).
* **[backups] Install backupstrategy controller by default**: Enabled backupstrategy controller by default for automatic backup scheduling ([**@lllamnyp**](https://github.com/lllamnyp) in #2020).
* **[backups] Better selectors for VM strategy**: Improved VM backup strategy selectors for accurate and reliable backup targeting ([**@lllamnyp**](https://github.com/lllamnyp) in #2023).
* **[backups] Create RBAC for backup resources**: Added comprehensive RBAC configuration for backup operations and restore jobs ([**@lllamnyp**](https://github.com/lllamnyp) in #2018).
### Networking
* **[kilo] Introduce Kilo WireGuard mesh networking**: Added Kilo as a system package providing secure WireGuard-based VPN mesh for connecting Kubernetes nodes across different networks and regions ([**@kvaps**](https://github.com/kvaps) in #1691).
* **[kilo] Add Cilium compatibility variant**: Added `cilium` variant enabling Cilium-aware IPIP encapsulation for full network policy enforcement with Kilo mesh ([**@kvaps**](https://github.com/kvaps) in #2055).
* **[kilo] Update to v0.8.0 with configurable MTU**: Updated Kilo to v0.8.0 with configurable MTU parameter and performance improvements ([**@kvaps**](https://github.com/kvaps) in #2003, #2049, #2053).
* **[local-ccm] Add local-ccm package**: Added local cloud controller manager for managing load balancer services in bare-metal environments ([**@kvaps**](https://github.com/kvaps) in #1831).
* **[local-ccm] Add node-lifecycle-controller component**: Added optional node-lifecycle-controller that automatically deletes unreachable NotReady nodes, solving the "zombie" node problem in autoscaled clusters ([**@IvanHunters**](https://github.com/IvanHunters) in #1992).
* **[tenant] Allow egress to parent ingress pods**: Updated tenant network policies to allow egress traffic to parent cluster ingress pods ([**@lexfrei**](https://github.com/lexfrei) in #1765, #1776).
### New Applications
* **[mongodb] Add MongoDB managed application**: Added MongoDB as a fully managed database with replica sets, persistent storage, and unified user/database configuration ([**@lexfrei**](https://github.com/lexfrei) in #1822; [**@kvaps**](https://github.com/kvaps) in #1923).
* **[qdrant] Add Qdrant vector database**: Added Qdrant as a high-performance vector database for AI/ML workloads with API key authentication and optional LoadBalancer access ([**@lexfrei**](https://github.com/lexfrei) in #1987).
* **[harbor] Add managed Harbor container registry**: Added Harbor v2.14.2 as a managed tenant-level container registry with CloudNativePG, Redis operator, COSI BucketClaim storage, and Trivy scanner ([**@lexfrei**](https://github.com/lexfrei) in #2058).
* **[nats] Add monitoring**: Added Grafana dashboards for NATS JetStream and server metrics, Prometheus monitoring with TLS support ([**@klinch0**](https://github.com/klinch0) in #1381).
* **[mariadb] Rename mysql application to mariadb**: Renamed MySQL application to MariaDB with automatic migration (migration 27) for all existing resources ([**@kvaps**](https://github.com/kvaps) in #2026).
* **[ferretdb] Remove FerretDB application**: Removed FerretDB, superseded by native MongoDB support ([**@kvaps**](https://github.com/kvaps) in #2028).
### Kubernetes and System Components
* **[kubernetes] Update supported Kubernetes versions to v1.30–v1.35**: Updated the tenant Kubernetes version matrix, with v1.35 as the new default. Kamaji updated to edge-26.2.4 and CAPI Kamaji provider to v0.16.0 ([**@lexfrei**](https://github.com/lexfrei) in #2073).
* **[kubernetes] Auto-enable Gateway API support in cert-manager**: Added automatic Gateway API support in cert-manager for tenant clusters ([**@kvaps**](https://github.com/kvaps) in #1997).
* **[kubernetes] Use ingress-nginx nodeport service**: Changed tenant Kubernetes clusters to use ingress-nginx NodePort service for improved compatibility ([**@sircthulhu**](https://github.com/sircthulhu) in #1948).
* **[system] Add cluster-autoscaler for Hetzner and Azure**: Added cluster-autoscaler system package for automatically scaling management cluster nodes on Hetzner and Azure ([**@kvaps**](https://github.com/kvaps) in #1964).
* **[cluster-autoscaler] Enable enforce-node-group-min-size by default**: Ensures node groups are always scaled up to their configured minimum size ([**@kvaps**](https://github.com/kvaps) in #2050).
* **[system] Add clustersecret-operator package**: Added clustersecret-operator for managing secrets across multiple namespaces ([**@sircthulhu**](https://github.com/sircthulhu) in #2025).
### Monitoring
* **[monitoring] Enable monitoring for core components**: Enhanced monitoring capabilities with dashboards and metrics for core Cozystack components ([**@IvanHunters**](https://github.com/IvanHunters) in #1937).
* **[monitoring] Add SLACK_SEVERITY_FILTER and VMAgent for tenant monitoring**: Added SLACK_SEVERITY_FILTER for Slack alert filtering and VMAgent for tenant namespace metrics scraping ([**@IvanHunters**](https://github.com/IvanHunters) in #1712).
* **[monitoring-agents] Fix FQDN resolution for tenant workload clusters**: Fixed monitoring agents in tenant clusters to use full DNS names with cluster domain suffix ([**@IvanHunters**](https://github.com/IvanHunters) in #2075; [**@kvaps**](https://github.com/kvaps) in #2086).
### Storage
* **[linstor] Move CRDs to dedicated piraeus-operator-crds chart**: Moved LINSTOR CRDs to a dedicated chart, ensuring reliable installation of all CRDs including `linstorsatellites.io` ([**@kvaps**](https://github.com/kvaps) in #2036; [**@IvanHunters**](https://github.com/IvanHunters) in #1991).
* **[seaweedfs] Increase certificate duration to 10 years**: Increased SeaweedFS certificate validity to 10 years to reduce rotation overhead ([**@IvanHunters**](https://github.com/IvanHunters) in #1986).
## Improvements
* **[dashboard] Upgrade dashboard to version 1.4.0**: Updated Cozystack dashboard to v1.4.0 with new features and improvements ([**@sircthulhu**](https://github.com/sircthulhu) in #2051).
* **[dashboard] Hide Ingresses/Services/Secrets tabs when no selectors defined**: Tabs are now conditionally shown based on whether the ApplicationDefinition has resource selectors configured, reducing UI clutter ([**@kvaps**](https://github.com/kvaps) in #2087).
* **[dashboard] Add startupProbe to prevent container restarts on slow hardware**: Added startup probe to dashboard pods to prevent unnecessary restarts ([**@kvaps**](https://github.com/kvaps) in #1996).
* **[keycloak] Allow custom Ingress hostname via values**: Added `ingress.host` field to cozy-keycloak chart values for overriding the default `keycloak.<root-host>` hostname ([**@sircthulhu**](https://github.com/sircthulhu) in #2101).
* **[branding] Separate values for Keycloak**: Separated Keycloak branding values for better customization capabilities ([**@nbykov0**](https://github.com/nbykov0) in #1947).
* **[rbac] Use hierarchical naming scheme**: Refactored RBAC to use hierarchical naming for cluster roles and role bindings ([**@lllamnyp**](https://github.com/lllamnyp) in #2019).
* **[tenant,rbac] Use shared clusterroles**: Refactored tenant RBAC to use shared ClusterRoles for improved consistency ([**@lllamnyp**](https://github.com/lllamnyp) in #1999).
* **[kubernetes] Increase default apiServer resourcesPreset to large**: Increased kube-apiserver resource preset to `large` for more reliable operation under higher workloads ([**@kvaps**](https://github.com/kvaps) in #1875).
* **[kubernetes] Increase kube-apiserver startup probe threshold**: Increased startup probe threshold to allow more time for API server readiness ([**@kvaps**](https://github.com/kvaps) in #1876).
* **[etcd] Increase probe thresholds for better recovery**: Increased etcd probe thresholds to improve cluster resilience during temporary slowdowns ([**@kvaps**](https://github.com/kvaps) in #1874).
* **[etcd-operator] Add vertical-pod-autoscaler dependency**: Added VPA as a dependency to etcd-operator for proper resource scaling ([**@sircthulhu**](https://github.com/sircthulhu) in #2047).
* **[cilium] Change cilium-operator replicas to 1**: Reduced Cilium operator replicas to decrease resource consumption in smaller deployments ([**@IvanHunters**](https://github.com/IvanHunters) in #1784).
* **[keycloak-configure,dashboard] Enable insecure TLS verification by default**: Made SSL certificate verification configurable with insecure mode enabled by default for local development ([**@IvanHunters**](https://github.com/IvanHunters) in #2005).
* **[platform] Split telemetry between operator and controller**: Separated telemetry collection for better metrics isolation ([**@kvaps**](https://github.com/kvaps) in #1869).
* **[system] Add resource requests and limits to etcd-defrag**: Added resource requests and limits to etcd-defrag job to prevent resource contention ([**@matthieu-robin**](https://github.com/matthieu-robin) in #1785, #1786).
## Fixes
* **[dashboard] Fix sidebar visibility on cluster-level pages**: Fixed broken URLs with double `//` on cluster-level pages by hiding namespace-scoped sidebar items when no tenant is selected ([**@sircthulhu**](https://github.com/sircthulhu) in #2106).
* **[platform] Fix upgrade issues in migrations, etcd timeout, and migration script**: Fixed multiple upgrade failures discovered during v0.41.1 → v1.0 upgrade testing, including migration 26-29 fixes, RFC3339 format for annotations, and extended etcd HelmRelease timeout to 30m ([**@kvaps**](https://github.com/kvaps) in #2096).
* **[platform] Fix orphaned -rd HelmReleases after application renames**: Migrations 28-29 updated to remove orphaned `-rd` HelmReleases in `cozy-system` after `ferretdb→mongodb`, `mysql→mariadb`, and `virtual-machine→vm-disk+vm-instance` renames, with migration 33 as a safety net ([**@kvaps**](https://github.com/kvaps) in #2102).
* **[platform] Adopt tenant-root into cozystack-basics during migration**: Added migration 31 to adopt existing `tenant-root` Namespace and HelmRelease into `cozystack-basics` for a safe v0.41.x → v1.0 upgrade path ([**@kvaps**](https://github.com/kvaps) in #2065).
* **[platform] Preserve tenant-root HelmRelease during migration**: Fixed data-loss risk during migration where `tenant-root` HelmRelease could be deleted ([**@sircthulhu**](https://github.com/sircthulhu) in #2063).
* **[platform] Fix cozystack-values secret race condition**: Fixed race condition in cozystack-values secret creation that could cause initialization failures ([**@lllamnyp**](https://github.com/lllamnyp) in #2024).
* **[cozystack-basics] Preserve existing HelmRelease values during reconciliations**: Fixed data-loss bug where changes to `tenant-root` HelmRelease were dropped on the next reconciliation ([**@sircthulhu**](https://github.com/sircthulhu) in #2068).
* **[cozystack-basics] Deny resourcequotas deletion for tenant admin**: Fixed `cozy:tenant:admin:base` ClusterRole to explicitly deny deletion of ResourceQuota objects ([**@myasnikovdaniil**](https://github.com/myasnikovdaniil) in #2076).
* **[dashboard] Fix legacy templating and cluster identifier in sidebar links**: Standardized cluster identifier across dashboard menu links resolving broken link targets for Backups and External IPs ([**@androndo**](https://github.com/androndo) in #2093).
* **[dashboard] Fix backupjobs creation form and sidebar backup category identifier**: Fixed backup job creation form fields and fixed sidebar backup category identifier ([**@androndo**](https://github.com/androndo) in #2103).
* **[kubevirt] Update KubeVirt to v1.6.4 and CDI to v1.64.0, fix VM pod initialization**: Updated KubeVirt and CDI and disabled serial console logging globally to fix the `guest-console-log` init container blocking virt-launcher pods ([**@nbykov0**](https://github.com/nbykov0) in #1833; [**@kvaps**](https://github.com/kvaps)).
* **[linstor] Fix DRBD+LUKS+STORAGE resource creation failure**: Applied upstream fix for all newly created encrypted volumes failing due to missing `setExists(true)` call in `LuksLayer` ([**@kvaps**](https://github.com/kvaps) in #2072).
* **[platform] Clean up Helm secrets for removed releases**: Added cleanup logic to migration 23 to remove orphaned Helm secrets from removed `-rd` releases ([**@kvaps**](https://github.com/kvaps) in #2035).
* **[monitoring] Fix YAML parse error in vmagent template**: Fixed YAML parsing error in monitoring-agents vmagent template ([**@kvaps**](https://github.com/kvaps) in #2037).
* **[monitoring] Remove cozystack-controller dependency**: Fixed monitoring package to remove unnecessary cozystack-controller dependency ([**@IvanHunters**](https://github.com/IvanHunters) in #1990).
* **[monitoring] Remove duplicate dashboards.list**: Fixed duplicate dashboards.list configuration in extra/monitoring package ([**@IvanHunters**](https://github.com/IvanHunters) in #2016).
* **[linstor] Update piraeus-server patches with critical fixes**: Backported critical patches fixing edge cases in device management and DRBD resource handling ([**@kvaps**](https://github.com/kvaps) in #1850).
* **[apiserver] Fix Watch resourceVersion and bookmark handling**: Fixed Watch API handling of resourceVersion and bookmarks for proper event streaming ([**@kvaps**](https://github.com/kvaps) in #1860).
* **[bootbox] Auto-create bootbox-application as dependency**: Fixed bootbox package to automatically create required bootbox-application dependency ([**@kvaps**](https://github.com/kvaps) in #1974).
* **[postgres-operator] Correct PromQL syntax in CNPGClusterOffline alert**: Fixed incorrect PromQL syntax in the CNPGClusterOffline Prometheus alert ([**@mattia-eleuteri**](https://github.com/mattia-eleuteri) in #1981).
* **[coredns] Fix serviceaccount to match kubernetes bootstrap RBAC**: Fixed CoreDNS service account to correctly match Kubernetes bootstrap RBAC requirements ([**@mattia-eleuteri**](https://github.com/mattia-eleuteri) in #1958).
* **[dashboard] Verify JWT token**: Added JWT token verification to dashboard for improved security ([**@lllamnyp**](https://github.com/lllamnyp) in #1980).
* **[codegen] Fix missing gen_client in update-codegen.sh**: Fixed build error in `pkg/generated/applyconfiguration/utils.go` by including `gen_client` in the codegen script ([**@lexfrei**](https://github.com/lexfrei) in #2061).
* **[kubevirt-operator] Fix typo in VMNotRunningFor10Minutes alert**: Fixed typo in VM alert name ensuring proper alert triggering ([**@lexfrei**](https://github.com/lexfrei) in #1770, #1775).
## Security
* **[dashboard] Verify JWT token**: Added JWT token verification to the dashboard for improved authentication security ([**@lllamnyp**](https://github.com/lllamnyp) in #1980).
## Dependencies
* **[cilium] Update to v1.18.6**: Updated Cilium CNI to v1.18.6 with security fixes and performance improvements ([**@sircthulhu**](https://github.com/sircthulhu) in #1868).
* **[kube-ovn] Update to v1.15.3**: Updated Kube-OVN CNI to v1.15.3 with performance improvements and bug fixes ([**@kvaps**](https://github.com/kvaps) in #2022).
* **[kilo] Update to v0.8.0**: Updated Kilo WireGuard mesh to v0.8.0 with performance improvements and new compatibility features ([**@kvaps**](https://github.com/kvaps) in #2053).
* **Update Talos Linux to v1.12.1**: Updated Talos Linux to v1.12.1 with latest features and security patches ([**@kvaps**](https://github.com/kvaps) in #1877).
## System Configuration
* **[vpc] Migrate subnets definition from map to array format**: Migrated VPC subnets from `map[string]Subnet` to `[]Subnet` with explicit `name` field, with automatic migration via migration 30 ([**@kvaps**](https://github.com/kvaps) in #2052).
* **[migrations] Add migrations 23-33 for v1.0 upgrade path**: Added 11 incremental migrations handling CRD ownership, resource renaming, secret cleanup, Helm adoption, and configuration conversion for the v0.41.x → v1.0.0 upgrade path ([**@kvaps**](https://github.com/kvaps) in #1975, #2035, #2036, #2040, #2026, #2065, #2052, #2102).
* **[tenant] Run cleanup job from system namespace**: Moved tenant cleanup job to system namespace for improved security and resource isolation ([**@lllamnyp**](https://github.com/lllamnyp) in #1774, #1777).
## Development, Testing, and CI/CD
* **[ci] Use GitHub Copilot CLI for changelog generation**: Automated changelog generation using GitHub Copilot CLI ([**@androndo**](https://github.com/androndo) in #1753).
* **[ci] Choose runner conditional on label**: Added conditional runner selection in CI based on PR labels ([**@lllamnyp**](https://github.com/lllamnyp) in #1998).
* **[e2e] Use helm install instead of kubectl apply for cozystack installation**: Replaced static YAML apply flow with direct `helm upgrade --install` of the installer chart in E2E tests ([**@lexfrei**](https://github.com/lexfrei) in #2060).
* **[e2e] Make kubernetes test retries effective by cleaning up stale resources**: Fixed E2E test retries by adding pre-creation cleanup and increasing deployment wait timeout to 300s ([**@lexfrei**](https://github.com/lexfrei) in #2062).
* **[e2e] Increase HelmRelease readiness timeout for kubernetes test**: Increased HelmRelease readiness timeout to prevent false failures on slower hardware ([**@lexfrei**](https://github.com/lexfrei) in #2033).
* **[ci] Improve cozyreport functionality**: Enhanced cozyreport tool with improved reporting for CI/CD pipelines ([**@lllamnyp**](https://github.com/lllamnyp) in #2032).
* **feat(cozypkg): add cross-platform build targets with version injection**: Added cross-platform build targets for cozypkg/cozyhr tool for linux/amd64, linux/arm64, darwin/amd64, darwin/arm64 ([**@kvaps**](https://github.com/kvaps) in #1862).
* **refactor: move scripts to hack directory**: Reorganized scripts to the standard `hack/` location ([**@kvaps**](https://github.com/kvaps) in #1863).
* **Update CODEOWNERS**: Updated CODEOWNERS to include new maintainers ([**@lllamnyp**](https://github.com/lllamnyp) in #1972; [**@IvanHunters**](https://github.com/IvanHunters) in #2015).
* **[talm] Skip config loading for completion subcommands**: Fixed talm CLI to skip config loading for shell completion commands ([**@kitsunoff**](https://github.com/kitsunoff) in cozystack/talm#109).
* **[talm] Fix metadata.id type casting in physical_links_info**: Fixed Prometheus query to properly cast metadata.id to string for regexMatch operations ([**@kvaps**](https://github.com/kvaps) in cozystack/talm#110).
## Documentation
* **[website] Add documentation versioning**: Implemented comprehensive documentation versioning with separate v0 and v1 documentation trees and a version selector in the UI ([**@IvanStukov**](https://github.com/IvanStukov) in cozystack/website#415).
* **[website] Describe upgrade to v1.0**: Added detailed upgrade instructions for migrating from v0.x to v1.0 ([**@nbykov0**](https://github.com/nbykov0) in cozystack/website@21bbe84).
* **[website] Migrate ConfigMap references to Platform Package in v1 docs**: Updated entire v1 documentation to replace legacy ConfigMap-based configuration with the new Platform Package API ([**@sircthulhu**](https://github.com/sircthulhu) in cozystack/website#426).
* **[website] Add generic Kubernetes deployment guide for v1**: Added installation guide for deploying Cozystack on any generic Kubernetes cluster ([**@lexfrei**](https://github.com/lexfrei) in cozystack/website#408).
* **[website] Describe operator-based and HelmRelease-based package patterns**: Added development documentation explaining operator-based and HelmRelease-based package patterns ([**@kvaps**](https://github.com/kvaps) in cozystack/website#413).
* **[website] Add Helm chart development principles guide**: Added developer guide documenting Cozystack's four core Helm chart principles ([**@kvaps**](https://github.com/kvaps) in cozystack/website#418).
* **[website] Add network architecture overview**: Added comprehensive network architecture documentation covering the multi-layered networking stack with Mermaid diagrams ([**@IvanHunters**](https://github.com/IvanHunters) in cozystack/website#422).
* **[website] Add LINSTOR disk preparation guide**: Added comprehensive documentation for preparing disks for LINSTOR storage ([**@IvanHunters**](https://github.com/IvanHunters) in cozystack/website#411).
* **[website] Add Proxmox VM migration guide**: Added detailed guide for migrating virtual machines from Proxmox to Cozystack ([**@IvanHunters**](https://github.com/IvanHunters) in cozystack/website#410).
* **[website] Add cluster autoscaler documentation**: Added documentation for Hetzner setup with Talos, vSwitch, and Kilo mesh integration ([**@kvaps**](https://github.com/kvaps) in #1964).
* **[website] Improve Azure autoscaling troubleshooting guide**: Enhanced Azure autoscaling documentation with serial console instructions and `az vmss update --custom-data` guidance ([**@kvaps**](https://github.com/kvaps) in cozystack/website#424).
* **[website] Update multi-location documentation for cilium-kilo variant**: Updated multi-location networking docs to reflect the integrated `cilium-kilo` variant selection ([**@kvaps**](https://github.com/kvaps) in cozystack/website@02d63f0).
* **[website] Update documentation to use jsonpatch for service exposure**: Improved `kubectl patch` commands to use JSON Patch `add` operations ([**@sircthulhu**](https://github.com/sircthulhu) in cozystack/website#427).
* **[website] Update certificates section in Platform Package documentation**: Updated certificate configuration docs to reflect new `solver` and `issuerName` fields ([**@myasnikovdaniil**](https://github.com/myasnikovdaniil) in cozystack/website#429).
* **[website] Add tenant Kubernetes cluster log querying guide**: Added documentation for querying logs from tenant clusters in Grafana using VictoriaLogs labels ([**@IvanHunters**](https://github.com/IvanHunters) in cozystack/website#430).
* **[website] Replace non-idempotent commands with idempotent alternatives**: Updated `helm install` to `helm upgrade --install` and `kubectl create` to `kubectl apply` across all installation guides ([**@lexfrei**](https://github.com/lexfrei) in cozystack/website#431).
* **[website] Fix broken documentation links with .md suffix**: Fixed incorrect internal links across virtualization guides for v0 and v1 documentation ([**@cheese**](https://github.com/cheese) in cozystack/website#432).
* **[website] Refactor resource planning documentation**: Improved resource planning guide with clearer structure and more comprehensive coverage ([**@IvanStukov**](https://github.com/IvanStukov) in cozystack/website#423).
* **[website] Add ServiceAccount API access documentation and update FAQ**: Added documentation for ServiceAccount API access token configuration and updated FAQ ([**@IvanStukov**](https://github.com/IvanStukov) in cozystack/website#421).
* **[website] Update networking-mesh allowed-location-ips example**: Replaced provider-specific CLI with standard `kubectl` commands in multi-location networking guide ([**@kvaps**](https://github.com/kvaps) in cozystack/website#425).
* **[website] Add Hetzner RobotLB documentation**: Added documentation for configuring public IP with Hetzner RobotLB ([**@kvaps**](https://github.com/kvaps) in cozystack/website#394).
* **[website] Add documentation for creating and managing cloned VMs**: Added comprehensive guide for VM cloning operations ([**@sircthulhu**](https://github.com/sircthulhu) in cozystack/website#401).
* **[website] Update Talos installation docs for Hetzner and Servers.com**: Updated installation documentation for Hetzner and Servers.com environments ([**@kvaps**](https://github.com/kvaps) in cozystack/website#395).
* **[website] Add Hidora organization support details**: Added Hidora to the support page ([**@matthieu-robin**](https://github.com/matthieu-robin) in cozystack/website#397, cozystack/website#398).
* **[website] Check quotas before an upgrade**: Added troubleshooting documentation for checking resource quotas before upgrades ([**@nbykov0**](https://github.com/nbykov0) in cozystack/website#405).
* **[website] Update support documentation**: Updated support documentation with current contact information ([**@xrmtech-isk**](https://github.com/xrmtech-isk) in cozystack/website#420).
* **[website] Correct typo in kubeconfig reference in Kubernetes installation guide**: Fixed documentation typo in kubeconfig reference ([**@shkarface**](https://github.com/shkarface) in cozystack/website#414).
## Breaking Changes & Upgrade Notes
* **[api] CozystackResourceDefinition renamed to ApplicationDefinition**: The `CozystackResourceDefinition` CRD has been renamed to `ApplicationDefinition`. Migration 24 handles the transition automatically during upgrade ([**@kvaps**](https://github.com/kvaps) in #1864).
* **[platform] Certificate issuer configuration parameters renamed**: The `publishing.certificates.issuerType` field is renamed to `publishing.certificates.solver`, and the value `cloudflare` is renamed to `dns01`. A new `publishing.certificates.issuerName` field (default: `letsencrypt-prod`) is added. Migration 32 automatically converts existing configurations — no manual action required ([**@myasnikovdaniil**](https://github.com/myasnikovdaniil) in #2077).
* **[vpc] VPC subnets definition migrated from map to array format**: VPC subnets are now defined as `[]Subnet` with an explicit `name` field instead of `map[string]Subnet`. Migration 30 handles the conversion automatically ([**@kvaps**](https://github.com/kvaps) in #2052).
* **[vm] virtual-machine application replaced by vm-disk and vm-instance**: The legacy `virtual-machine` application has been fully replaced. Migration 28 automatically converts existing VMs to the new architecture ([**@kvaps**](https://github.com/kvaps) in #2040).
* **[mysql] mysql application renamed to mariadb**: Existing MySQL deployments are automatically renamed to MariaDB via migration 27 ([**@kvaps**](https://github.com/kvaps) in #2026).
### Upgrade Guide
To upgrade from v0.41.x to v1.0.0:
1.**Backup your cluster** before upgrading.
2. Run the provided migration script: `hack/migrate-to-version-1.0.sh`.
3. The 33 incremental migration steps will automatically handle all resource renaming, configuration conversion, CRD adoption, and secret cleanup.
4. Refer to the [upgrade documentation](https://cozystack.io/docs/v1/upgrade) for detailed instructions and troubleshooting.
## Contributors
We'd like to thank all contributors who made this release possible:
* **[platform] Prevent cozystack-version ConfigMap from deletion**: Added resource protection to prevent the `cozystack-version` ConfigMap from being accidentally deleted, improving platform stability and reliability ([**@myasnikovdaniil**](https://github.com/myasnikovdaniil) in #2112, #2114).
* **[installer] Add keep annotation to Namespace and update migration script**: Added `helm.sh/resource-policy: keep` annotation to the `cozy-system` Namespace in the installer Helm chart to prevent Helm from deleting the namespace (and all HelmReleases within it) when the installer release is removed. The v1.0 migration script is also updated to annotate the `cozy-system` namespace and `cozystack-version` ConfigMap with this policy before migration ([**@kvaps**](https://github.com/kvaps) in #2122, #2123).
* **[dashboard] Add FlowSchema to exempt BFF from API throttling**: Added a `cozy-dashboard-exempt` FlowSchema to exempt the dashboard Back-End-for-Frontend (BFF) service account from Kubernetes API Priority and Fairness throttling. Previously, the BFF fell under the `workload-low` priority level, causing 429 (Too Many Requests) errors under load, resulting in dashboard unresponsiveness ([**@kvaps**](https://github.com/kvaps) in #2121, #2124).
## Documentation
* **[website] Replace bundles documentation with variants**: Renamed the "Bundles" documentation section to "Variants" to match current Cozystack terminology. Removed deprecated variants (`iaas-full`, `distro-full`, `distro-hosted`) and added new variants: `default` (PackageSources only, for manual package management via cozypkg) and `isp-full-generic` (full PaaS/IaaS on k3s, kubeadm, or RKE2). Updated all cross-references throughout the documentation ([**@kvaps**](https://github.com/kvaps) in cozystack/website#433).
* **[website] Add step to protect namespace before upgrading**: Updated the cluster upgrade guide and v0.41→v1.0 migration guide with a required step to annotate the `cozy-system` namespace and `cozystack-version` ConfigMap with `helm.sh/resource-policy=keep` before running `helm upgrade`, preventing accidental namespace deletion ([**@kvaps**](https://github.com/kvaps) in cozystack/website#435).
* **[platform] Suspend cozy-proxy if it conflicts with installer release during migration**: Added a check in the v0.41→v1.0 migration script to detect and automatically suspend the `cozy-proxy` HelmRelease when its `releaseName` is set to `cozystack`, which conflicts with the installer release and would cause `cozystack-operator` deletion during the upgrade ([**@kvaps**](https://github.com/kvaps) in #2128, #2130).
* **[platform] Fix off-by-one error in run-migrations script**: Fixed a bug in the migration runner where the first required migration was always skipped due to an off-by-one error in the migration range calculation, ensuring all upgrade steps execute correctly ([**@myasnikovdaniil**](https://github.com/myasnikovdaniil) in #2126, #2132).
* **[system] Fix Keycloak proxy configuration for v26.x**: Replaced the deprecated `KC_PROXY=edge` environment variable with `KC_PROXY_HEADERS=xforwarded` and `KC_HTTP_ENABLED=true` in the Keycloak StatefulSet template. `KC_PROXY` was removed in Keycloak 26.x, previously causing "Non-secure context detected" warnings and broken cookie handling when running behind a reverse proxy with TLS termination ([**@sircthulhu**](https://github.com/sircthulhu) in #2125, #2134).
* **[dashboard] Allow clearing instanceType field and preserve newlines in secret copy**: Added `allowEmpty: true` to the `instanceType` field in the VMInstance form so users can explicitly clear it to use custom KubeVirt resources without a named instance type. Also fixed newline preservation when copying secrets with CMD+C ([**@sircthulhu**](https://github.com/sircthulhu) in #2135, #2137).
* **[dashboard] Restore stock-instance sidebars for namespace-level pages**: Restored `stock-instance-api-form`, `stock-instance-api-table`, `stock-instance-builtin-form`, and `stock-instance-builtin-table` sidebar resources that were inadvertently removed in #2106. Without these sidebars, namespace-level pages such as Backup Plans rendered as empty pages with no interactive content ([**@sircthulhu**](https://github.com/sircthulhu) in #2136, #2138).
Hubble is a network and security observability platform built on top of Cilium. It provides deep visibility into the communication and behavior of services in your Kubernetes cluster.
## Prerequisites
- Cozystack platform running with Cilium as the CNI
- Monitoring hub enabled for Grafana access
## Configuration
Hubble is disabled by default in Cozystack. To enable it, update the Cilium configuration.
### Enable Hubble
Edit the Cilium values in your platform configuration to enable Hubble:
# Wait for container to be started (pod Running does not guarantee container is ready for exec on slow CI)
if ! timeout 120 sh -ec "until kubectl -n tenant-test get pod openbao-$name-0 --output jsonpath='{.status.containerStatuses[0].started}' 2>/dev/null | grep -q true; do sleep 5; done"; then
echo "=== DEBUG: Container did not start in time ===" >&2
kubectl -n tenant-test describe pod openbao-$name-0 >&2 || true
# bao status exit codes: 0 = unsealed, 1 = error/not ready, 2 = sealed but responsive
if ! timeout 60 sh -ec "until kubectl -n tenant-test exec openbao-$name-0 -- bao status >/dev/null 2>&1; rc=\$?; test \$rc -eq 0 -o \$rc -eq 2; do sleep 3; done"; then
echo "=== DEBUG: OpenBAO API did not become responsive ===" >&2
kubectl -n tenant-test describe pod openbao-$name-0 >&2 || true
timeout 10 sh -ec 'until talosctl bootstrap -n 192.168.123.11 -e 192.168.123.11; do sleep 1; done'
# Wait until etcd is healthy
timeout 180 sh -ec 'until talosctl etcd members -n 192.168.123.11,192.168.123.12,192.168.123.13 -e 192.168.123.10 >/dev/null 2>&1; do sleep 1; done'
if ! timeout 180 sh -ec 'until talosctl etcd members -n 192.168.123.11,192.168.123.12,192.168.123.13 -e 192.168.123.10 >/dev/null 2>&1; do sleep 1; done'; then
// SetupWithManager registers our controller with the Manager and sets up watches.
Some files were not shown because too many files have changed in this diff
Show More
Reference in New Issue
Block a user
Blocking a user prevents them from interacting with repositories, such as opening or commenting on pull requests or issues. Learn more about blocking a user.