251 Commits

Author SHA1 Message Date
Andrei Kvapil
14a9017932 fix(migration): suspend cozy-proxy if it conflicts with installer release
In v0.41.x, cozy-proxy HelmRelease was configured with
releaseName: cozystack, which collides with the installer helm release.
If not suspended before upgrade, the cozy-proxy HR reconciles and
overwrites the installer release, deleting cozystack-operator.

Add a check in the migration script that detects this conflict and
suspends the cozy-proxy HelmRelease before proceeding.

Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: Andrei Kvapil <kvapss@gmail.com>
2026-03-02 12:59:36 +01:00
Andrei Kvapil
c83e41ea14 fix(installer): add keep annotation to Namespace and update migration script
Add helm.sh/resource-policy=keep annotation to the cozy-system Namespace
in the installer helm chart. This prevents Helm from deleting the
namespace when the HelmRelease is removed, which would otherwise destroy
all other HelmReleases within it.

Update the migration script to annotate the cozy-system namespace and
cozystack-version ConfigMap with helm.sh/resource-policy=keep before
generating the Package resource.

Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: Andrei Kvapil <kvapss@gmail.com>
2026-02-28 11:46:09 +01:00
Andrei Kvapil
daa3905b67 [ci] Debug improvements (#2111)
<!-- Thank you for making a contribution! Here are some tips for you:
- Start the PR title with the [label] of Cozystack component:
- For system components: [platform], [system], [linstor], [cilium],
[kube-ovn], [dashboard], [cluster-api], etc.
- For managed apps: [apps], [tenant], [kubernetes], [postgres],
[virtual-machine] etc.
- For development and maintenance: [tests], [ci], [docs], [maintenance].
- If it's a work in progress, consider creating this PR as a draft.
- Don't hesistate to ask for opinion and review in the community chats,
even if it's still a draft.
- Add the label `backport` if it's a bugfix that needs to be backported
to a previous version.
-->

## What this PR does


### Release note

<!--  Write a release note:
- Explain what has changed internally and for users.
- Start with the same [label] as in the PR title
- Follow the guidelines at
https://github.com/kubernetes/community/blob/master/contributors/guide/release-notes.md.
-->

```release-note
[ci] Added more debug information to ci tests
```

<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->

## Summary by CodeRabbit

* **Chores**
* Enhanced error handling and diagnostic output in development testing
infrastructure.

<!-- end of auto-generated comment: release notes by coderabbit.ai -->
2026-02-27 18:24:07 +01:00
Andrei Kvapil
022ddf73a8 [apps] Add OpenBAO as a managed secrets management service (#2059)
## What this PR does

Adds OpenBAO (open-source Vault fork) as a new managed PaaS application
in Cozystack.

**Structure follows existing app patterns (qdrant, nats):**
- System chart with vendored upstream `openbao/openbao` (chart v0.25.3,
appVersion v2.5.0)
- App chart with standalone/HA mode switching based on replicas count
- TLS via cert-manager self-signed certificates per instance
- ApplicationDefinition, PackageSource, PaaS bundle entry
- E2E test with init/unseal workflow

**Key design decisions:**
- `replicas: 1` → standalone mode with file storage; `replicas > 1` → HA
with Raft integrated storage and retry_join with TLS peer verification
- TLS enabled by default — each instance gets a self-signed Certificate
with DNS SANs covering services and pod addresses
- `disable_mlock = true` in HCL config since default security context
drops IPC_LOCK capability
- Injector and CSI provider disabled (cluster-scoped components, not
safe per-tenant)
- No auto-init/unseal — OpenBAO requires manual initialization by design
- E2E test performs full lifecycle: deploy, wait for certificate + API,
init, unseal, verify readiness, cleanup

### Release note

```release-note
[apps] Add OpenBAO as a managed secrets management service with standalone and HA Raft modes, TLS enabled by default
```

<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->

## Summary by CodeRabbit

## Release Notes

* **New Features**
* Added OpenBAO managed secrets management service with
high-availability and standalone deployment options
  * Integrated monitoring and dashboards for operational visibility
  * Enabled configurable external access and web UI
  * Added automated snapshot backup capability

<!-- end of auto-generated comment: release notes by coderabbit.ai -->
2026-02-27 11:11:59 +01:00
Myasnikov Daniil
3bf43312aa (ci) Removed cozytest output trimming in non-tty run
Signed-off-by: Myasnikov Daniil <myasnikovdaniil2001@gmail.com>
2026-02-27 12:43:00 +05:00
Myasnikov Daniil
fd6d0c3603 (ci) Added extra debug commands for k8s startup
Signed-off-by: Myasnikov Daniil <myasnikovdaniil2001@gmail.com>
2026-02-27 12:41:40 +05:00
Andrei Kvapil
948346ef6d fix(platform): use original cozystack.io/ui label in migration 26 and simplify migration script
Migration 26 was using apps.cozystack.io/application.kind=Monitoring label
which is added by migration 22 and may not be present on v0.41.1 clusters.
Switch to cozystack.io/ui=true (guaranteed on old HRs) with field-selector
for exact name match.

Also remove redundant bundle enabled flags from migrate-to-version-1.0.sh
since the variant already determines them via its values file.

Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: Andrei Kvapil <kvapss@gmail.com>
2026-02-24 23:49:43 +01:00
Andrei Kvapil
da597225d1 fix(platform): add missing field mappings in migrate-to-version-1.0.sh
Add ConfigMap fields that were not converted to Package values:
- bundle-disable → bundles.disabledPackages
- bundle-enable → bundles.enabledPackages
- expose-ingress → publishing.ingressName
- expose-services → publishing.exposedServices

Remove incorrect bundles.system.type field that is not part of the
Package values schema.

Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: Andrei Kvapil <kvapss@gmail.com>
2026-02-24 23:34:53 +01:00
Andrei Kvapil
02064888a4 feat(platform): make cluster issuer name and ACME solver configurable (#2077)
<!-- Thank you for making a contribution! Here are some tips for you:
- Start the PR title with the [label] of Cozystack component:
- For system components: [platform], [system], [linstor], [cilium],
[kube-ovn], [dashboard], [cluster-api], etc.
- For managed apps: [apps], [tenant], [kubernetes], [postgres],
[virtual-machine] etc.
- For development and maintenance: [tests], [ci], [docs], [maintenance].
- If it's a work in progress, consider creating this PR as a draft.
- Don't hesistate to ask for opinion and review in the community chats,
even if it's still a draft.
- Add the label `backport` if it's a bugfix that needs to be backported
to a previous version.
-->

## What this PR does

Previously `_cluster.clusterissuer` controlled the ACME solver type
using
  values `http01` / `cloudflare`, and every ingress template hardcoded 
`cert-manager.io/cluster-issuer: letsencrypt-prod` with no way to
override it.

  This PR adds new parameters in platform chart:
  - `publishing.certificates.solver` (default `http01`)
  - `publishing.certificates.issuerName` (default: `letsencrypt-prod`)
  instead of single parameter before
  - `publishing.certificates.issuerType`
  
Previous `certificates.issuerType` was renamed to `certificates.solver`;
Also its possible value
`cloudflare` was renamed to `dns01` to use standard ACME terminology.

New `certificates.issuerName` (default: `letsencrypt-prod`) — propagated
as
`_cluster.issuer-name` to all packages via `cozystack-values` then its
value appears in
`cert-manager.io/cluster-issuer` annotation across 8 templates of
ingresses in system applications.

`publishing.certificates.solver` can be set empty to clearly support
`selfsigned-cluster-issuer`,
   or have any value, but it can be a bit confusing.

  Operators can now point ingresses at any ClusterIssuer (custom ACME,
  self-signed, internal CA) by setting `certificates.issuerName` without
  touching individual package templates.

  ## Breaking changes

  | What changed | Before | After |
  |---|---|---|
  | Solver key | `certificates.issuerType` | `certificates.solver` |
| Cloudflare solver value | `issuerType: cloudflare` | `solver: dns01` |

  This changes handled by migration when upgrading cozystack from v1 
  or by `migration-to-v1.0.sh` script (also checked by migration later)
  No actions from user needed.

### Release note

<!--  Write a release note:
- Explain what has changed internally and for users.
- Start with the same [label] as in the PR title
- Follow the guidelines at
https://github.com/kubernetes/community/blob/master/contributors/guide/release-notes.md.
-->

```release-note
[platform] Added publishing.certificates.solver (http01/dns01) and 
  publishing.certificates.issuerName fields to allow configuring ACME challenge 
  type and ClusterIssuer per installation, replacing the old implicit issuerType field
[platform] Migration script and upgrade hook (migration 32) convert old
  clusterissuer/issuerType fields to the new solver/issuerName fields
```

<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->

## Summary by CodeRabbit

* **Chores**
* Migrated certificate issuer configuration from legacy `issuerType`
field to new `solver` and `issuerName` fields system-wide.
* Automated migration script converts existing configurations, mapping
legacy values (cloudflare, http01) to new format.
* Updated all certificate-related templates to use new configurable
solver and issuer settings with sensible defaults.

<!-- end of auto-generated comment: release notes by coderabbit.ai -->
2026-02-20 23:09:12 +01:00
Andrei Kvapil
d856775961 feat(kubernetes): update supported versions to v1.30-v1.35 (#2073)
## What this PR does

Updates Kubernetes version support to match current release landscape
and Talos 1.12 compatibility:

- Update Kamaji from `edge-25.4.1` to `edge-26.2.4` (adds K8s 1.35
support)
- Update Kubernetes version matrix: v1.30, v1.31, v1.32, v1.33, v1.34,
v1.35
- Drop EOL versions v1.28 and v1.29
- Remove merged-upstream patch (992.diff — label preservation fix)
- Regenerate disable-datastore-check.diff for new Kamaji version

Changes:

- Default Kubernetes version is now v1.35
- E2E tests will validate v1.35 (latest) and v1.34 (previous)
- Patch versions updated to latest available (v1.35.0, v1.34.4, v1.33.8,
v1.32.12, v1.31.14, v1.30.14)

### Release note

```release-note
[kubernetes] Update supported Kubernetes versions to v1.30-v1.35
```


<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

* **New Features**
* Added a Kamaji CRDs Helm chart with DataStore and KubeconfigGenerator
resources, plus deployment templates and configurable
kubeconfigGenerator settings
* DataStore now supports multiple backends (etcd, MySQL, PostgreSQL,
NATS) with TLS/auth validations and status tracking (observedGeneration)

* **Chores**
* Bumped default Kubernetes version from v1.33 to v1.35 (added v1.34;
removed v1.28–v1.29)
* Updated charts, packaging metadata, README/docs and helm
ignore/Makefile entries; updated builder base image and chart
dependencies
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
2026-02-20 20:12:56 +01:00
Myasnikov Daniil
c98b6203a7 fix(platform): fix migrate script to account clusterissuer parameter
Signed-off-by: Myasnikov Daniil <myasnikovdaniil2001@gmail.com>
2026-02-20 16:41:41 +05:00
Aleksei Sviridkin
8f1e52690d test(e2e): fix kubernetes-previous retry failures
- Kill stale port-forward processes before starting a new one;
  on retries, the previous attempt's port-forward still holds the
  port, causing all kubectl commands to get "connection refused"
- Use -ge 2 instead of -eq 2 for node count check; MachineHealthCheck
  may create a 3rd VM, leading to 3 nodes joining the tenant cluster
  which would never satisfy the exact equality check
- Increase node join timeout from 5m to 8m; QEMU VMs with v1.34 need
  more time to boot and join when running after kubernetes-latest

Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: Aleksei Sviridkin <f@lex.la>
2026-02-20 03:23:50 +03:00
Aleksei Sviridkin
00ab6e792c test(e2e): increase worker node join timeout to 5 minutes
When running kubernetes-latest and kubernetes-previous E2E tests
simultaneously, worker VMs compete for resources in the sandbox
environment. 3 minutes was insufficient for nodes to boot and
join the tenant cluster under load. Increase to 5 minutes.

Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: Aleksei Sviridkin <f@lex.la>
2026-02-20 01:30:10 +03:00
Aleksei Sviridkin
4e5455c72c fix(e2e): poll for CRD existence before waiting for Established condition
kubectl wait fails immediately with NotFound if the CRD does not exist
yet. The operator creates CRDs asynchronously on startup, so wrap the
wait in a retry loop that tolerates the initial absence.

Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: Aleksei Sviridkin <f@lex.la>
2026-02-19 19:24:19 +03:00
Aleksei Sviridkin
dbfdbc8298 fix(installer): check parsePlatformSourceURL error, wait for PackageSource in E2E
Explicitly check error from parsePlatformSourceURL instead of relying on
the implicit guarantee that installPlatformSourceResource already checked
it. This prevents latent bugs if startup order is ever restructured.

Add wait for platform PackageSource existence in E2E test before creating
Package resource, preventing flaky failures when operator startup is slow.

Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: Aleksei Sviridkin <f@lex.la>
2026-02-19 17:57:45 +03:00
Aleksei Sviridkin
8450830f06 fix(installer): add CRD wait in E2E, unit tests for PackageSource creation
Add explicit CRD wait (kubectl wait crd --for=condition=Established) in
E2E test before creating Package resources, preventing race condition
between operator CRD installation and resource creation.

Add unit tests for installPlatformPackageSource covering create, update,
and GitRepository sourceRef kind scenarios.

Document that hardcoded variant list is an intentional design choice
matching the previous Helm template behavior.

Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: Aleksei Sviridkin <f@lex.la>
2026-02-19 17:50:09 +03:00
Aleksei Sviridkin
655133b81c fix(installer): move PackageSource creation from Helm template to operator
Replace the Helm hook approach with programmatic PackageSource creation
in the operator startup sequence. Helm hooks are unsuitable for persistent
resources like PackageSource because before-hook-creation policy causes
cascade deletion of owned ArtifactGenerators during upgrades.

The operator now creates the platform PackageSource after installing CRDs
and the Flux source resource, using the same create-or-update pattern as
installPlatformSourceResource(). The sourceRef.kind is derived from the
platform source URL (OCIRepository for oci://, GitRepository for git).

Also fix stale comment in e2e test referencing deleted crds/ directory.

Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: Aleksei Sviridkin <f@lex.la>
2026-02-19 17:38:23 +03:00
Aleksei Sviridkin
668ddc552e refactor(installer): remove CRDs from Helm chart, rely on operator --install-crds
Remove the crds/ directory from the cozy-installer Helm chart. The operator
already installs embedded CRDs via server-side apply on every startup with
the --install-crds=true flag, making the Helm crds/ directory redundant.

Convert templates/packagesource.yaml to a Helm post-install/post-upgrade
hook so it is applied after the operator has started and installed CRDs.

Update codegen to write CRDs only to internal/crdinstall/manifests/ (single
source of truth) and update the Makefile to source build assets from there.

Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: Aleksei Sviridkin <f@lex.la>
2026-02-19 17:23:37 +03:00
Andrei Kvapil
7ff5b2ba23 [harbor] Add managed Harbor container registry (#2058)
## What this PR does

Adds Harbor v2.14.2 as a managed tenant-level container registry service
in the PaaS bundle.

**Architecture:**

- Wrapper chart (`apps/harbor`) — HelmRelease, Ingress,
WorkloadMonitors, BucketClaim, dashboard RBAC
- Vendored upstream chart (`system/harbor`) from helm.goharbor.io
v1.18.2
- System chart (`system/harbor`) provisions PostgreSQL via CloudNativePG
and Redis via redis-operator
- ApplicationDefinition (`system/harbor-rd`) for dynamic `Harbor` CRD
registration
- PackageSource and paas.yaml bundle entry for platform integration

**Key design decisions:**

- Database and Redis provisioned via CPNG and redis-operator (not
internal Helm-based instances) for reliable day-2 operations
- Registry image storage uses S3 via COSI BucketClaim/BucketAccess from
namespace SeaweedFS
- Trivy vulnerability scanner cache uses PVC (S3 not supported by
vendored chart)
- Token CA key/cert persisted across upgrades via Secret lookup
- Per-component resource configuration (core, registry, jobservice,
trivy)
- Ingress with TLS via cert-manager, cloudflare issuer type handling,
proxy timeouts for large image pushes
- Auto-generated admin credentials persisted across upgrades

**E2E test:** Creates Harbor instance, verifies HelmRelease readiness,
deployment availability, credentials secret, service port, then cleans
up.

### Release note

```release-note
[harbor] Add managed Harbor container registry as a tenant-level service
```

<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->

## Summary by CodeRabbit

* **New Features**
* Added Harbor container registry deployment with integrated Kubernetes
support, including database and cache layers.
  * Enabled metrics monitoring via Prometheus integration.
  * Configured dashboard management interface for Harbor administration.

* **Tests**
  * Added end-to-end testing for Harbor deployment and verification.

* **Chores**
  * Integrated Harbor into the platform's application package bundle.

<!-- end of auto-generated comment: release notes by coderabbit.ai -->
2026-02-18 13:54:26 +01:00
Aleksei Sviridkin
87d0390256 fix(harbor): include tenant domain in default hostname and add E2E cleanup
Use tenant base domain in default hostname construction (harbor.RELEASE.DOMAIN)
to match the pattern used by other apps (kubernetes, vpn). Remove unused $ingress
variable from harbor.yaml. Add cleanup of stale resources from previous failed
E2E runs.

Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: Aleksei Sviridkin <f@lex.la>
2026-02-18 02:07:38 +03:00
Andrei Kvapil
3b267d6882 refactor(e2e): use helm install instead of kubectl apply for cozystack installation (#2060)
## Summary

- Replace pre-rendered static YAML application (`kubectl apply`) with
direct `helm upgrade --install` of the `packages/core/installer` chart
in E2E tests
- Remove CRD/operator artifact upload/download from CI workflow — the
chart with correct values is already present in the sandbox via
workspace copy and `pr.patch`
- Remove `copy-installer-manifest` Makefile target and its dependencies

## Test plan

- [ ] CI build job completes without uploading CRD/operator artifacts
- [ ] E2E `install-cozystack` step succeeds with `helm upgrade
--install`
- [ ] All existing E2E app tests pass

<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

* **Chores**
* PR workflows now only keep the primary disk asset; publishing/fetching
of auxiliary operator and CRD artifacts removed.
* CRD manifests are produced by concatenation and a verify-crds check
was added to unit tests; file-write permissions for embedded manifests
tightened.

* **New Features**
* Operator can install CRDs at startup to ensure resources exist before
reconcile.
  * E2E install now uses the chart-based installer flow.

* **Tests**
* Added comprehensive tests for CRD-install handling and manifest
writing.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
2026-02-17 23:31:08 +01:00
Aleksei Sviridkin
e7ffc21743 feat(harbor): switch registry storage to S3 via COSI BucketClaim
Replace PVC-based registry storage with S3 via COSI BucketClaim/BucketAccess.
The system chart parses BucketInfo secret and creates a registry-s3 Secret
with REGISTRY_STORAGE_S3_* env vars that override Harbor's ConfigMap values.

- Add bucket-secret.yaml to system chart (BucketInfo parser)
- Remove storageType/size from registry config (S3 is now the only option)
- Use Harbor's existingSecret support for S3 credentials injection
- Add objectstorage-controller to PackageSource dependencies
- Update E2E test with COSI bucket provisioning waits and diagnostics

Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: Aleksei Sviridkin <f@lex.la>
2026-02-18 01:23:02 +03:00
kklinch0
7ac989923d Add monitoring for NATs
Co-authored-by: Andrei Kvapil <kvapss@gmail.com>
Signed-off-by: kklinch0 <kklinch0@gmail.com>
Signed-off-by: Andrei Kvapil <kvapss@gmail.com>
2026-02-17 22:54:12 +01:00
Aleksei Sviridkin
0f2ba5aba2 fix(harbor): add diagnostic output on E2E system HelmRelease timeout
Dump HelmRelease status, pods, events, and ExternalArtifact info
when harbor-test-system fails to become ready, to diagnose the
root cause of the persistent timeout.

Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: Aleksei Sviridkin <f@lex.la>
2026-02-18 00:50:12 +03:00
Aleksei Sviridkin
490faaf292 fix(harbor): add operator dependencies, fix persistence rendering, increase E2E timeout
Add postgres-operator and redis-operator to PackageSource dependsOn
to ensure CRDs are available before Harbor system chart deploys.

Make persistentVolumeClaim conditional to avoid empty YAML mapping
when using S3 storage without Trivy.

Increase E2E system HelmRelease timeout from 300s to 600s to account
for CPNG + Redis + Harbor bootstrap time on QEMU.

Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: Aleksei Sviridkin <f@lex.la>
2026-02-18 00:50:12 +03:00
Aleksei Sviridkin
cea57f62c8 [harbor] Make registry storage configurable: S3 or PVC
Add registry.storageType parameter (pvc/s3) to let users choose
between PVC storage and S3 via COSI BucketClaim. Default is pvc,
which works without SeaweedFS in the tenant namespace.

Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: Aleksei Sviridkin <f@lex.la>
2026-02-18 00:50:12 +03:00
Aleksei Sviridkin
c815725bcf [harbor] Fix E2E test: use correct HelmRelease name with prefix
ApplicationDefinition has prefix "harbor-", so CR name "harbor" produces
HelmRelease "harbor-harbor". Use name="test" and release="harbor-test"
to correctly reference all resources.

Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: Aleksei Sviridkin <f@lex.la>
2026-02-18 00:50:12 +03:00
Aleksei Sviridkin
0c85639fed [harbor] Move to apps/, use S3 via BucketClaim for registry storage
Move Harbor from packages/extra/ to packages/apps/ as it is a
self-sufficient end-user application, not a singleton tenant module.
Update bundle entry from system to paas accordingly.

Replace registry PVC storage with S3 via COSI BucketClaim/BucketAccess,
provisioned from the namespace's SeaweedFS instance. S3 credentials are
injected into the HelmRelease via valuesFrom with targetPath.

Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: Aleksei Sviridkin <f@lex.la>
2026-02-18 00:50:12 +03:00
Aleksei Sviridkin
2dd3c03279 [harbor] Use CPNG and redis-operator instead of internal databases
Replace Harbor's internal PostgreSQL with CloudNativePG operator and
internal Redis with redis-operator (RedisFailover), following established
Cozystack patterns from seaweedfs and redis apps.

Additional fixes from code review:
- Fix registry resources nesting level (registry.registry/controller)
- Persist token CA across upgrades to prevent JWT invalidation
- Update values schema and ApplicationDefinition

Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: Aleksei Sviridkin <f@lex.la>
2026-02-18 00:50:11 +03:00
Aleksei Sviridkin
543ce6e5fd [harbor] Add managed Harbor container registry application
Add Harbor v2.14.2 as a tenant-level managed service with per-component
resource configuration, ingress with TLS termination, and internal
PostgreSQL/Redis.

Includes:
- extra/harbor wrapper chart with HelmRelease, WorkloadMonitors, Ingress
- system/harbor with vendored upstream chart (helm.goharbor.io v1.18.2)
- harbor-rd ApplicationDefinition for dynamic CRD registration
- PackageSource and system.yaml bundle entry
- E2E test with Secret and Service verification

Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: Aleksei Sviridkin <f@lex.la>
2026-02-18 00:50:11 +03:00
Aleksei Sviridkin
1558fb428a build(codegen): sync CRDs to operator embed directory
After generating CRDs to packages/core/installer/crds/, copy them to
internal/crdinstall/manifests/ so the operator binary embeds the latest
CRD definitions.

Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: Aleksei Sviridkin <f@lex.la>
2026-02-18 00:49:56 +03:00
Aleksei Sviridkin
55cd8fc0e1 refactor(installer): move CRDs to crds/ directory for proper Helm install ordering
Helm installs crds/ contents before processing templates, resolving the
chicken-and-egg problem where PackageSource CR validation fails because
its CRD hasn't been registered yet.

- Move definitions/ to crds/ in the installer chart
- Remove templates/crds.yaml (Helm auto-installs from crds/)
- Update codegen script to write CRDs to crds/
- Replace helm template with cat for static CRD manifest generation
- Remove pre-apply CRD workaround from e2e test

Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: Aleksei Sviridkin <f@lex.la>
2026-02-18 00:49:55 +03:00
Aleksei Sviridkin
58dfc97201 fix(e2e): apply CRDs before helm install to resolve dependency ordering
Helm cannot validate PackageSource CR during install because the CRD
is part of the same chart. Pre-apply CRDs via helm template + kubectl
apply --server-side before running helm upgrade --install.

Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: Aleksei Sviridkin <f@lex.la>
2026-02-18 00:49:55 +03:00
Aleksei Sviridkin
153d2c48ae refactor(e2e): use helm install instead of kubectl apply for cozystack installation
Replace pre-rendered static YAML application with direct helm chart
installation in e2e tests. The chart directory with correct values is
already present in the sandbox after pr.patch application.

- Remove CRD/operator artifact upload/download from CI workflow
- Remove copy-installer-manifest target from testing Makefile
- Use helm upgrade --install from local chart in e2e-install-cozystack.bats

Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: Aleksei Sviridkin <f@lex.la>
2026-02-18 00:49:55 +03:00
Aleksei Sviridkin
dd4723386f test(openbao): add E2E test for standalone mode
Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: Aleksei Sviridkin <f@lex.la>
2026-02-17 21:23:27 +03:00
Andrei Kvapil
6c431d0857 fix(codegen): add gen_client to update-codegen.sh and regenerate applyconfiguration (#2061)
## What this PR does

Fix build error in `pkg/generated/applyconfiguration/utils.go` caused by
a reference to `testing.TypeConverter` which was removed in client-go
v0.34.1.

The root cause was that `hack/update-codegen.sh` called `gen_helpers`
and
`gen_openapi` but never called `gen_client`, so the applyconfiguration
code
was never regenerated after the client-go upgrade.

Changes:
- Fix `THIS_PKG` from `k8s.io/sample-apiserver` template leftover to
correct module path
- Add `kube::codegen::gen_client` call with `--with-applyconfig` flag
- Regenerate applyconfiguration (now uses `managedfields.TypeConverter`)
- Add tests for `ForKind` and `NewTypeConverter` functions

### Release note

```release-note
[maintenance] Regenerate applyconfiguration code for client-go v0.34.1 compatibility
```


<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->

## Summary by CodeRabbit

* **Documentation**
* Updated backup class definitions example to reference MariaDB instead
of MySQL.

* **Chores**
* Updated code generation tooling and module dependencies to support
enhanced functionality.

<!-- end of auto-generated comment: release notes by coderabbit.ai -->
2026-02-17 18:21:39 +01:00
Aleksei Sviridkin
a52da8dd8d style(e2e): consistently quote kubeconfig variable references
Quote all tenantkubeconfig-${test_name} references in run-kubernetes.sh
for consistent shell scripting style. The only exception is line 195
inside a sh -ec "..." double-quoted string where inner quotes would
break the outer quoting.

Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: Aleksei Sviridkin <f@lex.la>
2026-02-17 02:00:44 +03:00
Aleksei Sviridkin
315e5dc0bd fix(e2e): make kubernetes test retries effective by cleaning up stale resources
When the kubernetes E2E test fails at the deployment wait step, set -eu
causes immediate exit before cleanup. On retry, kubectl apply outputs
"unchanged" for the stuck deployment, making retries 2 and 3 guaranteed
to fail against the same stuck pod.

Add pre-creation cleanup of backend deployment/service and NFS test
resources using --ignore-not-found, so retries start fresh. Also
increase the deployment wait timeout from 90s to 300s to handle CI
resource pressure, aligning with other timeouts in the same function.

Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: Aleksei Sviridkin <f@lex.la>
2026-02-17 01:58:13 +03:00
Aleksei Sviridkin
75e25fa977 fix(codegen): add gen_client to update-codegen.sh and regenerate applyconfiguration
The applyconfiguration code referenced testing.TypeConverter from
k8s.io/client-go/testing, which was removed in client-go v0.34.1.

Root cause: hack/update-codegen.sh called gen_helpers and gen_openapi
but not gen_client, so applyconfiguration was never regenerated after
the client-go upgrade.

Changes:
- Fix THIS_PKG from sample-apiserver template leftover to correct
  module path
- Add kube::codegen::gen_client call with --with-applyconfig flag
- Regenerate applyconfiguration (now uses managedfields.TypeConverter)
- Add tests for ForKind and NewTypeConverter functions

Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: Aleksei Sviridkin <f@lex.la>
2026-02-16 23:01:38 +03:00
Andrei Kvapil
2bc5e01fda fix kubernetes e2e test for rwx volume
Signed-off-by: Andrei Kvapil <kvapss@gmail.com>
2026-02-14 11:02:12 +01:00
Andrei Kvapil
dbba5c325b fix kubernetes e2e test
Signed-off-by: Andrei Kvapil <kvapss@gmail.com>
2026-02-14 10:13:09 +01:00
Andrei Kvapil
13aa341a28 fix(platform): address review comments in vm migration script
Replace `|| echo ""` with `|| true` to avoid newline bugs in variable
assignments. Switch `for x in $(cmd)` loops to `while read` for safer
iteration over kubectl output.

Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: Andrei Kvapil <kvapss@gmail.com>
2026-02-14 03:02:54 +01:00
Andrei Kvapil
168f6f2445 fix(csi): address review feedback for kubevirt-csi-driver RWX support
- Move nil check before req dereference in CreateVolume
- Scope CiliumNetworkPolicy endpointSelector to specific VMI
- Use vmNamespace from NodeId for VMI lookup instead of infraNamespace
- Log PVC lookup errors in ControllerExpandVolume
- Wrap CNP ownerReference updates in retry.RetryOnConflict
- Fix infraClusterLabels validation to check runControllerService flag
- Dereference nodeName pointer in error message
- Replace panic with klog.Fatal for consistent error handling
- Honor CSI readonly flag in NFS NodePublishVolume
- Log mount list errors in isNFSMount
- Reorder Dockerfile ENTRYPOINT after COPY for better layer caching
- Add cleanup on e2e test failure and --wait on pod deletion

Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: Andrei Kvapil <kvapss@gmail.com>
2026-02-13 09:04:33 +01:00
Andrei Kvapil
46103400f2 test(e2e): adapt kubernetes NFS test for native RWX CSI support
Remove separate NFS Application dependency from e2e test. The kubevirt
CSI driver wrapper now handles RWX Filesystem volumes natively - PVCs
with ReadWriteMany accessMode use the standard kubevirt StorageClass.

Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: Andrei Kvapil <kvapss@gmail.com>
2026-02-13 08:57:14 +01:00
Andrei Kvapil
9a86551e40 fix(e2e): correct s3Bucket reference in mariadb test
Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: Andrei Kvapil <kvapss@gmail.com>
2026-02-12 15:18:06 +01:00
Andrei Kvapil
bce5300116 refactor: rename mysql application to mariadb
The mysql chart actually deploys MariaDB via mariadb-operator, but was
incorrectly named "mysql". Rename all references to use the correct
"mariadb" name across the codebase.

Changes:
- Rename packages/apps/mysql -> packages/apps/mariadb
- Rename packages/system/mysql-rd -> packages/system/mariadb-rd
- Rename platform source and bundle references
- Update CRD kind from MySQL to MariaDB
- Update RBAC, e2e tests, backup controller tests
- Keep real MySQL CLI/config tool names unchanged (mysqldump, [mysqld], etc.)

Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: Andrei Kvapil <kvapss@gmail.com>
2026-02-12 15:15:20 +01:00
Andrei Kvapil
0260b15aaf refactor(apps): remove FerretDB application (#2028)
## What this PR does

Remove the FerretDB managed application from Cozystack. This includes
the application Helm chart, resource definition, platform source, PaaS
bundle entry, RBAC clusterrole entry, and e2e test. Historical migration
scripts are left intact for upgrade compatibility.

### Release note

```release-note
[ferretdb] Removed FerretDB managed application
```

<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->

## Summary by CodeRabbit

* **Chores**
* Removed FerretDB managed database service and associated Helm chart,
documentation, and test components from the platform.

<!-- end of auto-generated comment: release notes by coderabbit.ai -->
2026-02-12 09:30:01 +01:00
Andrei Kvapil
3971e9cb39 [installer] Rename talos asset to cozystack-operator-talos.yaml
Add -talos suffix to the default variant output file for consistency
with -generic and -hosted variants. Update all references in CI
workflows, e2e tests, upload scripts, and testing Makefile.

Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: Andrei Kvapil <kvapss@gmail.com>
2026-02-11 22:05:37 +01:00
Timofei Larkin
5f27152d18 [ci] Cozyreport improvements (#2032)
## What this PR does

Previously the debug log collection script that fired when CI failed
treated Packages and PackageSources as namespaced resources and as a
result of incorrect parsing failed to correctly kubectl describe and
kubectl get -oyaml them. Additionally, the script did not read the logs
of init containers. These issues are fixed with this patch.

### Release note

```release-note
[ci] Improvements to cozyreport.sh (ci log collection script): fix
retrieval of Package and PackageSource details, consider initContainers
as well as containers, when fetching logs of errored pods.
```
2026-02-11 20:40:07 +04:00
Aleksei Sviridkin
2673624261 fix(e2e): apply increased timeout only to ingress-nginx
Keep the 1-minute timeout for other components (cilium, coredns, csi,
vsnap-crd) to preserve fast failure detection, and apply the 5-minute
timeout specifically to ingress-nginx which needs it after the
hostNetwork to NodePort migration.

Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: Aleksei Sviridkin <f@lex.la>
2026-02-11 17:39:02 +03:00