mvp1-caching-research/caching.md

225 lines
6.9 KiB
Markdown
Raw Permalink Normal View History

2025-02-26 16:58:09 +00:00
## Caching in MVP1
### Requirements
- We want to cache artifacts from upstream repositories in order to
- Avoid rate limiting (docker hub)
- Improve on download speed
- Improve on availability
- We want to cache container images
- Docker Hub
- GCR
- Quay
- We want to cache common software dependency artifacts of various programming languages
- Maven/Ivy Java
- Go
- NPM
- Rust
- PyPI
- Must be easily configurable / manageable
- Static config
- API config (REST)
- Must store artifacts permanently
- Resetting the cache (delete everything) should be easy, tho
- Currently of out Scope
- Auth: Cache provides data to anyone who can reach it
2025-03-14 16:53:03 +00:00
- Nice to have
- Repo Cache: Can store uploaded artifacts
2025-02-26 16:58:09 +00:00
### Architectural Solutions
#### File System-Based Caching
- Re-using artifacts stored on the local file system
- e.g. backup and restore `node_modules` directory
- Setup within pipelines
2025-03-04 10:29:56 +00:00
- Important: proper cache key selection
- Performance depends on the cache's storage location
- on node: fast but localized to node
- network storage: still has to download cache archive
2025-02-26 16:58:09 +00:00
- Pro: Artifacts are downloaded directly from upstream, no further config needed
- Con: Does not address rate limiting concerns for initial cache warm up
- Pro: No extra config needed in tooling apart of pipeline cache config
- Has to be stored somewhere?
- GitHub Actions / GitLab typically manage this
- similar to local dev env
- Con: State management
- Update the cache if new dependencies are used/requested
- Dirty state (looking at you, maven)
- Impure behaviour possible, creating side effects
- Integrity checks of package managers might be bypassed at this point
- Con: Duplicate content
- Con: Invalidation needed at some point
#### Pull-Through Cache
- Mirror/Proxy repo for upstream repo
- Downloads artifacts transparently from upstream when requested
- Downloaded artifacts are stored locally in the mirror for faster access
- Pro: Can be re-used in pipelines, dev machines, cloud/prod environments
- Pro: Little state management necessary if any
2025-03-04 10:29:56 +00:00
- Con: Requires extra config in tooling, build tools, `containerd`, etc
2025-02-26 16:58:09 +00:00
- Using only the pull-through cache should be fast enough for builds in CI
- Reproducible builds ftw
### Solution Candidates
#### Forgejo Runner Cache
2025-03-04 10:29:56 +00:00
- common actions like `setup-java` do a good job as they create dependencies on all build config files (e.g all `pom.xml`)
- invalidation if there is any change to dependencies etc.
2025-02-26 16:58:09 +00:00
#### Nexus
2025-03-04 10:29:56 +00:00
[Nexus OSS GH](https://github.com/sonatype/nexus-public)
2025-02-26 16:58:09 +00:00
2025-03-04 10:29:56 +00:00
- Open source / free version
- EPL License allows commercial distribution
- OSS version only has an extremely limited feature set of supported repository types.
- basically only maven support
- does not suffice for our use case
2025-03-14 16:53:03 +00:00
- Community Edition has more features but is limited in sizing. Upgrade to Pro edition necessary if those limits are exceeded.
2025-02-28 17:23:04 +00:00
2025-02-26 16:58:09 +00:00
#### Artifactory
2025-03-04 10:29:56 +00:00
- Open source / free version
2025-03-14 16:53:03 +00:00
2025-03-04 10:29:56 +00:00
- Limited feature set
- Separate distributions per repo type java / container / etc
- Inconvenient and insufficient for our use case
2025-03-14 16:53:03 +00:00
- Full feature set requires paid license
2025-02-26 16:58:09 +00:00
2025-02-28 17:23:04 +00:00
License evaluation needed
2025-03-04 10:29:56 +00:00
[EULA](https://jfrog.com/artifactory/eula/)
2025-02-28 17:23:04 +00:00
2025-02-26 16:58:09 +00:00
#### Artipie
[GH](https://github.com/artipie/artipie)
[Wiki](https://github.com/artipie/artipie/wiki)
2025-03-14 16:53:03 +00:00
- Self-hosted and upstream artifact caching
- MIT License
2025-02-26 16:58:09 +00:00
- might be abandoned / low dev activity / needs new maintainer
- However, technically it looks extremely promising
- Initial setup does not run out of the box correctly, needs some love
- Mostly headless
- Brings a limited web interface
- Repo creation, artifact viewing
2025-03-14 16:53:03 +00:00
- Buggy default config
- config changes require restart, which seems to be a bug?
- Easy to setup, once bugs and buggy config are mitigated/worked around
- File system and object storage supported
- No databases required
- Pro: Config in yaml file
2025-02-26 16:58:09 +00:00
2025-03-14 16:53:03 +00:00
- Due to its simplicity it might be a good candidate for a first upstream caching solution
2025-02-26 16:58:09 +00:00
#### Pulp
[Website](https://pulpproject.org/)
[GH](https://github.com/pulp/pulpcore)
2025-03-14 16:53:03 +00:00
- Self-hosted and upstream artifact caching
- GPL 2.0 License
2025-02-26 16:58:09 +00:00
- Pull-Through Caches are only technical previews and might not work correctly
2025-03-14 16:53:03 +00:00
- Pull-through cache does not fit into the concept of how artifacts are stored an tracked
- Intended workflow is to sync dedicated artifacts with some upstream repo, not the entire repo
- Setup and config are quite complex
- Build for high availability
- File system and object storage supported
- Requires SQL Db (Postgres) and possibly Redis
2025-02-26 16:58:09 +00:00
#### kube-image-keeper
2025-02-28 17:23:04 +00:00
[GH](https://github.com/enix/kube-image-keeper)
- Creates a DaemonSet, installing a service on each worker node
- Works within the cluster and rewrites image coordinates on the fly
- Pro: fine grained caching control
- select/exempt images / namespaces
- cache invalidation
- Pro: config within k8s or as k8s objects
- Con: Invasive
- Con: Rewrites image coordinates using a mutating webhook
- Con: Must be hosted within each (workload) cluster
- Con BLOCKER: Cannot handle image digest due to manifest rewrites
2025-02-26 16:58:09 +00:00
#### 'Simple' Squid proxy (or similar)
- Caching of arbitrary resources via HTTP
- "Stupid" caching
- Invalidation becomes a problem rather quickly
2025-03-04 10:29:56 +00:00
2025-02-26 16:58:09 +00:00
#### Harbor
[Website](https://goharbor.io/)
[GH](https://github.com/goharbor/harbor)
- Apache 2.0 License
- The go-to container registry
- Allows self-hosting artifacts and caching upstream ones
- Pro: Image Signing
- Pro: Multi Tenant
- Pro: Quotas
- Pro: Vulnerability Scans
- Pro: SBOM creation
- Pro: P2P distribution of artifacts
- Pro: fully fledged web interface
- Con: Only Container / OCI related artifacts
2025-02-26 16:58:09 +00:00
### Recommendation
2025-03-04 10:29:56 +00:00
- File system cache
- Easy solution as it is offered within most pipelines
- Reduces build times significantly if dependencies have to be downloaded from outside networks
- Avoid using fs cache, i.e. forgejo runner cache, long term or at all
- Unless you can handle proper cache invalidation
2025-03-14 16:53:03 +00:00
- Promotes immutable infra and reproducible builds without side effects
2025-03-04 10:29:56 +00:00
- Use as additional layer if there is no local cache repo
2025-03-14 16:53:03 +00:00
- Repo caches
- Can replace file system cache if network and repo are fast enough
- Optimal solution would be a Nexus/Artifactory-like unified solution
- Foss solutions like Artipie and Pulp have severe problems
- Requires us to add features/fixes/maintenance
- Due to scarce landscape of proper foss solutions we might have to opt for multiple dedicated solutions
- If we opt for a dedicated container cache, we should re-evaluate Harbor or Quay
- Try to use Artipie as a first, simple solution and use Forgejo Runner caches in conjunction for even better performance
- If Artipie does not work correctly or does not fit some reason we didn't waste too much time on it
- If Artipie is abandoned but the concept works for us, we should consider maintaining it and continuing its development