4.4 KiB
Caching in MVP1
Requirements
-
We want to cache artifacts from upstream repositories in order to
- Avoid rate limiting (docker hub)
- Improve on download speed
- Improve on availability
-
We want to cache container images
- Docker Hub
- GCR
- Quay
-
We want to cache common software dependency artifacts of various programming languages
- Maven/Ivy Java
- Go
- NPM
- Rust
- PyPI
-
Must be easily configurable / manageable
- Static config
- API config (REST)
-
Must store artifacts permanently
- Resetting the cache (delete everything) should be easy, tho
-
Currently of out Scope
- Auth: Cache provides data to anyone who can reach it
Architectural Solutions
File System-Based Caching
-
Re-using artifacts stored on the local file system
- e.g. backup and restore
node_modules
directory - Setup within pipelines
- e.g. backup and restore
-
Important: proper cache key selection
-
Performance depends on the cache's storage location
- on node: fast but localized to node
- network storage: still has to download cache archive
-
Pro: Artifacts are downloaded directly from upstream, no further config needed
- Con: Does not address rate limiting concerns for initial cache warm up
-
Pro: No extra config needed in tooling apart of pipeline cache config
-
Has to be stored somewhere?
- GitHub Actions / GitLab typically manage this
- similar to local dev env
-
Con: State management
- Update the cache if new dependencies are used/requested
- Dirty state (looking at you, maven)
- Impure behaviour possible, creating side effects
- Integrity checks of package managers might be bypassed at this point
-
Con: Duplicate content
-
Con: Invalidation needed at some point
Pull-Through Cache
-
Mirror/Proxy repo for upstream repo
- Downloads artifacts transparently from upstream when requested
- Downloaded artifacts are stored locally in the mirror for faster access
-
Pro: Can be re-used in pipelines, dev machines, cloud/prod environments
-
Pro: Little state management necessary if any
-
Con: Requires extra config in tooling, build tools,
containerd
, etc -
Using only the pull-through cache should be fast enough for builds in CI
- Reproducible builds ftw
Solution Candidates
Forgejo Runner Cache
- common actions like
setup-java
do a good job as they create dependencies on all build config files (e.g allpom.xml
)- invalidation if there is any change to dependencies etc.
Nexus
-
Open source / free version
- EPL License allows commercial distribution
-
OSS version only has an extremely limited feature set of supported repository types.
- basically only maven support
- does not suffice for our use case
-
Community Edition has more features but is limited in sizing. Upgrade to Pro edition necessary in those limits are exceeded.
Artifactory
-
Open source / free version
- Limited feature set
- Separate distributions per repo type java / container / etc
-
Inconvenient and insufficient for our use case
License evaluation needed EULA
Artipie
- might be abandoned / low dev activity / needs new maintainer
- However, technically it looks extremely promising
Pulp
- Pull-Through Caches are only technical previews and might not work correctly
kube-image-keeper
-
Creates a DaemonSet, installing a service on each worker node
-
Works within the cluster and rewrites image coordinates on the fly
-
Pro: fine grained caching control
- select/exempt images / namespaces
- cache invalidation
-
Pro: config within k8s or as k8s objects
-
Con: Invasive
-
Con: Rewrites image coordinates using a mutating webhook
-
Con: Must be hosted within each (workload) cluster
-
Con BLOCKER: Cannot handle image digest due to manifest rewrites
'Simple' Squid proxy (or similar)
- Caching of arbitrary resouces via HTTP
Harbor
Recommendation
- File system cache
- Easy solution as it is offered within most pipelines
- Reduces build times significantly if dependencies have to be downloaded from outside networks
- Avoid using fs cache, i.e. forgejo runner cache, long term or at all
- Unless you can handle proper cache invalidation
- Promote immutable infra and reproducible builds without side effects
- Use as additional layer if there is no local cache repo