mvp1-caching-research/caching.md at 28d06b28ae6a85be108bdbfa14a0758f0816ae0c

Patrick.Sy/mvp1-caching-research

Fork 0

Patrick Sy 28d06b28ae

chore: Added some more research

2025-02-28 18:23:04 +01:00

3.1 KiB

Raw Blame History

Caching in MVP1

Requirements

We want to cache artifacts from upstream repositories in order to
- Avoid rate limiting (docker hub)
- Improve on download speed
- Improve on availability
We want to cache container images
- Docker Hub
- GCR
- Quay
We want to cache common software dependency artifacts of various programming languages
- Maven/Ivy Java
- Go
- NPM
- Rust
- PyPI
Must be easily configurable / manageable
- Static config
- API config (REST)
Must store artifacts permanently
- Resetting the cache (delete everything) should be easy, tho
Currently of out Scope
- Auth: Cache provides data to anyone who can reach it

Architectural Solutions

File System-Based Caching

Re-using artifacts stored on the local file system
- e.g. backup and restore node_modules directory
- Setup within pipelines
Pro: Artifacts are downloaded directly from upstream, no further config needed
- Con: Does not address rate limiting concerns for initial cache warm up
Pro: No extra config needed in tooling apart of pipeline cache config
Has to be stored somewhere?
- GitHub Actions / GitLab typically manage this
- similar to local dev env
Con: State management
- Update the cache if new dependencies are used/requested
- Dirty state (looking at you, maven)
- Impure behaviour possible, creating side effects
- Integrity checks of package managers might be bypassed at this point
Con: Duplicate content
Con: Invalidation needed at some point

Pull-Through Cache

Mirror/Proxy repo for upstream repo
- Downloads artifacts transparently from upstream when requested
- Downloaded artifacts are stored locally in the mirror for faster access
Pro: Can be re-used in pipelines, dev machines, cloud/prod environments
Pro: Little state management necessary if any
Con: Requires extra config in tooling, build tools, containerd, etc
Using only the pull-through cache should be fast enough for builds in CI
- Reproducible builds ftw

Solution Candidates

Forgejo Runner Cache

Nexus

Open source / free version

License evaluation needed

Artifactory

Open source / free version

License evaluation needed

Artipie

might be abandoned / low dev activity / needs new maintainer
- However, technically it looks extremely promising

Pulp

Pull-Through Caches are only technical previews and might not work correctly

kube-image-keeper

Creates a DaemonSet, installing a service on each worker node
Works within the cluster and rewrites image coordinates on the fly
Pro: fine grained caching control
- select/exempt images / namespaces
- cache invalidation
Pro: config within k8s or as k8s objects
Con: Invasive
Con: Rewrites image coordinates using a mutating webhook
Con: Must be hosted within each (workload) cluster
Con BLOCKER: Cannot handle image digest due to manifest rewrites

'Simple' Squid proxy (or similar)

Harbor

Recommendation

Avoid using fs cache, i.e. forgejo runner cache, long term or at all
- Promote immutable infra and reproducible builds without side effects

3.1 KiB Raw Blame History

Caching in MVP1

Requirements

Architectural Solutions

File System-Based Caching

Pull-Through Cache

Solution Candidates

Forgejo Runner Cache

Nexus

Artifactory

Artipie

Pulp

kube-image-keeper

'Simple' Squid proxy (or similar)

Harbor

Recommendation

3.1 KiB

Raw Blame History