## Caching in MVP1 ### Requirements - We want to cache artifacts from upstream repositories in order to - Avoid rate limiting (docker hub) - Improve on download speed - Improve on availability - We want to cache container images - Docker Hub - GCR - Quay - We want to cache common software dependency artifacts of various programming languages - Maven/Ivy Java - Go - NPM - Rust - PyPI - Must be easily configurable / manageable - Static config - API config (REST) - Must store artifacts permanently - Resetting the cache (delete everything) should be easy, tho - Currently of out Scope - Auth: Cache provides data to anyone who can reach it ### Architectural Solutions #### File System-Based Caching - Re-using artifacts stored on the local file system - e.g. backup and restore `node_modules` directory - Setup within pipelines - Pro: Artifacts are downloaded directly from upstream, no further config needed - Con: Does not address rate limiting concerns for initial cache warm up - Pro: No extra config needed in tooling apart of pipeline cache config - Has to be stored somewhere? - GitHub Actions / GitLab typically manage this - similar to local dev env - Con: State management - Update the cache if new dependencies are used/requested - Dirty state (looking at you, maven) - Impure behaviour possible, creating side effects - Integrity checks of package managers might be bypassed at this point - Con: Duplicate content - Con: Invalidation needed at some point #### Pull-Through Cache - Mirror/Proxy repo for upstream repo - Downloads artifacts transparently from upstream when requested - Downloaded artifacts are stored locally in the mirror for faster access - Pro: Can be re-used in pipelines, dev machines, cloud/prod environments - Pro: Little state management necessary if any - Con: Requires extra config in tooling, build tools, containerd, etc - Using only the pull-through cache should be fast enough for builds in CI - Reproducible builds ftw ### Solution Candidates #### Forgejo Runner Cache #### Nexus Open source / free version License evaluation needed #### Artifactory Open source / free version License evaluation needed #### Artipie - might be abandoned / low dev activity / needs new maintainer - However, technically it looks extremely promising #### Pulp - Pull-Through Caches are only technical previews and might not work correctly #### kube-image-keeper [GH](https://github.com/enix/kube-image-keeper) - Creates a DaemonSet, installing a service on each worker node - Works within the cluster and rewrites image coordinates on the fly - Pro: fine grained caching control - select/exempt images / namespaces - cache invalidation - Pro: config within k8s or as k8s objects - Con: Invasive - Con: Rewrites image coordinates using a mutating webhook - Con: Must be hosted within each (workload) cluster - Con BLOCKER: Cannot handle image digest due to manifest rewrites #### 'Simple' Squid proxy (or similar) #### Harbor ### Recommendation - Avoid using fs cache, i.e. forgejo runner cache, long term or at all - Promote immutable infra and reproducible builds without side effects