commit 4bc6c696aa7479c175b164914dfa2ff04d53974c Author: Patrick Sy Date: Wed Feb 26 17:58:09 2025 +0100 init diff --git a/caching.md b/caching.md new file mode 100644 index 0000000..d39dd06 --- /dev/null +++ b/caching.md @@ -0,0 +1,104 @@ +## Caching in MVP1 + +### Requirements + +- We want to cache artifacts from upstream repositories in order to + + - Avoid rate limiting (docker hub) + - Improve on download speed + - Improve on availability + +- We want to cache container images + + - Docker Hub + - GCR + - Quay + +- We want to cache common software dependency artifacts of various programming languages + + - Maven/Ivy Java + - Go + - NPM + - Rust + - PyPI + +- Must be easily configurable / manageable + + - Static config + - API config (REST) + +- Must store artifacts permanently + + - Resetting the cache (delete everything) should be easy, tho + +- Currently of out Scope + - Auth: Cache provides data to anyone who can reach it + +### Architectural Solutions + +#### File System-Based Caching + +- Re-using artifacts stored on the local file system + + - e.g. backup and restore `node_modules` directory + - Setup within pipelines + +- Pro: Artifacts are downloaded directly from upstream, no further config needed + - Con: Does not address rate limiting concerns for initial cache warm up +- Pro: No extra config needed in tooling apart of pipeline cache config +- Has to be stored somewhere? + - GitHub Actions / GitLab typically manage this + - similar to local dev env +- Con: State management + - Update the cache if new dependencies are used/requested + - Dirty state (looking at you, maven) + - Impure behaviour possible, creating side effects + - Integrity checks of package managers might be bypassed at this point +- Con: Duplicate content +- Con: Invalidation needed at some point + +#### Pull-Through Cache + +- Mirror/Proxy repo for upstream repo + + - Downloads artifacts transparently from upstream when requested + - Downloaded artifacts are stored locally in the mirror for faster access + +- Pro: Can be re-used in pipelines, dev machines, cloud/prod environments +- Pro: Little state management necessary if any +- Con: Requires extra config in tooling, build tools, containerd, etc + +- Using only the pull-through cache should be fast enough for builds in CI + - Reproducible builds ftw + +### Solution Candidates + +#### Forgejo Runner Cache + +#### Nexus + +Open source / free version + +#### Artifactory + +Open source / free version + +#### Artipie + +- might be abandoned / low dev activity / needs new maintainer + - However, technically it looks extremely promising + +#### Pulp + +- Pull-Through Caches are only technical previews and might not work correctly + +#### kube-image-keeper + +#### 'Simple' Squid proxy (or similar) + +#### Harbor + +### Recommendation + +- Avoid using fs cache, i.e. forgejo runner cache, long term or at all + - Promote immutable infra and reproducible builds without side effects