This commit is contained in:
Patrick Sy 2025-02-26 17:58:09 +01:00
commit 4bc6c696aa
Signed by: Patrick.Sy
GPG key ID: DDDC8EC51823195E

104
caching.md Normal file
View file

@ -0,0 +1,104 @@
## Caching in MVP1
### Requirements
- We want to cache artifacts from upstream repositories in order to
- Avoid rate limiting (docker hub)
- Improve on download speed
- Improve on availability
- We want to cache container images
- Docker Hub
- GCR
- Quay
- We want to cache common software dependency artifacts of various programming languages
- Maven/Ivy Java
- Go
- NPM
- Rust
- PyPI
- Must be easily configurable / manageable
- Static config
- API config (REST)
- Must store artifacts permanently
- Resetting the cache (delete everything) should be easy, tho
- Currently of out Scope
- Auth: Cache provides data to anyone who can reach it
### Architectural Solutions
#### File System-Based Caching
- Re-using artifacts stored on the local file system
- e.g. backup and restore `node_modules` directory
- Setup within pipelines
- Pro: Artifacts are downloaded directly from upstream, no further config needed
- Con: Does not address rate limiting concerns for initial cache warm up
- Pro: No extra config needed in tooling apart of pipeline cache config
- Has to be stored somewhere?
- GitHub Actions / GitLab typically manage this
- similar to local dev env
- Con: State management
- Update the cache if new dependencies are used/requested
- Dirty state (looking at you, maven)
- Impure behaviour possible, creating side effects
- Integrity checks of package managers might be bypassed at this point
- Con: Duplicate content
- Con: Invalidation needed at some point
#### Pull-Through Cache
- Mirror/Proxy repo for upstream repo
- Downloads artifacts transparently from upstream when requested
- Downloaded artifacts are stored locally in the mirror for faster access
- Pro: Can be re-used in pipelines, dev machines, cloud/prod environments
- Pro: Little state management necessary if any
- Con: Requires extra config in tooling, build tools, containerd, etc
- Using only the pull-through cache should be fast enough for builds in CI
- Reproducible builds ftw
### Solution Candidates
#### Forgejo Runner Cache
#### Nexus
Open source / free version
#### Artifactory
Open source / free version
#### Artipie
- might be abandoned / low dev activity / needs new maintainer
- However, technically it looks extremely promising
#### Pulp
- Pull-Through Caches are only technical previews and might not work correctly
#### kube-image-keeper
#### 'Simple' Squid proxy (or similar)
#### Harbor
### Recommendation
- Avoid using fs cache, i.e. forgejo runner cache, long term or at all
- Promote immutable infra and reproducible builds without side effects