mvp1-caching-research/caching.md
2025-02-26 17:58:09 +01:00

2.6 KiB

Caching in MVP1

Requirements

  • We want to cache artifacts from upstream repositories in order to

    • Avoid rate limiting (docker hub)
    • Improve on download speed
    • Improve on availability
  • We want to cache container images

    • Docker Hub
    • GCR
    • Quay
  • We want to cache common software dependency artifacts of various programming languages

    • Maven/Ivy Java
    • Go
    • NPM
    • Rust
    • PyPI
  • Must be easily configurable / manageable

    • Static config
    • API config (REST)
  • Must store artifacts permanently

    • Resetting the cache (delete everything) should be easy, tho
  • Currently of out Scope

    • Auth: Cache provides data to anyone who can reach it

Architectural Solutions

File System-Based Caching

  • Re-using artifacts stored on the local file system

    • e.g. backup and restore node_modules directory
    • Setup within pipelines
  • Pro: Artifacts are downloaded directly from upstream, no further config needed

    • Con: Does not address rate limiting concerns for initial cache warm up
  • Pro: No extra config needed in tooling apart of pipeline cache config

  • Has to be stored somewhere?

    • GitHub Actions / GitLab typically manage this
    • similar to local dev env
  • Con: State management

    • Update the cache if new dependencies are used/requested
    • Dirty state (looking at you, maven)
    • Impure behaviour possible, creating side effects
    • Integrity checks of package managers might be bypassed at this point
  • Con: Duplicate content

  • Con: Invalidation needed at some point

Pull-Through Cache

  • Mirror/Proxy repo for upstream repo

    • Downloads artifacts transparently from upstream when requested
    • Downloaded artifacts are stored locally in the mirror for faster access
  • Pro: Can be re-used in pipelines, dev machines, cloud/prod environments

  • Pro: Little state management necessary if any

  • Con: Requires extra config in tooling, build tools, containerd, etc

  • Using only the pull-through cache should be fast enough for builds in CI

    • Reproducible builds ftw

Solution Candidates

Forgejo Runner Cache

Nexus

Open source / free version

Artifactory

Open source / free version

Artipie

  • might be abandoned / low dev activity / needs new maintainer
    • However, technically it looks extremely promising

Pulp

  • Pull-Through Caches are only technical previews and might not work correctly

kube-image-keeper

'Simple' Squid proxy (or similar)

Harbor

Recommendation

  • Avoid using fs cache, i.e. forgejo runner cache, long term or at all
    • Promote immutable infra and reproducible builds without side effects