mvp1-caching-research/caching.md

6.9 KiB

Caching in MVP1

Requirements

  • We want to cache artifacts from upstream repositories in order to

    • Avoid rate limiting (docker hub)
    • Improve on download speed
    • Improve on availability
  • We want to cache container images

    • Docker Hub
    • GCR
    • Quay
  • We want to cache common software dependency artifacts of various programming languages

    • Maven/Ivy Java
    • Go
    • NPM
    • Rust
    • PyPI
  • Must be easily configurable / manageable

    • Static config
    • API config (REST)
  • Must store artifacts permanently

    • Resetting the cache (delete everything) should be easy, tho
  • Currently of out Scope

    • Auth: Cache provides data to anyone who can reach it
  • Nice to have

    • Repo Cache: Can store uploaded artifacts

Architectural Solutions

File System-Based Caching

  • Re-using artifacts stored on the local file system

    • e.g. backup and restore node_modules directory
    • Setup within pipelines
  • Important: proper cache key selection

  • Performance depends on the cache's storage location

    • on node: fast but localized to node
    • network storage: still has to download cache archive
  • Pro: Artifacts are downloaded directly from upstream, no further config needed

    • Con: Does not address rate limiting concerns for initial cache warm up
  • Pro: No extra config needed in tooling apart of pipeline cache config

  • Has to be stored somewhere?

    • GitHub Actions / GitLab typically manage this
    • similar to local dev env
  • Con: State management

    • Update the cache if new dependencies are used/requested
    • Dirty state (looking at you, maven)
    • Impure behaviour possible, creating side effects
    • Integrity checks of package managers might be bypassed at this point
  • Con: Duplicate content

  • Con: Invalidation needed at some point

Pull-Through Cache

  • Mirror/Proxy repo for upstream repo

    • Downloads artifacts transparently from upstream when requested
    • Downloaded artifacts are stored locally in the mirror for faster access
  • Pro: Can be re-used in pipelines, dev machines, cloud/prod environments

  • Pro: Little state management necessary if any

  • Con: Requires extra config in tooling, build tools, containerd, etc

  • Using only the pull-through cache should be fast enough for builds in CI

    • Reproducible builds ftw

Solution Candidates

Forgejo Runner Cache

  • common actions like setup-java do a good job as they create dependencies on all build config files (e.g all pom.xml)
    • invalidation if there is any change to dependencies etc.

Nexus

Nexus OSS GH

  • Open source / free version

    • EPL License allows commercial distribution
  • OSS version only has an extremely limited feature set of supported repository types.

    • basically only maven support
    • does not suffice for our use case
  • Community Edition has more features but is limited in sizing. Upgrade to Pro edition necessary if those limits are exceeded.

Artifactory

  • Open source / free version

    • Limited feature set
    • Separate distributions per repo type java / container / etc
  • Inconvenient and insufficient for our use case

  • Full feature set requires paid license

License evaluation needed EULA

Artipie

GH Wiki

  • Self-hosted and upstream artifact caching

  • MIT License

  • might be abandoned / low dev activity / needs new maintainer

    • However, technically it looks extremely promising
    • Initial setup does not run out of the box correctly, needs some love
  • Mostly headless

    • Brings a limited web interface
      • Repo creation, artifact viewing
  • Buggy default config

    • config changes require restart, which seems to be a bug?
  • Easy to setup, once bugs and buggy config are mitigated/worked around

  • File system and object storage supported

  • No databases required

  • Pro: Config in yaml file

  • Due to its simplicity it might be a good candidate for a first upstream caching solution

Pulp

Website GH

  • Self-hosted and upstream artifact caching

  • GPL 2.0 License

  • Pull-Through Caches are only technical previews and might not work correctly

    • Pull-through cache does not fit into the concept of how artifacts are stored an tracked
    • Intended workflow is to sync dedicated artifacts with some upstream repo, not the entire repo
  • Setup and config are quite complex

    • Build for high availability
  • File system and object storage supported

  • Requires SQL Db (Postgres) and possibly Redis

kube-image-keeper

GH

  • Creates a DaemonSet, installing a service on each worker node

  • Works within the cluster and rewrites image coordinates on the fly

  • Pro: fine grained caching control

    • select/exempt images / namespaces
    • cache invalidation
  • Pro: config within k8s or as k8s objects

  • Con: Invasive

  • Con: Rewrites image coordinates using a mutating webhook

  • Con: Must be hosted within each (workload) cluster

  • Con BLOCKER: Cannot handle image digest due to manifest rewrites

'Simple' Squid proxy (or similar)

  • Caching of arbitrary resources via HTTP
  • "Stupid" caching
    • Invalidation becomes a problem rather quickly

Harbor

Website GH

  • Apache 2.0 License

  • The go-to container registry

  • Allows self-hosting artifacts and caching upstream ones

  • Pro: Image Signing

  • Pro: Multi Tenant

  • Pro: Quotas

  • Pro: Vulnerability Scans

  • Pro: SBOM creation

  • Pro: P2P distribution of artifacts

  • Pro: fully fledged web interface

  • Con: Only Container / OCI related artifacts

Recommendation

  • File system cache

    • Easy solution as it is offered within most pipelines
    • Reduces build times significantly if dependencies have to be downloaded from outside networks
    • Avoid using fs cache, i.e. forgejo runner cache, long term or at all
      • Unless you can handle proper cache invalidation
      • Promotes immutable infra and reproducible builds without side effects
    • Use as additional layer if there is no local cache repo
  • Repo caches

    • Can replace file system cache if network and repo are fast enough
    • Optimal solution would be a Nexus/Artifactory-like unified solution
      • Foss solutions like Artipie and Pulp have severe problems
        • Requires us to add features/fixes/maintenance
    • Due to scarce landscape of proper foss solutions we might have to opt for multiple dedicated solutions
      • If we opt for a dedicated container cache, we should re-evaluate Harbor or Quay
  • Try to use Artipie as a first, simple solution and use Forgejo Runner caches in conjunction for even better performance

    • If Artipie does not work correctly or does not fit some reason we didn't waste too much time on it
    • If Artipie is abandoned but the concept works for us, we should consider maintaining it and continuing its development