ingress-nginx-helm/internal/ingress/metric/main.go

228 lines
5.8 KiB
Go
Raw Normal View History

2018-07-07 17:46:18 +00:00
/*
Copyright 2017 The Kubernetes Authors.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
*/
package metric
import (
"os"
"sync/atomic"
2018-07-30 13:56:09 +00:00
"time"
2018-07-07 17:46:18 +00:00
"github.com/prometheus/client_golang/prometheus"
2020-08-08 23:31:02 +00:00
"k8s.io/klog/v2"
"k8s.io/apimachinery/pkg/util/sets"
2018-07-07 17:46:18 +00:00
"k8s.io/ingress-nginx/internal/ingress/metric/collectors"
"k8s.io/ingress-nginx/pkg/apis/ingress"
2018-07-07 17:46:18 +00:00
)
// Collector defines the interface for a metric collector
type Collector interface {
ConfigSuccess(uint64, bool)
IncReloadCount()
IncReloadErrorCount()
SetAdmissionMetrics(float64, float64, float64, float64, float64, float64)
OnStartedLeading(string)
OnStoppedLeading(string)
IncCheckCount(string, string)
IncCheckErrorCount(string, string)
Add a certificate info metric (#8253) When the ingress controller loads certificates (new ones or following a secret update), it performs a series of check to ensure its validity. In our systems, we detected a case where, when the secret object is compromised, for example when the certificate does not match the secret key, different pods of the ingress controller are serving a different version of the certificate. This behaviour is due to the cache mechanism of the ingress controller, keeping the last known certificate in case of corruption. When this happens, old ingress-controller pods will keep serving the old one, while new pods, by failing to load the corrupted certificates, would use the default certificate, causing invalid certificates for its clients. This generates a random error on the client side, depending on the actual pod instance it reaches. In order to allow detecting occurences of those situations, add a metric to expose, for all ingress controlller pods, detailed informations of the currently loaded certificate. This will, for example, allow setting an alert when there is a certificate discrepency across all ingress controller pods using a query similar to `sum(nginx_ingress_controller_ssl_certificate_info{host="name.tld"})by(serial_number)` This also allows to catch other exceptions loading certificates (failing to load the certificate from the k8s API, ... Co-authored-by: Daniel Ricart <danielricart@users.noreply.github.com> Co-authored-by: Daniel Ricart <danielricart@users.noreply.github.com>
2022-02-24 15:08:32 +00:00
RemoveMetrics(ingresses, endpoints, certificates []string)
2018-07-07 17:46:18 +00:00
SetSSLExpireTime([]*ingress.Server)
Add a certificate info metric (#8253) When the ingress controller loads certificates (new ones or following a secret update), it performs a series of check to ensure its validity. In our systems, we detected a case where, when the secret object is compromised, for example when the certificate does not match the secret key, different pods of the ingress controller are serving a different version of the certificate. This behaviour is due to the cache mechanism of the ingress controller, keeping the last known certificate in case of corruption. When this happens, old ingress-controller pods will keep serving the old one, while new pods, by failing to load the corrupted certificates, would use the default certificate, causing invalid certificates for its clients. This generates a random error on the client side, depending on the actual pod instance it reaches. In order to allow detecting occurences of those situations, add a metric to expose, for all ingress controlller pods, detailed informations of the currently loaded certificate. This will, for example, allow setting an alert when there is a certificate discrepency across all ingress controller pods using a query similar to `sum(nginx_ingress_controller_ssl_certificate_info{host="name.tld"})by(serial_number)` This also allows to catch other exceptions loading certificates (failing to load the certificate from the k8s API, ... Co-authored-by: Daniel Ricart <danielricart@users.noreply.github.com> Co-authored-by: Daniel Ricart <danielricart@users.noreply.github.com>
2022-02-24 15:08:32 +00:00
SetSSLInfo(servers []*ingress.Server)
2018-07-07 17:46:18 +00:00
// SetHosts sets the hostnames that are being served by the ingress controller
SetHosts(sets.String)
Start(string)
Stop(string)
2018-07-07 17:46:18 +00:00
}
type collector struct {
nginxStatus collectors.NGINXStatusCollector
nginxProcess collectors.NGINXProcessCollector
ingressController *collectors.Controller
admissionController *collectors.AdmissionCollector
2018-07-07 17:46:18 +00:00
socket *collectors.SocketCollector
registry *prometheus.Registry
}
// NewCollector creates a new metric collector the for ingress controller
func NewCollector(metricsPerHost, reportStatusClasses bool, registry *prometheus.Registry, ingressclass string, buckets collectors.HistogramBuckets) (Collector, error) {
2018-07-07 17:46:18 +00:00
podNamespace := os.Getenv("POD_NAMESPACE")
if podNamespace == "" {
podNamespace = "default"
}
podName := os.Getenv("POD_NAME")
Release v1 (#7470) * Drop v1beta1 from ingress nginx (#7156) * Drop v1beta1 from ingress nginx Signed-off-by: Ricardo Pchevuzinske Katz <ricardo.katz@gmail.com> * Fix intorstr logic in controller Signed-off-by: Ricardo Pchevuzinske Katz <ricardo.katz@gmail.com> * fixing admission Signed-off-by: Ricardo Pchevuzinske Katz <ricardo.katz@gmail.com> * more intorstr fixing * correct template rendering Signed-off-by: Ricardo Pchevuzinske Katz <ricardo.katz@gmail.com> * Fix e2e tests for v1 api Signed-off-by: Ricardo Pchevuzinske Katz <ricardo.katz@gmail.com> * Fix gofmt errors * This is finally working...almost there... Signed-off-by: Ricardo Pchevuzinske Katz <ricardo.katz@gmail.com> * Re-add removed validation of AdmissionReview * Prepare for v1.0.0-alpha.1 release Signed-off-by: Ricardo Pchevuzinske Katz <ricardo.katz@gmail.com> * Update changelog and matrix table for v1.0.0-alpha.1 (#7274) Signed-off-by: Ricardo Pchevuzinske Katz <ricardo.katz@gmail.com> * add docs for syslog feature (#7219) * Fix link to e2e-tests.md in developer-guide (#7201) * Use ENV expansion for namespace in args (#7146) Update the DaemonSet namespace references to use the `POD_NAMESPACE` environment variable in the same way that the Deployment does. * chart: using Helm builtin capabilities check (#7190) Signed-off-by: Jintao Zhang <zhangjintao9020@gmail.com> * Update proper default value for HTTP2MaxConcurrentStreams in Docs (#6944) It should be 128 as documented in https://github.com/kubernetes/ingress-nginx/blob/master/internal/ingress/controller/config/config.go#L780 * Fix MaxWorkerOpenFiles calculation on high cores nodes (#7107) * Fix MaxWorkerOpenFiles calculation on high cores nodes * Add e2e test for rlimit_nofile * Fix doc for max-worker-open-files * ingress/tcp: add additional error logging on failed (#7208) * Add file containing stable release (#7313) * Handle named (non-numeric) ports correctly (#7311) Signed-off-by: Carlos Panato <ctadeu@gmail.com> * Updated v1beta1 to v1 as its deprecated (#7308) * remove mercurial from build (#7031) * Retry to download maxmind DB if it fails (#7242) * Retry to download maxmind DB if it fails. Signed-off-by: Sergey Shakuto <sshakuto@infoblox.com> * Add retries count arg, move retry logic into DownloadGeoLite2DB function Signed-off-by: Sergey Shakuto <sshakuto@infoblox.com> * Reorder parameters in DownloadGeoLite2DB Signed-off-by: Sergey Shakuto <sshakuto@infoblox.com> * Remove hardcoded value Signed-off-by: Sergey Shakuto <sshakuto@infoblox.com> * Release v1.0.0-alpha.1 * Add changelog for v1.0.0-alpha.2 * controller: ignore non-service backends (#7332) * controller: ignore non-service backends Signed-off-by: Carlos Panato <ctadeu@gmail.com> * update per feedback Signed-off-by: Carlos Panato <ctadeu@gmail.com> * fix: allow scope/tcp/udp configmap namespace to altered (#7161) * Lower webhook timeout for digital ocean (#7319) * Lower webhook timeout for digital ocean * Set Digital Ocean value controller.admissionWebhooks.timeoutSeconds to 29 * update OWNERS and aliases files (#7365) (#7366) Signed-off-by: Carlos Panato <ctadeu@gmail.com> * Downgrade Lua modules for s390x (#7355) Downgrade Lua modules to last known working version. * Fix IngressClass logic for newer releases (#7341) * Fix IngressClass logic for newer releases Signed-off-by: Ricardo Pchevuzinske Katz <ricardo.katz@gmail.com> * Change e2e tests for the new IngressClass presence * Fix chart and admission tests Signed-off-by: Ricardo Pchevuzinske Katz <ricardo.katz@gmail.com> * Fix helm chart test Signed-off-by: Ricardo Pchevuzinske Katz <ricardo.katz@gmail.com> * Fix reviews * Remove ingressclass code from admission * update tag to v1.0.0-beta.1 * update readme and changelog for v1.0.0-beta.1 * Release v1.0.0-beta.1 - helm and manifests (#7422) * Change the order of annotation just to trigger a new helm release (#7425) * [cherry-pick] Add dev-v1 branch into helm releaser (#7428) * Add dev-v1 branch into helm releaser (#7424) * chore: add link for artifacthub.io/prerelease annotations Signed-off-by: Jintao Zhang <zhangjintao9020@gmail.com> Co-authored-by: Ricardo Katz <rikatz@users.noreply.github.com> * k8s job ci pipeline for dev-v1 br v1.22.0 (#7453) * k8s job ci pipeline for dev-v1 br v1.22.0 Signed-off-by: Neha Lohia <nehapithadiya444@gmail.com> * k8s job ci pipeline for dev-v1 br v1.21.2 Signed-off-by: Neha Lohia <nehapithadiya444@gmail.com> * remove v1.21.1 version Signed-off-by: Neha Lohia <nehapithadiya444@gmail.com> * Add controller.watchIngressWithoutClass config option (#7459) Signed-off-by: Akshit Grover <akshit.grover2016@gmail.com> * Release new helm chart with certgen fixed (#7478) * Update go version, modules and remove ioutil * Release new helm chart with certgen fixed * changed appversion, chartversion, TAG, image (#7490) * Fix CI conflict * Fix CI conflict * Fix build.sh from rebase process * Fix controller_test post rebase Co-authored-by: Tianhao Guo <rggth09@gmail.com> Co-authored-by: Ray <61553+rctay@users.noreply.github.com> Co-authored-by: Bill Cassidy <cassid4@gmail.com> Co-authored-by: Jintao Zhang <tao12345666333@163.com> Co-authored-by: Sathish Ramani <rsathishx87@gmail.com> Co-authored-by: Mansur Marvanov <nanorobocop@gmail.com> Co-authored-by: Matt1360 <568198+Matt1360@users.noreply.github.com> Co-authored-by: Carlos Tadeu Panato Junior <ctadeu@gmail.com> Co-authored-by: Kundan Kumar <kundan.kumar@india.nec.com> Co-authored-by: Tom Hayward <thayward@infoblox.com> Co-authored-by: Sergey Shakuto <sshakuto@infoblox.com> Co-authored-by: Tore <tore.lonoy@gmail.com> Co-authored-by: Bouke Versteegh <info@boukeversteegh.nl> Co-authored-by: Shahid <shahid@us.ibm.com> Co-authored-by: James Strong <strong.james.e@gmail.com> Co-authored-by: Long Wu Yuan <longwuyuan@gmail.com> Co-authored-by: Jintao Zhang <zhangjintao9020@gmail.com> Co-authored-by: Neha Lohia <nehapithadiya444@gmail.com> Co-authored-by: Akshit Grover <akshit.grover2016@gmail.com>
2021-08-21 20:42:00 +00:00
nc, err := collectors.NewNGINXStatus(podName, podNamespace, ingressclass)
2018-07-07 17:46:18 +00:00
if err != nil {
return nil, err
}
Release v1 (#7470) * Drop v1beta1 from ingress nginx (#7156) * Drop v1beta1 from ingress nginx Signed-off-by: Ricardo Pchevuzinske Katz <ricardo.katz@gmail.com> * Fix intorstr logic in controller Signed-off-by: Ricardo Pchevuzinske Katz <ricardo.katz@gmail.com> * fixing admission Signed-off-by: Ricardo Pchevuzinske Katz <ricardo.katz@gmail.com> * more intorstr fixing * correct template rendering Signed-off-by: Ricardo Pchevuzinske Katz <ricardo.katz@gmail.com> * Fix e2e tests for v1 api Signed-off-by: Ricardo Pchevuzinske Katz <ricardo.katz@gmail.com> * Fix gofmt errors * This is finally working...almost there... Signed-off-by: Ricardo Pchevuzinske Katz <ricardo.katz@gmail.com> * Re-add removed validation of AdmissionReview * Prepare for v1.0.0-alpha.1 release Signed-off-by: Ricardo Pchevuzinske Katz <ricardo.katz@gmail.com> * Update changelog and matrix table for v1.0.0-alpha.1 (#7274) Signed-off-by: Ricardo Pchevuzinske Katz <ricardo.katz@gmail.com> * add docs for syslog feature (#7219) * Fix link to e2e-tests.md in developer-guide (#7201) * Use ENV expansion for namespace in args (#7146) Update the DaemonSet namespace references to use the `POD_NAMESPACE` environment variable in the same way that the Deployment does. * chart: using Helm builtin capabilities check (#7190) Signed-off-by: Jintao Zhang <zhangjintao9020@gmail.com> * Update proper default value for HTTP2MaxConcurrentStreams in Docs (#6944) It should be 128 as documented in https://github.com/kubernetes/ingress-nginx/blob/master/internal/ingress/controller/config/config.go#L780 * Fix MaxWorkerOpenFiles calculation on high cores nodes (#7107) * Fix MaxWorkerOpenFiles calculation on high cores nodes * Add e2e test for rlimit_nofile * Fix doc for max-worker-open-files * ingress/tcp: add additional error logging on failed (#7208) * Add file containing stable release (#7313) * Handle named (non-numeric) ports correctly (#7311) Signed-off-by: Carlos Panato <ctadeu@gmail.com> * Updated v1beta1 to v1 as its deprecated (#7308) * remove mercurial from build (#7031) * Retry to download maxmind DB if it fails (#7242) * Retry to download maxmind DB if it fails. Signed-off-by: Sergey Shakuto <sshakuto@infoblox.com> * Add retries count arg, move retry logic into DownloadGeoLite2DB function Signed-off-by: Sergey Shakuto <sshakuto@infoblox.com> * Reorder parameters in DownloadGeoLite2DB Signed-off-by: Sergey Shakuto <sshakuto@infoblox.com> * Remove hardcoded value Signed-off-by: Sergey Shakuto <sshakuto@infoblox.com> * Release v1.0.0-alpha.1 * Add changelog for v1.0.0-alpha.2 * controller: ignore non-service backends (#7332) * controller: ignore non-service backends Signed-off-by: Carlos Panato <ctadeu@gmail.com> * update per feedback Signed-off-by: Carlos Panato <ctadeu@gmail.com> * fix: allow scope/tcp/udp configmap namespace to altered (#7161) * Lower webhook timeout for digital ocean (#7319) * Lower webhook timeout for digital ocean * Set Digital Ocean value controller.admissionWebhooks.timeoutSeconds to 29 * update OWNERS and aliases files (#7365) (#7366) Signed-off-by: Carlos Panato <ctadeu@gmail.com> * Downgrade Lua modules for s390x (#7355) Downgrade Lua modules to last known working version. * Fix IngressClass logic for newer releases (#7341) * Fix IngressClass logic for newer releases Signed-off-by: Ricardo Pchevuzinske Katz <ricardo.katz@gmail.com> * Change e2e tests for the new IngressClass presence * Fix chart and admission tests Signed-off-by: Ricardo Pchevuzinske Katz <ricardo.katz@gmail.com> * Fix helm chart test Signed-off-by: Ricardo Pchevuzinske Katz <ricardo.katz@gmail.com> * Fix reviews * Remove ingressclass code from admission * update tag to v1.0.0-beta.1 * update readme and changelog for v1.0.0-beta.1 * Release v1.0.0-beta.1 - helm and manifests (#7422) * Change the order of annotation just to trigger a new helm release (#7425) * [cherry-pick] Add dev-v1 branch into helm releaser (#7428) * Add dev-v1 branch into helm releaser (#7424) * chore: add link for artifacthub.io/prerelease annotations Signed-off-by: Jintao Zhang <zhangjintao9020@gmail.com> Co-authored-by: Ricardo Katz <rikatz@users.noreply.github.com> * k8s job ci pipeline for dev-v1 br v1.22.0 (#7453) * k8s job ci pipeline for dev-v1 br v1.22.0 Signed-off-by: Neha Lohia <nehapithadiya444@gmail.com> * k8s job ci pipeline for dev-v1 br v1.21.2 Signed-off-by: Neha Lohia <nehapithadiya444@gmail.com> * remove v1.21.1 version Signed-off-by: Neha Lohia <nehapithadiya444@gmail.com> * Add controller.watchIngressWithoutClass config option (#7459) Signed-off-by: Akshit Grover <akshit.grover2016@gmail.com> * Release new helm chart with certgen fixed (#7478) * Update go version, modules and remove ioutil * Release new helm chart with certgen fixed * changed appversion, chartversion, TAG, image (#7490) * Fix CI conflict * Fix CI conflict * Fix build.sh from rebase process * Fix controller_test post rebase Co-authored-by: Tianhao Guo <rggth09@gmail.com> Co-authored-by: Ray <61553+rctay@users.noreply.github.com> Co-authored-by: Bill Cassidy <cassid4@gmail.com> Co-authored-by: Jintao Zhang <tao12345666333@163.com> Co-authored-by: Sathish Ramani <rsathishx87@gmail.com> Co-authored-by: Mansur Marvanov <nanorobocop@gmail.com> Co-authored-by: Matt1360 <568198+Matt1360@users.noreply.github.com> Co-authored-by: Carlos Tadeu Panato Junior <ctadeu@gmail.com> Co-authored-by: Kundan Kumar <kundan.kumar@india.nec.com> Co-authored-by: Tom Hayward <thayward@infoblox.com> Co-authored-by: Sergey Shakuto <sshakuto@infoblox.com> Co-authored-by: Tore <tore.lonoy@gmail.com> Co-authored-by: Bouke Versteegh <info@boukeversteegh.nl> Co-authored-by: Shahid <shahid@us.ibm.com> Co-authored-by: James Strong <strong.james.e@gmail.com> Co-authored-by: Long Wu Yuan <longwuyuan@gmail.com> Co-authored-by: Jintao Zhang <zhangjintao9020@gmail.com> Co-authored-by: Neha Lohia <nehapithadiya444@gmail.com> Co-authored-by: Akshit Grover <akshit.grover2016@gmail.com>
2021-08-21 20:42:00 +00:00
pc, err := collectors.NewNGINXProcess(podName, podNamespace, ingressclass)
2018-07-07 17:46:18 +00:00
if err != nil {
return nil, err
}
s, err := collectors.NewSocketCollector(podName, podNamespace, ingressclass, metricsPerHost, reportStatusClasses, buckets)
2018-07-07 17:46:18 +00:00
if err != nil {
return nil, err
}
Release v1 (#7470) * Drop v1beta1 from ingress nginx (#7156) * Drop v1beta1 from ingress nginx Signed-off-by: Ricardo Pchevuzinske Katz <ricardo.katz@gmail.com> * Fix intorstr logic in controller Signed-off-by: Ricardo Pchevuzinske Katz <ricardo.katz@gmail.com> * fixing admission Signed-off-by: Ricardo Pchevuzinske Katz <ricardo.katz@gmail.com> * more intorstr fixing * correct template rendering Signed-off-by: Ricardo Pchevuzinske Katz <ricardo.katz@gmail.com> * Fix e2e tests for v1 api Signed-off-by: Ricardo Pchevuzinske Katz <ricardo.katz@gmail.com> * Fix gofmt errors * This is finally working...almost there... Signed-off-by: Ricardo Pchevuzinske Katz <ricardo.katz@gmail.com> * Re-add removed validation of AdmissionReview * Prepare for v1.0.0-alpha.1 release Signed-off-by: Ricardo Pchevuzinske Katz <ricardo.katz@gmail.com> * Update changelog and matrix table for v1.0.0-alpha.1 (#7274) Signed-off-by: Ricardo Pchevuzinske Katz <ricardo.katz@gmail.com> * add docs for syslog feature (#7219) * Fix link to e2e-tests.md in developer-guide (#7201) * Use ENV expansion for namespace in args (#7146) Update the DaemonSet namespace references to use the `POD_NAMESPACE` environment variable in the same way that the Deployment does. * chart: using Helm builtin capabilities check (#7190) Signed-off-by: Jintao Zhang <zhangjintao9020@gmail.com> * Update proper default value for HTTP2MaxConcurrentStreams in Docs (#6944) It should be 128 as documented in https://github.com/kubernetes/ingress-nginx/blob/master/internal/ingress/controller/config/config.go#L780 * Fix MaxWorkerOpenFiles calculation on high cores nodes (#7107) * Fix MaxWorkerOpenFiles calculation on high cores nodes * Add e2e test for rlimit_nofile * Fix doc for max-worker-open-files * ingress/tcp: add additional error logging on failed (#7208) * Add file containing stable release (#7313) * Handle named (non-numeric) ports correctly (#7311) Signed-off-by: Carlos Panato <ctadeu@gmail.com> * Updated v1beta1 to v1 as its deprecated (#7308) * remove mercurial from build (#7031) * Retry to download maxmind DB if it fails (#7242) * Retry to download maxmind DB if it fails. Signed-off-by: Sergey Shakuto <sshakuto@infoblox.com> * Add retries count arg, move retry logic into DownloadGeoLite2DB function Signed-off-by: Sergey Shakuto <sshakuto@infoblox.com> * Reorder parameters in DownloadGeoLite2DB Signed-off-by: Sergey Shakuto <sshakuto@infoblox.com> * Remove hardcoded value Signed-off-by: Sergey Shakuto <sshakuto@infoblox.com> * Release v1.0.0-alpha.1 * Add changelog for v1.0.0-alpha.2 * controller: ignore non-service backends (#7332) * controller: ignore non-service backends Signed-off-by: Carlos Panato <ctadeu@gmail.com> * update per feedback Signed-off-by: Carlos Panato <ctadeu@gmail.com> * fix: allow scope/tcp/udp configmap namespace to altered (#7161) * Lower webhook timeout for digital ocean (#7319) * Lower webhook timeout for digital ocean * Set Digital Ocean value controller.admissionWebhooks.timeoutSeconds to 29 * update OWNERS and aliases files (#7365) (#7366) Signed-off-by: Carlos Panato <ctadeu@gmail.com> * Downgrade Lua modules for s390x (#7355) Downgrade Lua modules to last known working version. * Fix IngressClass logic for newer releases (#7341) * Fix IngressClass logic for newer releases Signed-off-by: Ricardo Pchevuzinske Katz <ricardo.katz@gmail.com> * Change e2e tests for the new IngressClass presence * Fix chart and admission tests Signed-off-by: Ricardo Pchevuzinske Katz <ricardo.katz@gmail.com> * Fix helm chart test Signed-off-by: Ricardo Pchevuzinske Katz <ricardo.katz@gmail.com> * Fix reviews * Remove ingressclass code from admission * update tag to v1.0.0-beta.1 * update readme and changelog for v1.0.0-beta.1 * Release v1.0.0-beta.1 - helm and manifests (#7422) * Change the order of annotation just to trigger a new helm release (#7425) * [cherry-pick] Add dev-v1 branch into helm releaser (#7428) * Add dev-v1 branch into helm releaser (#7424) * chore: add link for artifacthub.io/prerelease annotations Signed-off-by: Jintao Zhang <zhangjintao9020@gmail.com> Co-authored-by: Ricardo Katz <rikatz@users.noreply.github.com> * k8s job ci pipeline for dev-v1 br v1.22.0 (#7453) * k8s job ci pipeline for dev-v1 br v1.22.0 Signed-off-by: Neha Lohia <nehapithadiya444@gmail.com> * k8s job ci pipeline for dev-v1 br v1.21.2 Signed-off-by: Neha Lohia <nehapithadiya444@gmail.com> * remove v1.21.1 version Signed-off-by: Neha Lohia <nehapithadiya444@gmail.com> * Add controller.watchIngressWithoutClass config option (#7459) Signed-off-by: Akshit Grover <akshit.grover2016@gmail.com> * Release new helm chart with certgen fixed (#7478) * Update go version, modules and remove ioutil * Release new helm chart with certgen fixed * changed appversion, chartversion, TAG, image (#7490) * Fix CI conflict * Fix CI conflict * Fix build.sh from rebase process * Fix controller_test post rebase Co-authored-by: Tianhao Guo <rggth09@gmail.com> Co-authored-by: Ray <61553+rctay@users.noreply.github.com> Co-authored-by: Bill Cassidy <cassid4@gmail.com> Co-authored-by: Jintao Zhang <tao12345666333@163.com> Co-authored-by: Sathish Ramani <rsathishx87@gmail.com> Co-authored-by: Mansur Marvanov <nanorobocop@gmail.com> Co-authored-by: Matt1360 <568198+Matt1360@users.noreply.github.com> Co-authored-by: Carlos Tadeu Panato Junior <ctadeu@gmail.com> Co-authored-by: Kundan Kumar <kundan.kumar@india.nec.com> Co-authored-by: Tom Hayward <thayward@infoblox.com> Co-authored-by: Sergey Shakuto <sshakuto@infoblox.com> Co-authored-by: Tore <tore.lonoy@gmail.com> Co-authored-by: Bouke Versteegh <info@boukeversteegh.nl> Co-authored-by: Shahid <shahid@us.ibm.com> Co-authored-by: James Strong <strong.james.e@gmail.com> Co-authored-by: Long Wu Yuan <longwuyuan@gmail.com> Co-authored-by: Jintao Zhang <zhangjintao9020@gmail.com> Co-authored-by: Neha Lohia <nehapithadiya444@gmail.com> Co-authored-by: Akshit Grover <akshit.grover2016@gmail.com>
2021-08-21 20:42:00 +00:00
ic := collectors.NewController(podName, podNamespace, ingressclass)
2018-07-07 17:46:18 +00:00
am := collectors.NewAdmissionCollector(podName, podNamespace, ingressclass)
2018-07-07 17:46:18 +00:00
return Collector(&collector{
nginxStatus: nc,
nginxProcess: pc,
admissionController: am,
ingressController: ic,
2018-07-07 17:46:18 +00:00
socket: s,
registry: registry,
}), nil
}
func (c *collector) ConfigSuccess(hash uint64, success bool) {
c.ingressController.ConfigSuccess(hash, success)
}
func (c *collector) IncCheckCount(namespace string, name string) {
c.ingressController.IncCheckCount(namespace, name)
}
func (c *collector) IncCheckErrorCount(namespace string, name string) {
c.ingressController.IncCheckErrorCount(namespace, name)
}
2018-07-07 17:46:18 +00:00
func (c *collector) IncReloadCount() {
c.ingressController.IncReloadCount()
}
func (c *collector) IncReloadErrorCount() {
c.ingressController.IncReloadErrorCount()
}
Add a certificate info metric (#8253) When the ingress controller loads certificates (new ones or following a secret update), it performs a series of check to ensure its validity. In our systems, we detected a case where, when the secret object is compromised, for example when the certificate does not match the secret key, different pods of the ingress controller are serving a different version of the certificate. This behaviour is due to the cache mechanism of the ingress controller, keeping the last known certificate in case of corruption. When this happens, old ingress-controller pods will keep serving the old one, while new pods, by failing to load the corrupted certificates, would use the default certificate, causing invalid certificates for its clients. This generates a random error on the client side, depending on the actual pod instance it reaches. In order to allow detecting occurences of those situations, add a metric to expose, for all ingress controlller pods, detailed informations of the currently loaded certificate. This will, for example, allow setting an alert when there is a certificate discrepency across all ingress controller pods using a query similar to `sum(nginx_ingress_controller_ssl_certificate_info{host="name.tld"})by(serial_number)` This also allows to catch other exceptions loading certificates (failing to load the certificate from the k8s API, ... Co-authored-by: Daniel Ricart <danielricart@users.noreply.github.com> Co-authored-by: Daniel Ricart <danielricart@users.noreply.github.com>
2022-02-24 15:08:32 +00:00
func (c *collector) RemoveMetrics(ingresses, hosts, certificates []string) {
2018-07-07 17:46:18 +00:00
c.socket.RemoveMetrics(ingresses, c.registry)
Add a certificate info metric (#8253) When the ingress controller loads certificates (new ones or following a secret update), it performs a series of check to ensure its validity. In our systems, we detected a case where, when the secret object is compromised, for example when the certificate does not match the secret key, different pods of the ingress controller are serving a different version of the certificate. This behaviour is due to the cache mechanism of the ingress controller, keeping the last known certificate in case of corruption. When this happens, old ingress-controller pods will keep serving the old one, while new pods, by failing to load the corrupted certificates, would use the default certificate, causing invalid certificates for its clients. This generates a random error on the client side, depending on the actual pod instance it reaches. In order to allow detecting occurences of those situations, add a metric to expose, for all ingress controlller pods, detailed informations of the currently loaded certificate. This will, for example, allow setting an alert when there is a certificate discrepency across all ingress controller pods using a query similar to `sum(nginx_ingress_controller_ssl_certificate_info{host="name.tld"})by(serial_number)` This also allows to catch other exceptions loading certificates (failing to load the certificate from the k8s API, ... Co-authored-by: Daniel Ricart <danielricart@users.noreply.github.com> Co-authored-by: Daniel Ricart <danielricart@users.noreply.github.com>
2022-02-24 15:08:32 +00:00
c.ingressController.RemoveMetrics(hosts, certificates, c.registry)
2018-07-07 17:46:18 +00:00
}
func (c *collector) Start(admissionStatus string) {
2018-07-07 17:46:18 +00:00
c.registry.MustRegister(c.nginxStatus)
c.registry.MustRegister(c.nginxProcess)
if admissionStatus != "" {
c.registry.MustRegister(c.admissionController)
}
2018-07-07 17:46:18 +00:00
c.registry.MustRegister(c.ingressController)
c.registry.MustRegister(c.socket)
2018-07-30 13:56:09 +00:00
// the default nginx.conf does not contains
// a server section with the status port
go func() {
time.Sleep(5 * time.Second)
c.nginxStatus.Start()
}()
2018-07-07 17:46:18 +00:00
go c.nginxProcess.Start()
go c.socket.Start()
}
func (c *collector) Stop(admissionStatus string) {
2018-07-07 17:46:18 +00:00
c.registry.Unregister(c.nginxStatus)
c.registry.Unregister(c.nginxProcess)
if admissionStatus != "" {
c.registry.Unregister(c.admissionController)
}
2018-07-07 17:46:18 +00:00
c.registry.Unregister(c.ingressController)
c.registry.Unregister(c.socket)
c.nginxStatus.Stop()
c.nginxProcess.Stop()
c.socket.Stop()
}
func (c *collector) SetSSLExpireTime(servers []*ingress.Server) {
if !isLeader() {
return
}
2020-09-27 20:32:40 +00:00
klog.V(2).InfoS("Updating ssl expiration metrics")
2018-07-07 17:46:18 +00:00
c.ingressController.SetSSLExpireTime(servers)
}
Add a certificate info metric (#8253) When the ingress controller loads certificates (new ones or following a secret update), it performs a series of check to ensure its validity. In our systems, we detected a case where, when the secret object is compromised, for example when the certificate does not match the secret key, different pods of the ingress controller are serving a different version of the certificate. This behaviour is due to the cache mechanism of the ingress controller, keeping the last known certificate in case of corruption. When this happens, old ingress-controller pods will keep serving the old one, while new pods, by failing to load the corrupted certificates, would use the default certificate, causing invalid certificates for its clients. This generates a random error on the client side, depending on the actual pod instance it reaches. In order to allow detecting occurences of those situations, add a metric to expose, for all ingress controlller pods, detailed informations of the currently loaded certificate. This will, for example, allow setting an alert when there is a certificate discrepency across all ingress controller pods using a query similar to `sum(nginx_ingress_controller_ssl_certificate_info{host="name.tld"})by(serial_number)` This also allows to catch other exceptions loading certificates (failing to load the certificate from the k8s API, ... Co-authored-by: Daniel Ricart <danielricart@users.noreply.github.com> Co-authored-by: Daniel Ricart <danielricart@users.noreply.github.com>
2022-02-24 15:08:32 +00:00
func (c *collector) SetSSLInfo(servers []*ingress.Server) {
klog.V(2).Infof("Updating ssl certificate info metrics")
c.ingressController.SetSSLInfo(servers)
}
func (c *collector) SetHosts(hosts sets.String) {
c.socket.SetHosts(hosts)
}
func (c *collector) SetAdmissionMetrics(testedIngressLength float64, testedIngressTime float64, renderingIngressLength float64, renderingIngressTime float64, testedConfigurationSize float64, admissionTime float64) {
c.admissionController.SetAdmissionMetrics(
testedIngressLength,
testedIngressTime,
renderingIngressLength,
renderingIngressTime,
testedConfigurationSize,
admissionTime,
)
}
2019-03-11 16:20:41 +00:00
// OnStartedLeading indicates the pod was elected as the leader
func (c *collector) OnStartedLeading(electionID string) {
setLeader(true)
c.ingressController.OnStartedLeading(electionID)
}
2019-03-11 16:20:41 +00:00
// OnStoppedLeading indicates the pod stopped being the leader
func (c *collector) OnStoppedLeading(electionID string) {
setLeader(false)
c.ingressController.OnStoppedLeading(electionID)
Add a certificate info metric (#8253) When the ingress controller loads certificates (new ones or following a secret update), it performs a series of check to ensure its validity. In our systems, we detected a case where, when the secret object is compromised, for example when the certificate does not match the secret key, different pods of the ingress controller are serving a different version of the certificate. This behaviour is due to the cache mechanism of the ingress controller, keeping the last known certificate in case of corruption. When this happens, old ingress-controller pods will keep serving the old one, while new pods, by failing to load the corrupted certificates, would use the default certificate, causing invalid certificates for its clients. This generates a random error on the client side, depending on the actual pod instance it reaches. In order to allow detecting occurences of those situations, add a metric to expose, for all ingress controlller pods, detailed informations of the currently loaded certificate. This will, for example, allow setting an alert when there is a certificate discrepency across all ingress controller pods using a query similar to `sum(nginx_ingress_controller_ssl_certificate_info{host="name.tld"})by(serial_number)` This also allows to catch other exceptions loading certificates (failing to load the certificate from the k8s API, ... Co-authored-by: Daniel Ricart <danielricart@users.noreply.github.com> Co-authored-by: Daniel Ricart <danielricart@users.noreply.github.com>
2022-02-24 15:08:32 +00:00
c.ingressController.RemoveAllSSLMetrics(c.registry)
}
var (
currentLeader uint32
)
func setLeader(leader bool) {
var i uint32
if leader {
i = 1
}
atomic.StoreUint32(&currentLeader, i)
}
func isLeader() bool {
return atomic.LoadUint32(&currentLeader) != 0
}