This commit introduces a backwards compatible command line option
--report-status-classes which will enable reporting response status classes
(2xx, 3xx..) instead of status codes in exported metrics.
* Initial work on chrooting nginx process
* More improvements in chroot
* Fix charts and some file locations
* Fix symlink on non chrooted container
* fix psp test
* Add e2e tests to chroot image
* Fix logger
* Add internal logger in controller
* Fix overlay for chrooted tests
* Fix tests
* fix boilerplates
* Fix unittest to point to the right pid
* Fix PR review
When the ingress controller loads certificates (new ones or following a
secret update), it performs a series of check to ensure its validity.
In our systems, we detected a case where, when the secret object is
compromised, for example when the certificate does not match the secret
key, different pods of the ingress controller are serving a different
version of the certificate.
This behaviour is due to the cache mechanism of the ingress controller,
keeping the last known certificate in case of corruption. When this
happens, old ingress-controller pods will keep serving the old one,
while new pods, by failing to load the corrupted certificates, would
use the default certificate, causing invalid certificates for its
clients.
This generates a random error on the client side, depending on the
actual pod instance it reaches.
In order to allow detecting occurences of those situations, add a metric
to expose, for all ingress controlller pods, detailed informations of
the currently loaded certificate.
This will, for example, allow setting an alert when there is a
certificate discrepency across all ingress controller pods using a query
similar to `sum(nginx_ingress_controller_ssl_certificate_info{host="name.tld"})by(serial_number)`
This also allows to catch other exceptions loading certificates (failing
to load the certificate from the k8s API, ...
Co-authored-by: Daniel Ricart <danielricart@users.noreply.github.com>
Co-authored-by: Daniel Ricart <danielricart@users.noreply.github.com>
<!--- Provide a general summary of your changes in the Title above --->
<!--- Why is this change required? What problem does it solve? -->
Introduces the CLI command flag `--disable-full-test`
By default, it doesn't alter the current behavior of the tests performed by the admission controller.
With or Without the flag, a full checkOverlap is actioned, without any alteration
and the object `pcfg` is created with the whole set of ingreses.
If the flag is set to true, it does manipulate the size of `pcfg` up to the content of $this single ingress.
This is achieved by overriding pcfg content by just the last slice that got recently appended to the object `ings`
```
if n.cfg.DisableFullValidationTest {
_, _, pcfg = n.getConfiguration(ings[len(ings)-1:])
}
```
The following steps of generateTemplate and testTemplate are significally reduced to a signle scenario
```
content, err := n.generateTemplate(cfg, *pcfg)
...
err = n.testTemplate(content)
```
This flag doesn't avoid the proper testing of collisions, neither bad syntaxis within the rendered
configuration of the ingress.
But it does eliminate a scenario, which I wasn't able to produce, where by for some reason even proper rendering
and valid values, without collisions of host/path may end into an invalid nginx.conf
The reasoning for this Feature is:
- Test duration increases by the number of ingresses in the cluster.
- File size grows to very important numbers 150-200Mb on clusters with just 2000~ ingresses.
- Tests in that scenario, takes approximately 20s using the last 0.48.1 improvements
- Produces a considerable memory consumption, as well as CPU, compute, that affects directly the containers
that serve traffic.
Since the flag is trully optional, and by default is disabled I fell as a good thing to have that can definitively
help on large-scale scenarios that still want to have a reasonable set of tests in place at a lower cost.
<!--- What types of changes does your code introduce? Put an `x` in all the boxes that apply: -->
- [ ] Bug fix (non-breaking change which fixes an issue)
- [X ] New feature (non-breaking change which adds functionality)
- [ ] Breaking change (fix or feature that would cause existing functionality to change)
<!--- Please describe in detail how you tested your changes. -->
<!--- Include details of your testing environment, and the tests you ran to -->
<!--- see how your change affects other areas of the code, etc. -->
Tested with the build kit the following scenarios on a cluster with 1000~ ingresses:
- With Flag Disabled or Flag, not present (current status as per 0.48.1)
collision scenario (wrong snippet content):
`kubectl apply -f ../collision-syntax.yaml 0.18s user 0.05s system 3% cpu 6.639 total`
collisions scenario (duplicated host):
`kubectl apply -f ../collision-host.yaml 0.17s user 0.05s system 3% cpu 6.245 total`
create/update:
`kubectl apply -f ing-215.yaml 0.16s user 0.05s system 3% cpu 5.845 total`
- With Flag Enabled (true):
collision scenario (wrong snippet content):
`kubectl apply -f ../collision.yaml 0.18s user 0.02s system 57% cpu 0.347 total`
collision scenario (duplicated host):
`kubectl apply -f ../collision.yaml 0.21s user 0.06s system 85% cpu 0.318 total`
create/update:
`kubectl apply -f ing-973.yaml 0.17s user 0.03s system 72% cpu 0.271 total`
As part of the test, I did verified that the created nginx for the test was of a smaller size, and that it didnt affect negatively the final nginx.conf (of a much larger side) where this was merged by the next steps in place after the validation. I couldn't observe any other change in the behaviour and so far the routine looks simple and non harmful.
<!--- Go over all the following points, and put an `x` in all the boxes that apply. -->
<!--- If you're unsure about any of these, don't hesitate to ask. We're here to help! -->
- [x] My change requires a change to the documentation.
- [x] I have updated the documentation accordingly.
- [x] I've read the [CONTRIBUTION](https://github.com/kubernetes/ingress-nginx/blob/main/CONTRIBUTING.md) guide
- [ ] I have added tests to cover my changes.
- [ ] All new and existing tests passed.
For the test part, I would need to understand the placement and test case that this would require, I wasn't able to see an existing scenario for this
* Add a flag to specify address to bind the healthz server
Signed-off-by: m.nabokikh <maksim.nabokikh@flant.com>
* Add healthz host to the helm chart
Signed-off-by: m.nabokikh <maksim.nabokikh@flant.com>
* Apply suggestions from code review
Co-authored-by: Ricardo Katz <rikatz@users.noreply.github.com>
Co-authored-by: Ricardo Katz <rikatz@users.noreply.github.com>
* fix ingress-nginx panic when the certificate format is wrong.
Signed-off-by: wang_wenhu <976400757@qq.com>
* Add unit test.
Signed-off-by: wang_wenhu <976400757@qq.com>
* Update controller_test.go