The goal of this Ingress controller is the assembly of a configuration file (nginx.conf). The main implication of this requirement is the need to reload NGINX after any change in the configuration file. Though it is important to note that we don't reload Nginx on changes that impact only an upstream configuration (i.e Endpoints change when you deploy your app). We use lua-nginx-module to achieve this. Check below to learn more about how it's done.
Usually, a Kubernetes Controller utilizes the synchronization loop pattern to check if the desired state in the controller is updated or a change is required. To this purpose, we need to build a model using different objects from the cluster, in particular (in no special order) Ingresses, Services, Endpoints, Secrets, and Configmaps to generate a point in time configuration file that reflects the state of the cluster.
To get this object from the cluster, we use Kubernetes Informers, in particular, FilteredSharedInformer. This informers allows reacting to changes in using callbacks to individual changes when a new object is added, modified or removed. Unfortunately, there is no way to know if a particular change is going to affect the final configuration file. Therefore on every change, we have to rebuild a new model from scratch based on the state of cluster and compare it to the current model. If the new model equals to the current one, then we avoid generating a new NGINX configuration and triggering a reload. Otherwise, we check if the difference is only about Endpoints. If so we then send the new list of Endpoints to a Lua handler running inside Nginx using HTTP POST request and again avoid generating a new NGINX configuration and triggering a reload. If the difference between running and new model is about more than just Endpoints we create a new NGINX configuration based on the new model, replace the current model and trigger a reload.
One of the uses of the model is to avoid unnecessary reloads when there's no change in the state and to detect conflicts in definitions.
The final representation of the NGINX configuration is generated from a Go template using the new model as input for the variables required by the template.
Building a model is an expensive operation, for this reason, the use of the synchronization loop is a must. By using a work queue it is possible to not lose changes and remove the use of sync.Mutex to force a single execution of the sync loop and additionally it is possible to create a time window between the start and end of the sync loop that allows us to discard unnecessary updates. It is important to understand that any change in the cluster could generate events that the informer will send to the controller and one of the reasons for the work queue.
Operations to build the model:
Order Ingress rules by CreationTimestamp field, i.e., old rules first.
If the same path for the same host is defined in more than one Ingress, the oldest rule wins.
If more than one Ingress contains a TLS section for the same host, the oldest rule wins.
If multiple Ingresses define an annotation that affects the configuration of the Server block, the oldest rule wins.
Create a list of NGINX Servers (per hostname)
Create a list of NGINX Upstreams
If multiple Ingresses define different paths for the same host, the ingress controller will merge the definitions.
Annotations are applied to all the paths in the Ingress.
Multiple Ingresses can define different annotations. These definitions are not shared between Ingresses.
In some cases, it is possible to avoid reloads, in particular when there is a change in the endpoints, i.e., a pod is started or replaced. It is out of the scope of this Ingress controller to remove reloads completely. This would require an incredible amount of work and at some point makes no sense. This can change only if NGINX changes the way new configurations are read, basically, new changes do not replace worker processes.
On every endpoint change the controller fetches endpoints from all the services it sees and generates corresponding Backend objects. It then sends these objects to a Lua handler running inside Nginx. The Lua code in turn stores those backends in a shared memory zone. Then for every request Lua code running in balancer_by_lua context detects what endpoints it should choose upstream peer from and applies the configured load balancing algorithm to choose the peer. Then Nginx takes care of the rest. This way we avoid reloading Nginx on endpoint changes. Note that this includes annotation changes that affects only upstream configuration in Nginx as well.
In a relatively big cluster with frequently deploying apps this feature saves significant number of Nginx reloads which can otherwise affect response latency, load balancing quality (after every reload Nginx resets the state of load balancing) and so on.
Because the ingress controller works using the synchronization loop pattern, it is applying the configuration for all matching objects. In case some Ingress objects have a broken configuration, for example a syntax error in the nginx.ingress.kubernetes.io/configuration-snippet annotation, the generated configuration becomes invalid, does not reload and hence no more ingresses will be taken into account.
To prevent this situation to happen, the nginx ingress controller optionally exposes a validating admission webhook server to ensure the validity of incoming ingress objects. This webhook appends the incoming ingress objects to the list of ingresses, generates the configuration and calls nginx to ensure the configuration has no syntax errors.
The goal of this Ingress controller is the assembly of a configuration file (nginx.conf). The main implication of this requirement is the need to reload NGINX after any change in the configuration file. Though it is important to note that we don't reload Nginx on changes that impact only an upstream configuration (i.e Endpoints change when you deploy your app). We use lua-nginx-module to achieve this. Check below to learn more about how it's done.
Usually, a Kubernetes Controller utilizes the synchronization loop pattern to check if the desired state in the controller is updated or a change is required. To this purpose, we need to build a model using different objects from the cluster, in particular (in no special order) Ingresses, Services, Endpoints, Secrets, and Configmaps to generate a point in time configuration file that reflects the state of the cluster.
To get this object from the cluster, we use Kubernetes Informers, in particular, FilteredSharedInformer. These informers allow reacting to change in using callbacks to individual changes when a new object is added, modified or removed. Unfortunately, there is no way to know if a particular change is going to affect the final configuration file. Therefore on every change, we have to rebuild a new model from scratch based on the state of cluster and compare it to the current model. If the new model equals to the current one, then we avoid generating a new NGINX configuration and triggering a reload. Otherwise, we check if the difference is only about Endpoints. If so we then send the new list of Endpoints to a Lua handler running inside Nginx using HTTP POST request and again avoid generating a new NGINX configuration and triggering a reload. If the difference between running and new model is about more than just Endpoints we create a new NGINX configuration based on the new model, replace the current model and trigger a reload.
One of the uses of the model is to avoid unnecessary reloads when there's no change in the state and to detect conflicts in definitions.
The final representation of the NGINX configuration is generated from a Go template using the new model as input for the variables required by the template.
Building a model is an expensive operation, for this reason, the use of the synchronization loop is a must. By using a work queue it is possible to not lose changes and remove the use of sync.Mutex to force a single execution of the sync loop and additionally it is possible to create a time window between the start and end of the sync loop that allows us to discard unnecessary updates. It is important to understand that any change in the cluster could generate events that the informer will send to the controller and one of the reasons for the work queue.
Operations to build the model:
Order Ingress rules by CreationTimestamp field, i.e., old rules first.
If the same path for the same host is defined in more than one Ingress, the oldest rule wins.
If more than one Ingress contains a TLS section for the same host, the oldest rule wins.
If multiple Ingresses define an annotation that affects the configuration of the Server block, the oldest rule wins.
Create a list of NGINX Servers (per hostname)
Create a list of NGINX Upstreams
If multiple Ingresses define different paths for the same host, the ingress controller will merge the definitions.
Annotations are applied to all the paths in the Ingress.
Multiple Ingresses can define different annotations. These definitions are not shared between Ingresses.
In some cases, it is possible to avoid reloads, in particular when there is a change in the endpoints, i.e., a pod is started or replaced. It is out of the scope of this Ingress controller to remove reloads completely. This would require an incredible amount of work and at some point makes no sense. This can change only if NGINX changes the way new configurations are read, basically, new changes do not replace worker processes.
On every endpoint change the controller fetches endpoints from all the services it sees and generates corresponding Backend objects. It then sends these objects to a Lua handler running inside Nginx. The Lua code in turn stores those backends in a shared memory zone. Then for every request Lua code running in balancer_by_lua context detects what endpoints it should choose upstream peer from and applies the configured load balancing algorithm to choose the peer. Then Nginx takes care of the rest. This way we avoid reloading Nginx on endpoint changes. Note that this includes annotation changes that affects only upstream configuration in Nginx as well.
In a relatively big cluster with frequently deploying apps this feature saves significant number of Nginx reloads which can otherwise affect response latency, load balancing quality (after every reload Nginx resets the state of load balancing) and so on.
Because the ingress controller works using the synchronization loop pattern, it is applying the configuration for all matching objects. In case some Ingress objects have a broken configuration, for example a syntax error in the nginx.ingress.kubernetes.io/configuration-snippet annotation, the generated configuration becomes invalid, does not reload and hence no more ingresses will be taken into account.
To prevent this situation to happen, the nginx ingress controller optionally exposes a validating admission webhook server to ensure the validity of incoming ingress objects. This webhook appends the incoming ingress objects to the list of ingresses, generates the configuration and calls nginx to ensure the configuration has no syntax errors.
FAQ - Migration to Kubernetes 1.22 and apiVersion networking.k8s.io/v1 ¶
If you are using Ingress objects in your cluster (running Kubernetes older than v1.22), and you plan to upgrade to Kubernetes v1.22, this page is relevant to you.
What is an IngressClass and why is it important for users of ingress-nginx controller now? ¶
IngressClass is a Kubernetes resource. See the description below. It's important because until now, a default install of the ingress-nginx controller did not require a IngressClass object. From version 1.0.0 of the ingress-nginx controller, an IngressClass object is required.
On clusters with more than one instance of the ingress-nginx controller, all instances of the controllers must be aware of which Ingress objects they serve. The ingressClassName field of an Ingress is the way to let the controller know about that.
kubectl explain ingressclass
+ FAQ - Migration to Kubernetes 1.22 and apiVersion `networking.k8s.io/v1` - NGINX Ingress Controller
FAQ - Migration to Kubernetes 1.22 and apiVersion networking.k8s.io/v1 ¶
If you are using Ingress objects in your cluster (running Kubernetes older than v1.22), and you plan to upgrade to Kubernetes v1.22, this page is relevant to you.
What is an IngressClass and why is it important for users of ingress-nginx controller now? ¶
IngressClass is a Kubernetes resource. See the description below. It's important because until now, a default install of the ingress-nginx controller did not require a IngressClass object. From version 1.0.0 of the ingress-nginx controller, an IngressClass object is required.
On clusters with more than one instance of the ingress-nginx controller, all instances of the controllers must be aware of which Ingress objects they serve. The ingressClassName field of an Ingress is the way to let the controller know about that.
and add the value spec.ingressClassName=nginx in your Ingress objects.
I have many ingress objects in my cluster. What should I do? ¶
If you have lot of ingress objects without ingressClass configuration, you can run the ingress controller with the flag --watch-ingress-without-class=true.
It's a flag that is passed, as an argument, to the nginx-ingress-controller executable. In the configuration, it looks like this:
# ...
+
and add the value spec.ingressClassName=nginx in your Ingress objects.
I have many ingress objects in my cluster. What should I do? ¶
If you have a lot of ingress objects without ingressClass configuration, you can run the ingress controller with the flag --watch-ingress-without-class=true.
I have more than one controller in my cluster, and I'm already using the annotation ¶
No problem. This should still keep working, but we highly recommend you to test! Even though kubernetes.io/ingress.class is deprecated, the ingress-nginx controller still understands that annotation. If you want to follow good practice, you should consider migrating to use IngressClass and .spec.ingressClassName.
I have more than one controller running in my cluster, and I want to use the new API ¶
In this scenario, you need to create multiple IngressClasses (see the example above).
Be aware that IngressClass works in a very specific way: you will need to change the .spec.controller value in your IngressClass and configure the controller to expect the exact same value.
Let's see an example, supposing that you have three IngressClasses:
IngressClass ingress-nginx-one, with .spec.controller equal to example.com/ingress-nginx1
IngressClass ingress-nginx-two, with .spec.controller equal to example.com/ingress-nginx2
IngressClass ingress-nginx-three, with .spec.controller equal to example.com/ingress-nginx1
For private use, you can also use a controller name that doesn't contain a /, e.g. ingress-nginx1.
When deploying your ingress controllers, you will have to change the --controller-class field as follows:
Ingress-Nginx A, configured to use controller class name example.com/ingress-nginx1
Ingress-Nginx B, configured to use controller class name example.com/ingress-nginx2
When you create an Ingress object with its ingressClassName set to ingress-nginx-two, only controllers looking for the example.com/ingress-nginx2 controller class pay attention to the new object.
Given that Ingress-Nginx B is set up that way, it will serve that object, whereas Ingress-Nginx A ignores the new Ingress.
Bear in mind that if you start Ingress-Nginx B with the command line argument --watch-ingress-without-class=true, it will serve:
Ingresses without any ingressClassName set
Ingresses where the deprecated annotation (kubernetes.io/ingress.class) matches the value set in the command line argument --ingress-class
Ingresses that refer to any IngressClass that has the same spec.controller as configured in --controller-class
If you start Ingress-Nginx B with the command line argument --watch-ingress-without-class=true and you run Ingress-Nginx A with the command line argument --watch-ingress-without-class=false then this is a supported configuration. If you have two ingress-nginx controllers for the same cluster, both running with --watch-ingress-without-class=true then there is likely to be a conflict.
Why am I am seeing "ingress class annotation is not equal to the expected by Ingress Controller" in my controller logs? ¶
It is highly likely that you will also see the name of the ingress resource in the same error message. This error message has been observed on use the deprecated annotation (kubernetes.io/ingress.class) in a Ingress resource manifest. It is recommended to use the .spec.ingressClassName field of the Ingress resource, to specify the name of the IngressClass of the Ingress you are defining.
How can I easily install multiple instances of the ingress-nginx controller in the same cluster? ¶
You can install them in different namespaces.
Create a new namespace
kubectl create namespace ingress-nginx-2
+
I have more than one controller in my cluster, and I'm already using the annotation ¶
No problem. This should still keep working, but we highly recommend you to test! Even though kubernetes.io/ingress.class is deprecated, the ingress-nginx controller still understands that annotation. If you want to follow good practice, you should consider migrating to use IngressClass and .spec.ingressClassName.
I have more than one controller running in my cluster, and I want to use the new API ¶
In this scenario, you need to create multiple IngressClasses (see the example above).
Be aware that IngressClass works in a very specific way: you will need to change the .spec.controller value in your IngressClass and configure the controller to expect the exact same value.
Let's see an example, supposing that you have three IngressClasses:
IngressClass ingress-nginx-one, with .spec.controller equal to example.com/ingress-nginx1
IngressClass ingress-nginx-two, with .spec.controller equal to example.com/ingress-nginx2
IngressClass ingress-nginx-three, with .spec.controller equal to example.com/ingress-nginx1
For private use, you can also use a controller name that doesn't contain a /, e.g. ingress-nginx1.
When deploying your ingress controllers, you will have to change the --controller-class field as follows:
Ingress-Nginx A, configured to use controller class name example.com/ingress-nginx1
Ingress-Nginx B, configured to use controller class name example.com/ingress-nginx2
When you create an Ingress object with its ingressClassName set to ingress-nginx-two, only controllers looking for the example.com/ingress-nginx2 controller class pay attention to the new object.
Given that Ingress-Nginx B is set up that way, it will serve that object, whereas Ingress-Nginx A ignores the new Ingress.
Bear in mind that if you start Ingress-Nginx B with the command line argument --watch-ingress-without-class=true, it will serve:
Ingresses without any ingressClassName set
Ingresses where the deprecated annotation (kubernetes.io/ingress.class) matches the value set in the command line argument --ingress-class
Ingresses that refer to any IngressClass that has the same spec.controller as configured in --controller-class
If you start Ingress-Nginx B with the command line argument --watch-ingress-without-class=true and you run Ingress-Nginx A with the command line argument --watch-ingress-without-class=false then this is a supported configuration. If you have two ingress-nginx controllers for the same cluster, both running with --watch-ingress-without-class=true then there is likely to be a conflict.
Why am I seeing "ingress class annotation is not equal to the expected by Ingress Controller" in my controller logs? ¶
It is highly likely that you will also see the name of the ingress resource in the same error message. This error message has been observed on use the deprecated annotation (kubernetes.io/ingress.class) in an Ingress resource manifest. It is recommended to use the .spec.ingressClassName field of the Ingress resource, to specify the name of the IngressClass of the Ingress you are defining.
How can I easily install multiple instances of the ingress-nginx controller in the same cluster? ¶
You can install them in different namespaces.
Create a new namespace
kubectl create namespace ingress-nginx-2
Use Helm to install the additional instance of the ingress controller
We have to assume that you have the helm repo for the ingress-nginx controller already added to your Helm config. But, if you have not added the helm repo then you can do this to add the repo to your helm config;
By default NGINX uses the content of the header X-Forwarded-For as the source of truth to get information about the client IP address. This works without issues in L7 if we configure the setting proxy-real-ip-cidr with the correct information of the IP/network address of trusted external load balancer.
If the ingress controller is running in AWS we need to use the VPC IPv4 CIDR.
Another option is to enable proxy protocol using use-proxy-protocol: "true".
In this mode NGINX does not use the content of the header to get the source IP address of the connection.
Each path in an Ingress is required to have a corresponding path type. Paths that do not include an explicit pathType will fail validation. By default NGINX path type is Prefix to not break existing definitions
If you are using a L4 proxy to forward the traffic to the NGINX pods and terminate HTTP/HTTPS there, you will lose the remote endpoint's IP address. To prevent this you could use the Proxy Protocol for forwarding traffic, this will send the connection details before forwarding the actual TCP connection itself.
Since 1.9.13 NGINX will not retry non-idempotent requests (POST, LOCK, PATCH) in case of an error. The previous behavior can be restored using retry-non-idempotent=true in the configuration ConfigMap.
The NGINX ingress controller does not use Services to route traffic to the pods. Instead it uses the Endpoints API in order to bypass kube-proxy to allow NGINX features like session affinity and custom load balancing algorithms. It also removes some overhead, such as conntrack entries for iptables DNAT.
By default NGINX uses the content of the header X-Forwarded-For as the source of truth to get information about the client IP address. This works without issues in L7 if we configure the setting proxy-real-ip-cidr with the correct information of the IP/network address of trusted external load balancer.
If the ingress controller is running in AWS we need to use the VPC IPv4 CIDR.
Another option is to enable proxy protocol using use-proxy-protocol: "true".
In this mode NGINX does not use the content of the header to get the source IP address of the connection.
Each path in an Ingress is required to have a corresponding path type. Paths that do not include an explicit pathType will fail validation. By default NGINX path type is Prefix to not break existing definitions
If you are using a L4 proxy to forward the traffic to the NGINX pods and terminate HTTP/HTTPS there, you will lose the remote endpoint's IP address. To prevent this you could use the Proxy Protocol for forwarding traffic, this will send the connection details before forwarding the actual TCP connection itself.
Since 1.9.13 NGINX will not retry non-idempotent requests (POST, LOCK, PATCH) in case of an error. The previous behavior can be restored using retry-non-idempotent=true in the configuration ConfigMap.
The NGINX ingress controller does not use Services to route traffic to the pods. Instead it uses the Endpoints API in order to bypass kube-proxy to allow NGINX features like session affinity and custom load balancing algorithms. It also removes some overhead, such as conntrack entries for iptables DNAT.
By default, deploying multiple Ingress controllers (e.g., ingress-nginx & gce) will result in all controllers simultaneously racing to update Ingress status fields in confusing ways.
To fix this problem, use IngressClasses. The kubernetes.io/ingress.class annotation is not being preferred or suggested to use as it can be deprecated in future. Better to use the field ingress.spec.ingressClassName. But, when user has deployed with scope.enabled, then the ingress class resource field is not used.
If all ingress controllers respect IngressClasses (e.g. multiple instances of ingress-nginx v1.0), you can deploy two Ingress controllers by granting them control over two different IngressClasses, then selecting one of the two IngressClasses with ingressClassName.
First, ensure the --controller-class= and --ingress-class are set to something different on each ingress controller, If your additional ingress controller is to be installed in a namespace, where there is/are one/more-than-one ingress-nginx-controller(s) already installed, then you need to specify a different unique --election-id for the new instance of the controller.
By default, deploying multiple Ingress controllers (e.g., ingress-nginx & gce) will result in all controllers simultaneously racing to update Ingress status fields in confusing ways.
To fix this problem, use IngressClasses. The kubernetes.io/ingress.class annotation is not being preferred or suggested to use as it can be deprecated in the future. Better to use the field ingress.spec.ingressClassName. But, when user has deployed with scope.enabled, then the ingress class resource field is not used.
If all ingress controllers respect IngressClasses (e.g. multiple instances of ingress-nginx v1.0), you can deploy two Ingress controllers by granting them control over two different IngressClasses, then selecting one of the two IngressClasses with ingressClassName.
First, ensure the --controller-class= and --ingress-class are set to something different on each ingress controller, If your additional ingress controller is to be installed in a namespace, where there is/are one/more-than-one ingress-nginx-controller(s) already installed, then you need to specify a different unique --election-id for the new instance of the controller.