* _Service Account:_ This is recommended, because nothing has to be configured. The Ingress controller will use information provided by the system to communicate with the API server. See 'Service Account' section for details.
* _Kubeconfig file:_ In some Kubernetes environments service accounts are not available. In this case a manual configuration is required. The Ingress controller binary can be started with the `--kubeconfig` flag. The value of the flag is a path to a file specifying how to connect to the API server. Using the `--kubeconfig` does not requires the flag `--apiserver-host`.
* _Using the flag `--apiserver-host`:_ Using this flag `--apiserver-host=http://localhost:8080` it is possible to specify an unsecured API server or reach a remote kubernetes cluster using [kubectl proxy](https://kubernetes.io/docs/user-guide/kubectl/kubectl_proxy/).
add the flag `--kubeconfig=/etc/kubernetes/kubeconfig.yaml` to the args section of the deployment.
## Using GDB with Nginx
[Gdb](https://www.gnu.org/software/gdb/) can be used to with nginx to perform a configuration
dump. This allows us to see which configuration is being used, as well as older configurations.
Note: The below is based on the nginx [documentation](https://docs.nginx.com/nginx/admin-guide/monitoring/debugging/#dumping-nginx-configuration-from-a-running-process).
## Image related issues faced on Nginx 4.2.5 or other versions (Helm chart versions)
1. Incase you face below error while installing Nginx using helm chart (either by helm commands or helm_release terraform provider )
```
Warning Failed 5m5s (x4 over 6m34s) kubelet Failed to pull image "registry.k8s.io/ingress-nginx/kube-webhook-certgen:v1.3.0@sha256:549e71a6ca248c5abd51cdb73dbc3083df62cf92ed5e6147c780e30f7e007a47": rpc error: code = Unknown desc = failed to pull and unpack image "registry.k8s.io/ingress-nginx/kube-webhook-certgen@sha256:549e71a6ca248c5abd51cdb73dbc3083df62cf92ed5e6147c780e30f7e007a47": failed to resolve reference "registry.k8s.io/ingress-nginx/kube-webhook-certgen@sha256:549e71a6ca248c5abd51cdb73dbc3083df62cf92ed5e6147c780e30f7e007a47": failed to do request: Head "https://eu.gcr.io/v2/k8s-artifacts-prod/ingress-nginx/kube-webhook-certgen/manifests/sha256:549e71a6ca248c5abd51cdb73dbc3083df62cf92ed5e6147c780e30f7e007a47": EOF
```
Then please follow the below steps.
2. During troubleshooting you can also execute the below commands to test the connectivities from you local machines and repositories details
a. curl registry.k8s.io/ingress-nginx/kube-webhook-certgen@sha256:549e71a6ca248c5abd51cdb73dbc3083df62cf92ed5e6147c780e30f7e007a47 > /dev/null
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0
(⎈ |myprompt)➜ ~
```
b. curl -I https://eu.gcr.io/v2/k8s-artifacts-prod/ingress-nginx/kube-webhook-certgen/manifests/sha256:549e71a6ca248c5abd51cdb73dbc3083df62cf92ed5e6147c780e30f7e007a47
Redirection in the proxy is implemented to ensure the pulling of the images.
3. This is the solution recommended to whitelist the below image repositories :
```
*.appspot.com
*.k8s.io
*.pkg.dev
*.gcr.io
```
More details about the above repos :
a. *.k8s.io -> To ensure you can pull any images from registry.k8s.io
b. *.gcr.io -> GCP services are used for image hosting. This is part of the domains suggested by GCP to allow and ensure users can pull images from their container registry services.
c. *.appspot.com -> This a Google domain. part of the domain used for GCR.
One possible reason for this error is lack of permission to bind to the port. Ports 80, 443, and any other port <1024areLinuxprivilegedportswhichhistoricallycouldonlybeboundbyroot.Theingress-nginx-controllerusestheCAP_NET_BIND_SERVICE [linux capability](https://man7.org/linux/man-pages/man7/capabilities.7.html) toallowbindingtheseportsasanormaluser(www-data/101).Thisinvolvestwocomponents:
1. In the image, the /nginx-ingress-controller file has the cap_net_bind_service capability added (e.g. via [setcap](https://man7.org/linux/man-pages/man8/setcap.8.html))
2. The NET_BIND_SERVICE capability is added to the container in the containerSecurityContext of the deployment.
If encountering this on one/some node(s) and not on others, try to purge and pull a fresh copy of the image to the affected node(s), in case there has been corruption of the underlying layers to lose the capability on the executable.
### Create a test pod
The /nginx-ingress-controller process exits/crashes when encountering this error, making it difficult to troubleshoot what is happening inside the container. To get around this, start an equivalent container running "sleep 3600", and exec into it for further troubleshooting. For example:
```yaml
apiVersion: v1
kind: Pod
metadata:
name: ingress-nginx-sleep
namespace: default
labels:
app: nginx
spec:
containers:
- name: nginx
image: ##_CONTROLLER_IMAGE_##
resources:
requests:
memory: "512Mi"
cpu: "500m"
limits:
memory: "1Gi"
cpu: "1"
command: ["sleep"]
args: ["3600"]
ports:
- containerPort: 80
name: http
protocol: TCP
- containerPort: 443
name: https
protocol: TCP
securityContext:
allowPrivilegeEscalation: true
capabilities:
add:
- NET_BIND_SERVICE
drop:
- ALL
runAsUser: 101
restartPolicy: Never
nodeSelector:
kubernetes.io/hostname: ##_NODE_NAME_##
tolerations:
- key: "node.kubernetes.io/unschedulable"
operator: "Exists"
effect: NoSchedule
```
* update the namespace if applicable/desired
* replace `##_NODE_NAME_##` with the problematic node (or remove nodeSelector section if problem is not confined to one node)
* replace `##_CONTROLLER_IMAGE_##` with the same image as in use by your ingress-nginx deployment
* confirm the securityContext section matches what is in place for ingress-nginx-controller pods in your cluster
Apply the YAML and open a shell into the pod.
Try to manually run the controller process:
```console
$ /nginx-ingress-controller
```
You should get the same error as from the ingress controller pod logs.
Confirm the capabilities are properly surfacing into the pod:
```console
$ grep CapBnd /proc/1/status
CapBnd: 0000000000000400
```
The above value has only net_bind_service enabled (per security context in YAML which adds that and drops all). If you get a different value, then you can decode it on another linux box (capsh not available in this container) like below, and then figure out why specified capabilities are not propagating into the pod/container.
```console
$ capsh --decode=0000000000000400
0x0000000000000400=cap_net_bind_service
```
## Create a test pod as root
(Note, this may be restricted by PodSecurityPolicy, PodSecurityAdmission/Standards, OPA Gatekeeper, etc. in which case you will need to do the appropriate workaround for testing, e.g. deploy in a new namespace without the restrictions.)
To test further you may want to install additional utilities, etc. Modify the pod yaml by:
* changing runAsUser from 101 to 0
* removing the "drop..ALL" section from the capabilities.
Some things to try after shelling into this container:
Try running the controller as the www-data (101) user:
```console
$ chmod 4755 /nginx-ingress-controller
$ /nginx-ingress-controller
```
Examine the errors to see if there is still an issue listening on the port or if it passed that and moved on to other expected errors due to running out of context.
Install the libcap package and check capabilities on the file:
```console
$ apk add libcap
(1/1) Installing libcap (2.50-r0)
Executing busybox-1.33.1-r7.trigger
OK: 26 MiB in 41 packages
$ getcap /nginx-ingress-controller
/nginx-ingress-controller cap_net_bind_service=ep
```
(if missing, see above about purging image on the server and re-pulling)
Strace the executable to see what system calls are being executed when it fails:
```console
$ apk add strace
(1/1) Installing strace (5.12-r0)
Executing busybox-1.33.1-r7.trigger
OK: 28 MiB in 42 packages
$ strace /nginx-ingress-controller
execve("/nginx-ingress-controller", ["/nginx-ingress-controller"], 0x7ffeb9eb3240 /* 131 vars */) = 0