Troubleshooting Traefik routing and Jaeger for observability
As a continuation of last article for configuring Traefik, there have been quite some moment that the routing does not work as expected, so it would worth some discussion and sharing on how I tried to troubleshoot.
Items I tried
Traefik access log
Access log to me is important to know what kind of access it does that go through Traefik.
By default, the Helm chart is setting log level to ERROR and access log is disabled.
I did configure the helm chart overriding values yaml to set log level to DEBUG and access log to true.
By reviewing the documentation of log and access log, log level and access log configuration are configured through file based configuration (through defined yaml file in config map and bind to a volume). But the enable/disable of access log is not in such configuration.
By further inspecting the helm chart pod template, the log level and access log enabling are both configured using command line interface (CLI)
We can then prepare to monitor the log, by running kubectl commnad
# the -f switch mean "follow", which we would see the screen update when new logs comes in
kubectl logs svc/<traefik service name> -f
Then we hit our site and read the log, there would be overwhelming number of logs, and we would look for the pattern like following:
Several piece of information can be noticed, first, the request path is /api/device/systemTime, and we see the X-Replaced-Path is “/device/systemTime, this align with the middleware to rewrite path from /device/* to /api/device/*.
Also the forward URL is going to 10.109.0.85:8031, which is one of the pod of the target service (one can check the with kubectl get pod -o=wide)
Traefik comes with a dashboard, one could follow the Traefik documentation to setup (either insecure mode or secure mode with authentication using Kubernetes secret)
By default, an IngressRoute would be added with rule
(PathPrefix(`/api`) || PathPrefix(`/dashboard`))
This is not enough as it does not specify the hostname and there might be some other services that conflict with the same pattern (likely /api prefix), the documentation recommend adding rule like following:
And sure, if we do not want to config DNS, we could update the host table to point the domain name to the Traefik load balancer IP address.
By accessing the URL, one would need to use the pattern:
http://<traefik dashboard hostname>/dashboard/
In my scenario, without the ending “/” after dashboard would not work (it somehow being matched to another route)
The dashboard helped me to see the list of entrypoints and confirm the configuration is taking effect as expected.
Also the routers route show some error and drilling down would see a list of rules as well as the detail (middleware and error cause)
I know Jaeger through Traefik documentation section observability > tracing > Jaeger, I only used it to inspect particular request info as well as the middleware being applied.
I install Jaeger with the jaeger operator helm chart (their github here with instruction of installation).
A minimal deployment would be a Kubernetes deployment config yaml like following:
And also at Traefik, add the “additionalArguments” or other ways to apply the CLI as following:
Finally adding a IngressRoute rule:
Adding DNS record to host table pointing “jaeger-query” to the Traefik load balance IP address and open browser to hit the url http://jaeger-query.
We would see the UI as following, and clicking one of the traffic we would see more.
Jaeger is fall more powerful for traceability, just that I haven’t learn to use more of it yet.
In this article, I shared the tools I used to troubleshoot / trace the Traefik routes, hopes that help someone~