Monitoring Deployment Guide#
This guide shows how to connect AUP Learning Cloud to a Prometheus and Grafana monitoring stack. It uses kube-prometheus-stack as the recommended example, then shows how to reuse an existing Prometheus Operator and Grafana deployment.
The AUP Learning Cloud Helm chart can create the monitoring resources needed for Hub metrics: a ServiceMonitor, optional Grafana dashboard ConfigMaps, optional Prometheus alert rules, a metrics NetworkPolicy, and an authenticated token secret when authenticated scraping is enabled.
Prerequisites#
A Kubernetes cluster with AUP Learning Cloud installed or ready to install.
kubectlaccess with permission to create resources in themonitoringandjupyterhubnamespaces.Helm 3 installed locally.
Access to the AUP Learning Cloud deployment repository that contains
runtime/values.yamlandruntime/chart.
Install kube-prometheus-stack#
kube-prometheus-stack is the recommended reference deployment for Prometheus Operator, Prometheus, Alertmanager, and Grafana.
Artifact Hub page: https://artifacthub.io/packages/helm/prometheus-community/kube-prometheus-stack
1. Create the monitoring namespace#
kubectl create namespace monitoring
If the namespace already exists, this command can return an AlreadyExists error. That is safe to ignore.
2. Add the Helm repository#
helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm repo update
3. Install the stack#
Use the Helm release name monitoring in the monitoring namespace. This matches the default AUP Learning Cloud monitoring.releaseLabel: monitoring value.
helm upgrade --install monitoring prometheus-community/kube-prometheus-stack \
--namespace monitoring
The Prometheus Operator installed by this stack usually selects ServiceMonitor and PrometheusRule objects with the label release: monitoring. If you use a different Helm release name or custom selector, update monitoring.releaseLabel in AUP Learning Cloud to match that selector.
4. Check the monitoring pods#
kubectl -n monitoring get pods
kubectl -n monitoring get svc
Wait until the Prometheus Operator, Prometheus, and Grafana pods are running.
A working kube-prometheus-stack deployment should include pods similar to these:
alertmanager-monitoring-kube-prometheus-alertmanager-0 2/2 Running
monitoring-grafana-... 3/3 Running
monitoring-kube-prometheus-operator-... 1/1 Running
monitoring-kube-state-metrics-... 1/1 Running
prometheus-monitoring-kube-prometheus-prometheus-0 2/2 Running
The exact pod names and replica counts depend on the chart version and your cluster configuration.
Reuse an Existing Prometheus and Grafana Stack#
If your cluster already has Prometheus Operator and Grafana, you don’t need to install kube-prometheus-stack again. Instead, confirm these points with the monitoring owner:
The Prometheus Operator watches
ServiceMonitorresources in themonitoringnamespace.Prometheus can scrape services in the
jupyterhubnamespace.The operator selector matches the label used by AUP Learning Cloud. The chart creates
ServiceMonitorandPrometheusRuleresources withrelease: <monitoring.releaseLabel>.Grafana sidecar dashboard discovery reads ConfigMaps from the
monitoringnamespace withgrafana_dashboard: "1", if you want the AUP Learning Cloud dashboards to appear automatically.
For example, if the existing Prometheus stack selects release: platform-monitoring, set:
monitoring:
releaseLabel: platform-monitoring
Configure AUP Learning Cloud Monitoring Values#
Edit runtime/values.yaml and enable the monitoring options you need.
Recommended production configuration:
monitoring:
enabled: true
namespace: monitoring
releaseLabel: monitoring
hubMetrics:
enabled: true
allowUnauthenticatedScrape: false
serviceAnnotations:
enabled: false
serviceMonitor:
enabled: true
interval: 15s
authorization:
enabled: true
type: Bearer
hubServiceName: prometheus-metrics
secret:
create: true
name: ""
key: token
grafana:
dashboard:
enabled: true
prometheusRule:
enabled: true
Value Reference#
Value |
Description |
|---|---|
|
Master switch for AUP Learning Cloud monitoring resources. Keep this |
|
Namespace where monitoring objects are created. Use |
|
Value used for the |
|
Enables Hub metrics integration. The chart also creates a metrics |
|
Allows |
|
Adds |
|
Creates a |
|
Scrape interval for the Hub metrics endpoint, such as |
|
Adds ServiceMonitor authorization settings. Keep this |
|
Authorization type passed to the ServiceMonitor. The default is |
|
JupyterHub service account used for the metrics token. The default |
|
Creates a token secret in the monitoring namespace when set to |
|
Optional existing or custom secret name. Leave empty to use the chart-generated |
|
Secret key that stores the token. The default is |
|
Creates Grafana dashboard ConfigMaps in the monitoring namespace with label |
|
Creates Prometheus alert rules for |
Apply the AUP Learning Cloud Configuration#
Run the upgrade from the deployment repository root.
cd deploy
helm upgrade jupyterhub ../runtime/chart --namespace jupyterhub \
-f ../runtime/values.yaml
If your deployment uses an additional local or environment-specific values file, include it in the same command. For example:
helm upgrade jupyterhub ../runtime/chart --namespace jupyterhub \
-f ../runtime/values.yaml -f ../runtime/values.local.yaml
Verify the Setup#
Check that the AUP Learning Cloud monitoring resources exist:
kubectl -n monitoring get servicemonitor hub-metrics
kubectl -n monitoring get secret | grep metrics-token
kubectl -n monitoring get configmap grafana-dashboard-aup-hub
kubectl -n jupyterhub get networkpolicy hub-metrics
If monitoring.prometheusRule.enabled: true, also check the Hub alert rule:
kubectl -n monitoring get prometheusrule hub-alerts
A working cluster with ServiceMonitor, authenticated scraping, Grafana dashboards, and metrics NetworkPolicy enabled should show objects like this:
servicemonitor.monitoring.coreos.com/hub-metrics
secret/hub-metrics-token
configmap/grafana-dashboard-aup-hub
networkpolicy.networking.k8s.io/hub-metrics
Check that Prometheus sees the Hub target:
kubectl -n monitoring port-forward svc/monitoring-kube-prometheus-prometheus 9090:9090
Open http://127.0.0.1:9090/targets and look for the hub-metrics target. It should be UP.
You can also verify from the Prometheus API. With the port-forward still running, query the Hub scrape target:
curl -fsSL 'http://127.0.0.1:9090/api/v1/query?query=up%7Bjob%3D%22hub%22%7D'
A healthy result contains "job":"hub", "namespace":"jupyterhub", and a final value of "1":
{
"status": "success",
"data": {
"resultType": "vector",
"result": [
{
"metric": {
"job": "hub",
"namespace": "jupyterhub",
"service": "hub"
},
"value": ["<timestamp>", "1"]
}
]
}
}
Check that Grafana can discover the AUP Learning Cloud dashboards through the dashboard ConfigMap:
kubectl -n monitoring describe configmap grafana-dashboard-aup-hub
The ConfigMap should contain these dashboard files:
aup-hub-operations.json
aup-hub-notebook-resources.json
If your Grafana deployment uses the standard sidecar dashboard loader, these ConfigMaps are enough. You do not need to expose Grafana publicly just to validate this step.
Useful AUP Learning Cloud Hub metrics include:
hub_spawn_gpu_totalhub_spawn_failed_totalhub_active_sessionshub_session_runtime_minuteshub_spawn_duration_secondshub_quota_denied_totalhub_quota_deducted_totalhub_pod_failure_totalhub_repo_clone_failed_total
Troubleshooting#
ServiceMonitor Exists but Prometheus Does Not Scrape It#
Check the release label:
kubectl -n monitoring get servicemonitor hub-metrics --show-labels
If Prometheus expects a different label, update monitoring.releaseLabel and run the Helm upgrade again.
Token Secret Is Missing#
Confirm these values are enabled:
monitoring:
enabled: true
hubMetrics:
enabled: true
serviceMonitor:
enabled: true
authorization:
enabled: true
secret:
create: true
The chart also validates that monitoring.serviceMonitor.authorization.hubServiceName exists under hub.services and has a matching hub.loadRoles entry with the read:metrics scope.
Grafana Dashboards Do Not Appear#
Check that the dashboard ConfigMap was created:
kubectl -n monitoring get configmap grafana-dashboard-aup-hub --show-labels
The ConfigMap uses grafana_dashboard: "1". Your Grafana sidecar or dashboard loader must watch the monitoring namespace and this label.
Prometheus Alerts Do Not Appear#
Check the rule label and namespace:
kubectl -n monitoring get prometheusrule hub-alerts --show-labels
The rule must be in a namespace watched by the Prometheus Operator, and its release label must match the operator’s rule selector.