Grafana Dashboards
Dashboards allow cluster telemetry data to be visualized interactively in near real-time. Omnistat provides several sample Grafana dashboards for cluster-wide deployments that vary depending on whether resource manager integration is desired or not (screenshots of the variants with resource manager integration enabled are highlighted in Example Screenshots). JSON sources for example dashboards that can be used in local deployments are highlighted below. Note that in addition to querying GPU data gathered with the Omnistat data collector, these example dashboards assume that node-exporter data is also being collected.
RMS dashboards provide integration with Resource Managers like SLURM.
Global Dashboard: provides an overview of the system, cluster-level telemetry for allocated and unallocated nodes, and job indices.
Node Dashboard: job allocation timeline and detailed metrics for a single node in the cluster.
Job Dashboard: provides detailed time-series data, load distribution, and other metrics for a single job.
Standalone dashboards are meant to work without a resource manager.
Global Dashboard: provides an overview of the system and cluster-level telemetry.
Node Dashboard: detailed metrics for a single node in the cluster.
Grafana server
To visualize Omnistat’s monitoring data, Grafana needs to be installed and configured to use the Prometheus server described in the system-wide installation.
Installation
The official Grafana documentation describes several ways to install Grafana, including using packages for major operating systems.
Connectivity between Grafana server and Prometheus server is required to display Omnistat data, so Grafana is typically installed and runs on an administrative host. If the host chosen to support the Prometheus server can route out externally, you can also leverage public Grafana Cloud infrastructure and forward system telemetry data to an external Grafana instance.
Note
We recommend the official documentation for production systems. However, if you are only interested in testing the dashboards, you can use the Grafana Docker image. For example, run a temporary Grafana container with the following command, load localhost:3000 in a browser, and then follow the steps below to configure the Grafana server and import dashboards.
docker run -e GF_AUTH_ANONYMOUS_ENABLED=true -e GF_AUTH_ANONYMOUS_ORG_ROLE=Admin -e GF_USERS_DEFAULT_THEME=light -it --rm -p 3000:3000 grafana/grafana
Data source
The Prometheus server configured as part of the Omnistat installation needs to be added to Grafana as a new data source.
To add a data source to Grafana:
Click Connections in the left-side menu.
Enter “Prometheus” in the search dialog, and click the Prometheus button under the search box.
Configure the new Prometehus data source following instructions and provide the hostname and port where Omnistat’s Prometheus server is running.
Import dashboards
To import a dashboard to an existing Grafana server:
Click Dashboards in the left-side menu.
Click New and select New Dashboard from the drop-down menu.
On the dashboard, click + Add visualization.
Upload the dashboard JSON file.
Sample dashboards are configured using standard default values for settings such as network ports, but may require changes depending on the environment. The following variables represent the most relevant dashboard settings:
source
: Name of the Prometheus data source where the data is stored. Defaults toprometheus
, and may require filtering if the Grafana instance has several Prometheus data sources.node_exporter_port
: Port of the Prometheus Node Exporter. Defaults to9100
.
To configure a dashboard:
Open a dashboard in edit mode.
Click Dashboard settings located at the top of the page.
Click Variables.
Click the desired variable and update its value.