At work we use Google Cloud Platform to run our machine learning jobs on
multiple machines. GCP has a monitoring platform called Stackdriver which can
be used to view all kinds of metrics about your VMs. Unfortunately, it
doesn't collect any metrics about GPUs,