Skip to main content

Tensorboard

Introduction#

In this tutorial, you will learn

  • How to run Tensorboard in the compute node using Slurm
  • How to access the Tensorboard from your browser in your local machine

If you are not familiar with Slurm yet, please take a look at this tutorial first.

Run Tensorboard#

The following command will run Tensorboard at <current dir>/logdir using port 6006. If you found that the requested port 6006 is taken by other processes, feel free to change it.

$ srun --job-name tensorboard --mem 50G --container-image=registry.apex.cmkl.ac.th#nvidia/tensorflow:21.05-tf2-py3 \
--no-container-mount-home --container-mounts=`pwd`:/work --pty \
tensorboard --logdir=/work/logdir --port=6006
TensorBoard 2.4.1 at http://prism-1.apex.cmkl.ac.th:6006/ (Press CTRL+C to quit)

From the log above, Tensorboard is running at http://prism-1.apex.cmkl.ac.th:6006/. Note that, Tensorboard might not always be ran on prism-1. This result depends on Slurm's management, unless you manually request for prism-1.

Important ! You should not request gpus for Tensorboard.

Accessing Tensorboard#

You have to use ssh tunneling in order to access the running Tensorboard in CMKL server from your local machine. The idea is to access the Tensenboard website behide CMKL's firewall by passing the packets through ssh port instead of the typical tcp ports.

The command below redirects packets that have the destination of localhost:6006 to CMKL server. Then, from CMKL server, the packets are forwarded to prism-1:6006, which is the address for Tensorboard from previous section.

ssh -NL localhost:6006:prism-1:6006 <your_cmkl_user>@apex-login.cmkl.ac.th

In summary; local browser: localhost:6006 -> CMKL login node -> prism-1:6006 -> Tensorboard.

More details can be found in https://www.ssh.com/academy/ssh/tunneling/example#what-is-ssh-port-forwarding,-aka-ssh-tunneling?