Skip to main content

Cluster Administration

Identity Management#

Apex identity management system is provided by FreeIPA on archon-3. The system is running in isolation from the rest of the cluster using LXD. IPA provides bundled identiy, authentication and policy management with LDAP and Kerberos support for linux-based services and applications. IPA also provides a web portal for identity provisioning & management.

Dex is also deployed on Kubernetes auth namespace and provides OAuth/OpenID Connect mechanism for web-based/modern applications.

We are working on streamlining IPA with Goliath for account generation. At the moment, we need users to complete Apex Trial Request Form for account generation.

User Home Directory#

User home directories are provisioned on /lustre/ai/home (under /lustre/ai filesystem). The directory is also automatically mounted to /home as a shared directory for all machines in the cluster. Occasionally, after server crashes or network failure, the machine may need to re-sync its cached with lustre making the /home unavailable. Users are recommended to refrain from using the machine until /home has been properly re-mounted as any data written before the sync to local /home will be shadowed by lustre mount.

Administrators should ensure that all lustre directories (/lustre/ai, /lustre/scratch, /lustre/testfs, /home) are properly mounted when the system restarts.

Ansible and DeepOps#

Ansible can be used to automate many repetitive tasks required to manage the host cluster. Adhoc ansible runs should re-use DeepOps/config/inventory for inventory file to ensure node configuration consistency. DeepOps used the inventory file as the configuration to provision Kubernetes using Kubespray.

Ideally, we should be able to provision & configure bare-metal systems declaratively using Ansible or Crossplane. The admin team is working toward the goal which will help our long-term operation & maintability of the cluster. For references, see metal-stack,gardener and KubeKey.