High Powered Computing Cluster
About
The Mathematics Department has a high powered computing cluster available to all Mathematics Graduate Students, Faculty, Staff, and Post-Doc researchers. The cluster currently totals 23 nodes and is powered by the Slurm batch engine running on Ubuntu 16.04. Memberships to the cluster are updated daily to reflect student/staff standing automatically. Nodes 1-12 are powered by two Intel(R) Xeon(R) CPU E5-2690 8-core CPUs @ 2.90GHz with 128GB of DDR3 RAM. Nodes 12-23 are powered by two Intel(R) Xeon(R) CPU E5-2690 14 core CPUs @ 2.60GHz with 128GB of DDR3 RAM.
Restrictions
Individual jobs on the Mathematics HPC cluster are limited to 1, 4, 8 or 12 cores and a maximum runtime of 10 days. These restrictions exist to ensure the HPC is always running at maximum efficiency and that rogue jobs do not hog CPU and Memory resources. There are no limits on the amount of jobs that you can submit, but they will run in order that they were submitted to the HPC. If you have a use case outside of these restrictions that you feel is valid, please reach out and let us know at support@math.ncsu.edu.
Connecting
Connecting to the HPC and utilizing the processing power is handled over SSH. Connections are only accept from on-campus networks. If you need to utilize the HPC cluster while off-campus, you must first connect to NCSU’s secure VPN.
You can establish an ssh connection using the following command:
ssh yourUnityID@hpc.math.ncsu.edu
After entering your NC State Unity account password, you will be given a shell session on the head node of the cluster. Do not execute code on this node directly. Doing so will cause adverse affects to the cluster and will not be effective for your work due to the limited resources of the head node.
Data Storage
The cluster’s data partition is backed by a resilient 20TB storage array. There are no quotas on user directories, but if you intend on storing more than ~500GB on the cluster for long periods of time, you should contact support@math.ncus.edu to ensure the proper availability of the storage for your extended jobs. All data should be kept to your home directory (~ or /mnt/HpcStor/home/unityId) to be shared properly with all HPC nodes.
You can use a variety of SFTP clients to send and receive files using the hostname hpc.math.ncsu.edu and the login credentials of your Unity ID and password. FileZilla is an example of a solid S/FTP application. You can also use macOS' Finder app.
Submitting Jobs
Once you have established an SSH connection you can submit jobs using thesubmit
andsubmit_multi
commands. If you enter the command without any parameters, it will give you a printout of available applications to submit, and the correct syntax. Only submit files that are within your home directory, no other storage locations are shared with the HPC nodes. submit is used for single core jobs, and only 1 CPU core and 8GB of memory are assigned to each job. submit_multi is used to submit multi-core jobs with 6GB of memory per CPU core assigned to each job. You may submit as many jobs as you like, as frequently as you like; however, they will only run once there are adequate resources available on the cluster.
Adding Packages and Apps
Your user directory is shared between all worker nodes. Thus, any executables or packages installed there will be available to all nodes to complete work with. With Python packages, you can install with pip to your user directory yourself. Here's an example with the Python package plist:
pip install --user plist
Troubleshooting and Assistance
For any questions or troubleshooting requests, contact support@math.ncsu.edu to get in touch with a specialist. Include as much detail, logs, and screenshots as you can so we can work to identify the problem and come up with a solution as soon as we reach out to you.
Mounting Google Drive
You can mount your individual NC State Google Drive, or NC State Google Team Drive on the HPC as well. This is useful if your are archiving large amounts of data for later use. It is not advised to execute code directly from your mounted Google Drive filesystem due to the network speed limitation.
Here is an example of how to make a remote called drive
. First, run:
rclone config
This will guide you through an interactive setup process:
No remotes found - make a new one
n) New remote
r) Rename remote
c) Copy remote
s) Set configuration password
q) Quit config
n/r/c/s/q> n
name> drive
Type of storage to configure.
Choose a number from below, or type in your own value
[snip]
10 / Google Drive
\ "drive"
[snip]
Storage> 10
Google Application Client Id - leave blank normally.
client_id>
Google Application Client Secret - leave blank normally.
client_secret>
Scope that rclone should use when requesting access from drive.
Choose a number from below, or type in your own value
1 / Full access all files, excluding Application Data Folder.
\ "drive"
2 / Read-only access to file metadata and file contents.
\ "drive.readonly"
/ Access to files created by rclone only.
3 | These are visible in the drive website.
| File authorization is revoked when the user deauthorizes the app.
\ "drive.file"
/ Allows read and write access to the Application Data folder.
4 | This is not visible in the drive website.
\ "drive.appfolder"
/ Allows read-only access to file metadata but
5 | does not allow any access to read or download file content.
\ "drive.metadata.readonly"
scope> 1
ID of the root folder - leave blank normally. Fill in to access "Computers" folders. (see docs).
root_folder_id>
Service Account Credentials JSON file path - needed only if you want use SA instead of interactive login.
service_account_file>
Remote config
Use auto config?
* Say Y if not sure
* Say N if you are working on a remote or headless machine or Y didn't work
y) Yes
n) No
y/n> n
#>>>> At this point rclone will provide you with a URL. Open that URL in your local browser, log in, copy the provided security key and enter it here.
Configure this as a team drive?
y) Yes
n) No
y/n> n
--------------------
[remote]
client_id =
client_secret =
scope = drive
root_folder_id =
service_account_file =
token = {"access_token":"XXX","token_type":"Bearer","refresh_token":"XXX","expiry":"2014-03-16T13:57:58.955387075Z"}
--------------------
y) Yes this is OK
e) Edit this remote
d) Delete this remote
y/e/d> y
rclone mount drive: ~/Drive &