Xeon Phi Setup/Build Notes

These notes cover the creation of a Debian 10.9 (buster) server with ZFS root which serves as host to Knights Corner Xeon Phi coprocessor cards.

Each of these coprocessor cards features a P54C-derived core extended to support the X86-64 instruction set, 4-way SMT, and a beefy 512-bit vector processor bolted alongside. Sixty of these cores are connected on a roughly 1 terabit/s bi-directional ring bus. In addition to 8GB of GDDR5 RAM, each core has 512kB of local cache and, via the ring bus and a distributed tag store, all caches are coherent and quickly accessible from remote cores. This hardware is packaged up on a PCIe card which presents a virtual network interface to the host. The coprocessor card runs Linux+BusyBox, allowing SSH access to a traditional Linux environment on a familiar 60-core x86-64 architecture.

The hostname frostburg.subgeniuskitty.com stems from the original FROSTBURG, a CM-5 designed by Thinking Machines. Although the fundamental connection topology of a fat tree was different than the ring used in this Xeon Phi, the systems are somewhat similar. Both feature a NUMA cluster of repackaged and extended commercial processor cores operating on independent instruction streams in a MIMD fashion focused on small local data stores. By coincidence, both also feature similar core counts and total memory size.

The information on this page includes:

These notes are a high-level checklist for my reference rather than a step-by-step installation guide for the public. That means they make no attempt to explain all options at each step, rather that they mention only the options I use on my servers. It also means they use my domains, my file system paths, etc in the examples. Don’t blindly copy and paste.


Hardware

The host system was kept low power both figuratively and literally. It will primarily serve as a host for the Phi coprocessors and bridge to the network.

To enter the BIOS, use the DEL key. Similarly, a boot device selection menu is obtained by pressing F11. System will display two-character status codes in the bottom right corner of display.

Support files are stored under hw_support/Intel Xeon Phi/supermicro/.

Memory

Using eight identical sticks of MT36JSZF51272PZ-1G4 RAM. These are ECC DDR3 2Rx4 PC3-10600 RDIMMS operating at 1.5V. Per page 2-12 of the manual (MNL_1502.pdf), DIMMs are installed in all blue memory slots.

Processors & Heatsinks

Xeon E5-2637 CPUs selected for lower power, high frequency, cheap price, and ‘full’ PCIe lane count. They only need to be a host for the real show. Per page 5-7 of the chassis manual (MNL-1564.pdf), CPU1 requires heatsink SNK-P0048PS and CPU2 requires heatsink SNK-P0047PS.

SAS Backplane & Motherboard SATA

The SAS backplane is a little odd. The first eight drive bays connect via a pair of SFF-8087 connectors and the last two drive bays connect via standard 7-pin SATA connectors.

Since the motherboard provides ten 7-pin SATA connectors, two cables breaking out SFF-8087 to quad SATA will be required. I tried using just such a cable, but had no luck. There doesn’t appear to be anything configurable on the backplane itself. The backplane manual is stored at BPN-SAS-218A.pdf. My cable was of unknown origin. Per photos on some eBay auctions, the proper Supermicro cable appears to be part number 672042095704. In addition to the four SATA connectors, this cable also bundles some sort of 4-pin header, presumably the SGPIO connection.

In the meantime, since I only intend to use two small drives in a ZFS mirror for the OS and home directories, with all other storage on network shares, simply use the last two slots and connect with normal 30"+ SATA cables.

These last two drive bay slots are connected to the two white SATA ports on the motherboard, with the lowest numbered drive slot connected to the rear-most white SATA port. When SFF-8087 connectors are eventually used to increase local storage, relocate the boot drives to drive slots 0 and 1, and connect these slots to the white SATA ports.

On the motherboard, the white ports are SATA3 and the black ports are SATA2. The line of 2x white and 4x black SATA ports are part of the primary SATA controller or I_SATA. The other line of 4x black SATA ports is part of the secondary or S_SATA controller. Put any boot drives on the I_SATA ports.

Xeon Phi

Section 5.1 of the Intel Xeon Phi Coprocessor Datasheet (DocID 328209-004EN) mentions that connecting the card via both 2x4 and 2x3 power connectors enables higher sustained power draw up to 245 watts versus 225 watts of other power cable configurations. This chassis will easily support the higher power draw and heat dissipation.

The Xeon Phi coprocessor cards reserve PCIe MMIO address space sufficient to map the entire coprocessor card’s RAM. Since this is >4GB, PCIe Base Address Registers (BAR) of greater than 32-bit size are required. This should be enabled in the BIOS of this particular motherboard under PCIe/PCI/PnP Configuration -> Above 4G Decoding.

In general, motherboards with chipsets equal to or newer than the C602 should work. This includes most Supermicro motherboards from the X9xxx generation or later. None of the Supermicro X8xxx generation motherboards appear to be compatible.

The Xeon Phi 5110P, per the suffix, is passively cooled. Section 3 of the Intel Xeon Phi Coprocessor Datasheet (DocID 328209-004EN) details the cooling and mounting requirements.

Optional Fans

There are a number of optional fans for this chassis, all detailed in the chassis manual (MNL-1564.pdf). My machine includes the optional fan for another double-height, full-length PCIe card with backpanel IO slots, intended to support something like a GPU to drive monitors. Since the optional fan is installed and since the power budget easily supports it, this means the fifth Xeon Phi card could be installed, albeit with slower PCIe connection.

Regardless, since this fan is installed, whenever fewer than four Xeon Phi cards are installed, preferentially locate them on the left hand side of chassis, near the lower numbered drive bays.

Power Supply

The system contains dual redundant power supplies. Each is capable of supplying 1600 watts, but only when connected to a 240 volt source. When connected to a 120 volt source, maximum power output is 1000 watts.

Rackmount

The chassis is over 30" long and protrudes from rear of rack by approximately ½". To avoid the rear cable snagging passing carts and elbows, chassis was mounted at top of rack (after empty 1U). The Supermicro rails required cutting four notches in the vertical posts, so this is a semi-permanent home.

Inserting or extracting the server from the rack at that height requires an extraordinary amount of free space in front of the rack and some advance planning. Where possible, try to do hardware modifications in-rack. The rails are extremely solid even when the server is fully extended. The grey OS-114/WQM-4 sonar test set chassis makes a solid step stool at the ideal height for working on the server while installed in the rack.

USB Ports

There are only two USB ports, both located on the rear of the chassis. During OS installation, if a mouse is required in addition to the keyboard and USB install drive, then a USB hub is required.


Debian Buster Installation

These installation instructions use the following XFCE Debian live image.

debian-live-10.9.0-amd64-xfce.iso

Both the Gnome and XFCE live images were unusably slow in GUI mode. The text installer was fast and responsive, as were VTYs (Ctrl+Alt+F2) from within the live environment. Only the GUIs were slow, but they were slow to the point of being unusable, with single keypresses registering over a dozen times. Once Debian was installed on the SSD and booting normally, the GUI is perfectly usable. Since the local terminal is only used to install and start an OpenSSH daemon, and since this can be done from a VTY, the issue was not investigated further.

The root on ZFS portion of this installation process is derived from the guide located here:

https://openzfs.github.io/openzfs-docs/Getting%20Started/Debian/Debian%20Buster%20Root%20on%20ZFS.html

Remote Access

From the F11 BIOS boot menu, select the UEFI entry for the USB live image. Lacking a mouse, press CTRL+ALT+F2 after X is running in order to access a text-only VTY, already logged in as the user user. Install an SSH server so the remaining install can be done over the network.

apt-get update
apt-get install openssh-server
systemctl enable ssh

From wherever you intend to complete the install, SSH into the live Debian environment as user user with password live.

ZFS Configuration

Edit /etc/apt/sources.list to include the following entries.

deb http://deb.debian.org/debian/ buster main contrib
deb http://deb.debian.org/debian/ buster-backports main contrib
deb-src http://deb.debian.org/debian/ buster main contrib

Install the ZFS kernel module. Specify --no-install-recommends to avoid picking up zfsutils-linux since it will fail at this point. See https://github.com/openzfs/zfs/issues/9599 for more details.

apt-get install -t buster-backports --no-install-recommends zfs-dkms
modprobe zfs

With the kernel module successfully loaded, proceed to install ZFS.

apt-get install -t buster-backports zfsutils-linux

After using dd to eliminate any existing partition tables, partition the disks for use with UEFI and ZFS.

First, create a UEFI partition on each disk.

sgdisk -n2:1M:+512M -t2:EF00 /dev/disk/by-id/ata-INTEL_SSDSA2M160G2GN_BTPO1252011L160AGN

Next, create a partition for the boot pool.

sgdisk -n3:0:+1G -t3:BF01 /dev/disk/by-id/ata-INTEL_SSDSA2M160G2GN_BTPO1252011L160AGN

Finally, create a partition for the encrypted pool.

sgdisk -n4:0:0 -t4:BF00 /dev/disk/by-id/ata-INTEL_SSDSA2M160G2GN_BTPO1252011L160AGN

Now that partitioning is complete, create the boot and root pools.

The boot pool uses only ZFS options supported by GRUB.

zpool create \
    -o cachefile=/etc/zfs/zpool.cache \
    -o ashift=12 -d \
    -o feature@async_destroy=enabled \
    -o feature@bookmarks=enabled \
    -o feature@embedded_data=enabled \
    -o feature@empty_bpobj=enabled \
    -o feature@enabled_txg=enabled \
    -o feature@extensible_dataset=enabled \
    -o feature@filesystem_limits=enabled \
    -o feature@hole_birth=enabled \
    -o feature@large_blocks=enabled \
    -o feature@lz4_compress=enabled \
    -o feature@spacemap_histogram=enabled \
    -o feature@zpool_checkpoint=enabled \
    -O acltype=posixacl -O canmount=off -O compression=lz4 \
    -O devices=off -O normalization=formD -O relatime=on -O xattr=sa \
    -O mountpoint=/boot -R /mnt \
    bpool mirror \
    /dev/disk/by-id/ata-INTEL_SSDSA2M160G2GN_BTPO1252011L160AGN-part3
    /dev/disk/by-id/ata-INTEL_SSDSA2M160G2GN_BTHC72250AKD480MGN-part3

Now create the root pool with ZFS encryption.

zpool create \
    -o ashift=12 \
    -O encryption=aes-256-gcm \
    -O keylocation=prompt -O keyformat=passphrase \
    -O acltype=posixacl -O canmount=off -O compression=lz4 \
    -O dnodesize=auto -O normalization=formD -O relatime=on \
    -O xattr=sa -O mountpoint=/ -R /mnt \
    rpool mirror \
    /dev/disk/by-id/ata-INTEL_SSDSA2M160G2GN_BTPO1252011L160AGN-part4
    /dev/disk/by-id/ata-INTEL_SSDSA2M160G2GN_BTHC72250AKD480MGN-part4

All the pools are created, so now it’s time to setup filesystems. Start with some containers.

zfs create -o canmount=off -o mountpoint=none rpool/ROOT
zfs create -o canmount=off -o mountpoint=none bpool/BOOT

Now add filesystems for boot and root.

zfs create -o canmount=noauto -o mountpoint=/ rpool/ROOT/debian
zfs mount rpool/ROOT/debian
zfs create -o mountpoint=/boot bpool/BOOT/debian

Create a filesystem to contain home directories and mount root’s homedir in the correct location.

zfs create rpool/home
zfs create -o mountpoint=/root rpool/home/root
chmod 700 /mnt/root

Create filesystems under /var and exclude temporary files from snapshots.

zfs create -o canmount=off rpool/var
zfs create -o canmount=off rpool/var/lib
zfs create rpool/var/log
zfs create rpool/var/spool
zfs create -o com.sun:auto-snapshot=false rpool/var/cache
zfs create -o com.sun:auto-snapshot=false rpool/var/tmp
chmod 1777 /mnt/var/tmp
zfs create rpool/var/mail

Create a few other misc filesystems.

zfs create rpool/srv
zfs create -o canmount=off rpool/usr
zfs create rpool/usr/local

Temporarily mount a tmpfs at /run.

mkdir /mnt/run
mount -t tmpfs tmpfs /mnt/run
mkdir /mnt/run/lock

Debian Configuration

Install a minimal Debian system.

apt-get install debootstrap
debootstrap buster /mnt

Copy the zpool cache into the new system.

mkdir /mnt/etc/zfs
cp /etc/zfs/zpool.cache /mnt/etc/zfs

Set the hostname.

echo frostburg > /mnt/etc/hostname
echo "127.0.1.1 frostburg.subgeniuskitty.com frostburg" >> /mnt/etc/hosts

Configure networking.

vi /mnt/etc/network/interfaces.d/enp129s0f0

auto enp129s0f0
iface enp129s0f0 inet static
    address 192.168.1.7/24
    gateway 192.168.1.1

vi /etc/resolv.conf

search subgeniuskitty.com
nameserver 192.168.1.1

Configure packages sources.

vi /mnt/etc/apt/sources.list

deb http://deb.debian.org/debian buster main contrib
deb-src http://deb.debian.org/debian buster main contrib

deb http://security.debian.org/debian-security buster/updates main contrib
deb-src http://security.debian.org/debian-security buster/updates main contrib

deb http://deb.debian.org/debian buster-updates main contrib
deb-src http://deb.debian.org/debian buster-updates main contrib

vi /mnt/etc/apt/sources.list.d/buster-backports.list

deb http://deb.debian.org/debian buster-backports main contrib
deb-src http://deb.debian.org/debian buster-backports main contrib

vi /mnt/etc/apt/preferences.d/90_zfs

Package: libnvpair1linux libuutil1linux libzfs2linux libzfslinux-dev libzpool2linux python3-pyzfs pyzfs-doc spl spl-dkms zfs-dkms zfs-dracut zfs-initramfs zfs-test zfsutils-linux zfsutils-linux-dev zfs-zed
Pin: release n=buster-backports
Pin-Priority: 990

apt-get update

Chroot into the new environment.

mount --rbind /dev  /mnt/dev
mount --rbind /proc /mnt/proc
mount --rbind /sys  /mnt/sys
chroot /mnt

Configure the new environment as a basic system.

ln -s /proc/self/mounts /etc/mtab
apt-get update
export TERM=vt100
apt-get install console-setup locales
dpkg-reconfigure locales tzdata keyboard-configuration console-setup

Install ZFS on the new system.

apt-get install dpkg-dev linux-headers-amd64 linux-image-amd64
apt-get install zfs-initramfs
echo REMAKE_INITRD=yes > /etc/dkms/zfs.conf

Install GRUB and configure UEFI boot partition.

apt-get install dosfstools
mkdosfs -F 32 -s 1 -n EFI /dev/disk/by-id/ata-INTEL_SSDSA2M160G2GN_BTPO1252011L160AGN-part2
mkdir /boot/efi
echo "/dev/disk/by-id/ata-INTEL_SSDSA2M160G2GN_BTPO1252011L160AGN-part2 /boot/efi vfat defaults 0 0" >> /etc/fstab
mount /boot/efi
apt-get install grub-efi-amd64 shim-signed
apt-get remove --purge os-prober

Ensure the bpool is always imported, even if /etc/zfs/zpool.cache doesn’t exist or doesn’t include a relevant entry.

vi /etc/systemd/system/zfs-import-bpool.service

[Unit]
DefaultDependencies=no
Before=zfs-import-scan.service
Before=zfs-import-cache.service

[Service]
Type=oneshot
RemainAfterExit=yes
ExecStart=/sbin/zpool import -N -o cachefile=none bpool
# Work-around to preserve zpool cache:
ExecStartPre=-/bin/mv /etc/zfs/zpool.cache /etc/zfs/preboot_zpool.cache
ExecStartPost=-/bin/mv /etc/zfs/preboot_zpool.cache /etc/zfs/zpool.cache

[Install]
WantedBy=zfs-import.target

systemctl enable zfs-import-bpool.service

Create a tmpfs mounted at /tmp.

cp /usr/share/systemd/tmp.mount /etc/systemd/system/
systemctl enable tmp.mount

Bootloader Configuration

Verify ZFS boot filesystem is recognized.

grub-probe /boot

Refresh initrd.

update-initramfs -c -k all

Configure GRUB by editing /etc/default/grub. Remove the quiet option from GRUB_CMDLINE_LINUX_DEFAULT and add the following two options to the appropriate entries.

GRUB_CMDLINE_LINUX="root=ZFS=rpool/ROOT/debian"
GRUB_TERMINAL=console

Install GRUB to the UEFI boot partition.

grub-install --target=x86_64-efi --efi-directory=/boot/efi --bootloader-id=debian-1 --recheck --no-floppy

Install GRUB on the other hard drives, incrementing -2 to -N as necessary.

umount /boot/efi
dd if=/dev/disk/by-id/scsi-SATA_disk1-part2 \
   of=/dev/disk/by-id/scsi-SATA_disk2-part2
efibootmgr -c -g -d /dev/disk/by-id/scsi-SATA_disk2 \
    -p 2 -L "debian-2" -l '\EFI\debian\grubx64.efi'
mount /boot/efi    

Fix filesystem mount ordering. Quoting from the install reference, “We need to activate zfs-mount-generator. This makes systemd aware of the separate mountpoints, which is important for things like /var/log and /var/tmp. In turn, rsyslog.service depends on var-log.mount by way of local-fs.target and services using the PrivateTmp feature of systemd automatically use After=var-tmp.mount.”

mkdir /etc/zfs/zfs-list.cache
touch /etc/zfs/zfs-list.cache/bpool
touch /etc/zfs/zfs-list.cache/rpool
zed -F

From another SSH session, verify that zed updated the cache by making sure the previously created empty files are not empty.

cat /etc/zfs/zfs-list.cache/bpool
cat /etc/zfs/zfs-list.cache/rpool

If all is well, return to the previous SSH session and terminate zed with Ctrl+C.

Fix the paths to eliminate /mnt.

sed -Ei "s|/mnt/?|/|" /etc/zfs/zfs-list.cache/*

Reboot

The Debian install is almost ready for use without the live Debian host environment. Only a few steps remain.

Do a final system update.

apt-get dist-upgrade

Disable log compression since ZFS is already compressing at the block level.

for file in /etc/logrotate.d/* ; do
    if grep -Eq "(^|[^#y])compress" "$file" ; then
        sed -i -r "s/(^|[^#y])(compress)/\1#\2/" "$file"
    fi
done

Install an SSH server so we can login again after rebooting.

apt-get install openssh-server

Set a root password.

passwd

Create a user account.

zfs create rpool/home/ataylor
adduser ataylor
mkdir /etc/skel/.ssh && chmod 700 /etc/skel/.ssh
cp -a /etc/skel/. /home/ataylor/
scp ataylor@lagavulin:/usr/home/ataylor/.ssh/id_rsa.pub /home/ataylor/.ssh/authorized_keys
chown -R ataylor:ataylor /home/ataylor
usermod -a -G audio,cdrom,dip,floppy,netdev,plugdev,sudo,video ataylor

Snapshot the install.

zfs snapshot bpool/BOOT/debian@install
zfs snapshot rpool/ROOT/debian@install

Exit the chroot and unmount all filesystems.

exit
mount | grep -v zfs | tac | awk '/\/mnt/ {print $3}' | xargs -i{} umount -lf {}
zpool export -a

Reboot the computer and remove the USB stick. Installation is complete.

UNIX Userland

Install various no-config-required userland packages before continuing.

apt-get install net-tools bzip2 zip ntp htop xterm screen git \
        build-essential pciutils smartmontools gdb valgrind wget \
        texlive texlive-latex-extra graphviz firefox sysfsutils

X Window Manager

Install X and dwm to ensure all dependencies are met for running my dwm-derived window manager.

apt-get install xorg dwm numlockx

Install dependencies for building my window manager.

apt-get install libx11-dev libxft-dev libxinerama-dev

Copy the Hophib Modern Desktop git repo to the new server. Make the following changes:

Execute make clean install. Verify that dwm, dwm-status and dwm-watchdog.sh all ended up in /home/ataylor/bin with appropriate permissions. Delete the man pages that were installed in ataylor’s homedir.

Create ~/.xinitrc with following contents.

/usr/bin/numlockx &
/home/ataylor/bin/dwm-status &
/home/ataylor/bin/dwm-watchdog.sh

Verify X and my window manager start successfully and that dwm-watchdog.sh keeps X and X applications alive during a window manager live restart.

VIM

Install gvim.

apt-get install gvim

Create ~/.vimrc with the following contents.

set nocompatible
filetype off
set mouse=r
set number
syntax on
set tabstop=4
set expandtab

"Folding
"http://vim.wikia.com/wiki/Folding_for_plain_text_files_based_on_indentation
"set foldmethod=expr
"set foldexpr=(getline(v:lnum)=~'^$')?-1:((indent(v:lnum)<indent(v:lnum+1))?('>'.indent(v:lnum+1)):indent(v:lnum))
"set foldtext=getline(v:foldstart)
"set fillchars=fold:\ "(there's a space after that \)
"highlight Folded ctermfg=DarkGreen ctermbg=Black
"set foldcolumn=6

" Color the 100th column.
set colorcolumn=100
highlight ColorColumn ctermbg = darkgray

TCSH

Install tcsh.

apt-get install tcsh

Change the default shell for new users by editing /etc/adduser.conf, setting the DSHELL variable to /bin/tcsh. Then use the chsh command to change the shell for root and ataylor. Create ~/.cshrc in ataylor’s and root’s homedir with the following contents. Remember to also copy it to /etc/skel and set permissions so it’s used for any future users on the system.

# .cshrc - csh resource script, read at beginning of execution by each shell

alias h       history 25
alias j       jobs -l
alias la      ls -aF
alias lf      ls -FA
alias ll      ls -lF --color
alias ls      ls --color

# These are normally set through /etc/login.conf.  You may override them here
# if wanted.
set path = (/sbin /bin /usr/sbin /usr/bin /usr/local/sbin /usr/local/bin $HOME/bin)

setenv EDITOR vim
setenv PAGER  more

if ($?prompt) then
    # An interactive shell -- set some stuff up
    set prompt = "%N@%m:%~ %# "
    set promptchars = "%#"

    set filec
    set history = 1000
    set savehist = (1000 merge)
    set autolist = ambiguous
    # Use history to aid expansion
    set autoexpand
    set autorehash
    set mail = (/var/mail/$USER)
    if ( $?tcsh ) then
        bindkey "^W" backward-delete-word
        bindkey -k up history-search-backward
        bindkey -k down history-search-forward
    endif

endif

XScreensaver

Install Xscreensaver and configure screen locking.

apt-get install xscreensaver xscreensaver-data

Run xscreensaver-demo and select some screensavers. If inspiration doesn’t strike, do single screensaver mode with the abstractile hack; it looks good on pretty much any hardware. Remember to enable screen locking.

Add the following line to ~/.xinitrc.

/bin/xscreensaver -nosplash &

Go Toolchain

The version of Go provided via apt-get is always out of date, so all Go installs on this server are done via tarball from the https://golang.com website. Go 1.16.3 is used for this example but the newest version of Go may be found at https://golang.org/dl/.

Previous versions of Go are installed entirely under /usr/local/go. Delete the entire /usr/local/go directory before proceeding.

wget https://golang.org/dl/go1.16.3.linux-amd64.tar.gz
tar -C /usr/local -xzf go1.16.3.linux-amd64.tar.gz

If this is the first time installing Go on the system, update everyone’s $PATH to include /usr/local/go/bin. Remember to update files under /etc/skel at the same time.

ZFS Snapshots

In order to configure automatic ZFS snapshots, use the auto-zfs-snapshot package.

apt-get install auto-zfs-snapshot

In addition to the snapshot script itself, this package includes automatically enabled cron entries, but it will only snapshot filesystems with the com.sun:auto-snapshot property set to true. Since we already manually set that property to false for /var/cache and /var/tmp, simply set it to true for the two parent pools and allow filesystems to inherit wherever possible.

zfs set com.sun:auto-snapshot=true rpool
zfs set com.sun:auto-snapshot=true bpool

Verify that relevant filesystems inherited the property.

zfs get com.sun:auto-snapshot

After waiting 15+ minutes, verify that snapshots begin to appear.

zfs list -t snapshot

ZFS Scrubs

Automate ZFS scrubs by creating /etc/cron.d/zfs-scrubs with the following contents.

PATH=/etc:/bin:/sbin:/usr/bin:/usr/sbin:/usr/local/bin
0 0 0 * * root /sbin/zpool scrub rpool
0 0 0 * * root /sbin/zpool scrub bpool

Status Updates

In order to receive status updates like failed drive notifications, we must first configure the system to send email through the SGK mail server. Rather than use exim4 as provided by the base system, instead use msmtp.

apt-get install msmtp-mta

Create the file /etc/msmtprc with the following contents.

# Set default values for all following accounts.
defaults
auth           on
tls            on
tls_trust_file /etc/ssl/certs/ca-certificates.crt
tls_starttls   off

# Account: subgeniuskitty
account        default
host           mail.subgeniuskitty.com
port           465
from           ataylor@subgeniuskitty.com
user           ataylor@subgeniuskitty.com
password       <plaintext-password>

Create the file /etc/cron.d/status-emails with the following contents.

PATH=/etc:/bin:/sbin:/usr/bin:/usr/sbin:/usr/local/bin
SHELL=/bin/bash
0 0 * * 0 root /sbin/zpool status | echo -e "Subject:FROSTBURG: zpool status\n\n $(cat -)" | msmtp ataylor@subgeniuskitty.com

IRC Environment

IRC is used for collaboration on the server. First install daemon and client.

apt-get install ngircd irssi

Configure the server by editing /etc/ngircd/ngircd.conf. The defaults are mostly acceptable but the server must be given a name and restricted to only listen for local connections. While we’re at it, the max nick length is only 9 by default and should be increased. Note that these values need to be inserted under the appropriate category, as shown below, but the categories already exist in the config file.

[Global]
    Name = frostburg.subgeniuskitty.com
    Info = Frostburg - Private IRC Server
    Listen = 127.0.0.1
[Limits]
    MaxNickLength = 32

Restart the server and verify it listens on the correct addresses.

# systemctl restart ngircd
# netstat -an | grep LISTEN
tcp        0      0 127.0.0.1:6667          0.0.0.0:*               LISTEN

Startup a client in screen for each user.

screen -dR irc
irssi
/connect localhost
/join #channel

Public SSH Access

Although frostburg is on a private subnet, I want public SSH access. The easiest way to set this up is via a reverse SSH tunnel to one of the public subgeniuskitty.com servers.

This section refers to three machines:

First, setup appropriate login credentials on the server, which in this case is frostburg.subgeniuskitty.com. Ignore any warnings about /home/username already existing or not being owned by the correct user. These are simply a side effect of using ZFS since we must create the homedir before adding the user, but we can’t change ownership until after the new user exists.

server:~ # zfs create rpool/home/username
server:~ # adduser username
server:~ # cp -a /etc/skel/. /home/username
server:~ # chown -R username:username /home/username
server:~ # zfs snapshot rpoot/home/username@account_creation

If necessary for the intended tasks, add the user to any relevant groups with something like the following command.

server:~ # usermod -a -G netdev,plugdev,sudo,video username

The user will also need login credentials on the endpoint. These credentials don’t need to allow anything other than simply SSHing through to the server.

endpoint:~ # adduser username

With appropriate credentials successfully created, move on to setting up a reverse SSH tunnel from server to endpoint.

First, create an SSH key on the server with no passphrase and authorize it for logins on the endpoint. This will be used to bring the tunnel up when the machine boots. If a non-empty passphrase is specified, you will need to type it during the boot process.

server:~ # ssh-keygen
server:~ # scp /root/.ssh/id_rsa.pub username@endpoint:/home/username/temp_key_file
server:~ # ssh username@endpoint
    (login requires password)
endpoint:~ % mkdir -p /home/username/.ssh
endpoint:~ % mv /home/username/temp_key_file /home/username/.ssh/authorized_keys
endpoint:~ % logout
server:~ # ssh username@endpoint
    (login does not require password)
endpoint:~ % logout
server:~ # mv /root/.ssh/id_rsa rtunnel_nopwd
server:~ # mv /root/.ssh/id_rsa.pub rtunnel_nopwd.pub

Next, create the tunnel using AutoSSH to maintain a long-term connection.

server:~ # apt-get install autossh
server:~ # vi /etc/systemd/system/autossh-tunnel.service
    [Unit]
    Description=AutoSSH tunnel between frostburg.SGK and www.SGK
    After=network-online.target

    [Service]
    Environment="AUTOSSH_GATETIME=0"
    ExecStart=/bin/autossh -N -M 0 -o "ServerAliveInterval 30" -o "ServerAliveCountMax 3" -i /root/.ssh/rtunnel_nopwd -R 4242:localhost:22 username@endpoint

    [Install]
    WantedBy=multi-user.target
server:~ # systemctl daemon-reload
server:~ # systemctl start autossh-tunnel.service
server:~ # systemctl enable autossh-tunnel.service

At this point the SSH tunnel is operational. Let’s make things a little easier for the user by storing most of the config options in an SSH config file.

endpoint:~ # su - username
endpoint:~ % vi /home/username/.ssh/config
    Host server
        Hostname localhost
        User     username
        Port     4242

Now, when we execute ssh server, it is equivalent to the command ssh -p 4242 username@localhost, much easier to remember.

It’s time to test everything out. Starting from the client, you should now be able to login to the server via the endpoint.

client:~ % ssh username@endpoint
endpoint:~ % ssh server
server:~ %

Xeon Phi Kernel Module

It appears that Linux kernel version 4.19.181 included with Debian 10.9 already has some sort of in-tree kernel support for these Xeon Phi coprocessor cards as seen in the final lines of the following diagnostic output. Also note that the card allocated an 8GB PCIe MMIO region, indicating that the 64-bit BAR setting in the BIOS is working as intended.

root@frostburg:~ # lspci | grep -i Co-processor
02:00.0 Co-processor: Intel Corporation Xeon Phi coprocessor 5100 series (rev 11)
root@frostburg:~ # lspci -s 02:00.0 -vv
02:00.0 Co-processor: Intel Corporation Xeon Phi coprocessor 5100 series (rev 11)
        <snip>
        Region 0: Memory at 21c00000000 (64-bit, prefetchable) [size=8G]
        <snip>
        Kernel driver in use: mic
        Kernel modules: mic_host

However, since the Intel manuals are plastered with warnings about using exact, sanctioned combinations of kernel module, MPSS software, and Phi firmware, I decided to avoid the kernel module included with the system and instead attempt porting the kernel module source code included with MPSS onto a newer Linux kernel. Once I have everything operational and understand how it should work, then I can try the open-source driver.

I have updated the Intel kernel driver to work with newer Linux kernels. My work is based upon the kernel source included with MPSS 3.8.6, the latest/last release from Intel. Since the Xeon Phi x100 series is EOL, I don’t think Intel intends to release any more versions of MPSS. Check README.md in my xeon-phi-kernel-module git repo for up-to-date information regarding kernel version compatibility.

Before compiling the kernel module, verify that relevant kernel headers are installed.

% uname -a
Linux frostburg 4.19.0-16-amd64 #1 SMP Debian 4.19.181-1 (2021-03-19) x86_64 GNU/Linux
% dpkg -l | grep linux-header
ii  linux-headers-4.19.0-16-amd64    4.19.181-1                        amd64        Header files for Linux 4.19.0-16-amd64
ii  linux-headers-4.19.0-16-common   4.19.181-1                        all          Common header files for Linux 4.19.0-16
ii  linux-headers-amd64              4.19+105+deb10u11                 amd64        Header files for Linux amd64 configuration (meta-package)

Download and compile my updated version of the Intel kernel driver. Sample compilation output is included below.

% git clone git://git.subgeniuskitty.com/xeon-phi-kernel-module/
% cd xeon-phi-kernel-module/
% make clean all
make -C /lib/modules/4.19.0-16-amd64/build M=xeon-phi-kernel-module modules \
        INSTALL_MOD_PATH=
make[1]: Entering directory '/usr/src/linux-headers-4.19.0-16-amd64'
  CC [M]  xeon-phi-kernel-module/dma/mic_dma_lib.o
  CC [M]  xeon-phi-kernel-module/dma/mic_dma_md.o
  CC [M]  xeon-phi-kernel-module/host/acptboot.o
  CC [M]  xeon-phi-kernel-module/host/ioctl.o
  CC [M]  xeon-phi-kernel-module/host/linpm.o
  CC [M]  xeon-phi-kernel-module/host/linpsmi.o
  CC [M]  xeon-phi-kernel-module/host/linscif_host.o
  CC [M]  xeon-phi-kernel-module/host/linsysfs.o
  CC [M]  xeon-phi-kernel-module/host/linux.o
  CC [M]  xeon-phi-kernel-module/host/linvcons.o
  CC [M]  xeon-phi-kernel-module/host/linvnet.o
  CC [M]  xeon-phi-kernel-module/host/micpsmi.o
  CC [M]  xeon-phi-kernel-module/host/micscif_pm.o
  CC [M]  xeon-phi-kernel-module/host/pm_ioctl.o
  CC [M]  xeon-phi-kernel-module/host/pm_pcstate.o
  CC [M]  xeon-phi-kernel-module/host/tools_support.o
  CC [M]  xeon-phi-kernel-module/host/uos_download.o
  CC [M]  xeon-phi-kernel-module/host/vhost/mic_vhost.o
  CC [M]  xeon-phi-kernel-module/host/vhost/mic_blk.o
  CC [M]  xeon-phi-kernel-module/host/vmcore.o
  CC [M]  xeon-phi-kernel-module/micscif/micscif_api.o
  CC [M]  xeon-phi-kernel-module/micscif/micscif_debug.o
  CC [M]  xeon-phi-kernel-module/micscif/micscif_fd.o
  CC [M]  xeon-phi-kernel-module/micscif/micscif_intr.o
  CC [M]  xeon-phi-kernel-module/micscif/micscif_nm.o
  CC [M]  xeon-phi-kernel-module/micscif/micscif_nodeqp.o
  CC [M]  xeon-phi-kernel-module/micscif/micscif_ports.o
  CC [M]  xeon-phi-kernel-module/micscif/micscif_rb.o
  CC [M]  xeon-phi-kernel-module/micscif/micscif_rma_dma.o
  CC [M]  xeon-phi-kernel-module/micscif/micscif_rma_list.o
  CC [M]  xeon-phi-kernel-module/micscif/micscif_rma.o
  CC [M]  xeon-phi-kernel-module/micscif/micscif_select.o
  CC [M]  xeon-phi-kernel-module/micscif/micscif_smpt.o
  CC [M]  xeon-phi-kernel-module/micscif/micscif_sysfs.o
  CC [M]  xeon-phi-kernel-module/micscif/micscif_va_gen.o
  CC [M]  xeon-phi-kernel-module/micscif/micscif_va_node.o
  CC [M]  xeon-phi-kernel-module/vnet/micveth_dma.o
  CC [M]  xeon-phi-kernel-module/vnet/micveth_param.o
  LD [M]  xeon-phi-kernel-module/mic.o
  Building modules, stage 2.
  MODPOST 1 modules
  CC      xeon-phi-kernel-module/mic.mod.o
  LD [M]  xeon-phi-kernel-module/mic.ko
make[1]: Leaving directory '/usr/src/linux-headers-4.19.0-16-amd64'

At this point you can manually load/install the new kernel module (mic.ko) which is found in the current directory, or execute make install. The latter command also installs the SCIF header file, as well as putting some config files under /usr/local/etc/. The information in those config files won’t be picked up by the system (we will install configs in the correct location in a moment), but it is useful as a reference. Sample make install output is shown below.

# make install
make -C /lib/modules/4.19.0-16-amd64/build M=/home/ataylor/xeon-phi-kernel-module modules_install \
        INSTALL_MOD_PATH=
make[1]: Entering directory '/usr/src/linux-headers-4.19.0-16-amd64'
  INSTALL /home/ataylor/xeon-phi-kernel-module/mic.ko
  DEPMOD  4.19.0-16-amd64
Warning: modules_install: missing 'System.map' file. Skipping depmod.
make[1]: Leaving directory '/usr/src/linux-headers-4.19.0-16-amd64'
install -d /usr/local/etc/sysconfig/modules
install mic.modules /usr/local/etc/sysconfig/modules
install -d /usr/local/etc/modprobe.d
install -m644 mic.conf /usr/local/etc/modprobe.d
install -d /usr/local/etc/udev/rules.d
install -m644 udev-mic.rules /usr/local/etc/udev/rules.d/50-udev-mic.rules
install -d /lib/modules/4.19.0-16-amd64
install -m644 Module.symvers /lib/modules/4.19.0-16-amd64/scif.symvers
install -d /usr/src/linux-headers-4.19.0-16-amd64/include/modules
install -m644 include/scif.h /usr/src/linux-headers-4.19.0-16-amd64/include/modules

Create the file /etc/modprobe.d/mic.conf with the following contents, intended to accomplish two things. First, blacklist the in-tree MIC kernel module that shipped with our kernel, including all associated modules, and second, configure the Intel MIC kernel module which we just built and installed. The options shown are drawn from the defaults in /usr/local/etc/modprobe.d/mic.conf.

# Blacklist the in-tree kernel modules associated with the Knight's Corner Xeon
# Phi so that we can load the Intel kernel module.

# These two modules depend on the various bus modules that follow.
blacklist mic_host
blacklist mic_x100_dma

blacklist cosm_bus
blacklist vop_bus
blacklist scif_bus
blacklist mic_bus

# ^^^------ Blacklisting the in-tree MIC kernel module.
# ==============================================================================
# vvv------ Configuring the Intel MIC kernel module.

# The following options apply to the Intel Many Integrated Core (MIC) driver.
# Unless otherwise noted, the value "1" enables the feature and "0" disables
# it.
#
# Option:      p2p
# Description: Enables use of SCIF interface peer to peer communication.
#
# Option:      p2p_proxy
# Description: Enables use of SCIF P2P Proxy DMA which converts DMA
#              reads into DMA writes for performance on certain Intel
#              platforms.
#
# Option:      reg_cache
# Description: Enables SCIF Registration Caching.
#
# Option:      huge_page
# Description: Enables SCIF Huge Page Support.
#
# Option:      watchdog
# Description: Enables SCIF watchdog for Lost Node detection.
#
# Option:      watchdog_auto_reboot
# Description: Configures behavior of MIC host driver upon detection of a lost
#              node. This option is a nop if watchdog=0. Setting value "1"
#              allows host driver to reboot node back to "online" state,
#              whereas value "0" only allows the host driver to reset the node
#              back to "ready" state, leaving the user responsible for rebooting
#              the node (or not).
#
# Option:      crash_dump
# Description: Enables uOS Kernel Crash Dump Captures.
#
# Option:      ulimit
# Description: Enables ulimit checks on max locked memory for scif_register.
#
options mic reg_cache=1 huge_page=1 watchdog=1 watchdog_auto_reboot=1 crash_dump=1 p2p=1 p2p_proxy=1 ulimit=0
options mic_host reg_cache=1 huge_page=1 watchdog=1 watchdog_auto_reboot=1 crash_dump=1 p2p=1 p2p_proxy=1 ulimit=0

Finally, add the line mic to the file /etc/modules-load.d/modules.conf, instructing the system to load this kernel module on boot, then run depmod to ensure the system is aware of the new kernel module, followed by a reboot to verify everything works.

After the system comes back up, verify that the module loaded with your desired options using the systool command, sample output below.

# systool -v -m mic
Module = "mic"

  Attributes:
    coresize            = "741376"
    initsize            = "0"
    initstate           = "live"
    refcnt              = "0"
    taint               = "OE"
    uevent              = <store method only>

  Parameters:
    crash_dump          = "Y"
    huge_page           = "Y"
    msi                 = "Y"
    p2p_proxy           = "Y"
    p2p                 = "Y"
    pm_qos_cpu_dma_lat  = "-1"
    psmi                = "N"
    ramoops_count       = "4"
    reg_cache           = "Y"
    ulimit              = "N"
    vnet                = "dma"
    vnet_addr           = "0"
    vnet_num_buffers    = "62"
    watchdog_auto_reboot= "Y"
    watchdog            = "Y"

  Sections:
    <snip>

Intel MPSS