You are viewing a preview of this job. Log in or register to view more details about this job.

HPC Engineer

Job Details
We are seeking a skilled and motivated HPC Engineer ​to support our High-Performance Computing (HPC) and Digital Platform operations. This role plays a critical part in maintaining and optimizing Linux-based systems in a complex, data-driven environment. You’ll work alongside a global IT team to ensure seamless operation of our HPC and cloud infrastructure, supporting scientific discovery and data-intensive workloads.

This position requires flexibility to work variable shifts, including occasional evenings, weekends, or holiday coverage based on business and system support needs.

 

Key Responsibilities:

  • Install, configure, and maintain Linux systems and related applications across HPC and cloud environments
  • Monitor system performance, analyze logs, and proactively identify and address potential issues
  • Provide technical support to end users, resolving system, hardware, and software issues
  • Manage system backups, software upgrades, and security patches
  • Support in-house software, troubleshoot performance issues, and ensure adherence to IT policies
  • Utilize ticketing systems to manage and resolve support requests efficiently
  • Collaborate with cross-functional teams to support evolving business and research needs
  • Contribute to automation efforts using tools like Ansible, GitLab, Puppet, or equivalent
  • Support job scheduling and workload management using SLURM (preferred)
  • Stay current with evolving technologies and best practices in HPC and cloud computing

Required Skills & Qualifications:

  • Bachelor’s degree in Computer Science, IT, Engineering, or a related field – or equivalent work experience
  • 1–5 years of hands-on Linux system administration experience, preferably in an HPC environment
  • Proficient in shell scripting (Bash, Python, or Perl)
  • Experience with Docker and container orchestration
  • Familiarity with configuration management tools (Ansible, Chef, Puppet, Salt, etc.)
  • Exposure to SQL-based databases such as MySQL or MariaDB
  • Strong troubleshooting, problem-solving, and communication skills
  • Ability to work variable shifts in a 24/7 environment as needed

Preferred Qualifications:

  • Experience with SLURM workload manager
  • Exposure to DevOps practices and tools (CI/CD, Kubernetes, OpenStack)
  • Understanding of hardware infrastructure, including CPU, GPU, and storage systems
  • Cloud administration experience (AWS, Azure, GCP, etc.)
  • Certifications such as CompTIA Network+, CCNA, or ITIL Foundation

Location: Houston, Texas

Schedule: Hybrid – 4 days onsite, 1 day work-from-home
Employment Type: Full-Time | Variable Shifts Required