# OpenOnDemand Job Templates User Guide

This guide explains how to use job templates in OpenOnDemand to submit SLURM jobs efficiently.

## What are Job Templates?

Job templates are pre-configured SLURM job scripts that help you quickly submit common types of jobs without writing scripts from scratch. Each template includes:

- Pre-configured SLURM directives (cores, memory, time limits)
- Example code or workflows
- Documentation and best practices
- Ready-to-run scripts

## Accessing the Job Composer

1. **Log in to OpenOnDemand** at your cluster's URL
2. Click on **"Jobs"** in the top navigation menu
3. Select **"Job Composer"** from the dropdown

You'll see the Job Composer interface with:

- List of your existing jobs (left sidebar)
- Job details and files (main panel)
- Action buttons (Submit, Edit, Delete, etc.)

## Creating Jobs from Templates

### Step 1: Create New Job from Template

1. In the Job Composer, click **"New Job"** button
2. Select **"From Template"**
3. Choose a template from the list (see [Available Templates](#available-templates))
4. Click **"Create New Job"**

The template will be copied to your jobs directory with all necessary files.

### Step 2: Review Job Location

Your new job is created in:

```text
~/ondemand/data/sys/myjobs/projects/default/<job-id>/
```

Each job gets a unique directory containing:

- `script.sh` - The SLURM job script
- Template-specific files (e.g., example R scripts, definition files)
- `README.md` - Documentation (if included)

### Step 3: Understand the Job Structure

Every job template includes a `script.sh` file with SLURM directives at the top:

```bash
#!/bin/bash
#SBATCH --job-name=my_job        # Job name
#SBATCH --time=01:00:00          # Time limit (HH:MM:SS)
#SBATCH --partition=normal       # Queue/partition
#SBATCH -n 1                     # Number of tasks
#SBATCH -c 4                     # CPU cores per task
#SBATCH --mem=8G                 # Memory
#SBATCH --output=%x-%j.out       # Output file
#SBATCH --error=%x-%j.err        # Error file
```

## Editing Job Scripts

### Using the Built-in Editor

1. In Job Composer, select your job from the left sidebar
2. Click on `script.sh` in the file list
3. Click **"Edit"** button
4. Make your changes in the editor
5. Click **"Save"** when done

### Common Edits

#### Change Resource Requirements

Edit the SLURM directives to match your needs:

```bash
#SBATCH --time=04:00:00          # Increase time limit
#SBATCH -c 8                     # Use more CPU cores
#SBATCH --mem=32G                # Request more memory
#SBATCH --partition=gpu          # Use GPU partition
```

#### Add Your Data Files

Edit the script to point to your actual data:

```bash
# Change this:
Rscript hello.R

# To this:
Rscript /path/to/your/analysis.R
```

#### Modify Job Name and Output

```bash
#SBATCH --job-name=my_analysis   # Descriptive name
#SBATCH --output=results_%j.out  # Custom output filename
```

### Uploading Additional Files

1. In Job Composer, select your job
2. Click **"Open Dir"** to open the job directory in the file browser
3. Use **"Upload"** button to add your data files
4. Update the script to reference your uploaded files

## Submitting Jobs

### Submit Your Job

1. Select your job in the Job Composer
2. Review the script and ensure all settings are correct
3. Click the **"Submit"** button

You'll see a confirmation message with the job ID (e.g., "Job submitted successfully with ID: 12345")

### What Happens Next?

1. **Queued**: Job enters the SLURM queue
2. **Running**: Job starts when resources are available
3. **Completed**: Job finishes (check output files for results)
4. **Failed**: Job encountered an error (check error file)

## Monitoring Jobs

### View Active Jobs

1. Click **"Jobs"** → **"Active Jobs"** in the top menu
2. You'll see all your running and pending jobs
3. Information displayed:
   - Job ID
   - Job Name
   - Status (Running, Pending, Completed, Failed)
   - Time elapsed
   - Nodes/cores used

### Check Job Output

While the job is running or after completion:

1. In Job Composer, select your job
2. Click on the output file (e.g., `my_job-12345.out`)
3. Click **"View"** to see the contents
4. Click **"Refresh"** to update (for running jobs)

### View Job Details

For detailed job information:

1. Go to **"Jobs"** → **"Active Jobs"**
2. Click on your job ID
3. View comprehensive details:
   - Start time
   - Resource usage
   - Node assignment
   - Full job parameters

## Available Templates

### Basic R Serial Job

**Template**: `rscript`

**Purpose**: Run R scripts on a single core

**Includes**:

- `script.sh` - SLURM job script
- `hello.R` - Example R script with system information

**Use cases**:

- Data analysis
- Statistical computing
- Report generation

**How to customize**:

1. Replace `hello.R` with your R script or upload your own
2. Edit `script.sh` to reference your script:

   ```bash
   srun /usr/bin/apptainer exec /data/apps/rstudio.sif Rscript your_script.R
   ```

3. Adjust resources (memory, cores, time) as needed

### Build Custom Apptainer Image

**Template**: `apptainer_builder`

**Purpose**: Build custom container images based on RStudio

**Includes**:

- `script.sh` - Build automation script
- `rstudio_custom.def` - Apptainer definition file
- `README.md` - Detailed instructions

**Use cases**:

- Installing additional R packages
- Adding system dependencies (GDAL, PROJ, etc.)
- Creating reproducible environments
- Custom software stacks

**How to customize**:

1. Edit `rstudio_custom.def` to add your packages:

   ```singularity
   %post
       apt-get update
       apt-get install -y your-system-packages

       R --slave -e 'install.packages(c("your", "packages"))'
   ```

2. Submit the job (build takes 1-4 hours)
3. Image is saved to `$HOME/apps/rstudio_custom.sif`
4. Use in future jobs or RStudio sessions

## Advanced Usage

### Creating Job Arrays

Run the same script multiple times with different parameters:

1. Edit your `script.sh` and add:

   ```bash
   #SBATCH --array=1-10           # Run 10 instances
   ```

2. Use `$SLURM_ARRAY_TASK_ID` in your script:

   ```bash
   Rscript analysis.R $SLURM_ARRAY_TASK_ID
   ```

### Job Dependencies

Run jobs in sequence:

1. Submit first job and note the job ID (e.g., 12345)
2. Create second job with dependency:

   ```bash
   #SBATCH --dependency=afterok:12345
   ```

### Using Custom Container Images

After building a custom image:

1. Edit your job script
2. Change the container path:

   ```bash
   # Instead of:
   apptainer exec /data/apps/rstudio.sif Rscript script.R

   # Use:
   apptainer exec $HOME/apps/rstudio_custom.sif Rscript script.R
   ```

### Email Notifications

Get notified when jobs complete:

```bash
#SBATCH --mail-type=END,FAIL     # Email on end or failure
#SBATCH --mail-user=your.email@example.com
```

### Using GPU Resources

For GPU-accelerated jobs:

```bash
#SBATCH --partition=gpu          # GPU partition
#SBATCH --gres=gpu:1             # Request 1 GPU
#SBATCH --gres=gpu:2             # Or request 2 GPUs
```

### Parallel Processing in R

For multi-core R jobs:

```bash
#SBATCH -c 8                     # Request 8 cores
```

In your R script:

```r
library(parallel)
library(doParallel)

# Use all available cores
n_cores <- as.numeric(Sys.getenv("SLURM_CPUS_PER_TASK"))
registerDoParallel(cores = n_cores)

# Your parallel code here
results <- foreach(i = 1:1000) %dopar% {
    # Computation
}
```

## Troubleshooting

### Job Stays in Pending State

**Possible causes**:

- Requested resources not available
- Partition full or not accessible
- Time limit too high for requested partition
- Account/QOS limits reached

**Solutions**:

1. Check active jobs: **"Jobs"** → **"Active Jobs"**
2. Reduce resource requests (cores, memory, time)
3. Try a different partition
4. Contact admin if issue persists

### Job Fails Immediately

**Check the error file** (`*.err`):

1. In Job Composer, select your job
2. Open the `.err` file
3. Look for error messages

**Common issues**:

- File not found: Check paths are absolute or relative to job directory
- Permission denied: Ensure files are readable
- Module not loaded: Container may be missing dependencies
- Syntax errors: Review script for typos

### Out of Memory Errors

**Symptoms**:

- Job fails with "out of memory" or "killed" message
- Exit code 137

**Solutions**:

1. Increase memory request:

   ```bash
   #SBATCH --mem=32G              # Instead of 8G
   ```

2. Use memory-efficient approaches in your code
3. Split job into smaller chunks

### Container Image Not Found

**Error**: `FATAL: container not found`

**Solutions**:

1. Check the container path in your script
2. Verify the container exists:

   ```bash
   ls -l /data/apps/rstudio.sif
   ```

3. If using custom image, ensure build completed:

   ```bash
   ls -l $HOME/apps/rstudio_custom.sif
   ```

### Job Takes Too Long

**Options**:

1. Request more time:

   ```bash
   #SBATCH --time=12:00:00
   ```

2. Optimize your code (vectorization, parallel processing)
3. Use more CPU cores if parallelizable
4. Check if you're in the correct partition for long jobs

### Can't Edit Files

**If editor doesn't work**:

1. Click **"Open Dir"** in Job Composer
2. Use the file browser's built-in editor
3. Or download file, edit locally, re-upload

### Need to Cancel a Job

1. Go to **"Jobs"** → **"Active Jobs"**
2. Find your job in the list
3. Click **"Delete"** or **"Cancel"** button
4. Confirm the cancellation

## Best Practices

### Resource Estimation

- **Start small**: Begin with minimal resources and increase if needed
- **Monitor usage**: Check actual resource usage after jobs complete
- **Be realistic**: Don't over-request resources (wastes queue time)

### File Organization

- Keep related jobs in same project directory
- Use descriptive job names
- Document your workflow in README files
- Clean up old jobs periodically

### Testing

- Test with small datasets first
- Use short time limits for testing
- Verify output before running large batches
- Use interactive sessions for debugging

### Reproducibility

- Document software versions
- Use container images for consistent environments
- Save SLURM scripts with your results
- Note date and job ID in your analysis notes

## Getting Help

### Resources

- **Documentation**: Check README files in templates
- **Active Jobs**: Monitor job status and resource usage
- **Error Logs**: Always check `.err` files for failures

### Contact Support

If you encounter persistent issues:

1. Note the job ID
2. Save error messages
3. Document what you've tried
4. Contact your HPC admin

## Quick Reference

### Common SLURM Directives

```bash
#SBATCH --job-name=name          # Job name
#SBATCH --partition=normal       # Queue/partition
#SBATCH --time=HH:MM:SS          # Time limit
#SBATCH -n 1                     # Number of tasks
#SBATCH -c 4                     # CPUs per task
#SBATCH --mem=8G                 # Memory per node
#SBATCH --mem-per-cpu=2G         # Memory per CPU
#SBATCH --output=file.out        # Output file
#SBATCH --error=file.err         # Error file
#SBATCH --mail-type=ALL          # Email notifications
#SBATCH --mail-user=email        # Email address
#SBATCH --gres=gpu:1             # GPU request
#SBATCH --array=1-10             # Job array
#SBATCH --dependency=afterok:123 # Job dependency
```

### Environment Variables in Jobs

```bash
$SLURM_JOB_ID                    # Job ID
$SLURM_JOB_NAME                  # Job name
$SLURM_SUBMIT_DIR                # Submission directory
$SLURM_JOB_NODELIST              # Assigned nodes
$SLURM_NTASKS                    # Number of tasks
$SLURM_CPUS_PER_TASK             # CPUs per task
$SLURM_ARRAY_TASK_ID             # Array index
```

### Useful Commands (in scripts)

```bash
cd $SLURM_SUBMIT_DIR             # Go to submission directory
echo "Job ID: $SLURM_JOB_ID"    # Print job info
date                              # Timestamp
hostname                          # Node name
```

## Example Workflow

### Complete Example: Running an R Analysis

1. **Create job from template**:
   - Jobs → Job Composer → New Job → From Template
   - Select "Basic R Serial Job"

2. **Upload your R script**:
   - Click "Open Dir"
   - Upload your `analysis.R` file

3. **Edit the job script**:

   ```bash
   #!/bin/bash
   #SBATCH --job-name=my_analysis
   #SBATCH --time=02:00:00
   #SBATCH -c 4
   #SBATCH --mem=16G

   cd $SLURM_SUBMIT_DIR
   srun apptainer exec /data/apps/rstudio.sif Rscript analysis.R
   ```

4. **Submit the job**:
   - Click "Submit"
   - Note the job ID

5. **Monitor progress**:
   - Jobs → Active Jobs
   - Check output file periodically

6. **Review results**:
   - Open `.out` file when job completes
   - Download output files if needed

---

**Last Updated**: 2025-12-03

**Questions?** Contact your SciIT team.