Setup run hpc by ghar1821 · Pull Request #119 · openproblems-bio/task_cyto_batch_integration

ghar1821 · 2026-01-16T04:25:27Z

Describe your changes

This PR introduces new configuration setup to support running benchmarks on the WEHI HPC environment and some implementation changes.

Config to run on WEHI hpc

The hpc's scratch system is prone to task collisions when multiple jobs access the same shared cache/temp folders. It is not smart enough to keep each job isolated (or maybe it is a nextflow problem, I am not too sure). To resolve this, I introduced env variables (on top of few that are already mandatory) like NUMBA_CACHE_DIR, and APPTAINER_TMPDIR. These are the "parent" directory. Each task will create a sub directory within it with the Task ID as the directory name and use it. This prevents methods that write out temp files (e.g., CytoNorm, BatchAdjust) from overwriting each other's temp files.

This config will only be used to run the job on WEHI hpc and have no impact on the config to run the job on open problems AWS.

Apptainer Config - apart from adding the mandatory cache, I have to use envwhitelist to ensure these the env variables are properly passed into the container. Otherwise, the caches from the containers will default to my home directory or the /tmp folders in the node, which I am not allowed to use.

Before running the jobs, one need to pull and build all the apptainer images first. There is a separate script for this with a README file in the scripts/run_benchmark/wehi_hpc folder.
It is not possible to let the child jobs to pull and build the images by setting ociAutoPull=True. This is because concurrent jobs running the same methods/metrics will overwrite the same image files simultaneously, leading to cache deadlocks. Alternatively, if setting ociAutoPull=False, the head job will be overwhelmed pulling and building the images and just die, even with a 2-day pullTimeout.
Hence the script to pull and build the images are necessary.

I also updated the default retry attempts to 3x. This mitigates random "Bus Errors" that somehow resolves itself after a second or third attempt.

Implementation changes

Changes were made to methods/metrics:

One control and no control method will only get either samples from one control plus non-control samples or just no control samples.
They will no longer be given access to other samples to correct or to train the model.
CytoVI: Now uses a MinMax scaler fitted on Batch 1 post-correction for normalization. Excluded from current run. More work needed to work out the optimal way to run this method.
Ratio Inconsistent Peaks: Added handling for edge cases where methods return only zero for a given marker/donor/cell type, preventing division by zero when calculating sd.
HarmonyPy: Removed redundant transpose operation. The latest harmonypy updates no longer require this.
CytoNorm: Fixed a bug in to mid methods where recompute was incorrectly set to FALSE (now TRUE).
Perfect Integration: Fixed a bug where string-based batch columns (vs. integers) resulted in only control samples being returned. Note: Most datasets currently use int for batches, which violates our schema. See Issue Batch obs is not always str in datasets #121 for long-term fix.
BatchAdjust: Fixed a (dumb) requirement where non-control samples need "Batch_" somewhere in the sample name.
Updated get_obs_var_for_integrated helper to handle type mismatches when overriding string-based batch columns with integer maps for perfect integration.
Resource Tuning: adjusted time, mem, and cpu requirements:
Low: Control methods.
Mid: Most methods/metrics.
High/Very High: rPCA.
Update batchadjust, cytonorm to use HPC temp dir if the environment variable is set or else
default to what is set by viash. See previous section why this is needed.

Checklist before requesting a review

I have performed a self-review of my code
Check the correct box. Does this PR contain:
- Breaking changes
- New functionality
- Major changes
- Minor changes
- Bug fixes
Proposed changes are described in the CHANGELOG.md
CI Tests succeed and look good!

ghar1821 · 2026-02-04T04:05:18Z

Update batchadjust, cytonorm to use HPC temp dir if the environment variable is set or else
default to what is set by viash. See previous section why this is needed.

Is there are reason why only these two methods were adapted?

no i have updated all methods that used temp folders (need FCS files). No longer need the HPC temp dir anymore. See changes i described above.

ghar1821 added 28 commits January 15, 2026 19:32

add scripts to process raw dataset

42effdb

editing config to set apptainer cache dir

83ceda2

editing pre-run scripts and trying to fix R methods not running.

f3019a9

add h5py to setup

f664c48

reverting changes to setup

338aa23

separate submit scripts

f41f6f4

finally the first setting that works!!!!

66c348e

update config and settings for control methods

fa1ade7

adjusted resources for metrics and methods

5902468

update cytovi to use A30 gpu

7497ea4

add numba cache dir export to allow jit caching

cf3d35b

update cytovi implementation

35e3fdb

force recompute for all cytonorm

bad078d

add temp dir resolution for hpc

bdcbf46

remove transpose from harmonypy

7bada43

adding support for hpc

6793f3d

update temp dir again

aa4b07a

latest config file that works reasonably well with hpc

44def10

add some job submit scripts for SLURM

fc4df26

update tmp_path for cytonorm

bcc7ddb

redirect numba cache dir away from /tmp and to its own folder.

9dae0b6

update batch adjust non control samples naming

379b3dd

fix bug in perfect integration subsetting

d062157

fix bug where we can't replace the batch column if it is not integer

7627554

fix bug where the donor loc are somewhat mismatched..

5ff088b

update ratio inconsistent peak where corrected data return only zero

2864b9c

Update script.py

ddc57cf

update scripts

2312cb2

ghar1821 requested review from LuLeom and rcannood January 17, 2026 23:11

ghar1821 and others added 22 commits February 2, 2026 20:29

remove average batch r2 global

4e807f1

add seed setting for cytovi

03cc959

remove env for viash temp files

ca9ac0a

update lisi to allow anndata write

fa20205

update cycombine

04a4280

more updates to cycombine

f337c99

minor change of script type

236fec8

update cytonorm

0706945

fixed gaussnorm

f75d01c

fixed limma

11cab3e

Fixed harmonypy and combat

b1592c4

Fixed rPCA

f4bff8d

update batchadjust and add copy to subset

facc520

remove cytovi and some obsolete metrics

f27dad7

renamed shuffle control methods

4eaeaac

missed label change

0e11692

reorganising scripts for hpc

1e2d508

update changelog

06a9069

update changelog again

f444132

update changelog

ebb18a0

update changelog

e3d8951

update description.

f2d073e

ghar1821 added 7 commits February 4, 2026 15:27

manually adding some dependencies for flowCore and flowStats

3e3af60

update ratio inconsistent peaks

10c2c60

update inconsistent peaks

999ce87

add print statements to subset functions

558cd5c

add print statements when writing files out

de15e28

add utility scripts for pulling intermediate files

6210259

update methods and metrics labels

9680aa3

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Comments

Setup run hpc#119

Setup run hpc#119
ghar1821 wants to merge 58 commits intomainfrom
setup_run_hpc

ghar1821 commented Jan 16, 2026 •

edited

Loading

Uh oh!

ghar1821 commented Feb 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Comments

Conversation

ghar1821 commented Jan 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Describe your changes

Config to run on WEHI hpc

Implementation changes

Checklist before requesting a review

Uh oh!

ghar1821 commented Feb 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

ghar1821 commented Jan 16, 2026 •

edited

Loading