Open
Conversation
Contributor
Author
no i have updated all methods that used temp folders (need FCS files). No longer need the HPC temp dir anymore. See changes i described above. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Describe your changes
This PR introduces new configuration setup to support running benchmarks on the WEHI HPC environment and some implementation changes.
Config to run on WEHI hpc
The hpc's scratch system is prone to task collisions when multiple jobs access the same shared cache/temp folders. It is not smart enough to keep each job isolated (or maybe it is a nextflow problem, I am not too sure). To resolve this, I introduced env variables (on top of few that are already mandatory) like
NUMBA_CACHE_DIR, andAPPTAINER_TMPDIR. These are the "parent" directory. Each task will create a sub directory within it with the Task ID as the directory name and use it. This prevents methods that write out temp files (e.g., CytoNorm, BatchAdjust) from overwriting each other's temp files.This config will only be used to run the job on WEHI hpc and have no impact on the config to run the job on open problems AWS.
Apptainer Config - apart from adding the mandatory cache, I have to use envwhitelist to ensure these the env variables are properly passed into the container. Otherwise, the caches from the containers will default to my home directory or the /tmp folders in the node, which I am not allowed to use.
Before running the jobs, one need to pull and build all the apptainer images first. There is a separate script for this with a README file in the
scripts/run_benchmark/wehi_hpcfolder.It is not possible to let the child jobs to pull and build the images by setting
ociAutoPull=True. This is because concurrent jobs running the same methods/metrics will overwrite the same image files simultaneously, leading to cache deadlocks. Alternatively, if settingociAutoPull=False, the head job will be overwhelmed pulling and building the images and just die, even with a 2-daypullTimeout.Hence the script to pull and build the images are necessary.
I also updated the default retry attempts to 3x. This mitigates random "Bus Errors" that somehow resolves itself after a second or third attempt.
Implementation changes
Changes were made to methods/metrics:
They will no longer be given access to other samples to correct or to train the model.
get_obs_var_for_integratedhelper to handle type mismatches when overriding string-based batch columns with integer maps for perfect integration.default to what is set by viash. See previous section why this is needed.
Checklist before requesting a review
I have performed a self-review of my code
Check the correct box. Does this PR contain:
Proposed changes are described in the CHANGELOG.md
CI Tests succeed and look good!