Update parse_cohort_files.py by D-Pankey · Pull Request #76 · mskcc/beagle_cli

D-Pankey · 2026-02-07T04:56:35Z

parse_cohort_file
Should parse one or more cohort files and writes a combined list of sample IDs to an output file. No longer converts to PrimaryIds

Single file
python3 parse_cohort_files.py parse cohort1.txt parsed_samples.txt

Multiple files
python3 parse_cohort_files.py parse cohort1.txt cohort2.txt parsed_samples.txt

Directory wildcard
python3 parse_cohort_files.py parse /home/cohortfiles/*.txt parsed_samples.txt

list_directories  
Lists all subdirectories within a given parent directory and writes them to an output file.

python3 parse_cohort_files.py list_dir /home/BAM/ bam_directories.txt

compare_files
Compares two files and reports difference to an output file

python3 parse_cohort_files.py compare <file1.txt> <file2.txt> <report_file>

le files

scripts/parse_cohort_files.py

sivkovic · 2026-02-12T14:59:01Z

scripts/parse_cohort_files.py

+        all_sample_ids.extend(samples)
+
+    # Write all parsed sample names to output file
    with open(output_file, "w") as f:


We do not need to write parsed sample names to output file. Only primaryIds

sivkovic · 2026-02-12T15:01:05Z

scripts/parse_cohort_files.py

+    directories_set = set(all_directories)
+
+    unique_to_samples = samples_set - directories_set
+    #print(f"Unique to samples: {unique_to_samples}")


Unnecessary comment

D-Pankey added 9 commits February 3, 2026 17:49

updated parse command to take multip

a474c07

le files

modify function name

19c96f3

accept multiple files

a48c6b5

Change primary IDs to sample IDs in cohort parsing

e2d4b0e

Add list_directories function to generate directory list

f267b76

Added compare function to compare outputs

96b9c54

Removed restriction of which file to compare first

5c5ab24

update compare command

507c81f

updated HELP

0ae0f98

D-Pankey assigned sivkovic Feb 7, 2026

sivkovic requested changes Feb 9, 2026

View reviewed changes

scripts/parse_cohort_files.py Show resolved Hide resolved

D-Pankey added 5 commits February 9, 2026 16:30

merged functions

647ba3e

make subdirectory list optional

7b02362

reverting changes

f4b0eb5

cmoSampleName conversion

157486d

remove printing unique_to_samples in console

6818c8a

sivkovic requested changes Feb 12, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update parse_cohort_files.py#76

Update parse_cohort_files.py#76
D-Pankey wants to merge 14 commits intodevelopfrom
update_parse

D-Pankey commented Feb 7, 2026 •

edited

Loading

Uh oh!

Uh oh!

sivkovic Feb 12, 2026

Uh oh!

sivkovic Feb 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

D-Pankey commented Feb 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

sivkovic Feb 12, 2026

Choose a reason for hiding this comment

Uh oh!

sivkovic Feb 12, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

D-Pankey commented Feb 7, 2026 •

edited

Loading