Skip to content

Update parse_cohort_files.py#76

Open
D-Pankey wants to merge 14 commits intodevelopfrom
update_parse
Open

Update parse_cohort_files.py#76
D-Pankey wants to merge 14 commits intodevelopfrom
update_parse

Conversation

@D-Pankey
Copy link

@D-Pankey D-Pankey commented Feb 7, 2026

parse_cohort_file
Should parse one or more cohort files and writes a combined list of sample IDs to an output file.
No longer converts to PrimaryIds

Single file
python3 parse_cohort_files.py parse cohort1.txt parsed_samples.txt

Multiple files
python3 parse_cohort_files.py parse cohort1.txt cohort2.txt parsed_samples.txt

Directory wildcard
python3 parse_cohort_files.py parse /home/cohortfiles/*.txt parsed_samples.txt

list_directories


Lists all subdirectories within a given parent directory and writes them to an output file.

python3 parse_cohort_files.py list_dir /home/BAM/ bam_directories.txt

compare_files
Compares two files and reports difference to an output file

python3 parse_cohort_files.py compare <file1.txt> <file2.txt> <report_file>

all_sample_ids.extend(samples)

# Write all parsed sample names to output file
with open(output_file, "w") as f:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We do not need to write parsed sample names to output file. Only primaryIds

directories_set = set(all_directories)

unique_to_samples = samples_set - directories_set
#print(f"Unique to samples: {unique_to_samples}")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unnecessary comment

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants