Information content values not being calculated correctly for conflated cliques

Looking at the source code for normalization, it looks like to me like the info_content value is only being loaded for the conflation clique leader, not for individual values within a clique:

https://github.com/NCATSTranslator/NodeNormalization/blob/b4f558ecdd7e07b5754ab397349a4b938bdb3742/node_normalizer/normalizer.py#L562

We currently set things up so that conflationed IDs with smaller IC values are sorted to the front. The only way that (currently) you could get a lower ID later in the sequence is if there's a CURIE with an IC and a prefix that's later in the order than other values, but at the moment CHEBIs for chemicals and NCBIGene identifiers for genes are the only conflated prefixes we have with ICs. So I will need to come up with a test case to test this. I started investigating this in PR #366.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Information content values not being calculated correctly for conflated cliques #368

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Information content values not being calculated correctly for conflated cliques #368

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions