Skip to content

Latest commit

 

History

History
95 lines (60 loc) · 2.62 KB

File metadata and controls

95 lines (60 loc) · 2.62 KB

Usage example

Initial setup

Import the library:

import EdgePrediction

Create an instance:

ep = EdgePrediction.EdgePrediction()

Input data format

The input data is a directed graph and should be provided in CSV format. The files must contain the following columns, in any order, with one row per edge:

  • Source node type
  • Source node name
  • Edge type
  • Target node name
  • Target node type

Any additional columns are ignored. Multiple files can be loaded sequentially into a single instance of the EdgePrediction library, in which case all data is combined. Example input data is available from an application of EdgePrediction to Adverse Drug Reactions (https://github.com/KHP-Informatics/ADR-graph).

Currently the algorithm assumes that edges of a given type will be made for one type of node, and that the same type of is the source node type of all edge types in the graph. All edge types in the input data will be used as features in the predictive model.

Load data

Load data from Bean et al (https://github.com/KHP-Informatics/ADR- graph):

ep.CSV_to_graph(fname = 'data/data.csv', header = True, srcNameCol = 0, srcTypeCol = 1, tgtNameCol = 4, tgtTypeCol = 3, edgeTypeCol = 2)

Prepare to run prediction algorithm

Once all input files are loaded, pre-process the data ready for prediction:

ep.preprocess()

Specify the type of edge in the graph that should be predicted. In this case, using the data from Bean et al., we want to predict edges from drugs to ADRs which have type HAS_SIDE_EFFECT:

ep.to_predict = 'HAS_SIDE_EFFECT'

Finally, specify the order in which the features (edge types) are iterated over. This final preparation step is optional but encouraged, as in some cases the order is used to break ties. For our example data we my specify:

ep.network_order = ['HAS_SIDE_EFFECT', 'DRUG_TARGETS', 'INDICATED_FOR']

Run prediction algorithm

The algorithm predicts edges of type ep.to_predict from all source nodes to a specified target node. For example, we will predict HAS_SIDE_EFFECT edges from all drug nodes to neuroleptic malignant syndrome (C0027849):

result = ep.predict(target=“C0027849”, calculate_auc=True)

"result" is a python dictionary containing the parameters of the trained model, its predictions and various standard metrics such as precision and recall.

Alternatively, the EdgePrediction library can run the algorithm for all target nodes in the graph that have edges of type ep.to_predict (“HAS_SIDE_EFFECT” in our example):

results = ep.predictAll(calculate_auc=True)