ClinVar is a freely accessible, public archive of reports of the relationships among human variations and phenotypes, with supporting evidence. ClinVar thus facilitates access to and communication about the relationships asserted between human variation and observed health status, and the history of that interpretation.
Variants are (usually manually) classified by clinical laboratories on a categorical spectrum ranging from benign, likely benign, uncertain significance, likely pathogenic, and pathogenic. Variants that have conflicting classifications (from laboratory to laboratory) can cause confusion when clinicians or researchers try to interpret whether the variant has an impact on the disease of a given patient.
The objective is to predict whether a ClinVar variant will have conflicting classifications. This is presented here as a binary classification method, where each data in the dataset is a genetic variant.
Conflicting classifications are when two of any of the following three categories are present for one variant.
Likely Benign or Benign
Variant of uncertain significance
Likely Pathogenic or Pathogenic
I have assigned Conflicting classification to the CLASS column. It is a binary representation of whether or not a variant has conflicting classifications, where 0 represents consistent classifications and 1 represents conflicting classifications.
I have removed all the variants from original clinvar since Since this problem only relates to variants with multiple classifications.
I have added the code in github - https://github.com/Mchockalingam/OpenDL/blob/master/Classification.ipynb
Result - Random Classifier produces the best result in the analysis.
In my next post I will analyse the same data using Keras and publish results.