SPOKE-related paper summaries

BioMedical Evidence Graph (BMEG) for Cancer


Exploring Integrative Analysis Using the BioMedical Evidence Graph. Struck A, Walsh B, Buchanan A, Lee JA, Spangler R, Stuart JM, Ellrott K. JCO Clin Cancer Inform. 2020 Feb;4:147-159. PMID: 32097025
[back to paper list]

  • cancer biology graph database and query engine
  • “unique from other biologic data graphs in that sample-level molecular and clinical information is connected to reference knowledge bases”
  • data on 15,000 patients, 52,000 samples, 6.8 million alleles, 640,000 drug-response experiments, and 50,000 literature-derived genotype-to-phenotype associations (> 41 million vertices and 57 million edges)
  • query-based API client code available for Python, Javascript, and R
  • online server bmeg.io and available on GitHub

BMEG Data Sources Include:

(see full list at BMEG website)

Figure: How example queries traverse the graph
(Project TCGA-BRCA, specific queries in paper)

  1. Count Mutations per Gene in Breast Cancer
  2. Identify Pathways Containing Mutated Genes
  3. Determine the Number of Mutations in Each Pathway
  4. Find Publications Relevant to Phenotypic Consequences of Mutations
  5. Find Drugs Described in the Literature to Treat Phenotypes Linked to Mutations
  6. Find Drugs Tested in Breast Cancer Cell Lines
  7. Find the Sensitivity of Breast Cancer Cell Lines to a Drug
  8. Find Gene Expression Data Linked to Cell Lines

“To enable various analytical queries... we developed the Graph Integration Platform (GRIP)... [it] stores multiple forms of data and has the ability to hold thousands of data elements per vertex and per edge of the graph. This allows it to store sparse relationship data... as well as dense matrix-formatted data... The GRIP Query Language is a traversal-based graph-selection language inspired by Gremlin.”

Thoughts

  • impressive amount of data in the graph
  • significant learning curve for query language; not accessible to most researchers
  • is it updated/updatable? maybe not
  • are any of the data sources or ontologies of use to us?