WikiPathways SPARQL queries
On WikiPathways content is replicated in a SPARQL endpoint. Queries can be performed in three ways:
- Either go to the endpoint directly and create your own SPARQL query.
- Copy and paste an example query listed below in the endpoint.
- Adapt a code example to programmatically make a SPARQL query
This project is written up in the “Using the Semantic Web for Rapid Integration of WikiPathways with Other Biological Online Data Resources” paper.
Due to an Apache update, we are now creating RDF data according to SPARQL 1.1.
However, our SPARQL endpoint running on Virtuoso is still using SPARQL 1.0.
This influences the way to query strings, and might affect federated queries.
Please remove the ^^xsd:string suffix
, as shown in the example below.
Example queries
Within the example queries, we have omitted the prefixes. These prefixes are automatically used in the SPARQL endpoint. The following prefixes are used in the WikiPathways RDF:
PREFIX gpml: <>
PREFIX wp: <>
PREFIX cur: <>
PREFIX wprdf: <>
PREFIX biopax: <>
PREFIX cas: <>
PREFIX dc: <>
PREFIX dcterms: <>
PREFIX foaf: <>
PREFIX ncbigene:<>
PREFIX pubmed: <>
PREFIX rdf: <>
PREFIX rdfs: <>
PREFIX skos: <>
PREFIX xsd: <>
Metadata queries
List the information about the data sets in the SPARQL endpoint:
select distinct ?dataset (str(?titleLit) as ?title) ?date ?license where {
?dataset a void:Dataset ;
dcterms:title ?titleLit ;
dcterms:license ?license ;
pav:createdOn ?date .
Pathway-oriented queries
Get the species currently in WikiPathways with their respective URI's
SELECT DISTINCT ?organism (str(?label) as ?name)
?concept wp:organism ?organism ;
wp:organismName ?label .
List pathways and their species
SELECT DISTINCT (str(?title) as ?pathway) (str(?label) as ?organism)
?pw dc:title ?title ;
wp:organism ?organism ;
wp:organismName ?label .
List the species captured in WikiPathways and the number of pathways per species
SELECT DISTINCT ?organism (str(?label) as ?name) (count(?pw) as ?pathwayCount)
?pw dc:title ?title ;
wp:organism ?organism ;
wp:organismName ?label .
ORDER BY DESC(?pathwayCount)
List all pathways for species "Mus musculus"
The following query list all mouse pathways. ?wpIdentifier
is the link through,
points to the RDF version of WikiPathways and ?page is the revision which is loaded
in the SPARQL endpoint.
SELECT DISTINCT ?wpIdentifier ?pathway ?page
?pathway dc:title ?title .
?pathway foaf:page ?page .
?pathway dc:identifier ?wpIdentifier .
?pathway wp:organismName "Mus musculus" .
ORDER BY ?wpIdentifier
Get all pathways with a particular gene
List all pathways per instance of a particular gene or protein (wp:GeneProduct
SELECT DISTINCT ?pathway (str(?label) as ?geneProduct)
?geneProduct a wp:GeneProduct .
?geneProduct rdfs:label ?label .
?geneProduct dcterms:isPartOf ?pathway .
?pathway a wp:Pathway .
FILTER regex(str(?label), "CYP").
Get all groups and complexes containing a particular gene
List all groups and complexes per instance of a particular gene or protein (wp:GeneProduct
SELECT DISTINCT ?pathway (str(?label) as ?geneProduct)
?geneProduct a wp:GeneProduct .
?geneProduct rdfs:label ?label .
?geneProduct dcterms:isPartOf ?pathway .
FILTER NOT EXISTS { ?pathway a wp:Interaction } .
FILTER NOT EXISTS { ?pathway a wp:Pathway } .
FILTER regex(str(?label), "CYP").
Get all the genes on a particular pathway
List all the genes and proteins (wp:GeneProduct
) associated with a particular pathway WPID.
select distinct ?pathway (str(?label) as ?geneProduct) where {
?geneProduct a wp:GeneProduct .
?geneProduct rdfs:label ?label .
?geneProduct dcterms:isPartOf ?pathway .
?pathway a wp:Pathway .
?pathway dcterms:identifier "WP1560" .
Count the number of pathways per ontology term
In WikiPathways, pathways can be tagged with ontology terms from Pathway, Cell Line and Disease
ontology. The following query returns a pathway count for each term from any of the available
ontologies. These terms are collectively modeled as wp:pathwayOntology
; but this includes
all ontologies, not just the “Pathway” ontology.
SELECT DISTINCT ?pwOntologyTerm count(?pwOntologyTerm) as ?pathwayCount
?pathwayRDF wp:ontologyTag ?pwOntologyTerm .
ORDER BY DESC(?pathwayCount)
Count the number of pathways per curation tag
We can also count the number of pathways by curation and community tag:
SELECT ?curationTag (count(DISTINCT ?pathway) as ?pathwayCount)
?pathway wp:ontologyTag ?curationTag .
FILTER contains(STR(?curationTag), "Curation:")
ORDER BY DESC(?pathwayCount)
Get all pathways with a particular ontology term
In WikiPathways, pathways can be tagged with ontology terms from Pathway, Cell Line and Disease ontology. The following query returns a list of pathways tagged with PW_0000296.
PREFIX obo: <>
SELECT ?pathway (str(?titleLit) AS ?title)
?pathwayRDF wp:ontologyTag obo:PW_0000296 ;
foaf:page ?pathway ;
dc:title ?titleLit .
Get all ontology terms for a particular pathway
List all the ontology terms tagged on a particular pathway.
SELECT (?o as ?pwOntologyTerm) (str(?titleLit) as ?title) ?pathway
?pathwayRDF wp:ontologyTag ?o ;
foaf:page ?pathway ;
dc:title ?titleLit ;
dcterms:identifier "WP1560" .
FILTER (! regex(str(?pathway), "group"))
Get all Reactome pathways
List all the ontology terms tagged on a particular pathway.
SELECT DISTINCT ?pathway (str(?titleLit) as ?title)
?pathway wp:ontologyTag cur:Reactome_Approved ;
dc:title ?titleLit .
LIPID MAPS-related queries
Count the number of lipids per pathways in WikiPathways with LIPID MAPS identifier
Converts all Metabolite identifiers to LipidMaps (provided by BridgeDb), and create an ordered list of pathways including lipid compounds.
prefix lipidmaps: <>
select distinct ?pathwayRes (str(?wpid) as ?pathway)
(str(?title) as ?pathwayTitle)
(count(distinct ?lipidID) AS ?LipidsInPWs)
where {
?metabolite a wp:Metabolite ;
dcterms:identifier ?id ;
dcterms:isPartOf ?pathwayRes ;
wp:bdbLipidMaps ?lipidID .
?pathwayRes a wp:Pathway ;
wp:organismName "Homo sapiens" ;
dcterms:identifier ?wpid ;
dc:title ?title .
Count amount of lipids per LIPID MAPS ontology class
Counts unique LIPID MAPS identifier (provided by BridgeDb) for the fatty acid (FA) class, other examples are provided as a comment.
select count(distinct ?lipidID) as ?IndividualLipidsPerClass_FA
where { ?metabolite a wp:Metabolite ;
dcterms:identifier ?id ;
dcterms:isPartOf ?pathwayRes ;
wp:bdbLipidMaps ?lipidID .
?pathwayRes a wp:Pathway ;
wp:organismName "Homo sapiens" ;
dcterms:identifier ?wpid ;
dc:title ?title .
FILTER regex(str(?lipidID), "FA" ). # Other classes: GL, GP, SP, ST, PR, SL, PK
Find pathways per LIPID MAPS ontology class, sorted on amount of unique lipids
Filter all unique LIPID MAPS identifier (provided by BridgeDb) for the fatty acid (FA) class, and find all pathways with individual lipids in there.
select distinct ?pathwayRes (str(?wpid) as ?pathway) (str(?title) as ?pathwayTitle) (count(distinct ?lipidID) AS ?FA_LipidsInPWs)
where { ?metabolite a wp:Metabolite ;
dcterms:identifier ?id ;
dcterms:isPartOf ?pathwayRes ;
wp:bdbLipidMaps ?lipidID .
?pathwayRes a wp:Pathway ;
wp:organismName "Homo sapiens" ;
dcterms:identifier ?wpid ;
dc:title ?title .
FILTER regex(str(?lipidID), "FA" ). # Fatty acids, Other classes: GL, GP, SP, ST, PR, SL, PK
Data statistics-oriented queries
Count the number of metabolites per species
Though strictly speaking, it guesstimates it, because it counts the number of unique metabolite identifiers. Normalization in the RDF generation code ensures we do not double count metabolites with identifiers from different databases, but it still differentially counts metabolites with different charge states.
select (count(distinct ?metabolite) as ?count) (str(?label) as ?species) where {
?metabolite a wp:Metabolite ;
dcterms:isPartOf ?pw .
?pw dc:title ?title ;
wp:organism ?organism ;
wp:organismName ?label .
} GROUP BY ?label ORDER BY DESC(?count)
Interaction-oriented queries
Get all interactions for a particular datanode
Find all interactions that are connected to a particular datanode. (wp:Interaction).
#Find all interactions that are connected to a particular datanode.
SELECT DISTINCT ?interaction ?pathway WHERE {
?pathway a wp:Pathway .
?interaction dcterms:isPartOf ?pathway .
?interaction a wp:Interaction .
?interaction wp:participants <> .
Datasource-oriented queries
Get all datasources currently captured in WikiPathways
SELECT DISTINCT (str(?datasourceLit) as ?datasource)
?concept dc:source ?datasourceLit
Get the number of entries per datasource in WikiPathways
SELECT (str(?datasourceLit) as ?datasource)
(count(DISTINCT ?dataNode) as ?numberEntries)
?concept dc:source ?datasourceLit ;
wp:isAbout ?dataNode .
ORDER BY DESC(?numberEntries)
Return all compounds annotated with the "ChEMBL compound" as data source and the pathways they are in
SELECT DISTINCT ?identifier ?pathway
?concept dcterms:isPartOf ?pathway . ?pathway a wp:Pathway .
?concept dc:source "ChEMBL compound" .
?concept dc:identifier ?identifier .
Literature queries
Articles cited by Reactome but not by WikiPathways
SELECT (COUNT(DISTINCT ?pubmed) AS ?count)
?pubmed a wp:PublicationReference .
MINUS { ?pubmed dcterms:isPartOf/wp:ontologyTag cur:AnalysisCollection }
{ ?pubmed dcterms:isPartOf/wp:ontologyTag cur:Reactome_Approved }
Articles cited by WikiPathways but not by Reactome
SELECT (COUNT(DISTINCT ?pubmed) AS ?count)
?pubmed a wp:PublicationReference .
{ ?pubmed dcterms:isPartOf/wp:ontologyTag cur:AnalysisCollection }
MINUS { ?pubmed dcterms:isPartOf/wp:ontologyTag cur:Reactome_Approved }
Articles cited by both Reactome and WikiPathways
SELECT (COUNT(DISTINCT ?pubmed) AS ?count)
?pubmed a wp:PublicationReference .
{ ?pubmed dcterms:isPartOf/wp:ontologyTag cur:AnalysisCollection }
{ ?pubmed dcterms:isPartOf/wp:ontologyTag cur:Reactome_Approved }