layout | title | nav_order | parent | grand_parent |
---|---|---|---|---|
default |
Query using SPARQL |
8 |
Python |
Retrieve statistics value for a place |
Returns the results of running a graph query on the Data Commons knowledge graph
using SPARQL{: target="_blank"}. Note that Data Commons is only
able to support a limited subsection of SPARQL functionality at this time: specifically only the keywords ORDER BY
, DISTINCT
, and LIMIT
.
Note: The Python SPARQL library currently only supports the V1 version of the API.
Signature:
datacommons.query(query_string, select=None)
Required arguments:
query_string
: A SPARQL query string.
This method makes it possible to query the Data Commons knowledge graph using SPARQL. SPARQL is a query language developed to retrieve data from websites. It leverages the graph structure innate in the data it queries to return specific information to an end user. For more information on assembling SPARQL queries, check out the Wikipedia page about SPARQL{: target="_blank"} and the W3C specification information{: target="_blank"}.
This method accepts the additional optional argument select
. This function selects rows to be returned by query
. Under the hood, the select
function examines a row in the results of executing query_string
and returns True
if and only if the row is to be returned by query
. The row passed in as an argument is represented as a dict
that maps a query variable in query_string
to its value in the given row.
NOTE:
- In the query, each variable should have a
typeOf
condition, e.g."?var typeOf City ."
.
A correct response will always look like this:
[
{
"<field name>": "<field value>",
...
},
...
]
The response contains an array of dictionaries, each corresponding to one node matching the conditions of the query. Each dictionary's keys match the variables in the query SELECT clause, and the values in the dictionaries are those associated to the given node's query-specified properties.
The following examples and error responses, along with explanations and fixes for the errors, are available in this Python notebook{: target="_blank"}.
>>> geoId06_name_query = 'SELECT ?name ?dcid WHERE { ?a typeOf Place . ?a name ?name . ?a dcid ("geoId/06" "geoId/21" "geoId/24") . ?a dcid ?dcid }'
>>> datacommons.query(geoId06_name_query)
[{'?name': 'Kentucky', '?dcid': 'geoId/21'}, {'?name': 'California', '?dcid': 'geoId/06'}, {'?name': 'Maryland', '?dcid': 'geoId/24'}]
>>> bio_specimens_reverse_alphabetical_order_query = 'SELECT ?name WHERE { ?biologicalSpecimen typeOf BiologicalSpecimen . ?biologicalSpecimen name ?name } ORDER BY DESC(?name) LIMIT 10'
>>> datacommons.query(bio_specimens_reverse_alphabetical_order_query)
[{'?name': 'x Triticosecale'}, {'?name': 'x Silene'}, {'?name': 'x Silene'}, {'?name': 'x Silene'}, {'?name': 'x Pseudelymus saxicola (Scribn. & J.G.Sm.) Barkworth & D.R.Dewey'}, {'?name': 'x Pseudelymus saxicola (Scribn. & J.G.Sm.) Barkworth & D.R.Dewey'}, {'?name': 'x Pseudelymus saxicola (Scribn. & J.G.Sm.) Barkworth & D.R.Dewey'}, {'?name': 'x Pseudelymus saxicola (Scribn. & J.G.Sm.) Barkworth & D.R.Dewey'}, {'?name': 'x Pseudelymus saxicola (Scribn. & J.G.Sm.) Barkworth & D.R.Dewey'}, {'?name': 'x Pseudelymus saxicola (Scribn. & J.G.Sm.) Barkworth & D.R.Dewey'}]
>>> gni_by_country_query = 'SELECT ?observation ?place WHERE { ?observation typeOf StatVarObservation . ?observation variableMeasured Amount_EconomicActivity_GrossNationalIncome_PurchasingPowerParity_PerCapita . ?observation observationAbout ?place . ?place typeOf Country . } ORDER BY ASC (?place) LIMIT 10'
>>> datacommons.query(gni_by_country_query)
[{'?observation': 'dc/o/syrpc3m8q34z7', '?place': 'country/ABW'}, {'?observation': 'dc/o/bqtfmc351v0f2', '?place': 'country/ABW'}, {'?observation': 'dc/o/md36fx6ty4d64', '?place': 'country/ABW'}, {'?observation': 'dc/o/bm28zvchsyf4b', '?place': 'country/ABW'}, {'?observation': 'dc/o/3nleez1feevw6', '?place': 'country/ABW'}, {'?observation': 'dc/o/x2yg38d0xecnf', '?place': 'country/ABW'}, {'?observation': 'dc/o/7swdqf6yjdyw8', '?place': 'country/ABW'}, {'?observation': 'dc/o/yqmsmbx1qskfg', '?place': 'country/ABW'}, {'?observation': 'dc/o/6hlhrz3k8p5wf', '?place': 'country/ABW'}, {'?observation': 'dc/o/txfw505ydg629', '?place': 'country/ABW'}]
>>> internationalDollar_obs_query = 'SELECT ?observation WHERE { ?observation typeOf StatVarObservation . ?observation unit InternationalDollar } LIMIT 10'
>>> datacommons.query(internationalDollar_obs_query)
[{'?observation': 'dc/o/s3gzszzvj34f1'}, {'?observation': 'dc/o/gd41m7qym86d4'}, {'?observation': 'dc/o/wq62twxx902p4'}, {'?observation': 'dc/o/d93kzvns8sq4c'}, {'?observation': 'dc/o/6s741lstdqrg4'}, {'?observation': 'dc/o/2kcq1xjkmrzmd'}, {'?observation': 'dc/o/ced6jejwv224f'}, {'?observation': 'dc/o/q31my0dmcryzd'}, {'?observation': 'dc/o/96frt9w0yjwxf'}, {'?observation': 'dc/o/rvjz5xn9mlg73'}]
Example 5: Retrieve a list of ten distinct annual estimates of life expectancy, along with the year of estimation, for forty-seven-year-old Hungarians.
>>> life_expectancy_query = 'SELECT DISTINCT ?LifeExpectancy ?year WHERE { ?o typeOf StatVarObservation . ?o variableMeasured LifeExpectancy_Person_47Years . ?o observationAbout country/HUN . ?o value ?LifeExpectancy . ?o observationDate ?year } ORDER BY ASC(?LifeExpectancy) LIMIT 10'
>>> datacommons.query(life_expectancy_query)
[{'?LifeExpectancy': '26.4', '?year': '1993'}, {'?LifeExpectancy': '26.5', '?year': '1992'}, {'?LifeExpectancy': '26.7', '?year': '1990'}, {'?LifeExpectancy': '26.7', '?year': '1994'}, {'?LifeExpectancy': '26.8', '?year': '1991'}, {'?LifeExpectancy': '26.9', '?year': '1995'}, {'?LifeExpectancy': '27.2', '?year': '1996'}, {'?LifeExpectancy': '27.4', '?year': '1999'}, {'?LifeExpectancy': '27.5', '?year': '1997'}, {'?LifeExpectancy': '27.5', '?year': '1998'}]
>>> names_for_places_query = 'SELECT ?name ?dcid WHERE { ?a typeOf Place . ?a name ?name . ?a dcid ("geoId/06" "geoId/21" "geoId/24") . ?a dcid ?dcid }'
>>> maryland_selector = lambda row: row['?name'] == 'Maryland'
>>> result = datacommons.query(names_for_places_query, select=maryland_selector)
>>> for r in result:
... print(r)
...
{'?name': 'Maryland', '?dcid': 'geoId/24'}
>>> gni_by_country_query = 'SELECT ?observation WHERE { ?observation typeOf StatVarObservation . ?observation variableMeasured Amount_EconomicActivity_GrossNationalIncome_PurchasingPowerParity_PerCapita . ?observation observationAbout ?place . ?place typeOf Country . } ORDER BY ASC (?place) LIMIT 10'
>>> datacommons.query(gni_by_country_query)
Traceback (most recent call last):
File "/home/porpentina/miniconda3/lib/python3.7/site-packages/datacommons/query.py", line 102, in query
res = six.moves.urllib.request.urlopen(req)
File "/home/porpentina/miniconda3/lib/python3.7/urllib/request.py", line 222, in urlopen
return opener.open(url, data, timeout)
File "/home/porpentina/miniconda3/lib/python3.7/urllib/request.py", line 531, in open
response = meth(req, response)
File "/home/porpentina/miniconda3/lib/python3.7/urllib/request.py", line 641, in http_response
'http', request, response, code, msg, hdrs)
File "/home/porpentina/miniconda3/lib/python3.7/urllib/request.py", line 569, in error
return self._call_chain(*args)
File "/home/porpentina/miniconda3/lib/python3.7/urllib/request.py", line 503, in _call_chain
result = func(*args)
File "/home/porpentina/miniconda3/lib/python3.7/urllib/request.py", line 649, in http_error_default
raise HTTPError(req.full_url, code, msg, hdrs, fp)
urllib.error.HTTPError: HTTP Error 500: Internal Server Error
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/porpentina/miniconda3/lib/python3.7/site-packages/datacommons/query.py", line 104, in query
raise ValueError('Response error {}:\n{}'.format(e.code, e.read()))
ValueError: Response error 500:
b'{\n "code": 2,\n "message": "googleapi: Error 400: Unrecognized name: place; Did you mean name? at [1:802], invalidQuery",\n "details": [\n {\n "@type": "type.googleapis.com/google.rpc.DebugInfo",\n "stackEntries": [],\n "detail": "internal"\n }\n ]\n}\n'
>>> gni_by_country_query = 'SELECT ?observation WHERE { ?observation typeOf StatVarObservation . ?observation variableMeasured Amount_EconomicActivity_GrossNationalIncome_PurchasingPowerParity_PerCapita . ?observation observationAbout ?place . ?place typeOf Country . } ORDER BY ASC (?place) LIMIT 10'
>>> datacommons.query(gni_by_country_query)
Traceback (most recent call last):
File "/home/porpentina/miniconda3/lib/python3.7/site-packages/datacommons/query.py", line 102, in query
res = six.moves.urllib.request.urlopen(req)
File "/home/porpentina/miniconda3/lib/python3.7/urllib/request.py", line 222, in urlopen
return opener.open(url, data, timeout)
File "/home/porpentina/miniconda3/lib/python3.7/urllib/request.py", line 531, in open
response = meth(req, response)
File "/home/porpentina/miniconda3/lib/python3.7/urllib/request.py", line 641, in http_response
'http', request, response, code, msg, hdrs)
File "/home/porpentina/miniconda3/lib/python3.7/urllib/request.py", line 569, in error
return self._call_chain(*args)
File "/home/porpentina/miniconda3/lib/python3.7/urllib/request.py", line 503, in _call_chain
result = func(*args)
File "/home/porpentina/miniconda3/lib/python3.7/urllib/request.py", line 649, in http_error_default
raise HTTPError(req.full_url, code, msg, hdrs, fp)
urllib.error.HTTPError: HTTP Error 500: Internal Server Error
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/porpentina/miniconda3/lib/python3.7/site-packages/datacommons/query.py", line 104, in query
raise ValueError('Response error {}:\n{}'.format(e.code, e.read()))
ValueError: Response error 500:
b'{\n "code": 2,\n "message": "googleapi: Error 400: Unrecognized name: place; Did you mean name? at [1:802], invalidQuery",\n "details": [\n {\n "@type": "type.googleapis.com/google.rpc.DebugInfo",\n "stackEntries": [],\n "detail": "internal"\n }\n ]\n}\n'
>>> gni_by_country_query = 'SELECT ?observation WHERE { ?observation typeOf StatVarObservation . \\\\\ ?observation variableMeasured Amount_EconomicActivity_GrossNationalIncome_PurchasingPowerParity_PerCapita . ?observation observationAbout ?place . ?place typeOf Country . } ORDER BY ASC (?place) LIMIT 10'
>>> names_for_places_query = 'SELECT ?name ?dcid WHERE { ?a typeOf Place . ?a name ?name . ?a dcid ("geoId/06" "geoId/21" "geoId/24") . ?a dcid ?dcid }'
>>> bad_selector = lambda row: row['?earthquake'] == 'Nonexistent'
>>> result = datacommons.query(names_for_places_query, select=bad_selector)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/porpentina/miniconda3/lib/python3.7/site-packages/datacommons/query.py", line 127, in query
if select is None or select(row_map):
File "<stdin>", line 1, in <lambda>
KeyError: '?earthquake'