Skip to content

Commit

Permalink
Add brief description of RSI model to quickstart (#492)
Browse files Browse the repository at this point in the history
* integrate custom docs with new UI

* more edits

* use website wording for intro

* fix numbering in table

* rename and some edits

* rename manage_repo file, per Bo

* Merge.

* formatting edits

* updates per Keyur's feedback

* Fix typos

* fix nav order

* fix link to API key request form

* update form link

* update key request form and output dir env var

* Revert to gerund

Though the style guide says to just use imperatives, "get started" just sounds weird. Also this is more consistent with "troubleshooting"

* new troubleshooting entry

* fix typo

* new data container procedures

* more work

* more work

* complete data draft

* more changes

* more changes

* more revisions

* update troubleshooting doc etc.

* new version of diagrams

* remove data loading problems troubleshooting entry; can't reproduce

* revert title change

* add example for not mixing entity types

* changes from Keyur

* add screenshots for GCP, and related changes

* fixed one image

* added screenshots for Cloud Run service

* resize images

* more changes from Keyur

* fix a tiny error

* delete unused images

* fix missing dash

* update build file

* adjust build command

* Revert "adjust build command"

This reverts commit 4ce0fb9.

* update docker file

* more fixes

* one last fix

* make links to Cloud Console open in a new page

* fixes to quickstart suggested by Prem

* one more change

* change from Keyur

* revise procedure

* add brief explanation of data model to quickstart

* slight wording tweak

* incorporate feedback from Keyur

* remove erroneous edit

* correct missing text
  • Loading branch information
kmoscoe authored Aug 29, 2024
1 parent 5d317e3 commit cafd60b
Show file tree
Hide file tree
Showing 4 changed files with 42 additions and 18 deletions.
4 changes: 2 additions & 2 deletions custom_dc/custom_data.md
Original file line number Diff line number Diff line change
@@ -1,12 +1,12 @@
---
layout: default
title: Work with custom data
title: Prepare and load your own data
nav_order: 3
parent: Build your own Data Commons
---

{:.no_toc}
# Work with custom data
# Prepare and load your own data

This page shows you how to format and load your own custom data into your local instance. This is step 2 of the [recommended workflow](/custom_dc/index.html#workflow).

Expand Down
5 changes: 3 additions & 2 deletions custom_dc/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -83,8 +83,9 @@ The cost of running a site on Google Cloud Platform depends on the size of your
{: #workflow}
## Recommended workflow

1. Work through the [Getting started](/custom_dc/quickstart.html) page to learn how to run a local Data Commons instance and load some sample custom data.
1. Prepare your real-world custom data and load it in the local custom instance. Data Commons requires your data to be in a specific format. See [Work with custom data](/custom_dc/custom_data.html).
1. Work through the [Getting started](/custom_dc/quickstart.html) page to learn how to run a local Data Commons instance and load some sample data.
1. Prepare your real-world data and load it in the local custom instance. Data Commons requires your data to be in a specific format. See [Prepare and load your own data](/custom_dc/custom_data.html) for details.
> Note: This section is very important! If your data is not in the scheme Data Commons expects, it won't load.
1. If you want to customize the look of the feel of the site, see [Customize the site](/custom_dc/custom_ui.html).
1. When you have finished testing locally, host your data and code in Google Cloud Platform: first, upload your data to Google Cloud Storage and create a Cloud Run job to load the data into Google Cloud SQL. See [Load data in Google Cloud](/custom_dc/data_cloud.html).
1. Build a custom image, upload it to the Google Cloud Artifact Registry and create a Cloud Run service to run the site. See [Deploy services to Google Cloud](deploy_cloud.md)
Expand Down
30 changes: 27 additions & 3 deletions custom_dc/quickstart.md
Original file line number Diff line number Diff line change
Expand Up @@ -94,7 +94,7 @@ cd website
<tbody>
<tr>
<td width="300"><a href="https://github.com/datacommonsorg/website/tree/master/custom_dc/sample" target="_blank"><code>custom_dc/sample/</code></a></td>
<td>Sample supplemental data that is added to the base data in Data Commons. This page shows you how to easily load and view this data. The data is in CSV format and mapped to Data Commons entity definitions using the `config.json` file. </td>
<td>Sample supplemental data that is added to the base data in Data Commons. This page describes the model and format of this data and how you can load and view it. </td>
</tr>
<tr>
<td><a href="https://github.com/datacommonsorg/website/tree/master/custom_dc/examples" target="_blank"><code>custom_dc/examples/</code></a></td>
Expand All @@ -115,9 +115,33 @@ cd website
</tbody>
</table>

## Load data
## Look at the sample data

In this step, we will add sample data that we have included as part of the download for you to load it into your custom instance. This data is from the Organisation for Economic Co-operation and Development (OECD): "per country data for annual average wages" and "gender wage gaps".
Before you start up a Data Commons site, it's important to understand the basics of the data model that is expected in a custom Data Commons instance. Let's look at the sample data in the CSV files in the `custom_dc/sample/` folder. This data is from the Organisation for Economic Co-operation and Development (OECD): "per country data for annual average wages" and "gender wage gaps":

countryAlpha3Code | date | average_annual_wage |
------------------|-------|---------------------|
BEL | 2000 | 54577.62735 |
BEL | 2001 | 54743.96009 |
BEL | 2002 | 56157.24355 |
BEL | 2003 | 56491.99591 |
... | ... | ... |

countryAlpha3Code | date | gender_wage_gap |
------------------|-------|-----------------|
DNK | 2005 | 10.16733044 |
DNK | 2006 | 10.17206126 |
DNK | 2007 | 9.850297951 |
DNK | 2008 | 10.18354903 |
... | ... | ... |

There are a few important things to note:
- There are only 3 columns: one representing a place (`countryAlpha3Code`, a [special Data Commons place type](/custom_dc/custom_data.html#special-names)); one representing a date (`date`); and one representing a [_statistical variable_](/glossary.html#variable), which is a Data Commons concept for a metric: `average_annual_wage` and `gender_wage_gap`. (Actually, there can be any number of statistical variable columns -- but no other types of additional columns -- and these two CSV files could be combined into one.)
- Every row is a separate [_observation_](/glossary.html#observation), or a value of the variable for a given place and time. In the case of multiple statistical variable columns in the same file, each row would, of course, consist of multiple observations.

This is the format to which your data must conform if you want to take advantage of Data Commons' simple import facility. If your data doesn't follow this model, you'll need to do some more work to prepare or configure it for correct loading. (That topic is discussed in detail in [Preparing and loading your data](custom_data.md).)

## Load sample data

To load the sample data:

Expand Down
21 changes: 10 additions & 11 deletions glossary.md
Original file line number Diff line number Diff line change
Expand Up @@ -87,7 +87,16 @@ Property of [variables](#variable) that measure proportions, used in conjunction

As an example, in 1999, [approximately 36% of Canadians were Internet users](https://datacommons.org/browser/dc/o/0d9e3dd3y6yt3){: target="_blank"}. Here the measured value of `Count_Person_IsInternetUser_PerCapita` is 36, and the scaling factor or denominator for this per capita measurement is 100. Without the scaling factor, we would interpret the value to be 36/1, or 3600%.

A complete list of properties can be found in the [Knowledge Graph](https://datacommons.org/browser/scalingFactor){: target="_blank"}.
### [Statistical Variable](https://datacommons.org/browser/StatisticalVariable){: target="_blank"}
{: #variable}

Any type of metric, statistic, or measure that can be measured for a specific entity (most typically a place, but could be any other entity in the graph, such as a school or power plant) and time. Examples include [median income of persons older than 16](https://datacommons.org/browser/Median_Income_Person_16OrMoreYears){: target="_blank"}, [number of female high school graduates aged 18 to 24](https://datacommons.org/browser/Count_Person_18To24Years_EducationalAttainmentHighSchoolGraduateIncludesEquivalency_Female){: target="_blank"}, [unemployment rate](https://browser.datacommons.org/browser/UnemploymentRate_Person){: target="_blank"}, or [percentage of persons with diabetes](https://browser.datacommons.org/browser/Percent_Person_WithDiabetes){: target="_blank"}. A complete list of variables can be found in the [Knowledge Graph](https://datacommons.org/browser/StatisticalVariable){: target="_blank"}.

### [Statistical Variable Group](https://datacommons.org/browser/StatVarGroup){: target="_blank"}
{: #variable-group}

Represents a grouping of variables that are conceptually related. For example, variable group [Person With Gender = Female](https://datacommons.org/browser/dc/g/Person_Gender-Female){: target="_blank"} consists of variables like [Female Median Age](https://datacommons.org/browser/Median_Age_Person_Female){: target="_blank"}, [Female Median Income](https://datacommons.org/browser/Median_Income_Person_15OrMoreYears_Female_WithIncome){: target="_blank"} and etc. A variable group could also have child variable groups, which describe a subset of the parent variable group. For example, variable group [Person With Age, Gender = Female](https://datacommons.org/browser/dc/g/Person_Age_Gender-Female){: target="_blank"} is a child of [Person With Gender = Female](https://datacommons.org/browser/dc/g/Person_Gender-Female){: target="_blank"}. It contains variables that have age constraints.


### Triple
{: #triple}
Expand All @@ -113,13 +122,3 @@ USA -- containedInPlace --> northamerica
{: #unit}

The unit of measurement. Examples include [kilowatt hours](https://datacommons.org/browser/KilowattHour){: target="_blank"}, [inches](https://datacommons.org/browser/Inch){: target="_blank"}, and [Indian Rupees](https://datacommons.org/browser/IndianRupee){: target="_blank"}. A complete list of properties can be found in the [Knowledge Graph](https://datacommons.org/browser/unit){: target="_blank"}.

### [Statistical Variable](https://datacommons.org/browser/StatisticalVariable){: target="_blank"}
{: #variable}

Any type of metric, statistic, or measure that can be measured at a place and time. Examples include [median income of persons older than 16](https://datacommons.org/browser/Median_Income_Person_16OrMoreYears){: target="_blank"}, [number of female high school graduates aged 18 to 24](https://datacommons.org/browser/Count_Person_18To24Years_EducationalAttainmentHighSchoolGraduateIncludesEquivalency_Female){: target="_blank"}, [unemployment rate](https://browser.datacommons.org/browser/UnemploymentRate_Person){: target="_blank"}, or [percentage of persons with diabetes](https://browser.datacommons.org/browser/Percent_Person_WithDiabetes){: target="_blank"}. A complete list of variables can be found in the [Knowledge Graph](https://datacommons.org/browser/StatisticalVariable){: target="_blank"}.

### [Statistical Variable Group](https://datacommons.org/browser/StatVarGroup){: target="_blank"}
{: #variable-group}

Represents a grouping of variables that are conceptually related. For example, variable group [Person With Gender = Female](https://datacommons.org/browser/dc/g/Person_Gender-Female){: target="_blank"} consists of variables like [Female Median Age](https://datacommons.org/browser/Median_Age_Person_Female){: target="_blank"}, [Female Median Income](https://datacommons.org/browser/Median_Income_Person_15OrMoreYears_Female_WithIncome){: target="_blank"} and etc. A variable group could also have child variable groups, which describe a subset of the parent variable group. For example, variable group [Person With Age, Gender = Female](https://datacommons.org/browser/dc/g/Person_Age_Gender-Female){: target="_blank"} is a child of [Person With Gender = Female](https://datacommons.org/browser/dc/g/Person_Gender-Female){: target="_blank"}. It contains variables that have age constraints.

0 comments on commit cafd60b

Please sign in to comment.