Skip to content

Commit

Permalink
Fixing various broken internal links (#397)
Browse files Browse the repository at this point in the history
* first round of custom DC docs

* first complete version of custom DC docs

* added troubleshooting item

* fixed CSV format

* fixed HTML formatting

* small fixes

* remove toc attempt

* edits from Keyur

* more edits

* removed curl procedure

* add NL note

* more edits from Keyur

* fix link

* remove curl procedure and other tiny fixes

* more tiny fixes

* fixes for curl removal

* fix various broken links

* resolving merge conflicts

* Update custom_dc/custom_data.md

Co-authored-by: Julia Wu <[email protected]>

* Update custom_dc/custom_data.md

Co-authored-by: Julia Wu <[email protected]>

---------

Co-authored-by: Julia Wu <[email protected]>
  • Loading branch information
kmoscoe and juliawu authored May 17, 2024
1 parent c8f9667 commit ec505ae
Show file tree
Hide file tree
Showing 7 changed files with 31 additions and 27 deletions.
5 changes: 3 additions & 2 deletions custom_dc/build_repo.md
Original file line number Diff line number Diff line change
Expand Up @@ -47,9 +47,10 @@ It will take several minutes to build.

To run the container with the local SQLite database, start the Docker container as described below.

To run the container with a remote Cloud SQL database, see [Start the Docker container with Cloud data](build_repo.md#docker-data) for procedures.
To run the container with a remote Cloud SQL database, see [Start the Docker container with Cloud data](/custom_dc/build_repo.html#docker-data) for procedures.

To upload and deploy the container to the Cloud, see [Deploy a custom instance to Google Cloud](/custom_dc/deploy_cloud.html) for procedures.

To upload and deploy the container to the Cloud, see [Deploy a custom instance to Google Cloud](deploy_cloud.md) for procedures.

## Run the container with the local SQLite database

Expand Down
22 changes: 11 additions & 11 deletions custom_dc/custom_data.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,13 +17,13 @@ Examples are provided in [`custom_dc/sample`](https://github.com/datacommonsorg/

## Prepare the CSV files {#prepare-csv}

Custom Data Commons provides a simplified data model, which allows your data to be mapped to the Data Commons knowledge graph schema. Data in the CSV files should conform to a _variable per column_ scheme. This requires minimal manual configuration; the Data Commons importer can create observations and statistical variables if they don't already exist, and it resolves all columns to [DCID](../glossary.md#dcid)s.
Custom Data Commons provides a simplified data model, which allows your data to be mapped to the Data Commons knowledge graph schema. Data in the CSV files should conform to a _variable per column_ scheme. This requires minimal manual configuration; the Data Commons importer can create observations and statistical variables if they don't already exist, and it resolves all columns to [DCID](../glossary.html#dcid)s.

With the variable-per-column scheme, data is provided in this format, in this exact sequence:

_ENTITY, OBSERVATION_DATE, STATISTICAL_VARIABLE1, STATISTICAL_VARIABLE2, …_

There are two properties, the _ENTITY_ and the _OBSERVATION\_DATE_, that specify the place and time of the observation; all other properties must be expressed as [statistical variables](../glossary.md#variable). To illustrate what this means, consider this example: let's say you have a dataset that provides the number of public schools in U.S. cities, broken down by elementary, middle, secondary and postsecondary. Your data might have the following structure, which we identify as _variable per row_ (numbers are not real, but are just made up for the sake of example):
There are two properties, the _ENTITY_ and the _OBSERVATION\_DATE_, that specify the place and time of the observation; all other properties must be expressed as [statistical variables](../glossary.html#variable). To illustrate what this means, consider this example: let's say you have a dataset that provides the number of public schools in U.S. cities, broken down by elementary, middle, secondary and postsecondary. Your data might have the following structure, which we identify as _variable per row_ (numbers are not real, but are just made up for the sake of example):

```csv
city,year,typeOfSchool,count
Expand All @@ -45,21 +45,21 @@ San Francisco,2023,300,300,200,50
San Jose,2023,400,400,300,0
```

The _ENTITY_ is an existing property in the Data Commons knowledge graph that is used to describe an entity, most commonly a place. The best way to think of the entity type is as a key that could be used to join to other data sets. The column heading can be expressed as any existing place-related property; see [Place types](../place_types.md) for a full list. It may also be any of the special DCID prefixes listed in (Special place names)[#special-names].

The _ENTITY_ is an existing property in the Data Commons knowledge graph that is used to describe an entity, most commonly a place. The best way to think of the entity type is as a key that could be used to join to other data sets. The column heading can be expressed as any existing place-related property; see [Place types](../place_types.html) for a full list. It may also be any of the special DCID prefixes listed in (Special place names)[#special-names].
The _ENTITY_ is an existing property in the Data Commons knowledge graph that is used to describe an entity, most commonly a place. The best way to think of the entity type is as a key that could be used to join to other data sets. The column heading can be expressed as any existing place-related property; see [Place types](../place_types.html) for a full list. It may also be any of the special DCID prefixes listed in [Special place names](#special-names).
The _DATE_ is the date of the observation and should be in the format _YYYY_, _YYYY_-_MM_, or _YYYY_-_MM_-_DD_. The heading can be anything, although as a best practice, we recommend using a corresponding identifier, such as `year`, `month` or `date`.

The _VARIABLE_ should contain a metric [observation](../glossary.md#observation) at a particular time. We recommend that you try to reuse existing statistical variables where feasible; use the main Data Commons [Statistical Variable Explorer](https://datacommons.org/tools/statvar) to find them. If there is no existing statistical variable you can use, name the heading with an illustrative name and the importer will create a new variable for you.
The _VARIABLE_ should contain a metric [observation](../glossary.html#observation) at a particular time. We recommend that you try to reuse existing statistical variables where feasible; use the main Data Commons [Statistical Variable Explorer](https://datacommons.org/tools/statvar) to find them. If there is no existing statistical variable you can use, name the heading with an illustrative name and the importer will create a new variable for you.

The variable values must be numeric. Zeros and null values are accepted: zeros will be recorded and null values ignored.

All headers must be in camelCase.

### Special place names {#special-names}

In addition to the place names listed in [Place types](../place_types.md), you can also use the following special names:
In addition to the place names listed in [Place types](../place_types.html), you can also use the following special names:

* [`dcid`](../glossary.md#dcid) --- An already resolved DC ID. Examples:`country/USA`, `geoId/06`
* [`dcid`](../glossary.html#dcid) --- An already resolved DC ID. Examples:`country/USA`, `geoId/06`
* `country3AlphaCode` --- Three-character country codes. Examples: `USA`, `CHN`
* `geoId` --- Place geo IDs. Examples: `06`, `023`
* `lat#lng` --- Latitude and longitude of the place using the format _lat_#_long_. Example: `38.7#-119.4`
Expand Down Expand Up @@ -154,7 +154,7 @@ The first set of parameters only applies to `foo.csv`. The second set of paramet

`entityType`

: Required: All entities in a given file must be of a specific type. This type should be specified as the value of the <code>entityType</code> field. The importer tries to resolve entities to DCIDs of that type. In most cases, the <code>entityType</code> will be a supported place type; see [Place types](../place_types.md) for a list.
: Required: All entities in a given file must be of a specific type. This type should be specified as the value of the <code>entityType</code> field. The importer tries to resolve entities to DCIDs of that type. In most cases, the <code>entityType</code> will be a supported place type; see [Place types](../place_types.html) for a list.

`ignoreColumns`

Expand Down Expand Up @@ -184,7 +184,7 @@ The name should be concise and precise; that is, the shortest possible name that

`properties`

: Additional Data Commons properties associated with this variable. These are Data Commons property entities. See [Representing statistics in Data Commons](../data/blob/master/docs/representing_statistics.md#statisticalvariable) for more details.
: Additional Data Commons properties associated with this variable. These are Data Commons property entities. See [Representing statistics in Data Commons](https://github.com/datacommonsorg/data/blob/master/docs/representing_statistics.md) for more details.

Each property is specified as a key:value pair. Here are some examples:

Expand Down Expand Up @@ -230,8 +230,8 @@ The `sources` section is optional. It encodes the sources and provenances associ

## Load local custom data

To load custom data uploaded to Google Cloud, see instead [Pointing the local Data Commons site to the Cloud data](testing_cloud.md) for procedures.

To load custom data uploaded to Google Cloud, see instead [Pointing the local Data Commons site to the Cloud data](/custom_dc/testing_cloud.html) for procedures.
To load custom data uploaded to Google Cloud, see instead [Pointing the local Data Commons site to the Cloud data](/custom_dc/data_cloud.html) for procedures.
### Start the Docker container with local custom data {#docker-data}

Once you have your CSV files and config.json set up, use the following command to restart the Docker container, mapping your custom data directory to the Docker userdata directory.
Expand Down
2 changes: 1 addition & 1 deletion custom_dc/custom_ui.md
Original file line number Diff line number Diff line change
Expand Up @@ -76,4 +76,4 @@ Alternatively, if you have existing existing CSS and Javascript files, put them

See [`server/templates/custom_dc/custom/new.html`](https://github.com/datacommonsorg/website/blob/master/server/templates/custom_dc/custom/new.html) as an example.

Note: Currently, making changes to any of the files in the `static/` directory requires that you rebuild a local version of the repo to pick up the changes, as described in [Build and run a local repo](build_repo.md). We plan to fix this in the near future.
Note: Currently, making changes to any of the files in the `static/` directory requires that you rebuild a local version of the repo to pick up the changes, as described in [Build and run a local repo](/custom_dc/build_repo.html). We plan to fix this in the near future.
3 changes: 2 additions & 1 deletion custom_dc/data_cloud.md
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,7 @@ You will upload your CSV and JSON files to [Google Cloud Storage](https://cloud.

While you are testing, you can start with a single Google Cloud region; to be close to the main Data Commons data, you can use `us-central1`. However, once you launch, you may want to host your data and application closer to where your users will be. In any case, you should use the _same region_ for your Google Cloud SQL instance, the Google Cloud Storage buckets, and the [Google Cloud Run service](deploy_cloud.md) where you will host the site. For a list of supported regions, see Cloud SQL [Manage instance locations](https://cloud.google.com/sql/docs/mysql/locations).


### Create a Google Cloud SQL instance

1. Go to [https://console.cloud.google.com/sql/instances](https://console.cloud.google.com/sql/instances) for your project.
Expand Down Expand Up @@ -120,7 +121,7 @@ gcr.io/datcom-ci/datacommons-website-compose:stable

#### Run with a locally built repo

If you have made local changes and have a [locally built repo](build_repo.md), from the `website` root of the repository, run the following:
If you have made local changes and have a [locally built repo](custom_dc/build_repo.html), from the `website` root of the repository, run the following:

<pre>
docker run -it \
Expand Down
4 changes: 2 additions & 2 deletions custom_dc/deploy_cloud.md
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,7 @@ You push a locally built Docker image to the [Google Cloud Artifact Registry](ht

This procedure creates a "dev" Docker package that you upload to the Google Cloud Artifact Registry, and then deploy to Google Cloud Run.

1. Build a local version of the Docker image, following the procedure in [Build the local repo](build_repo.md).
1. Build a local version of the Docker image, following the procedure in [Build the local repo](/custom_dc/build_repo.html).
1. Authenticate to gcloud:

```shell
Expand Down Expand Up @@ -106,5 +106,5 @@ See also [Deploying to Cloud Run](https://cloud.google.com/run/docs/deploying) f
Once you have deployed a custom instance to Google Cloud, you can continue to update your custom data in two ways:
- Load the data from a local running instance, as described in [Load custom data in Cloud SQL](data_cloud.md#load-data-cloudsql)
- Load the data from a local running instance, as described in [Load custom data in Cloud SQL](/custom_dc/data_cloud.html#load-data-cloudsql)
- Use the `/admin` page from the running Cloud app.
14 changes: 7 additions & 7 deletions custom_dc/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -41,8 +41,8 @@ Also, if you want to add all of your data to the main Data Commons and test how

For the following use cases, a custom Data Commons instance is not necessary:

- You only want to make your own data available to the main public Data Commons site and don't need to test it. In this case, see the procedures in [Data imports](../import_data/index.md).
- You want to make the base public data or visualizations available in your own site. For this purpose, you can call the Data Commons APIs from your site; see [Data Commons web components](../api/web_components.md) for more details.
- You only want to make your own data available to the main public Data Commons site and don't need to test it. In this case, see the procedures in [Data imports](/import_dataset/index.html).
- You want to make the base public data or visualizations available in your own site. For this purpose, you can call the Data Commons APIs from your site; see [Data Commons web components](/api/web_components/index.html) for more details.

## Supported features

Expand Down Expand Up @@ -89,10 +89,10 @@ The cost of running a site on Google Cloud Platform depends on the size of your

## Recommended workflow

1. Work through the [Quickstart](quickstart.md) page to learn how to run a local Data Commons instance and load some sample custom data.
1. Prepare your real-world custom data and load it in the local custom instance. Data Commons requires your data to be in a specific format. See [Work with custom data](custom_data.md). If you are just testing custom data to add to the main Data Commons site, this is all you need to do.
1. If you are launching your own Data Commons site, and want to customize the look of the feel of the site, see [Customize the site](custom_ui.md).
1. If you are launching your own Data Commons site, upload your data to Google Cloud Platform and continue to use the local instance to test and validate the site. We recommend using Google Cloud Storage to store your data, and Google Cloud SQL to receive SQL queries from the local servers. See [Test data in Google Cloud](data_cloud.md).
1. When you are satisfied that everything is working correctly, and are getting closer to launch, upload your custom site to Google Cloud Run and continue to test in the Cloud. See [Deploy the custom instance to Google Cloud](deploy_cloud.md)
1. Work through the [Quickstart](/custom_dc/quickstart.html) page to learn how to run a local Data Commons instance and load some sample custom data.
1. Prepare your real-world custom data and load it in the local custom instance. Data Commons requires your data to be in a specific format. See [Work with custom data](/custom_dc/custom_data.html). If you are just testing custom data to add to the main Data Commons site, this is all you need to do.
1. If you are launching your own Data Commons site, and want to customize the look of the feel of the site, see [Customize the site](/custom_dc/custom_ui.html).
1. If you are launching your own Data Commons site, upload your data to Google Cloud Platform and continue to use the local instance to test and validate the site. We recommend using Google Cloud Storage to store your data, and Google Cloud SQL to receive SQL queries from the local servers. See [Test data in Google Cloud](/custom_dc/data_cloud.html).
1. When you are satisfied that everything is working correctly, and are getting closer to launch, upload your custom site to Google Cloud Run and continue to test in the Cloud. See [Deploy the custom instance to Google Cloud](/custom_dc/deploy_cloud.html)
1. To launch your site to real traffic, configure your Cloud service to serve external traffic. Consult [Mapping custom domains](https://cloud.google.com/run/docs/mapping-custom-domains) and related Google Cloud Run documentation for complete details on configuring domains and traffic.
1. For future updates and launches, continue to make UI and data changes locally and upload the data to Cloud Storage, before deploying the changes to Cloud Run.
8 changes: 5 additions & 3 deletions custom_dc/troubleshooting.md
Original file line number Diff line number Diff line change
Expand Up @@ -38,15 +38,16 @@ Step 7/62 : COPY mixer/go.mod mixer/go.sum ./
COPY failed: file not found in build context or excluded by .dockerignore: stat mixer/go.mod: file does not exist
```

You need to download additional git modules. See [One-time setup: download build dependencies](build_repo.md#download_deps).
You need to download additional git modules. See [One-time setup: download build dependencies](/custom_dc/build_repo.html#download_deps).

### Data loading problems

If you try to load data using the `/admin page`, and see the following errors:

`Error running import` or `invalid input`

There is a problem with how you have set up your CSV files and/or config.json file. Check that your CSV files conform to the structure described in [Prepare the CSV files](custom_data.md#prepare-csv).
There is a problem with how you have set up your CSV files and/or config.json file. Check that your CSV files conform to the structure described in [Prepare the CSV files](/custom_dc/custom_data.html#prepare-csv).


If the load page does not show any errors but data still does not load, try checking the following:

Expand All @@ -63,7 +64,8 @@ If you try to enter input into any of the explorer tools fields, and you get thi

![screenshot_troubleshoot](/assets/images/custom_dc/customdc_screenshot7.png){: width="800"}

This is because you are missing a valid API key or the necessary APIs are not enabled. Follow procedures in [Enable Google Cloud APIs and get a Maps API key](quickstart.md#maps-key), and be sure to obtain a permanent Maps/Places API key.
This is because you are missing a valid API key or the necessary APIs are not enabled. Follow procedures in [Enable Google Cloud APIs and get a Maps API key](/custom_dc/quickstart.html#maps-key), and be sure to obtain a permanent Maps/Places API key.


### Cloud Run Service problems

Expand Down

0 comments on commit ec505ae

Please sign in to comment.