From 685344378f59e136b746c771c0ebd0e48a1a9697 Mon Sep 17 00:00:00 2001
From: Bo Xu <shifucun@users.noreply.github.com>
Date: Tue, 21 Nov 2023 17:58:50 +0000
Subject: [PATCH] Update documentation site to refer to the RSI approach for
 custom DC (#378)

---
 custom_dc/customize_ui.md | 104 ------------------
 custom_dc/index.md        |  13 ++-
 custom_dc/prepare_data.md | 122 ---------------------
 custom_dc/setup_gcp.md    |  94 ----------------
 custom_dc/upload_data.md  | 218 --------------------------------------
 5 files changed, 10 insertions(+), 541 deletions(-)
 delete mode 100644 custom_dc/customize_ui.md
 delete mode 100644 custom_dc/prepare_data.md
 delete mode 100644 custom_dc/setup_gcp.md
 delete mode 100644 custom_dc/upload_data.md
diff --git a/custom_dc/customize_ui.md b/custom_dc/customize_ui.md
deleted file mode 100644
index ef3226a13..000000000
--- a/custom_dc/customize_ui.md
+++ /dev/null
@@ -1,104 +0,0 @@
----
-layout: default
-title: Customize UI
-nav_order: 4
-parent: Custom Data Commons
-published: true
----
-
-## Overview
-
-Custom Data Commons allows customization of the web pages on top of
-[datacommons.org](https://datacommons.org). The customization includes overall
-color scheme, home page content, landing pages of timeline/scatter/map tools
-and etc.
-
-## Environment Setup
-
-Fork [datacommonsorg/website](https://github.com/datacommonsorg/website) Github
-repo following [these
-instructions](https://github.com/datacommonsorg/website#github-workflow) into a
-new repo, which will be used as the custom Data Commons codebase. Custom Data
-Commons development and deployment will be based on this forked repo.
-
-To run website in a local environment (Mac, Linux), follow this
-[guide](https://github.com/datacommonsorg/website/blob/master/docs/developer_guide.md#local-development-with-flask).
-Use `-e custom` flag when starting the local Flask server:
-
-```bash
-./run_server.sh -e custom
-```
-
-## Update UI Code
-
-### Update Header, Footer and Page Content
-
-Page header and footer can be customized in
-[base.html](https://github.com/datacommonsorg/website/blob/master/server/templates/custom_dc/custom/base.html)
-by updating the html element within `<header></header>` and `<footer></footer>`.
-
-Homepage can be customized in
-[homepage.html](https://github.com/datacommonsorg/website/blob/master/server/templates/custom_dc/custom/homepage.html).
-
-### Update CSS and Javascript
-
-The custom Data Commons provides an
-[overrides.css](https://github.com/datacommonsorg/website/tree/master/static/custom_dc/custom/overrides.css)
-to override CSS styles. It has a default color override. More style changes can
-be added in that file.
-
-If there are already existing CSS and Javascript files, put them under the
-[/static/custom_dc/custom](https://github.com/datacommonsorg/website/tree/master/static/custom_dc/custom)
-folder. Then include these files in the `<head>` section of the corresponding
-html files as
-
-```html
-<link href="/custom_dc/custom/<additional>.css" rel="stylesheet" />
-```
-
-or
-
-```html
-<script src="/custom_dc/custom/<additional>.js"></script>
-```
-
-## Deploy to GCP
-
-### One Time Setup
-
-- Install the following tools:
-
-  - [`gcloud`](https://cloud.google.com/sdk/docs/install)
-  - [`kubectl`](https://kubernetes.io/docs/tasks/tools/install-kubectl/)
-  - [`kustomize`](https://kustomize.io/)
-  - [`yq` 4.x](https://github.com/mikefarah/yq#install)
-
-- Install gke-gcloud-auth-plugin
-
-  - `gcloud components install gke-gcloud-auth-plugin`
-
-### Deploy Local Change
-
-After testing locally, follow the instructions below to deploy to GCP.
-`project_id` refers to the GCP project where custom Data Commons is installed.
-
-- Git commit all local changes (no need to push to Github repo). Later steps
-  will build docker image based on the hash of this commit.
-
-- Run the following code to build and push docker images to the Container
-  Registry.
-
-  ```bash
-  ./scripts/push_image.sh <project_id>
-  ```
-
-  Follow the link from the log to check the status until the push is complete.
-
-- Deploy the website to GKE:
-
-  ```bash
-  ./scripts/deploy_gke.sh -p <project_id>
-  ```
-
-  Check the deployment from GKE console. Once it's done, check the UI changes
-  from the website.
diff --git a/custom_dc/index.md b/custom_dc/index.md
index ad9b8bca5..f1fc7282c 100644
--- a/custom_dc/index.md
+++ b/custom_dc/index.md
@@ -2,7 +2,7 @@
 layout: default
 title: Custom Data Commons
 nav_order: 90
-has_children: true
+has_children: false
 ---
 
 ## Overview
@@ -21,9 +21,16 @@ full control over data, computing resources and access control. It can be
 accessible by the general public or can be access controlled to limited
 principals.
 
+## System Setup and Custom Data Import
+
+Check this
+[documentation](https://github.com/datacommonsorg/website/blob/master/custom_dc/README.md)
+including the system diagram, data storage options, deployment instructions, private
+data preparation and UI customization.
+
 ## Case Study
 
-#### Feeding America Data Commons
+### Feeding America Data Commons
 
 [Feeding America Data Commons](https://datacommons.feedingamerica.org/) provides access to data from [Map the Meal Gap](https://map.feedingamerica.org/),
 overlaid with data from a wide range of additional sources into a single
@@ -32,7 +39,7 @@ heart health and food insecurity can be retrieved with a few clicks.
 
 ![fa](/assets/images/custom_dc/home-heart-food.png){: height="450" }
 
-#### India Data Commons
+### India Data Commons
 
 [India Data Commons](https://datacommons.iitm.ac.in/) is an effort by Robert Bosch Center for Data Science and
 Artificial Intelligence, IIT Madras to highlight India-specific data in Data
diff --git a/custom_dc/prepare_data.md b/custom_dc/prepare_data.md
deleted file mode 100644
index 1a051554c..000000000
--- a/custom_dc/prepare_data.md
+++ /dev/null
@@ -1,122 +0,0 @@
----
-layout: default
-title: Prepare Data
-nav_order: 2
-parent: Custom Data Commons
-published: true
----
-
-## Overview
-
-Preparing data involves cleaning / formatting the raw data into compatible CSV
-files. Each CSV file is expected to have columns corresponding to the Values
-(numeric) about a Variable, Place and Date. The format of a CSV file is
-specified by a [Template
-MCF](https://github.com/datacommonsorg/data/blob/master/docs/mcf_format.md#template-mcf).
-The ready to use artifacts contain one TMCF file (.tmcf) and a few compatible
-CSV files (.csv).
-
-## File Format
-
-### General Format
-
-In the table shown below, there are separate columns for Variable (Variable),
-Place (Country), Date (Year) and Value (Value) and each row of the CSV
-corresponds to one observation of the Variable about a Place at the specified
-Date.
-
-| Year | Country | Variable        | Value       | Extra Column [Optional] |
-| ---- | ------- | --------------- | ----------- | ----------------------- |
-| 2017 | UK      | Life_Expectancy | 81.25609756 | 1                       |
-| 2017 | UK      | Population      | 65844142    | 2                       |
-
-The TMCF for this CSV looks like:
-
-```txt
-Node: E:data->E0
-typeOf: dcs:StatVarObservation
-observationAbout: C:data->Country
-observationDate: C:data->Year
-variableMeasured: C:data->Variable
-value: C:data->Value
-```
-
-Note: If all observations in the CSV are about the same Date, then those do not
-need to be specified as columns, but just as constants. This applies to
-Variable, Place as well. For the example above, if the CSV has data only for
-2017, then the CSV and TMCF looks like:
-
-| Country | Variable        | Value    | Extra Column [Optional] |
-| ------- | --------------- | -------- | ----------------------- |
-| UK      | Life_Expectancy | 81.2     | 1                       |
-| UK      | Population      | 65844142 | 2                       |
-
-```txt
-Node: E:data->E0
-typeOf: dcs:StatVarObservation
-observationAbout: C:data->Country
-observationDate: 2017
-variableMeasured: C:data->Variable
-value: C:data->Value
-```
-
-### Date as Column Header
-
-It is possible to specify Date as column headers.
-
-| Country | Variable        | 2017     | 2018     |
-| ------- | --------------- | -------- | -------- |
-| UK      | Life_Expectancy | 81.2     | 81.3     |
-| KR      | Population      | 51361911 | 51606633 |
-
-```txt
-Node: E:data->E0
-typeOf: dcs:StatVarObservation
-observationAbout: C:data->Country
-observationDate: 2017
-variableMeasured: C:data->Variable
-value: C:data->2017
-
-Node: E:data->E1
-typeOf: dcs:StatVarObservation
-observationAbout: C:data->Country
-observationDate: 2018
-variableMeasured: C:data->Variable
-value: C:data->2018
-```
-
-### Variable as Column Header
-
-It is possible to specify Variable as column headers.
-
-| Year | Country | Life_Expectancy | Population |
-| ---- | ------- | --------------- | ---------- |
-| 2017 | UK      | 81.2            | 65844142   |
-| 2018 | KR      | 82              | 51361911   |
-
-```txt
-Node: E:data->E0
-typeOf: dcs:StatVarObservation
-observationAbout: C:data->Country
-observationDate: C:data->Year
-variableMeasured: Life_Expectancy
-value: C:data->Life_Expectancy
-
-Node: E:data->E1
-typeOf: dcs:StatVarObservation
-observationAbout: C:data->Country
-observationDate: C:data->Year
-variableMeasured: Population
-value: C:data->Population
-```
-
-### Date and Place Formats
-
-Please check [Supported Date and Place
-Formats](https://datacommons.org/import/#supported-formats)
-
-## Testing Data
-
-Before uploading the data into custom instance, make sure to run [Import
-Checker](https://github.com/datacommonsorg/import#using-import-tool) and make
-sure there are no formatting or other issues.
diff --git a/custom_dc/setup_gcp.md b/custom_dc/setup_gcp.md
deleted file mode 100644
index 842c34b81..000000000
--- a/custom_dc/setup_gcp.md
+++ /dev/null
@@ -1,94 +0,0 @@
----
-layout: default
-title: System Setup
-nav_order: 1
-parent: Custom Data Commons
-published: true
----
-
-## Overview
-
-Custom Data Commons is deployed in Google Cloud Platform (GCP). This manual
-describes how to install a custom Data Commons instance in an existing GCP
-project (with id `PROJECT_ID`).
-
-### Steps
-
-1. From [Google Cloud Console](https://console.cloud.google.com/), open Cloud
-   Shell by clicking on the icon, like below:
-
-   ![fa](/assets/images/custom_dc/install_step_1.png){: width="600" }
-
-1. Set the environment variables `PROJECT_ID` and `CONTACT_EMAIL` in the
-   terminal:
-
-   ```bash
-   export PROJECT_ID=<project_id>
-   export CONTACT_EMAIL=<email_address>
-   ```
-
-   ![fa](/assets/images/custom_dc/install_step_2.png){: width="600" }
-
-   Note: If this step fails, please [contact us at with form](https://docs.google.com/forms/d/e/1FAIpQLSeVCR95YOZ56ABsPwdH1tPAjjIeVDtisLF-8oDYlOxYmNZ7LQ/viewform) with the errors.
-
-1. [Optional] The default domain of the instance is
-   `<PROJECT_ID>-datacommons.com`. If you want to use an existing custom domain,
-   set the environment variable:
-
-   ```bash
-   export CUSTOM_DC_DOMAIN=<existing_domain>
-   ```
-
-   Later on, you need to create a new DNS record by linking the domain with the
-   IP address (from GCP project) from your domain provider.
-
-1. Please run the following installation command inside the terminal. This may
-   take up to 20 minutes to complete.
-
-   ```bash
-   curl -fsSL https://raw.githubusercontent.com/datacommonsorg/website/custom-dc-v0.3.2/scripts/install_custom_dc.sh -o install_custom_dc.sh && \
-   chmod u+x install_custom_dc.sh && \
-   ./install_custom_dc.sh
-   ```
-
-1. Please [fill out this form](https://docs.google.com/forms/d/e/1FAIpQLSeVCR95YOZ56ABsPwdH1tPAjjIeVDtisLF-8oDYlOxYmNZ7LQ/viewform) to get an API key
-   for data access. Store the API key in [Cloud Secret Manager](https://console.cloud.google.com/security/secret-manager) with name `mixer-api-key`.
-
-1. [Optional] Get a Google Maps API key
-   ([instruction](https://developers.google.com/maps/documentation/javascript/get-api-key)).
-   Store the API key in [Cloud Secret
-   Manager](https://console.cloud.google.com/security/secret-manager) with name
-   `maps-api-key`. This is used for place search in visualization tools.
-
-1. You should get an email by Google domains that has the section pictured
-   below. Please click on “Verify email now”.
-
-   ![fa](/assets/images/custom_dc/install_step_3.png){: width="400" }
-
-   Note: You may not get the verification email if you have verified Cloud
-   Domains or Google Domains in the past. If you do not get the verification
-   within 10 minutes, check GCP UI to see if the Cloud Domain is active. If it is
-   active, then please skip this step. Below is what active Cloud Domains looks
-   like.
-
-   ![fa](/assets/images/custom_dc/install_step_4.png){: width="400" }
-
-1. Deploy a default Data Commons Instance:
-
-   First clone the Github repo
-
-   ```bash
-   git clone https://github.com/datacommonsorg/website.git
-   ```
-
-   Then update fields `project` in
-   [custom_dc_template.yaml](https://github.com/datacommonsorg/website/blob/master/deploy/helm_charts/envs/custom_dc_template.yaml)
-   with the actual GCP project ID.
-
-   Deploy to GKE by running:
-
-   ```bash
-   ./scripts/deploy_gke_helm.sh -e custom_dc_template -l us-central1-a
-   ```
-
-   Go to [GCP console](https://console.cloud.google.com/kubernetes/workload/overview) to make sure pods are running successfully.
diff --git a/custom_dc/upload_data.md b/custom_dc/upload_data.md
deleted file mode 100644
index 755747d96..000000000
--- a/custom_dc/upload_data.md
+++ /dev/null
@@ -1,218 +0,0 @@
----
-layout: default
-title: Upload Data
-nav_order: 3
-parent: Custom Data Commons
-published: true
----
-
-## Overview
-
-Schema files (MCF), data files (CSV) and data specification files (TMCF) are
-stored in Google Cloud Storage (GCS) of the custom Data Commons GCP project.
-These files should be stored based on a desired layout, so data can be processed
-and show up correctly. It's worth to understand a few terms to better understand
-the data layout.
-
-### Data Source
-
-Data source refers to a data agency such as "Census", "World Bank".
-
-### Dataset
-
-Dataset does not have a standard definition. The granulairty of a
-dataset varies depending on the sources. For example, one dataset can contain
-public parks information of all the states in USA if they are published
-together. Or if each state publishes this information individually, then there
-are multiple datasets for this topic.
-
-### Import
-
-Import is the smallest unit of data upload in Data Commons. It usually (but not
-necessarily) corresponds to a dataset.
-
-### Import Group
-
-A group of related imports that have similar topics. This is also the unit of
-raw data processing.
-
-### Table
-
-Table corresponds to one TMCF file and a set of CSV files that have the same
-shape. One import could have one or multiple tables.
-
-## Example Layout
-
-Consider the following two datasets:
-
-1. State level public park general information in 50 csv files (collected by
-   each state in different format, with size of 5G).
-2. State level public park expenditure with 1 csv file per year (collected by an
-   agency, with size of 5M).
-
-They can be arranged in multiple ways.
-
-### Single Import Group
-
-Since these data are all about public parks, they can be put under one import
-group, with two imports:
-
-- general info import
-
-  - With one schema file describing public parks properties
-  - With 50 tables for each state
-    - Each table has one TMCF and one CSV file
-
-- expenditure import
-  - With one schema file describing expenditure
-  - With one table containing one TMCF and mutliple CSV files.
-
-### Multiple Import Groups
-
-If the two datasets are managed by different departments and are updated at
-different frequencies, they can each be an import group. This way, when
-expenditure data is updated, only its data is processed and the larger general
-information import is untouched.
-
-## Storage Layout
-
-All custom Data Commons data are stored under one GCS folder. Below shows a
-typical layout.
-
-Note, create a root folder under the desired GCS bucket, which will be used to
-hold all the data.
-
-```txt
-<root_folder>
-├── import_group1/
-│   ├── data/
-│   │   ├── import1/
-│   │   │   ├── table1/
-│   │   │   │   ├── bar.tmcf
-│   │   │   │   ├── bar1.csv
-│   │   │   │   └── bar2.csv
-│   │   │   ├── table2/
-│   │   │   │   ├── foo.tmcf
-│   │   │   │   └── foo.csv
-│   │   │   ├── schema.mcf
-│   │   │   └── provenance.json
-│   │   └── import2/
-│   │       ├── table1/
-│   │       │   ├── baz.tmcf
-│   │       │   └── baz.csv
-│   │       ├── schema.mcf
-│   │       └── provenance.json
-│   ├── internal/
-│   └── provenance.json
-└── import_group2/
-    ├── data/
-    └── internal/
-```
-
-Raw data should be uploaded under `/<import_group>/data/<import>/<table>`. Each
-`table` folder can only contain one TMCF file while all the CSV files should
-have conformating format.
-
-Note `internal/` folder is used to hold computed data and config files and
-should not be touched.
-
-The data source and other meta info can be specified in `provenance.json` file
-with the following fields
-
-```json
-{
-  "name": "Name of the source (dataset)",
-  "url": "Url of the source (dataset)"
-}
-```
-
-provenance.json can be at import group level or import level, usually indicating
-data source and dataset provenance respectively.
-
-## Add a Custom Variable Hierarchy
-
-When using a custom DC instance with new statistical variables, it can be useful
-to define a custom hierarchy for the variables. The hierarchy is used in the
-Explorer tools to navigate the variables in a structured manner. For example, a
-sample custom hierarchy with two layers of groups and three variables is
-provided below.
-
-```txt
-.
-└── Example Root Node/
-    ├── Group 1A/
-    │   ├── Variable X
-    │   └── Group 2A/
-    │       └── Variable Y
-    └── Group 1B/
-        └── Variable Z
-```
-
-To auto-generate the nodes in a custom hierarchy from a spec like above, use
-[this
-notebook](https://colab.sandbox.google.com/github/datacommonsorg/api-python/blob/master/notebooks/Custom_Hierarchy_Generator.ipynb).
-Alternatively, to hand write the MCF nodes involved, please read on...
-
-To define the hierarchy, each group needs a `StatVarGroup` definition. The
-`StatVarGroup` nodes are linked to each other and to a custom root node via
-`specializationOf` properties. The example below can be used as a template —
-please replace all <span class="param">{}</span> with custom identifiers:
-
-<div class="schema-example">
-    Node: dcid:dc/g/Custom_Root
-    typeOf: dcs:StatVarGroup
-    specializationOf: dcid:dc/g/Root
-    name: "<span class="param">{Example Root Node}</span>"
-
-    Node: dcid:dc/g/Custom_<span class="param">{1A}</span>
-    typeOf: dcs:StatVarGroup
-    name: "<span class="param">{Group 1A}</span>"
-    specializationOf: dcid:dc/g/Custom_Root
-    displayRank: <span class="param">{1}</span>
-
-    Node: dcid:dc/g/Custom_<span class="param">{1B}</span>
-    typeOf: dcs:StatVarGroup
-    name: "<span class="param">{Group 1B}</span>"
-    specializationOf: dcid:dc/g/Custom_Root
-    displayRank: <span class="param">{2}</span>
-
-    Node: dcid:dc/g/Custom_<span class="param">{2A}</span>
-    typeOf: dcs:StatVarGroup
-    name: "<span class="param">{Group 2A}</span>"
-    specializationOf: dcid:dc/g/Custom_<span class="param">{1A}</span>
-    displayRank: <span class="param">{1}</span>
-
-</div>
-
-Next, each new variable needs a `StatisticalVariable` node definition, which will specify which group in the hierarchy it belongs to. The example below can be used as a template — please replace all <span class="param">{}</span> with custom identifiers:
-
-<div class="schema-example">
-    Node: dcid:<span class="param">{Variable_X}</span>
-    name: "<span class="param">{Variable X}</span>"
-    typeOf: dcs:StatisticalVariable
-    populationType: dcs:Thing
-    measuredProperty: dcs:<span class="param">{Variable_X}</span>
-    statType: dcs:measuredValue
-    memberOf: dcid:dc/g/Custom_<span class="param">{1A}</span>
-
-    Node: dcid:<span class="param">{Variable_Y}</span>
-    name: "<span class="param">{Variable Y}</span>"
-    typeOf: dcs:StatisticalVariable
-    populationType: dcs:Thing
-    measuredProperty: dcs:<span class="param">{Variable_Y}</span>
-    statType: dcs:measuredValue
-    memberOf: dcid:dc/g/Custom_<span class="param">{2A}</span>
-
-    Node: dcid:<span class="param">{Variable_Z}</span>
-    name: "<span class="param">{Variable Z}</span>"
-    typeOf: dcs:StatisticalVariable
-    populationType: dcs:Thing
-    measuredProperty: dcs:<span class="param">{Variable_Z}</span>
-    statType: dcs:measuredValue
-    memberOf: dcid:dc/g/Custom_<span class="param">{1B}</span>
-
-</div>
-
-The `StatVarGroup` and `StatisticalVariable` nodes that make up the hiearchy can
-be included in a `.mcf` file and added to the GCS bucket associated with the
-custom DC instance.