Built site for gh-pages

Coleridge-Initiative · Oct 25, 2024 · ec7597e · ec7597e
1 parent 427d87d
commit ec7597e
Show file tree

Hide file tree

Showing 4 changed files with 5 additions and 5 deletions.
diff --git a/.nojekyll b/.nojekyll
@@ -1 +1 @@
-3bfe7e5c
+fc187fba
diff --git a/appendix.html b/appendix.html
@@ -425,8 +425,8 @@ <h3 class="anchored" data-anchor-id="r-connection">R Connection</h3>
 <h3 class="anchored" data-anchor-id="python-connection">Python Connection</h3>
 <div class="sourceCode" id="cb10"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb10-1"><a href="#cb10-1" aria-hidden="true" tabindex="-1"></a><span class="im">import</span> pyodbc</span>
 <span id="cb10-2"><a href="#cb10-2" aria-hidden="true" tabindex="-1"></a><span class="im">import</span> pandas <span class="im">as</span> pd</span>
-<span id="cb10-3"><a href="#cb10-3" aria-hidden="true" tabindex="-1"></a>cnxn <span class="op">=</span> pyodbc.<span class="ex">connect</span>(<span class="st">'DSN=Redshift01_projects_DSN; UID = adrf</span><span class="ch">\\</span><span class="st">user.name.project; PWD = password'</span>)</span>
-<span id="cb10-4"><a href="#cb10-4" aria-hidden="true" tabindex="-1"></a>df <span class="op">=</span> pd.read_sql(“SELECT <span class="op">*</span> FROM projects.schema_name.table_name”, cnxn)</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div>
+<span id="cb10-3"><a href="#cb10-3" aria-hidden="true" tabindex="-1"></a>cnxn <span class="op">=</span> pyodbc.<span class="ex">connect</span>(<span class="st">'DSN=Redshift01_projects_DSN; UID=adrf\user.name.project; PWD=password'</span>)</span>
+<span id="cb10-4"><a href="#cb10-4" aria-hidden="true" tabindex="-1"></a>df <span class="op">=</span> pd.read_sql(<span class="st">"SELECT * FROM projects.schema_name.table_name"</span>, cnxn)</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div>
 </section>
 <section id="stata-connection" class="level3">
 <h3 class="anchored" data-anchor-id="stata-connection">Stata Connection</h3>

diff --git a/index.html b/index.html
@@ -237,7 +237,7 @@ <h1 class="title">ADRF Onboarding Handbook</h1>
     <div>
     <div class="quarto-title-meta-heading">Published</div>
     <div class="quarto-title-meta-contents">
-      <p class="date">Last Updated on 22 October, 2024</p>
+      <p class="date">Last Updated on 25 October, 2024</p>
     </div>
   </div>
 

diff --git a/search.json b/search.json
@@ -404,7 +404,7 @@
     "href": "appendix.html#data-access",
     "title": "12  Redshift querying guide",
     "section": "Data Access",
-    "text": "Data Access\nThe data is housed in Redshift. You need to replace the “user.name.project” with your project based username. The project based username is your user folder name in the U:/ drive:\n\n\n\n\n\n\nNote: Your username will be different than in these examples.\n\nThe password needed to access Redshift is the second password entered when logging into the ADRF as shown in the screen below:\n\n\n\n\n\nAll data is stored under schemas in the projects database and are accessible by the following programs:\n\nDBeaver\nTo establish a connection to Redshift in DBeaver, first double click on the server you wish to connect to. In the example below I’m connecting to Redshift11_projects. Then a window will appear asking for your Username and Password. This will be your user folder name and include adrf\\ before the username. Then click OK. You will now have access to your data stored on the Redshift11_projects server.\n\n\n\n\n\nCreating Tables in PR/TR Schema\nWhen users create tables in their PR (Research Project) or TR (Training Project) schema, the table is initially permissioned to the user only. This is analogous to creating a document or file in your U drive: Only you have access to the newly created table.\nIf you want to allow all individuals in your project workspace to access the table in the PR/TR schema, you will need to grant permission to the table to the rest of the users who have access to the PR or TR schema.\nYou can do this by running the following code:\nGRANT SELECT, UPDATE, DELETE, INSERT ON TABLE schema_name.table_name TO group db_xxxxxx_rw;\n\nNote: In the above code example replace schma_name with the pr_ or tr_ schema assigned to your workspace and replace table_name with the name of the table on which you want to grant access. Also, in the group name db_xxxxxx_rw, replace xxxxxx with your project code. This is the last 6 characters in your project based user name. This will start with either a T or a P.\n\nIf you want to allow only a single user on your project to access the table, you will need to grant permission to that user. You can do this by running the following code:\nGRANT SELECT, UPDATE, DELETE, INSERT ON TABLE schema_name.table_name to \"IAM:first_name.last_name.project_code\";\n\nNote: In the above code example replace schma_name with the pr_ or tr_ schema assigned to your workspace and replace table_name with the name of the table on which you want to grant access. Also, in \"IAM:first_name.last_name.project_code\" update first_name.last_name.project_code with the user name to whom you want to grant access to.\n\nIf you have any questions, please reach out to us at [email protected]\nWhen connecting to the database using an ODBC connection, you need to use one of the following DSNs:\n\nRedshift01_projects_DSN\nRedshift11_projects_DSN\n\nIn the code examples below, the default DSN is Redshift01_projects_DSN.\n\n\nSAS Connection\nproc sql;\nconnect to odbc as my con\n(datasrc=Redshift01_projects_DSN user=adrf\\user.name.project password=password);\nselect * from connection to mycon\n(select * form projects.schema.table);\ndisconnect from mycon;\nquit;\n\n\nR Connection\nBest practices for loading large amounts of data in R\nTo ensure R can efficiently manage large amounts of data, please add the following lines of code to your R script before any packages are loaded:\noptions(java.parameters = c(\"-XX:+UseConcMarkSweepGC\", \"-Xmx8192m\"))\ngc()\nBest practices for writing tables to Redshift\nWhen writing an R data frame to Redshift use the following code as an example:\n# Note: replace the table_name with the name of the data frame you wish to write to Redshift\n\nDBI::dbWriteTable(conn = conn, #name of the connection \nname = \"schema_name.table_name\", #name of table to save df to \nvalue = df_name, #name of df to write to Redshift \noverwrite = TRUE) #if you want to overwrite a current table, otherwise FALSE\n\nqry &lt;- \"GRANT SELECT ON TABLE schema.table_name TO group &lt;group_name&gt;;\"\ndbSendUpdate(conn,qry)\nThe below table is for connecting to RedShift11 Database\nlibrary(RJDBC)\ndbusr=Sys.getenv(\"DBUSER\")                                                                                  \ndbpswd=Sys.getenv(\"DBPASSWD\")\n\n# Database URL\nurl &lt;- paste0(\"jdbc:redshift:iam://adrf-redshift11.cdy8ch2udktk.us-gov-west-1.redshift.amazonaws.com:5439/projects;\",\n\"loginToRp=urn:amazon:webservices:govcloud;\",\n\"ssl=true;\",\n\"AutoCreate=true;\",\n\"idp_host=adfs.adrf.net;\",\n\"idp_port=443;\",\n\"ssl_insecure=true;\",\n\"plugin_name=com.amazon.redshift.plugin.AdfsCredentialsProvider\")\n\n# Redshift JDBC Driver Setting\ndriver &lt;- JDBC(\"com.amazon.redshift.jdbc42.Driver\",\nclassPath = \"C:\\\\drivers\\\\redshift_withsdk\\\\redshift-jdbc42-2.1.0.12\\\\redshift-jdbc42-2.1.0.12.jar\",\nidentifier.quote=\"`\")\nconn &lt;- dbConnect(driver, url, dbusr, dbpswd)\nFor the above code to work, please create a file name .Renviron in your user folder (user folder is something like i.e. u:\\John.doe.p00002) And .Renviron file should contain the following:\nDBUSER='adrf\\John.doe.p00002'\nDBPASSWD='xxxxxxxxxxxx'\nPLEASE replace user id and password with your project workspace specific user is and password.\nThis will ensure you don’t have your id and password in R code and then you can easily share your R code with others without sharing your ID and password.\nThe below table is for connecting to RedShift01 Database\nlibrary(RJDBC)\ndbusr=Sys.getenv(\"DBUSER\")                                                                                  \ndbpswd=Sys.getenv(\"DBPASSWD\")\n\n# Database URL\nurl &lt;- paste0(\"jdbc:redshift:iam://adrf-redshift01.cdy8ch2udktk.us-gov-west-1.redshift.amazonaws.com:5439/projects;\",\n\"loginToRp=urn:amazon:webservices:govcloud;\",\n\"ssl=true;\",\n\"AutoCreate=true;\",\n\"idp_host=adfs.adrf.net;\",\n\"idp_port=443;\",\n\"ssl_insecure=true;\",\n\"plugin_name=com.amazon.redshift.plugin.AdfsCredentialsProvider\")\n\n# Redshift JDBC Driver Setting\ndriver &lt;- JDBC(\"com.amazon.redshift.jdbc42.Driver\",\nclassPath = \"C:\\\\drivers\\\\redshift_withsdk\\\\redshift-jdbc42-2.1.0.12\\\\redshift-jdbc42-2.1.0.12.jar\",\nidentifier.quote=\"`\")\nconn &lt;- dbConnect(driver, url, dbusr, dbpswd)\nFor the above code to work, please create a file name .Renviron in your user folder (user folder is something like i.e. u:\\John.doe.p00002) And .Renviron file should contain the following:\nDBUSER='adrf\\John.doe.p00002'\nDBPASSWD='xxxxxxxxxxxx'\nPLEASE replace user id and password with your project workspace specific user is and password.\nThis will ensure you don’t have your id and password in R code and then you can easily share your R code with others without sharing your ID and password.\n\n\nPython Connection\nimport pyodbc\nimport pandas as pd\ncnxn = pyodbc.connect('DSN=Redshift01_projects_DSN; UID = adrf\\\\user.name.project; PWD = password')\ndf = pd.read_sql(“SELECT * FROM projects.schema_name.table_name”, cnxn)\n\n\nStata Connection\nodbc load, exec(\"select * from PATH_TO_TABLE\") clear dsn(\"Redshift11_projects_DSN\") user(\"adrf\\user.name.project\") password(\"password\")",
+    "text": "Data Access\nThe data is housed in Redshift. You need to replace the “user.name.project” with your project based username. The project based username is your user folder name in the U:/ drive:\n\n\n\n\n\n\nNote: Your username will be different than in these examples.\n\nThe password needed to access Redshift is the second password entered when logging into the ADRF as shown in the screen below:\n\n\n\n\n\nAll data is stored under schemas in the projects database and are accessible by the following programs:\n\nDBeaver\nTo establish a connection to Redshift in DBeaver, first double click on the server you wish to connect to. In the example below I’m connecting to Redshift11_projects. Then a window will appear asking for your Username and Password. This will be your user folder name and include adrf\\ before the username. Then click OK. You will now have access to your data stored on the Redshift11_projects server.\n\n\n\n\n\nCreating Tables in PR/TR Schema\nWhen users create tables in their PR (Research Project) or TR (Training Project) schema, the table is initially permissioned to the user only. This is analogous to creating a document or file in your U drive: Only you have access to the newly created table.\nIf you want to allow all individuals in your project workspace to access the table in the PR/TR schema, you will need to grant permission to the table to the rest of the users who have access to the PR or TR schema.\nYou can do this by running the following code:\nGRANT SELECT, UPDATE, DELETE, INSERT ON TABLE schema_name.table_name TO group db_xxxxxx_rw;\n\nNote: In the above code example replace schma_name with the pr_ or tr_ schema assigned to your workspace and replace table_name with the name of the table on which you want to grant access. Also, in the group name db_xxxxxx_rw, replace xxxxxx with your project code. This is the last 6 characters in your project based user name. This will start with either a T or a P.\n\nIf you want to allow only a single user on your project to access the table, you will need to grant permission to that user. You can do this by running the following code:\nGRANT SELECT, UPDATE, DELETE, INSERT ON TABLE schema_name.table_name to \"IAM:first_name.last_name.project_code\";\n\nNote: In the above code example replace schma_name with the pr_ or tr_ schema assigned to your workspace and replace table_name with the name of the table on which you want to grant access. Also, in \"IAM:first_name.last_name.project_code\" update first_name.last_name.project_code with the user name to whom you want to grant access to.\n\nIf you have any questions, please reach out to us at [email protected]\nWhen connecting to the database using an ODBC connection, you need to use one of the following DSNs:\n\nRedshift01_projects_DSN\nRedshift11_projects_DSN\n\nIn the code examples below, the default DSN is Redshift01_projects_DSN.\n\n\nSAS Connection\nproc sql;\nconnect to odbc as my con\n(datasrc=Redshift01_projects_DSN user=adrf\\user.name.project password=password);\nselect * from connection to mycon\n(select * form projects.schema.table);\ndisconnect from mycon;\nquit;\n\n\nR Connection\nBest practices for loading large amounts of data in R\nTo ensure R can efficiently manage large amounts of data, please add the following lines of code to your R script before any packages are loaded:\noptions(java.parameters = c(\"-XX:+UseConcMarkSweepGC\", \"-Xmx8192m\"))\ngc()\nBest practices for writing tables to Redshift\nWhen writing an R data frame to Redshift use the following code as an example:\n# Note: replace the table_name with the name of the data frame you wish to write to Redshift\n\nDBI::dbWriteTable(conn = conn, #name of the connection \nname = \"schema_name.table_name\", #name of table to save df to \nvalue = df_name, #name of df to write to Redshift \noverwrite = TRUE) #if you want to overwrite a current table, otherwise FALSE\n\nqry &lt;- \"GRANT SELECT ON TABLE schema.table_name TO group &lt;group_name&gt;;\"\ndbSendUpdate(conn,qry)\nThe below table is for connecting to RedShift11 Database\nlibrary(RJDBC)\ndbusr=Sys.getenv(\"DBUSER\")                                                                                  \ndbpswd=Sys.getenv(\"DBPASSWD\")\n\n# Database URL\nurl &lt;- paste0(\"jdbc:redshift:iam://adrf-redshift11.cdy8ch2udktk.us-gov-west-1.redshift.amazonaws.com:5439/projects;\",\n\"loginToRp=urn:amazon:webservices:govcloud;\",\n\"ssl=true;\",\n\"AutoCreate=true;\",\n\"idp_host=adfs.adrf.net;\",\n\"idp_port=443;\",\n\"ssl_insecure=true;\",\n\"plugin_name=com.amazon.redshift.plugin.AdfsCredentialsProvider\")\n\n# Redshift JDBC Driver Setting\ndriver &lt;- JDBC(\"com.amazon.redshift.jdbc42.Driver\",\nclassPath = \"C:\\\\drivers\\\\redshift_withsdk\\\\redshift-jdbc42-2.1.0.12\\\\redshift-jdbc42-2.1.0.12.jar\",\nidentifier.quote=\"`\")\nconn &lt;- dbConnect(driver, url, dbusr, dbpswd)\nFor the above code to work, please create a file name .Renviron in your user folder (user folder is something like i.e. u:\\John.doe.p00002) And .Renviron file should contain the following:\nDBUSER='adrf\\John.doe.p00002'\nDBPASSWD='xxxxxxxxxxxx'\nPLEASE replace user id and password with your project workspace specific user is and password.\nThis will ensure you don’t have your id and password in R code and then you can easily share your R code with others without sharing your ID and password.\nThe below table is for connecting to RedShift01 Database\nlibrary(RJDBC)\ndbusr=Sys.getenv(\"DBUSER\")                                                                                  \ndbpswd=Sys.getenv(\"DBPASSWD\")\n\n# Database URL\nurl &lt;- paste0(\"jdbc:redshift:iam://adrf-redshift01.cdy8ch2udktk.us-gov-west-1.redshift.amazonaws.com:5439/projects;\",\n\"loginToRp=urn:amazon:webservices:govcloud;\",\n\"ssl=true;\",\n\"AutoCreate=true;\",\n\"idp_host=adfs.adrf.net;\",\n\"idp_port=443;\",\n\"ssl_insecure=true;\",\n\"plugin_name=com.amazon.redshift.plugin.AdfsCredentialsProvider\")\n\n# Redshift JDBC Driver Setting\ndriver &lt;- JDBC(\"com.amazon.redshift.jdbc42.Driver\",\nclassPath = \"C:\\\\drivers\\\\redshift_withsdk\\\\redshift-jdbc42-2.1.0.12\\\\redshift-jdbc42-2.1.0.12.jar\",\nidentifier.quote=\"`\")\nconn &lt;- dbConnect(driver, url, dbusr, dbpswd)\nFor the above code to work, please create a file name .Renviron in your user folder (user folder is something like i.e. u:\\John.doe.p00002) And .Renviron file should contain the following:\nDBUSER='adrf\\John.doe.p00002'\nDBPASSWD='xxxxxxxxxxxx'\nPLEASE replace user id and password with your project workspace specific user is and password.\nThis will ensure you don’t have your id and password in R code and then you can easily share your R code with others without sharing your ID and password.\n\n\nPython Connection\nimport pyodbc\nimport pandas as pd\ncnxn = pyodbc.connect('DSN=Redshift01_projects_DSN; UID=adrf\\user.name.project; PWD=password')\ndf = pd.read_sql(\"SELECT * FROM projects.schema_name.table_name\", cnxn)\n\n\nStata Connection\nodbc load, exec(\"select * from PATH_TO_TABLE\") clear dsn(\"Redshift11_projects_DSN\") user(\"adrf\\user.name.project\") password(\"password\")",
     "crumbs": [
       "<span class='chapter-number'>12</span>  <span class='chapter-title'>Redshift querying guide</span>"
     ]