Skip to content

Commit

Permalink
Refactor file paths in notes/README.md
Browse files Browse the repository at this point in the history
  • Loading branch information
dell-datascience committed Jun 28, 2024
1 parent eb9b4b4 commit e5393de
Show file tree
Hide file tree
Showing 5 changed files with 758 additions and 4 deletions.
8 changes: 4 additions & 4 deletions notes/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@ Below you will find links to the notes for each lesson.

Additionally, the following gists with cheatsheets are available.

* [Virtualization and containerization](https://gist.github.com/ziritrion/1842c8a4c4851602a8733bba19ab6050)
* [Python environment management](https://gist.github.com/ziritrion/8024025672ea92b8bdeb320d6015aa0d)
* [Git cheatsheet](https://gist.github.com/ziritrion/d73ca65bf4d19c79ca842a55853cb962)
* [Create a VM instance for the DE zoomcamp](https://gist.github.com/ziritrion/3214aa570e15ae09bf72c4587cb9d686)
* [Virtualization and containerization](virtualization_containerization.md)
* [Python environment management](python_environment.md)
* [Git cheatsheet](git_cheatsheet.md)
* [Create a VM instance for the DE zoomcamp](create_vm_instance.md)
121 changes: 121 additions & 0 deletions notes/creating_VM_instance_for_data_engineering.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,121 @@
## Install and setup Gcloud SDK

1. Download Gcloud SDK [from this link](https://cloud.google.com/sdk/docs/install) and install it according to the instructions for your OS.
1. Initialize the SDK [following these instructions](https://cloud.google.com/sdk/docs/quickstart).
1. Run `gcloud init` from a terminal and follow the instructions.
1. Make sure that your project is selected with the command `gcloud config list`

## Creating a VM instance

1. From your project's dashboard, go to _Cloud Compute_ > _VM instance_
1. Create a new instance:
* Manual setup:
* Any name of your choosing
* Pick your favourite region. You can check out the regions [in this link](https://cloud.google.com/about/locations).
* Pick a _E2 series_ instance. A _e2-standard-4_ instance is recommended (4 vCPUs, 16GB RAM)
* Change the boot disk to _Ubuntu_. The _Ubuntu 20.04 LTS_ version is recommended. Also pick at least 30GB of storage.
* Leave all other settings on their default value and click on _Create_.
* Gcloud SDK setup:
```sh
gcloud compute instances create dezoomcamp --zone=europe-west1-b --image-family=ubuntu-2004-lts --image-project=ubuntu-os-cloud --machine-type=e2-standard-4 --boot-disk-size=30GB
```
1. When you create an instance, it will be started automatically. You can skip to step 3 of the next section.

## Set up SSH access

1. Start your instance from the _VM instances_ dashboard.
1. In your local terminal, make sure that gcloud SDK is configured for your project. Use `gcloud config list` to list your current config's details.
1. If you have multiple google accounts but the current config does not match the account you want:
1. Use `gcloud config configurations list` to see all of the available configs and their associated accounts.
1. Change to the config you want with `gcloud config configurations activate my-project`
1. If the config matches your account but points to a different project:
1. Use `gcloud projects list` to list the projects available to your account (it can take a while to load).
1. use `gcloud config set project my-project` to change your current config to your project.
3. Set up the SSH connection to your VM instances with `gcloud compute config-ssh`
* Inside `~/ssh/` a new `config` file should appear with the necessary info to connect.
* If you did not have a SSH key, a pair of public and private SSH keys will be generated for you.
* The output of this command will give you the _host name_ of your instance in this format: `instance.zone.project` ; write it down.
4. You should now be able to open a terminal and SSH to your VM instance like this:
* `ssh instance.zone.project`
5. In VSCode, with the Remote SSH extension, if you run the [command palette](https://code.visualstudio.com/docs/getstarted/userinterface#_command-palette) and look for _Remote-SSH: Connect to Host_ (or alternatively you click on the Remote SSH icon on the bottom left corner and click on _Connect to Host_), your instance should now be listed. Select it to connect to it and work remotely.
### (Optional) Starting your instance with gcloud sdk after you shut it down.
1. List your available instances.
```sh
gcloud compute instances list
```
2. Start your instance.
```sh
gcloud compute instances start <instance_name>
```
3. Set up ssh so that you don't have to manually change the IP in your config files.
```sh
gcloud compute config-ssh
```

## Install stuff

1. Run this first in your SSH session: `sudo apt update && sudo apt -y upgrade`
* It's a good idea to run this command often, once per day or every few days, to keep your VM up to date.
### Anaconda:
1. In your local browser, go to the [Anaconda download page](https://www.anaconda.com/products/individual), scroll to the bottom, right click on the _64 bit x86 installer_ link under Linux and copy the URL.
* At the time of writing this gist, the URL is https://repo.anaconda.com/archive/Anaconda3-2021.11-Linux-x86_64.sh
1. In your SSH session, type `wget <anaconda_url>` to download the installer.
1. Find the filename of the installer with `ls`
1. Run the installer with `bash <filename>` (you can start typing the name and then press the Tab key to autocomplete)
1. Follow the on-screen instructions. Anwer `yes` to all _yes/no_ questions and leave all other default values.
1. Log out of your current SSH session with `exit` and log back in. You should now see a `(base)` at the beginning of your command prompt.
1. You may now remove the Anaconda installer with `rm <filename>`
### Docker:
1. Run `sudo apt install docker.io` to install it.
1. Change your settings so that you can run Docker without `sudo`:
1. Run `sudo groupadd docker`
1. Run `sudo gpasswd -a $USER docker`
1. Log out of your SSH session and log back in.
1. Run `sudo service docker restart`
1. Test that Docker can run successfully with `docker run hello-world`
### Docker compose:
1. Go to https://github.com/docker/compose/releases and copy the URL for the `docker-compose-linux-x86_64` binary for its latest version.
* At the time of writing, the last available version is `v2.2.3` and the URL for it is https://github.com/docker/compose/releases/download/v2.2.3/docker-compose-linux-x86_64
1. Create a folder for binary files for your Linux user:
1. Create a subfolder `bin` in your home account with `mkdir ~/bin`
1. Go to the folder with `cd ~/bin`
1. Download the binary file with `wget <compose_url> -O docker-compose`
* If you forget to add the `-O` option, you can rename the file with `mv <long_filename> docker-compose`
* Make sure that the `docker-compose` file is in the folder with `ls`
1. Make the binary executable with `chmod +x docker-compose`
* Check the file with `ls` again; it should now be colored green. You should now be able to run it with `./docker-compose version`
1. Go back to the home folder with `cd ~`
1. Run `nano .bashrc` to modify your path environment variable:
1. Scroll to the end of the file
1. Add this line at the end:
```bash
export PATH="${HOME}/bin:${PATH}"
```
1. Press `CTRL` + `o` in your keyboard and press Enter afterwards to save the file.
1. Press `CTRL` + `x` in your keyboard to exit the Nano editor.
1. Reload the path environment variable with `source .bashrc`
1. You should now be able to run Docker compose from anywhere; test it with `docker-compose version`
### Terraform:
1. Run `curl -fsSL https://apt.releases.hashicorp.com/gpg | sudo apt-key add -`
1. Run `sudo apt-add-repository "deb [arch=amd64] https://apt.releases.hashicorp.com $(lsb_release -cs) main"`
1. Run `sudo apt-get update && sudo apt-get install terraform`
## Upload/download files to/from your instance
1. Download a file.
```sh
# From your local machine
scp <instance_name>:path/to/remote/file path/to/local/file
```
1. Upload a file.
```sh
# From your local machine
scp path/to/local/file <instance_name>:path/to/remote/file
```
1. You can also drag & drop stuff in VSCode with the remote extension.
1. If you use a client like Cyberduck, you can connect with SFTP to your instance using the `instance.zone.project` name as server, and adding the generated private ssh key.
135 changes: 135 additions & 0 deletions notes/git_cheat_sheet
Original file line number Diff line number Diff line change
@@ -0,0 +1,135 @@
# Basic git

1. Make sure your local copy of the selected branch is updated.
1. Without overwriting anything
- `git fetch`
1. If you already fetched or you are ready to overwrite your local copy, then pull
- `git pull`
1. Check your repo branches
1. Local branches
- `git branch`
1. All branches on remote repo
- `git branch -r`
1. Both local and remote branches
- `git branch -a`
1. You can also add `-v` to make the commands explicitly verbose
1. Create a branch and access it
1. Normal way
1. `git branch new_branch`
2. (2 ways)
- `git checkout new_branch`
- `git switch new_branch` > Recommended option (avoid `checkout` unless necessary)
2. Shortcut (2 ways)
- `git checkout -b new_branch`
- `git switch -c new_branch` > Recommended option (avoid `checkout` unless necessary)
1. Get some work done lol
1. Check the status of your work
- `git status`
1. Did you mess up editing a file and want to restore it to how it was beforehand?
- `git restore changed_file.txt`
1. Add changes to staging in order to prepare your commit
1. Add a single file
- `git add new_file.txt`
2. Add all changed files
- `git add . -p`
1. Did you screw up? Reset the staging
- `git reset`
1. Commit
- `git commit -m "This is a commit message"`
1. Check the commit history of the branch you're in
- `git log`
- If you wanna see some cool things with log, you can use something like this:
- `git log --graph --oneline --all`
1. Make sure you upload your commits to the remote repo! If your local branch is brand new, you must add it to the remote repo.
1. New branch
- `git push -u origin new_branch`
2. Previously existing branch
- `git push`
1. Move to another branch
- `git checkout another_branch`
1. Merge some branch into your current branch (assuming default behavior of pull is merge)
- `git pull branch_that_will_be_merged_into_current_branch`

For more info check the [GitHub Git Cheat Sheet](https://education.github.com/git-cheat-sheet-education.pdf)

## Checkout vs Switch

`checkout` can be used to switch branches and/or restore working tree files, which means that you can do things like undo/restore commmits and overwrite local changes, or detach the HEAD (navigating a commit which is not the latest on its branch).

`switch` is only used for switching and creating branches. It cannot discard changes to tracked files: if you've changed a tracked file and want to switch branches, you'll need to stash or commit the changes.

# Advanced git

The following are some best practices that may be useful, taken from [this blog post](https://mislav.net/2013/02/merge-vs-rebase/), as well as [this tip](https://stackoverflow.com/questions/501407/is-there-a-git-merge-dry-run-option).

1. While working on a branch, if you need to pull commits from the remote repo to your local repo, use rebase instead of merge to reduce the amount of commits
- `git pull --rebase`
- If you want to make rebasing the default behavior when doing `git pull`, do so with `git config --global --bool pull.rebase true`
1. Before pushing your changes to the remote repo, perform basic housekeeping (squash related commits together, rewording messages, etc)
- `git rebase -i @{u}`
1. Make sure that you've fetched all changes from the remote repo
- `git fetch`
1. Simulate a merge to see any possible conflicts:
1. Do a merge with the `--no-commit` flag from the work branch.
- `get merge --no-commit --no-ff $GOOD_BRANCH`
3. Examine the staged changes
- `git diff --cached`
4. Undo the merge
- `git merge --abort`
3. Merge (do not rebase) changes from master/main into your branch, in order to update the branch with the latest features and solve any compatibility issues and/or conflicts
1. `git merge main`
2. `git pull --merge main`
4. Enforce merge commit when merging feature branch into main, even if a merge commit isn't necessary (check next point for exception), in order to make it easier to see the where and when of changes. Assuming you're in main:
- `git merge --no-ff branch_that_will_be_merged_into_main`
5. Exception to point 4: if you only need to merge a single commit (typical for stuff such as bugfixes). Assuming you're in main:
- `git cherry-pick branch_that_only_has_a_single_commit`
6. Delete merged branch:
1. Delete locally
- `git branch -d branch_that_has_been_merged`
1. Delete on remote repo
- `git push origin :branch_that_has_been_merged`

# Create a remote repo (local folder as remote repo)

## Official method

_[Source](https://git-scm.com/book/en/v2/Git-on-the-Server-Getting-Git-on-a-Server)_

1. Make sure you've got a local commit. You may initialize a local repo with `git init` on any project folder and making sure that it has at least one commit, or you may use an already existing local repo.
2. On a separate folder, run:
```bash
git clone --bare path/to/local/project project.git
```
* This will create a folder with name `project.git` on the folder you're running the command.
* Remote repo folders use the `.git` extension as a standard.
* This folder is a ***bare*** repository. It does not contain a working folder, only the git files.
3. Move the `project.git` folder to the final destination. Ideally, a shared folder such as a networked drive that everyone has access to "locally".
* You may combine steps 2 and 3 by creating the bare repo directly on the final folder.
4. You should now be able to clone the repo:
```bash
git clone path/to/remote/repo/project.git
```
5. The original repo that we bare-cloned does not have an origin repo to push to. If you want to keep using it, set up a remote like this:
```bash
git remote add origin path/to/remote/repo/project.git
```

## Alternative method

_[Source](https://stackoverflow.com/questions/14087667/create-a-remote-git-repo-from-local-folder)_

1. On remote folder:
```bash
mkdir my_repo
cd my_repo
git init --bare
```
2. On local folder:
```bash
cd my_repo
git init
git remote add origin ssh://myserver/my_repo
git add .
git commit -m "Initial commit"
git push -u origin master
```
91 changes: 91 additions & 0 deletions notes/python_environment_management.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,91 @@
# Conda

1. Create a virtual environment
* `conda create --name my_env_name python=3.8` or whatever Python version you may need.
2. List available envs (2 different ways
* `conda env list`
* `conda info --envs`
3. Activate virtual env
* `conda activate my_env_name`
4. Deactivate current environment
* `conda deactivate`
5. If pip doesn't work with a fresh conda install:
* `conda install pip`
6. Install project dependencies (listed in requirements.txt file)
* `conda install --file requirements.txt`
* `pip install -r requirements.txt`
7. Delete an old environment
* `conda remove --name my_env_name --all`
* `conda env remove -n my_env_name`
8. Update conda
* `conda update conda`
9. Update all packages in the current environment
* `conda update --all`
10. Update all packages in another env
* `conda update -n my_env_name --all`
11. List installed packages in current environment
* `conda list`
12. Add conda-forge channel
* `conda config --add channels conda-forge`
13. Check conda channels
* `conda config --show channels`
14. Remove conda-forge channel
* `conda config --remove channels conda-forge`
15. Create an environment file from your current environment.
* `conda env export --from-history > environment.yml`
16. Create a new environment and install dependencies listed in YML file.
* `conda env create -f environment.yml`
17. If you don't want the base environment to load automatically whenever you open a new shell, change the configs:
* `conda config --set auto_activate_base false`

# Pipenv + Pyenv

`pyenv` is a Python version manager. It allows you to install and manage multiple Python versions.

`pipenv` is a Python virtualenv management tools. `pipenv` does not have built-in package search; make sure you search for the packages at [PyPI](https://pypi.org/).

### Installation

1. `pyenv`
* https://github.com/pyenv/pyenv
* `pyenv` can be installed with either Brew or with the automatic installer script.
* For Windows, there is `pyenv-win` but I have not tested it.
* For Ubuntu 22.04 LTS, make sure you run the following before installing:
* `sudo apt-get install -y make build-essential libssl-dev zlib1g-dev libbz2-dev libreadline-dev libsqlite3-dev wget curl llvm xz-utils tk-dev libffi-dev liblzma-dev python3-openssl`
1. `pipenv`
* https://pipenv.pypa.io/en/latest/index.html
* Install locally for your user with `pip install pipenv --user`

### Usage

The environments are based on the folder you're on. There is no need to manually name them, and there is no environment activation to take care of per se.

1. Install the Python version you need
* `pyenv install 3.11`
1. Create a new virtual environment with pipenv and choose the Python vrsion you want.
* `pipenv install --python 3.11`
3. Install a package (this will modify `Pipfile` and `Pipfile.lock`)
* `pipenv install some_package`
* `pipenv install some_package=1.0`
4. If a `Pipfile.lock` file already exists, you can install the packages from it.
* `pipenv sync`
5. Update packages
* `pipenv update` > updates all packages
* `pipenv update <package>` updates a single package and its sub-dependencies
6. Access the pipenv shell (necessary for enabling the virtualenv and for your script to find the installed packages).
* `pipenv shell`
7. Exit a pipenv shell.
* `exit`
8. Install all dependencies to the system, avoiding virtualenvs entirely (useful for deployment in containers)
* `pipenv install --system --deploy` > Use this in your Dockerfile.

# Installing pipenv on Docker Dev Environment

* Update apt

sudo apt update && sudo apt upgrade
* Install pip

sudo apt install python3-pip
* Install pyenv. Follow instructions here: https://github.com/pyenv/pyenv-installer
* Install pipenv. Follow instructions here: https://pipenv.pypa.io/en/latest/
Loading

0 comments on commit e5393de

Please sign in to comment.