diff --git a/modules/hands-on/ _posts/2024-09-05-hands-on.md b/modules/hands-on/ _posts/2024-09-05-hands-on.md index 8b13789..f22fefd 100644 --- a/modules/hands-on/ _posts/2024-09-05-hands-on.md +++ b/modules/hands-on/ _posts/2024-09-05-hands-on.md @@ -1 +1,69 @@ +--- +Title: Command Line Essentials for Bioinformatics +--- + +# Command Line Essentials for Bioinformatics + +## Learning Objectives +#### Understand Basic Concepts of Unix +* Explain the history and significance of the command line interface (CLI) in bioinformatics. +* Identify the different CLI for the operating systems (OS) and the advantages thereof. +* Applications of Unix CLI in Bioinformatics and SuperComputing. +* Demonstrate proficiency in basic commands for navigating directories (`cd`, `ls`, `pwd`). +* Understand the structure of file paths and the concept of relative vs. absolute paths. +* Use commands to create, delete, copy and move files (`mkdir`, `rmdir`, `cp`, `mv`, `rm`, `chmod`). +* Learn to manage file permissions and understand roles in Unix. +* Utilise Unix tools for text processing (`grep`, `awk`, `sed`, `cut`, `sort`, `cat`, `echo`). +* Perform data extraction and transformation tasks on bioinformatics data files (e.g., FASTA, FASTQ). +* Understand basic networking concepts and commands (`ping`, `curl`, `wget`, `scp`, `ssh`). +* Use networking tools to download data, transfer files securely and check connectivity. +* Construct and execute shell scripts to automate bioinformatics analyses. +* Chain commands using pipes and redirection to process and analyse data efficiently. + +
+ +## Background +[CLI Presentation](https://docs.google.com/presentation/d/1A_ecGBZuysro9qNSrmJhjiZUnj2NCc6t/edit#slide=id.p1) +#### Understand Basic Concepts of Unix +The Unix CLI plays a pivotal role in bioinformatics, offering you a powerful and flexible tool for data analysis and management. Leveraging a combination of Bash and command line utilities will enable efficient handling of large datasets, allow for the automation of repetitive tasks through scripting and access to a wide array of specialised tools and software. The CLI facilitates the integration of various applications and allows for precise control over data processing workflows, making it essential for tasks such as sequence alignment, genome assembly, proteomics, statistical analyses and data visualisation. Moreover, its ability to manipulate text and files directly enhances productivity and reproducibility. + + +#### Advantages of using Linux: +* Software Security and Stability. +* Extensive networking capabilities. +* Software updates in the hands of the user. +* Linux is Open Source and has attracted many different contributors. +* Implemented in over 90% of supercomputers allowing for improved scalability. + +#### Navigate the Unix File System +Navigating the Unix file system is essential for efficiently managing files and directories within the operating system. The Unix file system is structured as a hierarchical tree, starting from the root directory and branching out into subdirectories. Basic commands like `cd` (change directory), `ls` (list files) and `pwd` (print working directory) allow users to move through this structure, view contents and identify their current location. Understanding file paths (both absolute and relative) is crucial for accessing files accurately for a variety of analyses. Mastering these navigation skills streamlines data analyses and workflow development by laying the groundwork for utilising powerful bioinformatics tools. + +#### File Management +File management in Unix is a critical skill for effectively organising and manipulating data within the operating system. Unix provides a variety of commands that facilitate the creation, deletion and modification of files and directories. Commands such as`mkdir` (make directory), `rm` (remove files), `cp`(copy files) and `mv`(move or rename files) empower users to maintain an orderly file structure, essential for handling large datasets commonly encountered in fields like bioinformatics. + +Additionally, understanding file permissions with commands like `chmod` (change mode) ensures that users can control access to sensitive data. Proficient file management not only enhances productivity but also ensures data integrity and accessibility, making it an indispensable aspect of working in a Unix environment. + +#### Text Processing Tools +Text processing tools in Unix are invaluable for analysing and manipulating data, particularly in fields like bioinformatics where large datasets are common. Commands such as `grep` , `awk` and `sed` allow users to search for specific patterns, format text and transform data efficiently. For instance, `grep` can be used to find sequences within files, while `awk` excels at processing structured data by manipulating columns and rows. `sed` is a powerful stream editor that enables in-place editing of text files. +The `cut` command is useful for extracting specific fields from data files, `sort` organises data in a specified order and `uniq` filters out duplicate lines. Additionally, `cat` allows users to concatenate and display file contents and `echo` is used to print text or variables to the terminal, which can aid in scripting. +Mastering these tools enhances a bioinformatician's ability to extract meaningful information from complex datasets, automate repetitive text manipulation tasks and prepare data for further analysis. + +#### Networking Tools +Networking tools in Unix such as `ping`, `curl`, `wget`, `scp `and `ssh`, are essential for managing data transfer and connectivity, particularly where accessing remote servers and datasets is commonplace. The `ping` command is used to test network connectivity and diagnose potential issues, while `curl` and `wget` facilitate downloading files from the web or retrieving data from APIs. The `scp` (secure copying) command allows transfer of files between local and remote systems, ensuring data integrity during transfer. Additionally, `ssh` provides a secure method for accessing and managing remote servers, enabling users to execute commands and run analyses remotely. + +Mastering these networking tools is crucial for bioinformaticians as they aid in the process of accessing remote resources, sharing data and collaborating with others in a secure and efficient manner. + +#### Data Analysis & Shell Scripting +Shell scripting allows users to automate complex workflows by combining multiple commands into a single executable script, thereby saving time and reducing the potential for human error.The ability to read and write scripts enables users to standardise their analyses, making it easier to reproduce results and share methodologies with collaborators. Additionally, shell scripting can be integrated with bioinformatics tools to streamline tasks such as sequence alignment, variant calling and data transformation + +Mastering these skills not only enhances productivity but also empowers researchers to tackle intricate data challenges with confidence and efficiency, ultimately advancing scientific discovery. + +## Hands on Exercises (work in progress) +[CLI Terminal Online](https://sandbox.bio/tutorials/terminal-basics) + +[CLI Basics](https://docs.google.com/presentation/d/1lqLPnbV2v1Nc73YNGLC8g-8qovjIoJro/edit#slide=id.p10) + +[BASH Basics](https://www.linode.com/docs/guides/intro-bash-shell-scripting/) + +[LINUX cheat sheet](https://www.geeksforgeeks.org/linux-commands-cheat-sheet/)