diff --git a/level102/containerization_and_orchestration/orchestration_with_kubernetes/index.html b/level102/containerization_and_orchestration/orchestration_with_kubernetes/index.html index 6bce456c..deff95ea 100644 --- a/level102/containerization_and_orchestration/orchestration_with_kubernetes/index.html +++ b/level102/containerization_and_orchestration/orchestration_with_kubernetes/index.html @@ -2406,12 +2406,12 @@

Lab 2:

Curl the IP corresponding to 10.96.114.184. This curl request reaches one of the 10 pods in the deployment “nginx-deployment” in a round robin fashion. What happens when we execute the expose command is that a kubernetes Service is created of type Cluster IP so that all the pods behind this service are accessible through a single local IP (10.96.114.184, here).

It is possible to have a public IP instead (i.e an actual external load balancer) by creating a Service of type LoadBalancer. Do feel free to play around with it!

-

The above exercises a pretty good exposure to using Kubernetes to manage large scale deployments. Trust me, the process is very similar to the above for operating 1000 deployments and containers too! While a Deployment object is good enough for managing stateless applications, Kuberenetes provides other resources like Job, Daemonset, Cronjob, Statefulset etc. to manage special use cases.

+

The above exercises a pretty good exposure to using Kubernetes to manage large scale deployments. Trust me, the process is very similar to the above for operating 1000 deployments and containers too! While a Deployment object is good enough for managing stateless applications, Kubernetes provides other resources like Job, Daemonset, Cronjob, Statefulset etc. to manage special use cases.

eAdditional labs: https://kubernetes.courselabs.co/ (Huge number of free follow-along exercises to play with Kubernetes)

Advanced topics

Most often than not, microservices orchestrated with Kubernetes contain dozens of instances of resources like deployment, services and configs. The manifests for these applications can be auto- generated with Helm templates and passed on as Helm charts. Similar to how we have PiPy for python packages there are remote repositories like Bitnami where Helm charts (e.g for setting up a production-ready Prometheus or Kafka with a single click) can be downloaded and used. This is a good place to begin.

-

Kuberenetes provides the flexibility to create our custom resources (similar to Deployment or the Pod which we saw). For instance, if you want to create 5 instances of a resource with kind as SchoolOfSre you can! The only thing is that you have to write your custom resource for it. You can also build a custom operator for your custom resource to take certain actions on the resource instance. You can check here for more information.

+

Kubernetes provides the flexibility to create our custom resources (similar to Deployment or the Pod which we saw). For instance, if you want to create 5 instances of a resource with kind as SchoolOfSre you can! The only thing is that you have to write your custom resource for it. You can also build a custom operator for your custom resource to take certain actions on the resource instance. You can check here for more information.

diff --git a/search/search_index.json b/search/search_index.json index 16a04ac8..c25f3758 100644 --- a/search/search_index.json +++ b/search/search_index.json @@ -1 +1 @@ -{"config":{"indexing":"full","lang":["en"],"min_search_length":3,"prebuild_index":false,"separator":"[\\s\\-]+"},"docs":[{"location":"","text":"School of SRE Site Reliability Engineers (SREs) sits at the intersection of software engineering and systems engineering. While there are potentially infinite permutations and combinations of how infrastructure and software components can be put together to achieve an objective, focusing on foundational skills allows SREs to work with complex systems and software, regardless of whether these systems are proprietary, 3rd party, open systems, run on cloud/on-prem infrastructure, etc. Particularly important is to gain a deep understanding of how these areas of systems and infrastructure relate to each other and interact with each other. The combination of software and systems engineering skills is rare and is generally built over time with exposure to a wide variety of infrastructure, systems, and software. SREs bring in engineering practices to keep the site up. Each distributed system is an agglomeration of many components. SREs validate business requirements, convert them to SLAs for each of the components that constitute the distributed system, monitor and measure adherence to SLAs, re-architect or scale out to mitigate or avoid SLA breaches, add these learnings as feedback to new systems or projects and thereby reduce operational toil. Hence SREs play a vital role right from the day 0 design of the system. In early 2019, we started visiting campuses across India to recruit the best and brightest minds to make sure LinkedIn, and all the services that make up its complex technology stack are always available for everyone. This critical function at LinkedIn falls under the purview of the Site Engineering team and Site Reliability Engineers (SREs) who are Software Engineers, specialized in reliability. As we continued on this journey we started getting a lot of questions from these campuses on what exactly the site reliability engineering role entails? And, how could someone learn the skills and the disciplines involved to become a successful site reliability engineer? Fast forward a few months, and a few of these campus students had joined LinkedIn either as interns or as full-time engineers to become a part of the Site Engineering team; we also had a few lateral hires who joined our organization who were not from a traditional SRE background. That's when a few of us got together and started to think about how we can onboard new graduate engineers to the Site Engineering team. There are very few resources out there guiding someone on the basic skill sets one has to acquire as a beginner SRE. Because of the lack of these resources, we felt that individuals have a tough time getting into open positions in the industry. We created the School Of SRE as a starting point for anyone wanting to build their career as an SRE. In this course, we are focusing on building strong foundational skills. The course is structured in a way to provide more real life examples and how learning each of these topics can play an important role in day to day job responsibilities of an SRE. Currently we are covering the following topics under the School Of SRE: Level 101 Fundamentals Series Linux Basics Git Linux Networking Python and Web Data Relational databases(MySQL) NoSQL concepts Big Data Systems Design Metrics and Monitoring Security Level 102 Linux Intermediate Linux Advanced Containers and orchestration System Calls and Signals Networking System Design System troubleshooting and performance improvements Continuous Integration and Continuous Delivery We believe continuous learning will help in acquiring deeper knowledge and competencies in order to expand your skill sets, every module has added references that could be a guide for further learning. Our hope is that by going through these modules we should be able to build the essential skills required for a Site Reliability Engineer. At LinkedIn, we are using this curriculum for onboarding our non-traditional hires and new college grads into the SRE role. We had multiple rounds of successful onboarding experiences with new employees and the course helped them be productive in a very short period of time. This motivated us to open source the content for helping other organizations in onboarding new engineers into the role and provide guidance for aspiring individuals to get into the role. We realize that the initial content we created is just a starting point and we hope that the community can help in the journey of refining and expanding the content. Check out the contributing guide to get started.","title":"Home"},{"location":"#school-of-sre","text":"Site Reliability Engineers (SREs) sits at the intersection of software engineering and systems engineering. While there are potentially infinite permutations and combinations of how infrastructure and software components can be put together to achieve an objective, focusing on foundational skills allows SREs to work with complex systems and software, regardless of whether these systems are proprietary, 3rd party, open systems, run on cloud/on-prem infrastructure, etc. Particularly important is to gain a deep understanding of how these areas of systems and infrastructure relate to each other and interact with each other. The combination of software and systems engineering skills is rare and is generally built over time with exposure to a wide variety of infrastructure, systems, and software. SREs bring in engineering practices to keep the site up. Each distributed system is an agglomeration of many components. SREs validate business requirements, convert them to SLAs for each of the components that constitute the distributed system, monitor and measure adherence to SLAs, re-architect or scale out to mitigate or avoid SLA breaches, add these learnings as feedback to new systems or projects and thereby reduce operational toil. Hence SREs play a vital role right from the day 0 design of the system. In early 2019, we started visiting campuses across India to recruit the best and brightest minds to make sure LinkedIn, and all the services that make up its complex technology stack are always available for everyone. This critical function at LinkedIn falls under the purview of the Site Engineering team and Site Reliability Engineers (SREs) who are Software Engineers, specialized in reliability. As we continued on this journey we started getting a lot of questions from these campuses on what exactly the site reliability engineering role entails? And, how could someone learn the skills and the disciplines involved to become a successful site reliability engineer? Fast forward a few months, and a few of these campus students had joined LinkedIn either as interns or as full-time engineers to become a part of the Site Engineering team; we also had a few lateral hires who joined our organization who were not from a traditional SRE background. That's when a few of us got together and started to think about how we can onboard new graduate engineers to the Site Engineering team. There are very few resources out there guiding someone on the basic skill sets one has to acquire as a beginner SRE. Because of the lack of these resources, we felt that individuals have a tough time getting into open positions in the industry. We created the School Of SRE as a starting point for anyone wanting to build their career as an SRE. In this course, we are focusing on building strong foundational skills. The course is structured in a way to provide more real life examples and how learning each of these topics can play an important role in day to day job responsibilities of an SRE. Currently we are covering the following topics under the School Of SRE: Level 101 Fundamentals Series Linux Basics Git Linux Networking Python and Web Data Relational databases(MySQL) NoSQL concepts Big Data Systems Design Metrics and Monitoring Security Level 102 Linux Intermediate Linux Advanced Containers and orchestration System Calls and Signals Networking System Design System troubleshooting and performance improvements Continuous Integration and Continuous Delivery We believe continuous learning will help in acquiring deeper knowledge and competencies in order to expand your skill sets, every module has added references that could be a guide for further learning. Our hope is that by going through these modules we should be able to build the essential skills required for a Site Reliability Engineer. At LinkedIn, we are using this curriculum for onboarding our non-traditional hires and new college grads into the SRE role. We had multiple rounds of successful onboarding experiences with new employees and the course helped them be productive in a very short period of time. This motivated us to open source the content for helping other organizations in onboarding new engineers into the role and provide guidance for aspiring individuals to get into the role. We realize that the initial content we created is just a starting point and we hope that the community can help in the journey of refining and expanding the content. Check out the contributing guide to get started.","title":"School of SRE"},{"location":"CODE_OF_CONDUCT/","text":"This code of conduct outlines expectations for participation in LinkedIn-managed open source communities, as well as steps for reporting unacceptable behavior. We are committed to providing a welcoming and inspiring community for all. People violating this code of conduct may be banned from the community. Our open source communities strive to: Be friendly and patient: Remember you might not be communicating in someone else's primary spoken or programming language, and others may not have your level of understanding. Be welcoming: Our communities welcome and support people of all backgrounds and identities. This includes, but is not limited to members of any race, ethnicity, culture, national origin, color, immigration status, social and economic class, educational level, sex, sexual orientation, gender identity and expression, age, size, family status, political belief, religion, and mental and physical ability. Be respectful: We are a world-wide community of professionals, and we conduct ourselves professionally. Disagreement is no excuse for poor behavior and poor manners. Disrespectful and unacceptable behavior includes, but is not limited to: Violent threats or language. Discriminatory or derogatory jokes and language. Posting sexually explicit or violent material. Posting, or threatening to post, people's personally identifying information (\"doxing\"). Insults, especially those using discriminatory terms or slurs. Behavior that could be perceived as sexual attention. Advocating for or encouraging any of the above behaviors. Understand disagreements: Disagreements, both social and technical, are useful learning opportunities. Seek to understand the other viewpoints and resolve differences constructively. This code is not exhaustive or complete. It serves to capture our common understanding of a productive, collaborative environment. We expect the code to be followed in spirit as much as in the letter. Scope This code of conduct applies to all repos and communities for LinkedIn-managed open source projects regardless of whether or not the repo explicitly calls out its use of this code. The code also applies in public spaces when an individual is representing a project or its community. Examples include using an official project e-mail address, posting via an official social media account, or acting as an appointed representative at an online or offline event. Representation of a project may be further defined and clarified by project maintainers. Note: Some LinkedIn-managed communities have codes of conduct that pre-date this document and issue resolution process. While communities are not required to change their code, they are expected to use the resolution process outlined here. The review team will coordinate with the communities involved to address your concerns. Reporting Code of Conduct Issues We encourage all communities to resolve issues on their own whenever possible. This builds a broader and deeper understanding and ultimately a healthier interaction. In the event that an issue cannot be resolved locally, please feel free to report your concerns by contacting oss@linkedin.com . In your report please include: Your contact information. Names (real, usernames or pseudonyms) of any individuals involved. If there are additional witnesses, please include them as well. Your account of what occurred, and if you believe the incident is ongoing. If there is a publicly available record (e.g. a mailing list archive or a public chat log), please include a link or attachment. Any additional information that may be helpful. All reports will be reviewed by a multi-person team and will result in a response that is deemed necessary and appropriate to the circumstances. Where additional perspectives are needed, the team may seek insight from others with relevant expertise or experience. The confidentiality of the person reporting the incident will be kept at all times. Involved parties are never part of the review team. Anyone asked to stop unacceptable behavior is expected to comply immediately. If an individual engages in unacceptable behavior, the review team may take any action they deem appropriate, including a permanent ban from the community. This code of conduct is based on the Microsoft Open Source Code of Conduct which was based on the template established by the TODO Group and used by numerous other large communities (e.g., Facebook , Yahoo , Twitter , GitHub ) and the Scope section from the Contributor Covenant version 1.4 .","title":"Code of Conduct"},{"location":"CODE_OF_CONDUCT/#scope","text":"This code of conduct applies to all repos and communities for LinkedIn-managed open source projects regardless of whether or not the repo explicitly calls out its use of this code. The code also applies in public spaces when an individual is representing a project or its community. Examples include using an official project e-mail address, posting via an official social media account, or acting as an appointed representative at an online or offline event. Representation of a project may be further defined and clarified by project maintainers. Note: Some LinkedIn-managed communities have codes of conduct that pre-date this document and issue resolution process. While communities are not required to change their code, they are expected to use the resolution process outlined here. The review team will coordinate with the communities involved to address your concerns.","title":"Scope"},{"location":"CODE_OF_CONDUCT/#reporting-code-of-conduct-issues","text":"We encourage all communities to resolve issues on their own whenever possible. This builds a broader and deeper understanding and ultimately a healthier interaction. In the event that an issue cannot be resolved locally, please feel free to report your concerns by contacting oss@linkedin.com . In your report please include: Your contact information. Names (real, usernames or pseudonyms) of any individuals involved. If there are additional witnesses, please include them as well. Your account of what occurred, and if you believe the incident is ongoing. If there is a publicly available record (e.g. a mailing list archive or a public chat log), please include a link or attachment. Any additional information that may be helpful. All reports will be reviewed by a multi-person team and will result in a response that is deemed necessary and appropriate to the circumstances. Where additional perspectives are needed, the team may seek insight from others with relevant expertise or experience. The confidentiality of the person reporting the incident will be kept at all times. Involved parties are never part of the review team. Anyone asked to stop unacceptable behavior is expected to comply immediately. If an individual engages in unacceptable behavior, the review team may take any action they deem appropriate, including a permanent ban from the community. This code of conduct is based on the Microsoft Open Source Code of Conduct which was based on the template established by the TODO Group and used by numerous other large communities (e.g., Facebook , Yahoo , Twitter , GitHub ) and the Scope section from the Contributor Covenant version 1.4 .","title":"Reporting Code of Conduct Issues"},{"location":"CONTRIBUTING/","text":"We realise that the initial content we created is just a starting point and our hope is that the community can help in the journey refining and extending the contents. As a contributor, you represent that the content you submit is not plagiarised. By submitting the content, you (and, if applicable, your employer) are licensing the submitted content to LinkedIn and the open source community subject to the Creative Commons Attribution 4.0 International Public License. Repository URL : https://github.com/linkedin/school-of-sre Contributing Guidelines Ensure that you adhere to the following guidelines: Should be about principles and concepts that can be applied in any company or individual project. Do not focus on particular tools or tech stack(which usually change over time). Adhere to the Code of Conduct . Should be relevant to the roles and responsibilities of an SRE. Should be locally tested (see steps for testing) and well formatted. It is good practice to open an issue first and discuss your changes before submitting a pull request. This way, you can incorporate ideas from others before you even start. Building and testing locally Run the following commands to build and view the site locally before opening a PR. python3 -m venv .venv source .venv/bin/activate pip install -r requirements.txt mkdocs build mkdocs serve Opening a PR Follow the GitHub PR workflow for your contributions. Fork this repo, create a feature branch, commit your changes and open a PR to this repo.","title":"Contribute"},{"location":"CONTRIBUTING/#contributing-guidelines","text":"Ensure that you adhere to the following guidelines: Should be about principles and concepts that can be applied in any company or individual project. Do not focus on particular tools or tech stack(which usually change over time). Adhere to the Code of Conduct . Should be relevant to the roles and responsibilities of an SRE. Should be locally tested (see steps for testing) and well formatted. It is good practice to open an issue first and discuss your changes before submitting a pull request. This way, you can incorporate ideas from others before you even start.","title":"Contributing Guidelines"},{"location":"CONTRIBUTING/#building-and-testing-locally","text":"Run the following commands to build and view the site locally before opening a PR. python3 -m venv .venv source .venv/bin/activate pip install -r requirements.txt mkdocs build mkdocs serve","title":"Building and testing locally"},{"location":"CONTRIBUTING/#opening-a-pr","text":"Follow the GitHub PR workflow for your contributions. Fork this repo, create a feature branch, commit your changes and open a PR to this repo.","title":"Opening a PR"},{"location":"sre_community/","text":"We are having an active LinkedIn community for School of SRE. Please join the group via : https://www.linkedin.com/groups/12493545/ The group has members with different levels of experience in site reliability engineering. There are active conversation on different technical topics centered around site reliability engineering. We encourage everyone to join the conversation and learn from each other and build a successful career in the SRE space.","title":"SRE Community"},{"location":"level101/big_data/evolution/","text":"Evolution of Hadoop Architecture of Hadoop HDFS The Hadoop Distributed File System (HDFS) is a distributed file system designed to run on commodity hardware. It has many similarities with existing distributed file systems. However, the differences from other distributed file systems are significant. HDFS is highly fault-tolerant and is designed to be deployed on low-cost hardware. HDFS provides high throughput access to application data and is suitable for applications that have large data sets. HDFS is part of the Apache Hadoop Core project . The main components of HDFS include: 1. NameNode: is the arbitrator and central repository of file namespace in the cluster. The NameNode executes the operations such as opening, closing, and renaming files and directories. 2. DataNode: manages the storage attached to the node on which it runs. It is responsible for serving all the read and writes requests. It performs operations on instructions on NameNode such as creation, deletion, and replications of blocks. 3. Client: Responsible for getting the required metadata from the namenode and then communicating with the datanodes for reads and writes. YARN YARN stands for \u201cYet Another Resource Negotiator\u201c. It was introduced in Hadoop 2.0 to remove the bottleneck on Job Tracker which was present in Hadoop 1.0. YARN was described as a \u201cRedesigned Resource Manager\u201d at the time of its launching, but it has now evolved to be known as a large-scale distributed operating system used for Big Data processing. The main components of YARN architecture include: 1. Client: It submits map-reduce(MR) jobs to the resource manager. 2. Resource Manager: It is the master daemon of YARN and is responsible for resource assignment and management among all the applications. Whenever it receives a processing request, it forwards it to the corresponding node manager and allocates resources for the completion of the request accordingly. It has two major components: 1. Scheduler: It performs scheduling based on the allocated application and available resources. It is a pure scheduler, which means that it does not perform other tasks such as monitoring or tracking and does not guarantee a restart if a task fails. The YARN scheduler supports plugins such as Capacity Scheduler and Fair Scheduler to partition the cluster resources. 2. Application manager: It is responsible for accepting the application and negotiating the first container from the resource manager. It also restarts the Application Manager container if a task fails. 3. Node Manager: It takes care of individual nodes on the Hadoop cluster and manages application and workflow and that particular node. Its primary job is to keep up with the Node Manager. It monitors resource usage, performs log management, and also kills a container based on directions from the resource manager. It is also responsible for creating the container process and starting it at the request of the Application master. 4. Application Master: An application is a single job submitted to a framework. The application manager is responsible for negotiating resources with the resource manager, tracking the status, and monitoring the progress of a single application. The application master requests the container from the node manager by sending a Container Launch Context(CLC) which includes everything an application needs to run. Once the application is started, it sends the health report to the resource manager from time-to-time. 5. Container: It is a collection of physical resources such as RAM, CPU cores, and disk on a single node. The containers are invoked by Container Launch Context(CLC) which is a record that contains information such as environment variables, security tokens, dependencies, etc. MapReduce framework The term MapReduce represents two separate and distinct tasks Hadoop programs perform-Map Job and Reduce Job. Map jobs take data sets as input and process them to produce key-value pairs. Reduce job takes the output of the Map job i.e. the key-value pairs and aggregates them to produce desired results. Hadoop MapReduce (Hadoop Map/Reduce) is a software framework for distributed processing of large data sets on computing clusters. Mapreduce helps to split the input data set into a number of parts and run a program on all data parts parallel at once. Please find the below Word count example demonstrating the usage of the MapReduce framework: Other tooling around Hadoop Hive Uses a language called HQL which is very SQL like. Gives non-programmers the ability to query and analyze data in Hadoop. Is basically an abstraction layer on top of map-reduce. Ex. HQL query: SELECT pet.name, comment FROM pet JOIN event ON (pet.name = event.name); In mysql: SELECT pet.name, comment FROM pet, event WHERE pet.name = event.name; Pig Uses a scripting language called Pig Latin, which is more workflow driven. Don't need to be an expert Java programmer but need a few coding skills. Is also an abstraction layer on top of map-reduce. Here is a quick question for you: What is the output of running the pig queries in the right column against the data present in the left column in the below image? Output: 7,Komal,Nayak,24,9848022334,trivendram 8,Bharathi,Nambiayar,24,9848022333,Chennai 5,Trupthi,Mohanthy,23,9848022336,Bhuwaneshwar 6,Archana,Mishra,23,9848022335,Chennai Spark Spark provides primitives for in-memory cluster computing that allows user programs to load data into a cluster\u2019s memory and query it repeatedly, making it well suited to machine learning algorithms. Presto Presto is a high performance, distributed SQL query engine for Big Data. Its architecture allows users to query a variety of data sources such as Hadoop, AWS S3, Alluxio, MySQL, Cassandra, Kafka, and MongoDB. Example presto query: use studentDB; show tables; SELECT roll_no, name FROM studentDB.studentDetails where section=\u2019A\u2019 limit 5; Data Serialisation and storage In order to transport the data over the network or to store on some persistent storage, we use the process of translating data structures or objects state into binary or textual form. We call this process serialization.. Avro data is stored in a container file (a .avro file) and its schema (the .avsc file) is stored with the data file. Apache Hive provides support to store a table as Avro and can also query data in this serialisation format.","title":"Evolution and Architecture of Hadoop"},{"location":"level101/big_data/evolution/#evolution-of-hadoop","text":"","title":"Evolution of Hadoop"},{"location":"level101/big_data/evolution/#architecture-of-hadoop","text":"HDFS The Hadoop Distributed File System (HDFS) is a distributed file system designed to run on commodity hardware. It has many similarities with existing distributed file systems. However, the differences from other distributed file systems are significant. HDFS is highly fault-tolerant and is designed to be deployed on low-cost hardware. HDFS provides high throughput access to application data and is suitable for applications that have large data sets. HDFS is part of the Apache Hadoop Core project . The main components of HDFS include: 1. NameNode: is the arbitrator and central repository of file namespace in the cluster. The NameNode executes the operations such as opening, closing, and renaming files and directories. 2. DataNode: manages the storage attached to the node on which it runs. It is responsible for serving all the read and writes requests. It performs operations on instructions on NameNode such as creation, deletion, and replications of blocks. 3. Client: Responsible for getting the required metadata from the namenode and then communicating with the datanodes for reads and writes. YARN YARN stands for \u201cYet Another Resource Negotiator\u201c. It was introduced in Hadoop 2.0 to remove the bottleneck on Job Tracker which was present in Hadoop 1.0. YARN was described as a \u201cRedesigned Resource Manager\u201d at the time of its launching, but it has now evolved to be known as a large-scale distributed operating system used for Big Data processing. The main components of YARN architecture include: 1. Client: It submits map-reduce(MR) jobs to the resource manager. 2. Resource Manager: It is the master daemon of YARN and is responsible for resource assignment and management among all the applications. Whenever it receives a processing request, it forwards it to the corresponding node manager and allocates resources for the completion of the request accordingly. It has two major components: 1. Scheduler: It performs scheduling based on the allocated application and available resources. It is a pure scheduler, which means that it does not perform other tasks such as monitoring or tracking and does not guarantee a restart if a task fails. The YARN scheduler supports plugins such as Capacity Scheduler and Fair Scheduler to partition the cluster resources. 2. Application manager: It is responsible for accepting the application and negotiating the first container from the resource manager. It also restarts the Application Manager container if a task fails. 3. Node Manager: It takes care of individual nodes on the Hadoop cluster and manages application and workflow and that particular node. Its primary job is to keep up with the Node Manager. It monitors resource usage, performs log management, and also kills a container based on directions from the resource manager. It is also responsible for creating the container process and starting it at the request of the Application master. 4. Application Master: An application is a single job submitted to a framework. The application manager is responsible for negotiating resources with the resource manager, tracking the status, and monitoring the progress of a single application. The application master requests the container from the node manager by sending a Container Launch Context(CLC) which includes everything an application needs to run. Once the application is started, it sends the health report to the resource manager from time-to-time. 5. Container: It is a collection of physical resources such as RAM, CPU cores, and disk on a single node. The containers are invoked by Container Launch Context(CLC) which is a record that contains information such as environment variables, security tokens, dependencies, etc.","title":"Architecture of Hadoop"},{"location":"level101/big_data/evolution/#mapreduce-framework","text":"The term MapReduce represents two separate and distinct tasks Hadoop programs perform-Map Job and Reduce Job. Map jobs take data sets as input and process them to produce key-value pairs. Reduce job takes the output of the Map job i.e. the key-value pairs and aggregates them to produce desired results. Hadoop MapReduce (Hadoop Map/Reduce) is a software framework for distributed processing of large data sets on computing clusters. Mapreduce helps to split the input data set into a number of parts and run a program on all data parts parallel at once. Please find the below Word count example demonstrating the usage of the MapReduce framework:","title":"MapReduce framework"},{"location":"level101/big_data/evolution/#other-tooling-around-hadoop","text":"Hive Uses a language called HQL which is very SQL like. Gives non-programmers the ability to query and analyze data in Hadoop. Is basically an abstraction layer on top of map-reduce. Ex. HQL query: SELECT pet.name, comment FROM pet JOIN event ON (pet.name = event.name); In mysql: SELECT pet.name, comment FROM pet, event WHERE pet.name = event.name; Pig Uses a scripting language called Pig Latin, which is more workflow driven. Don't need to be an expert Java programmer but need a few coding skills. Is also an abstraction layer on top of map-reduce. Here is a quick question for you: What is the output of running the pig queries in the right column against the data present in the left column in the below image? Output: 7,Komal,Nayak,24,9848022334,trivendram 8,Bharathi,Nambiayar,24,9848022333,Chennai 5,Trupthi,Mohanthy,23,9848022336,Bhuwaneshwar 6,Archana,Mishra,23,9848022335,Chennai Spark Spark provides primitives for in-memory cluster computing that allows user programs to load data into a cluster\u2019s memory and query it repeatedly, making it well suited to machine learning algorithms. Presto Presto is a high performance, distributed SQL query engine for Big Data. Its architecture allows users to query a variety of data sources such as Hadoop, AWS S3, Alluxio, MySQL, Cassandra, Kafka, and MongoDB. Example presto query: use studentDB; show tables; SELECT roll_no, name FROM studentDB.studentDetails where section=\u2019A\u2019 limit 5;","title":"Other tooling around Hadoop"},{"location":"level101/big_data/evolution/#data-serialisation-and-storage","text":"In order to transport the data over the network or to store on some persistent storage, we use the process of translating data structures or objects state into binary or textual form. We call this process serialization.. Avro data is stored in a container file (a .avro file) and its schema (the .avsc file) is stored with the data file. Apache Hive provides support to store a table as Avro and can also query data in this serialisation format.","title":"Data Serialisation and storage"},{"location":"level101/big_data/intro/","text":"Big Data Prerequisites Basics of Linux File systems. Basic understanding of System Design. What to expect from this course This course covers the basics of Big Data and how it has evolved to become what it is today. We will take a look at a few realistic scenarios where Big Data would be a perfect fit. An interesting assignment on designing a Big Data system is followed by understanding the architecture of Hadoop and the tooling around it. What is not covered under this course Writing programs to draw analytics from data. Course Contents Overview of Big Data Usage of Big Data techniques Evolution of Hadoop Architecture of hadoop HDFS Yarn MapReduce framework Other tooling around hadoop Hive Pig Spark Presto Data Serialisation and storage Overview of Big Data Big Data is a collection of large datasets that cannot be processed using traditional computing techniques. It is not a single technique or a tool, rather it has become a complete subject, which involves various tools, techniques, and frameworks. Big Data could consist of Structured data Unstructured data Semi-structured data Characteristics of Big Data: Volume Variety Velocity Variability Examples of Big Data generation include stock exchanges, social media sites, jet engines, etc. Usage of Big Data Techniques Take the example of the traffic lights problem. There are more than 300,000 traffic lights in the US as of 2018. Let us assume that we placed a device on each of them to collect metrics and send it to a central metrics collection system. If each of the IoT devices sends 10 events per minute, we have 300000x10x60x24 = 432x10^7 events per day. How would you go about processing that and telling me how many of the signals were \u201cgreen\u201d at 10:45 am on a particular day? Consider the next example on Unified Payments Interface (UPI) transactions: We had about 1.15 billion UPI transactions in the month of October 2019 in India. If we try to extrapolate this data to about a year and try to find out some common payments that were happening through a particular UPI ID, how do you suggest we go about that?","title":"Introduction"},{"location":"level101/big_data/intro/#big-data","text":"","title":"Big Data"},{"location":"level101/big_data/intro/#prerequisites","text":"Basics of Linux File systems. Basic understanding of System Design.","title":"Prerequisites"},{"location":"level101/big_data/intro/#what-to-expect-from-this-course","text":"This course covers the basics of Big Data and how it has evolved to become what it is today. We will take a look at a few realistic scenarios where Big Data would be a perfect fit. An interesting assignment on designing a Big Data system is followed by understanding the architecture of Hadoop and the tooling around it.","title":"What to expect from this course"},{"location":"level101/big_data/intro/#what-is-not-covered-under-this-course","text":"Writing programs to draw analytics from data.","title":"What is not covered under this course"},{"location":"level101/big_data/intro/#course-contents","text":"Overview of Big Data Usage of Big Data techniques Evolution of Hadoop Architecture of hadoop HDFS Yarn MapReduce framework Other tooling around hadoop Hive Pig Spark Presto Data Serialisation and storage","title":"Course Contents"},{"location":"level101/big_data/intro/#overview-of-big-data","text":"Big Data is a collection of large datasets that cannot be processed using traditional computing techniques. It is not a single technique or a tool, rather it has become a complete subject, which involves various tools, techniques, and frameworks. Big Data could consist of Structured data Unstructured data Semi-structured data Characteristics of Big Data: Volume Variety Velocity Variability Examples of Big Data generation include stock exchanges, social media sites, jet engines, etc.","title":"Overview of Big Data"},{"location":"level101/big_data/intro/#usage-of-big-data-techniques","text":"Take the example of the traffic lights problem. There are more than 300,000 traffic lights in the US as of 2018. Let us assume that we placed a device on each of them to collect metrics and send it to a central metrics collection system. If each of the IoT devices sends 10 events per minute, we have 300000x10x60x24 = 432x10^7 events per day. How would you go about processing that and telling me how many of the signals were \u201cgreen\u201d at 10:45 am on a particular day? Consider the next example on Unified Payments Interface (UPI) transactions: We had about 1.15 billion UPI transactions in the month of October 2019 in India. If we try to extrapolate this data to about a year and try to find out some common payments that were happening through a particular UPI ID, how do you suggest we go about that?","title":"Usage of Big Data Techniques"},{"location":"level101/big_data/tasks/","text":"Tasks and conclusion Post-training tasks: Try setting up your own 3 node Hadoop cluster. A VM based solution can be found here Write a simple spark/MR job of your choice and understand how to generate analytics from data. Sample dataset can be found here References: Hadoop documentation HDFS Architecture YARN Architecture Google GFS paper","title":"Conclusion"},{"location":"level101/big_data/tasks/#tasks-and-conclusion","text":"","title":"Tasks and conclusion"},{"location":"level101/big_data/tasks/#post-training-tasks","text":"Try setting up your own 3 node Hadoop cluster. A VM based solution can be found here Write a simple spark/MR job of your choice and understand how to generate analytics from data. Sample dataset can be found here","title":"Post-training tasks:"},{"location":"level101/big_data/tasks/#references","text":"Hadoop documentation HDFS Architecture YARN Architecture Google GFS paper","title":"References:"},{"location":"level101/databases_nosql/further_reading/","text":"Conclusion We have covered basic concepts of NoSQL databases. There is much more to learn and do. We hope this course gives you a good start and inspires you to explore further. Further reading NoSQL: https://hostingdata.co.uk/nosql-database/ https://www.mongodb.com/nosql-explained https://www.mongodb.com/nosql-explained/nosql-vs-sql Cap Theorem http://www.julianbrowne.com/article/brewers-cap-theorem Scalability http://www.slideshare.net/jboner/scalability-availability-stability-patterns Eventual Consistency https://www.allthingsdistributed.com/2008/12/eventually_consistent.html https://www.toptal.com/big-data/consistent-hashing https://web.stanford.edu/class/cs244/papers/chord_TON_2003.pdf","title":"Conclusion"},{"location":"level101/databases_nosql/further_reading/#conclusion","text":"We have covered basic concepts of NoSQL databases. There is much more to learn and do. We hope this course gives you a good start and inspires you to explore further.","title":"Conclusion"},{"location":"level101/databases_nosql/further_reading/#further-reading","text":"NoSQL: https://hostingdata.co.uk/nosql-database/ https://www.mongodb.com/nosql-explained https://www.mongodb.com/nosql-explained/nosql-vs-sql Cap Theorem http://www.julianbrowne.com/article/brewers-cap-theorem Scalability http://www.slideshare.net/jboner/scalability-availability-stability-patterns Eventual Consistency https://www.allthingsdistributed.com/2008/12/eventually_consistent.html https://www.toptal.com/big-data/consistent-hashing https://web.stanford.edu/class/cs244/papers/chord_TON_2003.pdf","title":"Further reading"},{"location":"level101/databases_nosql/intro/","text":"NoSQL Concepts Prerequisites Relational Databases What to expect from this course At the end of training, you will have an understanding of what a NoSQL database is, what kind of advantages or disadvantages it has over traditional RDBMS, learn about different types of NoSQL databases and understand some of the underlying concepts & trade offs w.r.t to NoSQL. What is not covered under this course We will not be deep diving into any specific NoSQL Database. Course Contents Introduction to NoSQL CAP Theorem Data versioning Partitioning Hashing Quorum Introduction When people use the term \u201cNoSQL database\u201d, they typically use it to refer to any non-relational database. Some say the term \u201cNoSQL\u201d stands for \u201cnon SQL\u201d while others say it stands for \u201cnot only SQL.\u201d Either way, most agree that NoSQL databases are databases that store data in a format other than relational tables. A common misconception is that NoSQL databases or non-relational databases don\u2019t store relationship data well. NoSQL databases can store relationship data\u2014they just store it differently than relational databases do. In fact, when compared with SQL databases, many find modeling relationship data in NoSQL databases to be easier , because related data doesn\u2019t have to be split between tables. Such databases have existed since the late 1960s, but the name \"NoSQL\" was only coined in the early 21st century. NASA used a NoSQL database to track inventory for the Apollo mission. NoSQL databases emerged in the late 2000s as the cost of storage dramatically decreased. Gone were the days of needing to create a complex, difficult-to-manage data model simply for the purposes of reducing data duplication. Developers (rather than storage) were becoming the primary cost of software development, so NoSQL databases optimized for developer productivity. With the rise of Agile development methodology, NoSQL databases were developed with a focus on scaling, fast performance and at the same time allowed for frequent application changes and made programming easier. Types of NoSQL databases: Over time due to the way these NoSQL databases were developed to suit requirements at different companies, we ended up with quite a few types of them. However, they can be broadly classified into 4 types. Some of the databases can overlap between different types. They are Document databases: They store data in documents similar to JSON (JavaScript Object Notation) objects. Each document contains pairs of fields and values. The values can typically be a variety of types including things like strings, numbers, booleans, arrays, or objects, and their structures typically align with objects developers are working with in code. The advantages include intuitive data model & flexible schemas. Because of their variety of field value types and powerful query languages, document databases are great for a wide variety of use cases and can be used as a general purpose database. They can horizontally scale-out to accomodate large data volumes. Ex: MongoDB, Couchbase Key-Value databases: These are a simpler type of databases where each item contains keys and values. A value can typically only be retrieved by referencing its key, so learning how to query for a specific key-value pair is typically simple. Key-value databases are great for use cases where you need to store large amounts of data but you don\u2019t need to perform complex queries to retrieve it. Common use cases include storing user preferences or caching. Ex: Redis , DynamoDB , Voldemort / Venice (Linkedin), Wide-Column stores: They store data in tables, rows, and dynamic columns. Wide-column stores provide a lot of flexibility over relational databases because each row is not required to have the same columns. Many consider wide-column stores to be two-dimensional key-value databases. Wide-column stores are great for when you need to store large amounts of data and you can predict what your query patterns will be. Wide-column stores are commonly used for storing Internet of Things data and user profile data. Cassandra and HBase are two of the most popular wide-column stores. Graph Databases: These databases store data in nodes and edges. Nodes typically store information about people, places, and things while edges store information about the relationships between the nodes. The underlying storage mechanism of graph databases can vary. Some depend on a relational engine and \u201cstore\u201d the graph data in a table (although a table is a logical element, therefore this approach imposes another level of abstraction between the graph database, the graph database management system and the physical devices where the data is actually stored). Others use a key-value store or document-oriented database for storage, making them inherently NoSQL structures. Graph databases excel in use cases where you need to traverse relationships to look for patterns such as social networks, fraud detection, and recommendation engines. Ex: Neo4j Comparison Performance Scalability Flexibility Complexity Functionality Key Value high high high none Variable Document stores high Variable (high) high low Variable (low) Column DB high high moderate low minimal Graph Variable Variable high high Graph theory Differences between SQL and NoSQL The table below summarizes the main differences between SQL and NoSQL databases. SQL Databases NoSQL Databases Data Storage Model Tables with fixed rows and columns Document: JSON documents, Key-value: key-value pairs, Wide-column: tables with rows and dynamic columns, Graph: nodes and edges Primary Purpose General purpose Document: general purpose, Key-value: large amounts of data with simple lookup queries, Wide-column: large amounts of data with predictable query patterns, Graph: analyzing and traversing relationships between connected data Schemas Rigid Flexible Scaling Vertical (scale-up with a larger server) Horizontal (scale-out across commodity servers) Multi-Record ACID Transactions Supported Most do not support multi-record ACID transactions. However, some\u2014like MongoDB\u2014do. Joins Typically required Typically not required Data to Object Mapping Requires ORM (object-relational mapping) Many do not require ORMs. Document DB documents map directly to data structures in most popular programming languages. Advantages Flexible Data Models Most NoSQL systems feature flexible schemas. A flexible schema means you can easily modify your database schema to add or remove fields to support for evolving application requirements. This facilitates with continuous application development of new features without database operation overhead. Horizontal Scaling Most NoSQL systems allow you to scale horizontally, which means you can add in cheaper & commodity hardware, whenever you want to scale a system. On the other hand SQL systems generally scale Vertically (a more powerful server). NoSQL systems can also host huge data sets when compared to traditional SQL systems. Fast Queries NoSQL can generally be a lot faster than traditional SQL systems due to data denormalization and horizontal scaling. Most NoSQL systems also tend to store similar data together facilitating faster query responses. Developer productivity NoSQL systems tend to map data based on the programming data structures. As a result developers need to perform fewer data transformations leading to increased productivity & fewer bugs.","title":"Introduction"},{"location":"level101/databases_nosql/intro/#nosql-concepts","text":"","title":"NoSQL Concepts"},{"location":"level101/databases_nosql/intro/#prerequisites","text":"Relational Databases","title":"Prerequisites"},{"location":"level101/databases_nosql/intro/#what-to-expect-from-this-course","text":"At the end of training, you will have an understanding of what a NoSQL database is, what kind of advantages or disadvantages it has over traditional RDBMS, learn about different types of NoSQL databases and understand some of the underlying concepts & trade offs w.r.t to NoSQL.","title":"What to expect from this course"},{"location":"level101/databases_nosql/intro/#what-is-not-covered-under-this-course","text":"We will not be deep diving into any specific NoSQL Database.","title":"What is not covered under this course"},{"location":"level101/databases_nosql/intro/#course-contents","text":"Introduction to NoSQL CAP Theorem Data versioning Partitioning Hashing Quorum","title":"Course Contents"},{"location":"level101/databases_nosql/intro/#introduction","text":"When people use the term \u201cNoSQL database\u201d, they typically use it to refer to any non-relational database. Some say the term \u201cNoSQL\u201d stands for \u201cnon SQL\u201d while others say it stands for \u201cnot only SQL.\u201d Either way, most agree that NoSQL databases are databases that store data in a format other than relational tables. A common misconception is that NoSQL databases or non-relational databases don\u2019t store relationship data well. NoSQL databases can store relationship data\u2014they just store it differently than relational databases do. In fact, when compared with SQL databases, many find modeling relationship data in NoSQL databases to be easier , because related data doesn\u2019t have to be split between tables. Such databases have existed since the late 1960s, but the name \"NoSQL\" was only coined in the early 21st century. NASA used a NoSQL database to track inventory for the Apollo mission. NoSQL databases emerged in the late 2000s as the cost of storage dramatically decreased. Gone were the days of needing to create a complex, difficult-to-manage data model simply for the purposes of reducing data duplication. Developers (rather than storage) were becoming the primary cost of software development, so NoSQL databases optimized for developer productivity. With the rise of Agile development methodology, NoSQL databases were developed with a focus on scaling, fast performance and at the same time allowed for frequent application changes and made programming easier.","title":"Introduction"},{"location":"level101/databases_nosql/intro/#types-of-nosql-databases","text":"Over time due to the way these NoSQL databases were developed to suit requirements at different companies, we ended up with quite a few types of them. However, they can be broadly classified into 4 types. Some of the databases can overlap between different types. They are Document databases: They store data in documents similar to JSON (JavaScript Object Notation) objects. Each document contains pairs of fields and values. The values can typically be a variety of types including things like strings, numbers, booleans, arrays, or objects, and their structures typically align with objects developers are working with in code. The advantages include intuitive data model & flexible schemas. Because of their variety of field value types and powerful query languages, document databases are great for a wide variety of use cases and can be used as a general purpose database. They can horizontally scale-out to accomodate large data volumes. Ex: MongoDB, Couchbase Key-Value databases: These are a simpler type of databases where each item contains keys and values. A value can typically only be retrieved by referencing its key, so learning how to query for a specific key-value pair is typically simple. Key-value databases are great for use cases where you need to store large amounts of data but you don\u2019t need to perform complex queries to retrieve it. Common use cases include storing user preferences or caching. Ex: Redis , DynamoDB , Voldemort / Venice (Linkedin), Wide-Column stores: They store data in tables, rows, and dynamic columns. Wide-column stores provide a lot of flexibility over relational databases because each row is not required to have the same columns. Many consider wide-column stores to be two-dimensional key-value databases. Wide-column stores are great for when you need to store large amounts of data and you can predict what your query patterns will be. Wide-column stores are commonly used for storing Internet of Things data and user profile data. Cassandra and HBase are two of the most popular wide-column stores. Graph Databases: These databases store data in nodes and edges. Nodes typically store information about people, places, and things while edges store information about the relationships between the nodes. The underlying storage mechanism of graph databases can vary. Some depend on a relational engine and \u201cstore\u201d the graph data in a table (although a table is a logical element, therefore this approach imposes another level of abstraction between the graph database, the graph database management system and the physical devices where the data is actually stored). Others use a key-value store or document-oriented database for storage, making them inherently NoSQL structures. Graph databases excel in use cases where you need to traverse relationships to look for patterns such as social networks, fraud detection, and recommendation engines. Ex: Neo4j","title":"Types of NoSQL databases:"},{"location":"level101/databases_nosql/intro/#comparison","text":"Performance Scalability Flexibility Complexity Functionality Key Value high high high none Variable Document stores high Variable (high) high low Variable (low) Column DB high high moderate low minimal Graph Variable Variable high high Graph theory","title":"Comparison"},{"location":"level101/databases_nosql/intro/#differences-between-sql-and-nosql","text":"The table below summarizes the main differences between SQL and NoSQL databases. SQL Databases NoSQL Databases Data Storage Model Tables with fixed rows and columns Document: JSON documents, Key-value: key-value pairs, Wide-column: tables with rows and dynamic columns, Graph: nodes and edges Primary Purpose General purpose Document: general purpose, Key-value: large amounts of data with simple lookup queries, Wide-column: large amounts of data with predictable query patterns, Graph: analyzing and traversing relationships between connected data Schemas Rigid Flexible Scaling Vertical (scale-up with a larger server) Horizontal (scale-out across commodity servers) Multi-Record ACID Transactions Supported Most do not support multi-record ACID transactions. However, some\u2014like MongoDB\u2014do. Joins Typically required Typically not required Data to Object Mapping Requires ORM (object-relational mapping) Many do not require ORMs. Document DB documents map directly to data structures in most popular programming languages.","title":"Differences between SQL and NoSQL"},{"location":"level101/databases_nosql/intro/#advantages","text":"Flexible Data Models Most NoSQL systems feature flexible schemas. A flexible schema means you can easily modify your database schema to add or remove fields to support for evolving application requirements. This facilitates with continuous application development of new features without database operation overhead. Horizontal Scaling Most NoSQL systems allow you to scale horizontally, which means you can add in cheaper & commodity hardware, whenever you want to scale a system. On the other hand SQL systems generally scale Vertically (a more powerful server). NoSQL systems can also host huge data sets when compared to traditional SQL systems. Fast Queries NoSQL can generally be a lot faster than traditional SQL systems due to data denormalization and horizontal scaling. Most NoSQL systems also tend to store similar data together facilitating faster query responses. Developer productivity NoSQL systems tend to map data based on the programming data structures. As a result developers need to perform fewer data transformations leading to increased productivity & fewer bugs.","title":"Advantages"},{"location":"level101/databases_nosql/key_concepts/","text":"Key Concepts Lets looks at some of the key concepts when we talk about NoSQL or distributed systems CAP Theorem In a keynote titled \u201c Towards Robust Distributed Systems \u201d at ACM\u2019s PODC symposium in 2000 Eric Brewer came up with the so-called CAP-theorem which is widely adopted today by large web companies as well as in the NoSQL community. The CAP acronym stands for C onsistency, A vailability & P artition Tolerance. Consistency It refers to how consistent a system is after an execution. A distributed system is called consistent when a write made by a source is available for all readers of that shared data. Different NoSQL systems support different levels of consistency. Availability It refers to how a system responds to loss of functionality of different systems due to hardware and software failures. A high availability implies that a system is still available to handle operations (reads and writes) when a certain part of the system is down due to a failure or upgrade. Partition Tolerance It is the ability of the system to continue operations in the event of a network partition. A network partition occurs when a failure causes two or more islands of networks where the systems can\u2019t talk to each other across the islands temporarily or permanently. Brewer alleges that one can at most choose two of these three characteristics in a shared-data system. The CAP-theorem states that a choice can only be made for two options out of consistency, availability and partition tolerance. A growing number of use cases in large scale applications tend to value reliability implying that availability & redundancy are more valuable than consistency. As a result these systems struggle to meet ACID properties. They attain this by loosening on the consistency requirement i.e Eventual Consistency. Eventual Consistency means that all readers will see writes, as time goes on: \u201cIn a steady state, the system will eventually return the last written value\u201d. Clients therefore may face an inconsistent state of data as updates are in progress. For instance, in a replicated database updates may go to one node which replicates the latest version to all other nodes that contain a replica of the modified dataset so that the replica nodes eventually will have the latest version. NoSQL systems support different levels of eventual consistency models. For example: Read Your Own Writes Consistency Clients will see their updates immediately after they are written. The reads can hit nodes other than the one where it was written. However they might not see updates by other clients immediately. Session Consistency Clients will see the updates to their data within a session scope. This generally indicates that reads & writes occur on the same server. Other clients using the same nodes will receive the same updates. Casual Consistency A system provides causal consistency if the following condition holds: write operations that are related by potential causality are seen by each process of the system in order. Different processes may observe concurrent writes in different orders Eventual consistency is useful if concurrent updates of the same partitions of data are unlikely and if clients do not immediately depend on reading updates issued by themselves or by other clients. Depending on what consistency model was chosen for the system (or parts of it), determines where the requests are routed, ex: replicas. CAP alternatives illustration Choice Traits Examples Consistency + Availability (Forfeit Partitions) 2-phase commits Cache invalidation protocols Single-site databases Cluster databases LDAP xFS file system Consistency + Partition tolerance (Forfeit Availability) Pessimistic locking Make minority partitions unavailable Distributed databases Distributed locking Majority protocols Availability + Partition tolerance (Forfeit Consistency) expirations/leases conflict resolution optimistic DNS Web caching Versioning of Data in distributed systems When data is distributed across nodes, it can be modified on different nodes at the same time (assuming strict consistency is enforced). Questions arise on conflict resolution for concurrent updates. Some of the popular conflict resolution mechanism are Timestamps This is the most obvious solution. You sort updates based on chronological order and choose the latest update. However this relies on clock synchronization across different parts of the infrastructure. This gets even more complicated when parts of systems are spread across different geographic locations. Optimistic Locking You associate a unique value like a clock or counter with every data update. When a client wants to update data, it has to specify which version of data needs to be updated. This would mean you need to keep track of history of the data versions. Vector Clocks A vector clock is defined as a tuple of clock values from each node. In a distributed environment, each node maintains a tuple of such clock values which represent the state of the nodes itself and its peers/replicas. A clock value may be real timestamps derived from local clock or version no. Vector clocks illustration Vector clocks have the following advantages over other conflict resolution mechanism No dependency on synchronized clocks No total ordering of revision nos required for casual reasoning No need to store and maintain multiple versions of the data on different nodes.** ** Partitioning When the amount of data crosses the capacity of a single node, we need to think of splitting data, creating replicas for load balancing & disaster recovery. Depending on how dynamic the infrastructure is, we have a few approaches that we can take. Memory cached These are partitioned in-memory databases that are primarily used for transient data. These databases are generally used as a front for traditional RDBMS. Most frequently used data is replicated from a rdbms into a memory database to facilitate fast queries and to take the load off from backend DB\u2019s. A very common example is memcached or couchbase. Clustering Traditional cluster mechanisms abstract away the cluster topology from clients. A client need not know where the actual data is residing and which node it is talking to. Clustering is very commonly used in traditional RDBMS where it can help scaling the persistent layer to a certain extent. Separating reads from writes In this method, you will have multiple replicas hosting the same data. The incoming writes are typically sent to a single node (Leader) or multiple nodes (multi-Leader), while the rest of the replicas (Follower) handle reads requests. The leader replicates writes asynchronously to all followers. However the write lag can\u2019t be completely avoided. Sometimes a leader can crash before it replicates all the data to a follower. When this happens, a follower with the most consistent data can be turned into a leader. As you can realize now, it is hard to enforce full consistency in this model. You also need to consider the ratio of read vs write traffic. This model won\u2019t make sense when writes are higher than reads. The replication methods can also vary widely. Some systems do a complete transfer of state periodically, while others use a delta state transfer approach. You could also transfer the state by transferring the operations in order. The followers can then apply the same operations as the leader to catch up. Sharding Sharing refers to dividing data in such a way that data is distributed evenly (both in terms of storage & processing power) across a cluster of nodes. It can also imply data locality, which means similar & related data is stored together to facilitate faster access. A shard in turn can be further replicated to meet load balancing or disaster recovery requirements. A single shard replica might take in all writes (single leader) or multiple replicas can take writes (multi-leader). Reads can be distributed across multiple replicas. Since data is now distributed across multiple nodes, clients should be able to consistently figure out where data is hosted. We will look at some of the common techniques below. The downside of sharding is that joins between shards is not possible. So an upstream/downstream application has to aggregate the results from multiple shards. Sharding example Hashing A hash function is a function that maps one piece of data\u2014typically describing some kind of object, often of arbitrary size\u2014to another piece of data, typically an integer, known as hash code , or simply hash . In a partitioned database, it is important to consistently map a key to a server/replica. For ex: you can use a very simple hash as a modulo function. _p = k mod n_ Where p -> partition, k -> primary key n -> no of nodes The downside of this simple hash is that, whenever the cluster topology changes, the data distribution also changes. When you are dealing with memory caches, it will be easy to distribute partitions around. Whenever a node joins/leaves a topology, partitions can reorder themselves, a cache miss can be re-populated from backend DB. However when you look at persistent data, it is not possible as the new node doesn\u2019t have the data needed to serve it. This brings us to consistent hashing. Consistent Hashing Consistent hashing is a distributed hashing scheme that operates independently of the number of servers or objects in a distributed hash table by assigning them a position on an abstract circle, or hash ring . This allows servers and objects to scale without affecting the overall system. Say that our hash function h() generates a 32-bit integer. Then, to determine to which server we will send a key k, we find the server s whose hash h(s) is the smallest integer that is larger than h(k). To make the process simpler, we assume the table is circular, which means that if we cannot find a server with a hash larger than h(k), we wrap around and start looking from the beginning of the array. Consistent hashing illustration In consistent hashing when a server is removed or added then only the keys from that server are relocated. For example, if server S3 is removed then, all keys from server S3 will be moved to server S4 but keys stored on server S4 and S2 are not relocated. But there is one problem, when server S3 is removed then keys from S3 are not equally distributed among remaining servers S4 and S2. They are only assigned to server S4 which increases the load on server S4. To evenly distribute the load among servers when a server is added or removed, it creates a fixed number of replicas ( known as virtual nodes) of each server and distributes it along the circle. So instead of server labels S1, S2 and S3, we will have S10 S11\u2026S19, S20 S21\u2026S29 and S30 S31\u2026S39. The factor for a number of replicas is also known as weight , depending on the situation. All keys which are mapped to replicas Sij are stored on server Si. To find a key we do the same thing, find the position of the key on the circle and then move forward until you find a server replica. If the server replica is Sij then the key is stored in server Si. Suppose server S3 is removed, then all S3 replicas with labels S30 S31 \u2026 S39 must be removed. Now the objects keys adjacent to S3X labels will be automatically re-assigned to S1X, S2X and S4X. All keys originally assigned to S1, S2 & S4 will not be moved. Similar things happen if we add a server. Suppose we want to add a server S5 as a replacement of S3 then we need to add labels S50 S51 \u2026 S59. In the ideal case, one-fourth of keys from S1, S2 and S4 will be reassigned to S5. When applied to persistent storages, further issues arise: if a node has left the scene, data stored on this node becomes unavailable, unless it has been replicated to other nodes before; in the opposite case of a new node joining the others, adjacent nodes are no longer responsible for some pieces of data which they still store but not get asked for anymore as the corresponding objects are no longer hashed to them by requesting clients. In order to address this issue, a replication factor (r) can be introduced. Introducing replicas in a partitioning scheme\u2014besides reliability benefits\u2014also makes it possible to spread workload for read requests that can go to any physical node responsible for a requested piece of data. Scalability doesn\u2019t work if the clients have to decide between multiple versions of the dataset, because they need to read from a quorum of servers which in turn reduces the efficiency of load balancing. Quorum Quorum is the minimum number of nodes in a cluster that must be online and be able to communicate with each other. If any additional node failure occurs beyond this threshold, the cluster will stop running. To attain a quorum, you need a majority of the nodes. Commonly it is (N/2 + 1), where N is the total no of nodes in the system. For ex, In a 3 node cluster, you need 2 nodes for a majority, In a 5 node cluster, you need 3 nodes for a majority, In a 6 node cluster, you need 4 nodes for a majority. Quorum example Network problems can cause communication failures among cluster nodes. One set of nodes might be able to communicate together across a functioning part of a network but not be able to communicate with a different set of nodes in another part of the network. This is known as split brain in cluster or cluster partitioning. Now the partition which has quorum is allowed to continue running the application. The other partitions are removed from the cluster. Eg: In a 5 node cluster, consider what happens if nodes 1, 2, and 3 can communicate with each other but not with nodes 4 and 5. Nodes 1, 2, and 3 constitute a majority, and they continue running as a cluster. Nodes 4 and 5, being a minority, stop running as a cluster. If node 3 loses communication with other nodes, all nodes stop running as a cluster. However, all functioning nodes will continue to listen for communication, so that when the network begins working again, the cluster can form and begin to run. Below diagram demonstrates Quorum selection on a cluster partitioned into two sets. Cluster Quorum example","title":"Key Concepts"},{"location":"level101/databases_nosql/key_concepts/#key-concepts","text":"Lets looks at some of the key concepts when we talk about NoSQL or distributed systems","title":"Key Concepts"},{"location":"level101/databases_nosql/key_concepts/#cap-theorem","text":"In a keynote titled \u201c Towards Robust Distributed Systems \u201d at ACM\u2019s PODC symposium in 2000 Eric Brewer came up with the so-called CAP-theorem which is widely adopted today by large web companies as well as in the NoSQL community. The CAP acronym stands for C onsistency, A vailability & P artition Tolerance. Consistency It refers to how consistent a system is after an execution. A distributed system is called consistent when a write made by a source is available for all readers of that shared data. Different NoSQL systems support different levels of consistency. Availability It refers to how a system responds to loss of functionality of different systems due to hardware and software failures. A high availability implies that a system is still available to handle operations (reads and writes) when a certain part of the system is down due to a failure or upgrade. Partition Tolerance It is the ability of the system to continue operations in the event of a network partition. A network partition occurs when a failure causes two or more islands of networks where the systems can\u2019t talk to each other across the islands temporarily or permanently. Brewer alleges that one can at most choose two of these three characteristics in a shared-data system. The CAP-theorem states that a choice can only be made for two options out of consistency, availability and partition tolerance. A growing number of use cases in large scale applications tend to value reliability implying that availability & redundancy are more valuable than consistency. As a result these systems struggle to meet ACID properties. They attain this by loosening on the consistency requirement i.e Eventual Consistency. Eventual Consistency means that all readers will see writes, as time goes on: \u201cIn a steady state, the system will eventually return the last written value\u201d. Clients therefore may face an inconsistent state of data as updates are in progress. For instance, in a replicated database updates may go to one node which replicates the latest version to all other nodes that contain a replica of the modified dataset so that the replica nodes eventually will have the latest version. NoSQL systems support different levels of eventual consistency models. For example: Read Your Own Writes Consistency Clients will see their updates immediately after they are written. The reads can hit nodes other than the one where it was written. However they might not see updates by other clients immediately. Session Consistency Clients will see the updates to their data within a session scope. This generally indicates that reads & writes occur on the same server. Other clients using the same nodes will receive the same updates. Casual Consistency A system provides causal consistency if the following condition holds: write operations that are related by potential causality are seen by each process of the system in order. Different processes may observe concurrent writes in different orders Eventual consistency is useful if concurrent updates of the same partitions of data are unlikely and if clients do not immediately depend on reading updates issued by themselves or by other clients. Depending on what consistency model was chosen for the system (or parts of it), determines where the requests are routed, ex: replicas. CAP alternatives illustration Choice Traits Examples Consistency + Availability (Forfeit Partitions) 2-phase commits Cache invalidation protocols Single-site databases Cluster databases LDAP xFS file system Consistency + Partition tolerance (Forfeit Availability) Pessimistic locking Make minority partitions unavailable Distributed databases Distributed locking Majority protocols Availability + Partition tolerance (Forfeit Consistency) expirations/leases conflict resolution optimistic DNS Web caching","title":"CAP Theorem"},{"location":"level101/databases_nosql/key_concepts/#versioning-of-data-in-distributed-systems","text":"When data is distributed across nodes, it can be modified on different nodes at the same time (assuming strict consistency is enforced). Questions arise on conflict resolution for concurrent updates. Some of the popular conflict resolution mechanism are Timestamps This is the most obvious solution. You sort updates based on chronological order and choose the latest update. However this relies on clock synchronization across different parts of the infrastructure. This gets even more complicated when parts of systems are spread across different geographic locations. Optimistic Locking You associate a unique value like a clock or counter with every data update. When a client wants to update data, it has to specify which version of data needs to be updated. This would mean you need to keep track of history of the data versions. Vector Clocks A vector clock is defined as a tuple of clock values from each node. In a distributed environment, each node maintains a tuple of such clock values which represent the state of the nodes itself and its peers/replicas. A clock value may be real timestamps derived from local clock or version no. Vector clocks illustration Vector clocks have the following advantages over other conflict resolution mechanism No dependency on synchronized clocks No total ordering of revision nos required for casual reasoning No need to store and maintain multiple versions of the data on different nodes.** **","title":"Versioning of Data in distributed systems"},{"location":"level101/databases_nosql/key_concepts/#partitioning","text":"When the amount of data crosses the capacity of a single node, we need to think of splitting data, creating replicas for load balancing & disaster recovery. Depending on how dynamic the infrastructure is, we have a few approaches that we can take. Memory cached These are partitioned in-memory databases that are primarily used for transient data. These databases are generally used as a front for traditional RDBMS. Most frequently used data is replicated from a rdbms into a memory database to facilitate fast queries and to take the load off from backend DB\u2019s. A very common example is memcached or couchbase. Clustering Traditional cluster mechanisms abstract away the cluster topology from clients. A client need not know where the actual data is residing and which node it is talking to. Clustering is very commonly used in traditional RDBMS where it can help scaling the persistent layer to a certain extent. Separating reads from writes In this method, you will have multiple replicas hosting the same data. The incoming writes are typically sent to a single node (Leader) or multiple nodes (multi-Leader), while the rest of the replicas (Follower) handle reads requests. The leader replicates writes asynchronously to all followers. However the write lag can\u2019t be completely avoided. Sometimes a leader can crash before it replicates all the data to a follower. When this happens, a follower with the most consistent data can be turned into a leader. As you can realize now, it is hard to enforce full consistency in this model. You also need to consider the ratio of read vs write traffic. This model won\u2019t make sense when writes are higher than reads. The replication methods can also vary widely. Some systems do a complete transfer of state periodically, while others use a delta state transfer approach. You could also transfer the state by transferring the operations in order. The followers can then apply the same operations as the leader to catch up. Sharding Sharing refers to dividing data in such a way that data is distributed evenly (both in terms of storage & processing power) across a cluster of nodes. It can also imply data locality, which means similar & related data is stored together to facilitate faster access. A shard in turn can be further replicated to meet load balancing or disaster recovery requirements. A single shard replica might take in all writes (single leader) or multiple replicas can take writes (multi-leader). Reads can be distributed across multiple replicas. Since data is now distributed across multiple nodes, clients should be able to consistently figure out where data is hosted. We will look at some of the common techniques below. The downside of sharding is that joins between shards is not possible. So an upstream/downstream application has to aggregate the results from multiple shards. Sharding example","title":"Partitioning"},{"location":"level101/databases_nosql/key_concepts/#hashing","text":"A hash function is a function that maps one piece of data\u2014typically describing some kind of object, often of arbitrary size\u2014to another piece of data, typically an integer, known as hash code , or simply hash . In a partitioned database, it is important to consistently map a key to a server/replica. For ex: you can use a very simple hash as a modulo function. _p = k mod n_ Where p -> partition, k -> primary key n -> no of nodes The downside of this simple hash is that, whenever the cluster topology changes, the data distribution also changes. When you are dealing with memory caches, it will be easy to distribute partitions around. Whenever a node joins/leaves a topology, partitions can reorder themselves, a cache miss can be re-populated from backend DB. However when you look at persistent data, it is not possible as the new node doesn\u2019t have the data needed to serve it. This brings us to consistent hashing.","title":"Hashing"},{"location":"level101/databases_nosql/key_concepts/#consistent-hashing","text":"Consistent hashing is a distributed hashing scheme that operates independently of the number of servers or objects in a distributed hash table by assigning them a position on an abstract circle, or hash ring . This allows servers and objects to scale without affecting the overall system. Say that our hash function h() generates a 32-bit integer. Then, to determine to which server we will send a key k, we find the server s whose hash h(s) is the smallest integer that is larger than h(k). To make the process simpler, we assume the table is circular, which means that if we cannot find a server with a hash larger than h(k), we wrap around and start looking from the beginning of the array. Consistent hashing illustration In consistent hashing when a server is removed or added then only the keys from that server are relocated. For example, if server S3 is removed then, all keys from server S3 will be moved to server S4 but keys stored on server S4 and S2 are not relocated. But there is one problem, when server S3 is removed then keys from S3 are not equally distributed among remaining servers S4 and S2. They are only assigned to server S4 which increases the load on server S4. To evenly distribute the load among servers when a server is added or removed, it creates a fixed number of replicas ( known as virtual nodes) of each server and distributes it along the circle. So instead of server labels S1, S2 and S3, we will have S10 S11\u2026S19, S20 S21\u2026S29 and S30 S31\u2026S39. The factor for a number of replicas is also known as weight , depending on the situation. All keys which are mapped to replicas Sij are stored on server Si. To find a key we do the same thing, find the position of the key on the circle and then move forward until you find a server replica. If the server replica is Sij then the key is stored in server Si. Suppose server S3 is removed, then all S3 replicas with labels S30 S31 \u2026 S39 must be removed. Now the objects keys adjacent to S3X labels will be automatically re-assigned to S1X, S2X and S4X. All keys originally assigned to S1, S2 & S4 will not be moved. Similar things happen if we add a server. Suppose we want to add a server S5 as a replacement of S3 then we need to add labels S50 S51 \u2026 S59. In the ideal case, one-fourth of keys from S1, S2 and S4 will be reassigned to S5. When applied to persistent storages, further issues arise: if a node has left the scene, data stored on this node becomes unavailable, unless it has been replicated to other nodes before; in the opposite case of a new node joining the others, adjacent nodes are no longer responsible for some pieces of data which they still store but not get asked for anymore as the corresponding objects are no longer hashed to them by requesting clients. In order to address this issue, a replication factor (r) can be introduced. Introducing replicas in a partitioning scheme\u2014besides reliability benefits\u2014also makes it possible to spread workload for read requests that can go to any physical node responsible for a requested piece of data. Scalability doesn\u2019t work if the clients have to decide between multiple versions of the dataset, because they need to read from a quorum of servers which in turn reduces the efficiency of load balancing.","title":"Consistent Hashing"},{"location":"level101/databases_nosql/key_concepts/#quorum","text":"Quorum is the minimum number of nodes in a cluster that must be online and be able to communicate with each other. If any additional node failure occurs beyond this threshold, the cluster will stop running. To attain a quorum, you need a majority of the nodes. Commonly it is (N/2 + 1), where N is the total no of nodes in the system. For ex, In a 3 node cluster, you need 2 nodes for a majority, In a 5 node cluster, you need 3 nodes for a majority, In a 6 node cluster, you need 4 nodes for a majority. Quorum example Network problems can cause communication failures among cluster nodes. One set of nodes might be able to communicate together across a functioning part of a network but not be able to communicate with a different set of nodes in another part of the network. This is known as split brain in cluster or cluster partitioning. Now the partition which has quorum is allowed to continue running the application. The other partitions are removed from the cluster. Eg: In a 5 node cluster, consider what happens if nodes 1, 2, and 3 can communicate with each other but not with nodes 4 and 5. Nodes 1, 2, and 3 constitute a majority, and they continue running as a cluster. Nodes 4 and 5, being a minority, stop running as a cluster. If node 3 loses communication with other nodes, all nodes stop running as a cluster. However, all functioning nodes will continue to listen for communication, so that when the network begins working again, the cluster can form and begin to run. Below diagram demonstrates Quorum selection on a cluster partitioned into two sets. Cluster Quorum example","title":"Quorum"},{"location":"level101/databases_sql/backup_recovery/","text":"Backup and Recovery Backups are a very crucial part of any database setup. They are generally a copy of the data that can be used to reconstruct the data in case of any major or minor crisis with the database. In general terms backups can be of two types:- Physical Backup - the data directory as it is on the disk Logical Backup - the table structure and records in it Both the above kinds of backups are supported by MySQL with different tools. It is the job of the SRE to identify which should be used when. Mysqldump This utility is available with MySQL installation. It helps in getting the logical backup of the database. It outputs a set of SQL statements to reconstruct the data. It is not recommended to use mysqldump for large tables as it might take a lot of time and the file size will be huge. However, for small tables it is the best and the quickest option. mysqldump [options] > dump_output.sql There are certain options that can be used with mysqldump to get an appropriate dump of the database. To dump all the databases mysqldump -u -p --all-databases > all_dbs.sql To dump specific databases mysqldump -u -p --databases db1 db2 db3 > dbs.sql To dump a single database mysqldump -u -p --databases db1 > db1.sql OR mysqldump -u -p db1 > db1.sql The difference between the above two commands is that the latter one does not contain the CREATE DATABASE command in the backup output. To dump specific tables in a database mysqldump -u -p db1 table1 table2 > db1_tables.sql To dump only table structures and no data mysqldump -u -p --no-data db1 > db1_structure.sql To dump only table data and no CREATE statements mysqldump -u -p --no-create-info db1 > db1_data.sql To dump only specific records from a table mysqldump -u -p --no-create-info db1 table1 --where=\u201dsalary>80000\u201d > db1_table1_80000.sql Mysqldump can also provide output in CSV, other delimited text or XML format to support use-cases if any. The backup from mysqldump utility is offline i.e. when the backup finishes it will not have the changes to the database which were made when the backup was going on. For example, if the backup started at 3 PM and finished at 4 PM, it will not have the changes made to the database between 3 and 4 PM. Restoring from mysqldump can be done in the following two ways:- From shell mysql -u -p < all_dbs.sql OR From shell if the database is already created mysql -u -p db1 < db1.sql From within MySQL shell mysql> source all_dbs.sql Percona Xtrabackup This utility is installed separately from the MySQL server and is open source, provided by Percona. It helps in getting the full or partial physical backup of the database. It provides online backup of the database i.e. it will have the changes made to the database when the backup was going on as explained at the end of the previous section. Full Backup - the complete backup of the database. Partial Backup - Incremental Cumulative - After one full backup, the next backups will have changes post the full backup. For example, we took a full backup on Sunday, from Monday onwards every backup will have changes after Sunday; so, Tuesday\u2019s backup will have Monday\u2019s changes as well, Wednesday\u2019s backup will have changes of Monday and Tuesday as well and so on. Differential - After one full backup, the next backups will have changes post the previous incremental backup. For example, we took a full backup on Sunday, Monday will have changes done after Sunday, Tuesday will have changes done after Monday, and so on. Percona xtrabackup allows us to get both full and incremental backups as we desire. However, incremental backups take less space than a full backup (if taken per day) but the restore time of incremental backups is more than that of full backups. Creating a full backup xtrabackup --defaults-file= --user= --password= --backup --target-dir= Example xtrabackup --defaults-file=/etc/my.cnf --user=some_user --password=XXXX --backup --target-dir=/mnt/data/backup/ Some other options --stream - can be used to stream the backup files to standard output in a specified format. xbstream is the only option for now. --tmp-dir - set this to a tmp directory to be used for temporary files while taking backups. --parallel - set this to the number of threads that can be used to parallely copy data files to target directory. --compress - by default - quicklz is used. Set this to have the backup in compressed format. Each file is a .qp compressed file and can be extracted by qpress file archiver. --decompress - decompresses all the files which were compressed with the .qp extension. It will not delete the .qp files after decompression. To do that, use --remove-original along with this. Please note that the decompress option should be run separately from the xtrabackup command that used the compress option. Preparing a backup Once the backup is done with the --backup option, we need to prepare it in order to restore it. This is done to make the datafiles consistent with point-in-time. There might have been some transactions going on while the backup was being executed and those have changed the data files. When we prepare a backup, all those transactions are applied to the data files. xtrabackup --prepare --target-dir= Example xtrabackup --prepare --target-dir=/mnt/data/backup/ It is not recommended to halt a process which is preparing the backup as that might cause data file corruption and backup cannot be used further. The backup will have to be taken again. Restoring a Full Backup To restore the backup which is created and prepared from above commands, just copy everything from the backup target-dir to the data-dir of MySQL server, change the ownership of all files to mysql user (the linux user used by MySQL server) and start mysql. Or the below command can be used as well, xtrabackup --defaults-file=/etc/my.cnf --copy-back --target-dir=/mnt/data/backups/ Note - the backup has to be prepared in order to restore it. Creating Incremental backups Percona Xtrabackup helps create incremental backups, i.e only the changes can be backed up since the last backup. Every InnoDB page contains a log sequence number or LSN that is also mentioned as one of the last lines of backup and prepare commands. xtrabackup: Transaction log of lsn to was copied. OR InnoDB: Shutdown completed; log sequence number completed OK! This indicates that the backup has been taken till the log sequence number mentioned. This is a key information in understanding incremental backups and working towards automating one. Incremental backups do not compare data files for changes, instead, they go through the InnoDB pages and compare their LSN to the last backup\u2019s LSN. So, without one full backup, the incremental backups are useless. The xtrabackup command creates a xtrabackup_checkpoint file which has the information about the LSN of the backup. Below are the key contents of the file:- backup_type = full-backuped | incremental from_lsn = 0 (full backup) | to_lsn of last backup to_lsn = last_lsn = There is a difference between to_lsn and last_lsn . When the last_lsn is more than to_lsn that means there are transactions that ran while we took the backup and are yet to be applied. That is what --prepare is used for. To take incremental backups, first, we require one full backup. xtrabackup --defaults-file=/etc/my.cnf --user=some_user --password=XXXX --backup --target-dir=/mnt/data/backup/full/ Let\u2019s assume the contents of the xtrabackup_checkpoint file to be as follows. backup_type = full-backuped from_lsn = 0 to_lsn = 1000 last_lsn = 1000 Now that we have one full backup, we can have an incremental backup that takes the changes. We will go with differential incremental backups. xtrabackup --defaults-file=/etc/my.cnf --user=some_user --password=XXXX --backup --target-dir=/mnt/data/backup/incr1/ --incremental-basedir=/mnt/data/backup/full/ There are delta files created in the incr1 directory like, ibdata1.delta , db1/tbl1.ibd.delta with the changes from the full directory. The xtrabackup_checkpoint file will thus have the following contents. backup_type = incremental from_lsn = 1000 to_lsn = 1500 last_lsn = 1500 Hence, the from_lsn here is equal to the to_lsn of the last backup or the basedir provided for the incremental backups. For the next incremental backup we can use this incremental backup as the basedir. xtrabackup --defaults-file=/etc/my.cnf --user=some_user --password=XXXX --backup --target-dir=/mnt/data/backup/incr2/ --incremental-basedir=/mnt/data/backup/incr1/ The xtrabackup_checkpoint file will thus have the following contents. backup_type = incremental from_lsn = 1500 to_lsn = 2000 last_lsn = 2200 Preparing Incremental backups Preparing incremental backups is not the same as preparing a full backup. When prepare runs, two operations are performed - committed transactions are applied on the data files and uncommitted transactions are rolled back . While preparing incremental backups, we have to skip rollback of uncommitted transactions as it is likely that they might get committed in the next incremental backup. If we rollback uncommitted transactions the further incremental backups cannot be applied. We use --apply-log-only option along with --prepare to avoid the rollback phase. From the last section, we had the following directories with complete backup /mnt/data/backup/full /mnt/data/backup/incr1 /mnt/data/backup/incr2 First, we prepare the full backup, but only with the --apply-log-only option. xtrabackup --prepare --apply-log-only --target-dir=/mnt/data/backup/full The output of the command will contain the following at the end. InnoDB: Shutdown complete; log sequence number 1000 Completed OK! Note the LSN mentioned at the end is the same as the to_lsn from the xtrabackup_checkpoint created for full backup. Next, we apply the changes from the first incremental backup to the full backup. xtrabackup --prepare --apply-log-only --target-dir=/mnt/data/backup/full --incremental-dir=/mnt/data/backup/incr1 This applies the delta files in the incremental directory to the full backup directory. It rolls the data files in the full backup directory forward to the time of incremental backup and applies the redo logs as usual. Lastly, we apply the last incremental backup same as the previous one with just a small change. xtrabackup --prepare --target-dir=/mnt/data/backup/full --incremental-dir=/mnt/data/backup/incr1 We do not have to use the --apply-log-only option with it. It applies the incr2 delta files to the full backup data files taking them forward, applies redo logs on them and finally rollbacks the uncommitted transactions to produce the final result. The data now present in the full backup directory can now be used to restore. Note - To create cumulative incremental backups, the incremental-basedir should always be the full backup directory for every incremental backup. While preparing, we can start with the full backup with the --apply-log-only option and use just the last incremental backup for the final --prepare as that has all the changes since the full backup. Restoring Incremental backups Once all the above steps are completed, restoring is the same as done for a full backup. Further Reading MySQL Point-In-Time-Recovery Another MySQL backup tool - mysqlpump Another MySQL backup tool - mydumper A comparison between mysqldump, mysqlpump and mydumper Backup Best Practices","title":"Backup and Recovery"},{"location":"level101/databases_sql/backup_recovery/#backup-and-recovery","text":"Backups are a very crucial part of any database setup. They are generally a copy of the data that can be used to reconstruct the data in case of any major or minor crisis with the database. In general terms backups can be of two types:- Physical Backup - the data directory as it is on the disk Logical Backup - the table structure and records in it Both the above kinds of backups are supported by MySQL with different tools. It is the job of the SRE to identify which should be used when.","title":"Backup and Recovery"},{"location":"level101/databases_sql/backup_recovery/#mysqldump","text":"This utility is available with MySQL installation. It helps in getting the logical backup of the database. It outputs a set of SQL statements to reconstruct the data. It is not recommended to use mysqldump for large tables as it might take a lot of time and the file size will be huge. However, for small tables it is the best and the quickest option. mysqldump [options] > dump_output.sql There are certain options that can be used with mysqldump to get an appropriate dump of the database. To dump all the databases mysqldump -u -p --all-databases > all_dbs.sql To dump specific databases mysqldump -u -p --databases db1 db2 db3 > dbs.sql To dump a single database mysqldump -u -p --databases db1 > db1.sql OR mysqldump -u -p db1 > db1.sql The difference between the above two commands is that the latter one does not contain the CREATE DATABASE command in the backup output. To dump specific tables in a database mysqldump -u -p db1 table1 table2 > db1_tables.sql To dump only table structures and no data mysqldump -u -p --no-data db1 > db1_structure.sql To dump only table data and no CREATE statements mysqldump -u -p --no-create-info db1 > db1_data.sql To dump only specific records from a table mysqldump -u -p --no-create-info db1 table1 --where=\u201dsalary>80000\u201d > db1_table1_80000.sql Mysqldump can also provide output in CSV, other delimited text or XML format to support use-cases if any. The backup from mysqldump utility is offline i.e. when the backup finishes it will not have the changes to the database which were made when the backup was going on. For example, if the backup started at 3 PM and finished at 4 PM, it will not have the changes made to the database between 3 and 4 PM. Restoring from mysqldump can be done in the following two ways:- From shell mysql -u -p < all_dbs.sql OR From shell if the database is already created mysql -u -p db1 < db1.sql From within MySQL shell mysql> source all_dbs.sql","title":"Mysqldump"},{"location":"level101/databases_sql/backup_recovery/#percona-xtrabackup","text":"This utility is installed separately from the MySQL server and is open source, provided by Percona. It helps in getting the full or partial physical backup of the database. It provides online backup of the database i.e. it will have the changes made to the database when the backup was going on as explained at the end of the previous section. Full Backup - the complete backup of the database. Partial Backup - Incremental Cumulative - After one full backup, the next backups will have changes post the full backup. For example, we took a full backup on Sunday, from Monday onwards every backup will have changes after Sunday; so, Tuesday\u2019s backup will have Monday\u2019s changes as well, Wednesday\u2019s backup will have changes of Monday and Tuesday as well and so on. Differential - After one full backup, the next backups will have changes post the previous incremental backup. For example, we took a full backup on Sunday, Monday will have changes done after Sunday, Tuesday will have changes done after Monday, and so on. Percona xtrabackup allows us to get both full and incremental backups as we desire. However, incremental backups take less space than a full backup (if taken per day) but the restore time of incremental backups is more than that of full backups. Creating a full backup xtrabackup --defaults-file= --user= --password= --backup --target-dir= Example xtrabackup --defaults-file=/etc/my.cnf --user=some_user --password=XXXX --backup --target-dir=/mnt/data/backup/ Some other options --stream - can be used to stream the backup files to standard output in a specified format. xbstream is the only option for now. --tmp-dir - set this to a tmp directory to be used for temporary files while taking backups. --parallel - set this to the number of threads that can be used to parallely copy data files to target directory. --compress - by default - quicklz is used. Set this to have the backup in compressed format. Each file is a .qp compressed file and can be extracted by qpress file archiver. --decompress - decompresses all the files which were compressed with the .qp extension. It will not delete the .qp files after decompression. To do that, use --remove-original along with this. Please note that the decompress option should be run separately from the xtrabackup command that used the compress option. Preparing a backup Once the backup is done with the --backup option, we need to prepare it in order to restore it. This is done to make the datafiles consistent with point-in-time. There might have been some transactions going on while the backup was being executed and those have changed the data files. When we prepare a backup, all those transactions are applied to the data files. xtrabackup --prepare --target-dir= Example xtrabackup --prepare --target-dir=/mnt/data/backup/ It is not recommended to halt a process which is preparing the backup as that might cause data file corruption and backup cannot be used further. The backup will have to be taken again. Restoring a Full Backup To restore the backup which is created and prepared from above commands, just copy everything from the backup target-dir to the data-dir of MySQL server, change the ownership of all files to mysql user (the linux user used by MySQL server) and start mysql. Or the below command can be used as well, xtrabackup --defaults-file=/etc/my.cnf --copy-back --target-dir=/mnt/data/backups/ Note - the backup has to be prepared in order to restore it. Creating Incremental backups Percona Xtrabackup helps create incremental backups, i.e only the changes can be backed up since the last backup. Every InnoDB page contains a log sequence number or LSN that is also mentioned as one of the last lines of backup and prepare commands. xtrabackup: Transaction log of lsn to was copied. OR InnoDB: Shutdown completed; log sequence number completed OK! This indicates that the backup has been taken till the log sequence number mentioned. This is a key information in understanding incremental backups and working towards automating one. Incremental backups do not compare data files for changes, instead, they go through the InnoDB pages and compare their LSN to the last backup\u2019s LSN. So, without one full backup, the incremental backups are useless. The xtrabackup command creates a xtrabackup_checkpoint file which has the information about the LSN of the backup. Below are the key contents of the file:- backup_type = full-backuped | incremental from_lsn = 0 (full backup) | to_lsn of last backup to_lsn = last_lsn = There is a difference between to_lsn and last_lsn . When the last_lsn is more than to_lsn that means there are transactions that ran while we took the backup and are yet to be applied. That is what --prepare is used for. To take incremental backups, first, we require one full backup. xtrabackup --defaults-file=/etc/my.cnf --user=some_user --password=XXXX --backup --target-dir=/mnt/data/backup/full/ Let\u2019s assume the contents of the xtrabackup_checkpoint file to be as follows. backup_type = full-backuped from_lsn = 0 to_lsn = 1000 last_lsn = 1000 Now that we have one full backup, we can have an incremental backup that takes the changes. We will go with differential incremental backups. xtrabackup --defaults-file=/etc/my.cnf --user=some_user --password=XXXX --backup --target-dir=/mnt/data/backup/incr1/ --incremental-basedir=/mnt/data/backup/full/ There are delta files created in the incr1 directory like, ibdata1.delta , db1/tbl1.ibd.delta with the changes from the full directory. The xtrabackup_checkpoint file will thus have the following contents. backup_type = incremental from_lsn = 1000 to_lsn = 1500 last_lsn = 1500 Hence, the from_lsn here is equal to the to_lsn of the last backup or the basedir provided for the incremental backups. For the next incremental backup we can use this incremental backup as the basedir. xtrabackup --defaults-file=/etc/my.cnf --user=some_user --password=XXXX --backup --target-dir=/mnt/data/backup/incr2/ --incremental-basedir=/mnt/data/backup/incr1/ The xtrabackup_checkpoint file will thus have the following contents. backup_type = incremental from_lsn = 1500 to_lsn = 2000 last_lsn = 2200 Preparing Incremental backups Preparing incremental backups is not the same as preparing a full backup. When prepare runs, two operations are performed - committed transactions are applied on the data files and uncommitted transactions are rolled back . While preparing incremental backups, we have to skip rollback of uncommitted transactions as it is likely that they might get committed in the next incremental backup. If we rollback uncommitted transactions the further incremental backups cannot be applied. We use --apply-log-only option along with --prepare to avoid the rollback phase. From the last section, we had the following directories with complete backup /mnt/data/backup/full /mnt/data/backup/incr1 /mnt/data/backup/incr2 First, we prepare the full backup, but only with the --apply-log-only option. xtrabackup --prepare --apply-log-only --target-dir=/mnt/data/backup/full The output of the command will contain the following at the end. InnoDB: Shutdown complete; log sequence number 1000 Completed OK! Note the LSN mentioned at the end is the same as the to_lsn from the xtrabackup_checkpoint created for full backup. Next, we apply the changes from the first incremental backup to the full backup. xtrabackup --prepare --apply-log-only --target-dir=/mnt/data/backup/full --incremental-dir=/mnt/data/backup/incr1 This applies the delta files in the incremental directory to the full backup directory. It rolls the data files in the full backup directory forward to the time of incremental backup and applies the redo logs as usual. Lastly, we apply the last incremental backup same as the previous one with just a small change. xtrabackup --prepare --target-dir=/mnt/data/backup/full --incremental-dir=/mnt/data/backup/incr1 We do not have to use the --apply-log-only option with it. It applies the incr2 delta files to the full backup data files taking them forward, applies redo logs on them and finally rollbacks the uncommitted transactions to produce the final result. The data now present in the full backup directory can now be used to restore. Note - To create cumulative incremental backups, the incremental-basedir should always be the full backup directory for every incremental backup. While preparing, we can start with the full backup with the --apply-log-only option and use just the last incremental backup for the final --prepare as that has all the changes since the full backup. Restoring Incremental backups Once all the above steps are completed, restoring is the same as done for a full backup.","title":"Percona Xtrabackup"},{"location":"level101/databases_sql/backup_recovery/#further-reading","text":"MySQL Point-In-Time-Recovery Another MySQL backup tool - mysqlpump Another MySQL backup tool - mydumper A comparison between mysqldump, mysqlpump and mydumper Backup Best Practices","title":"Further Reading"},{"location":"level101/databases_sql/concepts/","text":"Relational DBs are used for data storage. Even a file can be used to store data, but relational DBs are designed with specific goals: Efficiency Ease of access and management Organized Handle relations between data (represented as tables) Transaction: a unit of work that can comprise multiple statements, executed together ACID properties Set of properties that guarantee data integrity of DB transactions Atomicity: Each transaction is atomic (succeeds or fails completely) Consistency: Transactions only result in valid state (which includes rules, constraints, triggers etc.) Isolation: Each transaction is executed independently of others safely within a concurrent system Durability: Completed transactions will not be lost due to any later failures Let\u2019s take some examples to illustrate the above properties. Account A has a balance of \u20b9200 & B has \u20b9400. Account A is transferring \u20b9100 to Account B. This transaction has a deduction from sender and an addition into the recipient\u2019s balance. If the first operation passes successfully while the second fails, A\u2019s balance would be \u20b9100 while B would be having \u20b9400 instead of \u20b9500. Atomicity in a DB ensures this partially failed transaction is rolled back. If the second operation above fails, it leaves the DB inconsistent (sum of balance of accounts before and after the operation is not the same). Consistency ensures that this does not happen. There are three operations, one to calculate interest for A\u2019s account, another to add that to A\u2019s account, then transfer \u20b9100 from B to A. Without isolation guarantees, concurrent execution of these 3 operations may lead to a different outcome every time. What happens if the system crashes before the transactions are written to disk? Durability ensures that the changes are applied correctly during recovery. Relational data Tables represent relations Columns (fields) represent attributes Rows are individual records Schema describes the structure of DB SQL A query language to interact with and manage data. CRUD operations - create, read, update, delete queries Management operations - create DBs/tables/indexes etc, backup, import/export, users, access controls Exercise: Classify the below queries into the four types - DDL (definition), DML(manipulation), DCL(control) and TCL(transactions) and explain in detail. insert, create, drop, delete, update, commit, rollback, truncate, alter, grant, revoke You can practise these in the lab section . Constraints Rules for data that can be stored. Query fails if you violate any of these defined on a table. Primary key: one or more columns that contain UNIQUE values, and cannot contain NULL values. A table can have only ONE primary key. An index on it is created by default. Foreign key: links two tables together. Its value(s) match a primary key in a different table \\ Not null: Does not allow null values \\ Unique: Value of column must be unique across all rows \\ Default: Provides a default value for a column if none is specified during insert Check: Allows only particular values (like Balance >= 0) Indexes Most indexes use B+ tree structure. Why use them: Speeds up queries (in large tables that fetch only a few rows, min/max queries, by eliminating rows from consideration etc) Types of indexes: unique, primary key, fulltext, secondary Write-heavy loads, mostly full table scans or accessing large number of rows etc. do not benefit from indexes Joins Allows you to fetch related data from multiple tables, linking them together with some common field. Powerful but also resource-intensive and makes scaling databases difficult. This is the cause of many slow performing queries when run at scale, and the solution is almost always to find ways to reduce the joins. Access control DBs have privileged accounts for admin tasks, and regular accounts for clients. There are finegrained controls on what actions(DDL, DML etc. discussed earlier )are allowed for these accounts. DB first verifies the user credentials (authentication), and then examines whether this user is permitted to perform the request (authorization) by looking up these information in some internal tables. Other controls include activity auditing that allows examining the history of actions done by a user, and resource limits which define the number of queries, connections etc. allowed. Popular databases Commercial, closed source - Oracle, Microsoft SQL Server, IBM DB2 Open source with optional paid support - MySQL, MariaDB, PostgreSQL Individuals and small companies have always preferred open source DBs because of the huge cost associated with commercial software. In recent times, even large organizations have moved away from commercial software to open source alternatives because of the flexibility and cost savings associated with it. Lack of support is no longer a concern because of the paid support available from the developer and third parties. MySQL is the most widely used open source DB, and it is widely supported by hosting providers, making it easy for anyone to use. It is part of the popular Linux-Apache-MySQL-PHP ( LAMP ) stack that became popular in the 2000s. We have many more choices for a programming language, but the rest of that stack is still widely used.","title":"Key Concepts"},{"location":"level101/databases_sql/concepts/#popular-databases","text":"Commercial, closed source - Oracle, Microsoft SQL Server, IBM DB2 Open source with optional paid support - MySQL, MariaDB, PostgreSQL Individuals and small companies have always preferred open source DBs because of the huge cost associated with commercial software. In recent times, even large organizations have moved away from commercial software to open source alternatives because of the flexibility and cost savings associated with it. Lack of support is no longer a concern because of the paid support available from the developer and third parties. MySQL is the most widely used open source DB, and it is widely supported by hosting providers, making it easy for anyone to use. It is part of the popular Linux-Apache-MySQL-PHP ( LAMP ) stack that became popular in the 2000s. We have many more choices for a programming language, but the rest of that stack is still widely used.","title":"Popular databases"},{"location":"level101/databases_sql/conclusion/","text":"Conclusion We have covered basic concepts of SQL databases. We have also covered some of the tasks that an SRE may be responsible for - there is so much more to learn and do. We hope this course gives you a good start and inspires you to explore further. Further reading More practice with online resources like this one Normalization Routines , triggers Views Transaction isolation levels Sharding Setting up HA , monitoring , backups","title":"Conclusion"},{"location":"level101/databases_sql/conclusion/#conclusion","text":"We have covered basic concepts of SQL databases. We have also covered some of the tasks that an SRE may be responsible for - there is so much more to learn and do. We hope this course gives you a good start and inspires you to explore further.","title":"Conclusion"},{"location":"level101/databases_sql/conclusion/#further-reading","text":"More practice with online resources like this one Normalization Routines , triggers Views Transaction isolation levels Sharding Setting up HA , monitoring , backups","title":"Further reading"},{"location":"level101/databases_sql/innodb/","text":"Why should you use this? General purpose, row level locking, ACID support, transactions, crash recovery and multi-version concurrency control etc. Architecture Key components: Memory: Buffer pool: LRU cache of frequently used data(table and index) to be processed directly from memory, which speeds up processing. Important for tuning performance. Change buffer: Caches changes to secondary index pages when those pages are not in the buffer pool and merges it when they are fetched. Merging may take a long time and impact live queries. It also takes up part of the buffer pool. Avoids the extra I/O to read secondary indexes in. Adaptive hash index: Supplements InnoDB\u2019s B-Tree indexes with fast hash lookup tables like a cache. Slight performance penalty for misses, also adds maintenance overhead of updating it. Hash collisions cause AHI rebuilding for large DBs. Log buffer: Holds log data before flush to disk. Size of each above memory is configurable, and impacts performance a lot. Requires careful analysis of workload, available resources, benchmarking and tuning for optimal performance. Disk: Tables: Stores data within rows and columns. Indexes: Helps find rows with specific column values quickly, avoids full table scans. Redo Logs: all transactions are written to them, and after a crash, the recovery process corrects data written by incomplete transactions and replays any pending ones. Undo Logs: Records associated with a single transaction that contains information about how to undo the latest change by a transaction.","title":"InnoDB"},{"location":"level101/databases_sql/innodb/#why-should-you-use-this","text":"General purpose, row level locking, ACID support, transactions, crash recovery and multi-version concurrency control etc.","title":"Why should you use this?"},{"location":"level101/databases_sql/innodb/#architecture","text":"","title":"Architecture"},{"location":"level101/databases_sql/innodb/#key-components","text":"Memory: Buffer pool: LRU cache of frequently used data(table and index) to be processed directly from memory, which speeds up processing. Important for tuning performance. Change buffer: Caches changes to secondary index pages when those pages are not in the buffer pool and merges it when they are fetched. Merging may take a long time and impact live queries. It also takes up part of the buffer pool. Avoids the extra I/O to read secondary indexes in. Adaptive hash index: Supplements InnoDB\u2019s B-Tree indexes with fast hash lookup tables like a cache. Slight performance penalty for misses, also adds maintenance overhead of updating it. Hash collisions cause AHI rebuilding for large DBs. Log buffer: Holds log data before flush to disk. Size of each above memory is configurable, and impacts performance a lot. Requires careful analysis of workload, available resources, benchmarking and tuning for optimal performance. Disk: Tables: Stores data within rows and columns. Indexes: Helps find rows with specific column values quickly, avoids full table scans. Redo Logs: all transactions are written to them, and after a crash, the recovery process corrects data written by incomplete transactions and replays any pending ones. Undo Logs: Records associated with a single transaction that contains information about how to undo the latest change by a transaction.","title":"Key components:"},{"location":"level101/databases_sql/intro/","text":"Relational Databases Prerequisites Complete Linux course Install Docker (for lab section) What to expect from this course You will have an understanding of what relational databases are, their advantages, and some MySQL specific concepts. What is not covered under this course In depth implementation details Advanced topics like normalization, sharding Specific tools for administration Introduction The main purpose of database systems is to manage data. This includes storage, adding new data, deleting unused data, updating existing data, retrieving data within a reasonable response time, other maintenance tasks to keep the system running etc. Pre-reads RDBMS Concepts Course Contents Key Concepts MySQL Architecture InnoDB Backup and Recovery MySQL Replication Operational Concepts SELECT Query Query Performance Lab Further Reading","title":"Introduction"},{"location":"level101/databases_sql/intro/#relational-databases","text":"","title":"Relational Databases"},{"location":"level101/databases_sql/intro/#prerequisites","text":"Complete Linux course Install Docker (for lab section)","title":"Prerequisites"},{"location":"level101/databases_sql/intro/#what-to-expect-from-this-course","text":"You will have an understanding of what relational databases are, their advantages, and some MySQL specific concepts.","title":"What to expect from this course"},{"location":"level101/databases_sql/intro/#what-is-not-covered-under-this-course","text":"In depth implementation details Advanced topics like normalization, sharding Specific tools for administration","title":"What is not covered under this course"},{"location":"level101/databases_sql/intro/#introduction","text":"The main purpose of database systems is to manage data. This includes storage, adding new data, deleting unused data, updating existing data, retrieving data within a reasonable response time, other maintenance tasks to keep the system running etc.","title":"Introduction"},{"location":"level101/databases_sql/intro/#pre-reads","text":"RDBMS Concepts","title":"Pre-reads"},{"location":"level101/databases_sql/intro/#course-contents","text":"Key Concepts MySQL Architecture InnoDB Backup and Recovery MySQL Replication Operational Concepts SELECT Query Query Performance Lab Further Reading","title":"Course Contents"},{"location":"level101/databases_sql/lab/","text":"Prerequisites Install Docker Setup Create a working directory named sos or something similar, and cd into it. Enter the following into a file named my.cnf under a directory named custom. sos $ cat custom/my.cnf [mysqld] # These settings apply to MySQL server # You can set port, socket path, buffer size etc. # Below, we are configuring slow query settings slow_query_log=1 slow_query_log_file=/var/log/mysqlslow.log long_query_time=1 Start a container and enable slow query log with the following: sos $ docker run --name db -v custom:/etc/mysql/conf.d -e MYSQL_ROOT_PASSWORD=realsecret -d mysql:8 sos $ docker cp custom/my.cnf $(docker ps -qf \"name=db\"):/etc/mysql/conf.d/custom.cnf sos $ docker restart $(docker ps -qf \"name=db\") Import a sample database sos $ git clone git@github.com:datacharmer/test_db.git sos $ docker cp test_db $(docker ps -qf \"name=db\"):/home/test_db/ sos $ docker exec -it $(docker ps -qf \"name=db\") bash root@3ab5b18b0c7d:/# cd /home/test_db/ root@3ab5b18b0c7d:/# mysql -uroot -prealsecret mysql < employees.sql root@3ab5b18b0c7d:/etc# touch /var/log/mysqlslow.log root@3ab5b18b0c7d:/etc# chown mysql:mysql /var/log/mysqlslow.log Workshop 1: Run some sample queries Run the following $ mysql -uroot -prealsecret mysql mysql> # inspect DBs and tables # the last 4 are MySQL internal DBs mysql> show databases; +--------------------+ | Database | +--------------------+ | employees | | information_schema | | mysql | | performance_schema | | sys | +--------------------+ > use employees; mysql> show tables; +----------------------+ | Tables_in_employees | +----------------------+ | current_dept_emp | | departments | | dept_emp | | dept_emp_latest_date | | dept_manager | | employees | | salaries | | titles | +----------------------+ # read a few rows mysql> select * from employees limit 5; # filter data by conditions mysql> select count(*) from employees where gender = 'M' limit 5; # find count of particular data mysql> select count(*) from employees where first_name = 'Sachin'; Workshop 2: Use explain and explain analyze to profile a query, identify and add indexes required for improving performance # View all indexes on table #(\\G is to output horizontally, replace it with a ; to get table output) mysql> show index from employees from employees\\G *************************** 1. row *************************** Table: employees Non_unique: 0 Key_name: PRIMARY Seq_in_index: 1 Column_name: emp_no Collation: A Cardinality: 299113 Sub_part: NULL Packed: NULL Null: Index_type: BTREE Comment: Index_comment: Visible: YES Expression: NULL # This query uses an index, idenitfied by 'key' field # By prefixing explain keyword to the command, # we get query plan (including key used) mysql> explain select * from employees where emp_no < 10005\\G *************************** 1. row *************************** id: 1 select_type: SIMPLE table: employees partitions: NULL type: range possible_keys: PRIMARY key: PRIMARY key_len: 4 ref: NULL rows: 4 filtered: 100.00 Extra: Using where # Compare that to the next query which does not utilize any index mysql> explain select first_name, last_name from employees where first_name = 'Sachin'\\G *************************** 1. row *************************** id: 1 select_type: SIMPLE table: employees partitions: NULL type: ALL possible_keys: NULL key: NULL key_len: NULL ref: NULL rows: 299113 filtered: 10.00 Extra: Using where # Let's see how much time this query takes mysql> explain analyze select first_name, last_name from employees where first_name = 'Sachin'\\G *************************** 1. row *************************** EXPLAIN: -> Filter: (employees.first_name = 'Sachin') (cost=30143.55 rows=29911) (actual time=28.284..3952.428 rows=232 loops=1) -> Table scan on employees (cost=30143.55 rows=299113) (actual time=0.095..1996.092 rows=300024 loops=1) # Cost(estimated by query planner) is 30143.55 # actual time=28.284ms for first row, 3952.428 for all rows # Now lets try adding an index and running the query again mysql> create index idx_firstname on employees(first_name); Query OK, 0 rows affected (1.25 sec) Records: 0 Duplicates: 0 Warnings: 0 mysql> explain analyze select first_name, last_name from employees where first_name = 'Sachin'; +--------------------------------------------------------------------------------------------------------------------------------------------+ | EXPLAIN | +--------------------------------------------------------------------------------------------------------------------------------------------+ | -> Index lookup on employees using idx_firstname (first_name='Sachin') (cost=81.20 rows=232) (actual time=0.551..2.934 rows=232 loops=1) | +--------------------------------------------------------------------------------------------------------------------------------------------+ 1 row in set (0.01 sec) # Actual time=0.551ms for first row # 2.934ms for all rows. A huge improvement! # Also notice that the query involves only an index lookup, # and no table scan (reading all rows of table) # ..which vastly reduces load on the DB. Workshop 3: Identify slow queries on a MySQL server # Run the command below in two terminal tabs to open two shells into the container. docker exec -it $(docker ps -qf \"name=db\") bash # Open a mysql prompt in one of them and execute this command # We have configured to log queries that take longer than 1s, # so this sleep(3) will be logged mysql -uroot -prealsecret mysql mysql> select sleep(3); # Now, in the other terminal, tail the slow log to find details about the query root@62c92c89234d:/etc# tail -f /var/log/mysqlslow.log /usr/sbin/mysqld, Version: 8.0.21 (MySQL Community Server - GPL). started with: Tcp port: 3306 Unix socket: /var/run/mysqld/mysqld.sock Time Id Command Argument # Time: 2020-11-26T14:53:44.822348Z # User@Host: root[root] @ localhost [] Id: 9 # Query_time: 5.404938 Lock_time: 0.000000 Rows_sent: 1 Rows_examined: 1 use employees; # Time: 2020-11-26T14:53:58.015736Z # User@Host: root[root] @ localhost [] Id: 9 # Query_time: 10.000225 Lock_time: 0.000000 Rows_sent: 1 Rows_examined: 1 SET timestamp=1606402428; select sleep(3); These were simulated examples with minimal complexity. In real life, the queries would be much more complex and the explain/analyze and slow query logs would have more details.","title":"Lab"},{"location":"level101/databases_sql/mysql/","text":"MySQL architecture MySQL architecture enables you to select the right storage engine for your needs, and abstracts away all implementation details from the end users (application engineers and DBA ) who only need to know a consistent stable API. Application layer: Connection handling - each client gets its own connection which is cached for the duration of access) Authentication - server checks (username,password,host) info of client and allows/rejects connection Security: server determines whether the client has privileges to execute each query (check with show privileges command) Server layer: Services and utilities - backup/restore, replication, cluster etc SQL interface - clients run queries for data access and manipulation SQL parser - creates a parse tree from the query (lexical/syntactic/semantic analysis and code generation) Optimizer - optimizes queries using various algorithms and data available to it(table level stats), modifies queries, order of scanning, indexes to use etc. (check with explain command) Caches and buffers - cache stores query results, buffer pool(InnoDB) stores table and index data in LRU fashion Storage engine options: InnoDB: most widely used, transaction support, ACID compliant, supports row-level locking, crash recovery and multi-version concurrency control. Default since MySQL 5.5+. MyISAM: fast, does not support transactions, provides table-level locking, great for read-heavy workloads, mostly in web and data warehousing. Default upto MySQL 5.1. Archive: optimised for high speed inserts, compresses data as it is inserted, does not support transactions, ideal for storing and retrieving large amounts of seldom referenced historical, archived data Memory: tables in memory. Fastest engine, supports table-level locking, does not support transactions, ideal for creating temporary tables or quick lookups, data is lost after a shutdown CSV: stores data in CSV files, great for integrating into other applications that use this format \u2026 etc. It is possible to migrate from one storage engine to another. But this migration locks tables for all operations and is not online, as it changes the physical layout of the data. It takes a long time and is generally not recommended. Hence, choosing the right storage engine at the beginning is important. General guideline is to use InnoDB unless you have a specific need for one of the other storage engines. Running mysql> SHOW ENGINES; shows you the supported engines on your MySQL server.","title":"MySQL"},{"location":"level101/databases_sql/mysql/#mysql-architecture","text":"MySQL architecture enables you to select the right storage engine for your needs, and abstracts away all implementation details from the end users (application engineers and DBA ) who only need to know a consistent stable API. Application layer: Connection handling - each client gets its own connection which is cached for the duration of access) Authentication - server checks (username,password,host) info of client and allows/rejects connection Security: server determines whether the client has privileges to execute each query (check with show privileges command) Server layer: Services and utilities - backup/restore, replication, cluster etc SQL interface - clients run queries for data access and manipulation SQL parser - creates a parse tree from the query (lexical/syntactic/semantic analysis and code generation) Optimizer - optimizes queries using various algorithms and data available to it(table level stats), modifies queries, order of scanning, indexes to use etc. (check with explain command) Caches and buffers - cache stores query results, buffer pool(InnoDB) stores table and index data in LRU fashion Storage engine options: InnoDB: most widely used, transaction support, ACID compliant, supports row-level locking, crash recovery and multi-version concurrency control. Default since MySQL 5.5+. MyISAM: fast, does not support transactions, provides table-level locking, great for read-heavy workloads, mostly in web and data warehousing. Default upto MySQL 5.1. Archive: optimised for high speed inserts, compresses data as it is inserted, does not support transactions, ideal for storing and retrieving large amounts of seldom referenced historical, archived data Memory: tables in memory. Fastest engine, supports table-level locking, does not support transactions, ideal for creating temporary tables or quick lookups, data is lost after a shutdown CSV: stores data in CSV files, great for integrating into other applications that use this format \u2026 etc. It is possible to migrate from one storage engine to another. But this migration locks tables for all operations and is not online, as it changes the physical layout of the data. It takes a long time and is generally not recommended. Hence, choosing the right storage engine at the beginning is important. General guideline is to use InnoDB unless you have a specific need for one of the other storage engines. Running mysql> SHOW ENGINES; shows you the supported engines on your MySQL server.","title":"MySQL architecture"},{"location":"level101/databases_sql/operations/","text":"Explain and explain+analyze EXPLAIN analyzes query plans from the optimizer, including how tables are joined, which tables/rows are scanned etc. Explain analyze shows the above and additional info like execution cost, number of rows returned, time taken etc. This knowledge is useful to tweak queries and add indexes. Watch this performance tuning tutorial video . Checkout the lab section for a hands-on about indexes. Slow query logs Used to identify slow queries (configurable threshold), enabled in config or dynamically with a query Checkout the lab section about identifying slow queries. User management This includes creation and changes to users, like managing privileges, changing password etc. Backup and restore strategies, pros and cons Logical backup using mysqldump - slower but can be done online Physical backup (copy data directory or use xtrabackup) - quick backup/recovery. Copying data directory requires locking or shut down. xtrabackup is an improvement because it supports backups without shutting down (hot backup). Others - PITR, snapshots etc. Crash recovery process using redo logs After a crash, when you restart server it reads redo logs and replays modifications to recover Monitoring MySQL Key MySQL metrics: reads, writes, query runtime, errors, slow queries, connections, running threads, InnoDB metrics Key OS metrics: CPU, load, memory, disk I/O, network Replication Copies data from one instance to one or more instances. Helps in horizontal scaling, data protection, analytics and performance. Binlog dump thread on primary, replication I/O and SQL threads on secondary. Strategies include the standard async, semi async or group replication. High Availability Ability to cope with failure at software, hardware and network level. Essential for anyone who needs 99.9%+ uptime. Can be implemented with replication or clustering solutions from MySQL, Percona, Oracle etc. Requires expertise to setup and maintain. Failover can be manual, scripted or using tools like Orchestrator. Data directory Data is stored in a particular directory, with nested directories for the data contained in each database. There are also MySQL log files, InnoDB log files, server process ID file and some other configs. The data directory is configurable. MySQL configuration This can be done by passing parameters during startup , or in a file . There are a few standard paths where MySQL looks for config files, /etc/my.cnf is one of the commonly used paths. These options are organized under headers (mysqld for server and mysql for client), you can explore them more in the lab that follows. Logs MySQL has logs for various purposes - general query log, errors, binary logs (for replication), slow query log. Only error log is enabled by default (to reduce I/O and storage requirement), the others can be enabled when required - by specifying config parameters at startup or running commands at runtime. Log destination can also be tweaked with config parameters.","title":"Operational Concepts"},{"location":"level101/databases_sql/query_performance/","text":"Query Performance Improvement Query Performance is a very crucial aspect of relational databases. If not tuned correctly, the select queries can become slow and painful for the application, and for the MySQL server as well. The important task is to identify the slow queries and try to improve their performance by either rewriting them or creating proper indexes on the tables involved in it. The Slow Query Log The slow query log contains SQL statements that take a longer time to execute then set in the config parameter long_query_time. These queries are the candidates for optimization. There are some good utilities to summarize the slow query logs like, mysqldumpslow (provided by MySQL itself), pt-query-digest (provided by Percona), etc. Following are the config parameters that are used to enable and effectively catch slow queries Variable Explanation Example value slow_query_log Enables or disables slow query logs ON slow_query_log_file The location of the slow query log /var/lib/mysql/mysql-slow.log long_query_time Threshold time. The query that takes longer than this time is logged in slow query log 5 log_queries_not_using_indexes When enabled with the slow query log, the queries which do not make use of any index are also logged in the slow query log even though they take less time than long_query_time. ON So, for this section, we will be enabling slow_query_log , long_query_time will be kept to 0.3 (300 ms) , and log_queries_not_using index will be enabled as well. Below are the queries that we will execute on the employees database. select * from employees where last_name = 'Koblick'; select * from salaries where salary >= 100000; select * from titles where title = 'Manager'; select * from employees where year(hire_date) = 1995; select year(e.hire_date), max(s.salary) from employees e join salaries s on e.emp_no=s.emp_no group by year(e.hire_date); Now, queries 1 , 3 and 4 executed under 300 ms but if we check the slow query logs, we will find these queries logged as they are not using any of the index. Queries 2 and 5 are taking longer than 300ms and also not using any index. Use the following command to get the summary of the slow query log mysqldumpslow /var/lib/mysql/mysql-slow.log There are some more queries in the snapshot that were along with the queries mentioned. Mysqldumpslow replaces actual values that were used by N (in case of numbers) and S (in case of strings). That can be overridden by -a option, however that will increase the output lines if different values are used in similar queries. The EXPLAIN Plan The EXPLAIN command is used with any query that we want to analyze. It describes the query execution plan, how MySQL sees and executes the query. EXPLAIN works with Select, Insert, Update and Delete statements. It tells about different aspects of the query like, how tables are joined, indexes used or not, etc. The important thing here is to understand the basic Explain plan output of a query to determine its performance. Let's take the following query as an example, mysql> explain select * from salaries where salary = 100000; +----+-------------+----------+------------+------+---------------+------+---------+------+---------+----------+-------------+ | id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra | +----+-------------+----------+------------+------+---------------+------+---------+------+---------+----------+-------------+ | 1 | SIMPLE | salaries | NULL | ALL | NULL | NULL | NULL | NULL | 2838426 | 10.00 | Using where | +----+-------------+----------+------------+------+---------------+------+---------+------+---------+----------+-------------+ 1 row in set, 1 warning (0.00 sec) The key aspects to understand in the above output are:- Partitions - the number of partitions considered while executing the query. It is only valid if the table is partitioned. Possible_keys - the list of indexes that were considered during creation of the execution plan. Key - the index that will be used while executing the query. Rows - the number of rows examined during the execution. Filtered - the percentage of rows that were filtered out of the rows examined. The maximum and most optimized result will have 100 in this field. Extra - this tells some extra information on how MySQL evaluates, whether the query is using only where clause to match target rows, any index or temporary table, etc. So, for the above query, we can determine that there are no partitions, there are no candidate indexes to be used and so no index is used at all, over 2M rows are examined and only 10% of them are included in the result, and lastly, only a where clause is used to match the target rows. Creating an Index Indexes are used to speed up selecting relevant rows for a given column value. Without an index, MySQL starts with the first row and goes through the entire table to find matching rows. If the table has too many rows, the operation becomes costly. With indexes, MySQL determines the position to start looking for the data without reading the full table. A primary key is also an index which is also the fastest and is stored along with the table data. Secondary indexes are stored outside of the table data and are used to further enhance the performance of SQL statements. Indexes are mostly stored as B-Trees, with some exceptions like spatial indexes use R-Trees and memory tables use hash indexes. There are 2 ways to create indexes:- While creating a table - if we know beforehand the columns that will drive the most number of where clauses in select queries, then we can put an index over them while creating a table. Altering a Table - To improve the performance of a troubling query, we create an index on a table which already has data in it using ALTER or CREATE INDEX command. This operation does not block the table but might take some time to complete depending on the size of the table. Let\u2019s look at the query that we discussed in the previous section. It\u2019s clear that scanning over 2M records is not a good idea when only 10% of those records are actually in the resultset. Hence, we create an index on the salary column of the salaries table. create index idx_salary on salaries(salary) OR alter table salaries add index idx_salary(salary) And the same explain plan now looks like this mysql> explain select * from salaries where salary = 100000; +----+-------------+----------+------------+------+---------------+------------+---------+-------+------+----------+-------+ | id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra | +----+-------------+----------+------------+------+---------------+------------+---------+-------+------+----------+-------+ | 1 | SIMPLE | salaries | NULL | ref | idx_salary | idx_salary | 4 | const | 13 | 100.00 | NULL | +----+-------------+----------+------------+------+---------------+------------+---------+-------+------+----------+-------+ 1 row in set, 1 warning (0.00 sec) Now the index used is idx_salary, the one we recently created. The index actually helped examine only 13 records and all of them are in the resultset. Also, the query execution time is also reduced from over 700ms to almost negligible. Let\u2019s look at another example. Here we are searching for a specific combination of first_name and last_name. But, we might also search based on last_name only. mysql> explain select * from employees where last_name = 'Dredge' and first_name = 'Yinghua'; +----+-------------+-----------+------------+------+---------------+------+---------+------+--------+----------+-------------+ | id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra | +----+-------------+-----------+------------+------+---------------+------+---------+------+--------+----------+-------------+ | 1 | SIMPLE | employees | NULL | ALL | NULL | NULL | NULL | NULL | 299468 | 1.00 | Using where | +----+-------------+-----------+------------+------+---------------+------+---------+------+--------+----------+-------------+ 1 row in set, 1 warning (0.00 sec) Now only 1% record out of almost 300K is the resultset. Although the query time is particularly quick as we have only 300K records, this will be a pain if the number of records are over millions. In this case, we create an index on last_name and first_name, not separately, but a composite index including both the columns. create index idx_last_first on employees(last_name, first_name) mysql> explain select * from employees where last_name = 'Dredge' and first_name = 'Yinghua'; +----+-------------+-----------+------------+------+----------------+----------------+---------+-------------+------+----------+-------+ | id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra | +----+-------------+-----------+------------+------+----------------+----------------+---------+-------------+------+----------+-------+ | 1 | SIMPLE | employees | NULL | ref | idx_last_first | idx_last_first | 124 | const,const | 1 | 100.00 | NULL | +----+-------------+-----------+------------+------+----------------+----------------+---------+-------------+------+----------+-------+ 1 row in set, 1 warning (0.00 sec) We chose to put last_name before first_name while creating the index as the optimizer starts from the leftmost prefix of the index while evaluating the query. For example, if we have a 3-column index like idx(c1, c2, c3), then the search capability of the index follows - (c1), (c1, c2) or (c1, c2, c3) i.e. if your where clause has only first_name this index won\u2019t work. mysql> explain select * from employees where first_name = 'Yinghua'; +----+-------------+-----------+------------+------+---------------+------+---------+------+--------+----------+-------------+ | id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra | +----+-------------+-----------+------------+------+---------------+------+---------+------+--------+----------+-------------+ | 1 | SIMPLE | employees | NULL | ALL | NULL | NULL | NULL | NULL | 299468 | 10.00 | Using where | +----+-------------+-----------+------------+------+---------------+------+---------+------+--------+----------+-------------+ 1 row in set, 1 warning (0.00 sec) But, if you have only the last_name in the where clause, it will work as expected. mysql> explain select * from employees where last_name = 'Dredge'; +----+-------------+-----------+------------+------+----------------+----------------+---------+-------+------+----------+-------+ | id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra | +----+-------------+-----------+------------+------+----------------+----------------+---------+-------+------+----------+-------+ | 1 | SIMPLE | employees | NULL | ref | idx_last_first | idx_last_first | 66 | const | 200 | 100.00 | NULL | +----+-------------+-----------+------------+------+----------------+----------------+---------+-------+------+----------+-------+ 1 row in set, 1 warning (0.00 sec) For another example, use the following queries:- create table employees_2 like employees; create table salaries_2 like salaries; alter table salaries_2 drop primary key; We made copies of employees and salaries tables without the Primary Key of salaries table to understand an example of Select with Join. When you have queries like the below, it becomes tricky to identify the pain point of the query. mysql> select e.first_name, e.last_name, s.salary, e.hire_date from employees_2 e join salaries_2 s on e.emp_no=s.emp_no where e.last_name='Dredge'; 1860 rows in set (4.44 sec) This query is taking about 4.5 seconds to complete with 1860 rows in the resultset. Let\u2019s look at the Explain plan. There will be 2 records in the Explain plan as 2 tables are used in the query. mysql> explain select e.first_name, e.last_name, s.salary, e.hire_date from employees_2 e join salaries_2 s on e.emp_no=s.emp_no where e.last_name='Dredge'; +----+-------------+-------+------------+--------+------------------------+---------+---------+--------------------+---------+----------+-------------+ | id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra | +----+-------------+-------+------------+--------+------------------------+---------+---------+--------------------+---------+----------+-------------+ | 1 | SIMPLE | s | NULL | ALL | NULL | NULL | NULL | NULL | 2837194 | 100.00 | NULL | | 1 | SIMPLE | e | NULL | eq_ref | PRIMARY,idx_last_first | PRIMARY | 4 | employees.s.emp_no | 1 | 5.00 | Using where | +----+-------------+-------+------------+--------+------------------------+---------+---------+--------------------+---------+----------+-------------+ 2 rows in set, 1 warning (0.00 sec) These are in order of evaluation i.e. salaries_2 will be evaluated first and then employees_2 will be joined to it. As it looks like, it scans almost all the rows of salaries_2 table and tries to match the employees_2 rows as per the join condition. Though where clause is used in fetching the final resultset, but the index corresponding to the where clause is not used for the employees_2 table. If the join is done on two indexes which have the same data-types, it will always be faster. So, let\u2019s create an index on the emp_no column of salaries_2 table and analyze the query again. create index idx_empno on salaries_2(emp_no); mysql> explain select e.first_name, e.last_name, s.salary, e.hire_date from employees_2 e join salaries_2 s on e.emp_no=s.emp_no where e.last_name='Dredge'; +----+-------------+-------+------------+------+------------------------+----------------+---------+--------------------+------+----------+-------+ | id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra | +----+-------------+-------+------------+------+------------------------+----------------+---------+--------------------+------+----------+-------+ | 1 | SIMPLE | e | NULL | ref | PRIMARY,idx_last_first | idx_last_first | 66 | const | 200 | 100.00 | NULL | | 1 | SIMPLE | s | NULL | ref | idx_empno | idx_empno | 4 | employees.e.emp_no | 9 | 100.00 | NULL | +----+-------------+-------+------------+------+------------------------+----------------+---------+--------------------+------+----------+-------+ 2 rows in set, 1 warning (0.00 sec) Now, not only did the index help the optimizer to examine only a few rows in both tables, it reversed the order of the tables in evaluation. The employees_2 table is evaluated first and rows are selected as per the index respective to the where clause. Then the records are joined to salaries_2 table as per the index used due to the join condition. The execution time of the query came down from 4.5s to 0.02s . mysql> select e.first_name, e.last_name, s.salary, e.hire_date from employees_2 e join salaries_2 s on e.emp_no=s.emp_no where e.last_name='Dredge'\\G 1860 rows in set (0.02 sec)","title":"Query Performance"},{"location":"level101/databases_sql/query_performance/#query-performance-improvement","text":"Query Performance is a very crucial aspect of relational databases. If not tuned correctly, the select queries can become slow and painful for the application, and for the MySQL server as well. The important task is to identify the slow queries and try to improve their performance by either rewriting them or creating proper indexes on the tables involved in it.","title":"Query Performance Improvement"},{"location":"level101/databases_sql/query_performance/#the-slow-query-log","text":"The slow query log contains SQL statements that take a longer time to execute then set in the config parameter long_query_time. These queries are the candidates for optimization. There are some good utilities to summarize the slow query logs like, mysqldumpslow (provided by MySQL itself), pt-query-digest (provided by Percona), etc. Following are the config parameters that are used to enable and effectively catch slow queries Variable Explanation Example value slow_query_log Enables or disables slow query logs ON slow_query_log_file The location of the slow query log /var/lib/mysql/mysql-slow.log long_query_time Threshold time. The query that takes longer than this time is logged in slow query log 5 log_queries_not_using_indexes When enabled with the slow query log, the queries which do not make use of any index are also logged in the slow query log even though they take less time than long_query_time. ON So, for this section, we will be enabling slow_query_log , long_query_time will be kept to 0.3 (300 ms) , and log_queries_not_using index will be enabled as well. Below are the queries that we will execute on the employees database. select * from employees where last_name = 'Koblick'; select * from salaries where salary >= 100000; select * from titles where title = 'Manager'; select * from employees where year(hire_date) = 1995; select year(e.hire_date), max(s.salary) from employees e join salaries s on e.emp_no=s.emp_no group by year(e.hire_date); Now, queries 1 , 3 and 4 executed under 300 ms but if we check the slow query logs, we will find these queries logged as they are not using any of the index. Queries 2 and 5 are taking longer than 300ms and also not using any index. Use the following command to get the summary of the slow query log mysqldumpslow /var/lib/mysql/mysql-slow.log There are some more queries in the snapshot that were along with the queries mentioned. Mysqldumpslow replaces actual values that were used by N (in case of numbers) and S (in case of strings). That can be overridden by -a option, however that will increase the output lines if different values are used in similar queries.","title":"The Slow Query Log"},{"location":"level101/databases_sql/query_performance/#the-explain-plan","text":"The EXPLAIN command is used with any query that we want to analyze. It describes the query execution plan, how MySQL sees and executes the query. EXPLAIN works with Select, Insert, Update and Delete statements. It tells about different aspects of the query like, how tables are joined, indexes used or not, etc. The important thing here is to understand the basic Explain plan output of a query to determine its performance. Let's take the following query as an example, mysql> explain select * from salaries where salary = 100000; +----+-------------+----------+------------+------+---------------+------+---------+------+---------+----------+-------------+ | id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra | +----+-------------+----------+------------+------+---------------+------+---------+------+---------+----------+-------------+ | 1 | SIMPLE | salaries | NULL | ALL | NULL | NULL | NULL | NULL | 2838426 | 10.00 | Using where | +----+-------------+----------+------------+------+---------------+------+---------+------+---------+----------+-------------+ 1 row in set, 1 warning (0.00 sec) The key aspects to understand in the above output are:- Partitions - the number of partitions considered while executing the query. It is only valid if the table is partitioned. Possible_keys - the list of indexes that were considered during creation of the execution plan. Key - the index that will be used while executing the query. Rows - the number of rows examined during the execution. Filtered - the percentage of rows that were filtered out of the rows examined. The maximum and most optimized result will have 100 in this field. Extra - this tells some extra information on how MySQL evaluates, whether the query is using only where clause to match target rows, any index or temporary table, etc. So, for the above query, we can determine that there are no partitions, there are no candidate indexes to be used and so no index is used at all, over 2M rows are examined and only 10% of them are included in the result, and lastly, only a where clause is used to match the target rows.","title":"The EXPLAIN Plan"},{"location":"level101/databases_sql/query_performance/#creating-an-index","text":"Indexes are used to speed up selecting relevant rows for a given column value. Without an index, MySQL starts with the first row and goes through the entire table to find matching rows. If the table has too many rows, the operation becomes costly. With indexes, MySQL determines the position to start looking for the data without reading the full table. A primary key is also an index which is also the fastest and is stored along with the table data. Secondary indexes are stored outside of the table data and are used to further enhance the performance of SQL statements. Indexes are mostly stored as B-Trees, with some exceptions like spatial indexes use R-Trees and memory tables use hash indexes. There are 2 ways to create indexes:- While creating a table - if we know beforehand the columns that will drive the most number of where clauses in select queries, then we can put an index over them while creating a table. Altering a Table - To improve the performance of a troubling query, we create an index on a table which already has data in it using ALTER or CREATE INDEX command. This operation does not block the table but might take some time to complete depending on the size of the table. Let\u2019s look at the query that we discussed in the previous section. It\u2019s clear that scanning over 2M records is not a good idea when only 10% of those records are actually in the resultset. Hence, we create an index on the salary column of the salaries table. create index idx_salary on salaries(salary) OR alter table salaries add index idx_salary(salary) And the same explain plan now looks like this mysql> explain select * from salaries where salary = 100000; +----+-------------+----------+------------+------+---------------+------------+---------+-------+------+----------+-------+ | id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra | +----+-------------+----------+------------+------+---------------+------------+---------+-------+------+----------+-------+ | 1 | SIMPLE | salaries | NULL | ref | idx_salary | idx_salary | 4 | const | 13 | 100.00 | NULL | +----+-------------+----------+------------+------+---------------+------------+---------+-------+------+----------+-------+ 1 row in set, 1 warning (0.00 sec) Now the index used is idx_salary, the one we recently created. The index actually helped examine only 13 records and all of them are in the resultset. Also, the query execution time is also reduced from over 700ms to almost negligible. Let\u2019s look at another example. Here we are searching for a specific combination of first_name and last_name. But, we might also search based on last_name only. mysql> explain select * from employees where last_name = 'Dredge' and first_name = 'Yinghua'; +----+-------------+-----------+------------+------+---------------+------+---------+------+--------+----------+-------------+ | id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra | +----+-------------+-----------+------------+------+---------------+------+---------+------+--------+----------+-------------+ | 1 | SIMPLE | employees | NULL | ALL | NULL | NULL | NULL | NULL | 299468 | 1.00 | Using where | +----+-------------+-----------+------------+------+---------------+------+---------+------+--------+----------+-------------+ 1 row in set, 1 warning (0.00 sec) Now only 1% record out of almost 300K is the resultset. Although the query time is particularly quick as we have only 300K records, this will be a pain if the number of records are over millions. In this case, we create an index on last_name and first_name, not separately, but a composite index including both the columns. create index idx_last_first on employees(last_name, first_name) mysql> explain select * from employees where last_name = 'Dredge' and first_name = 'Yinghua'; +----+-------------+-----------+------------+------+----------------+----------------+---------+-------------+------+----------+-------+ | id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra | +----+-------------+-----------+------------+------+----------------+----------------+---------+-------------+------+----------+-------+ | 1 | SIMPLE | employees | NULL | ref | idx_last_first | idx_last_first | 124 | const,const | 1 | 100.00 | NULL | +----+-------------+-----------+------------+------+----------------+----------------+---------+-------------+------+----------+-------+ 1 row in set, 1 warning (0.00 sec) We chose to put last_name before first_name while creating the index as the optimizer starts from the leftmost prefix of the index while evaluating the query. For example, if we have a 3-column index like idx(c1, c2, c3), then the search capability of the index follows - (c1), (c1, c2) or (c1, c2, c3) i.e. if your where clause has only first_name this index won\u2019t work. mysql> explain select * from employees where first_name = 'Yinghua'; +----+-------------+-----------+------------+------+---------------+------+---------+------+--------+----------+-------------+ | id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra | +----+-------------+-----------+------------+------+---------------+------+---------+------+--------+----------+-------------+ | 1 | SIMPLE | employees | NULL | ALL | NULL | NULL | NULL | NULL | 299468 | 10.00 | Using where | +----+-------------+-----------+------------+------+---------------+------+---------+------+--------+----------+-------------+ 1 row in set, 1 warning (0.00 sec) But, if you have only the last_name in the where clause, it will work as expected. mysql> explain select * from employees where last_name = 'Dredge'; +----+-------------+-----------+------------+------+----------------+----------------+---------+-------+------+----------+-------+ | id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra | +----+-------------+-----------+------------+------+----------------+----------------+---------+-------+------+----------+-------+ | 1 | SIMPLE | employees | NULL | ref | idx_last_first | idx_last_first | 66 | const | 200 | 100.00 | NULL | +----+-------------+-----------+------------+------+----------------+----------------+---------+-------+------+----------+-------+ 1 row in set, 1 warning (0.00 sec) For another example, use the following queries:- create table employees_2 like employees; create table salaries_2 like salaries; alter table salaries_2 drop primary key; We made copies of employees and salaries tables without the Primary Key of salaries table to understand an example of Select with Join. When you have queries like the below, it becomes tricky to identify the pain point of the query. mysql> select e.first_name, e.last_name, s.salary, e.hire_date from employees_2 e join salaries_2 s on e.emp_no=s.emp_no where e.last_name='Dredge'; 1860 rows in set (4.44 sec) This query is taking about 4.5 seconds to complete with 1860 rows in the resultset. Let\u2019s look at the Explain plan. There will be 2 records in the Explain plan as 2 tables are used in the query. mysql> explain select e.first_name, e.last_name, s.salary, e.hire_date from employees_2 e join salaries_2 s on e.emp_no=s.emp_no where e.last_name='Dredge'; +----+-------------+-------+------------+--------+------------------------+---------+---------+--------------------+---------+----------+-------------+ | id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra | +----+-------------+-------+------------+--------+------------------------+---------+---------+--------------------+---------+----------+-------------+ | 1 | SIMPLE | s | NULL | ALL | NULL | NULL | NULL | NULL | 2837194 | 100.00 | NULL | | 1 | SIMPLE | e | NULL | eq_ref | PRIMARY,idx_last_first | PRIMARY | 4 | employees.s.emp_no | 1 | 5.00 | Using where | +----+-------------+-------+------------+--------+------------------------+---------+---------+--------------------+---------+----------+-------------+ 2 rows in set, 1 warning (0.00 sec) These are in order of evaluation i.e. salaries_2 will be evaluated first and then employees_2 will be joined to it. As it looks like, it scans almost all the rows of salaries_2 table and tries to match the employees_2 rows as per the join condition. Though where clause is used in fetching the final resultset, but the index corresponding to the where clause is not used for the employees_2 table. If the join is done on two indexes which have the same data-types, it will always be faster. So, let\u2019s create an index on the emp_no column of salaries_2 table and analyze the query again. create index idx_empno on salaries_2(emp_no); mysql> explain select e.first_name, e.last_name, s.salary, e.hire_date from employees_2 e join salaries_2 s on e.emp_no=s.emp_no where e.last_name='Dredge'; +----+-------------+-------+------------+------+------------------------+----------------+---------+--------------------+------+----------+-------+ | id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra | +----+-------------+-------+------------+------+------------------------+----------------+---------+--------------------+------+----------+-------+ | 1 | SIMPLE | e | NULL | ref | PRIMARY,idx_last_first | idx_last_first | 66 | const | 200 | 100.00 | NULL | | 1 | SIMPLE | s | NULL | ref | idx_empno | idx_empno | 4 | employees.e.emp_no | 9 | 100.00 | NULL | +----+-------------+-------+------------+------+------------------------+----------------+---------+--------------------+------+----------+-------+ 2 rows in set, 1 warning (0.00 sec) Now, not only did the index help the optimizer to examine only a few rows in both tables, it reversed the order of the tables in evaluation. The employees_2 table is evaluated first and rows are selected as per the index respective to the where clause. Then the records are joined to salaries_2 table as per the index used due to the join condition. The execution time of the query came down from 4.5s to 0.02s . mysql> select e.first_name, e.last_name, s.salary, e.hire_date from employees_2 e join salaries_2 s on e.emp_no=s.emp_no where e.last_name='Dredge'\\G 1860 rows in set (0.02 sec)","title":"Creating an Index"},{"location":"level101/databases_sql/replication/","text":"MySQL Replication Replication enables data from one MySQL host (termed as Primary) to be copied to another MySQL host (termed as Replica). MySQL Replication is asynchronous in nature by default, but it can be changed to semi-synchronous with some configurations. Some common applications of MySQL replication are:- Read-scaling - as multiple hosts can replicate the data from a single primary host, we can set up as many replicas as we need and scale reads through them, i.e. application writes will go to a single primary host and the reads can balance between all the replicas that are there. Such a setup can improve the write performance as well, as the primary is dedicated to only updates and not reads. Backups using replicas - the backup process can sometimes be a little heavy. But if we have replicas configured, then we can use one of them to get the backup without affecting the primary data at all. Disaster Recovery - a replica in some other geographical region paves a proper path to configure disaster recovery. MySQL supports different types of synchronizations as well:- Asynchronous - this is the default synchronization method. It is one-way, i.e. one host serves as primary and one or more hosts as replica. We will discuss this method throughout the replication topic. Semi-Synchronous - in this type of synchronization, a commit performed on the primary host is blocked until at least one replica acknowledges it. Post the acknowledgement from any one replica, the control is returned to the session that performed the transaction. This ensures strong consistency but the replication is slower than asynchronous. Delayed - we can deliberately lag the replica in a typical MySQL replication by the number of seconds desired by the use case. This type of replication safeguards from severe human errors of dropping or corrupting the data on the primary, for example, in the above diagram for Delayed Replication, if a DROP DATABASE is executed by mistake on the primary, we still have 30 minutes to recover the data from R2 as that command has not been replicated on R2 yet. Pre-Requisites Before we dive into setting up replication, we should know about the binary logs. Binary logs play a very important role in MySQL replication. Binary logs, or commonly known as binlogs contain events about the changes done to the database, like table structure changes, data changes via DML operations, etc. They are not used to log SELECT statements. For replication, the primary sends the information to the replicas using its binlogs about the changes done to the database, and the replicas make the same data changes. With respect to MySQL replication, the binary log format can be of two types that decides the main type of replication:- - Statement-Based Replication or SBR - Row-Based Replication or RBR Statement Based Binlog Format Originally, the replication in MySQL was based on SQL statements getting replicated and executed on the replica from the primary. This is called statement based logging. The binlog contains the exact SQL statement run by the session. So If we run the above statements to insert 3 records and the update 3 in a single update statement, they will be logged exactly the same as when we executed them. Row Based Binlog Format The Row based is the default one in the latest MySQL releases. This is a lot different from the Statement format as here, row events are logged instead of statements. By that we mean, in the above example one update statement affected 3 records, but binlog had only one update statement; if it is a row based format, binlog will have an event for each record updated. Statement Based v/s Row Based binlogs Let\u2019s have a look at the operational differences between statement-based and row-based binlogs. Statement Based Row Based Logs SQL statements as executed Logs row events based on SQL statements executed Takes lesser disk space Takes more disk space Restoring using binlogs is faster Restoring using binlogs is slower When used for replication, if any statement has a predefined function that has its own value, like sysdate(), uuid() etc, the output could be different on the replica which makes it inconsistent. Whatever is executed becomes a row event with values, so there will be no problem if such functions are used in SQL statements. Only statements are logged so no other row events are generated. A lot of events are generated when a table is copied into another using INSERT INTO SELECT. Note - There is another type of binlog format called Mixed . With mixed logging, statement based is used by default but it switches to row based in certain cases. If MySQL cannot guarantee that statement based logging is safe for the statements executed, it issues a warning and switches to row based for those statements. We will be using binary log format as Row for the entire replication topic. Replication in Motion The above figure indicates how a typical MySQL replication works. Replica_IO_Thread is responsible to fetch the binlog events from the primary binary logs to the replica On the Replica host, relay logs are created which are exact copies of the binary logs. If the binary logs on primary are in row format, the relay logs will be the same. Replica_SQL_Thread applies the relay logs on the replica MySQL server. If log-bin is enabled on the replica, then the replica will have its own binary logs as well. If log-slave-updates is enabled, then it will have the updates from the primary logged in the binlogs as well. Setting up Replication In this section, we will set up a simple asynchronous replication. The binlogs will be in row based format. The replication will be set up on two fresh hosts with no prior data present. There are two different ways in which we can set up replication. Binlog based - Each replica keeps a record of the binlog coordinates on the primary - current binlog and position in the binlog till where it has read and processed. So, at a time different replicas might be reading different parts of the same binlog. GTID based - Every transaction gets an identifier called global transaction identifier or GTID. There is no need to keep the record of binlog coordinates, as long as the replica has all the GTIDs executed on the primary, it is consistent with the primary. A typical GTID is the server_uuid:# positive integer. We will set up a GTID based replication in the following section but will also discuss binlog based replication setup as well. Primary Host Configurations The following config parameters should be present in the primary my.cnf file for setting up GTID based replication. server-id - a unique ID for the mysql server log-bin - the binlog location binlog-format - ROW | STATEMENT (we will use ROW) gtid-mode - ON enforce-gtid-consistency - ON (allows execution of only those statements which can be logged using GTIDs) Replica Host Configurations The following config parameters should be present in the replica my.cnf file for setting up replication. server-id - different than the primary host log-bin - (optional, if you want replica to log its own changes as well) binlog-format - depends on the above gtid-mode - ON enforce-gtid-consistency - ON log-slave-updates - ON (if binlog is enabled, then we can enable this. This enables the replica to log the changes coming from the primary along with its own changes. Helps in setting up chain replication) Replication User Every replica connects to the primary using a mysql user for replicating. So there must be a mysql user account for the same on the primary host. Any user can be used for this purpose provided it has REPLICATION SLAVE privilege. If the sole purpose is replication then we can have a user with only the required privilege. On the primary host mysql> create user repl_user@ identified by 'xxxxx'; mysql> grant replication slave on *.* to repl_user@''; Obtaining Starting position from Primary Run the following command on the primary host mysql> show master status\\G *************************** 1. row *************************** File: mysql-bin.000001 Position: 73 Binlog_Do_DB: Binlog_Ignore_DB: Executed_Gtid_Set: e17d0920-d00e-11eb-a3e6-000d3aa00f87:1-3 1 row in set (0.00 sec) If we are working with binary log based replication, the top two output lines are the most important ones. That tells the current binlog on the primary host and till what position it has written. For fresh hosts we know that no data is written so we can directly set up replication using the very first binlog file and position 4. If we are setting up a replication from a backup, then that changes the way we obtain the starting position. For GTIDs, the executed_gtid_set is the value where primary is right now. Again, for a fresh setup, we don\u2019t have to specify anything about the starting point and it will start from the transaction id 1, but when we set up from a backup, the backup will contain the GTID positions till where backup has been taken. Setting up Replica The replication setup must know about the primary host, the user and password to connect, the binlog coordinates (for binlog based replication) or the GTID auto-position parameter. The following command is used for setting up change master to master_host = '', master_port = , master_user = 'repl_user', master_password = 'xxxxx', master_auto_position = 1; Note - the Change Master To command has been replaced with Change Replication Source To from Mysql 8.0.23 onwards, also all the master and slave keywords are replaced with source and replica . If it is binlog based replication, then instead of master_auto_position, we need to specify the binlog coordinates. master_log_file = 'mysql-bin.000001', master_log_pos = 4 Starting Replication and Check Status Now that everything is configured, we just need to start the replication on the replica via the following command start slave; OR from MySQL 8.0.23 onwards, start replica; Whether or not the replication is running successfully, we can determine by running the following command show slave status\\G OR from MySQL 8.0.23 onwards, show replica status\\G mysql> show replica status\\G *************************** 1. row *************************** Replica_IO_State: Waiting for master to send event Source_Host: Source_User: repl_user Source_Port: Connect_Retry: 60 Source_Log_File: mysql-bin.000001 Read_Source_Log_Pos: 852 Relay_Log_File: mysql-relay-bin.000002 Relay_Log_Pos: 1067 Relay_Source_Log_File: mysql-bin.000001 Replica_IO_Running: Yes Replica_SQL_Running: Yes Replicate_Do_DB: Replicate_Ignore_DB: Replicate_Do_Table: Replicate_Ignore_Table: Replicate_Wild_Do_Table: Replicate_Wild_Ignore_Table: Last_Errno: 0 Last_Error: Skip_Counter: 0 Exec_Source_Log_Pos: 852 Relay_Log_Space: 1283 Until_Condition: None Until_Log_File: Until_Log_Pos: 0 Source_SSL_Allowed: No Source_SSL_CA_File: Source_SSL_CA_Path: Source_SSL_Cert: Source_SSL_Cipher: Source_SSL_Key: Seconds_Behind_Source: 0 Source_SSL_Verify_Server_Cert: No Last_IO_Errno: 0 Last_IO_Error: Last_SQL_Errno: 0 Last_SQL_Error: Replicate_Ignore_Server_Ids: Source_Server_Id: 1 Source_UUID: e17d0920-d00e-11eb-a3e6-000d3aa00f87 Source_Info_File: mysql.slave_master_info SQL_Delay: 0 SQL_Remaining_Delay: NULL Replica_SQL_Running_State: Slave has read all relay log; waiting for more updates Source_Retry_Count: 86400 Source_Bind: Last_IO_Error_Timestamp: Last_SQL_Error_Timestamp: Source_SSL_Crl: Source_SSL_Crlpath: Retrieved_Gtid_Set: e17d0920-d00e-11eb-a3e6-000d3aa00f87:1-3 Executed_Gtid_Set: e17d0920-d00e-11eb-a3e6-000d3aa00f87:1-3 Auto_Position: 1 Replicate_Rewrite_DB: Channel_Name: Source_TLS_Version: Source_public_key_path: Get_Source_public_key: 0 Network_Namespace: 1 row in set (0.00 sec) Some of the parameters are explained below:- Relay_Source_Log_File - the primary\u2019s file where replica is currently reading from Execute_Source_Log_Pos - for the above file on which position is the replica reading currently from. These two parameters are of utmost importance when binlog based replication is used. Replica_IO_Running - IO thread of replica is running or not Replica_SQL_Running - SQL thread of replica is running or not Seconds_Behind_Source - the difference of seconds when a statement was executed on Primary and then on Replica. This indicates how much replication lag is there. Source_UUID - the uuid of the primary host Retrieved_Gtid_Set - the GTIDs fetched from the primary host by the replica to be executed. Executed_Gtid_Set - the GTIDs executed on the replica. This set remains the same for the entire cluster if the replicas are in sync. Auto_Position - it directs the replica to fetch the next GTID automatically Create a Replica for the already setup cluster The steps discussed in the previous section talks about the setting up replication on two fresh hosts. When we have to set up a replica for a host which is already serving applications, then the backup of the primary is used, either fresh backup taken for the replica (should only be done if the traffic it is serving is less) or use a recently taken backup. If the size of the databases on the MySQL primary server is small, less than 100G recommended, then mysqldump can be used to take backup along with the following options. mysqldump -uroot -p -hhost_ip -P3306 --all-databases --single-transaction --master-data=1 > primary_host.bkp --single-transaction - this option starts a transaction before taking the backup which ensures it is consistent. As transactions are isolated from each other, so no other writes affect the backup. --master-data - this option is required if binlog based replication is desired to be set up. It includes the binary log file and log file position in the backup file. When GTID mode is enabled and mysqldump is executed, it includes the GTID executed to be used to start the replica after the backup position. The contents of the mysqldump output file will have the following It is recommended to comment these before restoring otherwise they could throw errors. Also, using master-data=2 will automatically comment the master_log_file line. Similarly, when taking backup of the host using xtrabackup , the file xtrabckup_info file contains the information about binlog file and file position, as well as the GTID executed set. server_version = 8.0.25 start_time = 2021-06-22 03:45:17 end_time = 2021-06-22 03:45:20 lock_time = 0 binlog_pos = filename 'mysql-bin.000007', position '196', GTID of the last change 'e17d0920-d00e-11eb-a3e6-000d3aa00f87:1-5' innodb_from_lsn = 0 innodb_to_lsn = 18153149 partial = N incremental = N format = file compressed = N encrypted = N Now, after setting MySQL server on the desired host, restore the backup taken from any one of the above methods. If the intended way is binlog based replication, then use the binlog file and position info in the following command change Replication Source to source_host = \u2018primary_ip\u2019, source_port = 3306, source_user = \u2018repl_user\u2019, source_password = \u2018xxxxx\u2019, source_log_file = \u2018mysql-bin.000007\u2019, source_log_pos = \u2018196\u2019; If the replication needs to be set via GITDs, then run the below command to tell the replica about the GTIDs already executed. On the Replica host, run th following commands reset master; set global gtid_purged = \u2018e17d0920-d00e-11eb-a3e6-000d3aa00f87:1-5\u2019 change replication source to source_host = \u2018primary_ip\u2019, source_port = 3306, source_user = \u2018repl_user\u2019, source_password = \u2018xxxxx\u2019, source_auto_position = 1 The reset master command resets the position of the binary log to initial. It can be skipped if the host is a freshly installed MySQL, but we restored a backup so it is necessary. The gtid_purged global variable lets the replica know the GTIDs that have already been executed, so that the replication can start after that. Then in the change source command, we set the auto-position to 1 which automatically gets the next GTID to proceed. Further Reading More applications of Replication Automtaed Failovers using MySQL Orchestrator","title":"MySQL Replication"},{"location":"level101/databases_sql/replication/#mysql-replication","text":"Replication enables data from one MySQL host (termed as Primary) to be copied to another MySQL host (termed as Replica). MySQL Replication is asynchronous in nature by default, but it can be changed to semi-synchronous with some configurations. Some common applications of MySQL replication are:- Read-scaling - as multiple hosts can replicate the data from a single primary host, we can set up as many replicas as we need and scale reads through them, i.e. application writes will go to a single primary host and the reads can balance between all the replicas that are there. Such a setup can improve the write performance as well, as the primary is dedicated to only updates and not reads. Backups using replicas - the backup process can sometimes be a little heavy. But if we have replicas configured, then we can use one of them to get the backup without affecting the primary data at all. Disaster Recovery - a replica in some other geographical region paves a proper path to configure disaster recovery. MySQL supports different types of synchronizations as well:- Asynchronous - this is the default synchronization method. It is one-way, i.e. one host serves as primary and one or more hosts as replica. We will discuss this method throughout the replication topic. Semi-Synchronous - in this type of synchronization, a commit performed on the primary host is blocked until at least one replica acknowledges it. Post the acknowledgement from any one replica, the control is returned to the session that performed the transaction. This ensures strong consistency but the replication is slower than asynchronous. Delayed - we can deliberately lag the replica in a typical MySQL replication by the number of seconds desired by the use case. This type of replication safeguards from severe human errors of dropping or corrupting the data on the primary, for example, in the above diagram for Delayed Replication, if a DROP DATABASE is executed by mistake on the primary, we still have 30 minutes to recover the data from R2 as that command has not been replicated on R2 yet. Pre-Requisites Before we dive into setting up replication, we should know about the binary logs. Binary logs play a very important role in MySQL replication. Binary logs, or commonly known as binlogs contain events about the changes done to the database, like table structure changes, data changes via DML operations, etc. They are not used to log SELECT statements. For replication, the primary sends the information to the replicas using its binlogs about the changes done to the database, and the replicas make the same data changes. With respect to MySQL replication, the binary log format can be of two types that decides the main type of replication:- - Statement-Based Replication or SBR - Row-Based Replication or RBR Statement Based Binlog Format Originally, the replication in MySQL was based on SQL statements getting replicated and executed on the replica from the primary. This is called statement based logging. The binlog contains the exact SQL statement run by the session. So If we run the above statements to insert 3 records and the update 3 in a single update statement, they will be logged exactly the same as when we executed them. Row Based Binlog Format The Row based is the default one in the latest MySQL releases. This is a lot different from the Statement format as here, row events are logged instead of statements. By that we mean, in the above example one update statement affected 3 records, but binlog had only one update statement; if it is a row based format, binlog will have an event for each record updated. Statement Based v/s Row Based binlogs Let\u2019s have a look at the operational differences between statement-based and row-based binlogs. Statement Based Row Based Logs SQL statements as executed Logs row events based on SQL statements executed Takes lesser disk space Takes more disk space Restoring using binlogs is faster Restoring using binlogs is slower When used for replication, if any statement has a predefined function that has its own value, like sysdate(), uuid() etc, the output could be different on the replica which makes it inconsistent. Whatever is executed becomes a row event with values, so there will be no problem if such functions are used in SQL statements. Only statements are logged so no other row events are generated. A lot of events are generated when a table is copied into another using INSERT INTO SELECT. Note - There is another type of binlog format called Mixed . With mixed logging, statement based is used by default but it switches to row based in certain cases. If MySQL cannot guarantee that statement based logging is safe for the statements executed, it issues a warning and switches to row based for those statements. We will be using binary log format as Row for the entire replication topic. Replication in Motion The above figure indicates how a typical MySQL replication works. Replica_IO_Thread is responsible to fetch the binlog events from the primary binary logs to the replica On the Replica host, relay logs are created which are exact copies of the binary logs. If the binary logs on primary are in row format, the relay logs will be the same. Replica_SQL_Thread applies the relay logs on the replica MySQL server. If log-bin is enabled on the replica, then the replica will have its own binary logs as well. If log-slave-updates is enabled, then it will have the updates from the primary logged in the binlogs as well.","title":"MySQL Replication"},{"location":"level101/databases_sql/replication/#setting-up-replication","text":"In this section, we will set up a simple asynchronous replication. The binlogs will be in row based format. The replication will be set up on two fresh hosts with no prior data present. There are two different ways in which we can set up replication. Binlog based - Each replica keeps a record of the binlog coordinates on the primary - current binlog and position in the binlog till where it has read and processed. So, at a time different replicas might be reading different parts of the same binlog. GTID based - Every transaction gets an identifier called global transaction identifier or GTID. There is no need to keep the record of binlog coordinates, as long as the replica has all the GTIDs executed on the primary, it is consistent with the primary. A typical GTID is the server_uuid:# positive integer. We will set up a GTID based replication in the following section but will also discuss binlog based replication setup as well. Primary Host Configurations The following config parameters should be present in the primary my.cnf file for setting up GTID based replication. server-id - a unique ID for the mysql server log-bin - the binlog location binlog-format - ROW | STATEMENT (we will use ROW) gtid-mode - ON enforce-gtid-consistency - ON (allows execution of only those statements which can be logged using GTIDs) Replica Host Configurations The following config parameters should be present in the replica my.cnf file for setting up replication. server-id - different than the primary host log-bin - (optional, if you want replica to log its own changes as well) binlog-format - depends on the above gtid-mode - ON enforce-gtid-consistency - ON log-slave-updates - ON (if binlog is enabled, then we can enable this. This enables the replica to log the changes coming from the primary along with its own changes. Helps in setting up chain replication) Replication User Every replica connects to the primary using a mysql user for replicating. So there must be a mysql user account for the same on the primary host. Any user can be used for this purpose provided it has REPLICATION SLAVE privilege. If the sole purpose is replication then we can have a user with only the required privilege. On the primary host mysql> create user repl_user@ identified by 'xxxxx'; mysql> grant replication slave on *.* to repl_user@''; Obtaining Starting position from Primary Run the following command on the primary host mysql> show master status\\G *************************** 1. row *************************** File: mysql-bin.000001 Position: 73 Binlog_Do_DB: Binlog_Ignore_DB: Executed_Gtid_Set: e17d0920-d00e-11eb-a3e6-000d3aa00f87:1-3 1 row in set (0.00 sec) If we are working with binary log based replication, the top two output lines are the most important ones. That tells the current binlog on the primary host and till what position it has written. For fresh hosts we know that no data is written so we can directly set up replication using the very first binlog file and position 4. If we are setting up a replication from a backup, then that changes the way we obtain the starting position. For GTIDs, the executed_gtid_set is the value where primary is right now. Again, for a fresh setup, we don\u2019t have to specify anything about the starting point and it will start from the transaction id 1, but when we set up from a backup, the backup will contain the GTID positions till where backup has been taken. Setting up Replica The replication setup must know about the primary host, the user and password to connect, the binlog coordinates (for binlog based replication) or the GTID auto-position parameter. The following command is used for setting up change master to master_host = '', master_port = , master_user = 'repl_user', master_password = 'xxxxx', master_auto_position = 1; Note - the Change Master To command has been replaced with Change Replication Source To from Mysql 8.0.23 onwards, also all the master and slave keywords are replaced with source and replica . If it is binlog based replication, then instead of master_auto_position, we need to specify the binlog coordinates. master_log_file = 'mysql-bin.000001', master_log_pos = 4 Starting Replication and Check Status Now that everything is configured, we just need to start the replication on the replica via the following command start slave; OR from MySQL 8.0.23 onwards, start replica; Whether or not the replication is running successfully, we can determine by running the following command show slave status\\G OR from MySQL 8.0.23 onwards, show replica status\\G mysql> show replica status\\G *************************** 1. row *************************** Replica_IO_State: Waiting for master to send event Source_Host: Source_User: repl_user Source_Port: Connect_Retry: 60 Source_Log_File: mysql-bin.000001 Read_Source_Log_Pos: 852 Relay_Log_File: mysql-relay-bin.000002 Relay_Log_Pos: 1067 Relay_Source_Log_File: mysql-bin.000001 Replica_IO_Running: Yes Replica_SQL_Running: Yes Replicate_Do_DB: Replicate_Ignore_DB: Replicate_Do_Table: Replicate_Ignore_Table: Replicate_Wild_Do_Table: Replicate_Wild_Ignore_Table: Last_Errno: 0 Last_Error: Skip_Counter: 0 Exec_Source_Log_Pos: 852 Relay_Log_Space: 1283 Until_Condition: None Until_Log_File: Until_Log_Pos: 0 Source_SSL_Allowed: No Source_SSL_CA_File: Source_SSL_CA_Path: Source_SSL_Cert: Source_SSL_Cipher: Source_SSL_Key: Seconds_Behind_Source: 0 Source_SSL_Verify_Server_Cert: No Last_IO_Errno: 0 Last_IO_Error: Last_SQL_Errno: 0 Last_SQL_Error: Replicate_Ignore_Server_Ids: Source_Server_Id: 1 Source_UUID: e17d0920-d00e-11eb-a3e6-000d3aa00f87 Source_Info_File: mysql.slave_master_info SQL_Delay: 0 SQL_Remaining_Delay: NULL Replica_SQL_Running_State: Slave has read all relay log; waiting for more updates Source_Retry_Count: 86400 Source_Bind: Last_IO_Error_Timestamp: Last_SQL_Error_Timestamp: Source_SSL_Crl: Source_SSL_Crlpath: Retrieved_Gtid_Set: e17d0920-d00e-11eb-a3e6-000d3aa00f87:1-3 Executed_Gtid_Set: e17d0920-d00e-11eb-a3e6-000d3aa00f87:1-3 Auto_Position: 1 Replicate_Rewrite_DB: Channel_Name: Source_TLS_Version: Source_public_key_path: Get_Source_public_key: 0 Network_Namespace: 1 row in set (0.00 sec) Some of the parameters are explained below:- Relay_Source_Log_File - the primary\u2019s file where replica is currently reading from Execute_Source_Log_Pos - for the above file on which position is the replica reading currently from. These two parameters are of utmost importance when binlog based replication is used. Replica_IO_Running - IO thread of replica is running or not Replica_SQL_Running - SQL thread of replica is running or not Seconds_Behind_Source - the difference of seconds when a statement was executed on Primary and then on Replica. This indicates how much replication lag is there. Source_UUID - the uuid of the primary host Retrieved_Gtid_Set - the GTIDs fetched from the primary host by the replica to be executed. Executed_Gtid_Set - the GTIDs executed on the replica. This set remains the same for the entire cluster if the replicas are in sync. Auto_Position - it directs the replica to fetch the next GTID automatically Create a Replica for the already setup cluster The steps discussed in the previous section talks about the setting up replication on two fresh hosts. When we have to set up a replica for a host which is already serving applications, then the backup of the primary is used, either fresh backup taken for the replica (should only be done if the traffic it is serving is less) or use a recently taken backup. If the size of the databases on the MySQL primary server is small, less than 100G recommended, then mysqldump can be used to take backup along with the following options. mysqldump -uroot -p -hhost_ip -P3306 --all-databases --single-transaction --master-data=1 > primary_host.bkp --single-transaction - this option starts a transaction before taking the backup which ensures it is consistent. As transactions are isolated from each other, so no other writes affect the backup. --master-data - this option is required if binlog based replication is desired to be set up. It includes the binary log file and log file position in the backup file. When GTID mode is enabled and mysqldump is executed, it includes the GTID executed to be used to start the replica after the backup position. The contents of the mysqldump output file will have the following It is recommended to comment these before restoring otherwise they could throw errors. Also, using master-data=2 will automatically comment the master_log_file line. Similarly, when taking backup of the host using xtrabackup , the file xtrabckup_info file contains the information about binlog file and file position, as well as the GTID executed set. server_version = 8.0.25 start_time = 2021-06-22 03:45:17 end_time = 2021-06-22 03:45:20 lock_time = 0 binlog_pos = filename 'mysql-bin.000007', position '196', GTID of the last change 'e17d0920-d00e-11eb-a3e6-000d3aa00f87:1-5' innodb_from_lsn = 0 innodb_to_lsn = 18153149 partial = N incremental = N format = file compressed = N encrypted = N Now, after setting MySQL server on the desired host, restore the backup taken from any one of the above methods. If the intended way is binlog based replication, then use the binlog file and position info in the following command change Replication Source to source_host = \u2018primary_ip\u2019, source_port = 3306, source_user = \u2018repl_user\u2019, source_password = \u2018xxxxx\u2019, source_log_file = \u2018mysql-bin.000007\u2019, source_log_pos = \u2018196\u2019; If the replication needs to be set via GITDs, then run the below command to tell the replica about the GTIDs already executed. On the Replica host, run th following commands reset master; set global gtid_purged = \u2018e17d0920-d00e-11eb-a3e6-000d3aa00f87:1-5\u2019 change replication source to source_host = \u2018primary_ip\u2019, source_port = 3306, source_user = \u2018repl_user\u2019, source_password = \u2018xxxxx\u2019, source_auto_position = 1 The reset master command resets the position of the binary log to initial. It can be skipped if the host is a freshly installed MySQL, but we restored a backup so it is necessary. The gtid_purged global variable lets the replica know the GTIDs that have already been executed, so that the replication can start after that. Then in the change source command, we set the auto-position to 1 which automatically gets the next GTID to proceed.","title":"Setting up Replication"},{"location":"level101/databases_sql/replication/#further-reading","text":"More applications of Replication Automtaed Failovers using MySQL Orchestrator","title":"Further Reading"},{"location":"level101/databases_sql/select_query/","text":"SELECT Query The most commonly used command while working with MySQL is SELECT. It is used to fetch the result set from one or more tables. The general form of a typical select query looks like:- SELECT expr FROM table1 [WHERE condition] [GROUP BY column_list HAVING condition] [ORDER BY column_list ASC|DESC] [LIMIT #] The above general form contains some commonly used clauses of a SELECT query:- expr - comma-separated column list or * (for all columns) WHERE - a condition is provided, if true, directs the query to select only those records. GROUP BY - groups the entire result set based on the column list provided. An aggregate function is recommended to be present in the select expression of the query. HAVING supports grouping by putting a condition on the selected or any other aggregate function. ORDER BY - sorts the result set based on the column list in ascending or descending order. LIMIT - commonly used to limit the number of records. Let\u2019s have a look at some examples for a better understanding of the above. The dataset used for the examples below is available here and is free to use. Select all records mysql> select * from employees limit 5; +--------+------------+------------+-----------+--------+------------+ | emp_no | birth_date | first_name | last_name | gender | hire_date | +--------+------------+------------+-----------+--------+------------+ | 10001 | 1953-09-02 | Georgi | Facello | M | 1986-06-26 | | 10002 | 1964-06-02 | Bezalel | Simmel | F | 1985-11-21 | | 10003 | 1959-12-03 | Parto | Bamford | M | 1986-08-28 | | 10004 | 1954-05-01 | Chirstian | Koblick | M | 1986-12-01 | | 10005 | 1955-01-21 | Kyoichi | Maliniak | M | 1989-09-12 | +--------+------------+------------+-----------+--------+------------+ 5 rows in set (0.00 sec) Select specific fields for all records mysql> select first_name, last_name, gender from employees limit 5; +------------+-----------+--------+ | first_name | last_name | gender | +------------+-----------+--------+ | Georgi | Facello | M | | Bezalel | Simmel | F | | Parto | Bamford | M | | Chirstian | Koblick | M | | Kyoichi | Maliniak | M | +------------+-----------+--------+ 5 rows in set (0.00 sec) Select all records Where hire_date >= January 1, 1990 mysql> select * from employees where hire_date >= '1990-01-01' limit 5; +--------+------------+------------+-------------+--------+------------+ | emp_no | birth_date | first_name | last_name | gender | hire_date | +--------+------------+------------+-------------+--------+------------+ | 10008 | 1958-02-19 | Saniya | Kalloufi | M | 1994-09-15 | | 10011 | 1953-11-07 | Mary | Sluis | F | 1990-01-22 | | 10012 | 1960-10-04 | Patricio | Bridgland | M | 1992-12-18 | | 10016 | 1961-05-02 | Kazuhito | Cappelletti | M | 1995-01-27 | | 10017 | 1958-07-06 | Cristinel | Bouloucos | F | 1993-08-03 | +--------+------------+------------+-------------+--------+------------+ 5 rows in set (0.01 sec) Select first_name and last_name from all records Where birth_date >= 1960 AND gender = \u2018F\u2019 mysql> select first_name, last_name from employees where year(birth_date) >= 1960 and gender='F' limit 5; +------------+-----------+ | first_name | last_name | +------------+-----------+ | Bezalel | Simmel | | Duangkaew | Piveteau | | Divier | Reistad | | Jeong | Reistad | | Mingsen | Casley | +------------+-----------+ 5 rows in set (0.00 sec) Display the total number of records mysql> select count(*) from employees; +----------+ | count(*) | +----------+ | 300024 | +----------+ 1 row in set (0.05 sec) Display gender-wise count of all records mysql> select gender, count(*) from employees group by gender; +--------+----------+ | gender | count(*) | +--------+----------+ | M | 179973 | | F | 120051 | +--------+----------+ 2 rows in set (0.14 sec) Display the year of hire_date and number of employees hired that year, also only those years where more than 20k employees were hired mysql> select year(hire_date), count(*) from employees group by year(hire_date) having count(*) > 20000; +-----------------+----------+ | year(hire_date) | count(*) | +-----------------+----------+ | 1985 | 35316 | | 1986 | 36150 | | 1987 | 33501 | | 1988 | 31436 | | 1989 | 28394 | | 1990 | 25610 | | 1991 | 22568 | | 1992 | 20402 | +-----------------+----------+ 8 rows in set (0.14 sec) Display all records ordered by their hire_date in descending order. If hire_date is the same then in order of their birth_date ascending order mysql> select * from employees order by hire_date desc, birth_date asc limit 5; +--------+------------+------------+-----------+--------+------------+ | emp_no | birth_date | first_name | last_name | gender | hire_date | +--------+------------+------------+-----------+--------+------------+ | 463807 | 1964-06-12 | Bikash | Covnot | M | 2000-01-28 | | 428377 | 1957-05-09 | Yucai | Gerlach | M | 2000-01-23 | | 499553 | 1954-05-06 | Hideyuki | Delgrande | F | 2000-01-22 | | 222965 | 1959-08-07 | Volkmar | Perko | F | 2000-01-13 | | 47291 | 1960-09-09 | Ulf | Flexer | M | 2000-01-12 | +--------+------------+------------+-----------+--------+------------+ 5 rows in set (0.12 sec) SELECT - JOINS JOIN statement is used to produce a combined result set from two or more tables based on certain conditions. It can be also used with Update and Delete statements but we will be focussing on the select query. Following is a basic general form for joins SELECT table1.col1, table2.col1, ... (any combination) FROM table1 table2 ON (or USING depends on join_type) table1.column_for_joining = table2.column_for_joining WHERE \u2026 Any number of columns can be selected, but it is recommended to select only those which are relevant to increase the readability of the resultset. All other clauses like where, group by are not mandatory. Let\u2019s discuss the types of JOINs supported by MySQL Syntax. Inner Join This joins table A with table B on a condition. Only the records where the condition is True are selected in the resultset. Display some details of employees along with their salary mysql> select e.emp_no,e.first_name,e.last_name,s.salary from employees e join salaries s on e.emp_no=s.emp_no limit 5; +--------+------------+-----------+--------+ | emp_no | first_name | last_name | salary | +--------+------------+-----------+--------+ | 10001 | Georgi | Facello | 60117 | | 10001 | Georgi | Facello | 62102 | | 10001 | Georgi | Facello | 66074 | | 10001 | Georgi | Facello | 66596 | | 10001 | Georgi | Facello | 66961 | +--------+------------+-----------+--------+ 5 rows in set (0.00 sec) Similar result can be achieved by mysql> select e.emp_no,e.first_name,e.last_name,s.salary from employees e join salaries s using (emp_no) limit 5; +--------+------------+-----------+--------+ | emp_no | first_name | last_name | salary | +--------+------------+-----------+--------+ | 10001 | Georgi | Facello | 60117 | | 10001 | Georgi | Facello | 62102 | | 10001 | Georgi | Facello | 66074 | | 10001 | Georgi | Facello | 66596 | | 10001 | Georgi | Facello | 66961 | +--------+------------+-----------+--------+ 5 rows in set (0.00 sec) And also by mysql> select e.emp_no,e.first_name,e.last_name,s.salary from employees e natural join salaries s limit 5; +--------+------------+-----------+--------+ | emp_no | first_name | last_name | salary | +--------+------------+-----------+--------+ | 10001 | Georgi | Facello | 60117 | | 10001 | Georgi | Facello | 62102 | | 10001 | Georgi | Facello | 66074 | | 10001 | Georgi | Facello | 66596 | | 10001 | Georgi | Facello | 66961 | +--------+------------+-----------+--------+ 5 rows in set (0.00 sec) Outer Join Majorly of two types:- - LEFT - joining complete table A with table B on a condition. All the records from table A are selected, but from table B, only those records are selected where the condition is True. - RIGHT - Exact opposite of the left join. Let us assume the below tables for understanding left join better. mysql> select * from dummy1; +----------+------------+ | same_col | diff_col_1 | +----------+------------+ | 1 | A | | 2 | B | | 3 | C | +----------+------------+ mysql> select * from dummy2; +----------+------------+ | same_col | diff_col_2 | +----------+------------+ | 1 | X | | 3 | Y | +----------+------------+ A simple select join will look like the one below. mysql> select * from dummy1 d1 left join dummy2 d2 on d1.same_col=d2.same_col; +----------+------------+----------+------------+ | same_col | diff_col_1 | same_col | diff_col_2 | +----------+------------+----------+------------+ | 1 | A | 1 | X | | 3 | C | 3 | Y | | 2 | B | NULL | NULL | +----------+------------+----------+------------+ 3 rows in set (0.00 sec) Which can also be written as mysql> select * from dummy1 d1 left join dummy2 d2 using(same_col); +----------+------------+------------+ | same_col | diff_col_1 | diff_col_2 | +----------+------------+------------+ | 1 | A | X | | 3 | C | Y | | 2 | B | NULL | +----------+------------+------------+ 3 rows in set (0.00 sec) And also as mysql> select * from dummy1 d1 natural left join dummy2 d2; +----------+------------+------------+ | same_col | diff_col_1 | diff_col_2 | +----------+------------+------------+ | 1 | A | X | | 3 | C | Y | | 2 | B | NULL | +----------+------------+------------+ 3 rows in set (0.00 sec) Cross Join This does a cross product of table A and table B without any condition. It doesn\u2019t have a lot of applications in the real world. A Simple Cross Join looks like this mysql> select * from dummy1 cross join dummy2; +----------+------------+----------+------------+ | same_col | diff_col_1 | same_col | diff_col_2 | +----------+------------+----------+------------+ | 1 | A | 3 | Y | | 1 | A | 1 | X | | 2 | B | 3 | Y | | 2 | B | 1 | X | | 3 | C | 3 | Y | | 3 | C | 1 | X | +----------+------------+----------+------------+ 6 rows in set (0.01 sec) One use case that can come in handy is when you have to fill in some missing entries. For example, all the entries from dummy1 must be inserted into a similar table dummy3, with each record must have 3 entries with statuses 1, 5 and 7. mysql> desc dummy3; +----------+----------+------+-----+---------+-------+ | Field | Type | Null | Key | Default | Extra | +----------+----------+------+-----+---------+-------+ | same_col | int | YES | | NULL | | | value | char(15) | YES | | NULL | | | status | smallint | YES | | NULL | | +----------+----------+------+-----+---------+-------+ 3 rows in set (0.02 sec) Either you create an insert query script with as many entries as in dummy1 or use cross join to produce the required resultset. mysql> select * from dummy1 cross join (select 1 union select 5 union select 7) T2 order by same_col; +----------+------------+---+ | same_col | diff_col_1 | 1 | +----------+------------+---+ | 1 | A | 1 | | 1 | A | 5 | | 1 | A | 7 | | 2 | B | 1 | | 2 | B | 5 | | 2 | B | 7 | | 3 | C | 1 | | 3 | C | 5 | | 3 | C | 7 | +----------+------------+---+ 9 rows in set (0.00 sec) The T2 section in the above query is called a sub-query . We will discuss the same in the next section. Natural Join This implicitly selects the common column from table A and table B and performs an inner join. mysql> select e.emp_no,e.first_name,e.last_name,s.salary from employees e natural join salaries s limit 5; +--------+------------+-----------+--------+ | emp_no | first_name | last_name | salary | +--------+------------+-----------+--------+ | 10001 | Georgi | Facello | 60117 | | 10001 | Georgi | Facello | 62102 | | 10001 | Georgi | Facello | 66074 | | 10001 | Georgi | Facello | 66596 | | 10001 | Georgi | Facello | 66961 | +--------+------------+-----------+--------+ 5 rows in set (0.00 sec) Notice how natural join and using takes care that the common column is displayed only once if you are not explicitly selecting columns for the query. Some More Examples Display emp_no, salary, title and dept of the employees where salary > 80000 mysql> select e.emp_no, s.salary, t.title, d.dept_no from employees e join salaries s using (emp_no) join titles t using (emp_no) join dept_emp d using (emp_no) where s.salary > 80000 limit 5; +--------+--------+--------------+---------+ | emp_no | salary | title | dept_no | +--------+--------+--------------+---------+ | 10017 | 82163 | Senior Staff | d001 | | 10017 | 86157 | Senior Staff | d001 | | 10017 | 89619 | Senior Staff | d001 | | 10017 | 91985 | Senior Staff | d001 | | 10017 | 96122 | Senior Staff | d001 | +--------+--------+--------------+---------+ 5 rows in set (0.00 sec) Display title-wise count of employees in each department order by dept_no mysql> select d.dept_no, t.title, count(*) from titles t left join dept_emp d using (emp_no) group by d.dept_no, t.title order by d.dept_no limit 10; +---------+--------------------+----------+ | dept_no | title | count(*) | +---------+--------------------+----------+ | d001 | Manager | 2 | | d001 | Senior Staff | 13940 | | d001 | Staff | 16196 | | d002 | Manager | 2 | | d002 | Senior Staff | 12139 | | d002 | Staff | 13929 | | d003 | Manager | 2 | | d003 | Senior Staff | 12274 | | d003 | Staff | 14342 | | d004 | Assistant Engineer | 6445 | +---------+--------------------+----------+ 10 rows in set (1.32 sec) SELECT - Subquery A subquery is generally a smaller resultset that can be used to power a select query in many ways. It can be used in a \u2018where\u2019 condition, can be used in place of join mostly where a join could be an overkill. These subqueries are also termed as derived tables. They must have a table alias in the select query. Let\u2019s look at some examples of subqueries. Here we got the department name from the departments table by a subquery which used dept_no from dept_emp table. mysql> select e.emp_no, (select dept_name from departments where dept_no=d.dept_no) dept_name from employees e join dept_emp d using (emp_no) limit 5; +--------+-----------------+ | emp_no | dept_name | +--------+-----------------+ | 10001 | Development | | 10002 | Sales | | 10003 | Production | | 10004 | Production | | 10005 | Human Resources | +--------+-----------------+ 5 rows in set (0.01 sec) Here, we used the \u2018avg\u2019 query above (which got the avg salary) as a subquery to list the employees whose latest salary is more than the average. mysql> select avg(salary) from salaries; +-------------+ | avg(salary) | +-------------+ | 63810.7448 | +-------------+ 1 row in set (0.80 sec) mysql> select e.emp_no, max(s.salary) from employees e natural join salaries s group by e.emp_no having max(s.salary) > (select avg(salary) from salaries) limit 10; +--------+---------------+ | emp_no | max(s.salary) | +--------+---------------+ | 10001 | 88958 | | 10002 | 72527 | | 10004 | 74057 | | 10005 | 94692 | | 10007 | 88070 | | 10009 | 94443 | | 10010 | 80324 | | 10013 | 68901 | | 10016 | 77935 | | 10017 | 99651 | +--------+---------------+ 10 rows in set (0.56 sec)","title":"Select Query"},{"location":"level101/databases_sql/select_query/#select-query","text":"The most commonly used command while working with MySQL is SELECT. It is used to fetch the result set from one or more tables. The general form of a typical select query looks like:- SELECT expr FROM table1 [WHERE condition] [GROUP BY column_list HAVING condition] [ORDER BY column_list ASC|DESC] [LIMIT #] The above general form contains some commonly used clauses of a SELECT query:- expr - comma-separated column list or * (for all columns) WHERE - a condition is provided, if true, directs the query to select only those records. GROUP BY - groups the entire result set based on the column list provided. An aggregate function is recommended to be present in the select expression of the query. HAVING supports grouping by putting a condition on the selected or any other aggregate function. ORDER BY - sorts the result set based on the column list in ascending or descending order. LIMIT - commonly used to limit the number of records. Let\u2019s have a look at some examples for a better understanding of the above. The dataset used for the examples below is available here and is free to use. Select all records mysql> select * from employees limit 5; +--------+------------+------------+-----------+--------+------------+ | emp_no | birth_date | first_name | last_name | gender | hire_date | +--------+------------+------------+-----------+--------+------------+ | 10001 | 1953-09-02 | Georgi | Facello | M | 1986-06-26 | | 10002 | 1964-06-02 | Bezalel | Simmel | F | 1985-11-21 | | 10003 | 1959-12-03 | Parto | Bamford | M | 1986-08-28 | | 10004 | 1954-05-01 | Chirstian | Koblick | M | 1986-12-01 | | 10005 | 1955-01-21 | Kyoichi | Maliniak | M | 1989-09-12 | +--------+------------+------------+-----------+--------+------------+ 5 rows in set (0.00 sec) Select specific fields for all records mysql> select first_name, last_name, gender from employees limit 5; +------------+-----------+--------+ | first_name | last_name | gender | +------------+-----------+--------+ | Georgi | Facello | M | | Bezalel | Simmel | F | | Parto | Bamford | M | | Chirstian | Koblick | M | | Kyoichi | Maliniak | M | +------------+-----------+--------+ 5 rows in set (0.00 sec) Select all records Where hire_date >= January 1, 1990 mysql> select * from employees where hire_date >= '1990-01-01' limit 5; +--------+------------+------------+-------------+--------+------------+ | emp_no | birth_date | first_name | last_name | gender | hire_date | +--------+------------+------------+-------------+--------+------------+ | 10008 | 1958-02-19 | Saniya | Kalloufi | M | 1994-09-15 | | 10011 | 1953-11-07 | Mary | Sluis | F | 1990-01-22 | | 10012 | 1960-10-04 | Patricio | Bridgland | M | 1992-12-18 | | 10016 | 1961-05-02 | Kazuhito | Cappelletti | M | 1995-01-27 | | 10017 | 1958-07-06 | Cristinel | Bouloucos | F | 1993-08-03 | +--------+------------+------------+-------------+--------+------------+ 5 rows in set (0.01 sec) Select first_name and last_name from all records Where birth_date >= 1960 AND gender = \u2018F\u2019 mysql> select first_name, last_name from employees where year(birth_date) >= 1960 and gender='F' limit 5; +------------+-----------+ | first_name | last_name | +------------+-----------+ | Bezalel | Simmel | | Duangkaew | Piveteau | | Divier | Reistad | | Jeong | Reistad | | Mingsen | Casley | +------------+-----------+ 5 rows in set (0.00 sec) Display the total number of records mysql> select count(*) from employees; +----------+ | count(*) | +----------+ | 300024 | +----------+ 1 row in set (0.05 sec) Display gender-wise count of all records mysql> select gender, count(*) from employees group by gender; +--------+----------+ | gender | count(*) | +--------+----------+ | M | 179973 | | F | 120051 | +--------+----------+ 2 rows in set (0.14 sec) Display the year of hire_date and number of employees hired that year, also only those years where more than 20k employees were hired mysql> select year(hire_date), count(*) from employees group by year(hire_date) having count(*) > 20000; +-----------------+----------+ | year(hire_date) | count(*) | +-----------------+----------+ | 1985 | 35316 | | 1986 | 36150 | | 1987 | 33501 | | 1988 | 31436 | | 1989 | 28394 | | 1990 | 25610 | | 1991 | 22568 | | 1992 | 20402 | +-----------------+----------+ 8 rows in set (0.14 sec) Display all records ordered by their hire_date in descending order. If hire_date is the same then in order of their birth_date ascending order mysql> select * from employees order by hire_date desc, birth_date asc limit 5; +--------+------------+------------+-----------+--------+------------+ | emp_no | birth_date | first_name | last_name | gender | hire_date | +--------+------------+------------+-----------+--------+------------+ | 463807 | 1964-06-12 | Bikash | Covnot | M | 2000-01-28 | | 428377 | 1957-05-09 | Yucai | Gerlach | M | 2000-01-23 | | 499553 | 1954-05-06 | Hideyuki | Delgrande | F | 2000-01-22 | | 222965 | 1959-08-07 | Volkmar | Perko | F | 2000-01-13 | | 47291 | 1960-09-09 | Ulf | Flexer | M | 2000-01-12 | +--------+------------+------------+-----------+--------+------------+ 5 rows in set (0.12 sec)","title":"SELECT Query"},{"location":"level101/databases_sql/select_query/#select-joins","text":"JOIN statement is used to produce a combined result set from two or more tables based on certain conditions. It can be also used with Update and Delete statements but we will be focussing on the select query. Following is a basic general form for joins SELECT table1.col1, table2.col1, ... (any combination) FROM table1 table2 ON (or USING depends on join_type) table1.column_for_joining = table2.column_for_joining WHERE \u2026 Any number of columns can be selected, but it is recommended to select only those which are relevant to increase the readability of the resultset. All other clauses like where, group by are not mandatory. Let\u2019s discuss the types of JOINs supported by MySQL Syntax. Inner Join This joins table A with table B on a condition. Only the records where the condition is True are selected in the resultset. Display some details of employees along with their salary mysql> select e.emp_no,e.first_name,e.last_name,s.salary from employees e join salaries s on e.emp_no=s.emp_no limit 5; +--------+------------+-----------+--------+ | emp_no | first_name | last_name | salary | +--------+------------+-----------+--------+ | 10001 | Georgi | Facello | 60117 | | 10001 | Georgi | Facello | 62102 | | 10001 | Georgi | Facello | 66074 | | 10001 | Georgi | Facello | 66596 | | 10001 | Georgi | Facello | 66961 | +--------+------------+-----------+--------+ 5 rows in set (0.00 sec) Similar result can be achieved by mysql> select e.emp_no,e.first_name,e.last_name,s.salary from employees e join salaries s using (emp_no) limit 5; +--------+------------+-----------+--------+ | emp_no | first_name | last_name | salary | +--------+------------+-----------+--------+ | 10001 | Georgi | Facello | 60117 | | 10001 | Georgi | Facello | 62102 | | 10001 | Georgi | Facello | 66074 | | 10001 | Georgi | Facello | 66596 | | 10001 | Georgi | Facello | 66961 | +--------+------------+-----------+--------+ 5 rows in set (0.00 sec) And also by mysql> select e.emp_no,e.first_name,e.last_name,s.salary from employees e natural join salaries s limit 5; +--------+------------+-----------+--------+ | emp_no | first_name | last_name | salary | +--------+------------+-----------+--------+ | 10001 | Georgi | Facello | 60117 | | 10001 | Georgi | Facello | 62102 | | 10001 | Georgi | Facello | 66074 | | 10001 | Georgi | Facello | 66596 | | 10001 | Georgi | Facello | 66961 | +--------+------------+-----------+--------+ 5 rows in set (0.00 sec) Outer Join Majorly of two types:- - LEFT - joining complete table A with table B on a condition. All the records from table A are selected, but from table B, only those records are selected where the condition is True. - RIGHT - Exact opposite of the left join. Let us assume the below tables for understanding left join better. mysql> select * from dummy1; +----------+------------+ | same_col | diff_col_1 | +----------+------------+ | 1 | A | | 2 | B | | 3 | C | +----------+------------+ mysql> select * from dummy2; +----------+------------+ | same_col | diff_col_2 | +----------+------------+ | 1 | X | | 3 | Y | +----------+------------+ A simple select join will look like the one below. mysql> select * from dummy1 d1 left join dummy2 d2 on d1.same_col=d2.same_col; +----------+------------+----------+------------+ | same_col | diff_col_1 | same_col | diff_col_2 | +----------+------------+----------+------------+ | 1 | A | 1 | X | | 3 | C | 3 | Y | | 2 | B | NULL | NULL | +----------+------------+----------+------------+ 3 rows in set (0.00 sec) Which can also be written as mysql> select * from dummy1 d1 left join dummy2 d2 using(same_col); +----------+------------+------------+ | same_col | diff_col_1 | diff_col_2 | +----------+------------+------------+ | 1 | A | X | | 3 | C | Y | | 2 | B | NULL | +----------+------------+------------+ 3 rows in set (0.00 sec) And also as mysql> select * from dummy1 d1 natural left join dummy2 d2; +----------+------------+------------+ | same_col | diff_col_1 | diff_col_2 | +----------+------------+------------+ | 1 | A | X | | 3 | C | Y | | 2 | B | NULL | +----------+------------+------------+ 3 rows in set (0.00 sec) Cross Join This does a cross product of table A and table B without any condition. It doesn\u2019t have a lot of applications in the real world. A Simple Cross Join looks like this mysql> select * from dummy1 cross join dummy2; +----------+------------+----------+------------+ | same_col | diff_col_1 | same_col | diff_col_2 | +----------+------------+----------+------------+ | 1 | A | 3 | Y | | 1 | A | 1 | X | | 2 | B | 3 | Y | | 2 | B | 1 | X | | 3 | C | 3 | Y | | 3 | C | 1 | X | +----------+------------+----------+------------+ 6 rows in set (0.01 sec) One use case that can come in handy is when you have to fill in some missing entries. For example, all the entries from dummy1 must be inserted into a similar table dummy3, with each record must have 3 entries with statuses 1, 5 and 7. mysql> desc dummy3; +----------+----------+------+-----+---------+-------+ | Field | Type | Null | Key | Default | Extra | +----------+----------+------+-----+---------+-------+ | same_col | int | YES | | NULL | | | value | char(15) | YES | | NULL | | | status | smallint | YES | | NULL | | +----------+----------+------+-----+---------+-------+ 3 rows in set (0.02 sec) Either you create an insert query script with as many entries as in dummy1 or use cross join to produce the required resultset. mysql> select * from dummy1 cross join (select 1 union select 5 union select 7) T2 order by same_col; +----------+------------+---+ | same_col | diff_col_1 | 1 | +----------+------------+---+ | 1 | A | 1 | | 1 | A | 5 | | 1 | A | 7 | | 2 | B | 1 | | 2 | B | 5 | | 2 | B | 7 | | 3 | C | 1 | | 3 | C | 5 | | 3 | C | 7 | +----------+------------+---+ 9 rows in set (0.00 sec) The T2 section in the above query is called a sub-query . We will discuss the same in the next section. Natural Join This implicitly selects the common column from table A and table B and performs an inner join. mysql> select e.emp_no,e.first_name,e.last_name,s.salary from employees e natural join salaries s limit 5; +--------+------------+-----------+--------+ | emp_no | first_name | last_name | salary | +--------+------------+-----------+--------+ | 10001 | Georgi | Facello | 60117 | | 10001 | Georgi | Facello | 62102 | | 10001 | Georgi | Facello | 66074 | | 10001 | Georgi | Facello | 66596 | | 10001 | Georgi | Facello | 66961 | +--------+------------+-----------+--------+ 5 rows in set (0.00 sec) Notice how natural join and using takes care that the common column is displayed only once if you are not explicitly selecting columns for the query. Some More Examples Display emp_no, salary, title and dept of the employees where salary > 80000 mysql> select e.emp_no, s.salary, t.title, d.dept_no from employees e join salaries s using (emp_no) join titles t using (emp_no) join dept_emp d using (emp_no) where s.salary > 80000 limit 5; +--------+--------+--------------+---------+ | emp_no | salary | title | dept_no | +--------+--------+--------------+---------+ | 10017 | 82163 | Senior Staff | d001 | | 10017 | 86157 | Senior Staff | d001 | | 10017 | 89619 | Senior Staff | d001 | | 10017 | 91985 | Senior Staff | d001 | | 10017 | 96122 | Senior Staff | d001 | +--------+--------+--------------+---------+ 5 rows in set (0.00 sec) Display title-wise count of employees in each department order by dept_no mysql> select d.dept_no, t.title, count(*) from titles t left join dept_emp d using (emp_no) group by d.dept_no, t.title order by d.dept_no limit 10; +---------+--------------------+----------+ | dept_no | title | count(*) | +---------+--------------------+----------+ | d001 | Manager | 2 | | d001 | Senior Staff | 13940 | | d001 | Staff | 16196 | | d002 | Manager | 2 | | d002 | Senior Staff | 12139 | | d002 | Staff | 13929 | | d003 | Manager | 2 | | d003 | Senior Staff | 12274 | | d003 | Staff | 14342 | | d004 | Assistant Engineer | 6445 | +---------+--------------------+----------+ 10 rows in set (1.32 sec)","title":"SELECT - JOINS"},{"location":"level101/databases_sql/select_query/#select-subquery","text":"A subquery is generally a smaller resultset that can be used to power a select query in many ways. It can be used in a \u2018where\u2019 condition, can be used in place of join mostly where a join could be an overkill. These subqueries are also termed as derived tables. They must have a table alias in the select query. Let\u2019s look at some examples of subqueries. Here we got the department name from the departments table by a subquery which used dept_no from dept_emp table. mysql> select e.emp_no, (select dept_name from departments where dept_no=d.dept_no) dept_name from employees e join dept_emp d using (emp_no) limit 5; +--------+-----------------+ | emp_no | dept_name | +--------+-----------------+ | 10001 | Development | | 10002 | Sales | | 10003 | Production | | 10004 | Production | | 10005 | Human Resources | +--------+-----------------+ 5 rows in set (0.01 sec) Here, we used the \u2018avg\u2019 query above (which got the avg salary) as a subquery to list the employees whose latest salary is more than the average. mysql> select avg(salary) from salaries; +-------------+ | avg(salary) | +-------------+ | 63810.7448 | +-------------+ 1 row in set (0.80 sec) mysql> select e.emp_no, max(s.salary) from employees e natural join salaries s group by e.emp_no having max(s.salary) > (select avg(salary) from salaries) limit 10; +--------+---------------+ | emp_no | max(s.salary) | +--------+---------------+ | 10001 | 88958 | | 10002 | 72527 | | 10004 | 74057 | | 10005 | 94692 | | 10007 | 88070 | | 10009 | 94443 | | 10010 | 80324 | | 10013 | 68901 | | 10016 | 77935 | | 10017 | 99651 | +--------+---------------+ 10 rows in set (0.56 sec)","title":"SELECT - Subquery"},{"location":"level101/git/branches/","text":"Working With Branches Coming back to our local repo which has two commits. So far, what we have is a single line of history. Commits are chained in a single line. But sometimes you may have a need to work on two different features in parallel in the same repo. Now one option here could be making a new folder/repo with the same code and use that for another feature development. But there's a better way. Use branches. Since git follows tree like structure for commits, we can use branches to work on different sets of features. From a commit, two or more branches can be created and branches can also be merged. Using branches, there can exist multiple lines of histories and we can checkout to any of them and work on it. Checking out, as we discussed earlier, would simply mean replacing contents of the directory (repo) with the snapshot at the checked out version. Let's create a branch and see how it looks like: $ git branch b1 $ git log --oneline --graph * 7f3b00e (HEAD -> master, b1) adding file 2 * df2fb7a adding file 1 We create a branch called b1 . Git log tells us that b1 also points to the last commit (7f3b00e) but the HEAD is still pointing to master. If you remember, HEAD points to the commit/reference wherever you are checkout to. So if we checkout to b1 , HEAD should point to that. Let's confirm: $ git checkout b1 Switched to branch 'b1' $ git log --oneline --graph * 7f3b00e (HEAD -> b1, master) adding file 2 * df2fb7a adding file 1 b1 still points to the same commit but HEAD now points to b1 . Since we create a branch at commit 7f3b00e , there will be two lines of histories starting this commit. Depending on which branch you are checked out on, the line of history will progress. At this moment, we are checked out on branch b1 , so making a new commit will advance branch reference b1 to that commit and current b1 commit will become its parent. Let's do that. # Creating a file and making a commit $ echo \"I am a file in b1 branch\" > b1.txt $ git add b1.txt $ git commit -m \"adding b1 file\" [b1 872a38f] adding b1 file 1 file changed, 1 insertion(+) create mode 100644 b1.txt # The new line of history $ git log --oneline --graph * 872a38f (HEAD -> b1) adding b1 file * 7f3b00e (master) adding file 2 * df2fb7a adding file 1 $ Do note that master is still pointing to the old commit it was pointing to. We can now checkout to master branch and make commits there. This will result in another line of history starting from commit 7f3b00e. # checkout to master branch $ git checkout master Switched to branch 'master' # Creating a new commit on master branch $ echo \"new file in master branch\" > master.txt $ git add master.txt $ git commit -m \"adding master.txt file\" [master 60dc441] adding master.txt file 1 file changed, 1 insertion(+) create mode 100644 master.txt # The history line $ git log --oneline --graph * 60dc441 (HEAD -> master) adding master.txt file * 7f3b00e adding file 2 * df2fb7a adding file 1 Notice how branch b1 is not visible here since we are on the master. Let's try to visualize both to get the whole picture: $ git log --oneline --graph --all * 60dc441 (HEAD -> master) adding master.txt file | * 872a38f (b1) adding b1 file |/ * 7f3b00e adding file 2 * df2fb7a adding file 1 Above tree structure should make things clear. Notice a clear branch/fork on commit 7f3b00e. This is how we create branches. Now they both are two separate lines of history on which feature development can be done independently. To reiterate, internally, git is just a tree of commits. Branch names (human readable) are pointers to those commits in the tree. We use various git commands to work with the tree structure and references. Git accordingly modifies contents of our repo. Merges Now say the feature you were working on branch b1 is complete and you need to merge it on master branch, where all the final version of code goes. So first you will checkout to branch master and then you pull the latest code from upstream (eg: GitHub). Then you need to merge your code from b1 into master. There could be two ways this can be done. Here is the current history: $ git log --oneline --graph --all * 60dc441 (HEAD -> master) adding master.txt file | * 872a38f (b1) adding b1 file |/ * 7f3b00e adding file 2 * df2fb7a adding file 1 Option 1: Directly merge the branch. Merging the branch b1 into master will result in a new merge commit. This will merge changes from two different lines of history and create a new commit of the result. $ git merge b1 Merge made by the 'recursive' strategy. b1.txt | 1 + 1 file changed, 1 insertion(+) create mode 100644 b1.txt $ git log --oneline --graph --all * 8fc28f9 (HEAD -> master) Merge branch 'b1' |\\ | * 872a38f (b1) adding b1 file * | 60dc441 adding master.txt file |/ * 7f3b00e adding file 2 * df2fb7a adding file 1 You can see a new merge commit created (8fc28f9). You will be prompted for the commit message. If there are a lot of branches in the repo, this result will end-up with a lot of merge commits. Which looks ugly compared to a single line of history of development. So let's look at an alternative approach First let's reset our last merge and go to the previous state. $ git reset --hard 60dc441 HEAD is now at 60dc441 adding master.txt file $ git log --oneline --graph --all * 60dc441 (HEAD -> master) adding master.txt file | * 872a38f (b1) adding b1 file |/ * 7f3b00e adding file 2 * df2fb7a adding file 1 Option 2: Rebase. Now, instead of merging two branches which has a similar base (commit: 7f3b00e), let us rebase branch b1 on to current master. What this means is take branch b1 (from commit 7f3b00e to commit 872a38f) and rebase (put them on top of) master (60dc441). # Switch to b1 $ git checkout b1 Switched to branch 'b1' # Rebase (b1 which is current branch) on master $ git rebase master First, rewinding head to replay your work on top of it... Applying: adding b1 file # The result $ git log --oneline --graph --all * 5372c8f (HEAD -> b1) adding b1 file * 60dc441 (master) adding master.txt file * 7f3b00e adding file 2 * df2fb7a adding file 1 You can see b1 which had 1 commit. That commit's parent was 7f3b00e . But since we rebase it on master ( 60dc441 ). That becomes the parent now. As a side effect, you also see it has become a single line of history. Now if we were to merge b1 into master , it would simply mean change master to point to 5372c8f which is b1 . Let's try it: # checkout to master since we want to merge code into master $ git checkout master Switched to branch 'master' # the current history, where b1 is based on master $ git log --oneline --graph --all * 5372c8f (b1) adding b1 file * 60dc441 (HEAD -> master) adding master.txt file * 7f3b00e adding file 2 * df2fb7a adding file 1 # Performing the merge, notice the \"fast-forward\" message $ git merge b1 Updating 60dc441..5372c8f Fast-forward b1.txt | 1 + 1 file changed, 1 insertion(+) create mode 100644 b1.txt # The Result $ git log --oneline --graph --all * 5372c8f (HEAD -> master, b1) adding b1 file * 60dc441 adding master.txt file * 7f3b00e adding file 2 * df2fb7a adding file 1 Now you see both b1 and master are pointing to the same commit. Your code has been merged to the master branch and it can be pushed. Also we have clean line of history! :D","title":"Working With Branches"},{"location":"level101/git/branches/#working-with-branches","text":"Coming back to our local repo which has two commits. So far, what we have is a single line of history. Commits are chained in a single line. But sometimes you may have a need to work on two different features in parallel in the same repo. Now one option here could be making a new folder/repo with the same code and use that for another feature development. But there's a better way. Use branches. Since git follows tree like structure for commits, we can use branches to work on different sets of features. From a commit, two or more branches can be created and branches can also be merged. Using branches, there can exist multiple lines of histories and we can checkout to any of them and work on it. Checking out, as we discussed earlier, would simply mean replacing contents of the directory (repo) with the snapshot at the checked out version. Let's create a branch and see how it looks like: $ git branch b1 $ git log --oneline --graph * 7f3b00e (HEAD -> master, b1) adding file 2 * df2fb7a adding file 1 We create a branch called b1 . Git log tells us that b1 also points to the last commit (7f3b00e) but the HEAD is still pointing to master. If you remember, HEAD points to the commit/reference wherever you are checkout to. So if we checkout to b1 , HEAD should point to that. Let's confirm: $ git checkout b1 Switched to branch 'b1' $ git log --oneline --graph * 7f3b00e (HEAD -> b1, master) adding file 2 * df2fb7a adding file 1 b1 still points to the same commit but HEAD now points to b1 . Since we create a branch at commit 7f3b00e , there will be two lines of histories starting this commit. Depending on which branch you are checked out on, the line of history will progress. At this moment, we are checked out on branch b1 , so making a new commit will advance branch reference b1 to that commit and current b1 commit will become its parent. Let's do that. # Creating a file and making a commit $ echo \"I am a file in b1 branch\" > b1.txt $ git add b1.txt $ git commit -m \"adding b1 file\" [b1 872a38f] adding b1 file 1 file changed, 1 insertion(+) create mode 100644 b1.txt # The new line of history $ git log --oneline --graph * 872a38f (HEAD -> b1) adding b1 file * 7f3b00e (master) adding file 2 * df2fb7a adding file 1 $ Do note that master is still pointing to the old commit it was pointing to. We can now checkout to master branch and make commits there. This will result in another line of history starting from commit 7f3b00e. # checkout to master branch $ git checkout master Switched to branch 'master' # Creating a new commit on master branch $ echo \"new file in master branch\" > master.txt $ git add master.txt $ git commit -m \"adding master.txt file\" [master 60dc441] adding master.txt file 1 file changed, 1 insertion(+) create mode 100644 master.txt # The history line $ git log --oneline --graph * 60dc441 (HEAD -> master) adding master.txt file * 7f3b00e adding file 2 * df2fb7a adding file 1 Notice how branch b1 is not visible here since we are on the master. Let's try to visualize both to get the whole picture: $ git log --oneline --graph --all * 60dc441 (HEAD -> master) adding master.txt file | * 872a38f (b1) adding b1 file |/ * 7f3b00e adding file 2 * df2fb7a adding file 1 Above tree structure should make things clear. Notice a clear branch/fork on commit 7f3b00e. This is how we create branches. Now they both are two separate lines of history on which feature development can be done independently. To reiterate, internally, git is just a tree of commits. Branch names (human readable) are pointers to those commits in the tree. We use various git commands to work with the tree structure and references. Git accordingly modifies contents of our repo.","title":"Working With Branches"},{"location":"level101/git/branches/#merges","text":"Now say the feature you were working on branch b1 is complete and you need to merge it on master branch, where all the final version of code goes. So first you will checkout to branch master and then you pull the latest code from upstream (eg: GitHub). Then you need to merge your code from b1 into master. There could be two ways this can be done. Here is the current history: $ git log --oneline --graph --all * 60dc441 (HEAD -> master) adding master.txt file | * 872a38f (b1) adding b1 file |/ * 7f3b00e adding file 2 * df2fb7a adding file 1 Option 1: Directly merge the branch. Merging the branch b1 into master will result in a new merge commit. This will merge changes from two different lines of history and create a new commit of the result. $ git merge b1 Merge made by the 'recursive' strategy. b1.txt | 1 + 1 file changed, 1 insertion(+) create mode 100644 b1.txt $ git log --oneline --graph --all * 8fc28f9 (HEAD -> master) Merge branch 'b1' |\\ | * 872a38f (b1) adding b1 file * | 60dc441 adding master.txt file |/ * 7f3b00e adding file 2 * df2fb7a adding file 1 You can see a new merge commit created (8fc28f9). You will be prompted for the commit message. If there are a lot of branches in the repo, this result will end-up with a lot of merge commits. Which looks ugly compared to a single line of history of development. So let's look at an alternative approach First let's reset our last merge and go to the previous state. $ git reset --hard 60dc441 HEAD is now at 60dc441 adding master.txt file $ git log --oneline --graph --all * 60dc441 (HEAD -> master) adding master.txt file | * 872a38f (b1) adding b1 file |/ * 7f3b00e adding file 2 * df2fb7a adding file 1 Option 2: Rebase. Now, instead of merging two branches which has a similar base (commit: 7f3b00e), let us rebase branch b1 on to current master. What this means is take branch b1 (from commit 7f3b00e to commit 872a38f) and rebase (put them on top of) master (60dc441). # Switch to b1 $ git checkout b1 Switched to branch 'b1' # Rebase (b1 which is current branch) on master $ git rebase master First, rewinding head to replay your work on top of it... Applying: adding b1 file # The result $ git log --oneline --graph --all * 5372c8f (HEAD -> b1) adding b1 file * 60dc441 (master) adding master.txt file * 7f3b00e adding file 2 * df2fb7a adding file 1 You can see b1 which had 1 commit. That commit's parent was 7f3b00e . But since we rebase it on master ( 60dc441 ). That becomes the parent now. As a side effect, you also see it has become a single line of history. Now if we were to merge b1 into master , it would simply mean change master to point to 5372c8f which is b1 . Let's try it: # checkout to master since we want to merge code into master $ git checkout master Switched to branch 'master' # the current history, where b1 is based on master $ git log --oneline --graph --all * 5372c8f (b1) adding b1 file * 60dc441 (HEAD -> master) adding master.txt file * 7f3b00e adding file 2 * df2fb7a adding file 1 # Performing the merge, notice the \"fast-forward\" message $ git merge b1 Updating 60dc441..5372c8f Fast-forward b1.txt | 1 + 1 file changed, 1 insertion(+) create mode 100644 b1.txt # The Result $ git log --oneline --graph --all * 5372c8f (HEAD -> master, b1) adding b1 file * 60dc441 adding master.txt file * 7f3b00e adding file 2 * df2fb7a adding file 1 Now you see both b1 and master are pointing to the same commit. Your code has been merged to the master branch and it can be pushed. Also we have clean line of history! :D","title":"Merges"},{"location":"level101/git/conclusion/","text":"What next from here? There are a lot of git commands and features which we have not explored here. But with the base built-up, be sure to explore concepts like Cherrypick Squash Amend Stash Reset","title":"Conclusion"},{"location":"level101/git/conclusion/#what-next-from-here","text":"There are a lot of git commands and features which we have not explored here. But with the base built-up, be sure to explore concepts like Cherrypick Squash Amend Stash Reset","title":"What next from here?"},{"location":"level101/git/git-basics/","text":"Git Prerequisites Have Git installed https://git-scm.com/downloads Have taken any git high level tutorial or following LinkedIn learning courses https://www.linkedin.com/learning/git-essential-training-the-basics/ https://www.linkedin.com/learning/git-branches-merges-and-remotes/ The Official Git Docs What to expect from this course As an engineer in the field of computer science, having knowledge of version control tools becomes almost a requirement. While there are a lot of version control tools that exist today like SVN, Mercurial, etc, Git perhaps is the most used one and this course we will be working with Git. While this course does not start with Git 101 and expects basic knowledge of git as a prerequisite, it will reintroduce the git concepts known by you with details covering what is happening under the hood as you execute various git commands. So that next time you run a git command, you will be able to press enter more confidently! What is not covered under this course Advanced usage and specifics of internal implementation details of Git. Course Contents Git Basics Working with Branches Git with Github Hooks Git Basics Though you might be aware already, let's revisit why we need a version control system. As the project grows and multiple developers start working on it, an efficient method for collaboration is warranted. Git helps the team collaborate easily and also maintains the history of the changes happening with the codebase. Creating a Git Repo Any folder can be converted into a git repository. After executing the following command, we will see a .git folder within the folder, which makes our folder a git repository. All the magic that git does, .git folder is the enabler for the same. # creating an empty folder and changing current dir to it $ cd /tmp $ mkdir school-of-sre $ cd school-of-sre/ # initialize a git repo $ git init Initialized empty Git repository in /private/tmp/school-of-sre/.git/ As the output says, an empty git repo has been initialized in our folder. Let's take a look at what is there. $ ls .git/ HEAD config description hooks info objects refs There are a bunch of folders and files in the .git folder. As I said, all these enables git to do its magic. We will look into some of these folders and files. But for now, what we have is an empty git repository. Tracking a File Now as you might already know, let us create a new file in our repo (we will refer to the folder as repo now.) And see git status $ echo \"I am file 1\" > file1.txt $ git status On branch master No commits yet Untracked files: (use \"git add ...\" to include in what will be committed) file1.txt nothing added to commit but untracked files present (use \"git add\" to track) The current git status says No commits yet and there is one untracked file. Since we just created the file, git is not tracking that file. We explicitly need to ask git to track files and folders. (also checkout gitignore ) And how we do that is via git add command as suggested in the above output. Then we go ahead and create a commit. $ git add file1.txt $ git status On branch master No commits yet Changes to be committed: (use \"git rm --cached ...\" to unstage) new file: file1.txt $ git commit -m \"adding file 1\" [master (root-commit) df2fb7a] adding file 1 1 file changed, 1 insertion(+) create mode 100644 file1.txt Notice how after adding the file, git status says Changes to be committed: . What it means is whatever is listed there, will be included in the next commit. Then we go ahead and create a commit, with an attached messaged via -m . More About a Commit Commit is a snapshot of the repo. Whenever a commit is made, a snapshot of the current state of repo (the folder) is taken and saved. Each commit has a unique ID. ( df2fb7a for the commit we made in the previous step). As we keep adding/changing more and more contents and keep making commits, all those snapshots are stored by git. Again, all this magic happens inside the .git folder. This is where all this snapshot or versions are stored in an efficient manner. Adding More Changes Let us create one more file and commit the change. It would look the same as the previous commit we made. $ echo \"I am file 2\" > file2.txt $ git add file2.txt $ git commit -m \"adding file 2\" [master 7f3b00e] adding file 2 1 file changed, 1 insertion(+) create mode 100644 file2.txt A new commit with ID 7f3b00e has been created. You can issue git status at any time to see the state of the repository. **IMPORTANT: Note that commit IDs are long string (SHA) but we can refer to a commit by its initial few (8 or more) characters too. We will interchangeably using shorter and longer commit IDs.** Now that we have two commits, let's visualize them: $ git log --oneline --graph * 7f3b00e (HEAD -> master) adding file 2 * df2fb7a adding file 1 git log , as the name suggests, prints the log of all the git commits. Here you see two additional arguments, --oneline prints the shorter version of the log, ie: the commit message only and not the person who made the commit and when. --graph prints it in graph format. Now at this moment the commits might look like just one in each line but all commits are stored as a tree like data structure internally by git. That means there can be two or more children commits of a given commit. And not just a single line of commits. We will look more into this part when we get to the Branches section. For now this is our commit history: df2fb7a ===> 7f3b00e Are commits really linked? As I just said, the two commits we just made are linked via tree like data structure and we saw how they are linked. But let's actually verify it. Everything in git is an object. Newly created files are stored as an object. Changes to file are stored as an objects and even commits are objects. To view contents of an object we can use the following command with the object's ID. We will take a look at the contents of the second commit $ git cat-file -p 7f3b00e tree ebf3af44d253e5328340026e45a9fa9ae3ea1982 parent df2fb7a61f5d40c1191e0fdeb0fc5d6e7969685a author Sanket Patel 1603273316 -0700 committer Sanket Patel 1603273316 -0700 adding file 2 Take a note of parent attribute in the above output. It points to the commit id of the first commit we made. So this proves that they are linked! Additionally you can see the second commit's message in this object. As I said all this magic is enabled by .git folder and the object to which we are looking at also is in that folder. $ ls .git/objects/7f/3b00eaa957815884198e2fdfec29361108d6a9 .git/objects/7f/3b00eaa957815884198e2fdfec29361108d6a9 It is stored in .git/objects/ folder. All the files and changes to them as well are stored in this folder. The Version Control part of Git We already can see two commits (versions) in our git log. One thing a version control tool gives you is ability to browse back and forth in history. For example: some of your users are running an old version of code and they are reporting an issue. In order to debug the issue, you need access to the old code. The one in your current repo is the latest code. In this example, you are working on the second commit (7f3b00e) and someone reported an issue with the code snapshot at commit (df2fb7a). This is how you would get access to the code at any older commit # Current contents, two files present $ ls file1.txt file2.txt # checking out to (an older) commit $ git checkout df2fb7a Note: checking out 'df2fb7a'. You are in 'detached HEAD' state. You can look around, make experimental changes and commit them, and you can discard any commits you make in this state without impacting any branches by performing another checkout. If you want to create a new branch to retain commits you create, you may do so (now or later) by using -b with the checkout command again. Example: git checkout -b HEAD is now at df2fb7a adding file 1 # checking contents, can verify it has old contents $ ls file1.txt So this is how we would get access to old versions/snapshots. All we need is a reference to that snapshot. Upon executing git checkout ... , what git does for you is use the .git folder, see what was the state of things (files and folders) at that version/reference and replace the contents of current directory with those contents. The then-existing content will no longer be present in the local dir (repo) but we can and will still get access to them because they are tracked via git commit and .git folder has them stored/tracked. Reference I mention in the previous section that we need a reference to the version. By default, git repo is made of tree of commits. And each commit has a unique IDs. But the unique ID is not the only thing we can reference commits via. There are multiple ways to reference commits. For example: HEAD is a reference to current commit. Whatever commit your repo is checked out at, HEAD will point to that. HEAD~1 is reference to previous commit. So while checking out previous version in section above, we could have done git checkout HEAD~1 . Similarly, master is also a reference (to a branch). Since git uses tree like structure to store commits, there of course will be branches. And the default branch is called master . Master (or any branch reference) will point to the latest commit in the branch. Even though we have checked out to the previous commit in out repo, master still points to the latest commit. And we can get back to the latest version by checkout at master reference $ git checkout master Previous HEAD position was df2fb7a adding file 1 Switched to branch 'master' # now we will see latest code, with two files $ ls file1.txt file2.txt Note, instead of master in above command, we could have used commit's ID as well. References and The Magic Let's look at the state of things. Two commits, master and HEAD references are pointing to the latest commit $ git log --oneline --graph * 7f3b00e (HEAD -> master) adding file 2 * df2fb7a adding file 1 The magic? Let's examine these files: $ cat .git/refs/heads/master 7f3b00eaa957815884198e2fdfec29361108d6a9 Viola! Where master is pointing to is stored in a file. Whenever git needs to know where master reference is pointing to, or if git needs to update where master points, it just needs to update the file above. So when you create a new commit, a new commit is created on top of the current commit and the master file is updated with the new commit's ID. Similary, for HEAD reference: $ cat .git/HEAD ref: refs/heads/master We can see HEAD is pointing to a reference called refs/heads/master . So HEAD will point where ever the master points. Little Adventure We discussed how git will update the files as we execute commands. But let's try to do it ourselves, by hand, and see what happens. $ git log --oneline --graph * 7f3b00e (HEAD -> master) adding file 2 * df2fb7a adding file 1 Now let's change master to point to the previous/first commit. $ echo df2fb7a61f5d40c1191e0fdeb0fc5d6e7969685a > .git/refs/heads/master $ git log --oneline --graph * df2fb7a (HEAD -> master) adding file 1 # RESETTING TO ORIGINAL $ echo 7f3b00eaa957815884198e2fdfec29361108d6a9 > .git/refs/heads/master $ git log --oneline --graph * 7f3b00e (HEAD -> master) adding file 2 * df2fb7a adding file 1 We just edited the master reference file and now we can see only the first commit in git log. Undoing the change to the file brings the state back to original. Not so much of magic, is it?","title":"Git Basics"},{"location":"level101/git/git-basics/#git","text":"","title":"Git"},{"location":"level101/git/git-basics/#prerequisites","text":"Have Git installed https://git-scm.com/downloads Have taken any git high level tutorial or following LinkedIn learning courses https://www.linkedin.com/learning/git-essential-training-the-basics/ https://www.linkedin.com/learning/git-branches-merges-and-remotes/ The Official Git Docs","title":"Prerequisites"},{"location":"level101/git/git-basics/#what-to-expect-from-this-course","text":"As an engineer in the field of computer science, having knowledge of version control tools becomes almost a requirement. While there are a lot of version control tools that exist today like SVN, Mercurial, etc, Git perhaps is the most used one and this course we will be working with Git. While this course does not start with Git 101 and expects basic knowledge of git as a prerequisite, it will reintroduce the git concepts known by you with details covering what is happening under the hood as you execute various git commands. So that next time you run a git command, you will be able to press enter more confidently!","title":"What to expect from this course"},{"location":"level101/git/git-basics/#what-is-not-covered-under-this-course","text":"Advanced usage and specifics of internal implementation details of Git.","title":"What is not covered under this course"},{"location":"level101/git/git-basics/#course-contents","text":"Git Basics Working with Branches Git with Github Hooks","title":"Course Contents"},{"location":"level101/git/git-basics/#git-basics","text":"Though you might be aware already, let's revisit why we need a version control system. As the project grows and multiple developers start working on it, an efficient method for collaboration is warranted. Git helps the team collaborate easily and also maintains the history of the changes happening with the codebase.","title":"Git Basics"},{"location":"level101/git/git-basics/#creating-a-git-repo","text":"Any folder can be converted into a git repository. After executing the following command, we will see a .git folder within the folder, which makes our folder a git repository. All the magic that git does, .git folder is the enabler for the same. # creating an empty folder and changing current dir to it $ cd /tmp $ mkdir school-of-sre $ cd school-of-sre/ # initialize a git repo $ git init Initialized empty Git repository in /private/tmp/school-of-sre/.git/ As the output says, an empty git repo has been initialized in our folder. Let's take a look at what is there. $ ls .git/ HEAD config description hooks info objects refs There are a bunch of folders and files in the .git folder. As I said, all these enables git to do its magic. We will look into some of these folders and files. But for now, what we have is an empty git repository.","title":"Creating a Git Repo"},{"location":"level101/git/git-basics/#tracking-a-file","text":"Now as you might already know, let us create a new file in our repo (we will refer to the folder as repo now.) And see git status $ echo \"I am file 1\" > file1.txt $ git status On branch master No commits yet Untracked files: (use \"git add ...\" to include in what will be committed) file1.txt nothing added to commit but untracked files present (use \"git add\" to track) The current git status says No commits yet and there is one untracked file. Since we just created the file, git is not tracking that file. We explicitly need to ask git to track files and folders. (also checkout gitignore ) And how we do that is via git add command as suggested in the above output. Then we go ahead and create a commit. $ git add file1.txt $ git status On branch master No commits yet Changes to be committed: (use \"git rm --cached ...\" to unstage) new file: file1.txt $ git commit -m \"adding file 1\" [master (root-commit) df2fb7a] adding file 1 1 file changed, 1 insertion(+) create mode 100644 file1.txt Notice how after adding the file, git status says Changes to be committed: . What it means is whatever is listed there, will be included in the next commit. Then we go ahead and create a commit, with an attached messaged via -m .","title":"Tracking a File"},{"location":"level101/git/git-basics/#more-about-a-commit","text":"Commit is a snapshot of the repo. Whenever a commit is made, a snapshot of the current state of repo (the folder) is taken and saved. Each commit has a unique ID. ( df2fb7a for the commit we made in the previous step). As we keep adding/changing more and more contents and keep making commits, all those snapshots are stored by git. Again, all this magic happens inside the .git folder. This is where all this snapshot or versions are stored in an efficient manner.","title":"More About a Commit"},{"location":"level101/git/git-basics/#adding-more-changes","text":"Let us create one more file and commit the change. It would look the same as the previous commit we made. $ echo \"I am file 2\" > file2.txt $ git add file2.txt $ git commit -m \"adding file 2\" [master 7f3b00e] adding file 2 1 file changed, 1 insertion(+) create mode 100644 file2.txt A new commit with ID 7f3b00e has been created. You can issue git status at any time to see the state of the repository. **IMPORTANT: Note that commit IDs are long string (SHA) but we can refer to a commit by its initial few (8 or more) characters too. We will interchangeably using shorter and longer commit IDs.** Now that we have two commits, let's visualize them: $ git log --oneline --graph * 7f3b00e (HEAD -> master) adding file 2 * df2fb7a adding file 1 git log , as the name suggests, prints the log of all the git commits. Here you see two additional arguments, --oneline prints the shorter version of the log, ie: the commit message only and not the person who made the commit and when. --graph prints it in graph format. Now at this moment the commits might look like just one in each line but all commits are stored as a tree like data structure internally by git. That means there can be two or more children commits of a given commit. And not just a single line of commits. We will look more into this part when we get to the Branches section. For now this is our commit history: df2fb7a ===> 7f3b00e","title":"Adding More Changes"},{"location":"level101/git/git-basics/#are-commits-really-linked","text":"As I just said, the two commits we just made are linked via tree like data structure and we saw how they are linked. But let's actually verify it. Everything in git is an object. Newly created files are stored as an object. Changes to file are stored as an objects and even commits are objects. To view contents of an object we can use the following command with the object's ID. We will take a look at the contents of the second commit $ git cat-file -p 7f3b00e tree ebf3af44d253e5328340026e45a9fa9ae3ea1982 parent df2fb7a61f5d40c1191e0fdeb0fc5d6e7969685a author Sanket Patel 1603273316 -0700 committer Sanket Patel 1603273316 -0700 adding file 2 Take a note of parent attribute in the above output. It points to the commit id of the first commit we made. So this proves that they are linked! Additionally you can see the second commit's message in this object. As I said all this magic is enabled by .git folder and the object to which we are looking at also is in that folder. $ ls .git/objects/7f/3b00eaa957815884198e2fdfec29361108d6a9 .git/objects/7f/3b00eaa957815884198e2fdfec29361108d6a9 It is stored in .git/objects/ folder. All the files and changes to them as well are stored in this folder.","title":"Are commits really linked?"},{"location":"level101/git/git-basics/#the-version-control-part-of-git","text":"We already can see two commits (versions) in our git log. One thing a version control tool gives you is ability to browse back and forth in history. For example: some of your users are running an old version of code and they are reporting an issue. In order to debug the issue, you need access to the old code. The one in your current repo is the latest code. In this example, you are working on the second commit (7f3b00e) and someone reported an issue with the code snapshot at commit (df2fb7a). This is how you would get access to the code at any older commit # Current contents, two files present $ ls file1.txt file2.txt # checking out to (an older) commit $ git checkout df2fb7a Note: checking out 'df2fb7a'. You are in 'detached HEAD' state. You can look around, make experimental changes and commit them, and you can discard any commits you make in this state without impacting any branches by performing another checkout. If you want to create a new branch to retain commits you create, you may do so (now or later) by using -b with the checkout command again. Example: git checkout -b HEAD is now at df2fb7a adding file 1 # checking contents, can verify it has old contents $ ls file1.txt So this is how we would get access to old versions/snapshots. All we need is a reference to that snapshot. Upon executing git checkout ... , what git does for you is use the .git folder, see what was the state of things (files and folders) at that version/reference and replace the contents of current directory with those contents. The then-existing content will no longer be present in the local dir (repo) but we can and will still get access to them because they are tracked via git commit and .git folder has them stored/tracked.","title":"The Version Control part of Git"},{"location":"level101/git/git-basics/#reference","text":"I mention in the previous section that we need a reference to the version. By default, git repo is made of tree of commits. And each commit has a unique IDs. But the unique ID is not the only thing we can reference commits via. There are multiple ways to reference commits. For example: HEAD is a reference to current commit. Whatever commit your repo is checked out at, HEAD will point to that. HEAD~1 is reference to previous commit. So while checking out previous version in section above, we could have done git checkout HEAD~1 . Similarly, master is also a reference (to a branch). Since git uses tree like structure to store commits, there of course will be branches. And the default branch is called master . Master (or any branch reference) will point to the latest commit in the branch. Even though we have checked out to the previous commit in out repo, master still points to the latest commit. And we can get back to the latest version by checkout at master reference $ git checkout master Previous HEAD position was df2fb7a adding file 1 Switched to branch 'master' # now we will see latest code, with two files $ ls file1.txt file2.txt Note, instead of master in above command, we could have used commit's ID as well.","title":"Reference"},{"location":"level101/git/git-basics/#references-and-the-magic","text":"Let's look at the state of things. Two commits, master and HEAD references are pointing to the latest commit $ git log --oneline --graph * 7f3b00e (HEAD -> master) adding file 2 * df2fb7a adding file 1 The magic? Let's examine these files: $ cat .git/refs/heads/master 7f3b00eaa957815884198e2fdfec29361108d6a9 Viola! Where master is pointing to is stored in a file. Whenever git needs to know where master reference is pointing to, or if git needs to update where master points, it just needs to update the file above. So when you create a new commit, a new commit is created on top of the current commit and the master file is updated with the new commit's ID. Similary, for HEAD reference: $ cat .git/HEAD ref: refs/heads/master We can see HEAD is pointing to a reference called refs/heads/master . So HEAD will point where ever the master points.","title":"References and The Magic"},{"location":"level101/git/git-basics/#little-adventure","text":"We discussed how git will update the files as we execute commands. But let's try to do it ourselves, by hand, and see what happens. $ git log --oneline --graph * 7f3b00e (HEAD -> master) adding file 2 * df2fb7a adding file 1 Now let's change master to point to the previous/first commit. $ echo df2fb7a61f5d40c1191e0fdeb0fc5d6e7969685a > .git/refs/heads/master $ git log --oneline --graph * df2fb7a (HEAD -> master) adding file 1 # RESETTING TO ORIGINAL $ echo 7f3b00eaa957815884198e2fdfec29361108d6a9 > .git/refs/heads/master $ git log --oneline --graph * 7f3b00e (HEAD -> master) adding file 2 * df2fb7a adding file 1 We just edited the master reference file and now we can see only the first commit in git log. Undoing the change to the file brings the state back to original. Not so much of magic, is it?","title":"Little Adventure"},{"location":"level101/git/github-hooks/","text":"Git with GitHub Till now all the operations we did were in our local repo while git also helps us in a collaborative environment. GitHub is one place on the internet where you can centrally host your git repos and collaborate with other developers. Most of the workflow will remain the same as we discussed, with addition of couple of things: Pull: to pull latest changes from github (the central) repo Push: to push your changes to github repo so that it's available to all people GitHub has written nice guides and tutorials about this and you can refer them here: GitHub Hello World Git Handbook Hooks Git has another nice feature called hooks. Hooks are basically scripts which will be called when a certain event happens. Here is where hooks are located: $ ls .git/hooks/ applypatch-msg.sample fsmonitor-watchman.sample pre-applypatch.sample pre-push.sample pre-receive.sample update.sample commit-msg.sample post-update.sample pre-commit.sample pre-rebase.sample prepare-commit-msg.sample Names are self explanatory. These hooks are useful when you want to do certain things when a certain event happens. If you want to run tests before pushing code, you would want to setup pre-push hooks. Let's try to create a pre commit hook. $ echo \"echo this is from pre commit hook\" > .git/hooks/pre-commit $ chmod +x .git/hooks/pre-commit We basically create a file called pre-commit in hooks folder and make it executable. Now if we make a commit, we should see the message getting printed. $ echo \"sample file\" > sample.txt $ git add sample.txt $ git commit -m \"adding sample file\" this is from pre commit hook # <===== THE MESSAGE FROM HOOK EXECUTION [master 9894e05] adding sample file 1 file changed, 1 insertion(+) create mode 100644 sample.txt","title":"Github and Hooks"},{"location":"level101/git/github-hooks/#git-with-github","text":"Till now all the operations we did were in our local repo while git also helps us in a collaborative environment. GitHub is one place on the internet where you can centrally host your git repos and collaborate with other developers. Most of the workflow will remain the same as we discussed, with addition of couple of things: Pull: to pull latest changes from github (the central) repo Push: to push your changes to github repo so that it's available to all people GitHub has written nice guides and tutorials about this and you can refer them here: GitHub Hello World Git Handbook","title":"Git with GitHub"},{"location":"level101/git/github-hooks/#hooks","text":"Git has another nice feature called hooks. Hooks are basically scripts which will be called when a certain event happens. Here is where hooks are located: $ ls .git/hooks/ applypatch-msg.sample fsmonitor-watchman.sample pre-applypatch.sample pre-push.sample pre-receive.sample update.sample commit-msg.sample post-update.sample pre-commit.sample pre-rebase.sample prepare-commit-msg.sample Names are self explanatory. These hooks are useful when you want to do certain things when a certain event happens. If you want to run tests before pushing code, you would want to setup pre-push hooks. Let's try to create a pre commit hook. $ echo \"echo this is from pre commit hook\" > .git/hooks/pre-commit $ chmod +x .git/hooks/pre-commit We basically create a file called pre-commit in hooks folder and make it executable. Now if we make a commit, we should see the message getting printed. $ echo \"sample file\" > sample.txt $ git add sample.txt $ git commit -m \"adding sample file\" this is from pre commit hook # <===== THE MESSAGE FROM HOOK EXECUTION [master 9894e05] adding sample file 1 file changed, 1 insertion(+) create mode 100644 sample.txt","title":"Hooks"},{"location":"level101/linux_basics/command_line_basics/","text":"Command Line Basics Lab Environment Setup One can use an online bash interpreter to run all the commands that are provided as examples in this course. This will also help you in getting a hands-on experience of various linux commands. REPL is one of the popular online bash interpreters for running linux commands. We will be using it for running all the commands mentioned in this course. What is a Command A command is a program that tells the operating system to perform specific work. Programs are stored as files in linux. Therefore, a command is also a file which is stored somewhere on the disk. Commands may also take additional arguments as input from the user. These arguments are called command line arguments. Knowing how to use the commands is important and there are many ways to get help in Linux, especially for commands. Almost every command will have some form of documentation, most commands will have a command-line argument -h or --help that will display a reasonable amount of documentation. But the most popular documentation system in Linux is called man pages - short for manual pages. Using --help to show the documentation for ls command. File System Organization The linux file system has a hierarchical (or tree-like) structure with its highest level directory called root ( denoted by / ). Directories present inside the root directory stores file related to the system. These directories in turn can either store system files or application files or user related files. bin | The executable program of most commonly used commands reside in bin directory dev | This directory contains files related to devices on the system etc | This directory contains all the system configuration files home | This directory contains user related files and directories. lib | This directory contains all the library files mnt | This directory contains files related to mounted devices on the system proc | This directory contains files related to the running processes on the system root | This directory contains root user related files and directories. sbin | This directory contains programs used for system administration. tmp | This directory is used to store temporary files on the system usr | This directory is used to store application programs on the system Commands for Navigating the File System There are three basic commands which are used frequently to navigate the file system: ls pwd cd We will now try to understand what each command does and how to use these commands. You should also practice the given examples on the online bash shell. pwd (print working directory) At any given moment of time, we will be standing in a certain directory. To get the name of the directory in which we are standing, we can use the pwd command in linux. We will now use the cd command to move to a different directory and then print the working directory. cd (change directory) The cd command can be used to change the working directory. Using the command, you can move from one directory to another. In the below example, we are initially in the root directory. we have then used the cd command to change the directory. ls (list files and directories)** The ls command is used to list the contents of a directory. It will list down all the files and folders present in the given directory. If we just type ls in the shell, it will list all the files and directories present in the current directory. We can also provide the directory name as argument to ls command. It will then list all the files and directories inside the given directory. Commands for Manipulating Files There are five basic commands which are used frequently to manipulate files: touch mkdir cp mv rm We will now try to understand what each command does and how to use these commands. You should also practice the given examples on the online bash shell. touch (create new file) The touch command can be used to create an empty new file. This command is very useful for many other purposes but we will discuss the simplest use case of creating a new file. General syntax of using touch command touch mkdir (create new directories) The mkdir command is used to create directories.You can use ls command to verify that the new directory is created. General syntax of using mkdir command mkdir rm (delete files and directories) The rm command can be used to delete files and directories. It is very important to note that this command permanently deletes the files and directories. It's almost impossible to recover these files and directories once you have executed rm command on them successfully. Do run this command with care. General syntax of using rm command: rm Let's try to understand the rm command with an example. We will try to delete the file and directory we created using touch and mkdir command respectively. cp (copy files and directories) The cp command is used to copy files and directories from one location to another. Do note that the cp command doesn't do any change to the original files or directories. The original files or directories and their copy both co-exist after running cp command successfully. General syntax of using cp command: cp We are currently in the '/home/runner' directory. We will use the mkdir command to create a new directory named \"test_directory\". We will now try to copy the \"_test_runner.py\" file to the directory we created just now. Do note that nothing happened to the original \"_test_runner.py\" file. It's still there in the current directory. A new copy of it got created inside the \"test_directory\". We can also use the cp command to copy the whole directory from one location to another. Let's try to understand this with an example. We again used the mkdir command to create a new directory called \"another_directory\". We then used the cp command along with an additional argument '-r' to copy the \"test_directory\". mv (move files and directories) The mv command can either be used to move files or directories from one location to another or it can be used to rename files or directories. Do note that moving files and copying them are very different. When you move the files or directories, the original copy is lost. General syntax of using mv command: mv In this example, we will use the mv command to move the \"_test_runner.py\" file to \"test_directory\". In this case, this file already exists in \"test_directory\". The mv command will just replace it. Do note that the original file doesn't exist in the current directory after mv command ran successfully. We can also use the mv command to move a directory from one location to another. In this case, we do not need to use the '-r' flag that we did while using the cp command. Do note that the original directory will not exist if we use mv command. One of the important uses of the mv command is to rename files and directories. Let's see how we can use this command for renaming. We have first changed our location to \"test_directory\". We then use the mv command to rename the \"\"_test_runner.py\" file to \"test.py\". Commands for Viewing Files There are five basic commands which are used frequently to view the files: cat head tail more less We will now try to understand what each command does and how to use these commands. You should also practice the given examples on the online bash shell. We will create a new file called \"numbers.txt\" and insert numbers from 1 to 100 in this file. Each number will be in a separate line. Do not worry about the above command now. It's an advanced command which is used to generate numbers. We have then used a redirection operator to push these numbers to the file. We will be discussing I/O redirection in the later sections. cat The most simplest use of cat command is to print the contents of the file on your output screen. This command is very useful and can be used for many other purposes. We will study about other use cases later. You can try to run the above command and you will see numbers being printed from 1 to 100 on your screen. You will need to scroll up to view all the numbers. head The head command displays the first 10 lines of the file by default. We can include additional arguments to display as many lines as we want from the top. In this example, we are only able to see the first 10 lines from the file when we use the head command. By default, head command will only display the first 10 lines. If we want to specify the number of lines we want to see from start, use the '-n' argument to provide the input. tail The tail command displays the last 10 lines of the file by default. We can include additional arguments to display as many lines as we want from the end of the file. By default, the tail command will only display the last 10 lines. If we want to specify the number of lines we want to see from the end, use '-n' argument to provide the input. In this example, we are only able to see the last 5 lines from the file when we use the tail command with explicit -n option. more More command displays the contents of a file or a command output, displaying one screen at a time in case the file is large (Eg: log files). It also allows forward navigation and limited backward navigation in the file. More command displays as much as can fit on the current screen and waits for user input to advance. Forward navigation can be done by pressing Enter, which advances the output by one line and Space, which advances the output by one screen. less Less command is an improved version of more. It displays the contents of a file or a command output, one page at a time. It allows backward navigation as well as forward navigation in the file and also has search options. We can use arrow keys for advancing backward or forward by one line. For moving forward by one page, press Space and for moving backward by one page, press b on your keyboard. You can go to the beginning and the end of a file instantly. Echo Command in Linux The echo command is one of the simplest commands that is used in the shell. This command is equivalent to what we have in other programming languages. The echo command prints the given input string on the screen. Text Processing Commands In the previous section, we learned how to view the content of a file. In many cases, we will be interested in performing the below operations: Print only the lines which contain a particular word(s) Replace a particular word with another word in a file Sort the lines in a particular order There are three basic commands which are used frequently to process texts: grep sed sort We will now try to understand what each command does and how to use these commands. You should also practice the given examples on the online bash shell. We will create a new file called \"numbers.txt\" and insert numbers from 1 to 10 in this file. Each number will be in a separate line. grep The grep command in its simplest form can be used to search particular words in a text file. It will display all the lines in a file that contains a particular input. The word we want to search is provided as an input to the grep command. General syntax of using grep command: grep In this example, we are trying to search for a string \"1\" in this file. The grep command outputs the lines where it found this string. sed The sed command in its simplest form can be used to replace a text in a file. General syntax of using the sed command for replacement: sed 's///' Let's try to replace each occurrence of \"1\" in the file with \"3\" using sed command. The content of the file will not change in the above example. To do so, we have to use an extra argument '-i' so that the changes are reflected back in the file. sort The sort command can be used to sort the input provided to it as an argument. By default, it will sort in increasing order. Let's first see the content of the file before trying to sort it. Now, we will try to sort the file using the sort command. The sort command sorts the content in lexicographical order. The content of the file will not change in the above example. I/O Redirection Each open file gets assigned a file descriptor. A file descriptor is an unique identifier for open files in the system. There are always three default files open, stdin (the keyboard), stdout (the screen), and stderr (error messages output to the screen). These files can be redirected. Everything is a file in linux - https://unix.stackexchange.com/questions/225537/everything-is-a-file Till now, we have displayed all the output on the screen which is the standard output. We can use some special operators to redirect the output of the command to files or even to the input of other commands. I/O redirection is a very powerful feature. In the below example, we have used the '>' operator to redirect the output of ls command to output.txt file. In the below example, we have redirected the output from echo command to a file. We can also redirect the output of a command as an input to another command. This is possible with the help of pipes. In the below example, we have passed the output of cat command as an input to grep command using pipe(|) operator. In the below example, we have passed the output of sort command as an input to uniq command using pipe(|) operator. The uniq command only prints the unique numbers from the input. I/O redirection - https://tldp.org/LDP/abs/html/io-redirection.html","title":"Command Line Basics"},{"location":"level101/linux_basics/command_line_basics/#command-line-basics","text":"","title":"Command Line Basics"},{"location":"level101/linux_basics/command_line_basics/#lab-environment-setup","text":"One can use an online bash interpreter to run all the commands that are provided as examples in this course. This will also help you in getting a hands-on experience of various linux commands. REPL is one of the popular online bash interpreters for running linux commands. We will be using it for running all the commands mentioned in this course.","title":"Lab Environment Setup"},{"location":"level101/linux_basics/command_line_basics/#what-is-a-command","text":"A command is a program that tells the operating system to perform specific work. Programs are stored as files in linux. Therefore, a command is also a file which is stored somewhere on the disk. Commands may also take additional arguments as input from the user. These arguments are called command line arguments. Knowing how to use the commands is important and there are many ways to get help in Linux, especially for commands. Almost every command will have some form of documentation, most commands will have a command-line argument -h or --help that will display a reasonable amount of documentation. But the most popular documentation system in Linux is called man pages - short for manual pages. Using --help to show the documentation for ls command.","title":"What is a Command"},{"location":"level101/linux_basics/command_line_basics/#file-system-organization","text":"The linux file system has a hierarchical (or tree-like) structure with its highest level directory called root ( denoted by / ). Directories present inside the root directory stores file related to the system. These directories in turn can either store system files or application files or user related files. bin | The executable program of most commonly used commands reside in bin directory dev | This directory contains files related to devices on the system etc | This directory contains all the system configuration files home | This directory contains user related files and directories. lib | This directory contains all the library files mnt | This directory contains files related to mounted devices on the system proc | This directory contains files related to the running processes on the system root | This directory contains root user related files and directories. sbin | This directory contains programs used for system administration. tmp | This directory is used to store temporary files on the system usr | This directory is used to store application programs on the system","title":"File System Organization"},{"location":"level101/linux_basics/command_line_basics/#commands-for-navigating-the-file-system","text":"There are three basic commands which are used frequently to navigate the file system: ls pwd cd We will now try to understand what each command does and how to use these commands. You should also practice the given examples on the online bash shell.","title":"Commands for Navigating the File System"},{"location":"level101/linux_basics/command_line_basics/#pwd-print-working-directory","text":"At any given moment of time, we will be standing in a certain directory. To get the name of the directory in which we are standing, we can use the pwd command in linux. We will now use the cd command to move to a different directory and then print the working directory.","title":"pwd (print working directory)"},{"location":"level101/linux_basics/command_line_basics/#cd-change-directory","text":"The cd command can be used to change the working directory. Using the command, you can move from one directory to another. In the below example, we are initially in the root directory. we have then used the cd command to change the directory.","title":"cd (change directory)"},{"location":"level101/linux_basics/command_line_basics/#ls-list-files-and-directories","text":"The ls command is used to list the contents of a directory. It will list down all the files and folders present in the given directory. If we just type ls in the shell, it will list all the files and directories present in the current directory. We can also provide the directory name as argument to ls command. It will then list all the files and directories inside the given directory.","title":"ls (list files and directories)**"},{"location":"level101/linux_basics/command_line_basics/#commands-for-manipulating-files","text":"There are five basic commands which are used frequently to manipulate files: touch mkdir cp mv rm We will now try to understand what each command does and how to use these commands. You should also practice the given examples on the online bash shell.","title":"Commands for Manipulating Files"},{"location":"level101/linux_basics/command_line_basics/#touch-create-new-file","text":"The touch command can be used to create an empty new file. This command is very useful for many other purposes but we will discuss the simplest use case of creating a new file. General syntax of using touch command touch ","title":"touch (create new file)"},{"location":"level101/linux_basics/command_line_basics/#mkdir-create-new-directories","text":"The mkdir command is used to create directories.You can use ls command to verify that the new directory is created. General syntax of using mkdir command mkdir ","title":"mkdir (create new directories)"},{"location":"level101/linux_basics/command_line_basics/#rm-delete-files-and-directories","text":"The rm command can be used to delete files and directories. It is very important to note that this command permanently deletes the files and directories. It's almost impossible to recover these files and directories once you have executed rm command on them successfully. Do run this command with care. General syntax of using rm command: rm Let's try to understand the rm command with an example. We will try to delete the file and directory we created using touch and mkdir command respectively.","title":"rm (delete files and directories)"},{"location":"level101/linux_basics/command_line_basics/#cp-copy-files-and-directories","text":"The cp command is used to copy files and directories from one location to another. Do note that the cp command doesn't do any change to the original files or directories. The original files or directories and their copy both co-exist after running cp command successfully. General syntax of using cp command: cp We are currently in the '/home/runner' directory. We will use the mkdir command to create a new directory named \"test_directory\". We will now try to copy the \"_test_runner.py\" file to the directory we created just now. Do note that nothing happened to the original \"_test_runner.py\" file. It's still there in the current directory. A new copy of it got created inside the \"test_directory\". We can also use the cp command to copy the whole directory from one location to another. Let's try to understand this with an example. We again used the mkdir command to create a new directory called \"another_directory\". We then used the cp command along with an additional argument '-r' to copy the \"test_directory\". mv (move files and directories) The mv command can either be used to move files or directories from one location to another or it can be used to rename files or directories. Do note that moving files and copying them are very different. When you move the files or directories, the original copy is lost. General syntax of using mv command: mv In this example, we will use the mv command to move the \"_test_runner.py\" file to \"test_directory\". In this case, this file already exists in \"test_directory\". The mv command will just replace it. Do note that the original file doesn't exist in the current directory after mv command ran successfully. We can also use the mv command to move a directory from one location to another. In this case, we do not need to use the '-r' flag that we did while using the cp command. Do note that the original directory will not exist if we use mv command. One of the important uses of the mv command is to rename files and directories. Let's see how we can use this command for renaming. We have first changed our location to \"test_directory\". We then use the mv command to rename the \"\"_test_runner.py\" file to \"test.py\".","title":"cp (copy files and directories)"},{"location":"level101/linux_basics/command_line_basics/#commands-for-viewing-files","text":"There are five basic commands which are used frequently to view the files: cat head tail more less We will now try to understand what each command does and how to use these commands. You should also practice the given examples on the online bash shell. We will create a new file called \"numbers.txt\" and insert numbers from 1 to 100 in this file. Each number will be in a separate line. Do not worry about the above command now. It's an advanced command which is used to generate numbers. We have then used a redirection operator to push these numbers to the file. We will be discussing I/O redirection in the later sections.","title":"Commands for Viewing Files"},{"location":"level101/linux_basics/command_line_basics/#cat","text":"The most simplest use of cat command is to print the contents of the file on your output screen. This command is very useful and can be used for many other purposes. We will study about other use cases later. You can try to run the above command and you will see numbers being printed from 1 to 100 on your screen. You will need to scroll up to view all the numbers.","title":"cat"},{"location":"level101/linux_basics/command_line_basics/#head","text":"The head command displays the first 10 lines of the file by default. We can include additional arguments to display as many lines as we want from the top. In this example, we are only able to see the first 10 lines from the file when we use the head command. By default, head command will only display the first 10 lines. If we want to specify the number of lines we want to see from start, use the '-n' argument to provide the input.","title":"head"},{"location":"level101/linux_basics/command_line_basics/#tail","text":"The tail command displays the last 10 lines of the file by default. We can include additional arguments to display as many lines as we want from the end of the file. By default, the tail command will only display the last 10 lines. If we want to specify the number of lines we want to see from the end, use '-n' argument to provide the input. In this example, we are only able to see the last 5 lines from the file when we use the tail command with explicit -n option.","title":"tail"},{"location":"level101/linux_basics/command_line_basics/#more","text":"More command displays the contents of a file or a command output, displaying one screen at a time in case the file is large (Eg: log files). It also allows forward navigation and limited backward navigation in the file. More command displays as much as can fit on the current screen and waits for user input to advance. Forward navigation can be done by pressing Enter, which advances the output by one line and Space, which advances the output by one screen.","title":"more"},{"location":"level101/linux_basics/command_line_basics/#less","text":"Less command is an improved version of more. It displays the contents of a file or a command output, one page at a time. It allows backward navigation as well as forward navigation in the file and also has search options. We can use arrow keys for advancing backward or forward by one line. For moving forward by one page, press Space and for moving backward by one page, press b on your keyboard. You can go to the beginning and the end of a file instantly.","title":"less"},{"location":"level101/linux_basics/command_line_basics/#echo-command-in-linux","text":"The echo command is one of the simplest commands that is used in the shell. This command is equivalent to what we have in other programming languages. The echo command prints the given input string on the screen.","title":"Echo Command in Linux"},{"location":"level101/linux_basics/command_line_basics/#text-processing-commands","text":"In the previous section, we learned how to view the content of a file. In many cases, we will be interested in performing the below operations: Print only the lines which contain a particular word(s) Replace a particular word with another word in a file Sort the lines in a particular order There are three basic commands which are used frequently to process texts: grep sed sort We will now try to understand what each command does and how to use these commands. You should also practice the given examples on the online bash shell. We will create a new file called \"numbers.txt\" and insert numbers from 1 to 10 in this file. Each number will be in a separate line.","title":"Text Processing Commands"},{"location":"level101/linux_basics/command_line_basics/#grep","text":"The grep command in its simplest form can be used to search particular words in a text file. It will display all the lines in a file that contains a particular input. The word we want to search is provided as an input to the grep command. General syntax of using grep command: grep In this example, we are trying to search for a string \"1\" in this file. The grep command outputs the lines where it found this string.","title":"grep"},{"location":"level101/linux_basics/command_line_basics/#sed","text":"The sed command in its simplest form can be used to replace a text in a file. General syntax of using the sed command for replacement: sed 's///' Let's try to replace each occurrence of \"1\" in the file with \"3\" using sed command. The content of the file will not change in the above example. To do so, we have to use an extra argument '-i' so that the changes are reflected back in the file.","title":"sed"},{"location":"level101/linux_basics/command_line_basics/#sort","text":"The sort command can be used to sort the input provided to it as an argument. By default, it will sort in increasing order. Let's first see the content of the file before trying to sort it. Now, we will try to sort the file using the sort command. The sort command sorts the content in lexicographical order. The content of the file will not change in the above example.","title":"sort"},{"location":"level101/linux_basics/command_line_basics/#io-redirection","text":"Each open file gets assigned a file descriptor. A file descriptor is an unique identifier for open files in the system. There are always three default files open, stdin (the keyboard), stdout (the screen), and stderr (error messages output to the screen). These files can be redirected. Everything is a file in linux - https://unix.stackexchange.com/questions/225537/everything-is-a-file Till now, we have displayed all the output on the screen which is the standard output. We can use some special operators to redirect the output of the command to files or even to the input of other commands. I/O redirection is a very powerful feature. In the below example, we have used the '>' operator to redirect the output of ls command to output.txt file. In the below example, we have redirected the output from echo command to a file. We can also redirect the output of a command as an input to another command. This is possible with the help of pipes. In the below example, we have passed the output of cat command as an input to grep command using pipe(|) operator. In the below example, we have passed the output of sort command as an input to uniq command using pipe(|) operator. The uniq command only prints the unique numbers from the input. I/O redirection - https://tldp.org/LDP/abs/html/io-redirection.html","title":"I/O Redirection"},{"location":"level101/linux_basics/conclusion/","text":"Conclusion We have covered the basics of Linux operating systems and basic commands used in linux. We have also covered the Linux server administration commands. We hope that this course will make it easier for you to operate on the command line. Applications in SRE Role As a SRE, you will be required to perform some general tasks on these Linux servers. You will also be using the command line when you are troubleshooting issues. Moving from one location to another in the filesystem will require the help of ls , pwd and cd commands. You may need to search some specific information in the log files. grep command would be very useful here. I/O redirection will become handy if you want to store the output in a file or pass it as an input to another command. tail command is very useful to view the latest data in the log file. Different users will have different permissions depending on their roles. We will also not want everyone in the company to access our servers for security reasons. Users permissions can be restricted with chown , chmod and chgrp commands. ssh is one of the most frequently used commands for a SRE. Logging into servers and troubleshooting along with performing basic administration tasks will only be possible if we are able to login into the server. What if we want to run an apache server or nginx on a server? We will first install it using the package manager. Package management commands become important here. Managing services on servers is another critical responsibility of a SRE. Systemd related commands can help in troubleshooting issues. If a service goes down, we can start it using systemctl start command. We can also stop a service in case it is not needed. Monitoring is another core responsibility of a SRE. Memory and CPU are two important system level metrics which should be monitored. Commands like top and free are quite helpful here. If a service is throwing an error, how do we find out the root cause of the error ? We will certainly need to check logs to find out the whole stack trace of the error. The log file will also tell us the number of times the error has occurred along with time when it started. Useful Courses and tutorials Edx basic linux commands course Edx Red Hat Enterprise Linux Course https://linuxcommand.org/lc3_learning_the_shell.php","title":"Conclusion"},{"location":"level101/linux_basics/conclusion/#conclusion","text":"We have covered the basics of Linux operating systems and basic commands used in linux. We have also covered the Linux server administration commands. We hope that this course will make it easier for you to operate on the command line.","title":"Conclusion"},{"location":"level101/linux_basics/conclusion/#applications-in-sre-role","text":"As a SRE, you will be required to perform some general tasks on these Linux servers. You will also be using the command line when you are troubleshooting issues. Moving from one location to another in the filesystem will require the help of ls , pwd and cd commands. You may need to search some specific information in the log files. grep command would be very useful here. I/O redirection will become handy if you want to store the output in a file or pass it as an input to another command. tail command is very useful to view the latest data in the log file. Different users will have different permissions depending on their roles. We will also not want everyone in the company to access our servers for security reasons. Users permissions can be restricted with chown , chmod and chgrp commands. ssh is one of the most frequently used commands for a SRE. Logging into servers and troubleshooting along with performing basic administration tasks will only be possible if we are able to login into the server. What if we want to run an apache server or nginx on a server? We will first install it using the package manager. Package management commands become important here. Managing services on servers is another critical responsibility of a SRE. Systemd related commands can help in troubleshooting issues. If a service goes down, we can start it using systemctl start command. We can also stop a service in case it is not needed. Monitoring is another core responsibility of a SRE. Memory and CPU are two important system level metrics which should be monitored. Commands like top and free are quite helpful here. If a service is throwing an error, how do we find out the root cause of the error ? We will certainly need to check logs to find out the whole stack trace of the error. The log file will also tell us the number of times the error has occurred along with time when it started.","title":"Applications in SRE Role"},{"location":"level101/linux_basics/conclusion/#useful-courses-and-tutorials","text":"Edx basic linux commands course Edx Red Hat Enterprise Linux Course https://linuxcommand.org/lc3_learning_the_shell.php","title":"Useful Courses and tutorials"},{"location":"level101/linux_basics/intro/","text":"Linux Basics Introduction Prerequisites Should be comfortable in using any operating systems like Windows, Linux or Mac Expected to have fundamental knowledge of operating systems What to expect from this course This course is divided into three parts. In the first part, we cover the fundamentals of Linux operating systems. We will talk about Linux architecture, Linux distributions and uses of Linux operating systems. We will also talk about the difference between GUI and CLI. In the second part, we cover some basic commands used in Linux. We will focus on commands used for navigating the file system, viewing and manipulating files, I/O redirection etc. In the third part, we cover Linux system administration. This includes day to day tasks performed by Linux admins, like managing users/groups, managing file permissions, monitoring system performance, log files etc. In the second and third part, we will be taking examples to understand the concepts. What is not covered under this course We are not covering advanced Linux commands and bash scripting in this course. We will also not be covering Linux internals. Course Contents The following topics has been covered in this course: Introduction to Linux What are Linux Operating Systems What are popular Linux distributions Uses of Linux Operating Systems Linux Architecture Graphical user interface (GUI) vs Command line interface (CLI) Command Line Basics Lab Environment Setup What is a Command File System Organization Navigating File System Manipulating Files Viewing Files Echo Command Text Processing Commands I/O Redirection Linux system administration Lab Environment Setup User/Groups management Becoming a Superuser File Permissions SSH Command Package Management Process Management Memory Management Daemons and Systemd Logs Conclusion Applications in SRE Role Useful Courses and tutorials What are Linux operating systems Most of us are familiar with the Windows operating system used in more than 75% of the personal computers. The Windows operating systems are based on Windows NT kernel. A kernel is the most important part of an operating system - it performs important functions like process management, memory management, filesystem management etc. Linux operating systems are based on the Linux kernel. A Linux based operating system will consist of Linux kernel, GUI/CLI, system libraries and system utilities. The Linux kernel was independently developed and released by Linus Torvalds. The Linux kernel is free and open-source - https://github.com/torvalds/linux Linux is a kernel and not a complete operating system. Linux kernel is combined with GNU system to make a complete operating system. Therefore, linux based operating systems are also called as GNU/Linux systems. GNU is an extensive collection of free softwares like compiler, debugger, C library etc. Linux and the GNU System History of Linux - https://en.wikipedia.org/wiki/History_of_Linux What are popular Linux distributions A Linux distribution(distro) is an operating system based on the Linux kernel and a package management system. A package management system consists of tools that help in installing, upgrading, configuring and removing softwares on the operating system. Software are usually adopted to a distribution and are packaged in a distro specific format. These packages are available through a distro specific repository. Packages are installed and managed in the operating system by a package manager. List of popular Linux distributions: Fedora Ubuntu Debian Centos Red Hat Enterprise Linux Suse Arch Linux Packaging systems Distributions Package manager Debian style (.deb) Debian, Ubuntu APT Red Hat style (.rpm) Fedora, CentOS, Red Hat Enterprise Linux YUM Linux Architecture The Linux kernel is monolithic in nature. System calls are used to interact with the Linux kernel space. Kernel code can only be executed in the kernel mode. Non-kernel code is executed in the user mode. Device drivers are used to communicate with the hardware devices. Uses of Linux Operating Systems Operating system based on Linux kernel are widely used in: Personal computers Servers Mobile phones - Android is based on Linux operating system Embedded devices - watches, televisions, traffic lights etc Satellites Network devices - routers, switches etc. Graphical user interface (GUI) vs Command line interface (CLI) A user interacts with a computer with the help of user interfaces. The user interface can be either GUI or CLI. Graphical user interface allows a user to interact with the computer using graphics such as icons and images. When a user clicks on an icon to open an application on a computer, he or she is actually using the GUI. It's easy to perform tasks using GUI. Command line interface allows a user to interact with the computer using commands. A user types the command in a terminal and the system helps in executing these commands. A new user with experience on GUI may find it difficult to interact with CLI as he/she needs to be aware of the commands to perform a particular operation. Shell vs Terminal Shell is a program that takes commands from the users and gives them to the operating system for processing. Shell is an example of a CLI (command line interface). Bash is one of the most popular shell programs available on Linux servers. Other popular shell programs are zsh, ksh and tcsh. Terminal is a program that opens a window and lets you interact with the shell. Some popular examples of terminals are gnome-terminal, xterm, konsole etc. Linux users do use the terms shell, terminal, prompt, console etc. interchangeably. In simple terms, these all refer to a way of taking commands from the user.","title":"Introduction"},{"location":"level101/linux_basics/intro/#linux-basics","text":"","title":"Linux Basics"},{"location":"level101/linux_basics/intro/#introduction","text":"","title":"Introduction"},{"location":"level101/linux_basics/intro/#prerequisites","text":"Should be comfortable in using any operating systems like Windows, Linux or Mac Expected to have fundamental knowledge of operating systems","title":"Prerequisites"},{"location":"level101/linux_basics/intro/#what-to-expect-from-this-course","text":"This course is divided into three parts. In the first part, we cover the fundamentals of Linux operating systems. We will talk about Linux architecture, Linux distributions and uses of Linux operating systems. We will also talk about the difference between GUI and CLI. In the second part, we cover some basic commands used in Linux. We will focus on commands used for navigating the file system, viewing and manipulating files, I/O redirection etc. In the third part, we cover Linux system administration. This includes day to day tasks performed by Linux admins, like managing users/groups, managing file permissions, monitoring system performance, log files etc. In the second and third part, we will be taking examples to understand the concepts.","title":"What to expect from this course"},{"location":"level101/linux_basics/intro/#what-is-not-covered-under-this-course","text":"We are not covering advanced Linux commands and bash scripting in this course. We will also not be covering Linux internals.","title":"What is not covered under this course"},{"location":"level101/linux_basics/intro/#course-contents","text":"The following topics has been covered in this course: Introduction to Linux What are Linux Operating Systems What are popular Linux distributions Uses of Linux Operating Systems Linux Architecture Graphical user interface (GUI) vs Command line interface (CLI) Command Line Basics Lab Environment Setup What is a Command File System Organization Navigating File System Manipulating Files Viewing Files Echo Command Text Processing Commands I/O Redirection Linux system administration Lab Environment Setup User/Groups management Becoming a Superuser File Permissions SSH Command Package Management Process Management Memory Management Daemons and Systemd Logs Conclusion Applications in SRE Role Useful Courses and tutorials","title":"Course Contents"},{"location":"level101/linux_basics/intro/#what-are-linux-operating-systems","text":"Most of us are familiar with the Windows operating system used in more than 75% of the personal computers. The Windows operating systems are based on Windows NT kernel. A kernel is the most important part of an operating system - it performs important functions like process management, memory management, filesystem management etc. Linux operating systems are based on the Linux kernel. A Linux based operating system will consist of Linux kernel, GUI/CLI, system libraries and system utilities. The Linux kernel was independently developed and released by Linus Torvalds. The Linux kernel is free and open-source - https://github.com/torvalds/linux Linux is a kernel and not a complete operating system. Linux kernel is combined with GNU system to make a complete operating system. Therefore, linux based operating systems are also called as GNU/Linux systems. GNU is an extensive collection of free softwares like compiler, debugger, C library etc. Linux and the GNU System History of Linux - https://en.wikipedia.org/wiki/History_of_Linux","title":"What are Linux operating systems"},{"location":"level101/linux_basics/intro/#what-are-popular-linux-distributions","text":"A Linux distribution(distro) is an operating system based on the Linux kernel and a package management system. A package management system consists of tools that help in installing, upgrading, configuring and removing softwares on the operating system. Software are usually adopted to a distribution and are packaged in a distro specific format. These packages are available through a distro specific repository. Packages are installed and managed in the operating system by a package manager. List of popular Linux distributions: Fedora Ubuntu Debian Centos Red Hat Enterprise Linux Suse Arch Linux Packaging systems Distributions Package manager Debian style (.deb) Debian, Ubuntu APT Red Hat style (.rpm) Fedora, CentOS, Red Hat Enterprise Linux YUM","title":"What are popular Linux distributions"},{"location":"level101/linux_basics/intro/#linux-architecture","text":"The Linux kernel is monolithic in nature. System calls are used to interact with the Linux kernel space. Kernel code can only be executed in the kernel mode. Non-kernel code is executed in the user mode. Device drivers are used to communicate with the hardware devices.","title":"Linux Architecture"},{"location":"level101/linux_basics/intro/#uses-of-linux-operating-systems","text":"Operating system based on Linux kernel are widely used in: Personal computers Servers Mobile phones - Android is based on Linux operating system Embedded devices - watches, televisions, traffic lights etc Satellites Network devices - routers, switches etc.","title":"Uses of Linux Operating Systems"},{"location":"level101/linux_basics/intro/#graphical-user-interface-gui-vs-command-line-interface-cli","text":"A user interacts with a computer with the help of user interfaces. The user interface can be either GUI or CLI. Graphical user interface allows a user to interact with the computer using graphics such as icons and images. When a user clicks on an icon to open an application on a computer, he or she is actually using the GUI. It's easy to perform tasks using GUI. Command line interface allows a user to interact with the computer using commands. A user types the command in a terminal and the system helps in executing these commands. A new user with experience on GUI may find it difficult to interact with CLI as he/she needs to be aware of the commands to perform a particular operation.","title":"Graphical user interface (GUI) vs Command line interface (CLI)"},{"location":"level101/linux_basics/intro/#shell-vs-terminal","text":"Shell is a program that takes commands from the users and gives them to the operating system for processing. Shell is an example of a CLI (command line interface). Bash is one of the most popular shell programs available on Linux servers. Other popular shell programs are zsh, ksh and tcsh. Terminal is a program that opens a window and lets you interact with the shell. Some popular examples of terminals are gnome-terminal, xterm, konsole etc. Linux users do use the terms shell, terminal, prompt, console etc. interchangeably. In simple terms, these all refer to a way of taking commands from the user.","title":"Shell vs Terminal"},{"location":"level101/linux_basics/linux_server_administration/","text":"Linux Server Administration In this course will try to cover some of the common tasks that a linux server administrator performs. We will first try to understand what a particular command does and then try to understand the commands using examples. Do keep in mind that it's very important to practice the Linux commands on your own. Lab Environment Setup Install docker on your system - https://docs.docker.com/engine/install/ We will be running all the commands on Red Hat Enterprise Linux (RHEL) 8 system. We will run most of the commands used in this module in the above Docker container. Multi-User Operating Systems An operating system is considered as multi-user if it allows multiple people/users to use a computer and not affect each other's files and preferences. Linux based operating systems are multi-user in nature as it allows multiple users to access the system at the same time. A typical computer will only have one keyboard and monitor but multiple users can log in via SSH if the computer is connected to the network. We will cover more about SSH later. As a server administrator, we are mostly concerned with the Linux servers which are physically present at a very large distance from us. We can connect to these servers with the help of remote login methods like SSH. Since Linux supports multiple users, we need to have a method which can protect the users from each other. One user should not be able to access and modify files of other users User/Group Management Users in Linux has an associated user ID called UID attached to them. Users also has a home directory and a login shell associated with them. A group is a collection of one or more users. A group makes it easier to share permissions among a group of users. Each group has a group ID called GID associated with it. id command id command can be used to find the uid and gid associated with an user. It also lists down the groups to which the user belongs to. The uid and gid associated with the root user is 0. A good way to find out the current user in Linux is to use the whoami command. \"root\" user or superuser is the most privileged user with unrestricted access to all the resources on the system. It has UID 0 Important files associated with users/groups /etc/passwd Stores the user name, the uid, the gid, the home directory, the login shell etc /etc/shadow Stores the password associated with the users /etc/group Stores information about different groups on the system If you want to understand each filed discussed in the above outputs, you can go through below links: https://tldp.org/LDP/lame/LAME/linux-admin-made-easy/shadow-file-formats.html https://tldp.org/HOWTO/User-Authentication-HOWTO/x71.html Important commands for managing users Some of the commands which are used frequently to manage users/groups on Linux are following: useradd - Creates a new user passwd - Adds or modifies passwords for a user usermod - Modifies attributes of an user userdel - Deletes an user useradd The useradd command adds a new user in Linux. We will create a new user 'shivam'. We will also verify that the user has been created by tailing the /etc/passwd file. The uid and gid are 1000 for the newly created user. The home directory assigned to the user is /home/shivam and the login shell assigned is /bin/bash. Do note that the user home directory and login shell can be modified later on. If we do not specify any value for attributes like home directory or login shell, default values will be assigned to the user. We can also override these default values when creating a new user. passwd The passwd command is used to create or modify passwords for a user. In the above examples, we have not assigned any password for users 'shivam' or 'amit' while creating them. \"!!\" in an account entry in shadow means the account of an user has been created, but not yet given a password. Let's now try to create a password for user \"shivam\". Do remember the password as we will be later using examples where it will be useful. Also, let's change the password for the root user now. When we switch from a normal user to root user, it will request you for a password. Also, when you login using root user, the password will be asked. usermod The usermod command is used to modify the attributes of an user like the home directory or the shell. Let's try to modify the login shell of user \"amit\" to \"/bin/bash\". In a similar way, you can also modify many other attributes for a user. Try 'usermod -h' for a list of attributes you can modify. userdel The userdel command is used to remove a user on Linux. Once we remove a user, all the information related to that user will be removed. Let's try to delete the user \"amit\". After deleting the user, you will not find the entry for that user in \"/etc/passwd\" or \"/etc/shadow\" file. Important commands for managing groups Commands for managing groups are quite similar to the commands used for managing users. Each command is not explained in detail here as they are quite similar. You can try running these commands on your system. groupadd \\ Creates a new group groupmod \\ Modifies attributes of a group groupdel \\ Deletes a group gpasswd \\ Modifies password for group We will now try to add user \"shivam\" to the group we have created above. Becoming a Superuser Before running the below commands, do make sure that you have set up a password for user \"shivam\" and user \"root\" using the passwd command described in the above section. The su command can be used to switch users in Linux. Let's now try to switch to user \"shivam\". Let's now try to open the \"/etc/shadow\" file. The operating system didn't allow the user \"shivam\" to read the content of the \"/etc/shadow\" file. This is an important file in Linux which stores the passwords of users. This file can only be accessed by root or users who have the superuser privileges. The sudo command allows a user to run commands with the security privileges of the root user. Do remember that the root user has all the privileges on a system. We can also use su command to switch to the root user and open the above file but doing that will require the password of the root user. An alternative way which is preferred on most modern operating systems is to use sudo command for becoming a superuser. Using this way, a user has to enter his/her password and they need to be a part of the sudo group. How to provide superpriveleges to other users ? Let's first switch to the root user using su command. Do note that using the below command will need you to enter the password for the root user. In case, you forgot to set a password for the root user, type \"exit\" and you will be back as the root user. Now, set up a password using the passwd command. The file /etc/sudoers holds the names of users permitted to invoke sudo . In redhat operating systems, this file is not present by default. We will need to install sudo. We will discuss the yum command in detail in later sections. Try to open the \"/etc/sudoers\" file on the system. The file has a lot of information. This file stores the rules that users must follow when running the sudo command. For example, root is allowed to run any commands from anywhere. One easy way of providing root access to users is to add them to a group which has permissions to run all the commands. \"wheel\" is a group in redhat Linux with such privileges. Let's add the user \"shivam\" to this group so that it also has sudo privileges. Let's now switch back to user \"shivam\" and try to access the \"/etc/shadow\" file. We need to use sudo before running the command since it can only be accessed with the sudo privileges. We have already given sudo privileges to user \u201cshivam\u201d by adding him to the group \u201cwheel\u201d. File Permissions On a Linux operating system, each file and directory is assigned access permissions for the owner of the file, the members of a group of related users and everybody else. This is to make sure that one user is not allowed to access the files and resources of another user. To see the permissions of a file, we can use the ls command. Let's look at the permissions of /etc/passwd file. Let's go over some of the important fields in the output that are related to file permissions. Chmod command The chmod command is used to modify files and directories permissions in Linux. The chmod command accepts permissions in as a numerical argument. We can think of permission as a series of bits with 1 representing True or allowed and 0 representing False or not allowed. Permission rwx Binary Decimal Read, write and execute rwx 111 7 Read and write rw- 110 6 Read and execute r-x 101 5 Read only r-- 100 4 Write and execute -wx 011 3 Write only -w- 010 2 Execute only --x 001 1 None --- 000 0 We will now create a new file and check the permission of the file. The group owner doesn't have the permission to write to this file. Let's give the group owner or root the permission to write to it using chmod command. Chmod command can be also used to change the permissions of a directory in the similar way. Chown command The chown command is used to change the owner of files or directories in Linux. Command syntax: chown \\ \\ In case, we do not have sudo privileges, we need to use sudo command . Let's switch to user 'shivam' and try changing the owner. We have also changed the owner of the file to root before running the below command. Chown command can also be used to change the owner of a directory in the similar way. Chgrp command The chgrp command can be used to change the group ownership of files or directories in Linux. The syntax is very similar to that of chown command. Chgrp command can also be used to change the owner of a directory in the similar way. SSH Command The ssh command is used for logging into the remote systems, transfer files between systems and for executing commands on a remote machine. SSH stands for secure shell and is used to provide an encrypted secured connection between two hosts over an insecure network like the internet. Reference: https://www.ssh.com/ssh/command/ We will now discuss passwordless authentication which is secure and most commonly used for ssh authentication. Passwordless Authentication Using SSH Using this method, we can ssh into hosts without entering the password. This method is also useful when we want some scripts to perform ssh-related tasks. Passwordless authentication requires the use of a public and private key pair. As the name implies, the public key can be shared with anyone but the private key should be kept private. Lets not get into the details of how this authentication works. You can read more about it here Steps for setting up a passwordless authentication with a remote host: Generating public-private key pair If we already have a key pair stored in \\~/.ssh directory, we will not need to generate keys again. Install openssh package which contains all the commands related to ssh. Generate a key pair using the ssh-keygen command. One can choose the default values for all prompts. After running the ssh-keygen command successfully, we should see two keys present in the \\~/.ssh directory. Id_rsa is the private key and id_rsa.pub is the public key. Do note that the private key can only be read and modified by you. Transferring the public key to the remote host There are multiple ways to transfer the public key to the remote server. We will look at one of the most common ways of doing it using the ssh-copy-id command. Install the openssh-clients package to use ssh-copy-id command. Use the ssh-copy-id command to copy your public key to the remote host. Now, ssh into the remote host using the password authentication. Our public key should be there in \\~/.ssh/authorized_keys now. \\~/.ssh/authorized_key contains a list of public keys. The users associated with these public keys have the ssh access into the remote host. How to run commands on a remote host ? General syntax: ssh \\@\\ \\ How to transfer files from one host to another host ? General syntax: scp \\ \\ Package Management Package management is the process of installing and managing software on the system. We can install the packages which we require from the Linux package distributor. Different distributors use different packaging systems. Packaging systems Distributions Debian style (.deb) Debian, Ubuntu Red Hat style (.rpm) Fedora, CentOS, Red Hat Enterprise Linux Popular Packaging Systems in Linux Command Description yum install \\ Installs a package on your system yum update \\ Updates a package to it's latest available version yum remove \\ Removes a package from your system yum search \\ Searches for a particular keyword DNF is the successor to YUM which is now used in Fedora for installing and managing packages. DNF may replace YUM in the future on all RPM based Linux distributions. We did find an exact match for the keyword httpd when we searched using yum search command. Let's now install the httpd package. After httpd is installed, we will use the yum remove command to remove httpd package. Process Management In this section, we will study about some useful commands that can be used to monitor the processes on Linux systems. ps (process status) The ps command is used to know the information of a process or list of processes. If you get an error \"ps command not found\" while running ps command, do install procps package. ps without any arguments is not very useful. Let's try to list all the processes on the system by using the below command. Reference: https://unix.stackexchange.com/questions/106847/what-does-aux-mean-in-ps-aux We can use an additional argument with ps command to list the information about the process with a specific process ID. We can use grep in combination with ps command to list only specific processes. top The top command is used to show information about Linux processes running on the system in real time. It also shows a summary of the system information. For each process, top lists down the process ID, owner, priority, state, cpu utilization, memory utilization and much more information. It also lists down the memory utilization and cpu utilization of the system as a whole along with system uptime and cpu load average. Memory Management In this section, we will study about some useful commands that can be used to view information about the system memory. free The free command is used to display the memory usage of the system. The command displays the total free and used space available in the RAM along with space occupied by the caches/buffers. free command by default shows the memory usage in kilobytes. We can use an additional argument to get the data in human-readable format. vmstat The vmstat command can be used to display the memory usage along with additional information about io and cpu usage. Checking Disk Space In this section, we will study about some useful commands that can be used to view disk space on Linux. df (disk free) The df command is used to display the free and available space for each mounted file system. du (disk usage) The du command is used to display disk usage of files and directories on the system. The below command can be used to display the top 5 largest directories in the root directory. Daemons A computer program that runs as a background process is called a daemon. Traditionally, the name of daemon processes ended with d - sshd, httpd etc. We cannot interact with a daemon process as they run in the background. Services and daemons are used interchangeably most of the time. Systemd Systemd is a system and service manager for Linux operating systems. Systemd units are the building blocks of systemd. These units are represented by unit configuration files. The below examples shows the unit configuration files available at /usr/lib/systemd/system which are distributed by installed RPM packages. We are more interested in the configuration file that ends with service as these are service units. Managing System Services Service units end with .service file extension. Systemctl command can be used to start/stop/restart the services managed by systemd. Command Description systemctl start name.service Starts a service systemctl stop name.service Stops a service systemctl restart name.service Restarts a service systemctl status name.service Check the status of a service systemctl reload name.service Reload the configuration of a service Logs In this section, we will talk about some important files and directories which can be very useful for viewing system logs and applications logs in Linux. These logs can be very useful when you are troubleshooting on the system.","title":"Server Administration"},{"location":"level101/linux_basics/linux_server_administration/#linux-server-administration","text":"In this course will try to cover some of the common tasks that a linux server administrator performs. We will first try to understand what a particular command does and then try to understand the commands using examples. Do keep in mind that it's very important to practice the Linux commands on your own.","title":"Linux Server Administration"},{"location":"level101/linux_basics/linux_server_administration/#lab-environment-setup","text":"Install docker on your system - https://docs.docker.com/engine/install/ We will be running all the commands on Red Hat Enterprise Linux (RHEL) 8 system. We will run most of the commands used in this module in the above Docker container.","title":"Lab Environment Setup"},{"location":"level101/linux_basics/linux_server_administration/#multi-user-operating-systems","text":"An operating system is considered as multi-user if it allows multiple people/users to use a computer and not affect each other's files and preferences. Linux based operating systems are multi-user in nature as it allows multiple users to access the system at the same time. A typical computer will only have one keyboard and monitor but multiple users can log in via SSH if the computer is connected to the network. We will cover more about SSH later. As a server administrator, we are mostly concerned with the Linux servers which are physically present at a very large distance from us. We can connect to these servers with the help of remote login methods like SSH. Since Linux supports multiple users, we need to have a method which can protect the users from each other. One user should not be able to access and modify files of other users","title":"Multi-User Operating Systems"},{"location":"level101/linux_basics/linux_server_administration/#usergroup-management","text":"Users in Linux has an associated user ID called UID attached to them. Users also has a home directory and a login shell associated with them. A group is a collection of one or more users. A group makes it easier to share permissions among a group of users. Each group has a group ID called GID associated with it.","title":"User/Group Management"},{"location":"level101/linux_basics/linux_server_administration/#id-command","text":"id command can be used to find the uid and gid associated with an user. It also lists down the groups to which the user belongs to. The uid and gid associated with the root user is 0. A good way to find out the current user in Linux is to use the whoami command. \"root\" user or superuser is the most privileged user with unrestricted access to all the resources on the system. It has UID 0","title":"id command"},{"location":"level101/linux_basics/linux_server_administration/#important-files-associated-with-usersgroups","text":"/etc/passwd Stores the user name, the uid, the gid, the home directory, the login shell etc /etc/shadow Stores the password associated with the users /etc/group Stores information about different groups on the system If you want to understand each filed discussed in the above outputs, you can go through below links: https://tldp.org/LDP/lame/LAME/linux-admin-made-easy/shadow-file-formats.html https://tldp.org/HOWTO/User-Authentication-HOWTO/x71.html","title":"Important files associated with users/groups"},{"location":"level101/linux_basics/linux_server_administration/#important-commands-for-managing-users","text":"Some of the commands which are used frequently to manage users/groups on Linux are following: useradd - Creates a new user passwd - Adds or modifies passwords for a user usermod - Modifies attributes of an user userdel - Deletes an user","title":"Important commands for managing users"},{"location":"level101/linux_basics/linux_server_administration/#useradd","text":"The useradd command adds a new user in Linux. We will create a new user 'shivam'. We will also verify that the user has been created by tailing the /etc/passwd file. The uid and gid are 1000 for the newly created user. The home directory assigned to the user is /home/shivam and the login shell assigned is /bin/bash. Do note that the user home directory and login shell can be modified later on. If we do not specify any value for attributes like home directory or login shell, default values will be assigned to the user. We can also override these default values when creating a new user.","title":"useradd"},{"location":"level101/linux_basics/linux_server_administration/#passwd","text":"The passwd command is used to create or modify passwords for a user. In the above examples, we have not assigned any password for users 'shivam' or 'amit' while creating them. \"!!\" in an account entry in shadow means the account of an user has been created, but not yet given a password. Let's now try to create a password for user \"shivam\". Do remember the password as we will be later using examples where it will be useful. Also, let's change the password for the root user now. When we switch from a normal user to root user, it will request you for a password. Also, when you login using root user, the password will be asked.","title":"passwd"},{"location":"level101/linux_basics/linux_server_administration/#usermod","text":"The usermod command is used to modify the attributes of an user like the home directory or the shell. Let's try to modify the login shell of user \"amit\" to \"/bin/bash\". In a similar way, you can also modify many other attributes for a user. Try 'usermod -h' for a list of attributes you can modify.","title":"usermod"},{"location":"level101/linux_basics/linux_server_administration/#userdel","text":"The userdel command is used to remove a user on Linux. Once we remove a user, all the information related to that user will be removed. Let's try to delete the user \"amit\". After deleting the user, you will not find the entry for that user in \"/etc/passwd\" or \"/etc/shadow\" file.","title":"userdel"},{"location":"level101/linux_basics/linux_server_administration/#important-commands-for-managing-groups","text":"Commands for managing groups are quite similar to the commands used for managing users. Each command is not explained in detail here as they are quite similar. You can try running these commands on your system. groupadd \\ Creates a new group groupmod \\ Modifies attributes of a group groupdel \\ Deletes a group gpasswd \\ Modifies password for group We will now try to add user \"shivam\" to the group we have created above.","title":"Important commands for managing groups"},{"location":"level101/linux_basics/linux_server_administration/#becoming-a-superuser","text":"Before running the below commands, do make sure that you have set up a password for user \"shivam\" and user \"root\" using the passwd command described in the above section. The su command can be used to switch users in Linux. Let's now try to switch to user \"shivam\". Let's now try to open the \"/etc/shadow\" file. The operating system didn't allow the user \"shivam\" to read the content of the \"/etc/shadow\" file. This is an important file in Linux which stores the passwords of users. This file can only be accessed by root or users who have the superuser privileges. The sudo command allows a user to run commands with the security privileges of the root user. Do remember that the root user has all the privileges on a system. We can also use su command to switch to the root user and open the above file but doing that will require the password of the root user. An alternative way which is preferred on most modern operating systems is to use sudo command for becoming a superuser. Using this way, a user has to enter his/her password and they need to be a part of the sudo group. How to provide superpriveleges to other users ? Let's first switch to the root user using su command. Do note that using the below command will need you to enter the password for the root user. In case, you forgot to set a password for the root user, type \"exit\" and you will be back as the root user. Now, set up a password using the passwd command. The file /etc/sudoers holds the names of users permitted to invoke sudo . In redhat operating systems, this file is not present by default. We will need to install sudo. We will discuss the yum command in detail in later sections. Try to open the \"/etc/sudoers\" file on the system. The file has a lot of information. This file stores the rules that users must follow when running the sudo command. For example, root is allowed to run any commands from anywhere. One easy way of providing root access to users is to add them to a group which has permissions to run all the commands. \"wheel\" is a group in redhat Linux with such privileges. Let's add the user \"shivam\" to this group so that it also has sudo privileges. Let's now switch back to user \"shivam\" and try to access the \"/etc/shadow\" file. We need to use sudo before running the command since it can only be accessed with the sudo privileges. We have already given sudo privileges to user \u201cshivam\u201d by adding him to the group \u201cwheel\u201d.","title":"Becoming a Superuser"},{"location":"level101/linux_basics/linux_server_administration/#file-permissions","text":"On a Linux operating system, each file and directory is assigned access permissions for the owner of the file, the members of a group of related users and everybody else. This is to make sure that one user is not allowed to access the files and resources of another user. To see the permissions of a file, we can use the ls command. Let's look at the permissions of /etc/passwd file. Let's go over some of the important fields in the output that are related to file permissions.","title":"File Permissions"},{"location":"level101/linux_basics/linux_server_administration/#chmod-command","text":"The chmod command is used to modify files and directories permissions in Linux. The chmod command accepts permissions in as a numerical argument. We can think of permission as a series of bits with 1 representing True or allowed and 0 representing False or not allowed. Permission rwx Binary Decimal Read, write and execute rwx 111 7 Read and write rw- 110 6 Read and execute r-x 101 5 Read only r-- 100 4 Write and execute -wx 011 3 Write only -w- 010 2 Execute only --x 001 1 None --- 000 0 We will now create a new file and check the permission of the file. The group owner doesn't have the permission to write to this file. Let's give the group owner or root the permission to write to it using chmod command. Chmod command can be also used to change the permissions of a directory in the similar way.","title":"Chmod command"},{"location":"level101/linux_basics/linux_server_administration/#chown-command","text":"The chown command is used to change the owner of files or directories in Linux. Command syntax: chown \\ \\ In case, we do not have sudo privileges, we need to use sudo command . Let's switch to user 'shivam' and try changing the owner. We have also changed the owner of the file to root before running the below command. Chown command can also be used to change the owner of a directory in the similar way.","title":"Chown command"},{"location":"level101/linux_basics/linux_server_administration/#chgrp-command","text":"The chgrp command can be used to change the group ownership of files or directories in Linux. The syntax is very similar to that of chown command. Chgrp command can also be used to change the owner of a directory in the similar way.","title":"Chgrp command"},{"location":"level101/linux_basics/linux_server_administration/#ssh-command","text":"The ssh command is used for logging into the remote systems, transfer files between systems and for executing commands on a remote machine. SSH stands for secure shell and is used to provide an encrypted secured connection between two hosts over an insecure network like the internet. Reference: https://www.ssh.com/ssh/command/ We will now discuss passwordless authentication which is secure and most commonly used for ssh authentication.","title":"SSH Command"},{"location":"level101/linux_basics/linux_server_administration/#passwordless-authentication-using-ssh","text":"Using this method, we can ssh into hosts without entering the password. This method is also useful when we want some scripts to perform ssh-related tasks. Passwordless authentication requires the use of a public and private key pair. As the name implies, the public key can be shared with anyone but the private key should be kept private. Lets not get into the details of how this authentication works. You can read more about it here Steps for setting up a passwordless authentication with a remote host: Generating public-private key pair If we already have a key pair stored in \\~/.ssh directory, we will not need to generate keys again. Install openssh package which contains all the commands related to ssh. Generate a key pair using the ssh-keygen command. One can choose the default values for all prompts. After running the ssh-keygen command successfully, we should see two keys present in the \\~/.ssh directory. Id_rsa is the private key and id_rsa.pub is the public key. Do note that the private key can only be read and modified by you. Transferring the public key to the remote host There are multiple ways to transfer the public key to the remote server. We will look at one of the most common ways of doing it using the ssh-copy-id command. Install the openssh-clients package to use ssh-copy-id command. Use the ssh-copy-id command to copy your public key to the remote host. Now, ssh into the remote host using the password authentication. Our public key should be there in \\~/.ssh/authorized_keys now. \\~/.ssh/authorized_key contains a list of public keys. The users associated with these public keys have the ssh access into the remote host.","title":"Passwordless Authentication Using SSH"},{"location":"level101/linux_basics/linux_server_administration/#how-to-run-commands-on-a-remote-host","text":"General syntax: ssh \\@\\ \\","title":"How to run commands on a remote host ?"},{"location":"level101/linux_basics/linux_server_administration/#how-to-transfer-files-from-one-host-to-another-host","text":"General syntax: scp \\ \\","title":"How to transfer files from one host to another host ?"},{"location":"level101/linux_basics/linux_server_administration/#package-management","text":"Package management is the process of installing and managing software on the system. We can install the packages which we require from the Linux package distributor. Different distributors use different packaging systems. Packaging systems Distributions Debian style (.deb) Debian, Ubuntu Red Hat style (.rpm) Fedora, CentOS, Red Hat Enterprise Linux Popular Packaging Systems in Linux Command Description yum install \\ Installs a package on your system yum update \\ Updates a package to it's latest available version yum remove \\ Removes a package from your system yum search \\ Searches for a particular keyword DNF is the successor to YUM which is now used in Fedora for installing and managing packages. DNF may replace YUM in the future on all RPM based Linux distributions. We did find an exact match for the keyword httpd when we searched using yum search command. Let's now install the httpd package. After httpd is installed, we will use the yum remove command to remove httpd package.","title":"Package Management"},{"location":"level101/linux_basics/linux_server_administration/#process-management","text":"In this section, we will study about some useful commands that can be used to monitor the processes on Linux systems.","title":"Process Management"},{"location":"level101/linux_basics/linux_server_administration/#ps-process-status","text":"The ps command is used to know the information of a process or list of processes. If you get an error \"ps command not found\" while running ps command, do install procps package. ps without any arguments is not very useful. Let's try to list all the processes on the system by using the below command. Reference: https://unix.stackexchange.com/questions/106847/what-does-aux-mean-in-ps-aux We can use an additional argument with ps command to list the information about the process with a specific process ID. We can use grep in combination with ps command to list only specific processes.","title":"ps (process status)"},{"location":"level101/linux_basics/linux_server_administration/#top","text":"The top command is used to show information about Linux processes running on the system in real time. It also shows a summary of the system information. For each process, top lists down the process ID, owner, priority, state, cpu utilization, memory utilization and much more information. It also lists down the memory utilization and cpu utilization of the system as a whole along with system uptime and cpu load average.","title":"top"},{"location":"level101/linux_basics/linux_server_administration/#memory-management","text":"In this section, we will study about some useful commands that can be used to view information about the system memory.","title":"Memory Management"},{"location":"level101/linux_basics/linux_server_administration/#free","text":"The free command is used to display the memory usage of the system. The command displays the total free and used space available in the RAM along with space occupied by the caches/buffers. free command by default shows the memory usage in kilobytes. We can use an additional argument to get the data in human-readable format.","title":"free"},{"location":"level101/linux_basics/linux_server_administration/#vmstat","text":"The vmstat command can be used to display the memory usage along with additional information about io and cpu usage.","title":"vmstat"},{"location":"level101/linux_basics/linux_server_administration/#checking-disk-space","text":"In this section, we will study about some useful commands that can be used to view disk space on Linux.","title":"Checking Disk Space"},{"location":"level101/linux_basics/linux_server_administration/#df-disk-free","text":"The df command is used to display the free and available space for each mounted file system.","title":"df (disk free)"},{"location":"level101/linux_basics/linux_server_administration/#du-disk-usage","text":"The du command is used to display disk usage of files and directories on the system. The below command can be used to display the top 5 largest directories in the root directory.","title":"du (disk usage)"},{"location":"level101/linux_basics/linux_server_administration/#daemons","text":"A computer program that runs as a background process is called a daemon. Traditionally, the name of daemon processes ended with d - sshd, httpd etc. We cannot interact with a daemon process as they run in the background. Services and daemons are used interchangeably most of the time.","title":"Daemons"},{"location":"level101/linux_basics/linux_server_administration/#systemd","text":"Systemd is a system and service manager for Linux operating systems. Systemd units are the building blocks of systemd. These units are represented by unit configuration files. The below examples shows the unit configuration files available at /usr/lib/systemd/system which are distributed by installed RPM packages. We are more interested in the configuration file that ends with service as these are service units.","title":"Systemd"},{"location":"level101/linux_basics/linux_server_administration/#managing-system-services","text":"Service units end with .service file extension. Systemctl command can be used to start/stop/restart the services managed by systemd. Command Description systemctl start name.service Starts a service systemctl stop name.service Stops a service systemctl restart name.service Restarts a service systemctl status name.service Check the status of a service systemctl reload name.service Reload the configuration of a service","title":"Managing System Services"},{"location":"level101/linux_basics/linux_server_administration/#logs","text":"In this section, we will talk about some important files and directories which can be very useful for viewing system logs and applications logs in Linux. These logs can be very useful when you are troubleshooting on the system.","title":"Logs"},{"location":"level101/linux_networking/conclusion/","text":"Conclusion With this we have traversed through the TCP/IP stack completely. We hope there will be a different perspective when one opens any website in the browser post the course. During the course we have also dissected what are common tasks in this pipeline which falls under the ambit of SRE. Post Training Exercises Setup own DNS resolver in the dev environment which acts as an authoritative DNS server for example.com and forwarder for other domains. Update resolv.conf to use the new DNS resolver running in localhost Set up a site dummy.example.com in localhost and run a webserver with a self signed certificate. Update the trusted CAs or pass self signed CA\u2019s public key as a parameter so that curl https://dummy.example.com -v works properly without self signed cert warning Update the routing table to use another host(container/VM) in the same network as a gateway for 8.8.8.8/32 and run ping 8.8.8.8. Do the packet capture on the new gateway to see L3 hop is working as expected(might need to disable icmp_redirect)","title":"Conclusion"},{"location":"level101/linux_networking/conclusion/#conclusion","text":"With this we have traversed through the TCP/IP stack completely. We hope there will be a different perspective when one opens any website in the browser post the course. During the course we have also dissected what are common tasks in this pipeline which falls under the ambit of SRE.","title":"Conclusion"},{"location":"level101/linux_networking/conclusion/#post-training-exercises","text":"Setup own DNS resolver in the dev environment which acts as an authoritative DNS server for example.com and forwarder for other domains. Update resolv.conf to use the new DNS resolver running in localhost Set up a site dummy.example.com in localhost and run a webserver with a self signed certificate. Update the trusted CAs or pass self signed CA\u2019s public key as a parameter so that curl https://dummy.example.com -v works properly without self signed cert warning Update the routing table to use another host(container/VM) in the same network as a gateway for 8.8.8.8/32 and run ping 8.8.8.8. Do the packet capture on the new gateway to see L3 hop is working as expected(might need to disable icmp_redirect)","title":"Post Training Exercises"},{"location":"level101/linux_networking/dns/","text":"DNS Domain Names are the simple human-readable names for websites. The Internet understands only IP addresses, but since memorizing incoherent numbers is not practical, domain names are used instead. These domain names are translated into IP addresses by the DNS infrastructure. When somebody tries to open www.linkedin.com in the browser, the browser tries to convert www.linkedin.com to an IP Address. This process is called DNS resolution. A simple pseudocode depicting this process looks this ip, err = getIPAddress(domainName) if err: print(\u201cunknown Host Exception while trying to resolve:%s\u201d.format(domainName)) Now let\u2019s try to understand what happens inside the getIPAddress function. The browser would have a DNS cache of its own where it checks if there is a mapping for the domainName to an IP Address already available, in which case the browser uses that IP address. If no such mapping exists, the browser calls gethostbyname syscall to ask the operating system to find the IP address for the given domainName def getIPAddress(domainName): resp, fail = lookupCache(domainName) If not fail: return resp else: resp, err = gethostbyname(domainName) if err: return null, err else: return resp Now lets understand what operating system kernel does when the gethostbyname function is called. The Linux operating system looks at the file /etc/nsswitch.conf file which usually has a line hosts: files dns This line means the OS has to look up first in file (/etc/hosts) and then use DNS protocol to do the resolution if there is no match in /etc/hosts. The file /etc/hosts is of format IPAddress FQDN [FQDN].* 127.0.0.1 localhost.localdomain localhost ::1 localhost.localdomain localhost If a match exists for a domain in this file then that IP address is returned by the OS. Lets add a line to this file 127.0.0.1 test.linkedin.com And then do ping test.linkedin.com ping test.linkedin.com -n PING test.linkedin.com (127.0.0.1) 56(84) bytes of data. 64 bytes from 127.0.0.1: icmp_seq=1 ttl=64 time=0.047 ms 64 bytes from 127.0.0.1: icmp_seq=2 ttl=64 time=0.036 ms 64 bytes from 127.0.0.1: icmp_seq=3 ttl=64 time=0.037 ms As mentioned earlier, if no match exists in /etc/hosts, the OS tries to do a DNS resolution using the DNS protocol. The linux system makes a DNS request to the first IP in /etc/resolv.conf. If there is no response, requests are sent to subsequent servers in resolv.conf. These servers in resolv.conf are called DNS resolvers. The DNS resolvers are populated by DHCP or statically configured by an administrator. Dig is a userspace DNS system which creates and sends request to DNS resolvers and prints the response it receives to the console. #run this command in one shell to capture all DNS requests sudo tcpdump -s 0 -A -i any port 53 #make a dig request from another shell dig linkedin.com 13:19:54.432507 IP 172.19.209.122.56497 > 172.23.195.101.53: 527+ [1au] A? linkedin.com. (41) ....E..E....@.n....z...e...5.1.:... .........linkedin.com.......)........ 13:19:54.485131 IP 172.23.195.101.53 > 172.19.209.122.56497: 527 1/0/1 A 108.174.10.10 (57) ....E..U..@.|. ....e...z.5...A...............linkedin.com..............3..l. ..)........ The packet capture shows a request is made to 172.23.195.101:53 (this is the resolver in /etc/resolv.conf) for linkedin.com and a response is received from 172.23.195.101 with the IP address of linkedin.com 108.174.10.10 Now let's try to understand how DNS resolver tries to find the IP address of linkedin.com. DNS resolver first looks at its cache. Since many devices in the network can query for the domain name linkedin.com, the name resolution result may already exist in the cache. If there is a cache miss, it starts the DNS resolution process. The DNS server breaks \u201clinkedin.com\u201d to \u201c.\u201d, \u201ccom.\u201d and \u201clinkedin.com.\u201d and starts DNS resolution from \u201c.\u201d. The \u201c.\u201d is called root domain and those IPs are known to the DNS resolver software. DNS resolver queries the root domain nameservers to find the right top-level domain (TLD) nameservers which could respond regarding details for \"com.\". The address of the TLD nameserver of \u201ccom.\u201d is returned. Now the DNS resolution service contacts the TLD nameserver for \u201ccom.\u201d to fetch the authoritative nameserver for \u201clinkedin.com\u201d. Once an authoritative nameserver of \u201clinkedin.com\u201d is known, the resolver contacts Linkedin\u2019s nameserver to provide the IP address of \u201clinkedin.com\u201d. This whole process can be visualized by running the following - dig +trace linkedin.com linkedin.com. 3600 IN A 108.174.10.10 This DNS response has 5 fields where the first field is the request and the last field is the response. The second field is the Time to Live which says how long the DNS response is valid in seconds. In this case this mapping of linkedin.com is valid for 1 hour. This is how the resolvers and application(browser) maintain their cache. Any request for linkedin.com beyond 1 hour will be treated as a cache miss as the mapping has expired its TTL and the whole process has to be redone. The 4th field says the type of DNS response/request. Some of the various DNS query types are A, AAAA, NS, TXT, PTR, MX and CNAME. A record returns IPV4 address of the domain name AAAA record returns the IPV6 address of the domain Name NS record returns the authoritative nameserver for the domain name CNAME records are aliases to the domain names. Some domains point to other domain names and resolving the latter domain name gives an IP which is used as an IP for the former domain name as well. Example www.linkedin.com\u2019s IP address is the same as 2-01-2c3e-005a.cdx.cedexis.net. For the brevity we are not discussing other DNS record types, the RFC of each of these records are available here . dig A linkedin.com +short 108.174.10.10 dig AAAA linkedin.com +short 2620:109:c002::6cae:a0a dig NS linkedin.com +short dns3.p09.nsone.net. dns4.p09.nsone.net. dns2.p09.nsone.net. ns4.p43.dynect.net. ns1.p43.dynect.net. ns2.p43.dynect.net. ns3.p43.dynect.net. dns1.p09.nsone.net. dig www.linkedin.com CNAME +short 2-01-2c3e-005a.cdx.cedexis.net. Armed with these fundamentals of DNS lets see usecases where DNS is used by SREs. Applications in SRE role This section covers some of the common solutions SRE can derive from DNS Every company has to have its internal DNS infrastructure for intranet sites and internal services like databases and other internal applications like wiki. So there has to be a DNS infrastructure maintained for those domain names by the infrastructure team. This DNS infrastructure has to be optimized and scaled so that it doesn\u2019t become a single point of failure. Failure of the internal DNS infrastructure can cause API calls of microservices to fail and other cascading effects. DNS can also be used for discovering services. For example the hostname serviceb.internal.example.com could list instances which run service b internally in example.com company. Cloud providers provide options to enable DNS discovery( example ) DNS is used by cloud providers and CDN providers to scale their services. In Azure/AWS, Load Balancers are given a CNAME instead of IPAddress. They update the IPAddress of the Loadbalancers as they scale by changing the IP Address of alias domain names. This is one of the reasons why A records of such alias domains are short lived like 1 minute. DNS can also be used to make clients get IP addresses closer to their location so that their HTTP calls can be responded faster if the company has a presence geographically distributed. SRE also has to understand since there is no verification in DNS infrastructure, these responses can be spoofed. This is safeguarded by other protocols like HTTPS(dealt later). DNSSEC protects from forged or manipulated DNS responses. Stale DNS cache can be a problem. Some apps might still be using expired DNS records for their api calls. This is something SRE has to be wary of when doing maintenance. DNS Loadbalancing and service discovery also has to understand TTL and the servers can be removed from the pool only after waiting till TTL post the changes are made to DNS records. If this is not done, a certain portion of the traffic will fail as the server is removed before the TTL.","title":"DNS"},{"location":"level101/linux_networking/dns/#dns","text":"Domain Names are the simple human-readable names for websites. The Internet understands only IP addresses, but since memorizing incoherent numbers is not practical, domain names are used instead. These domain names are translated into IP addresses by the DNS infrastructure. When somebody tries to open www.linkedin.com in the browser, the browser tries to convert www.linkedin.com to an IP Address. This process is called DNS resolution. A simple pseudocode depicting this process looks this ip, err = getIPAddress(domainName) if err: print(\u201cunknown Host Exception while trying to resolve:%s\u201d.format(domainName)) Now let\u2019s try to understand what happens inside the getIPAddress function. The browser would have a DNS cache of its own where it checks if there is a mapping for the domainName to an IP Address already available, in which case the browser uses that IP address. If no such mapping exists, the browser calls gethostbyname syscall to ask the operating system to find the IP address for the given domainName def getIPAddress(domainName): resp, fail = lookupCache(domainName) If not fail: return resp else: resp, err = gethostbyname(domainName) if err: return null, err else: return resp Now lets understand what operating system kernel does when the gethostbyname function is called. The Linux operating system looks at the file /etc/nsswitch.conf file which usually has a line hosts: files dns This line means the OS has to look up first in file (/etc/hosts) and then use DNS protocol to do the resolution if there is no match in /etc/hosts. The file /etc/hosts is of format IPAddress FQDN [FQDN].* 127.0.0.1 localhost.localdomain localhost ::1 localhost.localdomain localhost If a match exists for a domain in this file then that IP address is returned by the OS. Lets add a line to this file 127.0.0.1 test.linkedin.com And then do ping test.linkedin.com ping test.linkedin.com -n PING test.linkedin.com (127.0.0.1) 56(84) bytes of data. 64 bytes from 127.0.0.1: icmp_seq=1 ttl=64 time=0.047 ms 64 bytes from 127.0.0.1: icmp_seq=2 ttl=64 time=0.036 ms 64 bytes from 127.0.0.1: icmp_seq=3 ttl=64 time=0.037 ms As mentioned earlier, if no match exists in /etc/hosts, the OS tries to do a DNS resolution using the DNS protocol. The linux system makes a DNS request to the first IP in /etc/resolv.conf. If there is no response, requests are sent to subsequent servers in resolv.conf. These servers in resolv.conf are called DNS resolvers. The DNS resolvers are populated by DHCP or statically configured by an administrator. Dig is a userspace DNS system which creates and sends request to DNS resolvers and prints the response it receives to the console. #run this command in one shell to capture all DNS requests sudo tcpdump -s 0 -A -i any port 53 #make a dig request from another shell dig linkedin.com 13:19:54.432507 IP 172.19.209.122.56497 > 172.23.195.101.53: 527+ [1au] A? linkedin.com. (41) ....E..E....@.n....z...e...5.1.:... .........linkedin.com.......)........ 13:19:54.485131 IP 172.23.195.101.53 > 172.19.209.122.56497: 527 1/0/1 A 108.174.10.10 (57) ....E..U..@.|. ....e...z.5...A...............linkedin.com..............3..l. ..)........ The packet capture shows a request is made to 172.23.195.101:53 (this is the resolver in /etc/resolv.conf) for linkedin.com and a response is received from 172.23.195.101 with the IP address of linkedin.com 108.174.10.10 Now let's try to understand how DNS resolver tries to find the IP address of linkedin.com. DNS resolver first looks at its cache. Since many devices in the network can query for the domain name linkedin.com, the name resolution result may already exist in the cache. If there is a cache miss, it starts the DNS resolution process. The DNS server breaks \u201clinkedin.com\u201d to \u201c.\u201d, \u201ccom.\u201d and \u201clinkedin.com.\u201d and starts DNS resolution from \u201c.\u201d. The \u201c.\u201d is called root domain and those IPs are known to the DNS resolver software. DNS resolver queries the root domain nameservers to find the right top-level domain (TLD) nameservers which could respond regarding details for \"com.\". The address of the TLD nameserver of \u201ccom.\u201d is returned. Now the DNS resolution service contacts the TLD nameserver for \u201ccom.\u201d to fetch the authoritative nameserver for \u201clinkedin.com\u201d. Once an authoritative nameserver of \u201clinkedin.com\u201d is known, the resolver contacts Linkedin\u2019s nameserver to provide the IP address of \u201clinkedin.com\u201d. This whole process can be visualized by running the following - dig +trace linkedin.com linkedin.com. 3600 IN A 108.174.10.10 This DNS response has 5 fields where the first field is the request and the last field is the response. The second field is the Time to Live which says how long the DNS response is valid in seconds. In this case this mapping of linkedin.com is valid for 1 hour. This is how the resolvers and application(browser) maintain their cache. Any request for linkedin.com beyond 1 hour will be treated as a cache miss as the mapping has expired its TTL and the whole process has to be redone. The 4th field says the type of DNS response/request. Some of the various DNS query types are A, AAAA, NS, TXT, PTR, MX and CNAME. A record returns IPV4 address of the domain name AAAA record returns the IPV6 address of the domain Name NS record returns the authoritative nameserver for the domain name CNAME records are aliases to the domain names. Some domains point to other domain names and resolving the latter domain name gives an IP which is used as an IP for the former domain name as well. Example www.linkedin.com\u2019s IP address is the same as 2-01-2c3e-005a.cdx.cedexis.net. For the brevity we are not discussing other DNS record types, the RFC of each of these records are available here . dig A linkedin.com +short 108.174.10.10 dig AAAA linkedin.com +short 2620:109:c002::6cae:a0a dig NS linkedin.com +short dns3.p09.nsone.net. dns4.p09.nsone.net. dns2.p09.nsone.net. ns4.p43.dynect.net. ns1.p43.dynect.net. ns2.p43.dynect.net. ns3.p43.dynect.net. dns1.p09.nsone.net. dig www.linkedin.com CNAME +short 2-01-2c3e-005a.cdx.cedexis.net. Armed with these fundamentals of DNS lets see usecases where DNS is used by SREs.","title":"DNS"},{"location":"level101/linux_networking/dns/#applications-in-sre-role","text":"This section covers some of the common solutions SRE can derive from DNS Every company has to have its internal DNS infrastructure for intranet sites and internal services like databases and other internal applications like wiki. So there has to be a DNS infrastructure maintained for those domain names by the infrastructure team. This DNS infrastructure has to be optimized and scaled so that it doesn\u2019t become a single point of failure. Failure of the internal DNS infrastructure can cause API calls of microservices to fail and other cascading effects. DNS can also be used for discovering services. For example the hostname serviceb.internal.example.com could list instances which run service b internally in example.com company. Cloud providers provide options to enable DNS discovery( example ) DNS is used by cloud providers and CDN providers to scale their services. In Azure/AWS, Load Balancers are given a CNAME instead of IPAddress. They update the IPAddress of the Loadbalancers as they scale by changing the IP Address of alias domain names. This is one of the reasons why A records of such alias domains are short lived like 1 minute. DNS can also be used to make clients get IP addresses closer to their location so that their HTTP calls can be responded faster if the company has a presence geographically distributed. SRE also has to understand since there is no verification in DNS infrastructure, these responses can be spoofed. This is safeguarded by other protocols like HTTPS(dealt later). DNSSEC protects from forged or manipulated DNS responses. Stale DNS cache can be a problem. Some apps might still be using expired DNS records for their api calls. This is something SRE has to be wary of when doing maintenance. DNS Loadbalancing and service discovery also has to understand TTL and the servers can be removed from the pool only after waiting till TTL post the changes are made to DNS records. If this is not done, a certain portion of the traffic will fail as the server is removed before the TTL.","title":"Applications in SRE role"},{"location":"level101/linux_networking/http/","text":"HTTP Till this point we have only got the IP address of linkedin.com. The HTML page of linkedin.com is served by HTTP protocol which the browser renders. Browser sends a HTTP request to the IP of the server determined above. Request has a verb GET, PUT, POST followed by a path and query parameters and lines of key value pair which gives information about the client and capabilities of the client like contents it can accept and a body (usually in POST or PUT) # Eg run the following in your container and have a look at the headers curl linkedin.com -v * Connected to linkedin.com (108.174.10.10) port 80 (#0) > GET / HTTP/1.1 > Host: linkedin.com > User-Agent: curl/7.64.1 > Accept: */* > < HTTP/1.1 301 Moved Permanently < Date: Mon, 09 Nov 2020 10:39:43 GMT < X-Li-Pop: prod-esv5 < X-LI-Proto: http/1.1 < Location: https://www.linkedin.com/ < Content-Length: 0 < * Connection #0 to host linkedin.com left intact * Closing connection 0 Here, in the first line GET is the verb, / is the path and 1.1 is the HTTP protocol version. Then there are key value pairs which give client capabilities and some details to the server. The server responds back with HTTP version, Status Code and Status message . Status codes 2xx means success, 3xx denotes redirection, 4xx denotes client side errors and 5xx server side errors. We will now jump in to see the difference between HTTP/1.0 and HTTP/1.1. #On the terminal type telnet www.linkedin.com 80 #Copy and paste the following with an empty new line at last in the telnet STDIN GET / HTTP/1.1 HOST:linkedin.com USER-AGENT: curl This would get server response and waits for next input as the underlying connection to www.linkedin.com can be reused for further queries. While going through TCP, we can understand the benefits of this. But in HTTP/1.0 this connection will be immediately closed after the response meaning new connection has to be opened for each query. HTTP/1.1 can have only one inflight request in an open connection but connection can be reused for multiple requests one after another. One of the benefits of HTTP/2.0 over HTTP/1.1 is we can have multiple inflight requests on the same connection. We are restricting our scope to generic HTTP and not jumping to the intricacies of each protocol version but they should be straight forward to understand post the course. HTTP is called stateless protocol . This section we will try to understand what stateless means. Say we logged in to linkedin.com, each request to linkedin.com from the client will have no context of the user and it makes no sense to prompt user to login for each page/resource. This problem of HTTP is solved by COOKIE . A user is created a session when a user logs in. This session identifier is sent to the browser via SET-COOKIE header. The browser stores the COOKIE till the expiry set by the server and sends the cookie for each request from hereon for linkedin.com. More details on cookies are available here . Cookies are a critical piece of information like password and since HTTP is a plain text protocol, any man in the middle can capture either password or cookies and can breach the privacy of the user. Similarly as discussed during DNS a spoofed IP of linkedin.com can cause a phishing attack on users where an user can give linkedin\u2019s password to login on the malicious site. To solve both problems HTTPs came in place and HTTPs has to be mandated. HTTPS has to provide server identification and encryption of data between client and server. The server administrator has to generate a private public key pair and certificate request. This certificate request has to be signed by a certificate authority which converts the certificate request to a certificate. The server administrator has to update the certificate and private key to the webserver. The certificate has details about the server (like domain name for which it serves, expiry date), public key of the server. The private key is a secret to the server and losing the private key loses the trust the server provides. When clients connect, the client sends a HELLO. The server sends its certificate to the client. The client checks the validity of the cert by seeing if it is within its expiry time, if it is signed by a trusted authority and the hostname in the cert is the same as the server. This validation makes sure the server is the right server and there is no phishing. Once that is validated, the client negotiates a symmetrical key and cipher with the server by encrypting the negotiation with the public key of the server. Nobody else other than the server who has the private key can understand this data. Once negotiation is complete, that symmetric key and algorithm is used for further encryption which can be decrypted only by client and server from thereon as they only know the symmetric key and algorithm. The switch to symmetric algorithm from asymmetric encryption algorithm is to not strain the resources of client devices as symmetric encryption is generally less resource intensive than asymmetric. #Try the following on your terminal to see the cert details like Subject Name(domain name), Issuer details, Expiry date curl https://www.linkedin.com -v * Connected to www.linkedin.com (13.107.42.14) port 443 (#0) * ALPN, offering h2 * ALPN, offering http/1.1 * successfully set certificate verify locations: * CAfile: /etc/ssl/cert.pem CApath: none * TLSv1.2 (OUT), TLS handshake, Client hello (1): } [230 bytes data] * TLSv1.2 (IN), TLS handshake, Server hello (2): { [90 bytes data] * TLSv1.2 (IN), TLS handshake, Certificate (11): { [3171 bytes data] * TLSv1.2 (IN), TLS handshake, Server key exchange (12): { [365 bytes data] * TLSv1.2 (IN), TLS handshake, Server finished (14): { [4 bytes data] * TLSv1.2 (OUT), TLS handshake, Client key exchange (16): } [102 bytes data] * TLSv1.2 (OUT), TLS change cipher, Change cipher spec (1): } [1 bytes data] * TLSv1.2 (OUT), TLS handshake, Finished (20): } [16 bytes data] * TLSv1.2 (IN), TLS change cipher, Change cipher spec (1): { [1 bytes data] * TLSv1.2 (IN), TLS handshake, Finished (20): { [16 bytes data] * SSL connection using TLSv1.2 / ECDHE-RSA-AES256-GCM-SHA384 * ALPN, server accepted to use h2 * Server certificate: * subject: C=US; ST=California; L=Sunnyvale; O=LinkedIn Corporation; CN=www.linkedin.com * start date: Oct 2 00:00:00 2020 GMT * expire date: Apr 2 12:00:00 2021 GMT * subjectAltName: host \"www.linkedin.com\" matched cert's \"www.linkedin.com\" * issuer: C=US; O=DigiCert Inc; CN=DigiCert SHA2 Secure Server CA * SSL certificate verify ok. * Using HTTP2, server supports multi-use * Connection state changed (HTTP/2 confirmed) * Copying HTTP/2 data in stream buffer to connection buffer after upgrade: len=0 * Using Stream ID: 1 (easy handle 0x7fb055808200) * Connection state changed (MAX_CONCURRENT_STREAMS == 100)! 0 82117 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0 * Connection #0 to host www.linkedin.com left intact HTTP/2 200 cache-control: no-cache, no-store pragma: no-cache content-length: 82117 content-type: text/html; charset=utf-8 expires: Thu, 01 Jan 1970 00:00:00 GMT set-cookie: JSESSIONID=ajax:2747059799136291014; SameSite=None; Path=/; Domain=.www.linkedin.com; Secure set-cookie: lang=v=2&lang=en-us; SameSite=None; Path=/; Domain=linkedin.com; Secure set-cookie: bcookie=\"v=2&70bd59e3-5a51-406c-8e0d-dd70befa8890\"; domain=.linkedin.com; Path=/; Secure; Expires=Wed, 09-Nov-2022 22:27:42 GMT; SameSite=None set-cookie: bscookie=\"v=1&202011091050107ae9b7ac-fe97-40fc-830d-d7a9ccf80659AQGib5iXwarbY8CCBP94Q39THkgUlx6J\"; domain=.www.linkedin.com; Path=/; Secure; Expires=Wed, 09-Nov-2022 22:27:42 GMT; HttpOnly; SameSite=None set-cookie: lissc=1; domain=.linkedin.com; Path=/; Secure; Expires=Tue, 09-Nov-2021 10:50:10 GMT; SameSite=None set-cookie: lidc=\"b=VGST04:s=V:r=V:g=2201:u=1:i=1604919010:t=1605005410:v=1:sig=AQHe-KzU8i_5Iy6MwnFEsgRct3c9Lh5R\"; Expires=Tue, 10 Nov 2020 10:50:10 GMT; domain=.linkedin.com; Path=/; SameSite=None; Secure x-fs-txn-id: 2b8d5409ba70 x-fs-uuid: 61bbf94956d14516302567fc882b0000 expect-ct: max-age=86400, report-uri=\"https://www.linkedin.com/platform-telemetry/ct\" x-xss-protection: 1; mode=block content-security-policy-report-only: default-src 'none'; connect-src 'self' www.linkedin.com www.google-analytics.com https://dpm.demdex.net/id lnkd.demdex.net blob: https://linkedin.sc.omtrdc.net/b/ss/ static.licdn.com static-exp1.licdn.com static-exp2.licdn.com static-exp3.licdn.com; script-src 'sha256-THuVhwbXPeTR0HszASqMOnIyxqEgvGyBwSPBKBF/iMc=' 'sha256-PyCXNcEkzRWqbiNr087fizmiBBrq9O6GGD8eV3P09Ik=' 'sha256-2SQ55Erm3CPCb+k03EpNxU9bdV3XL9TnVTriDs7INZ4=' 'sha256-S/KSPe186K/1B0JEjbIXcCdpB97krdzX05S+dHnQjUs=' platform.linkedin.com platform-akam.linkedin.com platform-ecst.linkedin.com platform-azur.linkedin.com static.licdn.com static-exp1.licdn.com static-exp2.licdn.com static-exp3.licdn.com; img-src data: blob: *; font-src data: *; style-src 'self' 'unsafe-inline' static.licdn.com static-exp1.licdn.com static-exp2.licdn.com static-exp3.licdn.com; media-src dms.licdn.com; child-src blob: *; frame-src 'self' lnkd.demdex.net linkedin.cdn.qualaroo.com; manifest-src 'self'; report-uri https://www.linkedin.com/platform-telemetry/csp?f=g content-security-policy: default-src *; connect-src 'self' https://media-src.linkedin.com/media/ www.linkedin.com s.c.lnkd.licdn.com m.c.lnkd.licdn.com s.c.exp1.licdn.com s.c.exp2.licdn.com m.c.exp1.licdn.com m.c.exp2.licdn.com wss://*.linkedin.com dms.licdn.com https://dpm.demdex.net/id lnkd.demdex.net blob: https://accounts.google.com/gsi/status https://linkedin.sc.omtrdc.net/b/ss/ www.google-analytics.com static.licdn.com static-exp1.licdn.com static-exp2.licdn.com static-exp3.licdn.com media.licdn.com media-exp1.licdn.com media-exp2.licdn.com media-exp3.licdn.com; img-src data: blob: *; font-src data: *; style-src 'unsafe-inline' 'self' static-src.linkedin.com *.licdn.com; script-src 'report-sample' 'unsafe-inline' 'unsafe-eval' 'self' spdy.linkedin.com static-src.linkedin.com *.ads.linkedin.com *.licdn.com static.chartbeat.com www.google-analytics.com ssl.google-analytics.com bcvipva02.rightnowtech.com www.bizographics.com sjs.bizographics.com js.bizographics.com d.la4-c1-was.salesforceliveagent.com slideshare.www.linkedin.com https://snap.licdn.com/li.lms-analytics/ platform.linkedin.com platform-akam.linkedin.com platform-ecst.linkedin.com platform-azur.linkedin.com; object-src 'none'; media-src blob: *; child-src blob: lnkd-communities: voyager: *; frame-ancestors 'self'; report-uri https://www.linkedin.com/platform-telemetry/csp?f=l x-frame-options: sameorigin x-content-type-options: nosniff strict-transport-security: max-age=2592000 x-li-fabric: prod-lva1 x-li-pop: afd-prod-lva1 x-li-proto: http/2 x-li-uuid: Ybv5SVbRRRYwJWf8iCsAAA== x-msedge-ref: Ref A: CFB9AC1D2B0645DDB161CEE4A4909AEF Ref B: BOM02EDGE0712 Ref C: 2020-11-09T10:50:10Z date: Mon, 09 Nov 2020 10:50:10 GMT * Closing connection 0 Here my system has a list of certificate authorities it trusts in this file /etc/ssl/cert.pem. Curl validates the certificate is for www.linkedin.com by seeing the CN section of the subject part of the certificate. It also makes sure the certificate is not expired by seeing the expire date. It also validates the signature on the certificate by using the public key of issuer Digicert in /etc/ssl/cert.pem. Once this is done, using the public key of www.linkedin.com it negotiates cipher TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 with a symmetric key. Subsequent data transfer including first HTTP request uses the same cipher and symmetric key.","title":"HTTP"},{"location":"level101/linux_networking/http/#http","text":"Till this point we have only got the IP address of linkedin.com. The HTML page of linkedin.com is served by HTTP protocol which the browser renders. Browser sends a HTTP request to the IP of the server determined above. Request has a verb GET, PUT, POST followed by a path and query parameters and lines of key value pair which gives information about the client and capabilities of the client like contents it can accept and a body (usually in POST or PUT) # Eg run the following in your container and have a look at the headers curl linkedin.com -v * Connected to linkedin.com (108.174.10.10) port 80 (#0) > GET / HTTP/1.1 > Host: linkedin.com > User-Agent: curl/7.64.1 > Accept: */* > < HTTP/1.1 301 Moved Permanently < Date: Mon, 09 Nov 2020 10:39:43 GMT < X-Li-Pop: prod-esv5 < X-LI-Proto: http/1.1 < Location: https://www.linkedin.com/ < Content-Length: 0 < * Connection #0 to host linkedin.com left intact * Closing connection 0 Here, in the first line GET is the verb, / is the path and 1.1 is the HTTP protocol version. Then there are key value pairs which give client capabilities and some details to the server. The server responds back with HTTP version, Status Code and Status message . Status codes 2xx means success, 3xx denotes redirection, 4xx denotes client side errors and 5xx server side errors. We will now jump in to see the difference between HTTP/1.0 and HTTP/1.1. #On the terminal type telnet www.linkedin.com 80 #Copy and paste the following with an empty new line at last in the telnet STDIN GET / HTTP/1.1 HOST:linkedin.com USER-AGENT: curl This would get server response and waits for next input as the underlying connection to www.linkedin.com can be reused for further queries. While going through TCP, we can understand the benefits of this. But in HTTP/1.0 this connection will be immediately closed after the response meaning new connection has to be opened for each query. HTTP/1.1 can have only one inflight request in an open connection but connection can be reused for multiple requests one after another. One of the benefits of HTTP/2.0 over HTTP/1.1 is we can have multiple inflight requests on the same connection. We are restricting our scope to generic HTTP and not jumping to the intricacies of each protocol version but they should be straight forward to understand post the course. HTTP is called stateless protocol . This section we will try to understand what stateless means. Say we logged in to linkedin.com, each request to linkedin.com from the client will have no context of the user and it makes no sense to prompt user to login for each page/resource. This problem of HTTP is solved by COOKIE . A user is created a session when a user logs in. This session identifier is sent to the browser via SET-COOKIE header. The browser stores the COOKIE till the expiry set by the server and sends the cookie for each request from hereon for linkedin.com. More details on cookies are available here . Cookies are a critical piece of information like password and since HTTP is a plain text protocol, any man in the middle can capture either password or cookies and can breach the privacy of the user. Similarly as discussed during DNS a spoofed IP of linkedin.com can cause a phishing attack on users where an user can give linkedin\u2019s password to login on the malicious site. To solve both problems HTTPs came in place and HTTPs has to be mandated. HTTPS has to provide server identification and encryption of data between client and server. The server administrator has to generate a private public key pair and certificate request. This certificate request has to be signed by a certificate authority which converts the certificate request to a certificate. The server administrator has to update the certificate and private key to the webserver. The certificate has details about the server (like domain name for which it serves, expiry date), public key of the server. The private key is a secret to the server and losing the private key loses the trust the server provides. When clients connect, the client sends a HELLO. The server sends its certificate to the client. The client checks the validity of the cert by seeing if it is within its expiry time, if it is signed by a trusted authority and the hostname in the cert is the same as the server. This validation makes sure the server is the right server and there is no phishing. Once that is validated, the client negotiates a symmetrical key and cipher with the server by encrypting the negotiation with the public key of the server. Nobody else other than the server who has the private key can understand this data. Once negotiation is complete, that symmetric key and algorithm is used for further encryption which can be decrypted only by client and server from thereon as they only know the symmetric key and algorithm. The switch to symmetric algorithm from asymmetric encryption algorithm is to not strain the resources of client devices as symmetric encryption is generally less resource intensive than asymmetric. #Try the following on your terminal to see the cert details like Subject Name(domain name), Issuer details, Expiry date curl https://www.linkedin.com -v * Connected to www.linkedin.com (13.107.42.14) port 443 (#0) * ALPN, offering h2 * ALPN, offering http/1.1 * successfully set certificate verify locations: * CAfile: /etc/ssl/cert.pem CApath: none * TLSv1.2 (OUT), TLS handshake, Client hello (1): } [230 bytes data] * TLSv1.2 (IN), TLS handshake, Server hello (2): { [90 bytes data] * TLSv1.2 (IN), TLS handshake, Certificate (11): { [3171 bytes data] * TLSv1.2 (IN), TLS handshake, Server key exchange (12): { [365 bytes data] * TLSv1.2 (IN), TLS handshake, Server finished (14): { [4 bytes data] * TLSv1.2 (OUT), TLS handshake, Client key exchange (16): } [102 bytes data] * TLSv1.2 (OUT), TLS change cipher, Change cipher spec (1): } [1 bytes data] * TLSv1.2 (OUT), TLS handshake, Finished (20): } [16 bytes data] * TLSv1.2 (IN), TLS change cipher, Change cipher spec (1): { [1 bytes data] * TLSv1.2 (IN), TLS handshake, Finished (20): { [16 bytes data] * SSL connection using TLSv1.2 / ECDHE-RSA-AES256-GCM-SHA384 * ALPN, server accepted to use h2 * Server certificate: * subject: C=US; ST=California; L=Sunnyvale; O=LinkedIn Corporation; CN=www.linkedin.com * start date: Oct 2 00:00:00 2020 GMT * expire date: Apr 2 12:00:00 2021 GMT * subjectAltName: host \"www.linkedin.com\" matched cert's \"www.linkedin.com\" * issuer: C=US; O=DigiCert Inc; CN=DigiCert SHA2 Secure Server CA * SSL certificate verify ok. * Using HTTP2, server supports multi-use * Connection state changed (HTTP/2 confirmed) * Copying HTTP/2 data in stream buffer to connection buffer after upgrade: len=0 * Using Stream ID: 1 (easy handle 0x7fb055808200) * Connection state changed (MAX_CONCURRENT_STREAMS == 100)! 0 82117 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0 * Connection #0 to host www.linkedin.com left intact HTTP/2 200 cache-control: no-cache, no-store pragma: no-cache content-length: 82117 content-type: text/html; charset=utf-8 expires: Thu, 01 Jan 1970 00:00:00 GMT set-cookie: JSESSIONID=ajax:2747059799136291014; SameSite=None; Path=/; Domain=.www.linkedin.com; Secure set-cookie: lang=v=2&lang=en-us; SameSite=None; Path=/; Domain=linkedin.com; Secure set-cookie: bcookie=\"v=2&70bd59e3-5a51-406c-8e0d-dd70befa8890\"; domain=.linkedin.com; Path=/; Secure; Expires=Wed, 09-Nov-2022 22:27:42 GMT; SameSite=None set-cookie: bscookie=\"v=1&202011091050107ae9b7ac-fe97-40fc-830d-d7a9ccf80659AQGib5iXwarbY8CCBP94Q39THkgUlx6J\"; domain=.www.linkedin.com; Path=/; Secure; Expires=Wed, 09-Nov-2022 22:27:42 GMT; HttpOnly; SameSite=None set-cookie: lissc=1; domain=.linkedin.com; Path=/; Secure; Expires=Tue, 09-Nov-2021 10:50:10 GMT; SameSite=None set-cookie: lidc=\"b=VGST04:s=V:r=V:g=2201:u=1:i=1604919010:t=1605005410:v=1:sig=AQHe-KzU8i_5Iy6MwnFEsgRct3c9Lh5R\"; Expires=Tue, 10 Nov 2020 10:50:10 GMT; domain=.linkedin.com; Path=/; SameSite=None; Secure x-fs-txn-id: 2b8d5409ba70 x-fs-uuid: 61bbf94956d14516302567fc882b0000 expect-ct: max-age=86400, report-uri=\"https://www.linkedin.com/platform-telemetry/ct\" x-xss-protection: 1; mode=block content-security-policy-report-only: default-src 'none'; connect-src 'self' www.linkedin.com www.google-analytics.com https://dpm.demdex.net/id lnkd.demdex.net blob: https://linkedin.sc.omtrdc.net/b/ss/ static.licdn.com static-exp1.licdn.com static-exp2.licdn.com static-exp3.licdn.com; script-src 'sha256-THuVhwbXPeTR0HszASqMOnIyxqEgvGyBwSPBKBF/iMc=' 'sha256-PyCXNcEkzRWqbiNr087fizmiBBrq9O6GGD8eV3P09Ik=' 'sha256-2SQ55Erm3CPCb+k03EpNxU9bdV3XL9TnVTriDs7INZ4=' 'sha256-S/KSPe186K/1B0JEjbIXcCdpB97krdzX05S+dHnQjUs=' platform.linkedin.com platform-akam.linkedin.com platform-ecst.linkedin.com platform-azur.linkedin.com static.licdn.com static-exp1.licdn.com static-exp2.licdn.com static-exp3.licdn.com; img-src data: blob: *; font-src data: *; style-src 'self' 'unsafe-inline' static.licdn.com static-exp1.licdn.com static-exp2.licdn.com static-exp3.licdn.com; media-src dms.licdn.com; child-src blob: *; frame-src 'self' lnkd.demdex.net linkedin.cdn.qualaroo.com; manifest-src 'self'; report-uri https://www.linkedin.com/platform-telemetry/csp?f=g content-security-policy: default-src *; connect-src 'self' https://media-src.linkedin.com/media/ www.linkedin.com s.c.lnkd.licdn.com m.c.lnkd.licdn.com s.c.exp1.licdn.com s.c.exp2.licdn.com m.c.exp1.licdn.com m.c.exp2.licdn.com wss://*.linkedin.com dms.licdn.com https://dpm.demdex.net/id lnkd.demdex.net blob: https://accounts.google.com/gsi/status https://linkedin.sc.omtrdc.net/b/ss/ www.google-analytics.com static.licdn.com static-exp1.licdn.com static-exp2.licdn.com static-exp3.licdn.com media.licdn.com media-exp1.licdn.com media-exp2.licdn.com media-exp3.licdn.com; img-src data: blob: *; font-src data: *; style-src 'unsafe-inline' 'self' static-src.linkedin.com *.licdn.com; script-src 'report-sample' 'unsafe-inline' 'unsafe-eval' 'self' spdy.linkedin.com static-src.linkedin.com *.ads.linkedin.com *.licdn.com static.chartbeat.com www.google-analytics.com ssl.google-analytics.com bcvipva02.rightnowtech.com www.bizographics.com sjs.bizographics.com js.bizographics.com d.la4-c1-was.salesforceliveagent.com slideshare.www.linkedin.com https://snap.licdn.com/li.lms-analytics/ platform.linkedin.com platform-akam.linkedin.com platform-ecst.linkedin.com platform-azur.linkedin.com; object-src 'none'; media-src blob: *; child-src blob: lnkd-communities: voyager: *; frame-ancestors 'self'; report-uri https://www.linkedin.com/platform-telemetry/csp?f=l x-frame-options: sameorigin x-content-type-options: nosniff strict-transport-security: max-age=2592000 x-li-fabric: prod-lva1 x-li-pop: afd-prod-lva1 x-li-proto: http/2 x-li-uuid: Ybv5SVbRRRYwJWf8iCsAAA== x-msedge-ref: Ref A: CFB9AC1D2B0645DDB161CEE4A4909AEF Ref B: BOM02EDGE0712 Ref C: 2020-11-09T10:50:10Z date: Mon, 09 Nov 2020 10:50:10 GMT * Closing connection 0 Here my system has a list of certificate authorities it trusts in this file /etc/ssl/cert.pem. Curl validates the certificate is for www.linkedin.com by seeing the CN section of the subject part of the certificate. It also makes sure the certificate is not expired by seeing the expire date. It also validates the signature on the certificate by using the public key of issuer Digicert in /etc/ssl/cert.pem. Once this is done, using the public key of www.linkedin.com it negotiates cipher TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 with a symmetric key. Subsequent data transfer including first HTTP request uses the same cipher and symmetric key.","title":"HTTP"},{"location":"level101/linux_networking/intro/","text":"Linux Networking Fundamentals Prerequisites High-level knowledge of commonly used jargon in TCP/IP stack like DNS, TCP, UDP and HTTP Linux Commandline Basics What to expect from this course Throughout the course, we cover how an SRE can optimize the system to improve their web stack performance and troubleshoot if there is an issue in any of the layers of the networking stack. This course tries to dig through each layer of traditional TCP/IP stack and expects an SRE to have a picture beyond the bird\u2019s eye view of the functioning of the Internet. What is not covered under this course This course spends time on the fundamentals. We are not covering concepts like HTTP/2.0 , QUIC , TCP congestion control protocols , Anycast , BGP , CDN , Tunnels and Multicast . We expect that this course will provide the relevant basics to understand such concepts Birds eye view of the course The course covers the question \u201cWhat happens when you open linkedin.com in your browser?\u201d The course follows the flow of TCP/IP stack.More specifically, the course covers topics of Application layer protocols DNS and HTTP, transport layer protocols UDP and TCP, networking layer protocol IP and Data Link Layer protocol Course Contents DNS UDP HTTP TCP IP Routing","title":"Introduction"},{"location":"level101/linux_networking/intro/#linux-networking-fundamentals","text":"","title":"Linux Networking Fundamentals"},{"location":"level101/linux_networking/intro/#prerequisites","text":"High-level knowledge of commonly used jargon in TCP/IP stack like DNS, TCP, UDP and HTTP Linux Commandline Basics","title":"Prerequisites"},{"location":"level101/linux_networking/intro/#what-to-expect-from-this-course","text":"Throughout the course, we cover how an SRE can optimize the system to improve their web stack performance and troubleshoot if there is an issue in any of the layers of the networking stack. This course tries to dig through each layer of traditional TCP/IP stack and expects an SRE to have a picture beyond the bird\u2019s eye view of the functioning of the Internet.","title":"What to expect from this course"},{"location":"level101/linux_networking/intro/#what-is-not-covered-under-this-course","text":"This course spends time on the fundamentals. We are not covering concepts like HTTP/2.0 , QUIC , TCP congestion control protocols , Anycast , BGP , CDN , Tunnels and Multicast . We expect that this course will provide the relevant basics to understand such concepts","title":"What is not covered under this course"},{"location":"level101/linux_networking/intro/#birds-eye-view-of-the-course","text":"The course covers the question \u201cWhat happens when you open linkedin.com in your browser?\u201d The course follows the flow of TCP/IP stack.More specifically, the course covers topics of Application layer protocols DNS and HTTP, transport layer protocols UDP and TCP, networking layer protocol IP and Data Link Layer protocol","title":"Birds eye view of the course"},{"location":"level101/linux_networking/intro/#course-contents","text":"DNS UDP HTTP TCP IP Routing","title":"Course Contents"},{"location":"level101/linux_networking/ipr/","text":"IP Routing and Data Link Layer We will dig how packets that leave the client reach the server and vice versa. When the packet reaches the IP layer, the transport layer populates source port, destination port. IP/Network layer populates destination IP(discovered from DNS) and then looks up the route to the destination IP on the routing table. #Linux route -n command gives the default routing table route -n Kernel IP routing table Destination Gateway Genmask Flags Metric Ref Use Iface 0.0.0.0 172.17.0.1 0.0.0.0 UG 0 0 0 eth0 172.17.0.0 0.0.0.0 255.255.0.0 U 0 0 0 eth0 Here the destination IP is bitwise AND\u2019d with the Genmask and if the answer is the destination part of the table then that gateway and interface is picked for routing. Here linkedin.com\u2019s IP 108.174.10.10 is AND\u2019d with 255.255.255.0 and the answer we get is 108.174.10.0 which doesn\u2019t match with any destination in the routing table. Then Linux does an AND of destination IP with 0.0.0.0 and we get 0.0.0.0. This answer matches the default row Routing table is processed in the order of more octets of 1 set in genmask and genmask 0.0.0.0 is the default route if nothing matches. At the end of this operation Linux figured out that the packet has to be sent to next hop 172.17.0.1 via eth0. The source IP of the packet will be set as the IP of interface eth0. Now to send the packet to 172.17.0.1 linux has to figure out the MAC address of 172.17.0.1. MAC address is figured by looking at the internal arp cache which stores translation between IP address and MAC address. If there is a cache miss, Linux broadcasts ARP request within the internal network asking who has 172.17.0.1. The owner of the IP sends an ARP response which is cached by the kernel and the kernel sends the packet to the gateway by setting Source mac address as mac address of eth0 and destination mac address of 172.17.0.1 which we got just now. Similar routing lookup process is followed in each hop till the packet reaches the actual server. Transport layer and layers above it come to play only at end servers. During intermediate hops only till the IP/Network layer is involved. One weird gateway we saw in the routing table is 0.0.0.0. This gateway means no Layer3(Network layer) hop is needed to send the packet. Both source and destination are in the same network. Kernel has to figure out the mac of the destination and populate source and destination mac appropriately and send the packet out so that it reaches the destination without any Layer3 hop in the middle As we followed in other modules, lets complete this session with SRE usecases Applications in SRE role Generally the routing table is populated by DHCP and playing around is not a good practice. There can be reasons where one has to play around the routing table but take that path only when it's absolutely necessary Understanding error messages better like, \u201cNo route to host\u201d error can mean mac address of the destination host is not found and it can mean the destination host is down On rare cases looking at the ARP table can help us understand if there is a IP conflict where same IP is assigned to two hosts by mistake and this is causing unexpected behavior","title":"Routing"},{"location":"level101/linux_networking/ipr/#ip-routing-and-data-link-layer","text":"We will dig how packets that leave the client reach the server and vice versa. When the packet reaches the IP layer, the transport layer populates source port, destination port. IP/Network layer populates destination IP(discovered from DNS) and then looks up the route to the destination IP on the routing table. #Linux route -n command gives the default routing table route -n Kernel IP routing table Destination Gateway Genmask Flags Metric Ref Use Iface 0.0.0.0 172.17.0.1 0.0.0.0 UG 0 0 0 eth0 172.17.0.0 0.0.0.0 255.255.0.0 U 0 0 0 eth0 Here the destination IP is bitwise AND\u2019d with the Genmask and if the answer is the destination part of the table then that gateway and interface is picked for routing. Here linkedin.com\u2019s IP 108.174.10.10 is AND\u2019d with 255.255.255.0 and the answer we get is 108.174.10.0 which doesn\u2019t match with any destination in the routing table. Then Linux does an AND of destination IP with 0.0.0.0 and we get 0.0.0.0. This answer matches the default row Routing table is processed in the order of more octets of 1 set in genmask and genmask 0.0.0.0 is the default route if nothing matches. At the end of this operation Linux figured out that the packet has to be sent to next hop 172.17.0.1 via eth0. The source IP of the packet will be set as the IP of interface eth0. Now to send the packet to 172.17.0.1 linux has to figure out the MAC address of 172.17.0.1. MAC address is figured by looking at the internal arp cache which stores translation between IP address and MAC address. If there is a cache miss, Linux broadcasts ARP request within the internal network asking who has 172.17.0.1. The owner of the IP sends an ARP response which is cached by the kernel and the kernel sends the packet to the gateway by setting Source mac address as mac address of eth0 and destination mac address of 172.17.0.1 which we got just now. Similar routing lookup process is followed in each hop till the packet reaches the actual server. Transport layer and layers above it come to play only at end servers. During intermediate hops only till the IP/Network layer is involved. One weird gateway we saw in the routing table is 0.0.0.0. This gateway means no Layer3(Network layer) hop is needed to send the packet. Both source and destination are in the same network. Kernel has to figure out the mac of the destination and populate source and destination mac appropriately and send the packet out so that it reaches the destination without any Layer3 hop in the middle As we followed in other modules, lets complete this session with SRE usecases","title":"IP Routing and Data Link Layer"},{"location":"level101/linux_networking/ipr/#applications-in-sre-role","text":"Generally the routing table is populated by DHCP and playing around is not a good practice. There can be reasons where one has to play around the routing table but take that path only when it's absolutely necessary Understanding error messages better like, \u201cNo route to host\u201d error can mean mac address of the destination host is not found and it can mean the destination host is down On rare cases looking at the ARP table can help us understand if there is a IP conflict where same IP is assigned to two hosts by mistake and this is causing unexpected behavior","title":"Applications in SRE role"},{"location":"level101/linux_networking/tcp/","text":"TCP TCP is a transport layer protocol like UDP but it guarantees reliability, flow control and congestion control. TCP guarantees reliable delivery by using sequence numbers. A TCP connection is established by a three way handshake. In our case, the client sends a SYN packet along with the starting sequence number it plans to use, the server acknowledges the SYN packet and sends a SYN with its sequence number. Once the client acknowledges the syn packet, the connection is established. Each data transferred from here on is considered delivered reliably once acknowledgement for that sequence is received by the concerned party #To understand handshake run packet capture on one bash session tcpdump -S -i any port 80 #Run curl on one bash session curl www.linkedin.com Here client sends a syn flag shown by [S] flag with a sequence number 1522264672. The server acknowledges receipt of SYN with an ack [.] flag and a Syn flag for its sequence number[S]. The server uses the sequence number 1063230400 and acknowledges the client it\u2019s expecting sequence number 1522264673 (client sequence+1). Client sends a zero length acknowledgement packet to the server(server sequence+1) and connection stands established. This is called three way handshake. The client sends a 76 bytes length packet after this and increments its sequence number by 76. Server sends a 170 byte response and closes the connection. This was the difference we were talking about between HTTP/1.1 and HTTP/1.0. In HTTP/1.1 this same connection can be reused which reduces overhead of 3 way handshake for each HTTP request. If a packet is missed between client and server, server won\u2019t send an ack to the client and client would retry sending the packet till the ACK is received. This guarantees reliability. The flow control is established by the win size field in each segment. The win size says available TCP buffer length in the kernel which can be used to buffer received segments. A size 0 means the receiver has a lot of lag to catch from its socket buffer and the sender has to pause sending packets so that receiver can cope up. This flow control protects from slow receiver and fast sender problem TCP also does congestion control which determines how many segments can be in transit without an ack. Linux provides us the ability to configure algorithms for congestion control which we are not covering here. While closing a connection, client/server calls a close syscall. Let's assume client do that. Client\u2019s kernel will send a FIN packet to the server. Server\u2019s kernel can\u2019t close the connection till the close syscall is called by the server application. Once server app calls close, server also sends a FIN packet and client enters into time wait state for 2*MSS(120s) so that this socket can\u2019t be reused for that time period to prevent any TCP state corruptions due to stray stale packets. Armed with our TCP and HTTP knowledge lets see how this is used by SREs in their role Applications in SRE role Scaling HTTP performance using load balancers need consistent knowledge about both TCP and HTTP. There are different kinds of load balancing like L4, L7 load balancing, Direct Server Return etc. HTTPs offloading can be done on Load balancer or directly on servers based on the performance and compliance needs. Tweaking sysctl variables for rmem and wmem like we did for UDP can improve throughput of sender and receiver. Sysctl variable tcp_max_syn_backlog and socket variable somax_conn determines how many connections for which the kernel can complete 3 way handshake before app calling accept syscall. This is much useful in single threaded applications. Once the backlog is full, new connections stay in SYN_RCVD state (when you run netstat) till the application calls accept syscall Apps can run out of file descriptors if there are too many short lived connections. Digging through tcp_reuse and tcp_recycle can help reduce time spent in the time wait state(it has its own risk). Making apps reuse a pool of connections instead of creating ad hoc connection can also help Understanding performance bottlenecks by seeing metrics and classifying whether its a problem in App or network side. Example too many sockets in Close_wait state is a problem on application whereas retransmissions can be a problem more on network or on OS stack than the application itself. Understanding the fundamentals can help us narrow down where the bottleneck is","title":"TCP"},{"location":"level101/linux_networking/tcp/#tcp","text":"TCP is a transport layer protocol like UDP but it guarantees reliability, flow control and congestion control. TCP guarantees reliable delivery by using sequence numbers. A TCP connection is established by a three way handshake. In our case, the client sends a SYN packet along with the starting sequence number it plans to use, the server acknowledges the SYN packet and sends a SYN with its sequence number. Once the client acknowledges the syn packet, the connection is established. Each data transferred from here on is considered delivered reliably once acknowledgement for that sequence is received by the concerned party #To understand handshake run packet capture on one bash session tcpdump -S -i any port 80 #Run curl on one bash session curl www.linkedin.com Here client sends a syn flag shown by [S] flag with a sequence number 1522264672. The server acknowledges receipt of SYN with an ack [.] flag and a Syn flag for its sequence number[S]. The server uses the sequence number 1063230400 and acknowledges the client it\u2019s expecting sequence number 1522264673 (client sequence+1). Client sends a zero length acknowledgement packet to the server(server sequence+1) and connection stands established. This is called three way handshake. The client sends a 76 bytes length packet after this and increments its sequence number by 76. Server sends a 170 byte response and closes the connection. This was the difference we were talking about between HTTP/1.1 and HTTP/1.0. In HTTP/1.1 this same connection can be reused which reduces overhead of 3 way handshake for each HTTP request. If a packet is missed between client and server, server won\u2019t send an ack to the client and client would retry sending the packet till the ACK is received. This guarantees reliability. The flow control is established by the win size field in each segment. The win size says available TCP buffer length in the kernel which can be used to buffer received segments. A size 0 means the receiver has a lot of lag to catch from its socket buffer and the sender has to pause sending packets so that receiver can cope up. This flow control protects from slow receiver and fast sender problem TCP also does congestion control which determines how many segments can be in transit without an ack. Linux provides us the ability to configure algorithms for congestion control which we are not covering here. While closing a connection, client/server calls a close syscall. Let's assume client do that. Client\u2019s kernel will send a FIN packet to the server. Server\u2019s kernel can\u2019t close the connection till the close syscall is called by the server application. Once server app calls close, server also sends a FIN packet and client enters into time wait state for 2*MSS(120s) so that this socket can\u2019t be reused for that time period to prevent any TCP state corruptions due to stray stale packets. Armed with our TCP and HTTP knowledge lets see how this is used by SREs in their role","title":"TCP"},{"location":"level101/linux_networking/tcp/#applications-in-sre-role","text":"Scaling HTTP performance using load balancers need consistent knowledge about both TCP and HTTP. There are different kinds of load balancing like L4, L7 load balancing, Direct Server Return etc. HTTPs offloading can be done on Load balancer or directly on servers based on the performance and compliance needs. Tweaking sysctl variables for rmem and wmem like we did for UDP can improve throughput of sender and receiver. Sysctl variable tcp_max_syn_backlog and socket variable somax_conn determines how many connections for which the kernel can complete 3 way handshake before app calling accept syscall. This is much useful in single threaded applications. Once the backlog is full, new connections stay in SYN_RCVD state (when you run netstat) till the application calls accept syscall Apps can run out of file descriptors if there are too many short lived connections. Digging through tcp_reuse and tcp_recycle can help reduce time spent in the time wait state(it has its own risk). Making apps reuse a pool of connections instead of creating ad hoc connection can also help Understanding performance bottlenecks by seeing metrics and classifying whether its a problem in App or network side. Example too many sockets in Close_wait state is a problem on application whereas retransmissions can be a problem more on network or on OS stack than the application itself. Understanding the fundamentals can help us narrow down where the bottleneck is","title":"Applications in SRE role"},{"location":"level101/linux_networking/udp/","text":"UDP UDP is a transport layer protocol. DNS is an application layer protocol that runs on top of UDP(most of the times). Before jumping into UDP, let's try to understand what an application and transport layer is. DNS protocol is used by a DNS client(eg dig) and DNS server(eg named). The transport layer makes sure the DNS request reaches the DNS server process and similarly the response reaches the DNS client process. Multiple processes can run on a system and they can listen on any ports . DNS servers usually listen on port number 53. When a client makes a DNS request, after filling the necessary application payload, it passes the payload to the kernel via sendto system call. The kernel picks a random port number( >1024 ) as source port number and puts 53 as destination port number and sends the packet to lower layers. When the kernel on server side receives the packet, it checks the port number and queues the packet to the application buffer of the DNS server process which makes a recvfrom system call and reads the packet. This process by the kernel is called multiplexing(combining packets from multiple applications to same lower layers) and demultiplexing(segregating packets from single lower layer to multiple applications). Multiplexing and Demultiplexing is done by the Transport layer. UDP is one of the simplest transport layer protocol and it does only multiplexing and demultiplexing. Another common transport layer protocol TCP does a bunch of other things like reliable communication, flow control and congestion control. UDP is designed to be lightweight and handle communications with little overhead. So it doesn\u2019t do anything beyond multiplexing and demultiplexing. If applications running on top of UDP need any of the features of TCP, they have to implement that in their application This example from python wiki covers a sample UDP client and server where \u201cHello World\u201d is an application payload sent to server listening on port number 5005. The server receives the packet and prints the \u201cHello World\u201d string from the client Applications in SRE role If the underlying network is slow and the UDP layer is unable to queue packets down to the networking layer, sendto syscall from the application will hang till the kernel finds some of its buffer is freed. This can affect the throughput of the system. Increasing write memory buffer values using sysctl variables net.core.wmem_max and net.core.wmem_default provides some cushion to the application from the slow network Similarly if the receiver process is slow in consuming from its buffer, the kernel has to drop packets which it can\u2019t queue due to the buffer being full. Since UDP doesn\u2019t guarantee reliability these dropped packets can cause data loss unless tracked by the application layer. Increasing sysctl variables rmem_default and rmem_max can provide some cushion to slow applications from fast senders.","title":"UDP"},{"location":"level101/linux_networking/udp/#udp","text":"UDP is a transport layer protocol. DNS is an application layer protocol that runs on top of UDP(most of the times). Before jumping into UDP, let's try to understand what an application and transport layer is. DNS protocol is used by a DNS client(eg dig) and DNS server(eg named). The transport layer makes sure the DNS request reaches the DNS server process and similarly the response reaches the DNS client process. Multiple processes can run on a system and they can listen on any ports . DNS servers usually listen on port number 53. When a client makes a DNS request, after filling the necessary application payload, it passes the payload to the kernel via sendto system call. The kernel picks a random port number( >1024 ) as source port number and puts 53 as destination port number and sends the packet to lower layers. When the kernel on server side receives the packet, it checks the port number and queues the packet to the application buffer of the DNS server process which makes a recvfrom system call and reads the packet. This process by the kernel is called multiplexing(combining packets from multiple applications to same lower layers) and demultiplexing(segregating packets from single lower layer to multiple applications). Multiplexing and Demultiplexing is done by the Transport layer. UDP is one of the simplest transport layer protocol and it does only multiplexing and demultiplexing. Another common transport layer protocol TCP does a bunch of other things like reliable communication, flow control and congestion control. UDP is designed to be lightweight and handle communications with little overhead. So it doesn\u2019t do anything beyond multiplexing and demultiplexing. If applications running on top of UDP need any of the features of TCP, they have to implement that in their application This example from python wiki covers a sample UDP client and server where \u201cHello World\u201d is an application payload sent to server listening on port number 5005. The server receives the packet and prints the \u201cHello World\u201d string from the client","title":"UDP"},{"location":"level101/linux_networking/udp/#applications-in-sre-role","text":"If the underlying network is slow and the UDP layer is unable to queue packets down to the networking layer, sendto syscall from the application will hang till the kernel finds some of its buffer is freed. This can affect the throughput of the system. Increasing write memory buffer values using sysctl variables net.core.wmem_max and net.core.wmem_default provides some cushion to the application from the slow network Similarly if the receiver process is slow in consuming from its buffer, the kernel has to drop packets which it can\u2019t queue due to the buffer being full. Since UDP doesn\u2019t guarantee reliability these dropped packets can cause data loss unless tracked by the application layer. Increasing sysctl variables rmem_default and rmem_max can provide some cushion to slow applications from fast senders.","title":"Applications in SRE role"},{"location":"level101/metrics_and_monitoring/alerts/","text":"Proactive monitoring using alerts Earlier we discussed different ways to collect key metric data points from a service and its underlying infrastructure. This data gives us a better understanding of how the service is performing. One of the main objectives of monitoring is to detect any service degradations early (reduce Mean Time To Detect) and notify stakeholders so that the issues are either avoided or can be fixed early, thus reducing Mean Time To Recover (MTTR). For example, if you are notified when resource usage by a service exceeds 90 percent, you can take preventive measures to avoid any service breakdown due to a shortage of resources. On the other hand, when a service goes down due to an issue, early detection and notification of such incidents can help you quickly fix the issue. Figure 8: An alert notification received on Slack Today most of the monitoring services available provide a mechanism to set up alerts on one or a combination of metrics to actively monitor the service health. These alerts have a set of defined rules or conditions, and when the rule is broken, you are notified. These rules can be as simple as notifying when the metric value exceeds n to as complex as a week over week (WoW) comparison of standard deviation over a period of time. Monitoring tools notify you about an active alert, and most of these tools support instant messaging (IM) platforms, SMS, email, or phone calls. Figure 8 shows a sample alert notification received on Slack for memory usage exceeding 90 percent of total RAM space on the host.","title":"Proactive Monitoring with Alerts"},{"location":"level101/metrics_and_monitoring/alerts/#_1","text":"","title":""},{"location":"level101/metrics_and_monitoring/alerts/#proactive-monitoring-using-alerts","text":"Earlier we discussed different ways to collect key metric data points from a service and its underlying infrastructure. This data gives us a better understanding of how the service is performing. One of the main objectives of monitoring is to detect any service degradations early (reduce Mean Time To Detect) and notify stakeholders so that the issues are either avoided or can be fixed early, thus reducing Mean Time To Recover (MTTR). For example, if you are notified when resource usage by a service exceeds 90 percent, you can take preventive measures to avoid any service breakdown due to a shortage of resources. On the other hand, when a service goes down due to an issue, early detection and notification of such incidents can help you quickly fix the issue. Figure 8: An alert notification received on Slack Today most of the monitoring services available provide a mechanism to set up alerts on one or a combination of metrics to actively monitor the service health. These alerts have a set of defined rules or conditions, and when the rule is broken, you are notified. These rules can be as simple as notifying when the metric value exceeds n to as complex as a week over week (WoW) comparison of standard deviation over a period of time. Monitoring tools notify you about an active alert, and most of these tools support instant messaging (IM) platforms, SMS, email, or phone calls. Figure 8 shows a sample alert notification received on Slack for memory usage exceeding 90 percent of total RAM space on the host.","title":"Proactive monitoring using alerts"},{"location":"level101/metrics_and_monitoring/best_practices/","text":"Best practices for monitoring When setting up monitoring for a service, keep the following best practices in mind. Use the right metric type -- Most of the libraries available today offer various metric types. Choose the appropriate metric type for monitoring your system. Following are the types of metrics and their purposes. Gauge -- Gauge is a constant type of metric. After the metric is initialized, the metric value does not change unless you intentionally update it. Timer -- Timer measures the time taken to complete a task. Counter -- Counter counts the number of occurrences of a particular event. For more information about these metric types, see Data Types . Avoid over-monitoring -- Monitoring can be a significant engineering endeavor . Therefore, be sure not to spend too much time and resources on monitoring services, yet make sure all important metrics are captured. Prevent alert fatigue -- Set alerts for metrics that are important and actionable. If you receive too many non-critical alerts, you might start ignoring alert notifications over time. As a result, critical alerts might get overlooked. Have a runbook for alerts -- For every alert, make sure you have a document explaining what actions and checks need to be performed when the alert fires. This enables any engineer on the team to handle the alert and take necessary actions, without any help from others.","title":"Best Practices for Monitoring"},{"location":"level101/metrics_and_monitoring/best_practices/#_1","text":"","title":""},{"location":"level101/metrics_and_monitoring/best_practices/#best-practices-for-monitoring","text":"When setting up monitoring for a service, keep the following best practices in mind. Use the right metric type -- Most of the libraries available today offer various metric types. Choose the appropriate metric type for monitoring your system. Following are the types of metrics and their purposes. Gauge -- Gauge is a constant type of metric. After the metric is initialized, the metric value does not change unless you intentionally update it. Timer -- Timer measures the time taken to complete a task. Counter -- Counter counts the number of occurrences of a particular event. For more information about these metric types, see Data Types . Avoid over-monitoring -- Monitoring can be a significant engineering endeavor . Therefore, be sure not to spend too much time and resources on monitoring services, yet make sure all important metrics are captured. Prevent alert fatigue -- Set alerts for metrics that are important and actionable. If you receive too many non-critical alerts, you might start ignoring alert notifications over time. As a result, critical alerts might get overlooked. Have a runbook for alerts -- For every alert, make sure you have a document explaining what actions and checks need to be performed when the alert fires. This enables any engineer on the team to handle the alert and take necessary actions, without any help from others.","title":"Best practices for monitoring"},{"location":"level101/metrics_and_monitoring/command-line_tools/","text":"Command-line tools Most of the Linux distributions today come with a set of tools that monitor the system's performance. These tools help you measure and understand various subsystem statistics (CPU, memory, network, and so on). Let's look at some of the tools that are predominantly used. ps/top -- The process status command (ps) displays information about all the currently running processes in a Linux system. The top command is similar to the ps command, but it periodically updates the information displayed until the program is terminated. An advanced version of top, called htop, has a more user-friendly interface and some additional features. These command-line utilities come with options to modify the operation and output of the command. Following are some important options supported by the ps command. -p -- Displays information about processes that match the specified process IDs. Similarly, you can use -u and -g to display information about processes belonging to a specific user or group. -a -- Displays information about other users' processes, as well as one's own. -x -- When displaying processes matched by other options, includes processes that do not have a controlling terminal. Figure 2: Results of top command ss -- The socket statistics command (ss) displays information about network sockets on the system. This tool is the successor of netstat , which is deprecated. Following are some command-line options supported by the ss command: -t -- Displays the TCP socket. Similarly, -u displays UDP sockets, -x is for UNIX domain sockets, and so on. -l -- Displays only listening sockets. -n -- Instructs the command to not resolve service names. Instead displays the port numbers. Figure 3: List of listening sockets on a system free -- The free command displays memory usage statistics on the host like available memory, used memory, and free memory. Most often, this command is used with the -h command-line option, which displays the statistics in a human-readable format. Figure 4: Memory statistics on a host in human-readable form df -- The df command displays disk space usage statistics. The -i command-line option is also often used to display inode usage statistics. The -h command-line option is used for displaying statistics in a human-readable format. Figure 5: Disk usage statistics on a system in human-readable form sar -- The sar utility monitors various subsystems, such as CPU and memory, in real time. This data can be stored in a file specified with the -o option. This tool helps to identify anomalies. iftop -- The interface top command ( iftop ) displays bandwidth utilization by a host on an interface. This command is often used to identify bandwidth usage by active connections. The -i option specifies which network interface to watch. Figure 6: Network bandwidth usage by active connection on the host tcpdump -- The tcpdump command is a network monitoring tool that captures network packets flowing over the network and displays a description of the captured packets. The following options are available: -i -- Interface to listen on host -- Filters traffic going to or from the specified host src/dst -- Displays one-way traffic from the source (src) or to the destination (dst) port -- Filters traffic to or from a particular port Figure 7: *tcpdump* of packets on *docker0* interface on a host","title":"Command-line Tools"},{"location":"level101/metrics_and_monitoring/command-line_tools/#_1","text":"","title":""},{"location":"level101/metrics_and_monitoring/command-line_tools/#command-line-tools","text":"Most of the Linux distributions today come with a set of tools that monitor the system's performance. These tools help you measure and understand various subsystem statistics (CPU, memory, network, and so on). Let's look at some of the tools that are predominantly used. ps/top -- The process status command (ps) displays information about all the currently running processes in a Linux system. The top command is similar to the ps command, but it periodically updates the information displayed until the program is terminated. An advanced version of top, called htop, has a more user-friendly interface and some additional features. These command-line utilities come with options to modify the operation and output of the command. Following are some important options supported by the ps command. -p -- Displays information about processes that match the specified process IDs. Similarly, you can use -u and -g to display information about processes belonging to a specific user or group. -a -- Displays information about other users' processes, as well as one's own. -x -- When displaying processes matched by other options, includes processes that do not have a controlling terminal. Figure 2: Results of top command ss -- The socket statistics command (ss) displays information about network sockets on the system. This tool is the successor of netstat , which is deprecated. Following are some command-line options supported by the ss command: -t -- Displays the TCP socket. Similarly, -u displays UDP sockets, -x is for UNIX domain sockets, and so on. -l -- Displays only listening sockets. -n -- Instructs the command to not resolve service names. Instead displays the port numbers. Figure 3: List of listening sockets on a system free -- The free command displays memory usage statistics on the host like available memory, used memory, and free memory. Most often, this command is used with the -h command-line option, which displays the statistics in a human-readable format. Figure 4: Memory statistics on a host in human-readable form df -- The df command displays disk space usage statistics. The -i command-line option is also often used to display inode usage statistics. The -h command-line option is used for displaying statistics in a human-readable format. Figure 5: Disk usage statistics on a system in human-readable form sar -- The sar utility monitors various subsystems, such as CPU and memory, in real time. This data can be stored in a file specified with the -o option. This tool helps to identify anomalies. iftop -- The interface top command ( iftop ) displays bandwidth utilization by a host on an interface. This command is often used to identify bandwidth usage by active connections. The -i option specifies which network interface to watch. Figure 6: Network bandwidth usage by active connection on the host tcpdump -- The tcpdump command is a network monitoring tool that captures network packets flowing over the network and displays a description of the captured packets. The following options are available: -i -- Interface to listen on host -- Filters traffic going to or from the specified host src/dst -- Displays one-way traffic from the source (src) or to the destination (dst) port -- Filters traffic to or from a particular port Figure 7: *tcpdump* of packets on *docker0* interface on a host","title":"Command-line tools"},{"location":"level101/metrics_and_monitoring/conclusion/","text":"Conclusion A robust monitoring and alerting system is necessary for maintaining and troubleshooting a system. A dashboard with key metrics can give you an overview of service performance, all in one place. Well-defined alerts (with realistic thresholds and notifications) further enable you to quickly identify any anomalies in the service infrastructure and in resource saturation. By taking necessary actions, you can avoid any service degradations and decrease MTTD for service breakdowns. In addition to in-house monitoring, monitoring real user experience can help you to understand service performance as perceived by the users. Many modules are involved in serving the user, and most of them are out of your control. Therefore, you need to have real-user monitoring in place. Metrics give very abstract details on service performance. To get a better understanding of the system and for faster recovery during incidents, you might want to implement the other two pillars of observability: logs and tracing. Logs and trace data can help you understand what led to service failure or degradation. Following are some resources to learn more about monitoring and observability: Google SRE book: Monitoring Distributed Systems Mastering Distributed Tracing by Yuri Shkuro References Google SRE book: Monitoring Distributed Systems Mastering Distributed Tracing, by Yuri Shkuro Monitoring and Observability Three PIllars with Zero Answers Engineering blogs on LinkedIn , Grafana , Elastic.co , OpenTelemetry","title":"Conclusion"},{"location":"level101/metrics_and_monitoring/conclusion/#conclusion","text":"A robust monitoring and alerting system is necessary for maintaining and troubleshooting a system. A dashboard with key metrics can give you an overview of service performance, all in one place. Well-defined alerts (with realistic thresholds and notifications) further enable you to quickly identify any anomalies in the service infrastructure and in resource saturation. By taking necessary actions, you can avoid any service degradations and decrease MTTD for service breakdowns. In addition to in-house monitoring, monitoring real user experience can help you to understand service performance as perceived by the users. Many modules are involved in serving the user, and most of them are out of your control. Therefore, you need to have real-user monitoring in place. Metrics give very abstract details on service performance. To get a better understanding of the system and for faster recovery during incidents, you might want to implement the other two pillars of observability: logs and tracing. Logs and trace data can help you understand what led to service failure or degradation. Following are some resources to learn more about monitoring and observability: Google SRE book: Monitoring Distributed Systems Mastering Distributed Tracing by Yuri Shkuro","title":"Conclusion"},{"location":"level101/metrics_and_monitoring/conclusion/#references","text":"Google SRE book: Monitoring Distributed Systems Mastering Distributed Tracing, by Yuri Shkuro Monitoring and Observability Three PIllars with Zero Answers Engineering blogs on LinkedIn , Grafana , Elastic.co , OpenTelemetry","title":"References"},{"location":"level101/metrics_and_monitoring/introduction/","text":"Prerequisites Linux Basics Python and the Web Systems Design Linux Networking Fundamentals What to expect from this course Monitoring is an integral part of any system. As an SRE, you need to have a basic understanding of monitoring a service infrastructure. By the end of this course, you will gain a better understanding of the following topics: What is monitoring? What needs to be measured How the metrics gathered can be used to improve business decisions and overall reliability Proactive monitoring with alerts Log processing and its importance What is observability? Distributed tracing Logs Metrics What is not covered in this course Guide to setting up a monitoring infrastructure Deep dive into different monitoring technologies and benchmarking or comparison of any tools Course content Introduction Four golden signals of monitoring Why is monitoring important? Command-line tools Third-party monitoring Proactive monitoring using alerts Best practices for monitoring Observability Logs Tracing Conclusion Introduction Monitoring is a process of collecting real-time performance metrics from a system, analyzing the data to derive meaningful information, and displaying the data to the users. In simple terms, you measure various metrics regularly to understand the state of the system, including but not limited to, user requests, latency, and error rate. What gets measured, gets fixed ---if you can measure something, you can reason about it, understand it, discuss it, and act upon it with confidence. Four golden signals of monitoring When setting up monitoring for a system, you need to decide what to measure. The four golden signals of monitoring provide a good understanding of service performance and lay a foundation for monitoring a system. These four golden signals are Traffic Latency Error Saturation These metrics help you to understand the system performance and bottlenecks, and to create a better end-user experience. As discussed in the Google SRE book , if you can measure only four metrics of your service, focus on these four. Let's look at each of the four golden signals. Traffic -- Traffic gives a better understanding of the service demand. Often referred to as service QPS (queries per second), traffic is a measure of requests served by the service. This signal helps you to decide when a service needs to be scaled up to handle increasing customer demand and scaled down to be cost-effective. Latency -- Latency is the measure of time taken by the service to process the incoming request and send the response. Measuring service latency helps in the early detection of slow degradation of the service. Distinguishing between the latency of successful requests and the latency of failed requests is important. For example, an HTTP 5XX error triggered due to loss of connection to a database or other critical backend might be served very quickly. However, because an HTTP 500 error indicates a failed request, factoring 500s into overall latency might result in misleading calculations. Error (rate) -- Error is the measure of failed client requests. These failures can be easily identified based on the response codes ( HTTP 5XX error ). There might be cases where the response is considered erroneous due to wrong result data or due to policy violations. For example, you might get an HTTP 200 response, but the body has incomplete data, or response time is breaching the agreed-upon SLA s. Therefore, you need to have other mechanisms (code logic or instrumentation ) in place to capture errors in addition to the response codes. Saturation -- Saturation is a measure of the resource utilization by a service. This signal tells you the state of service resources and how full they are. These resources include memory, compute, network I/O, and so on. Service performance slowly degrades even before resource utilization is at 100 percent. Therefore, having a utilization target is important. An increase in latency is a good indicator of saturation; measuring the 99th percentile of latency can help in the early detection of saturation. Depending on the type of service, you can measure these signals in different ways. For example, you might measure queries per second served for a web server. In contrast, for a database server, transactions performed and database sessions created give you an idea about the traffic handled by the database server. With the help of additional code logic (monitoring libraries and instrumentation), you can measure these signals periodically and store them for future analysis. Although these metrics give you an idea about the performance at the service end, you need to also ensure that the same user experience is delivered at the client end. Therefore, you might need to monitor the service from outside the service infrastructure, which is discussed under third-party monitoring. Why is monitoring important? Monitoring plays a key role in the success of a service. As discussed earlier, monitoring provides performance insights for understanding service health. With access to historical data collected over time, you can build intelligent applications to address specific needs. Some of the key use cases follow: Reduction in time to resolve issues -- With a good monitoring infrastructure in place, you can identify issues quickly and resolve them, which reduces the impact caused by the issues. Business decisions -- Data collected over a period of time can help you make business decisions such as determining the product release cycle, which features to invest in, and geographical areas to focus on. Decisions based on long-term data can improve the overall product experience. Resource planning -- By analyzing historical data, you can forecast service compute-resource demands, and you can properly allocate resources. This allows financially effective decisions, with no compromise in end-user experience. Before we dive deeper into monitoring, let's understand some basic terminologies. Metric -- A metric is a quantitative measure of a particular system attribute---for example, memory or CPU Node or host -- A physical server, virtual machine, or container where an application is running QPS -- Queries Per Second , a measure of traffic served by the service per second Latency -- The time interval between user action and the response from the server---for example, time spent after sending a query to a database before the first response bit is received Error rate -- Number of errors observed over a particular time period (usually a second) Graph -- In monitoring, a graph is a representation of one or more values of metrics collected over time Dashboard -- A dashboard is a collection of graphs that provide an overview of system health Incident -- An incident is an event that disrupts the normal operations of a system MTTD -- Mean Time To Detect is the time interval between the beginning of a service failure and the detection of such failure MTTR -- Mean Time To Resolve is the time spent to fix a service failure and bring the service back to its normal state Before we discuss monitoring an application, let us look at the monitoring infrastructure. Following is an illustration of a basic monitoring system. Figure 1: Illustration of a monitoring infrastructure Figure 1 shows a monitoring infrastructure mechanism for aggregating metrics on the system, and collecting and storing the data for display. In addition, a monitoring infrastructure includes alert subsystems for notifying concerned parties during any abnormal behavior. Let's look at each of these infrastructure components: Host metrics agent -- A host metrics agent is a process running on the host that collects performance statistics for host subsystems such as memory, CPU, and network. These metrics are regularly relayed to a metrics collector for storage and visualization. Some examples are collectd , telegraf , and metricbeat . Metric aggregator -- A metric aggregator is a process running on the host. Applications running on the host collect service metrics using instrumentation . Collected metrics are sent either to the aggregator process or directly to the metrics collector over API, if available. Received metrics are aggregated periodically and relayed to the metrics collector in batches. An example is StatsD . Metrics collector -- A metrics collector process collects all the metrics from the metric aggregators running on multiple hosts. The collector takes care of decoding and stores this data on the database. Metric collection and storage might be taken care of by one single service such as InfluxDB , which we discuss next. An example is carbon daemons . Storage -- A time-series database stores all of these metrics. Examples are OpenTSDB , Whisper , and InfluxDB . Metrics server -- A metrics server can be as basic as a web server that graphically renders metric data. In addition, the metrics server provides aggregation functionalities and APIs for fetching metric data programmatically. Some examples are Grafana and Graphite-Web . Alert manager -- The alert manager regularly polls metric data available and, if there are any anomalies detected, notifies you. Each alert has a set of rules for identifying such anomalies. Today many metrics servers such as Grafana support alert management. We discuss alerting in detail later . Examples are Grafana and Icinga .","title":"Introduction"},{"location":"level101/metrics_and_monitoring/introduction/#_1","text":"","title":""},{"location":"level101/metrics_and_monitoring/introduction/#prerequisites","text":"Linux Basics Python and the Web Systems Design Linux Networking Fundamentals","title":"Prerequisites"},{"location":"level101/metrics_and_monitoring/introduction/#what-to-expect-from-this-course","text":"Monitoring is an integral part of any system. As an SRE, you need to have a basic understanding of monitoring a service infrastructure. By the end of this course, you will gain a better understanding of the following topics: What is monitoring? What needs to be measured How the metrics gathered can be used to improve business decisions and overall reliability Proactive monitoring with alerts Log processing and its importance What is observability? Distributed tracing Logs Metrics","title":"What to expect from this course"},{"location":"level101/metrics_and_monitoring/introduction/#what-is-not-covered-in-this-course","text":"Guide to setting up a monitoring infrastructure Deep dive into different monitoring technologies and benchmarking or comparison of any tools","title":"What is not covered in this course"},{"location":"level101/metrics_and_monitoring/introduction/#course-content","text":"Introduction Four golden signals of monitoring Why is monitoring important? Command-line tools Third-party monitoring Proactive monitoring using alerts Best practices for monitoring Observability Logs Tracing Conclusion","title":"Course content"},{"location":"level101/metrics_and_monitoring/introduction/#_2","text":"","title":""},{"location":"level101/metrics_and_monitoring/introduction/#introduction","text":"Monitoring is a process of collecting real-time performance metrics from a system, analyzing the data to derive meaningful information, and displaying the data to the users. In simple terms, you measure various metrics regularly to understand the state of the system, including but not limited to, user requests, latency, and error rate. What gets measured, gets fixed ---if you can measure something, you can reason about it, understand it, discuss it, and act upon it with confidence.","title":"Introduction"},{"location":"level101/metrics_and_monitoring/introduction/#four-golden-signals-of-monitoring","text":"When setting up monitoring for a system, you need to decide what to measure. The four golden signals of monitoring provide a good understanding of service performance and lay a foundation for monitoring a system. These four golden signals are Traffic Latency Error Saturation These metrics help you to understand the system performance and bottlenecks, and to create a better end-user experience. As discussed in the Google SRE book , if you can measure only four metrics of your service, focus on these four. Let's look at each of the four golden signals. Traffic -- Traffic gives a better understanding of the service demand. Often referred to as service QPS (queries per second), traffic is a measure of requests served by the service. This signal helps you to decide when a service needs to be scaled up to handle increasing customer demand and scaled down to be cost-effective. Latency -- Latency is the measure of time taken by the service to process the incoming request and send the response. Measuring service latency helps in the early detection of slow degradation of the service. Distinguishing between the latency of successful requests and the latency of failed requests is important. For example, an HTTP 5XX error triggered due to loss of connection to a database or other critical backend might be served very quickly. However, because an HTTP 500 error indicates a failed request, factoring 500s into overall latency might result in misleading calculations. Error (rate) -- Error is the measure of failed client requests. These failures can be easily identified based on the response codes ( HTTP 5XX error ). There might be cases where the response is considered erroneous due to wrong result data or due to policy violations. For example, you might get an HTTP 200 response, but the body has incomplete data, or response time is breaching the agreed-upon SLA s. Therefore, you need to have other mechanisms (code logic or instrumentation ) in place to capture errors in addition to the response codes. Saturation -- Saturation is a measure of the resource utilization by a service. This signal tells you the state of service resources and how full they are. These resources include memory, compute, network I/O, and so on. Service performance slowly degrades even before resource utilization is at 100 percent. Therefore, having a utilization target is important. An increase in latency is a good indicator of saturation; measuring the 99th percentile of latency can help in the early detection of saturation. Depending on the type of service, you can measure these signals in different ways. For example, you might measure queries per second served for a web server. In contrast, for a database server, transactions performed and database sessions created give you an idea about the traffic handled by the database server. With the help of additional code logic (monitoring libraries and instrumentation), you can measure these signals periodically and store them for future analysis. Although these metrics give you an idea about the performance at the service end, you need to also ensure that the same user experience is delivered at the client end. Therefore, you might need to monitor the service from outside the service infrastructure, which is discussed under third-party monitoring.","title":"Four golden signals of monitoring"},{"location":"level101/metrics_and_monitoring/introduction/#why-is-monitoring-important","text":"Monitoring plays a key role in the success of a service. As discussed earlier, monitoring provides performance insights for understanding service health. With access to historical data collected over time, you can build intelligent applications to address specific needs. Some of the key use cases follow: Reduction in time to resolve issues -- With a good monitoring infrastructure in place, you can identify issues quickly and resolve them, which reduces the impact caused by the issues. Business decisions -- Data collected over a period of time can help you make business decisions such as determining the product release cycle, which features to invest in, and geographical areas to focus on. Decisions based on long-term data can improve the overall product experience. Resource planning -- By analyzing historical data, you can forecast service compute-resource demands, and you can properly allocate resources. This allows financially effective decisions, with no compromise in end-user experience. Before we dive deeper into monitoring, let's understand some basic terminologies. Metric -- A metric is a quantitative measure of a particular system attribute---for example, memory or CPU Node or host -- A physical server, virtual machine, or container where an application is running QPS -- Queries Per Second , a measure of traffic served by the service per second Latency -- The time interval between user action and the response from the server---for example, time spent after sending a query to a database before the first response bit is received Error rate -- Number of errors observed over a particular time period (usually a second) Graph -- In monitoring, a graph is a representation of one or more values of metrics collected over time Dashboard -- A dashboard is a collection of graphs that provide an overview of system health Incident -- An incident is an event that disrupts the normal operations of a system MTTD -- Mean Time To Detect is the time interval between the beginning of a service failure and the detection of such failure MTTR -- Mean Time To Resolve is the time spent to fix a service failure and bring the service back to its normal state Before we discuss monitoring an application, let us look at the monitoring infrastructure. Following is an illustration of a basic monitoring system. Figure 1: Illustration of a monitoring infrastructure Figure 1 shows a monitoring infrastructure mechanism for aggregating metrics on the system, and collecting and storing the data for display. In addition, a monitoring infrastructure includes alert subsystems for notifying concerned parties during any abnormal behavior. Let's look at each of these infrastructure components: Host metrics agent -- A host metrics agent is a process running on the host that collects performance statistics for host subsystems such as memory, CPU, and network. These metrics are regularly relayed to a metrics collector for storage and visualization. Some examples are collectd , telegraf , and metricbeat . Metric aggregator -- A metric aggregator is a process running on the host. Applications running on the host collect service metrics using instrumentation . Collected metrics are sent either to the aggregator process or directly to the metrics collector over API, if available. Received metrics are aggregated periodically and relayed to the metrics collector in batches. An example is StatsD . Metrics collector -- A metrics collector process collects all the metrics from the metric aggregators running on multiple hosts. The collector takes care of decoding and stores this data on the database. Metric collection and storage might be taken care of by one single service such as InfluxDB , which we discuss next. An example is carbon daemons . Storage -- A time-series database stores all of these metrics. Examples are OpenTSDB , Whisper , and InfluxDB . Metrics server -- A metrics server can be as basic as a web server that graphically renders metric data. In addition, the metrics server provides aggregation functionalities and APIs for fetching metric data programmatically. Some examples are Grafana and Graphite-Web . Alert manager -- The alert manager regularly polls metric data available and, if there are any anomalies detected, notifies you. Each alert has a set of rules for identifying such anomalies. Today many metrics servers such as Grafana support alert management. We discuss alerting in detail later . Examples are Grafana and Icinga .","title":"Why is monitoring important?"},{"location":"level101/metrics_and_monitoring/observability/","text":"Observability Engineers often use observability when referring to building reliable systems. Observability is a term derived from control theory, It is a measure of how well internal states of a system can be inferred from knowledge of its external outputs. Service infrastructures used on a daily basis are becoming more and more complex; proactive monitoring alone is not sufficient to quickly resolve issues causing application failures. With monitoring, you can keep known past failures from recurring, but with a complex service architecture, many unknown factors can cause potential problems. To address such cases, you can make the service observable. An observable system provides highly granular insights into the implicit failure modes. In addition, an observable system furnishes ample context about its inner workings, which unlocks the ability to uncover deeper systemic issues. Monitoring enables failure detection; observability helps in gaining a better understanding of the system. Among engineers, there is a common misconception that monitoring and observability are two different things. Actually, observability is the superset to monitoring; that is, monitoring improves service observability. The goal of observability is not only to detect problems, but also to understand where the issue is and what is causing it. In addition to metrics, observability has two more pillars: logs and traces, as shown in Figure 9. Although these three components do not make a system 100 percent observable, these are the most important and powerful components that give a better understanding of the system. Each of these pillars has its flaws, which are described in Three Pillars with Zero Answers . Figure 9: Three pillars of observability Because we have covered metrics already, let's look at the other two pillars (logs and traces). Logs Logs (often referred to as events ) are a record of activities performed by a service during its run time, with a corresponding timestamp. Metrics give abstract information about degradations in a system, and logs give a detailed view of what is causing these degradations. Logs created by the applications and infrastructure components help in effectively understanding the behavior of the system by providing details on application errors, exceptions, and event timelines. Logs help you to go back in time to understand the events that led to a failure. Therefore, examining logs is essential to troubleshooting system failures. Log processing involves the aggregation of different logs from individual applications and their subsequent shipment to central storage. Moving logs to central storage helps to preserve the logs, in case the application instances are inaccessible, or the application crashes due to a failure. After the logs are available in a central place, you can analyze the logs to derive sensible information from them. For audit and compliance purposes, you archive these logs on the central storage for a certain period of time. Log analyzers fetch useful information from log lines, such as request user information, request URL (feature), and response headers (such as content length) and response time. This information is grouped based on these attributes and made available to you through a visualization tool for quick understanding. You might be wondering how this log information helps. This information gives a holistic view of activities performed on all the involved entities. For example, let's say someone is performing a DoS (denial of service) attack on a web application. With the help of log processing, you can quickly look at top client IPs derived from access logs and identify where the attack is coming from. Similarly, if a feature in an application is causing a high error rate when accessed with a particular request parameter value, the results of log analysis can help you to quickly identify the misbehaving parameter value and take further action. Figure 10: Log processing and analysis using ELK stack Figure 10 shows a log processing platform using ELK (Elasticsearch, Logstash, Kibana), which provides centralized log processing. Beats is a collection of lightweight data shippers that can ship logs, audit data, network data, and so on over the network. In this use case specifically, we are using filebeat as a log shipper. Filebeat watches service log files and ships the log data to Logstash. Logstash parses these logs and transforms the data, preparing it to store on Elasticsearch. Transformed log data is stored on Elasticsearch and indexed for fast retrieval. Kibana searches and displays log data stored on Elasticsearch. Kibana also provides a set of visualizations for graphically displaying summaries derived from log data. Storing logs is expensive. And extensive logging of every event on the server is costly and takes up more storage space. With an increasing number of services, this cost can increase proportionally to the number of services. Tracing So far, we covered the importance of metrics and logging. Metrics give an abstract overview of the system, and logging gives a record of events that occurred. Imagine a complex distributed system with multiple microservices, where a user request is processed by multiple microservices in the system. Metrics and logging give you some information about how these requests are being handled by the system, but they fail to provide detailed information across all the microservices and how they affect a particular client request. If a slow downstream microservice is leading to increased response times, you need to have detailed visibility across all involved microservices to identify such microservice. The answer to this need is a request tracing mechanism. A trace is a series of spans, where each span is a record of events performed by different microservices to serve the client's request. In simple terms, a trace is a log of client-request serving derived from various microservices across different physical machines. Each span includes span metadata such as trace ID and span ID, and context, which includes information about transactions performed. Figure 11: Trace and spans for a URL shortener request Figure 11 is a graphical representation of a trace captured on the URL shortener example we covered earlier while learning Python. Similar to monitoring, the tracing infrastructure comprises a few modules for collecting traces, storing them, and accessing them. Each microservice runs a tracing library that collects traces in the background, creates in-memory batches, and submits the tracing backend. The tracing backend normalizes received trace data and stores it on persistent storage. Tracing data comes from multiple different microservices; therefore, trace storage is often organized to store data incrementally and is indexed by trace identifier. This organization helps in the reconstruction of trace data and in visualization. Figure 12 illustrates the anatomy of the distributed system. Figure 12: Anatomy of distributed tracing Today a set of tools and frameworks are available for building distributed tracing solutions. Following are some of the popular tools: OpenTelemetry : Observability framework for cloud-native software Jaeger : Open-source distributed tracing solution Zipkin : Open-source distributed tracing solution","title":"Observability"},{"location":"level101/metrics_and_monitoring/observability/#_1","text":"","title":""},{"location":"level101/metrics_and_monitoring/observability/#observability","text":"Engineers often use observability when referring to building reliable systems. Observability is a term derived from control theory, It is a measure of how well internal states of a system can be inferred from knowledge of its external outputs. Service infrastructures used on a daily basis are becoming more and more complex; proactive monitoring alone is not sufficient to quickly resolve issues causing application failures. With monitoring, you can keep known past failures from recurring, but with a complex service architecture, many unknown factors can cause potential problems. To address such cases, you can make the service observable. An observable system provides highly granular insights into the implicit failure modes. In addition, an observable system furnishes ample context about its inner workings, which unlocks the ability to uncover deeper systemic issues. Monitoring enables failure detection; observability helps in gaining a better understanding of the system. Among engineers, there is a common misconception that monitoring and observability are two different things. Actually, observability is the superset to monitoring; that is, monitoring improves service observability. The goal of observability is not only to detect problems, but also to understand where the issue is and what is causing it. In addition to metrics, observability has two more pillars: logs and traces, as shown in Figure 9. Although these three components do not make a system 100 percent observable, these are the most important and powerful components that give a better understanding of the system. Each of these pillars has its flaws, which are described in Three Pillars with Zero Answers . Figure 9: Three pillars of observability Because we have covered metrics already, let's look at the other two pillars (logs and traces).","title":"Observability"},{"location":"level101/metrics_and_monitoring/observability/#logs","text":"Logs (often referred to as events ) are a record of activities performed by a service during its run time, with a corresponding timestamp. Metrics give abstract information about degradations in a system, and logs give a detailed view of what is causing these degradations. Logs created by the applications and infrastructure components help in effectively understanding the behavior of the system by providing details on application errors, exceptions, and event timelines. Logs help you to go back in time to understand the events that led to a failure. Therefore, examining logs is essential to troubleshooting system failures. Log processing involves the aggregation of different logs from individual applications and their subsequent shipment to central storage. Moving logs to central storage helps to preserve the logs, in case the application instances are inaccessible, or the application crashes due to a failure. After the logs are available in a central place, you can analyze the logs to derive sensible information from them. For audit and compliance purposes, you archive these logs on the central storage for a certain period of time. Log analyzers fetch useful information from log lines, such as request user information, request URL (feature), and response headers (such as content length) and response time. This information is grouped based on these attributes and made available to you through a visualization tool for quick understanding. You might be wondering how this log information helps. This information gives a holistic view of activities performed on all the involved entities. For example, let's say someone is performing a DoS (denial of service) attack on a web application. With the help of log processing, you can quickly look at top client IPs derived from access logs and identify where the attack is coming from. Similarly, if a feature in an application is causing a high error rate when accessed with a particular request parameter value, the results of log analysis can help you to quickly identify the misbehaving parameter value and take further action. Figure 10: Log processing and analysis using ELK stack Figure 10 shows a log processing platform using ELK (Elasticsearch, Logstash, Kibana), which provides centralized log processing. Beats is a collection of lightweight data shippers that can ship logs, audit data, network data, and so on over the network. In this use case specifically, we are using filebeat as a log shipper. Filebeat watches service log files and ships the log data to Logstash. Logstash parses these logs and transforms the data, preparing it to store on Elasticsearch. Transformed log data is stored on Elasticsearch and indexed for fast retrieval. Kibana searches and displays log data stored on Elasticsearch. Kibana also provides a set of visualizations for graphically displaying summaries derived from log data. Storing logs is expensive. And extensive logging of every event on the server is costly and takes up more storage space. With an increasing number of services, this cost can increase proportionally to the number of services.","title":"Logs"},{"location":"level101/metrics_and_monitoring/observability/#tracing","text":"So far, we covered the importance of metrics and logging. Metrics give an abstract overview of the system, and logging gives a record of events that occurred. Imagine a complex distributed system with multiple microservices, where a user request is processed by multiple microservices in the system. Metrics and logging give you some information about how these requests are being handled by the system, but they fail to provide detailed information across all the microservices and how they affect a particular client request. If a slow downstream microservice is leading to increased response times, you need to have detailed visibility across all involved microservices to identify such microservice. The answer to this need is a request tracing mechanism. A trace is a series of spans, where each span is a record of events performed by different microservices to serve the client's request. In simple terms, a trace is a log of client-request serving derived from various microservices across different physical machines. Each span includes span metadata such as trace ID and span ID, and context, which includes information about transactions performed. Figure 11: Trace and spans for a URL shortener request Figure 11 is a graphical representation of a trace captured on the URL shortener example we covered earlier while learning Python. Similar to monitoring, the tracing infrastructure comprises a few modules for collecting traces, storing them, and accessing them. Each microservice runs a tracing library that collects traces in the background, creates in-memory batches, and submits the tracing backend. The tracing backend normalizes received trace data and stores it on persistent storage. Tracing data comes from multiple different microservices; therefore, trace storage is often organized to store data incrementally and is indexed by trace identifier. This organization helps in the reconstruction of trace data and in visualization. Figure 12 illustrates the anatomy of the distributed system. Figure 12: Anatomy of distributed tracing Today a set of tools and frameworks are available for building distributed tracing solutions. Following are some of the popular tools: OpenTelemetry : Observability framework for cloud-native software Jaeger : Open-source distributed tracing solution Zipkin : Open-source distributed tracing solution","title":"Tracing"},{"location":"level101/metrics_and_monitoring/third-party_monitoring/","text":"Third-party monitoring Today most cloud providers offer a variety of monitoring solutions. In addition, a number of companies such as Datadog offer monitoring-as-a-service. In this section, we are not covering monitoring-as-a-service in depth. In recent years, more and more people have access to the internet. Many services are offered online to cater to the increasing user base. As a result, web pages are becoming larger, with increased client-side scripts. Users want these services to be fast and error-free. From the service point of view, when the response body is composed, an HTTP 200 OK response is sent, and everything looks okay. But there might be errors during transmission or on the client side. As previously mentioned, monitoring services from within the service infrastructure give good visibility into service health, but this is not enough. You need to monitor user experience, specifically the availability of services for clients. A number of third-party services such as Catchpoint , Pingdom , and so on are available for achieving this goal. Third-party monitoring services can generate synthetic traffic simulating user requests from various parts of the world, to ensure the service is globally accessible. Other third-party monitoring solutions for real user monitoring (RUM) provide performance statistics such as service uptime and response time, from different geographical locations. This allows you to monitor the user experience from these locations, which might have different internet backbones, different operating systems, and different browsers and browser versions. Catchpoint Global Monitoring Network is a comprehensive 3-minute video that explains the importance of monitoring the client experience.","title":"Third-party Monitoring"},{"location":"level101/metrics_and_monitoring/third-party_monitoring/#_1","text":"","title":""},{"location":"level101/metrics_and_monitoring/third-party_monitoring/#third-party-monitoring","text":"Today most cloud providers offer a variety of monitoring solutions. In addition, a number of companies such as Datadog offer monitoring-as-a-service. In this section, we are not covering monitoring-as-a-service in depth. In recent years, more and more people have access to the internet. Many services are offered online to cater to the increasing user base. As a result, web pages are becoming larger, with increased client-side scripts. Users want these services to be fast and error-free. From the service point of view, when the response body is composed, an HTTP 200 OK response is sent, and everything looks okay. But there might be errors during transmission or on the client side. As previously mentioned, monitoring services from within the service infrastructure give good visibility into service health, but this is not enough. You need to monitor user experience, specifically the availability of services for clients. A number of third-party services such as Catchpoint , Pingdom , and so on are available for achieving this goal. Third-party monitoring services can generate synthetic traffic simulating user requests from various parts of the world, to ensure the service is globally accessible. Other third-party monitoring solutions for real user monitoring (RUM) provide performance statistics such as service uptime and response time, from different geographical locations. This allows you to monitor the user experience from these locations, which might have different internet backbones, different operating systems, and different browsers and browser versions. Catchpoint Global Monitoring Network is a comprehensive 3-minute video that explains the importance of monitoring the client experience.","title":"Third-party monitoring"},{"location":"level101/python_web/intro/","text":"Python and The Web Prerequisites Basic understanding of python language. Basic familiarity with flask framework. What to expect from this course This course is divided into two high level parts. In the first part, assuming familiarity with python language\u2019s basic operations and syntax usage, we will dive a little deeper into understanding python as a language. We will compare python with other programming languages that you might already know like Java and C. We will also explore concepts of Python objects and with help of that, explore python features like decorators. In the second part which will revolve around the web, and also assume familiarity with the Flask framework, we will start from the socket module and work with HTTP requests. This will demystify how frameworks like flask work internally. And to introduce SRE flavour to the course, we will design, develop and deploy (in theory) a URL shortening application. We will emphasize parts of the whole process that are more important as an SRE of the said app/service. What is not covered under this course Extensive knowledge of python internals and advanced python. Lab Environment Setup Have latest version of python installed Course Contents The Python Language Some Python Concepts Python Gotchas Python and Web Sockets Flask The URL Shortening App Design Scaling The App Monitoring The App The Python Language Assuming you know a little bit of C/C++ and Java, let's try to discuss the following questions in context of those two languages and python. You might have heard that C/C++ is a compiled language while python is an interpreted language. Generally, with compiled language we first compile the program and then run the executable while in case of python we run the source code directly like python hello_world.py . While Java, being an interpreted language, still has a separate compilation step and then its run. So what's really the difference? Compiled vs. Interpreted This might sound a little weird to you: python, in a way is a compiled language! Python has a compiler built-in! It is obvious in the case of java since we compile it using a separate command ie: javac helloWorld.java and it will produce a .class file which we know as a bytecode . Well, python is very similar to that. One difference here is that there is no separate compile command/binary needed to run a python program. What is the difference then, between java and python? Well, Java's compiler is more strict and sophisticated. As you might know Java is a statically typed language. So the compiler is written in a way that it can verify types related errors during compile time. While python being a dynamic language, types are not known until a program is run. So in a way, python compiler is dumb (or, less strict). But there indeed is a compile step involved when a python program is run. You might have seen python bytecode files with .pyc extension. Here is how you can see bytecode for a given python program. # Create a Hello World $ echo \"print('hello world')\" > hello_world.py # Making sure it runs $ python3 hello_world.py hello world # The bytecode of the given program $ python -m dis hello_world.py 1 0 LOAD_NAME 0 (print) 2 LOAD_CONST 0 ('hello world') 4 CALL_FUNCTION 1 6 POP_TOP 8 LOAD_CONST 1 (None) 10 RETURN_VALUE Read more about dis module here Now coming to C/C++, there of course is a compiler. But the output is different than what java/python compiler would produce. Compiling a C program would produce what we also know as machine code . As opposed to bytecode. Running The Programs We know compilation is involved in all 3 languages we are discussing. Just that the compilers are different in nature and they output different types of content. In case of C/C++, the output is machine code which can be directly read by your operating system. When you execute that program, your OS will know how exactly to run it. But this is not the case with bytecode. Those bytecodes are language specific. Python has its own set of bytecode defined (more in dis module) and so does java. So naturally, your operating system will not know how to run it. To run this bytecode, we have something called Virtual Machines. Ie: The JVM or the Python VM (CPython, Jython). These so called Virtual Machines are the programs which can read the bytecode and run it on a given operating system. Python has multiple VMs available. Cpython is a python VM implemented in C language, similarly Jython is a Java implementation of python VM. At the end of the day, what they should be capable of is to understand python language syntax, be able to compile it to bytecode and be able to run that bytecode. You can implement a python VM in any language! (And people do so, just because it can be done) The Operating System +------------------------------------+ | | | | | | hello_world.py Python bytecode | Python VM Process | | | +----------------+ +----------------+ | +----------------+ | |print(... | COMPILE |LOAD_CONST... | | |Reads bytecode | | | +--------------->+ +------------------->+line by line | | | | | | | |and executes. | | | | | | | | | | +----------------+ +----------------+ | +----------------+ | | | | | | | hello_world.c OS Specific machinecode | A New Process | | | +----------------+ +----------------+ | +----------------+ | |void main() { | COMPILE | binary contents| | | binary contents| | | +--------------->+ +------------------->+ | | | | | | | | | | | | | | | | | | +----------------+ +----------------+ | +----------------+ | | (binary contents | | runs as is) | | | | | +------------------------------------+ Two things to note for above diagram: Generally, when we run a python program, a python VM process is started which reads the python source code, compiles it to byte code and run it in a single step. Compiling is not a separate step. Shown only for illustration purpose. Binaries generated for C like languages are not exactly run as is. Since there are multiple types of binaries (eg: ELF), there are more complicated steps involved in order to run a binary but we will not go into that since all that is done at OS level.","title":"Introduction"},{"location":"level101/python_web/intro/#python-and-the-web","text":"","title":"Python and The Web"},{"location":"level101/python_web/intro/#prerequisites","text":"Basic understanding of python language. Basic familiarity with flask framework.","title":"Prerequisites"},{"location":"level101/python_web/intro/#what-to-expect-from-this-course","text":"This course is divided into two high level parts. In the first part, assuming familiarity with python language\u2019s basic operations and syntax usage, we will dive a little deeper into understanding python as a language. We will compare python with other programming languages that you might already know like Java and C. We will also explore concepts of Python objects and with help of that, explore python features like decorators. In the second part which will revolve around the web, and also assume familiarity with the Flask framework, we will start from the socket module and work with HTTP requests. This will demystify how frameworks like flask work internally. And to introduce SRE flavour to the course, we will design, develop and deploy (in theory) a URL shortening application. We will emphasize parts of the whole process that are more important as an SRE of the said app/service.","title":"What to expect from this course"},{"location":"level101/python_web/intro/#what-is-not-covered-under-this-course","text":"Extensive knowledge of python internals and advanced python.","title":"What is not covered under this course"},{"location":"level101/python_web/intro/#lab-environment-setup","text":"Have latest version of python installed","title":"Lab Environment Setup"},{"location":"level101/python_web/intro/#course-contents","text":"The Python Language Some Python Concepts Python Gotchas Python and Web Sockets Flask The URL Shortening App Design Scaling The App Monitoring The App","title":"Course Contents"},{"location":"level101/python_web/intro/#the-python-language","text":"Assuming you know a little bit of C/C++ and Java, let's try to discuss the following questions in context of those two languages and python. You might have heard that C/C++ is a compiled language while python is an interpreted language. Generally, with compiled language we first compile the program and then run the executable while in case of python we run the source code directly like python hello_world.py . While Java, being an interpreted language, still has a separate compilation step and then its run. So what's really the difference?","title":"The Python Language"},{"location":"level101/python_web/intro/#compiled-vs-interpreted","text":"This might sound a little weird to you: python, in a way is a compiled language! Python has a compiler built-in! It is obvious in the case of java since we compile it using a separate command ie: javac helloWorld.java and it will produce a .class file which we know as a bytecode . Well, python is very similar to that. One difference here is that there is no separate compile command/binary needed to run a python program. What is the difference then, between java and python? Well, Java's compiler is more strict and sophisticated. As you might know Java is a statically typed language. So the compiler is written in a way that it can verify types related errors during compile time. While python being a dynamic language, types are not known until a program is run. So in a way, python compiler is dumb (or, less strict). But there indeed is a compile step involved when a python program is run. You might have seen python bytecode files with .pyc extension. Here is how you can see bytecode for a given python program. # Create a Hello World $ echo \"print('hello world')\" > hello_world.py # Making sure it runs $ python3 hello_world.py hello world # The bytecode of the given program $ python -m dis hello_world.py 1 0 LOAD_NAME 0 (print) 2 LOAD_CONST 0 ('hello world') 4 CALL_FUNCTION 1 6 POP_TOP 8 LOAD_CONST 1 (None) 10 RETURN_VALUE Read more about dis module here Now coming to C/C++, there of course is a compiler. But the output is different than what java/python compiler would produce. Compiling a C program would produce what we also know as machine code . As opposed to bytecode.","title":"Compiled vs. Interpreted"},{"location":"level101/python_web/intro/#running-the-programs","text":"We know compilation is involved in all 3 languages we are discussing. Just that the compilers are different in nature and they output different types of content. In case of C/C++, the output is machine code which can be directly read by your operating system. When you execute that program, your OS will know how exactly to run it. But this is not the case with bytecode. Those bytecodes are language specific. Python has its own set of bytecode defined (more in dis module) and so does java. So naturally, your operating system will not know how to run it. To run this bytecode, we have something called Virtual Machines. Ie: The JVM or the Python VM (CPython, Jython). These so called Virtual Machines are the programs which can read the bytecode and run it on a given operating system. Python has multiple VMs available. Cpython is a python VM implemented in C language, similarly Jython is a Java implementation of python VM. At the end of the day, what they should be capable of is to understand python language syntax, be able to compile it to bytecode and be able to run that bytecode. You can implement a python VM in any language! (And people do so, just because it can be done) The Operating System +------------------------------------+ | | | | | | hello_world.py Python bytecode | Python VM Process | | | +----------------+ +----------------+ | +----------------+ | |print(... | COMPILE |LOAD_CONST... | | |Reads bytecode | | | +--------------->+ +------------------->+line by line | | | | | | | |and executes. | | | | | | | | | | +----------------+ +----------------+ | +----------------+ | | | | | | | hello_world.c OS Specific machinecode | A New Process | | | +----------------+ +----------------+ | +----------------+ | |void main() { | COMPILE | binary contents| | | binary contents| | | +--------------->+ +------------------->+ | | | | | | | | | | | | | | | | | | +----------------+ +----------------+ | +----------------+ | | (binary contents | | runs as is) | | | | | +------------------------------------+ Two things to note for above diagram: Generally, when we run a python program, a python VM process is started which reads the python source code, compiles it to byte code and run it in a single step. Compiling is not a separate step. Shown only for illustration purpose. Binaries generated for C like languages are not exactly run as is. Since there are multiple types of binaries (eg: ELF), there are more complicated steps involved in order to run a binary but we will not go into that since all that is done at OS level.","title":"Running The Programs"},{"location":"level101/python_web/python-concepts/","text":"Some Python Concepts Though you are expected to know python and its syntax at basic level, let us discuss some fundamental concepts that will help you understand the python language better. Everything in Python is an object. That includes the functions, lists, dicts, classes, modules, a running function (instance of function definition), everything. In the CPython, it would mean there is an underlying struct variable for each object. In python's current execution context, all the variables are stored in a dict. It'd be a string to object mapping. If you have a function and a float variable defined in the current context, here is how it is handled internally. >>> float_number=42.0 >>> def foo_func(): ... pass ... # NOTICE HOW VARIABLE NAMES ARE STRINGS, stored in a dict >>> locals() {'__name__': '__main__', '__doc__': None, '__package__': None, '__loader__': , '__spec__': None, '__annotations__': {}, '__builtins__': , 'float_number': 42.0, 'foo_func': } Python Functions Since functions too are objects, we can see what all attributes a function contains as following >>> def hello(name): ... print(f\"Hello, {name}!\") ... >>> dir(hello) ['__annotations__', '__call__', '__class__', '__closure__', '__code__', '__defaults__', '__delattr__', '__dict__', '__dir__', '__doc__', '__eq__', '__format__', '__ge__', '__get__', '__getattribute__', '__globals__', '__gt__', '__hash__', '__init__', '__init_subclass__', '__kwdefaults__', '__le__', '__lt__', '__module__', '__name__', '__ne__', '__new__', '__qualname__', '__reduce__', '__reduce_ex__', '__repr__', '__setattr__', '__sizeof__', '__str__', '__subclasshook__'] While there are a lot of them, let's look at some interesting ones globals This attribute, as the name suggests, has references of global variables. If you ever need to know what all global variables are in the scope of this function, this will tell you. See how the function start seeing the new variable in globals >>> hello.__globals__ {'__name__': '__main__', '__doc__': None, '__package__': None, '__loader__': , '__spec__': None, '__annotations__': {}, '__builtins__': , 'hello': } # adding new global variable >>> GLOBAL=\"g_val\" >>> hello.__globals__ {'__name__': '__main__', '__doc__': None, '__package__': None, '__loader__': , '__spec__': None, '__annotations__': {}, '__builtins__': , 'hello': , 'GLOBAL': 'g_val'} code This is an interesting one! As everything in python is an object, this includes the bytecode too. The compiled python bytecode is a python code object. Which is accessible via __code__ attribute here. A function has an associated code object which carries some interesting information. # the file in which function is defined # stdin here since this is run in an interpreter >>> hello.__code__.co_filename '' # number of arguments the function takes >>> hello.__code__.co_argcount 1 # local variable names >>> hello.__code__.co_varnames ('name',) # the function code's compiled bytecode >>> hello.__code__.co_code b't\\x00d\\x01|\\x00\\x9b\\x00d\\x02\\x9d\\x03\\x83\\x01\\x01\\x00d\\x00S\\x00' There are more code attributes which you can enlist by >>> dir(hello.__code__) Decorators Related to functions, python has another feature called decorators. Let's see how that works, keeping everything is an object in mind. Here is a sample decorator: >>> def deco(func): ... def inner(): ... print(\"before\") ... func() ... print(\"after\") ... return inner ... >>> @deco ... def hello_world(): ... print(\"hello world\") ... >>> >>> hello_world() before hello world after Here @deco syntax is used to decorate the hello_world function. It is essentially same as doing >>> def hello_world(): ... print(\"hello world\") ... >>> hello_world = deco(hello_world) What goes inside the deco function might seem complex. Let's try to uncover it. Function hello_world is created It is passed to deco function deco create a new function This new function is calls hello_world function And does a couple other things deco returns the newly created function hello_world is replaced with above function Let's visualize it for better understanding BEFORE function_object (ID: 100) \"hello_world\" +--------------------+ + |print(\"hello_world\")| | | | +--------------> | | | | +--------------------+ WHAT DECORATOR DOES creates a new function (ID: 101) +---------------------------------+ |input arg: function with id: 100 | | | |print(\"before\") | |call function object with id 100 | |print(\"after\") | | | +---------------------------------+ ^ | AFTER | | | \"hello_world\" +-------------+ Note how the hello_world name points to a new function object but that new function object knows the reference (ID) of the original function. Some Gotchas While it is very quick to build prototypes in python and there are tons of libraries available, as the codebase complexity increases, type errors become more common and will get hard to deal with. (There are solutions to that problem like type annotations in python. Checkout mypy .) Because python is dynamically typed language, that means all types are determined at runtime. And that makes python run very slow compared to other statically typed languages. Python has something called GIL (global interpreter lock) which is a limiting factor for utilizing multiple CPU cores for parallel computation. Some weird things that python does: https://github.com/satwikkansal/wtfpython","title":"Some Python Concepts"},{"location":"level101/python_web/python-concepts/#some-python-concepts","text":"Though you are expected to know python and its syntax at basic level, let us discuss some fundamental concepts that will help you understand the python language better. Everything in Python is an object. That includes the functions, lists, dicts, classes, modules, a running function (instance of function definition), everything. In the CPython, it would mean there is an underlying struct variable for each object. In python's current execution context, all the variables are stored in a dict. It'd be a string to object mapping. If you have a function and a float variable defined in the current context, here is how it is handled internally. >>> float_number=42.0 >>> def foo_func(): ... pass ... # NOTICE HOW VARIABLE NAMES ARE STRINGS, stored in a dict >>> locals() {'__name__': '__main__', '__doc__': None, '__package__': None, '__loader__': , '__spec__': None, '__annotations__': {}, '__builtins__': , 'float_number': 42.0, 'foo_func': }","title":"Some Python Concepts"},{"location":"level101/python_web/python-concepts/#python-functions","text":"Since functions too are objects, we can see what all attributes a function contains as following >>> def hello(name): ... print(f\"Hello, {name}!\") ... >>> dir(hello) ['__annotations__', '__call__', '__class__', '__closure__', '__code__', '__defaults__', '__delattr__', '__dict__', '__dir__', '__doc__', '__eq__', '__format__', '__ge__', '__get__', '__getattribute__', '__globals__', '__gt__', '__hash__', '__init__', '__init_subclass__', '__kwdefaults__', '__le__', '__lt__', '__module__', '__name__', '__ne__', '__new__', '__qualname__', '__reduce__', '__reduce_ex__', '__repr__', '__setattr__', '__sizeof__', '__str__', '__subclasshook__'] While there are a lot of them, let's look at some interesting ones","title":"Python Functions"},{"location":"level101/python_web/python-concepts/#globals","text":"This attribute, as the name suggests, has references of global variables. If you ever need to know what all global variables are in the scope of this function, this will tell you. See how the function start seeing the new variable in globals >>> hello.__globals__ {'__name__': '__main__', '__doc__': None, '__package__': None, '__loader__': , '__spec__': None, '__annotations__': {}, '__builtins__': , 'hello': } # adding new global variable >>> GLOBAL=\"g_val\" >>> hello.__globals__ {'__name__': '__main__', '__doc__': None, '__package__': None, '__loader__': , '__spec__': None, '__annotations__': {}, '__builtins__': , 'hello': , 'GLOBAL': 'g_val'}","title":"globals"},{"location":"level101/python_web/python-concepts/#code","text":"This is an interesting one! As everything in python is an object, this includes the bytecode too. The compiled python bytecode is a python code object. Which is accessible via __code__ attribute here. A function has an associated code object which carries some interesting information. # the file in which function is defined # stdin here since this is run in an interpreter >>> hello.__code__.co_filename '' # number of arguments the function takes >>> hello.__code__.co_argcount 1 # local variable names >>> hello.__code__.co_varnames ('name',) # the function code's compiled bytecode >>> hello.__code__.co_code b't\\x00d\\x01|\\x00\\x9b\\x00d\\x02\\x9d\\x03\\x83\\x01\\x01\\x00d\\x00S\\x00' There are more code attributes which you can enlist by >>> dir(hello.__code__)","title":"code"},{"location":"level101/python_web/python-concepts/#decorators","text":"Related to functions, python has another feature called decorators. Let's see how that works, keeping everything is an object in mind. Here is a sample decorator: >>> def deco(func): ... def inner(): ... print(\"before\") ... func() ... print(\"after\") ... return inner ... >>> @deco ... def hello_world(): ... print(\"hello world\") ... >>> >>> hello_world() before hello world after Here @deco syntax is used to decorate the hello_world function. It is essentially same as doing >>> def hello_world(): ... print(\"hello world\") ... >>> hello_world = deco(hello_world) What goes inside the deco function might seem complex. Let's try to uncover it. Function hello_world is created It is passed to deco function deco create a new function This new function is calls hello_world function And does a couple other things deco returns the newly created function hello_world is replaced with above function Let's visualize it for better understanding BEFORE function_object (ID: 100) \"hello_world\" +--------------------+ + |print(\"hello_world\")| | | | +--------------> | | | | +--------------------+ WHAT DECORATOR DOES creates a new function (ID: 101) +---------------------------------+ |input arg: function with id: 100 | | | |print(\"before\") | |call function object with id 100 | |print(\"after\") | | | +---------------------------------+ ^ | AFTER | | | \"hello_world\" +-------------+ Note how the hello_world name points to a new function object but that new function object knows the reference (ID) of the original function.","title":"Decorators"},{"location":"level101/python_web/python-concepts/#some-gotchas","text":"While it is very quick to build prototypes in python and there are tons of libraries available, as the codebase complexity increases, type errors become more common and will get hard to deal with. (There are solutions to that problem like type annotations in python. Checkout mypy .) Because python is dynamically typed language, that means all types are determined at runtime. And that makes python run very slow compared to other statically typed languages. Python has something called GIL (global interpreter lock) which is a limiting factor for utilizing multiple CPU cores for parallel computation. Some weird things that python does: https://github.com/satwikkansal/wtfpython","title":"Some Gotchas"},{"location":"level101/python_web/python-web-flask/","text":"Python, Web and Flask Back in the old days, websites were simple. They were simple static html contents. A webserver would be listening on a defined port and according to the HTTP request received, it would read files from disk and return them in response. But since then, complexity has evolved and websites are now dynamic. Depending on the request, multiple operations need to be performed like reading from database or calling other API and finally returning some response (HTML data, JSON content etc.) Since serving web requests is no longer a simple task like reading files from disk and return contents, we need to process each http request, perform some operations programmatically and construct a response. Sockets Though we have frameworks like flask, HTTP is still a protocol that works over TCP protocol. So let us setup a TCP server and send an HTTP request and inspect the request's payload. Note that this is not a tutorial on socket programming but what we are doing here is inspecting HTTP protocol at its ground level and look at what its contents look like. (Ref: Socket Programming in Python (Guide) on RealPython ) import socket HOST = '127.0.0.1' # Standard loopback interface address (localhost) PORT = 65432 # Port to listen on (non-privileged ports are > 1023) with socket.socket(socket.AF_INET, socket.SOCK_STREAM) as s: s.bind((HOST, PORT)) s.listen() conn, addr = s.accept() with conn: print('Connected by', addr) while True: data = conn.recv(1024) if not data: break print(data) Then we open localhost:65432 in our web browser and following would be the output: Connected by ('127.0.0.1', 54719) b'GET / HTTP/1.1\\r\\nHost: localhost:65432\\r\\nConnection: keep-alive\\r\\nDNT: 1\\r\\nUpgrade-Insecure-Requests: 1\\r\\nUser-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/85.0.4183.83 Safari/537.36 Edg/85.0.564.44\\r\\nAccept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9\\r\\nSec-Fetch-Site: none\\r\\nSec-Fetch-Mode: navigate\\r\\nSec-Fetch-User: ?1\\r\\nSec-Fetch-Dest: document\\r\\nAccept-Encoding: gzip, deflate, br\\r\\nAccept-Language: en-US,en;q=0.9\\r\\n\\r\\n' Examine closely and the content will look like the HTTP protocol's format. ie: HTTP_METHOD URI_PATH HTTP_VERSION HEADERS_SEPARATED_BY_SEPARATOR So though it's a blob of bytes, knowing http protocol specification , you can parse that string (ie: split by \\r\\n ) and get meaningful information out of it. Flask Flask, and other such frameworks does pretty much what we just discussed in the last section (with added more sophistication). They listen on a port on a TCP socket, receive an HTTP request, parse the data according to protocol format and make it available to you in a convenient manner. ie: you can access headers in flask by request.headers which is made available to you by splitting above payload by /r/n , as defined in http protocol. Another example: we register routes in flask by @app.route(\"/hello\") . What flask will do is maintain a registry internally which will map /hello with the function you decorated with. Now whenever a request comes with the /hello route (second component in the first line, split by space), flask calls the registered function and returns whatever the function returned. Same with all other web frameworks in other languages too. They all work on similar principles. What they basically do is understand the HTTP protocol, parses the HTTP request data and gives us programmers a nice interface to work with HTTP requests. Not so much of magic, innit?","title":"Python, Web and Flask"},{"location":"level101/python_web/python-web-flask/#python-web-and-flask","text":"Back in the old days, websites were simple. They were simple static html contents. A webserver would be listening on a defined port and according to the HTTP request received, it would read files from disk and return them in response. But since then, complexity has evolved and websites are now dynamic. Depending on the request, multiple operations need to be performed like reading from database or calling other API and finally returning some response (HTML data, JSON content etc.) Since serving web requests is no longer a simple task like reading files from disk and return contents, we need to process each http request, perform some operations programmatically and construct a response.","title":"Python, Web and Flask"},{"location":"level101/python_web/python-web-flask/#sockets","text":"Though we have frameworks like flask, HTTP is still a protocol that works over TCP protocol. So let us setup a TCP server and send an HTTP request and inspect the request's payload. Note that this is not a tutorial on socket programming but what we are doing here is inspecting HTTP protocol at its ground level and look at what its contents look like. (Ref: Socket Programming in Python (Guide) on RealPython ) import socket HOST = '127.0.0.1' # Standard loopback interface address (localhost) PORT = 65432 # Port to listen on (non-privileged ports are > 1023) with socket.socket(socket.AF_INET, socket.SOCK_STREAM) as s: s.bind((HOST, PORT)) s.listen() conn, addr = s.accept() with conn: print('Connected by', addr) while True: data = conn.recv(1024) if not data: break print(data) Then we open localhost:65432 in our web browser and following would be the output: Connected by ('127.0.0.1', 54719) b'GET / HTTP/1.1\\r\\nHost: localhost:65432\\r\\nConnection: keep-alive\\r\\nDNT: 1\\r\\nUpgrade-Insecure-Requests: 1\\r\\nUser-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/85.0.4183.83 Safari/537.36 Edg/85.0.564.44\\r\\nAccept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9\\r\\nSec-Fetch-Site: none\\r\\nSec-Fetch-Mode: navigate\\r\\nSec-Fetch-User: ?1\\r\\nSec-Fetch-Dest: document\\r\\nAccept-Encoding: gzip, deflate, br\\r\\nAccept-Language: en-US,en;q=0.9\\r\\n\\r\\n' Examine closely and the content will look like the HTTP protocol's format. ie: HTTP_METHOD URI_PATH HTTP_VERSION HEADERS_SEPARATED_BY_SEPARATOR So though it's a blob of bytes, knowing http protocol specification , you can parse that string (ie: split by \\r\\n ) and get meaningful information out of it.","title":"Sockets"},{"location":"level101/python_web/python-web-flask/#flask","text":"Flask, and other such frameworks does pretty much what we just discussed in the last section (with added more sophistication). They listen on a port on a TCP socket, receive an HTTP request, parse the data according to protocol format and make it available to you in a convenient manner. ie: you can access headers in flask by request.headers which is made available to you by splitting above payload by /r/n , as defined in http protocol. Another example: we register routes in flask by @app.route(\"/hello\") . What flask will do is maintain a registry internally which will map /hello with the function you decorated with. Now whenever a request comes with the /hello route (second component in the first line, split by space), flask calls the registered function and returns whatever the function returned. Same with all other web frameworks in other languages too. They all work on similar principles. What they basically do is understand the HTTP protocol, parses the HTTP request data and gives us programmers a nice interface to work with HTTP requests. Not so much of magic, innit?","title":"Flask"},{"location":"level101/python_web/sre-conclusion/","text":"Conclusion Scaling The App The design and development is just a part of the journey. We will need to setup continuous integration and continuous delivery pipelines sooner or later. And we have to deploy this app somewhere. Initially we can start with deploying this app on one virtual machine on any cloud provider. But this is a Single point of failure which is something we never allow as an SRE (or even as an engineer). So an improvement here can be having multiple instances of applications deployed behind a load balancer. This certainly prevents problems of one machine going down. Scaling here would mean adding more instances behind the load balancer. But this is scalable upto only a certain point. After that, other bottlenecks in the system will start appearing. ie: DB will become the bottleneck, or perhaps the load balancer itself. How do you know what is the bottleneck? You need to have observability into each aspects of the application architecture. Only after you have metrics, you will be able to know what is going wrong where. What gets measured, gets fixed! Get deeper insights into scaling from School Of SRE's Scalability module and post going through it, apply your learnings and takeaways to this app. Think how will we make this app geographically distributed and highly available and scalable. Monitoring Strategy Once we have our application deployed. It will be working ok. But not forever. Reliability is in the title of our job and we make systems reliable by making the design in a certain way. But things still will go down. Machines will fail. Disks will behave weirdly. Buggy code will get pushed to production. And all these possible scenarios will make the system less reliable. So what do we do? We monitor! We keep an eye on the system's health and if anything is not going as expected, we want ourselves to get alerted. Now let's think in terms of the given url shortening app. We need to monitor it. And we would want to get notified in case something goes wrong. But we first need to decide what is that something that we want to keep an eye on. Since it's a web app serving HTTP requests, we want to keep an eye on HTTP Status codes and latencies Request volume again is a good candidate, if the app is receiving an unusual amount of traffic, something might be off. We also want to keep an eye on the database so depending on the database solution chosen. Query times, volumes, disk usage etc. Finally, there also needs to be some external monitoring which runs periodic tests from devices outside of your data centers. This emulates customers and ensures that from customer point of view, the system is working as expected. Applications in SRE role In the world of SRE, python is a widely used language. For small scripts and tooling developed for various purposes. Since tooling developed by SRE works with critical pieces of infrastructure and has great power (to bring things down), it is important to know what you are doing while using a programming language and its features. Also it is equally important to know the language and its characteristics while debugging the issues. As an SRE having a deeper understanding of python language, it has helped me a lot to debug very sneaky bugs and be generally more aware and informed while making certain design decisions. While developing tools may or may not be part of SRE job, supporting tools or services is more likely to be a daily duty. Building an application or tool is just a small part of productionization. While there is certainly that goes in the design of the application itself to make it more robust, as an SRE you are responsible for its reliability and stability once it is deployed and running. And to ensure that, you\u2019d need to understand the application first and then come up with a strategy to monitor it properly and be prepared for various failure scenarios. Optional Exercises Make a decorator that will cache function return values depending on input parameters. Host the URL shortening app on any cloud provider. Setup monitoring using many of the tools available like catchpoint, datadog etc. Create a minimal flask-like framework on top of TCP sockets. Conclusion This module, in the first part, aims to make you more aware of the things that will happen when you choose python as your programming language and what happens when you run a python program. With the knowledge of how python handles things internally as objects, lot of seemingly magic things in python will start to make more sense. The second part will first explain how a framework like flask works using the existing knowledge of protocols like TCP and HTTP. It then touches the whole lifecycle of an application development lifecycle including the SRE parts of it. While the design and areas in architecture considered will not be exhaustive, it will give a good overview of things that are also important being an SRE and why they are important.","title":"Conclusion"},{"location":"level101/python_web/sre-conclusion/#conclusion","text":"","title":"Conclusion"},{"location":"level101/python_web/sre-conclusion/#scaling-the-app","text":"The design and development is just a part of the journey. We will need to setup continuous integration and continuous delivery pipelines sooner or later. And we have to deploy this app somewhere. Initially we can start with deploying this app on one virtual machine on any cloud provider. But this is a Single point of failure which is something we never allow as an SRE (or even as an engineer). So an improvement here can be having multiple instances of applications deployed behind a load balancer. This certainly prevents problems of one machine going down. Scaling here would mean adding more instances behind the load balancer. But this is scalable upto only a certain point. After that, other bottlenecks in the system will start appearing. ie: DB will become the bottleneck, or perhaps the load balancer itself. How do you know what is the bottleneck? You need to have observability into each aspects of the application architecture. Only after you have metrics, you will be able to know what is going wrong where. What gets measured, gets fixed! Get deeper insights into scaling from School Of SRE's Scalability module and post going through it, apply your learnings and takeaways to this app. Think how will we make this app geographically distributed and highly available and scalable.","title":"Scaling The App"},{"location":"level101/python_web/sre-conclusion/#monitoring-strategy","text":"Once we have our application deployed. It will be working ok. But not forever. Reliability is in the title of our job and we make systems reliable by making the design in a certain way. But things still will go down. Machines will fail. Disks will behave weirdly. Buggy code will get pushed to production. And all these possible scenarios will make the system less reliable. So what do we do? We monitor! We keep an eye on the system's health and if anything is not going as expected, we want ourselves to get alerted. Now let's think in terms of the given url shortening app. We need to monitor it. And we would want to get notified in case something goes wrong. But we first need to decide what is that something that we want to keep an eye on. Since it's a web app serving HTTP requests, we want to keep an eye on HTTP Status codes and latencies Request volume again is a good candidate, if the app is receiving an unusual amount of traffic, something might be off. We also want to keep an eye on the database so depending on the database solution chosen. Query times, volumes, disk usage etc. Finally, there also needs to be some external monitoring which runs periodic tests from devices outside of your data centers. This emulates customers and ensures that from customer point of view, the system is working as expected.","title":"Monitoring Strategy"},{"location":"level101/python_web/sre-conclusion/#applications-in-sre-role","text":"In the world of SRE, python is a widely used language. For small scripts and tooling developed for various purposes. Since tooling developed by SRE works with critical pieces of infrastructure and has great power (to bring things down), it is important to know what you are doing while using a programming language and its features. Also it is equally important to know the language and its characteristics while debugging the issues. As an SRE having a deeper understanding of python language, it has helped me a lot to debug very sneaky bugs and be generally more aware and informed while making certain design decisions. While developing tools may or may not be part of SRE job, supporting tools or services is more likely to be a daily duty. Building an application or tool is just a small part of productionization. While there is certainly that goes in the design of the application itself to make it more robust, as an SRE you are responsible for its reliability and stability once it is deployed and running. And to ensure that, you\u2019d need to understand the application first and then come up with a strategy to monitor it properly and be prepared for various failure scenarios.","title":"Applications in SRE role"},{"location":"level101/python_web/sre-conclusion/#optional-exercises","text":"Make a decorator that will cache function return values depending on input parameters. Host the URL shortening app on any cloud provider. Setup monitoring using many of the tools available like catchpoint, datadog etc. Create a minimal flask-like framework on top of TCP sockets.","title":"Optional Exercises"},{"location":"level101/python_web/sre-conclusion/#conclusion_1","text":"This module, in the first part, aims to make you more aware of the things that will happen when you choose python as your programming language and what happens when you run a python program. With the knowledge of how python handles things internally as objects, lot of seemingly magic things in python will start to make more sense. The second part will first explain how a framework like flask works using the existing knowledge of protocols like TCP and HTTP. It then touches the whole lifecycle of an application development lifecycle including the SRE parts of it. While the design and areas in architecture considered will not be exhaustive, it will give a good overview of things that are also important being an SRE and why they are important.","title":"Conclusion"},{"location":"level101/python_web/url-shorten-app/","text":"The URL Shortening App Let's build a very simple URL shortening app using flask and try to incorporate all aspects of the development process including the reliability aspects. We will not be building the UI and we will come up with a minimal set of API that will be enough for the app to function well. Design We don't jump directly to coding. First thing we do is gather requirements. Come up with an approach. Have the approach/design reviewed by peers. Evolve, iterate, document the decisions and tradeoffs. And then finally implement. While we will not do the full blown design document here, we will raise certain questions here that are important to the design. 1. High Level Operations and API Endpoints Since it's a URL shortening app, we will need an API for generating the shorten link given an original link. And an API/Endpoint which will accept the shorten link and redirect to original URL. We are not including the user aspect of the app to keep things minimal. These two API should make app functional and usable by anyone. 2. How to shorten? Given a url, we will need to generate a shortened version of it. One approach could be using random characters for each link. Another thing that can be done is to use some sort of hashing algorithm. The benefit here is we will reuse the same hash for the same link. ie: if lot of people are shortening https://www.linkedin.com they all will have the same value, compared to multiple entries in DB if chosen random characters. What about hash collisions? Even in random characters approach, though there is a less probability, hash collisions can happen. And we need to be mindful of them. In that case we might want to prepend/append the string with some random value to avoid conflict. Also, choice of hash algorithm matters. We will need to analyze algorithms. Their CPU requirements and their characteristics. Choose one that suits the most. 3. Is URL Valid? Given a URL to shorten, how do we verify if the URL is valid? Do we even verify or validate? One basic check that can be done is see if the URL matches a regex of a URL. To go even further we can try opening/visiting the URL. But there are certain gotchas here. We need to define success criteria. ie: HTTP 200 means it is valid. What is the URL is in private network? What if URL is temporarily down? 4. Storage Finally, storage. Where will we store the data that we will generate over time? There are multiple database solutions available and we will need to choose the one that suits this app the most. Relational database like MySQL would be a fair choice but be sure to checkout School of SRE's SQL database section and NoSQL databases section for deeper insights into making a more informed decision. 5. Other We are not accounting for users into our app and other possible features like rate limiting, customized links etc but it will eventually come up with time. Depending on the requirements, they too might need to get incorporated. The minimal working code is given below for reference but I'd encourage you to come up with your own. from flask import Flask, redirect, request from hashlib import md5 app = Flask(\"url_shortener\") mapping = {} @app.route(\"/shorten\", methods=[\"POST\"]) def shorten(): global mapping payload = request.json if \"url\" not in payload: return \"Missing URL Parameter\", 400 # TODO: check if URL is valid hash_ = md5() hash_.update(payload[\"url\"].encode()) digest = hash_.hexdigest()[:5] # limiting to 5 chars. Less the limit more the chances of collission if digest not in mapping: mapping[digest] = payload[\"url\"] return f\"Shortened: r/{digest}\\n\" else: # TODO: check for hash collission return f\"Already exists: r/{digest}\\n\" @app.route(\"/r/\") def redirect_(hash_): if hash_ not in mapping: return \"URL Not Found\", 404 return redirect(mapping[hash_]) if __name__ == \"__main__\": app.run(debug=True) \"\"\" OUTPUT: ===> SHORTENING $ curl localhost:5000/shorten -H \"content-type: application/json\" --data '{\"url\":\"https://linkedin.com\"}' Shortened: r/a62a4 ===> REDIRECTING, notice the response code 302 and the location header $ curl localhost:5000/r/a62a4 -v * Uses proxy env variable NO_PROXY == '127.0.0.1' * Trying ::1... * TCP_NODELAY set * Connection failed * connect to ::1 port 5000 failed: Connection refused * Trying 127.0.0.1... * TCP_NODELAY set * Connected to localhost (127.0.0.1) port 5000 (#0) > GET /r/a62a4 HTTP/1.1 > Host: localhost:5000 > User-Agent: curl/7.64.1 > Accept: */* > * HTTP 1.0, assume close after body < HTTP/1.0 302 FOUND < Content-Type: text/html; charset=utf-8 < Content-Length: 247 < Location: https://linkedin.com < Server: Werkzeug/0.15.4 Python/3.7.7 < Date: Tue, 27 Oct 2020 09:37:12 GMT < Redirecting...

Redirecting...

* Closing connection 0

You should be redirected automatically to target URL: https://linkedin.com. If not click the link. \"\"\"","title":"The URL Shortening App"},{"location":"level101/python_web/url-shorten-app/#the-url-shortening-app","text":"Let's build a very simple URL shortening app using flask and try to incorporate all aspects of the development process including the reliability aspects. We will not be building the UI and we will come up with a minimal set of API that will be enough for the app to function well.","title":"The URL Shortening App"},{"location":"level101/python_web/url-shorten-app/#design","text":"We don't jump directly to coding. First thing we do is gather requirements. Come up with an approach. Have the approach/design reviewed by peers. Evolve, iterate, document the decisions and tradeoffs. And then finally implement. While we will not do the full blown design document here, we will raise certain questions here that are important to the design.","title":"Design"},{"location":"level101/python_web/url-shorten-app/#1-high-level-operations-and-api-endpoints","text":"Since it's a URL shortening app, we will need an API for generating the shorten link given an original link. And an API/Endpoint which will accept the shorten link and redirect to original URL. We are not including the user aspect of the app to keep things minimal. These two API should make app functional and usable by anyone.","title":"1. High Level Operations and API Endpoints"},{"location":"level101/python_web/url-shorten-app/#2-how-to-shorten","text":"Given a url, we will need to generate a shortened version of it. One approach could be using random characters for each link. Another thing that can be done is to use some sort of hashing algorithm. The benefit here is we will reuse the same hash for the same link. ie: if lot of people are shortening https://www.linkedin.com they all will have the same value, compared to multiple entries in DB if chosen random characters. What about hash collisions? Even in random characters approach, though there is a less probability, hash collisions can happen. And we need to be mindful of them. In that case we might want to prepend/append the string with some random value to avoid conflict. Also, choice of hash algorithm matters. We will need to analyze algorithms. Their CPU requirements and their characteristics. Choose one that suits the most.","title":"2. How to shorten?"},{"location":"level101/python_web/url-shorten-app/#3-is-url-valid","text":"Given a URL to shorten, how do we verify if the URL is valid? Do we even verify or validate? One basic check that can be done is see if the URL matches a regex of a URL. To go even further we can try opening/visiting the URL. But there are certain gotchas here. We need to define success criteria. ie: HTTP 200 means it is valid. What is the URL is in private network? What if URL is temporarily down?","title":"3. Is URL Valid?"},{"location":"level101/python_web/url-shorten-app/#4-storage","text":"Finally, storage. Where will we store the data that we will generate over time? There are multiple database solutions available and we will need to choose the one that suits this app the most. Relational database like MySQL would be a fair choice but be sure to checkout School of SRE's SQL database section and NoSQL databases section for deeper insights into making a more informed decision.","title":"4. Storage"},{"location":"level101/python_web/url-shorten-app/#5-other","text":"We are not accounting for users into our app and other possible features like rate limiting, customized links etc but it will eventually come up with time. Depending on the requirements, they too might need to get incorporated. The minimal working code is given below for reference but I'd encourage you to come up with your own. from flask import Flask, redirect, request from hashlib import md5 app = Flask(\"url_shortener\") mapping = {} @app.route(\"/shorten\", methods=[\"POST\"]) def shorten(): global mapping payload = request.json if \"url\" not in payload: return \"Missing URL Parameter\", 400 # TODO: check if URL is valid hash_ = md5() hash_.update(payload[\"url\"].encode()) digest = hash_.hexdigest()[:5] # limiting to 5 chars. Less the limit more the chances of collission if digest not in mapping: mapping[digest] = payload[\"url\"] return f\"Shortened: r/{digest}\\n\" else: # TODO: check for hash collission return f\"Already exists: r/{digest}\\n\" @app.route(\"/r/\") def redirect_(hash_): if hash_ not in mapping: return \"URL Not Found\", 404 return redirect(mapping[hash_]) if __name__ == \"__main__\": app.run(debug=True) \"\"\" OUTPUT: ===> SHORTENING $ curl localhost:5000/shorten -H \"content-type: application/json\" --data '{\"url\":\"https://linkedin.com\"}' Shortened: r/a62a4 ===> REDIRECTING, notice the response code 302 and the location header $ curl localhost:5000/r/a62a4 -v * Uses proxy env variable NO_PROXY == '127.0.0.1' * Trying ::1... * TCP_NODELAY set * Connection failed * connect to ::1 port 5000 failed: Connection refused * Trying 127.0.0.1... * TCP_NODELAY set * Connected to localhost (127.0.0.1) port 5000 (#0) > GET /r/a62a4 HTTP/1.1 > Host: localhost:5000 > User-Agent: curl/7.64.1 > Accept: */* > * HTTP 1.0, assume close after body < HTTP/1.0 302 FOUND < Content-Type: text/html; charset=utf-8 < Content-Length: 247 < Location: https://linkedin.com < Server: Werkzeug/0.15.4 Python/3.7.7 < Date: Tue, 27 Oct 2020 09:37:12 GMT < Redirecting...

Redirecting...

* Closing connection 0

You should be redirected automatically to target URL: https://linkedin.com. If not click the link. \"\"\"","title":"5. Other"},{"location":"level101/security/conclusion/","text":"Conclusion Now that you have completed this course on Security you are now aware of the possible security threats to computer systems & networks. Not only that, but you are now better able to protect your systems as well as recommend security measures to others. This course provides fundamental everyday knowledge on security domain which will also help you keep security at the top of your priority. Other Resources Some books that would be a great resource Holistic Info-Sec for Web Developers https://holisticinfosecforwebdevelopers.com/ - Free and downloadable book series with very broad and deep coverage of what Web Developers and DevOps Engineers need to know in order to create robust, reliable, maintainable and secure software, networks and other, that are delivered continuously, on time, with no nasty surprises Docker Security - Quick Reference: For DevOps Engineers https://leanpub.com/dockersecurity-quickreference - A book on understanding the Docker security defaults, how to improve them (theory and practical), along with many tools and techniques. How to Hack Like a Legend https://amzn.to/2uWh1Up - A hacker\u2019s tale breaking into a secretive offshore company, Sparc Flow, 2018 How to Investigate Like a Rockstar https://books2read.com/u/4jDWoZ - Live a real crisis to master the secrets of forensic analysis, Sparc Flow, 2017 Real World Cryptography https://www.manning.com/books/real-world-cryptography - This early-access book teaches you applied cryptographic techniques to understand and apply security at every level of your systems and applications. AWS Security https://www.manning.com/books/aws-security?utm_source=github&utm_medium=organic&utm_campaign=book_shields_aws_1_31_20 - This early-access book covers commong AWS security issues and best practices for access policies, data protection, auditing, continuous monitoring, and incident response. Post Training asks/ Further Reading CTF Events like : https://github.com/apsdehal/awesome-ctf Penetration Testing : https://github.com/enaqx/awesome-pentest Threat Intelligence : https://github.com/hslatman/awesome-threat-intelligence Threat Detection & Hunting : https://github.com/0x4D31/awesome-threat-detection Web Security: https://github.com/qazbnm456/awesome-web-security Building Secure and Reliable Systems : https://landing.google.com/sre/resources/foundationsandprinciples/srs-book/","title":"Conclusion"},{"location":"level101/security/conclusion/#conclusion","text":"Now that you have completed this course on Security you are now aware of the possible security threats to computer systems & networks. Not only that, but you are now better able to protect your systems as well as recommend security measures to others. This course provides fundamental everyday knowledge on security domain which will also help you keep security at the top of your priority.","title":"Conclusion"},{"location":"level101/security/conclusion/#other-resources","text":"Some books that would be a great resource Holistic Info-Sec for Web Developers https://holisticinfosecforwebdevelopers.com/ - Free and downloadable book series with very broad and deep coverage of what Web Developers and DevOps Engineers need to know in order to create robust, reliable, maintainable and secure software, networks and other, that are delivered continuously, on time, with no nasty surprises Docker Security - Quick Reference: For DevOps Engineers https://leanpub.com/dockersecurity-quickreference - A book on understanding the Docker security defaults, how to improve them (theory and practical), along with many tools and techniques. How to Hack Like a Legend https://amzn.to/2uWh1Up - A hacker\u2019s tale breaking into a secretive offshore company, Sparc Flow, 2018 How to Investigate Like a Rockstar https://books2read.com/u/4jDWoZ - Live a real crisis to master the secrets of forensic analysis, Sparc Flow, 2017 Real World Cryptography https://www.manning.com/books/real-world-cryptography - This early-access book teaches you applied cryptographic techniques to understand and apply security at every level of your systems and applications. AWS Security https://www.manning.com/books/aws-security?utm_source=github&utm_medium=organic&utm_campaign=book_shields_aws_1_31_20 - This early-access book covers commong AWS security issues and best practices for access policies, data protection, auditing, continuous monitoring, and incident response.","title":"Other Resources"},{"location":"level101/security/conclusion/#post-training-asks-further-reading","text":"CTF Events like : https://github.com/apsdehal/awesome-ctf Penetration Testing : https://github.com/enaqx/awesome-pentest Threat Intelligence : https://github.com/hslatman/awesome-threat-intelligence Threat Detection & Hunting : https://github.com/0x4D31/awesome-threat-detection Web Security: https://github.com/qazbnm456/awesome-web-security Building Secure and Reliable Systems : https://landing.google.com/sre/resources/foundationsandprinciples/srs-book/","title":"Post Training asks/ Further Reading"},{"location":"level101/security/fundamentals/","text":"Part I: Fundamentals Introduction to Security Overview for SRE If you look closely, both Site Reliability Engineering and Security Engineering are concerned with keeping a system usable. Issues like broken releases, capacity shortages, and misconfigurations can make a system unusable (at least temporarily). Security or privacy incidents that break the trust of users also undermine the usefulness of a system. Consequently, system security should be top of mind for SREs. SREs should be involved in both significant design discussions and actual system changes. They have quite a big role in System design & hence are quite sometimes the first line of defence. SRE\u2019s help in preventing bad design & implementations which can affect the overall security of the infrastructure. Successfully designing, implementing, and maintaining systems requires a commitment to the full system lifecycle . This commitment is possible only when security and reliability are central elements in the architecture of systems. Core Pillars of Information Security : Confidentiality \u2013 only allow access to data for which the user is permitted Integrity \u2013 ensure data is not tampered or altered by unauthorized users Availability \u2013 ensure systems and data are available to authorized users when they need it Thinking like a Security Engineer When starting a new application or re-factoring an existing application, you should consider each functional feature, and consider: Is the process surrounding this feature as safe as possible? In other words, is this a flawed process? If I were evil, how would I abuse this feature? Or more specifically failing to address how a feature can be abused can cause design flaws. Is the feature required to be on by default? If so, are there limits or options that could help reduce the risk from this feature? Security Principles By OWASP (Open Web Application Security Project) Minimize attack surface area : Every feature that is added to an application adds a certain amount of risk to the overall application. The aim of secure development is to reduce the overall risk by reducing the attack surface area. For example, a web application implements online help with a search function. The search function may be vulnerable to SQL injection attacks. If the help feature was limited to authorized users, the attack likelihood is reduced. If the help feature\u2019s search function was gated through centralized data validation routines, the ability to perform SQL injection is dramatically reduced. However, if the help feature was re-written to eliminate the search function (through a better user interface, for example), this almost eliminates the attack surface area, even if the help feature was available to the Internet at large. Establish secure defaults: There are many ways to deliver an \u201cout of the box\u201d experience for users. However, by default, the experience should be secure, and it should be up to the user to reduce their security \u2013 if they are allowed. For example, by default, password ageing and complexity should be enabled. Users might be allowed to turn these two features off to simplify their use of the application and increase their risk. Default Passwords of routers, IoT devices should be changed Principle of Least privilege The principle of least privilege recommends that accounts have the least amount of privilege required to perform their business processes. This encompasses user rights, resource permissions such as CPU limits, memory, network, and file system permissions. For example, if a middleware server only requires access to the network, read access to a database table, and the ability to write to a log, this describes all the permissions that should be granted. Under no circumstances should the middleware be granted administrative privileges. Principle of Defense in depth The principle of defence in depth suggests that where one control would be reasonable, more controls that approach risks in different fashions are better. Controls, when used in depth, can make severe vulnerabilities extraordinarily difficult to exploit and thus unlikely to occur. With secure coding, this may take the form of tier-based validation, centralized auditing controls, and requiring users to be logged on all pages. For example, a flawed administrative interface is unlikely to be vulnerable to an anonymous attack if it correctly gates access to production management networks, checks for administrative user authorization, and logs all access. Fail securely Applications regularly fail to process transactions for many reasons. How they fail can determine if an application is secure or not. ``` is_admin = true; try { code_which_may_faile(); is_admin = is_user_assigned_role(\"Adminstrator\"); } catch (Exception err) { log.error(err.toString()); } ``` - If either codeWhichMayFail() or isUserInRole fails or throws an exception, the user is an admin by default. This is obviously a security risk. Don\u2019t trust services Many organizations utilize the processing capabilities of third-party partners, who more than likely have different security policies and posture than you. It is unlikely that you can influence or control any external third party, whether they are home users or major suppliers or partners. Therefore, the implicit trust of externally run systems is not warranted. All external systems should be treated similarly. For example, a loyalty program provider provides data that is used by Internet Banking, providing the number of reward points and a small list of potential redemption items. However, the data should be checked to ensure that it is safe to display to end-users and that the reward points are a positive number, and not improbably large. Separation of duties The key to fraud control is the separation of duties. For example, someone who requests a computer cannot also sign for it, nor should they directly receive the computer. This prevents the user from requesting many computers and claiming they never arrived. Certain roles have different levels of trust than normal users. In particular, administrators are different from normal users. In general, administrators should not be users of the application. For example, an administrator should be able to turn the system on or off, set password policy but shouldn\u2019t be able to log on to the storefront as a super privileged user, such as being able to \u201cbuy\u201d goods on behalf of other users. Avoid security by obscurity Security through obscurity is a weak security control, and nearly always fails when it is the only control. This is not to say that keeping secrets is a bad idea, it simply means that the security of systems should not be reliant upon keeping details hidden. For example, the security of an application should not rely upon knowledge of the source code being kept secret. The security should rely upon many other factors, including reasonable password policies, defence in depth, business transaction limits, solid network architecture, and fraud, and audit controls. A practical example is Linux. Linux\u2019s source code is widely available, and yet when properly secured, Linux is a secure and robust operating system. Keep security simple Attack surface area and simplicity go hand in hand. Certain software engineering practices prefer overly complex approaches to what would otherwise be a relatively straightforward and simple design. Developers should avoid the use of double negatives and complex architectures when a simpler approach would be faster and simpler. For example, although it might be fashionable to have a slew of singleton entity beans running on a separate middleware server, it is more secure and faster to simply use global variables with an appropriate mutex mechanism to protect against race conditions. Fix security issues correctly Once a security issue has been identified, it is important to develop a test for it and to understand the root cause of the issue. When design patterns are used, the security issue is likely widespread amongst all codebases, so developing the right fix without introducing regressions is essential. For example, a user has found that they can see another user\u2019s balance by adjusting their cookie. The fix seems to be relatively straightforward, but as the cookie handling code is shared among all applications, a change to just one application will trickle through to all other applications. The fix must, therefore, be tested on all affected applications. Reliability & Security Reliability and security are both crucial components of a truly trustworthy system, but building systems that are both reliable and secure is difficult. While the requirements for reliability and security share many common properties, they also require different design considerations. It is easy to miss the subtle interplay between reliability and security that can cause unexpected outcomes Ex: A password management application failure was triggered by a reliability problem i.e poor load-balancing and load-shedding strategies and its recovery were later complicated by multiple measures (HSM mechanism which needs to be plugged into server racks, which works as an authentication & the HSM token supposedly locked inside a case.. & the problem can be further elongated ) designed to increase the security of the system. Authentication vs Authorization Authentication is the act of validating that users are who they claim to be. Passwords are the most common authentication factor\u2014if a user enters the correct password, the system assumes the identity is valid and grants access. Other technologies such as One-Time Pins, authentication apps, and even biometrics can also be used to authenticate identity. In some instances, systems require the successful verification of more than one factor before granting access. This multi-factor authentication (MFA) requirement is often deployed to increase security beyond what passwords alone can provide. Authorization in system security is the process of giving the user permission to access a specific resource or function. This term is often used interchangeably with access control or client privilege. Giving someone permission to download a particular file on a server or providing individual users with administrative access to an application are good examples. In secure environments, authorization must always follow authentication, users should first prove that their identities are genuine before an organization\u2019s administrators grant them access to the requested resources. Common authentication flow (local authentication) The user registers using an identifier like username/email/mobile The application stores user credentials in the database The application sends a verification email/message to validate the registration Post successful registration, the user enters credentials for logging in On successful authentication, the user is allowed access to specific resources OpenID/OAuth OpenID is an authentication protocol that allows us to authenticate users without using a local auth system. In such a scenario, a user has to be registered with an OpenID Provider and the same provider should be integrated with the authentication flow of your application. To verify the details, we have to forward the authentication requests to the provider. On successful authentication, we receive a success message and/or profile details with which we can execute the necessary flow. OAuth is an authorization mechanism that allows your application user access to a provider(Gmail/Facebook/Instagram/etc). On successful response, we (your application) receive a token with which the application can access certain APIs on behalf of a user. OAuth is convenient in case your business use case requires some certain user-facing APIs like access to Google Drive or sending tweets on your behalf. Most OAuth 2.0 providers can be used for pseudo authentication. Having said that, it can get pretty complicated if you are using multiple OAuth providers to authenticate users on top of the local authentication system. Cryptography It is the science and study of hiding any text in such a way that only the intended recipients or authorized persons can read it and that any text can even use things such as invisible ink or the mechanical cryptography machines of the past. Cryptography is necessary for securing critical or proprietary information and is used to encode private data messages by converting some plain text into ciphertext. At its core, there are two ways of doing this, more advanced methods are all built upon. Ciphers Ciphers are the cornerstone of cryptography. A cipher is a set of algorithms that performs encryption or decryption on a message. An encryption algorithm (E) takes a secret key (k) and a message (m) and produces a ciphertext (c). Similarly, a Decryption algorithm (D) takes a secret key (K) and the previous resulting Ciphertext (C). They are represented as follows: E(k,m) = c D(k,c) = m This also means that for it to be a cipher, it must satisfy the consistency equation as follows, making it possible to decrypt. D(k,E(k,m)) = m Stream Ciphers: The message is broken into characters or bits and enciphered with a key or keystream(should be random and generated independently of the message stream) that is as long as the plaintext bitstream. If the keystream is random, this scheme would be unbreakable unless the keystream was acquired, making it unconditionally secure. The keystream must be provided to both parties in a secure way to prevent its release. Block Ciphers: Block ciphers \u2014 process messages in blocks, each of which is then encrypted or decrypted. A block cipher is a symmetric cipher in which blocks of plaintext are treated as a whole and used to produce ciphertext blocks. The block cipher takes blocks that are b bits long and encrypts them to blocks that are also b bits long. Block sizes are typically 64 or 128 bits long. Encryption Secret Key (Symmetric Key) : the same key is used for encryption and decryption Public Key (Asymmetric Key) in an asymmetric, the encryption and decryption keys are different but related. The encryption key is known as the public key and the decryption key is known as the private key. The public and private keys are known as a key pair. Symmetric Key Encryption DES The Data Encryption Standard (DES) has been the worldwide encryption standard for a long time. IBM developed DES in 1975, and it has held up remarkably well against years of cryptanalysis. DES is a symmetric encryption algorithm with a fixed key length of 56 bits. The algorithm is still good, but because of the short key length, it is susceptible to brute-force attacks that have sufficient resources. DES usually operates in block mode, whereby it encrypts data in 64-bit blocks. The same algorithm and key are used for both encryption and decryption. Because DES is based on simple mathematical functions, it can be easily implemented and accelerated in hardware. Triple DES With advances in computer processing power, the original 56-bit DES key became too short to withstand an attacker with even a limited budget. One way of increasing the effective key length of DES without changing the well-analyzed algorithm itself is to use the same algorithm with different keys several times in a row. The technique of applying DES three times in a row to a plain text block is called Triple DES (3DES). The 3DES technique is shown in Figure. Brute-force attacks on 3DES are considered unfeasible today. Because the basic algorithm has been tested in the field for more than 25 years, it is considered to be more trustworthy than its predecessor. AES On October 2, 2000, The U.S. National Institute of Standards and Technology (NIST) announced the selection of the Rijndael cipher as the AES algorithm. This cipher, developed by Joan Daemen and Vincent Rijmen, has a variable block length and key length. The algorithm currently specifies how to use keys with a length of 128, 192, or 256 bits to encrypt blocks with a length of 128, 192, or 256 bits (all nine combinations of key length and block length are possible). Both block and key lengths can be extended easily to multiples of 32 bits. AES was chosen to replace DES and 3DES because they are either too weak (DES, in terms of key length) or too slow (3DES) to run on modern, efficient hardware. AES is more efficient and much faster, usually by a factor of 5 compared to DES on the same hardware. AES is also more suitable for high throughput, especially if pure software encryption is used. However, AES is a relatively young algorithm, and as the golden rule of cryptography states, \u201cA more mature algorithm is always more trusted.\u201d Asymmetric Key Algorithm In a symmetric key system, Alice first puts the secret message in a box and then padlocks the box using a lock to which she has a key. She then sends the box to Bob through regular mail. When Bob receives the box, he uses an identical copy of Alice's key (which he has obtained previously) to open the box and read the message. In an asymmetric key system, instead of opening the box when he receives it, Bob simply adds his own personal lock to the box and returns the box through public mail to Alice. Alice uses her key to remove her lock and returns the box to Bob, with Bob's lock still in place. Finally, Bob uses his key to remove his lock and reads the message from Alice. The critical advantage in an asymmetric system is that Alice never needs to send a copy of her key to Bob. This reduces the possibility that a third party (for example, an unscrupulous postmaster) can copy the key while it is in transit to Bob, allowing that third party to spy on all future messages sent by Alice. In addition, if Bob is careless and allows someone else to copy his key, Alice's messages to Bob are compromised, but Alice's messages to other people remain secret NOTE : In terms of TLS key exchange, this is the common approach. Diffie-Hellman The protocol has two system parameters, p and g. They are both public and may be used by everybody. Parameter p is a prime number, and parameter g (usually called a generator) is an integer that is smaller than p, but with the following property: For every number n between 1 and p \u2013 1 inclusive, there is a power k of g such that n = gk mod p. Diffie Hellman algorithm is an asymmetric algorithm used to establish a shared secret for a symmetric key algorithm. Nowadays most of the people use hybrid cryptosystem i.e, a combination of symmetric and asymmetric encryption. Asymmetric Encryption is used as a technique in key exchange mechanism to share a secret key and after the key is shared between sender and receiver, the communication will take place using symmetric encryption. The shared secret key will be used to encrypt the communication. Refer: https://medium.com/@akhigbemmanuel/what-is-the-diffie-hellman-key-exchange-algorithm-84d60025a30d RSA The RSA algorithm is very flexible and has a variable key length where, if necessary, speed can be traded for the level of security of the algorithm. The RSA keys are usually 512 to 2048 bits long. RSA has withstood years of extensive cryptanalysis. Although those years neither proved nor disproved RSA's security, they attest to a confidence level in the algorithm. RSA security is based on the difficulty of factoring very large numbers. If an easy method of factoring these large numbers were discovered, the effectiveness of RSA would be destroyed. Refer: https://medium.com/curiositypapers/a-complete-explanation-of-rsa-asymmetric-encryption-742c5971e0f NOTE : RSA Keys can be used for key exchange just like Diffie Hellman Hashing Algorithms Hashing is one of the mechanisms used for data integrity assurance. Hashing is based on a one-way mathematical function, which is relatively easy to compute but significantly harder to reverse. A hash function, which is a one-way function to input data to produce a fixed-length digest (fingerprint) of output data. The digest is cryptographically strong; that is, it is impossible to recover input data from its digest. If the input data changes just a little, the digest (fingerprint) changes substantially in what is called an avalanche effect. More: https://medium.com/@rauljordan/the-state-of-hashing-algorithms-the-why-the-how-and-the-future-b21d5c0440de https://medium.com/@StevieCEllis/the-beautiful-hash-algorithm-f18d9d2b84fb MD5 MD5 is a one-way function with which it is easy to compute the hash from the given input data, but it is unfeasible to compute input data given only a hash. SHA-1 MD5 is considered less secure than SHA-1 because MD5 has some weaknesses. HA-1 also uses a stronger, 160-bit digest, which makes MD5 the second choice as hash methods are concerned. The algorithm takes a message of less than 264 bits in length and produces a 160-bit message digest. This algorithm is slightly slower than MD5. NOTE : SHA-1 is also recently demonstrated to be broken, Minimum current recommendation is SHA-256 Digital Certificates Digital signatures, provide a means to digitally authenticate devices and individual users. In public-key cryptography, such as the RSA encryption system, each user has a key-pair containing both a public key and a private key. The keys act as complements, and anything encrypted with one of the keys can be decrypted with the other. In simple terms, a signature is formed when data is encrypted with a user's private key. The receiver verifies the signature by decrypting the message with the sender's public key. Key management is often considered the most difficult task in designing and implementing cryptographic systems. Businesses can simplify some of the deployment and management issues that are encountered with secured data communications by employing a Public Key Infrastructure (PKI). Because corporations often move security-sensitive communications across the Internet, an effective mechanism must be implemented to protect sensitive information from the threats presented on the Internet. PKI provides a hierarchical framework for managing digital security attributes. Each PKI participant holds a digital certificate that has been issued by a CA (either public or private). The certificate contains several attributes that are used when parties negotiate a secure connection. These attributes must include the certificate validity period, end-host identity information, encryption keys that will be used for secure communications, and the signature of the issuing CA. Optional attributes may be included, depending on the requirements and capability of the PKI. A CA can be a trusted third party, such as VeriSign or Entrust, or a private (in-house) CA that you establish within your organization. The fact that the message could be decrypted using the sender's public key means that the holder of the private key created the message. This process relies on the receiver having a copy of the sender's public key and knowing with a high degree of certainty that it really does belong to the sender and not to someone pretending to be the sender. To validate the CA's signature, the receiver must know the CA's public key. Normally, this is handled out-of-band or through an operation performed during the installation of the certificate. For instance, most web browsers are configured with the root certificates of several CAs by default. CA Enrollment process The end host generates a private-public key pair. The end host generates a certificate request, which it forwards to the CA. Manual human intervention is required to approve the enrollment request, which is received by the CA. After the CA operator approves the request, the CA signs the certificate request with its private key and returns the completed certificate to the end host. The end host writes the certificate into a nonvolatile storage area (PC hard disk or NVRAM on Cisco routers). Refer : https://www.ssh.com/manuals/server-zos-product/55/ch06s03s01.html Login Security SSH SSH, the Secure Shell, is a popular, powerful, software-based approach to network security. Whenever data is sent by a computer to the network, SSH automatically encrypts (scrambles) it. Then, when the data reaches its intended recipient, SSH automatically decrypts (unscrambles) it. The result is transparent encryption: users can work normally, unaware that their communications are safely encrypted on the network. In addition, SSH can use modern, secure encryption algorithms based on how it's being configured and is effective enough to be found within mission-critical applications at major corporations. SSH has a client/server architecture An SSH server program, typically installed and run by a system administrator, accepts or rejects incoming connections to its host computer. Users then run SSH client programs, typically on other computers, to make requests of the SSH server, such as \u201cPlease log me in,\u201d \u201cPlease send me a file,\u201d or \u201cPlease execute this command.\u201d All communications between clients and servers are securely encrypted and protected from modification. What SSH is not: Although SSH stands for Secure Shell, it is not a true shell in the sense of the Unix Bourne shell and C shell. It is not a command interpreter, nor does it provide wildcard expansion, command history, and so forth. Rather, SSH creates a channel for running a shell on a remote computer, with end-to-end encryption between the two systems. The major features and guarantees of the SSH protocol are: Privacy of your data, via strong encryption Integrity of communications, guaranteeing they haven\u2019t been altered Authentication, i.e., proof of identity of senders and receivers Authorization, i.e., access control to accounts Forwarding or tunnelling to encrypt other TCP/IP-based sessions Kerberos According to Greek mythology Kerberos (Cerberus) was the gigantic, three-headed dog that guards the gates of the underworld to prevent the dead from leaving. So when it comes to Computer Science, Kerberos is a network authentication protocol and is currently the default authentication technology used by Microsoft Active Directory to authenticate users to services within a local area network. Kerberos uses symmetric-key cryptography and requires a trusted third-party authentication service to verify user identities. So they used the name of Kerberos for their computer network authentication protocol as the three heads of the Kerberos represent: a client: A user/ a service a server: Kerberos protected hosts reside - a Key Distribution Center (KDC), which acts as the trusted third-party authentication service. The KDC includes the following two servers: Authentication Server (AS) that performs the initial authentication and issues ticket-granting tickets (TGT) for users. Ticket-Granting Server (TGS) that issues service tickets that are based on the initial ticket-granting tickets (TGT). Certificate Chain The first part of the output of the OpenSSL command shows three certificates numbered 0, 1, and 2(not 2 anymore). Each certificate has a subject, s, and an issuer, i. The first certificate, number 0, is called the end-entity certificate. The subject line tells us it\u2019s valid for any subdomain of google.com because its subject is set to *.google.com. $ openssl s_client -connect www.google.com:443 -CApath /etc/ssl/certs CONNECTED(00000005) depth=2 OU = GlobalSign Root CA - R2, O = GlobalSign, CN = GlobalSign verify return:1 depth=1 C = US, O = Google Trust Services, CN = GTS CA 1O1 verify return:1 depth=0 C = US, ST = California, L = Mountain View, O = Google LLC, CN = www.google.com verify return:1 --- Certificate chain 0 s:/C=US/ST=California/L=Mountain View/O=Google LLC/CN=www.google.com i:/C=US/O=Google Trust Services/CN=GTS CA 1O1 1 s:/C=US/O=Google Trust Services/CN=GTS CA 1O1 i:/OU=GlobalSign Root CA - R2/O=GlobalSign/CN=GlobalSign --- Server certificate The issuer line indicates it\u2019s issued by Google Internet Authority G2, which also happens to be the subject of the second certificate, number 1 What the OpenSSL command line doesn\u2019t show here is the trust store that contains the list of CA certificates trusted by the system OpenSSL runs on. The public certificate of GlobalSign Authority must be present in the system\u2019s trust store to close the verification chain. This is called a chain of trust, and the figure below summarizes its behaviour at a high level. High-level view of the concept of chain of trust applied to verifying the authenticity of a website. The Root CA in the Firefox trust store provides the initial trust to verify the entire chain and trust the end-entity certificate. TLS Handshake The client sends a HELLO message to the server with a list of protocols and algorithms it supports. The server says HELLO back and sends its chain of certificates. Based on the capabilities of the client, the server picks a cipher suite. If the cipher suite supports ephemeral key exchange, like ECDHE does(ECDHE is an algorithm known as the Elliptic Curve Diffie-Hellman Exchange), the server and the client negotiate a pre-master key with the Diffie-Hellman algorithm. The pre-master key is never sent over the wire. The client and server create a session key that will be used to encrypt the data transiting through the connection. At the end of the handshake, both parties possess a secret session key used to encrypt data for the rest of the connection. This is what OpenSSL refers to as Master-Key NOTE There are 3 versions of TLS , TLS 1.0, 1.1 & 1.2 TLS 1.0 was released in 1999, making it a nearly two-decade-old protocol. It has been known to be vulnerable to attacks\u2014such as BEAST and POODLE\u2014for years, in addition to supporting weak cryptography, which doesn\u2019t keep modern-day connections sufficiently secure. TLS 1.1 is the forgotten \u201cmiddle child.\u201d It also has bad cryptography like its younger sibling. In most software, it was leapfrogged by TLS 1.2 and it\u2019s rare to see TLS 1.1 used. \u201cPerfect\u201d Forward Secrecy The term \u201cephemeral\u201d in the key exchange provides an important security feature mis-named perfect forward secrecy (PFS) or just \u201cForward Secrecy\u201d. In a non-ephemeral key exchange, the client sends the pre-master key to the server by encrypting it with the server\u2019s public key. The server then decrypts the pre-master key with its private key. If at a later point in time, the private key of the server is compromised, an attacker can go back to this handshake, decrypt the pre-master key, obtain the session key, and decrypt the entire traffic. Non-ephemeral key exchanges are vulnerable to attacks that may happen in the future on recorded traffic. And because people seldom change their password, decrypting data from the past may still be valuable for an attacker. An ephemeral key exchange like DHE, or its variant on elliptic curve, ECDHE, solves this problem by not transmitting the pre-master key over the wire. Instead, the pre-master key is computed by both the client and the server in isolation, using nonsensitive information exchanged publicly. Because the pre-master key can\u2019t be decrypted later by an attacker, the session key is safe from future attacks: hence, the term perfect forward secrecy. Keys are changed every X blocks along the stream. That prevents an attacker from simply sniffing the stream and applying brute force to crack the whole thing. \"Forward secrecy\" means that just because I can decrypt block M, does not mean that I can decrypt block Q Downside: The downside to PFS is that all those extra computational steps induce latency on the handshake and slow the user down. To avoid repeating this expensive work at every connection, both sides cache the session key for future use via a technique called session resumption. This is what the session-ID and TLS ticket are for: they allow a client and server that share a session ID to skip over the negotiation of a session key, because they already agreed on one previously, and go directly to exchanging data securely.","title":"Fundamentals of Security"},{"location":"level101/security/fundamentals/#part-i-fundamentals","text":"","title":"Part I: Fundamentals"},{"location":"level101/security/fundamentals/#introduction-to-security-overview-for-sre","text":"If you look closely, both Site Reliability Engineering and Security Engineering are concerned with keeping a system usable. Issues like broken releases, capacity shortages, and misconfigurations can make a system unusable (at least temporarily). Security or privacy incidents that break the trust of users also undermine the usefulness of a system. Consequently, system security should be top of mind for SREs. SREs should be involved in both significant design discussions and actual system changes. They have quite a big role in System design & hence are quite sometimes the first line of defence. SRE\u2019s help in preventing bad design & implementations which can affect the overall security of the infrastructure. Successfully designing, implementing, and maintaining systems requires a commitment to the full system lifecycle . This commitment is possible only when security and reliability are central elements in the architecture of systems. Core Pillars of Information Security : Confidentiality \u2013 only allow access to data for which the user is permitted Integrity \u2013 ensure data is not tampered or altered by unauthorized users Availability \u2013 ensure systems and data are available to authorized users when they need it Thinking like a Security Engineer When starting a new application or re-factoring an existing application, you should consider each functional feature, and consider: Is the process surrounding this feature as safe as possible? In other words, is this a flawed process? If I were evil, how would I abuse this feature? Or more specifically failing to address how a feature can be abused can cause design flaws. Is the feature required to be on by default? If so, are there limits or options that could help reduce the risk from this feature? Security Principles By OWASP (Open Web Application Security Project) Minimize attack surface area : Every feature that is added to an application adds a certain amount of risk to the overall application. The aim of secure development is to reduce the overall risk by reducing the attack surface area. For example, a web application implements online help with a search function. The search function may be vulnerable to SQL injection attacks. If the help feature was limited to authorized users, the attack likelihood is reduced. If the help feature\u2019s search function was gated through centralized data validation routines, the ability to perform SQL injection is dramatically reduced. However, if the help feature was re-written to eliminate the search function (through a better user interface, for example), this almost eliminates the attack surface area, even if the help feature was available to the Internet at large. Establish secure defaults: There are many ways to deliver an \u201cout of the box\u201d experience for users. However, by default, the experience should be secure, and it should be up to the user to reduce their security \u2013 if they are allowed. For example, by default, password ageing and complexity should be enabled. Users might be allowed to turn these two features off to simplify their use of the application and increase their risk. Default Passwords of routers, IoT devices should be changed Principle of Least privilege The principle of least privilege recommends that accounts have the least amount of privilege required to perform their business processes. This encompasses user rights, resource permissions such as CPU limits, memory, network, and file system permissions. For example, if a middleware server only requires access to the network, read access to a database table, and the ability to write to a log, this describes all the permissions that should be granted. Under no circumstances should the middleware be granted administrative privileges. Principle of Defense in depth The principle of defence in depth suggests that where one control would be reasonable, more controls that approach risks in different fashions are better. Controls, when used in depth, can make severe vulnerabilities extraordinarily difficult to exploit and thus unlikely to occur. With secure coding, this may take the form of tier-based validation, centralized auditing controls, and requiring users to be logged on all pages. For example, a flawed administrative interface is unlikely to be vulnerable to an anonymous attack if it correctly gates access to production management networks, checks for administrative user authorization, and logs all access. Fail securely Applications regularly fail to process transactions for many reasons. How they fail can determine if an application is secure or not. ``` is_admin = true; try { code_which_may_faile(); is_admin = is_user_assigned_role(\"Adminstrator\"); } catch (Exception err) { log.error(err.toString()); } ``` - If either codeWhichMayFail() or isUserInRole fails or throws an exception, the user is an admin by default. This is obviously a security risk. Don\u2019t trust services Many organizations utilize the processing capabilities of third-party partners, who more than likely have different security policies and posture than you. It is unlikely that you can influence or control any external third party, whether they are home users or major suppliers or partners. Therefore, the implicit trust of externally run systems is not warranted. All external systems should be treated similarly. For example, a loyalty program provider provides data that is used by Internet Banking, providing the number of reward points and a small list of potential redemption items. However, the data should be checked to ensure that it is safe to display to end-users and that the reward points are a positive number, and not improbably large. Separation of duties The key to fraud control is the separation of duties. For example, someone who requests a computer cannot also sign for it, nor should they directly receive the computer. This prevents the user from requesting many computers and claiming they never arrived. Certain roles have different levels of trust than normal users. In particular, administrators are different from normal users. In general, administrators should not be users of the application. For example, an administrator should be able to turn the system on or off, set password policy but shouldn\u2019t be able to log on to the storefront as a super privileged user, such as being able to \u201cbuy\u201d goods on behalf of other users. Avoid security by obscurity Security through obscurity is a weak security control, and nearly always fails when it is the only control. This is not to say that keeping secrets is a bad idea, it simply means that the security of systems should not be reliant upon keeping details hidden. For example, the security of an application should not rely upon knowledge of the source code being kept secret. The security should rely upon many other factors, including reasonable password policies, defence in depth, business transaction limits, solid network architecture, and fraud, and audit controls. A practical example is Linux. Linux\u2019s source code is widely available, and yet when properly secured, Linux is a secure and robust operating system. Keep security simple Attack surface area and simplicity go hand in hand. Certain software engineering practices prefer overly complex approaches to what would otherwise be a relatively straightforward and simple design. Developers should avoid the use of double negatives and complex architectures when a simpler approach would be faster and simpler. For example, although it might be fashionable to have a slew of singleton entity beans running on a separate middleware server, it is more secure and faster to simply use global variables with an appropriate mutex mechanism to protect against race conditions. Fix security issues correctly Once a security issue has been identified, it is important to develop a test for it and to understand the root cause of the issue. When design patterns are used, the security issue is likely widespread amongst all codebases, so developing the right fix without introducing regressions is essential. For example, a user has found that they can see another user\u2019s balance by adjusting their cookie. The fix seems to be relatively straightforward, but as the cookie handling code is shared among all applications, a change to just one application will trickle through to all other applications. The fix must, therefore, be tested on all affected applications. Reliability & Security Reliability and security are both crucial components of a truly trustworthy system, but building systems that are both reliable and secure is difficult. While the requirements for reliability and security share many common properties, they also require different design considerations. It is easy to miss the subtle interplay between reliability and security that can cause unexpected outcomes Ex: A password management application failure was triggered by a reliability problem i.e poor load-balancing and load-shedding strategies and its recovery were later complicated by multiple measures (HSM mechanism which needs to be plugged into server racks, which works as an authentication & the HSM token supposedly locked inside a case.. & the problem can be further elongated ) designed to increase the security of the system.","title":"Introduction to Security Overview for SRE"},{"location":"level101/security/fundamentals/#authentication-vs-authorization","text":"Authentication is the act of validating that users are who they claim to be. Passwords are the most common authentication factor\u2014if a user enters the correct password, the system assumes the identity is valid and grants access. Other technologies such as One-Time Pins, authentication apps, and even biometrics can also be used to authenticate identity. In some instances, systems require the successful verification of more than one factor before granting access. This multi-factor authentication (MFA) requirement is often deployed to increase security beyond what passwords alone can provide. Authorization in system security is the process of giving the user permission to access a specific resource or function. This term is often used interchangeably with access control or client privilege. Giving someone permission to download a particular file on a server or providing individual users with administrative access to an application are good examples. In secure environments, authorization must always follow authentication, users should first prove that their identities are genuine before an organization\u2019s administrators grant them access to the requested resources.","title":"Authentication vs Authorization"},{"location":"level101/security/fundamentals/#common-authentication-flow-local-authentication","text":"The user registers using an identifier like username/email/mobile The application stores user credentials in the database The application sends a verification email/message to validate the registration Post successful registration, the user enters credentials for logging in On successful authentication, the user is allowed access to specific resources","title":"Common authentication flow (local authentication)"},{"location":"level101/security/fundamentals/#openidoauth","text":"OpenID is an authentication protocol that allows us to authenticate users without using a local auth system. In such a scenario, a user has to be registered with an OpenID Provider and the same provider should be integrated with the authentication flow of your application. To verify the details, we have to forward the authentication requests to the provider. On successful authentication, we receive a success message and/or profile details with which we can execute the necessary flow. OAuth is an authorization mechanism that allows your application user access to a provider(Gmail/Facebook/Instagram/etc). On successful response, we (your application) receive a token with which the application can access certain APIs on behalf of a user. OAuth is convenient in case your business use case requires some certain user-facing APIs like access to Google Drive or sending tweets on your behalf. Most OAuth 2.0 providers can be used for pseudo authentication. Having said that, it can get pretty complicated if you are using multiple OAuth providers to authenticate users on top of the local authentication system.","title":"OpenID/OAuth"},{"location":"level101/security/fundamentals/#cryptography","text":"It is the science and study of hiding any text in such a way that only the intended recipients or authorized persons can read it and that any text can even use things such as invisible ink or the mechanical cryptography machines of the past. Cryptography is necessary for securing critical or proprietary information and is used to encode private data messages by converting some plain text into ciphertext. At its core, there are two ways of doing this, more advanced methods are all built upon.","title":"Cryptography"},{"location":"level101/security/fundamentals/#ciphers","text":"Ciphers are the cornerstone of cryptography. A cipher is a set of algorithms that performs encryption or decryption on a message. An encryption algorithm (E) takes a secret key (k) and a message (m) and produces a ciphertext (c). Similarly, a Decryption algorithm (D) takes a secret key (K) and the previous resulting Ciphertext (C). They are represented as follows: E(k,m) = c D(k,c) = m This also means that for it to be a cipher, it must satisfy the consistency equation as follows, making it possible to decrypt. D(k,E(k,m)) = m Stream Ciphers: The message is broken into characters or bits and enciphered with a key or keystream(should be random and generated independently of the message stream) that is as long as the plaintext bitstream. If the keystream is random, this scheme would be unbreakable unless the keystream was acquired, making it unconditionally secure. The keystream must be provided to both parties in a secure way to prevent its release. Block Ciphers: Block ciphers \u2014 process messages in blocks, each of which is then encrypted or decrypted. A block cipher is a symmetric cipher in which blocks of plaintext are treated as a whole and used to produce ciphertext blocks. The block cipher takes blocks that are b bits long and encrypts them to blocks that are also b bits long. Block sizes are typically 64 or 128 bits long. Encryption Secret Key (Symmetric Key) : the same key is used for encryption and decryption Public Key (Asymmetric Key) in an asymmetric, the encryption and decryption keys are different but related. The encryption key is known as the public key and the decryption key is known as the private key. The public and private keys are known as a key pair. Symmetric Key Encryption DES The Data Encryption Standard (DES) has been the worldwide encryption standard for a long time. IBM developed DES in 1975, and it has held up remarkably well against years of cryptanalysis. DES is a symmetric encryption algorithm with a fixed key length of 56 bits. The algorithm is still good, but because of the short key length, it is susceptible to brute-force attacks that have sufficient resources. DES usually operates in block mode, whereby it encrypts data in 64-bit blocks. The same algorithm and key are used for both encryption and decryption. Because DES is based on simple mathematical functions, it can be easily implemented and accelerated in hardware. Triple DES With advances in computer processing power, the original 56-bit DES key became too short to withstand an attacker with even a limited budget. One way of increasing the effective key length of DES without changing the well-analyzed algorithm itself is to use the same algorithm with different keys several times in a row. The technique of applying DES three times in a row to a plain text block is called Triple DES (3DES). The 3DES technique is shown in Figure. Brute-force attacks on 3DES are considered unfeasible today. Because the basic algorithm has been tested in the field for more than 25 years, it is considered to be more trustworthy than its predecessor. AES On October 2, 2000, The U.S. National Institute of Standards and Technology (NIST) announced the selection of the Rijndael cipher as the AES algorithm. This cipher, developed by Joan Daemen and Vincent Rijmen, has a variable block length and key length. The algorithm currently specifies how to use keys with a length of 128, 192, or 256 bits to encrypt blocks with a length of 128, 192, or 256 bits (all nine combinations of key length and block length are possible). Both block and key lengths can be extended easily to multiples of 32 bits. AES was chosen to replace DES and 3DES because they are either too weak (DES, in terms of key length) or too slow (3DES) to run on modern, efficient hardware. AES is more efficient and much faster, usually by a factor of 5 compared to DES on the same hardware. AES is also more suitable for high throughput, especially if pure software encryption is used. However, AES is a relatively young algorithm, and as the golden rule of cryptography states, \u201cA more mature algorithm is always more trusted.\u201d Asymmetric Key Algorithm In a symmetric key system, Alice first puts the secret message in a box and then padlocks the box using a lock to which she has a key. She then sends the box to Bob through regular mail. When Bob receives the box, he uses an identical copy of Alice's key (which he has obtained previously) to open the box and read the message. In an asymmetric key system, instead of opening the box when he receives it, Bob simply adds his own personal lock to the box and returns the box through public mail to Alice. Alice uses her key to remove her lock and returns the box to Bob, with Bob's lock still in place. Finally, Bob uses his key to remove his lock and reads the message from Alice. The critical advantage in an asymmetric system is that Alice never needs to send a copy of her key to Bob. This reduces the possibility that a third party (for example, an unscrupulous postmaster) can copy the key while it is in transit to Bob, allowing that third party to spy on all future messages sent by Alice. In addition, if Bob is careless and allows someone else to copy his key, Alice's messages to Bob are compromised, but Alice's messages to other people remain secret NOTE : In terms of TLS key exchange, this is the common approach. Diffie-Hellman The protocol has two system parameters, p and g. They are both public and may be used by everybody. Parameter p is a prime number, and parameter g (usually called a generator) is an integer that is smaller than p, but with the following property: For every number n between 1 and p \u2013 1 inclusive, there is a power k of g such that n = gk mod p. Diffie Hellman algorithm is an asymmetric algorithm used to establish a shared secret for a symmetric key algorithm. Nowadays most of the people use hybrid cryptosystem i.e, a combination of symmetric and asymmetric encryption. Asymmetric Encryption is used as a technique in key exchange mechanism to share a secret key and after the key is shared between sender and receiver, the communication will take place using symmetric encryption. The shared secret key will be used to encrypt the communication. Refer: https://medium.com/@akhigbemmanuel/what-is-the-diffie-hellman-key-exchange-algorithm-84d60025a30d RSA The RSA algorithm is very flexible and has a variable key length where, if necessary, speed can be traded for the level of security of the algorithm. The RSA keys are usually 512 to 2048 bits long. RSA has withstood years of extensive cryptanalysis. Although those years neither proved nor disproved RSA's security, they attest to a confidence level in the algorithm. RSA security is based on the difficulty of factoring very large numbers. If an easy method of factoring these large numbers were discovered, the effectiveness of RSA would be destroyed. Refer: https://medium.com/curiositypapers/a-complete-explanation-of-rsa-asymmetric-encryption-742c5971e0f NOTE : RSA Keys can be used for key exchange just like Diffie Hellman Hashing Algorithms Hashing is one of the mechanisms used for data integrity assurance. Hashing is based on a one-way mathematical function, which is relatively easy to compute but significantly harder to reverse. A hash function, which is a one-way function to input data to produce a fixed-length digest (fingerprint) of output data. The digest is cryptographically strong; that is, it is impossible to recover input data from its digest. If the input data changes just a little, the digest (fingerprint) changes substantially in what is called an avalanche effect. More: https://medium.com/@rauljordan/the-state-of-hashing-algorithms-the-why-the-how-and-the-future-b21d5c0440de https://medium.com/@StevieCEllis/the-beautiful-hash-algorithm-f18d9d2b84fb MD5 MD5 is a one-way function with which it is easy to compute the hash from the given input data, but it is unfeasible to compute input data given only a hash. SHA-1 MD5 is considered less secure than SHA-1 because MD5 has some weaknesses. HA-1 also uses a stronger, 160-bit digest, which makes MD5 the second choice as hash methods are concerned. The algorithm takes a message of less than 264 bits in length and produces a 160-bit message digest. This algorithm is slightly slower than MD5. NOTE : SHA-1 is also recently demonstrated to be broken, Minimum current recommendation is SHA-256 Digital Certificates Digital signatures, provide a means to digitally authenticate devices and individual users. In public-key cryptography, such as the RSA encryption system, each user has a key-pair containing both a public key and a private key. The keys act as complements, and anything encrypted with one of the keys can be decrypted with the other. In simple terms, a signature is formed when data is encrypted with a user's private key. The receiver verifies the signature by decrypting the message with the sender's public key. Key management is often considered the most difficult task in designing and implementing cryptographic systems. Businesses can simplify some of the deployment and management issues that are encountered with secured data communications by employing a Public Key Infrastructure (PKI). Because corporations often move security-sensitive communications across the Internet, an effective mechanism must be implemented to protect sensitive information from the threats presented on the Internet. PKI provides a hierarchical framework for managing digital security attributes. Each PKI participant holds a digital certificate that has been issued by a CA (either public or private). The certificate contains several attributes that are used when parties negotiate a secure connection. These attributes must include the certificate validity period, end-host identity information, encryption keys that will be used for secure communications, and the signature of the issuing CA. Optional attributes may be included, depending on the requirements and capability of the PKI. A CA can be a trusted third party, such as VeriSign or Entrust, or a private (in-house) CA that you establish within your organization. The fact that the message could be decrypted using the sender's public key means that the holder of the private key created the message. This process relies on the receiver having a copy of the sender's public key and knowing with a high degree of certainty that it really does belong to the sender and not to someone pretending to be the sender. To validate the CA's signature, the receiver must know the CA's public key. Normally, this is handled out-of-band or through an operation performed during the installation of the certificate. For instance, most web browsers are configured with the root certificates of several CAs by default. CA Enrollment process The end host generates a private-public key pair. The end host generates a certificate request, which it forwards to the CA. Manual human intervention is required to approve the enrollment request, which is received by the CA. After the CA operator approves the request, the CA signs the certificate request with its private key and returns the completed certificate to the end host. The end host writes the certificate into a nonvolatile storage area (PC hard disk or NVRAM on Cisco routers). Refer : https://www.ssh.com/manuals/server-zos-product/55/ch06s03s01.html","title":"Ciphers"},{"location":"level101/security/fundamentals/#login-security","text":"","title":"Login Security"},{"location":"level101/security/fundamentals/#ssh","text":"SSH, the Secure Shell, is a popular, powerful, software-based approach to network security. Whenever data is sent by a computer to the network, SSH automatically encrypts (scrambles) it. Then, when the data reaches its intended recipient, SSH automatically decrypts (unscrambles) it. The result is transparent encryption: users can work normally, unaware that their communications are safely encrypted on the network. In addition, SSH can use modern, secure encryption algorithms based on how it's being configured and is effective enough to be found within mission-critical applications at major corporations. SSH has a client/server architecture An SSH server program, typically installed and run by a system administrator, accepts or rejects incoming connections to its host computer. Users then run SSH client programs, typically on other computers, to make requests of the SSH server, such as \u201cPlease log me in,\u201d \u201cPlease send me a file,\u201d or \u201cPlease execute this command.\u201d All communications between clients and servers are securely encrypted and protected from modification. What SSH is not: Although SSH stands for Secure Shell, it is not a true shell in the sense of the Unix Bourne shell and C shell. It is not a command interpreter, nor does it provide wildcard expansion, command history, and so forth. Rather, SSH creates a channel for running a shell on a remote computer, with end-to-end encryption between the two systems. The major features and guarantees of the SSH protocol are: Privacy of your data, via strong encryption Integrity of communications, guaranteeing they haven\u2019t been altered Authentication, i.e., proof of identity of senders and receivers Authorization, i.e., access control to accounts Forwarding or tunnelling to encrypt other TCP/IP-based sessions","title":"SSH"},{"location":"level101/security/fundamentals/#kerberos","text":"According to Greek mythology Kerberos (Cerberus) was the gigantic, three-headed dog that guards the gates of the underworld to prevent the dead from leaving. So when it comes to Computer Science, Kerberos is a network authentication protocol and is currently the default authentication technology used by Microsoft Active Directory to authenticate users to services within a local area network. Kerberos uses symmetric-key cryptography and requires a trusted third-party authentication service to verify user identities. So they used the name of Kerberos for their computer network authentication protocol as the three heads of the Kerberos represent: a client: A user/ a service a server: Kerberos protected hosts reside - a Key Distribution Center (KDC), which acts as the trusted third-party authentication service. The KDC includes the following two servers: Authentication Server (AS) that performs the initial authentication and issues ticket-granting tickets (TGT) for users. Ticket-Granting Server (TGS) that issues service tickets that are based on the initial ticket-granting tickets (TGT).","title":"Kerberos"},{"location":"level101/security/fundamentals/#certificate-chain","text":"The first part of the output of the OpenSSL command shows three certificates numbered 0, 1, and 2(not 2 anymore). Each certificate has a subject, s, and an issuer, i. The first certificate, number 0, is called the end-entity certificate. The subject line tells us it\u2019s valid for any subdomain of google.com because its subject is set to *.google.com. $ openssl s_client -connect www.google.com:443 -CApath /etc/ssl/certs CONNECTED(00000005) depth=2 OU = GlobalSign Root CA - R2, O = GlobalSign, CN = GlobalSign verify return:1 depth=1 C = US, O = Google Trust Services, CN = GTS CA 1O1 verify return:1 depth=0 C = US, ST = California, L = Mountain View, O = Google LLC, CN = www.google.com verify return:1 --- Certificate chain 0 s:/C=US/ST=California/L=Mountain View/O=Google LLC/CN=www.google.com i:/C=US/O=Google Trust Services/CN=GTS CA 1O1 1 s:/C=US/O=Google Trust Services/CN=GTS CA 1O1 i:/OU=GlobalSign Root CA - R2/O=GlobalSign/CN=GlobalSign --- Server certificate The issuer line indicates it\u2019s issued by Google Internet Authority G2, which also happens to be the subject of the second certificate, number 1 What the OpenSSL command line doesn\u2019t show here is the trust store that contains the list of CA certificates trusted by the system OpenSSL runs on. The public certificate of GlobalSign Authority must be present in the system\u2019s trust store to close the verification chain. This is called a chain of trust, and the figure below summarizes its behaviour at a high level. High-level view of the concept of chain of trust applied to verifying the authenticity of a website. The Root CA in the Firefox trust store provides the initial trust to verify the entire chain and trust the end-entity certificate.","title":"Certificate Chain"},{"location":"level101/security/fundamentals/#tls-handshake","text":"The client sends a HELLO message to the server with a list of protocols and algorithms it supports. The server says HELLO back and sends its chain of certificates. Based on the capabilities of the client, the server picks a cipher suite. If the cipher suite supports ephemeral key exchange, like ECDHE does(ECDHE is an algorithm known as the Elliptic Curve Diffie-Hellman Exchange), the server and the client negotiate a pre-master key with the Diffie-Hellman algorithm. The pre-master key is never sent over the wire. The client and server create a session key that will be used to encrypt the data transiting through the connection. At the end of the handshake, both parties possess a secret session key used to encrypt data for the rest of the connection. This is what OpenSSL refers to as Master-Key NOTE There are 3 versions of TLS , TLS 1.0, 1.1 & 1.2 TLS 1.0 was released in 1999, making it a nearly two-decade-old protocol. It has been known to be vulnerable to attacks\u2014such as BEAST and POODLE\u2014for years, in addition to supporting weak cryptography, which doesn\u2019t keep modern-day connections sufficiently secure. TLS 1.1 is the forgotten \u201cmiddle child.\u201d It also has bad cryptography like its younger sibling. In most software, it was leapfrogged by TLS 1.2 and it\u2019s rare to see TLS 1.1 used.","title":"TLS Handshake"},{"location":"level101/security/fundamentals/#perfect-forward-secrecy","text":"The term \u201cephemeral\u201d in the key exchange provides an important security feature mis-named perfect forward secrecy (PFS) or just \u201cForward Secrecy\u201d. In a non-ephemeral key exchange, the client sends the pre-master key to the server by encrypting it with the server\u2019s public key. The server then decrypts the pre-master key with its private key. If at a later point in time, the private key of the server is compromised, an attacker can go back to this handshake, decrypt the pre-master key, obtain the session key, and decrypt the entire traffic. Non-ephemeral key exchanges are vulnerable to attacks that may happen in the future on recorded traffic. And because people seldom change their password, decrypting data from the past may still be valuable for an attacker. An ephemeral key exchange like DHE, or its variant on elliptic curve, ECDHE, solves this problem by not transmitting the pre-master key over the wire. Instead, the pre-master key is computed by both the client and the server in isolation, using nonsensitive information exchanged publicly. Because the pre-master key can\u2019t be decrypted later by an attacker, the session key is safe from future attacks: hence, the term perfect forward secrecy. Keys are changed every X blocks along the stream. That prevents an attacker from simply sniffing the stream and applying brute force to crack the whole thing. \"Forward secrecy\" means that just because I can decrypt block M, does not mean that I can decrypt block Q Downside: The downside to PFS is that all those extra computational steps induce latency on the handshake and slow the user down. To avoid repeating this expensive work at every connection, both sides cache the session key for future use via a technique called session resumption. This is what the session-ID and TLS ticket are for: they allow a client and server that share a session ID to skip over the negotiation of a session key, because they already agreed on one previously, and go directly to exchanging data securely.","title":"\u201cPerfect\u201d Forward Secrecy"},{"location":"level101/security/intro/","text":"Security Prerequisites Linux Basics Linux Networking What to expect from this course The course covers fundamentals of information security along with touching on subjects of system security, network & web security. This course aims to get you familiar with the basics of information security in day to day operations & then as an SRE develop the mindset of ensuring that security takes a front-seat while developing solutions. The course also serves as an introduction to common risks and best practices along with practical ways to find out vulnerable systems and loopholes which might become compromised if not secured. What is not covered under this course The courseware is not an ethical hacking workshop or a very deep dive into the fundamentals of the problems. The course does not deal with hacking or breaking into systems but rather an approach on how to ensure you don\u2019t get into those situations and also to make you aware of different ways a system can be compromised. Course Contents Fundamentals Network Security Threats, Attacks & Defence Writing Secure Code & More Conclusion","title":"Introduction"},{"location":"level101/security/intro/#security","text":"","title":"Security"},{"location":"level101/security/intro/#prerequisites","text":"Linux Basics Linux Networking","title":"Prerequisites"},{"location":"level101/security/intro/#what-to-expect-from-this-course","text":"The course covers fundamentals of information security along with touching on subjects of system security, network & web security. This course aims to get you familiar with the basics of information security in day to day operations & then as an SRE develop the mindset of ensuring that security takes a front-seat while developing solutions. The course also serves as an introduction to common risks and best practices along with practical ways to find out vulnerable systems and loopholes which might become compromised if not secured.","title":"What to expect from this course"},{"location":"level101/security/intro/#what-is-not-covered-under-this-course","text":"The courseware is not an ethical hacking workshop or a very deep dive into the fundamentals of the problems. The course does not deal with hacking or breaking into systems but rather an approach on how to ensure you don\u2019t get into those situations and also to make you aware of different ways a system can be compromised.","title":"What is not covered under this course"},{"location":"level101/security/intro/#course-contents","text":"Fundamentals Network Security Threats, Attacks & Defence Writing Secure Code & More Conclusion","title":"Course Contents"},{"location":"level101/security/network_security/","text":"Part II: Network Security Introduction TCP/IP is the dominant networking technology today. It is a five-layer architecture. These layers are, from top to bottom, the application layer, the transport layer (TCP), the network layer (IP), the data-link layer, and the physical layer. In addition to TCP/IP, there also are other networking technologies. For convenience, we use the OSI network model to represent non-TCP/IP network technologies. Different networks are interconnected using gateways. A gateway can be placed at any layer. The OSI model is a seven-layer architecture. The OSI architecture is similar to the TCP/IP architecture, except that the OSI model specifies two additional layers between the application layer and the transport layer in the TCP/IP architecture. These two layers are the presentation layer and the session layer. Figure 5.1 shows the relationship between the TCP/IP layers and the OSI layers. The application layer in TCP/IP corresponds to the application layer and the presentation layer in OSI. The transport layer in TCP/IP corresponds to the session layer and the transport layer in OSI. The remaining three layers in the TCP/IP architecture are one-to-one correspondent to the remaining three layers in the OSI model. Correspondence between layers of the TCP/IP architecture and the OSI model. Also shown are placements of cryptographic algorithms in network layers, where the dotted arrows indicate actual communications of cryptographic algorithms The functionalities of OSI layers are briefly described as follows: The application layer serves as an interface between applications and network programs. It supports application programs and end-user processing. Common application-layer programs include remote logins, file transfer, email, and Web browsing. The presentation layer is responsible for dealing with data that is formed differently. This protocol layer allows application-layer programs residing on different sides of a communication channel with different platforms to understand each other's data formats regardless of how they are presented. The session layer is responsible for creating, managing, and closing a communication connection. The transport layer is responsible for providing reliable connections, such as packet sequencing, traffic control, and congestion control. The network layer is responsible for routing device-independent data packets from the current hop to the next hop. The data-link layer is responsible for encapsulating device-independent data packets into device-dependent data frames. It has two sublayers: logical link control and media access control. The physical layer is responsible for transmitting device-dependent frames through some physical media. Starting from the application layer, data generated from an application program is passed down layer-by-layer to the physical layer. Data from the previous layer is enclosed in a new envelope at the current layer, where the data from the previous layer is also just an envelope containing the data from the layer before it. This is similar to enclosing a smaller envelope in a larger one. The envelope added at each layer contains sufficient information for handling the packet. Application-layer data are divided into blocks small enough to be encapsulated in an envelope at the next layer. Application data blocks are \u201cdressed up\u201d in the TCP/IP architecture according to the following basic steps. At the sending side, an application data block is encapsulated in a TCP packet when it is passed down to the TCP layer. In other words, a TCP packet consists of a header and a payload, where the header corresponds to the TCP envelope and the payload is the application data block. Likewise, the TCP packet will be encapsulated in an IP packet when it is passed down to the IP layer. An IP packet consists of a header and a payload, which is the TCP packet passed down from the TCP layer. The IP packet will be encapsulated in a device-dependent frame (e.g., an Ethernet frame) when it is passed down to the data-link layer. A frame has a header, and it may also have a trailer. For example, in addition to having a header, an Ethernet frame also has a 32-bit cyclic redundancy check (CRC) trailer. When it is passed down to the physical layer, a frame will be transformed into a sequence of media signals for transmission Flow Diagram of a Packet Generation At the destination side, the medium signals are converted by the physical layer into a frame, which is passed up to the data-link layer. The data-link layer passes the frame payload (i.e., the IP packet encapsulated in the frame) up to the IP layer. The IP layer passes the IP payload, namely, the TCP packet encapsulated in the IP packet, up to the TCP layer. The TCP layer passes the TCP payload, namely, the application data block, up to the application layer. When a packet arrives at a router, it only goes up to the IP layer, where certain fields in the IP header are modified (e.g., the value of TTL is decreased by 1). This modified packet is then passed back down layer-by-layer to the physical layer for further transmission. Public Key Infrastructure To deploy cryptographic algorithms in network applications, we need a way to distribute secret keys using open networks. Public-key cryptography is the best way to distribute these secret keys. To use public-key cryptography, we need to build a public-key infrastructure (PKI) to support and manage public-key certificates and certificate authority (CA) networks. In particular, PKIs are set up to perform the following functions: Determine the legitimacy of users before issuing public-key certificates to them. Issue public-key certificates upon user requests. Extend public-key certificates valid time upon user requests. Revoke public-key certificates upon users' requests or when the corresponding private keys are compromised. Store and manage public-key certificates. Prevent digital signature signers from denying their signatures. Support CA networks to allow different CAs to authenticate public-key certificates issued by other CAs. X.509: https://certificatedecoder.dev/?gclid=EAIaIQobChMI0M731O6G6gIVVSQrCh04bQaAEAAYASAAEgKRkPD_BwE IPsec: A Security Protocol at the Network Layer IPsec is a major security protocol at the network layer IPsec provides a potent platform for constructing virtual private networks (VPN). VPNs are private networks overlayed on public networks. The purpose of deploying cryptographic algorithms at the network layer is to encrypt or authenticate IP packets (either just the payloads or the whole packets). IPsec also specifies how to exchange keys. Thus, IPsec consists of authentication protocols, encryption protocols, and key exchange protocols. They are referred to, respectively, as authentication header (AH), encapsulating security payload (ESP), and Internet key exchange (IKE). PGP & S/MIME : Email Security There are several security protocols at the application layer. The most used of these protocols are email security protocols namely PGP and S/MIME. SMTP (\u201cSimple Mail Transfer Protocol\u201d) is used for sending and delivering from a client to a server via port 25: it\u2019s the outgoing server. On the contrary, POP (\u201cPost Office Protocol\u201d) allows the users to pick up the message and download it into their inbox: it\u2019s the incoming server. The latest version of the Post Office Protocol is named POP3, and it\u2019s been used since 1996; it uses port 110 PGP PGP implements all major cryptographic algorithms, the ZIP compression algorithm, and the Base64 encoding algorithm. It can be used to authenticate a message, encrypt a message, or both. PGP follows the following general process: authentication, ZIP compression, encryption, and Base64 encoding. The Base64 encoding procedure makes the message ready for SMTP transmission GPG (GnuPG) GnuPG is another free encryption standard that companies may use that is based on OpenPGP. GnuPG serves as a replacement for Symantec\u2019s PGP. The main difference is the supported algorithms. However, GnuPG plays nice with PGP by design. Because GnuPG is open, some businesses would prefer the technical support and the user interface that comes with Symantec\u2019s PGP. It is important to note that there are some nuances between the compatibility of GnuPG and PGP, such as the compatibility between certain algorithms, but in most applications such as email, there are workarounds. One such algorithm is the IDEA Module which isn\u2019t included in GnuPG out of the box due to patent issues. S/MIME SMTP can only handle 7-bit ASCII text (You can use UTF-8 extensions to alleviate these limitations, ) messages. While POP can handle other content types besides 7-bit ASCII, POP may, under a common default setting, download all the messages stored in the mail server to the user's local computer. After that, if POP removes these messages from the mail server. This makes it difficult for the users to read their messages from multiple computers. The Multipurpose Internet Mail Extension protocol (MIME) was designed to support sending and receiving email messages in various formats, including nontext files generated by word processors, graphics files, sound files, and video clips. Moreover, MIME allows a single message to include mixed types of data in any combination of these formats. The Internet Mail Access Protocol (IMAP), operated on TCP port 143(only for non-encrypted), stores (Configurable on both server & client just like PoP) incoming email messages in the mail server until the user deletes them deliberately. This allows the users to access their mailbox from multiple machines and download messages to a local machine without deleting it from the mailbox in the mail server. SSL/TLS SSL uses a PKI to decide if a server\u2019s public key is trustworthy by requiring servers to use a security certificate signed by a trusted CA. When Netscape Navigator 1.0 was released, it trusted a single CA operated by the RSA Data Security corporation. The server\u2019s public RSA keys were used to be stored in the security certificate, which can then be used by the browser to establish a secure communication channel. The security certificates we use today still rely on the same standard (named X.509) that Netscape Navigator 1.0 used back then. Netscape intended to train users(though this didn\u2019t work out later) to differentiate secure communications from insecure ones, so they put a lock icon next to the address bar. When the lock is open, the communication is insecure. A closed lock means communication has been secured with SSL, which required the server to provide a signed certificate. You\u2019re obviously familiar with this icon as it\u2019s been in every browser ever since. The engineers at Netscape truly created a standard for secure internet communications. A year after releasing SSL 2.0, Netscape fixed several security issues and released SSL 3.0, a protocol that, albeit being officially deprecated since June 2015, remains in use in certain parts of the world more than 20 years after its introduction. To standardize SSL, the Internet Engineering Task Force (IETF) created a slightly modified SSL 3.0 and, in 1999, unveiled it as Transport Layer Security (TLS) 1.0. The name change between SSL and TLS continues to confuse people today. Officially, TLS is the new SSL, but in practice, people use SSL and TLS interchangeably to talk about any version of the protocol. Must See: https://tls.ulfheim.net/ https://davidwong.fr/tls13/ Network Perimeter Security Let us see how we keep a check on the perimeter i.e the edges, the first layer of protection General Firewall Framework Firewalls are needed because encryption algorithms cannot effectively stop malicious packets from getting into an edge network. This is because IP packets, regardless of whether they are encrypted, can always be forwarded into an edge network. Firewalls that were developed in the 1990s are important instruments to help restrict network access. A firewall may be a hardware device, a software package, or a combination of both. Packets flowing into the internal network from the outside should be evaluated before they are allowed to enter. One of the critical elements of a firewall is its ability to examine packets without imposing a negative impact on communication speed while providing security protections for the internal network. The packet inspection that is carried out by firewalls can be done using several different methods. Based on the particular method used by the firewall, it can be characterized as either a packet filter, circuit gateway, application gateway, or dynamic packet filter. Packet Filters It inspects ingress packets coming to an internal network from outside and inspects egress packets going outside from an internal network Packing filtering only inspects IP headers and TCP headers, not the payloads generated at the application layer A packet-filtering firewall uses a set of rules to determine whether a packet should be allowed or denied to pass through. 2 types: Stateless It treats each packet as an independent object, and it does not keep track of any previously processed packets. In other words, stateless filtering inspects a packet when it arrives and makes a decision without leaving any record of the packet being inspected. Stateful Stateful filtering, also referred to as connection-state filtering, keeps track of connections between an internal host and an external host. A connection state (or state, for short) indicates whether it is a TCP connection or a UDP connection and whether the connection is established. Circuit Gateways Circuit gateways, also referred to as circuit-level gateways, are typically operated at the transportation layer They evaluate the information of the IP addresses and the port numbers contained in TCP (or UDP) headers and use it to determine whether to allow or to disallow an internal host and an external host to establish a connection. It is common practice to combine packet filters and circuit gateways to form a dynamic packet filter (DPF). Application Gateways(ALG) Aka PROXY Servers An Application Level Gateway (ALG) acts as a proxy for internal hosts, processing service requests from external clients. An ALG performs deep inspections on each IP packet (ingress or egress). In particular, an ALG inspects application program formats contained in the packet (e.g., MIME format or SQL format) and examines whether its payload is permitted. Thus, an ALG may be able to detect a computer virus contained in the payload. Because an ALG inspects packet payloads, it may be able to detect malicious code and quarantine suspicious packets, in addition to blocking packets with suspicious IP addresses and TCP ports. On the other hand, an ALG also incurs substantial computation and space overheads. Trusted Systems & Bastion Hosts A Trusted Operating System (TOS) is an operating system that meets a particular set of security requirements. Whether an operating system can be trusted or not depends on several elements. For example, for an operating system on a particular computer to be certified trusted, one needs to validate that, among other things, the following four requirements are satisfied: Its system design contains no defects; Its system software contains no loopholes; Its system is configured properly; and Its system management is appropriate. Bastion Hosts Bastion hosts are computers with strong defence mechanisms. They often serve as host computers for implementing application gateways, circuit gateways, and other types of firewalls. A bastion host is operated on a trusted operating system that must not contain unnecessary functionalities or programs. This measure helps to reduce error probabilities and makes it easier to conduct security checks. Only those network application programs that are necessary, for example, SSH, DNS, SMTP, and authentication programs, are installed on a bastion host. Bastion hosts are also primarily used as controlled ingress points so that the security monitoring can focus more narrowly on actions happening at a single point closely. Common Techniques & Scannings, Packet Capturing Scanning Ports with Nmap Nmap (\"Network Mapper\") is a free and open-source (license) utility for network discovery and security auditing. Many systems and network administrators also find it useful for tasks such as network inventory, managing service upgrade schedules, and monitoring host or service uptime. The best thing about Nmap is it\u2019s free and open-source and is very flexible and versatile Nmap is often used to determine alive hosts in a network, open ports on those hosts, services running on those open ports, and version identification of that service on that port. More at http://scanme.nmap.org/ nmap [scan type] [options] [target specification] Nmap uses 6 different port states: Open \u2014 An open port is one that is actively accepting TCP, UDP or SCTP connections. Open ports are what interests us the most because they are the ones that are vulnerable to attacks. Open ports also show the available services on a network. Closed \u2014 A port that receives and responds to Nmap probe packets but there is no application listening on that port. Useful for identifying that the host exists and for OS detection. Filtered \u2014 Nmap can\u2019t determine whether the port is open because packet filtering prevents its probes from reaching the port. Filtering could come from firewalls or router rules. Often little information is given from filtered ports during scans as the filters can drop the probes without responding or respond with useless error messages e.g. destination unreachable. Unfiltered \u2014 Port is accessible but Nmap doesn\u2019t know if it is open or closed. Only used in ACK scan which is used to map firewall rulesets. Other scan types can be used to identify whether the port is open. Open/filtered \u2014 Nmap is unable to determine between open and filtered. This happens when an open port gives no response. No response could mean that the probe was dropped by a packet filter or any response is blocked. Closed/filtered \u2014 Nmap is unable to determine whether a port is closed or filtered. Only used in the IP ID idle scan. Types of Nmap Scan: TCP Connect TCP Connect scan completes the 3-way handshake. If a port is open, the operating system completes the TCP three-way handshake and the port scanner immediately closes the connection to avoid DOS. This is \u201cnoisy\u201d because the services can log the sender IP address and might trigger Intrusion Detection Systems. UDP Scan This scan checks to see if any UDP ports are listening. Since UDP does not respond with a positive acknowledgement like TCP and only responds to an incoming UDP packet when the port is closed, SYN Scan SYN scan is another form of TCP scanning. This scan type is also known as \u201chalf-open scanning\u201d because it never actually opens a full TCP connection. The port scanner generates a SYN packet. If the target port is open, it will respond with an SYN-ACK packet. The scanner host responds with an RST packet, closing the connection before the handshake is completed. If the port is closed but unfiltered, the target will instantly respond with an RST packet. SYN scan has the advantage that the individual services never actually receive a connection. FIN Scan This is a stealthy scan, like the SYN scan, but sends a TCP FIN packet instead. ACK Scan Ack scanning determines whether the port is filtered or not. Null Scan Another very stealthy scan that sets all the TCP header flags to off or null. This is not normally a valid packet and some hosts will not know what to do with this. XMAS Scan Similar to the NULL scan except for all the flags in the TCP header is set to on RPC Scan This special type of scan looks for machine answering to RPC (Remote Procedure Call) services IDLE Scan It is a super stealthy method whereby the scan packets are bounced off an external host. You don\u2019t need to have control over the other host but it does have to set up and meet certain requirements. You must input the IP address of our \u201czombie\u201d host and what port number to use. It is one of the more controversial options in Nmap since it only has a use for malicious attacks. Scan Techniques A couple of scan techniques which can be used to gain more information about a system and its ports. You can read more at https://medium.com/infosec-adventures/nmap-cheatsheet-a423fcdda0ca OpenVAS OpenVAS is a full-featured vulnerability scanner. OpenVAS is a framework of services and tools that provides a comprehensive and powerful vulnerability scanning and management package OpenVAS, which is an open-source program, began as a fork of the once-more-popular scanning program, Nessus. OpenVAS is made up of three main parts. These are: a regularly updated feed of Network Vulnerability Tests (NVTs); a scanner, which runs the NVTs; and an SQLite 3 database for storing both your test configurations and the NVTs\u2019 results and configurations. https://www.greenbone.net/en/install_use_gce/ WireShark Wireshark is a protocol analyzer. This means Wireshark is designed to decode not only packet bits and bytes but also the relations between packets and protocols. Wireshark understands protocol sequences. A simple demo of Wireshark Capture only udp packets: Capture filter = \u201cudp\u201d Capture only tcp packets Capture filter = \u201ctcp\u201d TCP/IP 3 way Handshake Filter by IP address: displays all traffic from IP, be it source or destination ip.addr == 192.168.1.1 Filter by source address: display traffic only from IP source ip.src == 192.168.0.1 Filter by destination: display traffic only form IP destination ip.dst == 192.168.0.1 Filter by IP subnet: display traffic from subnet, be it source or destination ip.addr = 192.168.0.1/24 Filter by protocol: filter traffic by protocol name dns http ftp arp ssh telnet icmp Exclude IP address: remove traffic from and to IP address !ip.addr ==192.168.0.1 Display traffic between two specific subnet ip.addr == 192.168.0.1/24 and ip.addr == 192.168.1.1/24 Display traffic between two specific workstations ip.addr == 192.168.0.1 and ip.addr == 192.168.0.2 Filter by MAC eth.addr = 00:50:7f:c5:b6:78 Filter TCP port tcp.port == 80 Filter TCP port source tcp.srcport == 80 Filter TCP port destination tcp.dstport == 80 Find user agents http.user_agent contains Firefox !http.user_agent contains || !http.user_agent contains Chrome Filter broadcast traffic !(arp or icmp or dns) Filter IP address and port tcp.port == 80 && ip.addr == 192.168.0.1 Filter all http get requests http.request Filter all http get requests and responses http.request or http.response Filter three way handshake tcp.flags.syn==1 or (tcp.seq==1 and tcp.ack==1 and tcp.len==0 and tcp.analysis.initial_rtt) Find files by type frame contains \u201c(attachment|tar|exe|zip|pdf)\u201d Find traffic based on keyword tcp contains facebook frame contains facebook Detecting SYN Floods tcp.flags.syn == 1 and tcp.flags.ack == 0 Wireshark Promiscuous Mode - By default, Wireshark only captures packets going to and from the computer where it runs. By checking the box to run Wireshark in Promiscuous Mode in the Capture Settings, you can capture most of the traffic on the LAN. DumpCap Dumpcap is a network traffic dump tool. It captures packet data from a live network and writes the packets to a file. Dumpcap\u2019s native capture file format is pcapng, which is also the format used by Wireshark. By default, Dumpcap uses the pcap library to capture traffic from the first available network interface and writes the received raw packet data, along with the packets\u2019 time stamps into a pcapng file. The capture filter syntax follows the rules of the pcap library. The Wireshark command-line utility called 'dumpcap.exe' can be used to capture LAN traffic over an extended period of time. Wireshark itself can also be used, but dumpcap does not significantly utilize the computer's memory while capturing for long periods. DaemonLogger Daemonlogger is a packet logging application designed specifically for use in Network and Systems Management (NSM) environments. The biggest benefit Daemonlogger provides is that, like Dumpcap, it is simple to use for capturing packets. In order to begin capturing, you need only to invoke the command and specify an interface. daemonlogger \u2013i eth1 This option, by default, will begin capturing packets and logging them to the current working directory. Packets will be collected until the capture file size reaches 2 GB, and then a new file will be created. This will continue indefinitely until the process is halted. NetSniff-NG Netsniff-NG is a high-performance packet capture utility While the utilities we\u2019ve discussed to this point rely on Libpcap for capture, Netsniff-NG utilizes zero-copy mechanisms to capture packets. This is done with the intent to support full packet capture over high throughput links. To begin capturing packets with Netsniff-NG, we have to specify an input and output. In most cases, the input will be a network interface, and the output will be a file or folder on disk. netsniff-ng \u2013i eth1 \u2013o data.pcap Netflow NetFlow is a feature that was introduced on Cisco routers around 1996 that provides the ability to collect IP network traffic as it enters or exits an interface. By analyzing the data provided by NetFlow, a network administrator can determine things such as the source and destination of traffic, class of service, and the causes of congestion. A typical flow monitoring setup (using NetFlow) consists of three main components:[1] Flow exporter: aggregates packets into flows and exports flow records towards one or more flow collectors. Flow collector: responsible for reception, storage and pre-processing of flow data received from a flow exporter. Analysis application: analyzes received flow data in the context of intrusion detection or traffic profiling, for example. Routers and switches that support NetFlow can collect IP traffic statistics on all interfaces where NetFlow is enabled, and later export those statistics as NetFlow records toward at least one NetFlow collector\u2014typically a server that does the actual traffic analysis. IDS A security solution that detects security-related events in your environment but does not block them. IDS sensors can be software and hardware-based used to collect and analyze the network traffic. These sensors are available in two varieties, network IDS and host IDS. A host IDS is a server-specific agent running on a server with a minimum of overhead to monitor the operating system. A network IDS can be embedded in a networking device, a standalone appliance, or a module monitoring the network traffic. Signature Based IDS The signature-based IDS monitors the network traffic or observes the system and sends an alarm if a known malicious event is happening. It does so by comparing the data flow against a database of known attack patterns These signatures explicitly define what traffic or activity should be considered as malicious. Signature-based detection has been the bread and butter of network-based defensive security for over a decade, partially because it is very similar to how malicious activity is detected at the host level with antivirus utilities The formula is fairly simple: an analyst observes a malicious activity, derives indicators from the activity and develops them into signatures, and then those signatures will alert whenever the activity occurs again. ex: SNORT & SURICATA Policy-Based IDS The policy-based IDSs (mainly host IDSs) trigger an alarm whenever a violation occurs against the configured policy. This configured policy is or should be a representation of the security policies. This type of IDS is flexible and can be customized to a company's network requirements because it knows exactly what is permitted and what is not. On the other hand, the signature-based systems rely on vendor specifics and default settings. Anomaly Based IDS The anomaly-based IDS looks for traffic that deviates from the normal, but the definition of what is a normal network traffic pattern is the tricky part Two types of anomaly-based IDS exist: statistical and nonstatistical anomaly detection Statistical anomaly detection learns the traffic patterns interactively over a period of time. In the nonstatistical approach, the IDS has a predefined configuration of the supposedly acceptable and valid traffic patterns. Host-Based IDS & Network-Based IDS A host IDS can be described as a distributed agent residing on each server of the network that needs protection. These distributed agents are tied very closely to the underlying operating system. Network IDSs, on the other hand, can be described as intelligent sniffing devices. Data (raw packets) is captured from the network by a network IDS, whereas host IDSs capture the data from the host on which they are installed. Honeypots The use of decoy machines to direct intruders' attention away from the machines under protection is a major technique to preclude intrusion attacks. Any device, system, directory, or file used as a decoy to lure attackers away from important assets and to collect intrusion or abusive behaviours is referred to as a honeypot. A honeypot may be implemented as a physical device or as an emulation system. The idea is to set up decoy machines in a LAN, or decoy directories/files in a file system and make them appear important, but with several exploitable loopholes, to lure attackers to attack these machines or directories/files, so that other machines, directories, and files can evade intruders' attentions. A decoy machine may be a host computer or a server computer. Likewise, we may also set up decoy routers or even decoy LANs. Chinks In The Armour (TCP/IP Security Issues) IP Spoofing In this type of attack, the attacker replaces the IP address of the sender, or in some rare cases the destination, with a different address. IP spoofing is normally used to exploit a target host. In other cases, it is used to start a denial-of-service (DoS) attack. In a DoS attack, an attacker modifies the IP packet to mislead the target host into accepting the original packet as a packet sourced at a trusted host. The attacker must know the IP address of the trusted host to modify the packet headers (source IP address) so that it appears that the packets are coming from that host. IP Spoofing Detection Techniques Direct TTL Probes In this technique we send a packet to a host of suspect spoofed IP that triggers reply and compares TTL with suspect packet; if the TTL in the reply is not the same as the packet being checked; it is a spoofed packet. This Technique is successful when the attacker is in a different subnet from the victim. IP Identification Number. Send a probe to the host of suspect spoofed traffic that triggers a reply and compares IP ID with suspect traffic. If IP IDs are not in the near value of packet being checked, suspect traffic is spoofed TCP Flow Control Method Attackers sending spoofed TCP packets will not receive the target\u2019s SYN-ACK packets. Attackers cannot, therefore, be responsive to change in the congestion window size When the receiver still receives traffic even after a windows size is exhausted, most probably the packets are spoofed. Covert Channel A covert or clandestine channel can be best described as a pipe or communication channel between two entities that can be exploited by a process or application transferring information in a manner that violates the system's security specifications. More specifically for TCP/IP, in some instances, covert channels are established, and data can be secretly passed between two end systems. Ex: ICMP resides at the Internet layer of the TCP/IP protocol suite and is implemented in all TCP/IP hosts. Based on the specifications of the ICMP Protocol, an ICMP Echo Request message should have an 8-byte header and a 56-byte payload. The ICMP Echo Request packet should not carry any data in the payload. However, these packets are often used to carry secret information. The ICMP packets are altered slightly to carry secret data in the payload. This makes the size of the packet larger, but no control exists in the protocol stack to defeat this behaviour. The alteration of ICMP packets allows intruders to program specialized client-server pairs. These small pieces of code export confidential information without alerting the network administrator. ICMP can be leveraged for more than data exfiltration. For eg. some C&C tools such as Loki used ICMP channel to establish encrypted interactive session back in 1996. Deep packet inspection has since come a long way. A lot of IDS/IPS detect ICMP tunnelling. Check for echo responses that do not contain the same payload as request Check for the volume of ICMP traffic especially for volumes beyond an acceptable threshold IP Fragmentation Attack The TCP/IP protocol suite, or more specifically IP, allows the fragmentation of packets.(this is a feature & not a bug) IP fragmentation offset is used to keep track of the different parts of a datagram. The information or content in this field is used at the destination to reassemble the datagrams All such fragments have the same Identification field value, and the fragmentation offset indicates the position of the current fragment in the context of the original packet. Many access routers and firewalls do not perform packet reassembly. In normal operation, IP fragments do not overlap, but attackers can create artificially fragmented packets to mislead the routers or firewalls. Usually, these packets are small and almost impractical for end systems because of data and computational overhead. A good example of an IP fragmentation attack is the Ping of Death attack. The Ping of Death attack sends fragments that, when reassembled at the end station, create a larger packet than the maximum permissible length. TCP Flags Data exchange using TCP does not happen until a three-way handshake has been completed. This handshake uses different flags to influence the way TCP segments are processed. There are 6 bits in the TCP header that are often called flags. Namely: 6 different flags are part of the TCP header: Urgent pointer field (URG), Acknowledgment field (ACK), Push function (PSH), Reset the connection (RST), Synchronize sequence numbers (SYN), and the sender is finished with this connection (FIN). Abuse of the normal operation or settings of these flags can be used by attackers to launch DoS attacks. This causes network servers or web servers to crash or hang. | SYN | FIN | PSH | RST | Validity| |------|------|-------|------|---------| | 1 |1 |0 |0 |Illegal Combination | 1 |1 |1 |0 |Illegal Combination | 1 |1 |0 |1 |Illegal Combination | 1 |1 |1 |1 |Illegal Combination The attacker's ultimate goal is to write special programs or pieces of code that can construct these illegal combinations resulting in an efficient DoS attack. SYN FLOOD The timers (or lack of certain timers) in 3 way handshake are often used and exploited by attackers to disable services or even to enter systems. After step 2 of the three-way handshake, no limit is set on the time to wait after receiving a SYN. The attacker initiates many connection requests to the webserver of Company XYZ (almost certainly with a spoofed IP address). The SYN+ACK packets (Step 2) sent by the web server back to the originating source IP address are not replied to. This leaves a TCP session half-open on the webserver. Multiple packets cause multiple TCP sessions to stay open. Based on the hardware limitations of the server, a limited number of TCP sessions can stay open, and as a result, the webserver refuses further connection establishments attempts from any host as soon as a certain limit is reached. These half-open connections need to be completed or timed out before new connections can be established. FIN Attack In normal operation, the sender sets the TCP FIN flag indicating that no more data will be transmitted and the connection can be closed down. This is a four-way handshake mechanism, with both sender and receiver expected to send an acknowledgement on a received FIN packet. During an attack that is trying to kill connections, a spoofed FIN packet is constructed. This packet also has the correct sequence number, so the packets are seen as valid by the targeted host. These sequence numbers are easy to predict. This process is referred to as TCP sequence number prediction, whereby the attacker either sniffs the current Sequence and Acknowledgment (SEQ/ACK) numbers of the connection or can algorithmically predict these numbers. Connection Hijacking An authorized user (Employee X) sends HTTP requests over a TCP session with the webserver. The web server accepts the packets from Employee X only when the packet has the correct SEQ/ACK numbers. As seen previously, these numbers are important for the webserver to distinguish between different sessions and to make sure it is still talking to Employee X. Imagine that the cracker starts sending packets to the web server spoofing the IP address of Employee X, using the correct SEQ/ACK combination. The web server accepts the packet and increments the ACK number. In the meantime, Employee X continues to send packets but with incorrect SEQ/ACK numbers. As a result of sending unsynchronized packets, all data from Employee X is discarded when received by the webserver. The attacker pretends to be Employee X using the correct numbers. This finally results in the cracker hijacking the connection, whereby Employee X is completely confused and the webserver replies assuming the cracker is sending correct synchronized data. STEPS: The attacker examines the traffic flows with a network monitor and notices traffic from Employee X to a web server. The web server returns or echoes data back to the origination station (Employee X). Employee X acknowledges the packet. The cracker launches a spoofed packet to the server. The web server responds to the cracker. The cracker starts verifying SEQ/ACK numbers to double-check success. At this time, the cracker takes over the session from Employee X, which results in a session hanging for Employee X. The cracker can start sending traffic to the webserver. The web server returns the requested data to confirm delivery with the correct ACK number. The cracker can continue to send data (keeping track of the correct SEQ/ACK numbers) until eventually setting the FIN flag to terminate the session. Buffer Overflow A buffer is a temporary data storage area used to store program code and data. When a program or process tries to store more data in a buffer than it was originally anticipated to hold, a buffer overflow occurs. Buffers are temporary storage locations in memory (memory or buffer sizes are often measured in bytes) that can store a fixed amount of data in bytes. When more data is retrieved than can be stored in a buffer location, the additional information must go into an adjacent buffer, resulting in overwriting the valid data held in them. Mechanism: Buffer overflow vulnerabilities exist in different types. But the overall goal for all buffer overflow attacks is to take over the control of a privileged program and, if possible, the host. The attacker has two tasks to achieve this goal. First, the dirty code needs to be available in the program's code address space. Second, the privileged program should jump to that particular part of the code, which ensures that the proper parameters are loaded into memory. The first task can be achieved in two ways: by injecting the code in the right address space or by using the existing code and modifying certain parameters slightly. The second task is a little more complex because the program's control flow needs to be modified to make the program jump to the dirty code. CounterMeasure: The most important approach is to have a concerted focus on writing correct code. A second method is to make the data buffers (memory locations) address space of the program code non-executable. This type of address space makes it impossible to execute code, which might be infiltrated in the program's buffers during an attack. More Spoofing Address Resolution Protocol Spoofing The Address Resolution Protocol (ARP) provides a mechanism to resolve, or map, a known IP address to a MAC sublayer address. Using ARP spoofing, the cracker can exploit this hardware address authentication mechanism by spoofing the hardware address of Host B. Basically, the attacker can convince any host or network device on the local network that the cracker's workstation is the host to be trusted. This is a common method used in a switched environment. ARP spoofing can be prevented with the implementation of static ARP tables in all the hosts and routers of your network. Alternatively, you can implement an ARP server that responds to ARP requests on behalf of the target host. DNS Spoofing DNS spoofing is the method whereby the hacker convinces the target machine that the system it wants to connect to is the machine of the cracker. The cracker modifies some records so that name entries of hosts correspond to the attacker's IP address. There have been instances in which the complete DNS server was compromised by an attack. To counter DNS spoofing, the reverse lookup detects these attacks. The reverse lookup is a mechanism to verify the IP address against a name. The IP address and name files are usually kept on different servers to make compromise much more difficult","title":"Network Security"},{"location":"level101/security/network_security/#part-ii-network-security","text":"","title":"Part II: Network Security"},{"location":"level101/security/network_security/#introduction","text":"TCP/IP is the dominant networking technology today. It is a five-layer architecture. These layers are, from top to bottom, the application layer, the transport layer (TCP), the network layer (IP), the data-link layer, and the physical layer. In addition to TCP/IP, there also are other networking technologies. For convenience, we use the OSI network model to represent non-TCP/IP network technologies. Different networks are interconnected using gateways. A gateway can be placed at any layer. The OSI model is a seven-layer architecture. The OSI architecture is similar to the TCP/IP architecture, except that the OSI model specifies two additional layers between the application layer and the transport layer in the TCP/IP architecture. These two layers are the presentation layer and the session layer. Figure 5.1 shows the relationship between the TCP/IP layers and the OSI layers. The application layer in TCP/IP corresponds to the application layer and the presentation layer in OSI. The transport layer in TCP/IP corresponds to the session layer and the transport layer in OSI. The remaining three layers in the TCP/IP architecture are one-to-one correspondent to the remaining three layers in the OSI model. Correspondence between layers of the TCP/IP architecture and the OSI model. Also shown are placements of cryptographic algorithms in network layers, where the dotted arrows indicate actual communications of cryptographic algorithms The functionalities of OSI layers are briefly described as follows: The application layer serves as an interface between applications and network programs. It supports application programs and end-user processing. Common application-layer programs include remote logins, file transfer, email, and Web browsing. The presentation layer is responsible for dealing with data that is formed differently. This protocol layer allows application-layer programs residing on different sides of a communication channel with different platforms to understand each other's data formats regardless of how they are presented. The session layer is responsible for creating, managing, and closing a communication connection. The transport layer is responsible for providing reliable connections, such as packet sequencing, traffic control, and congestion control. The network layer is responsible for routing device-independent data packets from the current hop to the next hop. The data-link layer is responsible for encapsulating device-independent data packets into device-dependent data frames. It has two sublayers: logical link control and media access control. The physical layer is responsible for transmitting device-dependent frames through some physical media. Starting from the application layer, data generated from an application program is passed down layer-by-layer to the physical layer. Data from the previous layer is enclosed in a new envelope at the current layer, where the data from the previous layer is also just an envelope containing the data from the layer before it. This is similar to enclosing a smaller envelope in a larger one. The envelope added at each layer contains sufficient information for handling the packet. Application-layer data are divided into blocks small enough to be encapsulated in an envelope at the next layer. Application data blocks are \u201cdressed up\u201d in the TCP/IP architecture according to the following basic steps. At the sending side, an application data block is encapsulated in a TCP packet when it is passed down to the TCP layer. In other words, a TCP packet consists of a header and a payload, where the header corresponds to the TCP envelope and the payload is the application data block. Likewise, the TCP packet will be encapsulated in an IP packet when it is passed down to the IP layer. An IP packet consists of a header and a payload, which is the TCP packet passed down from the TCP layer. The IP packet will be encapsulated in a device-dependent frame (e.g., an Ethernet frame) when it is passed down to the data-link layer. A frame has a header, and it may also have a trailer. For example, in addition to having a header, an Ethernet frame also has a 32-bit cyclic redundancy check (CRC) trailer. When it is passed down to the physical layer, a frame will be transformed into a sequence of media signals for transmission Flow Diagram of a Packet Generation At the destination side, the medium signals are converted by the physical layer into a frame, which is passed up to the data-link layer. The data-link layer passes the frame payload (i.e., the IP packet encapsulated in the frame) up to the IP layer. The IP layer passes the IP payload, namely, the TCP packet encapsulated in the IP packet, up to the TCP layer. The TCP layer passes the TCP payload, namely, the application data block, up to the application layer. When a packet arrives at a router, it only goes up to the IP layer, where certain fields in the IP header are modified (e.g., the value of TTL is decreased by 1). This modified packet is then passed back down layer-by-layer to the physical layer for further transmission.","title":"Introduction"},{"location":"level101/security/network_security/#public-key-infrastructure","text":"To deploy cryptographic algorithms in network applications, we need a way to distribute secret keys using open networks. Public-key cryptography is the best way to distribute these secret keys. To use public-key cryptography, we need to build a public-key infrastructure (PKI) to support and manage public-key certificates and certificate authority (CA) networks. In particular, PKIs are set up to perform the following functions: Determine the legitimacy of users before issuing public-key certificates to them. Issue public-key certificates upon user requests. Extend public-key certificates valid time upon user requests. Revoke public-key certificates upon users' requests or when the corresponding private keys are compromised. Store and manage public-key certificates. Prevent digital signature signers from denying their signatures. Support CA networks to allow different CAs to authenticate public-key certificates issued by other CAs. X.509: https://certificatedecoder.dev/?gclid=EAIaIQobChMI0M731O6G6gIVVSQrCh04bQaAEAAYASAAEgKRkPD_BwE","title":"Public Key Infrastructure"},{"location":"level101/security/network_security/#ipsec-a-security-protocol-at-the-network-layer","text":"IPsec is a major security protocol at the network layer IPsec provides a potent platform for constructing virtual private networks (VPN). VPNs are private networks overlayed on public networks. The purpose of deploying cryptographic algorithms at the network layer is to encrypt or authenticate IP packets (either just the payloads or the whole packets). IPsec also specifies how to exchange keys. Thus, IPsec consists of authentication protocols, encryption protocols, and key exchange protocols. They are referred to, respectively, as authentication header (AH), encapsulating security payload (ESP), and Internet key exchange (IKE).","title":"IPsec: A Security Protocol at the Network Layer"},{"location":"level101/security/network_security/#pgp-smime-email-security","text":"There are several security protocols at the application layer. The most used of these protocols are email security protocols namely PGP and S/MIME. SMTP (\u201cSimple Mail Transfer Protocol\u201d) is used for sending and delivering from a client to a server via port 25: it\u2019s the outgoing server. On the contrary, POP (\u201cPost Office Protocol\u201d) allows the users to pick up the message and download it into their inbox: it\u2019s the incoming server. The latest version of the Post Office Protocol is named POP3, and it\u2019s been used since 1996; it uses port 110 PGP PGP implements all major cryptographic algorithms, the ZIP compression algorithm, and the Base64 encoding algorithm. It can be used to authenticate a message, encrypt a message, or both. PGP follows the following general process: authentication, ZIP compression, encryption, and Base64 encoding. The Base64 encoding procedure makes the message ready for SMTP transmission GPG (GnuPG) GnuPG is another free encryption standard that companies may use that is based on OpenPGP. GnuPG serves as a replacement for Symantec\u2019s PGP. The main difference is the supported algorithms. However, GnuPG plays nice with PGP by design. Because GnuPG is open, some businesses would prefer the technical support and the user interface that comes with Symantec\u2019s PGP. It is important to note that there are some nuances between the compatibility of GnuPG and PGP, such as the compatibility between certain algorithms, but in most applications such as email, there are workarounds. One such algorithm is the IDEA Module which isn\u2019t included in GnuPG out of the box due to patent issues. S/MIME SMTP can only handle 7-bit ASCII text (You can use UTF-8 extensions to alleviate these limitations, ) messages. While POP can handle other content types besides 7-bit ASCII, POP may, under a common default setting, download all the messages stored in the mail server to the user's local computer. After that, if POP removes these messages from the mail server. This makes it difficult for the users to read their messages from multiple computers. The Multipurpose Internet Mail Extension protocol (MIME) was designed to support sending and receiving email messages in various formats, including nontext files generated by word processors, graphics files, sound files, and video clips. Moreover, MIME allows a single message to include mixed types of data in any combination of these formats. The Internet Mail Access Protocol (IMAP), operated on TCP port 143(only for non-encrypted), stores (Configurable on both server & client just like PoP) incoming email messages in the mail server until the user deletes them deliberately. This allows the users to access their mailbox from multiple machines and download messages to a local machine without deleting it from the mailbox in the mail server. SSL/TLS SSL uses a PKI to decide if a server\u2019s public key is trustworthy by requiring servers to use a security certificate signed by a trusted CA. When Netscape Navigator 1.0 was released, it trusted a single CA operated by the RSA Data Security corporation. The server\u2019s public RSA keys were used to be stored in the security certificate, which can then be used by the browser to establish a secure communication channel. The security certificates we use today still rely on the same standard (named X.509) that Netscape Navigator 1.0 used back then. Netscape intended to train users(though this didn\u2019t work out later) to differentiate secure communications from insecure ones, so they put a lock icon next to the address bar. When the lock is open, the communication is insecure. A closed lock means communication has been secured with SSL, which required the server to provide a signed certificate. You\u2019re obviously familiar with this icon as it\u2019s been in every browser ever since. The engineers at Netscape truly created a standard for secure internet communications. A year after releasing SSL 2.0, Netscape fixed several security issues and released SSL 3.0, a protocol that, albeit being officially deprecated since June 2015, remains in use in certain parts of the world more than 20 years after its introduction. To standardize SSL, the Internet Engineering Task Force (IETF) created a slightly modified SSL 3.0 and, in 1999, unveiled it as Transport Layer Security (TLS) 1.0. The name change between SSL and TLS continues to confuse people today. Officially, TLS is the new SSL, but in practice, people use SSL and TLS interchangeably to talk about any version of the protocol. Must See: https://tls.ulfheim.net/ https://davidwong.fr/tls13/","title":"PGP & S/MIME : Email Security"},{"location":"level101/security/network_security/#network-perimeter-security","text":"Let us see how we keep a check on the perimeter i.e the edges, the first layer of protection","title":"Network Perimeter Security"},{"location":"level101/security/network_security/#general-firewall-framework","text":"Firewalls are needed because encryption algorithms cannot effectively stop malicious packets from getting into an edge network. This is because IP packets, regardless of whether they are encrypted, can always be forwarded into an edge network. Firewalls that were developed in the 1990s are important instruments to help restrict network access. A firewall may be a hardware device, a software package, or a combination of both. Packets flowing into the internal network from the outside should be evaluated before they are allowed to enter. One of the critical elements of a firewall is its ability to examine packets without imposing a negative impact on communication speed while providing security protections for the internal network. The packet inspection that is carried out by firewalls can be done using several different methods. Based on the particular method used by the firewall, it can be characterized as either a packet filter, circuit gateway, application gateway, or dynamic packet filter.","title":"General Firewall Framework"},{"location":"level101/security/network_security/#packet-filters","text":"It inspects ingress packets coming to an internal network from outside and inspects egress packets going outside from an internal network Packing filtering only inspects IP headers and TCP headers, not the payloads generated at the application layer A packet-filtering firewall uses a set of rules to determine whether a packet should be allowed or denied to pass through. 2 types: Stateless It treats each packet as an independent object, and it does not keep track of any previously processed packets. In other words, stateless filtering inspects a packet when it arrives and makes a decision without leaving any record of the packet being inspected. Stateful Stateful filtering, also referred to as connection-state filtering, keeps track of connections between an internal host and an external host. A connection state (or state, for short) indicates whether it is a TCP connection or a UDP connection and whether the connection is established.","title":"Packet Filters"},{"location":"level101/security/network_security/#circuit-gateways","text":"Circuit gateways, also referred to as circuit-level gateways, are typically operated at the transportation layer They evaluate the information of the IP addresses and the port numbers contained in TCP (or UDP) headers and use it to determine whether to allow or to disallow an internal host and an external host to establish a connection. It is common practice to combine packet filters and circuit gateways to form a dynamic packet filter (DPF).","title":"Circuit Gateways"},{"location":"level101/security/network_security/#application-gatewaysalg","text":"Aka PROXY Servers An Application Level Gateway (ALG) acts as a proxy for internal hosts, processing service requests from external clients. An ALG performs deep inspections on each IP packet (ingress or egress). In particular, an ALG inspects application program formats contained in the packet (e.g., MIME format or SQL format) and examines whether its payload is permitted. Thus, an ALG may be able to detect a computer virus contained in the payload. Because an ALG inspects packet payloads, it may be able to detect malicious code and quarantine suspicious packets, in addition to blocking packets with suspicious IP addresses and TCP ports. On the other hand, an ALG also incurs substantial computation and space overheads.","title":"Application Gateways(ALG)"},{"location":"level101/security/network_security/#trusted-systems-bastion-hosts","text":"A Trusted Operating System (TOS) is an operating system that meets a particular set of security requirements. Whether an operating system can be trusted or not depends on several elements. For example, for an operating system on a particular computer to be certified trusted, one needs to validate that, among other things, the following four requirements are satisfied: Its system design contains no defects; Its system software contains no loopholes; Its system is configured properly; and Its system management is appropriate. Bastion Hosts Bastion hosts are computers with strong defence mechanisms. They often serve as host computers for implementing application gateways, circuit gateways, and other types of firewalls. A bastion host is operated on a trusted operating system that must not contain unnecessary functionalities or programs. This measure helps to reduce error probabilities and makes it easier to conduct security checks. Only those network application programs that are necessary, for example, SSH, DNS, SMTP, and authentication programs, are installed on a bastion host. Bastion hosts are also primarily used as controlled ingress points so that the security monitoring can focus more narrowly on actions happening at a single point closely.","title":"Trusted Systems & Bastion Hosts"},{"location":"level101/security/network_security/#common-techniques-scannings-packet-capturing","text":"","title":"Common Techniques & Scannings, Packet Capturing"},{"location":"level101/security/network_security/#scanning-ports-with-nmap","text":"Nmap (\"Network Mapper\") is a free and open-source (license) utility for network discovery and security auditing. Many systems and network administrators also find it useful for tasks such as network inventory, managing service upgrade schedules, and monitoring host or service uptime. The best thing about Nmap is it\u2019s free and open-source and is very flexible and versatile Nmap is often used to determine alive hosts in a network, open ports on those hosts, services running on those open ports, and version identification of that service on that port. More at http://scanme.nmap.org/ nmap [scan type] [options] [target specification] Nmap uses 6 different port states: Open \u2014 An open port is one that is actively accepting TCP, UDP or SCTP connections. Open ports are what interests us the most because they are the ones that are vulnerable to attacks. Open ports also show the available services on a network. Closed \u2014 A port that receives and responds to Nmap probe packets but there is no application listening on that port. Useful for identifying that the host exists and for OS detection. Filtered \u2014 Nmap can\u2019t determine whether the port is open because packet filtering prevents its probes from reaching the port. Filtering could come from firewalls or router rules. Often little information is given from filtered ports during scans as the filters can drop the probes without responding or respond with useless error messages e.g. destination unreachable. Unfiltered \u2014 Port is accessible but Nmap doesn\u2019t know if it is open or closed. Only used in ACK scan which is used to map firewall rulesets. Other scan types can be used to identify whether the port is open. Open/filtered \u2014 Nmap is unable to determine between open and filtered. This happens when an open port gives no response. No response could mean that the probe was dropped by a packet filter or any response is blocked. Closed/filtered \u2014 Nmap is unable to determine whether a port is closed or filtered. Only used in the IP ID idle scan.","title":"Scanning Ports with Nmap"},{"location":"level101/security/network_security/#types-of-nmap-scan","text":"TCP Connect TCP Connect scan completes the 3-way handshake. If a port is open, the operating system completes the TCP three-way handshake and the port scanner immediately closes the connection to avoid DOS. This is \u201cnoisy\u201d because the services can log the sender IP address and might trigger Intrusion Detection Systems. UDP Scan This scan checks to see if any UDP ports are listening. Since UDP does not respond with a positive acknowledgement like TCP and only responds to an incoming UDP packet when the port is closed, SYN Scan SYN scan is another form of TCP scanning. This scan type is also known as \u201chalf-open scanning\u201d because it never actually opens a full TCP connection. The port scanner generates a SYN packet. If the target port is open, it will respond with an SYN-ACK packet. The scanner host responds with an RST packet, closing the connection before the handshake is completed. If the port is closed but unfiltered, the target will instantly respond with an RST packet. SYN scan has the advantage that the individual services never actually receive a connection. FIN Scan This is a stealthy scan, like the SYN scan, but sends a TCP FIN packet instead. ACK Scan Ack scanning determines whether the port is filtered or not. Null Scan Another very stealthy scan that sets all the TCP header flags to off or null. This is not normally a valid packet and some hosts will not know what to do with this. XMAS Scan Similar to the NULL scan except for all the flags in the TCP header is set to on RPC Scan This special type of scan looks for machine answering to RPC (Remote Procedure Call) services IDLE Scan It is a super stealthy method whereby the scan packets are bounced off an external host. You don\u2019t need to have control over the other host but it does have to set up and meet certain requirements. You must input the IP address of our \u201czombie\u201d host and what port number to use. It is one of the more controversial options in Nmap since it only has a use for malicious attacks. Scan Techniques A couple of scan techniques which can be used to gain more information about a system and its ports. You can read more at https://medium.com/infosec-adventures/nmap-cheatsheet-a423fcdda0ca","title":"Types of Nmap Scan:"},{"location":"level101/security/network_security/#openvas","text":"OpenVAS is a full-featured vulnerability scanner. OpenVAS is a framework of services and tools that provides a comprehensive and powerful vulnerability scanning and management package OpenVAS, which is an open-source program, began as a fork of the once-more-popular scanning program, Nessus. OpenVAS is made up of three main parts. These are: a regularly updated feed of Network Vulnerability Tests (NVTs); a scanner, which runs the NVTs; and an SQLite 3 database for storing both your test configurations and the NVTs\u2019 results and configurations. https://www.greenbone.net/en/install_use_gce/","title":"OpenVAS"},{"location":"level101/security/network_security/#wireshark","text":"Wireshark is a protocol analyzer. This means Wireshark is designed to decode not only packet bits and bytes but also the relations between packets and protocols. Wireshark understands protocol sequences. A simple demo of Wireshark Capture only udp packets: Capture filter = \u201cudp\u201d Capture only tcp packets Capture filter = \u201ctcp\u201d TCP/IP 3 way Handshake Filter by IP address: displays all traffic from IP, be it source or destination ip.addr == 192.168.1.1 Filter by source address: display traffic only from IP source ip.src == 192.168.0.1 Filter by destination: display traffic only form IP destination ip.dst == 192.168.0.1 Filter by IP subnet: display traffic from subnet, be it source or destination ip.addr = 192.168.0.1/24 Filter by protocol: filter traffic by protocol name dns http ftp arp ssh telnet icmp Exclude IP address: remove traffic from and to IP address !ip.addr ==192.168.0.1 Display traffic between two specific subnet ip.addr == 192.168.0.1/24 and ip.addr == 192.168.1.1/24 Display traffic between two specific workstations ip.addr == 192.168.0.1 and ip.addr == 192.168.0.2 Filter by MAC eth.addr = 00:50:7f:c5:b6:78 Filter TCP port tcp.port == 80 Filter TCP port source tcp.srcport == 80 Filter TCP port destination tcp.dstport == 80 Find user agents http.user_agent contains Firefox !http.user_agent contains || !http.user_agent contains Chrome Filter broadcast traffic !(arp or icmp or dns) Filter IP address and port tcp.port == 80 && ip.addr == 192.168.0.1 Filter all http get requests http.request Filter all http get requests and responses http.request or http.response Filter three way handshake tcp.flags.syn==1 or (tcp.seq==1 and tcp.ack==1 and tcp.len==0 and tcp.analysis.initial_rtt) Find files by type frame contains \u201c(attachment|tar|exe|zip|pdf)\u201d Find traffic based on keyword tcp contains facebook frame contains facebook Detecting SYN Floods tcp.flags.syn == 1 and tcp.flags.ack == 0 Wireshark Promiscuous Mode - By default, Wireshark only captures packets going to and from the computer where it runs. By checking the box to run Wireshark in Promiscuous Mode in the Capture Settings, you can capture most of the traffic on the LAN.","title":"WireShark"},{"location":"level101/security/network_security/#dumpcap","text":"Dumpcap is a network traffic dump tool. It captures packet data from a live network and writes the packets to a file. Dumpcap\u2019s native capture file format is pcapng, which is also the format used by Wireshark. By default, Dumpcap uses the pcap library to capture traffic from the first available network interface and writes the received raw packet data, along with the packets\u2019 time stamps into a pcapng file. The capture filter syntax follows the rules of the pcap library. The Wireshark command-line utility called 'dumpcap.exe' can be used to capture LAN traffic over an extended period of time. Wireshark itself can also be used, but dumpcap does not significantly utilize the computer's memory while capturing for long periods.","title":"DumpCap"},{"location":"level101/security/network_security/#daemonlogger","text":"Daemonlogger is a packet logging application designed specifically for use in Network and Systems Management (NSM) environments. The biggest benefit Daemonlogger provides is that, like Dumpcap, it is simple to use for capturing packets. In order to begin capturing, you need only to invoke the command and specify an interface. daemonlogger \u2013i eth1 This option, by default, will begin capturing packets and logging them to the current working directory. Packets will be collected until the capture file size reaches 2 GB, and then a new file will be created. This will continue indefinitely until the process is halted.","title":"DaemonLogger"},{"location":"level101/security/network_security/#netsniff-ng","text":"Netsniff-NG is a high-performance packet capture utility While the utilities we\u2019ve discussed to this point rely on Libpcap for capture, Netsniff-NG utilizes zero-copy mechanisms to capture packets. This is done with the intent to support full packet capture over high throughput links. To begin capturing packets with Netsniff-NG, we have to specify an input and output. In most cases, the input will be a network interface, and the output will be a file or folder on disk. netsniff-ng \u2013i eth1 \u2013o data.pcap","title":"NetSniff-NG"},{"location":"level101/security/network_security/#netflow","text":"NetFlow is a feature that was introduced on Cisco routers around 1996 that provides the ability to collect IP network traffic as it enters or exits an interface. By analyzing the data provided by NetFlow, a network administrator can determine things such as the source and destination of traffic, class of service, and the causes of congestion. A typical flow monitoring setup (using NetFlow) consists of three main components:[1] Flow exporter: aggregates packets into flows and exports flow records towards one or more flow collectors. Flow collector: responsible for reception, storage and pre-processing of flow data received from a flow exporter. Analysis application: analyzes received flow data in the context of intrusion detection or traffic profiling, for example. Routers and switches that support NetFlow can collect IP traffic statistics on all interfaces where NetFlow is enabled, and later export those statistics as NetFlow records toward at least one NetFlow collector\u2014typically a server that does the actual traffic analysis.","title":"Netflow"},{"location":"level101/security/network_security/#ids","text":"A security solution that detects security-related events in your environment but does not block them. IDS sensors can be software and hardware-based used to collect and analyze the network traffic. These sensors are available in two varieties, network IDS and host IDS. A host IDS is a server-specific agent running on a server with a minimum of overhead to monitor the operating system. A network IDS can be embedded in a networking device, a standalone appliance, or a module monitoring the network traffic. Signature Based IDS The signature-based IDS monitors the network traffic or observes the system and sends an alarm if a known malicious event is happening. It does so by comparing the data flow against a database of known attack patterns These signatures explicitly define what traffic or activity should be considered as malicious. Signature-based detection has been the bread and butter of network-based defensive security for over a decade, partially because it is very similar to how malicious activity is detected at the host level with antivirus utilities The formula is fairly simple: an analyst observes a malicious activity, derives indicators from the activity and develops them into signatures, and then those signatures will alert whenever the activity occurs again. ex: SNORT & SURICATA Policy-Based IDS The policy-based IDSs (mainly host IDSs) trigger an alarm whenever a violation occurs against the configured policy. This configured policy is or should be a representation of the security policies. This type of IDS is flexible and can be customized to a company's network requirements because it knows exactly what is permitted and what is not. On the other hand, the signature-based systems rely on vendor specifics and default settings. Anomaly Based IDS The anomaly-based IDS looks for traffic that deviates from the normal, but the definition of what is a normal network traffic pattern is the tricky part Two types of anomaly-based IDS exist: statistical and nonstatistical anomaly detection Statistical anomaly detection learns the traffic patterns interactively over a period of time. In the nonstatistical approach, the IDS has a predefined configuration of the supposedly acceptable and valid traffic patterns. Host-Based IDS & Network-Based IDS A host IDS can be described as a distributed agent residing on each server of the network that needs protection. These distributed agents are tied very closely to the underlying operating system. Network IDSs, on the other hand, can be described as intelligent sniffing devices. Data (raw packets) is captured from the network by a network IDS, whereas host IDSs capture the data from the host on which they are installed. Honeypots The use of decoy machines to direct intruders' attention away from the machines under protection is a major technique to preclude intrusion attacks. Any device, system, directory, or file used as a decoy to lure attackers away from important assets and to collect intrusion or abusive behaviours is referred to as a honeypot. A honeypot may be implemented as a physical device or as an emulation system. The idea is to set up decoy machines in a LAN, or decoy directories/files in a file system and make them appear important, but with several exploitable loopholes, to lure attackers to attack these machines or directories/files, so that other machines, directories, and files can evade intruders' attentions. A decoy machine may be a host computer or a server computer. Likewise, we may also set up decoy routers or even decoy LANs.","title":"IDS"},{"location":"level101/security/network_security/#chinks-in-the-armour-tcpip-security-issues","text":"","title":"Chinks In The Armour (TCP/IP Security Issues)"},{"location":"level101/security/network_security/#ip-spoofing","text":"In this type of attack, the attacker replaces the IP address of the sender, or in some rare cases the destination, with a different address. IP spoofing is normally used to exploit a target host. In other cases, it is used to start a denial-of-service (DoS) attack. In a DoS attack, an attacker modifies the IP packet to mislead the target host into accepting the original packet as a packet sourced at a trusted host. The attacker must know the IP address of the trusted host to modify the packet headers (source IP address) so that it appears that the packets are coming from that host. IP Spoofing Detection Techniques Direct TTL Probes In this technique we send a packet to a host of suspect spoofed IP that triggers reply and compares TTL with suspect packet; if the TTL in the reply is not the same as the packet being checked; it is a spoofed packet. This Technique is successful when the attacker is in a different subnet from the victim. IP Identification Number. Send a probe to the host of suspect spoofed traffic that triggers a reply and compares IP ID with suspect traffic. If IP IDs are not in the near value of packet being checked, suspect traffic is spoofed TCP Flow Control Method Attackers sending spoofed TCP packets will not receive the target\u2019s SYN-ACK packets. Attackers cannot, therefore, be responsive to change in the congestion window size When the receiver still receives traffic even after a windows size is exhausted, most probably the packets are spoofed.","title":"IP Spoofing"},{"location":"level101/security/network_security/#covert-channel","text":"A covert or clandestine channel can be best described as a pipe or communication channel between two entities that can be exploited by a process or application transferring information in a manner that violates the system's security specifications. More specifically for TCP/IP, in some instances, covert channels are established, and data can be secretly passed between two end systems. Ex: ICMP resides at the Internet layer of the TCP/IP protocol suite and is implemented in all TCP/IP hosts. Based on the specifications of the ICMP Protocol, an ICMP Echo Request message should have an 8-byte header and a 56-byte payload. The ICMP Echo Request packet should not carry any data in the payload. However, these packets are often used to carry secret information. The ICMP packets are altered slightly to carry secret data in the payload. This makes the size of the packet larger, but no control exists in the protocol stack to defeat this behaviour. The alteration of ICMP packets allows intruders to program specialized client-server pairs. These small pieces of code export confidential information without alerting the network administrator. ICMP can be leveraged for more than data exfiltration. For eg. some C&C tools such as Loki used ICMP channel to establish encrypted interactive session back in 1996. Deep packet inspection has since come a long way. A lot of IDS/IPS detect ICMP tunnelling. Check for echo responses that do not contain the same payload as request Check for the volume of ICMP traffic especially for volumes beyond an acceptable threshold","title":"Covert Channel"},{"location":"level101/security/network_security/#ip-fragmentation-attack","text":"The TCP/IP protocol suite, or more specifically IP, allows the fragmentation of packets.(this is a feature & not a bug) IP fragmentation offset is used to keep track of the different parts of a datagram. The information or content in this field is used at the destination to reassemble the datagrams All such fragments have the same Identification field value, and the fragmentation offset indicates the position of the current fragment in the context of the original packet. Many access routers and firewalls do not perform packet reassembly. In normal operation, IP fragments do not overlap, but attackers can create artificially fragmented packets to mislead the routers or firewalls. Usually, these packets are small and almost impractical for end systems because of data and computational overhead. A good example of an IP fragmentation attack is the Ping of Death attack. The Ping of Death attack sends fragments that, when reassembled at the end station, create a larger packet than the maximum permissible length. TCP Flags Data exchange using TCP does not happen until a three-way handshake has been completed. This handshake uses different flags to influence the way TCP segments are processed. There are 6 bits in the TCP header that are often called flags. Namely: 6 different flags are part of the TCP header: Urgent pointer field (URG), Acknowledgment field (ACK), Push function (PSH), Reset the connection (RST), Synchronize sequence numbers (SYN), and the sender is finished with this connection (FIN). Abuse of the normal operation or settings of these flags can be used by attackers to launch DoS attacks. This causes network servers or web servers to crash or hang. | SYN | FIN | PSH | RST | Validity| |------|------|-------|------|---------| | 1 |1 |0 |0 |Illegal Combination | 1 |1 |1 |0 |Illegal Combination | 1 |1 |0 |1 |Illegal Combination | 1 |1 |1 |1 |Illegal Combination The attacker's ultimate goal is to write special programs or pieces of code that can construct these illegal combinations resulting in an efficient DoS attack. SYN FLOOD The timers (or lack of certain timers) in 3 way handshake are often used and exploited by attackers to disable services or even to enter systems. After step 2 of the three-way handshake, no limit is set on the time to wait after receiving a SYN. The attacker initiates many connection requests to the webserver of Company XYZ (almost certainly with a spoofed IP address). The SYN+ACK packets (Step 2) sent by the web server back to the originating source IP address are not replied to. This leaves a TCP session half-open on the webserver. Multiple packets cause multiple TCP sessions to stay open. Based on the hardware limitations of the server, a limited number of TCP sessions can stay open, and as a result, the webserver refuses further connection establishments attempts from any host as soon as a certain limit is reached. These half-open connections need to be completed or timed out before new connections can be established. FIN Attack In normal operation, the sender sets the TCP FIN flag indicating that no more data will be transmitted and the connection can be closed down. This is a four-way handshake mechanism, with both sender and receiver expected to send an acknowledgement on a received FIN packet. During an attack that is trying to kill connections, a spoofed FIN packet is constructed. This packet also has the correct sequence number, so the packets are seen as valid by the targeted host. These sequence numbers are easy to predict. This process is referred to as TCP sequence number prediction, whereby the attacker either sniffs the current Sequence and Acknowledgment (SEQ/ACK) numbers of the connection or can algorithmically predict these numbers.","title":"IP Fragmentation Attack"},{"location":"level101/security/network_security/#connection-hijacking","text":"An authorized user (Employee X) sends HTTP requests over a TCP session with the webserver. The web server accepts the packets from Employee X only when the packet has the correct SEQ/ACK numbers. As seen previously, these numbers are important for the webserver to distinguish between different sessions and to make sure it is still talking to Employee X. Imagine that the cracker starts sending packets to the web server spoofing the IP address of Employee X, using the correct SEQ/ACK combination. The web server accepts the packet and increments the ACK number. In the meantime, Employee X continues to send packets but with incorrect SEQ/ACK numbers. As a result of sending unsynchronized packets, all data from Employee X is discarded when received by the webserver. The attacker pretends to be Employee X using the correct numbers. This finally results in the cracker hijacking the connection, whereby Employee X is completely confused and the webserver replies assuming the cracker is sending correct synchronized data. STEPS: The attacker examines the traffic flows with a network monitor and notices traffic from Employee X to a web server. The web server returns or echoes data back to the origination station (Employee X). Employee X acknowledges the packet. The cracker launches a spoofed packet to the server. The web server responds to the cracker. The cracker starts verifying SEQ/ACK numbers to double-check success. At this time, the cracker takes over the session from Employee X, which results in a session hanging for Employee X. The cracker can start sending traffic to the webserver. The web server returns the requested data to confirm delivery with the correct ACK number. The cracker can continue to send data (keeping track of the correct SEQ/ACK numbers) until eventually setting the FIN flag to terminate the session.","title":"Connection Hijacking"},{"location":"level101/security/network_security/#buffer-overflow","text":"A buffer is a temporary data storage area used to store program code and data. When a program or process tries to store more data in a buffer than it was originally anticipated to hold, a buffer overflow occurs. Buffers are temporary storage locations in memory (memory or buffer sizes are often measured in bytes) that can store a fixed amount of data in bytes. When more data is retrieved than can be stored in a buffer location, the additional information must go into an adjacent buffer, resulting in overwriting the valid data held in them. Mechanism: Buffer overflow vulnerabilities exist in different types. But the overall goal for all buffer overflow attacks is to take over the control of a privileged program and, if possible, the host. The attacker has two tasks to achieve this goal. First, the dirty code needs to be available in the program's code address space. Second, the privileged program should jump to that particular part of the code, which ensures that the proper parameters are loaded into memory. The first task can be achieved in two ways: by injecting the code in the right address space or by using the existing code and modifying certain parameters slightly. The second task is a little more complex because the program's control flow needs to be modified to make the program jump to the dirty code. CounterMeasure: The most important approach is to have a concerted focus on writing correct code. A second method is to make the data buffers (memory locations) address space of the program code non-executable. This type of address space makes it impossible to execute code, which might be infiltrated in the program's buffers during an attack.","title":"Buffer Overflow"},{"location":"level101/security/network_security/#more-spoofing","text":"Address Resolution Protocol Spoofing The Address Resolution Protocol (ARP) provides a mechanism to resolve, or map, a known IP address to a MAC sublayer address. Using ARP spoofing, the cracker can exploit this hardware address authentication mechanism by spoofing the hardware address of Host B. Basically, the attacker can convince any host or network device on the local network that the cracker's workstation is the host to be trusted. This is a common method used in a switched environment. ARP spoofing can be prevented with the implementation of static ARP tables in all the hosts and routers of your network. Alternatively, you can implement an ARP server that responds to ARP requests on behalf of the target host. DNS Spoofing DNS spoofing is the method whereby the hacker convinces the target machine that the system it wants to connect to is the machine of the cracker. The cracker modifies some records so that name entries of hosts correspond to the attacker's IP address. There have been instances in which the complete DNS server was compromised by an attack. To counter DNS spoofing, the reverse lookup detects these attacks. The reverse lookup is a mechanism to verify the IP address against a name. The IP address and name files are usually kept on different servers to make compromise much more difficult","title":"More Spoofing"},{"location":"level101/security/threats_attacks_defences/","text":"Part III: Threats, Attacks & Defense DNS Protection Cache Poisoning Attack Since DNS responses are cached, a quick response can be provided for repeated translations. DNS negative queries are also cached, e.g., misspelt words, and all cached data periodically times out. Cache poisoning is an issue in what is known as pharming. This term is used to describe a hacker\u2019s attack in which a website\u2019s traffic is redirected to a bogus website by forging the DNS mapping. In this case, an attacker attempts to insert a fake address record for an Internet domain into the DNS. If the server accepts the fake record, the cache is poisoned and subsequent requests for the address of the domain are answered with the address of a server controlled by the attacker. As long as the fake entry is cached by the server, browsers or e-mail servers will automatically go to the address provided by the compromised DNS server. the typical time to live (TTL) for cached entries is a couple of hours, thereby permitting ample time for numerous users to be affected by the attack. DNSSEC (Security Extension) The long-term solution to these DNS problems is authentication. If a resolver cannot distinguish between valid and invalid data in a response, then add source authentication to verify that the data received in response is equal to the data entered by the zone administrator DNS Security Extensions (DNSSEC) protects against data spoofing and corruption and provides mechanisms to authenticate servers and requests, as well as mechanisms to establish authenticity and integrity. When authenticating DNS responses, each DNS zone signs its data using a private key. It is recommended that this signing be done offline and in advance. The query for a particular record returns the requested resource record set (RRset) and signature (RRSIG) of the requested resource record set. The resolver then authenticates the response using a public key, which is pre-configured or learned via a sequence of key records in the DNS hierarchy. The goals of DNSSEC are to provide authentication and integrity for DNS responses without confidentiality or DDoS protection. BGP BGP stands for border gateway protocol. It is a routing protocol that exchanges routing information among multiple Autonomous Systems (AS) An Autonomous System is a collection of routers or networks with the same network policy usually under single administrative control. BGP tells routers which hop to use in order to reach the destination network. BGP is used for both communicating information among routers in an AS (interior) and between multiple ASes (exterior). How BGP Works BGP is responsible for finding a path to a destination router & the path it chooses should be the shortest and most reliable one. This decision is done through a protocol known as Link state. With the link-state protocol, each router broadcasts to all other routers in the network the state of its links and IP subnets. Each router then receives information from the other routers and constructs a complete topology view of the entire network. The next-hop routing table is based on this topology view. The link-state protocol uses a famous algorithm in the field of computer science, Dijkstra\u2019s shortest path algorithm: We start from our router considering the path cost to all our direct neighbours. The shortest path is then taken We then re-look at all our neighbours that we can reach and update our link state table with the cost information. We then continue taking the shortest path until every router has been visited. BGP Vulnerabilities By corrupting the BGP routing table we are able to influence the direction traffic flows on the internet! This action is known as BGP hijacking. Injecting bogus route advertising information into the BGP-distributed routing database by malicious sources, accidentally or routers can disrupt Internet backbone operations. Blackholing traffic: Blackhole route is a network route, i.e., routing table entry, that goes nowhere and packets matching the route prefix are dropped or ignored. Blackhole routes can only be detected by monitoring the lost traffic. Blackhole routes are the best defence against many common viral attacks where the traffic is dropped from infected machines to/from command & control hosts. Infamous BGP Injection attack on Youtube Ex: In 2008, Pakistan decided to block YouTube by creating a BGP route that led into a black hole. Instead, this routing information got transmitted to a hong kong ISP and from there accidentally got propagated to the rest of the world meaning millions were routed through to this black hole and therefore unable to access YouTube. Potentially, the greatest risk to BGP occurs in a denial of service attack in which a router is flooded with more packets than it can handle. Network overload and router resource exhaustion happen when the network begins carrying an excessive number of BGP messages, overloading the router control processors, memory, routing table and reducing the bandwidth available for data traffic. Refer: https://medium.com/bugbountywriteup/bgp-the-weak-link-in-the-internet-what-is-bgp-and-how-do-hackers-exploit-it-d899a68ba5bb Router flapping is another type of attack. Route flapping refers to repetitive changes to the BGP routing table, often several times a minute. Withdrawing and re-advertising at a high-rate can cause a serious problem for routers since they propagate the announcements of routes. If these route flaps happen fast enough, e.g., 30 to 50 times per second, the router becomes overloaded, which eventually prevents convergence on valid routes. The potential impact for Internet users is a slowdown in message delivery, and in some cases, packets may not be delivered at all. BGP Security Border Gateway Protocol Security recommends the use of BGP peer authentication since it is one of the strongest mechanisms for preventing malicious activity. The authentication mechanisms are Internet Protocol Security (IPsec) or BGP MD5. Another method, known as prefix limits, can be used to avoid filling router tables. In this approach, routers should be configured to disable or terminate a BGP peering session, and issue warning messages to administrators when a neighbour sends in excess of a preset number of prefixes. IETF is currently working on improving this space Web-Based Attacks HTTP Response Splitting Attacks HTTP response splitting attack may happen where the server script embeds user data in HTTP response headers without appropriate sanitation. This typically happens when the script embeds user data in the redirection URL of a redirection response (HTTP status code 3xx), or when the script embeds user data in a cookie value or name when the response sets a cookie. HTTP response splitting attacks can be used to perform web cache poisoning and cross-site scripting attacks. HTTP response splitting is the attacker\u2019s ability to send a single HTTP request that forces the webserver to form an output stream, which is then interpreted by the target as two HTTP responses instead of one response. Cross-Site Request Forgery (CSRF or XSRF) A Cross-Site Request Forgery attack tricks the victim\u2019s browser into issuing a command to a vulnerable web application. Vulnerability is caused by browsers automatically including user authentication data, session ID, IP address, Windows domain credentials, etc. with each request. Attackers typically use CSRF to initiate transactions such as transfer funds, login/logout user, close account, access sensitive data, and change account details. The vulnerability is caused by web browsers that automatically include credentials with each request, even for requests caused by a form, script, or image on another site. CSRF can also be dynamically constructed as part of a payload for a cross-site scripting attack All sites relying on automatic credentials are vulnerable. Popular browsers cannot prevent cross-site request forgery. Logging out of high-value sites as soon as possible can mitigate CSRF risk. It is recommended that a high-value website must require a client to manually provide authentication data in the same HTTP request used to perform any operation with security implications. Limiting the lifetime of session cookies can also reduce the chance of being used by other malicious sites. OWASP recommends website developers include a required security token in HTTP requests associated with sensitive business functions in order to mitigate CSRF attacks Cross-Site Scripting (XSS) Attacks Cross-Site Scripting occurs when dynamically generated web pages display user input, such as login information, that is not properly validated, allowing an attacker to embed malicious scripts into the generated page and then execute the script on the machine of any user that views the site. If successful, Cross-Site Scripting vulnerabilities can be exploited to manipulate or steal cookies, create requests that can be mistaken for those of a valid user, compromise confidential information, or execute malicious code on end-user systems. Cross-Site Scripting (XSS or CSS) attacks involve the execution of malicious scripts on the victim\u2019s browser. The victim is simply a user\u2019s host and not the server. XSS results from a failure to validate user input by a web-based application. Document Object Model (DOM) XSS Attacks The Document Object Model (DOM) based XSS does not require the webserver to receive the XSS payload for a successful attack. The attacker abuses the runtime by embedding their data on the client-side. An attacker can force the client (browser) to render the page with parts of the DOM controlled by the attacker. When the page is rendered and the data is processed by the page, typically by a client-side HTML-embedded script such as JavaScript, the page\u2019s code may insecurely embed the data in the page itself, thus delivering the cross-site scripting payload. There are several DOM objects which can serve as an attack vehicle for delivering malicious script to victims browser. Clickjacking The technique works by hiding malicious link/scripts under the cover of the content of a legitimate site. Buttons on a website actually contain invisible links, placed there by the attacker. So, an individual who clicks on an object they can visually see is actually being duped into visiting a malicious page or executing a malicious script. When mouseover is used together with clickjacking, the outcome is devastating. Facebook users have been hit by a clickjacking attack, which tricks people into \u201cliking\u201d a particular Facebook page, thus enabling the attack to spread since Memorial Day 2010. There is not yet effective defence against clickjacking, and disabling JavaScript is the only viable method DataBase Attacks & Defenses SQL injection Attacks It exploits improper input validation in database queries. A successful exploit will allow attackers to access, modify, or delete information in the database. It permits attackers to steal sensitive information stored within the backend databases of affected websites, which may include such things as user credentials, email addresses, personal information, and credit card numbers SELECT USERNAME,PASSWORD from USERS where USERNAME='' AND PASSWORD=''; Here the username & password is the input provided by the user. Suppose an attacker gives the input as \" OR '1'='1'\" in both fields. Therefore the SQL query will look like: SELECT USERNAME,PASSWORD from USERS where USERNAME='' OR '1'='1' AND PASSOWRD='' OR '1'='1'; This query results in a true statement & the user gets logged in. This example depicts the bost basic type of SQL injection SQL Injection Attack Defenses SQL injection can be protected by filtering the query to eliminate malicious syntax, which involves the employment of some tools in order to (a) scan the source code. In addition, the input fields should be restricted to the absolute minimum, typically anywhere from 7-12 characters, and validate any data, e.g., if a user inputs an age make sure the input is an integer with a maximum of 3 digits. VPN A virtual private network (VPN) is a service that offers a secure, reliable connection over a shared public infrastructure such as the Internet. Cisco defines a VPN as an encrypted connection between private networks over a public network. To date, there are three types of VPNs: Remote access Site-to-site Firewall-based Security Breach In spite of the most aggressive steps to protect computers from attacks, attackers sometimes get through. Any event that results in a violation of any of the confidentiality, integrity, or availability (CIA) security tenets is a security breach. Denial of Service Attacks Denial of service (DoS) attacks result in downtime or inability of a user to access a system. DoS attacks impact the availability of tenet of information systems security. A DoS attack is a coordinated attempt to deny service by occupying a computer to perform large amounts of unnecessary tasks. This excessive activity makes the system unavailable to perform legitimate operations Two common types of DoS attacks are as follows: Logic attacks\u2014Logic attacks use software flaws to crash or seriously hinder the performance of remote servers. You can prevent many of these attacks by installing the latest patches to keep your software up to date. Flooding attacks\u2014Flooding attacks overwhelm the victim computer\u2019s CPU, memory, or network resources by sending large numbers of useless requests to the machine. Most DoS attacks target weaknesses in the overall system architecture rather than a software bug or security flaw One popular technique for launching a packet flood is a SYN flood. One of the best defences against DoS attacks is to use intrusion prevention system (IPS) software or devices to detect and stop the attack. Distributed Denial of Service Attacks DDoS attacks differ from regular DoS attacks in their scope. In a DDoS attack, attackers hijack hundreds or even thousands of Internet computers, planting automated attack agents on those systems. The attacker then instructs the agents to bombard the target site with forged messages. This overloads the site and blocks legitimate traffic. The key here is strength in numbers. The attacker does more damage by distributing the attack across multiple computers. Wiretapping Although the term wiretapping is generally associated with voice telephone communications, attackers can also use wiretapping to intercept data communications. Attackers can tap telephone lines and data communication lines. Wiretapping can be active, where the attacker makes modifications to the line. It can also be passive, where an unauthorized user simply listens to the transmission without changing the contents. Passive intrusion can include the copying of data for a subsequent active attack. Two methods of active wiretapping are as follows: Between-the-lines wiretapping\u2014This type of wiretapping does not alter the messages sent by the legitimate user but inserts additional messages into the communication line when the legitimate user pauses. Piggyback-entry wiretapping\u2014This type of wiretapping intercepts and modifies the original message by breaking the communications line and routing the message to another computer that acts as a host. Backdoors Software developers sometimes include hidden access methods, called backdoors, in their programs. Backdoors give developers or support personnel easy access to a system without having to struggle with security controls. The problem is that backdoors don\u2019t always stay hidden. When an attacker discovers a backdoor, he or she can use it to bypass existing security controls such as passwords, encryption, and so on. Where legitimate users log on through front doors using a user ID and password, attackers use backdoors to bypass these normal access controls. Malicious Attacks Birthday Attack Once an attacker compromises a hashed password file, a birthday attack is performed. A birthday attack is a type of cryptographic attack that is used to make a brute-force attack of one-way hashes easier. It is a mathematical exploit that is based on the birthday problem in probability theory. Further Reading: https://www.sciencedirect.com/topics/computer-science/birthday-attack https://www.internetsecurity.tips/birthday-attack/ Brute-Force Password Attacks In a brute-force password attack, the attacker tries different passwords on a system until one of them is successful. Usually, the attacker employs a software program to try all possible combinations of a likely password, user ID, or security code until it locates a match. This occurs rapidly and in sequence. This type of attack is called a brute-force password attack because the attacker simply hammers away at the code. There is no skill or stealth involved\u2014just brute force that eventually breaks the code. Further Reading: https://owasp.org/www-community/attacks/Brute_force_attack https://owasp.org/www-community/controls/Blocking_Brute_Force_Attacks Dictionary Password Attacks A dictionary password attack is a simple attack that relies on users making poor password choices. In a dictionary password attack, a simple password-cracker program takes all the words from a dictionary file and attempts to log on by entering each dictionary entry as a password. Further Reading: https://capec.mitre.org/data/definitions/16.html Replay Attacks Replay attacks involve capturing data packets from a network and retransmitting them to produce an unauthorized effect. The receipt of duplicate, authenticated IP packets may disrupt service or have some other undesired consequence. Systems can be broken through replay attacks when attackers reuse old messages or parts of old messages to deceive system users. This helps intruders to gain information that allows unauthorized access into a system. Further reading: https://study.com/academy/lesson/replay-attack-definition-examples-prevention.html Man-in-the-Middle Attacks A man-in-the-middle attack takes advantage of the multihop process used by many types of networks. In this type of attack, an attacker intercepts messages between two parties before transferring them on to their intended destination. Web spoofing is a type of man-in-the-middle attack in which the user believes a secure session exists with a particular web server. In reality, the secure connection exists only with the attacker, not the webserver. The attacker then establishes a secure connection with the webserver, acting as an invisible go-between. The attacker passes traffic between the user and the webserver. In this way, the attacker can trick the user into supplying passwords, credit card information, and other private data. Further Reading: https://owasp.org/www-community/attacks/Man-in-the-middle_attack Masquerading In a masquerade attack, one user or computer pretends to be another user or computer. Masquerade attacks usually include one of the other forms of active attacks, such as IP address spoofing or replaying. Attackers can capture authentication sequences and then replay them later to log on again to an application or operating system. For example, an attacker might monitor usernames and passwords sent to a weak web application. The attacker could then use the intercepted credentials to log on to the web application and impersonate the user. Further Reading: https://dl.acm.org/doi/book/10.5555/2521792 https://ieeexplore.ieee.org/document/1653228 Eavesdropping Eavesdropping, or sniffing, occurs when a host sets its network interface on promiscuous mode and copies packets that pass by for later analysis. Promiscuous mode enables a network device to intercept and read each network packet(of course given some conditions) given sec, even if the packet\u2019s address doesn\u2019t match the network device. It is possible to attach hardware and software to monitor and analyze all packets on that segment of the transmission media without alerting any other users. Candidates for eavesdropping include satellite, wireless, mobile, and other transmission methods. Social Engineering Attackers often use a deception technique called social engineering to gain access to resources in an IT infrastructure. In nearly all cases, social engineering involves tricking authorized users into carrying out actions for unauthorized users. The success of social engineering attacks depends on the basic tendency of people to want to be helpful. Phreaking Phone phreaking, or simply phreaking, is a slang term that describes the activity of a subculture of people who study, experiment with, or explore telephone systems, telephone company equipment, and systems connected to public telephone networks. Phreaking is the art of exploiting bugs and glitches that exist in the telephone system. Phishing Phishing is a type of fraud in which an attacker attempts to trick the victim into providing private information such as credit card numbers, passwords, dates of birth, bank account numbers, automated teller machine (ATM) PINs, and Social Security numbers. Pharming Pharming is another type of attack that seeks to obtain personal or private financial information through domain spoofing. A pharming attack doesn\u2019t use messages to trick victims into visiting spoofed websites that appear legitimate, however. Instead, pharming \u201cpoisons\u201d a domain name on the domain name server (DNS), a process known as DNS poisoning. The result is that when a user enters the poisoned server\u2019s web address into his or her address bar, that user navigates to the attacker\u2019s site. The user\u2019s browser still shows the correct website, which makes pharming difficult to detect\u2014and therefore more serious. Where phishing attempts to scam people one at a time with an email or instant message, pharming enables scammers to target large groups of people at one time through domain spoofing.","title":"Threat, Attacks & Defences"},{"location":"level101/security/threats_attacks_defences/#part-iii-threats-attacks-defense","text":"","title":"Part III: Threats, Attacks & Defense"},{"location":"level101/security/threats_attacks_defences/#dns-protection","text":"","title":"DNS Protection"},{"location":"level101/security/threats_attacks_defences/#cache-poisoning-attack","text":"Since DNS responses are cached, a quick response can be provided for repeated translations. DNS negative queries are also cached, e.g., misspelt words, and all cached data periodically times out. Cache poisoning is an issue in what is known as pharming. This term is used to describe a hacker\u2019s attack in which a website\u2019s traffic is redirected to a bogus website by forging the DNS mapping. In this case, an attacker attempts to insert a fake address record for an Internet domain into the DNS. If the server accepts the fake record, the cache is poisoned and subsequent requests for the address of the domain are answered with the address of a server controlled by the attacker. As long as the fake entry is cached by the server, browsers or e-mail servers will automatically go to the address provided by the compromised DNS server. the typical time to live (TTL) for cached entries is a couple of hours, thereby permitting ample time for numerous users to be affected by the attack.","title":"Cache Poisoning Attack"},{"location":"level101/security/threats_attacks_defences/#dnssec-security-extension","text":"The long-term solution to these DNS problems is authentication. If a resolver cannot distinguish between valid and invalid data in a response, then add source authentication to verify that the data received in response is equal to the data entered by the zone administrator DNS Security Extensions (DNSSEC) protects against data spoofing and corruption and provides mechanisms to authenticate servers and requests, as well as mechanisms to establish authenticity and integrity. When authenticating DNS responses, each DNS zone signs its data using a private key. It is recommended that this signing be done offline and in advance. The query for a particular record returns the requested resource record set (RRset) and signature (RRSIG) of the requested resource record set. The resolver then authenticates the response using a public key, which is pre-configured or learned via a sequence of key records in the DNS hierarchy. The goals of DNSSEC are to provide authentication and integrity for DNS responses without confidentiality or DDoS protection.","title":"DNSSEC (Security Extension)"},{"location":"level101/security/threats_attacks_defences/#bgp","text":"BGP stands for border gateway protocol. It is a routing protocol that exchanges routing information among multiple Autonomous Systems (AS) An Autonomous System is a collection of routers or networks with the same network policy usually under single administrative control. BGP tells routers which hop to use in order to reach the destination network. BGP is used for both communicating information among routers in an AS (interior) and between multiple ASes (exterior).","title":"BGP"},{"location":"level101/security/threats_attacks_defences/#how-bgp-works","text":"BGP is responsible for finding a path to a destination router & the path it chooses should be the shortest and most reliable one. This decision is done through a protocol known as Link state. With the link-state protocol, each router broadcasts to all other routers in the network the state of its links and IP subnets. Each router then receives information from the other routers and constructs a complete topology view of the entire network. The next-hop routing table is based on this topology view. The link-state protocol uses a famous algorithm in the field of computer science, Dijkstra\u2019s shortest path algorithm: We start from our router considering the path cost to all our direct neighbours. The shortest path is then taken We then re-look at all our neighbours that we can reach and update our link state table with the cost information. We then continue taking the shortest path until every router has been visited.","title":"How BGP Works"},{"location":"level101/security/threats_attacks_defences/#bgp-vulnerabilities","text":"By corrupting the BGP routing table we are able to influence the direction traffic flows on the internet! This action is known as BGP hijacking. Injecting bogus route advertising information into the BGP-distributed routing database by malicious sources, accidentally or routers can disrupt Internet backbone operations. Blackholing traffic: Blackhole route is a network route, i.e., routing table entry, that goes nowhere and packets matching the route prefix are dropped or ignored. Blackhole routes can only be detected by monitoring the lost traffic. Blackhole routes are the best defence against many common viral attacks where the traffic is dropped from infected machines to/from command & control hosts. Infamous BGP Injection attack on Youtube Ex: In 2008, Pakistan decided to block YouTube by creating a BGP route that led into a black hole. Instead, this routing information got transmitted to a hong kong ISP and from there accidentally got propagated to the rest of the world meaning millions were routed through to this black hole and therefore unable to access YouTube. Potentially, the greatest risk to BGP occurs in a denial of service attack in which a router is flooded with more packets than it can handle. Network overload and router resource exhaustion happen when the network begins carrying an excessive number of BGP messages, overloading the router control processors, memory, routing table and reducing the bandwidth available for data traffic. Refer: https://medium.com/bugbountywriteup/bgp-the-weak-link-in-the-internet-what-is-bgp-and-how-do-hackers-exploit-it-d899a68ba5bb Router flapping is another type of attack. Route flapping refers to repetitive changes to the BGP routing table, often several times a minute. Withdrawing and re-advertising at a high-rate can cause a serious problem for routers since they propagate the announcements of routes. If these route flaps happen fast enough, e.g., 30 to 50 times per second, the router becomes overloaded, which eventually prevents convergence on valid routes. The potential impact for Internet users is a slowdown in message delivery, and in some cases, packets may not be delivered at all. BGP Security Border Gateway Protocol Security recommends the use of BGP peer authentication since it is one of the strongest mechanisms for preventing malicious activity. The authentication mechanisms are Internet Protocol Security (IPsec) or BGP MD5. Another method, known as prefix limits, can be used to avoid filling router tables. In this approach, routers should be configured to disable or terminate a BGP peering session, and issue warning messages to administrators when a neighbour sends in excess of a preset number of prefixes. IETF is currently working on improving this space","title":"BGP Vulnerabilities"},{"location":"level101/security/threats_attacks_defences/#web-based-attacks","text":"","title":"Web-Based Attacks"},{"location":"level101/security/threats_attacks_defences/#http-response-splitting-attacks","text":"HTTP response splitting attack may happen where the server script embeds user data in HTTP response headers without appropriate sanitation. This typically happens when the script embeds user data in the redirection URL of a redirection response (HTTP status code 3xx), or when the script embeds user data in a cookie value or name when the response sets a cookie. HTTP response splitting attacks can be used to perform web cache poisoning and cross-site scripting attacks. HTTP response splitting is the attacker\u2019s ability to send a single HTTP request that forces the webserver to form an output stream, which is then interpreted by the target as two HTTP responses instead of one response.","title":"HTTP Response Splitting Attacks"},{"location":"level101/security/threats_attacks_defences/#cross-site-request-forgery-csrf-or-xsrf","text":"A Cross-Site Request Forgery attack tricks the victim\u2019s browser into issuing a command to a vulnerable web application. Vulnerability is caused by browsers automatically including user authentication data, session ID, IP address, Windows domain credentials, etc. with each request. Attackers typically use CSRF to initiate transactions such as transfer funds, login/logout user, close account, access sensitive data, and change account details. The vulnerability is caused by web browsers that automatically include credentials with each request, even for requests caused by a form, script, or image on another site. CSRF can also be dynamically constructed as part of a payload for a cross-site scripting attack All sites relying on automatic credentials are vulnerable. Popular browsers cannot prevent cross-site request forgery. Logging out of high-value sites as soon as possible can mitigate CSRF risk. It is recommended that a high-value website must require a client to manually provide authentication data in the same HTTP request used to perform any operation with security implications. Limiting the lifetime of session cookies can also reduce the chance of being used by other malicious sites. OWASP recommends website developers include a required security token in HTTP requests associated with sensitive business functions in order to mitigate CSRF attacks","title":"Cross-Site Request Forgery (CSRF or XSRF)"},{"location":"level101/security/threats_attacks_defences/#cross-site-scripting-xss-attacks","text":"Cross-Site Scripting occurs when dynamically generated web pages display user input, such as login information, that is not properly validated, allowing an attacker to embed malicious scripts into the generated page and then execute the script on the machine of any user that views the site. If successful, Cross-Site Scripting vulnerabilities can be exploited to manipulate or steal cookies, create requests that can be mistaken for those of a valid user, compromise confidential information, or execute malicious code on end-user systems. Cross-Site Scripting (XSS or CSS) attacks involve the execution of malicious scripts on the victim\u2019s browser. The victim is simply a user\u2019s host and not the server. XSS results from a failure to validate user input by a web-based application.","title":"Cross-Site Scripting (XSS) Attacks"},{"location":"level101/security/threats_attacks_defences/#document-object-model-dom-xss-attacks","text":"The Document Object Model (DOM) based XSS does not require the webserver to receive the XSS payload for a successful attack. The attacker abuses the runtime by embedding their data on the client-side. An attacker can force the client (browser) to render the page with parts of the DOM controlled by the attacker. When the page is rendered and the data is processed by the page, typically by a client-side HTML-embedded script such as JavaScript, the page\u2019s code may insecurely embed the data in the page itself, thus delivering the cross-site scripting payload. There are several DOM objects which can serve as an attack vehicle for delivering malicious script to victims browser.","title":"Document Object Model (DOM) XSS Attacks"},{"location":"level101/security/threats_attacks_defences/#clickjacking","text":"The technique works by hiding malicious link/scripts under the cover of the content of a legitimate site. Buttons on a website actually contain invisible links, placed there by the attacker. So, an individual who clicks on an object they can visually see is actually being duped into visiting a malicious page or executing a malicious script. When mouseover is used together with clickjacking, the outcome is devastating. Facebook users have been hit by a clickjacking attack, which tricks people into \u201cliking\u201d a particular Facebook page, thus enabling the attack to spread since Memorial Day 2010. There is not yet effective defence against clickjacking, and disabling JavaScript is the only viable method","title":"Clickjacking"},{"location":"level101/security/threats_attacks_defences/#database-attacks-defenses","text":"","title":"DataBase Attacks & Defenses"},{"location":"level101/security/threats_attacks_defences/#sql-injection-attacks","text":"It exploits improper input validation in database queries. A successful exploit will allow attackers to access, modify, or delete information in the database. It permits attackers to steal sensitive information stored within the backend databases of affected websites, which may include such things as user credentials, email addresses, personal information, and credit card numbers SELECT USERNAME,PASSWORD from USERS where USERNAME='' AND PASSWORD=''; Here the username & password is the input provided by the user. Suppose an attacker gives the input as \" OR '1'='1'\" in both fields. Therefore the SQL query will look like: SELECT USERNAME,PASSWORD from USERS where USERNAME='' OR '1'='1' AND PASSOWRD='' OR '1'='1'; This query results in a true statement & the user gets logged in. This example depicts the bost basic type of SQL injection","title":"SQL injection Attacks"},{"location":"level101/security/threats_attacks_defences/#sql-injection-attack-defenses","text":"SQL injection can be protected by filtering the query to eliminate malicious syntax, which involves the employment of some tools in order to (a) scan the source code. In addition, the input fields should be restricted to the absolute minimum, typically anywhere from 7-12 characters, and validate any data, e.g., if a user inputs an age make sure the input is an integer with a maximum of 3 digits.","title":"SQL Injection Attack Defenses"},{"location":"level101/security/threats_attacks_defences/#vpn","text":"A virtual private network (VPN) is a service that offers a secure, reliable connection over a shared public infrastructure such as the Internet. Cisco defines a VPN as an encrypted connection between private networks over a public network. To date, there are three types of VPNs: Remote access Site-to-site Firewall-based","title":"VPN"},{"location":"level101/security/threats_attacks_defences/#security-breach","text":"In spite of the most aggressive steps to protect computers from attacks, attackers sometimes get through. Any event that results in a violation of any of the confidentiality, integrity, or availability (CIA) security tenets is a security breach.","title":"Security Breach"},{"location":"level101/security/threats_attacks_defences/#denial-of-service-attacks","text":"Denial of service (DoS) attacks result in downtime or inability of a user to access a system. DoS attacks impact the availability of tenet of information systems security. A DoS attack is a coordinated attempt to deny service by occupying a computer to perform large amounts of unnecessary tasks. This excessive activity makes the system unavailable to perform legitimate operations Two common types of DoS attacks are as follows: Logic attacks\u2014Logic attacks use software flaws to crash or seriously hinder the performance of remote servers. You can prevent many of these attacks by installing the latest patches to keep your software up to date. Flooding attacks\u2014Flooding attacks overwhelm the victim computer\u2019s CPU, memory, or network resources by sending large numbers of useless requests to the machine. Most DoS attacks target weaknesses in the overall system architecture rather than a software bug or security flaw One popular technique for launching a packet flood is a SYN flood. One of the best defences against DoS attacks is to use intrusion prevention system (IPS) software or devices to detect and stop the attack.","title":"Denial of Service Attacks"},{"location":"level101/security/threats_attacks_defences/#distributed-denial-of-service-attacks","text":"DDoS attacks differ from regular DoS attacks in their scope. In a DDoS attack, attackers hijack hundreds or even thousands of Internet computers, planting automated attack agents on those systems. The attacker then instructs the agents to bombard the target site with forged messages. This overloads the site and blocks legitimate traffic. The key here is strength in numbers. The attacker does more damage by distributing the attack across multiple computers.","title":"Distributed Denial of Service Attacks"},{"location":"level101/security/threats_attacks_defences/#wiretapping","text":"Although the term wiretapping is generally associated with voice telephone communications, attackers can also use wiretapping to intercept data communications. Attackers can tap telephone lines and data communication lines. Wiretapping can be active, where the attacker makes modifications to the line. It can also be passive, where an unauthorized user simply listens to the transmission without changing the contents. Passive intrusion can include the copying of data for a subsequent active attack. Two methods of active wiretapping are as follows: Between-the-lines wiretapping\u2014This type of wiretapping does not alter the messages sent by the legitimate user but inserts additional messages into the communication line when the legitimate user pauses. Piggyback-entry wiretapping\u2014This type of wiretapping intercepts and modifies the original message by breaking the communications line and routing the message to another computer that acts as a host.","title":"Wiretapping"},{"location":"level101/security/threats_attacks_defences/#backdoors","text":"Software developers sometimes include hidden access methods, called backdoors, in their programs. Backdoors give developers or support personnel easy access to a system without having to struggle with security controls. The problem is that backdoors don\u2019t always stay hidden. When an attacker discovers a backdoor, he or she can use it to bypass existing security controls such as passwords, encryption, and so on. Where legitimate users log on through front doors using a user ID and password, attackers use backdoors to bypass these normal access controls.","title":"Backdoors"},{"location":"level101/security/threats_attacks_defences/#malicious-attacks","text":"","title":"Malicious Attacks"},{"location":"level101/security/threats_attacks_defences/#birthday-attack","text":"Once an attacker compromises a hashed password file, a birthday attack is performed. A birthday attack is a type of cryptographic attack that is used to make a brute-force attack of one-way hashes easier. It is a mathematical exploit that is based on the birthday problem in probability theory. Further Reading: https://www.sciencedirect.com/topics/computer-science/birthday-attack https://www.internetsecurity.tips/birthday-attack/","title":"Birthday Attack"},{"location":"level101/security/threats_attacks_defences/#brute-force-password-attacks","text":"In a brute-force password attack, the attacker tries different passwords on a system until one of them is successful. Usually, the attacker employs a software program to try all possible combinations of a likely password, user ID, or security code until it locates a match. This occurs rapidly and in sequence. This type of attack is called a brute-force password attack because the attacker simply hammers away at the code. There is no skill or stealth involved\u2014just brute force that eventually breaks the code. Further Reading: https://owasp.org/www-community/attacks/Brute_force_attack https://owasp.org/www-community/controls/Blocking_Brute_Force_Attacks","title":"Brute-Force Password Attacks"},{"location":"level101/security/threats_attacks_defences/#dictionary-password-attacks","text":"A dictionary password attack is a simple attack that relies on users making poor password choices. In a dictionary password attack, a simple password-cracker program takes all the words from a dictionary file and attempts to log on by entering each dictionary entry as a password. Further Reading: https://capec.mitre.org/data/definitions/16.html","title":"Dictionary Password Attacks"},{"location":"level101/security/threats_attacks_defences/#replay-attacks","text":"Replay attacks involve capturing data packets from a network and retransmitting them to produce an unauthorized effect. The receipt of duplicate, authenticated IP packets may disrupt service or have some other undesired consequence. Systems can be broken through replay attacks when attackers reuse old messages or parts of old messages to deceive system users. This helps intruders to gain information that allows unauthorized access into a system. Further reading: https://study.com/academy/lesson/replay-attack-definition-examples-prevention.html","title":"Replay Attacks"},{"location":"level101/security/threats_attacks_defences/#man-in-the-middle-attacks","text":"A man-in-the-middle attack takes advantage of the multihop process used by many types of networks. In this type of attack, an attacker intercepts messages between two parties before transferring them on to their intended destination. Web spoofing is a type of man-in-the-middle attack in which the user believes a secure session exists with a particular web server. In reality, the secure connection exists only with the attacker, not the webserver. The attacker then establishes a secure connection with the webserver, acting as an invisible go-between. The attacker passes traffic between the user and the webserver. In this way, the attacker can trick the user into supplying passwords, credit card information, and other private data. Further Reading: https://owasp.org/www-community/attacks/Man-in-the-middle_attack","title":"Man-in-the-Middle Attacks"},{"location":"level101/security/threats_attacks_defences/#masquerading","text":"In a masquerade attack, one user or computer pretends to be another user or computer. Masquerade attacks usually include one of the other forms of active attacks, such as IP address spoofing or replaying. Attackers can capture authentication sequences and then replay them later to log on again to an application or operating system. For example, an attacker might monitor usernames and passwords sent to a weak web application. The attacker could then use the intercepted credentials to log on to the web application and impersonate the user. Further Reading: https://dl.acm.org/doi/book/10.5555/2521792 https://ieeexplore.ieee.org/document/1653228","title":"Masquerading"},{"location":"level101/security/threats_attacks_defences/#eavesdropping","text":"Eavesdropping, or sniffing, occurs when a host sets its network interface on promiscuous mode and copies packets that pass by for later analysis. Promiscuous mode enables a network device to intercept and read each network packet(of course given some conditions) given sec, even if the packet\u2019s address doesn\u2019t match the network device. It is possible to attach hardware and software to monitor and analyze all packets on that segment of the transmission media without alerting any other users. Candidates for eavesdropping include satellite, wireless, mobile, and other transmission methods.","title":"Eavesdropping"},{"location":"level101/security/threats_attacks_defences/#social-engineering","text":"Attackers often use a deception technique called social engineering to gain access to resources in an IT infrastructure. In nearly all cases, social engineering involves tricking authorized users into carrying out actions for unauthorized users. The success of social engineering attacks depends on the basic tendency of people to want to be helpful.","title":"Social Engineering"},{"location":"level101/security/threats_attacks_defences/#phreaking","text":"Phone phreaking, or simply phreaking, is a slang term that describes the activity of a subculture of people who study, experiment with, or explore telephone systems, telephone company equipment, and systems connected to public telephone networks. Phreaking is the art of exploiting bugs and glitches that exist in the telephone system.","title":"Phreaking"},{"location":"level101/security/threats_attacks_defences/#phishing","text":"Phishing is a type of fraud in which an attacker attempts to trick the victim into providing private information such as credit card numbers, passwords, dates of birth, bank account numbers, automated teller machine (ATM) PINs, and Social Security numbers.","title":"Phishing"},{"location":"level101/security/threats_attacks_defences/#pharming","text":"Pharming is another type of attack that seeks to obtain personal or private financial information through domain spoofing. A pharming attack doesn\u2019t use messages to trick victims into visiting spoofed websites that appear legitimate, however. Instead, pharming \u201cpoisons\u201d a domain name on the domain name server (DNS), a process known as DNS poisoning. The result is that when a user enters the poisoned server\u2019s web address into his or her address bar, that user navigates to the attacker\u2019s site. The user\u2019s browser still shows the correct website, which makes pharming difficult to detect\u2014and therefore more serious. Where phishing attempts to scam people one at a time with an email or instant message, pharming enables scammers to target large groups of people at one time through domain spoofing.","title":"Pharming"},{"location":"level101/security/writing_secure_code/","text":"PART IV: Writing Secure Code & More The first and most important step in reducing security and reliability issues is to educate developers. However, even the best-trained engineers make mistakes, security experts can write insecure code and SREs can miss reliability issues. It\u2019s difficult to keep the many considerations and tradeoffs involved in building secure and reliable systems in mind simultaneously, especially if you\u2019re also responsible for producing software. Use frameworks to enforce security and reliability while writing code A better approach is to handle security and reliability in common frameworks, languages, and libraries. Ideally, libraries only expose an interface that makes writing code with common classes of security vulnerabilities impossible. Multiple applications can use each library or framework. When domain experts fix an issue, they remove it from all the applications the framework supports, allowing this engineering approach to scale better. Common Security Vulnerabilities In large codebases, a handful of classes account for the majority of security vulnerabilities, despite ongoing efforts to educate developers and introduce code review. OWASP and SANS publish lists of common vulnerability classes Write Simple Code Try to keep your code clean and simple. Avoid Multi-Level Nesting Multilevel nesting is a common anti-pattern that can lead to simple mistakes. If the error is in the most common code path, it will likely be captured by the unit tests. However, unit tests don\u2019t always check error handling paths in multilevel nested code. The error might result in decreased reliability (for example, if the service crashes when it mishandles an error) or a security vulnerability (like a mishandled authorization check error). Eliminate YAGNI Smells Sometimes developers overengineer solutions by adding functionality that may be useful in the future, \u201cjust in case.\u201d This goes against the YAGNI (You Aren\u2019t Gonna Need It) principle, which recommends implementing only the code that you need. YAGNI code adds unnecessary complexity because it needs to be documented, tested, and maintained. To summarize, avoiding YAGNI code leads to improved reliability, and simpler code leads to fewer security bugs, fewer opportunities to make mistakes, and less developer time spent maintaining unused code. Repay Technical Debt It is a common practice for developers to mark places that require further attention with TODO or FIXME annotations. In the short term, this habit can accelerate the delivery velocity for the most critical functionality, and allow a team to meet early deadlines\u2014but it also incurs technical debt. Still, it\u2019s not necessarily a bad practice, as long as you have a clear process (and allocate time) for repaying such debt. Refactoring Refactoring is the most effective way to keep a codebase clean and simple. Even a healthy codebase occasionally needs to be Regardless of the reasons behind refactoring, you should always follow one golden rule: never mix refactoring and functional changes in a single commit to the code repository. Refactoring changes are typically significant and can be difficult to understand. If a commit also includes functional changes, there\u2019s a higher risk that an author or reviewer might overlook bugs. Unit Testing Unit testing can increase system security and reliability by pinpointing a wide range of bugs in individual software components before a release. This technique involves breaking software components into smaller, self-contained \u201cunits\u201d that have no external dependencies, and then testing each unit. Fuzz Testing Fuzz testing is a technique that complements the previously mentioned testing techniques. Fuzzing involves using a fuzzing engine to generate a large number of candidate inputs that are then passed through a fuzz driver to the fuzz target. The fuzzer then analyzes how the system handles the input. Complex inputs handled by all kinds of software are popular targets for fuzzing - for example, file parsers, compression algorithms, network protocol implementation and audio codec. Integration Testing Integration testing moves beyond individual units and abstractions, replacing fake or stubbed-out implementations of abstractions like databases or network services with real implementations. As a result, integration tests exercise more complete code paths. Because you must initialize and configure these other dependencies, integration testing may be slower and flakier than unit testing\u2014to execute the test, this approach incorporates real-world variables like network latency as services communicate end-to-end. As you move from testing individual low-level units of code to testing how they interact when composed together, the net result is a higher degree of confidence that the system is behaving as expected. Last But not the least Code Reviews Rely on Automation Don\u2019t check in Secrets Verifiable Builds","title":"Writing Secure code"},{"location":"level101/security/writing_secure_code/#part-iv-writing-secure-code-more","text":"The first and most important step in reducing security and reliability issues is to educate developers. However, even the best-trained engineers make mistakes, security experts can write insecure code and SREs can miss reliability issues. It\u2019s difficult to keep the many considerations and tradeoffs involved in building secure and reliable systems in mind simultaneously, especially if you\u2019re also responsible for producing software.","title":"PART IV: Writing Secure Code & More"},{"location":"level101/security/writing_secure_code/#use-frameworks-to-enforce-security-and-reliability-while-writing-code","text":"A better approach is to handle security and reliability in common frameworks, languages, and libraries. Ideally, libraries only expose an interface that makes writing code with common classes of security vulnerabilities impossible. Multiple applications can use each library or framework. When domain experts fix an issue, they remove it from all the applications the framework supports, allowing this engineering approach to scale better.","title":"Use frameworks to enforce security and reliability while writing code"},{"location":"level101/security/writing_secure_code/#common-security-vulnerabilities","text":"In large codebases, a handful of classes account for the majority of security vulnerabilities, despite ongoing efforts to educate developers and introduce code review. OWASP and SANS publish lists of common vulnerability classes","title":"Common Security Vulnerabilities"},{"location":"level101/security/writing_secure_code/#write-simple-code","text":"Try to keep your code clean and simple.","title":"Write Simple Code"},{"location":"level101/security/writing_secure_code/#avoid-multi-level-nesting","text":"Multilevel nesting is a common anti-pattern that can lead to simple mistakes. If the error is in the most common code path, it will likely be captured by the unit tests. However, unit tests don\u2019t always check error handling paths in multilevel nested code. The error might result in decreased reliability (for example, if the service crashes when it mishandles an error) or a security vulnerability (like a mishandled authorization check error).","title":"Avoid Multi-Level Nesting"},{"location":"level101/security/writing_secure_code/#eliminate-yagni-smells","text":"Sometimes developers overengineer solutions by adding functionality that may be useful in the future, \u201cjust in case.\u201d This goes against the YAGNI (You Aren\u2019t Gonna Need It) principle, which recommends implementing only the code that you need. YAGNI code adds unnecessary complexity because it needs to be documented, tested, and maintained. To summarize, avoiding YAGNI code leads to improved reliability, and simpler code leads to fewer security bugs, fewer opportunities to make mistakes, and less developer time spent maintaining unused code.","title":"Eliminate YAGNI Smells"},{"location":"level101/security/writing_secure_code/#repay-technical-debt","text":"It is a common practice for developers to mark places that require further attention with TODO or FIXME annotations. In the short term, this habit can accelerate the delivery velocity for the most critical functionality, and allow a team to meet early deadlines\u2014but it also incurs technical debt. Still, it\u2019s not necessarily a bad practice, as long as you have a clear process (and allocate time) for repaying such debt.","title":"Repay Technical Debt"},{"location":"level101/security/writing_secure_code/#refactoring","text":"Refactoring is the most effective way to keep a codebase clean and simple. Even a healthy codebase occasionally needs to be Regardless of the reasons behind refactoring, you should always follow one golden rule: never mix refactoring and functional changes in a single commit to the code repository. Refactoring changes are typically significant and can be difficult to understand. If a commit also includes functional changes, there\u2019s a higher risk that an author or reviewer might overlook bugs.","title":"Refactoring"},{"location":"level101/security/writing_secure_code/#unit-testing","text":"Unit testing can increase system security and reliability by pinpointing a wide range of bugs in individual software components before a release. This technique involves breaking software components into smaller, self-contained \u201cunits\u201d that have no external dependencies, and then testing each unit.","title":"Unit Testing"},{"location":"level101/security/writing_secure_code/#fuzz-testing","text":"Fuzz testing is a technique that complements the previously mentioned testing techniques. Fuzzing involves using a fuzzing engine to generate a large number of candidate inputs that are then passed through a fuzz driver to the fuzz target. The fuzzer then analyzes how the system handles the input. Complex inputs handled by all kinds of software are popular targets for fuzzing - for example, file parsers, compression algorithms, network protocol implementation and audio codec.","title":"Fuzz Testing"},{"location":"level101/security/writing_secure_code/#integration-testing","text":"Integration testing moves beyond individual units and abstractions, replacing fake or stubbed-out implementations of abstractions like databases or network services with real implementations. As a result, integration tests exercise more complete code paths. Because you must initialize and configure these other dependencies, integration testing may be slower and flakier than unit testing\u2014to execute the test, this approach incorporates real-world variables like network latency as services communicate end-to-end. As you move from testing individual low-level units of code to testing how they interact when composed together, the net result is a higher degree of confidence that the system is behaving as expected.","title":"Integration Testing"},{"location":"level101/security/writing_secure_code/#last-but-not-the-least","text":"Code Reviews Rely on Automation Don\u2019t check in Secrets Verifiable Builds","title":"Last But not the least"},{"location":"level101/systems_design/availability/","text":"HA - Availability - Common \u201cNines\u201d Availability is generally expressed as \u201cNines\u201d, common \u2018Nines\u2019 are listed below. Availability % Downtime per year Downtime per month Downtime per week Downtime per day 99%(Two Nines) 3.65 days 7.31 hours 1.68 hours 14.40 minutes 99.5%(Two and a half Nines) 1.83 days 3.65 hours 50.40 minutes 7.20 minutes 99.9%(Three Nines) 8.77 hours 43.83 minutes 10.08 minutes 1.44 minutes 99.95%(Three and a half Nines) 4.38 hours 21.92 minutes 5.04 minutes 43.20 seconds 99.99%(Four Nines) 52.60 minutes 4.38 minutes 1.01 minutes 8.64 seconds 99.995%(Four and a half Nines) 26.30 minutes 2.19 minutes 30.24 seconds 4.32 seconds 99.999%(Five Nines) 5.26 minutes 26.30 seconds 6.05 seconds 864.0 ms Refer https://en.wikipedia.org/wiki/High_availability#Percentage_calculation HA - Availability Serial Components A System with components is operating in the series If the failure of a part leads to the combination becoming inoperable. For example, if LB in our architecture fails, all access to app tiers will fail. LB and app tiers are connected serially. The combined availability of the system is the product of individual components availability A = Ax x Ay x \u2026.. Refer http://www.eventhelix.com/RealtimeMantra/FaultHandling/system_reliability_availability.htm HA - Availability Parallel Components A System with components is operating in parallel If the failure of a part leads to the other part taking over the operations of the failed part. If we have more than one LB and if the rest of the LBs can take over the traffic during one LB failure then LBs are operating in parallel The combined availability of the system is A = 1 - ( (1-Ax) x (1-Ax) x \u2026.. ) Refer http://www.eventhelix.com/RealtimeMantra/FaultHandling/system_reliability_availability.htm HA - Core Principles Elimination of single points of failure (SPOF) This means adding redundancy to the system so that the failure of a component does not mean failure of the entire system. Reliable crossover In redundant systems, the crossover point itself tends to become a single point of failure. Reliable systems must provide for reliable crossover. Detection of failures as they occur If the two principles above are observed, then a user may never see a failure Refer https://en.wikipedia.org/wiki/High_availability#Principles HA - SPOF WHAT: Never implement and always eliminate single points of failure. WHEN TO USE: During architecture reviews and new designs. HOW TO USE: Identify single instances on architectural diagrams. Strive for active/active configurations. At the very least we should have a standby to take control when active instances fail. WHY: Maximize availability through multiple instances. KEY TAKEAWAYS: Strive for active/active rather than active/passive solutions. Use load balancers to balance traffic across instances of a service. Use control services with active/passive instances for patterns that require singletons. HA - Reliable Crossover WHAT: Ensure when system components failover they do so reliably. WHEN TO USE: During architecture reviews, failure modeling, and designs. HOW TO USE: Identify how available a system is during the crossover and ensure it is within acceptable limits. WHY: Maximize availability and ensure data handling semantics are preserved. KEY TAKEAWAYS: Strive for active/active rather than active/passive solutions, they have a lesser risk of cross over being unreliable. Use LB and the right load balancing methods to ensure reliable failover. Model and build your data systems to ensure data is correctly handled when crossover happens. Generally, DB systems follow active/passive semantics for writes. Masters accept writes and when the master goes down, the follower is promoted to master(active from being passive) to accept writes. We have to be careful here that the cutover never introduces more than one master. This problem is called a split brain. Applications in SRE role SRE works on deciding an acceptable SLA and make sure the system is available to achieve the SLA SRE is involved in architecture design right from building the data center to make sure the site is not affected by a network switch, hardware, power, or software failures SRE also run mock drills of failures to see how the system behaves in uncharted territory and comes up with a plan to improve availability if there are misses. https://engineering.linkedin.com/blog/2017/11/resilience-engineering-at-linkedin-with-project-waterbear Post our understanding about HA, our architecture diagram looks something like this below","title":"Availability"},{"location":"level101/systems_design/availability/#ha-availability-common-nines","text":"Availability is generally expressed as \u201cNines\u201d, common \u2018Nines\u2019 are listed below. Availability % Downtime per year Downtime per month Downtime per week Downtime per day 99%(Two Nines) 3.65 days 7.31 hours 1.68 hours 14.40 minutes 99.5%(Two and a half Nines) 1.83 days 3.65 hours 50.40 minutes 7.20 minutes 99.9%(Three Nines) 8.77 hours 43.83 minutes 10.08 minutes 1.44 minutes 99.95%(Three and a half Nines) 4.38 hours 21.92 minutes 5.04 minutes 43.20 seconds 99.99%(Four Nines) 52.60 minutes 4.38 minutes 1.01 minutes 8.64 seconds 99.995%(Four and a half Nines) 26.30 minutes 2.19 minutes 30.24 seconds 4.32 seconds 99.999%(Five Nines) 5.26 minutes 26.30 seconds 6.05 seconds 864.0 ms","title":"HA - Availability - Common \u201cNines\u201d"},{"location":"level101/systems_design/availability/#refer","text":"https://en.wikipedia.org/wiki/High_availability#Percentage_calculation","title":"Refer"},{"location":"level101/systems_design/availability/#ha-availability-serial-components","text":"A System with components is operating in the series If the failure of a part leads to the combination becoming inoperable. For example, if LB in our architecture fails, all access to app tiers will fail. LB and app tiers are connected serially. The combined availability of the system is the product of individual components availability A = Ax x Ay x \u2026..","title":"HA - Availability Serial Components"},{"location":"level101/systems_design/availability/#refer_1","text":"http://www.eventhelix.com/RealtimeMantra/FaultHandling/system_reliability_availability.htm","title":"Refer"},{"location":"level101/systems_design/availability/#ha-availability-parallel-components","text":"A System with components is operating in parallel If the failure of a part leads to the other part taking over the operations of the failed part. If we have more than one LB and if the rest of the LBs can take over the traffic during one LB failure then LBs are operating in parallel The combined availability of the system is A = 1 - ( (1-Ax) x (1-Ax) x \u2026.. )","title":"HA - Availability Parallel Components"},{"location":"level101/systems_design/availability/#refer_2","text":"http://www.eventhelix.com/RealtimeMantra/FaultHandling/system_reliability_availability.htm","title":"Refer"},{"location":"level101/systems_design/availability/#ha-core-principles","text":"Elimination of single points of failure (SPOF) This means adding redundancy to the system so that the failure of a component does not mean failure of the entire system. Reliable crossover In redundant systems, the crossover point itself tends to become a single point of failure. Reliable systems must provide for reliable crossover. Detection of failures as they occur If the two principles above are observed, then a user may never see a failure","title":"HA - Core Principles"},{"location":"level101/systems_design/availability/#refer_3","text":"https://en.wikipedia.org/wiki/High_availability#Principles","title":"Refer"},{"location":"level101/systems_design/availability/#ha-spof","text":"WHAT: Never implement and always eliminate single points of failure. WHEN TO USE: During architecture reviews and new designs. HOW TO USE: Identify single instances on architectural diagrams. Strive for active/active configurations. At the very least we should have a standby to take control when active instances fail. WHY: Maximize availability through multiple instances. KEY TAKEAWAYS: Strive for active/active rather than active/passive solutions. Use load balancers to balance traffic across instances of a service. Use control services with active/passive instances for patterns that require singletons.","title":"HA - SPOF"},{"location":"level101/systems_design/availability/#ha-reliable-crossover","text":"WHAT: Ensure when system components failover they do so reliably. WHEN TO USE: During architecture reviews, failure modeling, and designs. HOW TO USE: Identify how available a system is during the crossover and ensure it is within acceptable limits. WHY: Maximize availability and ensure data handling semantics are preserved. KEY TAKEAWAYS: Strive for active/active rather than active/passive solutions, they have a lesser risk of cross over being unreliable. Use LB and the right load balancing methods to ensure reliable failover. Model and build your data systems to ensure data is correctly handled when crossover happens. Generally, DB systems follow active/passive semantics for writes. Masters accept writes and when the master goes down, the follower is promoted to master(active from being passive) to accept writes. We have to be careful here that the cutover never introduces more than one master. This problem is called a split brain.","title":"HA - Reliable Crossover"},{"location":"level101/systems_design/availability/#applications-in-sre-role","text":"SRE works on deciding an acceptable SLA and make sure the system is available to achieve the SLA SRE is involved in architecture design right from building the data center to make sure the site is not affected by a network switch, hardware, power, or software failures SRE also run mock drills of failures to see how the system behaves in uncharted territory and comes up with a plan to improve availability if there are misses. https://engineering.linkedin.com/blog/2017/11/resilience-engineering-at-linkedin-with-project-waterbear Post our understanding about HA, our architecture diagram looks something like this below","title":"Applications in SRE role"},{"location":"level101/systems_design/conclusion/","text":"Conclusion Armed with these principles, we hope the course will give a fresh perspective to design software systems. It might be over-engineering to get all this on day zero. But some are really important from day 0 like eliminating single points of failure, making scalable services by just increasing replicas. As a bottleneck is reached, we can split code by services, shard data to scale. As the organization matures, bringing in chaos engineering to measure how systems react to failure will help in designing robust software systems.","title":"Conclusion"},{"location":"level101/systems_design/conclusion/#conclusion","text":"Armed with these principles, we hope the course will give a fresh perspective to design software systems. It might be over-engineering to get all this on day zero. But some are really important from day 0 like eliminating single points of failure, making scalable services by just increasing replicas. As a bottleneck is reached, we can split code by services, shard data to scale. As the organization matures, bringing in chaos engineering to measure how systems react to failure will help in designing robust software systems.","title":"Conclusion"},{"location":"level101/systems_design/fault-tolerance/","text":"Fault Tolerance Failures are not avoidable in any system and will happen all the time, hence we need to build systems that can tolerate failures or recover from them. In systems, failure is the norm rather than the exception. \"Anything that can go wrong will go wrong\u201d -- Murphy\u2019s Law \u201cComplex systems contain changing mixtures of failures latent within them\u201d -- How Complex Systems Fail. Fault Tolerance - Failure Metrics Common failure metrics that get measured and tracked for any system. Mean time to repair (MTTR): The average time to repair and restore a failed system. Mean time between failures (MTBF): The average operational time between one device failure or system breakdown and the next. Mean time to failure (MTTF): The average time a device or system is expected to function before it fails. Mean time to detect (MTTD): The average time between the onset of a problem and when the organization detects it. Mean time to investigate (MTTI): The average time between the detection of an incident and when the organization begins to investigate its cause and solution. Mean time to restore service (MTRS): The average elapsed time from the detection of an incident until the affected system or component is again available to users. Mean time between system incidents (MTBSI): The average elapsed time between the detection of two consecutive incidents. MTBSI can be calculated by adding MTBF and MTRS (MTBSI = MTBF + MTRS). Failure rate: Another reliability metric, which measures the frequency with which a component or system fails. It is expressed as a number of failures over a unit of time. Refer https://www.splunk.com/en_us/data-insider/what-is-mean-time-to-repair.html Fault Tolerance - Fault Isolation Terms Systems should have a short circuit. Say in our content sharing system, if \u201cNotifications\u201d is not working, the site should gracefully handle that failure by removing the functionality instead of taking the whole site down. Swimlane is one of the commonly used fault isolation methodologies. Swimlane adds a barrier to the service from other services so that failure on either of them won\u2019t affect the other. Say we roll out a new feature \u2018Advertisement\u2019 in our content sharing app. We can have two architectures If Ads are generated on the fly synchronously during each Newsfeed request, the faults in the Ads feature get propagated to the Newsfeed feature. Instead if we swimlane the \u201cGeneration of Ads\u201d service and use a shared storage to populate Newsfeed App, Ads failures won\u2019t cascade to Newsfeed, and worst case if Ads don\u2019t meet SLA , we can have Newsfeed without Ads. Let's take another example, we have come up with a new model for our Content sharing App. Here we roll out an enterprise content sharing App where enterprises pay for the service and the content should never be shared outside the enterprise. Swimlane Principles Principle 1: Nothing is shared (also known as \u201cshare as little as possible\u201d). The less that is shared within a swim lane, the more fault isolative the swim lane becomes. (as shown in Enterprise use-case) Principle 2: Nothing crosses a swim lane boundary. Synchronous (defined by expecting a request\u2014not the transfer protocol) communication never crosses a swim lane boundary; if it does, the boundary is drawn incorrectly. (as shown in Ads feature) Swimlane Approaches Approach 1: Swim lane the money-maker. Never allow your cash register to be compromised by other systems. (Tier 1 vs Tier 2 in enterprise use case) Approach 2: Swim lane the biggest sources of incidents. Identify the recurring causes of pain and isolate them. (if Ads feature is in code yellow, swim laning it is the best option) Approach 3: Swim lane natural barriers. Customer boundaries make good swim lanes. (Public vs Enterprise customers) Refer https://learning.oreilly.com/library/view/the-art-of/9780134031408/ch21.html#ch21 Applications in SRE role Work with the DC tech or cloud team to distribute infrastructure such that its immune to switch or power failures by creating fault zones within a Data Center https://docs.microsoft.com/en-us/azure/virtual-machines/manage-availability#use-availability-zones-to-protect-from-datacenter-level-failures Work with the partners and design interaction between services such that one service breakdown is not amplified in a cascading fashion to all upstreams","title":"Fault Tolerance"},{"location":"level101/systems_design/fault-tolerance/#fault-tolerance","text":"Failures are not avoidable in any system and will happen all the time, hence we need to build systems that can tolerate failures or recover from them. In systems, failure is the norm rather than the exception. \"Anything that can go wrong will go wrong\u201d -- Murphy\u2019s Law \u201cComplex systems contain changing mixtures of failures latent within them\u201d -- How Complex Systems Fail.","title":"Fault Tolerance"},{"location":"level101/systems_design/fault-tolerance/#fault-tolerance-failure-metrics","text":"Common failure metrics that get measured and tracked for any system. Mean time to repair (MTTR): The average time to repair and restore a failed system. Mean time between failures (MTBF): The average operational time between one device failure or system breakdown and the next. Mean time to failure (MTTF): The average time a device or system is expected to function before it fails. Mean time to detect (MTTD): The average time between the onset of a problem and when the organization detects it. Mean time to investigate (MTTI): The average time between the detection of an incident and when the organization begins to investigate its cause and solution. Mean time to restore service (MTRS): The average elapsed time from the detection of an incident until the affected system or component is again available to users. Mean time between system incidents (MTBSI): The average elapsed time between the detection of two consecutive incidents. MTBSI can be calculated by adding MTBF and MTRS (MTBSI = MTBF + MTRS). Failure rate: Another reliability metric, which measures the frequency with which a component or system fails. It is expressed as a number of failures over a unit of time.","title":"Fault Tolerance - Failure Metrics"},{"location":"level101/systems_design/fault-tolerance/#refer","text":"https://www.splunk.com/en_us/data-insider/what-is-mean-time-to-repair.html","title":"Refer"},{"location":"level101/systems_design/fault-tolerance/#fault-tolerance-fault-isolation-terms","text":"Systems should have a short circuit. Say in our content sharing system, if \u201cNotifications\u201d is not working, the site should gracefully handle that failure by removing the functionality instead of taking the whole site down. Swimlane is one of the commonly used fault isolation methodologies. Swimlane adds a barrier to the service from other services so that failure on either of them won\u2019t affect the other. Say we roll out a new feature \u2018Advertisement\u2019 in our content sharing app. We can have two architectures If Ads are generated on the fly synchronously during each Newsfeed request, the faults in the Ads feature get propagated to the Newsfeed feature. Instead if we swimlane the \u201cGeneration of Ads\u201d service and use a shared storage to populate Newsfeed App, Ads failures won\u2019t cascade to Newsfeed, and worst case if Ads don\u2019t meet SLA , we can have Newsfeed without Ads. Let's take another example, we have come up with a new model for our Content sharing App. Here we roll out an enterprise content sharing App where enterprises pay for the service and the content should never be shared outside the enterprise.","title":"Fault Tolerance - Fault Isolation Terms"},{"location":"level101/systems_design/fault-tolerance/#swimlane-principles","text":"Principle 1: Nothing is shared (also known as \u201cshare as little as possible\u201d). The less that is shared within a swim lane, the more fault isolative the swim lane becomes. (as shown in Enterprise use-case) Principle 2: Nothing crosses a swim lane boundary. Synchronous (defined by expecting a request\u2014not the transfer protocol) communication never crosses a swim lane boundary; if it does, the boundary is drawn incorrectly. (as shown in Ads feature)","title":"Swimlane Principles"},{"location":"level101/systems_design/fault-tolerance/#swimlane-approaches","text":"Approach 1: Swim lane the money-maker. Never allow your cash register to be compromised by other systems. (Tier 1 vs Tier 2 in enterprise use case) Approach 2: Swim lane the biggest sources of incidents. Identify the recurring causes of pain and isolate them. (if Ads feature is in code yellow, swim laning it is the best option) Approach 3: Swim lane natural barriers. Customer boundaries make good swim lanes. (Public vs Enterprise customers)","title":"Swimlane Approaches"},{"location":"level101/systems_design/fault-tolerance/#refer_1","text":"https://learning.oreilly.com/library/view/the-art-of/9780134031408/ch21.html#ch21","title":"Refer"},{"location":"level101/systems_design/fault-tolerance/#applications-in-sre-role","text":"Work with the DC tech or cloud team to distribute infrastructure such that its immune to switch or power failures by creating fault zones within a Data Center https://docs.microsoft.com/en-us/azure/virtual-machines/manage-availability#use-availability-zones-to-protect-from-datacenter-level-failures Work with the partners and design interaction between services such that one service breakdown is not amplified in a cascading fashion to all upstreams","title":"Applications in SRE role"},{"location":"level101/systems_design/intro/","text":"Systems Design Prerequisites Fundamentals of common software system components: Linux Basics Linux Networking Databases RDBMS NoSQL Concepts What to expect from this course Thinking about and designing for scalability, availability, and reliability of large scale software systems. What is not covered under this course Individual software components\u2019 scalability and reliability concerns like e.g. Databases, while the same scalability principles and thinking can be applied, these individual components have their own specific nuances when scaling them and thinking about their reliability. More light will be shed on concepts rather than on setting up and configuring components like Loadbalancers to achieve scalability, availability, and reliability of systems Course Contents Introduction Scalability High Availability Fault Tolerance Introduction So, how do you go about learning to design a system? \u201d Like most great questions, it showed a level of naivety that was breathtaking. The only short answer I could give was, essentially, that you learned how to design a system by designing systems and finding out what works and what doesn\u2019t work.\u201d Jim Waldo, Sun Microsystems, On System Design As software and hardware systems have multiple moving parts, we need to think about how those parts will grow, their failure modes, their inter-dependencies, how it will impact the users and the business. There is no one-shot method or way to learn or do system design, we only learn to design systems by designing and iterating on them. This course will be a starter to make one think about scalability, availability, and fault tolerance during systems design. Backstory Let\u2019s design a simple content sharing application where users can share photos, media in our application which can be liked by their friends. Let\u2019s start with a simple design of the application and evolve it as we learn system design concepts","title":"Introduction"},{"location":"level101/systems_design/intro/#systems-design","text":"","title":"Systems Design"},{"location":"level101/systems_design/intro/#prerequisites","text":"Fundamentals of common software system components: Linux Basics Linux Networking Databases RDBMS NoSQL Concepts","title":"Prerequisites"},{"location":"level101/systems_design/intro/#what-to-expect-from-this-course","text":"Thinking about and designing for scalability, availability, and reliability of large scale software systems.","title":"What to expect from this course"},{"location":"level101/systems_design/intro/#what-is-not-covered-under-this-course","text":"Individual software components\u2019 scalability and reliability concerns like e.g. Databases, while the same scalability principles and thinking can be applied, these individual components have their own specific nuances when scaling them and thinking about their reliability. More light will be shed on concepts rather than on setting up and configuring components like Loadbalancers to achieve scalability, availability, and reliability of systems","title":"What is not covered under this course"},{"location":"level101/systems_design/intro/#course-contents","text":"Introduction Scalability High Availability Fault Tolerance","title":"Course Contents"},{"location":"level101/systems_design/intro/#introduction","text":"So, how do you go about learning to design a system? \u201d Like most great questions, it showed a level of naivety that was breathtaking. The only short answer I could give was, essentially, that you learned how to design a system by designing systems and finding out what works and what doesn\u2019t work.\u201d Jim Waldo, Sun Microsystems, On System Design As software and hardware systems have multiple moving parts, we need to think about how those parts will grow, their failure modes, their inter-dependencies, how it will impact the users and the business. There is no one-shot method or way to learn or do system design, we only learn to design systems by designing and iterating on them. This course will be a starter to make one think about scalability, availability, and fault tolerance during systems design.","title":"Introduction"},{"location":"level101/systems_design/intro/#backstory","text":"Let\u2019s design a simple content sharing application where users can share photos, media in our application which can be liked by their friends. Let\u2019s start with a simple design of the application and evolve it as we learn system design concepts","title":"Backstory"},{"location":"level101/systems_design/scalability/","text":"Scalability What does scalability mean for a system/service? A system is composed of services/components, each service/component scalability needs to be tackled separately, and the scalability of the system as a whole. A service is said to be scalable if, as resources are added to the system, it results in increased performance in a manner proportional to resources added An always-on service is said to be scalable if adding resources to facilitate redundancy does not result in a loss of performance Refer https://www.allthingsdistributed.com/2006/03/a_word_on_scalability.html Scalability - AKF Scale Cube The Scale Cube is a model for segmenting services, defining microservices, and scaling products. It also creates a common language for teams to discuss scale related options in designing solutions. The following section talks about certain scaling patterns based on our inferences from the AKF cube Scalability - Horizontal scaling Horizontal scaling stands for cloning of an application or service such that work can easily be distributed across instances with absolutely no bias. Let's see how our monolithic application improves with this principle Here DB is scaled separately from the application. This is to let you know each component\u2019s scaling capabilities can be different. Usually, web applications can be scaled by adding resources unless there is state stored inside the application. But DBs can be scaled only for Reads by adding more followers but Writes have to go to only one leader to make sure data is consistent. There are some DBs that support multi-leader writes but we are keeping them out of scope at this point. Apps should be able to differentiate between Reads and Writes to choose appropriate DB servers. Load balancers can split traffic between identical servers transparently. WHAT: Duplication of services or databases to spread transaction load. WHEN TO USE: Databases with a very high read-to-write ratio (5:1 or greater\u2014the higher the better). Because only read replicas of DBs can be scaled, not the Leader. HOW TO USE: Simply clone services and implement a load balancer. For databases, ensure that the accessing code understands the difference between a read and a write. WHY: Allows for the fast scale of transactions at the cost of duplicated data and functionality. KEY TAKEAWAYS: This is fast to implement, is a low cost from a developer effort perspective, and can scale transaction volumes nicely. However, they tend to be high cost from the perspective of the operational cost of data. The cost here means if we have 3 followers and 1 Leader DB, the same database will be stored as 4 copies in the 4 servers. Hence added storage cost Refer https://learning.oreilly.com/library/view/the-art-of/9780134031408/ch23.html Scalability Pattern - Load Balancing Improves the distribution of workloads across multiple computing resources, such as computers, a computer cluster, network links, central processing units, or disk drives. A commonly used technique is load balancing traffic across identical server clusters. A similar philosophy is used to load balance traffic across network links by ECMP , disk drives by RAID ,etc Aims to optimize resource use, maximize throughput, minimize response time, and avoid overload of any single resource. Using multiple components with load balancing instead of a single component may increase reliability and availability through redundancy. In our updated architecture diagram we have 4 servers to handle app traffic instead of a single server The device or system that performs load balancing is called a load balancer, abbreviated as LB. Refer https://en.wikipedia.org/wiki/Load_balancing_(computing) https://blog.envoyproxy.io/introduction-to-modern-network-load-balancing-and-proxying-a57f6ff80236 https://learning.oreilly.com/library/view/load-balancing-in/9781492038009/ https://learning.oreilly.com/library/view/practical-load-balancing/9781430236801/ http://shop.oreilly.com/product/9780596000509.do Scalability Pattern - LB Tasks What does an LB do? Service discovery: What backends are available in the system? In our architecture, 4 servers are available to serve App traffic. LB acts as a single endpoint that clients can use transparently to reach one of the 4 servers. Health checking: What backends are currently healthy and available to accept requests? If one out of the 4 App servers turns bad, LB should automatically short circuit the path so that clients don\u2019t sense any application downtime Load balancing: What algorithm should be used to balance individual requests across the healthy backends? There are many algorithms to distribute traffic across one of the four servers. Based on observations/experience, SRE can pick the algorithm that suits their pattern Scalability Pattern - LB Methods Common Load Balancing Methods Least Connection Method directs traffic to the server with the fewest active connections. Most useful when there are a large number of persistent connections in the traffic unevenly distributed between the servers. Works if clients maintain long-lived connections Least Response Time Method directs traffic to the server with the fewest active connections and the lowest average response time. Here response time is used to provide feedback of the server\u2019s health Round Robin Method rotates servers by directing traffic to the first available server and then moves that server to the bottom of the queue. Most useful when servers are of equal specification and there are not many persistent connections. IP Hash the IP address of the client determines which server receives the request. This can sometimes cause skewness in distribution but is useful if apps store some state locally and need some stickiness More advanced client/server-side example techniques - https://docs.nginx.com/nginx/admin-guide/load-balancer/ - http://cbonte.github.io/haproxy-dconv/2.2/intro.html#3.3.5 - https://twitter.github.io/finagle/guide/Clients.html#load-balancing Scalability Pattern - Caching - Content Delivery Networks (CDN) CDNs are added closer to the client\u2019s location. If the app has static data like images, Javascript, CSS which don\u2019t change very often, they can be cached. Since our example is a content sharing site, static content can be cached in CDNs with a suitable expiry. WHAT: Use CDNs (content delivery networks) to offload traffic from your site. WHEN TO USE: When speed improvements and scale warrant the additional cost. HOW TO USE: Most CDNs leverage DNS to serve content on your site\u2019s behalf. Thus you may need to make minor DNS changes or additions and move content to be served from new subdomains. Eg media-exp1.licdn.com is a domain used by Linkedin to serve static content Here a CNAME points the domain to the DNS of the CDN provider dig media-exp1.licdn.com +short 2-01-2c3e-005c.cdx.cedexis.net. WHY: CDNs help offload traffic spikes and are often economical ways to scale parts of a site\u2019s traffic. They also often substantially improve page download times. KEY TAKEAWAYS: CDNs are a fast and simple way to offset the spikiness of traffic as well as traffic growth in general. Make sure you perform a cost-benefit analysis and monitor the CDN usage. If CDNs have a lot of cache misses, then we don\u2019t gain much from CDN and are still serving requests using our compute resources. Scalability - Microservices This pattern represents the separation of work by service or function within the application. Microservices are meant to address the issues associated with growth and complexity in the code base and data sets. The intent is to create fault isolation as well as to reduce response times. Microservices can scale transactions, data sizes, and codebase sizes. They are most effective in scaling the size and complexity of your codebase. They tend to cost a bit more than horizontal scaling because the engineering team needs to rewrite services or, at the very least, disaggregate them from the original monolithic application. WHAT: Sometimes referred to as scale through services or resources, this rule focuses on scaling by splitting data sets, transactions, and engineering teams along verb (services) or noun (resources) boundaries. WHEN TO USE: Very large data sets where relations between data are not necessary. Large, complex systems where scaling engineering resources requires specialization. HOW TO USE: Split up actions by using verbs, or resources by using nouns, or use a mix. Split both the services and the data along the lines defined by the verb/noun approach. WHY: Allows for efficient scaling of not only transactions but also very large data sets associated with those transactions. It also allows for the efficient scaling of teams. KEY TAKEAWAYS: Microservices allow for efficient scaling of transactions, large data sets, and can help with fault isolation. It helps reduce the communication overhead of teams. The codebase becomes less complex as disjoint features are decoupled and spun as new services thereby letting each service scale independently specific to its requirement. Refer https://learning.oreilly.com/library/view/the-art-of/9780134031408/ch23.html Scalability - Sharding This pattern represents the separation of work based on attributes that are looked up to or determined at the time of the transaction. Most often, these are implemented as splits by requestor, customer, or client. Very often, a lookup service or deterministic algorithm will need to be written for these types of splits. Sharding aids in scaling transaction growth, scaling instruction sets, and decreasing processing time (the last by limiting the data necessary to perform any transaction). This is more effective at scaling growth in customers or clients. It can aid with disaster recovery efforts, and limit the impact of incidents to only a specific segment of customers. Here the auth data is sharded based on user names so that DBs can respond faster as the amount of data DBs have to work on has drastically reduced during queries. There can be other ways to split Here the whole data center is split and replicated and clients are directed to a data center based on their geography. This helps in improving performance as clients are directed to the closest data center and performance increases as we add more data centers. There are some replication and consistency overhead with this approach one needs to be aware of. This also gives fault tolerance by rolling out test features to one site and rollback if there is an impact to that geography WHAT: This is very often a split by some unique aspect of the customer such as customer ID, name, geography, and so on. WHEN TO USE: Very large, similar data sets such as large and rapidly growing customer bases or when the response time for a geographically distributed customer base is important. HOW TO USE: Identify something you know about the customer, such as customer ID, last name, geography, or device, and split or partition both data and services based on that attribute. WHY: Rapid customer growth exceeds other forms of data growth, or you have the need to perform fault isolation between certain customer groups as you scale. KEY TAKEAWAYS: Shards are effective at helping you to scale customer bases but can also be applied to other very large data sets that can\u2019t be pulled apart using the microservices methodology. Refer https://learning.oreilly.com/library/view/the-art-of/9780134031408/ch23.html Applications in SRE role SREs in coordination with the network team work on how to map users' traffic to a particular site. https://engineering.linkedin.com/blog/2017/05/trafficshift--load-testing-at-scale SREs work closely with the Dev team to split monoliths to multiple microservices that are easy to run and manage SREs work on improving Load Balancers' reliability, service discovery, and performance SREs work closely to split Data into shards and manage data integrity and consistency. https://engineering.linkedin.com/espresso/introducing-espresso-linkedins-hot-new-distributed-document-store SREs work to set up, configure, and improve the CDN cache hit rate.","title":"Scalability"},{"location":"level101/systems_design/scalability/#scalability","text":"What does scalability mean for a system/service? A system is composed of services/components, each service/component scalability needs to be tackled separately, and the scalability of the system as a whole. A service is said to be scalable if, as resources are added to the system, it results in increased performance in a manner proportional to resources added An always-on service is said to be scalable if adding resources to facilitate redundancy does not result in a loss of performance","title":"Scalability"},{"location":"level101/systems_design/scalability/#refer","text":"https://www.allthingsdistributed.com/2006/03/a_word_on_scalability.html","title":"Refer"},{"location":"level101/systems_design/scalability/#scalability-akf-scale-cube","text":"The Scale Cube is a model for segmenting services, defining microservices, and scaling products. It also creates a common language for teams to discuss scale related options in designing solutions. The following section talks about certain scaling patterns based on our inferences from the AKF cube","title":"Scalability - AKF Scale Cube"},{"location":"level101/systems_design/scalability/#scalability-horizontal-scaling","text":"Horizontal scaling stands for cloning of an application or service such that work can easily be distributed across instances with absolutely no bias. Let's see how our monolithic application improves with this principle Here DB is scaled separately from the application. This is to let you know each component\u2019s scaling capabilities can be different. Usually, web applications can be scaled by adding resources unless there is state stored inside the application. But DBs can be scaled only for Reads by adding more followers but Writes have to go to only one leader to make sure data is consistent. There are some DBs that support multi-leader writes but we are keeping them out of scope at this point. Apps should be able to differentiate between Reads and Writes to choose appropriate DB servers. Load balancers can split traffic between identical servers transparently. WHAT: Duplication of services or databases to spread transaction load. WHEN TO USE: Databases with a very high read-to-write ratio (5:1 or greater\u2014the higher the better). Because only read replicas of DBs can be scaled, not the Leader. HOW TO USE: Simply clone services and implement a load balancer. For databases, ensure that the accessing code understands the difference between a read and a write. WHY: Allows for the fast scale of transactions at the cost of duplicated data and functionality. KEY TAKEAWAYS: This is fast to implement, is a low cost from a developer effort perspective, and can scale transaction volumes nicely. However, they tend to be high cost from the perspective of the operational cost of data. The cost here means if we have 3 followers and 1 Leader DB, the same database will be stored as 4 copies in the 4 servers. Hence added storage cost","title":"Scalability - Horizontal scaling"},{"location":"level101/systems_design/scalability/#refer_1","text":"https://learning.oreilly.com/library/view/the-art-of/9780134031408/ch23.html","title":"Refer"},{"location":"level101/systems_design/scalability/#scalability-pattern-load-balancing","text":"Improves the distribution of workloads across multiple computing resources, such as computers, a computer cluster, network links, central processing units, or disk drives. A commonly used technique is load balancing traffic across identical server clusters. A similar philosophy is used to load balance traffic across network links by ECMP , disk drives by RAID ,etc Aims to optimize resource use, maximize throughput, minimize response time, and avoid overload of any single resource. Using multiple components with load balancing instead of a single component may increase reliability and availability through redundancy. In our updated architecture diagram we have 4 servers to handle app traffic instead of a single server The device or system that performs load balancing is called a load balancer, abbreviated as LB.","title":"Scalability Pattern - Load Balancing"},{"location":"level101/systems_design/scalability/#refer_2","text":"https://en.wikipedia.org/wiki/Load_balancing_(computing) https://blog.envoyproxy.io/introduction-to-modern-network-load-balancing-and-proxying-a57f6ff80236 https://learning.oreilly.com/library/view/load-balancing-in/9781492038009/ https://learning.oreilly.com/library/view/practical-load-balancing/9781430236801/ http://shop.oreilly.com/product/9780596000509.do","title":"Refer"},{"location":"level101/systems_design/scalability/#scalability-pattern-lb-tasks","text":"What does an LB do?","title":"Scalability Pattern - LB Tasks"},{"location":"level101/systems_design/scalability/#service-discovery","text":"What backends are available in the system? In our architecture, 4 servers are available to serve App traffic. LB acts as a single endpoint that clients can use transparently to reach one of the 4 servers.","title":"Service discovery:"},{"location":"level101/systems_design/scalability/#health-checking","text":"What backends are currently healthy and available to accept requests? If one out of the 4 App servers turns bad, LB should automatically short circuit the path so that clients don\u2019t sense any application downtime","title":"Health checking:"},{"location":"level101/systems_design/scalability/#load-balancing","text":"What algorithm should be used to balance individual requests across the healthy backends? There are many algorithms to distribute traffic across one of the four servers. Based on observations/experience, SRE can pick the algorithm that suits their pattern","title":"Load balancing:"},{"location":"level101/systems_design/scalability/#scalability-pattern-lb-methods","text":"Common Load Balancing Methods","title":"Scalability Pattern - LB Methods"},{"location":"level101/systems_design/scalability/#least-connection-method","text":"directs traffic to the server with the fewest active connections. Most useful when there are a large number of persistent connections in the traffic unevenly distributed between the servers. Works if clients maintain long-lived connections","title":"Least Connection Method"},{"location":"level101/systems_design/scalability/#least-response-time-method","text":"directs traffic to the server with the fewest active connections and the lowest average response time. Here response time is used to provide feedback of the server\u2019s health","title":"Least Response Time Method"},{"location":"level101/systems_design/scalability/#round-robin-method","text":"rotates servers by directing traffic to the first available server and then moves that server to the bottom of the queue. Most useful when servers are of equal specification and there are not many persistent connections.","title":"Round Robin Method"},{"location":"level101/systems_design/scalability/#ip-hash","text":"the IP address of the client determines which server receives the request. This can sometimes cause skewness in distribution but is useful if apps store some state locally and need some stickiness More advanced client/server-side example techniques - https://docs.nginx.com/nginx/admin-guide/load-balancer/ - http://cbonte.github.io/haproxy-dconv/2.2/intro.html#3.3.5 - https://twitter.github.io/finagle/guide/Clients.html#load-balancing","title":"IP Hash"},{"location":"level101/systems_design/scalability/#scalability-pattern-caching-content-delivery-networks-cdn","text":"CDNs are added closer to the client\u2019s location. If the app has static data like images, Javascript, CSS which don\u2019t change very often, they can be cached. Since our example is a content sharing site, static content can be cached in CDNs with a suitable expiry. WHAT: Use CDNs (content delivery networks) to offload traffic from your site. WHEN TO USE: When speed improvements and scale warrant the additional cost. HOW TO USE: Most CDNs leverage DNS to serve content on your site\u2019s behalf. Thus you may need to make minor DNS changes or additions and move content to be served from new subdomains. Eg media-exp1.licdn.com is a domain used by Linkedin to serve static content Here a CNAME points the domain to the DNS of the CDN provider dig media-exp1.licdn.com +short 2-01-2c3e-005c.cdx.cedexis.net. WHY: CDNs help offload traffic spikes and are often economical ways to scale parts of a site\u2019s traffic. They also often substantially improve page download times. KEY TAKEAWAYS: CDNs are a fast and simple way to offset the spikiness of traffic as well as traffic growth in general. Make sure you perform a cost-benefit analysis and monitor the CDN usage. If CDNs have a lot of cache misses, then we don\u2019t gain much from CDN and are still serving requests using our compute resources.","title":"Scalability Pattern - Caching - Content Delivery Networks (CDN)"},{"location":"level101/systems_design/scalability/#scalability-microservices","text":"This pattern represents the separation of work by service or function within the application. Microservices are meant to address the issues associated with growth and complexity in the code base and data sets. The intent is to create fault isolation as well as to reduce response times. Microservices can scale transactions, data sizes, and codebase sizes. They are most effective in scaling the size and complexity of your codebase. They tend to cost a bit more than horizontal scaling because the engineering team needs to rewrite services or, at the very least, disaggregate them from the original monolithic application. WHAT: Sometimes referred to as scale through services or resources, this rule focuses on scaling by splitting data sets, transactions, and engineering teams along verb (services) or noun (resources) boundaries. WHEN TO USE: Very large data sets where relations between data are not necessary. Large, complex systems where scaling engineering resources requires specialization. HOW TO USE: Split up actions by using verbs, or resources by using nouns, or use a mix. Split both the services and the data along the lines defined by the verb/noun approach. WHY: Allows for efficient scaling of not only transactions but also very large data sets associated with those transactions. It also allows for the efficient scaling of teams. KEY TAKEAWAYS: Microservices allow for efficient scaling of transactions, large data sets, and can help with fault isolation. It helps reduce the communication overhead of teams. The codebase becomes less complex as disjoint features are decoupled and spun as new services thereby letting each service scale independently specific to its requirement.","title":"Scalability - Microservices"},{"location":"level101/systems_design/scalability/#refer_3","text":"https://learning.oreilly.com/library/view/the-art-of/9780134031408/ch23.html","title":"Refer"},{"location":"level101/systems_design/scalability/#scalability-sharding","text":"This pattern represents the separation of work based on attributes that are looked up to or determined at the time of the transaction. Most often, these are implemented as splits by requestor, customer, or client. Very often, a lookup service or deterministic algorithm will need to be written for these types of splits. Sharding aids in scaling transaction growth, scaling instruction sets, and decreasing processing time (the last by limiting the data necessary to perform any transaction). This is more effective at scaling growth in customers or clients. It can aid with disaster recovery efforts, and limit the impact of incidents to only a specific segment of customers. Here the auth data is sharded based on user names so that DBs can respond faster as the amount of data DBs have to work on has drastically reduced during queries. There can be other ways to split Here the whole data center is split and replicated and clients are directed to a data center based on their geography. This helps in improving performance as clients are directed to the closest data center and performance increases as we add more data centers. There are some replication and consistency overhead with this approach one needs to be aware of. This also gives fault tolerance by rolling out test features to one site and rollback if there is an impact to that geography WHAT: This is very often a split by some unique aspect of the customer such as customer ID, name, geography, and so on. WHEN TO USE: Very large, similar data sets such as large and rapidly growing customer bases or when the response time for a geographically distributed customer base is important. HOW TO USE: Identify something you know about the customer, such as customer ID, last name, geography, or device, and split or partition both data and services based on that attribute. WHY: Rapid customer growth exceeds other forms of data growth, or you have the need to perform fault isolation between certain customer groups as you scale. KEY TAKEAWAYS: Shards are effective at helping you to scale customer bases but can also be applied to other very large data sets that can\u2019t be pulled apart using the microservices methodology.","title":"Scalability - Sharding"},{"location":"level101/systems_design/scalability/#refer_4","text":"https://learning.oreilly.com/library/view/the-art-of/9780134031408/ch23.html","title":"Refer"},{"location":"level101/systems_design/scalability/#applications-in-sre-role","text":"SREs in coordination with the network team work on how to map users' traffic to a particular site. https://engineering.linkedin.com/blog/2017/05/trafficshift--load-testing-at-scale SREs work closely with the Dev team to split monoliths to multiple microservices that are easy to run and manage SREs work on improving Load Balancers' reliability, service discovery, and performance SREs work closely to split Data into shards and manage data integrity and consistency. https://engineering.linkedin.com/espresso/introducing-espresso-linkedins-hot-new-distributed-document-store SREs work to set up, configure, and improve the CDN cache hit rate.","title":"Applications in SRE role"},{"location":"level102/containerization_and_orchestration/conclusion/","text":"Conclusion In this sub-module we have toured the world of containers starting from why we use containers, how containers evolved from the virtual machine past (though they are, in no means, obsolete) and how they are different from virtual machines. We then saw how containers are implemented with emphasis on cgroups and namespaces along with some hands-on exercises. Finally we concluded our journey with container orchestration where we learnt a bit of Kubernetes with some practical examples. Hope this module gives you enough knowledge and interest to continue learning and applying these technologies in greater depth!","title":"Conclusion"},{"location":"level102/containerization_and_orchestration/conclusion/#conclusion","text":"In this sub-module we have toured the world of containers starting from why we use containers, how containers evolved from the virtual machine past (though they are, in no means, obsolete) and how they are different from virtual machines. We then saw how containers are implemented with emphasis on cgroups and namespaces along with some hands-on exercises. Finally we concluded our journey with container orchestration where we learnt a bit of Kubernetes with some practical examples. Hope this module gives you enough knowledge and interest to continue learning and applying these technologies in greater depth!","title":"Conclusion"},{"location":"level102/containerization_and_orchestration/containerization_with_docker/","text":"Introduction Docker has gained huge popularity among other container engines since it was released to the public in 2013. Here are some of the reasons why Docker so popular: Improved portability Docker containers can be shipped and run across environments be it local machine, on-prem or cloud instances in the form of Docker images. Compared to docker containers, LXC containers have more machine specifications. - Lighter weight Docker images are light weight compared to VM images. For example, an Ubuntu 18.04 VM size is about 3GB whereas the docker image is 45MB! Versioning of container images Docker supports maintaining multiple versions of images which makes it easier to look up the history of an image and even rollback. Reuse of images Since Docker images are in the form of layers, one image can be used as base on top of which new images are built. For example, Alpine is a light weight image (5MB) which is commonly used as a base image. Docker layers are managed using storage drivers . Community support Docker hub is a container registry where anyone logged in can upload or download a container image. Docker images of popular OS distros are regularly updated in docker hub and receive large community support. Let\u2019s look at some terms which come up during our discussion of Docker. Docker terminology Docker images Docker image contains the executable version of the application along with the dependencies (config files, libraries, binaries) required for the application to run as a standalone container. It can be understood as a snapshot of a container. Docker images are present as layers on top of the base layer. These layers are the ones that are versioned. The most recent version of layer is the one that is used on top of the base image. docker image ls lists the images present in the host machine. Docker containers Docker container is the running instance of the docker image. While images are static, containers created from the images can be executed into and interacted with. This is actually the \u201ccontainer\u201d from the previous sections of the module. docker run is the command used to instantiate containers from images. docker ps lists docker containers currently running in the host machine. Docker file It is a plain text file of instructions based on which an image is assembled by docker engine (daemon, to be precise). It contains information on base image, ENV variables to be injected. docker build is used to build images from dockerfile. Docker hub It is Docker\u2019s official container registry of images. Any user with a docker login can upload custom images to Docker hub using docker push and fetch images using docker pull . Having known the basic terminologies let\u2019s look at how docker engine works; how CLI commands are interpreted and container life-cycle is managed. Components of Docker engine Let\u2019s start with the diagram of Docker Engine to understand better: The docker engine follows a client-server architecture. It consists of 3 components: Docker client This is the component the user directly interacts with. When you execute docker commands which we saw earlier (push, pull, container ls, image ls) , we are actually using the docker client. A single docker client can communicate with multiple docker daemons. REST API Provides an interface for the docker client and daemon to communicate. Docker Daemon (server) This is the main component of the docker engine. It builds images from dockerfile, fetches images from docker registry, pushes images to the registry, stops, starts containers etc. It also manages networking between containers. LAB The official docker github provides labs at several levels for learning Docker. We're linking one of the labs which we found great for people beginning from scratch. Please follow the labs in this order: Setting up local environment for the labs Basics for using docker CLI Creating and containerizing a basic Flask app Here is another beginner level lab for dockerizing a MERN (Mongo + React + Express) application and it\u2019s easy to follow along. Advanced features of Docker While we have covered the basics of containerization and how a standalone application can be dockerized, processes in the real world need to communicate with each other. This need is particularly prevalent in applications which follow a microservice architecture. Docker networks Docker networks facilitate the interaction between containers running on the same hosts or even different hosts. There are several options provided through docker network command which specifies how the container interacts with the host and with other containers. The host option allows sharing of network stack with the host, bridge allows communication between containers running on the same host but not external to the host, overlay facilitates interaction between containers across hosts attached to the same network and macvlan which assigns a separate MAC address to a container for legacy containers are some important types of networks supported by Docker. This however is outside the scope of this module. The official documentation on docker networks itself is a good place to start. Volumes Apart from images, containers and networks, Docker also provides the option to create and mount volumes within containers. Generally, data within docker containers is non-persistent i.e once you kill the container the data is lost. Volumes are used for storing persistent data in containers. This Docker lab is a great place to start playing with volumes. In the next section we see how container deployments are orchestrated with Kubernetes.","title":"Containerization With Docker"},{"location":"level102/containerization_and_orchestration/containerization_with_docker/#introduction","text":"Docker has gained huge popularity among other container engines since it was released to the public in 2013. Here are some of the reasons why Docker so popular: Improved portability Docker containers can be shipped and run across environments be it local machine, on-prem or cloud instances in the form of Docker images. Compared to docker containers, LXC containers have more machine specifications. - Lighter weight Docker images are light weight compared to VM images. For example, an Ubuntu 18.04 VM size is about 3GB whereas the docker image is 45MB! Versioning of container images Docker supports maintaining multiple versions of images which makes it easier to look up the history of an image and even rollback. Reuse of images Since Docker images are in the form of layers, one image can be used as base on top of which new images are built. For example, Alpine is a light weight image (5MB) which is commonly used as a base image. Docker layers are managed using storage drivers . Community support Docker hub is a container registry where anyone logged in can upload or download a container image. Docker images of popular OS distros are regularly updated in docker hub and receive large community support. Let\u2019s look at some terms which come up during our discussion of Docker.","title":"Introduction"},{"location":"level102/containerization_and_orchestration/containerization_with_docker/#docker-terminology","text":"Docker images Docker image contains the executable version of the application along with the dependencies (config files, libraries, binaries) required for the application to run as a standalone container. It can be understood as a snapshot of a container. Docker images are present as layers on top of the base layer. These layers are the ones that are versioned. The most recent version of layer is the one that is used on top of the base image. docker image ls lists the images present in the host machine. Docker containers Docker container is the running instance of the docker image. While images are static, containers created from the images can be executed into and interacted with. This is actually the \u201ccontainer\u201d from the previous sections of the module. docker run is the command used to instantiate containers from images. docker ps lists docker containers currently running in the host machine. Docker file It is a plain text file of instructions based on which an image is assembled by docker engine (daemon, to be precise). It contains information on base image, ENV variables to be injected. docker build is used to build images from dockerfile. Docker hub It is Docker\u2019s official container registry of images. Any user with a docker login can upload custom images to Docker hub using docker push and fetch images using docker pull . Having known the basic terminologies let\u2019s look at how docker engine works; how CLI commands are interpreted and container life-cycle is managed.","title":"Docker terminology"},{"location":"level102/containerization_and_orchestration/containerization_with_docker/#components-of-docker-engine","text":"Let\u2019s start with the diagram of Docker Engine to understand better: The docker engine follows a client-server architecture. It consists of 3 components: Docker client This is the component the user directly interacts with. When you execute docker commands which we saw earlier (push, pull, container ls, image ls) , we are actually using the docker client. A single docker client can communicate with multiple docker daemons. REST API Provides an interface for the docker client and daemon to communicate. Docker Daemon (server) This is the main component of the docker engine. It builds images from dockerfile, fetches images from docker registry, pushes images to the registry, stops, starts containers etc. It also manages networking between containers.","title":"Components of Docker engine"},{"location":"level102/containerization_and_orchestration/containerization_with_docker/#lab","text":"The official docker github provides labs at several levels for learning Docker. We're linking one of the labs which we found great for people beginning from scratch. Please follow the labs in this order: Setting up local environment for the labs Basics for using docker CLI Creating and containerizing a basic Flask app Here is another beginner level lab for dockerizing a MERN (Mongo + React + Express) application and it\u2019s easy to follow along.","title":"LAB"},{"location":"level102/containerization_and_orchestration/containerization_with_docker/#advanced-features-of-docker","text":"While we have covered the basics of containerization and how a standalone application can be dockerized, processes in the real world need to communicate with each other. This need is particularly prevalent in applications which follow a microservice architecture. Docker networks Docker networks facilitate the interaction between containers running on the same hosts or even different hosts. There are several options provided through docker network command which specifies how the container interacts with the host and with other containers. The host option allows sharing of network stack with the host, bridge allows communication between containers running on the same host but not external to the host, overlay facilitates interaction between containers across hosts attached to the same network and macvlan which assigns a separate MAC address to a container for legacy containers are some important types of networks supported by Docker. This however is outside the scope of this module. The official documentation on docker networks itself is a good place to start. Volumes Apart from images, containers and networks, Docker also provides the option to create and mount volumes within containers. Generally, data within docker containers is non-persistent i.e once you kill the container the data is lost. Volumes are used for storing persistent data in containers. This Docker lab is a great place to start playing with volumes. In the next section we see how container deployments are orchestrated with Kubernetes.","title":"Advanced features of Docker"},{"location":"level102/containerization_and_orchestration/intro/","text":"Containers and orchestration Introduction Containers, Docker and Kubernetes are \"cool\" terms that are being spoken of by everyone involved with software in some way. Let's dive into each of these pieces of technology at enough depth to understand what the whole deal is about! In this module we talk about the ins and outs of containers: the internals and usage of containers; how they are implemented, how to containerize your application and finally, how to deploy containerized applications on a large scale without losing your sleep. We'll also get our hands dirty by trying out a few lab exercises. Prerequisites Basic knowledge of linux will be helpful understanding the internals of containers Basic knowledge of shell commands (will come handy when we're containerizing applications) Knowledge of running a basic web application. You can go through our Python And Web module to gain familiarity with this. What to expect from this course This module is divided into 3 sub-modules. In the first sub module, we will cover the internals of containerization and why they\u2019re used for. The second sub-module introduces Docker, a popular container engine and contains lab exercises on dockerizing a basic webapp. The last module talks about container orchestration with Kubernetes and some lab exercises to show how it makes the lives of SREs easy. What is not covered under this course We will not cover advanced docker and kubernetes concepts. However, we will be leading you to links and references from where you can pick them up as per your interest. Course Contents The following topics has been covered in this course: Introduction to containers What are containers Why containers Difference between virtual machines and containers How are containers implemented Namespaces Cgroups Container engines Containerization with Docker Introduction Basic docker terminology Components of Docker engine Hands-on Introduction to Advanced Docker Container orchestration with Kubernetes Introduction Motivation to use Kubernetes Kubernetes Architecture Hands-on Introduction to Advanced Kubernetes concepts Conclusion","title":"Introduction"},{"location":"level102/containerization_and_orchestration/intro/#containers-and-orchestration","text":"","title":"Containers and orchestration"},{"location":"level102/containerization_and_orchestration/intro/#introduction","text":"Containers, Docker and Kubernetes are \"cool\" terms that are being spoken of by everyone involved with software in some way. Let's dive into each of these pieces of technology at enough depth to understand what the whole deal is about! In this module we talk about the ins and outs of containers: the internals and usage of containers; how they are implemented, how to containerize your application and finally, how to deploy containerized applications on a large scale without losing your sleep. We'll also get our hands dirty by trying out a few lab exercises.","title":"Introduction"},{"location":"level102/containerization_and_orchestration/intro/#prerequisites","text":"Basic knowledge of linux will be helpful understanding the internals of containers Basic knowledge of shell commands (will come handy when we're containerizing applications) Knowledge of running a basic web application. You can go through our Python And Web module to gain familiarity with this.","title":"Prerequisites"},{"location":"level102/containerization_and_orchestration/intro/#what-to-expect-from-this-course","text":"This module is divided into 3 sub-modules. In the first sub module, we will cover the internals of containerization and why they\u2019re used for. The second sub-module introduces Docker, a popular container engine and contains lab exercises on dockerizing a basic webapp. The last module talks about container orchestration with Kubernetes and some lab exercises to show how it makes the lives of SREs easy.","title":"What to expect from this course"},{"location":"level102/containerization_and_orchestration/intro/#what-is-not-covered-under-this-course","text":"We will not cover advanced docker and kubernetes concepts. However, we will be leading you to links and references from where you can pick them up as per your interest.","title":"What is not covered under this course"},{"location":"level102/containerization_and_orchestration/intro/#course-contents","text":"The following topics has been covered in this course: Introduction to containers What are containers Why containers Difference between virtual machines and containers How are containers implemented Namespaces Cgroups Container engines Containerization with Docker Introduction Basic docker terminology Components of Docker engine Hands-on Introduction to Advanced Docker Container orchestration with Kubernetes Introduction Motivation to use Kubernetes Kubernetes Architecture Hands-on Introduction to Advanced Kubernetes concepts Conclusion","title":"Course Contents"},{"location":"level102/containerization_and_orchestration/intro_to_containers/","text":"What are containers Here's a popular definition of containers according to Docker , a popular containerization engine : A container is a standard unit of software that packages up code and all its dependencies so the application runs quickly and reliably from one computing environment to another Let's break this down. A container is your code bundled along with its entire runtime environment. That includes your system libraries, binaries and config files needed for your application to run. Why containers You might wonder why we need to pack your application along with its dependencies. This is where the second part of the definition comes, ...so the application runs quickly and reliably from one computing environment to another. Developers usually write code in their dev environment (or local machine), test it in one or two staging/test environments before pushing their code into production. Ideally, for reliably testing applications before pushing to production, we need all these environments to be uniform to a tee (underlying OS, system libraries etc). Of course, the ideal is hard to achieve especially when we're using a mix of on-prem (complete control) and cloud infrastructure providers (more restrictive in terms of control of hardware and security options), a scenario which is more common today. This is exactly why we need to package not only the code but also the dependencies; so that your application runs reliably irrespective of which infrastructure or environment it runs on. We can run several containers on a single host. Due to how containers are implemented, each container has its own isolated environment within the same host. This means that a monolithic application can be broken down into micro-services and packaged into containers. Each microservice runs in the host machine in isolated environments. This is another reason why containers are used: separation of concerns . Providing isolated environments does not let the failure of one application in one container affect the other. This is called fault isolation . Isolation also gives the added benefit of increased security due to restricted visibility of processes in a container. Due to how most of the containerization solutions are implemented, we also have the option to cap the amount of resources consumed by applications running within a container. This is called resource limiting . Will will discuss this feature in more detail in the section on cgroups. Difference between virtual machines and containers Let's digress a little and go into some history. In the previous section we talked about how containers help us in achieving separation of concerns. Before the wide-spread usage of containers, virtualization was used for running applications in isolated environments in the same host (it\u2019s still being used today in some cases). In plain terms, virtualization is where we package software along with a copy of the OS on which it runs. This package is called a virtual machine (VM). The image of the OS bundled in the VM is called Guest OS. A component called Hypervisor sits between the Guest and the Host OS and is responsible for facilitating the access of the underlying OS\u2019s hardware to the Guest OS. You can learn more about hypervisors here . Similar to how multiple containers can be run in a single host machine, multiple VMs can be run on a single host and in this way, it\u2019s possible to run applications (or each microservice) in a separate VM and achieve separation of concerns. The main focus here is on the size of the VMs and containers. VMs come along with a copy of the guest operating system and therefore are heavy-weight compared to containers. If you\u2019re more interested in comparison of VMs and containers, you can check these articles from Backblaze and NetApp . While it is possible to run an operating system on a host with an incompatible kernel using hypervisors (e.g Windows 10 VM on CentOS 7), in cases where kernels can be shared (e.g Ubuntu on CentOS 7) containers are preferred over VMs due to the size factor. Sharing kernels, as you will see later, also gives containers many performance benefits over VMs like quicker boot-ups. Let\u2019s look at the diagram of how containers work. Comparing the two diagrams, we notice two things: Containers do not have a separate (guest) OS Container engine is the intermediary between containers and Host OS. It is used to facilitate the life-cycle of a container on the Host OS (it is not a necessity, however). The next section explains in detail how containers share the same operating system (kernel, to be precise) as the host machine and yet provide isolated environments for applications to run. How are containers implemented We\u2019ve talked about how containers, unlike virtual machines, share the same kernel as the host operating system and provide isolated environments for applications to run. This is achieved without the overhead of running a guest operating system on the host OS, thanks to two features of linux kernel called cgroups and kernel namespaces. Now that we are touching upon the internals of containers, it would be appropriate to give a more technically accurate representation of what they are. A container is a linux process or a group of linux processes which is restricted in - visibility into processes outside the container (implemented using namespace) - quantity of resources it can use (implemented using cgroups) and - system calls that can be made from the container. Refer seccomp , if interested in knowing more. These restrictions are what make a containerized application remain isolated from other processes running in the same host. Now let\u2019s talk about namespaces and cgroup in a little more detail. Namespaces Visibility of processes inside a container should be restricted within itself. This is what linux namespaces do. The idea is that processes within a namespace can\u2019t affect those which it can\u2019t \u201csee\u201d. Processes sharing a single namespace have identities, service and/or interfaces unique to the namespace they exist in. Here\u2019s a list of namespaces in linux: Mount Process groups sharing a mount namespace share a separate, private set of mount points and file system view. Any modifications made to these namespaced mount points are not visible outside the namespace. For example it is possible to have a /var within the a mount namespace which is different from /var in the host. PID A processes in a pid namespace have process ids which are unique only within the namespace. A process can be a root process (pid 1) in its own pid namespace and have an entire tree of processes under it. Network Each network namespace will have its own network device instances that can be configured with individual network addresses. Processes in the same network namespace can have their own ports and route tables. User User namespaces can have their own users and group ids. It\u2019s possible for a process using a non-privileged user in the host machine to have a root user identity within a user namespace. Cgroup Allows creation of cgroups which can be used only within the cgroup namespace. Cgroups will be covered in more detail in the following section. UTS This namespace has its own hostname and domain name IPC. Each IPC namespace has its own System V and POSIX message queues. As complex as it seems, creating namespaces in linux is quite simple. Let\u2019s see a quick demo to create a PID namespace. You\u2019ll need a linux based OS with sudoers permission to follow along. DEMO: namespaces First we check which processes are running in the host system (output varies from system to system). Note the process with pid 1. Let\u2019s create a PID namespace with the unshare command and create a bash process in the namespace You can see that ps aux (which itself is a process launched in the PID namespace so created) can only see processes within its own namespace. Hence, the output shows only 2 processes running within the namespace. Also note, the root process (pid 1) in the namespace is not init but it is the bash shell which we specified while creating the namespace. Let\u2019s create another process in the same namespace which sleeps for 1000 seconds in the background. In my case the pid of the sleep process is 44 within the PID namespace . On a separate terminal, check for the process id of the sleep process as seen from the host. Note the difference in pid (23844 in the host and 44 within the namespace) though both refer to the same process (start time and all other attributes are same). It\u2019s also possible to nest namespaces i.e create a pid namespace from another pid namespace. Try out sudo nsenter -t 23844 --pid -r bash to reenter the namespace and create another pid namespace within it. It should be fun to do! Cgroups A cgroup can be defined as a set of processes whose usage of resources is metered and monitored. The resources can be memory pages, disk i/o, CPU etc. In fact, cgroups are classified based on which resource the limit is imposed on and nature of action taken when a limit is violated. The component in the cgroup which tracks resource utilization and controls the behaviour of processes in a cgroup is called resource-subsystem or resource controller. Following is the set of resource controllers and their function according to RHEL\u2019s introduction to cgroups : blkio \u2014 this subsystem sets limits on input/output access to and from block devices such as physical drives (disk, solid state, or USB). cpu \u2014 this subsystem uses the scheduler to provide cgroup processes access to the CPU. cpuacct \u2014 this subsystem generates automatic reports on CPU resources used by processes in a cgroup. cpuset \u2014 this subsystem assigns individual CPUs (on a multicore system) and memory nodes to processes in a cgroup. devices \u2014 this subsystem allows or denies access to devices by processes in a cgroup. freezer \u2014 this subsystem suspends or resumes processes in a cgroup. memory \u2014 this subsystem sets limits on memory use by processes in a cgroup and generates automatic reports on memory resources used by those processes. Cgroups follow a hierarchical, tree-like structure for each resource controller i.e one cgroup exists for each controller. Each cgroup in a hierarchy inherits certain attributes (e.g limits) from its parent cgroup. Let\u2019s try out a quick demo with memory cgroups to wrap our heads around the above ideas. You\u2019ll need a linux based OS (here, RedHat) with sudo permission to follow along. DEMO: cgroups Let\u2019s start by checking if cgroup tools are installed in your machine. Execute mount | grep \"^cgroup\" . If you have the tools installed you\u2019ll see a output like this: If not, install the tools with sudo yum install libcgroup-tools -y . Now, we create a memory cgroup called mem_group with \u201croot\u201d as the owner of the cgroup. Command executed sudo cgcreate -a root -g memory:mem_group . Verify that cgroup is created. /sys/fs/cgroup/ is the pseudo filesystem where a newly created cgroup is added as a sub-group. Memory cgroup puts a limit on the memory usage of processes in the cgroup. Let\u2019s see what the limits are for mem_group. The file for checking the memory limit is memory.limit_in_bytes( more information here , if you\u2019re interested). Note that mem_group has inherited the limit from its parent cgroup Now, let\u2019s reduce the memory usage limit to 20KB for the purpose of our demo (the actual limit is rounded off to the nearest power of 2). This limit is too low and hence most of the processes attached to mem_group should be OOM killed. Create a new shell and attach it to the cgroup. We need sudo permissions for this. The process is OOM killed as expected. You can confirm the same with dmesg logs (mm_fault_error). If you want to try out a more in-depth exercise on cgroups, check out this tutorial from Geeks for Geeks . Let\u2019s come back to containers again. Containers share the same kernel as the underlying host operating system and provide an isolated environment of the application within. Cgroups help in managing resources used by processes within a container and namespaces help isolate network stack, pids, users, group ids and mount points in a container from another container running on the same host. Of course, there are more components to containers which truly make it fully functional but that discussion is out of scope of this module. Container engine Container engines ease the process of creating and managing containers in a host machine. How? The container creation workflow typically begins with a container image. A container image is a packaged, portable version of the target application bundled with all dependencies for it to run. These container images are either available on the host machine (container host) from previous builds or need to be pulled from a remote repository of images. Sometimes the container engine might need to build the container image from a set of instructions. Finally once the container image is fetched/built, the container engine unpacks the image and creates an isolated environment for the application as per the image specifications. The files in the container image are then mounted to the isolated environment to get the application up and running within the container. There are several container engines available like Docker, RKT, LXC (one of the first container engines) which require different image formats (Docker, LXD). OCI (Open Container Initiative) is a collaborative project started by Docker that aims to standardize container runtime specifications and image formats across vendors. OCI FAQ section is a good place to start if you\u2019re curious about this project. We will focus on Docker in the next section .","title":"Introduction To Containers"},{"location":"level102/containerization_and_orchestration/intro_to_containers/#what-are-containers","text":"Here's a popular definition of containers according to Docker , a popular containerization engine : A container is a standard unit of software that packages up code and all its dependencies so the application runs quickly and reliably from one computing environment to another Let's break this down. A container is your code bundled along with its entire runtime environment. That includes your system libraries, binaries and config files needed for your application to run.","title":"What are containers"},{"location":"level102/containerization_and_orchestration/intro_to_containers/#why-containers","text":"You might wonder why we need to pack your application along with its dependencies. This is where the second part of the definition comes, ...so the application runs quickly and reliably from one computing environment to another. Developers usually write code in their dev environment (or local machine), test it in one or two staging/test environments before pushing their code into production. Ideally, for reliably testing applications before pushing to production, we need all these environments to be uniform to a tee (underlying OS, system libraries etc). Of course, the ideal is hard to achieve especially when we're using a mix of on-prem (complete control) and cloud infrastructure providers (more restrictive in terms of control of hardware and security options), a scenario which is more common today. This is exactly why we need to package not only the code but also the dependencies; so that your application runs reliably irrespective of which infrastructure or environment it runs on. We can run several containers on a single host. Due to how containers are implemented, each container has its own isolated environment within the same host. This means that a monolithic application can be broken down into micro-services and packaged into containers. Each microservice runs in the host machine in isolated environments. This is another reason why containers are used: separation of concerns . Providing isolated environments does not let the failure of one application in one container affect the other. This is called fault isolation . Isolation also gives the added benefit of increased security due to restricted visibility of processes in a container. Due to how most of the containerization solutions are implemented, we also have the option to cap the amount of resources consumed by applications running within a container. This is called resource limiting . Will will discuss this feature in more detail in the section on cgroups.","title":"Why containers"},{"location":"level102/containerization_and_orchestration/intro_to_containers/#difference-between-virtual-machines-and-containers","text":"Let's digress a little and go into some history. In the previous section we talked about how containers help us in achieving separation of concerns. Before the wide-spread usage of containers, virtualization was used for running applications in isolated environments in the same host (it\u2019s still being used today in some cases). In plain terms, virtualization is where we package software along with a copy of the OS on which it runs. This package is called a virtual machine (VM). The image of the OS bundled in the VM is called Guest OS. A component called Hypervisor sits between the Guest and the Host OS and is responsible for facilitating the access of the underlying OS\u2019s hardware to the Guest OS. You can learn more about hypervisors here . Similar to how multiple containers can be run in a single host machine, multiple VMs can be run on a single host and in this way, it\u2019s possible to run applications (or each microservice) in a separate VM and achieve separation of concerns. The main focus here is on the size of the VMs and containers. VMs come along with a copy of the guest operating system and therefore are heavy-weight compared to containers. If you\u2019re more interested in comparison of VMs and containers, you can check these articles from Backblaze and NetApp . While it is possible to run an operating system on a host with an incompatible kernel using hypervisors (e.g Windows 10 VM on CentOS 7), in cases where kernels can be shared (e.g Ubuntu on CentOS 7) containers are preferred over VMs due to the size factor. Sharing kernels, as you will see later, also gives containers many performance benefits over VMs like quicker boot-ups. Let\u2019s look at the diagram of how containers work. Comparing the two diagrams, we notice two things: Containers do not have a separate (guest) OS Container engine is the intermediary between containers and Host OS. It is used to facilitate the life-cycle of a container on the Host OS (it is not a necessity, however). The next section explains in detail how containers share the same operating system (kernel, to be precise) as the host machine and yet provide isolated environments for applications to run.","title":"Difference between virtual machines and containers"},{"location":"level102/containerization_and_orchestration/intro_to_containers/#how-are-containers-implemented","text":"We\u2019ve talked about how containers, unlike virtual machines, share the same kernel as the host operating system and provide isolated environments for applications to run. This is achieved without the overhead of running a guest operating system on the host OS, thanks to two features of linux kernel called cgroups and kernel namespaces. Now that we are touching upon the internals of containers, it would be appropriate to give a more technically accurate representation of what they are. A container is a linux process or a group of linux processes which is restricted in - visibility into processes outside the container (implemented using namespace) - quantity of resources it can use (implemented using cgroups) and - system calls that can be made from the container. Refer seccomp , if interested in knowing more. These restrictions are what make a containerized application remain isolated from other processes running in the same host. Now let\u2019s talk about namespaces and cgroup in a little more detail.","title":"How are containers implemented"},{"location":"level102/containerization_and_orchestration/intro_to_containers/#namespaces","text":"Visibility of processes inside a container should be restricted within itself. This is what linux namespaces do. The idea is that processes within a namespace can\u2019t affect those which it can\u2019t \u201csee\u201d. Processes sharing a single namespace have identities, service and/or interfaces unique to the namespace they exist in. Here\u2019s a list of namespaces in linux: Mount Process groups sharing a mount namespace share a separate, private set of mount points and file system view. Any modifications made to these namespaced mount points are not visible outside the namespace. For example it is possible to have a /var within the a mount namespace which is different from /var in the host. PID A processes in a pid namespace have process ids which are unique only within the namespace. A process can be a root process (pid 1) in its own pid namespace and have an entire tree of processes under it. Network Each network namespace will have its own network device instances that can be configured with individual network addresses. Processes in the same network namespace can have their own ports and route tables. User User namespaces can have their own users and group ids. It\u2019s possible for a process using a non-privileged user in the host machine to have a root user identity within a user namespace. Cgroup Allows creation of cgroups which can be used only within the cgroup namespace. Cgroups will be covered in more detail in the following section. UTS This namespace has its own hostname and domain name IPC. Each IPC namespace has its own System V and POSIX message queues. As complex as it seems, creating namespaces in linux is quite simple. Let\u2019s see a quick demo to create a PID namespace. You\u2019ll need a linux based OS with sudoers permission to follow along.","title":"Namespaces"},{"location":"level102/containerization_and_orchestration/intro_to_containers/#demo-namespaces","text":"First we check which processes are running in the host system (output varies from system to system). Note the process with pid 1. Let\u2019s create a PID namespace with the unshare command and create a bash process in the namespace You can see that ps aux (which itself is a process launched in the PID namespace so created) can only see processes within its own namespace. Hence, the output shows only 2 processes running within the namespace. Also note, the root process (pid 1) in the namespace is not init but it is the bash shell which we specified while creating the namespace. Let\u2019s create another process in the same namespace which sleeps for 1000 seconds in the background. In my case the pid of the sleep process is 44 within the PID namespace . On a separate terminal, check for the process id of the sleep process as seen from the host. Note the difference in pid (23844 in the host and 44 within the namespace) though both refer to the same process (start time and all other attributes are same). It\u2019s also possible to nest namespaces i.e create a pid namespace from another pid namespace. Try out sudo nsenter -t 23844 --pid -r bash to reenter the namespace and create another pid namespace within it. It should be fun to do!","title":"DEMO: namespaces"},{"location":"level102/containerization_and_orchestration/intro_to_containers/#cgroups","text":"A cgroup can be defined as a set of processes whose usage of resources is metered and monitored. The resources can be memory pages, disk i/o, CPU etc. In fact, cgroups are classified based on which resource the limit is imposed on and nature of action taken when a limit is violated. The component in the cgroup which tracks resource utilization and controls the behaviour of processes in a cgroup is called resource-subsystem or resource controller. Following is the set of resource controllers and their function according to RHEL\u2019s introduction to cgroups : blkio \u2014 this subsystem sets limits on input/output access to and from block devices such as physical drives (disk, solid state, or USB). cpu \u2014 this subsystem uses the scheduler to provide cgroup processes access to the CPU. cpuacct \u2014 this subsystem generates automatic reports on CPU resources used by processes in a cgroup. cpuset \u2014 this subsystem assigns individual CPUs (on a multicore system) and memory nodes to processes in a cgroup. devices \u2014 this subsystem allows or denies access to devices by processes in a cgroup. freezer \u2014 this subsystem suspends or resumes processes in a cgroup. memory \u2014 this subsystem sets limits on memory use by processes in a cgroup and generates automatic reports on memory resources used by those processes. Cgroups follow a hierarchical, tree-like structure for each resource controller i.e one cgroup exists for each controller. Each cgroup in a hierarchy inherits certain attributes (e.g limits) from its parent cgroup. Let\u2019s try out a quick demo with memory cgroups to wrap our heads around the above ideas. You\u2019ll need a linux based OS (here, RedHat) with sudo permission to follow along.","title":"Cgroups"},{"location":"level102/containerization_and_orchestration/intro_to_containers/#demo-cgroups","text":"Let\u2019s start by checking if cgroup tools are installed in your machine. Execute mount | grep \"^cgroup\" . If you have the tools installed you\u2019ll see a output like this: If not, install the tools with sudo yum install libcgroup-tools -y . Now, we create a memory cgroup called mem_group with \u201croot\u201d as the owner of the cgroup. Command executed sudo cgcreate -a root -g memory:mem_group . Verify that cgroup is created. /sys/fs/cgroup/ is the pseudo filesystem where a newly created cgroup is added as a sub-group. Memory cgroup puts a limit on the memory usage of processes in the cgroup. Let\u2019s see what the limits are for mem_group. The file for checking the memory limit is memory.limit_in_bytes( more information here , if you\u2019re interested). Note that mem_group has inherited the limit from its parent cgroup Now, let\u2019s reduce the memory usage limit to 20KB for the purpose of our demo (the actual limit is rounded off to the nearest power of 2). This limit is too low and hence most of the processes attached to mem_group should be OOM killed. Create a new shell and attach it to the cgroup. We need sudo permissions for this. The process is OOM killed as expected. You can confirm the same with dmesg logs (mm_fault_error). If you want to try out a more in-depth exercise on cgroups, check out this tutorial from Geeks for Geeks . Let\u2019s come back to containers again. Containers share the same kernel as the underlying host operating system and provide an isolated environment of the application within. Cgroups help in managing resources used by processes within a container and namespaces help isolate network stack, pids, users, group ids and mount points in a container from another container running on the same host. Of course, there are more components to containers which truly make it fully functional but that discussion is out of scope of this module.","title":"DEMO: cgroups"},{"location":"level102/containerization_and_orchestration/intro_to_containers/#container-engine","text":"Container engines ease the process of creating and managing containers in a host machine. How? The container creation workflow typically begins with a container image. A container image is a packaged, portable version of the target application bundled with all dependencies for it to run. These container images are either available on the host machine (container host) from previous builds or need to be pulled from a remote repository of images. Sometimes the container engine might need to build the container image from a set of instructions. Finally once the container image is fetched/built, the container engine unpacks the image and creates an isolated environment for the application as per the image specifications. The files in the container image are then mounted to the isolated environment to get the application up and running within the container. There are several container engines available like Docker, RKT, LXC (one of the first container engines) which require different image formats (Docker, LXD). OCI (Open Container Initiative) is a collaborative project started by Docker that aims to standardize container runtime specifications and image formats across vendors. OCI FAQ section is a good place to start if you\u2019re curious about this project. We will focus on Docker in the next section .","title":"Container engine"},{"location":"level102/containerization_and_orchestration/orchestration_with_kubernetes/","text":"Introduction Now we finally arrive at the most awaited part: running and managing containers at scale. So far, we have seen how Docker facilitates managing the life-cycle of containers and provides improved portability of applications. Docker does provide a solution for easing the deployment of containers on a large scale ( you can check out Docker Swarm, if interested) which integrates well with Docker containers. However, Kubernetes has become the de-facto tool for orchestrating the management of microservices (as containers) in large distributed environments. Let\u2019s see the points of interest for us, SREs, to use container orchestration tools and Kubernetes in particular. Motivation to use Kubernetes Ease of usage Though there is a steep learning curve associated with Kubernetes, once learnt , can be used as a one stop tool to manage your microservices. With a single command it is possible to deploy full fledged production ready environments. The desired state of an application needs to be recorded as a YAML manifest and Kubernetes manages the application for you. Ensure optimum usage of resources We can specify limits on resources used by each container in a deployment. We can also specify our choice of nodes where Kubernetes can schedule nodes to be deployed (e.g microservices with high CPU consumption can be instructed to be deployed in high compute nodes). Fault tolerance Self-healing is built into basic resource types of Kubernetes. This removes the headache of designing a fault tolerant application system from scratch. This applies especially to stateless applications. Infrastructure agnostic Kubernetes does not have vendor lock-in. It can be set up in multiple cloud environments or in on-prem data centers. Strong community support and documentation Kubernetes is open-source and many technologies like operators, service mesh etc. have been built by the community to manage and monitor Kubernetes-orchestrated applications better. Extensible and customisable We can build our custom resource definitions which fit our use case for managing applications and use Kubernetes to manage them (with custom controllers). You can check out this article if you are more interested in this topic. Architecture of Kubernetes Here\u2019s a diagram (from the official Kubernetes documentation ) containing different components which make Kubernetes work: Kubernetes components can be divided into two parts: control plane components and data plane components . A Kubernetes cluster consists of 1 or more host machines (called nodes) where the containers managed by Kubernetes are run. This constitutes the data plane (or node plane). The brain of Kubernetes which responds to events from the node plane (e.g create a pod, replicas mismatch) and does the main orchestration is called the control plane. All control plane components are typically installed in a master node. This master node does not run any user containers. The Kubernetes components themselves are run as containers wrapped in Pods (which is the most basic kubernetes resource object). Control plane components: kube-apiserver etcd kube-scheduler kube-controller-manager Node plane components kubelet kube-proxy This workflow might help you understand the working on components better: An SRE installs kubectl in their local machine. This is the client which interacts with the Kubernetes control plane (and hence the cluster). They create a YAML file, called manifest which specifies the desired state of the resource (e.g a deployment names \u201cfrontend\u201d needs 3 pods to always be running) When they issue a command to create objects based in the YAML file, the kubectl CLI tool sends a rest API request to the kube-apiserver . If the manifest is valid, it is stored as key value pairs in the etcd server on the control plane. kube-scheduler chooses which nodes to put the containers on (basically schedules them) There are controller processes (managed by kube-controller manager) which makes sure the current state of the cluster is equivalent to the desired state (here, 3 pods are indeed running in the cluster -> all is fine). On the node plane side, kubelet makes sure that pods are locally kept in running state. LAB Prerequisites The best way to start this exercise is to use a Play with kubernetes lab . The environment gets torn down after 4 hours. So make sure that you save your files if you want to resume them. For persistent kubernetes clusters, you can set it up either in your local (using minikube ) or you can create a kubernetes cluster in Azure , GCP or any other cloud provider. Knowledge of YAML is nice to have for understanding the manifest files. Hands-on Lab 1: We are going to create an object called Pod which is the most basic unit for running a container in Kubernetes. Here, we will create a pod called \u2018nginx-pod\u201d which contains an nginx container called \u201cweb\u201d. We will also expose port 80 in the container so that we can interact with the nginx container. Save the below manifest in a file called nginx-pod.yaml apiVersion: v1 #[1] kind: Pod #[2] metadata: #[3] name: nginx-pod #[4] labels: #[5] app: nginx spec: #[6] containers: #[7] - name: web #[8] image: nginx #[9] ports: #[10] - name: web #[11] containerPort: 80 #[12] protocol: TCP #[13] Let\u2019s very briefly understand what\u2019s here: #[2] - kind: The \u201ckind\u201d of object that\u2019s being created. Here it is a Pod #[1] - apiVersion: The apiVersion of the \u201cPod\u201d resource. There could be minor changes in the values or keys in the yaml file if the version varies. #[3] - metadata: The metadata section of the file where pod labels and name is given #[6] - spec: This is the main part where the things inside the pod are defined These are not random key value pairs! They have to be interpretable by the kubeapiserver. You can check which key value pairs are optional/mandatory using kubectl explain pod command. Do try it out! Apply the manifest using the command kubectl apply -f nginx-pod.yaml . This creates the \u201cnginx-pod\u201d pod in the kubernetes cluster. Verify that the pod is in running state using kubectl get pod . It shows that nginx-pod is in Running state. 1/1 indicates that out of 1 out of 1 container(s) inside the pod is healthy. To check if the container running in \u201cnginx-pod\u201d is indeed \u201cweb\u201d we do the kubectl describe pod/nginx-pod command. This gives a lengthy output with a detailed description of the pod and the events that happened since the pod was created. This command is very useful for debugging. The part we are concerned here is this: You can see \u201cweb\u201d under the Containers section with Image as nginx. This is what we are looking for. How do we access the welcome page of nginx \u201cweb\u201d container? In the describe command you can see the IP address of the pod. Each pod is assigned an IP address on creation. Here, this is 10.244.1.3 Issue a curl request from the host curl 10.244.1.3:80 . You will get the welcome page! Let\u2019s say we want to use a specific tag of nginx (say 1.20.1) in the same pod i.e we want to modify some property of the pod. You can try editing nginx-pod.yaml (image: nginx:1.20.1 in #[9])and reapplying (step 2.). It will create a new container in the same pod with the new image. A container is created within the pod but the pod is the same. You can verify by checking the pod start time in describe command. It would show a much older time. What if we want to change the image to 1.20.1 for 1000 nginx pods? Stepping a little back, what if we want to create 1000 nginx pods. Of course, we can write a script but Kubernetes already offers a resource type called \u201cdeployment\u201d to manage large scale deployments better. Lab 2: We\u2019ll go a step further to see how we can create more than a single instance of the nginx pod at the same time. We will first create Save the below manifest in a file called nginx-deploy.yaml apiVersion: apps/v1 kind: Deployment #[1] metadata: name: nginx-deployment labels: app: nginx spec: replicas: 3 #[2] selector: matchLabels: app: nginx #[3] template: #[4] metadata: labels: app: nginx #[5] spec: containers: - name: web image: nginx ports: - name: web containerPort: 80 protocol: \"TCP\" You can see that it is similar to a pod definition till spec ( #[1] has Deployment as kind, api version is also different). Another thing interesting observation is the metadata and spec parts under #[4] is almost the same as the metadata and spec section under the Pod definition in Lab 1 (do go up and cross check this). What this implies is that we are deploying 3 nginx pods similar to Lab1. Also, the labels in matchLabels should be the same as labels under #[4] . Now apply the manifest using kubectl apply -f nginx-deploy.yaml Verify that 3 pods are indeed created. If you\u2019re curious, check the output of kubectl get deploy and kubectl describe deploy nginx-deployment . Delete one of the 3 pods using kubectl delete pod . After a few seconds again do kubectl get pod . You can see that a new pod is spawned to keep the total number of pods as 3 (see AGE 15s compared to others created 27 minutes ago)! This is a demonstration of how Kubernetes does fault tolerance. This is a property of Kubernetes deployment object (kill the pod from Lab1, it won\u2019t be respawned :) ) Let\u2019s say we want to increase the number of pods to 10. Try out kubectl scale deploy --replicas=10 nginx-deployment . You can see that 3/10 pods are older than the rest. This means Kubernetes has added 7 extra pods to scale the deployment to 10. This shows how simple it is to scale up and scale down containers using Kubernetes. Let\u2019s put all these pods behind a ClusterIP service. Execute kubectl expose deployment nginx-deployment --name=nginx-service . Curl the IP corresponding to 10.96.114.184. This curl request reaches one of the 10 pods in the deployment \u201cnginx-deployment\u201d in a round robin fashion. What happens when we execute the expose command is that a kubernetes Service is created of type Cluster IP so that all the pods behind this service are accessible through a single local IP (10.96.114.184, here). It is possible to have a public IP instead (i.e an actual external load balancer) by creating a Service of type LoadBalancer . Do feel free to play around with it! The above exercises a pretty good exposure to using Kubernetes to manage large scale deployments. Trust me, the process is very similar to the above for operating 1000 deployments and containers too! While a Deployment object is good enough for managing stateless applications, Kuberenetes provides other resources like Job, Daemonset, Cronjob, Statefulset etc. to manage special use cases. eAdditional labs: https://kubernetes.courselabs.co/ (Huge number of free follow-along exercises to play with Kubernetes) Advanced topics Most often than not, microservices orchestrated with Kubernetes contain dozens of instances of resources like deployment, services and configs. The manifests for these applications can be auto- generated with Helm templates and passed on as Helm charts. Similar to how we have PiPy for python packages there are remote repositories like Bitnami where Helm charts (e.g for setting up a production-ready Prometheus or Kafka with a single click) can be downloaded and used. This is a good place to begin . Kuberenetes provides the flexibility to create our custom resources (similar to Deployment or the Pod which we saw). For instance, if you want to create 5 instances of a resource with kind as SchoolOfSre you can! The only thing is that you have to write your custom resource for it. You can also build a custom operator for your custom resource to take certain actions on the resource instance. You can check here for more information.","title":"Orchestration With Kubernetes"},{"location":"level102/containerization_and_orchestration/orchestration_with_kubernetes/#introduction","text":"Now we finally arrive at the most awaited part: running and managing containers at scale. So far, we have seen how Docker facilitates managing the life-cycle of containers and provides improved portability of applications. Docker does provide a solution for easing the deployment of containers on a large scale ( you can check out Docker Swarm, if interested) which integrates well with Docker containers. However, Kubernetes has become the de-facto tool for orchestrating the management of microservices (as containers) in large distributed environments. Let\u2019s see the points of interest for us, SREs, to use container orchestration tools and Kubernetes in particular.","title":"Introduction"},{"location":"level102/containerization_and_orchestration/orchestration_with_kubernetes/#motivation-to-use-kubernetes","text":"Ease of usage Though there is a steep learning curve associated with Kubernetes, once learnt , can be used as a one stop tool to manage your microservices. With a single command it is possible to deploy full fledged production ready environments. The desired state of an application needs to be recorded as a YAML manifest and Kubernetes manages the application for you. Ensure optimum usage of resources We can specify limits on resources used by each container in a deployment. We can also specify our choice of nodes where Kubernetes can schedule nodes to be deployed (e.g microservices with high CPU consumption can be instructed to be deployed in high compute nodes). Fault tolerance Self-healing is built into basic resource types of Kubernetes. This removes the headache of designing a fault tolerant application system from scratch. This applies especially to stateless applications. Infrastructure agnostic Kubernetes does not have vendor lock-in. It can be set up in multiple cloud environments or in on-prem data centers. Strong community support and documentation Kubernetes is open-source and many technologies like operators, service mesh etc. have been built by the community to manage and monitor Kubernetes-orchestrated applications better. Extensible and customisable We can build our custom resource definitions which fit our use case for managing applications and use Kubernetes to manage them (with custom controllers). You can check out this article if you are more interested in this topic.","title":"Motivation to use Kubernetes"},{"location":"level102/containerization_and_orchestration/orchestration_with_kubernetes/#architecture-of-kubernetes","text":"Here\u2019s a diagram (from the official Kubernetes documentation ) containing different components which make Kubernetes work: Kubernetes components can be divided into two parts: control plane components and data plane components . A Kubernetes cluster consists of 1 or more host machines (called nodes) where the containers managed by Kubernetes are run. This constitutes the data plane (or node plane). The brain of Kubernetes which responds to events from the node plane (e.g create a pod, replicas mismatch) and does the main orchestration is called the control plane. All control plane components are typically installed in a master node. This master node does not run any user containers. The Kubernetes components themselves are run as containers wrapped in Pods (which is the most basic kubernetes resource object). Control plane components: kube-apiserver etcd kube-scheduler kube-controller-manager Node plane components kubelet kube-proxy This workflow might help you understand the working on components better: An SRE installs kubectl in their local machine. This is the client which interacts with the Kubernetes control plane (and hence the cluster). They create a YAML file, called manifest which specifies the desired state of the resource (e.g a deployment names \u201cfrontend\u201d needs 3 pods to always be running) When they issue a command to create objects based in the YAML file, the kubectl CLI tool sends a rest API request to the kube-apiserver . If the manifest is valid, it is stored as key value pairs in the etcd server on the control plane. kube-scheduler chooses which nodes to put the containers on (basically schedules them) There are controller processes (managed by kube-controller manager) which makes sure the current state of the cluster is equivalent to the desired state (here, 3 pods are indeed running in the cluster -> all is fine). On the node plane side, kubelet makes sure that pods are locally kept in running state.","title":"Architecture of Kubernetes"},{"location":"level102/containerization_and_orchestration/orchestration_with_kubernetes/#lab","text":"","title":"LAB"},{"location":"level102/containerization_and_orchestration/orchestration_with_kubernetes/#prerequisites","text":"The best way to start this exercise is to use a Play with kubernetes lab . The environment gets torn down after 4 hours. So make sure that you save your files if you want to resume them. For persistent kubernetes clusters, you can set it up either in your local (using minikube ) or you can create a kubernetes cluster in Azure , GCP or any other cloud provider. Knowledge of YAML is nice to have for understanding the manifest files.","title":"Prerequisites"},{"location":"level102/containerization_and_orchestration/orchestration_with_kubernetes/#hands-on","text":"","title":"Hands-on"},{"location":"level102/containerization_and_orchestration/orchestration_with_kubernetes/#lab-1","text":"We are going to create an object called Pod which is the most basic unit for running a container in Kubernetes. Here, we will create a pod called \u2018nginx-pod\u201d which contains an nginx container called \u201cweb\u201d. We will also expose port 80 in the container so that we can interact with the nginx container. Save the below manifest in a file called nginx-pod.yaml apiVersion: v1 #[1] kind: Pod #[2] metadata: #[3] name: nginx-pod #[4] labels: #[5] app: nginx spec: #[6] containers: #[7] - name: web #[8] image: nginx #[9] ports: #[10] - name: web #[11] containerPort: 80 #[12] protocol: TCP #[13] Let\u2019s very briefly understand what\u2019s here: #[2] - kind: The \u201ckind\u201d of object that\u2019s being created. Here it is a Pod #[1] - apiVersion: The apiVersion of the \u201cPod\u201d resource. There could be minor changes in the values or keys in the yaml file if the version varies. #[3] - metadata: The metadata section of the file where pod labels and name is given #[6] - spec: This is the main part where the things inside the pod are defined These are not random key value pairs! They have to be interpretable by the kubeapiserver. You can check which key value pairs are optional/mandatory using kubectl explain pod command. Do try it out! Apply the manifest using the command kubectl apply -f nginx-pod.yaml . This creates the \u201cnginx-pod\u201d pod in the kubernetes cluster. Verify that the pod is in running state using kubectl get pod . It shows that nginx-pod is in Running state. 1/1 indicates that out of 1 out of 1 container(s) inside the pod is healthy. To check if the container running in \u201cnginx-pod\u201d is indeed \u201cweb\u201d we do the kubectl describe pod/nginx-pod command. This gives a lengthy output with a detailed description of the pod and the events that happened since the pod was created. This command is very useful for debugging. The part we are concerned here is this: You can see \u201cweb\u201d under the Containers section with Image as nginx. This is what we are looking for. How do we access the welcome page of nginx \u201cweb\u201d container? In the describe command you can see the IP address of the pod. Each pod is assigned an IP address on creation. Here, this is 10.244.1.3 Issue a curl request from the host curl 10.244.1.3:80 . You will get the welcome page! Let\u2019s say we want to use a specific tag of nginx (say 1.20.1) in the same pod i.e we want to modify some property of the pod. You can try editing nginx-pod.yaml (image: nginx:1.20.1 in #[9])and reapplying (step 2.). It will create a new container in the same pod with the new image. A container is created within the pod but the pod is the same. You can verify by checking the pod start time in describe command. It would show a much older time. What if we want to change the image to 1.20.1 for 1000 nginx pods? Stepping a little back, what if we want to create 1000 nginx pods. Of course, we can write a script but Kubernetes already offers a resource type called \u201cdeployment\u201d to manage large scale deployments better.","title":"Lab 1:"},{"location":"level102/containerization_and_orchestration/orchestration_with_kubernetes/#lab-2","text":"We\u2019ll go a step further to see how we can create more than a single instance of the nginx pod at the same time. We will first create Save the below manifest in a file called nginx-deploy.yaml apiVersion: apps/v1 kind: Deployment #[1] metadata: name: nginx-deployment labels: app: nginx spec: replicas: 3 #[2] selector: matchLabels: app: nginx #[3] template: #[4] metadata: labels: app: nginx #[5] spec: containers: - name: web image: nginx ports: - name: web containerPort: 80 protocol: \"TCP\" You can see that it is similar to a pod definition till spec ( #[1] has Deployment as kind, api version is also different). Another thing interesting observation is the metadata and spec parts under #[4] is almost the same as the metadata and spec section under the Pod definition in Lab 1 (do go up and cross check this). What this implies is that we are deploying 3 nginx pods similar to Lab1. Also, the labels in matchLabels should be the same as labels under #[4] . Now apply the manifest using kubectl apply -f nginx-deploy.yaml Verify that 3 pods are indeed created. If you\u2019re curious, check the output of kubectl get deploy and kubectl describe deploy nginx-deployment . Delete one of the 3 pods using kubectl delete pod . After a few seconds again do kubectl get pod . You can see that a new pod is spawned to keep the total number of pods as 3 (see AGE 15s compared to others created 27 minutes ago)! This is a demonstration of how Kubernetes does fault tolerance. This is a property of Kubernetes deployment object (kill the pod from Lab1, it won\u2019t be respawned :) ) Let\u2019s say we want to increase the number of pods to 10. Try out kubectl scale deploy --replicas=10 nginx-deployment . You can see that 3/10 pods are older than the rest. This means Kubernetes has added 7 extra pods to scale the deployment to 10. This shows how simple it is to scale up and scale down containers using Kubernetes. Let\u2019s put all these pods behind a ClusterIP service. Execute kubectl expose deployment nginx-deployment --name=nginx-service . Curl the IP corresponding to 10.96.114.184. This curl request reaches one of the 10 pods in the deployment \u201cnginx-deployment\u201d in a round robin fashion. What happens when we execute the expose command is that a kubernetes Service is created of type Cluster IP so that all the pods behind this service are accessible through a single local IP (10.96.114.184, here). It is possible to have a public IP instead (i.e an actual external load balancer) by creating a Service of type LoadBalancer . Do feel free to play around with it! The above exercises a pretty good exposure to using Kubernetes to manage large scale deployments. Trust me, the process is very similar to the above for operating 1000 deployments and containers too! While a Deployment object is good enough for managing stateless applications, Kuberenetes provides other resources like Job, Daemonset, Cronjob, Statefulset etc. to manage special use cases. eAdditional labs: https://kubernetes.courselabs.co/ (Huge number of free follow-along exercises to play with Kubernetes)","title":"Lab 2:"},{"location":"level102/containerization_and_orchestration/orchestration_with_kubernetes/#advanced-topics","text":"Most often than not, microservices orchestrated with Kubernetes contain dozens of instances of resources like deployment, services and configs. The manifests for these applications can be auto- generated with Helm templates and passed on as Helm charts. Similar to how we have PiPy for python packages there are remote repositories like Bitnami where Helm charts (e.g for setting up a production-ready Prometheus or Kafka with a single click) can be downloaded and used. This is a good place to begin . Kuberenetes provides the flexibility to create our custom resources (similar to Deployment or the Pod which we saw). For instance, if you want to create 5 instances of a resource with kind as SchoolOfSre you can! The only thing is that you have to write your custom resource for it. You can also build a custom operator for your custom resource to take certain actions on the resource instance. You can check here for more information.","title":"Advanced topics"},{"location":"level102/continuous_integration_and_continuous_delivery/cicd_brief_history/","text":"The Evolution of the CI/CD Traditional development approaches have been around for a very long time. The waterfall model has been widely used in both large and small projects and has been successful. Despite the success, it has a lot of drawbacks like longer cycle times or delivery. While multiple team members are working on the project, the code changes get accumulated and never integrated until the planned build date. The build usually happens on agreed cycles that range from a month to a quarter. This results in several integration issues and build failures as the developers were working on their features in silos. It was a nightmare situation for the operations teams/for anyone to deploy the new builds/releases to the production environment because of lack of proper documentation on every change and the configuration requirements. So, to deploy successfully, often it required hot fixes and immediate patches. Another big challenge was collaboration. It is rare that the developer meets the operation engineers and does not have a full understanding of the production environment. All these challenges have given rise to longer cycle times for the delivery of the code changes. Agile methodology prescribes the delivery of incremental delivery of features in multiple iterations. So, the developers commit their code changes in smaller increments and roll out more frequently. Every code commit triggers a new build, and the integration issues are identified much early. This has improved the build process and thereby reduced the cycle time. This process is known as continuous integration or CI . The big barrier between the developers and the operation teams has been shrunken with the emergence of the trend where organizations are adapting to the DevOps and SRE disciplines. The collaboration between the developers and the operation teams is improved. Moreover, the use of the same tools and processes by both the teams has improved coordination and avoided conflicting understanding of the process. One of the main drivers in this regard is the continuous delivery (CD) process that ensures the incremental deployment of smaller changes. There are multiple pre-production environments also called the staging environments before deploying to production environments. CI/CD and DevOps The term DevOps represents the combination of Development (Dev) and Operations (Ops) teams. That is bringing developers and operations teams together for more collaboration. The development team often wants to introduce more features and more changes while the operation teams are more focused on the stability of the application in production. A change is always taken as a threat by the operations team as it can shake the stability of the environment. DevOps is termed as a culture that introduces the processes to reduce the barriers between developers and operations. The collaboration between Dev and Ops allows better follow-up of end-to-end production deployments and more frequent deployments. So, thus CI/CD is a key element in the DevOps processes.","title":"Brief History"},{"location":"level102/continuous_integration_and_continuous_delivery/cicd_brief_history/#the-evolution-of-the-cicd","text":"Traditional development approaches have been around for a very long time. The waterfall model has been widely used in both large and small projects and has been successful. Despite the success, it has a lot of drawbacks like longer cycle times or delivery. While multiple team members are working on the project, the code changes get accumulated and never integrated until the planned build date. The build usually happens on agreed cycles that range from a month to a quarter. This results in several integration issues and build failures as the developers were working on their features in silos. It was a nightmare situation for the operations teams/for anyone to deploy the new builds/releases to the production environment because of lack of proper documentation on every change and the configuration requirements. So, to deploy successfully, often it required hot fixes and immediate patches. Another big challenge was collaboration. It is rare that the developer meets the operation engineers and does not have a full understanding of the production environment. All these challenges have given rise to longer cycle times for the delivery of the code changes. Agile methodology prescribes the delivery of incremental delivery of features in multiple iterations. So, the developers commit their code changes in smaller increments and roll out more frequently. Every code commit triggers a new build, and the integration issues are identified much early. This has improved the build process and thereby reduced the cycle time. This process is known as continuous integration or CI . The big barrier between the developers and the operation teams has been shrunken with the emergence of the trend where organizations are adapting to the DevOps and SRE disciplines. The collaboration between the developers and the operation teams is improved. Moreover, the use of the same tools and processes by both the teams has improved coordination and avoided conflicting understanding of the process. One of the main drivers in this regard is the continuous delivery (CD) process that ensures the incremental deployment of smaller changes. There are multiple pre-production environments also called the staging environments before deploying to production environments.","title":"The Evolution of the CI/CD"},{"location":"level102/continuous_integration_and_continuous_delivery/cicd_brief_history/#cicd-and-devops","text":"The term DevOps represents the combination of Development (Dev) and Operations (Ops) teams. That is bringing developers and operations teams together for more collaboration. The development team often wants to introduce more features and more changes while the operation teams are more focused on the stability of the application in production. A change is always taken as a threat by the operations team as it can shake the stability of the environment. DevOps is termed as a culture that introduces the processes to reduce the barriers between developers and operations. The collaboration between Dev and Ops allows better follow-up of end-to-end production deployments and more frequent deployments. So, thus CI/CD is a key element in the DevOps processes.","title":"CI/CD and DevOps"},{"location":"level102/continuous_integration_and_continuous_delivery/conclusion/","text":"Applications in SRE Role The Monitoring, Automation and Eliminating the toil are some of the core pillars of the SRE discipline. As an SRE, you may require spending about 50% of time on automating the repetitive tasks and to eliminate the toil. CI/CD pipelines are one of the crucial tools for the SRE. They help in delivering the quality application with the smaller and regular and more frequent builds. Additionally, the CI/CD metrics such as Deployment time, Success rate, Cycle time and Automated test success rate etc. are the key things to watch to improve the quality of the product thus improving the reliability of the applications. Infrastructure-as-code is one of the standard practices followed in SRE for automating the repetitive configuration tasks. Every configuration is maintained as code, so it can be deployed using CI/CD pipelines. It is important to deliver the configuration changes to the production environments through CI/CD pipelines to maintain the versioning, consistency of the changes across environments and to avoid manual errors. Often, as an SRE, you are required to review the application CI/CD pipelines and recommend additional stages such as static code analysis and the security and privacy checks in the code to improve the security and reliability of the product. Conclusion In this chapter, we have studied the CI/CD pipelines with brief history on the challenges with the traditional build practices. We have also looked at how the CI/CD pipelines augments the SRE discipline. Use of CI/CD pipelines in software development life cycle is a modern approach in the SRE realm that helps achieve greater efficiency. We have also performed a hands-on lab activity on creating the CI/CD pipeline using Jenkins. References Continuous Integration(martinfowler.com) CI/CD for microservices - Azure Architecture Center | Microsoft Docs SREFoundationBlueprint_2 (devopsinstitute.com) Jenkins User Documentation","title":"Conclusion"},{"location":"level102/continuous_integration_and_continuous_delivery/conclusion/#applications-in-sre-role","text":"The Monitoring, Automation and Eliminating the toil are some of the core pillars of the SRE discipline. As an SRE, you may require spending about 50% of time on automating the repetitive tasks and to eliminate the toil. CI/CD pipelines are one of the crucial tools for the SRE. They help in delivering the quality application with the smaller and regular and more frequent builds. Additionally, the CI/CD metrics such as Deployment time, Success rate, Cycle time and Automated test success rate etc. are the key things to watch to improve the quality of the product thus improving the reliability of the applications. Infrastructure-as-code is one of the standard practices followed in SRE for automating the repetitive configuration tasks. Every configuration is maintained as code, so it can be deployed using CI/CD pipelines. It is important to deliver the configuration changes to the production environments through CI/CD pipelines to maintain the versioning, consistency of the changes across environments and to avoid manual errors. Often, as an SRE, you are required to review the application CI/CD pipelines and recommend additional stages such as static code analysis and the security and privacy checks in the code to improve the security and reliability of the product.","title":"Applications in SRE Role"},{"location":"level102/continuous_integration_and_continuous_delivery/conclusion/#conclusion","text":"In this chapter, we have studied the CI/CD pipelines with brief history on the challenges with the traditional build practices. We have also looked at how the CI/CD pipelines augments the SRE discipline. Use of CI/CD pipelines in software development life cycle is a modern approach in the SRE realm that helps achieve greater efficiency. We have also performed a hands-on lab activity on creating the CI/CD pipeline using Jenkins.","title":"Conclusion"},{"location":"level102/continuous_integration_and_continuous_delivery/conclusion/#references","text":"Continuous Integration(martinfowler.com) CI/CD for microservices - Azure Architecture Center | Microsoft Docs SREFoundationBlueprint_2 (devopsinstitute.com) Jenkins User Documentation","title":"References"},{"location":"level102/continuous_integration_and_continuous_delivery/continuous_delivery_release_pipeline/","text":"Continuous Delivery means deploying the application builds more frequently in the non-production environments such as SIT, UAT, INT and performing the integration tests and the acceptance tests automatically. In the CD, the tests are performed on the integrated application instead of the single microservice in the cases of microservice based application. The tests must include all the functional tests and the acceptance tests that may contain the UI tests. The build must be immutable in nature, that is the same package must be deployed across all the environments including the Production. The deployment to the Production is often manual after performing additional acceptance tests such as performance tests etc. So, the fully automated deployment to the Production environments is called the Continuous Deployment (whereas CD \u2013 Continuous delivery doesn\u2019t automatically deploy to Production). The continuous deployment must have a feature toggle so that a feature can be toggled off without the need for redeploying the code. Often, the deployment involves more than one production environment, for example in blue-green environments the application is first deployed to the blue environment and then to the green environment so that the downtime is not required. Fig 3: Continuous Delivery Pipeline","title":"Continuous Delivery and Deployment"},{"location":"level102/continuous_integration_and_continuous_delivery/continuous_integration_build_pipeline/","text":"CI is a software development practice where members of a team integrate their work frequently. Each integration is verified by an automated build (including test) to detect integration errors as quickly as possible. Continuous integration requires that all the code changes be maintained in a single code repository where all the members can push the changes to their feature branches regularly. The code changes must be quickly integrated with the rest of the code and automated builds should happen and feedback to the member to resolve them early. There should be a CI server where it can trigger a build as soon as the code is pushed by a member. The build typically involves compiling the code and transforming it to an executable file such as JARs or DLLs etc. called packaging. It must also perform unit tests with code coverage. Optionally, the build process can have additional stages such as static code analysis and vulnerability checks etc. Jenkins , Bamboo , Travis CI , GitLab , Azure DevOps etc. are the few popular CI tools. These tools provide various plugins and integration such as ant , maven etc. for building and packaging, and Junit, selenium etc. are for performing the unit tests. SonarQube can be used for static code analysis and code security. Fig 1: Continuous Integration Pipeline Fig 2: Continuous Integration Process","title":"Continuous Integration"},{"location":"level102/continuous_integration_and_continuous_delivery/introduction/","text":"Prerequisites Software Development and Maintenance Git Docker What to expect from this course? In this course, you will learn the basics of CI/CD and how it helps drive the SRE discipline in an organization. It also discusses the various DevOps tools in CI/CD practice and a hands-on lab session on Jenkins based pipeline. Finally, it will conclude by explaining the role in the growing SRE philosophy. What is not covered under this course? The course does not cover DevOps elements such as Infrastructure as a code, continuous monitoring applications and infrastructure comprehensively. Table of Contents What is CI/CD? Brief History to CI/CD and DevOps Continuous Integration Continuous Delivery and Deployment Jenkins based CI/CD pipeline - Hands-on Conclusion","title":"Introduction"},{"location":"level102/continuous_integration_and_continuous_delivery/introduction/#prerequisites","text":"Software Development and Maintenance Git Docker","title":"Prerequisites"},{"location":"level102/continuous_integration_and_continuous_delivery/introduction/#what-to-expect-from-this-course","text":"In this course, you will learn the basics of CI/CD and how it helps drive the SRE discipline in an organization. It also discusses the various DevOps tools in CI/CD practice and a hands-on lab session on Jenkins based pipeline. Finally, it will conclude by explaining the role in the growing SRE philosophy.","title":"What to expect from this course?"},{"location":"level102/continuous_integration_and_continuous_delivery/introduction/#what-is-not-covered-under-this-course","text":"The course does not cover DevOps elements such as Infrastructure as a code, continuous monitoring applications and infrastructure comprehensively.","title":"What is not covered under this course?"},{"location":"level102/continuous_integration_and_continuous_delivery/introduction/#table-of-contents","text":"What is CI/CD? Brief History to CI/CD and DevOps Continuous Integration Continuous Delivery and Deployment Jenkins based CI/CD pipeline - Hands-on Conclusion","title":"Table of Contents"},{"location":"level102/continuous_integration_and_continuous_delivery/introduction_to_cicd/","text":"Continuous Integration and Continuous Delivery, also known as CI/CD, is a set of processes that helps in faster integration of software code changes and deployment to the end user in a reliable manner. The more frequent integrations and deployments helps reduce the software development lifecycle. There are three practices in CI/CD: Continuous Integration Continuous Delivery Continuous Deployment Let\u2019s look in detail at each of these in the coming sections. The Benefits of CI/CD Significant reduction in integration problems. Teams can develop cohesive software more rapidly. Improved Collaboration between developers and operation teams can reduce the production integration issues. Faster delivery of new features with less friction Better debugging the production issues and fixing them in the next release/patch.","title":"What is CI/CD?"},{"location":"level102/continuous_integration_and_continuous_delivery/introduction_to_cicd/#the-benefits-of-cicd","text":"Significant reduction in integration problems. Teams can develop cohesive software more rapidly. Improved Collaboration between developers and operation teams can reduce the production integration issues. Faster delivery of new features with less friction Better debugging the production issues and fixing them in the next release/patch.","title":"The Benefits of CI/CD"},{"location":"level102/continuous_integration_and_continuous_delivery/jenkins_cicd_pipeline_hands_on_lab/","text":"Jenkins based CI/CD Pipeline Jenkins is an open-source continuous integration server for orchestrating the CI/CD pipelines. It supports integration with several components, infrastructure such as git, cloud etc. that helps in complete software development life cycle. In this hands-on lab, let us: * Create a build pipeline (CI) for a simple java application. * Adding Test stage to build pipeline This hands-on is based on the Jenkins running on docker on your local workstation, designed for Windows OS. For Linux OS, please follow the demo Note: The hands-on lab is designed with Jenkins on the docker. However, the steps are applicable for the direct docker installation on your windows workstation as well. Installing Git, Docker and Jenkins: Install git command line tool on your workstation. (Follow this to install Git Locally\u00b7) Docker Desktop for windows is installed on the workstation. Follow the instructions to install docker. Ensure that your Docker for Windows installation is configured to run Linux Containers rather than Windows Containers. See the Docker documentation for instructions to switch to Linux containers. Refer this to run and setup the Jenkins on docker. Configure Jenkins with initial steps such as create an admin user etc. Follow Setup wizard. If you have installed the Jenkins on your local workstation, make sure the maven tool is installed. Follow this to installl maven. Forking Sample java application: For this hands-on, let us fork a simple java application from the GitHub simple-java-maven-app . 1. Sign up for the GitHub account Join GitHub \u00b7 GitHub . Once signed up, proceed to login . 2. Open the simple-java-maven-app by clicking on this link 3. On the top right corner, click on the \u2018Fork\u2019 to create a copy of the project to your GitHub account. (Refer Fork A Repo ) 4. Once forked, clone this repository to your local workstation. Create Jenkins Project: Login to the Jenkins portal at http://localhost:8080 using the admin account created earlier during Jenkins\u2019s setup. On your first login, the following screen will appear. Click on \u201c Create a Job \u201d. Fig 4: Jenkins - Create a Job On the next screen, type simple-java-pipeline in the Enter an Item Name field. Select Pipeline from the list of items and click OK . Fig 5: Jenkins - Create Pipeline Click the Pipeline tab at the top of the page to scroll down to the Pipeline section. From the Definition field, choose the Pipeline script from SCM option. This option instructs Jenkins to obtain your Pipeline from Source Control Management (SCM), which will be your locally cloned Git repository. From the SCM field, choose Git . In the Repository URL field, specify the directory path of your locally cloned repository from the Forking Sample Java application section above. Screen looks like below after entering the details. Fig 6: Jenkins - Pipeline Configuration Create Build pipeline using the Jenkinsfile: Jenkinsfile is a script file containing the pipeline configuration and the stages and other instructions to Jenkins to create a pipeline from the file. This file will be saved at the root of the code repository. 1. Using your favorite text editor or IDE, create and save a new text file with the name Jenkinsfile at the root of your local simple-java-maven-app Git repository. 2. Copy the following declarative pipeline code and paste it into the empty Jenkinsfile . pipeline { agent { docker { image 'maven:3.8.1-adoptopenjdk-11' args '-v /root/.m2:/root/.m2' } } stages { stage('Build') { steps { sh 'mvn -B -DskipTests clean package' } } } } Note: If you are running Jenkins on your local workstation without the docker, please change the agent to any as shown below so that it runs on the localhost. Please ensure that the maven tool is installed on your local workstation. pipeline { agent any stages { stage('Build') { steps { sh 'mvn -B -DskipTests clean package' } } } } In the above Jenkinsfile: * We specified an agent where the pipeline should run. 'docker\u2019 in the agent section indicates to run a new docker container with the specified image. * In the stages section, we can define multiple steps as different stages. Here, we have a stage called \u2018Build\u2019, with the maven command for building the java application. Save your Jenkinsfile and commit and push to your forked repository. Run the following commands from the command prompt. cd git add . git commit -m \"Add initial Jenkinsfile\" git push origin master Go to Jenkins portal on your browser and click on the Dashboard . Open the simple-java-pipeline and from the left-menu, click on Build Now . Fig 7: Jenkins - Building the Pipeline Notice the build running under the Build History menu. Click on the build number and it shows the stages. Fig 8: Jenkins - View Running Builds We have successfully created a build pipeline with single stage and ran it. We can check the logs by clicking on the Console Output menu. Additional stages in the build pipeline: In the previous section, we have created the pipeline with a single stage. Typically, your CI pipeline contains multiple stages such as Build, Test and other optional stages such Code scanning etc. In this section, let us add a Test stage to the build pipeline and run. Go back to your text editor/IDE and open Jenkinsfile and the Test stage shown below. stage('Test') { steps { sh 'mvn test' } post { always { junit 'target/surefire-reports/*.xml' } } } The Jenkinsfile looks like below after adding the Test stage. pipeline { agent { docker { image 'maven:3.8.1-adoptopenjdk-11' args '-v /root/.m2:/root/.m2' } } stages { stage('Build') { steps { sh 'mvn -B -DskipTests clean package' } } stage('Test') { steps { sh 'mvn test' } post { always { junit 'target/surefire-reports/*.xml' } } } } } Here the stage \u2018Test\u2019 is added which runs the maven command test. The post -> always section ensures that this step is executed always after the steps are completed. The test report is available through Jenkins\u2019s interface. Note: If you are running Jenkins on your local workstation without the docker, please change the agent to any so that it runs on the localhost. Please ensure that the maven tool is installed on your local workstation. pipeline { agent any stages {\u2026 } } Save your Jenkinsfile and commit and push to your forked repository. Run the following commands from the command prompt. cd git add . git commit -m \"Test stage is added to Jenkinsfile\" git push origin master Go to Jenkins portal on your browser and click on the Dashboard . Open the simple-java-pipeline and from the left-menu, click on Build Now. Notice the Build and Test stages are showing in the Build screen. Fig 9: Jenkins - Viewing the Running Builds with Test stage Included We have now successfully created CI pipeline with two stages: Build and Test stages.","title":"CI/CD Pipeline - Hands-on"},{"location":"level102/continuous_integration_and_continuous_delivery/jenkins_cicd_pipeline_hands_on_lab/#jenkins-based-cicd-pipeline","text":"Jenkins is an open-source continuous integration server for orchestrating the CI/CD pipelines. It supports integration with several components, infrastructure such as git, cloud etc. that helps in complete software development life cycle. In this hands-on lab, let us: * Create a build pipeline (CI) for a simple java application. * Adding Test stage to build pipeline This hands-on is based on the Jenkins running on docker on your local workstation, designed for Windows OS. For Linux OS, please follow the demo Note: The hands-on lab is designed with Jenkins on the docker. However, the steps are applicable for the direct docker installation on your windows workstation as well.","title":"Jenkins based CI/CD Pipeline"},{"location":"level102/continuous_integration_and_continuous_delivery/jenkins_cicd_pipeline_hands_on_lab/#installing-git-docker-and-jenkins","text":"Install git command line tool on your workstation. (Follow this to install Git Locally\u00b7) Docker Desktop for windows is installed on the workstation. Follow the instructions to install docker. Ensure that your Docker for Windows installation is configured to run Linux Containers rather than Windows Containers. See the Docker documentation for instructions to switch to Linux containers. Refer this to run and setup the Jenkins on docker. Configure Jenkins with initial steps such as create an admin user etc. Follow Setup wizard. If you have installed the Jenkins on your local workstation, make sure the maven tool is installed. Follow this to installl maven.","title":"Installing Git, Docker and Jenkins:"},{"location":"level102/continuous_integration_and_continuous_delivery/jenkins_cicd_pipeline_hands_on_lab/#forking-sample-java-application","text":"For this hands-on, let us fork a simple java application from the GitHub simple-java-maven-app . 1. Sign up for the GitHub account Join GitHub \u00b7 GitHub . Once signed up, proceed to login . 2. Open the simple-java-maven-app by clicking on this link 3. On the top right corner, click on the \u2018Fork\u2019 to create a copy of the project to your GitHub account. (Refer Fork A Repo ) 4. Once forked, clone this repository to your local workstation.","title":"Forking Sample java application:"},{"location":"level102/continuous_integration_and_continuous_delivery/jenkins_cicd_pipeline_hands_on_lab/#create-jenkins-project","text":"Login to the Jenkins portal at http://localhost:8080 using the admin account created earlier during Jenkins\u2019s setup. On your first login, the following screen will appear. Click on \u201c Create a Job \u201d. Fig 4: Jenkins - Create a Job On the next screen, type simple-java-pipeline in the Enter an Item Name field. Select Pipeline from the list of items and click OK . Fig 5: Jenkins - Create Pipeline Click the Pipeline tab at the top of the page to scroll down to the Pipeline section. From the Definition field, choose the Pipeline script from SCM option. This option instructs Jenkins to obtain your Pipeline from Source Control Management (SCM), which will be your locally cloned Git repository. From the SCM field, choose Git . In the Repository URL field, specify the directory path of your locally cloned repository from the Forking Sample Java application section above. Screen looks like below after entering the details. Fig 6: Jenkins - Pipeline Configuration","title":"Create Jenkins Project:"},{"location":"level102/continuous_integration_and_continuous_delivery/jenkins_cicd_pipeline_hands_on_lab/#create-build-pipeline-using-the-jenkinsfile","text":"Jenkinsfile is a script file containing the pipeline configuration and the stages and other instructions to Jenkins to create a pipeline from the file. This file will be saved at the root of the code repository. 1. Using your favorite text editor or IDE, create and save a new text file with the name Jenkinsfile at the root of your local simple-java-maven-app Git repository. 2. Copy the following declarative pipeline code and paste it into the empty Jenkinsfile . pipeline { agent { docker { image 'maven:3.8.1-adoptopenjdk-11' args '-v /root/.m2:/root/.m2' } } stages { stage('Build') { steps { sh 'mvn -B -DskipTests clean package' } } } } Note: If you are running Jenkins on your local workstation without the docker, please change the agent to any as shown below so that it runs on the localhost. Please ensure that the maven tool is installed on your local workstation. pipeline { agent any stages { stage('Build') { steps { sh 'mvn -B -DskipTests clean package' } } } } In the above Jenkinsfile: * We specified an agent where the pipeline should run. 'docker\u2019 in the agent section indicates to run a new docker container with the specified image. * In the stages section, we can define multiple steps as different stages. Here, we have a stage called \u2018Build\u2019, with the maven command for building the java application. Save your Jenkinsfile and commit and push to your forked repository. Run the following commands from the command prompt. cd git add . git commit -m \"Add initial Jenkinsfile\" git push origin master Go to Jenkins portal on your browser and click on the Dashboard . Open the simple-java-pipeline and from the left-menu, click on Build Now . Fig 7: Jenkins - Building the Pipeline Notice the build running under the Build History menu. Click on the build number and it shows the stages. Fig 8: Jenkins - View Running Builds We have successfully created a build pipeline with single stage and ran it. We can check the logs by clicking on the Console Output menu.","title":"Create Build pipeline using the Jenkinsfile:"},{"location":"level102/continuous_integration_and_continuous_delivery/jenkins_cicd_pipeline_hands_on_lab/#additional-stages-in-the-build-pipeline","text":"In the previous section, we have created the pipeline with a single stage. Typically, your CI pipeline contains multiple stages such as Build, Test and other optional stages such Code scanning etc. In this section, let us add a Test stage to the build pipeline and run. Go back to your text editor/IDE and open Jenkinsfile and the Test stage shown below. stage('Test') { steps { sh 'mvn test' } post { always { junit 'target/surefire-reports/*.xml' } } } The Jenkinsfile looks like below after adding the Test stage. pipeline { agent { docker { image 'maven:3.8.1-adoptopenjdk-11' args '-v /root/.m2:/root/.m2' } } stages { stage('Build') { steps { sh 'mvn -B -DskipTests clean package' } } stage('Test') { steps { sh 'mvn test' } post { always { junit 'target/surefire-reports/*.xml' } } } } } Here the stage \u2018Test\u2019 is added which runs the maven command test. The post -> always section ensures that this step is executed always after the steps are completed. The test report is available through Jenkins\u2019s interface. Note: If you are running Jenkins on your local workstation without the docker, please change the agent to any so that it runs on the localhost. Please ensure that the maven tool is installed on your local workstation. pipeline { agent any stages {\u2026 } } Save your Jenkinsfile and commit and push to your forked repository. Run the following commands from the command prompt. cd git add . git commit -m \"Test stage is added to Jenkinsfile\" git push origin master Go to Jenkins portal on your browser and click on the Dashboard . Open the simple-java-pipeline and from the left-menu, click on Build Now. Notice the Build and Test stages are showing in the Build screen. Fig 9: Jenkins - Viewing the Running Builds with Test stage Included We have now successfully created CI pipeline with two stages: Build and Test stages.","title":"Additional stages in the build pipeline:"},{"location":"level102/linux_intermediate/archiving_backup/","text":"Archiving and Backup Introduction One of the things SREs make sure of is the services are up all the time (at least 99.99% of the time), but the amount of data generated at each server running those services are immense. This data could be logs, user data in the database, or any other kind of metadata. Hence we need to compress, archive, rotate, and Backup the data in a timely manner for data safety and to make sure we don\u2019t run out of space. Archiving We usually archive the data that are no longer needed but are kept mostly for compliance purposes. This helps in storing the data into compressed format saving a lot of space. Below section is to familiarize with the archiving tools and commands. gzip gzip is a program used to compress one or more files, it replaces the original file with a compressed version of the original file. Here we can see that the messages log file is compressed to almost one-fifth of the original size and replaced with messages.gz. We can uncompress this file using gunzip command. tar tar program is a tool for archiving files and directories into a single file (often called tarball). This tool is usually used to prepare archives of files before it is transferred to a long term backup server. tar doesn\u2019t replace the existing files and folders but creates a new file with extension .tar . It provides lot of flag to choose from for archiving Flags Description -c Creates archive -x Extracts the archive -f Creates archive with the given filename -t Displays or lists files in archived file -u Archives and adds to an existing archive file -v Displays verbose information -A Concatenates the archived file -z Compresses the tar file using gzip -j Compresses the tar file using bzip2 -W Verifies an archive file -r Updates or adds file or directory in already existing .tar file Create an archive with files and folder Flag c is used for creating the archive where f is the filename. Listing files in the archive We can use flag t for listing out what an archive contains. Extract files from the archive We can use flag x to unarchive the archive. Backup Backup is a process of copying/duplicating the existing data, This backup can be used to restore the dataset in case of data loss. Data backup also becomes critical when the data is not needed in a day to day job but can be referred to as a source of truth and for compliance reasons in future. Different types of backup are : Incremental backup Incremental backup is the backup of data since the last backup, this reduces data redundancy and storage efficiency. Differential backup Sometimes our data keeps on modifying/updating. In that case we take backup of changes that occurred since the last backup called differential backup. Network backup Network backup refers to sending out data over the network from the source to a backup destination in a client-server model. This backup destination can be centralized or decentralized. Decentralized backups are useful for disaster recovery scenarios. rsync is one of the linux command which sync up file from one server to the destination server over the network. The syntax for rsync goes like rsync \\[options\\] . We can locate the file on the path specified after : (colon) in the \u201c destination\u201d . If nothing is specified the default path is the home directory of the user used for backup. /home/azureuser in this case. You can always look for different options for rsync using the man rsync command. Cloud Backup There are various third parties which provide the backup of data to the cloud. These cloud backups are much more reliable than stored backups on local machines or any server without RAID configuration as these providers manage redundancy of data, data recovery along with the data security. Two most widely used cloud backup options are Azure backup (from Microsoft) and Amazon Glacier backup (from AWS).","title":"Archiving and Backup"},{"location":"level102/linux_intermediate/archiving_backup/#archiving-and-backup","text":"","title":"Archiving and Backup"},{"location":"level102/linux_intermediate/archiving_backup/#introduction","text":"One of the things SREs make sure of is the services are up all the time (at least 99.99% of the time), but the amount of data generated at each server running those services are immense. This data could be logs, user data in the database, or any other kind of metadata. Hence we need to compress, archive, rotate, and Backup the data in a timely manner for data safety and to make sure we don\u2019t run out of space.","title":"Introduction"},{"location":"level102/linux_intermediate/archiving_backup/#archiving","text":"We usually archive the data that are no longer needed but are kept mostly for compliance purposes. This helps in storing the data into compressed format saving a lot of space. Below section is to familiarize with the archiving tools and commands.","title":"Archiving"},{"location":"level102/linux_intermediate/archiving_backup/#gzip","text":"gzip is a program used to compress one or more files, it replaces the original file with a compressed version of the original file. Here we can see that the messages log file is compressed to almost one-fifth of the original size and replaced with messages.gz. We can uncompress this file using gunzip command.","title":"gzip"},{"location":"level102/linux_intermediate/archiving_backup/#tar","text":"tar program is a tool for archiving files and directories into a single file (often called tarball). This tool is usually used to prepare archives of files before it is transferred to a long term backup server. tar doesn\u2019t replace the existing files and folders but creates a new file with extension .tar . It provides lot of flag to choose from for archiving Flags Description -c Creates archive -x Extracts the archive -f Creates archive with the given filename -t Displays or lists files in archived file -u Archives and adds to an existing archive file -v Displays verbose information -A Concatenates the archived file -z Compresses the tar file using gzip -j Compresses the tar file using bzip2 -W Verifies an archive file -r Updates or adds file or directory in already existing .tar file","title":"tar"},{"location":"level102/linux_intermediate/archiving_backup/#create-an-archive-with-files-and-folder","text":"Flag c is used for creating the archive where f is the filename.","title":"Create an archive with files and folder"},{"location":"level102/linux_intermediate/archiving_backup/#listing-files-in-the-archive","text":"We can use flag t for listing out what an archive contains.","title":"Listing files in the archive"},{"location":"level102/linux_intermediate/archiving_backup/#extract-files-from-the-archive","text":"We can use flag x to unarchive the archive.","title":"Extract files from the archive"},{"location":"level102/linux_intermediate/archiving_backup/#backup","text":"Backup is a process of copying/duplicating the existing data, This backup can be used to restore the dataset in case of data loss. Data backup also becomes critical when the data is not needed in a day to day job but can be referred to as a source of truth and for compliance reasons in future. Different types of backup are :","title":"Backup"},{"location":"level102/linux_intermediate/archiving_backup/#incremental-backup","text":"Incremental backup is the backup of data since the last backup, this reduces data redundancy and storage efficiency.","title":"Incremental backup"},{"location":"level102/linux_intermediate/archiving_backup/#differential-backup","text":"Sometimes our data keeps on modifying/updating. In that case we take backup of changes that occurred since the last backup called differential backup.","title":"Differential backup"},{"location":"level102/linux_intermediate/archiving_backup/#network-backup","text":"Network backup refers to sending out data over the network from the source to a backup destination in a client-server model. This backup destination can be centralized or decentralized. Decentralized backups are useful for disaster recovery scenarios. rsync is one of the linux command which sync up file from one server to the destination server over the network. The syntax for rsync goes like rsync \\[options\\] . We can locate the file on the path specified after : (colon) in the \u201c destination\u201d . If nothing is specified the default path is the home directory of the user used for backup. /home/azureuser in this case. You can always look for different options for rsync using the man rsync command.","title":"Network backup"},{"location":"level102/linux_intermediate/archiving_backup/#cloud-backup","text":"There are various third parties which provide the backup of data to the cloud. These cloud backups are much more reliable than stored backups on local machines or any server without RAID configuration as these providers manage redundancy of data, data recovery along with the data security. Two most widely used cloud backup options are Azure backup (from Microsoft) and Amazon Glacier backup (from AWS).","title":"Cloud Backup"},{"location":"level102/linux_intermediate/bashscripting/","text":"Bash Scripting Introduction As an SRE, the Linux system sits at the core of our day to day work and so is bash scripting. It\u2019s a scripting language that is run by Linux Bash Interpreter. Until now we have covered a lot of features mostly on a command line, now we will use this command line as an interpreter to write programs that will ease our day to day job as an SRE. Writing the first bash script: We will start with a simple program, we will use Vim as the editor during the whole journey. #!/bin/bash # This if my first bash script # Line starting with # is commented echo \"Hello world!\" The first line of the script starting with \u201c#!\u201d is called she-bang. This is simply to let the system which interpreter to use while executing the script. Any Line starting with \u201c#\u201d (other than #!) is referred to as comments in script and is ignored by the interpreter while executing the script. Line 6 shows the \u201cecho\u201d command that we would be running. We will save this script as \u201cfirstscript.sh\u201d and make the script executable using chmod . Next thing is to run the script with the explicit path. We can see the desired \u201cHello World!\u201d as output. Taking user input and working with variables: Taking standard input using the read command and working with variables in bash. #!/bin/bash #We will take standard input #Will list all files at the path #We will concate variable and string echo \"Enter the path\" read path echo \"How deep in directory you want to go:\" read depth echo \"All files at path \" $path du -d $depth -all -h $path We are reading path in variable \u201c path \u201d and variable \u201c depth \u201d to list files and directories up to that depth. We concatenated strings with variables. We always use $ (dollar-sign) to reference the value it contains. We pass these variables to the du command to list out all the files and directories in that path upto the desired depth. Exit status: Every command and script when it completes executing, returns an integer in the range from 0 to 255 to the system, this is called exit status. \u201c0\u201d denotes success of the command while non-zero return code usually indicates various kinds of errors. We use $? special shell variable to get exit status of the last executed script or command. Command line arguments and understanding If \u2026 else branching: Another way to pass some values to the script is using command line arguments. Usually command line arguments in bash are accessed by $ followed by the index. The 0th index refers to the file itself, $1 to the first argument and so on. We use $# to check the count of arguments passed to the script. Making decisions in the programming language is it\u2019s integral part, and to tackle different conditions we use if \u2026 else statements or some more nested variant of it. The below script uses multiple concepts in one script. The aim of the script is to get some properties of the file. Line 4 to 7 is the standard example of \"if statement\" in bash. Syntax is as explained below: If [ condition ]; then If_block_to_execute else else_block_to_execute fi fi is to close the if \u2026 else block. We are comparing count of argument($#) if it is equal to 1 or not. If not we prompt for only one argument and exit the script with status code 1(not a success). One or more if statements can exist without else statement but vice versa doesn\u2019t make any sense. Operator -ne is used to compare two integers, read as \u201cinteger1 not equal to integer 2\u201d. Other comparison operators are: Operations Description num1 -eq num2 check if 1st number is equal to 2nd number num1 -ge num2 checks if 1st number is greater than or equal to 2nd number num1 -gt num2 checks if 1st number is greater than 2nd number num1 -le num2 checks if 1st number is less than or equal to 2nd number num1 -lt num2 checks if 1st number is less than 2nd number #!/bin/bash # This script evaluate the status of a file if [ $# -ne 1 ]; then echo \"Please pass one file name as argument\" exit 1 fi FILE=$1 if [ -e \"$FILE\" ]; then if [ -f \"$FILE\" ]; then echo \"$FILE is a regular file.\" fi if [ -d \"$FILE\" ]; then echo \"$FILE is a directory.\" fi if [ -r \"$FILE\" ]; then echo \"$FILE is readable.\" fi if [ -w \"$FILE\" ]; then echo \"$FILE is writable.\" fi if [ -x \"$FILE\" ]; then echo \"$FILE is executable/searchable.\" fi else echo \"$FILE does not exist\" exit 2 fi exit 0 There are lots of file expressions to evaluate file,like in bash script \u201c-e\u201d in line 10 returns true if the file passed as argument exist, false otherwise. Below are the some widely used file expressions: File Operations Description -e file File exists -d file File exists and is directory -f file File exists and is regular file -L file File exists and is symbolic link -r file File exists and has readable permission -w file File exists and has writable permission -x file File exists and has executable permission -s file File exists and size is greater than zero -S file File exists and is a network socket. Exit status is 2 when the file is not found. And if the file is found it prints out the properties it holds with exit status 0(success). Looping over to do a repeated task. We usually come up with tasks that are mostly repetitive, looping helps us to code those repetitive tasks in a more formal manner. There are different types of loop statement we can use in bash: Loop Syntax while while [ expression ] do [ while_block_to_execute ] done for for variable in 1,2,3 .. n do [ for_block_to_execute ] done until until [ expression ] do [ until_block_to_execute ] done #!/bin/bash #Script to monitor the server hosts=`cat host_list` while true do for i in $hosts do h=\"$i\" ping -c 1 -q \"$h\" &>/dev/null if [ $? -eq 0 ] then echo `date` \"server $h alive\" else echo `date` \"server $h is dead\" fi done sleep 60 done Monitoring a server is an important part of being an SRE. The file \u201chost_list\u201d contains the list of host which we want to monitor. We used an infinite \u201cwhile\u201d loop that will sleep every 60seconds. And for each host in the host_list we want to ping that host and check if that ping was successful with its exit status, if it\u2019s successful we say server is live or it\u2019s dead. The output of the script shows it is running every minute with the timestamp. Function Developers always try to make their applications/programs in modular fashion so that they don\u2019t have to write the same code every time and everywhere to carry out similar tasks. Functions help us achieve this. We usually call functions with some arguments and expect result based on that argument. The backup process we discussed in earlier section, we will try to automate that process using the below script and also get familiar with some more concepts like string comparison, functions and logical AND and OR operations. In the below code \u201clog_backup\u201d is a function which won\u2019t be executed until it is called. Line37 will be executed first where we will check the no. of arguments passed to the script. There are many logical operators like AND,OR, XOR etc. Logical Operator Symbol AND && OR | NOT ! Passing the wrong argument to script \u201cbackup.sh\u201d will prompt for correct usage. We have to pass whether we want to have incremental backup of the directory or the full backup along with the path of the directory we want to backup. If we want the incremental backup we will an additional argument as a meta file which is used to store the information of previous backed up files.(usually a metafile is .snar extension). #!/bin/bash #Scripts to take incremental and full backup backup_dir=\"/mnt/backup/\" time_stamp=\"`date +%d-%m-%Y-%Hh-%Mm-%Ss`\" log_backup(){ if [ $# -lt 2 ]; then echo \"Usage: ./backup.sh [backup_type] [log_path]\" exit 1; fi if [ $1 == \"incremental\" ]; then if [ $# -ne 3 ]; then echo \"Usage: ./backup.sh [backup_type] [log_path] [meta_file]\" exit 3; fi tar --create --listed-incremental=$3 --verbose --verbose --file=\"${backup_dir}incremental-${time_stamp}.tar\" $2 if [ $? -eq 0 ]; then echo \"Incremental backup succesful at '${backup_dir}incremental-${time_stamp}.tar'\" else echo \"Incremental Backup Failure\" fi elif [ $1 == \"full\" ];then tar cf \"${backup_dir}fullbackup-${time_stamp}.tar\" $2 if [ $? -eq 0 ];then echo \"Full backup successful at '${backup_dir}fullbackup-${time_stamp}.tar'\" else echo \"Full Backup Failure\" fi else echo \"Unknown parameter passed\" echo \"Usage: ./backup.sh [incremental|full] [log_path]\" exit 2; fi } if [ $# -lt 2 ] || [ $# -gt 3 ];then echo \"Usage: ./backup.sh [incremental|full] [log_path]\" exit 1 elif [ $# -eq 2 ];then log_backup $1 $2 elif [ $# -eq 3 ];then log_backup $1 $2 $3 fi exit 0 Passing all 3 arguments for incremental backup will take incremental backup at \u201c/mnt/backup/\u201d with each archive having timestamp concatenated to each file. The arguments passed inside the function can be accessed via $ followed by the index. The 0th index refers to the function itself, $1 to the first argument and so on. We use #$ to check the count of arguments passed to the function. Once we pass the string \u201cincremental\u201d or \u201cfull\u201d it gets compared inside the function and the specific block is executed. Below are some more operations that can be performed over strings. String Operations Description string1 == string2 Returns true if string1 equals string 2 otherwise false. string1 != string2 Returns true if string NOT equal string 2 otherwise false. string1 ~= regex Returns true if string1 matches the extended regular expression. -z string Returns true if string length is zero otherwise false. -n string Returns true if string length is non-zero otherwise false.","title":"Bash Scripting"},{"location":"level102/linux_intermediate/bashscripting/#bash-scripting","text":"","title":"Bash Scripting"},{"location":"level102/linux_intermediate/bashscripting/#introduction","text":"As an SRE, the Linux system sits at the core of our day to day work and so is bash scripting. It\u2019s a scripting language that is run by Linux Bash Interpreter. Until now we have covered a lot of features mostly on a command line, now we will use this command line as an interpreter to write programs that will ease our day to day job as an SRE.","title":"Introduction"},{"location":"level102/linux_intermediate/bashscripting/#writing-the-first-bash-script","text":"We will start with a simple program, we will use Vim as the editor during the whole journey. #!/bin/bash # This if my first bash script # Line starting with # is commented echo \"Hello world!\" The first line of the script starting with \u201c#!\u201d is called she-bang. This is simply to let the system which interpreter to use while executing the script. Any Line starting with \u201c#\u201d (other than #!) is referred to as comments in script and is ignored by the interpreter while executing the script. Line 6 shows the \u201cecho\u201d command that we would be running. We will save this script as \u201cfirstscript.sh\u201d and make the script executable using chmod . Next thing is to run the script with the explicit path. We can see the desired \u201cHello World!\u201d as output.","title":"Writing the first bash script:"},{"location":"level102/linux_intermediate/bashscripting/#taking-user-input-and-working-with-variables","text":"Taking standard input using the read command and working with variables in bash. #!/bin/bash #We will take standard input #Will list all files at the path #We will concate variable and string echo \"Enter the path\" read path echo \"How deep in directory you want to go:\" read depth echo \"All files at path \" $path du -d $depth -all -h $path We are reading path in variable \u201c path \u201d and variable \u201c depth \u201d to list files and directories up to that depth. We concatenated strings with variables. We always use $ (dollar-sign) to reference the value it contains. We pass these variables to the du command to list out all the files and directories in that path upto the desired depth.","title":"Taking user input and working with variables:"},{"location":"level102/linux_intermediate/bashscripting/#exit-status","text":"Every command and script when it completes executing, returns an integer in the range from 0 to 255 to the system, this is called exit status. \u201c0\u201d denotes success of the command while non-zero return code usually indicates various kinds of errors. We use $? special shell variable to get exit status of the last executed script or command.","title":"Exit status:"},{"location":"level102/linux_intermediate/bashscripting/#command-line-arguments-and-understanding-if-else-branching","text":"Another way to pass some values to the script is using command line arguments. Usually command line arguments in bash are accessed by $ followed by the index. The 0th index refers to the file itself, $1 to the first argument and so on. We use $# to check the count of arguments passed to the script. Making decisions in the programming language is it\u2019s integral part, and to tackle different conditions we use if \u2026 else statements or some more nested variant of it. The below script uses multiple concepts in one script. The aim of the script is to get some properties of the file. Line 4 to 7 is the standard example of \"if statement\" in bash. Syntax is as explained below: If [ condition ]; then If_block_to_execute else else_block_to_execute fi fi is to close the if \u2026 else block. We are comparing count of argument($#) if it is equal to 1 or not. If not we prompt for only one argument and exit the script with status code 1(not a success). One or more if statements can exist without else statement but vice versa doesn\u2019t make any sense. Operator -ne is used to compare two integers, read as \u201cinteger1 not equal to integer 2\u201d. Other comparison operators are: Operations Description num1 -eq num2 check if 1st number is equal to 2nd number num1 -ge num2 checks if 1st number is greater than or equal to 2nd number num1 -gt num2 checks if 1st number is greater than 2nd number num1 -le num2 checks if 1st number is less than or equal to 2nd number num1 -lt num2 checks if 1st number is less than 2nd number #!/bin/bash # This script evaluate the status of a file if [ $# -ne 1 ]; then echo \"Please pass one file name as argument\" exit 1 fi FILE=$1 if [ -e \"$FILE\" ]; then if [ -f \"$FILE\" ]; then echo \"$FILE is a regular file.\" fi if [ -d \"$FILE\" ]; then echo \"$FILE is a directory.\" fi if [ -r \"$FILE\" ]; then echo \"$FILE is readable.\" fi if [ -w \"$FILE\" ]; then echo \"$FILE is writable.\" fi if [ -x \"$FILE\" ]; then echo \"$FILE is executable/searchable.\" fi else echo \"$FILE does not exist\" exit 2 fi exit 0 There are lots of file expressions to evaluate file,like in bash script \u201c-e\u201d in line 10 returns true if the file passed as argument exist, false otherwise. Below are the some widely used file expressions: File Operations Description -e file File exists -d file File exists and is directory -f file File exists and is regular file -L file File exists and is symbolic link -r file File exists and has readable permission -w file File exists and has writable permission -x file File exists and has executable permission -s file File exists and size is greater than zero -S file File exists and is a network socket. Exit status is 2 when the file is not found. And if the file is found it prints out the properties it holds with exit status 0(success).","title":"Command line arguments and understanding If \u2026 else branching:"},{"location":"level102/linux_intermediate/bashscripting/#looping-over-to-do-a-repeated-task","text":"We usually come up with tasks that are mostly repetitive, looping helps us to code those repetitive tasks in a more formal manner. There are different types of loop statement we can use in bash: Loop Syntax while while [ expression ] do [ while_block_to_execute ] done for for variable in 1,2,3 .. n do [ for_block_to_execute ] done until until [ expression ] do [ until_block_to_execute ] done #!/bin/bash #Script to monitor the server hosts=`cat host_list` while true do for i in $hosts do h=\"$i\" ping -c 1 -q \"$h\" &>/dev/null if [ $? -eq 0 ] then echo `date` \"server $h alive\" else echo `date` \"server $h is dead\" fi done sleep 60 done Monitoring a server is an important part of being an SRE. The file \u201chost_list\u201d contains the list of host which we want to monitor. We used an infinite \u201cwhile\u201d loop that will sleep every 60seconds. And for each host in the host_list we want to ping that host and check if that ping was successful with its exit status, if it\u2019s successful we say server is live or it\u2019s dead. The output of the script shows it is running every minute with the timestamp.","title":"Looping over to do a repeated task."},{"location":"level102/linux_intermediate/bashscripting/#function","text":"Developers always try to make their applications/programs in modular fashion so that they don\u2019t have to write the same code every time and everywhere to carry out similar tasks. Functions help us achieve this. We usually call functions with some arguments and expect result based on that argument. The backup process we discussed in earlier section, we will try to automate that process using the below script and also get familiar with some more concepts like string comparison, functions and logical AND and OR operations. In the below code \u201clog_backup\u201d is a function which won\u2019t be executed until it is called. Line37 will be executed first where we will check the no. of arguments passed to the script. There are many logical operators like AND,OR, XOR etc. Logical Operator Symbol AND && OR | NOT ! Passing the wrong argument to script \u201cbackup.sh\u201d will prompt for correct usage. We have to pass whether we want to have incremental backup of the directory or the full backup along with the path of the directory we want to backup. If we want the incremental backup we will an additional argument as a meta file which is used to store the information of previous backed up files.(usually a metafile is .snar extension). #!/bin/bash #Scripts to take incremental and full backup backup_dir=\"/mnt/backup/\" time_stamp=\"`date +%d-%m-%Y-%Hh-%Mm-%Ss`\" log_backup(){ if [ $# -lt 2 ]; then echo \"Usage: ./backup.sh [backup_type] [log_path]\" exit 1; fi if [ $1 == \"incremental\" ]; then if [ $# -ne 3 ]; then echo \"Usage: ./backup.sh [backup_type] [log_path] [meta_file]\" exit 3; fi tar --create --listed-incremental=$3 --verbose --verbose --file=\"${backup_dir}incremental-${time_stamp}.tar\" $2 if [ $? -eq 0 ]; then echo \"Incremental backup succesful at '${backup_dir}incremental-${time_stamp}.tar'\" else echo \"Incremental Backup Failure\" fi elif [ $1 == \"full\" ];then tar cf \"${backup_dir}fullbackup-${time_stamp}.tar\" $2 if [ $? -eq 0 ];then echo \"Full backup successful at '${backup_dir}fullbackup-${time_stamp}.tar'\" else echo \"Full Backup Failure\" fi else echo \"Unknown parameter passed\" echo \"Usage: ./backup.sh [incremental|full] [log_path]\" exit 2; fi } if [ $# -lt 2 ] || [ $# -gt 3 ];then echo \"Usage: ./backup.sh [incremental|full] [log_path]\" exit 1 elif [ $# -eq 2 ];then log_backup $1 $2 elif [ $# -eq 3 ];then log_backup $1 $2 $3 fi exit 0 Passing all 3 arguments for incremental backup will take incremental backup at \u201c/mnt/backup/\u201d with each archive having timestamp concatenated to each file. The arguments passed inside the function can be accessed via $ followed by the index. The 0th index refers to the function itself, $1 to the first argument and so on. We use #$ to check the count of arguments passed to the function. Once we pass the string \u201cincremental\u201d or \u201cfull\u201d it gets compared inside the function and the specific block is executed. Below are some more operations that can be performed over strings. String Operations Description string1 == string2 Returns true if string1 equals string 2 otherwise false. string1 != string2 Returns true if string NOT equal string 2 otherwise false. string1 ~= regex Returns true if string1 matches the extended regular expression. -z string Returns true if string length is zero otherwise false. -n string Returns true if string length is non-zero otherwise false.","title":"Function"},{"location":"level102/linux_intermediate/conclusion/","text":"Conclusion Understanding package management is very crucial as an SRE, we always want the right set of software with their compatible versions to work in harmony to drive the big infrastructure and organization. We also saw how we can configure and use storage drives and how we can have redundancy of data using RAID to avoid the data loss, how data is placed over disk and use of file systems. Archiving and Backup is also a crucial part of being an SRE, It\u2019s our responsibility to keep the data safe and in a more efficient manner. Bash is very useful to automate the day to day toil that an SRE stumbles into. The above walkthrough of bash gives us an idea to get started, but mere reading through it won\u2019t take you much further. I believe \u201ctaking action and practicing the topic\u201d would give you confidence and will help you become a better SRE.","title":"Conclusion"},{"location":"level102/linux_intermediate/conclusion/#conclusion","text":"Understanding package management is very crucial as an SRE, we always want the right set of software with their compatible versions to work in harmony to drive the big infrastructure and organization. We also saw how we can configure and use storage drives and how we can have redundancy of data using RAID to avoid the data loss, how data is placed over disk and use of file systems. Archiving and Backup is also a crucial part of being an SRE, It\u2019s our responsibility to keep the data safe and in a more efficient manner. Bash is very useful to automate the day to day toil that an SRE stumbles into. The above walkthrough of bash gives us an idea to get started, but mere reading through it won\u2019t take you much further. I believe \u201ctaking action and practicing the topic\u201d would give you confidence and will help you become a better SRE.","title":"Conclusion"},{"location":"level102/linux_intermediate/introduction/","text":"Linux-Intermediate Prerequisites Expect to have gone through the School Of SRE Linux Basics . What to expect from this course This course is divided into two sections. In the first section we will cover where we left off the Linux Basics, earlier in the School of SRE curriculum, we will deep dive into some of the more advanced linux commands and concepts. In this second section we will discuss how we use Bash scripting in day to day work, automation and toil reduction as an SRE with the help of real life examples of any SRE. What is not covered under this course This course aims to make you familiar with the intersection of Linux commands, shell scripting and how SRE uses it. We would not be covering Linux internals. Lab Environment Setup Install docker on your system. https://docs.docker.com/engine/install/ We would be using RedHat Enterprise Linux (RHEL) 8. We would be running most of the commands in the above docker container. __________________________________________________________________________ Course Content Package Management Package: Dependencies Repository High Level and Low-Level Package management tools Storage Media Listing the mounted storage devices Creating a FileSystem Mounting the device Unmounting the device Making it easier with /etc/fstab file? Checking and Repairing FS RAID RAID levels RAID 0 (Striping) RAID 1(Mirroring) RAID 5(Striping with distributed parity) RAID 6(Striping with double parity) RAID 10(RAID 1+0 : Mirroring and Striping) Commands to monitor RAID LVM Archiving and Backup Archiving gzip tar Create an archive with files and folder Listing files in the archive Extract files from the archive Backup Incremental backup Differential backup Network backup Cloud Backup Introduction to Vim Opening a file and using insert mode Saving a file Exiting the VIM editor Bash Scripting Writing the first bash script Taking user input and working with variables Exit status Command line arguments and understanding If \u2026 else branching Looping over to do a repeated task Function Conclusion","title":"Introduction"},{"location":"level102/linux_intermediate/introduction/#linux-intermediate","text":"","title":"Linux-Intermediate"},{"location":"level102/linux_intermediate/introduction/#prerequisites","text":"Expect to have gone through the School Of SRE Linux Basics .","title":"Prerequisites"},{"location":"level102/linux_intermediate/introduction/#what-to-expect-from-this-course","text":"This course is divided into two sections. In the first section we will cover where we left off the Linux Basics, earlier in the School of SRE curriculum, we will deep dive into some of the more advanced linux commands and concepts. In this second section we will discuss how we use Bash scripting in day to day work, automation and toil reduction as an SRE with the help of real life examples of any SRE.","title":"What to expect from this course"},{"location":"level102/linux_intermediate/introduction/#what-is-not-covered-under-this-course","text":"This course aims to make you familiar with the intersection of Linux commands, shell scripting and how SRE uses it. We would not be covering Linux internals.","title":"What is not covered under this course"},{"location":"level102/linux_intermediate/introduction/#lab-environment-setup","text":"Install docker on your system. https://docs.docker.com/engine/install/ We would be using RedHat Enterprise Linux (RHEL) 8. We would be running most of the commands in the above docker container. __________________________________________________________________________","title":"Lab Environment Setup"},{"location":"level102/linux_intermediate/introduction/#course-content","text":"Package Management Package: Dependencies Repository High Level and Low-Level Package management tools Storage Media Listing the mounted storage devices Creating a FileSystem Mounting the device Unmounting the device Making it easier with /etc/fstab file? Checking and Repairing FS RAID RAID levels RAID 0 (Striping) RAID 1(Mirroring) RAID 5(Striping with distributed parity) RAID 6(Striping with double parity) RAID 10(RAID 1+0 : Mirroring and Striping) Commands to monitor RAID LVM Archiving and Backup Archiving gzip tar Create an archive with files and folder Listing files in the archive Extract files from the archive Backup Incremental backup Differential backup Network backup Cloud Backup Introduction to Vim Opening a file and using insert mode Saving a file Exiting the VIM editor Bash Scripting Writing the first bash script Taking user input and working with variables Exit status Command line arguments and understanding If \u2026 else branching Looping over to do a repeated task Function Conclusion","title":"Course Content"},{"location":"level102/linux_intermediate/introvim/","text":"Introduction to Vim Introduction As an SRE we several times log into into the servers and make changes to the config file, edit and modify scripts and the editor which comes handy and available in almost all linux distribution is Vim. Vim is an open-source and free command line editor, widely accepted and used. We will see some basics of how to use vim for creating and editing files. This knowledge will help us in understanding the next section, Scripting. Opening a file and using insert mode We use the command vim filename to open a file filename . The terminal will open an editor but once you start writing, it won\u2019t work. It\u2019s because we are not in \"INSERT\" mode in vim. Press i and get into insert mode and start writing. You will see on the bottom left \u201cINSERT\u201d after pressing \u201c i \u201d . You can use * ESC \u201d key to get back to normal mode. Saving a file After you insert your text in INSERT mode press ESC(escape) key on your keyboard to get out of it. Press : (colon shift +;) and press w and hit enter, the text you entered will get written in the file. Exiting the VIM editor Exiting vim can get real challenging for the beginners. There are various ways you can exit the Vim like exit without saving the work, exit with saving the work. Try below commands after exiting insert mode and pressing : (colon). Vim Commands Description :q Exit the file but won\u2019t exit if file has unsaved changes :wq Write(save) and exit the file. :q! Exit without saving the changes. This is basic we would be needing in bash scripting in the next section. You can always visit tutorial for learning more. For quick practice of vim commands visit: https://www.openvim.com/","title":"Introduction to Vim"},{"location":"level102/linux_intermediate/introvim/#introduction-to-vim","text":"","title":"Introduction to Vim"},{"location":"level102/linux_intermediate/introvim/#introduction","text":"As an SRE we several times log into into the servers and make changes to the config file, edit and modify scripts and the editor which comes handy and available in almost all linux distribution is Vim. Vim is an open-source and free command line editor, widely accepted and used. We will see some basics of how to use vim for creating and editing files. This knowledge will help us in understanding the next section, Scripting.","title":"Introduction"},{"location":"level102/linux_intermediate/introvim/#opening-a-file-and-using-insert-mode","text":"We use the command vim filename to open a file filename . The terminal will open an editor but once you start writing, it won\u2019t work. It\u2019s because we are not in \"INSERT\" mode in vim. Press i and get into insert mode and start writing. You will see on the bottom left \u201cINSERT\u201d after pressing \u201c i \u201d . You can use * ESC \u201d key to get back to normal mode.","title":"Opening a file and using insert mode"},{"location":"level102/linux_intermediate/introvim/#saving-a-file","text":"After you insert your text in INSERT mode press ESC(escape) key on your keyboard to get out of it. Press : (colon shift +;) and press w and hit enter, the text you entered will get written in the file.","title":"Saving a file"},{"location":"level102/linux_intermediate/introvim/#exiting-the-vim-editor","text":"Exiting vim can get real challenging for the beginners. There are various ways you can exit the Vim like exit without saving the work, exit with saving the work. Try below commands after exiting insert mode and pressing : (colon). Vim Commands Description :q Exit the file but won\u2019t exit if file has unsaved changes :wq Write(save) and exit the file. :q! Exit without saving the changes. This is basic we would be needing in bash scripting in the next section. You can always visit tutorial for learning more. For quick practice of vim commands visit: https://www.openvim.com/","title":"Exiting the VIM editor"},{"location":"level102/linux_intermediate/package_management/","text":"Package Management Introduction One of the main features of any operating system is the ability to run other programs and softwares, and hence Package management comes into picture. Package management is a method of installing and maintaining software programs on any operating system. Package In the early days of Linux, one had to download source code of any software and compile it to install and run the software. As the Linux space became more mature, it is understood the software landscape is very dynamic and started distributing software in the form of packages. Package file is a compressed collection of files that contains software, its dependencies, installation instructions and metadata about the package. Dependencies It is rare that a software package is stand-alone, it depends on the different software, libraries and modules. These subroutines are stored and made available in the form of shared libraries which may serve more than one program. These shared resources are called dependencies. Package management does this hard job of resolving dependencies and installing them for the user along with the software. Repository Repository is a storage location where all the packages, updates, dependencies are stored. Each repository can contain thousands of software packages hosted on a remote server intended to be installed and updated on linux systems. We usually update the package information ( often referred to as metadata ) by running \u201c sudo dnf update\u201d. Try out sudo dnf repolist all to list all the repositories. We usually add repositories for installing packages from third party vendors. dnf config-manager --add-repo http://www.example.com/example.repo High Level and Low-Level Package management tools There are mainly two types of packages management tools: 1. Low-level tools : This is mostly used for installing, removing and upgrading package files. 2. High-Level tools : In addition to Low-level tools, High-level tools do metadata searching and dependency resolution as well. Linux Distribution Low-Level Tools High-Level tools Debian dpkg apt-get Fedora, RedHat dnf dnf","title":"Package Management"},{"location":"level102/linux_intermediate/package_management/#package-management","text":"","title":"Package Management"},{"location":"level102/linux_intermediate/package_management/#introduction","text":"One of the main features of any operating system is the ability to run other programs and softwares, and hence Package management comes into picture. Package management is a method of installing and maintaining software programs on any operating system.","title":"Introduction"},{"location":"level102/linux_intermediate/package_management/#package","text":"In the early days of Linux, one had to download source code of any software and compile it to install and run the software. As the Linux space became more mature, it is understood the software landscape is very dynamic and started distributing software in the form of packages. Package file is a compressed collection of files that contains software, its dependencies, installation instructions and metadata about the package.","title":"Package"},{"location":"level102/linux_intermediate/package_management/#dependencies","text":"It is rare that a software package is stand-alone, it depends on the different software, libraries and modules. These subroutines are stored and made available in the form of shared libraries which may serve more than one program. These shared resources are called dependencies. Package management does this hard job of resolving dependencies and installing them for the user along with the software.","title":"Dependencies"},{"location":"level102/linux_intermediate/package_management/#repository","text":"Repository is a storage location where all the packages, updates, dependencies are stored. Each repository can contain thousands of software packages hosted on a remote server intended to be installed and updated on linux systems. We usually update the package information ( often referred to as metadata ) by running \u201c sudo dnf update\u201d. Try out sudo dnf repolist all to list all the repositories. We usually add repositories for installing packages from third party vendors. dnf config-manager --add-repo http://www.example.com/example.repo","title":"Repository"},{"location":"level102/linux_intermediate/package_management/#high-level-and-low-level-package-management-tools","text":"There are mainly two types of packages management tools: 1. Low-level tools : This is mostly used for installing, removing and upgrading package files. 2. High-Level tools : In addition to Low-level tools, High-level tools do metadata searching and dependency resolution as well. Linux Distribution Low-Level Tools High-Level tools Debian dpkg apt-get Fedora, RedHat dnf dnf","title":"High Level and Low-Level Package management tools"},{"location":"level102/linux_intermediate/storage_media/","text":"Storage Media Introduction Storage media are devices which are used to store data and information. Linux has amazing capabilities when it comes to handling external devices including storage devices. There are many kinds of storage devices physical storage devices like hard drives, virtual storage devices like RAID or LVM, network storage and so on. In this section we will learn to work with any storage device and configure it to our needs. Listing the mounted storage devices: We can use command mount to list all the storage devices mounted to your computer. The format in which we see above output is: device on mount_point type file\\_system\\_type (options) For example in the first line the device virtual sysfs is mounted at /sys path and has a sysfs file system. Now let\u2019s see what and how a filesystem is created. Creating a FileSystem Imagine a disk where all the data stored in the disk is in the form of one large chunk, there is nothing to figure out where one piece of data starts and ends, which piece of data is located at which place of the whole chunk of data and hence the File System comes into picture. File System(fs) is responsible for data storage, indexing and retrieval on any storage device. Below are the most popularly used file systems: FS Type Description FAT File Allocation Table, initially used on DOS and Microsoft Windows and now widely used for portable USB storage NTFS (New Technology File System) Used on Microsoft\u2019s Windows based operating systems ext Extended file system, designed for Linux systems. ext4 Fourth extended filesystem, is a journaled file system that is commonly used by the Linux kernel. HFS Hierarchical File System, in use until HFS+ was introduced on Mac OS 8.1. HFS+ Supports file system journaling, enabling recovery of data after a system crash. NFS Network File System originally from Sun Microsystems is the standard in UNIX-based networks. We will try to create an ext4 file system which is linux native fs using mkfs . Discalimer: Run this command on empty disk as this will wipe out the existing data. Here the device /dev/sdb1 is formatted and it\u2019s filesystem is changed to ext4 . Mounting the device: In Linux systems all files are arranged in a tree structure with (/) as root. Mounting a fs simply means making that fs accessible to a certain point in the Linux directory tree. We need a mount point(location) where we want to mount the above formatted device. We created a mount point /mount and used the mount command to attach the filesystem. Here -t flag specifies what is the fs type and after that the /dev/sdb1 (device name) and /mount (mount point we created earlier). Unmounting the device: Now let\u2019s see how we can unmount the device, which is equally important if we have removable storage media and want to mount on another host. We use umount for unmounting the device. Our first attempt did not unmount the /sdb1 because we were inside the storage device and it was being used. Once we jumped back to the home directory we were successfully able to unmount the device. Making it easier with /etc/fstab file? In our production environment, we can have servers with many storage devices that need to be mounted, and it is not feasible to mount each device using the command every time we reboot the system. To ease this burden, we can make use of configuration table called \u201cfstab\u201d usually found in /etc/fstab on Linux systems. Here on the first line we have /dev/mapper/rootvg-rootlv (storage device ) mounted on / (root mount point) which has the xfs filesystem type followed by options. We can run mount -a to reload this file after making changes. Checking and Repairing FS Filesystems encounter issues in case of any hardware failure, power failure and sometimes due to improper shutdown. Linux usually checks and repairs the corrupted disk if any during startup. We can also manually check for filesystem corruption using the command fsck . We can repair the same filesystem using fsck -y /dev/sdb1 . There are error codes attached to each kind of file system error ,and A sum of active errors is returned. Error Codes Description 0 No errors 1 Filesystem errors corrected 2 System should be rebooted 4 Filesystem errors left uncorrected 8 Operational error 16 Usage or syntax error 32 Checking canceled by user request 128 Shared-library error In the above fs check we got return code as 12 which is the sum of error code 8(operational error) and 4(uncorrected FS error). RAID RAID or \u201cRedundant Arrays of Independent Disks\u201d is a technique that distributes I/O across multiple disks to achieve increased performance and data redundancy. RAID has the ability to increase overall disk performance and survive disk failures. Software RAID uses the computer\u2019s CPU to carry out RAID operations whereas hardware RAID uses specialized processors, on disk controllers, to manage the disks. Three essential features of RAID are mirroring, striping and parity. RAID levels The below section discusses the RAID levels that are commonly used. For information on all RAID levels, please refer to here . RAID 0 (Striping) Striping is the method by which data is split up into \u201cblocks\u201d and written across all the disks present in the array. By spreading data across multiple drives, it means multiple disks can access the file, resulting in faster read/write speeds. The first disk in the array is not reused until an equal amount of data is written to each of the other disks in the array. Advantages It can be easily implemented. Bottlenecks caused due to I/O operations from the same disk are avoided, increasing the performance of such operations. Disadvantages It does not offer any kind of redundancy. If any one of the disks fails, then the data of the entire disk is lost and cannot be recovered. Use cases RAID 0 can be used for systems with non-critical data that has to be read at high speed, such as a video/audio editing station or gaming environments. RAID 1(Mirroring) Mirroring writes a copy of data to each disk which is part of the array. This means that the data is written as many times as disks in the array . It stores an exact replica of all data on a separate disk or disks. As expected, this would result in a slow write performance compared to that of a single disk. On the other hand, read operations can be done parallelly improving read performance. Advantages RAID 1 offers a better read performance than RAID 0 or single disk. It can survive multiple disk failures without the need for special data recovery algorithms Disadvantages It is costly since the effective storage capacity is only half of the number of disks due to replication of data. Use cases Applications that require low downtime but can have a slight hit on write performance. RAID 4(Striping with dedicated parity) RAID 4 works uses block-level striping (data can be striped in blocks of a variety of sizes depending on the applications and data to be stored) and a dedicated drive used to store parity information.The parity information is generated by an algorithm every time data is written to an array disk. The use of a parity bit is a way of adding checksums into data that can enable the target device to determine whether the data has been received correctly. In the event of a drive failure , the algorithm can be reversed and missing data can be generated based on the remaining data and parity information. Advantages Each drive in a RAID 4 array operates independently so I/O requests take place in parallel, speeding up performance over previous RAID levels. It can survive multiple disk failures without the need for special data recovery algorithms Disadvantages A minimum of 3 disks is required for setup. It needs hardware support for parity calculation. Write speeds are slow since parity relies on a single disk drive and carry out modifications of parity blocks for each I/O session. Use cases Operations dealing with really large files \u2013 when sequential read and write data process is used RAID 5(Striping with distributed parity) RAID 5 is similar to RAID 4, except that the parity information is spread across all drives in the array. This helps reduce the bottleneck inherent in writing parity information to a single drive during each write operation. RAID 5 is the most common secure RAID level. Advantages Read data transactions are fast as compared to write data transactions that are somewhat slow due to the calculation of parity. Data remains accessible even after drive failure and during replacement of a failed hard drive because the storage controller rebuilds the data on the new drive. Disadvantages RAID 5 requires a minimum of 3 drives and can work up to a maximum of 16 drives It needs hardware support for parity calculation. More than two drive failures can cause data loss. Use cases File storage and application servers, such as email, general storage servers, etc. RAID 6(Striping with double parity) RAID 6 is similar to RAID 5 with an added advantage of double distributed parity, which provides fault tolerance up to two failed drives. Advantages Read data transactions are fast. This provides a fault tolerance up to 2 failed drives. RAID 6 is more resilient than RAID 5. Disadvantages Write data transactions are slow due to double parity. Rebuilding the RAID array takes a longer time because of complex structure. Use cases Office automation, online customer service, and applications that require very high availability. RAID 10(RAID 1+0 : Mirroring and Striping) RAID 10 is a combination of RAID 0 and RAID 1. It means that both mirroring and striping in one single RAID array. Advantages Rebuilding the RAID array is fast. Read and write operations performance are good. Disadvantages Just like RAID 1, only half the drive capacity is available. It can be expensive to implement RAID 10. Use cases Transactional databases with sensitive information that require high performance and high data security. Commands to monitor RAID The command cat /proc/mdstat will give the status of a software RAID. Let us examine the output of the command: Personalities : [raid1] md0 : active raid1 sdb1[2] sda1[0] 10476544 blocks super 1.1 [2/2] [UU] bitmap: 0/1 pages [0KB], 65536KB chunk md1 : active raid1 sdb2[2] sda2[0] 10476544 blocks super 1.1 [2/2] [UU] bitmap: 1/1 pages [4KB], 65536KB chunk md2 : active raid1 sdb3[2] 41909248 blocks super 1.1 [2/1] [_U] bitmap: 1/1 pages [4KB], 65536KB chunk The \u201cpersonalities\u201d gives us the raid level that the raid is configured. In the above example, the raid is configured with RAID 1. md0 : active raid1 sdb1[2] sda1[0] tells us that there is an active raid of RAID 1 between sdb1(which is device 2) and sda1(which is device 0).An inactive array generally means that one of the disks are faulty. Md2 in the above example shows that we have 41909248 blocks super 1.1 [2/1] [_U] , this means that one disk is down in this particular raid. The command mdadm --detail /dev/ gives detailed information about that particular array. sudo mdadm --detail /dev/md0 /dev/md0: Version : 1.1 Creation Time : Fri Nov 17 11:49:20 2019 Raid Level : raid1 Array Size : 10476544 (9.99 GiB 10.32 GB) Used Dev Size : 10476544 (9.99 GiB 10.32 GB) Raid Devices : 2 Total Devices : 2 Persistence : Superblock is persistent Intent Bitmap : Internal Update Time : Sun Dec 2 01:00:53 2019 State : clean Active Devices : 2 Working Devices : 2 Failed Devices : 0 Spare Devices : 0 UUID : xxxxxxx:yyyyyy:zzzzzz:ffffff Events : 987 Number Major Minor RaidDevice State 0 8 1 0 active sync /dev/sda1 1 8 49 1 active sync /dev/sdb1 Incase of a missing disk in the above example, the State of the raid would be \u2018dirty\u2019 and Active Devices and Working Devices would be reduced to one. One of the entries(either /dev/sda1 or /dev/sdb1 depending on the missing disk) would have their RaidDevice changed to faulty. LVM LVM stands for Logical Volume Management. In the above section we saw how we can create FS and use individual disks according to our need the traditional way but using LVM we can achieve more flexibility in storage allocation like we can stitch three 2TB disks to make one single partition of 6TB, or we can attach another physical disk of 4TB to the server and add that disk to the logical volume group to make it 10TB in total. Refer to know more about LVM: https://www.redhat.com/sysadmin/lvm-vs-partitioning","title":"Storage Media"},{"location":"level102/linux_intermediate/storage_media/#storage-media","text":"","title":"Storage Media"},{"location":"level102/linux_intermediate/storage_media/#introduction","text":"Storage media are devices which are used to store data and information. Linux has amazing capabilities when it comes to handling external devices including storage devices. There are many kinds of storage devices physical storage devices like hard drives, virtual storage devices like RAID or LVM, network storage and so on. In this section we will learn to work with any storage device and configure it to our needs.","title":"Introduction"},{"location":"level102/linux_intermediate/storage_media/#listing-the-mounted-storage-devices","text":"We can use command mount to list all the storage devices mounted to your computer. The format in which we see above output is: device on mount_point type file\\_system\\_type (options) For example in the first line the device virtual sysfs is mounted at /sys path and has a sysfs file system. Now let\u2019s see what and how a filesystem is created.","title":"Listing the mounted storage devices:"},{"location":"level102/linux_intermediate/storage_media/#creating-a-filesystem","text":"Imagine a disk where all the data stored in the disk is in the form of one large chunk, there is nothing to figure out where one piece of data starts and ends, which piece of data is located at which place of the whole chunk of data and hence the File System comes into picture. File System(fs) is responsible for data storage, indexing and retrieval on any storage device. Below are the most popularly used file systems: FS Type Description FAT File Allocation Table, initially used on DOS and Microsoft Windows and now widely used for portable USB storage NTFS (New Technology File System) Used on Microsoft\u2019s Windows based operating systems ext Extended file system, designed for Linux systems. ext4 Fourth extended filesystem, is a journaled file system that is commonly used by the Linux kernel. HFS Hierarchical File System, in use until HFS+ was introduced on Mac OS 8.1. HFS+ Supports file system journaling, enabling recovery of data after a system crash. NFS Network File System originally from Sun Microsystems is the standard in UNIX-based networks. We will try to create an ext4 file system which is linux native fs using mkfs . Discalimer: Run this command on empty disk as this will wipe out the existing data. Here the device /dev/sdb1 is formatted and it\u2019s filesystem is changed to ext4 .","title":"Creating a FileSystem"},{"location":"level102/linux_intermediate/storage_media/#mounting-the-device","text":"In Linux systems all files are arranged in a tree structure with (/) as root. Mounting a fs simply means making that fs accessible to a certain point in the Linux directory tree. We need a mount point(location) where we want to mount the above formatted device. We created a mount point /mount and used the mount command to attach the filesystem. Here -t flag specifies what is the fs type and after that the /dev/sdb1 (device name) and /mount (mount point we created earlier).","title":"Mounting the device:"},{"location":"level102/linux_intermediate/storage_media/#unmounting-the-device","text":"Now let\u2019s see how we can unmount the device, which is equally important if we have removable storage media and want to mount on another host. We use umount for unmounting the device. Our first attempt did not unmount the /sdb1 because we were inside the storage device and it was being used. Once we jumped back to the home directory we were successfully able to unmount the device.","title":"Unmounting the device:"},{"location":"level102/linux_intermediate/storage_media/#making-it-easier-with-etcfstab-file","text":"In our production environment, we can have servers with many storage devices that need to be mounted, and it is not feasible to mount each device using the command every time we reboot the system. To ease this burden, we can make use of configuration table called \u201cfstab\u201d usually found in /etc/fstab on Linux systems. Here on the first line we have /dev/mapper/rootvg-rootlv (storage device ) mounted on / (root mount point) which has the xfs filesystem type followed by options. We can run mount -a to reload this file after making changes.","title":"Making it easier with /etc/fstab file?"},{"location":"level102/linux_intermediate/storage_media/#checking-and-repairing-fs","text":"Filesystems encounter issues in case of any hardware failure, power failure and sometimes due to improper shutdown. Linux usually checks and repairs the corrupted disk if any during startup. We can also manually check for filesystem corruption using the command fsck . We can repair the same filesystem using fsck -y /dev/sdb1 . There are error codes attached to each kind of file system error ,and A sum of active errors is returned. Error Codes Description 0 No errors 1 Filesystem errors corrected 2 System should be rebooted 4 Filesystem errors left uncorrected 8 Operational error 16 Usage or syntax error 32 Checking canceled by user request 128 Shared-library error In the above fs check we got return code as 12 which is the sum of error code 8(operational error) and 4(uncorrected FS error).","title":"Checking and Repairing FS"},{"location":"level102/linux_intermediate/storage_media/#raid","text":"RAID or \u201cRedundant Arrays of Independent Disks\u201d is a technique that distributes I/O across multiple disks to achieve increased performance and data redundancy. RAID has the ability to increase overall disk performance and survive disk failures. Software RAID uses the computer\u2019s CPU to carry out RAID operations whereas hardware RAID uses specialized processors, on disk controllers, to manage the disks. Three essential features of RAID are mirroring, striping and parity.","title":"RAID"},{"location":"level102/linux_intermediate/storage_media/#raid-levels","text":"The below section discusses the RAID levels that are commonly used. For information on all RAID levels, please refer to here .","title":"RAID levels"},{"location":"level102/linux_intermediate/storage_media/#raid-0-striping","text":"Striping is the method by which data is split up into \u201cblocks\u201d and written across all the disks present in the array. By spreading data across multiple drives, it means multiple disks can access the file, resulting in faster read/write speeds. The first disk in the array is not reused until an equal amount of data is written to each of the other disks in the array. Advantages It can be easily implemented. Bottlenecks caused due to I/O operations from the same disk are avoided, increasing the performance of such operations. Disadvantages It does not offer any kind of redundancy. If any one of the disks fails, then the data of the entire disk is lost and cannot be recovered. Use cases RAID 0 can be used for systems with non-critical data that has to be read at high speed, such as a video/audio editing station or gaming environments.","title":"RAID 0 (Striping)"},{"location":"level102/linux_intermediate/storage_media/#raid-1mirroring","text":"Mirroring writes a copy of data to each disk which is part of the array. This means that the data is written as many times as disks in the array . It stores an exact replica of all data on a separate disk or disks. As expected, this would result in a slow write performance compared to that of a single disk. On the other hand, read operations can be done parallelly improving read performance. Advantages RAID 1 offers a better read performance than RAID 0 or single disk. It can survive multiple disk failures without the need for special data recovery algorithms Disadvantages It is costly since the effective storage capacity is only half of the number of disks due to replication of data. Use cases Applications that require low downtime but can have a slight hit on write performance.","title":"RAID 1(Mirroring)"},{"location":"level102/linux_intermediate/storage_media/#raid-4striping-with-dedicated-parity","text":"RAID 4 works uses block-level striping (data can be striped in blocks of a variety of sizes depending on the applications and data to be stored) and a dedicated drive used to store parity information.The parity information is generated by an algorithm every time data is written to an array disk. The use of a parity bit is a way of adding checksums into data that can enable the target device to determine whether the data has been received correctly. In the event of a drive failure , the algorithm can be reversed and missing data can be generated based on the remaining data and parity information. Advantages Each drive in a RAID 4 array operates independently so I/O requests take place in parallel, speeding up performance over previous RAID levels. It can survive multiple disk failures without the need for special data recovery algorithms Disadvantages A minimum of 3 disks is required for setup. It needs hardware support for parity calculation. Write speeds are slow since parity relies on a single disk drive and carry out modifications of parity blocks for each I/O session. Use cases Operations dealing with really large files \u2013 when sequential read and write data process is used","title":"RAID 4(Striping with dedicated parity)"},{"location":"level102/linux_intermediate/storage_media/#raid-5striping-with-distributed-parity","text":"RAID 5 is similar to RAID 4, except that the parity information is spread across all drives in the array. This helps reduce the bottleneck inherent in writing parity information to a single drive during each write operation. RAID 5 is the most common secure RAID level. Advantages Read data transactions are fast as compared to write data transactions that are somewhat slow due to the calculation of parity. Data remains accessible even after drive failure and during replacement of a failed hard drive because the storage controller rebuilds the data on the new drive. Disadvantages RAID 5 requires a minimum of 3 drives and can work up to a maximum of 16 drives It needs hardware support for parity calculation. More than two drive failures can cause data loss. Use cases File storage and application servers, such as email, general storage servers, etc.","title":"RAID 5(Striping with distributed parity)"},{"location":"level102/linux_intermediate/storage_media/#raid-6striping-with-double-parity","text":"RAID 6 is similar to RAID 5 with an added advantage of double distributed parity, which provides fault tolerance up to two failed drives. Advantages Read data transactions are fast. This provides a fault tolerance up to 2 failed drives. RAID 6 is more resilient than RAID 5. Disadvantages Write data transactions are slow due to double parity. Rebuilding the RAID array takes a longer time because of complex structure. Use cases Office automation, online customer service, and applications that require very high availability.","title":"RAID 6(Striping with double parity)"},{"location":"level102/linux_intermediate/storage_media/#raid-10raid-10-mirroring-and-striping","text":"RAID 10 is a combination of RAID 0 and RAID 1. It means that both mirroring and striping in one single RAID array. Advantages Rebuilding the RAID array is fast. Read and write operations performance are good. Disadvantages Just like RAID 1, only half the drive capacity is available. It can be expensive to implement RAID 10. Use cases Transactional databases with sensitive information that require high performance and high data security.","title":"RAID 10(RAID 1+0 : Mirroring and Striping)"},{"location":"level102/linux_intermediate/storage_media/#commands-to-monitor-raid","text":"The command cat /proc/mdstat will give the status of a software RAID. Let us examine the output of the command: Personalities : [raid1] md0 : active raid1 sdb1[2] sda1[0] 10476544 blocks super 1.1 [2/2] [UU] bitmap: 0/1 pages [0KB], 65536KB chunk md1 : active raid1 sdb2[2] sda2[0] 10476544 blocks super 1.1 [2/2] [UU] bitmap: 1/1 pages [4KB], 65536KB chunk md2 : active raid1 sdb3[2] 41909248 blocks super 1.1 [2/1] [_U] bitmap: 1/1 pages [4KB], 65536KB chunk The \u201cpersonalities\u201d gives us the raid level that the raid is configured. In the above example, the raid is configured with RAID 1. md0 : active raid1 sdb1[2] sda1[0] tells us that there is an active raid of RAID 1 between sdb1(which is device 2) and sda1(which is device 0).An inactive array generally means that one of the disks are faulty. Md2 in the above example shows that we have 41909248 blocks super 1.1 [2/1] [_U] , this means that one disk is down in this particular raid. The command mdadm --detail /dev/ gives detailed information about that particular array. sudo mdadm --detail /dev/md0 /dev/md0: Version : 1.1 Creation Time : Fri Nov 17 11:49:20 2019 Raid Level : raid1 Array Size : 10476544 (9.99 GiB 10.32 GB) Used Dev Size : 10476544 (9.99 GiB 10.32 GB) Raid Devices : 2 Total Devices : 2 Persistence : Superblock is persistent Intent Bitmap : Internal Update Time : Sun Dec 2 01:00:53 2019 State : clean Active Devices : 2 Working Devices : 2 Failed Devices : 0 Spare Devices : 0 UUID : xxxxxxx:yyyyyy:zzzzzz:ffffff Events : 987 Number Major Minor RaidDevice State 0 8 1 0 active sync /dev/sda1 1 8 49 1 active sync /dev/sdb1 Incase of a missing disk in the above example, the State of the raid would be \u2018dirty\u2019 and Active Devices and Working Devices would be reduced to one. One of the entries(either /dev/sda1 or /dev/sdb1 depending on the missing disk) would have their RaidDevice changed to faulty.","title":"Commands to monitor RAID"},{"location":"level102/linux_intermediate/storage_media/#lvm","text":"LVM stands for Logical Volume Management. In the above section we saw how we can create FS and use individual disks according to our need the traditional way but using LVM we can achieve more flexibility in storage allocation like we can stitch three 2TB disks to make one single partition of 6TB, or we can attach another physical disk of 4TB to the server and add that disk to the logical volume group to make it 10TB in total. Refer to know more about LVM: https://www.redhat.com/sysadmin/lvm-vs-partitioning","title":"LVM"},{"location":"level102/networking/conclusion/","text":"This course would have given some background on deploying services in datacentre and various parameters to consider and available solutions. It has to be noted that, each of the solution discussed here have various pros and cons, so specific to the scenario/requirement, the right fit among these are to be identified and used. As we didnt go the depth of various technologies/solution in this course, it might have made the reader curious to know about some of the topics. Here are some of the reference or online training content, for further learning. linked engineering blog : has information about how Linkedin datacentres are setup and some of the key problems are solved. IPSpace blog : Has lot of articles about datacentre networking. Networking Basics course in edx. Happy learning !!","title":"Conclusion"},{"location":"level102/networking/infrastructure-features/","text":"Some of the aspects to consider are, whether the underlying data centre infrastructure supports ToR resiliency, i.e. features like link bundling (bonds), BGP, support for anycast service, load balancer, firewall, Quality of Service. As seen in previous sections, to deploy applications at scale, it will need certain capabilities to be supported from the infrastructure. This section will cover different options available, and their suitability. ToR connectivity This being one of the most frequent points of failure (considering the scale of deployment), there are different options available to connect the servers to the ToR. We are going to see them in detail below, Single ToR This is the simplest of all the options. Where a NIC of the server is connected to one ToR. The advantage of this approach is, there is a minimal number of switch ports used, allowing the DC fabric to support the rapid growth of server infrastructure (Note: Not only the ToR ports are used efficiently, but the upper switching layer in DC fabric as well, the port usage will be efficient). On the downside, the servers can be unreachable if there is an issue with the ToR, link or NIC. This will impact the stateful apps more, as the existing connections get abruptly disconnected. Fig 4: Single ToR design Dual ToR In this option, each server is connected to two ToR, of the same cabinet. This can be set up in active/passive mode, thereby providing resiliency during ToR/link/NIC failures. The resiliency can be achieved either in layer 2 or in layer 3. Layer 2 In this case, both the links are bundled together as a bond on the server side (with one NIC taking the active role and the other being passive). On the switch side, these two links are made part of multi-chassis lag (similar to bonding, but spread across switches). The prerequisite here is, both the ToR should be part of the same layer 2 domain. The IP addresses are configured on the bond interface on the server and SVI on the switch side. Note: In this, the ToR 2 role is only to provide resiliency. Fig 5: Dual ToR layer 2 setup Layer 3 In this case, both the links are configured as separate layer 3 interfaces. The resiliency is achieved by setting up a routing protocol (like BGP). Wherein one link is given higher preference over the other. In this case, the two ToR's can be set up independently, in layer 3 mode. The servers would need a virtual address, to which the services have to be bound. Note: In this, the ToR 2 role is only to provide resiliency. Fig 6: Dual ToR layer 3 setup Though the resiliency is better with dual ToR, the drawback is, the number of ports being used. As the access port in the ToR doubles up, the number of ports required in the Spine layer also doubles up, and this keeps cascading to higher layers. Type Single ToR Dual ToR (layer 2) Dual ToR (layer 3) Resiliency 1 No 2 Yes Yes Port usage 1:1 1:2 1:2 Cabling Less More More Cost of DC fabric Low High High ToR features required Low High Medium 1 Resiliency in terms of ToR/Link/NIC 2 As an alternative, resiliency can be addressed at the application layer. Along with the above-mentioned ones, an application might need more capabilities out of the infrastructure to deploy at scale. Some of them are, Anycast As seen in the previous section, of deploying at scale, anycast is one of the means to have services distributed across cabinets and still have traffic flowing to each one of the servers. To achieve this, two things are required Routing protocol between ToR and server (to announce the anycast address) Support for ECMP (Equal Cost Multi-Path) load balancing in the infrastructure, to distribute the flows across the cabinets. Load balancing Similar to Anycast, another means to achieve load balancing across servers (host a particular app), is using load balancers. These could be implemented in different ways Hardware load balancers: A LB device is placed inline of the traffic flow, and looks at the layer 3 and layer 4 information in an incoming packet. Then determine the set of real hosts, to which the connections are to be redirected. As covered in the Scale topic, these load balancers can be set up in two ways, Single-arm mode: In this mode, the load balancer handles only the incoming requests to the VIP. The response from the server goes directly to the clients. There are two ways to implement this, L2 DSR: Where the load balancer and the real servers remain in the same VLAN. Upon getting an incoming request, the load balancer identifies the real server to redirect the request and then modifies the destination mac address of that Ethernet frame. Upon processing this packet, the real server responds directly to the client. L3 DSR : In this case, the load balancer and real servers need not be in the same VLAN (does away with layer 2 complexities like running STP, managing wider broadcast domain, etc). Upon incoming request, the load balancer redirects to the real server, by modifying the destination IP address of the packet. Along with this, the DSCP value of the packet is set to a predefined value (mapped for that VIP). Upon receipt of this packet, the real server uses the DSCP value to determine the loopback address (VIP address). The response again goes directly to the client. Two arm mode: In this case, the load balancer is in line for incoming and outgoing traffic. DNS based load balancer: Here the DNS servers keep a check of the health of the real servers and resolve the domain in such a way that the client can connect to different servers in that cluster. This part was explained in detail in the deployment at scale section. IPVS based load balancing: This is another means, where an IPVS server presents itself as the service endpoint to the clients. Upon incoming request, the IPVS directs the request to the real servers. The IPVS can be set up to do health for the real servers. NAT Network Address Translation (NAT) will be required for hosts that need to connect to destinations on the Internet, but don't want to expose their configured NIC address. In this case, the address (of the internal server) is translated to a public address by a firewall. Few examples of this are proxy servers, mail servers, etc. QoS Quality of Service is a means to provide, differentiate treatment to few packets over others. These could provide priority in forwarding queues, or bandwidth reservations. In the data centre scenario, depending upon the bandwidth subscription ratio, the need for QoS varies, 1:1 bandwidth subscription ratio: In this case, the server to ToR connectivity (all servers in that cabinet) bandwidth should be equivalent to the ToR to Spine switch connectivity. Similarly for the upper layers as well. In this design, congestion on a link is not going to happen, as enough bandwidth will always be available. In this case, the only difference QoS can bring, it provides priority treatment for certain packets in the forwarding queue. Note: Packet buffering happens, when the packet moves between ports of different speeds, like 100Gbps, 10Gbps. Oversubscribed network: In this case, not all layers maintain a bandwidth subscription ratio, for example, the ToR uplink may be of lower bandwidth, compared to ToR to Server bandwidth (This is sometimes referred to as oversubscription ratio). In this case, there is a possibility of congestion. Here QoS might be required, to give priority as well as bandwidth reservation, for certain types of traffic flows.","title":"Infrastructure Services"},{"location":"level102/networking/infrastructure-features/#tor-connectivity","text":"This being one of the most frequent points of failure (considering the scale of deployment), there are different options available to connect the servers to the ToR. We are going to see them in detail below,","title":"ToR connectivity"},{"location":"level102/networking/infrastructure-features/#single-tor","text":"This is the simplest of all the options. Where a NIC of the server is connected to one ToR. The advantage of this approach is, there is a minimal number of switch ports used, allowing the DC fabric to support the rapid growth of server infrastructure (Note: Not only the ToR ports are used efficiently, but the upper switching layer in DC fabric as well, the port usage will be efficient). On the downside, the servers can be unreachable if there is an issue with the ToR, link or NIC. This will impact the stateful apps more, as the existing connections get abruptly disconnected. Fig 4: Single ToR design","title":"Single ToR"},{"location":"level102/networking/infrastructure-features/#dual-tor","text":"In this option, each server is connected to two ToR, of the same cabinet. This can be set up in active/passive mode, thereby providing resiliency during ToR/link/NIC failures. The resiliency can be achieved either in layer 2 or in layer 3.","title":"Dual ToR"},{"location":"level102/networking/infrastructure-features/#layer-2","text":"In this case, both the links are bundled together as a bond on the server side (with one NIC taking the active role and the other being passive). On the switch side, these two links are made part of multi-chassis lag (similar to bonding, but spread across switches). The prerequisite here is, both the ToR should be part of the same layer 2 domain. The IP addresses are configured on the bond interface on the server and SVI on the switch side. Note: In this, the ToR 2 role is only to provide resiliency. Fig 5: Dual ToR layer 2 setup","title":"Layer 2"},{"location":"level102/networking/infrastructure-features/#layer-3","text":"In this case, both the links are configured as separate layer 3 interfaces. The resiliency is achieved by setting up a routing protocol (like BGP). Wherein one link is given higher preference over the other. In this case, the two ToR's can be set up independently, in layer 3 mode. The servers would need a virtual address, to which the services have to be bound. Note: In this, the ToR 2 role is only to provide resiliency. Fig 6: Dual ToR layer 3 setup Though the resiliency is better with dual ToR, the drawback is, the number of ports being used. As the access port in the ToR doubles up, the number of ports required in the Spine layer also doubles up, and this keeps cascading to higher layers. Type Single ToR Dual ToR (layer 2) Dual ToR (layer 3) Resiliency 1 No 2 Yes Yes Port usage 1:1 1:2 1:2 Cabling Less More More Cost of DC fabric Low High High ToR features required Low High Medium 1 Resiliency in terms of ToR/Link/NIC 2 As an alternative, resiliency can be addressed at the application layer. Along with the above-mentioned ones, an application might need more capabilities out of the infrastructure to deploy at scale. Some of them are,","title":"Layer 3"},{"location":"level102/networking/infrastructure-features/#anycast","text":"As seen in the previous section, of deploying at scale, anycast is one of the means to have services distributed across cabinets and still have traffic flowing to each one of the servers. To achieve this, two things are required Routing protocol between ToR and server (to announce the anycast address) Support for ECMP (Equal Cost Multi-Path) load balancing in the infrastructure, to distribute the flows across the cabinets.","title":"Anycast"},{"location":"level102/networking/infrastructure-features/#load-balancing","text":"Similar to Anycast, another means to achieve load balancing across servers (host a particular app), is using load balancers. These could be implemented in different ways Hardware load balancers: A LB device is placed inline of the traffic flow, and looks at the layer 3 and layer 4 information in an incoming packet. Then determine the set of real hosts, to which the connections are to be redirected. As covered in the Scale topic, these load balancers can be set up in two ways, Single-arm mode: In this mode, the load balancer handles only the incoming requests to the VIP. The response from the server goes directly to the clients. There are two ways to implement this, L2 DSR: Where the load balancer and the real servers remain in the same VLAN. Upon getting an incoming request, the load balancer identifies the real server to redirect the request and then modifies the destination mac address of that Ethernet frame. Upon processing this packet, the real server responds directly to the client. L3 DSR : In this case, the load balancer and real servers need not be in the same VLAN (does away with layer 2 complexities like running STP, managing wider broadcast domain, etc). Upon incoming request, the load balancer redirects to the real server, by modifying the destination IP address of the packet. Along with this, the DSCP value of the packet is set to a predefined value (mapped for that VIP). Upon receipt of this packet, the real server uses the DSCP value to determine the loopback address (VIP address). The response again goes directly to the client. Two arm mode: In this case, the load balancer is in line for incoming and outgoing traffic. DNS based load balancer: Here the DNS servers keep a check of the health of the real servers and resolve the domain in such a way that the client can connect to different servers in that cluster. This part was explained in detail in the deployment at scale section. IPVS based load balancing: This is another means, where an IPVS server presents itself as the service endpoint to the clients. Upon incoming request, the IPVS directs the request to the real servers. The IPVS can be set up to do health for the real servers.","title":"Load balancing"},{"location":"level102/networking/infrastructure-features/#nat","text":"Network Address Translation (NAT) will be required for hosts that need to connect to destinations on the Internet, but don't want to expose their configured NIC address. In this case, the address (of the internal server) is translated to a public address by a firewall. Few examples of this are proxy servers, mail servers, etc.","title":"NAT"},{"location":"level102/networking/infrastructure-features/#qos","text":"Quality of Service is a means to provide, differentiate treatment to few packets over others. These could provide priority in forwarding queues, or bandwidth reservations. In the data centre scenario, depending upon the bandwidth subscription ratio, the need for QoS varies, 1:1 bandwidth subscription ratio: In this case, the server to ToR connectivity (all servers in that cabinet) bandwidth should be equivalent to the ToR to Spine switch connectivity. Similarly for the upper layers as well. In this design, congestion on a link is not going to happen, as enough bandwidth will always be available. In this case, the only difference QoS can bring, it provides priority treatment for certain packets in the forwarding queue. Note: Packet buffering happens, when the packet moves between ports of different speeds, like 100Gbps, 10Gbps. Oversubscribed network: In this case, not all layers maintain a bandwidth subscription ratio, for example, the ToR uplink may be of lower bandwidth, compared to ToR to Server bandwidth (This is sometimes referred to as oversubscription ratio). In this case, there is a possibility of congestion. Here QoS might be required, to give priority as well as bandwidth reservation, for certain types of traffic flows.","title":"QoS"},{"location":"level102/networking/introduction/","text":"Prerequisites It is recommended to have basic knowledge of network security, TCP and datacenter setup and the common terminologies used in them. Also, the readers are expected to go through the School of Sre contents - Linux Networking system design security What to expect from this course This part will cover how a datacenter infrastructure is segregated for different application needs as well as the consideration of deciding where to place an application. These will be broadly based on, Security, Scale, RTT (latency), Infrastructure features. Each of these topics will be covered in detail, Security - Will cover threat vectors faced by services facing external/internal clients. Potential mitigation options to consider while deploying them. This will touch upon perimeter security, DDoS protection, Network demarcation and ring-fencing the server clusters. Scale - Deploying large scale applications, require a better understanding of infrastructure capabilities, in terms of resource availability, failure domains, scaling options like using anycast, layer 4/7 load balancer, DNS based load balancing. RTT (latency) - Latency plays a key role in determining the overall performance of the distributed service/application, where calls are made between hosts to serve the users. Infrastructure features - Some of the aspects to consider are, whether the underlying data centre infrastructure supports ToR resiliency, i.e., features like link bundling (bonds), BGP (Border Gateway Protocol), support for anycast service, load balancer, firewall, Quality of Service. What is not covered under this course Though these parameters play a role in designing an application, we will not go into the details of the design. Each of these topics are vast, hence the objective is to introduce the terms and relevance of the parameters in them, and not to provide extensive details about each one of them. Course Contents Security Scale RTT Infrastructure features Conclusion Terminology Before discussing each of the topics, it is important to get familiar with few commonly used terms Cloud This refers to hosted solutions from different providers like Azure, AWS, GCP. Wherein enterprises can host their applications for either public or private usage. On-prem This term refers to physical Data Center(DC) infrastructure, built and managed by enterprises themselves. This can be used for private access as well as public (like users connecting over the Internet). Leaf switch (ToR) This refers to the switch, where the servers connect to, in a DC. They are called by many names, like access switch, Top of the Rack switch, Leaf switch. The term leaf switch comes from the Spine-leaf architecture , where the access switches are called leaf switches. Spine-leaf architecture is commonly used in large/hyper-scale data centres, which brings very high scalability options for the DC switching layer and is also more efficient in building and implementing these switches. Sometimes these are referred to as Clos architecture. Spine switch Spine switches are the aggregation point of several leaf switches, they provide the inter-leaf communication and also connect to the upper layer of DC infrastructure. DC fabric As the data centre grows, multiple Clos networks need to be interconnected, to support the scale, and fabric switches help to interconnect them. Cabinet This refers to the rack, where the servers and ToR are installed. One cabinet refers to the entire rack. BGP It is the Border Gateway Protocol, used to exchange routing information between routers and switches. This is one of the common protocols used in the Internet and as well Data Centers as well. Other protocols are also used in place of BGP, like OSPF. VPN A Virtual Private Network is a tunnel solution, where two private networks (like offices, datacentres, etc) can be interconnected over a public network (internet). These VPN tunnels encrypt the traffic before sending over the Internet, as a security measure. NIC Network Interface Card refers to the module in Servers, which consists of the Ethernet port and the interconnection to the system bus. It is used to connect to the switches (commonly ToR switches). Flow Flows refer to a traffic exchange between two nodes (could be servers, switches, routers, etc), which has common parameters like source/destination IP address, source/destination port number, IP Protocol number. This helps in traffic a particular traffic exchange session, between two nodes (like a file copy session, or an HTTP connection, etc). ECMP Equal Cost Multi-Path means, a switch/router can distribute the traffic to a destination, among multiple exit interfaces. The flow information is used to build a hash value and based on that, exit interfaces are selected. Once a flow is mapped to a particular exit interface, all the packets of that flow exit via the same interface only. This helps in preventing out of order delivery of packets. RTT This is a measure of the time it takes for a packet from the source to reach the destination and return to the source. This is most commonly used in measuring network performance and also troubleshooting. TCP throughput This is the measure of the data transfer rate achieved between two nodes. This is impacted by many parameters like RTT, packet size, window size, etc. Unicast This refers to the traffic flow between a single source to a single destination (i.e.) like ssh sessions, where there is one to one communication. Anycast This refers to one-to-one traffic flow as above, but endpoints could be multiple (i.e.) a single source can send traffic to any one of the destination hosts in that group. This is achieved by having the same IP address configured in multiple servers and every new traffic flow is mapped to one of the servers. Multicast This refers to one-to-many traffic flow (i.e.) a single source can send traffic to multiple destinations. To make it feasible, the network routers replicate the traffic to different hosts (which register as members of that particular multicast group).","title":"Introduction"},{"location":"level102/networking/introduction/#prerequisites","text":"It is recommended to have basic knowledge of network security, TCP and datacenter setup and the common terminologies used in them. Also, the readers are expected to go through the School of Sre contents - Linux Networking system design security","title":"Prerequisites"},{"location":"level102/networking/introduction/#what-to-expect-from-this-course","text":"This part will cover how a datacenter infrastructure is segregated for different application needs as well as the consideration of deciding where to place an application. These will be broadly based on, Security, Scale, RTT (latency), Infrastructure features. Each of these topics will be covered in detail, Security - Will cover threat vectors faced by services facing external/internal clients. Potential mitigation options to consider while deploying them. This will touch upon perimeter security, DDoS protection, Network demarcation and ring-fencing the server clusters. Scale - Deploying large scale applications, require a better understanding of infrastructure capabilities, in terms of resource availability, failure domains, scaling options like using anycast, layer 4/7 load balancer, DNS based load balancing. RTT (latency) - Latency plays a key role in determining the overall performance of the distributed service/application, where calls are made between hosts to serve the users. Infrastructure features - Some of the aspects to consider are, whether the underlying data centre infrastructure supports ToR resiliency, i.e., features like link bundling (bonds), BGP (Border Gateway Protocol), support for anycast service, load balancer, firewall, Quality of Service.","title":"What to expect from this course"},{"location":"level102/networking/introduction/#what-is-not-covered-under-this-course","text":"Though these parameters play a role in designing an application, we will not go into the details of the design. Each of these topics are vast, hence the objective is to introduce the terms and relevance of the parameters in them, and not to provide extensive details about each one of them.","title":"What is not covered under this course"},{"location":"level102/networking/introduction/#course-contents","text":"Security Scale RTT Infrastructure features Conclusion","title":"Course Contents"},{"location":"level102/networking/introduction/#terminology","text":"Before discussing each of the topics, it is important to get familiar with few commonly used terms Cloud This refers to hosted solutions from different providers like Azure, AWS, GCP. Wherein enterprises can host their applications for either public or private usage. On-prem This term refers to physical Data Center(DC) infrastructure, built and managed by enterprises themselves. This can be used for private access as well as public (like users connecting over the Internet). Leaf switch (ToR) This refers to the switch, where the servers connect to, in a DC. They are called by many names, like access switch, Top of the Rack switch, Leaf switch. The term leaf switch comes from the Spine-leaf architecture , where the access switches are called leaf switches. Spine-leaf architecture is commonly used in large/hyper-scale data centres, which brings very high scalability options for the DC switching layer and is also more efficient in building and implementing these switches. Sometimes these are referred to as Clos architecture. Spine switch Spine switches are the aggregation point of several leaf switches, they provide the inter-leaf communication and also connect to the upper layer of DC infrastructure. DC fabric As the data centre grows, multiple Clos networks need to be interconnected, to support the scale, and fabric switches help to interconnect them. Cabinet This refers to the rack, where the servers and ToR are installed. One cabinet refers to the entire rack. BGP It is the Border Gateway Protocol, used to exchange routing information between routers and switches. This is one of the common protocols used in the Internet and as well Data Centers as well. Other protocols are also used in place of BGP, like OSPF. VPN A Virtual Private Network is a tunnel solution, where two private networks (like offices, datacentres, etc) can be interconnected over a public network (internet). These VPN tunnels encrypt the traffic before sending over the Internet, as a security measure. NIC Network Interface Card refers to the module in Servers, which consists of the Ethernet port and the interconnection to the system bus. It is used to connect to the switches (commonly ToR switches). Flow Flows refer to a traffic exchange between two nodes (could be servers, switches, routers, etc), which has common parameters like source/destination IP address, source/destination port number, IP Protocol number. This helps in traffic a particular traffic exchange session, between two nodes (like a file copy session, or an HTTP connection, etc). ECMP Equal Cost Multi-Path means, a switch/router can distribute the traffic to a destination, among multiple exit interfaces. The flow information is used to build a hash value and based on that, exit interfaces are selected. Once a flow is mapped to a particular exit interface, all the packets of that flow exit via the same interface only. This helps in preventing out of order delivery of packets. RTT This is a measure of the time it takes for a packet from the source to reach the destination and return to the source. This is most commonly used in measuring network performance and also troubleshooting. TCP throughput This is the measure of the data transfer rate achieved between two nodes. This is impacted by many parameters like RTT, packet size, window size, etc. Unicast This refers to the traffic flow between a single source to a single destination (i.e.) like ssh sessions, where there is one to one communication. Anycast This refers to one-to-one traffic flow as above, but endpoints could be multiple (i.e.) a single source can send traffic to any one of the destination hosts in that group. This is achieved by having the same IP address configured in multiple servers and every new traffic flow is mapped to one of the servers. Multicast This refers to one-to-many traffic flow (i.e.) a single source can send traffic to multiple destinations. To make it feasible, the network routers replicate the traffic to different hosts (which register as members of that particular multicast group).","title":"Terminology"},{"location":"level102/networking/rtt/","text":"Latency plays a key role in determining the overall performance of the distributed service/application, where calls are made between hosts to serve the users. RTT is a measure of time, it takes for a packet to reach B from A, and return to A. It is measured in milliseconds. This measure plays a role in determining the performance of the services. Its impact is seen in calls made between different servers/services, to serve the user, as well as the TCP throughput that can be achieved. It is fairly common that service makes multiple calls to servers within its cluster or to different services like authentication, logging, database, etc, to respond to each user/client request. These servers can be spread across different cabinets, at times even between different data centres in the same region. Such cases are quite possible in cloud solutions, where the deployment spreads across different sites within a region. As the RTT increases, the response time for each of the calls gets longer and thereby has a cascading effect on the end response being sent to the user. Relation of RTT and throughput RTT is inversely proportional to the TCP throughput. As RTT increases, it reduces the TCP throughput, just like packet loss. Below is a formula to estimate the TCP throughput, based on TCP mss, RTT and packet loss. As within a data centre, these calculations are also, important for communication over the internet, where a client can connect to the DC hosted services, over different telco networks and the RTT is not very stable, due to the unpredictability of the Internet routing policies.","title":"RTT"},{"location":"level102/networking/rtt/#relation-of-rtt-and-throughput","text":"RTT is inversely proportional to the TCP throughput. As RTT increases, it reduces the TCP throughput, just like packet loss. Below is a formula to estimate the TCP throughput, based on TCP mss, RTT and packet loss. As within a data centre, these calculations are also, important for communication over the internet, where a client can connect to the DC hosted services, over different telco networks and the RTT is not very stable, due to the unpredictability of the Internet routing policies.","title":"Relation of RTT and throughput"},{"location":"level102/networking/scale/","text":"Deploying large scale applications, require a better understanding of infrastructure capabilities, in terms of resource availability, failure domains, scaling options like using anycast, layer 4/7 load balancer, DNS based load balancing. Building large scale applications is a complex activity, which should cover many aspects in design, development and as well as operationalisation. This section will talk about the considerations to look for while deploying them. Failure domains In any infrastructure, failures due to hardware or software issues are common. Though these may be a pain from a service availability perspective, these failures do happen and a pragmatic goal would be to, try to keep these failures to the minimum. Hence while deploying any service, failures/non-availability of some of the nodes to be factored in. Server failures A server could fail, due to power or NIC or software bug. And at times it may not be a complete failure but could be an error in the NIC, which causes some packet loss. This is a very common scenario and will impact the stateful services more. While designing such services, it is important to accommodate some level of tolerance to such failures. ToR failures This is one of the common scenarios, where the leaf switch connecting the servers goes down, along with it taking down the entire cabinet. There could be more than one server of the same service that can go down in this case. It requires planning to decide how much server loss can be handled without overloading other servers. Based on this, the service can be distributed across many cabinets. These calculations may vary, depending upon the resiliency in the ToR design, which will be covered in ToR connectivity section. Site failures Here site failure is a generic term, which could mean, a particular service is down in a site, maybe due to new version rollout, or failures of devices like firewall, load balancer, if the service depends on them, or loss of connectivity to remote sites (which might have limited options for resiliency) or issues with critical services like DNS, etc. Though these events may not be common, they can have a significant impact. In summary, handling these failure scenarios has to be thought about while designing the application itself. That will provide the tolerance required within the application to recover from unexpected failures. This will help not only for failures, even for planned maintenance work, as it will be easier to take part of the infrastructure, out of service. Resource availability The other aspect to consider while deploying applications at scale is the availability of the required infrastructure and the features the service is dependent upon. For example, for the resiliency of a cabinet, if one decides to distribute the service to 5 cabinets, but the service needs a load balancer (to distribute incoming connections to different servers), it may become challenging if load balancers are not supported in all cabinets. Or there could be a case that there are not enough cabinets available (that meet the minimum required specification for service to be set up). The best approach in these cases is to identify the requirements and gaps and then work with the Infrastructure team to best solve them. Scaling options While distributing the application to different cabinets, the incoming traffic to these services has to be distributed across these servers. To achieve this, the following may be considered Anycast This is one of the quickest ways to roll out traffic distribution across multiple cabinets. In this, each server, part of the cluster (where the service is set up), advertises a loopback address (/32 IPv4 or /128 IPv6 address), to the DC switch fabric (most commonly BGP is used for this purpose). The service has to be set up to be listening to this loopback address. When the clients try to connect to the service, get resolved to this virtual address and forward their queries. The DC switch fabric distributes each flow into different available next hops (eventually to all the servers in that service cluster). Note: The DC switch computes a hash, based on the IP packet header, this could include any combination of source and destination addresses, source and destination port, mac address and IP protocol number. Based on this hash value, a particular next-hop is picked up. Since all the packets in a traffic flow, carry the same values for these headers, all the packets in that flow will be mapped to the same path. Fig 1: Anycast setup To achieve a proportionate distribution of flows across these servers, it is important to maintain uniformity in each of the cabinets and pods. But remember, the distribution happens only based on flows, and if there are any elephant (large) flows, some servers might receive a higher volume of traffic. If there are any server or ToR failures, the advertisement of loopback address to the switches will stop, and thereby the new packets will be forwarded to the remaining available servers. Load balancer Another common approach is to use a load balancer. A Virtual IP is set up in the load balancers, to which the client connects while trying to access the service. The load balancer, in turn, redirects these connections to, one of the actual servers, where the service is running. In order to, verify the server is in the serviceable state, the load balancer does periodic health checks, and if it fails, the LB stops redirecting the connection to these servers. The load balancer can be deployed in single-arm mode, where the traffic to the VIP is redirected by the LB, and the return traffic from the server to the client is sent directly. The other option is the two-arm mode, where the return traffic is also passed through the LB. Fig 2: Single-arm mode Fig 3: Two-arm mode One of the cons of this approach is, at a higher scale, the load balancer can become the bottleneck, to support higher traffic volumes or concurrent connections per second. DNS based load balancing This is similar to the above approach, with the only difference is instead of an appliance, the load balancing is done at the DNS. The clients get different IP's to connect when they query for the DNS records of the service. The DNS server has to do a health check, to know which servers are in a good state. This approach alleviates the bottleneck of the load balancer solution. But require shorter TTL for the DNS records, so that problematic servers can be taken out of rotation quickly, which means, there will be far more DNS queries.","title":"Scale"},{"location":"level102/networking/scale/#failure-domains","text":"In any infrastructure, failures due to hardware or software issues are common. Though these may be a pain from a service availability perspective, these failures do happen and a pragmatic goal would be to, try to keep these failures to the minimum. Hence while deploying any service, failures/non-availability of some of the nodes to be factored in.","title":"Failure domains"},{"location":"level102/networking/scale/#server-failures","text":"A server could fail, due to power or NIC or software bug. And at times it may not be a complete failure but could be an error in the NIC, which causes some packet loss. This is a very common scenario and will impact the stateful services more. While designing such services, it is important to accommodate some level of tolerance to such failures.","title":"Server failures"},{"location":"level102/networking/scale/#tor-failures","text":"This is one of the common scenarios, where the leaf switch connecting the servers goes down, along with it taking down the entire cabinet. There could be more than one server of the same service that can go down in this case. It requires planning to decide how much server loss can be handled without overloading other servers. Based on this, the service can be distributed across many cabinets. These calculations may vary, depending upon the resiliency in the ToR design, which will be covered in ToR connectivity section.","title":"ToR failures"},{"location":"level102/networking/scale/#site-failures","text":"Here site failure is a generic term, which could mean, a particular service is down in a site, maybe due to new version rollout, or failures of devices like firewall, load balancer, if the service depends on them, or loss of connectivity to remote sites (which might have limited options for resiliency) or issues with critical services like DNS, etc. Though these events may not be common, they can have a significant impact. In summary, handling these failure scenarios has to be thought about while designing the application itself. That will provide the tolerance required within the application to recover from unexpected failures. This will help not only for failures, even for planned maintenance work, as it will be easier to take part of the infrastructure, out of service.","title":"Site failures"},{"location":"level102/networking/scale/#resource-availability","text":"The other aspect to consider while deploying applications at scale is the availability of the required infrastructure and the features the service is dependent upon. For example, for the resiliency of a cabinet, if one decides to distribute the service to 5 cabinets, but the service needs a load balancer (to distribute incoming connections to different servers), it may become challenging if load balancers are not supported in all cabinets. Or there could be a case that there are not enough cabinets available (that meet the minimum required specification for service to be set up). The best approach in these cases is to identify the requirements and gaps and then work with the Infrastructure team to best solve them.","title":"Resource availability"},{"location":"level102/networking/scale/#scaling-options","text":"While distributing the application to different cabinets, the incoming traffic to these services has to be distributed across these servers. To achieve this, the following may be considered","title":"Scaling options"},{"location":"level102/networking/scale/#anycast","text":"This is one of the quickest ways to roll out traffic distribution across multiple cabinets. In this, each server, part of the cluster (where the service is set up), advertises a loopback address (/32 IPv4 or /128 IPv6 address), to the DC switch fabric (most commonly BGP is used for this purpose). The service has to be set up to be listening to this loopback address. When the clients try to connect to the service, get resolved to this virtual address and forward their queries. The DC switch fabric distributes each flow into different available next hops (eventually to all the servers in that service cluster). Note: The DC switch computes a hash, based on the IP packet header, this could include any combination of source and destination addresses, source and destination port, mac address and IP protocol number. Based on this hash value, a particular next-hop is picked up. Since all the packets in a traffic flow, carry the same values for these headers, all the packets in that flow will be mapped to the same path. Fig 1: Anycast setup To achieve a proportionate distribution of flows across these servers, it is important to maintain uniformity in each of the cabinets and pods. But remember, the distribution happens only based on flows, and if there are any elephant (large) flows, some servers might receive a higher volume of traffic. If there are any server or ToR failures, the advertisement of loopback address to the switches will stop, and thereby the new packets will be forwarded to the remaining available servers.","title":"Anycast"},{"location":"level102/networking/scale/#load-balancer","text":"Another common approach is to use a load balancer. A Virtual IP is set up in the load balancers, to which the client connects while trying to access the service. The load balancer, in turn, redirects these connections to, one of the actual servers, where the service is running. In order to, verify the server is in the serviceable state, the load balancer does periodic health checks, and if it fails, the LB stops redirecting the connection to these servers. The load balancer can be deployed in single-arm mode, where the traffic to the VIP is redirected by the LB, and the return traffic from the server to the client is sent directly. The other option is the two-arm mode, where the return traffic is also passed through the LB. Fig 2: Single-arm mode Fig 3: Two-arm mode One of the cons of this approach is, at a higher scale, the load balancer can become the bottleneck, to support higher traffic volumes or concurrent connections per second.","title":"Load balancer"},{"location":"level102/networking/scale/#dns-based-load-balancing","text":"This is similar to the above approach, with the only difference is instead of an appliance, the load balancing is done at the DNS. The clients get different IP's to connect when they query for the DNS records of the service. The DNS server has to do a health check, to know which servers are in a good state. This approach alleviates the bottleneck of the load balancer solution. But require shorter TTL for the DNS records, so that problematic servers can be taken out of rotation quickly, which means, there will be far more DNS queries.","title":"DNS based load balancing"},{"location":"level102/networking/security/","text":"This section will cover threat vectors faced by services facing external/internal clients. Potential mitigation options to consider while deploying them. This will touch upon perimeter security, DDoS protection, Network demarcation and operational practices. Security Threat Security is one of the major considerations in any infrastructure. There are various security threats, which could amount to data theft, loss of service, fraudulent activity, etc. An attacker can use techniques like phishing, spamming, malware, Dos/DDoS, exploiting vulnerabilities, man-in-the-middle attack, and many more. In this section, we will cover some of these threats and possible mitigation. As there are numerous means to attack and secure the infrastructure, we will only focus on some of the most common ones. Phishing is mostly done via email (and other mass communication methods), where an attacker provides links to fake websites/URLs. Upon accessing that, victim's sensitive information like login credentials or personal data is collected and can be misused. Spamming is also similar to phishing, but the attacker doesn't collect data from users but tries to spam a particular website and probably overwhelm them (to cause slowness) and well use that opportunity to, compromise the security of the attacked website. Malware is like a trojan horse, where an attacker manages to install a piece of code on the secured systems in the infrastructure. Using this, the hacker can collect sensitive data and as well infect the critical services of the target company. Exploiting vulnerabilities is another method an attacker can gain access to the systems. These could be bugs or misconfiguration in web servers, internet-facing routers/switches/firewalls, etc. DoS/DDoS is one of the common attacks seen on internet-based services/solutions, especially those businesses based on eyeball traffic. Here the attacker tries to overwhelm the resources of the victim by generating spurious traffic to the external-facing services. By this, primarily the services turn slow or non-responsive, during this time, the attacker could try to hack into the network, if some of the security mechanism fails to filter through the attack traffic due to overload. Securing the infrastructure The first and foremost aspect for any infrastructure administration is to identify the various security threats that could affect the business running over this infrastructure. Once different threats are known, the security defence mechanism has to be designed and implemented. Some of the common means to securing the infrastructure are Perimeter security This is the first line of defence in any infrastructure, where unwanted/unexpected traffic flows into the infrastructure are filtered/blocked. These could be filters in the edge routers, that allow expected services (like port 443 traffic for web service running on HTTPS), or this filter can be set up to block unwanted traffic, like blocking UDP ports, if the services are not dependent on UDP. Similar to the application traffic entering the network, there could be other traffic like BGP messages for Internet peers, VPN tunnels traffic, as well other services like email/DNS, etc. There are means to protect every one of these, like using authentication mechanisms (password or key-based) for peers of BGP, VPN, and whitelisting these specific peers to make inbound connections (in perimeter filters). Along with these, the amount of messages/traffic can be rate-limited to known scale or expected load, so the resources are not overwhelmed. DDoS mitigation Protecting against a DDoS attack is another important aspect. The attack traffic will look similar to the genuine users/client request, but with the intention to flood the externally exposed app, which could be a web server, DNS, etc. Therefore it is essential to differentiate between the attack traffic and genuine traffic, for this, there are different methods to do at the application level, one such example using Captcha on a web service, to catch traffic originating from bots. For these methods to be useful, the nodes should be capable of handling both the attack traffic and genuine traffic. It may be possible in cloud-based infrastructure to dynamically add more virtual machines/resources, to handle the sudden spike in volume of traffic, but on-prem, the option to add additional resources might be challenging. To handle a large volume of attack traffic, there are solutions available, which can inspect the packets/traffic flows and identify anomalies (i.e.) traffic patterns that don't resemble a genuine connection, like client initiating TCP connection, but fail to complete the handshake, or set of sources, which have abnormally huge traffic flow. Once this unwanted traffic is identified, these are dropped at the edge of the network itself, thereby protecting the resources of app nodes. This topic alone can be discussed more in detail, but that will be beyond the scope of this section. Network Demarcation Network demarcation is another common strategy deployed in different networks when applications are grouped based on their security needs and vulnerability to an attack. Some common demarcations are, the external/internet facing nodes are grouped into a separate zone, whereas those nodes having sensitive data are segregated into a separate zone. And any communication between these zones is restricted with the help of security tools to limit exposure to unwanted hosts/ports. These inter-zone communication filters are sometimes called ring-fencing. The number of zones to be created, varies for different deployments, for example, there could be a host which should be able to communicate to the external world as well as internal servers, like proxy, email, in this case, these can be grouped under one zone, say De-Militarized Zones (DMZ). The main advantage of creating zones is that, even if there is a compromised host, that doesn't act as a back door entry for the rest of the infrastructure. Node protection Be it server, router, switches, load balancers, firewall, etc, each of these devices come with certain capabilities to secure themselves, like support for filters (e.g. Access-list, Iptables) to control what traffic to process and what to drop, anti-virus software can be used in servers to check on the software installed in them. Operational practices There are numerous security threats for infrastructure, and there are different solutions to defend them. The key part to the defence, is not only identifying the right solution and the tools for it but also making sure there are robust operational procedures in place, to respond promptly, decisively and with clarity, for any security incident. Standard Operating Procedures (SOP) SOP need to be well defined and act as a reference for on-call to follow during a security incident. This SoP should cover things like, When a security incident happens, how it will be alerted, to whom it will be alerted. Identify the scale and severity of the security incident. Who are the points of escalation and the threshold/time to intimate them, there could be other concerned teams or to the management or even to the security operations in-charge. Which solutions to use (and the procedure to follow in them) to mitigate the security incident. Also the data about the security incident has to be collated for further analysis. Many organisations have a dedicated team focused on security, and they drive most of the activities, during an attack and even before, to come up with best practices, guidelines and compliance audits. It is the responsibility of respective technical teams, to ensure the infrastructure meets these recommendations and gaps are fixed. Periodic review Along with defining SoP's, the entire security of the infrastructure has to be reviewed periodically. This review should include, Identifying any new/improved security threat that could potentially target the infrastructure. The SoP's have to be reviewed periodically, depending upon new security threats or changes in the procedure (to implement the solutions) Ensuring software upgrades/patches are done in a timely manner. Audit the infrastructure for any non-compliance of the security standards. Review of recent security incidents and find means to improvise the defence mechanisms.","title":"Security"},{"location":"level102/networking/security/#security-threat","text":"Security is one of the major considerations in any infrastructure. There are various security threats, which could amount to data theft, loss of service, fraudulent activity, etc. An attacker can use techniques like phishing, spamming, malware, Dos/DDoS, exploiting vulnerabilities, man-in-the-middle attack, and many more. In this section, we will cover some of these threats and possible mitigation. As there are numerous means to attack and secure the infrastructure, we will only focus on some of the most common ones. Phishing is mostly done via email (and other mass communication methods), where an attacker provides links to fake websites/URLs. Upon accessing that, victim's sensitive information like login credentials or personal data is collected and can be misused. Spamming is also similar to phishing, but the attacker doesn't collect data from users but tries to spam a particular website and probably overwhelm them (to cause slowness) and well use that opportunity to, compromise the security of the attacked website. Malware is like a trojan horse, where an attacker manages to install a piece of code on the secured systems in the infrastructure. Using this, the hacker can collect sensitive data and as well infect the critical services of the target company. Exploiting vulnerabilities is another method an attacker can gain access to the systems. These could be bugs or misconfiguration in web servers, internet-facing routers/switches/firewalls, etc. DoS/DDoS is one of the common attacks seen on internet-based services/solutions, especially those businesses based on eyeball traffic. Here the attacker tries to overwhelm the resources of the victim by generating spurious traffic to the external-facing services. By this, primarily the services turn slow or non-responsive, during this time, the attacker could try to hack into the network, if some of the security mechanism fails to filter through the attack traffic due to overload.","title":"Security Threat"},{"location":"level102/networking/security/#securing-the-infrastructure","text":"The first and foremost aspect for any infrastructure administration is to identify the various security threats that could affect the business running over this infrastructure. Once different threats are known, the security defence mechanism has to be designed and implemented. Some of the common means to securing the infrastructure are","title":"Securing the infrastructure"},{"location":"level102/networking/security/#perimeter-security","text":"This is the first line of defence in any infrastructure, where unwanted/unexpected traffic flows into the infrastructure are filtered/blocked. These could be filters in the edge routers, that allow expected services (like port 443 traffic for web service running on HTTPS), or this filter can be set up to block unwanted traffic, like blocking UDP ports, if the services are not dependent on UDP. Similar to the application traffic entering the network, there could be other traffic like BGP messages for Internet peers, VPN tunnels traffic, as well other services like email/DNS, etc. There are means to protect every one of these, like using authentication mechanisms (password or key-based) for peers of BGP, VPN, and whitelisting these specific peers to make inbound connections (in perimeter filters). Along with these, the amount of messages/traffic can be rate-limited to known scale or expected load, so the resources are not overwhelmed.","title":"Perimeter security"},{"location":"level102/networking/security/#ddos-mitigation","text":"Protecting against a DDoS attack is another important aspect. The attack traffic will look similar to the genuine users/client request, but with the intention to flood the externally exposed app, which could be a web server, DNS, etc. Therefore it is essential to differentiate between the attack traffic and genuine traffic, for this, there are different methods to do at the application level, one such example using Captcha on a web service, to catch traffic originating from bots. For these methods to be useful, the nodes should be capable of handling both the attack traffic and genuine traffic. It may be possible in cloud-based infrastructure to dynamically add more virtual machines/resources, to handle the sudden spike in volume of traffic, but on-prem, the option to add additional resources might be challenging. To handle a large volume of attack traffic, there are solutions available, which can inspect the packets/traffic flows and identify anomalies (i.e.) traffic patterns that don't resemble a genuine connection, like client initiating TCP connection, but fail to complete the handshake, or set of sources, which have abnormally huge traffic flow. Once this unwanted traffic is identified, these are dropped at the edge of the network itself, thereby protecting the resources of app nodes. This topic alone can be discussed more in detail, but that will be beyond the scope of this section.","title":"DDoS mitigation"},{"location":"level102/networking/security/#network-demarcation","text":"Network demarcation is another common strategy deployed in different networks when applications are grouped based on their security needs and vulnerability to an attack. Some common demarcations are, the external/internet facing nodes are grouped into a separate zone, whereas those nodes having sensitive data are segregated into a separate zone. And any communication between these zones is restricted with the help of security tools to limit exposure to unwanted hosts/ports. These inter-zone communication filters are sometimes called ring-fencing. The number of zones to be created, varies for different deployments, for example, there could be a host which should be able to communicate to the external world as well as internal servers, like proxy, email, in this case, these can be grouped under one zone, say De-Militarized Zones (DMZ). The main advantage of creating zones is that, even if there is a compromised host, that doesn't act as a back door entry for the rest of the infrastructure.","title":"Network Demarcation"},{"location":"level102/networking/security/#node-protection","text":"Be it server, router, switches, load balancers, firewall, etc, each of these devices come with certain capabilities to secure themselves, like support for filters (e.g. Access-list, Iptables) to control what traffic to process and what to drop, anti-virus software can be used in servers to check on the software installed in them.","title":"Node protection"},{"location":"level102/networking/security/#operational-practices","text":"There are numerous security threats for infrastructure, and there are different solutions to defend them. The key part to the defence, is not only identifying the right solution and the tools for it but also making sure there are robust operational procedures in place, to respond promptly, decisively and with clarity, for any security incident.","title":"Operational practices"},{"location":"level102/networking/security/#standard-operating-procedures-sop","text":"SOP need to be well defined and act as a reference for on-call to follow during a security incident. This SoP should cover things like, When a security incident happens, how it will be alerted, to whom it will be alerted. Identify the scale and severity of the security incident. Who are the points of escalation and the threshold/time to intimate them, there could be other concerned teams or to the management or even to the security operations in-charge. Which solutions to use (and the procedure to follow in them) to mitigate the security incident. Also the data about the security incident has to be collated for further analysis. Many organisations have a dedicated team focused on security, and they drive most of the activities, during an attack and even before, to come up with best practices, guidelines and compliance audits. It is the responsibility of respective technical teams, to ensure the infrastructure meets these recommendations and gaps are fixed.","title":"Standard Operating Procedures (SOP)"},{"location":"level102/networking/security/#periodic-review","text":"Along with defining SoP's, the entire security of the infrastructure has to be reviewed periodically. This review should include, Identifying any new/improved security threat that could potentially target the infrastructure. The SoP's have to be reviewed periodically, depending upon new security threats or changes in the procedure (to implement the solutions) Ensuring software upgrades/patches are done in a timely manner. Audit the infrastructure for any non-compliance of the security standards. Review of recent security incidents and find means to improvise the defence mechanisms.","title":"Periodic review"},{"location":"level102/system_calls_and_signals/conclusion/","text":"Conclusion One of the main goals of a SRE is to improve the reliability of high scale systems. Inorder to achieve this, a basic understanding of the internal workings of a system is necessary. Getting to know about how signals work is important since they play a big role in the lifecycle of processes. We see the use of signals in a range of operations on processes : from creating a process to killing a process. Knowledge of signals is important especially when handling them in programs. If you anticipate an event that causes signals, you can define a handler function and tell the operating system to run it when that particular type of signal arrives. Understanding system calls is especially useful to SRE's while debugging any Linux process. System calls provide precise knowledge of the internal functionalities of an operating system. It gives an in-depth understanding for programmers about C library functions which implement system calls at a lower level. With the use of strace command, one may easily debug slow or hung processes. Further Reading https://www.oreilly.com/library/view/understanding-the-linux/0596002130/ch01s06.html https://jvns.ca/blog/2021/04/03/what-problems-do-people-solve-with-strace/ https://medium.com/@akhandmishra/important-system-calls-every-programmer-should-know-8884381ceadb https://www.brendangregg.com/blog/2014-05-11/strace-wow-much-syscall.html","title":"Conclusion"},{"location":"level102/system_calls_and_signals/conclusion/#conclusion","text":"One of the main goals of a SRE is to improve the reliability of high scale systems. Inorder to achieve this, a basic understanding of the internal workings of a system is necessary. Getting to know about how signals work is important since they play a big role in the lifecycle of processes. We see the use of signals in a range of operations on processes : from creating a process to killing a process. Knowledge of signals is important especially when handling them in programs. If you anticipate an event that causes signals, you can define a handler function and tell the operating system to run it when that particular type of signal arrives. Understanding system calls is especially useful to SRE's while debugging any Linux process. System calls provide precise knowledge of the internal functionalities of an operating system. It gives an in-depth understanding for programmers about C library functions which implement system calls at a lower level. With the use of strace command, one may easily debug slow or hung processes.","title":"Conclusion"},{"location":"level102/system_calls_and_signals/conclusion/#further-reading","text":"https://www.oreilly.com/library/view/understanding-the-linux/0596002130/ch01s06.html https://jvns.ca/blog/2021/04/03/what-problems-do-people-solve-with-strace/ https://medium.com/@akhandmishra/important-system-calls-every-programmer-should-know-8884381ceadb https://www.brendangregg.com/blog/2014-05-11/strace-wow-much-syscall.html","title":"Further Reading"},{"location":"level102/system_calls_and_signals/intro/","text":"System Calls and Signals Prerequisites Linux Basics Python Basics What to expect from this course The course covers a fundamental understanding of signals and system calls. It sheds light on how the knowledge of signals and system calls can be helpful for an SRE. What is not covered under this course The course does not discuss any other interrupts or interrupt handling apart from signals. The course will not deep dive into signal handler and GNU C library. Course Contents Signals Introduction to interrupts and signals Types of signals Sending signals to process Handling signals Role of signals in system calls with the example of wait() System calls Introduction Types of system calls User mode,kernel mode and their transitions Working of write() system call Debugging in Linux with strace","title":"Introduction"},{"location":"level102/system_calls_and_signals/intro/#system-calls-and-signals","text":"","title":"System Calls and Signals"},{"location":"level102/system_calls_and_signals/intro/#prerequisites","text":"Linux Basics Python Basics","title":"Prerequisites"},{"location":"level102/system_calls_and_signals/intro/#what-to-expect-from-this-course","text":"The course covers a fundamental understanding of signals and system calls. It sheds light on how the knowledge of signals and system calls can be helpful for an SRE.","title":"What to expect from this course"},{"location":"level102/system_calls_and_signals/intro/#what-is-not-covered-under-this-course","text":"The course does not discuss any other interrupts or interrupt handling apart from signals. The course will not deep dive into signal handler and GNU C library.","title":"What is not covered under this course"},{"location":"level102/system_calls_and_signals/intro/#course-contents","text":"Signals Introduction to interrupts and signals Types of signals Sending signals to process Handling signals Role of signals in system calls with the example of wait() System calls Introduction Types of system calls User mode,kernel mode and their transitions Working of write() system call Debugging in Linux with strace","title":"Course Contents"},{"location":"level102/system_calls_and_signals/signals/","text":"Introduction to interrupts and signals An interrupt is an event that alters the normal execution flow of a program and can be generated by hardware devices or even by the CPU itself. When an interrupt occurs the current flow of execution is suspended and the interrupt handler runs. After the interrupt handler runs the previous execution flow is resumed. There are three types of events that can cause the CPU to interrupt: hardware interrupts, software interrupts, and exceptions. Signals are nothing but software interrupts that notifies a process that an event has occurred. These events might be requests from users or indications that a system problem (such as a memory access error) has occurred. Every signal has a signal number and a default action defined. A process can react to them in any of the following ways: a default (OS-provided) way catch the signal and handle them in a program-defined way ignore the signal entirely Signal Groups Signals fall into two broad categories. The first set constitutes the traditional or standard signals, which are used by the kernel to notify processes of events. On Linux, the standard signals are numbered from 1 to 31. The other set of signals consists of the realtime signals. Linux supports both POSIX reliable signals (hereinafter \"standard signals\") and POSIX real-time signals. Realtime Signals Realtime signals were defined in POSIX.1b to remedy a number of limitations of standard signals. They have the following advantages over standard signals: Realtime signals provide an increased range of signals that can be used for application-defined purposes. Only two standard signals are freely available for application-defined purposes: SIGUSR1 and SIGUSR2. Realtime signals are queued. If multiple instances of a realtime signal are sent to a process, then the signal is delivered multiple times. By contrast, if we send further instances of a standard signal that is already pending for a process, that signal is delivered only once. When sending a realtime signal, it is possible to specify data (an integer or pointer value) that accompanies the signal. The signal handler in the receiving process can retrieve this data. The order of delivery of different realtime signals is guaranteed. If multiple different realtime signals are pending, then the lowest-numbered signal is delivered first. In other words, signals are prioritized, with lower-numbered signals having higher priority. When multiple signals of the same type are queued, they are delivered\u2014along with their accompanying data\u2014in the order in which they were sent. Standard Signals The standard signals are the classical signals that have been there since the early days of Unix. Further here, we will be discussing about standard signals. Signal Overview A signal is said to be generated by some event. Once generated, a signal is later delivered to a process, which then takes some action in response to the signal. Between the time it is generated and the time it is delivered, a signal is said to be pending. Normally, a pending signal is delivered to a process as soon as it is next scheduled to run, or immediately if the process is already running (e.g., if the process sent a signal to itself). Sometimes, however, we need to ensure that a segment of code is not interrupted by the delivery of a signal. To do this, we can add a signal to the process\u2019s signal mask - a set of signals whose delivery is currently blocked. If a signal is generated while it is blocked, it remains pending until it is later unblocked (removed from the signal mask). Various system calls allow a process to add and remove signals from its signal mask. Upon delivery of a signal, a process carries out one of the following default actions, depending on the signal: The signal is ignored; that is, it is discarded by the kernel and has no effect on the process. (The process never even knows that it occurred.) The process is terminated (killed). This is sometimes referred to as abnormal process termination, as opposed to the normal process termination that occurs when a process terminates using exit(). A core dump file is generated, and the process is terminated. A core dump file contains an image of the virtual memory of the process, which can be loaded into a debugger in order to inspect the state of the process at the time that it terminated. The process is stopped\u2014execution of the process is suspended. Execution of the process is resumed after previously being stopped. Instead of accepting the default for a particular signal, a program can change the action that occurs when the signal is delivered. This is known as setting the disposition of the signal. To read more about disposition, refer here . A program can set one of the following dispositions for a signal: The default action should occur. This is useful to undo an earlier change of the disposition of the signal to something other than its default. The signal is ignored. This is useful for a signal whose default action would be to terminate the process. A signal handler is executed. A signal handler is a function, written by the programmer, that performs appropriate tasks in response to the delivery of a signal. For example, the shell has a handler for the SIGINT signal (generated by the interrupt character, Control-C) that causes it to stop what it is currently doing and return control to the main input loop, so that the user is once more presented with the shell prompt. Notifying the kernel that a handler function should be invoked is usually referred to as installing or establishing a signal handler. When a signal handler is invoked in response to the delivery of a signal, we say that the signal has been handled or, synonymously, caught. Note that it isn\u2019t possible to set the disposition of a signal to terminate or dump core (unless one of these is the default disposition of the signal). The nearest we can get to this is to install a handler for the signal that then calls either exit() or abort(). The abort() function generates a SIGABRT signal for the process, which causes it to dump core and terminate. Types of signals To list available signals in a Linux system, you can use the command kill -l . The table below lists the signals 1 to 20. To get a full list of signals, you can refer here . Signal name Signal number Default Action Meaning SIGHUP 1 Terminate Hangup detected on controlling terminal or death of controlling process SIGINT 2 Terminate Interrupt from keyboard SIGQUIT 3 Core dump Quit from keyboard SIGILL 4 Core dump Illegal instruction SIGTRAP 5 Core dump Trace/breakpoint trap for debugging SIGABRT , SIGIOT 6 Core dump Abnormal termination SIGBUS 7 Core dump Bus error SIGFPE 8 Core dump Floating point exception SIGKILL 9 Terminate Kill signal(cannot be caught or ignored) SIGUSR1 10 Terminate User-defined signal 1 SIGSEGV 11 Core dump Invalid memory reference SIGUSR2 12 Terminate User-defined signal 2 SIGPIPE 13 Terminate Broken pipe;write pipe with no readers SIGALRM 14 Terminate Timer signal from alarm SIGTERM 15 Terminate Process termination SIGSTKFLT 16 Terminate Stack fault on math co-processor SIGCHLD 17 Ignore Child stopped or terminated SIGCONT 18 Continue Continue if stopped SIGSTOP 19 Stop Stop process (can not be caught or ignore) SIGTSTP 20 Stop Stop types at tty Sending signals to process There are three different ways to send signals to processes: Sending signal to process using kill Kill command can be used to send signals to process. By default a SIGTERM signal is sent but a different type of signal can be sent to the process by defining the signal number(or signal name). For example, the command kill -9 367 sends SIGKILL to the process with PID 367 Sending signal to process via keyboard Signals can be sent to a running process by pressing some specific keys. For example, holding Ctrl+C sends SIGINT to the process which terminates it. Sending signal to process via another process A process can send a signal to another process via the kill() system call. In this use, signals can be employed as a synchronization technique, or even as a primitive form of interprocess communication (IPC). It is also possible for a process to send a signal to itself. int kill(pid_t pid, int sig) system call takes 2 arguments, pid of the process you wish to send the signal to and the signal number of the desired signal. Handling signals Referring to the table of signals in the previous section, you can see that there are default handlers attached to all signals when the program is started. When we invoke signal to attach our own handler, we are over-riding the default behaviour of the program in response to that signal. Specifically, if we attach a handler to SIGINT, the program will no longer terminate when you press +C (or send the program a SIGINT by any other means); rather, the function specified as the handler will be invoked instead which will define the behaviour of the program in response to that signal. Let\u2019s take an example of handling SIGINT signal and terminating a program. We will use Python\u2019s signal library to achieve this. When we press Ctrl+C, SIGINT signal is sent. From the signals table, we see that the default action for SIGINT is to terminate the process. To illustrate how the process reacts to the default action and a signal handler, let us consider the below example. Default Action of SIGINT: Let us first run the below lines in a python environment: while 1: continue Now let us press \"Ctrl+C\". On pressing \"Ctrl+C\" , a SIGINT interrupt is sent to the process and the default action for SIGINT as per the table we saw in the previous section is to terminate the process. We see that the while loop is terminated and we get the below on our console: ^CTraceback (most recent call last): File \"\", line 2, in KeyboardInterrupt The process terminated(default action) since it received a SIGINT(Keyboard Interrupt) when we pressed Ctrl+C. Signal Handler for SIGINT: Let us run the below lines of code in the Python environment. import signal import sys #Start of signal_handler function def signal_handler(signal, frame): print ('You pressed Ctrl+C!') # End of signal_handler function signal.signal(signal.SIGINT, signal_handler) This is an example of a program that defines its own signal handler for SIGINT , overriding the default action. Now let us run the while and continue statement as we did previously. while 1: continue Do we see any changes when Ctrl+C is pressed? Does the program terminate? We see the below output: ^CYou pressed Ctrl+C! Everytime we press Ctrl+C, we just the see the above message and the program does not terminate. Inorder to terminate the program, you can press Ctrl+Z which sends the SIGSTOP signal whose default action is to stop the process. In the case of the signal handler, we define a function signal_handler() which prints \u201cYou pressed Ctrl+C!\u201d and does not terminate the program. The handler is called with two arguments, the signal number and the current stack frame (None or a frame object ). signal.signal() allows defining custom handlers to be executed when a signal is received. Its two arguments are the signal number(name) you want to trap and the name of the signal handler. Role of signals in system calls with the example of wait() The wait() system call waits for one of the children of the calling process to terminate and returns the termination status of that child in the buffer pointed to by statusPtr . If the parent process calls the wait() system call, then the execution of the parent is suspended until the child is terminated. At the termination of the child, a SIGCHLD signal is generated which is delivered to the parent by the kernel. SIGCHLD signal indicates to the parent that there is some information on the child that needs to be collected. Parent, on receipt of SIGCHLD , reaps the status of the child from the process table. Even though the child is terminated, there is an entry in the process table corresponding to the child where the process entry and PID is stored. When the parent collects the status, this entry is deleted. Thus, all the traces of the child process are removed from the system. Zombie and Orphane States If the parent decides not to wait for the child\u2019s termination and it executes its subsequent task, or fails to read the exit status of the child, there remains an entry in the process table even after the termination of the child. This state of the child process is known as the Zombie state. In order to avoid long-lasting zombies, we need to have code that calls wait() after the child process is created. It is generally good to create a signal handler for the SIGCHLD signal, calling one of the wait-family functions in a loop, until no uncollected child data remains. A child process becomes orphaned, if its parent process terminates before the child .The orphaned child is adopted by init/systemd, the ancestor of all processes, whose process ID is 1. Further calls to fetch the parent pid of this process returns 1.","title":"Signals"},{"location":"level102/system_calls_and_signals/signals/#introduction-to-interrupts-and-signals","text":"An interrupt is an event that alters the normal execution flow of a program and can be generated by hardware devices or even by the CPU itself. When an interrupt occurs the current flow of execution is suspended and the interrupt handler runs. After the interrupt handler runs the previous execution flow is resumed. There are three types of events that can cause the CPU to interrupt: hardware interrupts, software interrupts, and exceptions. Signals are nothing but software interrupts that notifies a process that an event has occurred. These events might be requests from users or indications that a system problem (such as a memory access error) has occurred. Every signal has a signal number and a default action defined. A process can react to them in any of the following ways: a default (OS-provided) way catch the signal and handle them in a program-defined way ignore the signal entirely","title":"Introduction to interrupts and signals"},{"location":"level102/system_calls_and_signals/signals/#signal-groups","text":"Signals fall into two broad categories. The first set constitutes the traditional or standard signals, which are used by the kernel to notify processes of events. On Linux, the standard signals are numbered from 1 to 31. The other set of signals consists of the realtime signals. Linux supports both POSIX reliable signals (hereinafter \"standard signals\") and POSIX real-time signals.","title":"Signal Groups"},{"location":"level102/system_calls_and_signals/signals/#realtime-signals","text":"Realtime signals were defined in POSIX.1b to remedy a number of limitations of standard signals. They have the following advantages over standard signals: Realtime signals provide an increased range of signals that can be used for application-defined purposes. Only two standard signals are freely available for application-defined purposes: SIGUSR1 and SIGUSR2. Realtime signals are queued. If multiple instances of a realtime signal are sent to a process, then the signal is delivered multiple times. By contrast, if we send further instances of a standard signal that is already pending for a process, that signal is delivered only once. When sending a realtime signal, it is possible to specify data (an integer or pointer value) that accompanies the signal. The signal handler in the receiving process can retrieve this data. The order of delivery of different realtime signals is guaranteed. If multiple different realtime signals are pending, then the lowest-numbered signal is delivered first. In other words, signals are prioritized, with lower-numbered signals having higher priority. When multiple signals of the same type are queued, they are delivered\u2014along with their accompanying data\u2014in the order in which they were sent.","title":"Realtime Signals"},{"location":"level102/system_calls_and_signals/signals/#standard-signals","text":"The standard signals are the classical signals that have been there since the early days of Unix. Further here, we will be discussing about standard signals.","title":"Standard Signals"},{"location":"level102/system_calls_and_signals/signals/#signal-overview","text":"A signal is said to be generated by some event. Once generated, a signal is later delivered to a process, which then takes some action in response to the signal. Between the time it is generated and the time it is delivered, a signal is said to be pending. Normally, a pending signal is delivered to a process as soon as it is next scheduled to run, or immediately if the process is already running (e.g., if the process sent a signal to itself). Sometimes, however, we need to ensure that a segment of code is not interrupted by the delivery of a signal. To do this, we can add a signal to the process\u2019s signal mask - a set of signals whose delivery is currently blocked. If a signal is generated while it is blocked, it remains pending until it is later unblocked (removed from the signal mask). Various system calls allow a process to add and remove signals from its signal mask. Upon delivery of a signal, a process carries out one of the following default actions, depending on the signal: The signal is ignored; that is, it is discarded by the kernel and has no effect on the process. (The process never even knows that it occurred.) The process is terminated (killed). This is sometimes referred to as abnormal process termination, as opposed to the normal process termination that occurs when a process terminates using exit(). A core dump file is generated, and the process is terminated. A core dump file contains an image of the virtual memory of the process, which can be loaded into a debugger in order to inspect the state of the process at the time that it terminated. The process is stopped\u2014execution of the process is suspended. Execution of the process is resumed after previously being stopped. Instead of accepting the default for a particular signal, a program can change the action that occurs when the signal is delivered. This is known as setting the disposition of the signal. To read more about disposition, refer here . A program can set one of the following dispositions for a signal: The default action should occur. This is useful to undo an earlier change of the disposition of the signal to something other than its default. The signal is ignored. This is useful for a signal whose default action would be to terminate the process. A signal handler is executed. A signal handler is a function, written by the programmer, that performs appropriate tasks in response to the delivery of a signal. For example, the shell has a handler for the SIGINT signal (generated by the interrupt character, Control-C) that causes it to stop what it is currently doing and return control to the main input loop, so that the user is once more presented with the shell prompt. Notifying the kernel that a handler function should be invoked is usually referred to as installing or establishing a signal handler. When a signal handler is invoked in response to the delivery of a signal, we say that the signal has been handled or, synonymously, caught. Note that it isn\u2019t possible to set the disposition of a signal to terminate or dump core (unless one of these is the default disposition of the signal). The nearest we can get to this is to install a handler for the signal that then calls either exit() or abort(). The abort() function generates a SIGABRT signal for the process, which causes it to dump core and terminate.","title":"Signal Overview"},{"location":"level102/system_calls_and_signals/signals/#types-of-signals","text":"To list available signals in a Linux system, you can use the command kill -l . The table below lists the signals 1 to 20. To get a full list of signals, you can refer here . Signal name Signal number Default Action Meaning SIGHUP 1 Terminate Hangup detected on controlling terminal or death of controlling process SIGINT 2 Terminate Interrupt from keyboard SIGQUIT 3 Core dump Quit from keyboard SIGILL 4 Core dump Illegal instruction SIGTRAP 5 Core dump Trace/breakpoint trap for debugging SIGABRT , SIGIOT 6 Core dump Abnormal termination SIGBUS 7 Core dump Bus error SIGFPE 8 Core dump Floating point exception SIGKILL 9 Terminate Kill signal(cannot be caught or ignored) SIGUSR1 10 Terminate User-defined signal 1 SIGSEGV 11 Core dump Invalid memory reference SIGUSR2 12 Terminate User-defined signal 2 SIGPIPE 13 Terminate Broken pipe;write pipe with no readers SIGALRM 14 Terminate Timer signal from alarm SIGTERM 15 Terminate Process termination SIGSTKFLT 16 Terminate Stack fault on math co-processor SIGCHLD 17 Ignore Child stopped or terminated SIGCONT 18 Continue Continue if stopped SIGSTOP 19 Stop Stop process (can not be caught or ignore) SIGTSTP 20 Stop Stop types at tty","title":"Types of signals"},{"location":"level102/system_calls_and_signals/signals/#sending-signals-to-process","text":"There are three different ways to send signals to processes: Sending signal to process using kill Kill command can be used to send signals to process. By default a SIGTERM signal is sent but a different type of signal can be sent to the process by defining the signal number(or signal name). For example, the command kill -9 367 sends SIGKILL to the process with PID 367 Sending signal to process via keyboard Signals can be sent to a running process by pressing some specific keys. For example, holding Ctrl+C sends SIGINT to the process which terminates it. Sending signal to process via another process A process can send a signal to another process via the kill() system call. In this use, signals can be employed as a synchronization technique, or even as a primitive form of interprocess communication (IPC). It is also possible for a process to send a signal to itself. int kill(pid_t pid, int sig) system call takes 2 arguments, pid of the process you wish to send the signal to and the signal number of the desired signal.","title":"Sending signals to process"},{"location":"level102/system_calls_and_signals/signals/#handling-signals","text":"Referring to the table of signals in the previous section, you can see that there are default handlers attached to all signals when the program is started. When we invoke signal to attach our own handler, we are over-riding the default behaviour of the program in response to that signal. Specifically, if we attach a handler to SIGINT, the program will no longer terminate when you press +C (or send the program a SIGINT by any other means); rather, the function specified as the handler will be invoked instead which will define the behaviour of the program in response to that signal. Let\u2019s take an example of handling SIGINT signal and terminating a program. We will use Python\u2019s signal library to achieve this. When we press Ctrl+C, SIGINT signal is sent. From the signals table, we see that the default action for SIGINT is to terminate the process. To illustrate how the process reacts to the default action and a signal handler, let us consider the below example. Default Action of SIGINT: Let us first run the below lines in a python environment: while 1: continue Now let us press \"Ctrl+C\". On pressing \"Ctrl+C\" , a SIGINT interrupt is sent to the process and the default action for SIGINT as per the table we saw in the previous section is to terminate the process. We see that the while loop is terminated and we get the below on our console: ^CTraceback (most recent call last): File \"\", line 2, in KeyboardInterrupt The process terminated(default action) since it received a SIGINT(Keyboard Interrupt) when we pressed Ctrl+C. Signal Handler for SIGINT: Let us run the below lines of code in the Python environment. import signal import sys #Start of signal_handler function def signal_handler(signal, frame): print ('You pressed Ctrl+C!') # End of signal_handler function signal.signal(signal.SIGINT, signal_handler) This is an example of a program that defines its own signal handler for SIGINT , overriding the default action. Now let us run the while and continue statement as we did previously. while 1: continue Do we see any changes when Ctrl+C is pressed? Does the program terminate? We see the below output: ^CYou pressed Ctrl+C! Everytime we press Ctrl+C, we just the see the above message and the program does not terminate. Inorder to terminate the program, you can press Ctrl+Z which sends the SIGSTOP signal whose default action is to stop the process. In the case of the signal handler, we define a function signal_handler() which prints \u201cYou pressed Ctrl+C!\u201d and does not terminate the program. The handler is called with two arguments, the signal number and the current stack frame (None or a frame object ). signal.signal() allows defining custom handlers to be executed when a signal is received. Its two arguments are the signal number(name) you want to trap and the name of the signal handler.","title":"Handling signals"},{"location":"level102/system_calls_and_signals/signals/#role-of-signals-in-system-calls-with-the-example-of-wait","text":"The wait() system call waits for one of the children of the calling process to terminate and returns the termination status of that child in the buffer pointed to by statusPtr . If the parent process calls the wait() system call, then the execution of the parent is suspended until the child is terminated. At the termination of the child, a SIGCHLD signal is generated which is delivered to the parent by the kernel. SIGCHLD signal indicates to the parent that there is some information on the child that needs to be collected. Parent, on receipt of SIGCHLD , reaps the status of the child from the process table. Even though the child is terminated, there is an entry in the process table corresponding to the child where the process entry and PID is stored. When the parent collects the status, this entry is deleted. Thus, all the traces of the child process are removed from the system.","title":"Role of signals in system calls with the example of wait()"},{"location":"level102/system_calls_and_signals/signals/#zombie-and-orphane-states","text":"If the parent decides not to wait for the child\u2019s termination and it executes its subsequent task, or fails to read the exit status of the child, there remains an entry in the process table even after the termination of the child. This state of the child process is known as the Zombie state. In order to avoid long-lasting zombies, we need to have code that calls wait() after the child process is created. It is generally good to create a signal handler for the SIGCHLD signal, calling one of the wait-family functions in a loop, until no uncollected child data remains. A child process becomes orphaned, if its parent process terminates before the child .The orphaned child is adopted by init/systemd, the ancestor of all processes, whose process ID is 1. Further calls to fetch the parent pid of this process returns 1.","title":"Zombie and Orphane States"},{"location":"level102/system_calls_and_signals/system_calls/","text":"Introduction A system call is a controlled entry point into the kernel, allowing a process to request the kernel to perform some action on the process\u2019s behalf. The kernel makes a range of services accessible to programs via the system call application programming interface (API). Application developers often do not have direct access to the system calls, but can access them through this API. These services include, for example, creating a new process, performing I/O, and creating a pipe for interprocess communication. The set of system calls is fixed. Each system call is identified by a unique number. The list of different system calls can be found here . A system call changes the processor state from user mode to kernel mode, so that the CPU can access protected kernel memory. Each system call may have a set of arguments that specify information to be transferred from user space (i.e., the process\u2019s virtual address space) to kernel space and vice versa. From a programming point of view, invoking a system call looks much like calling a C function. Types of system calls There are mainly 5 types of different system calls. They are : Process Control: These system calls are used to handle tasks related to a process such as process creation, termination,etc. File Management: These system calls are used for operations on files such as reading/writing a file. Device Management: These system calls are used to deal with devices such as reading/writing into device buffers. Information Maintenance: These system calls handle information and its transfer between the operating system and the user program. Communication: These system calls are useful for inter-process communication. They are also used for creating and deleting a communication connection. Types Of System Calls Examples in Linux Process Control fork(),exit(),wait() File Management open(), read(),write() Device Management ioctl(),read(),write() Information Maintenance getpid(),alarm(),sleep() Communication pipe(),shmget(),mmap() User mode, kernel mode and their transitions Modern processor architectures typically allow the CPU to operate in at least two different modes: user mode and kernel mode . Correspondingly, areas of virtual memory can be marked as being part of user space or kernel space. When running in user mode, the CPU can access only memory that is marked as being in user space; attempts to access memory in kernel space result in a hardware exception. At any given time, a process may be executing in either user mode or kernel mode. The type of instructions that can be executed depends on the mode and this is enforced at the hardware level. CPU modes (also called processor modes, CPU states, CPU privilege levels) are operating modes for the central processing unit of some computer architectures that place restrictions on the type and scope of operations that can be performed by certain processes being run by the CPU. The kernel itself is not a process but a process manager. The kernel model assumes that processes that require a kernel service use specific programming constructs called system calls. When a program is executed in user mode, it cannot directly access the kernel data structures or the kernel programs. When an application executes in kernel mode, however, these restrictions no longer apply. A program usually executes in user mode and switches to kernel mode only when requesting a service provided by the kernel. If an application needs access to hardware resources on the system(like peripherals,memory,disks), it must issue a system call, which causes a context switch from user mode to kernel mode. This procedure is followed when reading/writing from/to files, etc. It is only the system call itself which runs in kernel mode, not the application code. When the system call is complete, the process returns to the user mode with the return value using an inverse context switch. Apart from system calls, kernel routines can be activated in the below ways as well: The CPU executing the process signals an exception , which is an unusual condition such as an invalid instruction. The kernel handles the exception on behalf of the process that caused it. A peripheral device issues an interrupt signal to the CPU to notify it of an event such as a request for attention, a status change, or the completion of an I/O operation. Each interrupt signal is dealt by a kernel program called an interrupt handler . Since peripheral devices operate asynchronously with respect to the CPU, interrupts occur at unpredictable times. A kernel thread is executed. Since it runs in kernel Mode, the corresponding program must be considered part of the kernel. In the above diagram, Process 1 in User Mode issues a system call, after which the process switches to Kernel Mode and the system call is serviced. Process 1 then resumes execution in User Mode until a timer interrupt occurs and the scheduler is activated in Kernel Mode. A process switch takes place and Process 2 starts its execution in User Mode until a hardware device raises an interrupt. As a consequence of the interrupt, Process 2 switches to Kernel Mode and services the interrupt. Working of write() system call The write() system call writes data to an open file. # include ssize_t write(int fd, void *buffer, size_t count); buffer is the address of the data to be written; count is the number of bytes to write from buffer; and fd is a file descriptor referring to the file to which data is to be written. write() call writes up to count bytes from buffer to the open file referred to by fd . On success, write() call returns the number of bytes actually written, which may be less than count and returns -1 on error. When performing I/O on a disk file, a successful return from write() doesn\u2019t guarantee that the data has been transferred to disk, because the kernel performs buffering of disk I/O in order to reduce disk activity and expedite write() calls. It simply copies data between a user-space buffer and a buffer in the kernel buffer cache. At some later point, the kernel writes (flushes) its buffer to the disk. If, in the interim, another process attempts to read these bytes of the file, then the kernel automatically supplies the data from the buffer cache, rather than from (the outdated contents of) the file. The aim of this design is to allow write() to be fast, since they don\u2019t need to wait on a (slow) disk operation. This design is also efficient, since it reduces the number of disk transfers that the kernel must perform. Debugging in Linux with strace strace is a tool used to trace the transition between user processes and the Linux kernel. Inorder to use the tool, we need ensure that it is installed in the system by running the command: $ rpm -qa | grep -i strace strace-4.12-9.el7.x86_64 If the above command does not give any output, you can install the tool via: $ yum install strace The functions which are a part of standard C library are known as library functions. The purposes of these functions are very diverse, including such tasks as opening a file, converting a time to a human-readable format, and comparing two character strings. Some library functions are layered on top of system calls. Often, library functions are designed to provide a more caller-friendly interface than the underlying system call. For example, the printf() function provides output formatting and data buffering, whereas the write() system call just outputs a block of bytes.The most commonly used implementation of the standard C library on Linux is the GNU C library glibc . The C programming language gives printf() that lets the user write data in many different formats. So printf() as a function converts your data into a formatted sequence of bytes and that calls write() to write those bytes onto the output. Let us examine what happens when a printf() statement is executed with the use of strace command : strace printf %s \u201cHello world\u201d ~]$ strace printf %s \"Hello world\" execve(\"/usr/bin/printf\", [\"printf\", \"%s\", \"Hello world\"], [/* 47 vars */]) = 0 brk(NULL) = 0x90d000 mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f8fc672f000 access(\"/etc/ld.so.preload\", R_OK) = -1 ENOENT (No such file or directory) open(\"/etc/ld.so.cache\", O_RDONLY|O_CLOEXEC) = 3 fstat(3, {st_mode=S_IFREG|0644, st_size=98854, ...}) = 0 mmap(NULL, 98854, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7f8fc6716000 close(3) = 0 open(\"/lib64/libc.so.6\", O_RDONLY|O_CLOEXEC) = 3 read(3, \"\\177ELF\\2\\1\\1\\3\\0\\0\\0\\0\\0\\0\\0\\0\\3\\0>\\0\\1\\0\\0\\0\\20&\\2\\0\\0\\0\\0\\0\"..., 832) = 832 fstat(3, {st_mode=S_IFREG|0755, st_size=2156160, ...}) = 0 mmap(NULL, 3985888, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7f8fc6141000 mprotect(0x7f8fc6304000, 2097152, PROT_NONE) = 0 mmap(0x7f8fc6504000, 24576, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x1c3000) = 0x7f8fc6504000 mmap(0x7f8fc650a000, 16864, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x7f8fc650a000 close(3) = 0 mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f8fc6715000 mmap(NULL, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f8fc6713000 arch_prctl(ARCH_SET_FS, 0x7f8fc6713740) = 0 mprotect(0x7f8fc6504000, 16384, PROT_READ) = 0 mprotect(0x60a000, 4096, PROT_READ) = 0 mprotect(0x7f8fc6730000, 4096, PROT_READ) = 0 munmap(0x7f8fc6716000, 98854) = 0 brk(NULL) = 0x90d000 brk(0x92e000) = 0x92e000 brk(NULL) = 0x92e000 open(\"/usr/lib/locale/locale-archive\", O_RDONLY|O_CLOEXEC) = 3 fstat(3, {st_mode=S_IFREG|0644, st_size=106075056, ...}) = 0 mmap(NULL, 106075056, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7f8fbfc17000 close(3) = 0 open(\"/usr/share/locale/locale.alias\", O_RDONLY|O_CLOEXEC) = 3 fstat(3, {st_mode=S_IFREG|0644, st_size=2502, ...}) = 0 mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f8fc672e000 read(3, \"# Locale name alias data base.\\n#\"..., 4096) = 2502 read(3, \"\", 4096) = 0 close(3) = 0 munmap(0x7f8fc672e000, 4096) = 0 open(\"/usr/lib/locale/UTF-8/LC_CTYPE\", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory) fstat(1, {st_mode=S_IFCHR|0620, st_rdev=makedev(136, 1), ...}) = 0 mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f8fc672e000 write(1, \"Hello world\", 11Hello world) = 11 close(1) = 0 munmap(0x7f8fc672e000, 4096) = 0 close(2) = 0 exit_group(0) = ? +++ exited with 0 +++ execve(\"/usr/bin/printf\", [\"printf\", \"%s\", \"Hello world\"], [/ 47 vars /]) = 0 The first system call made is execve() and does three things: The operating system (OS) stops the duplicated process (of the parent). OS loads up the new program (in this case: printf() ), and starts the new program. execve() replaces defining parts of the current process's memory stack with the new stuff loaded from the printf executable file. The first word of the line, execve, is the name of the system call being executed. The first parameter must be the path of a binary executable or a script. The second is an array of argument strings passed to the new program. By convention, the first of these strings should contain the filename associated with the file being executed. The third parameter must be an environment variable. The number after the = sign (which is 0 in this case) is a value returned by the execve system call which indicates that the call was successful. open(\"/usr/lib/locale/UTF-8/LC_CTYPE\", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory) In this line, the program tried to open() file /usr/lib/locale/UTF-8/LC_CTYPE . However the system call failed (with -1 status) with the descriptive error message No such file or directory , as the file wasn\u2019t found (ENOENT). brk(NULL) = 0x90d000 brk(0x92e000) = 0x92e000 brk(NULL) = 0x92e000 The system call brk() is used to increase or decrease the process\u2019s data segment. It returns the new address where the data segment of the process is to end. open(\"/lib64/libc.so.6\", O_RDONLY|O_CLOEXEC) = 3 read(3, \"\\177ELF\\2\\1\\1\\3\\0\\0\\0\\0\\0\\0\\0\\0\\3\\0>\\0\\1\\0\\0\\0\\20&\\2\\0\\0\\0\\0\\0\"..., 832) = 832 In the above lines of the console output, we see that a successful open() call is made, followed by read() system call. In open() , the first parameter is the path to file which you want to use and the second parameter defines the permissions. In this example, O_RDONLY which means the file is read only and O_CLOEXEC which enables the close-on-exec flag for the opened file. This is useful to avoid race conditions in multithreaded programs where one thread opens a file descriptor at the same time as another thread. 3 indicates the file descriptor used to open the file. Since fd 0, 1, 2 are already taken by stdin, stdout and stderr. So first unused file descriptor is 3 in file descriptor table. If open() In read() , the first parameter is the file descriptor which is 3(the file was opened using this file descriptor by open() . The second parameter is the buffer to read data from and the third parameter is the length of the buffer. The return value is 832 which is the number of bytes read. close(3) = 0 A close system call is used to close a file descriptor by the kernel. For most file systems, a program terminates access to a file in a filesystem using the close system call. 0 after the = sign indicates that the system call was successful. write(1, \"Hello world\", 11Hello world) = 11 In the previous section, we described the write() system call and the arguments that it takes. Whenever we see any output to the video screen, it\u2019s from the file named /dev/tty and written to stdout on screen through fd 1. The first parameter is the file descriptor , second parameter is the buffer containing the information to be written and the last parameter contains the count of characters. On success, the number of bytes written are returned (zero indicates nothing was written) which is 11 in this example. +++ exited with 0 +++ This indicates that the program exited successfully with exit code 0. An exit code of 0 generally indicates successful execution and termination in Linux programs. You don't need to memorize all the system calls or what they do, because you can refer to documentation when you need to. Ensure the following package is installed before running the man command: $ rpm -qa | grep -i man-pages man-pages-3.53-5.el7.noarch Run the following man command with the system call name to see the documentation for that system call(for eg, execve): man 2 execve Apart from system calls, strace can be used to detect the files that are being accessed by the program. In the above trace, we have a system call open(\"/lib64/libc.so.6\", O_RDONLY|O_CLOEXEC) = 3 which is opening the libc shared object /lib64/libc.so.6 which is the C implementation of various standard functions. It is the file where we see the printf() definition needed for printing Hello World . Strace can also be used to check if a program is hanging or stuck. When we have the trace, we can observe at which operation the program is stuck as well. Furthermore, as we go through the trace, we can also find errors(if there are any) to point out why the program is hung/stuck. Strace can be very helpful in finding the reason behind slow performance of a program. Alhough strace has the aforementioned uses to it, if you're running a trace in a production environment, strace is not a good choice. It introduces a substantial amount of overhead. According to a performance test conducted by Arnaldo Carvalho de Melo, a senior software engineer at Red Hat, the process traced using strace ran 173 times slower, which can be disastrous for a production environment.","title":"System Calls"},{"location":"level102/system_calls_and_signals/system_calls/#introduction","text":"A system call is a controlled entry point into the kernel, allowing a process to request the kernel to perform some action on the process\u2019s behalf. The kernel makes a range of services accessible to programs via the system call application programming interface (API). Application developers often do not have direct access to the system calls, but can access them through this API. These services include, for example, creating a new process, performing I/O, and creating a pipe for interprocess communication. The set of system calls is fixed. Each system call is identified by a unique number. The list of different system calls can be found here . A system call changes the processor state from user mode to kernel mode, so that the CPU can access protected kernel memory. Each system call may have a set of arguments that specify information to be transferred from user space (i.e., the process\u2019s virtual address space) to kernel space and vice versa. From a programming point of view, invoking a system call looks much like calling a C function.","title":"Introduction"},{"location":"level102/system_calls_and_signals/system_calls/#types-of-system-calls","text":"There are mainly 5 types of different system calls. They are : Process Control: These system calls are used to handle tasks related to a process such as process creation, termination,etc. File Management: These system calls are used for operations on files such as reading/writing a file. Device Management: These system calls are used to deal with devices such as reading/writing into device buffers. Information Maintenance: These system calls handle information and its transfer between the operating system and the user program. Communication: These system calls are useful for inter-process communication. They are also used for creating and deleting a communication connection. Types Of System Calls Examples in Linux Process Control fork(),exit(),wait() File Management open(), read(),write() Device Management ioctl(),read(),write() Information Maintenance getpid(),alarm(),sleep() Communication pipe(),shmget(),mmap()","title":"Types of system calls"},{"location":"level102/system_calls_and_signals/system_calls/#user-mode-kernel-mode-and-their-transitions","text":"Modern processor architectures typically allow the CPU to operate in at least two different modes: user mode and kernel mode . Correspondingly, areas of virtual memory can be marked as being part of user space or kernel space. When running in user mode, the CPU can access only memory that is marked as being in user space; attempts to access memory in kernel space result in a hardware exception. At any given time, a process may be executing in either user mode or kernel mode. The type of instructions that can be executed depends on the mode and this is enforced at the hardware level. CPU modes (also called processor modes, CPU states, CPU privilege levels) are operating modes for the central processing unit of some computer architectures that place restrictions on the type and scope of operations that can be performed by certain processes being run by the CPU. The kernel itself is not a process but a process manager. The kernel model assumes that processes that require a kernel service use specific programming constructs called system calls. When a program is executed in user mode, it cannot directly access the kernel data structures or the kernel programs. When an application executes in kernel mode, however, these restrictions no longer apply. A program usually executes in user mode and switches to kernel mode only when requesting a service provided by the kernel. If an application needs access to hardware resources on the system(like peripherals,memory,disks), it must issue a system call, which causes a context switch from user mode to kernel mode. This procedure is followed when reading/writing from/to files, etc. It is only the system call itself which runs in kernel mode, not the application code. When the system call is complete, the process returns to the user mode with the return value using an inverse context switch. Apart from system calls, kernel routines can be activated in the below ways as well: The CPU executing the process signals an exception , which is an unusual condition such as an invalid instruction. The kernel handles the exception on behalf of the process that caused it. A peripheral device issues an interrupt signal to the CPU to notify it of an event such as a request for attention, a status change, or the completion of an I/O operation. Each interrupt signal is dealt by a kernel program called an interrupt handler . Since peripheral devices operate asynchronously with respect to the CPU, interrupts occur at unpredictable times. A kernel thread is executed. Since it runs in kernel Mode, the corresponding program must be considered part of the kernel. In the above diagram, Process 1 in User Mode issues a system call, after which the process switches to Kernel Mode and the system call is serviced. Process 1 then resumes execution in User Mode until a timer interrupt occurs and the scheduler is activated in Kernel Mode. A process switch takes place and Process 2 starts its execution in User Mode until a hardware device raises an interrupt. As a consequence of the interrupt, Process 2 switches to Kernel Mode and services the interrupt.","title":"User mode, kernel mode and their transitions"},{"location":"level102/system_calls_and_signals/system_calls/#working-of-write-system-call","text":"The write() system call writes data to an open file. # include ssize_t write(int fd, void *buffer, size_t count); buffer is the address of the data to be written; count is the number of bytes to write from buffer; and fd is a file descriptor referring to the file to which data is to be written. write() call writes up to count bytes from buffer to the open file referred to by fd . On success, write() call returns the number of bytes actually written, which may be less than count and returns -1 on error. When performing I/O on a disk file, a successful return from write() doesn\u2019t guarantee that the data has been transferred to disk, because the kernel performs buffering of disk I/O in order to reduce disk activity and expedite write() calls. It simply copies data between a user-space buffer and a buffer in the kernel buffer cache. At some later point, the kernel writes (flushes) its buffer to the disk. If, in the interim, another process attempts to read these bytes of the file, then the kernel automatically supplies the data from the buffer cache, rather than from (the outdated contents of) the file. The aim of this design is to allow write() to be fast, since they don\u2019t need to wait on a (slow) disk operation. This design is also efficient, since it reduces the number of disk transfers that the kernel must perform.","title":"Working of write() system call"},{"location":"level102/system_calls_and_signals/system_calls/#debugging-in-linux-with-strace","text":"strace is a tool used to trace the transition between user processes and the Linux kernel. Inorder to use the tool, we need ensure that it is installed in the system by running the command: $ rpm -qa | grep -i strace strace-4.12-9.el7.x86_64 If the above command does not give any output, you can install the tool via: $ yum install strace The functions which are a part of standard C library are known as library functions. The purposes of these functions are very diverse, including such tasks as opening a file, converting a time to a human-readable format, and comparing two character strings. Some library functions are layered on top of system calls. Often, library functions are designed to provide a more caller-friendly interface than the underlying system call. For example, the printf() function provides output formatting and data buffering, whereas the write() system call just outputs a block of bytes.The most commonly used implementation of the standard C library on Linux is the GNU C library glibc . The C programming language gives printf() that lets the user write data in many different formats. So printf() as a function converts your data into a formatted sequence of bytes and that calls write() to write those bytes onto the output. Let us examine what happens when a printf() statement is executed with the use of strace command : strace printf %s \u201cHello world\u201d ~]$ strace printf %s \"Hello world\" execve(\"/usr/bin/printf\", [\"printf\", \"%s\", \"Hello world\"], [/* 47 vars */]) = 0 brk(NULL) = 0x90d000 mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f8fc672f000 access(\"/etc/ld.so.preload\", R_OK) = -1 ENOENT (No such file or directory) open(\"/etc/ld.so.cache\", O_RDONLY|O_CLOEXEC) = 3 fstat(3, {st_mode=S_IFREG|0644, st_size=98854, ...}) = 0 mmap(NULL, 98854, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7f8fc6716000 close(3) = 0 open(\"/lib64/libc.so.6\", O_RDONLY|O_CLOEXEC) = 3 read(3, \"\\177ELF\\2\\1\\1\\3\\0\\0\\0\\0\\0\\0\\0\\0\\3\\0>\\0\\1\\0\\0\\0\\20&\\2\\0\\0\\0\\0\\0\"..., 832) = 832 fstat(3, {st_mode=S_IFREG|0755, st_size=2156160, ...}) = 0 mmap(NULL, 3985888, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7f8fc6141000 mprotect(0x7f8fc6304000, 2097152, PROT_NONE) = 0 mmap(0x7f8fc6504000, 24576, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x1c3000) = 0x7f8fc6504000 mmap(0x7f8fc650a000, 16864, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x7f8fc650a000 close(3) = 0 mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f8fc6715000 mmap(NULL, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f8fc6713000 arch_prctl(ARCH_SET_FS, 0x7f8fc6713740) = 0 mprotect(0x7f8fc6504000, 16384, PROT_READ) = 0 mprotect(0x60a000, 4096, PROT_READ) = 0 mprotect(0x7f8fc6730000, 4096, PROT_READ) = 0 munmap(0x7f8fc6716000, 98854) = 0 brk(NULL) = 0x90d000 brk(0x92e000) = 0x92e000 brk(NULL) = 0x92e000 open(\"/usr/lib/locale/locale-archive\", O_RDONLY|O_CLOEXEC) = 3 fstat(3, {st_mode=S_IFREG|0644, st_size=106075056, ...}) = 0 mmap(NULL, 106075056, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7f8fbfc17000 close(3) = 0 open(\"/usr/share/locale/locale.alias\", O_RDONLY|O_CLOEXEC) = 3 fstat(3, {st_mode=S_IFREG|0644, st_size=2502, ...}) = 0 mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f8fc672e000 read(3, \"# Locale name alias data base.\\n#\"..., 4096) = 2502 read(3, \"\", 4096) = 0 close(3) = 0 munmap(0x7f8fc672e000, 4096) = 0 open(\"/usr/lib/locale/UTF-8/LC_CTYPE\", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory) fstat(1, {st_mode=S_IFCHR|0620, st_rdev=makedev(136, 1), ...}) = 0 mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f8fc672e000 write(1, \"Hello world\", 11Hello world) = 11 close(1) = 0 munmap(0x7f8fc672e000, 4096) = 0 close(2) = 0 exit_group(0) = ? +++ exited with 0 +++ execve(\"/usr/bin/printf\", [\"printf\", \"%s\", \"Hello world\"], [/ 47 vars /]) = 0 The first system call made is execve() and does three things: The operating system (OS) stops the duplicated process (of the parent). OS loads up the new program (in this case: printf() ), and starts the new program. execve() replaces defining parts of the current process's memory stack with the new stuff loaded from the printf executable file. The first word of the line, execve, is the name of the system call being executed. The first parameter must be the path of a binary executable or a script. The second is an array of argument strings passed to the new program. By convention, the first of these strings should contain the filename associated with the file being executed. The third parameter must be an environment variable. The number after the = sign (which is 0 in this case) is a value returned by the execve system call which indicates that the call was successful. open(\"/usr/lib/locale/UTF-8/LC_CTYPE\", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory) In this line, the program tried to open() file /usr/lib/locale/UTF-8/LC_CTYPE . However the system call failed (with -1 status) with the descriptive error message No such file or directory , as the file wasn\u2019t found (ENOENT). brk(NULL) = 0x90d000 brk(0x92e000) = 0x92e000 brk(NULL) = 0x92e000 The system call brk() is used to increase or decrease the process\u2019s data segment. It returns the new address where the data segment of the process is to end. open(\"/lib64/libc.so.6\", O_RDONLY|O_CLOEXEC) = 3 read(3, \"\\177ELF\\2\\1\\1\\3\\0\\0\\0\\0\\0\\0\\0\\0\\3\\0>\\0\\1\\0\\0\\0\\20&\\2\\0\\0\\0\\0\\0\"..., 832) = 832 In the above lines of the console output, we see that a successful open() call is made, followed by read() system call. In open() , the first parameter is the path to file which you want to use and the second parameter defines the permissions. In this example, O_RDONLY which means the file is read only and O_CLOEXEC which enables the close-on-exec flag for the opened file. This is useful to avoid race conditions in multithreaded programs where one thread opens a file descriptor at the same time as another thread. 3 indicates the file descriptor used to open the file. Since fd 0, 1, 2 are already taken by stdin, stdout and stderr. So first unused file descriptor is 3 in file descriptor table. If open() In read() , the first parameter is the file descriptor which is 3(the file was opened using this file descriptor by open() . The second parameter is the buffer to read data from and the third parameter is the length of the buffer. The return value is 832 which is the number of bytes read. close(3) = 0 A close system call is used to close a file descriptor by the kernel. For most file systems, a program terminates access to a file in a filesystem using the close system call. 0 after the = sign indicates that the system call was successful. write(1, \"Hello world\", 11Hello world) = 11 In the previous section, we described the write() system call and the arguments that it takes. Whenever we see any output to the video screen, it\u2019s from the file named /dev/tty and written to stdout on screen through fd 1. The first parameter is the file descriptor , second parameter is the buffer containing the information to be written and the last parameter contains the count of characters. On success, the number of bytes written are returned (zero indicates nothing was written) which is 11 in this example. +++ exited with 0 +++ This indicates that the program exited successfully with exit code 0. An exit code of 0 generally indicates successful execution and termination in Linux programs. You don't need to memorize all the system calls or what they do, because you can refer to documentation when you need to. Ensure the following package is installed before running the man command: $ rpm -qa | grep -i man-pages man-pages-3.53-5.el7.noarch Run the following man command with the system call name to see the documentation for that system call(for eg, execve): man 2 execve Apart from system calls, strace can be used to detect the files that are being accessed by the program. In the above trace, we have a system call open(\"/lib64/libc.so.6\", O_RDONLY|O_CLOEXEC) = 3 which is opening the libc shared object /lib64/libc.so.6 which is the C implementation of various standard functions. It is the file where we see the printf() definition needed for printing Hello World . Strace can also be used to check if a program is hanging or stuck. When we have the trace, we can observe at which operation the program is stuck as well. Furthermore, as we go through the trace, we can also find errors(if there are any) to point out why the program is hung/stuck. Strace can be very helpful in finding the reason behind slow performance of a program. Alhough strace has the aforementioned uses to it, if you're running a trace in a production environment, strace is not a good choice. It introduces a substantial amount of overhead. According to a performance test conducted by Arnaldo Carvalho de Melo, a senior software engineer at Red Hat, the process traced using strace ran 173 times slower, which can be disastrous for a production environment.","title":"Debugging in Linux with strace"},{"location":"level102/system_design/conclusion/","text":"We have looked at designing a sytem from the scratch, scaling it up from a single server to multiple datacenters and hundreds of thousands of users. However, you might have (rightly!) guessed that there is a lot more to system design than what we have covered so far. This course should give you a sweeping glance at the things that are fundamental to any system design process. Specific solutions implemented, frameworks and orchestration systems used evolve rapidly. However, the guiding principles remain the same. We hope you this course helped in getting you started along the right direction and that you have fun designing systems and solving interesting problems.","title":"Conclusion"},{"location":"level102/system_design/intro/","text":"System Design Prerequisites School of SRE - System Design - Phase I What to expect from this course The aim is to empower the reader to understand the building blocks of a well-designed system, evaluate existing systems, understand the trade-offs, come up with their own design, and to explore the various tools available to implement such a system. In phase one of this module, we talked about the fundamentals of system design including concepts like scalability, availability and reliability. We continue to build on those fundamentals in this phase. Throughout the course, there are callout sections that appear like this, and talk about things that are closely related to the system design process, but don\u2019t form a part of the system itself. They also have information about some common issues that crop up in system design. Watch out for them. What is not covered under this course While this course covers many aspects of system design, it does not cover the most fundamental concepts. For such topics, it is advised to go through the prerequisites. In general, this module will not go into actually implementing the architecture - we will not talk about choosing a hosting/cloud provider or an orchestration setup or a CI/CD system. Instead, we try to focus on the fundamental considerations that need to go into system design. Course Contents Introduction Large system Design Scaling Scaling beyond the datacentre Design patterns for resiliency Conclusion Introduction We talked about building a basic photo sharing application in the previous phase of this course. Our basic requirements for the application were that It should work for a reasonably large number of users Avoid service failures/cluster crash in case of any issues In other words, we wanted to build a system that was available, scalable and fault tolerant. We will continue designing that application, and cover additional concepts in the course of doing so. The photo sharing application is a web application that will handle everything from user sign up, log in, uploads, feed generation, user interaction and interaction with uploaded content. Also a database to store this information. In the simplest design, both the web app and the database can run on the same server. Recall this initial design from Phase 1. Building on that, we will talk about performance elements in system design - setting the right performance measurement metrics and using them to drive our design decisions, improving performance using caching, Content Delivery Networks (CDNs), etc. We will also talk about how to design for resilience by looking at some system design patterns - graceful degradation, time-outs and circuit breakers. Cost System design considerations like availability, scalability cannot exist in isolation. When operating outside the lab, we have other considerations / the existing considerations take on a different hue. One such consideration is cost. Real world systems almost always have budget constraints. System design, implementation and continued operation needs to have predictable costs per unit output. The output is usually the business problem you are trying to solve. Striking a balance between the two is very important. Understanding the capabilities of your system A well designed system requires understanding the building blocks intimately in terms of their capabilities. Not all components are created equal, and understanding what a single component can do is very important - for e.g., in the photo upload application it is important to know what a single database instance is capable of, in terms of read or write transactions per second and what would be a reasonable expectation be. This helps in building systems that are appropriately weighted - and will eliminate obvious sources of bottlenecks. On a lower level, even understanding the capabilities of the underlying hardware (or a VM instance if you are on cloud) is important. For eg., all disks don\u2019t perform the same, and all disks don\u2019t perform the same per dollar. If we are planning to have an API that is expected to return a response in 100ms under normal circumstances, then it is important to know how much of it will be spent in which parts of the system. The following link will help in getting a sense of each component\u2019s performance, all the way from the CPU cache to the network link to our end user. Numbers every programmer should know","title":"Introduction"},{"location":"level102/system_design/intro/#system-design","text":"","title":"System Design"},{"location":"level102/system_design/intro/#prerequisites","text":"School of SRE - System Design - Phase I","title":"Prerequisites"},{"location":"level102/system_design/intro/#what-to-expect-from-this-course","text":"The aim is to empower the reader to understand the building blocks of a well-designed system, evaluate existing systems, understand the trade-offs, come up with their own design, and to explore the various tools available to implement such a system. In phase one of this module, we talked about the fundamentals of system design including concepts like scalability, availability and reliability. We continue to build on those fundamentals in this phase. Throughout the course, there are callout sections that appear like this, and talk about things that are closely related to the system design process, but don\u2019t form a part of the system itself. They also have information about some common issues that crop up in system design. Watch out for them.","title":"What to expect from this course"},{"location":"level102/system_design/intro/#what-is-not-covered-under-this-course","text":"While this course covers many aspects of system design, it does not cover the most fundamental concepts. For such topics, it is advised to go through the prerequisites. In general, this module will not go into actually implementing the architecture - we will not talk about choosing a hosting/cloud provider or an orchestration setup or a CI/CD system. Instead, we try to focus on the fundamental considerations that need to go into system design.","title":"What is not covered under this course"},{"location":"level102/system_design/intro/#course-contents","text":"Introduction Large system Design Scaling Scaling beyond the datacentre Design patterns for resiliency Conclusion","title":"Course Contents"},{"location":"level102/system_design/intro/#introduction","text":"We talked about building a basic photo sharing application in the previous phase of this course. Our basic requirements for the application were that It should work for a reasonably large number of users Avoid service failures/cluster crash in case of any issues In other words, we wanted to build a system that was available, scalable and fault tolerant. We will continue designing that application, and cover additional concepts in the course of doing so. The photo sharing application is a web application that will handle everything from user sign up, log in, uploads, feed generation, user interaction and interaction with uploaded content. Also a database to store this information. In the simplest design, both the web app and the database can run on the same server. Recall this initial design from Phase 1. Building on that, we will talk about performance elements in system design - setting the right performance measurement metrics and using them to drive our design decisions, improving performance using caching, Content Delivery Networks (CDNs), etc. We will also talk about how to design for resilience by looking at some system design patterns - graceful degradation, time-outs and circuit breakers.","title":"Introduction"},{"location":"level102/system_design/large-system-design/","text":"Designing a system usually starts out to be abstract - we have large functional blocks that need to work together and are abstracted away into frontend, backend and database layers. However, when it is time to implement the system, especially as an SRE we have no other choice but to think in specific terms. Servers have a fixed amount of memory, storage capacity and processing power. So we need to think about the realistic expectations from our system, assess the requirements, translate them into specific requirements from each component of the system like network, storage and compute. This is typically how almost all large scale systems are built. The folks over at Google have formalized this approach to designing systems as \u2018Non abstract large system design\u2019 (NALSD). According to the Google site reliability workbook, \u201cPractically, NALSD combines elements of capacity planning, component isolation, and graceful system degradation that are crucial to highly available production systems.\u201d We will be using an approach similar to this to build our system. Application requirements Let us define our application requirements in more concrete terms i.e., specific functions: Our photo sharing application must let the user Sign up to become a member, and login to the application Upload photographs, and optionally add a description and tag location and/or people Follow other users on the platform See a feed comprising of photos from other users that they follow View their own profile page, and manage who they follow Let us define expectations for the application\u2019s performance for a better user experience. We also need to define the health of the system. SLIs and SLOs help us do just that. SLIs and SLOs The Google SRE book defines service level indicator(SLI) as \u201ca carefully defined quantitative measure of some aspect of the level of service that is provided.\u201d For our application, we can define multiple SLIs. One indicator can be the response time for loading the feed for our photo sharing application. Picking the right set of SLIs is very important since they essentially help us define the health of the system as a whole using concrete data. SLIs for an application are defined by the owners of the service, in consultation with the SREs. Service level objective (SLO) is defined as \u201ca target value or range of values for a service level that is measured by an SLI\u201d. SLO is a way for us to anchor ourselves to an optimal user experience by defining SLI boundaries. If our application takes a long time to load the feed, users might not open it very often. As a result, an example of SLO can be that at least 99% of the users should see their feed loaded within 1 second. Now that we have defined SLIs and SLOs, let us define the application\u2019s scalability, reliability and performance characteristics in terms of specific SLI and SLO levels. Defining application requirements in terms of SLIs and SLOs The following can be some of the expectations for our application: Once the user successfully uploads the image, it should be accessible to the user and their followers 100% of the time, barring user elected deletion. At least 50000 unique visitors should be able to visit the site at any given time and view their feed. 99% of the users should be able to view their feeds in less than 1 second. Upon uploading a new image, it should show up in the feed of the user\u2019s followers within 15 minutes. Users should be able to upload potentially thousands of images. (as long as they are not abusing the service) Since our ultimate aim is to learn system design, we will arbitrarily limit the functionalities of the system. This will help us keep sight of our aim, and keep us focussed. Having defined the functionalities and expectations for our system, let us quickly sketch an initial design. As of now, all the functionalities are residing on a single server, which has endpoints for all of these functions. We will attempt to build a system that satisfies our SLOs, is able to serve 50k concurrent users, and about a million total users. In the course of this attempt, we will discuss a string of concepts, some of which we have already seen in Phase 1 of this course. Caution Note that the numbers we have picked in the following sections are completely arbitrary. They have been chosen to demonstrate thinking about system design in a non-abstract manner. They have not been benchmarked, and bear no real world resemblance. Do not use them in any real world systems that you may be designing. You should come up with your own numbers, using the guiding principles we have relied upon here. Estimating resource requirements Single Computer If we wished to run the application on a single server, we would need to perform all the above functionalities from the diagram on this server itself. Let us perform some calculations to figure out what kind of resources we will need. Before anything else, we need to store the data about users, their uploads, follower information and any other metadata. We will choose a relational DB to store this information, like MySQL. Do note that we can also choose to use a NOSQL solution here. That would require a similar approach to calculate the requirements. Let us represent the users like so: UserID(int) UserName(varchar) DisplayName(varchar) YearOfBirth(year) Email(varchar) Photos can be represented like this: PhotoID(int) PhotoHash(varchar) Uploadtime(datetime) Location(varchar) OptionalFlag(varchar) Followers can be represented like this: Follower(int) Followee(int) Let us quickly estimate the storage needed for a hundred million total users. A single user would need 4B + 32B + 32B + 4B + 32B = 104 bytes. A hundred million users would need 10.4 GB storage. A single photo would need about 4B + 20B + 4B + 32B + 4B = 64 bytes of storage to store the metadata related to the photo. Assuming a million photos uploaded in one day, we would need about 64 MB of storage per day, just for the metadata. For the photo storage itself, we will need about 300GB per day, assuming 300KB average photo size. A single visitor opening our application simply hits our /get_feed endpoint upon logging in to the application. Let us quickly calculate the resources needed to serve this request. Assuming the initial feed loads 5 images (of 300 KB size on an average) and then does lazy loading to infinitely scroll, we will need to send about 1.5 megabytes of images to the user for his initial call. With a 1000 Mbps* network link to the server, we can send only about (1000/8)/1.5 or about 83 users all loading the feed at the same time, before we saturate the network link. If we needed to serve 50k concurrent users every second, we would need 1.5*50000*8 = 600000 Mbps network throughput needed for every 5 images sent, assuming we send out all 5 images in a single second. If we are reading all of it from disk, we would likely hit disk throughput limits far before approaching anywhere near this amount of traffic. So in order to meet our application requirements, we would need a server that has ~310GB storage for the database and the images of one day, and about 600 Gbps link to serve 50k users concurrently, along with CPU required to perform all this. Clearly not the task for a single server. And do note that we have severely limited the information we are storing in the database. We would likely need an order of magnitude more information to be stored. While we clearly do not have any real world server that has the resources we calculated above, this exercise provides us some valuable data points about what the resource cost is. Armed with this information, let us work on scaling our system through system design to get us as close as possible to our goals for the application. * Modern servers even have multi-gigabit links, but it is highly unlikely that such a huge server will be serving our application alone. Modern cloud providers have VMs that also boast several gigabits of bandwidth, but they usually end up being throttled after certain limits. References: SQL vs NoSQL databases Introducing Non-Abstract Large System Design","title":"Large System Design"},{"location":"level102/system_design/large-system-design/#application-requirements","text":"Let us define our application requirements in more concrete terms i.e., specific functions: Our photo sharing application must let the user Sign up to become a member, and login to the application Upload photographs, and optionally add a description and tag location and/or people Follow other users on the platform See a feed comprising of photos from other users that they follow View their own profile page, and manage who they follow Let us define expectations for the application\u2019s performance for a better user experience. We also need to define the health of the system. SLIs and SLOs help us do just that.","title":"Application requirements"},{"location":"level102/system_design/large-system-design/#slis-and-slos","text":"The Google SRE book defines service level indicator(SLI) as \u201ca carefully defined quantitative measure of some aspect of the level of service that is provided.\u201d For our application, we can define multiple SLIs. One indicator can be the response time for loading the feed for our photo sharing application. Picking the right set of SLIs is very important since they essentially help us define the health of the system as a whole using concrete data. SLIs for an application are defined by the owners of the service, in consultation with the SREs. Service level objective (SLO) is defined as \u201ca target value or range of values for a service level that is measured by an SLI\u201d. SLO is a way for us to anchor ourselves to an optimal user experience by defining SLI boundaries. If our application takes a long time to load the feed, users might not open it very often. As a result, an example of SLO can be that at least 99% of the users should see their feed loaded within 1 second. Now that we have defined SLIs and SLOs, let us define the application\u2019s scalability, reliability and performance characteristics in terms of specific SLI and SLO levels.","title":"SLIs and SLOs"},{"location":"level102/system_design/large-system-design/#defining-application-requirements-in-terms-of-slis-and-slos","text":"The following can be some of the expectations for our application: Once the user successfully uploads the image, it should be accessible to the user and their followers 100% of the time, barring user elected deletion. At least 50000 unique visitors should be able to visit the site at any given time and view their feed. 99% of the users should be able to view their feeds in less than 1 second. Upon uploading a new image, it should show up in the feed of the user\u2019s followers within 15 minutes. Users should be able to upload potentially thousands of images. (as long as they are not abusing the service) Since our ultimate aim is to learn system design, we will arbitrarily limit the functionalities of the system. This will help us keep sight of our aim, and keep us focussed. Having defined the functionalities and expectations for our system, let us quickly sketch an initial design. As of now, all the functionalities are residing on a single server, which has endpoints for all of these functions. We will attempt to build a system that satisfies our SLOs, is able to serve 50k concurrent users, and about a million total users. In the course of this attempt, we will discuss a string of concepts, some of which we have already seen in Phase 1 of this course.","title":"Defining application requirements in terms of SLIs and SLOs"},{"location":"level102/system_design/large-system-design/#estimating-resource-requirements","text":"Single Computer If we wished to run the application on a single server, we would need to perform all the above functionalities from the diagram on this server itself. Let us perform some calculations to figure out what kind of resources we will need. Before anything else, we need to store the data about users, their uploads, follower information and any other metadata. We will choose a relational DB to store this information, like MySQL. Do note that we can also choose to use a NOSQL solution here. That would require a similar approach to calculate the requirements. Let us represent the users like so: UserID(int) UserName(varchar) DisplayName(varchar) YearOfBirth(year) Email(varchar) Photos can be represented like this: PhotoID(int) PhotoHash(varchar) Uploadtime(datetime) Location(varchar) OptionalFlag(varchar) Followers can be represented like this: Follower(int) Followee(int) Let us quickly estimate the storage needed for a hundred million total users. A single user would need 4B + 32B + 32B + 4B + 32B = 104 bytes. A hundred million users would need 10.4 GB storage. A single photo would need about 4B + 20B + 4B + 32B + 4B = 64 bytes of storage to store the metadata related to the photo. Assuming a million photos uploaded in one day, we would need about 64 MB of storage per day, just for the metadata. For the photo storage itself, we will need about 300GB per day, assuming 300KB average photo size. A single visitor opening our application simply hits our /get_feed endpoint upon logging in to the application. Let us quickly calculate the resources needed to serve this request. Assuming the initial feed loads 5 images (of 300 KB size on an average) and then does lazy loading to infinitely scroll, we will need to send about 1.5 megabytes of images to the user for his initial call. With a 1000 Mbps* network link to the server, we can send only about (1000/8)/1.5 or about 83 users all loading the feed at the same time, before we saturate the network link. If we needed to serve 50k concurrent users every second, we would need 1.5*50000*8 = 600000 Mbps network throughput needed for every 5 images sent, assuming we send out all 5 images in a single second. If we are reading all of it from disk, we would likely hit disk throughput limits far before approaching anywhere near this amount of traffic. So in order to meet our application requirements, we would need a server that has ~310GB storage for the database and the images of one day, and about 600 Gbps link to serve 50k users concurrently, along with CPU required to perform all this. Clearly not the task for a single server. And do note that we have severely limited the information we are storing in the database. We would likely need an order of magnitude more information to be stored. While we clearly do not have any real world server that has the resources we calculated above, this exercise provides us some valuable data points about what the resource cost is. Armed with this information, let us work on scaling our system through system design to get us as close as possible to our goals for the application. * Modern servers even have multi-gigabit links, but it is highly unlikely that such a huge server will be serving our application alone. Modern cloud providers have VMs that also boast several gigabits of bandwidth, but they usually end up being throttled after certain limits.","title":"Estimating resource requirements"},{"location":"level102/system_design/large-system-design/#references","text":"SQL vs NoSQL databases Introducing Non-Abstract Large System Design","title":"References:"},{"location":"level102/system_design/resiliency/","text":"A resilient system is one that can keep functioning in the face of adversity. With our application, there can be numerous failures that act as adversities. There can be network level failures that take out entire data centres, there might be issues at the rack level or at the server level, or there might be something wrong with the cloud provider. We may also run out of capacity, or there might be a wrong code push that breaks the system. We will talk about a couple of such issues, and understand how we might design a system to work around such things. In some cases, a workaround might not be possible. However it is still valuable to know potential vulnerabilities to the system stability. Resilient architectures leverage system design patterns such as graceful degradation, quotas, timeouts and circuit breakers. Let us look at some of them in this section. Quotas A system may have a component or an endpoint that is consumed by multiple components and endpoints. It is important to have something in place that will prevent one consumer or client from overwhelming such a system. Quotas are one way to do this - we simply assign a specific quota for each component - by way of specifying requests per unit time. Anyone who breaches the quota is either warned or dropped, depending on the implementation. This way, one of our own systems misbehaving cannot result in denial of service to others. Quotas also help us prevent cascading failures. Graceful Degradation When a system with multiple dependencies encounters failure in one of the dependencies, gracefully degrading to minimum viable functionality would be a lot better than grinding the entire system to a halt. For example, let us assume there is an endpoint (an URL for a service or a specific function) in our application whose responsibility is to parse the location information in an user uploaded image from the image's metadata and provide suggestions for location tagging to the user. Rather than failing the entire upload, it is much better to skip over this functionality and still give the user an option to manually tag a location. Gracefully degrading is always better compared to total failures. Timeouts We sometimes call other services or resources like databases or API endpoints in our application. When calling such a resource from our application, it is important to always have a reasonable timeout. It doesn\u2019t necessarily even have to be that the resource will fail for all requests. It just might be that a specific request falls in the high tail latency category. A reasonable time out is helpful to keep the user experience consistent - it is better to fail rather than to have frustratingly long delays, in some cases. Exponential back-offs When a service endpoint fails, retries are one way to see if it was a momentary failure. However, if the retry is also going to fail, there is no point in endlessly retrying. At large enough scale, the retries can compete with the new requests (which might very well be served as expected) and saturate the system. To avoid this, we can look at exponential back-off for retries. This essentially decreases the rate at which the clients retry, upon encountering consecutive failures on retries. Circuit breakers While exponential back off is one way to deal with retry storms, circuit breakers can be another. Circuit breakers can help failures from percolating the entire system. Else, an unmitigated failure that flows through the system may result in false alerts, worsening the mean time to detection(MTTD) and mean time to resolution(MTTR). For example, in case one of the in-memory cache nodes fails resulting in requests reaching the database post the initial timeouts for cache, it might end up overloading the database. If the initial connection between cache node failure and DB node failure is not made, then it might result in increased MTTD of the actual cause and consequently the MTTR. Self healing systems A traditionally load-balanced application with multiple instances might fail when more than a threshold of instances stop responding to requests - either because they are down, or suddenly there is a huge influx of requests, resulting in degraded performance. A self-healing system adds more instances in this scenario to replace the failed instances. Auto-scaling like this can also help when there is a sudden spike in query. If our application runs on a public cloud, it might simply be a matter of spinning up more virtual machines . If we are running on-premise out of our data center, then we will want to think about capacity planning much more carefully. Regardless of how we handle adding additional capacity - simply addition may not be enough. We should also think about additional potential failure modes that might be encountered. For example, the load balancing layer itself might need scaling up, to handle the influx of new backends. Continuous Deployment and Integration A well designed system also needs to take into account the need for a proper staging setup that can mimic the production environment as closely as possible. There should also be a way for us to replay production traffic in the staging environment to test changes to production thoroughly.","title":"Resiliency"},{"location":"level102/system_design/resiliency/#quotas","text":"A system may have a component or an endpoint that is consumed by multiple components and endpoints. It is important to have something in place that will prevent one consumer or client from overwhelming such a system. Quotas are one way to do this - we simply assign a specific quota for each component - by way of specifying requests per unit time. Anyone who breaches the quota is either warned or dropped, depending on the implementation. This way, one of our own systems misbehaving cannot result in denial of service to others. Quotas also help us prevent cascading failures.","title":"Quotas"},{"location":"level102/system_design/resiliency/#graceful-degradation","text":"When a system with multiple dependencies encounters failure in one of the dependencies, gracefully degrading to minimum viable functionality would be a lot better than grinding the entire system to a halt. For example, let us assume there is an endpoint (an URL for a service or a specific function) in our application whose responsibility is to parse the location information in an user uploaded image from the image's metadata and provide suggestions for location tagging to the user. Rather than failing the entire upload, it is much better to skip over this functionality and still give the user an option to manually tag a location. Gracefully degrading is always better compared to total failures.","title":"Graceful Degradation"},{"location":"level102/system_design/resiliency/#timeouts","text":"We sometimes call other services or resources like databases or API endpoints in our application. When calling such a resource from our application, it is important to always have a reasonable timeout. It doesn\u2019t necessarily even have to be that the resource will fail for all requests. It just might be that a specific request falls in the high tail latency category. A reasonable time out is helpful to keep the user experience consistent - it is better to fail rather than to have frustratingly long delays, in some cases.","title":"Timeouts"},{"location":"level102/system_design/resiliency/#exponential-back-offs","text":"When a service endpoint fails, retries are one way to see if it was a momentary failure. However, if the retry is also going to fail, there is no point in endlessly retrying. At large enough scale, the retries can compete with the new requests (which might very well be served as expected) and saturate the system. To avoid this, we can look at exponential back-off for retries. This essentially decreases the rate at which the clients retry, upon encountering consecutive failures on retries.","title":"Exponential back-offs"},{"location":"level102/system_design/resiliency/#circuit-breakers","text":"While exponential back off is one way to deal with retry storms, circuit breakers can be another. Circuit breakers can help failures from percolating the entire system. Else, an unmitigated failure that flows through the system may result in false alerts, worsening the mean time to detection(MTTD) and mean time to resolution(MTTR). For example, in case one of the in-memory cache nodes fails resulting in requests reaching the database post the initial timeouts for cache, it might end up overloading the database. If the initial connection between cache node failure and DB node failure is not made, then it might result in increased MTTD of the actual cause and consequently the MTTR.","title":"Circuit breakers"},{"location":"level102/system_design/resiliency/#self-healing-systems","text":"A traditionally load-balanced application with multiple instances might fail when more than a threshold of instances stop responding to requests - either because they are down, or suddenly there is a huge influx of requests, resulting in degraded performance. A self-healing system adds more instances in this scenario to replace the failed instances. Auto-scaling like this can also help when there is a sudden spike in query. If our application runs on a public cloud, it might simply be a matter of spinning up more virtual machines . If we are running on-premise out of our data center, then we will want to think about capacity planning much more carefully. Regardless of how we handle adding additional capacity - simply addition may not be enough. We should also think about additional potential failure modes that might be encountered. For example, the load balancing layer itself might need scaling up, to handle the influx of new backends.","title":"Self healing systems"},{"location":"level102/system_design/resiliency/#continuous-deployment-and-integration","text":"A well designed system also needs to take into account the need for a proper staging setup that can mimic the production environment as closely as possible. There should also be a way for us to replay production traffic in the staging environment to test changes to production thoroughly.","title":"Continuous Deployment and Integration"},{"location":"level102/system_design/scaling-beyond-the-datacenter/","text":"Caching static assets Extending the existing caching solution a bit, we arrive at Content Delivery Networks(CDNs). CDNs are the caching layer that is closest to the user. A significant chunk of resources served in a webpage, may not be changing on an hourly or even a daily basis. In those cases, we would want to cache these at the CDN level, reducing our load. CDNs not only help reduce the load on our servers by removing the burden of serving static / bandwidth intensive resources, they also let us be present closer to our users, by way of points of presence(POPs). CDNs also let us do geo-load balancing, in case we have multiple data centres around the world, and would want to serve from the closest data center (DC) possible. Taking it a step further With the addition of caching and distributing our application into simpler services, we have solved the problem of scaling to 50000 users. However, our users may be geographically distributed locations and may not be at the same distance from our data centre or our cloud region. Consistency in user experience is important, else we are excluding users who are far away from our location, potentially eliminating a significant chunk of potential users. However, it is not impractical to have data centers all over the world, or even in more than a couple of locations in the world. This is where CDNs and POPs come into picture. Points of Presence CDN POPs are geographically distributed data centers aimed at being close to users. POPs reduce the round trip time by delivering content from a location that is nearest to the user. POPs typically may not have all the content, but have caching servers that cache the static assets, and fetch the rest of the content from the origin server where the application actually resides. Their main function is to reduce round trip time by bringing the content closer to the website\u2019s visitor. POPs can also route traffic to one of the multiple origin DCs possible. This way, POPs can be leveraged to add resiliency as well as load-balancing. Now, with our image sharing application becoming more popular by the day, let us assume that we have hit 100,000 concurrent users. And we have built another data center, predicting this increase in traffic. Now we need to be able to route the service to both of these data centers in a reliable manner, while also retaining the ability to fall back to a single data center in case there is an issue with one of the two DCs. This is where sticky routing comes into play. Sticky Routing When an user sends a request, there are cases in which we might want to serve a specific user\u2019s requests from a DC if we have multiple DCs, or a specific server inside a DC. We may also wish to serve all requests from a specific POP by a single data center. Sticky routing helps us do exactly that. It might be simply pinning all users to a specific DC or pinning specific users to specific servers. This is typically done from the POP, so that as soon as the user enters reaches our servers, we can route them to the nearest DC possible. Geo DNS When a user opens the application, the user can be directed to one of the multiple globally distributed POPs. This can be done using GeoDNS , which simply put, gives out a different IP address(which are distributed geographically), depending on the location of the user making the DNS request. GeoDNS is the first step in distributing users to different locations - it is not 100% accurate, and typically makes use of IP address allotment information for guessing the location of the user. However, it works well enough for >90% of the users. After this, we can have a sticky routing service that assigns each user to a specific DC, which we can use to assign a DC to this user, and set a cookie. When the user next visits, the cookie can be read at the POP to decide which data center the user\u2019s traffic must be directed to. Having multiple DCs and leveraging sticky routing has not only scaling benefits, but also adds to the resiliency of the service, albeit at the cost of additional complexity. Let us consider another use case in which an user uploads a new profile picture for themselves. If we have multiple data centres or POPs which are not synced in real time - not all of them might have the newer picture. In such a case, it would make sense to tie that user to a specific DC/region until the update has propagated to all regions. Sticky routing would enable us to do this. References CDNs LinkedIn's TrafficShift blog talks about sticky routing","title":"Scaling Beyond the Data Center"},{"location":"level102/system_design/scaling-beyond-the-datacenter/#caching-static-assets","text":"Extending the existing caching solution a bit, we arrive at Content Delivery Networks(CDNs). CDNs are the caching layer that is closest to the user. A significant chunk of resources served in a webpage, may not be changing on an hourly or even a daily basis. In those cases, we would want to cache these at the CDN level, reducing our load. CDNs not only help reduce the load on our servers by removing the burden of serving static / bandwidth intensive resources, they also let us be present closer to our users, by way of points of presence(POPs). CDNs also let us do geo-load balancing, in case we have multiple data centres around the world, and would want to serve from the closest data center (DC) possible. Taking it a step further With the addition of caching and distributing our application into simpler services, we have solved the problem of scaling to 50000 users. However, our users may be geographically distributed locations and may not be at the same distance from our data centre or our cloud region. Consistency in user experience is important, else we are excluding users who are far away from our location, potentially eliminating a significant chunk of potential users. However, it is not impractical to have data centers all over the world, or even in more than a couple of locations in the world. This is where CDNs and POPs come into picture.","title":"Caching static assets"},{"location":"level102/system_design/scaling-beyond-the-datacenter/#points-of-presence","text":"CDN POPs are geographically distributed data centers aimed at being close to users. POPs reduce the round trip time by delivering content from a location that is nearest to the user. POPs typically may not have all the content, but have caching servers that cache the static assets, and fetch the rest of the content from the origin server where the application actually resides. Their main function is to reduce round trip time by bringing the content closer to the website\u2019s visitor. POPs can also route traffic to one of the multiple origin DCs possible. This way, POPs can be leveraged to add resiliency as well as load-balancing. Now, with our image sharing application becoming more popular by the day, let us assume that we have hit 100,000 concurrent users. And we have built another data center, predicting this increase in traffic. Now we need to be able to route the service to both of these data centers in a reliable manner, while also retaining the ability to fall back to a single data center in case there is an issue with one of the two DCs. This is where sticky routing comes into play.","title":"Points of Presence"},{"location":"level102/system_design/scaling-beyond-the-datacenter/#sticky-routing","text":"When an user sends a request, there are cases in which we might want to serve a specific user\u2019s requests from a DC if we have multiple DCs, or a specific server inside a DC. We may also wish to serve all requests from a specific POP by a single data center. Sticky routing helps us do exactly that. It might be simply pinning all users to a specific DC or pinning specific users to specific servers. This is typically done from the POP, so that as soon as the user enters reaches our servers, we can route them to the nearest DC possible.","title":"Sticky Routing"},{"location":"level102/system_design/scaling-beyond-the-datacenter/#geo-dns","text":"When a user opens the application, the user can be directed to one of the multiple globally distributed POPs. This can be done using GeoDNS , which simply put, gives out a different IP address(which are distributed geographically), depending on the location of the user making the DNS request. GeoDNS is the first step in distributing users to different locations - it is not 100% accurate, and typically makes use of IP address allotment information for guessing the location of the user. However, it works well enough for >90% of the users. After this, we can have a sticky routing service that assigns each user to a specific DC, which we can use to assign a DC to this user, and set a cookie. When the user next visits, the cookie can be read at the POP to decide which data center the user\u2019s traffic must be directed to. Having multiple DCs and leveraging sticky routing has not only scaling benefits, but also adds to the resiliency of the service, albeit at the cost of additional complexity. Let us consider another use case in which an user uploads a new profile picture for themselves. If we have multiple data centres or POPs which are not synced in real time - not all of them might have the newer picture. In such a case, it would make sense to tie that user to a specific DC/region until the update has propagated to all regions. Sticky routing would enable us to do this.","title":"Geo DNS"},{"location":"level102/system_design/scaling-beyond-the-datacenter/#references","text":"CDNs LinkedIn's TrafficShift blog talks about sticky routing","title":"References"},{"location":"level102/system_design/scaling/","text":"In the Phase 1 of this course, we had seen AKF scale cube and how it can help in segmenting services, defining microservices and scaling the overall application. We will use a similar strategy to scale our application - while using the estimates from the previous section, so that we can have a data driven design rather than arbitrarily choosing scaling patterns. Splitting the application Considering the huge volume of traffic that might be generated by our application, and the related resource requirements in terms of memory and CPU, let us split the application into smaller chunks. One of the simplest ways to do this would be to simply divide the application along the endpoints, and spin them up as separate instances. In reality, this decision would probably be a little more complicated, and you might end up having multiple endpoints running from the same instance. The images can be stored in an object store that can be scaled independently, rather than locating it on the servers where the application or the database resides. This would reduce the resource requirements for the servers. Stateful vs Stateless services A stateless process or service doesn\u2019t rely on stored data of it\u2019s past invocations. A stateful service on the other hand stores its state in a datastore, and typically uses the state on every call or transaction. In some cases, there are options for us to design services in such a way that certain components can be made stateless and this helps in multiple ways. Applications can be containerized easily if they are stateless. Containerized applications are also easier to scale. Stateful services require you to scale the datastore with the state as well. However, containerizing databases or scaling databases is out of the scope of this module. The resulting design after such distribution of workloads might look something like this. You might notice that the diagram also has multiple databases. We will see more about this in the following sharding section. Now that we have split the application into smaller services, we need to look at scaling up the capacity of each of these endpoints. The popular Pareto principle states that \u201c80% of consequences come from 20% of the causes\u201d. Modifying it slightly, we can say that 80% of the traffic will be for 20% of images. The no. of images uploaded vs the no. of images seen by the user is going to be similarly skewed. An user is much more likely to view images on a daily basis than they are to upload new ones. In our simple design, generating the feed page with initial 5 images will be a matter of choosing 5 recently uploaded images from fellow users whom this user follows. While we can dynamically fetch the images from the database and generate the page on the fly once the user logs on, we might soon overwhelm the database in case a large number of users choose to login at the same time and load their feeds. There are two things we can do here, one is caching, and the other one is ahead of time generation of user feeds. An user with a million followers can potentially lead to hundreds of thousands of calls to the DB, simply to fetch the latest photoID that the user has uploaded. This can quickly overwhelm any DB, and can potentially bring down the DB itself. Sharding One way to solve the problem of DB limitation is scaling up the DB write and reads. Sharding is one way to scale the DB writes, where the DB would be split into parts that reside in different instances of the DB running on separate machines. DB reads can be scaled up similarly by using read replicas as we had seen in Phase 1 of this module. Compared to the number of images the popular user uploads, the number of views generated would be massive. In that case, we should cache the photoIDs of the user\u2019s uploads, to be returned without having to perform a potentially expensive call to the DB. Let us consider another endpoint in our application named /get_user_details . It simply returns the page an user would see upon clicking another user\u2019s name. This endpoint will return a list of posts that the user has created. Normally, a call to that endpoint will involve the application talking to the DB, fetching a list of all the posts by the user and returning the result. If someone\u2019s profile is viewed thousands of times that means there are thousands of calls to the DB - which may result in issues like hot keys and hot partitions. As with all other systems, an increase in load may result in worsening response times, resulting in inconsistent and potentially bad user experience. A simple solution here would be a cache layer - one that would return the user\u2019s profile with posts without having to call the DB everytime. Caching A cache is used for the temporary storage of data that is likely to be accessed again, often repetitively. When the data requested is found in the cache, it is termed as a `cache hit\u2019. A \u2018cache miss\u2019 is the obvious complement. A well positioned cache can greatly reduce the query response time as well as improve the scalability of a system. Caches can be placed at multiple levels between the user and the application. In Phase 1, we saw how we could use caches / CDNs to service static resources of the application, resulting in quicker response times as well as lesser burden on the application servers. Let us look at more situations where caching can play a role. In-memory caching: In memory caching is when the information to be cached is kept in the main memory of the server, allowing it to be retrieved much faster than a DB residing on a disk. We cache arbitrary text (which can be HTML fragments or may be JSON objects) and fetch it back really fast. An in memory cache is the quickest way to add a layer of fast cache that can optionally be persisted to disk as well. While caching can aid significantly in scaling up and improving performance, there are situations where cache is suddenly not in place. It might be that the cache was accidentally wiped, leading to all the queries falling through to the DB layer, often multiple calls for the same piece of information. It is important to be aware of this potential \u2018thundering herd\u2019 problem and design your system accordingly. Caching proxies: There are cases where you may want to cache entire webpages / responses of other upstream resources that you need to respond to requests. There are also cases where you want to let your upstream tell you what to cache and how long to cache it for. In such cases, it might be a good idea to have a caching solution that understands Cache related HTTP headers. One example for our usecase can be when users search for a specific term in our application - if there is a frequent enough search for a user or a term, it might be more efficient to cache the responses for some duration rather than performing the search anew everytime. Let\u2019s recap one of the goals - Atleast 50000 unique visitors should be able to visit the site at any given time and view their feed. With the implementation of caching, we have removed one potential bottleneck - the DB. We also decomposed the monolith into smaller chunks that provide individual services. Another step closer to our goal is to simply horizontally scale the services needed for feed viewing and putting them behind a load balancer. Please recall the scaling concepts discussed in Phase 1 of this module. Cache managment While caching sounds like a simple, easy solution for a hard problem, an even harder problem is to manage the cache efficiently. Like most things in your system, the cache layer is not infinite. Effective cache management means removing things from the cache at the right time, to ensure the cache hit rate remains high. There are many strategies to invalidate cache after a certain time period or below certain usage thresholds. It is important to keep an eye on cache-hit rate and fine tune your caching strategy accordingly. References There are many object storage solutions available. Minio is one self hosted solution. There are also vendor-specific solutions for the cloud like Azure Blob storage and Amazon S3 . Microservices architecture style - Azure architecture guide There are many in-memory caching solutions. Some of the most popular ones include redis and memcached . Cloud vendors also have their managed cache solutions. Some of the most popular proxies include squid and Apache Traffic Server Thundering herd problem - how instagram tackled it .","title":"Scaling"},{"location":"level102/system_design/scaling/#splitting-the-application","text":"Considering the huge volume of traffic that might be generated by our application, and the related resource requirements in terms of memory and CPU, let us split the application into smaller chunks. One of the simplest ways to do this would be to simply divide the application along the endpoints, and spin them up as separate instances. In reality, this decision would probably be a little more complicated, and you might end up having multiple endpoints running from the same instance. The images can be stored in an object store that can be scaled independently, rather than locating it on the servers where the application or the database resides. This would reduce the resource requirements for the servers.","title":"Splitting the application"},{"location":"level102/system_design/scaling/#stateful-vs-stateless-services","text":"A stateless process or service doesn\u2019t rely on stored data of it\u2019s past invocations. A stateful service on the other hand stores its state in a datastore, and typically uses the state on every call or transaction. In some cases, there are options for us to design services in such a way that certain components can be made stateless and this helps in multiple ways. Applications can be containerized easily if they are stateless. Containerized applications are also easier to scale. Stateful services require you to scale the datastore with the state as well. However, containerizing databases or scaling databases is out of the scope of this module. The resulting design after such distribution of workloads might look something like this. You might notice that the diagram also has multiple databases. We will see more about this in the following sharding section. Now that we have split the application into smaller services, we need to look at scaling up the capacity of each of these endpoints. The popular Pareto principle states that \u201c80% of consequences come from 20% of the causes\u201d. Modifying it slightly, we can say that 80% of the traffic will be for 20% of images. The no. of images uploaded vs the no. of images seen by the user is going to be similarly skewed. An user is much more likely to view images on a daily basis than they are to upload new ones. In our simple design, generating the feed page with initial 5 images will be a matter of choosing 5 recently uploaded images from fellow users whom this user follows. While we can dynamically fetch the images from the database and generate the page on the fly once the user logs on, we might soon overwhelm the database in case a large number of users choose to login at the same time and load their feeds. There are two things we can do here, one is caching, and the other one is ahead of time generation of user feeds. An user with a million followers can potentially lead to hundreds of thousands of calls to the DB, simply to fetch the latest photoID that the user has uploaded. This can quickly overwhelm any DB, and can potentially bring down the DB itself.","title":"Stateful vs Stateless services"},{"location":"level102/system_design/scaling/#sharding","text":"One way to solve the problem of DB limitation is scaling up the DB write and reads. Sharding is one way to scale the DB writes, where the DB would be split into parts that reside in different instances of the DB running on separate machines. DB reads can be scaled up similarly by using read replicas as we had seen in Phase 1 of this module. Compared to the number of images the popular user uploads, the number of views generated would be massive. In that case, we should cache the photoIDs of the user\u2019s uploads, to be returned without having to perform a potentially expensive call to the DB. Let us consider another endpoint in our application named /get_user_details . It simply returns the page an user would see upon clicking another user\u2019s name. This endpoint will return a list of posts that the user has created. Normally, a call to that endpoint will involve the application talking to the DB, fetching a list of all the posts by the user and returning the result. If someone\u2019s profile is viewed thousands of times that means there are thousands of calls to the DB - which may result in issues like hot keys and hot partitions. As with all other systems, an increase in load may result in worsening response times, resulting in inconsistent and potentially bad user experience. A simple solution here would be a cache layer - one that would return the user\u2019s profile with posts without having to call the DB everytime.","title":"Sharding"},{"location":"level102/system_design/scaling/#caching","text":"A cache is used for the temporary storage of data that is likely to be accessed again, often repetitively. When the data requested is found in the cache, it is termed as a `cache hit\u2019. A \u2018cache miss\u2019 is the obvious complement. A well positioned cache can greatly reduce the query response time as well as improve the scalability of a system. Caches can be placed at multiple levels between the user and the application. In Phase 1, we saw how we could use caches / CDNs to service static resources of the application, resulting in quicker response times as well as lesser burden on the application servers. Let us look at more situations where caching can play a role.","title":"Caching"},{"location":"level102/system_design/scaling/#in-memory-caching","text":"In memory caching is when the information to be cached is kept in the main memory of the server, allowing it to be retrieved much faster than a DB residing on a disk. We cache arbitrary text (which can be HTML fragments or may be JSON objects) and fetch it back really fast. An in memory cache is the quickest way to add a layer of fast cache that can optionally be persisted to disk as well. While caching can aid significantly in scaling up and improving performance, there are situations where cache is suddenly not in place. It might be that the cache was accidentally wiped, leading to all the queries falling through to the DB layer, often multiple calls for the same piece of information. It is important to be aware of this potential \u2018thundering herd\u2019 problem and design your system accordingly. Caching proxies: There are cases where you may want to cache entire webpages / responses of other upstream resources that you need to respond to requests. There are also cases where you want to let your upstream tell you what to cache and how long to cache it for. In such cases, it might be a good idea to have a caching solution that understands Cache related HTTP headers. One example for our usecase can be when users search for a specific term in our application - if there is a frequent enough search for a user or a term, it might be more efficient to cache the responses for some duration rather than performing the search anew everytime. Let\u2019s recap one of the goals - Atleast 50000 unique visitors should be able to visit the site at any given time and view their feed. With the implementation of caching, we have removed one potential bottleneck - the DB. We also decomposed the monolith into smaller chunks that provide individual services. Another step closer to our goal is to simply horizontally scale the services needed for feed viewing and putting them behind a load balancer. Please recall the scaling concepts discussed in Phase 1 of this module.","title":"In-memory caching:"},{"location":"level102/system_design/scaling/#cache-managment","text":"While caching sounds like a simple, easy solution for a hard problem, an even harder problem is to manage the cache efficiently. Like most things in your system, the cache layer is not infinite. Effective cache management means removing things from the cache at the right time, to ensure the cache hit rate remains high. There are many strategies to invalidate cache after a certain time period or below certain usage thresholds. It is important to keep an eye on cache-hit rate and fine tune your caching strategy accordingly.","title":"Cache managment"},{"location":"level102/system_design/scaling/#references","text":"There are many object storage solutions available. Minio is one self hosted solution. There are also vendor-specific solutions for the cloud like Azure Blob storage and Amazon S3 . Microservices architecture style - Azure architecture guide There are many in-memory caching solutions. Some of the most popular ones include redis and memcached . Cloud vendors also have their managed cache solutions. Some of the most popular proxies include squid and Apache Traffic Server Thundering herd problem - how instagram tackled it .","title":"References"},{"location":"level102/system_troubleshooting_and_performance/conclusion/","text":"Complex systems have many factors which can go wrong. It can be a bad design & architecture, poorly managed code, poor policies around different caches, bad DB queries or architecture, improper use of resources, or bad OS version, poorly monitored system, datacenter issues, network faults, and many more, Any of these can go wrong. As an SRE, Knowing important tools/commands, best practices, profiling, benchmarking and scaling can help you with faster troubleshooting and performance improvement of the overall system. Further readings Here are some links from the LinkedIn Engineering Blog, as written by LinkedIn engineers, about firefighting they did, ensuring site up 24x7x365. Taming memory fragmentation in Venice with Jemalloc Intro: Every Day Is Monday in Operations Fixing Linux filesystem performance regressions The impact of slow NFS on data systems","title":"Conclusion"},{"location":"level102/system_troubleshooting_and_performance/conclusion/#further-readings","text":"Here are some links from the LinkedIn Engineering Blog, as written by LinkedIn engineers, about firefighting they did, ensuring site up 24x7x365. Taming memory fragmentation in Venice with Jemalloc Intro: Every Day Is Monday in Operations Fixing Linux filesystem performance regressions The impact of slow NFS on data systems","title":"Further readings"},{"location":"level102/system_troubleshooting_and_performance/important-tools/","text":"Important linux commands Having knowledge of following commands will help find issues faster. Elaborating each command in detail is out of scope, please look for man pages or online for more information and examples around the same. For logs parsing -: grep, sed, awk, cut, tail, head For network checks -: nc, netstat, traceroute/6, mtr, ping/6, route, tcpdump, ss, ip For DNS -: dig, host, nslookup For tracing system call -: strace For parallel executions over ssh -: gnu parallel, xargs + ssh. For http/s checks -: curl, wget For list of open files -: lsof For modifying attributes of the system kernel -: sysctl In case of distributed systems, some good third party tools can help to execute commands/instructions on many hosts at once, like: SSH based tools ClusterSSH : Cluster ssh can help you run a command in parallel on many hosts at once. Ansible : It allows you to write ansible playbooks which you can run on hundreds/thousands of hosts at the same time. Agent Based tools Saltstack : Is a configuration, state and remote execution framework, provides a wide variety of flexibility to users to execute modules on large numbers of hosts at once. Puppet : Is an automated administrative engine for your Linux, Unix, and Windows systems, performs administrative tasks. Log analysis tools These can help in writing SQL type queries for parsing, analysing logs and provide an easy UI interface to create dashboards which can render various types of charts based on defined queries. ELK : Elasticsearch, Logstash and Kibana, provide package of tools and services to allow, parse logs, index logs and analyse logs easily and quickly. Once logs/data is parsed/filtered through logstash and indexed in elasticsearch, one can create dynamic dashboards in Kibana in a matter of minutes. Such provides easy analysis and correlation on application errors/exceptions/warnings. Azure kusto : Azure kusto is a cloud based service similar to Elasticsearch and Kibana, it allows easy indexing of heavy logs, provides SQL type interface for writing queries, and an interface to create dynamic dashboards.","title":"Important Tools"},{"location":"level102/system_troubleshooting_and_performance/important-tools/#important-linux-commands","text":"Having knowledge of following commands will help find issues faster. Elaborating each command in detail is out of scope, please look for man pages or online for more information and examples around the same. For logs parsing -: grep, sed, awk, cut, tail, head For network checks -: nc, netstat, traceroute/6, mtr, ping/6, route, tcpdump, ss, ip For DNS -: dig, host, nslookup For tracing system call -: strace For parallel executions over ssh -: gnu parallel, xargs + ssh. For http/s checks -: curl, wget For list of open files -: lsof For modifying attributes of the system kernel -: sysctl In case of distributed systems, some good third party tools can help to execute commands/instructions on many hosts at once, like: SSH based tools ClusterSSH : Cluster ssh can help you run a command in parallel on many hosts at once. Ansible : It allows you to write ansible playbooks which you can run on hundreds/thousands of hosts at the same time. Agent Based tools Saltstack : Is a configuration, state and remote execution framework, provides a wide variety of flexibility to users to execute modules on large numbers of hosts at once. Puppet : Is an automated administrative engine for your Linux, Unix, and Windows systems, performs administrative tasks.","title":"Important linux commands"},{"location":"level102/system_troubleshooting_and_performance/important-tools/#log-analysis-tools","text":"These can help in writing SQL type queries for parsing, analysing logs and provide an easy UI interface to create dashboards which can render various types of charts based on defined queries. ELK : Elasticsearch, Logstash and Kibana, provide package of tools and services to allow, parse logs, index logs and analyse logs easily and quickly. Once logs/data is parsed/filtered through logstash and indexed in elasticsearch, one can create dynamic dashboards in Kibana in a matter of minutes. Such provides easy analysis and correlation on application errors/exceptions/warnings. Azure kusto : Azure kusto is a cloud based service similar to Elasticsearch and Kibana, it allows easy indexing of heavy logs, provides SQL type interface for writing queries, and an interface to create dynamic dashboards.","title":"Log analysis tools"},{"location":"level102/system_troubleshooting_and_performance/introduction/","text":"System troubleshooting and performance improvements Prerequisites Linux Basics System design Basic Networking Metrics and Monitoring What to expect from this course This brief course tries to provide a general introduction on how to troubleshoot system issues, like analysing api failures, resource utilization, network issues, hardware and OS issues. Course also briefs on profiling and benchmarking to measure overall system performance. What is not covered under this course This course does not cover following -: System Design and Architecture. Programming practices. Metrics and Monitoring. OS basics. Course Contents Introduction Troubleshooting Troubleshooting Flowchart General Practices General Host issues Important tools to know Important linux commands Log analysis tools Performance improvements Performance analysis commands Profiling tools Benchmarking Scaling Troubleshooting Example Conclusion Further readings Introduction Troubleshooting is an important part of operations & development. It can\u2019t be learned by just reading one article or completing a course online, Its a continuous learning process, one learns it during :- Daily operations and development. Finding & Fixing application bugs. Finding & Fixing system & network issues. Performance analysis and improvements. And more. From an SRE\u2019s perspective, It is expected that they are aware of certain topics upfront to be able to troubleshoot problems around single or distributed systems. Know your resources well, understand host specifications, liks CPU, Memory, Network, Disk etc. Understand system design and architecture. Ensure important metrics are being collected/rendered properly. There was a famous quote by HP founders - \u201cWhat gets measured gets fixed\u201d If system components and performance metrics are captured thoroughly then there is a high chance of success in troubleshooting an issue, at its earliest. Scope There is no common approach to troubleshoot different types of applications or services, the failure can occur at any layer of it. We will keep the scope of this work to a web api service type only. Note -: Linux ecosystem is wide, there are hundreds of tools and utilities which can help with system troubleshooting, each comes with its own set of benefits and functionalities. We will cover some of the known tools, either already available with Linux or are available in the open source world. Detailed explanation of mentioned tools in this doc is out of scope, please explore the internet or man pages for more examples and documentation around the same.","title":"Introduction"},{"location":"level102/system_troubleshooting_and_performance/introduction/#system-troubleshooting-and-performance-improvements","text":"","title":"System troubleshooting and performance improvements"},{"location":"level102/system_troubleshooting_and_performance/introduction/#prerequisites","text":"Linux Basics System design Basic Networking Metrics and Monitoring","title":"Prerequisites"},{"location":"level102/system_troubleshooting_and_performance/introduction/#what-to-expect-from-this-course","text":"This brief course tries to provide a general introduction on how to troubleshoot system issues, like analysing api failures, resource utilization, network issues, hardware and OS issues. Course also briefs on profiling and benchmarking to measure overall system performance.","title":"What to expect from this course"},{"location":"level102/system_troubleshooting_and_performance/introduction/#what-is-not-covered-under-this-course","text":"This course does not cover following -: System Design and Architecture. Programming practices. Metrics and Monitoring. OS basics.","title":"What is not covered under this course"},{"location":"level102/system_troubleshooting_and_performance/introduction/#course-contents","text":"Introduction Troubleshooting Troubleshooting Flowchart General Practices General Host issues Important tools to know Important linux commands Log analysis tools Performance improvements Performance analysis commands Profiling tools Benchmarking Scaling Troubleshooting Example Conclusion Further readings","title":"Course Contents"},{"location":"level102/system_troubleshooting_and_performance/introduction/#introduction","text":"Troubleshooting is an important part of operations & development. It can\u2019t be learned by just reading one article or completing a course online, Its a continuous learning process, one learns it during :- Daily operations and development. Finding & Fixing application bugs. Finding & Fixing system & network issues. Performance analysis and improvements. And more. From an SRE\u2019s perspective, It is expected that they are aware of certain topics upfront to be able to troubleshoot problems around single or distributed systems. Know your resources well, understand host specifications, liks CPU, Memory, Network, Disk etc. Understand system design and architecture. Ensure important metrics are being collected/rendered properly. There was a famous quote by HP founders - \u201cWhat gets measured gets fixed\u201d If system components and performance metrics are captured thoroughly then there is a high chance of success in troubleshooting an issue, at its earliest.","title":"Introduction"},{"location":"level102/system_troubleshooting_and_performance/introduction/#scope","text":"There is no common approach to troubleshoot different types of applications or services, the failure can occur at any layer of it. We will keep the scope of this work to a web api service type only. Note -: Linux ecosystem is wide, there are hundreds of tools and utilities which can help with system troubleshooting, each comes with its own set of benefits and functionalities. We will cover some of the known tools, either already available with Linux or are available in the open source world. Detailed explanation of mentioned tools in this doc is out of scope, please explore the internet or man pages for more examples and documentation around the same.","title":"Scope"},{"location":"level102/system_troubleshooting_and_performance/performance-improvements/","text":"Performance tools are an important part of development/operations lifecycle, Its highly important for understanding application behavior. SRE generally uses such tools to evaluate how well service will perform and make/suggest improvements accordingly. Performance analysis commands Most of these commands are a must to know for doing performance analysis of a system or service. top -: shows real-time view of running system, processes, threads etc. htop -: Similar to top command, but a bit more interactive then it. iotop -: An interactive disk I/O monitoring tool. vmstat -: Virtual memory statistics explorer. iostat -: Monitoring tool for input/output statistics for devices and partitions. free -: Tell info about physical memory and swap memory. sar -: System activity report, reports diff metrics such as cpu, disk, mem, network, etc. mpstat -: Display info about CPU utilization and performance. lsof -: Provides info about the list of open files, opened by which processes. perf -: Performance analysing tool. Profiling tools Profiling is an important part of performance analysis of the service. There are various profiler tools available, which can help figure most frequent code-paths, debugging, memory profiling, etc. These can generate the heatmap to understand the code performance when under load. FlameGraph : Flame graphs are a visualization of profiled software, allowing the most frequent code-paths to be identified quickly and accurately. Valgrind : It is a programming tool for memory debugging, memory leak detection, and profiling. Gprof : GNU profiler tool uses a hybrid of instrumentation and sampling. Instrumentation is used to collect function call information, and sampling is used to gather runtime profiling information. To know how LinkedIn performs On-Demand Profiling on its services, Read LinkedIn blog ODP: An Infrastructure for On-Demand Service Profiling Benchmarking It is a process of measuring the best performance of the service. Like how much QPS service can handle, its latency when load is increasing, host resource utilization, loadavg etc etc. The regression testing (i.e load testing) is a must before deploying the service to production. Some of known tools -: Apache Benchmark Tool, ab :, It simulate a high load on webapp and gather data for analysis Httperf : It sends requests to the web server at a specified rate and gathers stats. Increase till one finds the saturation point. Apache JMeter : It is a popular open-source tool to measure web application performance. JMeter is a java based application and not only a web server, but you can use it against PHP, Java, REST, etc. Wrk : It is another modern performance measurement tool to put a load on your web server and give you latency, request per second, transfer per second, etc. details. Locust : Easy to use, scriptable and scalable performance testing tool. Limitation -: Above tools help in synthetic load or stress testing, but such does not measure actual end user experience, It can\u2019t see how end user resources will affect application performance, it is due to lack of memory, CPU, or poor connectivity to the internet. To know how LinkedIn performs load testing across its fleet. Read : Eliminating toil with fully automated load testing And to know how LinkedIn makes use of Real Time Monitoring (RUM) data to overcome the limitations of load testing, and help improve overall experience for end users. Read : Monitor and Improve Web Performance Using RUM Data Visualization Scaling System designed optimally can perform up to a certain limit only, based on availability of resources. Continuous optimization is always needed to ensure optimum use of resources at its peak. With increasing QPS, Systems need to scale up. We can either scale vertically or horizontally. Vertical scalability has its limits as one can increase cpu, memory, disk, GPU and other specifications to certain limit only, whereas horizontal scalability can grow easily and infinitely given limitations imposed by application design and environment attributes. Scaling a web application will require some or all of the following -: Ease the server load by adding more hosts. Distributing the traffic across servers by using Load Balancers. Scale up DB by sharding the data and increasing read replicas. Here\u2019s a good read how LinkedIn scaled its application stack A Brief History of Scaling LinkedIn","title":"Performance Improvements"},{"location":"level102/system_troubleshooting_and_performance/performance-improvements/#performance-analysis-commands","text":"Most of these commands are a must to know for doing performance analysis of a system or service. top -: shows real-time view of running system, processes, threads etc. htop -: Similar to top command, but a bit more interactive then it. iotop -: An interactive disk I/O monitoring tool. vmstat -: Virtual memory statistics explorer. iostat -: Monitoring tool for input/output statistics for devices and partitions. free -: Tell info about physical memory and swap memory. sar -: System activity report, reports diff metrics such as cpu, disk, mem, network, etc. mpstat -: Display info about CPU utilization and performance. lsof -: Provides info about the list of open files, opened by which processes. perf -: Performance analysing tool.","title":"Performance analysis commands"},{"location":"level102/system_troubleshooting_and_performance/performance-improvements/#profiling-tools","text":"Profiling is an important part of performance analysis of the service. There are various profiler tools available, which can help figure most frequent code-paths, debugging, memory profiling, etc. These can generate the heatmap to understand the code performance when under load. FlameGraph : Flame graphs are a visualization of profiled software, allowing the most frequent code-paths to be identified quickly and accurately. Valgrind : It is a programming tool for memory debugging, memory leak detection, and profiling. Gprof : GNU profiler tool uses a hybrid of instrumentation and sampling. Instrumentation is used to collect function call information, and sampling is used to gather runtime profiling information. To know how LinkedIn performs On-Demand Profiling on its services, Read LinkedIn blog ODP: An Infrastructure for On-Demand Service Profiling","title":"Profiling tools"},{"location":"level102/system_troubleshooting_and_performance/performance-improvements/#benchmarking","text":"It is a process of measuring the best performance of the service. Like how much QPS service can handle, its latency when load is increasing, host resource utilization, loadavg etc etc. The regression testing (i.e load testing) is a must before deploying the service to production. Some of known tools -: Apache Benchmark Tool, ab :, It simulate a high load on webapp and gather data for analysis Httperf : It sends requests to the web server at a specified rate and gathers stats. Increase till one finds the saturation point. Apache JMeter : It is a popular open-source tool to measure web application performance. JMeter is a java based application and not only a web server, but you can use it against PHP, Java, REST, etc. Wrk : It is another modern performance measurement tool to put a load on your web server and give you latency, request per second, transfer per second, etc. details. Locust : Easy to use, scriptable and scalable performance testing tool. Limitation -: Above tools help in synthetic load or stress testing, but such does not measure actual end user experience, It can\u2019t see how end user resources will affect application performance, it is due to lack of memory, CPU, or poor connectivity to the internet. To know how LinkedIn performs load testing across its fleet. Read : Eliminating toil with fully automated load testing And to know how LinkedIn makes use of Real Time Monitoring (RUM) data to overcome the limitations of load testing, and help improve overall experience for end users. Read : Monitor and Improve Web Performance Using RUM Data Visualization","title":"Benchmarking"},{"location":"level102/system_troubleshooting_and_performance/performance-improvements/#scaling","text":"System designed optimally can perform up to a certain limit only, based on availability of resources. Continuous optimization is always needed to ensure optimum use of resources at its peak. With increasing QPS, Systems need to scale up. We can either scale vertically or horizontally. Vertical scalability has its limits as one can increase cpu, memory, disk, GPU and other specifications to certain limit only, whereas horizontal scalability can grow easily and infinitely given limitations imposed by application design and environment attributes. Scaling a web application will require some or all of the following -: Ease the server load by adding more hosts. Distributing the traffic across servers by using Load Balancers. Scale up DB by sharding the data and increasing read replicas. Here\u2019s a good read how LinkedIn scaled its application stack A Brief History of Scaling LinkedIn","title":"Scaling"},{"location":"level102/system_troubleshooting_and_performance/troubleshooting-example/","text":"In this section we will see an example of an issue and try to troubleshoot it, and at the end a few famous troubleshooting stories are shared, which were shared by LinkedIn engineers earlier. Example - Memory leak : Often memory leak issues go unnoticed until the service becomes unresponsive after running for some time (days, week or even month) until service is restarted or bug is fixed, In such cases, service memory usage will reflect in increasing order in the metric graph, something like this graph. Memory leak is mismanagement of memory allocations by application, where unneeded memory is not released, over the period of time objects continue to pile up in memory resulting in service crash. Generally such non-released objects get sorted by garbage collector automatically, but sometimes due to a bug it fails. Debugging helps in figuring where much of the application storage memory is being applied. Then, you start tracking and filter everything based on usage. In case, you find objects that aren\u2019t in use, but are referenced, you can get rid of them by deleting them to avoid memory leaks. In the case of python applications, it comes with inbuilt features like tracemalloc . This module can help pinpoint where an object was allocated first. Almost every language comes with a set of tools/libraries (inbuilt or external) which helps find memory issues. Similarly for Java there is a famous memory leak detection tool called Java VisualVM . Let\u2019s see how a dummy flask based web app with a memory leak bug, with every request its memory usage keeps increasing, and how we can use tracemalloc to capture the leak. Assumption -: A python virtual environment is created, and flask is installed in it. A bare minimum flask code with bug, read comments for more info Starting flask app On start, Its memory usage is around 26576 kb, i.e approx 26MB Now with every subsequent GET request, We can notice that process memory usage continues to increase slowly. Now lets try 10000 requests, to see if memory usage increases heavily. To hit a high number of requests, we use an Apache benchmarking tool called \u201cab\u201d . After 10000 hits, we can notice memory usage of flask app is jumped almost 15 times, i.e from initial 26576 KB to 419316 KB, i.e from roughly 26 MB to 419 MB , That\u2019s a huge jump for such a small webapp. Lets try the python tracemalloc module to try to understand the application memory allocations. Tracemalloc takes memory snapshots at a particular point, performing various statistics on the same. Adding a bare minimum code to our app.py file, no change in fetchuserdata.py file, it will allow us to capture tracemalloc snapshots whenever we will hit /capture uri. After restart of app.py (flask run) , we will - First hit http://127.0.0.1:5000/capture - Then hit http://127.0.0.1:5000/ 10000 times, for memory leak/s to take place. - Finally hit http://127.0.0.1:5000/capture again to take a snapshot to know which line has the most allocation. In the final snapshot, we noticed the exact module and lineno where most allocation happened. I.e fetchuserdata.py, line no 6, after 10000 hits, it is holding 419 MB of memory. Summary Above example shows how a bug can lead to memory leak, and how we can use tracemalloc to understand where it is. In real world applications are way more complex than the above dummy example, you must understand that using tracemalloc might degrade application performance somebit, due to tracemalloc own overheads. Be mindful about its use in production environments. If you are interested in digging deeper into Python Object Memory Allocation Internals and debugging memory leak, have a look at an Interesting talk by Sanket Patel in PyCon India 2019, Debug Memory Leak In Python Flask | Python Object Memory Allocation Internals","title":"Troubleshooting Example"},{"location":"level102/system_troubleshooting_and_performance/troubleshooting-example/#example-memory-leak","text":"Often memory leak issues go unnoticed until the service becomes unresponsive after running for some time (days, week or even month) until service is restarted or bug is fixed, In such cases, service memory usage will reflect in increasing order in the metric graph, something like this graph. Memory leak is mismanagement of memory allocations by application, where unneeded memory is not released, over the period of time objects continue to pile up in memory resulting in service crash. Generally such non-released objects get sorted by garbage collector automatically, but sometimes due to a bug it fails. Debugging helps in figuring where much of the application storage memory is being applied. Then, you start tracking and filter everything based on usage. In case, you find objects that aren\u2019t in use, but are referenced, you can get rid of them by deleting them to avoid memory leaks. In the case of python applications, it comes with inbuilt features like tracemalloc . This module can help pinpoint where an object was allocated first. Almost every language comes with a set of tools/libraries (inbuilt or external) which helps find memory issues. Similarly for Java there is a famous memory leak detection tool called Java VisualVM . Let\u2019s see how a dummy flask based web app with a memory leak bug, with every request its memory usage keeps increasing, and how we can use tracemalloc to capture the leak. Assumption -: A python virtual environment is created, and flask is installed in it. A bare minimum flask code with bug, read comments for more info Starting flask app On start, Its memory usage is around 26576 kb, i.e approx 26MB Now with every subsequent GET request, We can notice that process memory usage continues to increase slowly. Now lets try 10000 requests, to see if memory usage increases heavily. To hit a high number of requests, we use an Apache benchmarking tool called \u201cab\u201d . After 10000 hits, we can notice memory usage of flask app is jumped almost 15 times, i.e from initial 26576 KB to 419316 KB, i.e from roughly 26 MB to 419 MB , That\u2019s a huge jump for such a small webapp. Lets try the python tracemalloc module to try to understand the application memory allocations. Tracemalloc takes memory snapshots at a particular point, performing various statistics on the same. Adding a bare minimum code to our app.py file, no change in fetchuserdata.py file, it will allow us to capture tracemalloc snapshots whenever we will hit /capture uri. After restart of app.py (flask run) , we will - First hit http://127.0.0.1:5000/capture - Then hit http://127.0.0.1:5000/ 10000 times, for memory leak/s to take place. - Finally hit http://127.0.0.1:5000/capture again to take a snapshot to know which line has the most allocation. In the final snapshot, we noticed the exact module and lineno where most allocation happened. I.e fetchuserdata.py, line no 6, after 10000 hits, it is holding 419 MB of memory. Summary Above example shows how a bug can lead to memory leak, and how we can use tracemalloc to understand where it is. In real world applications are way more complex than the above dummy example, you must understand that using tracemalloc might degrade application performance somebit, due to tracemalloc own overheads. Be mindful about its use in production environments. If you are interested in digging deeper into Python Object Memory Allocation Internals and debugging memory leak, have a look at an Interesting talk by Sanket Patel in PyCon India 2019, Debug Memory Leak In Python Flask | Python Object Memory Allocation Internals","title":"Example - Memory leak :"},{"location":"level102/system_troubleshooting_and_performance/troubleshooting/","text":"Troubleshooting system failures can be tricky or tedious at times. In this practice we need to examine the end-to-end flow of a service, all its downstreams, analysing logs, memory leak, CPU usage, disk IO, network failures, hosts issues, etc. Knowing certain practices and tools can help figure & mitigate failures faster. Here\u2019s the high level troubleshooting flowchart -: Troubleshooting Flowchart General Practices Different systems require different approaches for finding issues. Scope of this is limited and given a problem, there can be many more points which can be looked into. Following points will look at some high level practices towards finding webapp failures and finding fixes for the same. Reproduce problem Try the broken request to reproduce the issue, Like try Hit http/s request which fails. Check the end to end flow of request and look for return codes, mostly 3xx, 4xx or 5xx . 3xx are mostly about redirections, 4xx are about unauthorized, bad request, forbidden, etc, And 5xx is mostly about server side issues. Based on the return code you can look for the next step. Client side issues are mainly about missing or buggy static contents, like javascript issues, bad image, broken json from an async call etc, such can result in incorrect page rendering on browsers. Gather Information Look for errors/exceptions in application logs, Like \"Can\u2019t Allocate Memory\" or OutOfMemoryError, Or Something like \"disk I/O error\", Or a DNS resolution error. Check application and host metrics, Look for anomalies in service and hosts graphs. Since when CPU usage has increased, since when memory usage increased, since when disk space is reduced Or Disk I/O is increased, when load average start shooting up etc. Please read the School of SRE link for more detail around metrics and monitoring . Look for recent code or config changes which possibly are breaking the system. Understand the problem Try correlating gathered data with recent actions, like an exception showing up in logs after config/code deployment. Is it due to the QPS increase? Is it bad SQL queries? Do recent code changes demand better or more hardware? Find a solution and apply a fix Based on the above findings, look for a quick fix if possible, For example like rolling back changes if errors/exceptions correlate. Try patching or hotfixing the code, probably in staging setup if you want to fix forward. Try to scale up the system, if high QPS is the reason for system failure, then try adding resources (compute, storage, memory, etc) as necessary. Optimize SQL queries if needed. Verify complete request flow Hit requests again and ensure returns are successful (return code 2xx). Check Logs ensure no more exceptions/errors, as found earlier. Ensure metrics are back to normal. General Host issues To Know if host health is fine or not, look for any hardware failures or its performance issues, one can try following -: Dmesg -: Shows recent errors / failures thrown by kernel. This help with knowing hardware failures if any ls commands -: lspci, lsblk, lscpu, lsscsi, These commands list out pci, disk, cpu information. /var/log/messages -: Shows system app/service related errors/warnings, also shows kernel issues. Smartd -: check disk health.","title":"Troubleshooting"},{"location":"level102/system_troubleshooting_and_performance/troubleshooting/#troubleshooting-flowchart","text":"","title":"Troubleshooting Flowchart"},{"location":"level102/system_troubleshooting_and_performance/troubleshooting/#general-practices","text":"Different systems require different approaches for finding issues. Scope of this is limited and given a problem, there can be many more points which can be looked into. Following points will look at some high level practices towards finding webapp failures and finding fixes for the same. Reproduce problem Try the broken request to reproduce the issue, Like try Hit http/s request which fails. Check the end to end flow of request and look for return codes, mostly 3xx, 4xx or 5xx . 3xx are mostly about redirections, 4xx are about unauthorized, bad request, forbidden, etc, And 5xx is mostly about server side issues. Based on the return code you can look for the next step. Client side issues are mainly about missing or buggy static contents, like javascript issues, bad image, broken json from an async call etc, such can result in incorrect page rendering on browsers. Gather Information Look for errors/exceptions in application logs, Like \"Can\u2019t Allocate Memory\" or OutOfMemoryError, Or Something like \"disk I/O error\", Or a DNS resolution error. Check application and host metrics, Look for anomalies in service and hosts graphs. Since when CPU usage has increased, since when memory usage increased, since when disk space is reduced Or Disk I/O is increased, when load average start shooting up etc. Please read the School of SRE link for more detail around metrics and monitoring . Look for recent code or config changes which possibly are breaking the system. Understand the problem Try correlating gathered data with recent actions, like an exception showing up in logs after config/code deployment. Is it due to the QPS increase? Is it bad SQL queries? Do recent code changes demand better or more hardware? Find a solution and apply a fix Based on the above findings, look for a quick fix if possible, For example like rolling back changes if errors/exceptions correlate. Try patching or hotfixing the code, probably in staging setup if you want to fix forward. Try to scale up the system, if high QPS is the reason for system failure, then try adding resources (compute, storage, memory, etc) as necessary. Optimize SQL queries if needed. Verify complete request flow Hit requests again and ensure returns are successful (return code 2xx). Check Logs ensure no more exceptions/errors, as found earlier. Ensure metrics are back to normal.","title":"General Practices"},{"location":"level102/system_troubleshooting_and_performance/troubleshooting/#general-host-issues","text":"To Know if host health is fine or not, look for any hardware failures or its performance issues, one can try following -: Dmesg -: Shows recent errors / failures thrown by kernel. This help with knowing hardware failures if any ls commands -: lspci, lsblk, lscpu, lsscsi, These commands list out pci, disk, cpu information. /var/log/messages -: Shows system app/service related errors/warnings, also shows kernel issues. Smartd -: check disk health.","title":"General Host issues"}]} \ No newline at end of file +{"config":{"indexing":"full","lang":["en"],"min_search_length":3,"prebuild_index":false,"separator":"[\\s\\-]+"},"docs":[{"location":"","text":"School of SRE Site Reliability Engineers (SREs) sits at the intersection of software engineering and systems engineering. While there are potentially infinite permutations and combinations of how infrastructure and software components can be put together to achieve an objective, focusing on foundational skills allows SREs to work with complex systems and software, regardless of whether these systems are proprietary, 3rd party, open systems, run on cloud/on-prem infrastructure, etc. Particularly important is to gain a deep understanding of how these areas of systems and infrastructure relate to each other and interact with each other. The combination of software and systems engineering skills is rare and is generally built over time with exposure to a wide variety of infrastructure, systems, and software. SREs bring in engineering practices to keep the site up. Each distributed system is an agglomeration of many components. SREs validate business requirements, convert them to SLAs for each of the components that constitute the distributed system, monitor and measure adherence to SLAs, re-architect or scale out to mitigate or avoid SLA breaches, add these learnings as feedback to new systems or projects and thereby reduce operational toil. Hence SREs play a vital role right from the day 0 design of the system. In early 2019, we started visiting campuses across India to recruit the best and brightest minds to make sure LinkedIn, and all the services that make up its complex technology stack are always available for everyone. This critical function at LinkedIn falls under the purview of the Site Engineering team and Site Reliability Engineers (SREs) who are Software Engineers, specialized in reliability. As we continued on this journey we started getting a lot of questions from these campuses on what exactly the site reliability engineering role entails? And, how could someone learn the skills and the disciplines involved to become a successful site reliability engineer? Fast forward a few months, and a few of these campus students had joined LinkedIn either as interns or as full-time engineers to become a part of the Site Engineering team; we also had a few lateral hires who joined our organization who were not from a traditional SRE background. That's when a few of us got together and started to think about how we can onboard new graduate engineers to the Site Engineering team. There are very few resources out there guiding someone on the basic skill sets one has to acquire as a beginner SRE. Because of the lack of these resources, we felt that individuals have a tough time getting into open positions in the industry. We created the School Of SRE as a starting point for anyone wanting to build their career as an SRE. In this course, we are focusing on building strong foundational skills. The course is structured in a way to provide more real life examples and how learning each of these topics can play an important role in day to day job responsibilities of an SRE. Currently we are covering the following topics under the School Of SRE: Level 101 Fundamentals Series Linux Basics Git Linux Networking Python and Web Data Relational databases(MySQL) NoSQL concepts Big Data Systems Design Metrics and Monitoring Security Level 102 Linux Intermediate Linux Advanced Containers and orchestration System Calls and Signals Networking System Design System troubleshooting and performance improvements Continuous Integration and Continuous Delivery We believe continuous learning will help in acquiring deeper knowledge and competencies in order to expand your skill sets, every module has added references that could be a guide for further learning. Our hope is that by going through these modules we should be able to build the essential skills required for a Site Reliability Engineer. At LinkedIn, we are using this curriculum for onboarding our non-traditional hires and new college grads into the SRE role. We had multiple rounds of successful onboarding experiences with new employees and the course helped them be productive in a very short period of time. This motivated us to open source the content for helping other organizations in onboarding new engineers into the role and provide guidance for aspiring individuals to get into the role. We realize that the initial content we created is just a starting point and we hope that the community can help in the journey of refining and expanding the content. Check out the contributing guide to get started.","title":"Home"},{"location":"#school-of-sre","text":"Site Reliability Engineers (SREs) sits at the intersection of software engineering and systems engineering. While there are potentially infinite permutations and combinations of how infrastructure and software components can be put together to achieve an objective, focusing on foundational skills allows SREs to work with complex systems and software, regardless of whether these systems are proprietary, 3rd party, open systems, run on cloud/on-prem infrastructure, etc. Particularly important is to gain a deep understanding of how these areas of systems and infrastructure relate to each other and interact with each other. The combination of software and systems engineering skills is rare and is generally built over time with exposure to a wide variety of infrastructure, systems, and software. SREs bring in engineering practices to keep the site up. Each distributed system is an agglomeration of many components. SREs validate business requirements, convert them to SLAs for each of the components that constitute the distributed system, monitor and measure adherence to SLAs, re-architect or scale out to mitigate or avoid SLA breaches, add these learnings as feedback to new systems or projects and thereby reduce operational toil. Hence SREs play a vital role right from the day 0 design of the system. In early 2019, we started visiting campuses across India to recruit the best and brightest minds to make sure LinkedIn, and all the services that make up its complex technology stack are always available for everyone. This critical function at LinkedIn falls under the purview of the Site Engineering team and Site Reliability Engineers (SREs) who are Software Engineers, specialized in reliability. As we continued on this journey we started getting a lot of questions from these campuses on what exactly the site reliability engineering role entails? And, how could someone learn the skills and the disciplines involved to become a successful site reliability engineer? Fast forward a few months, and a few of these campus students had joined LinkedIn either as interns or as full-time engineers to become a part of the Site Engineering team; we also had a few lateral hires who joined our organization who were not from a traditional SRE background. That's when a few of us got together and started to think about how we can onboard new graduate engineers to the Site Engineering team. There are very few resources out there guiding someone on the basic skill sets one has to acquire as a beginner SRE. Because of the lack of these resources, we felt that individuals have a tough time getting into open positions in the industry. We created the School Of SRE as a starting point for anyone wanting to build their career as an SRE. In this course, we are focusing on building strong foundational skills. The course is structured in a way to provide more real life examples and how learning each of these topics can play an important role in day to day job responsibilities of an SRE. Currently we are covering the following topics under the School Of SRE: Level 101 Fundamentals Series Linux Basics Git Linux Networking Python and Web Data Relational databases(MySQL) NoSQL concepts Big Data Systems Design Metrics and Monitoring Security Level 102 Linux Intermediate Linux Advanced Containers and orchestration System Calls and Signals Networking System Design System troubleshooting and performance improvements Continuous Integration and Continuous Delivery We believe continuous learning will help in acquiring deeper knowledge and competencies in order to expand your skill sets, every module has added references that could be a guide for further learning. Our hope is that by going through these modules we should be able to build the essential skills required for a Site Reliability Engineer. At LinkedIn, we are using this curriculum for onboarding our non-traditional hires and new college grads into the SRE role. We had multiple rounds of successful onboarding experiences with new employees and the course helped them be productive in a very short period of time. This motivated us to open source the content for helping other organizations in onboarding new engineers into the role and provide guidance for aspiring individuals to get into the role. We realize that the initial content we created is just a starting point and we hope that the community can help in the journey of refining and expanding the content. Check out the contributing guide to get started.","title":"School of SRE"},{"location":"CODE_OF_CONDUCT/","text":"This code of conduct outlines expectations for participation in LinkedIn-managed open source communities, as well as steps for reporting unacceptable behavior. We are committed to providing a welcoming and inspiring community for all. People violating this code of conduct may be banned from the community. Our open source communities strive to: Be friendly and patient: Remember you might not be communicating in someone else's primary spoken or programming language, and others may not have your level of understanding. Be welcoming: Our communities welcome and support people of all backgrounds and identities. This includes, but is not limited to members of any race, ethnicity, culture, national origin, color, immigration status, social and economic class, educational level, sex, sexual orientation, gender identity and expression, age, size, family status, political belief, religion, and mental and physical ability. Be respectful: We are a world-wide community of professionals, and we conduct ourselves professionally. Disagreement is no excuse for poor behavior and poor manners. Disrespectful and unacceptable behavior includes, but is not limited to: Violent threats or language. Discriminatory or derogatory jokes and language. Posting sexually explicit or violent material. Posting, or threatening to post, people's personally identifying information (\"doxing\"). Insults, especially those using discriminatory terms or slurs. Behavior that could be perceived as sexual attention. Advocating for or encouraging any of the above behaviors. Understand disagreements: Disagreements, both social and technical, are useful learning opportunities. Seek to understand the other viewpoints and resolve differences constructively. This code is not exhaustive or complete. It serves to capture our common understanding of a productive, collaborative environment. We expect the code to be followed in spirit as much as in the letter. Scope This code of conduct applies to all repos and communities for LinkedIn-managed open source projects regardless of whether or not the repo explicitly calls out its use of this code. The code also applies in public spaces when an individual is representing a project or its community. Examples include using an official project e-mail address, posting via an official social media account, or acting as an appointed representative at an online or offline event. Representation of a project may be further defined and clarified by project maintainers. Note: Some LinkedIn-managed communities have codes of conduct that pre-date this document and issue resolution process. While communities are not required to change their code, they are expected to use the resolution process outlined here. The review team will coordinate with the communities involved to address your concerns. Reporting Code of Conduct Issues We encourage all communities to resolve issues on their own whenever possible. This builds a broader and deeper understanding and ultimately a healthier interaction. In the event that an issue cannot be resolved locally, please feel free to report your concerns by contacting oss@linkedin.com . In your report please include: Your contact information. Names (real, usernames or pseudonyms) of any individuals involved. If there are additional witnesses, please include them as well. Your account of what occurred, and if you believe the incident is ongoing. If there is a publicly available record (e.g. a mailing list archive or a public chat log), please include a link or attachment. Any additional information that may be helpful. All reports will be reviewed by a multi-person team and will result in a response that is deemed necessary and appropriate to the circumstances. Where additional perspectives are needed, the team may seek insight from others with relevant expertise or experience. The confidentiality of the person reporting the incident will be kept at all times. Involved parties are never part of the review team. Anyone asked to stop unacceptable behavior is expected to comply immediately. If an individual engages in unacceptable behavior, the review team may take any action they deem appropriate, including a permanent ban from the community. This code of conduct is based on the Microsoft Open Source Code of Conduct which was based on the template established by the TODO Group and used by numerous other large communities (e.g., Facebook , Yahoo , Twitter , GitHub ) and the Scope section from the Contributor Covenant version 1.4 .","title":"Code of Conduct"},{"location":"CODE_OF_CONDUCT/#scope","text":"This code of conduct applies to all repos and communities for LinkedIn-managed open source projects regardless of whether or not the repo explicitly calls out its use of this code. The code also applies in public spaces when an individual is representing a project or its community. Examples include using an official project e-mail address, posting via an official social media account, or acting as an appointed representative at an online or offline event. Representation of a project may be further defined and clarified by project maintainers. Note: Some LinkedIn-managed communities have codes of conduct that pre-date this document and issue resolution process. While communities are not required to change their code, they are expected to use the resolution process outlined here. The review team will coordinate with the communities involved to address your concerns.","title":"Scope"},{"location":"CODE_OF_CONDUCT/#reporting-code-of-conduct-issues","text":"We encourage all communities to resolve issues on their own whenever possible. This builds a broader and deeper understanding and ultimately a healthier interaction. In the event that an issue cannot be resolved locally, please feel free to report your concerns by contacting oss@linkedin.com . In your report please include: Your contact information. Names (real, usernames or pseudonyms) of any individuals involved. If there are additional witnesses, please include them as well. Your account of what occurred, and if you believe the incident is ongoing. If there is a publicly available record (e.g. a mailing list archive or a public chat log), please include a link or attachment. Any additional information that may be helpful. All reports will be reviewed by a multi-person team and will result in a response that is deemed necessary and appropriate to the circumstances. Where additional perspectives are needed, the team may seek insight from others with relevant expertise or experience. The confidentiality of the person reporting the incident will be kept at all times. Involved parties are never part of the review team. Anyone asked to stop unacceptable behavior is expected to comply immediately. If an individual engages in unacceptable behavior, the review team may take any action they deem appropriate, including a permanent ban from the community. This code of conduct is based on the Microsoft Open Source Code of Conduct which was based on the template established by the TODO Group and used by numerous other large communities (e.g., Facebook , Yahoo , Twitter , GitHub ) and the Scope section from the Contributor Covenant version 1.4 .","title":"Reporting Code of Conduct Issues"},{"location":"CONTRIBUTING/","text":"We realise that the initial content we created is just a starting point and our hope is that the community can help in the journey refining and extending the contents. As a contributor, you represent that the content you submit is not plagiarised. By submitting the content, you (and, if applicable, your employer) are licensing the submitted content to LinkedIn and the open source community subject to the Creative Commons Attribution 4.0 International Public License. Repository URL : https://github.com/linkedin/school-of-sre Contributing Guidelines Ensure that you adhere to the following guidelines: Should be about principles and concepts that can be applied in any company or individual project. Do not focus on particular tools or tech stack(which usually change over time). Adhere to the Code of Conduct . Should be relevant to the roles and responsibilities of an SRE. Should be locally tested (see steps for testing) and well formatted. It is good practice to open an issue first and discuss your changes before submitting a pull request. This way, you can incorporate ideas from others before you even start. Building and testing locally Run the following commands to build and view the site locally before opening a PR. python3 -m venv .venv source .venv/bin/activate pip install -r requirements.txt mkdocs build mkdocs serve Opening a PR Follow the GitHub PR workflow for your contributions. Fork this repo, create a feature branch, commit your changes and open a PR to this repo.","title":"Contribute"},{"location":"CONTRIBUTING/#contributing-guidelines","text":"Ensure that you adhere to the following guidelines: Should be about principles and concepts that can be applied in any company or individual project. Do not focus on particular tools or tech stack(which usually change over time). Adhere to the Code of Conduct . Should be relevant to the roles and responsibilities of an SRE. Should be locally tested (see steps for testing) and well formatted. It is good practice to open an issue first and discuss your changes before submitting a pull request. This way, you can incorporate ideas from others before you even start.","title":"Contributing Guidelines"},{"location":"CONTRIBUTING/#building-and-testing-locally","text":"Run the following commands to build and view the site locally before opening a PR. python3 -m venv .venv source .venv/bin/activate pip install -r requirements.txt mkdocs build mkdocs serve","title":"Building and testing locally"},{"location":"CONTRIBUTING/#opening-a-pr","text":"Follow the GitHub PR workflow for your contributions. Fork this repo, create a feature branch, commit your changes and open a PR to this repo.","title":"Opening a PR"},{"location":"sre_community/","text":"We are having an active LinkedIn community for School of SRE. Please join the group via : https://www.linkedin.com/groups/12493545/ The group has members with different levels of experience in site reliability engineering. There are active conversation on different technical topics centered around site reliability engineering. We encourage everyone to join the conversation and learn from each other and build a successful career in the SRE space.","title":"SRE Community"},{"location":"level101/big_data/evolution/","text":"Evolution of Hadoop Architecture of Hadoop HDFS The Hadoop Distributed File System (HDFS) is a distributed file system designed to run on commodity hardware. It has many similarities with existing distributed file systems. However, the differences from other distributed file systems are significant. HDFS is highly fault-tolerant and is designed to be deployed on low-cost hardware. HDFS provides high throughput access to application data and is suitable for applications that have large data sets. HDFS is part of the Apache Hadoop Core project . The main components of HDFS include: 1. NameNode: is the arbitrator and central repository of file namespace in the cluster. The NameNode executes the operations such as opening, closing, and renaming files and directories. 2. DataNode: manages the storage attached to the node on which it runs. It is responsible for serving all the read and writes requests. It performs operations on instructions on NameNode such as creation, deletion, and replications of blocks. 3. Client: Responsible for getting the required metadata from the namenode and then communicating with the datanodes for reads and writes. YARN YARN stands for \u201cYet Another Resource Negotiator\u201c. It was introduced in Hadoop 2.0 to remove the bottleneck on Job Tracker which was present in Hadoop 1.0. YARN was described as a \u201cRedesigned Resource Manager\u201d at the time of its launching, but it has now evolved to be known as a large-scale distributed operating system used for Big Data processing. The main components of YARN architecture include: 1. Client: It submits map-reduce(MR) jobs to the resource manager. 2. Resource Manager: It is the master daemon of YARN and is responsible for resource assignment and management among all the applications. Whenever it receives a processing request, it forwards it to the corresponding node manager and allocates resources for the completion of the request accordingly. It has two major components: 1. Scheduler: It performs scheduling based on the allocated application and available resources. It is a pure scheduler, which means that it does not perform other tasks such as monitoring or tracking and does not guarantee a restart if a task fails. The YARN scheduler supports plugins such as Capacity Scheduler and Fair Scheduler to partition the cluster resources. 2. Application manager: It is responsible for accepting the application and negotiating the first container from the resource manager. It also restarts the Application Manager container if a task fails. 3. Node Manager: It takes care of individual nodes on the Hadoop cluster and manages application and workflow and that particular node. Its primary job is to keep up with the Node Manager. It monitors resource usage, performs log management, and also kills a container based on directions from the resource manager. It is also responsible for creating the container process and starting it at the request of the Application master. 4. Application Master: An application is a single job submitted to a framework. The application manager is responsible for negotiating resources with the resource manager, tracking the status, and monitoring the progress of a single application. The application master requests the container from the node manager by sending a Container Launch Context(CLC) which includes everything an application needs to run. Once the application is started, it sends the health report to the resource manager from time-to-time. 5. Container: It is a collection of physical resources such as RAM, CPU cores, and disk on a single node. The containers are invoked by Container Launch Context(CLC) which is a record that contains information such as environment variables, security tokens, dependencies, etc. MapReduce framework The term MapReduce represents two separate and distinct tasks Hadoop programs perform-Map Job and Reduce Job. Map jobs take data sets as input and process them to produce key-value pairs. Reduce job takes the output of the Map job i.e. the key-value pairs and aggregates them to produce desired results. Hadoop MapReduce (Hadoop Map/Reduce) is a software framework for distributed processing of large data sets on computing clusters. Mapreduce helps to split the input data set into a number of parts and run a program on all data parts parallel at once. Please find the below Word count example demonstrating the usage of the MapReduce framework: Other tooling around Hadoop Hive Uses a language called HQL which is very SQL like. Gives non-programmers the ability to query and analyze data in Hadoop. Is basically an abstraction layer on top of map-reduce. Ex. HQL query: SELECT pet.name, comment FROM pet JOIN event ON (pet.name = event.name); In mysql: SELECT pet.name, comment FROM pet, event WHERE pet.name = event.name; Pig Uses a scripting language called Pig Latin, which is more workflow driven. Don't need to be an expert Java programmer but need a few coding skills. Is also an abstraction layer on top of map-reduce. Here is a quick question for you: What is the output of running the pig queries in the right column against the data present in the left column in the below image? Output: 7,Komal,Nayak,24,9848022334,trivendram 8,Bharathi,Nambiayar,24,9848022333,Chennai 5,Trupthi,Mohanthy,23,9848022336,Bhuwaneshwar 6,Archana,Mishra,23,9848022335,Chennai Spark Spark provides primitives for in-memory cluster computing that allows user programs to load data into a cluster\u2019s memory and query it repeatedly, making it well suited to machine learning algorithms. Presto Presto is a high performance, distributed SQL query engine for Big Data. Its architecture allows users to query a variety of data sources such as Hadoop, AWS S3, Alluxio, MySQL, Cassandra, Kafka, and MongoDB. Example presto query: use studentDB; show tables; SELECT roll_no, name FROM studentDB.studentDetails where section=\u2019A\u2019 limit 5; Data Serialisation and storage In order to transport the data over the network or to store on some persistent storage, we use the process of translating data structures or objects state into binary or textual form. We call this process serialization.. Avro data is stored in a container file (a .avro file) and its schema (the .avsc file) is stored with the data file. Apache Hive provides support to store a table as Avro and can also query data in this serialisation format.","title":"Evolution and Architecture of Hadoop"},{"location":"level101/big_data/evolution/#evolution-of-hadoop","text":"","title":"Evolution of Hadoop"},{"location":"level101/big_data/evolution/#architecture-of-hadoop","text":"HDFS The Hadoop Distributed File System (HDFS) is a distributed file system designed to run on commodity hardware. It has many similarities with existing distributed file systems. However, the differences from other distributed file systems are significant. HDFS is highly fault-tolerant and is designed to be deployed on low-cost hardware. HDFS provides high throughput access to application data and is suitable for applications that have large data sets. HDFS is part of the Apache Hadoop Core project . The main components of HDFS include: 1. NameNode: is the arbitrator and central repository of file namespace in the cluster. The NameNode executes the operations such as opening, closing, and renaming files and directories. 2. DataNode: manages the storage attached to the node on which it runs. It is responsible for serving all the read and writes requests. It performs operations on instructions on NameNode such as creation, deletion, and replications of blocks. 3. Client: Responsible for getting the required metadata from the namenode and then communicating with the datanodes for reads and writes. YARN YARN stands for \u201cYet Another Resource Negotiator\u201c. It was introduced in Hadoop 2.0 to remove the bottleneck on Job Tracker which was present in Hadoop 1.0. YARN was described as a \u201cRedesigned Resource Manager\u201d at the time of its launching, but it has now evolved to be known as a large-scale distributed operating system used for Big Data processing. The main components of YARN architecture include: 1. Client: It submits map-reduce(MR) jobs to the resource manager. 2. Resource Manager: It is the master daemon of YARN and is responsible for resource assignment and management among all the applications. Whenever it receives a processing request, it forwards it to the corresponding node manager and allocates resources for the completion of the request accordingly. It has two major components: 1. Scheduler: It performs scheduling based on the allocated application and available resources. It is a pure scheduler, which means that it does not perform other tasks such as monitoring or tracking and does not guarantee a restart if a task fails. The YARN scheduler supports plugins such as Capacity Scheduler and Fair Scheduler to partition the cluster resources. 2. Application manager: It is responsible for accepting the application and negotiating the first container from the resource manager. It also restarts the Application Manager container if a task fails. 3. Node Manager: It takes care of individual nodes on the Hadoop cluster and manages application and workflow and that particular node. Its primary job is to keep up with the Node Manager. It monitors resource usage, performs log management, and also kills a container based on directions from the resource manager. It is also responsible for creating the container process and starting it at the request of the Application master. 4. Application Master: An application is a single job submitted to a framework. The application manager is responsible for negotiating resources with the resource manager, tracking the status, and monitoring the progress of a single application. The application master requests the container from the node manager by sending a Container Launch Context(CLC) which includes everything an application needs to run. Once the application is started, it sends the health report to the resource manager from time-to-time. 5. Container: It is a collection of physical resources such as RAM, CPU cores, and disk on a single node. The containers are invoked by Container Launch Context(CLC) which is a record that contains information such as environment variables, security tokens, dependencies, etc.","title":"Architecture of Hadoop"},{"location":"level101/big_data/evolution/#mapreduce-framework","text":"The term MapReduce represents two separate and distinct tasks Hadoop programs perform-Map Job and Reduce Job. Map jobs take data sets as input and process them to produce key-value pairs. Reduce job takes the output of the Map job i.e. the key-value pairs and aggregates them to produce desired results. Hadoop MapReduce (Hadoop Map/Reduce) is a software framework for distributed processing of large data sets on computing clusters. Mapreduce helps to split the input data set into a number of parts and run a program on all data parts parallel at once. Please find the below Word count example demonstrating the usage of the MapReduce framework:","title":"MapReduce framework"},{"location":"level101/big_data/evolution/#other-tooling-around-hadoop","text":"Hive Uses a language called HQL which is very SQL like. Gives non-programmers the ability to query and analyze data in Hadoop. Is basically an abstraction layer on top of map-reduce. Ex. HQL query: SELECT pet.name, comment FROM pet JOIN event ON (pet.name = event.name); In mysql: SELECT pet.name, comment FROM pet, event WHERE pet.name = event.name; Pig Uses a scripting language called Pig Latin, which is more workflow driven. Don't need to be an expert Java programmer but need a few coding skills. Is also an abstraction layer on top of map-reduce. Here is a quick question for you: What is the output of running the pig queries in the right column against the data present in the left column in the below image? Output: 7,Komal,Nayak,24,9848022334,trivendram 8,Bharathi,Nambiayar,24,9848022333,Chennai 5,Trupthi,Mohanthy,23,9848022336,Bhuwaneshwar 6,Archana,Mishra,23,9848022335,Chennai Spark Spark provides primitives for in-memory cluster computing that allows user programs to load data into a cluster\u2019s memory and query it repeatedly, making it well suited to machine learning algorithms. Presto Presto is a high performance, distributed SQL query engine for Big Data. Its architecture allows users to query a variety of data sources such as Hadoop, AWS S3, Alluxio, MySQL, Cassandra, Kafka, and MongoDB. Example presto query: use studentDB; show tables; SELECT roll_no, name FROM studentDB.studentDetails where section=\u2019A\u2019 limit 5;","title":"Other tooling around Hadoop"},{"location":"level101/big_data/evolution/#data-serialisation-and-storage","text":"In order to transport the data over the network or to store on some persistent storage, we use the process of translating data structures or objects state into binary or textual form. We call this process serialization.. Avro data is stored in a container file (a .avro file) and its schema (the .avsc file) is stored with the data file. Apache Hive provides support to store a table as Avro and can also query data in this serialisation format.","title":"Data Serialisation and storage"},{"location":"level101/big_data/intro/","text":"Big Data Prerequisites Basics of Linux File systems. Basic understanding of System Design. What to expect from this course This course covers the basics of Big Data and how it has evolved to become what it is today. We will take a look at a few realistic scenarios where Big Data would be a perfect fit. An interesting assignment on designing a Big Data system is followed by understanding the architecture of Hadoop and the tooling around it. What is not covered under this course Writing programs to draw analytics from data. Course Contents Overview of Big Data Usage of Big Data techniques Evolution of Hadoop Architecture of hadoop HDFS Yarn MapReduce framework Other tooling around hadoop Hive Pig Spark Presto Data Serialisation and storage Overview of Big Data Big Data is a collection of large datasets that cannot be processed using traditional computing techniques. It is not a single technique or a tool, rather it has become a complete subject, which involves various tools, techniques, and frameworks. Big Data could consist of Structured data Unstructured data Semi-structured data Characteristics of Big Data: Volume Variety Velocity Variability Examples of Big Data generation include stock exchanges, social media sites, jet engines, etc. Usage of Big Data Techniques Take the example of the traffic lights problem. There are more than 300,000 traffic lights in the US as of 2018. Let us assume that we placed a device on each of them to collect metrics and send it to a central metrics collection system. If each of the IoT devices sends 10 events per minute, we have 300000x10x60x24 = 432x10^7 events per day. How would you go about processing that and telling me how many of the signals were \u201cgreen\u201d at 10:45 am on a particular day? Consider the next example on Unified Payments Interface (UPI) transactions: We had about 1.15 billion UPI transactions in the month of October 2019 in India. If we try to extrapolate this data to about a year and try to find out some common payments that were happening through a particular UPI ID, how do you suggest we go about that?","title":"Introduction"},{"location":"level101/big_data/intro/#big-data","text":"","title":"Big Data"},{"location":"level101/big_data/intro/#prerequisites","text":"Basics of Linux File systems. Basic understanding of System Design.","title":"Prerequisites"},{"location":"level101/big_data/intro/#what-to-expect-from-this-course","text":"This course covers the basics of Big Data and how it has evolved to become what it is today. We will take a look at a few realistic scenarios where Big Data would be a perfect fit. An interesting assignment on designing a Big Data system is followed by understanding the architecture of Hadoop and the tooling around it.","title":"What to expect from this course"},{"location":"level101/big_data/intro/#what-is-not-covered-under-this-course","text":"Writing programs to draw analytics from data.","title":"What is not covered under this course"},{"location":"level101/big_data/intro/#course-contents","text":"Overview of Big Data Usage of Big Data techniques Evolution of Hadoop Architecture of hadoop HDFS Yarn MapReduce framework Other tooling around hadoop Hive Pig Spark Presto Data Serialisation and storage","title":"Course Contents"},{"location":"level101/big_data/intro/#overview-of-big-data","text":"Big Data is a collection of large datasets that cannot be processed using traditional computing techniques. It is not a single technique or a tool, rather it has become a complete subject, which involves various tools, techniques, and frameworks. Big Data could consist of Structured data Unstructured data Semi-structured data Characteristics of Big Data: Volume Variety Velocity Variability Examples of Big Data generation include stock exchanges, social media sites, jet engines, etc.","title":"Overview of Big Data"},{"location":"level101/big_data/intro/#usage-of-big-data-techniques","text":"Take the example of the traffic lights problem. There are more than 300,000 traffic lights in the US as of 2018. Let us assume that we placed a device on each of them to collect metrics and send it to a central metrics collection system. If each of the IoT devices sends 10 events per minute, we have 300000x10x60x24 = 432x10^7 events per day. How would you go about processing that and telling me how many of the signals were \u201cgreen\u201d at 10:45 am on a particular day? Consider the next example on Unified Payments Interface (UPI) transactions: We had about 1.15 billion UPI transactions in the month of October 2019 in India. If we try to extrapolate this data to about a year and try to find out some common payments that were happening through a particular UPI ID, how do you suggest we go about that?","title":"Usage of Big Data Techniques"},{"location":"level101/big_data/tasks/","text":"Tasks and conclusion Post-training tasks: Try setting up your own 3 node Hadoop cluster. A VM based solution can be found here Write a simple spark/MR job of your choice and understand how to generate analytics from data. Sample dataset can be found here References: Hadoop documentation HDFS Architecture YARN Architecture Google GFS paper","title":"Conclusion"},{"location":"level101/big_data/tasks/#tasks-and-conclusion","text":"","title":"Tasks and conclusion"},{"location":"level101/big_data/tasks/#post-training-tasks","text":"Try setting up your own 3 node Hadoop cluster. A VM based solution can be found here Write a simple spark/MR job of your choice and understand how to generate analytics from data. Sample dataset can be found here","title":"Post-training tasks:"},{"location":"level101/big_data/tasks/#references","text":"Hadoop documentation HDFS Architecture YARN Architecture Google GFS paper","title":"References:"},{"location":"level101/databases_nosql/further_reading/","text":"Conclusion We have covered basic concepts of NoSQL databases. There is much more to learn and do. We hope this course gives you a good start and inspires you to explore further. Further reading NoSQL: https://hostingdata.co.uk/nosql-database/ https://www.mongodb.com/nosql-explained https://www.mongodb.com/nosql-explained/nosql-vs-sql Cap Theorem http://www.julianbrowne.com/article/brewers-cap-theorem Scalability http://www.slideshare.net/jboner/scalability-availability-stability-patterns Eventual Consistency https://www.allthingsdistributed.com/2008/12/eventually_consistent.html https://www.toptal.com/big-data/consistent-hashing https://web.stanford.edu/class/cs244/papers/chord_TON_2003.pdf","title":"Conclusion"},{"location":"level101/databases_nosql/further_reading/#conclusion","text":"We have covered basic concepts of NoSQL databases. There is much more to learn and do. We hope this course gives you a good start and inspires you to explore further.","title":"Conclusion"},{"location":"level101/databases_nosql/further_reading/#further-reading","text":"NoSQL: https://hostingdata.co.uk/nosql-database/ https://www.mongodb.com/nosql-explained https://www.mongodb.com/nosql-explained/nosql-vs-sql Cap Theorem http://www.julianbrowne.com/article/brewers-cap-theorem Scalability http://www.slideshare.net/jboner/scalability-availability-stability-patterns Eventual Consistency https://www.allthingsdistributed.com/2008/12/eventually_consistent.html https://www.toptal.com/big-data/consistent-hashing https://web.stanford.edu/class/cs244/papers/chord_TON_2003.pdf","title":"Further reading"},{"location":"level101/databases_nosql/intro/","text":"NoSQL Concepts Prerequisites Relational Databases What to expect from this course At the end of training, you will have an understanding of what a NoSQL database is, what kind of advantages or disadvantages it has over traditional RDBMS, learn about different types of NoSQL databases and understand some of the underlying concepts & trade offs w.r.t to NoSQL. What is not covered under this course We will not be deep diving into any specific NoSQL Database. Course Contents Introduction to NoSQL CAP Theorem Data versioning Partitioning Hashing Quorum Introduction When people use the term \u201cNoSQL database\u201d, they typically use it to refer to any non-relational database. Some say the term \u201cNoSQL\u201d stands for \u201cnon SQL\u201d while others say it stands for \u201cnot only SQL.\u201d Either way, most agree that NoSQL databases are databases that store data in a format other than relational tables. A common misconception is that NoSQL databases or non-relational databases don\u2019t store relationship data well. NoSQL databases can store relationship data\u2014they just store it differently than relational databases do. In fact, when compared with SQL databases, many find modeling relationship data in NoSQL databases to be easier , because related data doesn\u2019t have to be split between tables. Such databases have existed since the late 1960s, but the name \"NoSQL\" was only coined in the early 21st century. NASA used a NoSQL database to track inventory for the Apollo mission. NoSQL databases emerged in the late 2000s as the cost of storage dramatically decreased. Gone were the days of needing to create a complex, difficult-to-manage data model simply for the purposes of reducing data duplication. Developers (rather than storage) were becoming the primary cost of software development, so NoSQL databases optimized for developer productivity. With the rise of Agile development methodology, NoSQL databases were developed with a focus on scaling, fast performance and at the same time allowed for frequent application changes and made programming easier. Types of NoSQL databases: Over time due to the way these NoSQL databases were developed to suit requirements at different companies, we ended up with quite a few types of them. However, they can be broadly classified into 4 types. Some of the databases can overlap between different types. They are Document databases: They store data in documents similar to JSON (JavaScript Object Notation) objects. Each document contains pairs of fields and values. The values can typically be a variety of types including things like strings, numbers, booleans, arrays, or objects, and their structures typically align with objects developers are working with in code. The advantages include intuitive data model & flexible schemas. Because of their variety of field value types and powerful query languages, document databases are great for a wide variety of use cases and can be used as a general purpose database. They can horizontally scale-out to accomodate large data volumes. Ex: MongoDB, Couchbase Key-Value databases: These are a simpler type of databases where each item contains keys and values. A value can typically only be retrieved by referencing its key, so learning how to query for a specific key-value pair is typically simple. Key-value databases are great for use cases where you need to store large amounts of data but you don\u2019t need to perform complex queries to retrieve it. Common use cases include storing user preferences or caching. Ex: Redis , DynamoDB , Voldemort / Venice (Linkedin), Wide-Column stores: They store data in tables, rows, and dynamic columns. Wide-column stores provide a lot of flexibility over relational databases because each row is not required to have the same columns. Many consider wide-column stores to be two-dimensional key-value databases. Wide-column stores are great for when you need to store large amounts of data and you can predict what your query patterns will be. Wide-column stores are commonly used for storing Internet of Things data and user profile data. Cassandra and HBase are two of the most popular wide-column stores. Graph Databases: These databases store data in nodes and edges. Nodes typically store information about people, places, and things while edges store information about the relationships between the nodes. The underlying storage mechanism of graph databases can vary. Some depend on a relational engine and \u201cstore\u201d the graph data in a table (although a table is a logical element, therefore this approach imposes another level of abstraction between the graph database, the graph database management system and the physical devices where the data is actually stored). Others use a key-value store or document-oriented database for storage, making them inherently NoSQL structures. Graph databases excel in use cases where you need to traverse relationships to look for patterns such as social networks, fraud detection, and recommendation engines. Ex: Neo4j Comparison Performance Scalability Flexibility Complexity Functionality Key Value high high high none Variable Document stores high Variable (high) high low Variable (low) Column DB high high moderate low minimal Graph Variable Variable high high Graph theory Differences between SQL and NoSQL The table below summarizes the main differences between SQL and NoSQL databases. SQL Databases NoSQL Databases Data Storage Model Tables with fixed rows and columns Document: JSON documents, Key-value: key-value pairs, Wide-column: tables with rows and dynamic columns, Graph: nodes and edges Primary Purpose General purpose Document: general purpose, Key-value: large amounts of data with simple lookup queries, Wide-column: large amounts of data with predictable query patterns, Graph: analyzing and traversing relationships between connected data Schemas Rigid Flexible Scaling Vertical (scale-up with a larger server) Horizontal (scale-out across commodity servers) Multi-Record ACID Transactions Supported Most do not support multi-record ACID transactions. However, some\u2014like MongoDB\u2014do. Joins Typically required Typically not required Data to Object Mapping Requires ORM (object-relational mapping) Many do not require ORMs. Document DB documents map directly to data structures in most popular programming languages. Advantages Flexible Data Models Most NoSQL systems feature flexible schemas. A flexible schema means you can easily modify your database schema to add or remove fields to support for evolving application requirements. This facilitates with continuous application development of new features without database operation overhead. Horizontal Scaling Most NoSQL systems allow you to scale horizontally, which means you can add in cheaper & commodity hardware, whenever you want to scale a system. On the other hand SQL systems generally scale Vertically (a more powerful server). NoSQL systems can also host huge data sets when compared to traditional SQL systems. Fast Queries NoSQL can generally be a lot faster than traditional SQL systems due to data denormalization and horizontal scaling. Most NoSQL systems also tend to store similar data together facilitating faster query responses. Developer productivity NoSQL systems tend to map data based on the programming data structures. As a result developers need to perform fewer data transformations leading to increased productivity & fewer bugs.","title":"Introduction"},{"location":"level101/databases_nosql/intro/#nosql-concepts","text":"","title":"NoSQL Concepts"},{"location":"level101/databases_nosql/intro/#prerequisites","text":"Relational Databases","title":"Prerequisites"},{"location":"level101/databases_nosql/intro/#what-to-expect-from-this-course","text":"At the end of training, you will have an understanding of what a NoSQL database is, what kind of advantages or disadvantages it has over traditional RDBMS, learn about different types of NoSQL databases and understand some of the underlying concepts & trade offs w.r.t to NoSQL.","title":"What to expect from this course"},{"location":"level101/databases_nosql/intro/#what-is-not-covered-under-this-course","text":"We will not be deep diving into any specific NoSQL Database.","title":"What is not covered under this course"},{"location":"level101/databases_nosql/intro/#course-contents","text":"Introduction to NoSQL CAP Theorem Data versioning Partitioning Hashing Quorum","title":"Course Contents"},{"location":"level101/databases_nosql/intro/#introduction","text":"When people use the term \u201cNoSQL database\u201d, they typically use it to refer to any non-relational database. Some say the term \u201cNoSQL\u201d stands for \u201cnon SQL\u201d while others say it stands for \u201cnot only SQL.\u201d Either way, most agree that NoSQL databases are databases that store data in a format other than relational tables. A common misconception is that NoSQL databases or non-relational databases don\u2019t store relationship data well. NoSQL databases can store relationship data\u2014they just store it differently than relational databases do. In fact, when compared with SQL databases, many find modeling relationship data in NoSQL databases to be easier , because related data doesn\u2019t have to be split between tables. Such databases have existed since the late 1960s, but the name \"NoSQL\" was only coined in the early 21st century. NASA used a NoSQL database to track inventory for the Apollo mission. NoSQL databases emerged in the late 2000s as the cost of storage dramatically decreased. Gone were the days of needing to create a complex, difficult-to-manage data model simply for the purposes of reducing data duplication. Developers (rather than storage) were becoming the primary cost of software development, so NoSQL databases optimized for developer productivity. With the rise of Agile development methodology, NoSQL databases were developed with a focus on scaling, fast performance and at the same time allowed for frequent application changes and made programming easier.","title":"Introduction"},{"location":"level101/databases_nosql/intro/#types-of-nosql-databases","text":"Over time due to the way these NoSQL databases were developed to suit requirements at different companies, we ended up with quite a few types of them. However, they can be broadly classified into 4 types. Some of the databases can overlap between different types. They are Document databases: They store data in documents similar to JSON (JavaScript Object Notation) objects. Each document contains pairs of fields and values. The values can typically be a variety of types including things like strings, numbers, booleans, arrays, or objects, and their structures typically align with objects developers are working with in code. The advantages include intuitive data model & flexible schemas. Because of their variety of field value types and powerful query languages, document databases are great for a wide variety of use cases and can be used as a general purpose database. They can horizontally scale-out to accomodate large data volumes. Ex: MongoDB, Couchbase Key-Value databases: These are a simpler type of databases where each item contains keys and values. A value can typically only be retrieved by referencing its key, so learning how to query for a specific key-value pair is typically simple. Key-value databases are great for use cases where you need to store large amounts of data but you don\u2019t need to perform complex queries to retrieve it. Common use cases include storing user preferences or caching. Ex: Redis , DynamoDB , Voldemort / Venice (Linkedin), Wide-Column stores: They store data in tables, rows, and dynamic columns. Wide-column stores provide a lot of flexibility over relational databases because each row is not required to have the same columns. Many consider wide-column stores to be two-dimensional key-value databases. Wide-column stores are great for when you need to store large amounts of data and you can predict what your query patterns will be. Wide-column stores are commonly used for storing Internet of Things data and user profile data. Cassandra and HBase are two of the most popular wide-column stores. Graph Databases: These databases store data in nodes and edges. Nodes typically store information about people, places, and things while edges store information about the relationships between the nodes. The underlying storage mechanism of graph databases can vary. Some depend on a relational engine and \u201cstore\u201d the graph data in a table (although a table is a logical element, therefore this approach imposes another level of abstraction between the graph database, the graph database management system and the physical devices where the data is actually stored). Others use a key-value store or document-oriented database for storage, making them inherently NoSQL structures. Graph databases excel in use cases where you need to traverse relationships to look for patterns such as social networks, fraud detection, and recommendation engines. Ex: Neo4j","title":"Types of NoSQL databases:"},{"location":"level101/databases_nosql/intro/#comparison","text":"Performance Scalability Flexibility Complexity Functionality Key Value high high high none Variable Document stores high Variable (high) high low Variable (low) Column DB high high moderate low minimal Graph Variable Variable high high Graph theory","title":"Comparison"},{"location":"level101/databases_nosql/intro/#differences-between-sql-and-nosql","text":"The table below summarizes the main differences between SQL and NoSQL databases. SQL Databases NoSQL Databases Data Storage Model Tables with fixed rows and columns Document: JSON documents, Key-value: key-value pairs, Wide-column: tables with rows and dynamic columns, Graph: nodes and edges Primary Purpose General purpose Document: general purpose, Key-value: large amounts of data with simple lookup queries, Wide-column: large amounts of data with predictable query patterns, Graph: analyzing and traversing relationships between connected data Schemas Rigid Flexible Scaling Vertical (scale-up with a larger server) Horizontal (scale-out across commodity servers) Multi-Record ACID Transactions Supported Most do not support multi-record ACID transactions. However, some\u2014like MongoDB\u2014do. Joins Typically required Typically not required Data to Object Mapping Requires ORM (object-relational mapping) Many do not require ORMs. Document DB documents map directly to data structures in most popular programming languages.","title":"Differences between SQL and NoSQL"},{"location":"level101/databases_nosql/intro/#advantages","text":"Flexible Data Models Most NoSQL systems feature flexible schemas. A flexible schema means you can easily modify your database schema to add or remove fields to support for evolving application requirements. This facilitates with continuous application development of new features without database operation overhead. Horizontal Scaling Most NoSQL systems allow you to scale horizontally, which means you can add in cheaper & commodity hardware, whenever you want to scale a system. On the other hand SQL systems generally scale Vertically (a more powerful server). NoSQL systems can also host huge data sets when compared to traditional SQL systems. Fast Queries NoSQL can generally be a lot faster than traditional SQL systems due to data denormalization and horizontal scaling. Most NoSQL systems also tend to store similar data together facilitating faster query responses. Developer productivity NoSQL systems tend to map data based on the programming data structures. As a result developers need to perform fewer data transformations leading to increased productivity & fewer bugs.","title":"Advantages"},{"location":"level101/databases_nosql/key_concepts/","text":"Key Concepts Lets looks at some of the key concepts when we talk about NoSQL or distributed systems CAP Theorem In a keynote titled \u201c Towards Robust Distributed Systems \u201d at ACM\u2019s PODC symposium in 2000 Eric Brewer came up with the so-called CAP-theorem which is widely adopted today by large web companies as well as in the NoSQL community. The CAP acronym stands for C onsistency, A vailability & P artition Tolerance. Consistency It refers to how consistent a system is after an execution. A distributed system is called consistent when a write made by a source is available for all readers of that shared data. Different NoSQL systems support different levels of consistency. Availability It refers to how a system responds to loss of functionality of different systems due to hardware and software failures. A high availability implies that a system is still available to handle operations (reads and writes) when a certain part of the system is down due to a failure or upgrade. Partition Tolerance It is the ability of the system to continue operations in the event of a network partition. A network partition occurs when a failure causes two or more islands of networks where the systems can\u2019t talk to each other across the islands temporarily or permanently. Brewer alleges that one can at most choose two of these three characteristics in a shared-data system. The CAP-theorem states that a choice can only be made for two options out of consistency, availability and partition tolerance. A growing number of use cases in large scale applications tend to value reliability implying that availability & redundancy are more valuable than consistency. As a result these systems struggle to meet ACID properties. They attain this by loosening on the consistency requirement i.e Eventual Consistency. Eventual Consistency means that all readers will see writes, as time goes on: \u201cIn a steady state, the system will eventually return the last written value\u201d. Clients therefore may face an inconsistent state of data as updates are in progress. For instance, in a replicated database updates may go to one node which replicates the latest version to all other nodes that contain a replica of the modified dataset so that the replica nodes eventually will have the latest version. NoSQL systems support different levels of eventual consistency models. For example: Read Your Own Writes Consistency Clients will see their updates immediately after they are written. The reads can hit nodes other than the one where it was written. However they might not see updates by other clients immediately. Session Consistency Clients will see the updates to their data within a session scope. This generally indicates that reads & writes occur on the same server. Other clients using the same nodes will receive the same updates. Casual Consistency A system provides causal consistency if the following condition holds: write operations that are related by potential causality are seen by each process of the system in order. Different processes may observe concurrent writes in different orders Eventual consistency is useful if concurrent updates of the same partitions of data are unlikely and if clients do not immediately depend on reading updates issued by themselves or by other clients. Depending on what consistency model was chosen for the system (or parts of it), determines where the requests are routed, ex: replicas. CAP alternatives illustration Choice Traits Examples Consistency + Availability (Forfeit Partitions) 2-phase commits Cache invalidation protocols Single-site databases Cluster databases LDAP xFS file system Consistency + Partition tolerance (Forfeit Availability) Pessimistic locking Make minority partitions unavailable Distributed databases Distributed locking Majority protocols Availability + Partition tolerance (Forfeit Consistency) expirations/leases conflict resolution optimistic DNS Web caching Versioning of Data in distributed systems When data is distributed across nodes, it can be modified on different nodes at the same time (assuming strict consistency is enforced). Questions arise on conflict resolution for concurrent updates. Some of the popular conflict resolution mechanism are Timestamps This is the most obvious solution. You sort updates based on chronological order and choose the latest update. However this relies on clock synchronization across different parts of the infrastructure. This gets even more complicated when parts of systems are spread across different geographic locations. Optimistic Locking You associate a unique value like a clock or counter with every data update. When a client wants to update data, it has to specify which version of data needs to be updated. This would mean you need to keep track of history of the data versions. Vector Clocks A vector clock is defined as a tuple of clock values from each node. In a distributed environment, each node maintains a tuple of such clock values which represent the state of the nodes itself and its peers/replicas. A clock value may be real timestamps derived from local clock or version no. Vector clocks illustration Vector clocks have the following advantages over other conflict resolution mechanism No dependency on synchronized clocks No total ordering of revision nos required for casual reasoning No need to store and maintain multiple versions of the data on different nodes.** ** Partitioning When the amount of data crosses the capacity of a single node, we need to think of splitting data, creating replicas for load balancing & disaster recovery. Depending on how dynamic the infrastructure is, we have a few approaches that we can take. Memory cached These are partitioned in-memory databases that are primarily used for transient data. These databases are generally used as a front for traditional RDBMS. Most frequently used data is replicated from a rdbms into a memory database to facilitate fast queries and to take the load off from backend DB\u2019s. A very common example is memcached or couchbase. Clustering Traditional cluster mechanisms abstract away the cluster topology from clients. A client need not know where the actual data is residing and which node it is talking to. Clustering is very commonly used in traditional RDBMS where it can help scaling the persistent layer to a certain extent. Separating reads from writes In this method, you will have multiple replicas hosting the same data. The incoming writes are typically sent to a single node (Leader) or multiple nodes (multi-Leader), while the rest of the replicas (Follower) handle reads requests. The leader replicates writes asynchronously to all followers. However the write lag can\u2019t be completely avoided. Sometimes a leader can crash before it replicates all the data to a follower. When this happens, a follower with the most consistent data can be turned into a leader. As you can realize now, it is hard to enforce full consistency in this model. You also need to consider the ratio of read vs write traffic. This model won\u2019t make sense when writes are higher than reads. The replication methods can also vary widely. Some systems do a complete transfer of state periodically, while others use a delta state transfer approach. You could also transfer the state by transferring the operations in order. The followers can then apply the same operations as the leader to catch up. Sharding Sharing refers to dividing data in such a way that data is distributed evenly (both in terms of storage & processing power) across a cluster of nodes. It can also imply data locality, which means similar & related data is stored together to facilitate faster access. A shard in turn can be further replicated to meet load balancing or disaster recovery requirements. A single shard replica might take in all writes (single leader) or multiple replicas can take writes (multi-leader). Reads can be distributed across multiple replicas. Since data is now distributed across multiple nodes, clients should be able to consistently figure out where data is hosted. We will look at some of the common techniques below. The downside of sharding is that joins between shards is not possible. So an upstream/downstream application has to aggregate the results from multiple shards. Sharding example Hashing A hash function is a function that maps one piece of data\u2014typically describing some kind of object, often of arbitrary size\u2014to another piece of data, typically an integer, known as hash code , or simply hash . In a partitioned database, it is important to consistently map a key to a server/replica. For ex: you can use a very simple hash as a modulo function. _p = k mod n_ Where p -> partition, k -> primary key n -> no of nodes The downside of this simple hash is that, whenever the cluster topology changes, the data distribution also changes. When you are dealing with memory caches, it will be easy to distribute partitions around. Whenever a node joins/leaves a topology, partitions can reorder themselves, a cache miss can be re-populated from backend DB. However when you look at persistent data, it is not possible as the new node doesn\u2019t have the data needed to serve it. This brings us to consistent hashing. Consistent Hashing Consistent hashing is a distributed hashing scheme that operates independently of the number of servers or objects in a distributed hash table by assigning them a position on an abstract circle, or hash ring . This allows servers and objects to scale without affecting the overall system. Say that our hash function h() generates a 32-bit integer. Then, to determine to which server we will send a key k, we find the server s whose hash h(s) is the smallest integer that is larger than h(k). To make the process simpler, we assume the table is circular, which means that if we cannot find a server with a hash larger than h(k), we wrap around and start looking from the beginning of the array. Consistent hashing illustration In consistent hashing when a server is removed or added then only the keys from that server are relocated. For example, if server S3 is removed then, all keys from server S3 will be moved to server S4 but keys stored on server S4 and S2 are not relocated. But there is one problem, when server S3 is removed then keys from S3 are not equally distributed among remaining servers S4 and S2. They are only assigned to server S4 which increases the load on server S4. To evenly distribute the load among servers when a server is added or removed, it creates a fixed number of replicas ( known as virtual nodes) of each server and distributes it along the circle. So instead of server labels S1, S2 and S3, we will have S10 S11\u2026S19, S20 S21\u2026S29 and S30 S31\u2026S39. The factor for a number of replicas is also known as weight , depending on the situation. All keys which are mapped to replicas Sij are stored on server Si. To find a key we do the same thing, find the position of the key on the circle and then move forward until you find a server replica. If the server replica is Sij then the key is stored in server Si. Suppose server S3 is removed, then all S3 replicas with labels S30 S31 \u2026 S39 must be removed. Now the objects keys adjacent to S3X labels will be automatically re-assigned to S1X, S2X and S4X. All keys originally assigned to S1, S2 & S4 will not be moved. Similar things happen if we add a server. Suppose we want to add a server S5 as a replacement of S3 then we need to add labels S50 S51 \u2026 S59. In the ideal case, one-fourth of keys from S1, S2 and S4 will be reassigned to S5. When applied to persistent storages, further issues arise: if a node has left the scene, data stored on this node becomes unavailable, unless it has been replicated to other nodes before; in the opposite case of a new node joining the others, adjacent nodes are no longer responsible for some pieces of data which they still store but not get asked for anymore as the corresponding objects are no longer hashed to them by requesting clients. In order to address this issue, a replication factor (r) can be introduced. Introducing replicas in a partitioning scheme\u2014besides reliability benefits\u2014also makes it possible to spread workload for read requests that can go to any physical node responsible for a requested piece of data. Scalability doesn\u2019t work if the clients have to decide between multiple versions of the dataset, because they need to read from a quorum of servers which in turn reduces the efficiency of load balancing. Quorum Quorum is the minimum number of nodes in a cluster that must be online and be able to communicate with each other. If any additional node failure occurs beyond this threshold, the cluster will stop running. To attain a quorum, you need a majority of the nodes. Commonly it is (N/2 + 1), where N is the total no of nodes in the system. For ex, In a 3 node cluster, you need 2 nodes for a majority, In a 5 node cluster, you need 3 nodes for a majority, In a 6 node cluster, you need 4 nodes for a majority. Quorum example Network problems can cause communication failures among cluster nodes. One set of nodes might be able to communicate together across a functioning part of a network but not be able to communicate with a different set of nodes in another part of the network. This is known as split brain in cluster or cluster partitioning. Now the partition which has quorum is allowed to continue running the application. The other partitions are removed from the cluster. Eg: In a 5 node cluster, consider what happens if nodes 1, 2, and 3 can communicate with each other but not with nodes 4 and 5. Nodes 1, 2, and 3 constitute a majority, and they continue running as a cluster. Nodes 4 and 5, being a minority, stop running as a cluster. If node 3 loses communication with other nodes, all nodes stop running as a cluster. However, all functioning nodes will continue to listen for communication, so that when the network begins working again, the cluster can form and begin to run. Below diagram demonstrates Quorum selection on a cluster partitioned into two sets. Cluster Quorum example","title":"Key Concepts"},{"location":"level101/databases_nosql/key_concepts/#key-concepts","text":"Lets looks at some of the key concepts when we talk about NoSQL or distributed systems","title":"Key Concepts"},{"location":"level101/databases_nosql/key_concepts/#cap-theorem","text":"In a keynote titled \u201c Towards Robust Distributed Systems \u201d at ACM\u2019s PODC symposium in 2000 Eric Brewer came up with the so-called CAP-theorem which is widely adopted today by large web companies as well as in the NoSQL community. The CAP acronym stands for C onsistency, A vailability & P artition Tolerance. Consistency It refers to how consistent a system is after an execution. A distributed system is called consistent when a write made by a source is available for all readers of that shared data. Different NoSQL systems support different levels of consistency. Availability It refers to how a system responds to loss of functionality of different systems due to hardware and software failures. A high availability implies that a system is still available to handle operations (reads and writes) when a certain part of the system is down due to a failure or upgrade. Partition Tolerance It is the ability of the system to continue operations in the event of a network partition. A network partition occurs when a failure causes two or more islands of networks where the systems can\u2019t talk to each other across the islands temporarily or permanently. Brewer alleges that one can at most choose two of these three characteristics in a shared-data system. The CAP-theorem states that a choice can only be made for two options out of consistency, availability and partition tolerance. A growing number of use cases in large scale applications tend to value reliability implying that availability & redundancy are more valuable than consistency. As a result these systems struggle to meet ACID properties. They attain this by loosening on the consistency requirement i.e Eventual Consistency. Eventual Consistency means that all readers will see writes, as time goes on: \u201cIn a steady state, the system will eventually return the last written value\u201d. Clients therefore may face an inconsistent state of data as updates are in progress. For instance, in a replicated database updates may go to one node which replicates the latest version to all other nodes that contain a replica of the modified dataset so that the replica nodes eventually will have the latest version. NoSQL systems support different levels of eventual consistency models. For example: Read Your Own Writes Consistency Clients will see their updates immediately after they are written. The reads can hit nodes other than the one where it was written. However they might not see updates by other clients immediately. Session Consistency Clients will see the updates to their data within a session scope. This generally indicates that reads & writes occur on the same server. Other clients using the same nodes will receive the same updates. Casual Consistency A system provides causal consistency if the following condition holds: write operations that are related by potential causality are seen by each process of the system in order. Different processes may observe concurrent writes in different orders Eventual consistency is useful if concurrent updates of the same partitions of data are unlikely and if clients do not immediately depend on reading updates issued by themselves or by other clients. Depending on what consistency model was chosen for the system (or parts of it), determines where the requests are routed, ex: replicas. CAP alternatives illustration Choice Traits Examples Consistency + Availability (Forfeit Partitions) 2-phase commits Cache invalidation protocols Single-site databases Cluster databases LDAP xFS file system Consistency + Partition tolerance (Forfeit Availability) Pessimistic locking Make minority partitions unavailable Distributed databases Distributed locking Majority protocols Availability + Partition tolerance (Forfeit Consistency) expirations/leases conflict resolution optimistic DNS Web caching","title":"CAP Theorem"},{"location":"level101/databases_nosql/key_concepts/#versioning-of-data-in-distributed-systems","text":"When data is distributed across nodes, it can be modified on different nodes at the same time (assuming strict consistency is enforced). Questions arise on conflict resolution for concurrent updates. Some of the popular conflict resolution mechanism are Timestamps This is the most obvious solution. You sort updates based on chronological order and choose the latest update. However this relies on clock synchronization across different parts of the infrastructure. This gets even more complicated when parts of systems are spread across different geographic locations. Optimistic Locking You associate a unique value like a clock or counter with every data update. When a client wants to update data, it has to specify which version of data needs to be updated. This would mean you need to keep track of history of the data versions. Vector Clocks A vector clock is defined as a tuple of clock values from each node. In a distributed environment, each node maintains a tuple of such clock values which represent the state of the nodes itself and its peers/replicas. A clock value may be real timestamps derived from local clock or version no. Vector clocks illustration Vector clocks have the following advantages over other conflict resolution mechanism No dependency on synchronized clocks No total ordering of revision nos required for casual reasoning No need to store and maintain multiple versions of the data on different nodes.** **","title":"Versioning of Data in distributed systems"},{"location":"level101/databases_nosql/key_concepts/#partitioning","text":"When the amount of data crosses the capacity of a single node, we need to think of splitting data, creating replicas for load balancing & disaster recovery. Depending on how dynamic the infrastructure is, we have a few approaches that we can take. Memory cached These are partitioned in-memory databases that are primarily used for transient data. These databases are generally used as a front for traditional RDBMS. Most frequently used data is replicated from a rdbms into a memory database to facilitate fast queries and to take the load off from backend DB\u2019s. A very common example is memcached or couchbase. Clustering Traditional cluster mechanisms abstract away the cluster topology from clients. A client need not know where the actual data is residing and which node it is talking to. Clustering is very commonly used in traditional RDBMS where it can help scaling the persistent layer to a certain extent. Separating reads from writes In this method, you will have multiple replicas hosting the same data. The incoming writes are typically sent to a single node (Leader) or multiple nodes (multi-Leader), while the rest of the replicas (Follower) handle reads requests. The leader replicates writes asynchronously to all followers. However the write lag can\u2019t be completely avoided. Sometimes a leader can crash before it replicates all the data to a follower. When this happens, a follower with the most consistent data can be turned into a leader. As you can realize now, it is hard to enforce full consistency in this model. You also need to consider the ratio of read vs write traffic. This model won\u2019t make sense when writes are higher than reads. The replication methods can also vary widely. Some systems do a complete transfer of state periodically, while others use a delta state transfer approach. You could also transfer the state by transferring the operations in order. The followers can then apply the same operations as the leader to catch up. Sharding Sharing refers to dividing data in such a way that data is distributed evenly (both in terms of storage & processing power) across a cluster of nodes. It can also imply data locality, which means similar & related data is stored together to facilitate faster access. A shard in turn can be further replicated to meet load balancing or disaster recovery requirements. A single shard replica might take in all writes (single leader) or multiple replicas can take writes (multi-leader). Reads can be distributed across multiple replicas. Since data is now distributed across multiple nodes, clients should be able to consistently figure out where data is hosted. We will look at some of the common techniques below. The downside of sharding is that joins between shards is not possible. So an upstream/downstream application has to aggregate the results from multiple shards. Sharding example","title":"Partitioning"},{"location":"level101/databases_nosql/key_concepts/#hashing","text":"A hash function is a function that maps one piece of data\u2014typically describing some kind of object, often of arbitrary size\u2014to another piece of data, typically an integer, known as hash code , or simply hash . In a partitioned database, it is important to consistently map a key to a server/replica. For ex: you can use a very simple hash as a modulo function. _p = k mod n_ Where p -> partition, k -> primary key n -> no of nodes The downside of this simple hash is that, whenever the cluster topology changes, the data distribution also changes. When you are dealing with memory caches, it will be easy to distribute partitions around. Whenever a node joins/leaves a topology, partitions can reorder themselves, a cache miss can be re-populated from backend DB. However when you look at persistent data, it is not possible as the new node doesn\u2019t have the data needed to serve it. This brings us to consistent hashing.","title":"Hashing"},{"location":"level101/databases_nosql/key_concepts/#consistent-hashing","text":"Consistent hashing is a distributed hashing scheme that operates independently of the number of servers or objects in a distributed hash table by assigning them a position on an abstract circle, or hash ring . This allows servers and objects to scale without affecting the overall system. Say that our hash function h() generates a 32-bit integer. Then, to determine to which server we will send a key k, we find the server s whose hash h(s) is the smallest integer that is larger than h(k). To make the process simpler, we assume the table is circular, which means that if we cannot find a server with a hash larger than h(k), we wrap around and start looking from the beginning of the array. Consistent hashing illustration In consistent hashing when a server is removed or added then only the keys from that server are relocated. For example, if server S3 is removed then, all keys from server S3 will be moved to server S4 but keys stored on server S4 and S2 are not relocated. But there is one problem, when server S3 is removed then keys from S3 are not equally distributed among remaining servers S4 and S2. They are only assigned to server S4 which increases the load on server S4. To evenly distribute the load among servers when a server is added or removed, it creates a fixed number of replicas ( known as virtual nodes) of each server and distributes it along the circle. So instead of server labels S1, S2 and S3, we will have S10 S11\u2026S19, S20 S21\u2026S29 and S30 S31\u2026S39. The factor for a number of replicas is also known as weight , depending on the situation. All keys which are mapped to replicas Sij are stored on server Si. To find a key we do the same thing, find the position of the key on the circle and then move forward until you find a server replica. If the server replica is Sij then the key is stored in server Si. Suppose server S3 is removed, then all S3 replicas with labels S30 S31 \u2026 S39 must be removed. Now the objects keys adjacent to S3X labels will be automatically re-assigned to S1X, S2X and S4X. All keys originally assigned to S1, S2 & S4 will not be moved. Similar things happen if we add a server. Suppose we want to add a server S5 as a replacement of S3 then we need to add labels S50 S51 \u2026 S59. In the ideal case, one-fourth of keys from S1, S2 and S4 will be reassigned to S5. When applied to persistent storages, further issues arise: if a node has left the scene, data stored on this node becomes unavailable, unless it has been replicated to other nodes before; in the opposite case of a new node joining the others, adjacent nodes are no longer responsible for some pieces of data which they still store but not get asked for anymore as the corresponding objects are no longer hashed to them by requesting clients. In order to address this issue, a replication factor (r) can be introduced. Introducing replicas in a partitioning scheme\u2014besides reliability benefits\u2014also makes it possible to spread workload for read requests that can go to any physical node responsible for a requested piece of data. Scalability doesn\u2019t work if the clients have to decide between multiple versions of the dataset, because they need to read from a quorum of servers which in turn reduces the efficiency of load balancing.","title":"Consistent Hashing"},{"location":"level101/databases_nosql/key_concepts/#quorum","text":"Quorum is the minimum number of nodes in a cluster that must be online and be able to communicate with each other. If any additional node failure occurs beyond this threshold, the cluster will stop running. To attain a quorum, you need a majority of the nodes. Commonly it is (N/2 + 1), where N is the total no of nodes in the system. For ex, In a 3 node cluster, you need 2 nodes for a majority, In a 5 node cluster, you need 3 nodes for a majority, In a 6 node cluster, you need 4 nodes for a majority. Quorum example Network problems can cause communication failures among cluster nodes. One set of nodes might be able to communicate together across a functioning part of a network but not be able to communicate with a different set of nodes in another part of the network. This is known as split brain in cluster or cluster partitioning. Now the partition which has quorum is allowed to continue running the application. The other partitions are removed from the cluster. Eg: In a 5 node cluster, consider what happens if nodes 1, 2, and 3 can communicate with each other but not with nodes 4 and 5. Nodes 1, 2, and 3 constitute a majority, and they continue running as a cluster. Nodes 4 and 5, being a minority, stop running as a cluster. If node 3 loses communication with other nodes, all nodes stop running as a cluster. However, all functioning nodes will continue to listen for communication, so that when the network begins working again, the cluster can form and begin to run. Below diagram demonstrates Quorum selection on a cluster partitioned into two sets. Cluster Quorum example","title":"Quorum"},{"location":"level101/databases_sql/backup_recovery/","text":"Backup and Recovery Backups are a very crucial part of any database setup. They are generally a copy of the data that can be used to reconstruct the data in case of any major or minor crisis with the database. In general terms backups can be of two types:- Physical Backup - the data directory as it is on the disk Logical Backup - the table structure and records in it Both the above kinds of backups are supported by MySQL with different tools. It is the job of the SRE to identify which should be used when. Mysqldump This utility is available with MySQL installation. It helps in getting the logical backup of the database. It outputs a set of SQL statements to reconstruct the data. It is not recommended to use mysqldump for large tables as it might take a lot of time and the file size will be huge. However, for small tables it is the best and the quickest option. mysqldump [options] > dump_output.sql There are certain options that can be used with mysqldump to get an appropriate dump of the database. To dump all the databases mysqldump -u -p --all-databases > all_dbs.sql To dump specific databases mysqldump -u -p --databases db1 db2 db3 > dbs.sql To dump a single database mysqldump -u -p --databases db1 > db1.sql OR mysqldump -u -p db1 > db1.sql The difference between the above two commands is that the latter one does not contain the CREATE DATABASE command in the backup output. To dump specific tables in a database mysqldump -u -p db1 table1 table2 > db1_tables.sql To dump only table structures and no data mysqldump -u -p --no-data db1 > db1_structure.sql To dump only table data and no CREATE statements mysqldump -u -p --no-create-info db1 > db1_data.sql To dump only specific records from a table mysqldump -u -p --no-create-info db1 table1 --where=\u201dsalary>80000\u201d > db1_table1_80000.sql Mysqldump can also provide output in CSV, other delimited text or XML format to support use-cases if any. The backup from mysqldump utility is offline i.e. when the backup finishes it will not have the changes to the database which were made when the backup was going on. For example, if the backup started at 3 PM and finished at 4 PM, it will not have the changes made to the database between 3 and 4 PM. Restoring from mysqldump can be done in the following two ways:- From shell mysql -u -p < all_dbs.sql OR From shell if the database is already created mysql -u -p db1 < db1.sql From within MySQL shell mysql> source all_dbs.sql Percona Xtrabackup This utility is installed separately from the MySQL server and is open source, provided by Percona. It helps in getting the full or partial physical backup of the database. It provides online backup of the database i.e. it will have the changes made to the database when the backup was going on as explained at the end of the previous section. Full Backup - the complete backup of the database. Partial Backup - Incremental Cumulative - After one full backup, the next backups will have changes post the full backup. For example, we took a full backup on Sunday, from Monday onwards every backup will have changes after Sunday; so, Tuesday\u2019s backup will have Monday\u2019s changes as well, Wednesday\u2019s backup will have changes of Monday and Tuesday as well and so on. Differential - After one full backup, the next backups will have changes post the previous incremental backup. For example, we took a full backup on Sunday, Monday will have changes done after Sunday, Tuesday will have changes done after Monday, and so on. Percona xtrabackup allows us to get both full and incremental backups as we desire. However, incremental backups take less space than a full backup (if taken per day) but the restore time of incremental backups is more than that of full backups. Creating a full backup xtrabackup --defaults-file= --user= --password= --backup --target-dir= Example xtrabackup --defaults-file=/etc/my.cnf --user=some_user --password=XXXX --backup --target-dir=/mnt/data/backup/ Some other options --stream - can be used to stream the backup files to standard output in a specified format. xbstream is the only option for now. --tmp-dir - set this to a tmp directory to be used for temporary files while taking backups. --parallel - set this to the number of threads that can be used to parallely copy data files to target directory. --compress - by default - quicklz is used. Set this to have the backup in compressed format. Each file is a .qp compressed file and can be extracted by qpress file archiver. --decompress - decompresses all the files which were compressed with the .qp extension. It will not delete the .qp files after decompression. To do that, use --remove-original along with this. Please note that the decompress option should be run separately from the xtrabackup command that used the compress option. Preparing a backup Once the backup is done with the --backup option, we need to prepare it in order to restore it. This is done to make the datafiles consistent with point-in-time. There might have been some transactions going on while the backup was being executed and those have changed the data files. When we prepare a backup, all those transactions are applied to the data files. xtrabackup --prepare --target-dir= Example xtrabackup --prepare --target-dir=/mnt/data/backup/ It is not recommended to halt a process which is preparing the backup as that might cause data file corruption and backup cannot be used further. The backup will have to be taken again. Restoring a Full Backup To restore the backup which is created and prepared from above commands, just copy everything from the backup target-dir to the data-dir of MySQL server, change the ownership of all files to mysql user (the linux user used by MySQL server) and start mysql. Or the below command can be used as well, xtrabackup --defaults-file=/etc/my.cnf --copy-back --target-dir=/mnt/data/backups/ Note - the backup has to be prepared in order to restore it. Creating Incremental backups Percona Xtrabackup helps create incremental backups, i.e only the changes can be backed up since the last backup. Every InnoDB page contains a log sequence number or LSN that is also mentioned as one of the last lines of backup and prepare commands. xtrabackup: Transaction log of lsn to was copied. OR InnoDB: Shutdown completed; log sequence number completed OK! This indicates that the backup has been taken till the log sequence number mentioned. This is a key information in understanding incremental backups and working towards automating one. Incremental backups do not compare data files for changes, instead, they go through the InnoDB pages and compare their LSN to the last backup\u2019s LSN. So, without one full backup, the incremental backups are useless. The xtrabackup command creates a xtrabackup_checkpoint file which has the information about the LSN of the backup. Below are the key contents of the file:- backup_type = full-backuped | incremental from_lsn = 0 (full backup) | to_lsn of last backup to_lsn = last_lsn = There is a difference between to_lsn and last_lsn . When the last_lsn is more than to_lsn that means there are transactions that ran while we took the backup and are yet to be applied. That is what --prepare is used for. To take incremental backups, first, we require one full backup. xtrabackup --defaults-file=/etc/my.cnf --user=some_user --password=XXXX --backup --target-dir=/mnt/data/backup/full/ Let\u2019s assume the contents of the xtrabackup_checkpoint file to be as follows. backup_type = full-backuped from_lsn = 0 to_lsn = 1000 last_lsn = 1000 Now that we have one full backup, we can have an incremental backup that takes the changes. We will go with differential incremental backups. xtrabackup --defaults-file=/etc/my.cnf --user=some_user --password=XXXX --backup --target-dir=/mnt/data/backup/incr1/ --incremental-basedir=/mnt/data/backup/full/ There are delta files created in the incr1 directory like, ibdata1.delta , db1/tbl1.ibd.delta with the changes from the full directory. The xtrabackup_checkpoint file will thus have the following contents. backup_type = incremental from_lsn = 1000 to_lsn = 1500 last_lsn = 1500 Hence, the from_lsn here is equal to the to_lsn of the last backup or the basedir provided for the incremental backups. For the next incremental backup we can use this incremental backup as the basedir. xtrabackup --defaults-file=/etc/my.cnf --user=some_user --password=XXXX --backup --target-dir=/mnt/data/backup/incr2/ --incremental-basedir=/mnt/data/backup/incr1/ The xtrabackup_checkpoint file will thus have the following contents. backup_type = incremental from_lsn = 1500 to_lsn = 2000 last_lsn = 2200 Preparing Incremental backups Preparing incremental backups is not the same as preparing a full backup. When prepare runs, two operations are performed - committed transactions are applied on the data files and uncommitted transactions are rolled back . While preparing incremental backups, we have to skip rollback of uncommitted transactions as it is likely that they might get committed in the next incremental backup. If we rollback uncommitted transactions the further incremental backups cannot be applied. We use --apply-log-only option along with --prepare to avoid the rollback phase. From the last section, we had the following directories with complete backup /mnt/data/backup/full /mnt/data/backup/incr1 /mnt/data/backup/incr2 First, we prepare the full backup, but only with the --apply-log-only option. xtrabackup --prepare --apply-log-only --target-dir=/mnt/data/backup/full The output of the command will contain the following at the end. InnoDB: Shutdown complete; log sequence number 1000 Completed OK! Note the LSN mentioned at the end is the same as the to_lsn from the xtrabackup_checkpoint created for full backup. Next, we apply the changes from the first incremental backup to the full backup. xtrabackup --prepare --apply-log-only --target-dir=/mnt/data/backup/full --incremental-dir=/mnt/data/backup/incr1 This applies the delta files in the incremental directory to the full backup directory. It rolls the data files in the full backup directory forward to the time of incremental backup and applies the redo logs as usual. Lastly, we apply the last incremental backup same as the previous one with just a small change. xtrabackup --prepare --target-dir=/mnt/data/backup/full --incremental-dir=/mnt/data/backup/incr1 We do not have to use the --apply-log-only option with it. It applies the incr2 delta files to the full backup data files taking them forward, applies redo logs on them and finally rollbacks the uncommitted transactions to produce the final result. The data now present in the full backup directory can now be used to restore. Note - To create cumulative incremental backups, the incremental-basedir should always be the full backup directory for every incremental backup. While preparing, we can start with the full backup with the --apply-log-only option and use just the last incremental backup for the final --prepare as that has all the changes since the full backup. Restoring Incremental backups Once all the above steps are completed, restoring is the same as done for a full backup. Further Reading MySQL Point-In-Time-Recovery Another MySQL backup tool - mysqlpump Another MySQL backup tool - mydumper A comparison between mysqldump, mysqlpump and mydumper Backup Best Practices","title":"Backup and Recovery"},{"location":"level101/databases_sql/backup_recovery/#backup-and-recovery","text":"Backups are a very crucial part of any database setup. They are generally a copy of the data that can be used to reconstruct the data in case of any major or minor crisis with the database. In general terms backups can be of two types:- Physical Backup - the data directory as it is on the disk Logical Backup - the table structure and records in it Both the above kinds of backups are supported by MySQL with different tools. It is the job of the SRE to identify which should be used when.","title":"Backup and Recovery"},{"location":"level101/databases_sql/backup_recovery/#mysqldump","text":"This utility is available with MySQL installation. It helps in getting the logical backup of the database. It outputs a set of SQL statements to reconstruct the data. It is not recommended to use mysqldump for large tables as it might take a lot of time and the file size will be huge. However, for small tables it is the best and the quickest option. mysqldump [options] > dump_output.sql There are certain options that can be used with mysqldump to get an appropriate dump of the database. To dump all the databases mysqldump -u -p --all-databases > all_dbs.sql To dump specific databases mysqldump -u -p --databases db1 db2 db3 > dbs.sql To dump a single database mysqldump -u -p --databases db1 > db1.sql OR mysqldump -u -p db1 > db1.sql The difference between the above two commands is that the latter one does not contain the CREATE DATABASE command in the backup output. To dump specific tables in a database mysqldump -u -p db1 table1 table2 > db1_tables.sql To dump only table structures and no data mysqldump -u -p --no-data db1 > db1_structure.sql To dump only table data and no CREATE statements mysqldump -u -p --no-create-info db1 > db1_data.sql To dump only specific records from a table mysqldump -u -p --no-create-info db1 table1 --where=\u201dsalary>80000\u201d > db1_table1_80000.sql Mysqldump can also provide output in CSV, other delimited text or XML format to support use-cases if any. The backup from mysqldump utility is offline i.e. when the backup finishes it will not have the changes to the database which were made when the backup was going on. For example, if the backup started at 3 PM and finished at 4 PM, it will not have the changes made to the database between 3 and 4 PM. Restoring from mysqldump can be done in the following two ways:- From shell mysql -u -p < all_dbs.sql OR From shell if the database is already created mysql -u -p db1 < db1.sql From within MySQL shell mysql> source all_dbs.sql","title":"Mysqldump"},{"location":"level101/databases_sql/backup_recovery/#percona-xtrabackup","text":"This utility is installed separately from the MySQL server and is open source, provided by Percona. It helps in getting the full or partial physical backup of the database. It provides online backup of the database i.e. it will have the changes made to the database when the backup was going on as explained at the end of the previous section. Full Backup - the complete backup of the database. Partial Backup - Incremental Cumulative - After one full backup, the next backups will have changes post the full backup. For example, we took a full backup on Sunday, from Monday onwards every backup will have changes after Sunday; so, Tuesday\u2019s backup will have Monday\u2019s changes as well, Wednesday\u2019s backup will have changes of Monday and Tuesday as well and so on. Differential - After one full backup, the next backups will have changes post the previous incremental backup. For example, we took a full backup on Sunday, Monday will have changes done after Sunday, Tuesday will have changes done after Monday, and so on. Percona xtrabackup allows us to get both full and incremental backups as we desire. However, incremental backups take less space than a full backup (if taken per day) but the restore time of incremental backups is more than that of full backups. Creating a full backup xtrabackup --defaults-file= --user= --password= --backup --target-dir= Example xtrabackup --defaults-file=/etc/my.cnf --user=some_user --password=XXXX --backup --target-dir=/mnt/data/backup/ Some other options --stream - can be used to stream the backup files to standard output in a specified format. xbstream is the only option for now. --tmp-dir - set this to a tmp directory to be used for temporary files while taking backups. --parallel - set this to the number of threads that can be used to parallely copy data files to target directory. --compress - by default - quicklz is used. Set this to have the backup in compressed format. Each file is a .qp compressed file and can be extracted by qpress file archiver. --decompress - decompresses all the files which were compressed with the .qp extension. It will not delete the .qp files after decompression. To do that, use --remove-original along with this. Please note that the decompress option should be run separately from the xtrabackup command that used the compress option. Preparing a backup Once the backup is done with the --backup option, we need to prepare it in order to restore it. This is done to make the datafiles consistent with point-in-time. There might have been some transactions going on while the backup was being executed and those have changed the data files. When we prepare a backup, all those transactions are applied to the data files. xtrabackup --prepare --target-dir= Example xtrabackup --prepare --target-dir=/mnt/data/backup/ It is not recommended to halt a process which is preparing the backup as that might cause data file corruption and backup cannot be used further. The backup will have to be taken again. Restoring a Full Backup To restore the backup which is created and prepared from above commands, just copy everything from the backup target-dir to the data-dir of MySQL server, change the ownership of all files to mysql user (the linux user used by MySQL server) and start mysql. Or the below command can be used as well, xtrabackup --defaults-file=/etc/my.cnf --copy-back --target-dir=/mnt/data/backups/ Note - the backup has to be prepared in order to restore it. Creating Incremental backups Percona Xtrabackup helps create incremental backups, i.e only the changes can be backed up since the last backup. Every InnoDB page contains a log sequence number or LSN that is also mentioned as one of the last lines of backup and prepare commands. xtrabackup: Transaction log of lsn to was copied. OR InnoDB: Shutdown completed; log sequence number completed OK! This indicates that the backup has been taken till the log sequence number mentioned. This is a key information in understanding incremental backups and working towards automating one. Incremental backups do not compare data files for changes, instead, they go through the InnoDB pages and compare their LSN to the last backup\u2019s LSN. So, without one full backup, the incremental backups are useless. The xtrabackup command creates a xtrabackup_checkpoint file which has the information about the LSN of the backup. Below are the key contents of the file:- backup_type = full-backuped | incremental from_lsn = 0 (full backup) | to_lsn of last backup to_lsn = last_lsn = There is a difference between to_lsn and last_lsn . When the last_lsn is more than to_lsn that means there are transactions that ran while we took the backup and are yet to be applied. That is what --prepare is used for. To take incremental backups, first, we require one full backup. xtrabackup --defaults-file=/etc/my.cnf --user=some_user --password=XXXX --backup --target-dir=/mnt/data/backup/full/ Let\u2019s assume the contents of the xtrabackup_checkpoint file to be as follows. backup_type = full-backuped from_lsn = 0 to_lsn = 1000 last_lsn = 1000 Now that we have one full backup, we can have an incremental backup that takes the changes. We will go with differential incremental backups. xtrabackup --defaults-file=/etc/my.cnf --user=some_user --password=XXXX --backup --target-dir=/mnt/data/backup/incr1/ --incremental-basedir=/mnt/data/backup/full/ There are delta files created in the incr1 directory like, ibdata1.delta , db1/tbl1.ibd.delta with the changes from the full directory. The xtrabackup_checkpoint file will thus have the following contents. backup_type = incremental from_lsn = 1000 to_lsn = 1500 last_lsn = 1500 Hence, the from_lsn here is equal to the to_lsn of the last backup or the basedir provided for the incremental backups. For the next incremental backup we can use this incremental backup as the basedir. xtrabackup --defaults-file=/etc/my.cnf --user=some_user --password=XXXX --backup --target-dir=/mnt/data/backup/incr2/ --incremental-basedir=/mnt/data/backup/incr1/ The xtrabackup_checkpoint file will thus have the following contents. backup_type = incremental from_lsn = 1500 to_lsn = 2000 last_lsn = 2200 Preparing Incremental backups Preparing incremental backups is not the same as preparing a full backup. When prepare runs, two operations are performed - committed transactions are applied on the data files and uncommitted transactions are rolled back . While preparing incremental backups, we have to skip rollback of uncommitted transactions as it is likely that they might get committed in the next incremental backup. If we rollback uncommitted transactions the further incremental backups cannot be applied. We use --apply-log-only option along with --prepare to avoid the rollback phase. From the last section, we had the following directories with complete backup /mnt/data/backup/full /mnt/data/backup/incr1 /mnt/data/backup/incr2 First, we prepare the full backup, but only with the --apply-log-only option. xtrabackup --prepare --apply-log-only --target-dir=/mnt/data/backup/full The output of the command will contain the following at the end. InnoDB: Shutdown complete; log sequence number 1000 Completed OK! Note the LSN mentioned at the end is the same as the to_lsn from the xtrabackup_checkpoint created for full backup. Next, we apply the changes from the first incremental backup to the full backup. xtrabackup --prepare --apply-log-only --target-dir=/mnt/data/backup/full --incremental-dir=/mnt/data/backup/incr1 This applies the delta files in the incremental directory to the full backup directory. It rolls the data files in the full backup directory forward to the time of incremental backup and applies the redo logs as usual. Lastly, we apply the last incremental backup same as the previous one with just a small change. xtrabackup --prepare --target-dir=/mnt/data/backup/full --incremental-dir=/mnt/data/backup/incr1 We do not have to use the --apply-log-only option with it. It applies the incr2 delta files to the full backup data files taking them forward, applies redo logs on them and finally rollbacks the uncommitted transactions to produce the final result. The data now present in the full backup directory can now be used to restore. Note - To create cumulative incremental backups, the incremental-basedir should always be the full backup directory for every incremental backup. While preparing, we can start with the full backup with the --apply-log-only option and use just the last incremental backup for the final --prepare as that has all the changes since the full backup. Restoring Incremental backups Once all the above steps are completed, restoring is the same as done for a full backup.","title":"Percona Xtrabackup"},{"location":"level101/databases_sql/backup_recovery/#further-reading","text":"MySQL Point-In-Time-Recovery Another MySQL backup tool - mysqlpump Another MySQL backup tool - mydumper A comparison between mysqldump, mysqlpump and mydumper Backup Best Practices","title":"Further Reading"},{"location":"level101/databases_sql/concepts/","text":"Relational DBs are used for data storage. Even a file can be used to store data, but relational DBs are designed with specific goals: Efficiency Ease of access and management Organized Handle relations between data (represented as tables) Transaction: a unit of work that can comprise multiple statements, executed together ACID properties Set of properties that guarantee data integrity of DB transactions Atomicity: Each transaction is atomic (succeeds or fails completely) Consistency: Transactions only result in valid state (which includes rules, constraints, triggers etc.) Isolation: Each transaction is executed independently of others safely within a concurrent system Durability: Completed transactions will not be lost due to any later failures Let\u2019s take some examples to illustrate the above properties. Account A has a balance of \u20b9200 & B has \u20b9400. Account A is transferring \u20b9100 to Account B. This transaction has a deduction from sender and an addition into the recipient\u2019s balance. If the first operation passes successfully while the second fails, A\u2019s balance would be \u20b9100 while B would be having \u20b9400 instead of \u20b9500. Atomicity in a DB ensures this partially failed transaction is rolled back. If the second operation above fails, it leaves the DB inconsistent (sum of balance of accounts before and after the operation is not the same). Consistency ensures that this does not happen. There are three operations, one to calculate interest for A\u2019s account, another to add that to A\u2019s account, then transfer \u20b9100 from B to A. Without isolation guarantees, concurrent execution of these 3 operations may lead to a different outcome every time. What happens if the system crashes before the transactions are written to disk? Durability ensures that the changes are applied correctly during recovery. Relational data Tables represent relations Columns (fields) represent attributes Rows are individual records Schema describes the structure of DB SQL A query language to interact with and manage data. CRUD operations - create, read, update, delete queries Management operations - create DBs/tables/indexes etc, backup, import/export, users, access controls Exercise: Classify the below queries into the four types - DDL (definition), DML(manipulation), DCL(control) and TCL(transactions) and explain in detail. insert, create, drop, delete, update, commit, rollback, truncate, alter, grant, revoke You can practise these in the lab section . Constraints Rules for data that can be stored. Query fails if you violate any of these defined on a table. Primary key: one or more columns that contain UNIQUE values, and cannot contain NULL values. A table can have only ONE primary key. An index on it is created by default. Foreign key: links two tables together. Its value(s) match a primary key in a different table \\ Not null: Does not allow null values \\ Unique: Value of column must be unique across all rows \\ Default: Provides a default value for a column if none is specified during insert Check: Allows only particular values (like Balance >= 0) Indexes Most indexes use B+ tree structure. Why use them: Speeds up queries (in large tables that fetch only a few rows, min/max queries, by eliminating rows from consideration etc) Types of indexes: unique, primary key, fulltext, secondary Write-heavy loads, mostly full table scans or accessing large number of rows etc. do not benefit from indexes Joins Allows you to fetch related data from multiple tables, linking them together with some common field. Powerful but also resource-intensive and makes scaling databases difficult. This is the cause of many slow performing queries when run at scale, and the solution is almost always to find ways to reduce the joins. Access control DBs have privileged accounts for admin tasks, and regular accounts for clients. There are finegrained controls on what actions(DDL, DML etc. discussed earlier )are allowed for these accounts. DB first verifies the user credentials (authentication), and then examines whether this user is permitted to perform the request (authorization) by looking up these information in some internal tables. Other controls include activity auditing that allows examining the history of actions done by a user, and resource limits which define the number of queries, connections etc. allowed. Popular databases Commercial, closed source - Oracle, Microsoft SQL Server, IBM DB2 Open source with optional paid support - MySQL, MariaDB, PostgreSQL Individuals and small companies have always preferred open source DBs because of the huge cost associated with commercial software. In recent times, even large organizations have moved away from commercial software to open source alternatives because of the flexibility and cost savings associated with it. Lack of support is no longer a concern because of the paid support available from the developer and third parties. MySQL is the most widely used open source DB, and it is widely supported by hosting providers, making it easy for anyone to use. It is part of the popular Linux-Apache-MySQL-PHP ( LAMP ) stack that became popular in the 2000s. We have many more choices for a programming language, but the rest of that stack is still widely used.","title":"Key Concepts"},{"location":"level101/databases_sql/concepts/#popular-databases","text":"Commercial, closed source - Oracle, Microsoft SQL Server, IBM DB2 Open source with optional paid support - MySQL, MariaDB, PostgreSQL Individuals and small companies have always preferred open source DBs because of the huge cost associated with commercial software. In recent times, even large organizations have moved away from commercial software to open source alternatives because of the flexibility and cost savings associated with it. Lack of support is no longer a concern because of the paid support available from the developer and third parties. MySQL is the most widely used open source DB, and it is widely supported by hosting providers, making it easy for anyone to use. It is part of the popular Linux-Apache-MySQL-PHP ( LAMP ) stack that became popular in the 2000s. We have many more choices for a programming language, but the rest of that stack is still widely used.","title":"Popular databases"},{"location":"level101/databases_sql/conclusion/","text":"Conclusion We have covered basic concepts of SQL databases. We have also covered some of the tasks that an SRE may be responsible for - there is so much more to learn and do. We hope this course gives you a good start and inspires you to explore further. Further reading More practice with online resources like this one Normalization Routines , triggers Views Transaction isolation levels Sharding Setting up HA , monitoring , backups","title":"Conclusion"},{"location":"level101/databases_sql/conclusion/#conclusion","text":"We have covered basic concepts of SQL databases. We have also covered some of the tasks that an SRE may be responsible for - there is so much more to learn and do. We hope this course gives you a good start and inspires you to explore further.","title":"Conclusion"},{"location":"level101/databases_sql/conclusion/#further-reading","text":"More practice with online resources like this one Normalization Routines , triggers Views Transaction isolation levels Sharding Setting up HA , monitoring , backups","title":"Further reading"},{"location":"level101/databases_sql/innodb/","text":"Why should you use this? General purpose, row level locking, ACID support, transactions, crash recovery and multi-version concurrency control etc. Architecture Key components: Memory: Buffer pool: LRU cache of frequently used data(table and index) to be processed directly from memory, which speeds up processing. Important for tuning performance. Change buffer: Caches changes to secondary index pages when those pages are not in the buffer pool and merges it when they are fetched. Merging may take a long time and impact live queries. It also takes up part of the buffer pool. Avoids the extra I/O to read secondary indexes in. Adaptive hash index: Supplements InnoDB\u2019s B-Tree indexes with fast hash lookup tables like a cache. Slight performance penalty for misses, also adds maintenance overhead of updating it. Hash collisions cause AHI rebuilding for large DBs. Log buffer: Holds log data before flush to disk. Size of each above memory is configurable, and impacts performance a lot. Requires careful analysis of workload, available resources, benchmarking and tuning for optimal performance. Disk: Tables: Stores data within rows and columns. Indexes: Helps find rows with specific column values quickly, avoids full table scans. Redo Logs: all transactions are written to them, and after a crash, the recovery process corrects data written by incomplete transactions and replays any pending ones. Undo Logs: Records associated with a single transaction that contains information about how to undo the latest change by a transaction.","title":"InnoDB"},{"location":"level101/databases_sql/innodb/#why-should-you-use-this","text":"General purpose, row level locking, ACID support, transactions, crash recovery and multi-version concurrency control etc.","title":"Why should you use this?"},{"location":"level101/databases_sql/innodb/#architecture","text":"","title":"Architecture"},{"location":"level101/databases_sql/innodb/#key-components","text":"Memory: Buffer pool: LRU cache of frequently used data(table and index) to be processed directly from memory, which speeds up processing. Important for tuning performance. Change buffer: Caches changes to secondary index pages when those pages are not in the buffer pool and merges it when they are fetched. Merging may take a long time and impact live queries. It also takes up part of the buffer pool. Avoids the extra I/O to read secondary indexes in. Adaptive hash index: Supplements InnoDB\u2019s B-Tree indexes with fast hash lookup tables like a cache. Slight performance penalty for misses, also adds maintenance overhead of updating it. Hash collisions cause AHI rebuilding for large DBs. Log buffer: Holds log data before flush to disk. Size of each above memory is configurable, and impacts performance a lot. Requires careful analysis of workload, available resources, benchmarking and tuning for optimal performance. Disk: Tables: Stores data within rows and columns. Indexes: Helps find rows with specific column values quickly, avoids full table scans. Redo Logs: all transactions are written to them, and after a crash, the recovery process corrects data written by incomplete transactions and replays any pending ones. Undo Logs: Records associated with a single transaction that contains information about how to undo the latest change by a transaction.","title":"Key components:"},{"location":"level101/databases_sql/intro/","text":"Relational Databases Prerequisites Complete Linux course Install Docker (for lab section) What to expect from this course You will have an understanding of what relational databases are, their advantages, and some MySQL specific concepts. What is not covered under this course In depth implementation details Advanced topics like normalization, sharding Specific tools for administration Introduction The main purpose of database systems is to manage data. This includes storage, adding new data, deleting unused data, updating existing data, retrieving data within a reasonable response time, other maintenance tasks to keep the system running etc. Pre-reads RDBMS Concepts Course Contents Key Concepts MySQL Architecture InnoDB Backup and Recovery MySQL Replication Operational Concepts SELECT Query Query Performance Lab Further Reading","title":"Introduction"},{"location":"level101/databases_sql/intro/#relational-databases","text":"","title":"Relational Databases"},{"location":"level101/databases_sql/intro/#prerequisites","text":"Complete Linux course Install Docker (for lab section)","title":"Prerequisites"},{"location":"level101/databases_sql/intro/#what-to-expect-from-this-course","text":"You will have an understanding of what relational databases are, their advantages, and some MySQL specific concepts.","title":"What to expect from this course"},{"location":"level101/databases_sql/intro/#what-is-not-covered-under-this-course","text":"In depth implementation details Advanced topics like normalization, sharding Specific tools for administration","title":"What is not covered under this course"},{"location":"level101/databases_sql/intro/#introduction","text":"The main purpose of database systems is to manage data. This includes storage, adding new data, deleting unused data, updating existing data, retrieving data within a reasonable response time, other maintenance tasks to keep the system running etc.","title":"Introduction"},{"location":"level101/databases_sql/intro/#pre-reads","text":"RDBMS Concepts","title":"Pre-reads"},{"location":"level101/databases_sql/intro/#course-contents","text":"Key Concepts MySQL Architecture InnoDB Backup and Recovery MySQL Replication Operational Concepts SELECT Query Query Performance Lab Further Reading","title":"Course Contents"},{"location":"level101/databases_sql/lab/","text":"Prerequisites Install Docker Setup Create a working directory named sos or something similar, and cd into it. Enter the following into a file named my.cnf under a directory named custom. sos $ cat custom/my.cnf [mysqld] # These settings apply to MySQL server # You can set port, socket path, buffer size etc. # Below, we are configuring slow query settings slow_query_log=1 slow_query_log_file=/var/log/mysqlslow.log long_query_time=1 Start a container and enable slow query log with the following: sos $ docker run --name db -v custom:/etc/mysql/conf.d -e MYSQL_ROOT_PASSWORD=realsecret -d mysql:8 sos $ docker cp custom/my.cnf $(docker ps -qf \"name=db\"):/etc/mysql/conf.d/custom.cnf sos $ docker restart $(docker ps -qf \"name=db\") Import a sample database sos $ git clone git@github.com:datacharmer/test_db.git sos $ docker cp test_db $(docker ps -qf \"name=db\"):/home/test_db/ sos $ docker exec -it $(docker ps -qf \"name=db\") bash root@3ab5b18b0c7d:/# cd /home/test_db/ root@3ab5b18b0c7d:/# mysql -uroot -prealsecret mysql < employees.sql root@3ab5b18b0c7d:/etc# touch /var/log/mysqlslow.log root@3ab5b18b0c7d:/etc# chown mysql:mysql /var/log/mysqlslow.log Workshop 1: Run some sample queries Run the following $ mysql -uroot -prealsecret mysql mysql> # inspect DBs and tables # the last 4 are MySQL internal DBs mysql> show databases; +--------------------+ | Database | +--------------------+ | employees | | information_schema | | mysql | | performance_schema | | sys | +--------------------+ > use employees; mysql> show tables; +----------------------+ | Tables_in_employees | +----------------------+ | current_dept_emp | | departments | | dept_emp | | dept_emp_latest_date | | dept_manager | | employees | | salaries | | titles | +----------------------+ # read a few rows mysql> select * from employees limit 5; # filter data by conditions mysql> select count(*) from employees where gender = 'M' limit 5; # find count of particular data mysql> select count(*) from employees where first_name = 'Sachin'; Workshop 2: Use explain and explain analyze to profile a query, identify and add indexes required for improving performance # View all indexes on table #(\\G is to output horizontally, replace it with a ; to get table output) mysql> show index from employees from employees\\G *************************** 1. row *************************** Table: employees Non_unique: 0 Key_name: PRIMARY Seq_in_index: 1 Column_name: emp_no Collation: A Cardinality: 299113 Sub_part: NULL Packed: NULL Null: Index_type: BTREE Comment: Index_comment: Visible: YES Expression: NULL # This query uses an index, idenitfied by 'key' field # By prefixing explain keyword to the command, # we get query plan (including key used) mysql> explain select * from employees where emp_no < 10005\\G *************************** 1. row *************************** id: 1 select_type: SIMPLE table: employees partitions: NULL type: range possible_keys: PRIMARY key: PRIMARY key_len: 4 ref: NULL rows: 4 filtered: 100.00 Extra: Using where # Compare that to the next query which does not utilize any index mysql> explain select first_name, last_name from employees where first_name = 'Sachin'\\G *************************** 1. row *************************** id: 1 select_type: SIMPLE table: employees partitions: NULL type: ALL possible_keys: NULL key: NULL key_len: NULL ref: NULL rows: 299113 filtered: 10.00 Extra: Using where # Let's see how much time this query takes mysql> explain analyze select first_name, last_name from employees where first_name = 'Sachin'\\G *************************** 1. row *************************** EXPLAIN: -> Filter: (employees.first_name = 'Sachin') (cost=30143.55 rows=29911) (actual time=28.284..3952.428 rows=232 loops=1) -> Table scan on employees (cost=30143.55 rows=299113) (actual time=0.095..1996.092 rows=300024 loops=1) # Cost(estimated by query planner) is 30143.55 # actual time=28.284ms for first row, 3952.428 for all rows # Now lets try adding an index and running the query again mysql> create index idx_firstname on employees(first_name); Query OK, 0 rows affected (1.25 sec) Records: 0 Duplicates: 0 Warnings: 0 mysql> explain analyze select first_name, last_name from employees where first_name = 'Sachin'; +--------------------------------------------------------------------------------------------------------------------------------------------+ | EXPLAIN | +--------------------------------------------------------------------------------------------------------------------------------------------+ | -> Index lookup on employees using idx_firstname (first_name='Sachin') (cost=81.20 rows=232) (actual time=0.551..2.934 rows=232 loops=1) | +--------------------------------------------------------------------------------------------------------------------------------------------+ 1 row in set (0.01 sec) # Actual time=0.551ms for first row # 2.934ms for all rows. A huge improvement! # Also notice that the query involves only an index lookup, # and no table scan (reading all rows of table) # ..which vastly reduces load on the DB. Workshop 3: Identify slow queries on a MySQL server # Run the command below in two terminal tabs to open two shells into the container. docker exec -it $(docker ps -qf \"name=db\") bash # Open a mysql prompt in one of them and execute this command # We have configured to log queries that take longer than 1s, # so this sleep(3) will be logged mysql -uroot -prealsecret mysql mysql> select sleep(3); # Now, in the other terminal, tail the slow log to find details about the query root@62c92c89234d:/etc# tail -f /var/log/mysqlslow.log /usr/sbin/mysqld, Version: 8.0.21 (MySQL Community Server - GPL). started with: Tcp port: 3306 Unix socket: /var/run/mysqld/mysqld.sock Time Id Command Argument # Time: 2020-11-26T14:53:44.822348Z # User@Host: root[root] @ localhost [] Id: 9 # Query_time: 5.404938 Lock_time: 0.000000 Rows_sent: 1 Rows_examined: 1 use employees; # Time: 2020-11-26T14:53:58.015736Z # User@Host: root[root] @ localhost [] Id: 9 # Query_time: 10.000225 Lock_time: 0.000000 Rows_sent: 1 Rows_examined: 1 SET timestamp=1606402428; select sleep(3); These were simulated examples with minimal complexity. In real life, the queries would be much more complex and the explain/analyze and slow query logs would have more details.","title":"Lab"},{"location":"level101/databases_sql/mysql/","text":"MySQL architecture MySQL architecture enables you to select the right storage engine for your needs, and abstracts away all implementation details from the end users (application engineers and DBA ) who only need to know a consistent stable API. Application layer: Connection handling - each client gets its own connection which is cached for the duration of access) Authentication - server checks (username,password,host) info of client and allows/rejects connection Security: server determines whether the client has privileges to execute each query (check with show privileges command) Server layer: Services and utilities - backup/restore, replication, cluster etc SQL interface - clients run queries for data access and manipulation SQL parser - creates a parse tree from the query (lexical/syntactic/semantic analysis and code generation) Optimizer - optimizes queries using various algorithms and data available to it(table level stats), modifies queries, order of scanning, indexes to use etc. (check with explain command) Caches and buffers - cache stores query results, buffer pool(InnoDB) stores table and index data in LRU fashion Storage engine options: InnoDB: most widely used, transaction support, ACID compliant, supports row-level locking, crash recovery and multi-version concurrency control. Default since MySQL 5.5+. MyISAM: fast, does not support transactions, provides table-level locking, great for read-heavy workloads, mostly in web and data warehousing. Default upto MySQL 5.1. Archive: optimised for high speed inserts, compresses data as it is inserted, does not support transactions, ideal for storing and retrieving large amounts of seldom referenced historical, archived data Memory: tables in memory. Fastest engine, supports table-level locking, does not support transactions, ideal for creating temporary tables or quick lookups, data is lost after a shutdown CSV: stores data in CSV files, great for integrating into other applications that use this format \u2026 etc. It is possible to migrate from one storage engine to another. But this migration locks tables for all operations and is not online, as it changes the physical layout of the data. It takes a long time and is generally not recommended. Hence, choosing the right storage engine at the beginning is important. General guideline is to use InnoDB unless you have a specific need for one of the other storage engines. Running mysql> SHOW ENGINES; shows you the supported engines on your MySQL server.","title":"MySQL"},{"location":"level101/databases_sql/mysql/#mysql-architecture","text":"MySQL architecture enables you to select the right storage engine for your needs, and abstracts away all implementation details from the end users (application engineers and DBA ) who only need to know a consistent stable API. Application layer: Connection handling - each client gets its own connection which is cached for the duration of access) Authentication - server checks (username,password,host) info of client and allows/rejects connection Security: server determines whether the client has privileges to execute each query (check with show privileges command) Server layer: Services and utilities - backup/restore, replication, cluster etc SQL interface - clients run queries for data access and manipulation SQL parser - creates a parse tree from the query (lexical/syntactic/semantic analysis and code generation) Optimizer - optimizes queries using various algorithms and data available to it(table level stats), modifies queries, order of scanning, indexes to use etc. (check with explain command) Caches and buffers - cache stores query results, buffer pool(InnoDB) stores table and index data in LRU fashion Storage engine options: InnoDB: most widely used, transaction support, ACID compliant, supports row-level locking, crash recovery and multi-version concurrency control. Default since MySQL 5.5+. MyISAM: fast, does not support transactions, provides table-level locking, great for read-heavy workloads, mostly in web and data warehousing. Default upto MySQL 5.1. Archive: optimised for high speed inserts, compresses data as it is inserted, does not support transactions, ideal for storing and retrieving large amounts of seldom referenced historical, archived data Memory: tables in memory. Fastest engine, supports table-level locking, does not support transactions, ideal for creating temporary tables or quick lookups, data is lost after a shutdown CSV: stores data in CSV files, great for integrating into other applications that use this format \u2026 etc. It is possible to migrate from one storage engine to another. But this migration locks tables for all operations and is not online, as it changes the physical layout of the data. It takes a long time and is generally not recommended. Hence, choosing the right storage engine at the beginning is important. General guideline is to use InnoDB unless you have a specific need for one of the other storage engines. Running mysql> SHOW ENGINES; shows you the supported engines on your MySQL server.","title":"MySQL architecture"},{"location":"level101/databases_sql/operations/","text":"Explain and explain+analyze EXPLAIN analyzes query plans from the optimizer, including how tables are joined, which tables/rows are scanned etc. Explain analyze shows the above and additional info like execution cost, number of rows returned, time taken etc. This knowledge is useful to tweak queries and add indexes. Watch this performance tuning tutorial video . Checkout the lab section for a hands-on about indexes. Slow query logs Used to identify slow queries (configurable threshold), enabled in config or dynamically with a query Checkout the lab section about identifying slow queries. User management This includes creation and changes to users, like managing privileges, changing password etc. Backup and restore strategies, pros and cons Logical backup using mysqldump - slower but can be done online Physical backup (copy data directory or use xtrabackup) - quick backup/recovery. Copying data directory requires locking or shut down. xtrabackup is an improvement because it supports backups without shutting down (hot backup). Others - PITR, snapshots etc. Crash recovery process using redo logs After a crash, when you restart server it reads redo logs and replays modifications to recover Monitoring MySQL Key MySQL metrics: reads, writes, query runtime, errors, slow queries, connections, running threads, InnoDB metrics Key OS metrics: CPU, load, memory, disk I/O, network Replication Copies data from one instance to one or more instances. Helps in horizontal scaling, data protection, analytics and performance. Binlog dump thread on primary, replication I/O and SQL threads on secondary. Strategies include the standard async, semi async or group replication. High Availability Ability to cope with failure at software, hardware and network level. Essential for anyone who needs 99.9%+ uptime. Can be implemented with replication or clustering solutions from MySQL, Percona, Oracle etc. Requires expertise to setup and maintain. Failover can be manual, scripted or using tools like Orchestrator. Data directory Data is stored in a particular directory, with nested directories for the data contained in each database. There are also MySQL log files, InnoDB log files, server process ID file and some other configs. The data directory is configurable. MySQL configuration This can be done by passing parameters during startup , or in a file . There are a few standard paths where MySQL looks for config files, /etc/my.cnf is one of the commonly used paths. These options are organized under headers (mysqld for server and mysql for client), you can explore them more in the lab that follows. Logs MySQL has logs for various purposes - general query log, errors, binary logs (for replication), slow query log. Only error log is enabled by default (to reduce I/O and storage requirement), the others can be enabled when required - by specifying config parameters at startup or running commands at runtime. Log destination can also be tweaked with config parameters.","title":"Operational Concepts"},{"location":"level101/databases_sql/query_performance/","text":"Query Performance Improvement Query Performance is a very crucial aspect of relational databases. If not tuned correctly, the select queries can become slow and painful for the application, and for the MySQL server as well. The important task is to identify the slow queries and try to improve their performance by either rewriting them or creating proper indexes on the tables involved in it. The Slow Query Log The slow query log contains SQL statements that take a longer time to execute then set in the config parameter long_query_time. These queries are the candidates for optimization. There are some good utilities to summarize the slow query logs like, mysqldumpslow (provided by MySQL itself), pt-query-digest (provided by Percona), etc. Following are the config parameters that are used to enable and effectively catch slow queries Variable Explanation Example value slow_query_log Enables or disables slow query logs ON slow_query_log_file The location of the slow query log /var/lib/mysql/mysql-slow.log long_query_time Threshold time. The query that takes longer than this time is logged in slow query log 5 log_queries_not_using_indexes When enabled with the slow query log, the queries which do not make use of any index are also logged in the slow query log even though they take less time than long_query_time. ON So, for this section, we will be enabling slow_query_log , long_query_time will be kept to 0.3 (300 ms) , and log_queries_not_using index will be enabled as well. Below are the queries that we will execute on the employees database. select * from employees where last_name = 'Koblick'; select * from salaries where salary >= 100000; select * from titles where title = 'Manager'; select * from employees where year(hire_date) = 1995; select year(e.hire_date), max(s.salary) from employees e join salaries s on e.emp_no=s.emp_no group by year(e.hire_date); Now, queries 1 , 3 and 4 executed under 300 ms but if we check the slow query logs, we will find these queries logged as they are not using any of the index. Queries 2 and 5 are taking longer than 300ms and also not using any index. Use the following command to get the summary of the slow query log mysqldumpslow /var/lib/mysql/mysql-slow.log There are some more queries in the snapshot that were along with the queries mentioned. Mysqldumpslow replaces actual values that were used by N (in case of numbers) and S (in case of strings). That can be overridden by -a option, however that will increase the output lines if different values are used in similar queries. The EXPLAIN Plan The EXPLAIN command is used with any query that we want to analyze. It describes the query execution plan, how MySQL sees and executes the query. EXPLAIN works with Select, Insert, Update and Delete statements. It tells about different aspects of the query like, how tables are joined, indexes used or not, etc. The important thing here is to understand the basic Explain plan output of a query to determine its performance. Let's take the following query as an example, mysql> explain select * from salaries where salary = 100000; +----+-------------+----------+------------+------+---------------+------+---------+------+---------+----------+-------------+ | id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra | +----+-------------+----------+------------+------+---------------+------+---------+------+---------+----------+-------------+ | 1 | SIMPLE | salaries | NULL | ALL | NULL | NULL | NULL | NULL | 2838426 | 10.00 | Using where | +----+-------------+----------+------------+------+---------------+------+---------+------+---------+----------+-------------+ 1 row in set, 1 warning (0.00 sec) The key aspects to understand in the above output are:- Partitions - the number of partitions considered while executing the query. It is only valid if the table is partitioned. Possible_keys - the list of indexes that were considered during creation of the execution plan. Key - the index that will be used while executing the query. Rows - the number of rows examined during the execution. Filtered - the percentage of rows that were filtered out of the rows examined. The maximum and most optimized result will have 100 in this field. Extra - this tells some extra information on how MySQL evaluates, whether the query is using only where clause to match target rows, any index or temporary table, etc. So, for the above query, we can determine that there are no partitions, there are no candidate indexes to be used and so no index is used at all, over 2M rows are examined and only 10% of them are included in the result, and lastly, only a where clause is used to match the target rows. Creating an Index Indexes are used to speed up selecting relevant rows for a given column value. Without an index, MySQL starts with the first row and goes through the entire table to find matching rows. If the table has too many rows, the operation becomes costly. With indexes, MySQL determines the position to start looking for the data without reading the full table. A primary key is also an index which is also the fastest and is stored along with the table data. Secondary indexes are stored outside of the table data and are used to further enhance the performance of SQL statements. Indexes are mostly stored as B-Trees, with some exceptions like spatial indexes use R-Trees and memory tables use hash indexes. There are 2 ways to create indexes:- While creating a table - if we know beforehand the columns that will drive the most number of where clauses in select queries, then we can put an index over them while creating a table. Altering a Table - To improve the performance of a troubling query, we create an index on a table which already has data in it using ALTER or CREATE INDEX command. This operation does not block the table but might take some time to complete depending on the size of the table. Let\u2019s look at the query that we discussed in the previous section. It\u2019s clear that scanning over 2M records is not a good idea when only 10% of those records are actually in the resultset. Hence, we create an index on the salary column of the salaries table. create index idx_salary on salaries(salary) OR alter table salaries add index idx_salary(salary) And the same explain plan now looks like this mysql> explain select * from salaries where salary = 100000; +----+-------------+----------+------------+------+---------------+------------+---------+-------+------+----------+-------+ | id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra | +----+-------------+----------+------------+------+---------------+------------+---------+-------+------+----------+-------+ | 1 | SIMPLE | salaries | NULL | ref | idx_salary | idx_salary | 4 | const | 13 | 100.00 | NULL | +----+-------------+----------+------------+------+---------------+------------+---------+-------+------+----------+-------+ 1 row in set, 1 warning (0.00 sec) Now the index used is idx_salary, the one we recently created. The index actually helped examine only 13 records and all of them are in the resultset. Also, the query execution time is also reduced from over 700ms to almost negligible. Let\u2019s look at another example. Here we are searching for a specific combination of first_name and last_name. But, we might also search based on last_name only. mysql> explain select * from employees where last_name = 'Dredge' and first_name = 'Yinghua'; +----+-------------+-----------+------------+------+---------------+------+---------+------+--------+----------+-------------+ | id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra | +----+-------------+-----------+------------+------+---------------+------+---------+------+--------+----------+-------------+ | 1 | SIMPLE | employees | NULL | ALL | NULL | NULL | NULL | NULL | 299468 | 1.00 | Using where | +----+-------------+-----------+------------+------+---------------+------+---------+------+--------+----------+-------------+ 1 row in set, 1 warning (0.00 sec) Now only 1% record out of almost 300K is the resultset. Although the query time is particularly quick as we have only 300K records, this will be a pain if the number of records are over millions. In this case, we create an index on last_name and first_name, not separately, but a composite index including both the columns. create index idx_last_first on employees(last_name, first_name) mysql> explain select * from employees where last_name = 'Dredge' and first_name = 'Yinghua'; +----+-------------+-----------+------------+------+----------------+----------------+---------+-------------+------+----------+-------+ | id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra | +----+-------------+-----------+------------+------+----------------+----------------+---------+-------------+------+----------+-------+ | 1 | SIMPLE | employees | NULL | ref | idx_last_first | idx_last_first | 124 | const,const | 1 | 100.00 | NULL | +----+-------------+-----------+------------+------+----------------+----------------+---------+-------------+------+----------+-------+ 1 row in set, 1 warning (0.00 sec) We chose to put last_name before first_name while creating the index as the optimizer starts from the leftmost prefix of the index while evaluating the query. For example, if we have a 3-column index like idx(c1, c2, c3), then the search capability of the index follows - (c1), (c1, c2) or (c1, c2, c3) i.e. if your where clause has only first_name this index won\u2019t work. mysql> explain select * from employees where first_name = 'Yinghua'; +----+-------------+-----------+------------+------+---------------+------+---------+------+--------+----------+-------------+ | id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra | +----+-------------+-----------+------------+------+---------------+------+---------+------+--------+----------+-------------+ | 1 | SIMPLE | employees | NULL | ALL | NULL | NULL | NULL | NULL | 299468 | 10.00 | Using where | +----+-------------+-----------+------------+------+---------------+------+---------+------+--------+----------+-------------+ 1 row in set, 1 warning (0.00 sec) But, if you have only the last_name in the where clause, it will work as expected. mysql> explain select * from employees where last_name = 'Dredge'; +----+-------------+-----------+------------+------+----------------+----------------+---------+-------+------+----------+-------+ | id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra | +----+-------------+-----------+------------+------+----------------+----------------+---------+-------+------+----------+-------+ | 1 | SIMPLE | employees | NULL | ref | idx_last_first | idx_last_first | 66 | const | 200 | 100.00 | NULL | +----+-------------+-----------+------------+------+----------------+----------------+---------+-------+------+----------+-------+ 1 row in set, 1 warning (0.00 sec) For another example, use the following queries:- create table employees_2 like employees; create table salaries_2 like salaries; alter table salaries_2 drop primary key; We made copies of employees and salaries tables without the Primary Key of salaries table to understand an example of Select with Join. When you have queries like the below, it becomes tricky to identify the pain point of the query. mysql> select e.first_name, e.last_name, s.salary, e.hire_date from employees_2 e join salaries_2 s on e.emp_no=s.emp_no where e.last_name='Dredge'; 1860 rows in set (4.44 sec) This query is taking about 4.5 seconds to complete with 1860 rows in the resultset. Let\u2019s look at the Explain plan. There will be 2 records in the Explain plan as 2 tables are used in the query. mysql> explain select e.first_name, e.last_name, s.salary, e.hire_date from employees_2 e join salaries_2 s on e.emp_no=s.emp_no where e.last_name='Dredge'; +----+-------------+-------+------------+--------+------------------------+---------+---------+--------------------+---------+----------+-------------+ | id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra | +----+-------------+-------+------------+--------+------------------------+---------+---------+--------------------+---------+----------+-------------+ | 1 | SIMPLE | s | NULL | ALL | NULL | NULL | NULL | NULL | 2837194 | 100.00 | NULL | | 1 | SIMPLE | e | NULL | eq_ref | PRIMARY,idx_last_first | PRIMARY | 4 | employees.s.emp_no | 1 | 5.00 | Using where | +----+-------------+-------+------------+--------+------------------------+---------+---------+--------------------+---------+----------+-------------+ 2 rows in set, 1 warning (0.00 sec) These are in order of evaluation i.e. salaries_2 will be evaluated first and then employees_2 will be joined to it. As it looks like, it scans almost all the rows of salaries_2 table and tries to match the employees_2 rows as per the join condition. Though where clause is used in fetching the final resultset, but the index corresponding to the where clause is not used for the employees_2 table. If the join is done on two indexes which have the same data-types, it will always be faster. So, let\u2019s create an index on the emp_no column of salaries_2 table and analyze the query again. create index idx_empno on salaries_2(emp_no); mysql> explain select e.first_name, e.last_name, s.salary, e.hire_date from employees_2 e join salaries_2 s on e.emp_no=s.emp_no where e.last_name='Dredge'; +----+-------------+-------+------------+------+------------------------+----------------+---------+--------------------+------+----------+-------+ | id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra | +----+-------------+-------+------------+------+------------------------+----------------+---------+--------------------+------+----------+-------+ | 1 | SIMPLE | e | NULL | ref | PRIMARY,idx_last_first | idx_last_first | 66 | const | 200 | 100.00 | NULL | | 1 | SIMPLE | s | NULL | ref | idx_empno | idx_empno | 4 | employees.e.emp_no | 9 | 100.00 | NULL | +----+-------------+-------+------------+------+------------------------+----------------+---------+--------------------+------+----------+-------+ 2 rows in set, 1 warning (0.00 sec) Now, not only did the index help the optimizer to examine only a few rows in both tables, it reversed the order of the tables in evaluation. The employees_2 table is evaluated first and rows are selected as per the index respective to the where clause. Then the records are joined to salaries_2 table as per the index used due to the join condition. The execution time of the query came down from 4.5s to 0.02s . mysql> select e.first_name, e.last_name, s.salary, e.hire_date from employees_2 e join salaries_2 s on e.emp_no=s.emp_no where e.last_name='Dredge'\\G 1860 rows in set (0.02 sec)","title":"Query Performance"},{"location":"level101/databases_sql/query_performance/#query-performance-improvement","text":"Query Performance is a very crucial aspect of relational databases. If not tuned correctly, the select queries can become slow and painful for the application, and for the MySQL server as well. The important task is to identify the slow queries and try to improve their performance by either rewriting them or creating proper indexes on the tables involved in it.","title":"Query Performance Improvement"},{"location":"level101/databases_sql/query_performance/#the-slow-query-log","text":"The slow query log contains SQL statements that take a longer time to execute then set in the config parameter long_query_time. These queries are the candidates for optimization. There are some good utilities to summarize the slow query logs like, mysqldumpslow (provided by MySQL itself), pt-query-digest (provided by Percona), etc. Following are the config parameters that are used to enable and effectively catch slow queries Variable Explanation Example value slow_query_log Enables or disables slow query logs ON slow_query_log_file The location of the slow query log /var/lib/mysql/mysql-slow.log long_query_time Threshold time. The query that takes longer than this time is logged in slow query log 5 log_queries_not_using_indexes When enabled with the slow query log, the queries which do not make use of any index are also logged in the slow query log even though they take less time than long_query_time. ON So, for this section, we will be enabling slow_query_log , long_query_time will be kept to 0.3 (300 ms) , and log_queries_not_using index will be enabled as well. Below are the queries that we will execute on the employees database. select * from employees where last_name = 'Koblick'; select * from salaries where salary >= 100000; select * from titles where title = 'Manager'; select * from employees where year(hire_date) = 1995; select year(e.hire_date), max(s.salary) from employees e join salaries s on e.emp_no=s.emp_no group by year(e.hire_date); Now, queries 1 , 3 and 4 executed under 300 ms but if we check the slow query logs, we will find these queries logged as they are not using any of the index. Queries 2 and 5 are taking longer than 300ms and also not using any index. Use the following command to get the summary of the slow query log mysqldumpslow /var/lib/mysql/mysql-slow.log There are some more queries in the snapshot that were along with the queries mentioned. Mysqldumpslow replaces actual values that were used by N (in case of numbers) and S (in case of strings). That can be overridden by -a option, however that will increase the output lines if different values are used in similar queries.","title":"The Slow Query Log"},{"location":"level101/databases_sql/query_performance/#the-explain-plan","text":"The EXPLAIN command is used with any query that we want to analyze. It describes the query execution plan, how MySQL sees and executes the query. EXPLAIN works with Select, Insert, Update and Delete statements. It tells about different aspects of the query like, how tables are joined, indexes used or not, etc. The important thing here is to understand the basic Explain plan output of a query to determine its performance. Let's take the following query as an example, mysql> explain select * from salaries where salary = 100000; +----+-------------+----------+------------+------+---------------+------+---------+------+---------+----------+-------------+ | id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra | +----+-------------+----------+------------+------+---------------+------+---------+------+---------+----------+-------------+ | 1 | SIMPLE | salaries | NULL | ALL | NULL | NULL | NULL | NULL | 2838426 | 10.00 | Using where | +----+-------------+----------+------------+------+---------------+------+---------+------+---------+----------+-------------+ 1 row in set, 1 warning (0.00 sec) The key aspects to understand in the above output are:- Partitions - the number of partitions considered while executing the query. It is only valid if the table is partitioned. Possible_keys - the list of indexes that were considered during creation of the execution plan. Key - the index that will be used while executing the query. Rows - the number of rows examined during the execution. Filtered - the percentage of rows that were filtered out of the rows examined. The maximum and most optimized result will have 100 in this field. Extra - this tells some extra information on how MySQL evaluates, whether the query is using only where clause to match target rows, any index or temporary table, etc. So, for the above query, we can determine that there are no partitions, there are no candidate indexes to be used and so no index is used at all, over 2M rows are examined and only 10% of them are included in the result, and lastly, only a where clause is used to match the target rows.","title":"The EXPLAIN Plan"},{"location":"level101/databases_sql/query_performance/#creating-an-index","text":"Indexes are used to speed up selecting relevant rows for a given column value. Without an index, MySQL starts with the first row and goes through the entire table to find matching rows. If the table has too many rows, the operation becomes costly. With indexes, MySQL determines the position to start looking for the data without reading the full table. A primary key is also an index which is also the fastest and is stored along with the table data. Secondary indexes are stored outside of the table data and are used to further enhance the performance of SQL statements. Indexes are mostly stored as B-Trees, with some exceptions like spatial indexes use R-Trees and memory tables use hash indexes. There are 2 ways to create indexes:- While creating a table - if we know beforehand the columns that will drive the most number of where clauses in select queries, then we can put an index over them while creating a table. Altering a Table - To improve the performance of a troubling query, we create an index on a table which already has data in it using ALTER or CREATE INDEX command. This operation does not block the table but might take some time to complete depending on the size of the table. Let\u2019s look at the query that we discussed in the previous section. It\u2019s clear that scanning over 2M records is not a good idea when only 10% of those records are actually in the resultset. Hence, we create an index on the salary column of the salaries table. create index idx_salary on salaries(salary) OR alter table salaries add index idx_salary(salary) And the same explain plan now looks like this mysql> explain select * from salaries where salary = 100000; +----+-------------+----------+------------+------+---------------+------------+---------+-------+------+----------+-------+ | id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra | +----+-------------+----------+------------+------+---------------+------------+---------+-------+------+----------+-------+ | 1 | SIMPLE | salaries | NULL | ref | idx_salary | idx_salary | 4 | const | 13 | 100.00 | NULL | +----+-------------+----------+------------+------+---------------+------------+---------+-------+------+----------+-------+ 1 row in set, 1 warning (0.00 sec) Now the index used is idx_salary, the one we recently created. The index actually helped examine only 13 records and all of them are in the resultset. Also, the query execution time is also reduced from over 700ms to almost negligible. Let\u2019s look at another example. Here we are searching for a specific combination of first_name and last_name. But, we might also search based on last_name only. mysql> explain select * from employees where last_name = 'Dredge' and first_name = 'Yinghua'; +----+-------------+-----------+------------+------+---------------+------+---------+------+--------+----------+-------------+ | id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra | +----+-------------+-----------+------------+------+---------------+------+---------+------+--------+----------+-------------+ | 1 | SIMPLE | employees | NULL | ALL | NULL | NULL | NULL | NULL | 299468 | 1.00 | Using where | +----+-------------+-----------+------------+------+---------------+------+---------+------+--------+----------+-------------+ 1 row in set, 1 warning (0.00 sec) Now only 1% record out of almost 300K is the resultset. Although the query time is particularly quick as we have only 300K records, this will be a pain if the number of records are over millions. In this case, we create an index on last_name and first_name, not separately, but a composite index including both the columns. create index idx_last_first on employees(last_name, first_name) mysql> explain select * from employees where last_name = 'Dredge' and first_name = 'Yinghua'; +----+-------------+-----------+------------+------+----------------+----------------+---------+-------------+------+----------+-------+ | id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra | +----+-------------+-----------+------------+------+----------------+----------------+---------+-------------+------+----------+-------+ | 1 | SIMPLE | employees | NULL | ref | idx_last_first | idx_last_first | 124 | const,const | 1 | 100.00 | NULL | +----+-------------+-----------+------------+------+----------------+----------------+---------+-------------+------+----------+-------+ 1 row in set, 1 warning (0.00 sec) We chose to put last_name before first_name while creating the index as the optimizer starts from the leftmost prefix of the index while evaluating the query. For example, if we have a 3-column index like idx(c1, c2, c3), then the search capability of the index follows - (c1), (c1, c2) or (c1, c2, c3) i.e. if your where clause has only first_name this index won\u2019t work. mysql> explain select * from employees where first_name = 'Yinghua'; +----+-------------+-----------+------------+------+---------------+------+---------+------+--------+----------+-------------+ | id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra | +----+-------------+-----------+------------+------+---------------+------+---------+------+--------+----------+-------------+ | 1 | SIMPLE | employees | NULL | ALL | NULL | NULL | NULL | NULL | 299468 | 10.00 | Using where | +----+-------------+-----------+------------+------+---------------+------+---------+------+--------+----------+-------------+ 1 row in set, 1 warning (0.00 sec) But, if you have only the last_name in the where clause, it will work as expected. mysql> explain select * from employees where last_name = 'Dredge'; +----+-------------+-----------+------------+------+----------------+----------------+---------+-------+------+----------+-------+ | id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra | +----+-------------+-----------+------------+------+----------------+----------------+---------+-------+------+----------+-------+ | 1 | SIMPLE | employees | NULL | ref | idx_last_first | idx_last_first | 66 | const | 200 | 100.00 | NULL | +----+-------------+-----------+------------+------+----------------+----------------+---------+-------+------+----------+-------+ 1 row in set, 1 warning (0.00 sec) For another example, use the following queries:- create table employees_2 like employees; create table salaries_2 like salaries; alter table salaries_2 drop primary key; We made copies of employees and salaries tables without the Primary Key of salaries table to understand an example of Select with Join. When you have queries like the below, it becomes tricky to identify the pain point of the query. mysql> select e.first_name, e.last_name, s.salary, e.hire_date from employees_2 e join salaries_2 s on e.emp_no=s.emp_no where e.last_name='Dredge'; 1860 rows in set (4.44 sec) This query is taking about 4.5 seconds to complete with 1860 rows in the resultset. Let\u2019s look at the Explain plan. There will be 2 records in the Explain plan as 2 tables are used in the query. mysql> explain select e.first_name, e.last_name, s.salary, e.hire_date from employees_2 e join salaries_2 s on e.emp_no=s.emp_no where e.last_name='Dredge'; +----+-------------+-------+------------+--------+------------------------+---------+---------+--------------------+---------+----------+-------------+ | id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra | +----+-------------+-------+------------+--------+------------------------+---------+---------+--------------------+---------+----------+-------------+ | 1 | SIMPLE | s | NULL | ALL | NULL | NULL | NULL | NULL | 2837194 | 100.00 | NULL | | 1 | SIMPLE | e | NULL | eq_ref | PRIMARY,idx_last_first | PRIMARY | 4 | employees.s.emp_no | 1 | 5.00 | Using where | +----+-------------+-------+------------+--------+------------------------+---------+---------+--------------------+---------+----------+-------------+ 2 rows in set, 1 warning (0.00 sec) These are in order of evaluation i.e. salaries_2 will be evaluated first and then employees_2 will be joined to it. As it looks like, it scans almost all the rows of salaries_2 table and tries to match the employees_2 rows as per the join condition. Though where clause is used in fetching the final resultset, but the index corresponding to the where clause is not used for the employees_2 table. If the join is done on two indexes which have the same data-types, it will always be faster. So, let\u2019s create an index on the emp_no column of salaries_2 table and analyze the query again. create index idx_empno on salaries_2(emp_no); mysql> explain select e.first_name, e.last_name, s.salary, e.hire_date from employees_2 e join salaries_2 s on e.emp_no=s.emp_no where e.last_name='Dredge'; +----+-------------+-------+------------+------+------------------------+----------------+---------+--------------------+------+----------+-------+ | id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra | +----+-------------+-------+------------+------+------------------------+----------------+---------+--------------------+------+----------+-------+ | 1 | SIMPLE | e | NULL | ref | PRIMARY,idx_last_first | idx_last_first | 66 | const | 200 | 100.00 | NULL | | 1 | SIMPLE | s | NULL | ref | idx_empno | idx_empno | 4 | employees.e.emp_no | 9 | 100.00 | NULL | +----+-------------+-------+------------+------+------------------------+----------------+---------+--------------------+------+----------+-------+ 2 rows in set, 1 warning (0.00 sec) Now, not only did the index help the optimizer to examine only a few rows in both tables, it reversed the order of the tables in evaluation. The employees_2 table is evaluated first and rows are selected as per the index respective to the where clause. Then the records are joined to salaries_2 table as per the index used due to the join condition. The execution time of the query came down from 4.5s to 0.02s . mysql> select e.first_name, e.last_name, s.salary, e.hire_date from employees_2 e join salaries_2 s on e.emp_no=s.emp_no where e.last_name='Dredge'\\G 1860 rows in set (0.02 sec)","title":"Creating an Index"},{"location":"level101/databases_sql/replication/","text":"MySQL Replication Replication enables data from one MySQL host (termed as Primary) to be copied to another MySQL host (termed as Replica). MySQL Replication is asynchronous in nature by default, but it can be changed to semi-synchronous with some configurations. Some common applications of MySQL replication are:- Read-scaling - as multiple hosts can replicate the data from a single primary host, we can set up as many replicas as we need and scale reads through them, i.e. application writes will go to a single primary host and the reads can balance between all the replicas that are there. Such a setup can improve the write performance as well, as the primary is dedicated to only updates and not reads. Backups using replicas - the backup process can sometimes be a little heavy. But if we have replicas configured, then we can use one of them to get the backup without affecting the primary data at all. Disaster Recovery - a replica in some other geographical region paves a proper path to configure disaster recovery. MySQL supports different types of synchronizations as well:- Asynchronous - this is the default synchronization method. It is one-way, i.e. one host serves as primary and one or more hosts as replica. We will discuss this method throughout the replication topic. Semi-Synchronous - in this type of synchronization, a commit performed on the primary host is blocked until at least one replica acknowledges it. Post the acknowledgement from any one replica, the control is returned to the session that performed the transaction. This ensures strong consistency but the replication is slower than asynchronous. Delayed - we can deliberately lag the replica in a typical MySQL replication by the number of seconds desired by the use case. This type of replication safeguards from severe human errors of dropping or corrupting the data on the primary, for example, in the above diagram for Delayed Replication, if a DROP DATABASE is executed by mistake on the primary, we still have 30 minutes to recover the data from R2 as that command has not been replicated on R2 yet. Pre-Requisites Before we dive into setting up replication, we should know about the binary logs. Binary logs play a very important role in MySQL replication. Binary logs, or commonly known as binlogs contain events about the changes done to the database, like table structure changes, data changes via DML operations, etc. They are not used to log SELECT statements. For replication, the primary sends the information to the replicas using its binlogs about the changes done to the database, and the replicas make the same data changes. With respect to MySQL replication, the binary log format can be of two types that decides the main type of replication:- - Statement-Based Replication or SBR - Row-Based Replication or RBR Statement Based Binlog Format Originally, the replication in MySQL was based on SQL statements getting replicated and executed on the replica from the primary. This is called statement based logging. The binlog contains the exact SQL statement run by the session. So If we run the above statements to insert 3 records and the update 3 in a single update statement, they will be logged exactly the same as when we executed them. Row Based Binlog Format The Row based is the default one in the latest MySQL releases. This is a lot different from the Statement format as here, row events are logged instead of statements. By that we mean, in the above example one update statement affected 3 records, but binlog had only one update statement; if it is a row based format, binlog will have an event for each record updated. Statement Based v/s Row Based binlogs Let\u2019s have a look at the operational differences between statement-based and row-based binlogs. Statement Based Row Based Logs SQL statements as executed Logs row events based on SQL statements executed Takes lesser disk space Takes more disk space Restoring using binlogs is faster Restoring using binlogs is slower When used for replication, if any statement has a predefined function that has its own value, like sysdate(), uuid() etc, the output could be different on the replica which makes it inconsistent. Whatever is executed becomes a row event with values, so there will be no problem if such functions are used in SQL statements. Only statements are logged so no other row events are generated. A lot of events are generated when a table is copied into another using INSERT INTO SELECT. Note - There is another type of binlog format called Mixed . With mixed logging, statement based is used by default but it switches to row based in certain cases. If MySQL cannot guarantee that statement based logging is safe for the statements executed, it issues a warning and switches to row based for those statements. We will be using binary log format as Row for the entire replication topic. Replication in Motion The above figure indicates how a typical MySQL replication works. Replica_IO_Thread is responsible to fetch the binlog events from the primary binary logs to the replica On the Replica host, relay logs are created which are exact copies of the binary logs. If the binary logs on primary are in row format, the relay logs will be the same. Replica_SQL_Thread applies the relay logs on the replica MySQL server. If log-bin is enabled on the replica, then the replica will have its own binary logs as well. If log-slave-updates is enabled, then it will have the updates from the primary logged in the binlogs as well. Setting up Replication In this section, we will set up a simple asynchronous replication. The binlogs will be in row based format. The replication will be set up on two fresh hosts with no prior data present. There are two different ways in which we can set up replication. Binlog based - Each replica keeps a record of the binlog coordinates on the primary - current binlog and position in the binlog till where it has read and processed. So, at a time different replicas might be reading different parts of the same binlog. GTID based - Every transaction gets an identifier called global transaction identifier or GTID. There is no need to keep the record of binlog coordinates, as long as the replica has all the GTIDs executed on the primary, it is consistent with the primary. A typical GTID is the server_uuid:# positive integer. We will set up a GTID based replication in the following section but will also discuss binlog based replication setup as well. Primary Host Configurations The following config parameters should be present in the primary my.cnf file for setting up GTID based replication. server-id - a unique ID for the mysql server log-bin - the binlog location binlog-format - ROW | STATEMENT (we will use ROW) gtid-mode - ON enforce-gtid-consistency - ON (allows execution of only those statements which can be logged using GTIDs) Replica Host Configurations The following config parameters should be present in the replica my.cnf file for setting up replication. server-id - different than the primary host log-bin - (optional, if you want replica to log its own changes as well) binlog-format - depends on the above gtid-mode - ON enforce-gtid-consistency - ON log-slave-updates - ON (if binlog is enabled, then we can enable this. This enables the replica to log the changes coming from the primary along with its own changes. Helps in setting up chain replication) Replication User Every replica connects to the primary using a mysql user for replicating. So there must be a mysql user account for the same on the primary host. Any user can be used for this purpose provided it has REPLICATION SLAVE privilege. If the sole purpose is replication then we can have a user with only the required privilege. On the primary host mysql> create user repl_user@ identified by 'xxxxx'; mysql> grant replication slave on *.* to repl_user@''; Obtaining Starting position from Primary Run the following command on the primary host mysql> show master status\\G *************************** 1. row *************************** File: mysql-bin.000001 Position: 73 Binlog_Do_DB: Binlog_Ignore_DB: Executed_Gtid_Set: e17d0920-d00e-11eb-a3e6-000d3aa00f87:1-3 1 row in set (0.00 sec) If we are working with binary log based replication, the top two output lines are the most important ones. That tells the current binlog on the primary host and till what position it has written. For fresh hosts we know that no data is written so we can directly set up replication using the very first binlog file and position 4. If we are setting up a replication from a backup, then that changes the way we obtain the starting position. For GTIDs, the executed_gtid_set is the value where primary is right now. Again, for a fresh setup, we don\u2019t have to specify anything about the starting point and it will start from the transaction id 1, but when we set up from a backup, the backup will contain the GTID positions till where backup has been taken. Setting up Replica The replication setup must know about the primary host, the user and password to connect, the binlog coordinates (for binlog based replication) or the GTID auto-position parameter. The following command is used for setting up change master to master_host = '', master_port = , master_user = 'repl_user', master_password = 'xxxxx', master_auto_position = 1; Note - the Change Master To command has been replaced with Change Replication Source To from Mysql 8.0.23 onwards, also all the master and slave keywords are replaced with source and replica . If it is binlog based replication, then instead of master_auto_position, we need to specify the binlog coordinates. master_log_file = 'mysql-bin.000001', master_log_pos = 4 Starting Replication and Check Status Now that everything is configured, we just need to start the replication on the replica via the following command start slave; OR from MySQL 8.0.23 onwards, start replica; Whether or not the replication is running successfully, we can determine by running the following command show slave status\\G OR from MySQL 8.0.23 onwards, show replica status\\G mysql> show replica status\\G *************************** 1. row *************************** Replica_IO_State: Waiting for master to send event Source_Host: Source_User: repl_user Source_Port: Connect_Retry: 60 Source_Log_File: mysql-bin.000001 Read_Source_Log_Pos: 852 Relay_Log_File: mysql-relay-bin.000002 Relay_Log_Pos: 1067 Relay_Source_Log_File: mysql-bin.000001 Replica_IO_Running: Yes Replica_SQL_Running: Yes Replicate_Do_DB: Replicate_Ignore_DB: Replicate_Do_Table: Replicate_Ignore_Table: Replicate_Wild_Do_Table: Replicate_Wild_Ignore_Table: Last_Errno: 0 Last_Error: Skip_Counter: 0 Exec_Source_Log_Pos: 852 Relay_Log_Space: 1283 Until_Condition: None Until_Log_File: Until_Log_Pos: 0 Source_SSL_Allowed: No Source_SSL_CA_File: Source_SSL_CA_Path: Source_SSL_Cert: Source_SSL_Cipher: Source_SSL_Key: Seconds_Behind_Source: 0 Source_SSL_Verify_Server_Cert: No Last_IO_Errno: 0 Last_IO_Error: Last_SQL_Errno: 0 Last_SQL_Error: Replicate_Ignore_Server_Ids: Source_Server_Id: 1 Source_UUID: e17d0920-d00e-11eb-a3e6-000d3aa00f87 Source_Info_File: mysql.slave_master_info SQL_Delay: 0 SQL_Remaining_Delay: NULL Replica_SQL_Running_State: Slave has read all relay log; waiting for more updates Source_Retry_Count: 86400 Source_Bind: Last_IO_Error_Timestamp: Last_SQL_Error_Timestamp: Source_SSL_Crl: Source_SSL_Crlpath: Retrieved_Gtid_Set: e17d0920-d00e-11eb-a3e6-000d3aa00f87:1-3 Executed_Gtid_Set: e17d0920-d00e-11eb-a3e6-000d3aa00f87:1-3 Auto_Position: 1 Replicate_Rewrite_DB: Channel_Name: Source_TLS_Version: Source_public_key_path: Get_Source_public_key: 0 Network_Namespace: 1 row in set (0.00 sec) Some of the parameters are explained below:- Relay_Source_Log_File - the primary\u2019s file where replica is currently reading from Execute_Source_Log_Pos - for the above file on which position is the replica reading currently from. These two parameters are of utmost importance when binlog based replication is used. Replica_IO_Running - IO thread of replica is running or not Replica_SQL_Running - SQL thread of replica is running or not Seconds_Behind_Source - the difference of seconds when a statement was executed on Primary and then on Replica. This indicates how much replication lag is there. Source_UUID - the uuid of the primary host Retrieved_Gtid_Set - the GTIDs fetched from the primary host by the replica to be executed. Executed_Gtid_Set - the GTIDs executed on the replica. This set remains the same for the entire cluster if the replicas are in sync. Auto_Position - it directs the replica to fetch the next GTID automatically Create a Replica for the already setup cluster The steps discussed in the previous section talks about the setting up replication on two fresh hosts. When we have to set up a replica for a host which is already serving applications, then the backup of the primary is used, either fresh backup taken for the replica (should only be done if the traffic it is serving is less) or use a recently taken backup. If the size of the databases on the MySQL primary server is small, less than 100G recommended, then mysqldump can be used to take backup along with the following options. mysqldump -uroot -p -hhost_ip -P3306 --all-databases --single-transaction --master-data=1 > primary_host.bkp --single-transaction - this option starts a transaction before taking the backup which ensures it is consistent. As transactions are isolated from each other, so no other writes affect the backup. --master-data - this option is required if binlog based replication is desired to be set up. It includes the binary log file and log file position in the backup file. When GTID mode is enabled and mysqldump is executed, it includes the GTID executed to be used to start the replica after the backup position. The contents of the mysqldump output file will have the following It is recommended to comment these before restoring otherwise they could throw errors. Also, using master-data=2 will automatically comment the master_log_file line. Similarly, when taking backup of the host using xtrabackup , the file xtrabckup_info file contains the information about binlog file and file position, as well as the GTID executed set. server_version = 8.0.25 start_time = 2021-06-22 03:45:17 end_time = 2021-06-22 03:45:20 lock_time = 0 binlog_pos = filename 'mysql-bin.000007', position '196', GTID of the last change 'e17d0920-d00e-11eb-a3e6-000d3aa00f87:1-5' innodb_from_lsn = 0 innodb_to_lsn = 18153149 partial = N incremental = N format = file compressed = N encrypted = N Now, after setting MySQL server on the desired host, restore the backup taken from any one of the above methods. If the intended way is binlog based replication, then use the binlog file and position info in the following command change Replication Source to source_host = \u2018primary_ip\u2019, source_port = 3306, source_user = \u2018repl_user\u2019, source_password = \u2018xxxxx\u2019, source_log_file = \u2018mysql-bin.000007\u2019, source_log_pos = \u2018196\u2019; If the replication needs to be set via GITDs, then run the below command to tell the replica about the GTIDs already executed. On the Replica host, run th following commands reset master; set global gtid_purged = \u2018e17d0920-d00e-11eb-a3e6-000d3aa00f87:1-5\u2019 change replication source to source_host = \u2018primary_ip\u2019, source_port = 3306, source_user = \u2018repl_user\u2019, source_password = \u2018xxxxx\u2019, source_auto_position = 1 The reset master command resets the position of the binary log to initial. It can be skipped if the host is a freshly installed MySQL, but we restored a backup so it is necessary. The gtid_purged global variable lets the replica know the GTIDs that have already been executed, so that the replication can start after that. Then in the change source command, we set the auto-position to 1 which automatically gets the next GTID to proceed. Further Reading More applications of Replication Automtaed Failovers using MySQL Orchestrator","title":"MySQL Replication"},{"location":"level101/databases_sql/replication/#mysql-replication","text":"Replication enables data from one MySQL host (termed as Primary) to be copied to another MySQL host (termed as Replica). MySQL Replication is asynchronous in nature by default, but it can be changed to semi-synchronous with some configurations. Some common applications of MySQL replication are:- Read-scaling - as multiple hosts can replicate the data from a single primary host, we can set up as many replicas as we need and scale reads through them, i.e. application writes will go to a single primary host and the reads can balance between all the replicas that are there. Such a setup can improve the write performance as well, as the primary is dedicated to only updates and not reads. Backups using replicas - the backup process can sometimes be a little heavy. But if we have replicas configured, then we can use one of them to get the backup without affecting the primary data at all. Disaster Recovery - a replica in some other geographical region paves a proper path to configure disaster recovery. MySQL supports different types of synchronizations as well:- Asynchronous - this is the default synchronization method. It is one-way, i.e. one host serves as primary and one or more hosts as replica. We will discuss this method throughout the replication topic. Semi-Synchronous - in this type of synchronization, a commit performed on the primary host is blocked until at least one replica acknowledges it. Post the acknowledgement from any one replica, the control is returned to the session that performed the transaction. This ensures strong consistency but the replication is slower than asynchronous. Delayed - we can deliberately lag the replica in a typical MySQL replication by the number of seconds desired by the use case. This type of replication safeguards from severe human errors of dropping or corrupting the data on the primary, for example, in the above diagram for Delayed Replication, if a DROP DATABASE is executed by mistake on the primary, we still have 30 minutes to recover the data from R2 as that command has not been replicated on R2 yet. Pre-Requisites Before we dive into setting up replication, we should know about the binary logs. Binary logs play a very important role in MySQL replication. Binary logs, or commonly known as binlogs contain events about the changes done to the database, like table structure changes, data changes via DML operations, etc. They are not used to log SELECT statements. For replication, the primary sends the information to the replicas using its binlogs about the changes done to the database, and the replicas make the same data changes. With respect to MySQL replication, the binary log format can be of two types that decides the main type of replication:- - Statement-Based Replication or SBR - Row-Based Replication or RBR Statement Based Binlog Format Originally, the replication in MySQL was based on SQL statements getting replicated and executed on the replica from the primary. This is called statement based logging. The binlog contains the exact SQL statement run by the session. So If we run the above statements to insert 3 records and the update 3 in a single update statement, they will be logged exactly the same as when we executed them. Row Based Binlog Format The Row based is the default one in the latest MySQL releases. This is a lot different from the Statement format as here, row events are logged instead of statements. By that we mean, in the above example one update statement affected 3 records, but binlog had only one update statement; if it is a row based format, binlog will have an event for each record updated. Statement Based v/s Row Based binlogs Let\u2019s have a look at the operational differences between statement-based and row-based binlogs. Statement Based Row Based Logs SQL statements as executed Logs row events based on SQL statements executed Takes lesser disk space Takes more disk space Restoring using binlogs is faster Restoring using binlogs is slower When used for replication, if any statement has a predefined function that has its own value, like sysdate(), uuid() etc, the output could be different on the replica which makes it inconsistent. Whatever is executed becomes a row event with values, so there will be no problem if such functions are used in SQL statements. Only statements are logged so no other row events are generated. A lot of events are generated when a table is copied into another using INSERT INTO SELECT. Note - There is another type of binlog format called Mixed . With mixed logging, statement based is used by default but it switches to row based in certain cases. If MySQL cannot guarantee that statement based logging is safe for the statements executed, it issues a warning and switches to row based for those statements. We will be using binary log format as Row for the entire replication topic. Replication in Motion The above figure indicates how a typical MySQL replication works. Replica_IO_Thread is responsible to fetch the binlog events from the primary binary logs to the replica On the Replica host, relay logs are created which are exact copies of the binary logs. If the binary logs on primary are in row format, the relay logs will be the same. Replica_SQL_Thread applies the relay logs on the replica MySQL server. If log-bin is enabled on the replica, then the replica will have its own binary logs as well. If log-slave-updates is enabled, then it will have the updates from the primary logged in the binlogs as well.","title":"MySQL Replication"},{"location":"level101/databases_sql/replication/#setting-up-replication","text":"In this section, we will set up a simple asynchronous replication. The binlogs will be in row based format. The replication will be set up on two fresh hosts with no prior data present. There are two different ways in which we can set up replication. Binlog based - Each replica keeps a record of the binlog coordinates on the primary - current binlog and position in the binlog till where it has read and processed. So, at a time different replicas might be reading different parts of the same binlog. GTID based - Every transaction gets an identifier called global transaction identifier or GTID. There is no need to keep the record of binlog coordinates, as long as the replica has all the GTIDs executed on the primary, it is consistent with the primary. A typical GTID is the server_uuid:# positive integer. We will set up a GTID based replication in the following section but will also discuss binlog based replication setup as well. Primary Host Configurations The following config parameters should be present in the primary my.cnf file for setting up GTID based replication. server-id - a unique ID for the mysql server log-bin - the binlog location binlog-format - ROW | STATEMENT (we will use ROW) gtid-mode - ON enforce-gtid-consistency - ON (allows execution of only those statements which can be logged using GTIDs) Replica Host Configurations The following config parameters should be present in the replica my.cnf file for setting up replication. server-id - different than the primary host log-bin - (optional, if you want replica to log its own changes as well) binlog-format - depends on the above gtid-mode - ON enforce-gtid-consistency - ON log-slave-updates - ON (if binlog is enabled, then we can enable this. This enables the replica to log the changes coming from the primary along with its own changes. Helps in setting up chain replication) Replication User Every replica connects to the primary using a mysql user for replicating. So there must be a mysql user account for the same on the primary host. Any user can be used for this purpose provided it has REPLICATION SLAVE privilege. If the sole purpose is replication then we can have a user with only the required privilege. On the primary host mysql> create user repl_user@ identified by 'xxxxx'; mysql> grant replication slave on *.* to repl_user@''; Obtaining Starting position from Primary Run the following command on the primary host mysql> show master status\\G *************************** 1. row *************************** File: mysql-bin.000001 Position: 73 Binlog_Do_DB: Binlog_Ignore_DB: Executed_Gtid_Set: e17d0920-d00e-11eb-a3e6-000d3aa00f87:1-3 1 row in set (0.00 sec) If we are working with binary log based replication, the top two output lines are the most important ones. That tells the current binlog on the primary host and till what position it has written. For fresh hosts we know that no data is written so we can directly set up replication using the very first binlog file and position 4. If we are setting up a replication from a backup, then that changes the way we obtain the starting position. For GTIDs, the executed_gtid_set is the value where primary is right now. Again, for a fresh setup, we don\u2019t have to specify anything about the starting point and it will start from the transaction id 1, but when we set up from a backup, the backup will contain the GTID positions till where backup has been taken. Setting up Replica The replication setup must know about the primary host, the user and password to connect, the binlog coordinates (for binlog based replication) or the GTID auto-position parameter. The following command is used for setting up change master to master_host = '', master_port = , master_user = 'repl_user', master_password = 'xxxxx', master_auto_position = 1; Note - the Change Master To command has been replaced with Change Replication Source To from Mysql 8.0.23 onwards, also all the master and slave keywords are replaced with source and replica . If it is binlog based replication, then instead of master_auto_position, we need to specify the binlog coordinates. master_log_file = 'mysql-bin.000001', master_log_pos = 4 Starting Replication and Check Status Now that everything is configured, we just need to start the replication on the replica via the following command start slave; OR from MySQL 8.0.23 onwards, start replica; Whether or not the replication is running successfully, we can determine by running the following command show slave status\\G OR from MySQL 8.0.23 onwards, show replica status\\G mysql> show replica status\\G *************************** 1. row *************************** Replica_IO_State: Waiting for master to send event Source_Host: Source_User: repl_user Source_Port: Connect_Retry: 60 Source_Log_File: mysql-bin.000001 Read_Source_Log_Pos: 852 Relay_Log_File: mysql-relay-bin.000002 Relay_Log_Pos: 1067 Relay_Source_Log_File: mysql-bin.000001 Replica_IO_Running: Yes Replica_SQL_Running: Yes Replicate_Do_DB: Replicate_Ignore_DB: Replicate_Do_Table: Replicate_Ignore_Table: Replicate_Wild_Do_Table: Replicate_Wild_Ignore_Table: Last_Errno: 0 Last_Error: Skip_Counter: 0 Exec_Source_Log_Pos: 852 Relay_Log_Space: 1283 Until_Condition: None Until_Log_File: Until_Log_Pos: 0 Source_SSL_Allowed: No Source_SSL_CA_File: Source_SSL_CA_Path: Source_SSL_Cert: Source_SSL_Cipher: Source_SSL_Key: Seconds_Behind_Source: 0 Source_SSL_Verify_Server_Cert: No Last_IO_Errno: 0 Last_IO_Error: Last_SQL_Errno: 0 Last_SQL_Error: Replicate_Ignore_Server_Ids: Source_Server_Id: 1 Source_UUID: e17d0920-d00e-11eb-a3e6-000d3aa00f87 Source_Info_File: mysql.slave_master_info SQL_Delay: 0 SQL_Remaining_Delay: NULL Replica_SQL_Running_State: Slave has read all relay log; waiting for more updates Source_Retry_Count: 86400 Source_Bind: Last_IO_Error_Timestamp: Last_SQL_Error_Timestamp: Source_SSL_Crl: Source_SSL_Crlpath: Retrieved_Gtid_Set: e17d0920-d00e-11eb-a3e6-000d3aa00f87:1-3 Executed_Gtid_Set: e17d0920-d00e-11eb-a3e6-000d3aa00f87:1-3 Auto_Position: 1 Replicate_Rewrite_DB: Channel_Name: Source_TLS_Version: Source_public_key_path: Get_Source_public_key: 0 Network_Namespace: 1 row in set (0.00 sec) Some of the parameters are explained below:- Relay_Source_Log_File - the primary\u2019s file where replica is currently reading from Execute_Source_Log_Pos - for the above file on which position is the replica reading currently from. These two parameters are of utmost importance when binlog based replication is used. Replica_IO_Running - IO thread of replica is running or not Replica_SQL_Running - SQL thread of replica is running or not Seconds_Behind_Source - the difference of seconds when a statement was executed on Primary and then on Replica. This indicates how much replication lag is there. Source_UUID - the uuid of the primary host Retrieved_Gtid_Set - the GTIDs fetched from the primary host by the replica to be executed. Executed_Gtid_Set - the GTIDs executed on the replica. This set remains the same for the entire cluster if the replicas are in sync. Auto_Position - it directs the replica to fetch the next GTID automatically Create a Replica for the already setup cluster The steps discussed in the previous section talks about the setting up replication on two fresh hosts. When we have to set up a replica for a host which is already serving applications, then the backup of the primary is used, either fresh backup taken for the replica (should only be done if the traffic it is serving is less) or use a recently taken backup. If the size of the databases on the MySQL primary server is small, less than 100G recommended, then mysqldump can be used to take backup along with the following options. mysqldump -uroot -p -hhost_ip -P3306 --all-databases --single-transaction --master-data=1 > primary_host.bkp --single-transaction - this option starts a transaction before taking the backup which ensures it is consistent. As transactions are isolated from each other, so no other writes affect the backup. --master-data - this option is required if binlog based replication is desired to be set up. It includes the binary log file and log file position in the backup file. When GTID mode is enabled and mysqldump is executed, it includes the GTID executed to be used to start the replica after the backup position. The contents of the mysqldump output file will have the following It is recommended to comment these before restoring otherwise they could throw errors. Also, using master-data=2 will automatically comment the master_log_file line. Similarly, when taking backup of the host using xtrabackup , the file xtrabckup_info file contains the information about binlog file and file position, as well as the GTID executed set. server_version = 8.0.25 start_time = 2021-06-22 03:45:17 end_time = 2021-06-22 03:45:20 lock_time = 0 binlog_pos = filename 'mysql-bin.000007', position '196', GTID of the last change 'e17d0920-d00e-11eb-a3e6-000d3aa00f87:1-5' innodb_from_lsn = 0 innodb_to_lsn = 18153149 partial = N incremental = N format = file compressed = N encrypted = N Now, after setting MySQL server on the desired host, restore the backup taken from any one of the above methods. If the intended way is binlog based replication, then use the binlog file and position info in the following command change Replication Source to source_host = \u2018primary_ip\u2019, source_port = 3306, source_user = \u2018repl_user\u2019, source_password = \u2018xxxxx\u2019, source_log_file = \u2018mysql-bin.000007\u2019, source_log_pos = \u2018196\u2019; If the replication needs to be set via GITDs, then run the below command to tell the replica about the GTIDs already executed. On the Replica host, run th following commands reset master; set global gtid_purged = \u2018e17d0920-d00e-11eb-a3e6-000d3aa00f87:1-5\u2019 change replication source to source_host = \u2018primary_ip\u2019, source_port = 3306, source_user = \u2018repl_user\u2019, source_password = \u2018xxxxx\u2019, source_auto_position = 1 The reset master command resets the position of the binary log to initial. It can be skipped if the host is a freshly installed MySQL, but we restored a backup so it is necessary. The gtid_purged global variable lets the replica know the GTIDs that have already been executed, so that the replication can start after that. Then in the change source command, we set the auto-position to 1 which automatically gets the next GTID to proceed.","title":"Setting up Replication"},{"location":"level101/databases_sql/replication/#further-reading","text":"More applications of Replication Automtaed Failovers using MySQL Orchestrator","title":"Further Reading"},{"location":"level101/databases_sql/select_query/","text":"SELECT Query The most commonly used command while working with MySQL is SELECT. It is used to fetch the result set from one or more tables. The general form of a typical select query looks like:- SELECT expr FROM table1 [WHERE condition] [GROUP BY column_list HAVING condition] [ORDER BY column_list ASC|DESC] [LIMIT #] The above general form contains some commonly used clauses of a SELECT query:- expr - comma-separated column list or * (for all columns) WHERE - a condition is provided, if true, directs the query to select only those records. GROUP BY - groups the entire result set based on the column list provided. An aggregate function is recommended to be present in the select expression of the query. HAVING supports grouping by putting a condition on the selected or any other aggregate function. ORDER BY - sorts the result set based on the column list in ascending or descending order. LIMIT - commonly used to limit the number of records. Let\u2019s have a look at some examples for a better understanding of the above. The dataset used for the examples below is available here and is free to use. Select all records mysql> select * from employees limit 5; +--------+------------+------------+-----------+--------+------------+ | emp_no | birth_date | first_name | last_name | gender | hire_date | +--------+------------+------------+-----------+--------+------------+ | 10001 | 1953-09-02 | Georgi | Facello | M | 1986-06-26 | | 10002 | 1964-06-02 | Bezalel | Simmel | F | 1985-11-21 | | 10003 | 1959-12-03 | Parto | Bamford | M | 1986-08-28 | | 10004 | 1954-05-01 | Chirstian | Koblick | M | 1986-12-01 | | 10005 | 1955-01-21 | Kyoichi | Maliniak | M | 1989-09-12 | +--------+------------+------------+-----------+--------+------------+ 5 rows in set (0.00 sec) Select specific fields for all records mysql> select first_name, last_name, gender from employees limit 5; +------------+-----------+--------+ | first_name | last_name | gender | +------------+-----------+--------+ | Georgi | Facello | M | | Bezalel | Simmel | F | | Parto | Bamford | M | | Chirstian | Koblick | M | | Kyoichi | Maliniak | M | +------------+-----------+--------+ 5 rows in set (0.00 sec) Select all records Where hire_date >= January 1, 1990 mysql> select * from employees where hire_date >= '1990-01-01' limit 5; +--------+------------+------------+-------------+--------+------------+ | emp_no | birth_date | first_name | last_name | gender | hire_date | +--------+------------+------------+-------------+--------+------------+ | 10008 | 1958-02-19 | Saniya | Kalloufi | M | 1994-09-15 | | 10011 | 1953-11-07 | Mary | Sluis | F | 1990-01-22 | | 10012 | 1960-10-04 | Patricio | Bridgland | M | 1992-12-18 | | 10016 | 1961-05-02 | Kazuhito | Cappelletti | M | 1995-01-27 | | 10017 | 1958-07-06 | Cristinel | Bouloucos | F | 1993-08-03 | +--------+------------+------------+-------------+--------+------------+ 5 rows in set (0.01 sec) Select first_name and last_name from all records Where birth_date >= 1960 AND gender = \u2018F\u2019 mysql> select first_name, last_name from employees where year(birth_date) >= 1960 and gender='F' limit 5; +------------+-----------+ | first_name | last_name | +------------+-----------+ | Bezalel | Simmel | | Duangkaew | Piveteau | | Divier | Reistad | | Jeong | Reistad | | Mingsen | Casley | +------------+-----------+ 5 rows in set (0.00 sec) Display the total number of records mysql> select count(*) from employees; +----------+ | count(*) | +----------+ | 300024 | +----------+ 1 row in set (0.05 sec) Display gender-wise count of all records mysql> select gender, count(*) from employees group by gender; +--------+----------+ | gender | count(*) | +--------+----------+ | M | 179973 | | F | 120051 | +--------+----------+ 2 rows in set (0.14 sec) Display the year of hire_date and number of employees hired that year, also only those years where more than 20k employees were hired mysql> select year(hire_date), count(*) from employees group by year(hire_date) having count(*) > 20000; +-----------------+----------+ | year(hire_date) | count(*) | +-----------------+----------+ | 1985 | 35316 | | 1986 | 36150 | | 1987 | 33501 | | 1988 | 31436 | | 1989 | 28394 | | 1990 | 25610 | | 1991 | 22568 | | 1992 | 20402 | +-----------------+----------+ 8 rows in set (0.14 sec) Display all records ordered by their hire_date in descending order. If hire_date is the same then in order of their birth_date ascending order mysql> select * from employees order by hire_date desc, birth_date asc limit 5; +--------+------------+------------+-----------+--------+------------+ | emp_no | birth_date | first_name | last_name | gender | hire_date | +--------+------------+------------+-----------+--------+------------+ | 463807 | 1964-06-12 | Bikash | Covnot | M | 2000-01-28 | | 428377 | 1957-05-09 | Yucai | Gerlach | M | 2000-01-23 | | 499553 | 1954-05-06 | Hideyuki | Delgrande | F | 2000-01-22 | | 222965 | 1959-08-07 | Volkmar | Perko | F | 2000-01-13 | | 47291 | 1960-09-09 | Ulf | Flexer | M | 2000-01-12 | +--------+------------+------------+-----------+--------+------------+ 5 rows in set (0.12 sec) SELECT - JOINS JOIN statement is used to produce a combined result set from two or more tables based on certain conditions. It can be also used with Update and Delete statements but we will be focussing on the select query. Following is a basic general form for joins SELECT table1.col1, table2.col1, ... (any combination) FROM table1 table2 ON (or USING depends on join_type) table1.column_for_joining = table2.column_for_joining WHERE \u2026 Any number of columns can be selected, but it is recommended to select only those which are relevant to increase the readability of the resultset. All other clauses like where, group by are not mandatory. Let\u2019s discuss the types of JOINs supported by MySQL Syntax. Inner Join This joins table A with table B on a condition. Only the records where the condition is True are selected in the resultset. Display some details of employees along with their salary mysql> select e.emp_no,e.first_name,e.last_name,s.salary from employees e join salaries s on e.emp_no=s.emp_no limit 5; +--------+------------+-----------+--------+ | emp_no | first_name | last_name | salary | +--------+------------+-----------+--------+ | 10001 | Georgi | Facello | 60117 | | 10001 | Georgi | Facello | 62102 | | 10001 | Georgi | Facello | 66074 | | 10001 | Georgi | Facello | 66596 | | 10001 | Georgi | Facello | 66961 | +--------+------------+-----------+--------+ 5 rows in set (0.00 sec) Similar result can be achieved by mysql> select e.emp_no,e.first_name,e.last_name,s.salary from employees e join salaries s using (emp_no) limit 5; +--------+------------+-----------+--------+ | emp_no | first_name | last_name | salary | +--------+------------+-----------+--------+ | 10001 | Georgi | Facello | 60117 | | 10001 | Georgi | Facello | 62102 | | 10001 | Georgi | Facello | 66074 | | 10001 | Georgi | Facello | 66596 | | 10001 | Georgi | Facello | 66961 | +--------+------------+-----------+--------+ 5 rows in set (0.00 sec) And also by mysql> select e.emp_no,e.first_name,e.last_name,s.salary from employees e natural join salaries s limit 5; +--------+------------+-----------+--------+ | emp_no | first_name | last_name | salary | +--------+------------+-----------+--------+ | 10001 | Georgi | Facello | 60117 | | 10001 | Georgi | Facello | 62102 | | 10001 | Georgi | Facello | 66074 | | 10001 | Georgi | Facello | 66596 | | 10001 | Georgi | Facello | 66961 | +--------+------------+-----------+--------+ 5 rows in set (0.00 sec) Outer Join Majorly of two types:- - LEFT - joining complete table A with table B on a condition. All the records from table A are selected, but from table B, only those records are selected where the condition is True. - RIGHT - Exact opposite of the left join. Let us assume the below tables for understanding left join better. mysql> select * from dummy1; +----------+------------+ | same_col | diff_col_1 | +----------+------------+ | 1 | A | | 2 | B | | 3 | C | +----------+------------+ mysql> select * from dummy2; +----------+------------+ | same_col | diff_col_2 | +----------+------------+ | 1 | X | | 3 | Y | +----------+------------+ A simple select join will look like the one below. mysql> select * from dummy1 d1 left join dummy2 d2 on d1.same_col=d2.same_col; +----------+------------+----------+------------+ | same_col | diff_col_1 | same_col | diff_col_2 | +----------+------------+----------+------------+ | 1 | A | 1 | X | | 3 | C | 3 | Y | | 2 | B | NULL | NULL | +----------+------------+----------+------------+ 3 rows in set (0.00 sec) Which can also be written as mysql> select * from dummy1 d1 left join dummy2 d2 using(same_col); +----------+------------+------------+ | same_col | diff_col_1 | diff_col_2 | +----------+------------+------------+ | 1 | A | X | | 3 | C | Y | | 2 | B | NULL | +----------+------------+------------+ 3 rows in set (0.00 sec) And also as mysql> select * from dummy1 d1 natural left join dummy2 d2; +----------+------------+------------+ | same_col | diff_col_1 | diff_col_2 | +----------+------------+------------+ | 1 | A | X | | 3 | C | Y | | 2 | B | NULL | +----------+------------+------------+ 3 rows in set (0.00 sec) Cross Join This does a cross product of table A and table B without any condition. It doesn\u2019t have a lot of applications in the real world. A Simple Cross Join looks like this mysql> select * from dummy1 cross join dummy2; +----------+------------+----------+------------+ | same_col | diff_col_1 | same_col | diff_col_2 | +----------+------------+----------+------------+ | 1 | A | 3 | Y | | 1 | A | 1 | X | | 2 | B | 3 | Y | | 2 | B | 1 | X | | 3 | C | 3 | Y | | 3 | C | 1 | X | +----------+------------+----------+------------+ 6 rows in set (0.01 sec) One use case that can come in handy is when you have to fill in some missing entries. For example, all the entries from dummy1 must be inserted into a similar table dummy3, with each record must have 3 entries with statuses 1, 5 and 7. mysql> desc dummy3; +----------+----------+------+-----+---------+-------+ | Field | Type | Null | Key | Default | Extra | +----------+----------+------+-----+---------+-------+ | same_col | int | YES | | NULL | | | value | char(15) | YES | | NULL | | | status | smallint | YES | | NULL | | +----------+----------+------+-----+---------+-------+ 3 rows in set (0.02 sec) Either you create an insert query script with as many entries as in dummy1 or use cross join to produce the required resultset. mysql> select * from dummy1 cross join (select 1 union select 5 union select 7) T2 order by same_col; +----------+------------+---+ | same_col | diff_col_1 | 1 | +----------+------------+---+ | 1 | A | 1 | | 1 | A | 5 | | 1 | A | 7 | | 2 | B | 1 | | 2 | B | 5 | | 2 | B | 7 | | 3 | C | 1 | | 3 | C | 5 | | 3 | C | 7 | +----------+------------+---+ 9 rows in set (0.00 sec) The T2 section in the above query is called a sub-query . We will discuss the same in the next section. Natural Join This implicitly selects the common column from table A and table B and performs an inner join. mysql> select e.emp_no,e.first_name,e.last_name,s.salary from employees e natural join salaries s limit 5; +--------+------------+-----------+--------+ | emp_no | first_name | last_name | salary | +--------+------------+-----------+--------+ | 10001 | Georgi | Facello | 60117 | | 10001 | Georgi | Facello | 62102 | | 10001 | Georgi | Facello | 66074 | | 10001 | Georgi | Facello | 66596 | | 10001 | Georgi | Facello | 66961 | +--------+------------+-----------+--------+ 5 rows in set (0.00 sec) Notice how natural join and using takes care that the common column is displayed only once if you are not explicitly selecting columns for the query. Some More Examples Display emp_no, salary, title and dept of the employees where salary > 80000 mysql> select e.emp_no, s.salary, t.title, d.dept_no from employees e join salaries s using (emp_no) join titles t using (emp_no) join dept_emp d using (emp_no) where s.salary > 80000 limit 5; +--------+--------+--------------+---------+ | emp_no | salary | title | dept_no | +--------+--------+--------------+---------+ | 10017 | 82163 | Senior Staff | d001 | | 10017 | 86157 | Senior Staff | d001 | | 10017 | 89619 | Senior Staff | d001 | | 10017 | 91985 | Senior Staff | d001 | | 10017 | 96122 | Senior Staff | d001 | +--------+--------+--------------+---------+ 5 rows in set (0.00 sec) Display title-wise count of employees in each department order by dept_no mysql> select d.dept_no, t.title, count(*) from titles t left join dept_emp d using (emp_no) group by d.dept_no, t.title order by d.dept_no limit 10; +---------+--------------------+----------+ | dept_no | title | count(*) | +---------+--------------------+----------+ | d001 | Manager | 2 | | d001 | Senior Staff | 13940 | | d001 | Staff | 16196 | | d002 | Manager | 2 | | d002 | Senior Staff | 12139 | | d002 | Staff | 13929 | | d003 | Manager | 2 | | d003 | Senior Staff | 12274 | | d003 | Staff | 14342 | | d004 | Assistant Engineer | 6445 | +---------+--------------------+----------+ 10 rows in set (1.32 sec) SELECT - Subquery A subquery is generally a smaller resultset that can be used to power a select query in many ways. It can be used in a \u2018where\u2019 condition, can be used in place of join mostly where a join could be an overkill. These subqueries are also termed as derived tables. They must have a table alias in the select query. Let\u2019s look at some examples of subqueries. Here we got the department name from the departments table by a subquery which used dept_no from dept_emp table. mysql> select e.emp_no, (select dept_name from departments where dept_no=d.dept_no) dept_name from employees e join dept_emp d using (emp_no) limit 5; +--------+-----------------+ | emp_no | dept_name | +--------+-----------------+ | 10001 | Development | | 10002 | Sales | | 10003 | Production | | 10004 | Production | | 10005 | Human Resources | +--------+-----------------+ 5 rows in set (0.01 sec) Here, we used the \u2018avg\u2019 query above (which got the avg salary) as a subquery to list the employees whose latest salary is more than the average. mysql> select avg(salary) from salaries; +-------------+ | avg(salary) | +-------------+ | 63810.7448 | +-------------+ 1 row in set (0.80 sec) mysql> select e.emp_no, max(s.salary) from employees e natural join salaries s group by e.emp_no having max(s.salary) > (select avg(salary) from salaries) limit 10; +--------+---------------+ | emp_no | max(s.salary) | +--------+---------------+ | 10001 | 88958 | | 10002 | 72527 | | 10004 | 74057 | | 10005 | 94692 | | 10007 | 88070 | | 10009 | 94443 | | 10010 | 80324 | | 10013 | 68901 | | 10016 | 77935 | | 10017 | 99651 | +--------+---------------+ 10 rows in set (0.56 sec)","title":"Select Query"},{"location":"level101/databases_sql/select_query/#select-query","text":"The most commonly used command while working with MySQL is SELECT. It is used to fetch the result set from one or more tables. The general form of a typical select query looks like:- SELECT expr FROM table1 [WHERE condition] [GROUP BY column_list HAVING condition] [ORDER BY column_list ASC|DESC] [LIMIT #] The above general form contains some commonly used clauses of a SELECT query:- expr - comma-separated column list or * (for all columns) WHERE - a condition is provided, if true, directs the query to select only those records. GROUP BY - groups the entire result set based on the column list provided. An aggregate function is recommended to be present in the select expression of the query. HAVING supports grouping by putting a condition on the selected or any other aggregate function. ORDER BY - sorts the result set based on the column list in ascending or descending order. LIMIT - commonly used to limit the number of records. Let\u2019s have a look at some examples for a better understanding of the above. The dataset used for the examples below is available here and is free to use. Select all records mysql> select * from employees limit 5; +--------+------------+------------+-----------+--------+------------+ | emp_no | birth_date | first_name | last_name | gender | hire_date | +--------+------------+------------+-----------+--------+------------+ | 10001 | 1953-09-02 | Georgi | Facello | M | 1986-06-26 | | 10002 | 1964-06-02 | Bezalel | Simmel | F | 1985-11-21 | | 10003 | 1959-12-03 | Parto | Bamford | M | 1986-08-28 | | 10004 | 1954-05-01 | Chirstian | Koblick | M | 1986-12-01 | | 10005 | 1955-01-21 | Kyoichi | Maliniak | M | 1989-09-12 | +--------+------------+------------+-----------+--------+------------+ 5 rows in set (0.00 sec) Select specific fields for all records mysql> select first_name, last_name, gender from employees limit 5; +------------+-----------+--------+ | first_name | last_name | gender | +------------+-----------+--------+ | Georgi | Facello | M | | Bezalel | Simmel | F | | Parto | Bamford | M | | Chirstian | Koblick | M | | Kyoichi | Maliniak | M | +------------+-----------+--------+ 5 rows in set (0.00 sec) Select all records Where hire_date >= January 1, 1990 mysql> select * from employees where hire_date >= '1990-01-01' limit 5; +--------+------------+------------+-------------+--------+------------+ | emp_no | birth_date | first_name | last_name | gender | hire_date | +--------+------------+------------+-------------+--------+------------+ | 10008 | 1958-02-19 | Saniya | Kalloufi | M | 1994-09-15 | | 10011 | 1953-11-07 | Mary | Sluis | F | 1990-01-22 | | 10012 | 1960-10-04 | Patricio | Bridgland | M | 1992-12-18 | | 10016 | 1961-05-02 | Kazuhito | Cappelletti | M | 1995-01-27 | | 10017 | 1958-07-06 | Cristinel | Bouloucos | F | 1993-08-03 | +--------+------------+------------+-------------+--------+------------+ 5 rows in set (0.01 sec) Select first_name and last_name from all records Where birth_date >= 1960 AND gender = \u2018F\u2019 mysql> select first_name, last_name from employees where year(birth_date) >= 1960 and gender='F' limit 5; +------------+-----------+ | first_name | last_name | +------------+-----------+ | Bezalel | Simmel | | Duangkaew | Piveteau | | Divier | Reistad | | Jeong | Reistad | | Mingsen | Casley | +------------+-----------+ 5 rows in set (0.00 sec) Display the total number of records mysql> select count(*) from employees; +----------+ | count(*) | +----------+ | 300024 | +----------+ 1 row in set (0.05 sec) Display gender-wise count of all records mysql> select gender, count(*) from employees group by gender; +--------+----------+ | gender | count(*) | +--------+----------+ | M | 179973 | | F | 120051 | +--------+----------+ 2 rows in set (0.14 sec) Display the year of hire_date and number of employees hired that year, also only those years where more than 20k employees were hired mysql> select year(hire_date), count(*) from employees group by year(hire_date) having count(*) > 20000; +-----------------+----------+ | year(hire_date) | count(*) | +-----------------+----------+ | 1985 | 35316 | | 1986 | 36150 | | 1987 | 33501 | | 1988 | 31436 | | 1989 | 28394 | | 1990 | 25610 | | 1991 | 22568 | | 1992 | 20402 | +-----------------+----------+ 8 rows in set (0.14 sec) Display all records ordered by their hire_date in descending order. If hire_date is the same then in order of their birth_date ascending order mysql> select * from employees order by hire_date desc, birth_date asc limit 5; +--------+------------+------------+-----------+--------+------------+ | emp_no | birth_date | first_name | last_name | gender | hire_date | +--------+------------+------------+-----------+--------+------------+ | 463807 | 1964-06-12 | Bikash | Covnot | M | 2000-01-28 | | 428377 | 1957-05-09 | Yucai | Gerlach | M | 2000-01-23 | | 499553 | 1954-05-06 | Hideyuki | Delgrande | F | 2000-01-22 | | 222965 | 1959-08-07 | Volkmar | Perko | F | 2000-01-13 | | 47291 | 1960-09-09 | Ulf | Flexer | M | 2000-01-12 | +--------+------------+------------+-----------+--------+------------+ 5 rows in set (0.12 sec)","title":"SELECT Query"},{"location":"level101/databases_sql/select_query/#select-joins","text":"JOIN statement is used to produce a combined result set from two or more tables based on certain conditions. It can be also used with Update and Delete statements but we will be focussing on the select query. Following is a basic general form for joins SELECT table1.col1, table2.col1, ... (any combination) FROM table1 table2 ON (or USING depends on join_type) table1.column_for_joining = table2.column_for_joining WHERE \u2026 Any number of columns can be selected, but it is recommended to select only those which are relevant to increase the readability of the resultset. All other clauses like where, group by are not mandatory. Let\u2019s discuss the types of JOINs supported by MySQL Syntax. Inner Join This joins table A with table B on a condition. Only the records where the condition is True are selected in the resultset. Display some details of employees along with their salary mysql> select e.emp_no,e.first_name,e.last_name,s.salary from employees e join salaries s on e.emp_no=s.emp_no limit 5; +--------+------------+-----------+--------+ | emp_no | first_name | last_name | salary | +--------+------------+-----------+--------+ | 10001 | Georgi | Facello | 60117 | | 10001 | Georgi | Facello | 62102 | | 10001 | Georgi | Facello | 66074 | | 10001 | Georgi | Facello | 66596 | | 10001 | Georgi | Facello | 66961 | +--------+------------+-----------+--------+ 5 rows in set (0.00 sec) Similar result can be achieved by mysql> select e.emp_no,e.first_name,e.last_name,s.salary from employees e join salaries s using (emp_no) limit 5; +--------+------------+-----------+--------+ | emp_no | first_name | last_name | salary | +--------+------------+-----------+--------+ | 10001 | Georgi | Facello | 60117 | | 10001 | Georgi | Facello | 62102 | | 10001 | Georgi | Facello | 66074 | | 10001 | Georgi | Facello | 66596 | | 10001 | Georgi | Facello | 66961 | +--------+------------+-----------+--------+ 5 rows in set (0.00 sec) And also by mysql> select e.emp_no,e.first_name,e.last_name,s.salary from employees e natural join salaries s limit 5; +--------+------------+-----------+--------+ | emp_no | first_name | last_name | salary | +--------+------------+-----------+--------+ | 10001 | Georgi | Facello | 60117 | | 10001 | Georgi | Facello | 62102 | | 10001 | Georgi | Facello | 66074 | | 10001 | Georgi | Facello | 66596 | | 10001 | Georgi | Facello | 66961 | +--------+------------+-----------+--------+ 5 rows in set (0.00 sec) Outer Join Majorly of two types:- - LEFT - joining complete table A with table B on a condition. All the records from table A are selected, but from table B, only those records are selected where the condition is True. - RIGHT - Exact opposite of the left join. Let us assume the below tables for understanding left join better. mysql> select * from dummy1; +----------+------------+ | same_col | diff_col_1 | +----------+------------+ | 1 | A | | 2 | B | | 3 | C | +----------+------------+ mysql> select * from dummy2; +----------+------------+ | same_col | diff_col_2 | +----------+------------+ | 1 | X | | 3 | Y | +----------+------------+ A simple select join will look like the one below. mysql> select * from dummy1 d1 left join dummy2 d2 on d1.same_col=d2.same_col; +----------+------------+----------+------------+ | same_col | diff_col_1 | same_col | diff_col_2 | +----------+------------+----------+------------+ | 1 | A | 1 | X | | 3 | C | 3 | Y | | 2 | B | NULL | NULL | +----------+------------+----------+------------+ 3 rows in set (0.00 sec) Which can also be written as mysql> select * from dummy1 d1 left join dummy2 d2 using(same_col); +----------+------------+------------+ | same_col | diff_col_1 | diff_col_2 | +----------+------------+------------+ | 1 | A | X | | 3 | C | Y | | 2 | B | NULL | +----------+------------+------------+ 3 rows in set (0.00 sec) And also as mysql> select * from dummy1 d1 natural left join dummy2 d2; +----------+------------+------------+ | same_col | diff_col_1 | diff_col_2 | +----------+------------+------------+ | 1 | A | X | | 3 | C | Y | | 2 | B | NULL | +----------+------------+------------+ 3 rows in set (0.00 sec) Cross Join This does a cross product of table A and table B without any condition. It doesn\u2019t have a lot of applications in the real world. A Simple Cross Join looks like this mysql> select * from dummy1 cross join dummy2; +----------+------------+----------+------------+ | same_col | diff_col_1 | same_col | diff_col_2 | +----------+------------+----------+------------+ | 1 | A | 3 | Y | | 1 | A | 1 | X | | 2 | B | 3 | Y | | 2 | B | 1 | X | | 3 | C | 3 | Y | | 3 | C | 1 | X | +----------+------------+----------+------------+ 6 rows in set (0.01 sec) One use case that can come in handy is when you have to fill in some missing entries. For example, all the entries from dummy1 must be inserted into a similar table dummy3, with each record must have 3 entries with statuses 1, 5 and 7. mysql> desc dummy3; +----------+----------+------+-----+---------+-------+ | Field | Type | Null | Key | Default | Extra | +----------+----------+------+-----+---------+-------+ | same_col | int | YES | | NULL | | | value | char(15) | YES | | NULL | | | status | smallint | YES | | NULL | | +----------+----------+------+-----+---------+-------+ 3 rows in set (0.02 sec) Either you create an insert query script with as many entries as in dummy1 or use cross join to produce the required resultset. mysql> select * from dummy1 cross join (select 1 union select 5 union select 7) T2 order by same_col; +----------+------------+---+ | same_col | diff_col_1 | 1 | +----------+------------+---+ | 1 | A | 1 | | 1 | A | 5 | | 1 | A | 7 | | 2 | B | 1 | | 2 | B | 5 | | 2 | B | 7 | | 3 | C | 1 | | 3 | C | 5 | | 3 | C | 7 | +----------+------------+---+ 9 rows in set (0.00 sec) The T2 section in the above query is called a sub-query . We will discuss the same in the next section. Natural Join This implicitly selects the common column from table A and table B and performs an inner join. mysql> select e.emp_no,e.first_name,e.last_name,s.salary from employees e natural join salaries s limit 5; +--------+------------+-----------+--------+ | emp_no | first_name | last_name | salary | +--------+------------+-----------+--------+ | 10001 | Georgi | Facello | 60117 | | 10001 | Georgi | Facello | 62102 | | 10001 | Georgi | Facello | 66074 | | 10001 | Georgi | Facello | 66596 | | 10001 | Georgi | Facello | 66961 | +--------+------------+-----------+--------+ 5 rows in set (0.00 sec) Notice how natural join and using takes care that the common column is displayed only once if you are not explicitly selecting columns for the query. Some More Examples Display emp_no, salary, title and dept of the employees where salary > 80000 mysql> select e.emp_no, s.salary, t.title, d.dept_no from employees e join salaries s using (emp_no) join titles t using (emp_no) join dept_emp d using (emp_no) where s.salary > 80000 limit 5; +--------+--------+--------------+---------+ | emp_no | salary | title | dept_no | +--------+--------+--------------+---------+ | 10017 | 82163 | Senior Staff | d001 | | 10017 | 86157 | Senior Staff | d001 | | 10017 | 89619 | Senior Staff | d001 | | 10017 | 91985 | Senior Staff | d001 | | 10017 | 96122 | Senior Staff | d001 | +--------+--------+--------------+---------+ 5 rows in set (0.00 sec) Display title-wise count of employees in each department order by dept_no mysql> select d.dept_no, t.title, count(*) from titles t left join dept_emp d using (emp_no) group by d.dept_no, t.title order by d.dept_no limit 10; +---------+--------------------+----------+ | dept_no | title | count(*) | +---------+--------------------+----------+ | d001 | Manager | 2 | | d001 | Senior Staff | 13940 | | d001 | Staff | 16196 | | d002 | Manager | 2 | | d002 | Senior Staff | 12139 | | d002 | Staff | 13929 | | d003 | Manager | 2 | | d003 | Senior Staff | 12274 | | d003 | Staff | 14342 | | d004 | Assistant Engineer | 6445 | +---------+--------------------+----------+ 10 rows in set (1.32 sec)","title":"SELECT - JOINS"},{"location":"level101/databases_sql/select_query/#select-subquery","text":"A subquery is generally a smaller resultset that can be used to power a select query in many ways. It can be used in a \u2018where\u2019 condition, can be used in place of join mostly where a join could be an overkill. These subqueries are also termed as derived tables. They must have a table alias in the select query. Let\u2019s look at some examples of subqueries. Here we got the department name from the departments table by a subquery which used dept_no from dept_emp table. mysql> select e.emp_no, (select dept_name from departments where dept_no=d.dept_no) dept_name from employees e join dept_emp d using (emp_no) limit 5; +--------+-----------------+ | emp_no | dept_name | +--------+-----------------+ | 10001 | Development | | 10002 | Sales | | 10003 | Production | | 10004 | Production | | 10005 | Human Resources | +--------+-----------------+ 5 rows in set (0.01 sec) Here, we used the \u2018avg\u2019 query above (which got the avg salary) as a subquery to list the employees whose latest salary is more than the average. mysql> select avg(salary) from salaries; +-------------+ | avg(salary) | +-------------+ | 63810.7448 | +-------------+ 1 row in set (0.80 sec) mysql> select e.emp_no, max(s.salary) from employees e natural join salaries s group by e.emp_no having max(s.salary) > (select avg(salary) from salaries) limit 10; +--------+---------------+ | emp_no | max(s.salary) | +--------+---------------+ | 10001 | 88958 | | 10002 | 72527 | | 10004 | 74057 | | 10005 | 94692 | | 10007 | 88070 | | 10009 | 94443 | | 10010 | 80324 | | 10013 | 68901 | | 10016 | 77935 | | 10017 | 99651 | +--------+---------------+ 10 rows in set (0.56 sec)","title":"SELECT - Subquery"},{"location":"level101/git/branches/","text":"Working With Branches Coming back to our local repo which has two commits. So far, what we have is a single line of history. Commits are chained in a single line. But sometimes you may have a need to work on two different features in parallel in the same repo. Now one option here could be making a new folder/repo with the same code and use that for another feature development. But there's a better way. Use branches. Since git follows tree like structure for commits, we can use branches to work on different sets of features. From a commit, two or more branches can be created and branches can also be merged. Using branches, there can exist multiple lines of histories and we can checkout to any of them and work on it. Checking out, as we discussed earlier, would simply mean replacing contents of the directory (repo) with the snapshot at the checked out version. Let's create a branch and see how it looks like: $ git branch b1 $ git log --oneline --graph * 7f3b00e (HEAD -> master, b1) adding file 2 * df2fb7a adding file 1 We create a branch called b1 . Git log tells us that b1 also points to the last commit (7f3b00e) but the HEAD is still pointing to master. If you remember, HEAD points to the commit/reference wherever you are checkout to. So if we checkout to b1 , HEAD should point to that. Let's confirm: $ git checkout b1 Switched to branch 'b1' $ git log --oneline --graph * 7f3b00e (HEAD -> b1, master) adding file 2 * df2fb7a adding file 1 b1 still points to the same commit but HEAD now points to b1 . Since we create a branch at commit 7f3b00e , there will be two lines of histories starting this commit. Depending on which branch you are checked out on, the line of history will progress. At this moment, we are checked out on branch b1 , so making a new commit will advance branch reference b1 to that commit and current b1 commit will become its parent. Let's do that. # Creating a file and making a commit $ echo \"I am a file in b1 branch\" > b1.txt $ git add b1.txt $ git commit -m \"adding b1 file\" [b1 872a38f] adding b1 file 1 file changed, 1 insertion(+) create mode 100644 b1.txt # The new line of history $ git log --oneline --graph * 872a38f (HEAD -> b1) adding b1 file * 7f3b00e (master) adding file 2 * df2fb7a adding file 1 $ Do note that master is still pointing to the old commit it was pointing to. We can now checkout to master branch and make commits there. This will result in another line of history starting from commit 7f3b00e. # checkout to master branch $ git checkout master Switched to branch 'master' # Creating a new commit on master branch $ echo \"new file in master branch\" > master.txt $ git add master.txt $ git commit -m \"adding master.txt file\" [master 60dc441] adding master.txt file 1 file changed, 1 insertion(+) create mode 100644 master.txt # The history line $ git log --oneline --graph * 60dc441 (HEAD -> master) adding master.txt file * 7f3b00e adding file 2 * df2fb7a adding file 1 Notice how branch b1 is not visible here since we are on the master. Let's try to visualize both to get the whole picture: $ git log --oneline --graph --all * 60dc441 (HEAD -> master) adding master.txt file | * 872a38f (b1) adding b1 file |/ * 7f3b00e adding file 2 * df2fb7a adding file 1 Above tree structure should make things clear. Notice a clear branch/fork on commit 7f3b00e. This is how we create branches. Now they both are two separate lines of history on which feature development can be done independently. To reiterate, internally, git is just a tree of commits. Branch names (human readable) are pointers to those commits in the tree. We use various git commands to work with the tree structure and references. Git accordingly modifies contents of our repo. Merges Now say the feature you were working on branch b1 is complete and you need to merge it on master branch, where all the final version of code goes. So first you will checkout to branch master and then you pull the latest code from upstream (eg: GitHub). Then you need to merge your code from b1 into master. There could be two ways this can be done. Here is the current history: $ git log --oneline --graph --all * 60dc441 (HEAD -> master) adding master.txt file | * 872a38f (b1) adding b1 file |/ * 7f3b00e adding file 2 * df2fb7a adding file 1 Option 1: Directly merge the branch. Merging the branch b1 into master will result in a new merge commit. This will merge changes from two different lines of history and create a new commit of the result. $ git merge b1 Merge made by the 'recursive' strategy. b1.txt | 1 + 1 file changed, 1 insertion(+) create mode 100644 b1.txt $ git log --oneline --graph --all * 8fc28f9 (HEAD -> master) Merge branch 'b1' |\\ | * 872a38f (b1) adding b1 file * | 60dc441 adding master.txt file |/ * 7f3b00e adding file 2 * df2fb7a adding file 1 You can see a new merge commit created (8fc28f9). You will be prompted for the commit message. If there are a lot of branches in the repo, this result will end-up with a lot of merge commits. Which looks ugly compared to a single line of history of development. So let's look at an alternative approach First let's reset our last merge and go to the previous state. $ git reset --hard 60dc441 HEAD is now at 60dc441 adding master.txt file $ git log --oneline --graph --all * 60dc441 (HEAD -> master) adding master.txt file | * 872a38f (b1) adding b1 file |/ * 7f3b00e adding file 2 * df2fb7a adding file 1 Option 2: Rebase. Now, instead of merging two branches which has a similar base (commit: 7f3b00e), let us rebase branch b1 on to current master. What this means is take branch b1 (from commit 7f3b00e to commit 872a38f) and rebase (put them on top of) master (60dc441). # Switch to b1 $ git checkout b1 Switched to branch 'b1' # Rebase (b1 which is current branch) on master $ git rebase master First, rewinding head to replay your work on top of it... Applying: adding b1 file # The result $ git log --oneline --graph --all * 5372c8f (HEAD -> b1) adding b1 file * 60dc441 (master) adding master.txt file * 7f3b00e adding file 2 * df2fb7a adding file 1 You can see b1 which had 1 commit. That commit's parent was 7f3b00e . But since we rebase it on master ( 60dc441 ). That becomes the parent now. As a side effect, you also see it has become a single line of history. Now if we were to merge b1 into master , it would simply mean change master to point to 5372c8f which is b1 . Let's try it: # checkout to master since we want to merge code into master $ git checkout master Switched to branch 'master' # the current history, where b1 is based on master $ git log --oneline --graph --all * 5372c8f (b1) adding b1 file * 60dc441 (HEAD -> master) adding master.txt file * 7f3b00e adding file 2 * df2fb7a adding file 1 # Performing the merge, notice the \"fast-forward\" message $ git merge b1 Updating 60dc441..5372c8f Fast-forward b1.txt | 1 + 1 file changed, 1 insertion(+) create mode 100644 b1.txt # The Result $ git log --oneline --graph --all * 5372c8f (HEAD -> master, b1) adding b1 file * 60dc441 adding master.txt file * 7f3b00e adding file 2 * df2fb7a adding file 1 Now you see both b1 and master are pointing to the same commit. Your code has been merged to the master branch and it can be pushed. Also we have clean line of history! :D","title":"Working With Branches"},{"location":"level101/git/branches/#working-with-branches","text":"Coming back to our local repo which has two commits. So far, what we have is a single line of history. Commits are chained in a single line. But sometimes you may have a need to work on two different features in parallel in the same repo. Now one option here could be making a new folder/repo with the same code and use that for another feature development. But there's a better way. Use branches. Since git follows tree like structure for commits, we can use branches to work on different sets of features. From a commit, two or more branches can be created and branches can also be merged. Using branches, there can exist multiple lines of histories and we can checkout to any of them and work on it. Checking out, as we discussed earlier, would simply mean replacing contents of the directory (repo) with the snapshot at the checked out version. Let's create a branch and see how it looks like: $ git branch b1 $ git log --oneline --graph * 7f3b00e (HEAD -> master, b1) adding file 2 * df2fb7a adding file 1 We create a branch called b1 . Git log tells us that b1 also points to the last commit (7f3b00e) but the HEAD is still pointing to master. If you remember, HEAD points to the commit/reference wherever you are checkout to. So if we checkout to b1 , HEAD should point to that. Let's confirm: $ git checkout b1 Switched to branch 'b1' $ git log --oneline --graph * 7f3b00e (HEAD -> b1, master) adding file 2 * df2fb7a adding file 1 b1 still points to the same commit but HEAD now points to b1 . Since we create a branch at commit 7f3b00e , there will be two lines of histories starting this commit. Depending on which branch you are checked out on, the line of history will progress. At this moment, we are checked out on branch b1 , so making a new commit will advance branch reference b1 to that commit and current b1 commit will become its parent. Let's do that. # Creating a file and making a commit $ echo \"I am a file in b1 branch\" > b1.txt $ git add b1.txt $ git commit -m \"adding b1 file\" [b1 872a38f] adding b1 file 1 file changed, 1 insertion(+) create mode 100644 b1.txt # The new line of history $ git log --oneline --graph * 872a38f (HEAD -> b1) adding b1 file * 7f3b00e (master) adding file 2 * df2fb7a adding file 1 $ Do note that master is still pointing to the old commit it was pointing to. We can now checkout to master branch and make commits there. This will result in another line of history starting from commit 7f3b00e. # checkout to master branch $ git checkout master Switched to branch 'master' # Creating a new commit on master branch $ echo \"new file in master branch\" > master.txt $ git add master.txt $ git commit -m \"adding master.txt file\" [master 60dc441] adding master.txt file 1 file changed, 1 insertion(+) create mode 100644 master.txt # The history line $ git log --oneline --graph * 60dc441 (HEAD -> master) adding master.txt file * 7f3b00e adding file 2 * df2fb7a adding file 1 Notice how branch b1 is not visible here since we are on the master. Let's try to visualize both to get the whole picture: $ git log --oneline --graph --all * 60dc441 (HEAD -> master) adding master.txt file | * 872a38f (b1) adding b1 file |/ * 7f3b00e adding file 2 * df2fb7a adding file 1 Above tree structure should make things clear. Notice a clear branch/fork on commit 7f3b00e. This is how we create branches. Now they both are two separate lines of history on which feature development can be done independently. To reiterate, internally, git is just a tree of commits. Branch names (human readable) are pointers to those commits in the tree. We use various git commands to work with the tree structure and references. Git accordingly modifies contents of our repo.","title":"Working With Branches"},{"location":"level101/git/branches/#merges","text":"Now say the feature you were working on branch b1 is complete and you need to merge it on master branch, where all the final version of code goes. So first you will checkout to branch master and then you pull the latest code from upstream (eg: GitHub). Then you need to merge your code from b1 into master. There could be two ways this can be done. Here is the current history: $ git log --oneline --graph --all * 60dc441 (HEAD -> master) adding master.txt file | * 872a38f (b1) adding b1 file |/ * 7f3b00e adding file 2 * df2fb7a adding file 1 Option 1: Directly merge the branch. Merging the branch b1 into master will result in a new merge commit. This will merge changes from two different lines of history and create a new commit of the result. $ git merge b1 Merge made by the 'recursive' strategy. b1.txt | 1 + 1 file changed, 1 insertion(+) create mode 100644 b1.txt $ git log --oneline --graph --all * 8fc28f9 (HEAD -> master) Merge branch 'b1' |\\ | * 872a38f (b1) adding b1 file * | 60dc441 adding master.txt file |/ * 7f3b00e adding file 2 * df2fb7a adding file 1 You can see a new merge commit created (8fc28f9). You will be prompted for the commit message. If there are a lot of branches in the repo, this result will end-up with a lot of merge commits. Which looks ugly compared to a single line of history of development. So let's look at an alternative approach First let's reset our last merge and go to the previous state. $ git reset --hard 60dc441 HEAD is now at 60dc441 adding master.txt file $ git log --oneline --graph --all * 60dc441 (HEAD -> master) adding master.txt file | * 872a38f (b1) adding b1 file |/ * 7f3b00e adding file 2 * df2fb7a adding file 1 Option 2: Rebase. Now, instead of merging two branches which has a similar base (commit: 7f3b00e), let us rebase branch b1 on to current master. What this means is take branch b1 (from commit 7f3b00e to commit 872a38f) and rebase (put them on top of) master (60dc441). # Switch to b1 $ git checkout b1 Switched to branch 'b1' # Rebase (b1 which is current branch) on master $ git rebase master First, rewinding head to replay your work on top of it... Applying: adding b1 file # The result $ git log --oneline --graph --all * 5372c8f (HEAD -> b1) adding b1 file * 60dc441 (master) adding master.txt file * 7f3b00e adding file 2 * df2fb7a adding file 1 You can see b1 which had 1 commit. That commit's parent was 7f3b00e . But since we rebase it on master ( 60dc441 ). That becomes the parent now. As a side effect, you also see it has become a single line of history. Now if we were to merge b1 into master , it would simply mean change master to point to 5372c8f which is b1 . Let's try it: # checkout to master since we want to merge code into master $ git checkout master Switched to branch 'master' # the current history, where b1 is based on master $ git log --oneline --graph --all * 5372c8f (b1) adding b1 file * 60dc441 (HEAD -> master) adding master.txt file * 7f3b00e adding file 2 * df2fb7a adding file 1 # Performing the merge, notice the \"fast-forward\" message $ git merge b1 Updating 60dc441..5372c8f Fast-forward b1.txt | 1 + 1 file changed, 1 insertion(+) create mode 100644 b1.txt # The Result $ git log --oneline --graph --all * 5372c8f (HEAD -> master, b1) adding b1 file * 60dc441 adding master.txt file * 7f3b00e adding file 2 * df2fb7a adding file 1 Now you see both b1 and master are pointing to the same commit. Your code has been merged to the master branch and it can be pushed. Also we have clean line of history! :D","title":"Merges"},{"location":"level101/git/conclusion/","text":"What next from here? There are a lot of git commands and features which we have not explored here. But with the base built-up, be sure to explore concepts like Cherrypick Squash Amend Stash Reset","title":"Conclusion"},{"location":"level101/git/conclusion/#what-next-from-here","text":"There are a lot of git commands and features which we have not explored here. But with the base built-up, be sure to explore concepts like Cherrypick Squash Amend Stash Reset","title":"What next from here?"},{"location":"level101/git/git-basics/","text":"Git Prerequisites Have Git installed https://git-scm.com/downloads Have taken any git high level tutorial or following LinkedIn learning courses https://www.linkedin.com/learning/git-essential-training-the-basics/ https://www.linkedin.com/learning/git-branches-merges-and-remotes/ The Official Git Docs What to expect from this course As an engineer in the field of computer science, having knowledge of version control tools becomes almost a requirement. While there are a lot of version control tools that exist today like SVN, Mercurial, etc, Git perhaps is the most used one and this course we will be working with Git. While this course does not start with Git 101 and expects basic knowledge of git as a prerequisite, it will reintroduce the git concepts known by you with details covering what is happening under the hood as you execute various git commands. So that next time you run a git command, you will be able to press enter more confidently! What is not covered under this course Advanced usage and specifics of internal implementation details of Git. Course Contents Git Basics Working with Branches Git with Github Hooks Git Basics Though you might be aware already, let's revisit why we need a version control system. As the project grows and multiple developers start working on it, an efficient method for collaboration is warranted. Git helps the team collaborate easily and also maintains the history of the changes happening with the codebase. Creating a Git Repo Any folder can be converted into a git repository. After executing the following command, we will see a .git folder within the folder, which makes our folder a git repository. All the magic that git does, .git folder is the enabler for the same. # creating an empty folder and changing current dir to it $ cd /tmp $ mkdir school-of-sre $ cd school-of-sre/ # initialize a git repo $ git init Initialized empty Git repository in /private/tmp/school-of-sre/.git/ As the output says, an empty git repo has been initialized in our folder. Let's take a look at what is there. $ ls .git/ HEAD config description hooks info objects refs There are a bunch of folders and files in the .git folder. As I said, all these enables git to do its magic. We will look into some of these folders and files. But for now, what we have is an empty git repository. Tracking a File Now as you might already know, let us create a new file in our repo (we will refer to the folder as repo now.) And see git status $ echo \"I am file 1\" > file1.txt $ git status On branch master No commits yet Untracked files: (use \"git add ...\" to include in what will be committed) file1.txt nothing added to commit but untracked files present (use \"git add\" to track) The current git status says No commits yet and there is one untracked file. Since we just created the file, git is not tracking that file. We explicitly need to ask git to track files and folders. (also checkout gitignore ) And how we do that is via git add command as suggested in the above output. Then we go ahead and create a commit. $ git add file1.txt $ git status On branch master No commits yet Changes to be committed: (use \"git rm --cached ...\" to unstage) new file: file1.txt $ git commit -m \"adding file 1\" [master (root-commit) df2fb7a] adding file 1 1 file changed, 1 insertion(+) create mode 100644 file1.txt Notice how after adding the file, git status says Changes to be committed: . What it means is whatever is listed there, will be included in the next commit. Then we go ahead and create a commit, with an attached messaged via -m . More About a Commit Commit is a snapshot of the repo. Whenever a commit is made, a snapshot of the current state of repo (the folder) is taken and saved. Each commit has a unique ID. ( df2fb7a for the commit we made in the previous step). As we keep adding/changing more and more contents and keep making commits, all those snapshots are stored by git. Again, all this magic happens inside the .git folder. This is where all this snapshot or versions are stored in an efficient manner. Adding More Changes Let us create one more file and commit the change. It would look the same as the previous commit we made. $ echo \"I am file 2\" > file2.txt $ git add file2.txt $ git commit -m \"adding file 2\" [master 7f3b00e] adding file 2 1 file changed, 1 insertion(+) create mode 100644 file2.txt A new commit with ID 7f3b00e has been created. You can issue git status at any time to see the state of the repository. **IMPORTANT: Note that commit IDs are long string (SHA) but we can refer to a commit by its initial few (8 or more) characters too. We will interchangeably using shorter and longer commit IDs.** Now that we have two commits, let's visualize them: $ git log --oneline --graph * 7f3b00e (HEAD -> master) adding file 2 * df2fb7a adding file 1 git log , as the name suggests, prints the log of all the git commits. Here you see two additional arguments, --oneline prints the shorter version of the log, ie: the commit message only and not the person who made the commit and when. --graph prints it in graph format. Now at this moment the commits might look like just one in each line but all commits are stored as a tree like data structure internally by git. That means there can be two or more children commits of a given commit. And not just a single line of commits. We will look more into this part when we get to the Branches section. For now this is our commit history: df2fb7a ===> 7f3b00e Are commits really linked? As I just said, the two commits we just made are linked via tree like data structure and we saw how they are linked. But let's actually verify it. Everything in git is an object. Newly created files are stored as an object. Changes to file are stored as an objects and even commits are objects. To view contents of an object we can use the following command with the object's ID. We will take a look at the contents of the second commit $ git cat-file -p 7f3b00e tree ebf3af44d253e5328340026e45a9fa9ae3ea1982 parent df2fb7a61f5d40c1191e0fdeb0fc5d6e7969685a author Sanket Patel 1603273316 -0700 committer Sanket Patel 1603273316 -0700 adding file 2 Take a note of parent attribute in the above output. It points to the commit id of the first commit we made. So this proves that they are linked! Additionally you can see the second commit's message in this object. As I said all this magic is enabled by .git folder and the object to which we are looking at also is in that folder. $ ls .git/objects/7f/3b00eaa957815884198e2fdfec29361108d6a9 .git/objects/7f/3b00eaa957815884198e2fdfec29361108d6a9 It is stored in .git/objects/ folder. All the files and changes to them as well are stored in this folder. The Version Control part of Git We already can see two commits (versions) in our git log. One thing a version control tool gives you is ability to browse back and forth in history. For example: some of your users are running an old version of code and they are reporting an issue. In order to debug the issue, you need access to the old code. The one in your current repo is the latest code. In this example, you are working on the second commit (7f3b00e) and someone reported an issue with the code snapshot at commit (df2fb7a). This is how you would get access to the code at any older commit # Current contents, two files present $ ls file1.txt file2.txt # checking out to (an older) commit $ git checkout df2fb7a Note: checking out 'df2fb7a'. You are in 'detached HEAD' state. You can look around, make experimental changes and commit them, and you can discard any commits you make in this state without impacting any branches by performing another checkout. If you want to create a new branch to retain commits you create, you may do so (now or later) by using -b with the checkout command again. Example: git checkout -b HEAD is now at df2fb7a adding file 1 # checking contents, can verify it has old contents $ ls file1.txt So this is how we would get access to old versions/snapshots. All we need is a reference to that snapshot. Upon executing git checkout ... , what git does for you is use the .git folder, see what was the state of things (files and folders) at that version/reference and replace the contents of current directory with those contents. The then-existing content will no longer be present in the local dir (repo) but we can and will still get access to them because they are tracked via git commit and .git folder has them stored/tracked. Reference I mention in the previous section that we need a reference to the version. By default, git repo is made of tree of commits. And each commit has a unique IDs. But the unique ID is not the only thing we can reference commits via. There are multiple ways to reference commits. For example: HEAD is a reference to current commit. Whatever commit your repo is checked out at, HEAD will point to that. HEAD~1 is reference to previous commit. So while checking out previous version in section above, we could have done git checkout HEAD~1 . Similarly, master is also a reference (to a branch). Since git uses tree like structure to store commits, there of course will be branches. And the default branch is called master . Master (or any branch reference) will point to the latest commit in the branch. Even though we have checked out to the previous commit in out repo, master still points to the latest commit. And we can get back to the latest version by checkout at master reference $ git checkout master Previous HEAD position was df2fb7a adding file 1 Switched to branch 'master' # now we will see latest code, with two files $ ls file1.txt file2.txt Note, instead of master in above command, we could have used commit's ID as well. References and The Magic Let's look at the state of things. Two commits, master and HEAD references are pointing to the latest commit $ git log --oneline --graph * 7f3b00e (HEAD -> master) adding file 2 * df2fb7a adding file 1 The magic? Let's examine these files: $ cat .git/refs/heads/master 7f3b00eaa957815884198e2fdfec29361108d6a9 Viola! Where master is pointing to is stored in a file. Whenever git needs to know where master reference is pointing to, or if git needs to update where master points, it just needs to update the file above. So when you create a new commit, a new commit is created on top of the current commit and the master file is updated with the new commit's ID. Similary, for HEAD reference: $ cat .git/HEAD ref: refs/heads/master We can see HEAD is pointing to a reference called refs/heads/master . So HEAD will point where ever the master points. Little Adventure We discussed how git will update the files as we execute commands. But let's try to do it ourselves, by hand, and see what happens. $ git log --oneline --graph * 7f3b00e (HEAD -> master) adding file 2 * df2fb7a adding file 1 Now let's change master to point to the previous/first commit. $ echo df2fb7a61f5d40c1191e0fdeb0fc5d6e7969685a > .git/refs/heads/master $ git log --oneline --graph * df2fb7a (HEAD -> master) adding file 1 # RESETTING TO ORIGINAL $ echo 7f3b00eaa957815884198e2fdfec29361108d6a9 > .git/refs/heads/master $ git log --oneline --graph * 7f3b00e (HEAD -> master) adding file 2 * df2fb7a adding file 1 We just edited the master reference file and now we can see only the first commit in git log. Undoing the change to the file brings the state back to original. Not so much of magic, is it?","title":"Git Basics"},{"location":"level101/git/git-basics/#git","text":"","title":"Git"},{"location":"level101/git/git-basics/#prerequisites","text":"Have Git installed https://git-scm.com/downloads Have taken any git high level tutorial or following LinkedIn learning courses https://www.linkedin.com/learning/git-essential-training-the-basics/ https://www.linkedin.com/learning/git-branches-merges-and-remotes/ The Official Git Docs","title":"Prerequisites"},{"location":"level101/git/git-basics/#what-to-expect-from-this-course","text":"As an engineer in the field of computer science, having knowledge of version control tools becomes almost a requirement. While there are a lot of version control tools that exist today like SVN, Mercurial, etc, Git perhaps is the most used one and this course we will be working with Git. While this course does not start with Git 101 and expects basic knowledge of git as a prerequisite, it will reintroduce the git concepts known by you with details covering what is happening under the hood as you execute various git commands. So that next time you run a git command, you will be able to press enter more confidently!","title":"What to expect from this course"},{"location":"level101/git/git-basics/#what-is-not-covered-under-this-course","text":"Advanced usage and specifics of internal implementation details of Git.","title":"What is not covered under this course"},{"location":"level101/git/git-basics/#course-contents","text":"Git Basics Working with Branches Git with Github Hooks","title":"Course Contents"},{"location":"level101/git/git-basics/#git-basics","text":"Though you might be aware already, let's revisit why we need a version control system. As the project grows and multiple developers start working on it, an efficient method for collaboration is warranted. Git helps the team collaborate easily and also maintains the history of the changes happening with the codebase.","title":"Git Basics"},{"location":"level101/git/git-basics/#creating-a-git-repo","text":"Any folder can be converted into a git repository. After executing the following command, we will see a .git folder within the folder, which makes our folder a git repository. All the magic that git does, .git folder is the enabler for the same. # creating an empty folder and changing current dir to it $ cd /tmp $ mkdir school-of-sre $ cd school-of-sre/ # initialize a git repo $ git init Initialized empty Git repository in /private/tmp/school-of-sre/.git/ As the output says, an empty git repo has been initialized in our folder. Let's take a look at what is there. $ ls .git/ HEAD config description hooks info objects refs There are a bunch of folders and files in the .git folder. As I said, all these enables git to do its magic. We will look into some of these folders and files. But for now, what we have is an empty git repository.","title":"Creating a Git Repo"},{"location":"level101/git/git-basics/#tracking-a-file","text":"Now as you might already know, let us create a new file in our repo (we will refer to the folder as repo now.) And see git status $ echo \"I am file 1\" > file1.txt $ git status On branch master No commits yet Untracked files: (use \"git add ...\" to include in what will be committed) file1.txt nothing added to commit but untracked files present (use \"git add\" to track) The current git status says No commits yet and there is one untracked file. Since we just created the file, git is not tracking that file. We explicitly need to ask git to track files and folders. (also checkout gitignore ) And how we do that is via git add command as suggested in the above output. Then we go ahead and create a commit. $ git add file1.txt $ git status On branch master No commits yet Changes to be committed: (use \"git rm --cached ...\" to unstage) new file: file1.txt $ git commit -m \"adding file 1\" [master (root-commit) df2fb7a] adding file 1 1 file changed, 1 insertion(+) create mode 100644 file1.txt Notice how after adding the file, git status says Changes to be committed: . What it means is whatever is listed there, will be included in the next commit. Then we go ahead and create a commit, with an attached messaged via -m .","title":"Tracking a File"},{"location":"level101/git/git-basics/#more-about-a-commit","text":"Commit is a snapshot of the repo. Whenever a commit is made, a snapshot of the current state of repo (the folder) is taken and saved. Each commit has a unique ID. ( df2fb7a for the commit we made in the previous step). As we keep adding/changing more and more contents and keep making commits, all those snapshots are stored by git. Again, all this magic happens inside the .git folder. This is where all this snapshot or versions are stored in an efficient manner.","title":"More About a Commit"},{"location":"level101/git/git-basics/#adding-more-changes","text":"Let us create one more file and commit the change. It would look the same as the previous commit we made. $ echo \"I am file 2\" > file2.txt $ git add file2.txt $ git commit -m \"adding file 2\" [master 7f3b00e] adding file 2 1 file changed, 1 insertion(+) create mode 100644 file2.txt A new commit with ID 7f3b00e has been created. You can issue git status at any time to see the state of the repository. **IMPORTANT: Note that commit IDs are long string (SHA) but we can refer to a commit by its initial few (8 or more) characters too. We will interchangeably using shorter and longer commit IDs.** Now that we have two commits, let's visualize them: $ git log --oneline --graph * 7f3b00e (HEAD -> master) adding file 2 * df2fb7a adding file 1 git log , as the name suggests, prints the log of all the git commits. Here you see two additional arguments, --oneline prints the shorter version of the log, ie: the commit message only and not the person who made the commit and when. --graph prints it in graph format. Now at this moment the commits might look like just one in each line but all commits are stored as a tree like data structure internally by git. That means there can be two or more children commits of a given commit. And not just a single line of commits. We will look more into this part when we get to the Branches section. For now this is our commit history: df2fb7a ===> 7f3b00e","title":"Adding More Changes"},{"location":"level101/git/git-basics/#are-commits-really-linked","text":"As I just said, the two commits we just made are linked via tree like data structure and we saw how they are linked. But let's actually verify it. Everything in git is an object. Newly created files are stored as an object. Changes to file are stored as an objects and even commits are objects. To view contents of an object we can use the following command with the object's ID. We will take a look at the contents of the second commit $ git cat-file -p 7f3b00e tree ebf3af44d253e5328340026e45a9fa9ae3ea1982 parent df2fb7a61f5d40c1191e0fdeb0fc5d6e7969685a author Sanket Patel 1603273316 -0700 committer Sanket Patel 1603273316 -0700 adding file 2 Take a note of parent attribute in the above output. It points to the commit id of the first commit we made. So this proves that they are linked! Additionally you can see the second commit's message in this object. As I said all this magic is enabled by .git folder and the object to which we are looking at also is in that folder. $ ls .git/objects/7f/3b00eaa957815884198e2fdfec29361108d6a9 .git/objects/7f/3b00eaa957815884198e2fdfec29361108d6a9 It is stored in .git/objects/ folder. All the files and changes to them as well are stored in this folder.","title":"Are commits really linked?"},{"location":"level101/git/git-basics/#the-version-control-part-of-git","text":"We already can see two commits (versions) in our git log. One thing a version control tool gives you is ability to browse back and forth in history. For example: some of your users are running an old version of code and they are reporting an issue. In order to debug the issue, you need access to the old code. The one in your current repo is the latest code. In this example, you are working on the second commit (7f3b00e) and someone reported an issue with the code snapshot at commit (df2fb7a). This is how you would get access to the code at any older commit # Current contents, two files present $ ls file1.txt file2.txt # checking out to (an older) commit $ git checkout df2fb7a Note: checking out 'df2fb7a'. You are in 'detached HEAD' state. You can look around, make experimental changes and commit them, and you can discard any commits you make in this state without impacting any branches by performing another checkout. If you want to create a new branch to retain commits you create, you may do so (now or later) by using -b with the checkout command again. Example: git checkout -b HEAD is now at df2fb7a adding file 1 # checking contents, can verify it has old contents $ ls file1.txt So this is how we would get access to old versions/snapshots. All we need is a reference to that snapshot. Upon executing git checkout ... , what git does for you is use the .git folder, see what was the state of things (files and folders) at that version/reference and replace the contents of current directory with those contents. The then-existing content will no longer be present in the local dir (repo) but we can and will still get access to them because they are tracked via git commit and .git folder has them stored/tracked.","title":"The Version Control part of Git"},{"location":"level101/git/git-basics/#reference","text":"I mention in the previous section that we need a reference to the version. By default, git repo is made of tree of commits. And each commit has a unique IDs. But the unique ID is not the only thing we can reference commits via. There are multiple ways to reference commits. For example: HEAD is a reference to current commit. Whatever commit your repo is checked out at, HEAD will point to that. HEAD~1 is reference to previous commit. So while checking out previous version in section above, we could have done git checkout HEAD~1 . Similarly, master is also a reference (to a branch). Since git uses tree like structure to store commits, there of course will be branches. And the default branch is called master . Master (or any branch reference) will point to the latest commit in the branch. Even though we have checked out to the previous commit in out repo, master still points to the latest commit. And we can get back to the latest version by checkout at master reference $ git checkout master Previous HEAD position was df2fb7a adding file 1 Switched to branch 'master' # now we will see latest code, with two files $ ls file1.txt file2.txt Note, instead of master in above command, we could have used commit's ID as well.","title":"Reference"},{"location":"level101/git/git-basics/#references-and-the-magic","text":"Let's look at the state of things. Two commits, master and HEAD references are pointing to the latest commit $ git log --oneline --graph * 7f3b00e (HEAD -> master) adding file 2 * df2fb7a adding file 1 The magic? Let's examine these files: $ cat .git/refs/heads/master 7f3b00eaa957815884198e2fdfec29361108d6a9 Viola! Where master is pointing to is stored in a file. Whenever git needs to know where master reference is pointing to, or if git needs to update where master points, it just needs to update the file above. So when you create a new commit, a new commit is created on top of the current commit and the master file is updated with the new commit's ID. Similary, for HEAD reference: $ cat .git/HEAD ref: refs/heads/master We can see HEAD is pointing to a reference called refs/heads/master . So HEAD will point where ever the master points.","title":"References and The Magic"},{"location":"level101/git/git-basics/#little-adventure","text":"We discussed how git will update the files as we execute commands. But let's try to do it ourselves, by hand, and see what happens. $ git log --oneline --graph * 7f3b00e (HEAD -> master) adding file 2 * df2fb7a adding file 1 Now let's change master to point to the previous/first commit. $ echo df2fb7a61f5d40c1191e0fdeb0fc5d6e7969685a > .git/refs/heads/master $ git log --oneline --graph * df2fb7a (HEAD -> master) adding file 1 # RESETTING TO ORIGINAL $ echo 7f3b00eaa957815884198e2fdfec29361108d6a9 > .git/refs/heads/master $ git log --oneline --graph * 7f3b00e (HEAD -> master) adding file 2 * df2fb7a adding file 1 We just edited the master reference file and now we can see only the first commit in git log. Undoing the change to the file brings the state back to original. Not so much of magic, is it?","title":"Little Adventure"},{"location":"level101/git/github-hooks/","text":"Git with GitHub Till now all the operations we did were in our local repo while git also helps us in a collaborative environment. GitHub is one place on the internet where you can centrally host your git repos and collaborate with other developers. Most of the workflow will remain the same as we discussed, with addition of couple of things: Pull: to pull latest changes from github (the central) repo Push: to push your changes to github repo so that it's available to all people GitHub has written nice guides and tutorials about this and you can refer them here: GitHub Hello World Git Handbook Hooks Git has another nice feature called hooks. Hooks are basically scripts which will be called when a certain event happens. Here is where hooks are located: $ ls .git/hooks/ applypatch-msg.sample fsmonitor-watchman.sample pre-applypatch.sample pre-push.sample pre-receive.sample update.sample commit-msg.sample post-update.sample pre-commit.sample pre-rebase.sample prepare-commit-msg.sample Names are self explanatory. These hooks are useful when you want to do certain things when a certain event happens. If you want to run tests before pushing code, you would want to setup pre-push hooks. Let's try to create a pre commit hook. $ echo \"echo this is from pre commit hook\" > .git/hooks/pre-commit $ chmod +x .git/hooks/pre-commit We basically create a file called pre-commit in hooks folder and make it executable. Now if we make a commit, we should see the message getting printed. $ echo \"sample file\" > sample.txt $ git add sample.txt $ git commit -m \"adding sample file\" this is from pre commit hook # <===== THE MESSAGE FROM HOOK EXECUTION [master 9894e05] adding sample file 1 file changed, 1 insertion(+) create mode 100644 sample.txt","title":"Github and Hooks"},{"location":"level101/git/github-hooks/#git-with-github","text":"Till now all the operations we did were in our local repo while git also helps us in a collaborative environment. GitHub is one place on the internet where you can centrally host your git repos and collaborate with other developers. Most of the workflow will remain the same as we discussed, with addition of couple of things: Pull: to pull latest changes from github (the central) repo Push: to push your changes to github repo so that it's available to all people GitHub has written nice guides and tutorials about this and you can refer them here: GitHub Hello World Git Handbook","title":"Git with GitHub"},{"location":"level101/git/github-hooks/#hooks","text":"Git has another nice feature called hooks. Hooks are basically scripts which will be called when a certain event happens. Here is where hooks are located: $ ls .git/hooks/ applypatch-msg.sample fsmonitor-watchman.sample pre-applypatch.sample pre-push.sample pre-receive.sample update.sample commit-msg.sample post-update.sample pre-commit.sample pre-rebase.sample prepare-commit-msg.sample Names are self explanatory. These hooks are useful when you want to do certain things when a certain event happens. If you want to run tests before pushing code, you would want to setup pre-push hooks. Let's try to create a pre commit hook. $ echo \"echo this is from pre commit hook\" > .git/hooks/pre-commit $ chmod +x .git/hooks/pre-commit We basically create a file called pre-commit in hooks folder and make it executable. Now if we make a commit, we should see the message getting printed. $ echo \"sample file\" > sample.txt $ git add sample.txt $ git commit -m \"adding sample file\" this is from pre commit hook # <===== THE MESSAGE FROM HOOK EXECUTION [master 9894e05] adding sample file 1 file changed, 1 insertion(+) create mode 100644 sample.txt","title":"Hooks"},{"location":"level101/linux_basics/command_line_basics/","text":"Command Line Basics Lab Environment Setup One can use an online bash interpreter to run all the commands that are provided as examples in this course. This will also help you in getting a hands-on experience of various linux commands. REPL is one of the popular online bash interpreters for running linux commands. We will be using it for running all the commands mentioned in this course. What is a Command A command is a program that tells the operating system to perform specific work. Programs are stored as files in linux. Therefore, a command is also a file which is stored somewhere on the disk. Commands may also take additional arguments as input from the user. These arguments are called command line arguments. Knowing how to use the commands is important and there are many ways to get help in Linux, especially for commands. Almost every command will have some form of documentation, most commands will have a command-line argument -h or --help that will display a reasonable amount of documentation. But the most popular documentation system in Linux is called man pages - short for manual pages. Using --help to show the documentation for ls command. File System Organization The linux file system has a hierarchical (or tree-like) structure with its highest level directory called root ( denoted by / ). Directories present inside the root directory stores file related to the system. These directories in turn can either store system files or application files or user related files. bin | The executable program of most commonly used commands reside in bin directory dev | This directory contains files related to devices on the system etc | This directory contains all the system configuration files home | This directory contains user related files and directories. lib | This directory contains all the library files mnt | This directory contains files related to mounted devices on the system proc | This directory contains files related to the running processes on the system root | This directory contains root user related files and directories. sbin | This directory contains programs used for system administration. tmp | This directory is used to store temporary files on the system usr | This directory is used to store application programs on the system Commands for Navigating the File System There are three basic commands which are used frequently to navigate the file system: ls pwd cd We will now try to understand what each command does and how to use these commands. You should also practice the given examples on the online bash shell. pwd (print working directory) At any given moment of time, we will be standing in a certain directory. To get the name of the directory in which we are standing, we can use the pwd command in linux. We will now use the cd command to move to a different directory and then print the working directory. cd (change directory) The cd command can be used to change the working directory. Using the command, you can move from one directory to another. In the below example, we are initially in the root directory. we have then used the cd command to change the directory. ls (list files and directories)** The ls command is used to list the contents of a directory. It will list down all the files and folders present in the given directory. If we just type ls in the shell, it will list all the files and directories present in the current directory. We can also provide the directory name as argument to ls command. It will then list all the files and directories inside the given directory. Commands for Manipulating Files There are five basic commands which are used frequently to manipulate files: touch mkdir cp mv rm We will now try to understand what each command does and how to use these commands. You should also practice the given examples on the online bash shell. touch (create new file) The touch command can be used to create an empty new file. This command is very useful for many other purposes but we will discuss the simplest use case of creating a new file. General syntax of using touch command touch mkdir (create new directories) The mkdir command is used to create directories.You can use ls command to verify that the new directory is created. General syntax of using mkdir command mkdir rm (delete files and directories) The rm command can be used to delete files and directories. It is very important to note that this command permanently deletes the files and directories. It's almost impossible to recover these files and directories once you have executed rm command on them successfully. Do run this command with care. General syntax of using rm command: rm Let's try to understand the rm command with an example. We will try to delete the file and directory we created using touch and mkdir command respectively. cp (copy files and directories) The cp command is used to copy files and directories from one location to another. Do note that the cp command doesn't do any change to the original files or directories. The original files or directories and their copy both co-exist after running cp command successfully. General syntax of using cp command: cp We are currently in the '/home/runner' directory. We will use the mkdir command to create a new directory named \"test_directory\". We will now try to copy the \"_test_runner.py\" file to the directory we created just now. Do note that nothing happened to the original \"_test_runner.py\" file. It's still there in the current directory. A new copy of it got created inside the \"test_directory\". We can also use the cp command to copy the whole directory from one location to another. Let's try to understand this with an example. We again used the mkdir command to create a new directory called \"another_directory\". We then used the cp command along with an additional argument '-r' to copy the \"test_directory\". mv (move files and directories) The mv command can either be used to move files or directories from one location to another or it can be used to rename files or directories. Do note that moving files and copying them are very different. When you move the files or directories, the original copy is lost. General syntax of using mv command: mv In this example, we will use the mv command to move the \"_test_runner.py\" file to \"test_directory\". In this case, this file already exists in \"test_directory\". The mv command will just replace it. Do note that the original file doesn't exist in the current directory after mv command ran successfully. We can also use the mv command to move a directory from one location to another. In this case, we do not need to use the '-r' flag that we did while using the cp command. Do note that the original directory will not exist if we use mv command. One of the important uses of the mv command is to rename files and directories. Let's see how we can use this command for renaming. We have first changed our location to \"test_directory\". We then use the mv command to rename the \"\"_test_runner.py\" file to \"test.py\". Commands for Viewing Files There are five basic commands which are used frequently to view the files: cat head tail more less We will now try to understand what each command does and how to use these commands. You should also practice the given examples on the online bash shell. We will create a new file called \"numbers.txt\" and insert numbers from 1 to 100 in this file. Each number will be in a separate line. Do not worry about the above command now. It's an advanced command which is used to generate numbers. We have then used a redirection operator to push these numbers to the file. We will be discussing I/O redirection in the later sections. cat The most simplest use of cat command is to print the contents of the file on your output screen. This command is very useful and can be used for many other purposes. We will study about other use cases later. You can try to run the above command and you will see numbers being printed from 1 to 100 on your screen. You will need to scroll up to view all the numbers. head The head command displays the first 10 lines of the file by default. We can include additional arguments to display as many lines as we want from the top. In this example, we are only able to see the first 10 lines from the file when we use the head command. By default, head command will only display the first 10 lines. If we want to specify the number of lines we want to see from start, use the '-n' argument to provide the input. tail The tail command displays the last 10 lines of the file by default. We can include additional arguments to display as many lines as we want from the end of the file. By default, the tail command will only display the last 10 lines. If we want to specify the number of lines we want to see from the end, use '-n' argument to provide the input. In this example, we are only able to see the last 5 lines from the file when we use the tail command with explicit -n option. more More command displays the contents of a file or a command output, displaying one screen at a time in case the file is large (Eg: log files). It also allows forward navigation and limited backward navigation in the file. More command displays as much as can fit on the current screen and waits for user input to advance. Forward navigation can be done by pressing Enter, which advances the output by one line and Space, which advances the output by one screen. less Less command is an improved version of more. It displays the contents of a file or a command output, one page at a time. It allows backward navigation as well as forward navigation in the file and also has search options. We can use arrow keys for advancing backward or forward by one line. For moving forward by one page, press Space and for moving backward by one page, press b on your keyboard. You can go to the beginning and the end of a file instantly. Echo Command in Linux The echo command is one of the simplest commands that is used in the shell. This command is equivalent to what we have in other programming languages. The echo command prints the given input string on the screen. Text Processing Commands In the previous section, we learned how to view the content of a file. In many cases, we will be interested in performing the below operations: Print only the lines which contain a particular word(s) Replace a particular word with another word in a file Sort the lines in a particular order There are three basic commands which are used frequently to process texts: grep sed sort We will now try to understand what each command does and how to use these commands. You should also practice the given examples on the online bash shell. We will create a new file called \"numbers.txt\" and insert numbers from 1 to 10 in this file. Each number will be in a separate line. grep The grep command in its simplest form can be used to search particular words in a text file. It will display all the lines in a file that contains a particular input. The word we want to search is provided as an input to the grep command. General syntax of using grep command: grep In this example, we are trying to search for a string \"1\" in this file. The grep command outputs the lines where it found this string. sed The sed command in its simplest form can be used to replace a text in a file. General syntax of using the sed command for replacement: sed 's///' Let's try to replace each occurrence of \"1\" in the file with \"3\" using sed command. The content of the file will not change in the above example. To do so, we have to use an extra argument '-i' so that the changes are reflected back in the file. sort The sort command can be used to sort the input provided to it as an argument. By default, it will sort in increasing order. Let's first see the content of the file before trying to sort it. Now, we will try to sort the file using the sort command. The sort command sorts the content in lexicographical order. The content of the file will not change in the above example. I/O Redirection Each open file gets assigned a file descriptor. A file descriptor is an unique identifier for open files in the system. There are always three default files open, stdin (the keyboard), stdout (the screen), and stderr (error messages output to the screen). These files can be redirected. Everything is a file in linux - https://unix.stackexchange.com/questions/225537/everything-is-a-file Till now, we have displayed all the output on the screen which is the standard output. We can use some special operators to redirect the output of the command to files or even to the input of other commands. I/O redirection is a very powerful feature. In the below example, we have used the '>' operator to redirect the output of ls command to output.txt file. In the below example, we have redirected the output from echo command to a file. We can also redirect the output of a command as an input to another command. This is possible with the help of pipes. In the below example, we have passed the output of cat command as an input to grep command using pipe(|) operator. In the below example, we have passed the output of sort command as an input to uniq command using pipe(|) operator. The uniq command only prints the unique numbers from the input. I/O redirection - https://tldp.org/LDP/abs/html/io-redirection.html","title":"Command Line Basics"},{"location":"level101/linux_basics/command_line_basics/#command-line-basics","text":"","title":"Command Line Basics"},{"location":"level101/linux_basics/command_line_basics/#lab-environment-setup","text":"One can use an online bash interpreter to run all the commands that are provided as examples in this course. This will also help you in getting a hands-on experience of various linux commands. REPL is one of the popular online bash interpreters for running linux commands. We will be using it for running all the commands mentioned in this course.","title":"Lab Environment Setup"},{"location":"level101/linux_basics/command_line_basics/#what-is-a-command","text":"A command is a program that tells the operating system to perform specific work. Programs are stored as files in linux. Therefore, a command is also a file which is stored somewhere on the disk. Commands may also take additional arguments as input from the user. These arguments are called command line arguments. Knowing how to use the commands is important and there are many ways to get help in Linux, especially for commands. Almost every command will have some form of documentation, most commands will have a command-line argument -h or --help that will display a reasonable amount of documentation. But the most popular documentation system in Linux is called man pages - short for manual pages. Using --help to show the documentation for ls command.","title":"What is a Command"},{"location":"level101/linux_basics/command_line_basics/#file-system-organization","text":"The linux file system has a hierarchical (or tree-like) structure with its highest level directory called root ( denoted by / ). Directories present inside the root directory stores file related to the system. These directories in turn can either store system files or application files or user related files. bin | The executable program of most commonly used commands reside in bin directory dev | This directory contains files related to devices on the system etc | This directory contains all the system configuration files home | This directory contains user related files and directories. lib | This directory contains all the library files mnt | This directory contains files related to mounted devices on the system proc | This directory contains files related to the running processes on the system root | This directory contains root user related files and directories. sbin | This directory contains programs used for system administration. tmp | This directory is used to store temporary files on the system usr | This directory is used to store application programs on the system","title":"File System Organization"},{"location":"level101/linux_basics/command_line_basics/#commands-for-navigating-the-file-system","text":"There are three basic commands which are used frequently to navigate the file system: ls pwd cd We will now try to understand what each command does and how to use these commands. You should also practice the given examples on the online bash shell.","title":"Commands for Navigating the File System"},{"location":"level101/linux_basics/command_line_basics/#pwd-print-working-directory","text":"At any given moment of time, we will be standing in a certain directory. To get the name of the directory in which we are standing, we can use the pwd command in linux. We will now use the cd command to move to a different directory and then print the working directory.","title":"pwd (print working directory)"},{"location":"level101/linux_basics/command_line_basics/#cd-change-directory","text":"The cd command can be used to change the working directory. Using the command, you can move from one directory to another. In the below example, we are initially in the root directory. we have then used the cd command to change the directory.","title":"cd (change directory)"},{"location":"level101/linux_basics/command_line_basics/#ls-list-files-and-directories","text":"The ls command is used to list the contents of a directory. It will list down all the files and folders present in the given directory. If we just type ls in the shell, it will list all the files and directories present in the current directory. We can also provide the directory name as argument to ls command. It will then list all the files and directories inside the given directory.","title":"ls (list files and directories)**"},{"location":"level101/linux_basics/command_line_basics/#commands-for-manipulating-files","text":"There are five basic commands which are used frequently to manipulate files: touch mkdir cp mv rm We will now try to understand what each command does and how to use these commands. You should also practice the given examples on the online bash shell.","title":"Commands for Manipulating Files"},{"location":"level101/linux_basics/command_line_basics/#touch-create-new-file","text":"The touch command can be used to create an empty new file. This command is very useful for many other purposes but we will discuss the simplest use case of creating a new file. General syntax of using touch command touch ","title":"touch (create new file)"},{"location":"level101/linux_basics/command_line_basics/#mkdir-create-new-directories","text":"The mkdir command is used to create directories.You can use ls command to verify that the new directory is created. General syntax of using mkdir command mkdir ","title":"mkdir (create new directories)"},{"location":"level101/linux_basics/command_line_basics/#rm-delete-files-and-directories","text":"The rm command can be used to delete files and directories. It is very important to note that this command permanently deletes the files and directories. It's almost impossible to recover these files and directories once you have executed rm command on them successfully. Do run this command with care. General syntax of using rm command: rm Let's try to understand the rm command with an example. We will try to delete the file and directory we created using touch and mkdir command respectively.","title":"rm (delete files and directories)"},{"location":"level101/linux_basics/command_line_basics/#cp-copy-files-and-directories","text":"The cp command is used to copy files and directories from one location to another. Do note that the cp command doesn't do any change to the original files or directories. The original files or directories and their copy both co-exist after running cp command successfully. General syntax of using cp command: cp We are currently in the '/home/runner' directory. We will use the mkdir command to create a new directory named \"test_directory\". We will now try to copy the \"_test_runner.py\" file to the directory we created just now. Do note that nothing happened to the original \"_test_runner.py\" file. It's still there in the current directory. A new copy of it got created inside the \"test_directory\". We can also use the cp command to copy the whole directory from one location to another. Let's try to understand this with an example. We again used the mkdir command to create a new directory called \"another_directory\". We then used the cp command along with an additional argument '-r' to copy the \"test_directory\". mv (move files and directories) The mv command can either be used to move files or directories from one location to another or it can be used to rename files or directories. Do note that moving files and copying them are very different. When you move the files or directories, the original copy is lost. General syntax of using mv command: mv In this example, we will use the mv command to move the \"_test_runner.py\" file to \"test_directory\". In this case, this file already exists in \"test_directory\". The mv command will just replace it. Do note that the original file doesn't exist in the current directory after mv command ran successfully. We can also use the mv command to move a directory from one location to another. In this case, we do not need to use the '-r' flag that we did while using the cp command. Do note that the original directory will not exist if we use mv command. One of the important uses of the mv command is to rename files and directories. Let's see how we can use this command for renaming. We have first changed our location to \"test_directory\". We then use the mv command to rename the \"\"_test_runner.py\" file to \"test.py\".","title":"cp (copy files and directories)"},{"location":"level101/linux_basics/command_line_basics/#commands-for-viewing-files","text":"There are five basic commands which are used frequently to view the files: cat head tail more less We will now try to understand what each command does and how to use these commands. You should also practice the given examples on the online bash shell. We will create a new file called \"numbers.txt\" and insert numbers from 1 to 100 in this file. Each number will be in a separate line. Do not worry about the above command now. It's an advanced command which is used to generate numbers. We have then used a redirection operator to push these numbers to the file. We will be discussing I/O redirection in the later sections.","title":"Commands for Viewing Files"},{"location":"level101/linux_basics/command_line_basics/#cat","text":"The most simplest use of cat command is to print the contents of the file on your output screen. This command is very useful and can be used for many other purposes. We will study about other use cases later. You can try to run the above command and you will see numbers being printed from 1 to 100 on your screen. You will need to scroll up to view all the numbers.","title":"cat"},{"location":"level101/linux_basics/command_line_basics/#head","text":"The head command displays the first 10 lines of the file by default. We can include additional arguments to display as many lines as we want from the top. In this example, we are only able to see the first 10 lines from the file when we use the head command. By default, head command will only display the first 10 lines. If we want to specify the number of lines we want to see from start, use the '-n' argument to provide the input.","title":"head"},{"location":"level101/linux_basics/command_line_basics/#tail","text":"The tail command displays the last 10 lines of the file by default. We can include additional arguments to display as many lines as we want from the end of the file. By default, the tail command will only display the last 10 lines. If we want to specify the number of lines we want to see from the end, use '-n' argument to provide the input. In this example, we are only able to see the last 5 lines from the file when we use the tail command with explicit -n option.","title":"tail"},{"location":"level101/linux_basics/command_line_basics/#more","text":"More command displays the contents of a file or a command output, displaying one screen at a time in case the file is large (Eg: log files). It also allows forward navigation and limited backward navigation in the file. More command displays as much as can fit on the current screen and waits for user input to advance. Forward navigation can be done by pressing Enter, which advances the output by one line and Space, which advances the output by one screen.","title":"more"},{"location":"level101/linux_basics/command_line_basics/#less","text":"Less command is an improved version of more. It displays the contents of a file or a command output, one page at a time. It allows backward navigation as well as forward navigation in the file and also has search options. We can use arrow keys for advancing backward or forward by one line. For moving forward by one page, press Space and for moving backward by one page, press b on your keyboard. You can go to the beginning and the end of a file instantly.","title":"less"},{"location":"level101/linux_basics/command_line_basics/#echo-command-in-linux","text":"The echo command is one of the simplest commands that is used in the shell. This command is equivalent to what we have in other programming languages. The echo command prints the given input string on the screen.","title":"Echo Command in Linux"},{"location":"level101/linux_basics/command_line_basics/#text-processing-commands","text":"In the previous section, we learned how to view the content of a file. In many cases, we will be interested in performing the below operations: Print only the lines which contain a particular word(s) Replace a particular word with another word in a file Sort the lines in a particular order There are three basic commands which are used frequently to process texts: grep sed sort We will now try to understand what each command does and how to use these commands. You should also practice the given examples on the online bash shell. We will create a new file called \"numbers.txt\" and insert numbers from 1 to 10 in this file. Each number will be in a separate line.","title":"Text Processing Commands"},{"location":"level101/linux_basics/command_line_basics/#grep","text":"The grep command in its simplest form can be used to search particular words in a text file. It will display all the lines in a file that contains a particular input. The word we want to search is provided as an input to the grep command. General syntax of using grep command: grep In this example, we are trying to search for a string \"1\" in this file. The grep command outputs the lines where it found this string.","title":"grep"},{"location":"level101/linux_basics/command_line_basics/#sed","text":"The sed command in its simplest form can be used to replace a text in a file. General syntax of using the sed command for replacement: sed 's///' Let's try to replace each occurrence of \"1\" in the file with \"3\" using sed command. The content of the file will not change in the above example. To do so, we have to use an extra argument '-i' so that the changes are reflected back in the file.","title":"sed"},{"location":"level101/linux_basics/command_line_basics/#sort","text":"The sort command can be used to sort the input provided to it as an argument. By default, it will sort in increasing order. Let's first see the content of the file before trying to sort it. Now, we will try to sort the file using the sort command. The sort command sorts the content in lexicographical order. The content of the file will not change in the above example.","title":"sort"},{"location":"level101/linux_basics/command_line_basics/#io-redirection","text":"Each open file gets assigned a file descriptor. A file descriptor is an unique identifier for open files in the system. There are always three default files open, stdin (the keyboard), stdout (the screen), and stderr (error messages output to the screen). These files can be redirected. Everything is a file in linux - https://unix.stackexchange.com/questions/225537/everything-is-a-file Till now, we have displayed all the output on the screen which is the standard output. We can use some special operators to redirect the output of the command to files or even to the input of other commands. I/O redirection is a very powerful feature. In the below example, we have used the '>' operator to redirect the output of ls command to output.txt file. In the below example, we have redirected the output from echo command to a file. We can also redirect the output of a command as an input to another command. This is possible with the help of pipes. In the below example, we have passed the output of cat command as an input to grep command using pipe(|) operator. In the below example, we have passed the output of sort command as an input to uniq command using pipe(|) operator. The uniq command only prints the unique numbers from the input. I/O redirection - https://tldp.org/LDP/abs/html/io-redirection.html","title":"I/O Redirection"},{"location":"level101/linux_basics/conclusion/","text":"Conclusion We have covered the basics of Linux operating systems and basic commands used in linux. We have also covered the Linux server administration commands. We hope that this course will make it easier for you to operate on the command line. Applications in SRE Role As a SRE, you will be required to perform some general tasks on these Linux servers. You will also be using the command line when you are troubleshooting issues. Moving from one location to another in the filesystem will require the help of ls , pwd and cd commands. You may need to search some specific information in the log files. grep command would be very useful here. I/O redirection will become handy if you want to store the output in a file or pass it as an input to another command. tail command is very useful to view the latest data in the log file. Different users will have different permissions depending on their roles. We will also not want everyone in the company to access our servers for security reasons. Users permissions can be restricted with chown , chmod and chgrp commands. ssh is one of the most frequently used commands for a SRE. Logging into servers and troubleshooting along with performing basic administration tasks will only be possible if we are able to login into the server. What if we want to run an apache server or nginx on a server? We will first install it using the package manager. Package management commands become important here. Managing services on servers is another critical responsibility of a SRE. Systemd related commands can help in troubleshooting issues. If a service goes down, we can start it using systemctl start command. We can also stop a service in case it is not needed. Monitoring is another core responsibility of a SRE. Memory and CPU are two important system level metrics which should be monitored. Commands like top and free are quite helpful here. If a service is throwing an error, how do we find out the root cause of the error ? We will certainly need to check logs to find out the whole stack trace of the error. The log file will also tell us the number of times the error has occurred along with time when it started. Useful Courses and tutorials Edx basic linux commands course Edx Red Hat Enterprise Linux Course https://linuxcommand.org/lc3_learning_the_shell.php","title":"Conclusion"},{"location":"level101/linux_basics/conclusion/#conclusion","text":"We have covered the basics of Linux operating systems and basic commands used in linux. We have also covered the Linux server administration commands. We hope that this course will make it easier for you to operate on the command line.","title":"Conclusion"},{"location":"level101/linux_basics/conclusion/#applications-in-sre-role","text":"As a SRE, you will be required to perform some general tasks on these Linux servers. You will also be using the command line when you are troubleshooting issues. Moving from one location to another in the filesystem will require the help of ls , pwd and cd commands. You may need to search some specific information in the log files. grep command would be very useful here. I/O redirection will become handy if you want to store the output in a file or pass it as an input to another command. tail command is very useful to view the latest data in the log file. Different users will have different permissions depending on their roles. We will also not want everyone in the company to access our servers for security reasons. Users permissions can be restricted with chown , chmod and chgrp commands. ssh is one of the most frequently used commands for a SRE. Logging into servers and troubleshooting along with performing basic administration tasks will only be possible if we are able to login into the server. What if we want to run an apache server or nginx on a server? We will first install it using the package manager. Package management commands become important here. Managing services on servers is another critical responsibility of a SRE. Systemd related commands can help in troubleshooting issues. If a service goes down, we can start it using systemctl start command. We can also stop a service in case it is not needed. Monitoring is another core responsibility of a SRE. Memory and CPU are two important system level metrics which should be monitored. Commands like top and free are quite helpful here. If a service is throwing an error, how do we find out the root cause of the error ? We will certainly need to check logs to find out the whole stack trace of the error. The log file will also tell us the number of times the error has occurred along with time when it started.","title":"Applications in SRE Role"},{"location":"level101/linux_basics/conclusion/#useful-courses-and-tutorials","text":"Edx basic linux commands course Edx Red Hat Enterprise Linux Course https://linuxcommand.org/lc3_learning_the_shell.php","title":"Useful Courses and tutorials"},{"location":"level101/linux_basics/intro/","text":"Linux Basics Introduction Prerequisites Should be comfortable in using any operating systems like Windows, Linux or Mac Expected to have fundamental knowledge of operating systems What to expect from this course This course is divided into three parts. In the first part, we cover the fundamentals of Linux operating systems. We will talk about Linux architecture, Linux distributions and uses of Linux operating systems. We will also talk about the difference between GUI and CLI. In the second part, we cover some basic commands used in Linux. We will focus on commands used for navigating the file system, viewing and manipulating files, I/O redirection etc. In the third part, we cover Linux system administration. This includes day to day tasks performed by Linux admins, like managing users/groups, managing file permissions, monitoring system performance, log files etc. In the second and third part, we will be taking examples to understand the concepts. What is not covered under this course We are not covering advanced Linux commands and bash scripting in this course. We will also not be covering Linux internals. Course Contents The following topics has been covered in this course: Introduction to Linux What are Linux Operating Systems What are popular Linux distributions Uses of Linux Operating Systems Linux Architecture Graphical user interface (GUI) vs Command line interface (CLI) Command Line Basics Lab Environment Setup What is a Command File System Organization Navigating File System Manipulating Files Viewing Files Echo Command Text Processing Commands I/O Redirection Linux system administration Lab Environment Setup User/Groups management Becoming a Superuser File Permissions SSH Command Package Management Process Management Memory Management Daemons and Systemd Logs Conclusion Applications in SRE Role Useful Courses and tutorials What are Linux operating systems Most of us are familiar with the Windows operating system used in more than 75% of the personal computers. The Windows operating systems are based on Windows NT kernel. A kernel is the most important part of an operating system - it performs important functions like process management, memory management, filesystem management etc. Linux operating systems are based on the Linux kernel. A Linux based operating system will consist of Linux kernel, GUI/CLI, system libraries and system utilities. The Linux kernel was independently developed and released by Linus Torvalds. The Linux kernel is free and open-source - https://github.com/torvalds/linux Linux is a kernel and not a complete operating system. Linux kernel is combined with GNU system to make a complete operating system. Therefore, linux based operating systems are also called as GNU/Linux systems. GNU is an extensive collection of free softwares like compiler, debugger, C library etc. Linux and the GNU System History of Linux - https://en.wikipedia.org/wiki/History_of_Linux What are popular Linux distributions A Linux distribution(distro) is an operating system based on the Linux kernel and a package management system. A package management system consists of tools that help in installing, upgrading, configuring and removing softwares on the operating system. Software are usually adopted to a distribution and are packaged in a distro specific format. These packages are available through a distro specific repository. Packages are installed and managed in the operating system by a package manager. List of popular Linux distributions: Fedora Ubuntu Debian Centos Red Hat Enterprise Linux Suse Arch Linux Packaging systems Distributions Package manager Debian style (.deb) Debian, Ubuntu APT Red Hat style (.rpm) Fedora, CentOS, Red Hat Enterprise Linux YUM Linux Architecture The Linux kernel is monolithic in nature. System calls are used to interact with the Linux kernel space. Kernel code can only be executed in the kernel mode. Non-kernel code is executed in the user mode. Device drivers are used to communicate with the hardware devices. Uses of Linux Operating Systems Operating system based on Linux kernel are widely used in: Personal computers Servers Mobile phones - Android is based on Linux operating system Embedded devices - watches, televisions, traffic lights etc Satellites Network devices - routers, switches etc. Graphical user interface (GUI) vs Command line interface (CLI) A user interacts with a computer with the help of user interfaces. The user interface can be either GUI or CLI. Graphical user interface allows a user to interact with the computer using graphics such as icons and images. When a user clicks on an icon to open an application on a computer, he or she is actually using the GUI. It's easy to perform tasks using GUI. Command line interface allows a user to interact with the computer using commands. A user types the command in a terminal and the system helps in executing these commands. A new user with experience on GUI may find it difficult to interact with CLI as he/she needs to be aware of the commands to perform a particular operation. Shell vs Terminal Shell is a program that takes commands from the users and gives them to the operating system for processing. Shell is an example of a CLI (command line interface). Bash is one of the most popular shell programs available on Linux servers. Other popular shell programs are zsh, ksh and tcsh. Terminal is a program that opens a window and lets you interact with the shell. Some popular examples of terminals are gnome-terminal, xterm, konsole etc. Linux users do use the terms shell, terminal, prompt, console etc. interchangeably. In simple terms, these all refer to a way of taking commands from the user.","title":"Introduction"},{"location":"level101/linux_basics/intro/#linux-basics","text":"","title":"Linux Basics"},{"location":"level101/linux_basics/intro/#introduction","text":"","title":"Introduction"},{"location":"level101/linux_basics/intro/#prerequisites","text":"Should be comfortable in using any operating systems like Windows, Linux or Mac Expected to have fundamental knowledge of operating systems","title":"Prerequisites"},{"location":"level101/linux_basics/intro/#what-to-expect-from-this-course","text":"This course is divided into three parts. In the first part, we cover the fundamentals of Linux operating systems. We will talk about Linux architecture, Linux distributions and uses of Linux operating systems. We will also talk about the difference between GUI and CLI. In the second part, we cover some basic commands used in Linux. We will focus on commands used for navigating the file system, viewing and manipulating files, I/O redirection etc. In the third part, we cover Linux system administration. This includes day to day tasks performed by Linux admins, like managing users/groups, managing file permissions, monitoring system performance, log files etc. In the second and third part, we will be taking examples to understand the concepts.","title":"What to expect from this course"},{"location":"level101/linux_basics/intro/#what-is-not-covered-under-this-course","text":"We are not covering advanced Linux commands and bash scripting in this course. We will also not be covering Linux internals.","title":"What is not covered under this course"},{"location":"level101/linux_basics/intro/#course-contents","text":"The following topics has been covered in this course: Introduction to Linux What are Linux Operating Systems What are popular Linux distributions Uses of Linux Operating Systems Linux Architecture Graphical user interface (GUI) vs Command line interface (CLI) Command Line Basics Lab Environment Setup What is a Command File System Organization Navigating File System Manipulating Files Viewing Files Echo Command Text Processing Commands I/O Redirection Linux system administration Lab Environment Setup User/Groups management Becoming a Superuser File Permissions SSH Command Package Management Process Management Memory Management Daemons and Systemd Logs Conclusion Applications in SRE Role Useful Courses and tutorials","title":"Course Contents"},{"location":"level101/linux_basics/intro/#what-are-linux-operating-systems","text":"Most of us are familiar with the Windows operating system used in more than 75% of the personal computers. The Windows operating systems are based on Windows NT kernel. A kernel is the most important part of an operating system - it performs important functions like process management, memory management, filesystem management etc. Linux operating systems are based on the Linux kernel. A Linux based operating system will consist of Linux kernel, GUI/CLI, system libraries and system utilities. The Linux kernel was independently developed and released by Linus Torvalds. The Linux kernel is free and open-source - https://github.com/torvalds/linux Linux is a kernel and not a complete operating system. Linux kernel is combined with GNU system to make a complete operating system. Therefore, linux based operating systems are also called as GNU/Linux systems. GNU is an extensive collection of free softwares like compiler, debugger, C library etc. Linux and the GNU System History of Linux - https://en.wikipedia.org/wiki/History_of_Linux","title":"What are Linux operating systems"},{"location":"level101/linux_basics/intro/#what-are-popular-linux-distributions","text":"A Linux distribution(distro) is an operating system based on the Linux kernel and a package management system. A package management system consists of tools that help in installing, upgrading, configuring and removing softwares on the operating system. Software are usually adopted to a distribution and are packaged in a distro specific format. These packages are available through a distro specific repository. Packages are installed and managed in the operating system by a package manager. List of popular Linux distributions: Fedora Ubuntu Debian Centos Red Hat Enterprise Linux Suse Arch Linux Packaging systems Distributions Package manager Debian style (.deb) Debian, Ubuntu APT Red Hat style (.rpm) Fedora, CentOS, Red Hat Enterprise Linux YUM","title":"What are popular Linux distributions"},{"location":"level101/linux_basics/intro/#linux-architecture","text":"The Linux kernel is monolithic in nature. System calls are used to interact with the Linux kernel space. Kernel code can only be executed in the kernel mode. Non-kernel code is executed in the user mode. Device drivers are used to communicate with the hardware devices.","title":"Linux Architecture"},{"location":"level101/linux_basics/intro/#uses-of-linux-operating-systems","text":"Operating system based on Linux kernel are widely used in: Personal computers Servers Mobile phones - Android is based on Linux operating system Embedded devices - watches, televisions, traffic lights etc Satellites Network devices - routers, switches etc.","title":"Uses of Linux Operating Systems"},{"location":"level101/linux_basics/intro/#graphical-user-interface-gui-vs-command-line-interface-cli","text":"A user interacts with a computer with the help of user interfaces. The user interface can be either GUI or CLI. Graphical user interface allows a user to interact with the computer using graphics such as icons and images. When a user clicks on an icon to open an application on a computer, he or she is actually using the GUI. It's easy to perform tasks using GUI. Command line interface allows a user to interact with the computer using commands. A user types the command in a terminal and the system helps in executing these commands. A new user with experience on GUI may find it difficult to interact with CLI as he/she needs to be aware of the commands to perform a particular operation.","title":"Graphical user interface (GUI) vs Command line interface (CLI)"},{"location":"level101/linux_basics/intro/#shell-vs-terminal","text":"Shell is a program that takes commands from the users and gives them to the operating system for processing. Shell is an example of a CLI (command line interface). Bash is one of the most popular shell programs available on Linux servers. Other popular shell programs are zsh, ksh and tcsh. Terminal is a program that opens a window and lets you interact with the shell. Some popular examples of terminals are gnome-terminal, xterm, konsole etc. Linux users do use the terms shell, terminal, prompt, console etc. interchangeably. In simple terms, these all refer to a way of taking commands from the user.","title":"Shell vs Terminal"},{"location":"level101/linux_basics/linux_server_administration/","text":"Linux Server Administration In this course will try to cover some of the common tasks that a linux server administrator performs. We will first try to understand what a particular command does and then try to understand the commands using examples. Do keep in mind that it's very important to practice the Linux commands on your own. Lab Environment Setup Install docker on your system - https://docs.docker.com/engine/install/ We will be running all the commands on Red Hat Enterprise Linux (RHEL) 8 system. We will run most of the commands used in this module in the above Docker container. Multi-User Operating Systems An operating system is considered as multi-user if it allows multiple people/users to use a computer and not affect each other's files and preferences. Linux based operating systems are multi-user in nature as it allows multiple users to access the system at the same time. A typical computer will only have one keyboard and monitor but multiple users can log in via SSH if the computer is connected to the network. We will cover more about SSH later. As a server administrator, we are mostly concerned with the Linux servers which are physically present at a very large distance from us. We can connect to these servers with the help of remote login methods like SSH. Since Linux supports multiple users, we need to have a method which can protect the users from each other. One user should not be able to access and modify files of other users User/Group Management Users in Linux has an associated user ID called UID attached to them. Users also has a home directory and a login shell associated with them. A group is a collection of one or more users. A group makes it easier to share permissions among a group of users. Each group has a group ID called GID associated with it. id command id command can be used to find the uid and gid associated with an user. It also lists down the groups to which the user belongs to. The uid and gid associated with the root user is 0. A good way to find out the current user in Linux is to use the whoami command. \"root\" user or superuser is the most privileged user with unrestricted access to all the resources on the system. It has UID 0 Important files associated with users/groups /etc/passwd Stores the user name, the uid, the gid, the home directory, the login shell etc /etc/shadow Stores the password associated with the users /etc/group Stores information about different groups on the system If you want to understand each filed discussed in the above outputs, you can go through below links: https://tldp.org/LDP/lame/LAME/linux-admin-made-easy/shadow-file-formats.html https://tldp.org/HOWTO/User-Authentication-HOWTO/x71.html Important commands for managing users Some of the commands which are used frequently to manage users/groups on Linux are following: useradd - Creates a new user passwd - Adds or modifies passwords for a user usermod - Modifies attributes of an user userdel - Deletes an user useradd The useradd command adds a new user in Linux. We will create a new user 'shivam'. We will also verify that the user has been created by tailing the /etc/passwd file. The uid and gid are 1000 for the newly created user. The home directory assigned to the user is /home/shivam and the login shell assigned is /bin/bash. Do note that the user home directory and login shell can be modified later on. If we do not specify any value for attributes like home directory or login shell, default values will be assigned to the user. We can also override these default values when creating a new user. passwd The passwd command is used to create or modify passwords for a user. In the above examples, we have not assigned any password for users 'shivam' or 'amit' while creating them. \"!!\" in an account entry in shadow means the account of an user has been created, but not yet given a password. Let's now try to create a password for user \"shivam\". Do remember the password as we will be later using examples where it will be useful. Also, let's change the password for the root user now. When we switch from a normal user to root user, it will request you for a password. Also, when you login using root user, the password will be asked. usermod The usermod command is used to modify the attributes of an user like the home directory or the shell. Let's try to modify the login shell of user \"amit\" to \"/bin/bash\". In a similar way, you can also modify many other attributes for a user. Try 'usermod -h' for a list of attributes you can modify. userdel The userdel command is used to remove a user on Linux. Once we remove a user, all the information related to that user will be removed. Let's try to delete the user \"amit\". After deleting the user, you will not find the entry for that user in \"/etc/passwd\" or \"/etc/shadow\" file. Important commands for managing groups Commands for managing groups are quite similar to the commands used for managing users. Each command is not explained in detail here as they are quite similar. You can try running these commands on your system. groupadd \\ Creates a new group groupmod \\ Modifies attributes of a group groupdel \\ Deletes a group gpasswd \\ Modifies password for group We will now try to add user \"shivam\" to the group we have created above. Becoming a Superuser Before running the below commands, do make sure that you have set up a password for user \"shivam\" and user \"root\" using the passwd command described in the above section. The su command can be used to switch users in Linux. Let's now try to switch to user \"shivam\". Let's now try to open the \"/etc/shadow\" file. The operating system didn't allow the user \"shivam\" to read the content of the \"/etc/shadow\" file. This is an important file in Linux which stores the passwords of users. This file can only be accessed by root or users who have the superuser privileges. The sudo command allows a user to run commands with the security privileges of the root user. Do remember that the root user has all the privileges on a system. We can also use su command to switch to the root user and open the above file but doing that will require the password of the root user. An alternative way which is preferred on most modern operating systems is to use sudo command for becoming a superuser. Using this way, a user has to enter his/her password and they need to be a part of the sudo group. How to provide superpriveleges to other users ? Let's first switch to the root user using su command. Do note that using the below command will need you to enter the password for the root user. In case, you forgot to set a password for the root user, type \"exit\" and you will be back as the root user. Now, set up a password using the passwd command. The file /etc/sudoers holds the names of users permitted to invoke sudo . In redhat operating systems, this file is not present by default. We will need to install sudo. We will discuss the yum command in detail in later sections. Try to open the \"/etc/sudoers\" file on the system. The file has a lot of information. This file stores the rules that users must follow when running the sudo command. For example, root is allowed to run any commands from anywhere. One easy way of providing root access to users is to add them to a group which has permissions to run all the commands. \"wheel\" is a group in redhat Linux with such privileges. Let's add the user \"shivam\" to this group so that it also has sudo privileges. Let's now switch back to user \"shivam\" and try to access the \"/etc/shadow\" file. We need to use sudo before running the command since it can only be accessed with the sudo privileges. We have already given sudo privileges to user \u201cshivam\u201d by adding him to the group \u201cwheel\u201d. File Permissions On a Linux operating system, each file and directory is assigned access permissions for the owner of the file, the members of a group of related users and everybody else. This is to make sure that one user is not allowed to access the files and resources of another user. To see the permissions of a file, we can use the ls command. Let's look at the permissions of /etc/passwd file. Let's go over some of the important fields in the output that are related to file permissions. Chmod command The chmod command is used to modify files and directories permissions in Linux. The chmod command accepts permissions in as a numerical argument. We can think of permission as a series of bits with 1 representing True or allowed and 0 representing False or not allowed. Permission rwx Binary Decimal Read, write and execute rwx 111 7 Read and write rw- 110 6 Read and execute r-x 101 5 Read only r-- 100 4 Write and execute -wx 011 3 Write only -w- 010 2 Execute only --x 001 1 None --- 000 0 We will now create a new file and check the permission of the file. The group owner doesn't have the permission to write to this file. Let's give the group owner or root the permission to write to it using chmod command. Chmod command can be also used to change the permissions of a directory in the similar way. Chown command The chown command is used to change the owner of files or directories in Linux. Command syntax: chown \\ \\ In case, we do not have sudo privileges, we need to use sudo command . Let's switch to user 'shivam' and try changing the owner. We have also changed the owner of the file to root before running the below command. Chown command can also be used to change the owner of a directory in the similar way. Chgrp command The chgrp command can be used to change the group ownership of files or directories in Linux. The syntax is very similar to that of chown command. Chgrp command can also be used to change the owner of a directory in the similar way. SSH Command The ssh command is used for logging into the remote systems, transfer files between systems and for executing commands on a remote machine. SSH stands for secure shell and is used to provide an encrypted secured connection between two hosts over an insecure network like the internet. Reference: https://www.ssh.com/ssh/command/ We will now discuss passwordless authentication which is secure and most commonly used for ssh authentication. Passwordless Authentication Using SSH Using this method, we can ssh into hosts without entering the password. This method is also useful when we want some scripts to perform ssh-related tasks. Passwordless authentication requires the use of a public and private key pair. As the name implies, the public key can be shared with anyone but the private key should be kept private. Lets not get into the details of how this authentication works. You can read more about it here Steps for setting up a passwordless authentication with a remote host: Generating public-private key pair If we already have a key pair stored in \\~/.ssh directory, we will not need to generate keys again. Install openssh package which contains all the commands related to ssh. Generate a key pair using the ssh-keygen command. One can choose the default values for all prompts. After running the ssh-keygen command successfully, we should see two keys present in the \\~/.ssh directory. Id_rsa is the private key and id_rsa.pub is the public key. Do note that the private key can only be read and modified by you. Transferring the public key to the remote host There are multiple ways to transfer the public key to the remote server. We will look at one of the most common ways of doing it using the ssh-copy-id command. Install the openssh-clients package to use ssh-copy-id command. Use the ssh-copy-id command to copy your public key to the remote host. Now, ssh into the remote host using the password authentication. Our public key should be there in \\~/.ssh/authorized_keys now. \\~/.ssh/authorized_key contains a list of public keys. The users associated with these public keys have the ssh access into the remote host. How to run commands on a remote host ? General syntax: ssh \\@\\ \\ How to transfer files from one host to another host ? General syntax: scp \\ \\ Package Management Package management is the process of installing and managing software on the system. We can install the packages which we require from the Linux package distributor. Different distributors use different packaging systems. Packaging systems Distributions Debian style (.deb) Debian, Ubuntu Red Hat style (.rpm) Fedora, CentOS, Red Hat Enterprise Linux Popular Packaging Systems in Linux Command Description yum install \\ Installs a package on your system yum update \\ Updates a package to it's latest available version yum remove \\ Removes a package from your system yum search \\ Searches for a particular keyword DNF is the successor to YUM which is now used in Fedora for installing and managing packages. DNF may replace YUM in the future on all RPM based Linux distributions. We did find an exact match for the keyword httpd when we searched using yum search command. Let's now install the httpd package. After httpd is installed, we will use the yum remove command to remove httpd package. Process Management In this section, we will study about some useful commands that can be used to monitor the processes on Linux systems. ps (process status) The ps command is used to know the information of a process or list of processes. If you get an error \"ps command not found\" while running ps command, do install procps package. ps without any arguments is not very useful. Let's try to list all the processes on the system by using the below command. Reference: https://unix.stackexchange.com/questions/106847/what-does-aux-mean-in-ps-aux We can use an additional argument with ps command to list the information about the process with a specific process ID. We can use grep in combination with ps command to list only specific processes. top The top command is used to show information about Linux processes running on the system in real time. It also shows a summary of the system information. For each process, top lists down the process ID, owner, priority, state, cpu utilization, memory utilization and much more information. It also lists down the memory utilization and cpu utilization of the system as a whole along with system uptime and cpu load average. Memory Management In this section, we will study about some useful commands that can be used to view information about the system memory. free The free command is used to display the memory usage of the system. The command displays the total free and used space available in the RAM along with space occupied by the caches/buffers. free command by default shows the memory usage in kilobytes. We can use an additional argument to get the data in human-readable format. vmstat The vmstat command can be used to display the memory usage along with additional information about io and cpu usage. Checking Disk Space In this section, we will study about some useful commands that can be used to view disk space on Linux. df (disk free) The df command is used to display the free and available space for each mounted file system. du (disk usage) The du command is used to display disk usage of files and directories on the system. The below command can be used to display the top 5 largest directories in the root directory. Daemons A computer program that runs as a background process is called a daemon. Traditionally, the name of daemon processes ended with d - sshd, httpd etc. We cannot interact with a daemon process as they run in the background. Services and daemons are used interchangeably most of the time. Systemd Systemd is a system and service manager for Linux operating systems. Systemd units are the building blocks of systemd. These units are represented by unit configuration files. The below examples shows the unit configuration files available at /usr/lib/systemd/system which are distributed by installed RPM packages. We are more interested in the configuration file that ends with service as these are service units. Managing System Services Service units end with .service file extension. Systemctl command can be used to start/stop/restart the services managed by systemd. Command Description systemctl start name.service Starts a service systemctl stop name.service Stops a service systemctl restart name.service Restarts a service systemctl status name.service Check the status of a service systemctl reload name.service Reload the configuration of a service Logs In this section, we will talk about some important files and directories which can be very useful for viewing system logs and applications logs in Linux. These logs can be very useful when you are troubleshooting on the system.","title":"Server Administration"},{"location":"level101/linux_basics/linux_server_administration/#linux-server-administration","text":"In this course will try to cover some of the common tasks that a linux server administrator performs. We will first try to understand what a particular command does and then try to understand the commands using examples. Do keep in mind that it's very important to practice the Linux commands on your own.","title":"Linux Server Administration"},{"location":"level101/linux_basics/linux_server_administration/#lab-environment-setup","text":"Install docker on your system - https://docs.docker.com/engine/install/ We will be running all the commands on Red Hat Enterprise Linux (RHEL) 8 system. We will run most of the commands used in this module in the above Docker container.","title":"Lab Environment Setup"},{"location":"level101/linux_basics/linux_server_administration/#multi-user-operating-systems","text":"An operating system is considered as multi-user if it allows multiple people/users to use a computer and not affect each other's files and preferences. Linux based operating systems are multi-user in nature as it allows multiple users to access the system at the same time. A typical computer will only have one keyboard and monitor but multiple users can log in via SSH if the computer is connected to the network. We will cover more about SSH later. As a server administrator, we are mostly concerned with the Linux servers which are physically present at a very large distance from us. We can connect to these servers with the help of remote login methods like SSH. Since Linux supports multiple users, we need to have a method which can protect the users from each other. One user should not be able to access and modify files of other users","title":"Multi-User Operating Systems"},{"location":"level101/linux_basics/linux_server_administration/#usergroup-management","text":"Users in Linux has an associated user ID called UID attached to them. Users also has a home directory and a login shell associated with them. A group is a collection of one or more users. A group makes it easier to share permissions among a group of users. Each group has a group ID called GID associated with it.","title":"User/Group Management"},{"location":"level101/linux_basics/linux_server_administration/#id-command","text":"id command can be used to find the uid and gid associated with an user. It also lists down the groups to which the user belongs to. The uid and gid associated with the root user is 0. A good way to find out the current user in Linux is to use the whoami command. \"root\" user or superuser is the most privileged user with unrestricted access to all the resources on the system. It has UID 0","title":"id command"},{"location":"level101/linux_basics/linux_server_administration/#important-files-associated-with-usersgroups","text":"/etc/passwd Stores the user name, the uid, the gid, the home directory, the login shell etc /etc/shadow Stores the password associated with the users /etc/group Stores information about different groups on the system If you want to understand each filed discussed in the above outputs, you can go through below links: https://tldp.org/LDP/lame/LAME/linux-admin-made-easy/shadow-file-formats.html https://tldp.org/HOWTO/User-Authentication-HOWTO/x71.html","title":"Important files associated with users/groups"},{"location":"level101/linux_basics/linux_server_administration/#important-commands-for-managing-users","text":"Some of the commands which are used frequently to manage users/groups on Linux are following: useradd - Creates a new user passwd - Adds or modifies passwords for a user usermod - Modifies attributes of an user userdel - Deletes an user","title":"Important commands for managing users"},{"location":"level101/linux_basics/linux_server_administration/#useradd","text":"The useradd command adds a new user in Linux. We will create a new user 'shivam'. We will also verify that the user has been created by tailing the /etc/passwd file. The uid and gid are 1000 for the newly created user. The home directory assigned to the user is /home/shivam and the login shell assigned is /bin/bash. Do note that the user home directory and login shell can be modified later on. If we do not specify any value for attributes like home directory or login shell, default values will be assigned to the user. We can also override these default values when creating a new user.","title":"useradd"},{"location":"level101/linux_basics/linux_server_administration/#passwd","text":"The passwd command is used to create or modify passwords for a user. In the above examples, we have not assigned any password for users 'shivam' or 'amit' while creating them. \"!!\" in an account entry in shadow means the account of an user has been created, but not yet given a password. Let's now try to create a password for user \"shivam\". Do remember the password as we will be later using examples where it will be useful. Also, let's change the password for the root user now. When we switch from a normal user to root user, it will request you for a password. Also, when you login using root user, the password will be asked.","title":"passwd"},{"location":"level101/linux_basics/linux_server_administration/#usermod","text":"The usermod command is used to modify the attributes of an user like the home directory or the shell. Let's try to modify the login shell of user \"amit\" to \"/bin/bash\". In a similar way, you can also modify many other attributes for a user. Try 'usermod -h' for a list of attributes you can modify.","title":"usermod"},{"location":"level101/linux_basics/linux_server_administration/#userdel","text":"The userdel command is used to remove a user on Linux. Once we remove a user, all the information related to that user will be removed. Let's try to delete the user \"amit\". After deleting the user, you will not find the entry for that user in \"/etc/passwd\" or \"/etc/shadow\" file.","title":"userdel"},{"location":"level101/linux_basics/linux_server_administration/#important-commands-for-managing-groups","text":"Commands for managing groups are quite similar to the commands used for managing users. Each command is not explained in detail here as they are quite similar. You can try running these commands on your system. groupadd \\ Creates a new group groupmod \\ Modifies attributes of a group groupdel \\ Deletes a group gpasswd \\ Modifies password for group We will now try to add user \"shivam\" to the group we have created above.","title":"Important commands for managing groups"},{"location":"level101/linux_basics/linux_server_administration/#becoming-a-superuser","text":"Before running the below commands, do make sure that you have set up a password for user \"shivam\" and user \"root\" using the passwd command described in the above section. The su command can be used to switch users in Linux. Let's now try to switch to user \"shivam\". Let's now try to open the \"/etc/shadow\" file. The operating system didn't allow the user \"shivam\" to read the content of the \"/etc/shadow\" file. This is an important file in Linux which stores the passwords of users. This file can only be accessed by root or users who have the superuser privileges. The sudo command allows a user to run commands with the security privileges of the root user. Do remember that the root user has all the privileges on a system. We can also use su command to switch to the root user and open the above file but doing that will require the password of the root user. An alternative way which is preferred on most modern operating systems is to use sudo command for becoming a superuser. Using this way, a user has to enter his/her password and they need to be a part of the sudo group. How to provide superpriveleges to other users ? Let's first switch to the root user using su command. Do note that using the below command will need you to enter the password for the root user. In case, you forgot to set a password for the root user, type \"exit\" and you will be back as the root user. Now, set up a password using the passwd command. The file /etc/sudoers holds the names of users permitted to invoke sudo . In redhat operating systems, this file is not present by default. We will need to install sudo. We will discuss the yum command in detail in later sections. Try to open the \"/etc/sudoers\" file on the system. The file has a lot of information. This file stores the rules that users must follow when running the sudo command. For example, root is allowed to run any commands from anywhere. One easy way of providing root access to users is to add them to a group which has permissions to run all the commands. \"wheel\" is a group in redhat Linux with such privileges. Let's add the user \"shivam\" to this group so that it also has sudo privileges. Let's now switch back to user \"shivam\" and try to access the \"/etc/shadow\" file. We need to use sudo before running the command since it can only be accessed with the sudo privileges. We have already given sudo privileges to user \u201cshivam\u201d by adding him to the group \u201cwheel\u201d.","title":"Becoming a Superuser"},{"location":"level101/linux_basics/linux_server_administration/#file-permissions","text":"On a Linux operating system, each file and directory is assigned access permissions for the owner of the file, the members of a group of related users and everybody else. This is to make sure that one user is not allowed to access the files and resources of another user. To see the permissions of a file, we can use the ls command. Let's look at the permissions of /etc/passwd file. Let's go over some of the important fields in the output that are related to file permissions.","title":"File Permissions"},{"location":"level101/linux_basics/linux_server_administration/#chmod-command","text":"The chmod command is used to modify files and directories permissions in Linux. The chmod command accepts permissions in as a numerical argument. We can think of permission as a series of bits with 1 representing True or allowed and 0 representing False or not allowed. Permission rwx Binary Decimal Read, write and execute rwx 111 7 Read and write rw- 110 6 Read and execute r-x 101 5 Read only r-- 100 4 Write and execute -wx 011 3 Write only -w- 010 2 Execute only --x 001 1 None --- 000 0 We will now create a new file and check the permission of the file. The group owner doesn't have the permission to write to this file. Let's give the group owner or root the permission to write to it using chmod command. Chmod command can be also used to change the permissions of a directory in the similar way.","title":"Chmod command"},{"location":"level101/linux_basics/linux_server_administration/#chown-command","text":"The chown command is used to change the owner of files or directories in Linux. Command syntax: chown \\ \\ In case, we do not have sudo privileges, we need to use sudo command . Let's switch to user 'shivam' and try changing the owner. We have also changed the owner of the file to root before running the below command. Chown command can also be used to change the owner of a directory in the similar way.","title":"Chown command"},{"location":"level101/linux_basics/linux_server_administration/#chgrp-command","text":"The chgrp command can be used to change the group ownership of files or directories in Linux. The syntax is very similar to that of chown command. Chgrp command can also be used to change the owner of a directory in the similar way.","title":"Chgrp command"},{"location":"level101/linux_basics/linux_server_administration/#ssh-command","text":"The ssh command is used for logging into the remote systems, transfer files between systems and for executing commands on a remote machine. SSH stands for secure shell and is used to provide an encrypted secured connection between two hosts over an insecure network like the internet. Reference: https://www.ssh.com/ssh/command/ We will now discuss passwordless authentication which is secure and most commonly used for ssh authentication.","title":"SSH Command"},{"location":"level101/linux_basics/linux_server_administration/#passwordless-authentication-using-ssh","text":"Using this method, we can ssh into hosts without entering the password. This method is also useful when we want some scripts to perform ssh-related tasks. Passwordless authentication requires the use of a public and private key pair. As the name implies, the public key can be shared with anyone but the private key should be kept private. Lets not get into the details of how this authentication works. You can read more about it here Steps for setting up a passwordless authentication with a remote host: Generating public-private key pair If we already have a key pair stored in \\~/.ssh directory, we will not need to generate keys again. Install openssh package which contains all the commands related to ssh. Generate a key pair using the ssh-keygen command. One can choose the default values for all prompts. After running the ssh-keygen command successfully, we should see two keys present in the \\~/.ssh directory. Id_rsa is the private key and id_rsa.pub is the public key. Do note that the private key can only be read and modified by you. Transferring the public key to the remote host There are multiple ways to transfer the public key to the remote server. We will look at one of the most common ways of doing it using the ssh-copy-id command. Install the openssh-clients package to use ssh-copy-id command. Use the ssh-copy-id command to copy your public key to the remote host. Now, ssh into the remote host using the password authentication. Our public key should be there in \\~/.ssh/authorized_keys now. \\~/.ssh/authorized_key contains a list of public keys. The users associated with these public keys have the ssh access into the remote host.","title":"Passwordless Authentication Using SSH"},{"location":"level101/linux_basics/linux_server_administration/#how-to-run-commands-on-a-remote-host","text":"General syntax: ssh \\@\\ \\","title":"How to run commands on a remote host ?"},{"location":"level101/linux_basics/linux_server_administration/#how-to-transfer-files-from-one-host-to-another-host","text":"General syntax: scp \\ \\","title":"How to transfer files from one host to another host ?"},{"location":"level101/linux_basics/linux_server_administration/#package-management","text":"Package management is the process of installing and managing software on the system. We can install the packages which we require from the Linux package distributor. Different distributors use different packaging systems. Packaging systems Distributions Debian style (.deb) Debian, Ubuntu Red Hat style (.rpm) Fedora, CentOS, Red Hat Enterprise Linux Popular Packaging Systems in Linux Command Description yum install \\ Installs a package on your system yum update \\ Updates a package to it's latest available version yum remove \\ Removes a package from your system yum search \\ Searches for a particular keyword DNF is the successor to YUM which is now used in Fedora for installing and managing packages. DNF may replace YUM in the future on all RPM based Linux distributions. We did find an exact match for the keyword httpd when we searched using yum search command. Let's now install the httpd package. After httpd is installed, we will use the yum remove command to remove httpd package.","title":"Package Management"},{"location":"level101/linux_basics/linux_server_administration/#process-management","text":"In this section, we will study about some useful commands that can be used to monitor the processes on Linux systems.","title":"Process Management"},{"location":"level101/linux_basics/linux_server_administration/#ps-process-status","text":"The ps command is used to know the information of a process or list of processes. If you get an error \"ps command not found\" while running ps command, do install procps package. ps without any arguments is not very useful. Let's try to list all the processes on the system by using the below command. Reference: https://unix.stackexchange.com/questions/106847/what-does-aux-mean-in-ps-aux We can use an additional argument with ps command to list the information about the process with a specific process ID. We can use grep in combination with ps command to list only specific processes.","title":"ps (process status)"},{"location":"level101/linux_basics/linux_server_administration/#top","text":"The top command is used to show information about Linux processes running on the system in real time. It also shows a summary of the system information. For each process, top lists down the process ID, owner, priority, state, cpu utilization, memory utilization and much more information. It also lists down the memory utilization and cpu utilization of the system as a whole along with system uptime and cpu load average.","title":"top"},{"location":"level101/linux_basics/linux_server_administration/#memory-management","text":"In this section, we will study about some useful commands that can be used to view information about the system memory.","title":"Memory Management"},{"location":"level101/linux_basics/linux_server_administration/#free","text":"The free command is used to display the memory usage of the system. The command displays the total free and used space available in the RAM along with space occupied by the caches/buffers. free command by default shows the memory usage in kilobytes. We can use an additional argument to get the data in human-readable format.","title":"free"},{"location":"level101/linux_basics/linux_server_administration/#vmstat","text":"The vmstat command can be used to display the memory usage along with additional information about io and cpu usage.","title":"vmstat"},{"location":"level101/linux_basics/linux_server_administration/#checking-disk-space","text":"In this section, we will study about some useful commands that can be used to view disk space on Linux.","title":"Checking Disk Space"},{"location":"level101/linux_basics/linux_server_administration/#df-disk-free","text":"The df command is used to display the free and available space for each mounted file system.","title":"df (disk free)"},{"location":"level101/linux_basics/linux_server_administration/#du-disk-usage","text":"The du command is used to display disk usage of files and directories on the system. The below command can be used to display the top 5 largest directories in the root directory.","title":"du (disk usage)"},{"location":"level101/linux_basics/linux_server_administration/#daemons","text":"A computer program that runs as a background process is called a daemon. Traditionally, the name of daemon processes ended with d - sshd, httpd etc. We cannot interact with a daemon process as they run in the background. Services and daemons are used interchangeably most of the time.","title":"Daemons"},{"location":"level101/linux_basics/linux_server_administration/#systemd","text":"Systemd is a system and service manager for Linux operating systems. Systemd units are the building blocks of systemd. These units are represented by unit configuration files. The below examples shows the unit configuration files available at /usr/lib/systemd/system which are distributed by installed RPM packages. We are more interested in the configuration file that ends with service as these are service units.","title":"Systemd"},{"location":"level101/linux_basics/linux_server_administration/#managing-system-services","text":"Service units end with .service file extension. Systemctl command can be used to start/stop/restart the services managed by systemd. Command Description systemctl start name.service Starts a service systemctl stop name.service Stops a service systemctl restart name.service Restarts a service systemctl status name.service Check the status of a service systemctl reload name.service Reload the configuration of a service","title":"Managing System Services"},{"location":"level101/linux_basics/linux_server_administration/#logs","text":"In this section, we will talk about some important files and directories which can be very useful for viewing system logs and applications logs in Linux. These logs can be very useful when you are troubleshooting on the system.","title":"Logs"},{"location":"level101/linux_networking/conclusion/","text":"Conclusion With this we have traversed through the TCP/IP stack completely. We hope there will be a different perspective when one opens any website in the browser post the course. During the course we have also dissected what are common tasks in this pipeline which falls under the ambit of SRE. Post Training Exercises Setup own DNS resolver in the dev environment which acts as an authoritative DNS server for example.com and forwarder for other domains. Update resolv.conf to use the new DNS resolver running in localhost Set up a site dummy.example.com in localhost and run a webserver with a self signed certificate. Update the trusted CAs or pass self signed CA\u2019s public key as a parameter so that curl https://dummy.example.com -v works properly without self signed cert warning Update the routing table to use another host(container/VM) in the same network as a gateway for 8.8.8.8/32 and run ping 8.8.8.8. Do the packet capture on the new gateway to see L3 hop is working as expected(might need to disable icmp_redirect)","title":"Conclusion"},{"location":"level101/linux_networking/conclusion/#conclusion","text":"With this we have traversed through the TCP/IP stack completely. We hope there will be a different perspective when one opens any website in the browser post the course. During the course we have also dissected what are common tasks in this pipeline which falls under the ambit of SRE.","title":"Conclusion"},{"location":"level101/linux_networking/conclusion/#post-training-exercises","text":"Setup own DNS resolver in the dev environment which acts as an authoritative DNS server for example.com and forwarder for other domains. Update resolv.conf to use the new DNS resolver running in localhost Set up a site dummy.example.com in localhost and run a webserver with a self signed certificate. Update the trusted CAs or pass self signed CA\u2019s public key as a parameter so that curl https://dummy.example.com -v works properly without self signed cert warning Update the routing table to use another host(container/VM) in the same network as a gateway for 8.8.8.8/32 and run ping 8.8.8.8. Do the packet capture on the new gateway to see L3 hop is working as expected(might need to disable icmp_redirect)","title":"Post Training Exercises"},{"location":"level101/linux_networking/dns/","text":"DNS Domain Names are the simple human-readable names for websites. The Internet understands only IP addresses, but since memorizing incoherent numbers is not practical, domain names are used instead. These domain names are translated into IP addresses by the DNS infrastructure. When somebody tries to open www.linkedin.com in the browser, the browser tries to convert www.linkedin.com to an IP Address. This process is called DNS resolution. A simple pseudocode depicting this process looks this ip, err = getIPAddress(domainName) if err: print(\u201cunknown Host Exception while trying to resolve:%s\u201d.format(domainName)) Now let\u2019s try to understand what happens inside the getIPAddress function. The browser would have a DNS cache of its own where it checks if there is a mapping for the domainName to an IP Address already available, in which case the browser uses that IP address. If no such mapping exists, the browser calls gethostbyname syscall to ask the operating system to find the IP address for the given domainName def getIPAddress(domainName): resp, fail = lookupCache(domainName) If not fail: return resp else: resp, err = gethostbyname(domainName) if err: return null, err else: return resp Now lets understand what operating system kernel does when the gethostbyname function is called. The Linux operating system looks at the file /etc/nsswitch.conf file which usually has a line hosts: files dns This line means the OS has to look up first in file (/etc/hosts) and then use DNS protocol to do the resolution if there is no match in /etc/hosts. The file /etc/hosts is of format IPAddress FQDN [FQDN].* 127.0.0.1 localhost.localdomain localhost ::1 localhost.localdomain localhost If a match exists for a domain in this file then that IP address is returned by the OS. Lets add a line to this file 127.0.0.1 test.linkedin.com And then do ping test.linkedin.com ping test.linkedin.com -n PING test.linkedin.com (127.0.0.1) 56(84) bytes of data. 64 bytes from 127.0.0.1: icmp_seq=1 ttl=64 time=0.047 ms 64 bytes from 127.0.0.1: icmp_seq=2 ttl=64 time=0.036 ms 64 bytes from 127.0.0.1: icmp_seq=3 ttl=64 time=0.037 ms As mentioned earlier, if no match exists in /etc/hosts, the OS tries to do a DNS resolution using the DNS protocol. The linux system makes a DNS request to the first IP in /etc/resolv.conf. If there is no response, requests are sent to subsequent servers in resolv.conf. These servers in resolv.conf are called DNS resolvers. The DNS resolvers are populated by DHCP or statically configured by an administrator. Dig is a userspace DNS system which creates and sends request to DNS resolvers and prints the response it receives to the console. #run this command in one shell to capture all DNS requests sudo tcpdump -s 0 -A -i any port 53 #make a dig request from another shell dig linkedin.com 13:19:54.432507 IP 172.19.209.122.56497 > 172.23.195.101.53: 527+ [1au] A? linkedin.com. (41) ....E..E....@.n....z...e...5.1.:... .........linkedin.com.......)........ 13:19:54.485131 IP 172.23.195.101.53 > 172.19.209.122.56497: 527 1/0/1 A 108.174.10.10 (57) ....E..U..@.|. ....e...z.5...A...............linkedin.com..............3..l. ..)........ The packet capture shows a request is made to 172.23.195.101:53 (this is the resolver in /etc/resolv.conf) for linkedin.com and a response is received from 172.23.195.101 with the IP address of linkedin.com 108.174.10.10 Now let's try to understand how DNS resolver tries to find the IP address of linkedin.com. DNS resolver first looks at its cache. Since many devices in the network can query for the domain name linkedin.com, the name resolution result may already exist in the cache. If there is a cache miss, it starts the DNS resolution process. The DNS server breaks \u201clinkedin.com\u201d to \u201c.\u201d, \u201ccom.\u201d and \u201clinkedin.com.\u201d and starts DNS resolution from \u201c.\u201d. The \u201c.\u201d is called root domain and those IPs are known to the DNS resolver software. DNS resolver queries the root domain nameservers to find the right top-level domain (TLD) nameservers which could respond regarding details for \"com.\". The address of the TLD nameserver of \u201ccom.\u201d is returned. Now the DNS resolution service contacts the TLD nameserver for \u201ccom.\u201d to fetch the authoritative nameserver for \u201clinkedin.com\u201d. Once an authoritative nameserver of \u201clinkedin.com\u201d is known, the resolver contacts Linkedin\u2019s nameserver to provide the IP address of \u201clinkedin.com\u201d. This whole process can be visualized by running the following - dig +trace linkedin.com linkedin.com. 3600 IN A 108.174.10.10 This DNS response has 5 fields where the first field is the request and the last field is the response. The second field is the Time to Live which says how long the DNS response is valid in seconds. In this case this mapping of linkedin.com is valid for 1 hour. This is how the resolvers and application(browser) maintain their cache. Any request for linkedin.com beyond 1 hour will be treated as a cache miss as the mapping has expired its TTL and the whole process has to be redone. The 4th field says the type of DNS response/request. Some of the various DNS query types are A, AAAA, NS, TXT, PTR, MX and CNAME. A record returns IPV4 address of the domain name AAAA record returns the IPV6 address of the domain Name NS record returns the authoritative nameserver for the domain name CNAME records are aliases to the domain names. Some domains point to other domain names and resolving the latter domain name gives an IP which is used as an IP for the former domain name as well. Example www.linkedin.com\u2019s IP address is the same as 2-01-2c3e-005a.cdx.cedexis.net. For the brevity we are not discussing other DNS record types, the RFC of each of these records are available here . dig A linkedin.com +short 108.174.10.10 dig AAAA linkedin.com +short 2620:109:c002::6cae:a0a dig NS linkedin.com +short dns3.p09.nsone.net. dns4.p09.nsone.net. dns2.p09.nsone.net. ns4.p43.dynect.net. ns1.p43.dynect.net. ns2.p43.dynect.net. ns3.p43.dynect.net. dns1.p09.nsone.net. dig www.linkedin.com CNAME +short 2-01-2c3e-005a.cdx.cedexis.net. Armed with these fundamentals of DNS lets see usecases where DNS is used by SREs. Applications in SRE role This section covers some of the common solutions SRE can derive from DNS Every company has to have its internal DNS infrastructure for intranet sites and internal services like databases and other internal applications like wiki. So there has to be a DNS infrastructure maintained for those domain names by the infrastructure team. This DNS infrastructure has to be optimized and scaled so that it doesn\u2019t become a single point of failure. Failure of the internal DNS infrastructure can cause API calls of microservices to fail and other cascading effects. DNS can also be used for discovering services. For example the hostname serviceb.internal.example.com could list instances which run service b internally in example.com company. Cloud providers provide options to enable DNS discovery( example ) DNS is used by cloud providers and CDN providers to scale their services. In Azure/AWS, Load Balancers are given a CNAME instead of IPAddress. They update the IPAddress of the Loadbalancers as they scale by changing the IP Address of alias domain names. This is one of the reasons why A records of such alias domains are short lived like 1 minute. DNS can also be used to make clients get IP addresses closer to their location so that their HTTP calls can be responded faster if the company has a presence geographically distributed. SRE also has to understand since there is no verification in DNS infrastructure, these responses can be spoofed. This is safeguarded by other protocols like HTTPS(dealt later). DNSSEC protects from forged or manipulated DNS responses. Stale DNS cache can be a problem. Some apps might still be using expired DNS records for their api calls. This is something SRE has to be wary of when doing maintenance. DNS Loadbalancing and service discovery also has to understand TTL and the servers can be removed from the pool only after waiting till TTL post the changes are made to DNS records. If this is not done, a certain portion of the traffic will fail as the server is removed before the TTL.","title":"DNS"},{"location":"level101/linux_networking/dns/#dns","text":"Domain Names are the simple human-readable names for websites. The Internet understands only IP addresses, but since memorizing incoherent numbers is not practical, domain names are used instead. These domain names are translated into IP addresses by the DNS infrastructure. When somebody tries to open www.linkedin.com in the browser, the browser tries to convert www.linkedin.com to an IP Address. This process is called DNS resolution. A simple pseudocode depicting this process looks this ip, err = getIPAddress(domainName) if err: print(\u201cunknown Host Exception while trying to resolve:%s\u201d.format(domainName)) Now let\u2019s try to understand what happens inside the getIPAddress function. The browser would have a DNS cache of its own where it checks if there is a mapping for the domainName to an IP Address already available, in which case the browser uses that IP address. If no such mapping exists, the browser calls gethostbyname syscall to ask the operating system to find the IP address for the given domainName def getIPAddress(domainName): resp, fail = lookupCache(domainName) If not fail: return resp else: resp, err = gethostbyname(domainName) if err: return null, err else: return resp Now lets understand what operating system kernel does when the gethostbyname function is called. The Linux operating system looks at the file /etc/nsswitch.conf file which usually has a line hosts: files dns This line means the OS has to look up first in file (/etc/hosts) and then use DNS protocol to do the resolution if there is no match in /etc/hosts. The file /etc/hosts is of format IPAddress FQDN [FQDN].* 127.0.0.1 localhost.localdomain localhost ::1 localhost.localdomain localhost If a match exists for a domain in this file then that IP address is returned by the OS. Lets add a line to this file 127.0.0.1 test.linkedin.com And then do ping test.linkedin.com ping test.linkedin.com -n PING test.linkedin.com (127.0.0.1) 56(84) bytes of data. 64 bytes from 127.0.0.1: icmp_seq=1 ttl=64 time=0.047 ms 64 bytes from 127.0.0.1: icmp_seq=2 ttl=64 time=0.036 ms 64 bytes from 127.0.0.1: icmp_seq=3 ttl=64 time=0.037 ms As mentioned earlier, if no match exists in /etc/hosts, the OS tries to do a DNS resolution using the DNS protocol. The linux system makes a DNS request to the first IP in /etc/resolv.conf. If there is no response, requests are sent to subsequent servers in resolv.conf. These servers in resolv.conf are called DNS resolvers. The DNS resolvers are populated by DHCP or statically configured by an administrator. Dig is a userspace DNS system which creates and sends request to DNS resolvers and prints the response it receives to the console. #run this command in one shell to capture all DNS requests sudo tcpdump -s 0 -A -i any port 53 #make a dig request from another shell dig linkedin.com 13:19:54.432507 IP 172.19.209.122.56497 > 172.23.195.101.53: 527+ [1au] A? linkedin.com. (41) ....E..E....@.n....z...e...5.1.:... .........linkedin.com.......)........ 13:19:54.485131 IP 172.23.195.101.53 > 172.19.209.122.56497: 527 1/0/1 A 108.174.10.10 (57) ....E..U..@.|. ....e...z.5...A...............linkedin.com..............3..l. ..)........ The packet capture shows a request is made to 172.23.195.101:53 (this is the resolver in /etc/resolv.conf) for linkedin.com and a response is received from 172.23.195.101 with the IP address of linkedin.com 108.174.10.10 Now let's try to understand how DNS resolver tries to find the IP address of linkedin.com. DNS resolver first looks at its cache. Since many devices in the network can query for the domain name linkedin.com, the name resolution result may already exist in the cache. If there is a cache miss, it starts the DNS resolution process. The DNS server breaks \u201clinkedin.com\u201d to \u201c.\u201d, \u201ccom.\u201d and \u201clinkedin.com.\u201d and starts DNS resolution from \u201c.\u201d. The \u201c.\u201d is called root domain and those IPs are known to the DNS resolver software. DNS resolver queries the root domain nameservers to find the right top-level domain (TLD) nameservers which could respond regarding details for \"com.\". The address of the TLD nameserver of \u201ccom.\u201d is returned. Now the DNS resolution service contacts the TLD nameserver for \u201ccom.\u201d to fetch the authoritative nameserver for \u201clinkedin.com\u201d. Once an authoritative nameserver of \u201clinkedin.com\u201d is known, the resolver contacts Linkedin\u2019s nameserver to provide the IP address of \u201clinkedin.com\u201d. This whole process can be visualized by running the following - dig +trace linkedin.com linkedin.com. 3600 IN A 108.174.10.10 This DNS response has 5 fields where the first field is the request and the last field is the response. The second field is the Time to Live which says how long the DNS response is valid in seconds. In this case this mapping of linkedin.com is valid for 1 hour. This is how the resolvers and application(browser) maintain their cache. Any request for linkedin.com beyond 1 hour will be treated as a cache miss as the mapping has expired its TTL and the whole process has to be redone. The 4th field says the type of DNS response/request. Some of the various DNS query types are A, AAAA, NS, TXT, PTR, MX and CNAME. A record returns IPV4 address of the domain name AAAA record returns the IPV6 address of the domain Name NS record returns the authoritative nameserver for the domain name CNAME records are aliases to the domain names. Some domains point to other domain names and resolving the latter domain name gives an IP which is used as an IP for the former domain name as well. Example www.linkedin.com\u2019s IP address is the same as 2-01-2c3e-005a.cdx.cedexis.net. For the brevity we are not discussing other DNS record types, the RFC of each of these records are available here . dig A linkedin.com +short 108.174.10.10 dig AAAA linkedin.com +short 2620:109:c002::6cae:a0a dig NS linkedin.com +short dns3.p09.nsone.net. dns4.p09.nsone.net. dns2.p09.nsone.net. ns4.p43.dynect.net. ns1.p43.dynect.net. ns2.p43.dynect.net. ns3.p43.dynect.net. dns1.p09.nsone.net. dig www.linkedin.com CNAME +short 2-01-2c3e-005a.cdx.cedexis.net. Armed with these fundamentals of DNS lets see usecases where DNS is used by SREs.","title":"DNS"},{"location":"level101/linux_networking/dns/#applications-in-sre-role","text":"This section covers some of the common solutions SRE can derive from DNS Every company has to have its internal DNS infrastructure for intranet sites and internal services like databases and other internal applications like wiki. So there has to be a DNS infrastructure maintained for those domain names by the infrastructure team. This DNS infrastructure has to be optimized and scaled so that it doesn\u2019t become a single point of failure. Failure of the internal DNS infrastructure can cause API calls of microservices to fail and other cascading effects. DNS can also be used for discovering services. For example the hostname serviceb.internal.example.com could list instances which run service b internally in example.com company. Cloud providers provide options to enable DNS discovery( example ) DNS is used by cloud providers and CDN providers to scale their services. In Azure/AWS, Load Balancers are given a CNAME instead of IPAddress. They update the IPAddress of the Loadbalancers as they scale by changing the IP Address of alias domain names. This is one of the reasons why A records of such alias domains are short lived like 1 minute. DNS can also be used to make clients get IP addresses closer to their location so that their HTTP calls can be responded faster if the company has a presence geographically distributed. SRE also has to understand since there is no verification in DNS infrastructure, these responses can be spoofed. This is safeguarded by other protocols like HTTPS(dealt later). DNSSEC protects from forged or manipulated DNS responses. Stale DNS cache can be a problem. Some apps might still be using expired DNS records for their api calls. This is something SRE has to be wary of when doing maintenance. DNS Loadbalancing and service discovery also has to understand TTL and the servers can be removed from the pool only after waiting till TTL post the changes are made to DNS records. If this is not done, a certain portion of the traffic will fail as the server is removed before the TTL.","title":"Applications in SRE role"},{"location":"level101/linux_networking/http/","text":"HTTP Till this point we have only got the IP address of linkedin.com. The HTML page of linkedin.com is served by HTTP protocol which the browser renders. Browser sends a HTTP request to the IP of the server determined above. Request has a verb GET, PUT, POST followed by a path and query parameters and lines of key value pair which gives information about the client and capabilities of the client like contents it can accept and a body (usually in POST or PUT) # Eg run the following in your container and have a look at the headers curl linkedin.com -v * Connected to linkedin.com (108.174.10.10) port 80 (#0) > GET / HTTP/1.1 > Host: linkedin.com > User-Agent: curl/7.64.1 > Accept: */* > < HTTP/1.1 301 Moved Permanently < Date: Mon, 09 Nov 2020 10:39:43 GMT < X-Li-Pop: prod-esv5 < X-LI-Proto: http/1.1 < Location: https://www.linkedin.com/ < Content-Length: 0 < * Connection #0 to host linkedin.com left intact * Closing connection 0 Here, in the first line GET is the verb, / is the path and 1.1 is the HTTP protocol version. Then there are key value pairs which give client capabilities and some details to the server. The server responds back with HTTP version, Status Code and Status message . Status codes 2xx means success, 3xx denotes redirection, 4xx denotes client side errors and 5xx server side errors. We will now jump in to see the difference between HTTP/1.0 and HTTP/1.1. #On the terminal type telnet www.linkedin.com 80 #Copy and paste the following with an empty new line at last in the telnet STDIN GET / HTTP/1.1 HOST:linkedin.com USER-AGENT: curl This would get server response and waits for next input as the underlying connection to www.linkedin.com can be reused for further queries. While going through TCP, we can understand the benefits of this. But in HTTP/1.0 this connection will be immediately closed after the response meaning new connection has to be opened for each query. HTTP/1.1 can have only one inflight request in an open connection but connection can be reused for multiple requests one after another. One of the benefits of HTTP/2.0 over HTTP/1.1 is we can have multiple inflight requests on the same connection. We are restricting our scope to generic HTTP and not jumping to the intricacies of each protocol version but they should be straight forward to understand post the course. HTTP is called stateless protocol . This section we will try to understand what stateless means. Say we logged in to linkedin.com, each request to linkedin.com from the client will have no context of the user and it makes no sense to prompt user to login for each page/resource. This problem of HTTP is solved by COOKIE . A user is created a session when a user logs in. This session identifier is sent to the browser via SET-COOKIE header. The browser stores the COOKIE till the expiry set by the server and sends the cookie for each request from hereon for linkedin.com. More details on cookies are available here . Cookies are a critical piece of information like password and since HTTP is a plain text protocol, any man in the middle can capture either password or cookies and can breach the privacy of the user. Similarly as discussed during DNS a spoofed IP of linkedin.com can cause a phishing attack on users where an user can give linkedin\u2019s password to login on the malicious site. To solve both problems HTTPs came in place and HTTPs has to be mandated. HTTPS has to provide server identification and encryption of data between client and server. The server administrator has to generate a private public key pair and certificate request. This certificate request has to be signed by a certificate authority which converts the certificate request to a certificate. The server administrator has to update the certificate and private key to the webserver. The certificate has details about the server (like domain name for which it serves, expiry date), public key of the server. The private key is a secret to the server and losing the private key loses the trust the server provides. When clients connect, the client sends a HELLO. The server sends its certificate to the client. The client checks the validity of the cert by seeing if it is within its expiry time, if it is signed by a trusted authority and the hostname in the cert is the same as the server. This validation makes sure the server is the right server and there is no phishing. Once that is validated, the client negotiates a symmetrical key and cipher with the server by encrypting the negotiation with the public key of the server. Nobody else other than the server who has the private key can understand this data. Once negotiation is complete, that symmetric key and algorithm is used for further encryption which can be decrypted only by client and server from thereon as they only know the symmetric key and algorithm. The switch to symmetric algorithm from asymmetric encryption algorithm is to not strain the resources of client devices as symmetric encryption is generally less resource intensive than asymmetric. #Try the following on your terminal to see the cert details like Subject Name(domain name), Issuer details, Expiry date curl https://www.linkedin.com -v * Connected to www.linkedin.com (13.107.42.14) port 443 (#0) * ALPN, offering h2 * ALPN, offering http/1.1 * successfully set certificate verify locations: * CAfile: /etc/ssl/cert.pem CApath: none * TLSv1.2 (OUT), TLS handshake, Client hello (1): } [230 bytes data] * TLSv1.2 (IN), TLS handshake, Server hello (2): { [90 bytes data] * TLSv1.2 (IN), TLS handshake, Certificate (11): { [3171 bytes data] * TLSv1.2 (IN), TLS handshake, Server key exchange (12): { [365 bytes data] * TLSv1.2 (IN), TLS handshake, Server finished (14): { [4 bytes data] * TLSv1.2 (OUT), TLS handshake, Client key exchange (16): } [102 bytes data] * TLSv1.2 (OUT), TLS change cipher, Change cipher spec (1): } [1 bytes data] * TLSv1.2 (OUT), TLS handshake, Finished (20): } [16 bytes data] * TLSv1.2 (IN), TLS change cipher, Change cipher spec (1): { [1 bytes data] * TLSv1.2 (IN), TLS handshake, Finished (20): { [16 bytes data] * SSL connection using TLSv1.2 / ECDHE-RSA-AES256-GCM-SHA384 * ALPN, server accepted to use h2 * Server certificate: * subject: C=US; ST=California; L=Sunnyvale; O=LinkedIn Corporation; CN=www.linkedin.com * start date: Oct 2 00:00:00 2020 GMT * expire date: Apr 2 12:00:00 2021 GMT * subjectAltName: host \"www.linkedin.com\" matched cert's \"www.linkedin.com\" * issuer: C=US; O=DigiCert Inc; CN=DigiCert SHA2 Secure Server CA * SSL certificate verify ok. * Using HTTP2, server supports multi-use * Connection state changed (HTTP/2 confirmed) * Copying HTTP/2 data in stream buffer to connection buffer after upgrade: len=0 * Using Stream ID: 1 (easy handle 0x7fb055808200) * Connection state changed (MAX_CONCURRENT_STREAMS == 100)! 0 82117 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0 * Connection #0 to host www.linkedin.com left intact HTTP/2 200 cache-control: no-cache, no-store pragma: no-cache content-length: 82117 content-type: text/html; charset=utf-8 expires: Thu, 01 Jan 1970 00:00:00 GMT set-cookie: JSESSIONID=ajax:2747059799136291014; SameSite=None; Path=/; Domain=.www.linkedin.com; Secure set-cookie: lang=v=2&lang=en-us; SameSite=None; Path=/; Domain=linkedin.com; Secure set-cookie: bcookie=\"v=2&70bd59e3-5a51-406c-8e0d-dd70befa8890\"; domain=.linkedin.com; Path=/; Secure; Expires=Wed, 09-Nov-2022 22:27:42 GMT; SameSite=None set-cookie: bscookie=\"v=1&202011091050107ae9b7ac-fe97-40fc-830d-d7a9ccf80659AQGib5iXwarbY8CCBP94Q39THkgUlx6J\"; domain=.www.linkedin.com; Path=/; Secure; Expires=Wed, 09-Nov-2022 22:27:42 GMT; HttpOnly; SameSite=None set-cookie: lissc=1; domain=.linkedin.com; Path=/; Secure; Expires=Tue, 09-Nov-2021 10:50:10 GMT; SameSite=None set-cookie: lidc=\"b=VGST04:s=V:r=V:g=2201:u=1:i=1604919010:t=1605005410:v=1:sig=AQHe-KzU8i_5Iy6MwnFEsgRct3c9Lh5R\"; Expires=Tue, 10 Nov 2020 10:50:10 GMT; domain=.linkedin.com; Path=/; SameSite=None; Secure x-fs-txn-id: 2b8d5409ba70 x-fs-uuid: 61bbf94956d14516302567fc882b0000 expect-ct: max-age=86400, report-uri=\"https://www.linkedin.com/platform-telemetry/ct\" x-xss-protection: 1; mode=block content-security-policy-report-only: default-src 'none'; connect-src 'self' www.linkedin.com www.google-analytics.com https://dpm.demdex.net/id lnkd.demdex.net blob: https://linkedin.sc.omtrdc.net/b/ss/ static.licdn.com static-exp1.licdn.com static-exp2.licdn.com static-exp3.licdn.com; script-src 'sha256-THuVhwbXPeTR0HszASqMOnIyxqEgvGyBwSPBKBF/iMc=' 'sha256-PyCXNcEkzRWqbiNr087fizmiBBrq9O6GGD8eV3P09Ik=' 'sha256-2SQ55Erm3CPCb+k03EpNxU9bdV3XL9TnVTriDs7INZ4=' 'sha256-S/KSPe186K/1B0JEjbIXcCdpB97krdzX05S+dHnQjUs=' platform.linkedin.com platform-akam.linkedin.com platform-ecst.linkedin.com platform-azur.linkedin.com static.licdn.com static-exp1.licdn.com static-exp2.licdn.com static-exp3.licdn.com; img-src data: blob: *; font-src data: *; style-src 'self' 'unsafe-inline' static.licdn.com static-exp1.licdn.com static-exp2.licdn.com static-exp3.licdn.com; media-src dms.licdn.com; child-src blob: *; frame-src 'self' lnkd.demdex.net linkedin.cdn.qualaroo.com; manifest-src 'self'; report-uri https://www.linkedin.com/platform-telemetry/csp?f=g content-security-policy: default-src *; connect-src 'self' https://media-src.linkedin.com/media/ www.linkedin.com s.c.lnkd.licdn.com m.c.lnkd.licdn.com s.c.exp1.licdn.com s.c.exp2.licdn.com m.c.exp1.licdn.com m.c.exp2.licdn.com wss://*.linkedin.com dms.licdn.com https://dpm.demdex.net/id lnkd.demdex.net blob: https://accounts.google.com/gsi/status https://linkedin.sc.omtrdc.net/b/ss/ www.google-analytics.com static.licdn.com static-exp1.licdn.com static-exp2.licdn.com static-exp3.licdn.com media.licdn.com media-exp1.licdn.com media-exp2.licdn.com media-exp3.licdn.com; img-src data: blob: *; font-src data: *; style-src 'unsafe-inline' 'self' static-src.linkedin.com *.licdn.com; script-src 'report-sample' 'unsafe-inline' 'unsafe-eval' 'self' spdy.linkedin.com static-src.linkedin.com *.ads.linkedin.com *.licdn.com static.chartbeat.com www.google-analytics.com ssl.google-analytics.com bcvipva02.rightnowtech.com www.bizographics.com sjs.bizographics.com js.bizographics.com d.la4-c1-was.salesforceliveagent.com slideshare.www.linkedin.com https://snap.licdn.com/li.lms-analytics/ platform.linkedin.com platform-akam.linkedin.com platform-ecst.linkedin.com platform-azur.linkedin.com; object-src 'none'; media-src blob: *; child-src blob: lnkd-communities: voyager: *; frame-ancestors 'self'; report-uri https://www.linkedin.com/platform-telemetry/csp?f=l x-frame-options: sameorigin x-content-type-options: nosniff strict-transport-security: max-age=2592000 x-li-fabric: prod-lva1 x-li-pop: afd-prod-lva1 x-li-proto: http/2 x-li-uuid: Ybv5SVbRRRYwJWf8iCsAAA== x-msedge-ref: Ref A: CFB9AC1D2B0645DDB161CEE4A4909AEF Ref B: BOM02EDGE0712 Ref C: 2020-11-09T10:50:10Z date: Mon, 09 Nov 2020 10:50:10 GMT * Closing connection 0 Here my system has a list of certificate authorities it trusts in this file /etc/ssl/cert.pem. Curl validates the certificate is for www.linkedin.com by seeing the CN section of the subject part of the certificate. It also makes sure the certificate is not expired by seeing the expire date. It also validates the signature on the certificate by using the public key of issuer Digicert in /etc/ssl/cert.pem. Once this is done, using the public key of www.linkedin.com it negotiates cipher TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 with a symmetric key. Subsequent data transfer including first HTTP request uses the same cipher and symmetric key.","title":"HTTP"},{"location":"level101/linux_networking/http/#http","text":"Till this point we have only got the IP address of linkedin.com. The HTML page of linkedin.com is served by HTTP protocol which the browser renders. Browser sends a HTTP request to the IP of the server determined above. Request has a verb GET, PUT, POST followed by a path and query parameters and lines of key value pair which gives information about the client and capabilities of the client like contents it can accept and a body (usually in POST or PUT) # Eg run the following in your container and have a look at the headers curl linkedin.com -v * Connected to linkedin.com (108.174.10.10) port 80 (#0) > GET / HTTP/1.1 > Host: linkedin.com > User-Agent: curl/7.64.1 > Accept: */* > < HTTP/1.1 301 Moved Permanently < Date: Mon, 09 Nov 2020 10:39:43 GMT < X-Li-Pop: prod-esv5 < X-LI-Proto: http/1.1 < Location: https://www.linkedin.com/ < Content-Length: 0 < * Connection #0 to host linkedin.com left intact * Closing connection 0 Here, in the first line GET is the verb, / is the path and 1.1 is the HTTP protocol version. Then there are key value pairs which give client capabilities and some details to the server. The server responds back with HTTP version, Status Code and Status message . Status codes 2xx means success, 3xx denotes redirection, 4xx denotes client side errors and 5xx server side errors. We will now jump in to see the difference between HTTP/1.0 and HTTP/1.1. #On the terminal type telnet www.linkedin.com 80 #Copy and paste the following with an empty new line at last in the telnet STDIN GET / HTTP/1.1 HOST:linkedin.com USER-AGENT: curl This would get server response and waits for next input as the underlying connection to www.linkedin.com can be reused for further queries. While going through TCP, we can understand the benefits of this. But in HTTP/1.0 this connection will be immediately closed after the response meaning new connection has to be opened for each query. HTTP/1.1 can have only one inflight request in an open connection but connection can be reused for multiple requests one after another. One of the benefits of HTTP/2.0 over HTTP/1.1 is we can have multiple inflight requests on the same connection. We are restricting our scope to generic HTTP and not jumping to the intricacies of each protocol version but they should be straight forward to understand post the course. HTTP is called stateless protocol . This section we will try to understand what stateless means. Say we logged in to linkedin.com, each request to linkedin.com from the client will have no context of the user and it makes no sense to prompt user to login for each page/resource. This problem of HTTP is solved by COOKIE . A user is created a session when a user logs in. This session identifier is sent to the browser via SET-COOKIE header. The browser stores the COOKIE till the expiry set by the server and sends the cookie for each request from hereon for linkedin.com. More details on cookies are available here . Cookies are a critical piece of information like password and since HTTP is a plain text protocol, any man in the middle can capture either password or cookies and can breach the privacy of the user. Similarly as discussed during DNS a spoofed IP of linkedin.com can cause a phishing attack on users where an user can give linkedin\u2019s password to login on the malicious site. To solve both problems HTTPs came in place and HTTPs has to be mandated. HTTPS has to provide server identification and encryption of data between client and server. The server administrator has to generate a private public key pair and certificate request. This certificate request has to be signed by a certificate authority which converts the certificate request to a certificate. The server administrator has to update the certificate and private key to the webserver. The certificate has details about the server (like domain name for which it serves, expiry date), public key of the server. The private key is a secret to the server and losing the private key loses the trust the server provides. When clients connect, the client sends a HELLO. The server sends its certificate to the client. The client checks the validity of the cert by seeing if it is within its expiry time, if it is signed by a trusted authority and the hostname in the cert is the same as the server. This validation makes sure the server is the right server and there is no phishing. Once that is validated, the client negotiates a symmetrical key and cipher with the server by encrypting the negotiation with the public key of the server. Nobody else other than the server who has the private key can understand this data. Once negotiation is complete, that symmetric key and algorithm is used for further encryption which can be decrypted only by client and server from thereon as they only know the symmetric key and algorithm. The switch to symmetric algorithm from asymmetric encryption algorithm is to not strain the resources of client devices as symmetric encryption is generally less resource intensive than asymmetric. #Try the following on your terminal to see the cert details like Subject Name(domain name), Issuer details, Expiry date curl https://www.linkedin.com -v * Connected to www.linkedin.com (13.107.42.14) port 443 (#0) * ALPN, offering h2 * ALPN, offering http/1.1 * successfully set certificate verify locations: * CAfile: /etc/ssl/cert.pem CApath: none * TLSv1.2 (OUT), TLS handshake, Client hello (1): } [230 bytes data] * TLSv1.2 (IN), TLS handshake, Server hello (2): { [90 bytes data] * TLSv1.2 (IN), TLS handshake, Certificate (11): { [3171 bytes data] * TLSv1.2 (IN), TLS handshake, Server key exchange (12): { [365 bytes data] * TLSv1.2 (IN), TLS handshake, Server finished (14): { [4 bytes data] * TLSv1.2 (OUT), TLS handshake, Client key exchange (16): } [102 bytes data] * TLSv1.2 (OUT), TLS change cipher, Change cipher spec (1): } [1 bytes data] * TLSv1.2 (OUT), TLS handshake, Finished (20): } [16 bytes data] * TLSv1.2 (IN), TLS change cipher, Change cipher spec (1): { [1 bytes data] * TLSv1.2 (IN), TLS handshake, Finished (20): { [16 bytes data] * SSL connection using TLSv1.2 / ECDHE-RSA-AES256-GCM-SHA384 * ALPN, server accepted to use h2 * Server certificate: * subject: C=US; ST=California; L=Sunnyvale; O=LinkedIn Corporation; CN=www.linkedin.com * start date: Oct 2 00:00:00 2020 GMT * expire date: Apr 2 12:00:00 2021 GMT * subjectAltName: host \"www.linkedin.com\" matched cert's \"www.linkedin.com\" * issuer: C=US; O=DigiCert Inc; CN=DigiCert SHA2 Secure Server CA * SSL certificate verify ok. * Using HTTP2, server supports multi-use * Connection state changed (HTTP/2 confirmed) * Copying HTTP/2 data in stream buffer to connection buffer after upgrade: len=0 * Using Stream ID: 1 (easy handle 0x7fb055808200) * Connection state changed (MAX_CONCURRENT_STREAMS == 100)! 0 82117 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0 * Connection #0 to host www.linkedin.com left intact HTTP/2 200 cache-control: no-cache, no-store pragma: no-cache content-length: 82117 content-type: text/html; charset=utf-8 expires: Thu, 01 Jan 1970 00:00:00 GMT set-cookie: JSESSIONID=ajax:2747059799136291014; SameSite=None; Path=/; Domain=.www.linkedin.com; Secure set-cookie: lang=v=2&lang=en-us; SameSite=None; Path=/; Domain=linkedin.com; Secure set-cookie: bcookie=\"v=2&70bd59e3-5a51-406c-8e0d-dd70befa8890\"; domain=.linkedin.com; Path=/; Secure; Expires=Wed, 09-Nov-2022 22:27:42 GMT; SameSite=None set-cookie: bscookie=\"v=1&202011091050107ae9b7ac-fe97-40fc-830d-d7a9ccf80659AQGib5iXwarbY8CCBP94Q39THkgUlx6J\"; domain=.www.linkedin.com; Path=/; Secure; Expires=Wed, 09-Nov-2022 22:27:42 GMT; HttpOnly; SameSite=None set-cookie: lissc=1; domain=.linkedin.com; Path=/; Secure; Expires=Tue, 09-Nov-2021 10:50:10 GMT; SameSite=None set-cookie: lidc=\"b=VGST04:s=V:r=V:g=2201:u=1:i=1604919010:t=1605005410:v=1:sig=AQHe-KzU8i_5Iy6MwnFEsgRct3c9Lh5R\"; Expires=Tue, 10 Nov 2020 10:50:10 GMT; domain=.linkedin.com; Path=/; SameSite=None; Secure x-fs-txn-id: 2b8d5409ba70 x-fs-uuid: 61bbf94956d14516302567fc882b0000 expect-ct: max-age=86400, report-uri=\"https://www.linkedin.com/platform-telemetry/ct\" x-xss-protection: 1; mode=block content-security-policy-report-only: default-src 'none'; connect-src 'self' www.linkedin.com www.google-analytics.com https://dpm.demdex.net/id lnkd.demdex.net blob: https://linkedin.sc.omtrdc.net/b/ss/ static.licdn.com static-exp1.licdn.com static-exp2.licdn.com static-exp3.licdn.com; script-src 'sha256-THuVhwbXPeTR0HszASqMOnIyxqEgvGyBwSPBKBF/iMc=' 'sha256-PyCXNcEkzRWqbiNr087fizmiBBrq9O6GGD8eV3P09Ik=' 'sha256-2SQ55Erm3CPCb+k03EpNxU9bdV3XL9TnVTriDs7INZ4=' 'sha256-S/KSPe186K/1B0JEjbIXcCdpB97krdzX05S+dHnQjUs=' platform.linkedin.com platform-akam.linkedin.com platform-ecst.linkedin.com platform-azur.linkedin.com static.licdn.com static-exp1.licdn.com static-exp2.licdn.com static-exp3.licdn.com; img-src data: blob: *; font-src data: *; style-src 'self' 'unsafe-inline' static.licdn.com static-exp1.licdn.com static-exp2.licdn.com static-exp3.licdn.com; media-src dms.licdn.com; child-src blob: *; frame-src 'self' lnkd.demdex.net linkedin.cdn.qualaroo.com; manifest-src 'self'; report-uri https://www.linkedin.com/platform-telemetry/csp?f=g content-security-policy: default-src *; connect-src 'self' https://media-src.linkedin.com/media/ www.linkedin.com s.c.lnkd.licdn.com m.c.lnkd.licdn.com s.c.exp1.licdn.com s.c.exp2.licdn.com m.c.exp1.licdn.com m.c.exp2.licdn.com wss://*.linkedin.com dms.licdn.com https://dpm.demdex.net/id lnkd.demdex.net blob: https://accounts.google.com/gsi/status https://linkedin.sc.omtrdc.net/b/ss/ www.google-analytics.com static.licdn.com static-exp1.licdn.com static-exp2.licdn.com static-exp3.licdn.com media.licdn.com media-exp1.licdn.com media-exp2.licdn.com media-exp3.licdn.com; img-src data: blob: *; font-src data: *; style-src 'unsafe-inline' 'self' static-src.linkedin.com *.licdn.com; script-src 'report-sample' 'unsafe-inline' 'unsafe-eval' 'self' spdy.linkedin.com static-src.linkedin.com *.ads.linkedin.com *.licdn.com static.chartbeat.com www.google-analytics.com ssl.google-analytics.com bcvipva02.rightnowtech.com www.bizographics.com sjs.bizographics.com js.bizographics.com d.la4-c1-was.salesforceliveagent.com slideshare.www.linkedin.com https://snap.licdn.com/li.lms-analytics/ platform.linkedin.com platform-akam.linkedin.com platform-ecst.linkedin.com platform-azur.linkedin.com; object-src 'none'; media-src blob: *; child-src blob: lnkd-communities: voyager: *; frame-ancestors 'self'; report-uri https://www.linkedin.com/platform-telemetry/csp?f=l x-frame-options: sameorigin x-content-type-options: nosniff strict-transport-security: max-age=2592000 x-li-fabric: prod-lva1 x-li-pop: afd-prod-lva1 x-li-proto: http/2 x-li-uuid: Ybv5SVbRRRYwJWf8iCsAAA== x-msedge-ref: Ref A: CFB9AC1D2B0645DDB161CEE4A4909AEF Ref B: BOM02EDGE0712 Ref C: 2020-11-09T10:50:10Z date: Mon, 09 Nov 2020 10:50:10 GMT * Closing connection 0 Here my system has a list of certificate authorities it trusts in this file /etc/ssl/cert.pem. Curl validates the certificate is for www.linkedin.com by seeing the CN section of the subject part of the certificate. It also makes sure the certificate is not expired by seeing the expire date. It also validates the signature on the certificate by using the public key of issuer Digicert in /etc/ssl/cert.pem. Once this is done, using the public key of www.linkedin.com it negotiates cipher TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 with a symmetric key. Subsequent data transfer including first HTTP request uses the same cipher and symmetric key.","title":"HTTP"},{"location":"level101/linux_networking/intro/","text":"Linux Networking Fundamentals Prerequisites High-level knowledge of commonly used jargon in TCP/IP stack like DNS, TCP, UDP and HTTP Linux Commandline Basics What to expect from this course Throughout the course, we cover how an SRE can optimize the system to improve their web stack performance and troubleshoot if there is an issue in any of the layers of the networking stack. This course tries to dig through each layer of traditional TCP/IP stack and expects an SRE to have a picture beyond the bird\u2019s eye view of the functioning of the Internet. What is not covered under this course This course spends time on the fundamentals. We are not covering concepts like HTTP/2.0 , QUIC , TCP congestion control protocols , Anycast , BGP , CDN , Tunnels and Multicast . We expect that this course will provide the relevant basics to understand such concepts Birds eye view of the course The course covers the question \u201cWhat happens when you open linkedin.com in your browser?\u201d The course follows the flow of TCP/IP stack.More specifically, the course covers topics of Application layer protocols DNS and HTTP, transport layer protocols UDP and TCP, networking layer protocol IP and Data Link Layer protocol Course Contents DNS UDP HTTP TCP IP Routing","title":"Introduction"},{"location":"level101/linux_networking/intro/#linux-networking-fundamentals","text":"","title":"Linux Networking Fundamentals"},{"location":"level101/linux_networking/intro/#prerequisites","text":"High-level knowledge of commonly used jargon in TCP/IP stack like DNS, TCP, UDP and HTTP Linux Commandline Basics","title":"Prerequisites"},{"location":"level101/linux_networking/intro/#what-to-expect-from-this-course","text":"Throughout the course, we cover how an SRE can optimize the system to improve their web stack performance and troubleshoot if there is an issue in any of the layers of the networking stack. This course tries to dig through each layer of traditional TCP/IP stack and expects an SRE to have a picture beyond the bird\u2019s eye view of the functioning of the Internet.","title":"What to expect from this course"},{"location":"level101/linux_networking/intro/#what-is-not-covered-under-this-course","text":"This course spends time on the fundamentals. We are not covering concepts like HTTP/2.0 , QUIC , TCP congestion control protocols , Anycast , BGP , CDN , Tunnels and Multicast . We expect that this course will provide the relevant basics to understand such concepts","title":"What is not covered under this course"},{"location":"level101/linux_networking/intro/#birds-eye-view-of-the-course","text":"The course covers the question \u201cWhat happens when you open linkedin.com in your browser?\u201d The course follows the flow of TCP/IP stack.More specifically, the course covers topics of Application layer protocols DNS and HTTP, transport layer protocols UDP and TCP, networking layer protocol IP and Data Link Layer protocol","title":"Birds eye view of the course"},{"location":"level101/linux_networking/intro/#course-contents","text":"DNS UDP HTTP TCP IP Routing","title":"Course Contents"},{"location":"level101/linux_networking/ipr/","text":"IP Routing and Data Link Layer We will dig how packets that leave the client reach the server and vice versa. When the packet reaches the IP layer, the transport layer populates source port, destination port. IP/Network layer populates destination IP(discovered from DNS) and then looks up the route to the destination IP on the routing table. #Linux route -n command gives the default routing table route -n Kernel IP routing table Destination Gateway Genmask Flags Metric Ref Use Iface 0.0.0.0 172.17.0.1 0.0.0.0 UG 0 0 0 eth0 172.17.0.0 0.0.0.0 255.255.0.0 U 0 0 0 eth0 Here the destination IP is bitwise AND\u2019d with the Genmask and if the answer is the destination part of the table then that gateway and interface is picked for routing. Here linkedin.com\u2019s IP 108.174.10.10 is AND\u2019d with 255.255.255.0 and the answer we get is 108.174.10.0 which doesn\u2019t match with any destination in the routing table. Then Linux does an AND of destination IP with 0.0.0.0 and we get 0.0.0.0. This answer matches the default row Routing table is processed in the order of more octets of 1 set in genmask and genmask 0.0.0.0 is the default route if nothing matches. At the end of this operation Linux figured out that the packet has to be sent to next hop 172.17.0.1 via eth0. The source IP of the packet will be set as the IP of interface eth0. Now to send the packet to 172.17.0.1 linux has to figure out the MAC address of 172.17.0.1. MAC address is figured by looking at the internal arp cache which stores translation between IP address and MAC address. If there is a cache miss, Linux broadcasts ARP request within the internal network asking who has 172.17.0.1. The owner of the IP sends an ARP response which is cached by the kernel and the kernel sends the packet to the gateway by setting Source mac address as mac address of eth0 and destination mac address of 172.17.0.1 which we got just now. Similar routing lookup process is followed in each hop till the packet reaches the actual server. Transport layer and layers above it come to play only at end servers. During intermediate hops only till the IP/Network layer is involved. One weird gateway we saw in the routing table is 0.0.0.0. This gateway means no Layer3(Network layer) hop is needed to send the packet. Both source and destination are in the same network. Kernel has to figure out the mac of the destination and populate source and destination mac appropriately and send the packet out so that it reaches the destination without any Layer3 hop in the middle As we followed in other modules, lets complete this session with SRE usecases Applications in SRE role Generally the routing table is populated by DHCP and playing around is not a good practice. There can be reasons where one has to play around the routing table but take that path only when it's absolutely necessary Understanding error messages better like, \u201cNo route to host\u201d error can mean mac address of the destination host is not found and it can mean the destination host is down On rare cases looking at the ARP table can help us understand if there is a IP conflict where same IP is assigned to two hosts by mistake and this is causing unexpected behavior","title":"Routing"},{"location":"level101/linux_networking/ipr/#ip-routing-and-data-link-layer","text":"We will dig how packets that leave the client reach the server and vice versa. When the packet reaches the IP layer, the transport layer populates source port, destination port. IP/Network layer populates destination IP(discovered from DNS) and then looks up the route to the destination IP on the routing table. #Linux route -n command gives the default routing table route -n Kernel IP routing table Destination Gateway Genmask Flags Metric Ref Use Iface 0.0.0.0 172.17.0.1 0.0.0.0 UG 0 0 0 eth0 172.17.0.0 0.0.0.0 255.255.0.0 U 0 0 0 eth0 Here the destination IP is bitwise AND\u2019d with the Genmask and if the answer is the destination part of the table then that gateway and interface is picked for routing. Here linkedin.com\u2019s IP 108.174.10.10 is AND\u2019d with 255.255.255.0 and the answer we get is 108.174.10.0 which doesn\u2019t match with any destination in the routing table. Then Linux does an AND of destination IP with 0.0.0.0 and we get 0.0.0.0. This answer matches the default row Routing table is processed in the order of more octets of 1 set in genmask and genmask 0.0.0.0 is the default route if nothing matches. At the end of this operation Linux figured out that the packet has to be sent to next hop 172.17.0.1 via eth0. The source IP of the packet will be set as the IP of interface eth0. Now to send the packet to 172.17.0.1 linux has to figure out the MAC address of 172.17.0.1. MAC address is figured by looking at the internal arp cache which stores translation between IP address and MAC address. If there is a cache miss, Linux broadcasts ARP request within the internal network asking who has 172.17.0.1. The owner of the IP sends an ARP response which is cached by the kernel and the kernel sends the packet to the gateway by setting Source mac address as mac address of eth0 and destination mac address of 172.17.0.1 which we got just now. Similar routing lookup process is followed in each hop till the packet reaches the actual server. Transport layer and layers above it come to play only at end servers. During intermediate hops only till the IP/Network layer is involved. One weird gateway we saw in the routing table is 0.0.0.0. This gateway means no Layer3(Network layer) hop is needed to send the packet. Both source and destination are in the same network. Kernel has to figure out the mac of the destination and populate source and destination mac appropriately and send the packet out so that it reaches the destination without any Layer3 hop in the middle As we followed in other modules, lets complete this session with SRE usecases","title":"IP Routing and Data Link Layer"},{"location":"level101/linux_networking/ipr/#applications-in-sre-role","text":"Generally the routing table is populated by DHCP and playing around is not a good practice. There can be reasons where one has to play around the routing table but take that path only when it's absolutely necessary Understanding error messages better like, \u201cNo route to host\u201d error can mean mac address of the destination host is not found and it can mean the destination host is down On rare cases looking at the ARP table can help us understand if there is a IP conflict where same IP is assigned to two hosts by mistake and this is causing unexpected behavior","title":"Applications in SRE role"},{"location":"level101/linux_networking/tcp/","text":"TCP TCP is a transport layer protocol like UDP but it guarantees reliability, flow control and congestion control. TCP guarantees reliable delivery by using sequence numbers. A TCP connection is established by a three way handshake. In our case, the client sends a SYN packet along with the starting sequence number it plans to use, the server acknowledges the SYN packet and sends a SYN with its sequence number. Once the client acknowledges the syn packet, the connection is established. Each data transferred from here on is considered delivered reliably once acknowledgement for that sequence is received by the concerned party #To understand handshake run packet capture on one bash session tcpdump -S -i any port 80 #Run curl on one bash session curl www.linkedin.com Here client sends a syn flag shown by [S] flag with a sequence number 1522264672. The server acknowledges receipt of SYN with an ack [.] flag and a Syn flag for its sequence number[S]. The server uses the sequence number 1063230400 and acknowledges the client it\u2019s expecting sequence number 1522264673 (client sequence+1). Client sends a zero length acknowledgement packet to the server(server sequence+1) and connection stands established. This is called three way handshake. The client sends a 76 bytes length packet after this and increments its sequence number by 76. Server sends a 170 byte response and closes the connection. This was the difference we were talking about between HTTP/1.1 and HTTP/1.0. In HTTP/1.1 this same connection can be reused which reduces overhead of 3 way handshake for each HTTP request. If a packet is missed between client and server, server won\u2019t send an ack to the client and client would retry sending the packet till the ACK is received. This guarantees reliability. The flow control is established by the win size field in each segment. The win size says available TCP buffer length in the kernel which can be used to buffer received segments. A size 0 means the receiver has a lot of lag to catch from its socket buffer and the sender has to pause sending packets so that receiver can cope up. This flow control protects from slow receiver and fast sender problem TCP also does congestion control which determines how many segments can be in transit without an ack. Linux provides us the ability to configure algorithms for congestion control which we are not covering here. While closing a connection, client/server calls a close syscall. Let's assume client do that. Client\u2019s kernel will send a FIN packet to the server. Server\u2019s kernel can\u2019t close the connection till the close syscall is called by the server application. Once server app calls close, server also sends a FIN packet and client enters into time wait state for 2*MSS(120s) so that this socket can\u2019t be reused for that time period to prevent any TCP state corruptions due to stray stale packets. Armed with our TCP and HTTP knowledge lets see how this is used by SREs in their role Applications in SRE role Scaling HTTP performance using load balancers need consistent knowledge about both TCP and HTTP. There are different kinds of load balancing like L4, L7 load balancing, Direct Server Return etc. HTTPs offloading can be done on Load balancer or directly on servers based on the performance and compliance needs. Tweaking sysctl variables for rmem and wmem like we did for UDP can improve throughput of sender and receiver. Sysctl variable tcp_max_syn_backlog and socket variable somax_conn determines how many connections for which the kernel can complete 3 way handshake before app calling accept syscall. This is much useful in single threaded applications. Once the backlog is full, new connections stay in SYN_RCVD state (when you run netstat) till the application calls accept syscall Apps can run out of file descriptors if there are too many short lived connections. Digging through tcp_reuse and tcp_recycle can help reduce time spent in the time wait state(it has its own risk). Making apps reuse a pool of connections instead of creating ad hoc connection can also help Understanding performance bottlenecks by seeing metrics and classifying whether its a problem in App or network side. Example too many sockets in Close_wait state is a problem on application whereas retransmissions can be a problem more on network or on OS stack than the application itself. Understanding the fundamentals can help us narrow down where the bottleneck is","title":"TCP"},{"location":"level101/linux_networking/tcp/#tcp","text":"TCP is a transport layer protocol like UDP but it guarantees reliability, flow control and congestion control. TCP guarantees reliable delivery by using sequence numbers. A TCP connection is established by a three way handshake. In our case, the client sends a SYN packet along with the starting sequence number it plans to use, the server acknowledges the SYN packet and sends a SYN with its sequence number. Once the client acknowledges the syn packet, the connection is established. Each data transferred from here on is considered delivered reliably once acknowledgement for that sequence is received by the concerned party #To understand handshake run packet capture on one bash session tcpdump -S -i any port 80 #Run curl on one bash session curl www.linkedin.com Here client sends a syn flag shown by [S] flag with a sequence number 1522264672. The server acknowledges receipt of SYN with an ack [.] flag and a Syn flag for its sequence number[S]. The server uses the sequence number 1063230400 and acknowledges the client it\u2019s expecting sequence number 1522264673 (client sequence+1). Client sends a zero length acknowledgement packet to the server(server sequence+1) and connection stands established. This is called three way handshake. The client sends a 76 bytes length packet after this and increments its sequence number by 76. Server sends a 170 byte response and closes the connection. This was the difference we were talking about between HTTP/1.1 and HTTP/1.0. In HTTP/1.1 this same connection can be reused which reduces overhead of 3 way handshake for each HTTP request. If a packet is missed between client and server, server won\u2019t send an ack to the client and client would retry sending the packet till the ACK is received. This guarantees reliability. The flow control is established by the win size field in each segment. The win size says available TCP buffer length in the kernel which can be used to buffer received segments. A size 0 means the receiver has a lot of lag to catch from its socket buffer and the sender has to pause sending packets so that receiver can cope up. This flow control protects from slow receiver and fast sender problem TCP also does congestion control which determines how many segments can be in transit without an ack. Linux provides us the ability to configure algorithms for congestion control which we are not covering here. While closing a connection, client/server calls a close syscall. Let's assume client do that. Client\u2019s kernel will send a FIN packet to the server. Server\u2019s kernel can\u2019t close the connection till the close syscall is called by the server application. Once server app calls close, server also sends a FIN packet and client enters into time wait state for 2*MSS(120s) so that this socket can\u2019t be reused for that time period to prevent any TCP state corruptions due to stray stale packets. Armed with our TCP and HTTP knowledge lets see how this is used by SREs in their role","title":"TCP"},{"location":"level101/linux_networking/tcp/#applications-in-sre-role","text":"Scaling HTTP performance using load balancers need consistent knowledge about both TCP and HTTP. There are different kinds of load balancing like L4, L7 load balancing, Direct Server Return etc. HTTPs offloading can be done on Load balancer or directly on servers based on the performance and compliance needs. Tweaking sysctl variables for rmem and wmem like we did for UDP can improve throughput of sender and receiver. Sysctl variable tcp_max_syn_backlog and socket variable somax_conn determines how many connections for which the kernel can complete 3 way handshake before app calling accept syscall. This is much useful in single threaded applications. Once the backlog is full, new connections stay in SYN_RCVD state (when you run netstat) till the application calls accept syscall Apps can run out of file descriptors if there are too many short lived connections. Digging through tcp_reuse and tcp_recycle can help reduce time spent in the time wait state(it has its own risk). Making apps reuse a pool of connections instead of creating ad hoc connection can also help Understanding performance bottlenecks by seeing metrics and classifying whether its a problem in App or network side. Example too many sockets in Close_wait state is a problem on application whereas retransmissions can be a problem more on network or on OS stack than the application itself. Understanding the fundamentals can help us narrow down where the bottleneck is","title":"Applications in SRE role"},{"location":"level101/linux_networking/udp/","text":"UDP UDP is a transport layer protocol. DNS is an application layer protocol that runs on top of UDP(most of the times). Before jumping into UDP, let's try to understand what an application and transport layer is. DNS protocol is used by a DNS client(eg dig) and DNS server(eg named). The transport layer makes sure the DNS request reaches the DNS server process and similarly the response reaches the DNS client process. Multiple processes can run on a system and they can listen on any ports . DNS servers usually listen on port number 53. When a client makes a DNS request, after filling the necessary application payload, it passes the payload to the kernel via sendto system call. The kernel picks a random port number( >1024 ) as source port number and puts 53 as destination port number and sends the packet to lower layers. When the kernel on server side receives the packet, it checks the port number and queues the packet to the application buffer of the DNS server process which makes a recvfrom system call and reads the packet. This process by the kernel is called multiplexing(combining packets from multiple applications to same lower layers) and demultiplexing(segregating packets from single lower layer to multiple applications). Multiplexing and Demultiplexing is done by the Transport layer. UDP is one of the simplest transport layer protocol and it does only multiplexing and demultiplexing. Another common transport layer protocol TCP does a bunch of other things like reliable communication, flow control and congestion control. UDP is designed to be lightweight and handle communications with little overhead. So it doesn\u2019t do anything beyond multiplexing and demultiplexing. If applications running on top of UDP need any of the features of TCP, they have to implement that in their application This example from python wiki covers a sample UDP client and server where \u201cHello World\u201d is an application payload sent to server listening on port number 5005. The server receives the packet and prints the \u201cHello World\u201d string from the client Applications in SRE role If the underlying network is slow and the UDP layer is unable to queue packets down to the networking layer, sendto syscall from the application will hang till the kernel finds some of its buffer is freed. This can affect the throughput of the system. Increasing write memory buffer values using sysctl variables net.core.wmem_max and net.core.wmem_default provides some cushion to the application from the slow network Similarly if the receiver process is slow in consuming from its buffer, the kernel has to drop packets which it can\u2019t queue due to the buffer being full. Since UDP doesn\u2019t guarantee reliability these dropped packets can cause data loss unless tracked by the application layer. Increasing sysctl variables rmem_default and rmem_max can provide some cushion to slow applications from fast senders.","title":"UDP"},{"location":"level101/linux_networking/udp/#udp","text":"UDP is a transport layer protocol. DNS is an application layer protocol that runs on top of UDP(most of the times). Before jumping into UDP, let's try to understand what an application and transport layer is. DNS protocol is used by a DNS client(eg dig) and DNS server(eg named). The transport layer makes sure the DNS request reaches the DNS server process and similarly the response reaches the DNS client process. Multiple processes can run on a system and they can listen on any ports . DNS servers usually listen on port number 53. When a client makes a DNS request, after filling the necessary application payload, it passes the payload to the kernel via sendto system call. The kernel picks a random port number( >1024 ) as source port number and puts 53 as destination port number and sends the packet to lower layers. When the kernel on server side receives the packet, it checks the port number and queues the packet to the application buffer of the DNS server process which makes a recvfrom system call and reads the packet. This process by the kernel is called multiplexing(combining packets from multiple applications to same lower layers) and demultiplexing(segregating packets from single lower layer to multiple applications). Multiplexing and Demultiplexing is done by the Transport layer. UDP is one of the simplest transport layer protocol and it does only multiplexing and demultiplexing. Another common transport layer protocol TCP does a bunch of other things like reliable communication, flow control and congestion control. UDP is designed to be lightweight and handle communications with little overhead. So it doesn\u2019t do anything beyond multiplexing and demultiplexing. If applications running on top of UDP need any of the features of TCP, they have to implement that in their application This example from python wiki covers a sample UDP client and server where \u201cHello World\u201d is an application payload sent to server listening on port number 5005. The server receives the packet and prints the \u201cHello World\u201d string from the client","title":"UDP"},{"location":"level101/linux_networking/udp/#applications-in-sre-role","text":"If the underlying network is slow and the UDP layer is unable to queue packets down to the networking layer, sendto syscall from the application will hang till the kernel finds some of its buffer is freed. This can affect the throughput of the system. Increasing write memory buffer values using sysctl variables net.core.wmem_max and net.core.wmem_default provides some cushion to the application from the slow network Similarly if the receiver process is slow in consuming from its buffer, the kernel has to drop packets which it can\u2019t queue due to the buffer being full. Since UDP doesn\u2019t guarantee reliability these dropped packets can cause data loss unless tracked by the application layer. Increasing sysctl variables rmem_default and rmem_max can provide some cushion to slow applications from fast senders.","title":"Applications in SRE role"},{"location":"level101/metrics_and_monitoring/alerts/","text":"Proactive monitoring using alerts Earlier we discussed different ways to collect key metric data points from a service and its underlying infrastructure. This data gives us a better understanding of how the service is performing. One of the main objectives of monitoring is to detect any service degradations early (reduce Mean Time To Detect) and notify stakeholders so that the issues are either avoided or can be fixed early, thus reducing Mean Time To Recover (MTTR). For example, if you are notified when resource usage by a service exceeds 90 percent, you can take preventive measures to avoid any service breakdown due to a shortage of resources. On the other hand, when a service goes down due to an issue, early detection and notification of such incidents can help you quickly fix the issue. Figure 8: An alert notification received on Slack Today most of the monitoring services available provide a mechanism to set up alerts on one or a combination of metrics to actively monitor the service health. These alerts have a set of defined rules or conditions, and when the rule is broken, you are notified. These rules can be as simple as notifying when the metric value exceeds n to as complex as a week over week (WoW) comparison of standard deviation over a period of time. Monitoring tools notify you about an active alert, and most of these tools support instant messaging (IM) platforms, SMS, email, or phone calls. Figure 8 shows a sample alert notification received on Slack for memory usage exceeding 90 percent of total RAM space on the host.","title":"Proactive Monitoring with Alerts"},{"location":"level101/metrics_and_monitoring/alerts/#_1","text":"","title":""},{"location":"level101/metrics_and_monitoring/alerts/#proactive-monitoring-using-alerts","text":"Earlier we discussed different ways to collect key metric data points from a service and its underlying infrastructure. This data gives us a better understanding of how the service is performing. One of the main objectives of monitoring is to detect any service degradations early (reduce Mean Time To Detect) and notify stakeholders so that the issues are either avoided or can be fixed early, thus reducing Mean Time To Recover (MTTR). For example, if you are notified when resource usage by a service exceeds 90 percent, you can take preventive measures to avoid any service breakdown due to a shortage of resources. On the other hand, when a service goes down due to an issue, early detection and notification of such incidents can help you quickly fix the issue. Figure 8: An alert notification received on Slack Today most of the monitoring services available provide a mechanism to set up alerts on one or a combination of metrics to actively monitor the service health. These alerts have a set of defined rules or conditions, and when the rule is broken, you are notified. These rules can be as simple as notifying when the metric value exceeds n to as complex as a week over week (WoW) comparison of standard deviation over a period of time. Monitoring tools notify you about an active alert, and most of these tools support instant messaging (IM) platforms, SMS, email, or phone calls. Figure 8 shows a sample alert notification received on Slack for memory usage exceeding 90 percent of total RAM space on the host.","title":"Proactive monitoring using alerts"},{"location":"level101/metrics_and_monitoring/best_practices/","text":"Best practices for monitoring When setting up monitoring for a service, keep the following best practices in mind. Use the right metric type -- Most of the libraries available today offer various metric types. Choose the appropriate metric type for monitoring your system. Following are the types of metrics and their purposes. Gauge -- Gauge is a constant type of metric. After the metric is initialized, the metric value does not change unless you intentionally update it. Timer -- Timer measures the time taken to complete a task. Counter -- Counter counts the number of occurrences of a particular event. For more information about these metric types, see Data Types . Avoid over-monitoring -- Monitoring can be a significant engineering endeavor . Therefore, be sure not to spend too much time and resources on monitoring services, yet make sure all important metrics are captured. Prevent alert fatigue -- Set alerts for metrics that are important and actionable. If you receive too many non-critical alerts, you might start ignoring alert notifications over time. As a result, critical alerts might get overlooked. Have a runbook for alerts -- For every alert, make sure you have a document explaining what actions and checks need to be performed when the alert fires. This enables any engineer on the team to handle the alert and take necessary actions, without any help from others.","title":"Best Practices for Monitoring"},{"location":"level101/metrics_and_monitoring/best_practices/#_1","text":"","title":""},{"location":"level101/metrics_and_monitoring/best_practices/#best-practices-for-monitoring","text":"When setting up monitoring for a service, keep the following best practices in mind. Use the right metric type -- Most of the libraries available today offer various metric types. Choose the appropriate metric type for monitoring your system. Following are the types of metrics and their purposes. Gauge -- Gauge is a constant type of metric. After the metric is initialized, the metric value does not change unless you intentionally update it. Timer -- Timer measures the time taken to complete a task. Counter -- Counter counts the number of occurrences of a particular event. For more information about these metric types, see Data Types . Avoid over-monitoring -- Monitoring can be a significant engineering endeavor . Therefore, be sure not to spend too much time and resources on monitoring services, yet make sure all important metrics are captured. Prevent alert fatigue -- Set alerts for metrics that are important and actionable. If you receive too many non-critical alerts, you might start ignoring alert notifications over time. As a result, critical alerts might get overlooked. Have a runbook for alerts -- For every alert, make sure you have a document explaining what actions and checks need to be performed when the alert fires. This enables any engineer on the team to handle the alert and take necessary actions, without any help from others.","title":"Best practices for monitoring"},{"location":"level101/metrics_and_monitoring/command-line_tools/","text":"Command-line tools Most of the Linux distributions today come with a set of tools that monitor the system's performance. These tools help you measure and understand various subsystem statistics (CPU, memory, network, and so on). Let's look at some of the tools that are predominantly used. ps/top -- The process status command (ps) displays information about all the currently running processes in a Linux system. The top command is similar to the ps command, but it periodically updates the information displayed until the program is terminated. An advanced version of top, called htop, has a more user-friendly interface and some additional features. These command-line utilities come with options to modify the operation and output of the command. Following are some important options supported by the ps command. -p -- Displays information about processes that match the specified process IDs. Similarly, you can use -u and -g to display information about processes belonging to a specific user or group. -a -- Displays information about other users' processes, as well as one's own. -x -- When displaying processes matched by other options, includes processes that do not have a controlling terminal. Figure 2: Results of top command ss -- The socket statistics command (ss) displays information about network sockets on the system. This tool is the successor of netstat , which is deprecated. Following are some command-line options supported by the ss command: -t -- Displays the TCP socket. Similarly, -u displays UDP sockets, -x is for UNIX domain sockets, and so on. -l -- Displays only listening sockets. -n -- Instructs the command to not resolve service names. Instead displays the port numbers. Figure 3: List of listening sockets on a system free -- The free command displays memory usage statistics on the host like available memory, used memory, and free memory. Most often, this command is used with the -h command-line option, which displays the statistics in a human-readable format. Figure 4: Memory statistics on a host in human-readable form df -- The df command displays disk space usage statistics. The -i command-line option is also often used to display inode usage statistics. The -h command-line option is used for displaying statistics in a human-readable format. Figure 5: Disk usage statistics on a system in human-readable form sar -- The sar utility monitors various subsystems, such as CPU and memory, in real time. This data can be stored in a file specified with the -o option. This tool helps to identify anomalies. iftop -- The interface top command ( iftop ) displays bandwidth utilization by a host on an interface. This command is often used to identify bandwidth usage by active connections. The -i option specifies which network interface to watch. Figure 6: Network bandwidth usage by active connection on the host tcpdump -- The tcpdump command is a network monitoring tool that captures network packets flowing over the network and displays a description of the captured packets. The following options are available: -i -- Interface to listen on host -- Filters traffic going to or from the specified host src/dst -- Displays one-way traffic from the source (src) or to the destination (dst) port -- Filters traffic to or from a particular port Figure 7: *tcpdump* of packets on *docker0* interface on a host","title":"Command-line Tools"},{"location":"level101/metrics_and_monitoring/command-line_tools/#_1","text":"","title":""},{"location":"level101/metrics_and_monitoring/command-line_tools/#command-line-tools","text":"Most of the Linux distributions today come with a set of tools that monitor the system's performance. These tools help you measure and understand various subsystem statistics (CPU, memory, network, and so on). Let's look at some of the tools that are predominantly used. ps/top -- The process status command (ps) displays information about all the currently running processes in a Linux system. The top command is similar to the ps command, but it periodically updates the information displayed until the program is terminated. An advanced version of top, called htop, has a more user-friendly interface and some additional features. These command-line utilities come with options to modify the operation and output of the command. Following are some important options supported by the ps command. -p -- Displays information about processes that match the specified process IDs. Similarly, you can use -u and -g to display information about processes belonging to a specific user or group. -a -- Displays information about other users' processes, as well as one's own. -x -- When displaying processes matched by other options, includes processes that do not have a controlling terminal. Figure 2: Results of top command ss -- The socket statistics command (ss) displays information about network sockets on the system. This tool is the successor of netstat , which is deprecated. Following are some command-line options supported by the ss command: -t -- Displays the TCP socket. Similarly, -u displays UDP sockets, -x is for UNIX domain sockets, and so on. -l -- Displays only listening sockets. -n -- Instructs the command to not resolve service names. Instead displays the port numbers. Figure 3: List of listening sockets on a system free -- The free command displays memory usage statistics on the host like available memory, used memory, and free memory. Most often, this command is used with the -h command-line option, which displays the statistics in a human-readable format. Figure 4: Memory statistics on a host in human-readable form df -- The df command displays disk space usage statistics. The -i command-line option is also often used to display inode usage statistics. The -h command-line option is used for displaying statistics in a human-readable format. Figure 5: Disk usage statistics on a system in human-readable form sar -- The sar utility monitors various subsystems, such as CPU and memory, in real time. This data can be stored in a file specified with the -o option. This tool helps to identify anomalies. iftop -- The interface top command ( iftop ) displays bandwidth utilization by a host on an interface. This command is often used to identify bandwidth usage by active connections. The -i option specifies which network interface to watch. Figure 6: Network bandwidth usage by active connection on the host tcpdump -- The tcpdump command is a network monitoring tool that captures network packets flowing over the network and displays a description of the captured packets. The following options are available: -i -- Interface to listen on host -- Filters traffic going to or from the specified host src/dst -- Displays one-way traffic from the source (src) or to the destination (dst) port -- Filters traffic to or from a particular port Figure 7: *tcpdump* of packets on *docker0* interface on a host","title":"Command-line tools"},{"location":"level101/metrics_and_monitoring/conclusion/","text":"Conclusion A robust monitoring and alerting system is necessary for maintaining and troubleshooting a system. A dashboard with key metrics can give you an overview of service performance, all in one place. Well-defined alerts (with realistic thresholds and notifications) further enable you to quickly identify any anomalies in the service infrastructure and in resource saturation. By taking necessary actions, you can avoid any service degradations and decrease MTTD for service breakdowns. In addition to in-house monitoring, monitoring real user experience can help you to understand service performance as perceived by the users. Many modules are involved in serving the user, and most of them are out of your control. Therefore, you need to have real-user monitoring in place. Metrics give very abstract details on service performance. To get a better understanding of the system and for faster recovery during incidents, you might want to implement the other two pillars of observability: logs and tracing. Logs and trace data can help you understand what led to service failure or degradation. Following are some resources to learn more about monitoring and observability: Google SRE book: Monitoring Distributed Systems Mastering Distributed Tracing by Yuri Shkuro References Google SRE book: Monitoring Distributed Systems Mastering Distributed Tracing, by Yuri Shkuro Monitoring and Observability Three PIllars with Zero Answers Engineering blogs on LinkedIn , Grafana , Elastic.co , OpenTelemetry","title":"Conclusion"},{"location":"level101/metrics_and_monitoring/conclusion/#conclusion","text":"A robust monitoring and alerting system is necessary for maintaining and troubleshooting a system. A dashboard with key metrics can give you an overview of service performance, all in one place. Well-defined alerts (with realistic thresholds and notifications) further enable you to quickly identify any anomalies in the service infrastructure and in resource saturation. By taking necessary actions, you can avoid any service degradations and decrease MTTD for service breakdowns. In addition to in-house monitoring, monitoring real user experience can help you to understand service performance as perceived by the users. Many modules are involved in serving the user, and most of them are out of your control. Therefore, you need to have real-user monitoring in place. Metrics give very abstract details on service performance. To get a better understanding of the system and for faster recovery during incidents, you might want to implement the other two pillars of observability: logs and tracing. Logs and trace data can help you understand what led to service failure or degradation. Following are some resources to learn more about monitoring and observability: Google SRE book: Monitoring Distributed Systems Mastering Distributed Tracing by Yuri Shkuro","title":"Conclusion"},{"location":"level101/metrics_and_monitoring/conclusion/#references","text":"Google SRE book: Monitoring Distributed Systems Mastering Distributed Tracing, by Yuri Shkuro Monitoring and Observability Three PIllars with Zero Answers Engineering blogs on LinkedIn , Grafana , Elastic.co , OpenTelemetry","title":"References"},{"location":"level101/metrics_and_monitoring/introduction/","text":"Prerequisites Linux Basics Python and the Web Systems Design Linux Networking Fundamentals What to expect from this course Monitoring is an integral part of any system. As an SRE, you need to have a basic understanding of monitoring a service infrastructure. By the end of this course, you will gain a better understanding of the following topics: What is monitoring? What needs to be measured How the metrics gathered can be used to improve business decisions and overall reliability Proactive monitoring with alerts Log processing and its importance What is observability? Distributed tracing Logs Metrics What is not covered in this course Guide to setting up a monitoring infrastructure Deep dive into different monitoring technologies and benchmarking or comparison of any tools Course content Introduction Four golden signals of monitoring Why is monitoring important? Command-line tools Third-party monitoring Proactive monitoring using alerts Best practices for monitoring Observability Logs Tracing Conclusion Introduction Monitoring is a process of collecting real-time performance metrics from a system, analyzing the data to derive meaningful information, and displaying the data to the users. In simple terms, you measure various metrics regularly to understand the state of the system, including but not limited to, user requests, latency, and error rate. What gets measured, gets fixed ---if you can measure something, you can reason about it, understand it, discuss it, and act upon it with confidence. Four golden signals of monitoring When setting up monitoring for a system, you need to decide what to measure. The four golden signals of monitoring provide a good understanding of service performance and lay a foundation for monitoring a system. These four golden signals are Traffic Latency Error Saturation These metrics help you to understand the system performance and bottlenecks, and to create a better end-user experience. As discussed in the Google SRE book , if you can measure only four metrics of your service, focus on these four. Let's look at each of the four golden signals. Traffic -- Traffic gives a better understanding of the service demand. Often referred to as service QPS (queries per second), traffic is a measure of requests served by the service. This signal helps you to decide when a service needs to be scaled up to handle increasing customer demand and scaled down to be cost-effective. Latency -- Latency is the measure of time taken by the service to process the incoming request and send the response. Measuring service latency helps in the early detection of slow degradation of the service. Distinguishing between the latency of successful requests and the latency of failed requests is important. For example, an HTTP 5XX error triggered due to loss of connection to a database or other critical backend might be served very quickly. However, because an HTTP 500 error indicates a failed request, factoring 500s into overall latency might result in misleading calculations. Error (rate) -- Error is the measure of failed client requests. These failures can be easily identified based on the response codes ( HTTP 5XX error ). There might be cases where the response is considered erroneous due to wrong result data or due to policy violations. For example, you might get an HTTP 200 response, but the body has incomplete data, or response time is breaching the agreed-upon SLA s. Therefore, you need to have other mechanisms (code logic or instrumentation ) in place to capture errors in addition to the response codes. Saturation -- Saturation is a measure of the resource utilization by a service. This signal tells you the state of service resources and how full they are. These resources include memory, compute, network I/O, and so on. Service performance slowly degrades even before resource utilization is at 100 percent. Therefore, having a utilization target is important. An increase in latency is a good indicator of saturation; measuring the 99th percentile of latency can help in the early detection of saturation. Depending on the type of service, you can measure these signals in different ways. For example, you might measure queries per second served for a web server. In contrast, for a database server, transactions performed and database sessions created give you an idea about the traffic handled by the database server. With the help of additional code logic (monitoring libraries and instrumentation), you can measure these signals periodically and store them for future analysis. Although these metrics give you an idea about the performance at the service end, you need to also ensure that the same user experience is delivered at the client end. Therefore, you might need to monitor the service from outside the service infrastructure, which is discussed under third-party monitoring. Why is monitoring important? Monitoring plays a key role in the success of a service. As discussed earlier, monitoring provides performance insights for understanding service health. With access to historical data collected over time, you can build intelligent applications to address specific needs. Some of the key use cases follow: Reduction in time to resolve issues -- With a good monitoring infrastructure in place, you can identify issues quickly and resolve them, which reduces the impact caused by the issues. Business decisions -- Data collected over a period of time can help you make business decisions such as determining the product release cycle, which features to invest in, and geographical areas to focus on. Decisions based on long-term data can improve the overall product experience. Resource planning -- By analyzing historical data, you can forecast service compute-resource demands, and you can properly allocate resources. This allows financially effective decisions, with no compromise in end-user experience. Before we dive deeper into monitoring, let's understand some basic terminologies. Metric -- A metric is a quantitative measure of a particular system attribute---for example, memory or CPU Node or host -- A physical server, virtual machine, or container where an application is running QPS -- Queries Per Second , a measure of traffic served by the service per second Latency -- The time interval between user action and the response from the server---for example, time spent after sending a query to a database before the first response bit is received Error rate -- Number of errors observed over a particular time period (usually a second) Graph -- In monitoring, a graph is a representation of one or more values of metrics collected over time Dashboard -- A dashboard is a collection of graphs that provide an overview of system health Incident -- An incident is an event that disrupts the normal operations of a system MTTD -- Mean Time To Detect is the time interval between the beginning of a service failure and the detection of such failure MTTR -- Mean Time To Resolve is the time spent to fix a service failure and bring the service back to its normal state Before we discuss monitoring an application, let us look at the monitoring infrastructure. Following is an illustration of a basic monitoring system. Figure 1: Illustration of a monitoring infrastructure Figure 1 shows a monitoring infrastructure mechanism for aggregating metrics on the system, and collecting and storing the data for display. In addition, a monitoring infrastructure includes alert subsystems for notifying concerned parties during any abnormal behavior. Let's look at each of these infrastructure components: Host metrics agent -- A host metrics agent is a process running on the host that collects performance statistics for host subsystems such as memory, CPU, and network. These metrics are regularly relayed to a metrics collector for storage and visualization. Some examples are collectd , telegraf , and metricbeat . Metric aggregator -- A metric aggregator is a process running on the host. Applications running on the host collect service metrics using instrumentation . Collected metrics are sent either to the aggregator process or directly to the metrics collector over API, if available. Received metrics are aggregated periodically and relayed to the metrics collector in batches. An example is StatsD . Metrics collector -- A metrics collector process collects all the metrics from the metric aggregators running on multiple hosts. The collector takes care of decoding and stores this data on the database. Metric collection and storage might be taken care of by one single service such as InfluxDB , which we discuss next. An example is carbon daemons . Storage -- A time-series database stores all of these metrics. Examples are OpenTSDB , Whisper , and InfluxDB . Metrics server -- A metrics server can be as basic as a web server that graphically renders metric data. In addition, the metrics server provides aggregation functionalities and APIs for fetching metric data programmatically. Some examples are Grafana and Graphite-Web . Alert manager -- The alert manager regularly polls metric data available and, if there are any anomalies detected, notifies you. Each alert has a set of rules for identifying such anomalies. Today many metrics servers such as Grafana support alert management. We discuss alerting in detail later . Examples are Grafana and Icinga .","title":"Introduction"},{"location":"level101/metrics_and_monitoring/introduction/#_1","text":"","title":""},{"location":"level101/metrics_and_monitoring/introduction/#prerequisites","text":"Linux Basics Python and the Web Systems Design Linux Networking Fundamentals","title":"Prerequisites"},{"location":"level101/metrics_and_monitoring/introduction/#what-to-expect-from-this-course","text":"Monitoring is an integral part of any system. As an SRE, you need to have a basic understanding of monitoring a service infrastructure. By the end of this course, you will gain a better understanding of the following topics: What is monitoring? What needs to be measured How the metrics gathered can be used to improve business decisions and overall reliability Proactive monitoring with alerts Log processing and its importance What is observability? Distributed tracing Logs Metrics","title":"What to expect from this course"},{"location":"level101/metrics_and_monitoring/introduction/#what-is-not-covered-in-this-course","text":"Guide to setting up a monitoring infrastructure Deep dive into different monitoring technologies and benchmarking or comparison of any tools","title":"What is not covered in this course"},{"location":"level101/metrics_and_monitoring/introduction/#course-content","text":"Introduction Four golden signals of monitoring Why is monitoring important? Command-line tools Third-party monitoring Proactive monitoring using alerts Best practices for monitoring Observability Logs Tracing Conclusion","title":"Course content"},{"location":"level101/metrics_and_monitoring/introduction/#_2","text":"","title":""},{"location":"level101/metrics_and_monitoring/introduction/#introduction","text":"Monitoring is a process of collecting real-time performance metrics from a system, analyzing the data to derive meaningful information, and displaying the data to the users. In simple terms, you measure various metrics regularly to understand the state of the system, including but not limited to, user requests, latency, and error rate. What gets measured, gets fixed ---if you can measure something, you can reason about it, understand it, discuss it, and act upon it with confidence.","title":"Introduction"},{"location":"level101/metrics_and_monitoring/introduction/#four-golden-signals-of-monitoring","text":"When setting up monitoring for a system, you need to decide what to measure. The four golden signals of monitoring provide a good understanding of service performance and lay a foundation for monitoring a system. These four golden signals are Traffic Latency Error Saturation These metrics help you to understand the system performance and bottlenecks, and to create a better end-user experience. As discussed in the Google SRE book , if you can measure only four metrics of your service, focus on these four. Let's look at each of the four golden signals. Traffic -- Traffic gives a better understanding of the service demand. Often referred to as service QPS (queries per second), traffic is a measure of requests served by the service. This signal helps you to decide when a service needs to be scaled up to handle increasing customer demand and scaled down to be cost-effective. Latency -- Latency is the measure of time taken by the service to process the incoming request and send the response. Measuring service latency helps in the early detection of slow degradation of the service. Distinguishing between the latency of successful requests and the latency of failed requests is important. For example, an HTTP 5XX error triggered due to loss of connection to a database or other critical backend might be served very quickly. However, because an HTTP 500 error indicates a failed request, factoring 500s into overall latency might result in misleading calculations. Error (rate) -- Error is the measure of failed client requests. These failures can be easily identified based on the response codes ( HTTP 5XX error ). There might be cases where the response is considered erroneous due to wrong result data or due to policy violations. For example, you might get an HTTP 200 response, but the body has incomplete data, or response time is breaching the agreed-upon SLA s. Therefore, you need to have other mechanisms (code logic or instrumentation ) in place to capture errors in addition to the response codes. Saturation -- Saturation is a measure of the resource utilization by a service. This signal tells you the state of service resources and how full they are. These resources include memory, compute, network I/O, and so on. Service performance slowly degrades even before resource utilization is at 100 percent. Therefore, having a utilization target is important. An increase in latency is a good indicator of saturation; measuring the 99th percentile of latency can help in the early detection of saturation. Depending on the type of service, you can measure these signals in different ways. For example, you might measure queries per second served for a web server. In contrast, for a database server, transactions performed and database sessions created give you an idea about the traffic handled by the database server. With the help of additional code logic (monitoring libraries and instrumentation), you can measure these signals periodically and store them for future analysis. Although these metrics give you an idea about the performance at the service end, you need to also ensure that the same user experience is delivered at the client end. Therefore, you might need to monitor the service from outside the service infrastructure, which is discussed under third-party monitoring.","title":"Four golden signals of monitoring"},{"location":"level101/metrics_and_monitoring/introduction/#why-is-monitoring-important","text":"Monitoring plays a key role in the success of a service. As discussed earlier, monitoring provides performance insights for understanding service health. With access to historical data collected over time, you can build intelligent applications to address specific needs. Some of the key use cases follow: Reduction in time to resolve issues -- With a good monitoring infrastructure in place, you can identify issues quickly and resolve them, which reduces the impact caused by the issues. Business decisions -- Data collected over a period of time can help you make business decisions such as determining the product release cycle, which features to invest in, and geographical areas to focus on. Decisions based on long-term data can improve the overall product experience. Resource planning -- By analyzing historical data, you can forecast service compute-resource demands, and you can properly allocate resources. This allows financially effective decisions, with no compromise in end-user experience. Before we dive deeper into monitoring, let's understand some basic terminologies. Metric -- A metric is a quantitative measure of a particular system attribute---for example, memory or CPU Node or host -- A physical server, virtual machine, or container where an application is running QPS -- Queries Per Second , a measure of traffic served by the service per second Latency -- The time interval between user action and the response from the server---for example, time spent after sending a query to a database before the first response bit is received Error rate -- Number of errors observed over a particular time period (usually a second) Graph -- In monitoring, a graph is a representation of one or more values of metrics collected over time Dashboard -- A dashboard is a collection of graphs that provide an overview of system health Incident -- An incident is an event that disrupts the normal operations of a system MTTD -- Mean Time To Detect is the time interval between the beginning of a service failure and the detection of such failure MTTR -- Mean Time To Resolve is the time spent to fix a service failure and bring the service back to its normal state Before we discuss monitoring an application, let us look at the monitoring infrastructure. Following is an illustration of a basic monitoring system. Figure 1: Illustration of a monitoring infrastructure Figure 1 shows a monitoring infrastructure mechanism for aggregating metrics on the system, and collecting and storing the data for display. In addition, a monitoring infrastructure includes alert subsystems for notifying concerned parties during any abnormal behavior. Let's look at each of these infrastructure components: Host metrics agent -- A host metrics agent is a process running on the host that collects performance statistics for host subsystems such as memory, CPU, and network. These metrics are regularly relayed to a metrics collector for storage and visualization. Some examples are collectd , telegraf , and metricbeat . Metric aggregator -- A metric aggregator is a process running on the host. Applications running on the host collect service metrics using instrumentation . Collected metrics are sent either to the aggregator process or directly to the metrics collector over API, if available. Received metrics are aggregated periodically and relayed to the metrics collector in batches. An example is StatsD . Metrics collector -- A metrics collector process collects all the metrics from the metric aggregators running on multiple hosts. The collector takes care of decoding and stores this data on the database. Metric collection and storage might be taken care of by one single service such as InfluxDB , which we discuss next. An example is carbon daemons . Storage -- A time-series database stores all of these metrics. Examples are OpenTSDB , Whisper , and InfluxDB . Metrics server -- A metrics server can be as basic as a web server that graphically renders metric data. In addition, the metrics server provides aggregation functionalities and APIs for fetching metric data programmatically. Some examples are Grafana and Graphite-Web . Alert manager -- The alert manager regularly polls metric data available and, if there are any anomalies detected, notifies you. Each alert has a set of rules for identifying such anomalies. Today many metrics servers such as Grafana support alert management. We discuss alerting in detail later . Examples are Grafana and Icinga .","title":"Why is monitoring important?"},{"location":"level101/metrics_and_monitoring/observability/","text":"Observability Engineers often use observability when referring to building reliable systems. Observability is a term derived from control theory, It is a measure of how well internal states of a system can be inferred from knowledge of its external outputs. Service infrastructures used on a daily basis are becoming more and more complex; proactive monitoring alone is not sufficient to quickly resolve issues causing application failures. With monitoring, you can keep known past failures from recurring, but with a complex service architecture, many unknown factors can cause potential problems. To address such cases, you can make the service observable. An observable system provides highly granular insights into the implicit failure modes. In addition, an observable system furnishes ample context about its inner workings, which unlocks the ability to uncover deeper systemic issues. Monitoring enables failure detection; observability helps in gaining a better understanding of the system. Among engineers, there is a common misconception that monitoring and observability are two different things. Actually, observability is the superset to monitoring; that is, monitoring improves service observability. The goal of observability is not only to detect problems, but also to understand where the issue is and what is causing it. In addition to metrics, observability has two more pillars: logs and traces, as shown in Figure 9. Although these three components do not make a system 100 percent observable, these are the most important and powerful components that give a better understanding of the system. Each of these pillars has its flaws, which are described in Three Pillars with Zero Answers . Figure 9: Three pillars of observability Because we have covered metrics already, let's look at the other two pillars (logs and traces). Logs Logs (often referred to as events ) are a record of activities performed by a service during its run time, with a corresponding timestamp. Metrics give abstract information about degradations in a system, and logs give a detailed view of what is causing these degradations. Logs created by the applications and infrastructure components help in effectively understanding the behavior of the system by providing details on application errors, exceptions, and event timelines. Logs help you to go back in time to understand the events that led to a failure. Therefore, examining logs is essential to troubleshooting system failures. Log processing involves the aggregation of different logs from individual applications and their subsequent shipment to central storage. Moving logs to central storage helps to preserve the logs, in case the application instances are inaccessible, or the application crashes due to a failure. After the logs are available in a central place, you can analyze the logs to derive sensible information from them. For audit and compliance purposes, you archive these logs on the central storage for a certain period of time. Log analyzers fetch useful information from log lines, such as request user information, request URL (feature), and response headers (such as content length) and response time. This information is grouped based on these attributes and made available to you through a visualization tool for quick understanding. You might be wondering how this log information helps. This information gives a holistic view of activities performed on all the involved entities. For example, let's say someone is performing a DoS (denial of service) attack on a web application. With the help of log processing, you can quickly look at top client IPs derived from access logs and identify where the attack is coming from. Similarly, if a feature in an application is causing a high error rate when accessed with a particular request parameter value, the results of log analysis can help you to quickly identify the misbehaving parameter value and take further action. Figure 10: Log processing and analysis using ELK stack Figure 10 shows a log processing platform using ELK (Elasticsearch, Logstash, Kibana), which provides centralized log processing. Beats is a collection of lightweight data shippers that can ship logs, audit data, network data, and so on over the network. In this use case specifically, we are using filebeat as a log shipper. Filebeat watches service log files and ships the log data to Logstash. Logstash parses these logs and transforms the data, preparing it to store on Elasticsearch. Transformed log data is stored on Elasticsearch and indexed for fast retrieval. Kibana searches and displays log data stored on Elasticsearch. Kibana also provides a set of visualizations for graphically displaying summaries derived from log data. Storing logs is expensive. And extensive logging of every event on the server is costly and takes up more storage space. With an increasing number of services, this cost can increase proportionally to the number of services. Tracing So far, we covered the importance of metrics and logging. Metrics give an abstract overview of the system, and logging gives a record of events that occurred. Imagine a complex distributed system with multiple microservices, where a user request is processed by multiple microservices in the system. Metrics and logging give you some information about how these requests are being handled by the system, but they fail to provide detailed information across all the microservices and how they affect a particular client request. If a slow downstream microservice is leading to increased response times, you need to have detailed visibility across all involved microservices to identify such microservice. The answer to this need is a request tracing mechanism. A trace is a series of spans, where each span is a record of events performed by different microservices to serve the client's request. In simple terms, a trace is a log of client-request serving derived from various microservices across different physical machines. Each span includes span metadata such as trace ID and span ID, and context, which includes information about transactions performed. Figure 11: Trace and spans for a URL shortener request Figure 11 is a graphical representation of a trace captured on the URL shortener example we covered earlier while learning Python. Similar to monitoring, the tracing infrastructure comprises a few modules for collecting traces, storing them, and accessing them. Each microservice runs a tracing library that collects traces in the background, creates in-memory batches, and submits the tracing backend. The tracing backend normalizes received trace data and stores it on persistent storage. Tracing data comes from multiple different microservices; therefore, trace storage is often organized to store data incrementally and is indexed by trace identifier. This organization helps in the reconstruction of trace data and in visualization. Figure 12 illustrates the anatomy of the distributed system. Figure 12: Anatomy of distributed tracing Today a set of tools and frameworks are available for building distributed tracing solutions. Following are some of the popular tools: OpenTelemetry : Observability framework for cloud-native software Jaeger : Open-source distributed tracing solution Zipkin : Open-source distributed tracing solution","title":"Observability"},{"location":"level101/metrics_and_monitoring/observability/#_1","text":"","title":""},{"location":"level101/metrics_and_monitoring/observability/#observability","text":"Engineers often use observability when referring to building reliable systems. Observability is a term derived from control theory, It is a measure of how well internal states of a system can be inferred from knowledge of its external outputs. Service infrastructures used on a daily basis are becoming more and more complex; proactive monitoring alone is not sufficient to quickly resolve issues causing application failures. With monitoring, you can keep known past failures from recurring, but with a complex service architecture, many unknown factors can cause potential problems. To address such cases, you can make the service observable. An observable system provides highly granular insights into the implicit failure modes. In addition, an observable system furnishes ample context about its inner workings, which unlocks the ability to uncover deeper systemic issues. Monitoring enables failure detection; observability helps in gaining a better understanding of the system. Among engineers, there is a common misconception that monitoring and observability are two different things. Actually, observability is the superset to monitoring; that is, monitoring improves service observability. The goal of observability is not only to detect problems, but also to understand where the issue is and what is causing it. In addition to metrics, observability has two more pillars: logs and traces, as shown in Figure 9. Although these three components do not make a system 100 percent observable, these are the most important and powerful components that give a better understanding of the system. Each of these pillars has its flaws, which are described in Three Pillars with Zero Answers . Figure 9: Three pillars of observability Because we have covered metrics already, let's look at the other two pillars (logs and traces).","title":"Observability"},{"location":"level101/metrics_and_monitoring/observability/#logs","text":"Logs (often referred to as events ) are a record of activities performed by a service during its run time, with a corresponding timestamp. Metrics give abstract information about degradations in a system, and logs give a detailed view of what is causing these degradations. Logs created by the applications and infrastructure components help in effectively understanding the behavior of the system by providing details on application errors, exceptions, and event timelines. Logs help you to go back in time to understand the events that led to a failure. Therefore, examining logs is essential to troubleshooting system failures. Log processing involves the aggregation of different logs from individual applications and their subsequent shipment to central storage. Moving logs to central storage helps to preserve the logs, in case the application instances are inaccessible, or the application crashes due to a failure. After the logs are available in a central place, you can analyze the logs to derive sensible information from them. For audit and compliance purposes, you archive these logs on the central storage for a certain period of time. Log analyzers fetch useful information from log lines, such as request user information, request URL (feature), and response headers (such as content length) and response time. This information is grouped based on these attributes and made available to you through a visualization tool for quick understanding. You might be wondering how this log information helps. This information gives a holistic view of activities performed on all the involved entities. For example, let's say someone is performing a DoS (denial of service) attack on a web application. With the help of log processing, you can quickly look at top client IPs derived from access logs and identify where the attack is coming from. Similarly, if a feature in an application is causing a high error rate when accessed with a particular request parameter value, the results of log analysis can help you to quickly identify the misbehaving parameter value and take further action. Figure 10: Log processing and analysis using ELK stack Figure 10 shows a log processing platform using ELK (Elasticsearch, Logstash, Kibana), which provides centralized log processing. Beats is a collection of lightweight data shippers that can ship logs, audit data, network data, and so on over the network. In this use case specifically, we are using filebeat as a log shipper. Filebeat watches service log files and ships the log data to Logstash. Logstash parses these logs and transforms the data, preparing it to store on Elasticsearch. Transformed log data is stored on Elasticsearch and indexed for fast retrieval. Kibana searches and displays log data stored on Elasticsearch. Kibana also provides a set of visualizations for graphically displaying summaries derived from log data. Storing logs is expensive. And extensive logging of every event on the server is costly and takes up more storage space. With an increasing number of services, this cost can increase proportionally to the number of services.","title":"Logs"},{"location":"level101/metrics_and_monitoring/observability/#tracing","text":"So far, we covered the importance of metrics and logging. Metrics give an abstract overview of the system, and logging gives a record of events that occurred. Imagine a complex distributed system with multiple microservices, where a user request is processed by multiple microservices in the system. Metrics and logging give you some information about how these requests are being handled by the system, but they fail to provide detailed information across all the microservices and how they affect a particular client request. If a slow downstream microservice is leading to increased response times, you need to have detailed visibility across all involved microservices to identify such microservice. The answer to this need is a request tracing mechanism. A trace is a series of spans, where each span is a record of events performed by different microservices to serve the client's request. In simple terms, a trace is a log of client-request serving derived from various microservices across different physical machines. Each span includes span metadata such as trace ID and span ID, and context, which includes information about transactions performed. Figure 11: Trace and spans for a URL shortener request Figure 11 is a graphical representation of a trace captured on the URL shortener example we covered earlier while learning Python. Similar to monitoring, the tracing infrastructure comprises a few modules for collecting traces, storing them, and accessing them. Each microservice runs a tracing library that collects traces in the background, creates in-memory batches, and submits the tracing backend. The tracing backend normalizes received trace data and stores it on persistent storage. Tracing data comes from multiple different microservices; therefore, trace storage is often organized to store data incrementally and is indexed by trace identifier. This organization helps in the reconstruction of trace data and in visualization. Figure 12 illustrates the anatomy of the distributed system. Figure 12: Anatomy of distributed tracing Today a set of tools and frameworks are available for building distributed tracing solutions. Following are some of the popular tools: OpenTelemetry : Observability framework for cloud-native software Jaeger : Open-source distributed tracing solution Zipkin : Open-source distributed tracing solution","title":"Tracing"},{"location":"level101/metrics_and_monitoring/third-party_monitoring/","text":"Third-party monitoring Today most cloud providers offer a variety of monitoring solutions. In addition, a number of companies such as Datadog offer monitoring-as-a-service. In this section, we are not covering monitoring-as-a-service in depth. In recent years, more and more people have access to the internet. Many services are offered online to cater to the increasing user base. As a result, web pages are becoming larger, with increased client-side scripts. Users want these services to be fast and error-free. From the service point of view, when the response body is composed, an HTTP 200 OK response is sent, and everything looks okay. But there might be errors during transmission or on the client side. As previously mentioned, monitoring services from within the service infrastructure give good visibility into service health, but this is not enough. You need to monitor user experience, specifically the availability of services for clients. A number of third-party services such as Catchpoint , Pingdom , and so on are available for achieving this goal. Third-party monitoring services can generate synthetic traffic simulating user requests from various parts of the world, to ensure the service is globally accessible. Other third-party monitoring solutions for real user monitoring (RUM) provide performance statistics such as service uptime and response time, from different geographical locations. This allows you to monitor the user experience from these locations, which might have different internet backbones, different operating systems, and different browsers and browser versions. Catchpoint Global Monitoring Network is a comprehensive 3-minute video that explains the importance of monitoring the client experience.","title":"Third-party Monitoring"},{"location":"level101/metrics_and_monitoring/third-party_monitoring/#_1","text":"","title":""},{"location":"level101/metrics_and_monitoring/third-party_monitoring/#third-party-monitoring","text":"Today most cloud providers offer a variety of monitoring solutions. In addition, a number of companies such as Datadog offer monitoring-as-a-service. In this section, we are not covering monitoring-as-a-service in depth. In recent years, more and more people have access to the internet. Many services are offered online to cater to the increasing user base. As a result, web pages are becoming larger, with increased client-side scripts. Users want these services to be fast and error-free. From the service point of view, when the response body is composed, an HTTP 200 OK response is sent, and everything looks okay. But there might be errors during transmission or on the client side. As previously mentioned, monitoring services from within the service infrastructure give good visibility into service health, but this is not enough. You need to monitor user experience, specifically the availability of services for clients. A number of third-party services such as Catchpoint , Pingdom , and so on are available for achieving this goal. Third-party monitoring services can generate synthetic traffic simulating user requests from various parts of the world, to ensure the service is globally accessible. Other third-party monitoring solutions for real user monitoring (RUM) provide performance statistics such as service uptime and response time, from different geographical locations. This allows you to monitor the user experience from these locations, which might have different internet backbones, different operating systems, and different browsers and browser versions. Catchpoint Global Monitoring Network is a comprehensive 3-minute video that explains the importance of monitoring the client experience.","title":"Third-party monitoring"},{"location":"level101/python_web/intro/","text":"Python and The Web Prerequisites Basic understanding of python language. Basic familiarity with flask framework. What to expect from this course This course is divided into two high level parts. In the first part, assuming familiarity with python language\u2019s basic operations and syntax usage, we will dive a little deeper into understanding python as a language. We will compare python with other programming languages that you might already know like Java and C. We will also explore concepts of Python objects and with help of that, explore python features like decorators. In the second part which will revolve around the web, and also assume familiarity with the Flask framework, we will start from the socket module and work with HTTP requests. This will demystify how frameworks like flask work internally. And to introduce SRE flavour to the course, we will design, develop and deploy (in theory) a URL shortening application. We will emphasize parts of the whole process that are more important as an SRE of the said app/service. What is not covered under this course Extensive knowledge of python internals and advanced python. Lab Environment Setup Have latest version of python installed Course Contents The Python Language Some Python Concepts Python Gotchas Python and Web Sockets Flask The URL Shortening App Design Scaling The App Monitoring The App The Python Language Assuming you know a little bit of C/C++ and Java, let's try to discuss the following questions in context of those two languages and python. You might have heard that C/C++ is a compiled language while python is an interpreted language. Generally, with compiled language we first compile the program and then run the executable while in case of python we run the source code directly like python hello_world.py . While Java, being an interpreted language, still has a separate compilation step and then its run. So what's really the difference? Compiled vs. Interpreted This might sound a little weird to you: python, in a way is a compiled language! Python has a compiler built-in! It is obvious in the case of java since we compile it using a separate command ie: javac helloWorld.java and it will produce a .class file which we know as a bytecode . Well, python is very similar to that. One difference here is that there is no separate compile command/binary needed to run a python program. What is the difference then, between java and python? Well, Java's compiler is more strict and sophisticated. As you might know Java is a statically typed language. So the compiler is written in a way that it can verify types related errors during compile time. While python being a dynamic language, types are not known until a program is run. So in a way, python compiler is dumb (or, less strict). But there indeed is a compile step involved when a python program is run. You might have seen python bytecode files with .pyc extension. Here is how you can see bytecode for a given python program. # Create a Hello World $ echo \"print('hello world')\" > hello_world.py # Making sure it runs $ python3 hello_world.py hello world # The bytecode of the given program $ python -m dis hello_world.py 1 0 LOAD_NAME 0 (print) 2 LOAD_CONST 0 ('hello world') 4 CALL_FUNCTION 1 6 POP_TOP 8 LOAD_CONST 1 (None) 10 RETURN_VALUE Read more about dis module here Now coming to C/C++, there of course is a compiler. But the output is different than what java/python compiler would produce. Compiling a C program would produce what we also know as machine code . As opposed to bytecode. Running The Programs We know compilation is involved in all 3 languages we are discussing. Just that the compilers are different in nature and they output different types of content. In case of C/C++, the output is machine code which can be directly read by your operating system. When you execute that program, your OS will know how exactly to run it. But this is not the case with bytecode. Those bytecodes are language specific. Python has its own set of bytecode defined (more in dis module) and so does java. So naturally, your operating system will not know how to run it. To run this bytecode, we have something called Virtual Machines. Ie: The JVM or the Python VM (CPython, Jython). These so called Virtual Machines are the programs which can read the bytecode and run it on a given operating system. Python has multiple VMs available. Cpython is a python VM implemented in C language, similarly Jython is a Java implementation of python VM. At the end of the day, what they should be capable of is to understand python language syntax, be able to compile it to bytecode and be able to run that bytecode. You can implement a python VM in any language! (And people do so, just because it can be done) The Operating System +------------------------------------+ | | | | | | hello_world.py Python bytecode | Python VM Process | | | +----------------+ +----------------+ | +----------------+ | |print(... | COMPILE |LOAD_CONST... | | |Reads bytecode | | | +--------------->+ +------------------->+line by line | | | | | | | |and executes. | | | | | | | | | | +----------------+ +----------------+ | +----------------+ | | | | | | | hello_world.c OS Specific machinecode | A New Process | | | +----------------+ +----------------+ | +----------------+ | |void main() { | COMPILE | binary contents| | | binary contents| | | +--------------->+ +------------------->+ | | | | | | | | | | | | | | | | | | +----------------+ +----------------+ | +----------------+ | | (binary contents | | runs as is) | | | | | +------------------------------------+ Two things to note for above diagram: Generally, when we run a python program, a python VM process is started which reads the python source code, compiles it to byte code and run it in a single step. Compiling is not a separate step. Shown only for illustration purpose. Binaries generated for C like languages are not exactly run as is. Since there are multiple types of binaries (eg: ELF), there are more complicated steps involved in order to run a binary but we will not go into that since all that is done at OS level.","title":"Introduction"},{"location":"level101/python_web/intro/#python-and-the-web","text":"","title":"Python and The Web"},{"location":"level101/python_web/intro/#prerequisites","text":"Basic understanding of python language. Basic familiarity with flask framework.","title":"Prerequisites"},{"location":"level101/python_web/intro/#what-to-expect-from-this-course","text":"This course is divided into two high level parts. In the first part, assuming familiarity with python language\u2019s basic operations and syntax usage, we will dive a little deeper into understanding python as a language. We will compare python with other programming languages that you might already know like Java and C. We will also explore concepts of Python objects and with help of that, explore python features like decorators. In the second part which will revolve around the web, and also assume familiarity with the Flask framework, we will start from the socket module and work with HTTP requests. This will demystify how frameworks like flask work internally. And to introduce SRE flavour to the course, we will design, develop and deploy (in theory) a URL shortening application. We will emphasize parts of the whole process that are more important as an SRE of the said app/service.","title":"What to expect from this course"},{"location":"level101/python_web/intro/#what-is-not-covered-under-this-course","text":"Extensive knowledge of python internals and advanced python.","title":"What is not covered under this course"},{"location":"level101/python_web/intro/#lab-environment-setup","text":"Have latest version of python installed","title":"Lab Environment Setup"},{"location":"level101/python_web/intro/#course-contents","text":"The Python Language Some Python Concepts Python Gotchas Python and Web Sockets Flask The URL Shortening App Design Scaling The App Monitoring The App","title":"Course Contents"},{"location":"level101/python_web/intro/#the-python-language","text":"Assuming you know a little bit of C/C++ and Java, let's try to discuss the following questions in context of those two languages and python. You might have heard that C/C++ is a compiled language while python is an interpreted language. Generally, with compiled language we first compile the program and then run the executable while in case of python we run the source code directly like python hello_world.py . While Java, being an interpreted language, still has a separate compilation step and then its run. So what's really the difference?","title":"The Python Language"},{"location":"level101/python_web/intro/#compiled-vs-interpreted","text":"This might sound a little weird to you: python, in a way is a compiled language! Python has a compiler built-in! It is obvious in the case of java since we compile it using a separate command ie: javac helloWorld.java and it will produce a .class file which we know as a bytecode . Well, python is very similar to that. One difference here is that there is no separate compile command/binary needed to run a python program. What is the difference then, between java and python? Well, Java's compiler is more strict and sophisticated. As you might know Java is a statically typed language. So the compiler is written in a way that it can verify types related errors during compile time. While python being a dynamic language, types are not known until a program is run. So in a way, python compiler is dumb (or, less strict). But there indeed is a compile step involved when a python program is run. You might have seen python bytecode files with .pyc extension. Here is how you can see bytecode for a given python program. # Create a Hello World $ echo \"print('hello world')\" > hello_world.py # Making sure it runs $ python3 hello_world.py hello world # The bytecode of the given program $ python -m dis hello_world.py 1 0 LOAD_NAME 0 (print) 2 LOAD_CONST 0 ('hello world') 4 CALL_FUNCTION 1 6 POP_TOP 8 LOAD_CONST 1 (None) 10 RETURN_VALUE Read more about dis module here Now coming to C/C++, there of course is a compiler. But the output is different than what java/python compiler would produce. Compiling a C program would produce what we also know as machine code . As opposed to bytecode.","title":"Compiled vs. Interpreted"},{"location":"level101/python_web/intro/#running-the-programs","text":"We know compilation is involved in all 3 languages we are discussing. Just that the compilers are different in nature and they output different types of content. In case of C/C++, the output is machine code which can be directly read by your operating system. When you execute that program, your OS will know how exactly to run it. But this is not the case with bytecode. Those bytecodes are language specific. Python has its own set of bytecode defined (more in dis module) and so does java. So naturally, your operating system will not know how to run it. To run this bytecode, we have something called Virtual Machines. Ie: The JVM or the Python VM (CPython, Jython). These so called Virtual Machines are the programs which can read the bytecode and run it on a given operating system. Python has multiple VMs available. Cpython is a python VM implemented in C language, similarly Jython is a Java implementation of python VM. At the end of the day, what they should be capable of is to understand python language syntax, be able to compile it to bytecode and be able to run that bytecode. You can implement a python VM in any language! (And people do so, just because it can be done) The Operating System +------------------------------------+ | | | | | | hello_world.py Python bytecode | Python VM Process | | | +----------------+ +----------------+ | +----------------+ | |print(... | COMPILE |LOAD_CONST... | | |Reads bytecode | | | +--------------->+ +------------------->+line by line | | | | | | | |and executes. | | | | | | | | | | +----------------+ +----------------+ | +----------------+ | | | | | | | hello_world.c OS Specific machinecode | A New Process | | | +----------------+ +----------------+ | +----------------+ | |void main() { | COMPILE | binary contents| | | binary contents| | | +--------------->+ +------------------->+ | | | | | | | | | | | | | | | | | | +----------------+ +----------------+ | +----------------+ | | (binary contents | | runs as is) | | | | | +------------------------------------+ Two things to note for above diagram: Generally, when we run a python program, a python VM process is started which reads the python source code, compiles it to byte code and run it in a single step. Compiling is not a separate step. Shown only for illustration purpose. Binaries generated for C like languages are not exactly run as is. Since there are multiple types of binaries (eg: ELF), there are more complicated steps involved in order to run a binary but we will not go into that since all that is done at OS level.","title":"Running The Programs"},{"location":"level101/python_web/python-concepts/","text":"Some Python Concepts Though you are expected to know python and its syntax at basic level, let us discuss some fundamental concepts that will help you understand the python language better. Everything in Python is an object. That includes the functions, lists, dicts, classes, modules, a running function (instance of function definition), everything. In the CPython, it would mean there is an underlying struct variable for each object. In python's current execution context, all the variables are stored in a dict. It'd be a string to object mapping. If you have a function and a float variable defined in the current context, here is how it is handled internally. >>> float_number=42.0 >>> def foo_func(): ... pass ... # NOTICE HOW VARIABLE NAMES ARE STRINGS, stored in a dict >>> locals() {'__name__': '__main__', '__doc__': None, '__package__': None, '__loader__': , '__spec__': None, '__annotations__': {}, '__builtins__': , 'float_number': 42.0, 'foo_func': } Python Functions Since functions too are objects, we can see what all attributes a function contains as following >>> def hello(name): ... print(f\"Hello, {name}!\") ... >>> dir(hello) ['__annotations__', '__call__', '__class__', '__closure__', '__code__', '__defaults__', '__delattr__', '__dict__', '__dir__', '__doc__', '__eq__', '__format__', '__ge__', '__get__', '__getattribute__', '__globals__', '__gt__', '__hash__', '__init__', '__init_subclass__', '__kwdefaults__', '__le__', '__lt__', '__module__', '__name__', '__ne__', '__new__', '__qualname__', '__reduce__', '__reduce_ex__', '__repr__', '__setattr__', '__sizeof__', '__str__', '__subclasshook__'] While there are a lot of them, let's look at some interesting ones globals This attribute, as the name suggests, has references of global variables. If you ever need to know what all global variables are in the scope of this function, this will tell you. See how the function start seeing the new variable in globals >>> hello.__globals__ {'__name__': '__main__', '__doc__': None, '__package__': None, '__loader__': , '__spec__': None, '__annotations__': {}, '__builtins__': , 'hello': } # adding new global variable >>> GLOBAL=\"g_val\" >>> hello.__globals__ {'__name__': '__main__', '__doc__': None, '__package__': None, '__loader__': , '__spec__': None, '__annotations__': {}, '__builtins__': , 'hello': , 'GLOBAL': 'g_val'} code This is an interesting one! As everything in python is an object, this includes the bytecode too. The compiled python bytecode is a python code object. Which is accessible via __code__ attribute here. A function has an associated code object which carries some interesting information. # the file in which function is defined # stdin here since this is run in an interpreter >>> hello.__code__.co_filename '' # number of arguments the function takes >>> hello.__code__.co_argcount 1 # local variable names >>> hello.__code__.co_varnames ('name',) # the function code's compiled bytecode >>> hello.__code__.co_code b't\\x00d\\x01|\\x00\\x9b\\x00d\\x02\\x9d\\x03\\x83\\x01\\x01\\x00d\\x00S\\x00' There are more code attributes which you can enlist by >>> dir(hello.__code__) Decorators Related to functions, python has another feature called decorators. Let's see how that works, keeping everything is an object in mind. Here is a sample decorator: >>> def deco(func): ... def inner(): ... print(\"before\") ... func() ... print(\"after\") ... return inner ... >>> @deco ... def hello_world(): ... print(\"hello world\") ... >>> >>> hello_world() before hello world after Here @deco syntax is used to decorate the hello_world function. It is essentially same as doing >>> def hello_world(): ... print(\"hello world\") ... >>> hello_world = deco(hello_world) What goes inside the deco function might seem complex. Let's try to uncover it. Function hello_world is created It is passed to deco function deco create a new function This new function is calls hello_world function And does a couple other things deco returns the newly created function hello_world is replaced with above function Let's visualize it for better understanding BEFORE function_object (ID: 100) \"hello_world\" +--------------------+ + |print(\"hello_world\")| | | | +--------------> | | | | +--------------------+ WHAT DECORATOR DOES creates a new function (ID: 101) +---------------------------------+ |input arg: function with id: 100 | | | |print(\"before\") | |call function object with id 100 | |print(\"after\") | | | +---------------------------------+ ^ | AFTER | | | \"hello_world\" +-------------+ Note how the hello_world name points to a new function object but that new function object knows the reference (ID) of the original function. Some Gotchas While it is very quick to build prototypes in python and there are tons of libraries available, as the codebase complexity increases, type errors become more common and will get hard to deal with. (There are solutions to that problem like type annotations in python. Checkout mypy .) Because python is dynamically typed language, that means all types are determined at runtime. And that makes python run very slow compared to other statically typed languages. Python has something called GIL (global interpreter lock) which is a limiting factor for utilizing multiple CPU cores for parallel computation. Some weird things that python does: https://github.com/satwikkansal/wtfpython","title":"Some Python Concepts"},{"location":"level101/python_web/python-concepts/#some-python-concepts","text":"Though you are expected to know python and its syntax at basic level, let us discuss some fundamental concepts that will help you understand the python language better. Everything in Python is an object. That includes the functions, lists, dicts, classes, modules, a running function (instance of function definition), everything. In the CPython, it would mean there is an underlying struct variable for each object. In python's current execution context, all the variables are stored in a dict. It'd be a string to object mapping. If you have a function and a float variable defined in the current context, here is how it is handled internally. >>> float_number=42.0 >>> def foo_func(): ... pass ... # NOTICE HOW VARIABLE NAMES ARE STRINGS, stored in a dict >>> locals() {'__name__': '__main__', '__doc__': None, '__package__': None, '__loader__': , '__spec__': None, '__annotations__': {}, '__builtins__': , 'float_number': 42.0, 'foo_func': }","title":"Some Python Concepts"},{"location":"level101/python_web/python-concepts/#python-functions","text":"Since functions too are objects, we can see what all attributes a function contains as following >>> def hello(name): ... print(f\"Hello, {name}!\") ... >>> dir(hello) ['__annotations__', '__call__', '__class__', '__closure__', '__code__', '__defaults__', '__delattr__', '__dict__', '__dir__', '__doc__', '__eq__', '__format__', '__ge__', '__get__', '__getattribute__', '__globals__', '__gt__', '__hash__', '__init__', '__init_subclass__', '__kwdefaults__', '__le__', '__lt__', '__module__', '__name__', '__ne__', '__new__', '__qualname__', '__reduce__', '__reduce_ex__', '__repr__', '__setattr__', '__sizeof__', '__str__', '__subclasshook__'] While there are a lot of them, let's look at some interesting ones","title":"Python Functions"},{"location":"level101/python_web/python-concepts/#globals","text":"This attribute, as the name suggests, has references of global variables. If you ever need to know what all global variables are in the scope of this function, this will tell you. See how the function start seeing the new variable in globals >>> hello.__globals__ {'__name__': '__main__', '__doc__': None, '__package__': None, '__loader__': , '__spec__': None, '__annotations__': {}, '__builtins__': , 'hello': } # adding new global variable >>> GLOBAL=\"g_val\" >>> hello.__globals__ {'__name__': '__main__', '__doc__': None, '__package__': None, '__loader__': , '__spec__': None, '__annotations__': {}, '__builtins__': , 'hello': , 'GLOBAL': 'g_val'}","title":"globals"},{"location":"level101/python_web/python-concepts/#code","text":"This is an interesting one! As everything in python is an object, this includes the bytecode too. The compiled python bytecode is a python code object. Which is accessible via __code__ attribute here. A function has an associated code object which carries some interesting information. # the file in which function is defined # stdin here since this is run in an interpreter >>> hello.__code__.co_filename '' # number of arguments the function takes >>> hello.__code__.co_argcount 1 # local variable names >>> hello.__code__.co_varnames ('name',) # the function code's compiled bytecode >>> hello.__code__.co_code b't\\x00d\\x01|\\x00\\x9b\\x00d\\x02\\x9d\\x03\\x83\\x01\\x01\\x00d\\x00S\\x00' There are more code attributes which you can enlist by >>> dir(hello.__code__)","title":"code"},{"location":"level101/python_web/python-concepts/#decorators","text":"Related to functions, python has another feature called decorators. Let's see how that works, keeping everything is an object in mind. Here is a sample decorator: >>> def deco(func): ... def inner(): ... print(\"before\") ... func() ... print(\"after\") ... return inner ... >>> @deco ... def hello_world(): ... print(\"hello world\") ... >>> >>> hello_world() before hello world after Here @deco syntax is used to decorate the hello_world function. It is essentially same as doing >>> def hello_world(): ... print(\"hello world\") ... >>> hello_world = deco(hello_world) What goes inside the deco function might seem complex. Let's try to uncover it. Function hello_world is created It is passed to deco function deco create a new function This new function is calls hello_world function And does a couple other things deco returns the newly created function hello_world is replaced with above function Let's visualize it for better understanding BEFORE function_object (ID: 100) \"hello_world\" +--------------------+ + |print(\"hello_world\")| | | | +--------------> | | | | +--------------------+ WHAT DECORATOR DOES creates a new function (ID: 101) +---------------------------------+ |input arg: function with id: 100 | | | |print(\"before\") | |call function object with id 100 | |print(\"after\") | | | +---------------------------------+ ^ | AFTER | | | \"hello_world\" +-------------+ Note how the hello_world name points to a new function object but that new function object knows the reference (ID) of the original function.","title":"Decorators"},{"location":"level101/python_web/python-concepts/#some-gotchas","text":"While it is very quick to build prototypes in python and there are tons of libraries available, as the codebase complexity increases, type errors become more common and will get hard to deal with. (There are solutions to that problem like type annotations in python. Checkout mypy .) Because python is dynamically typed language, that means all types are determined at runtime. And that makes python run very slow compared to other statically typed languages. Python has something called GIL (global interpreter lock) which is a limiting factor for utilizing multiple CPU cores for parallel computation. Some weird things that python does: https://github.com/satwikkansal/wtfpython","title":"Some Gotchas"},{"location":"level101/python_web/python-web-flask/","text":"Python, Web and Flask Back in the old days, websites were simple. They were simple static html contents. A webserver would be listening on a defined port and according to the HTTP request received, it would read files from disk and return them in response. But since then, complexity has evolved and websites are now dynamic. Depending on the request, multiple operations need to be performed like reading from database or calling other API and finally returning some response (HTML data, JSON content etc.) Since serving web requests is no longer a simple task like reading files from disk and return contents, we need to process each http request, perform some operations programmatically and construct a response. Sockets Though we have frameworks like flask, HTTP is still a protocol that works over TCP protocol. So let us setup a TCP server and send an HTTP request and inspect the request's payload. Note that this is not a tutorial on socket programming but what we are doing here is inspecting HTTP protocol at its ground level and look at what its contents look like. (Ref: Socket Programming in Python (Guide) on RealPython ) import socket HOST = '127.0.0.1' # Standard loopback interface address (localhost) PORT = 65432 # Port to listen on (non-privileged ports are > 1023) with socket.socket(socket.AF_INET, socket.SOCK_STREAM) as s: s.bind((HOST, PORT)) s.listen() conn, addr = s.accept() with conn: print('Connected by', addr) while True: data = conn.recv(1024) if not data: break print(data) Then we open localhost:65432 in our web browser and following would be the output: Connected by ('127.0.0.1', 54719) b'GET / HTTP/1.1\\r\\nHost: localhost:65432\\r\\nConnection: keep-alive\\r\\nDNT: 1\\r\\nUpgrade-Insecure-Requests: 1\\r\\nUser-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/85.0.4183.83 Safari/537.36 Edg/85.0.564.44\\r\\nAccept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9\\r\\nSec-Fetch-Site: none\\r\\nSec-Fetch-Mode: navigate\\r\\nSec-Fetch-User: ?1\\r\\nSec-Fetch-Dest: document\\r\\nAccept-Encoding: gzip, deflate, br\\r\\nAccept-Language: en-US,en;q=0.9\\r\\n\\r\\n' Examine closely and the content will look like the HTTP protocol's format. ie: HTTP_METHOD URI_PATH HTTP_VERSION HEADERS_SEPARATED_BY_SEPARATOR So though it's a blob of bytes, knowing http protocol specification , you can parse that string (ie: split by \\r\\n ) and get meaningful information out of it. Flask Flask, and other such frameworks does pretty much what we just discussed in the last section (with added more sophistication). They listen on a port on a TCP socket, receive an HTTP request, parse the data according to protocol format and make it available to you in a convenient manner. ie: you can access headers in flask by request.headers which is made available to you by splitting above payload by /r/n , as defined in http protocol. Another example: we register routes in flask by @app.route(\"/hello\") . What flask will do is maintain a registry internally which will map /hello with the function you decorated with. Now whenever a request comes with the /hello route (second component in the first line, split by space), flask calls the registered function and returns whatever the function returned. Same with all other web frameworks in other languages too. They all work on similar principles. What they basically do is understand the HTTP protocol, parses the HTTP request data and gives us programmers a nice interface to work with HTTP requests. Not so much of magic, innit?","title":"Python, Web and Flask"},{"location":"level101/python_web/python-web-flask/#python-web-and-flask","text":"Back in the old days, websites were simple. They were simple static html contents. A webserver would be listening on a defined port and according to the HTTP request received, it would read files from disk and return them in response. But since then, complexity has evolved and websites are now dynamic. Depending on the request, multiple operations need to be performed like reading from database or calling other API and finally returning some response (HTML data, JSON content etc.) Since serving web requests is no longer a simple task like reading files from disk and return contents, we need to process each http request, perform some operations programmatically and construct a response.","title":"Python, Web and Flask"},{"location":"level101/python_web/python-web-flask/#sockets","text":"Though we have frameworks like flask, HTTP is still a protocol that works over TCP protocol. So let us setup a TCP server and send an HTTP request and inspect the request's payload. Note that this is not a tutorial on socket programming but what we are doing here is inspecting HTTP protocol at its ground level and look at what its contents look like. (Ref: Socket Programming in Python (Guide) on RealPython ) import socket HOST = '127.0.0.1' # Standard loopback interface address (localhost) PORT = 65432 # Port to listen on (non-privileged ports are > 1023) with socket.socket(socket.AF_INET, socket.SOCK_STREAM) as s: s.bind((HOST, PORT)) s.listen() conn, addr = s.accept() with conn: print('Connected by', addr) while True: data = conn.recv(1024) if not data: break print(data) Then we open localhost:65432 in our web browser and following would be the output: Connected by ('127.0.0.1', 54719) b'GET / HTTP/1.1\\r\\nHost: localhost:65432\\r\\nConnection: keep-alive\\r\\nDNT: 1\\r\\nUpgrade-Insecure-Requests: 1\\r\\nUser-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/85.0.4183.83 Safari/537.36 Edg/85.0.564.44\\r\\nAccept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9\\r\\nSec-Fetch-Site: none\\r\\nSec-Fetch-Mode: navigate\\r\\nSec-Fetch-User: ?1\\r\\nSec-Fetch-Dest: document\\r\\nAccept-Encoding: gzip, deflate, br\\r\\nAccept-Language: en-US,en;q=0.9\\r\\n\\r\\n' Examine closely and the content will look like the HTTP protocol's format. ie: HTTP_METHOD URI_PATH HTTP_VERSION HEADERS_SEPARATED_BY_SEPARATOR So though it's a blob of bytes, knowing http protocol specification , you can parse that string (ie: split by \\r\\n ) and get meaningful information out of it.","title":"Sockets"},{"location":"level101/python_web/python-web-flask/#flask","text":"Flask, and other such frameworks does pretty much what we just discussed in the last section (with added more sophistication). They listen on a port on a TCP socket, receive an HTTP request, parse the data according to protocol format and make it available to you in a convenient manner. ie: you can access headers in flask by request.headers which is made available to you by splitting above payload by /r/n , as defined in http protocol. Another example: we register routes in flask by @app.route(\"/hello\") . What flask will do is maintain a registry internally which will map /hello with the function you decorated with. Now whenever a request comes with the /hello route (second component in the first line, split by space), flask calls the registered function and returns whatever the function returned. Same with all other web frameworks in other languages too. They all work on similar principles. What they basically do is understand the HTTP protocol, parses the HTTP request data and gives us programmers a nice interface to work with HTTP requests. Not so much of magic, innit?","title":"Flask"},{"location":"level101/python_web/sre-conclusion/","text":"Conclusion Scaling The App The design and development is just a part of the journey. We will need to setup continuous integration and continuous delivery pipelines sooner or later. And we have to deploy this app somewhere. Initially we can start with deploying this app on one virtual machine on any cloud provider. But this is a Single point of failure which is something we never allow as an SRE (or even as an engineer). So an improvement here can be having multiple instances of applications deployed behind a load balancer. This certainly prevents problems of one machine going down. Scaling here would mean adding more instances behind the load balancer. But this is scalable upto only a certain point. After that, other bottlenecks in the system will start appearing. ie: DB will become the bottleneck, or perhaps the load balancer itself. How do you know what is the bottleneck? You need to have observability into each aspects of the application architecture. Only after you have metrics, you will be able to know what is going wrong where. What gets measured, gets fixed! Get deeper insights into scaling from School Of SRE's Scalability module and post going through it, apply your learnings and takeaways to this app. Think how will we make this app geographically distributed and highly available and scalable. Monitoring Strategy Once we have our application deployed. It will be working ok. But not forever. Reliability is in the title of our job and we make systems reliable by making the design in a certain way. But things still will go down. Machines will fail. Disks will behave weirdly. Buggy code will get pushed to production. And all these possible scenarios will make the system less reliable. So what do we do? We monitor! We keep an eye on the system's health and if anything is not going as expected, we want ourselves to get alerted. Now let's think in terms of the given url shortening app. We need to monitor it. And we would want to get notified in case something goes wrong. But we first need to decide what is that something that we want to keep an eye on. Since it's a web app serving HTTP requests, we want to keep an eye on HTTP Status codes and latencies Request volume again is a good candidate, if the app is receiving an unusual amount of traffic, something might be off. We also want to keep an eye on the database so depending on the database solution chosen. Query times, volumes, disk usage etc. Finally, there also needs to be some external monitoring which runs periodic tests from devices outside of your data centers. This emulates customers and ensures that from customer point of view, the system is working as expected. Applications in SRE role In the world of SRE, python is a widely used language. For small scripts and tooling developed for various purposes. Since tooling developed by SRE works with critical pieces of infrastructure and has great power (to bring things down), it is important to know what you are doing while using a programming language and its features. Also it is equally important to know the language and its characteristics while debugging the issues. As an SRE having a deeper understanding of python language, it has helped me a lot to debug very sneaky bugs and be generally more aware and informed while making certain design decisions. While developing tools may or may not be part of SRE job, supporting tools or services is more likely to be a daily duty. Building an application or tool is just a small part of productionization. While there is certainly that goes in the design of the application itself to make it more robust, as an SRE you are responsible for its reliability and stability once it is deployed and running. And to ensure that, you\u2019d need to understand the application first and then come up with a strategy to monitor it properly and be prepared for various failure scenarios. Optional Exercises Make a decorator that will cache function return values depending on input parameters. Host the URL shortening app on any cloud provider. Setup monitoring using many of the tools available like catchpoint, datadog etc. Create a minimal flask-like framework on top of TCP sockets. Conclusion This module, in the first part, aims to make you more aware of the things that will happen when you choose python as your programming language and what happens when you run a python program. With the knowledge of how python handles things internally as objects, lot of seemingly magic things in python will start to make more sense. The second part will first explain how a framework like flask works using the existing knowledge of protocols like TCP and HTTP. It then touches the whole lifecycle of an application development lifecycle including the SRE parts of it. While the design and areas in architecture considered will not be exhaustive, it will give a good overview of things that are also important being an SRE and why they are important.","title":"Conclusion"},{"location":"level101/python_web/sre-conclusion/#conclusion","text":"","title":"Conclusion"},{"location":"level101/python_web/sre-conclusion/#scaling-the-app","text":"The design and development is just a part of the journey. We will need to setup continuous integration and continuous delivery pipelines sooner or later. And we have to deploy this app somewhere. Initially we can start with deploying this app on one virtual machine on any cloud provider. But this is a Single point of failure which is something we never allow as an SRE (or even as an engineer). So an improvement here can be having multiple instances of applications deployed behind a load balancer. This certainly prevents problems of one machine going down. Scaling here would mean adding more instances behind the load balancer. But this is scalable upto only a certain point. After that, other bottlenecks in the system will start appearing. ie: DB will become the bottleneck, or perhaps the load balancer itself. How do you know what is the bottleneck? You need to have observability into each aspects of the application architecture. Only after you have metrics, you will be able to know what is going wrong where. What gets measured, gets fixed! Get deeper insights into scaling from School Of SRE's Scalability module and post going through it, apply your learnings and takeaways to this app. Think how will we make this app geographically distributed and highly available and scalable.","title":"Scaling The App"},{"location":"level101/python_web/sre-conclusion/#monitoring-strategy","text":"Once we have our application deployed. It will be working ok. But not forever. Reliability is in the title of our job and we make systems reliable by making the design in a certain way. But things still will go down. Machines will fail. Disks will behave weirdly. Buggy code will get pushed to production. And all these possible scenarios will make the system less reliable. So what do we do? We monitor! We keep an eye on the system's health and if anything is not going as expected, we want ourselves to get alerted. Now let's think in terms of the given url shortening app. We need to monitor it. And we would want to get notified in case something goes wrong. But we first need to decide what is that something that we want to keep an eye on. Since it's a web app serving HTTP requests, we want to keep an eye on HTTP Status codes and latencies Request volume again is a good candidate, if the app is receiving an unusual amount of traffic, something might be off. We also want to keep an eye on the database so depending on the database solution chosen. Query times, volumes, disk usage etc. Finally, there also needs to be some external monitoring which runs periodic tests from devices outside of your data centers. This emulates customers and ensures that from customer point of view, the system is working as expected.","title":"Monitoring Strategy"},{"location":"level101/python_web/sre-conclusion/#applications-in-sre-role","text":"In the world of SRE, python is a widely used language. For small scripts and tooling developed for various purposes. Since tooling developed by SRE works with critical pieces of infrastructure and has great power (to bring things down), it is important to know what you are doing while using a programming language and its features. Also it is equally important to know the language and its characteristics while debugging the issues. As an SRE having a deeper understanding of python language, it has helped me a lot to debug very sneaky bugs and be generally more aware and informed while making certain design decisions. While developing tools may or may not be part of SRE job, supporting tools or services is more likely to be a daily duty. Building an application or tool is just a small part of productionization. While there is certainly that goes in the design of the application itself to make it more robust, as an SRE you are responsible for its reliability and stability once it is deployed and running. And to ensure that, you\u2019d need to understand the application first and then come up with a strategy to monitor it properly and be prepared for various failure scenarios.","title":"Applications in SRE role"},{"location":"level101/python_web/sre-conclusion/#optional-exercises","text":"Make a decorator that will cache function return values depending on input parameters. Host the URL shortening app on any cloud provider. Setup monitoring using many of the tools available like catchpoint, datadog etc. Create a minimal flask-like framework on top of TCP sockets.","title":"Optional Exercises"},{"location":"level101/python_web/sre-conclusion/#conclusion_1","text":"This module, in the first part, aims to make you more aware of the things that will happen when you choose python as your programming language and what happens when you run a python program. With the knowledge of how python handles things internally as objects, lot of seemingly magic things in python will start to make more sense. The second part will first explain how a framework like flask works using the existing knowledge of protocols like TCP and HTTP. It then touches the whole lifecycle of an application development lifecycle including the SRE parts of it. While the design and areas in architecture considered will not be exhaustive, it will give a good overview of things that are also important being an SRE and why they are important.","title":"Conclusion"},{"location":"level101/python_web/url-shorten-app/","text":"The URL Shortening App Let's build a very simple URL shortening app using flask and try to incorporate all aspects of the development process including the reliability aspects. We will not be building the UI and we will come up with a minimal set of API that will be enough for the app to function well. Design We don't jump directly to coding. First thing we do is gather requirements. Come up with an approach. Have the approach/design reviewed by peers. Evolve, iterate, document the decisions and tradeoffs. And then finally implement. While we will not do the full blown design document here, we will raise certain questions here that are important to the design. 1. High Level Operations and API Endpoints Since it's a URL shortening app, we will need an API for generating the shorten link given an original link. And an API/Endpoint which will accept the shorten link and redirect to original URL. We are not including the user aspect of the app to keep things minimal. These two API should make app functional and usable by anyone. 2. How to shorten? Given a url, we will need to generate a shortened version of it. One approach could be using random characters for each link. Another thing that can be done is to use some sort of hashing algorithm. The benefit here is we will reuse the same hash for the same link. ie: if lot of people are shortening https://www.linkedin.com they all will have the same value, compared to multiple entries in DB if chosen random characters. What about hash collisions? Even in random characters approach, though there is a less probability, hash collisions can happen. And we need to be mindful of them. In that case we might want to prepend/append the string with some random value to avoid conflict. Also, choice of hash algorithm matters. We will need to analyze algorithms. Their CPU requirements and their characteristics. Choose one that suits the most. 3. Is URL Valid? Given a URL to shorten, how do we verify if the URL is valid? Do we even verify or validate? One basic check that can be done is see if the URL matches a regex of a URL. To go even further we can try opening/visiting the URL. But there are certain gotchas here. We need to define success criteria. ie: HTTP 200 means it is valid. What is the URL is in private network? What if URL is temporarily down? 4. Storage Finally, storage. Where will we store the data that we will generate over time? There are multiple database solutions available and we will need to choose the one that suits this app the most. Relational database like MySQL would be a fair choice but be sure to checkout School of SRE's SQL database section and NoSQL databases section for deeper insights into making a more informed decision. 5. Other We are not accounting for users into our app and other possible features like rate limiting, customized links etc but it will eventually come up with time. Depending on the requirements, they too might need to get incorporated. The minimal working code is given below for reference but I'd encourage you to come up with your own. from flask import Flask, redirect, request from hashlib import md5 app = Flask(\"url_shortener\") mapping = {} @app.route(\"/shorten\", methods=[\"POST\"]) def shorten(): global mapping payload = request.json if \"url\" not in payload: return \"Missing URL Parameter\", 400 # TODO: check if URL is valid hash_ = md5() hash_.update(payload[\"url\"].encode()) digest = hash_.hexdigest()[:5] # limiting to 5 chars. Less the limit more the chances of collission if digest not in mapping: mapping[digest] = payload[\"url\"] return f\"Shortened: r/{digest}\\n\" else: # TODO: check for hash collission return f\"Already exists: r/{digest}\\n\" @app.route(\"/r/\") def redirect_(hash_): if hash_ not in mapping: return \"URL Not Found\", 404 return redirect(mapping[hash_]) if __name__ == \"__main__\": app.run(debug=True) \"\"\" OUTPUT: ===> SHORTENING $ curl localhost:5000/shorten -H \"content-type: application/json\" --data '{\"url\":\"https://linkedin.com\"}' Shortened: r/a62a4 ===> REDIRECTING, notice the response code 302 and the location header $ curl localhost:5000/r/a62a4 -v * Uses proxy env variable NO_PROXY == '127.0.0.1' * Trying ::1... * TCP_NODELAY set * Connection failed * connect to ::1 port 5000 failed: Connection refused * Trying 127.0.0.1... * TCP_NODELAY set * Connected to localhost (127.0.0.1) port 5000 (#0) > GET /r/a62a4 HTTP/1.1 > Host: localhost:5000 > User-Agent: curl/7.64.1 > Accept: */* > * HTTP 1.0, assume close after body < HTTP/1.0 302 FOUND < Content-Type: text/html; charset=utf-8 < Content-Length: 247 < Location: https://linkedin.com < Server: Werkzeug/0.15.4 Python/3.7.7 < Date: Tue, 27 Oct 2020 09:37:12 GMT < Redirecting...

Redirecting...

* Closing connection 0

You should be redirected automatically to target URL: https://linkedin.com. If not click the link. \"\"\"","title":"The URL Shortening App"},{"location":"level101/python_web/url-shorten-app/#the-url-shortening-app","text":"Let's build a very simple URL shortening app using flask and try to incorporate all aspects of the development process including the reliability aspects. We will not be building the UI and we will come up with a minimal set of API that will be enough for the app to function well.","title":"The URL Shortening App"},{"location":"level101/python_web/url-shorten-app/#design","text":"We don't jump directly to coding. First thing we do is gather requirements. Come up with an approach. Have the approach/design reviewed by peers. Evolve, iterate, document the decisions and tradeoffs. And then finally implement. While we will not do the full blown design document here, we will raise certain questions here that are important to the design.","title":"Design"},{"location":"level101/python_web/url-shorten-app/#1-high-level-operations-and-api-endpoints","text":"Since it's a URL shortening app, we will need an API for generating the shorten link given an original link. And an API/Endpoint which will accept the shorten link and redirect to original URL. We are not including the user aspect of the app to keep things minimal. These two API should make app functional and usable by anyone.","title":"1. High Level Operations and API Endpoints"},{"location":"level101/python_web/url-shorten-app/#2-how-to-shorten","text":"Given a url, we will need to generate a shortened version of it. One approach could be using random characters for each link. Another thing that can be done is to use some sort of hashing algorithm. The benefit here is we will reuse the same hash for the same link. ie: if lot of people are shortening https://www.linkedin.com they all will have the same value, compared to multiple entries in DB if chosen random characters. What about hash collisions? Even in random characters approach, though there is a less probability, hash collisions can happen. And we need to be mindful of them. In that case we might want to prepend/append the string with some random value to avoid conflict. Also, choice of hash algorithm matters. We will need to analyze algorithms. Their CPU requirements and their characteristics. Choose one that suits the most.","title":"2. How to shorten?"},{"location":"level101/python_web/url-shorten-app/#3-is-url-valid","text":"Given a URL to shorten, how do we verify if the URL is valid? Do we even verify or validate? One basic check that can be done is see if the URL matches a regex of a URL. To go even further we can try opening/visiting the URL. But there are certain gotchas here. We need to define success criteria. ie: HTTP 200 means it is valid. What is the URL is in private network? What if URL is temporarily down?","title":"3. Is URL Valid?"},{"location":"level101/python_web/url-shorten-app/#4-storage","text":"Finally, storage. Where will we store the data that we will generate over time? There are multiple database solutions available and we will need to choose the one that suits this app the most. Relational database like MySQL would be a fair choice but be sure to checkout School of SRE's SQL database section and NoSQL databases section for deeper insights into making a more informed decision.","title":"4. Storage"},{"location":"level101/python_web/url-shorten-app/#5-other","text":"We are not accounting for users into our app and other possible features like rate limiting, customized links etc but it will eventually come up with time. Depending on the requirements, they too might need to get incorporated. The minimal working code is given below for reference but I'd encourage you to come up with your own. from flask import Flask, redirect, request from hashlib import md5 app = Flask(\"url_shortener\") mapping = {} @app.route(\"/shorten\", methods=[\"POST\"]) def shorten(): global mapping payload = request.json if \"url\" not in payload: return \"Missing URL Parameter\", 400 # TODO: check if URL is valid hash_ = md5() hash_.update(payload[\"url\"].encode()) digest = hash_.hexdigest()[:5] # limiting to 5 chars. Less the limit more the chances of collission if digest not in mapping: mapping[digest] = payload[\"url\"] return f\"Shortened: r/{digest}\\n\" else: # TODO: check for hash collission return f\"Already exists: r/{digest}\\n\" @app.route(\"/r/\") def redirect_(hash_): if hash_ not in mapping: return \"URL Not Found\", 404 return redirect(mapping[hash_]) if __name__ == \"__main__\": app.run(debug=True) \"\"\" OUTPUT: ===> SHORTENING $ curl localhost:5000/shorten -H \"content-type: application/json\" --data '{\"url\":\"https://linkedin.com\"}' Shortened: r/a62a4 ===> REDIRECTING, notice the response code 302 and the location header $ curl localhost:5000/r/a62a4 -v * Uses proxy env variable NO_PROXY == '127.0.0.1' * Trying ::1... * TCP_NODELAY set * Connection failed * connect to ::1 port 5000 failed: Connection refused * Trying 127.0.0.1... * TCP_NODELAY set * Connected to localhost (127.0.0.1) port 5000 (#0) > GET /r/a62a4 HTTP/1.1 > Host: localhost:5000 > User-Agent: curl/7.64.1 > Accept: */* > * HTTP 1.0, assume close after body < HTTP/1.0 302 FOUND < Content-Type: text/html; charset=utf-8 < Content-Length: 247 < Location: https://linkedin.com < Server: Werkzeug/0.15.4 Python/3.7.7 < Date: Tue, 27 Oct 2020 09:37:12 GMT < Redirecting...

Redirecting...

* Closing connection 0

You should be redirected automatically to target URL: https://linkedin.com. If not click the link. \"\"\"","title":"5. Other"},{"location":"level101/security/conclusion/","text":"Conclusion Now that you have completed this course on Security you are now aware of the possible security threats to computer systems & networks. Not only that, but you are now better able to protect your systems as well as recommend security measures to others. This course provides fundamental everyday knowledge on security domain which will also help you keep security at the top of your priority. Other Resources Some books that would be a great resource Holistic Info-Sec for Web Developers https://holisticinfosecforwebdevelopers.com/ - Free and downloadable book series with very broad and deep coverage of what Web Developers and DevOps Engineers need to know in order to create robust, reliable, maintainable and secure software, networks and other, that are delivered continuously, on time, with no nasty surprises Docker Security - Quick Reference: For DevOps Engineers https://leanpub.com/dockersecurity-quickreference - A book on understanding the Docker security defaults, how to improve them (theory and practical), along with many tools and techniques. How to Hack Like a Legend https://amzn.to/2uWh1Up - A hacker\u2019s tale breaking into a secretive offshore company, Sparc Flow, 2018 How to Investigate Like a Rockstar https://books2read.com/u/4jDWoZ - Live a real crisis to master the secrets of forensic analysis, Sparc Flow, 2017 Real World Cryptography https://www.manning.com/books/real-world-cryptography - This early-access book teaches you applied cryptographic techniques to understand and apply security at every level of your systems and applications. AWS Security https://www.manning.com/books/aws-security?utm_source=github&utm_medium=organic&utm_campaign=book_shields_aws_1_31_20 - This early-access book covers commong AWS security issues and best practices for access policies, data protection, auditing, continuous monitoring, and incident response. Post Training asks/ Further Reading CTF Events like : https://github.com/apsdehal/awesome-ctf Penetration Testing : https://github.com/enaqx/awesome-pentest Threat Intelligence : https://github.com/hslatman/awesome-threat-intelligence Threat Detection & Hunting : https://github.com/0x4D31/awesome-threat-detection Web Security: https://github.com/qazbnm456/awesome-web-security Building Secure and Reliable Systems : https://landing.google.com/sre/resources/foundationsandprinciples/srs-book/","title":"Conclusion"},{"location":"level101/security/conclusion/#conclusion","text":"Now that you have completed this course on Security you are now aware of the possible security threats to computer systems & networks. Not only that, but you are now better able to protect your systems as well as recommend security measures to others. This course provides fundamental everyday knowledge on security domain which will also help you keep security at the top of your priority.","title":"Conclusion"},{"location":"level101/security/conclusion/#other-resources","text":"Some books that would be a great resource Holistic Info-Sec for Web Developers https://holisticinfosecforwebdevelopers.com/ - Free and downloadable book series with very broad and deep coverage of what Web Developers and DevOps Engineers need to know in order to create robust, reliable, maintainable and secure software, networks and other, that are delivered continuously, on time, with no nasty surprises Docker Security - Quick Reference: For DevOps Engineers https://leanpub.com/dockersecurity-quickreference - A book on understanding the Docker security defaults, how to improve them (theory and practical), along with many tools and techniques. How to Hack Like a Legend https://amzn.to/2uWh1Up - A hacker\u2019s tale breaking into a secretive offshore company, Sparc Flow, 2018 How to Investigate Like a Rockstar https://books2read.com/u/4jDWoZ - Live a real crisis to master the secrets of forensic analysis, Sparc Flow, 2017 Real World Cryptography https://www.manning.com/books/real-world-cryptography - This early-access book teaches you applied cryptographic techniques to understand and apply security at every level of your systems and applications. AWS Security https://www.manning.com/books/aws-security?utm_source=github&utm_medium=organic&utm_campaign=book_shields_aws_1_31_20 - This early-access book covers commong AWS security issues and best practices for access policies, data protection, auditing, continuous monitoring, and incident response.","title":"Other Resources"},{"location":"level101/security/conclusion/#post-training-asks-further-reading","text":"CTF Events like : https://github.com/apsdehal/awesome-ctf Penetration Testing : https://github.com/enaqx/awesome-pentest Threat Intelligence : https://github.com/hslatman/awesome-threat-intelligence Threat Detection & Hunting : https://github.com/0x4D31/awesome-threat-detection Web Security: https://github.com/qazbnm456/awesome-web-security Building Secure and Reliable Systems : https://landing.google.com/sre/resources/foundationsandprinciples/srs-book/","title":"Post Training asks/ Further Reading"},{"location":"level101/security/fundamentals/","text":"Part I: Fundamentals Introduction to Security Overview for SRE If you look closely, both Site Reliability Engineering and Security Engineering are concerned with keeping a system usable. Issues like broken releases, capacity shortages, and misconfigurations can make a system unusable (at least temporarily). Security or privacy incidents that break the trust of users also undermine the usefulness of a system. Consequently, system security should be top of mind for SREs. SREs should be involved in both significant design discussions and actual system changes. They have quite a big role in System design & hence are quite sometimes the first line of defence. SRE\u2019s help in preventing bad design & implementations which can affect the overall security of the infrastructure. Successfully designing, implementing, and maintaining systems requires a commitment to the full system lifecycle . This commitment is possible only when security and reliability are central elements in the architecture of systems. Core Pillars of Information Security : Confidentiality \u2013 only allow access to data for which the user is permitted Integrity \u2013 ensure data is not tampered or altered by unauthorized users Availability \u2013 ensure systems and data are available to authorized users when they need it Thinking like a Security Engineer When starting a new application or re-factoring an existing application, you should consider each functional feature, and consider: Is the process surrounding this feature as safe as possible? In other words, is this a flawed process? If I were evil, how would I abuse this feature? Or more specifically failing to address how a feature can be abused can cause design flaws. Is the feature required to be on by default? If so, are there limits or options that could help reduce the risk from this feature? Security Principles By OWASP (Open Web Application Security Project) Minimize attack surface area : Every feature that is added to an application adds a certain amount of risk to the overall application. The aim of secure development is to reduce the overall risk by reducing the attack surface area. For example, a web application implements online help with a search function. The search function may be vulnerable to SQL injection attacks. If the help feature was limited to authorized users, the attack likelihood is reduced. If the help feature\u2019s search function was gated through centralized data validation routines, the ability to perform SQL injection is dramatically reduced. However, if the help feature was re-written to eliminate the search function (through a better user interface, for example), this almost eliminates the attack surface area, even if the help feature was available to the Internet at large. Establish secure defaults: There are many ways to deliver an \u201cout of the box\u201d experience for users. However, by default, the experience should be secure, and it should be up to the user to reduce their security \u2013 if they are allowed. For example, by default, password ageing and complexity should be enabled. Users might be allowed to turn these two features off to simplify their use of the application and increase their risk. Default Passwords of routers, IoT devices should be changed Principle of Least privilege The principle of least privilege recommends that accounts have the least amount of privilege required to perform their business processes. This encompasses user rights, resource permissions such as CPU limits, memory, network, and file system permissions. For example, if a middleware server only requires access to the network, read access to a database table, and the ability to write to a log, this describes all the permissions that should be granted. Under no circumstances should the middleware be granted administrative privileges. Principle of Defense in depth The principle of defence in depth suggests that where one control would be reasonable, more controls that approach risks in different fashions are better. Controls, when used in depth, can make severe vulnerabilities extraordinarily difficult to exploit and thus unlikely to occur. With secure coding, this may take the form of tier-based validation, centralized auditing controls, and requiring users to be logged on all pages. For example, a flawed administrative interface is unlikely to be vulnerable to an anonymous attack if it correctly gates access to production management networks, checks for administrative user authorization, and logs all access. Fail securely Applications regularly fail to process transactions for many reasons. How they fail can determine if an application is secure or not. ``` is_admin = true; try { code_which_may_faile(); is_admin = is_user_assigned_role(\"Adminstrator\"); } catch (Exception err) { log.error(err.toString()); } ``` - If either codeWhichMayFail() or isUserInRole fails or throws an exception, the user is an admin by default. This is obviously a security risk. Don\u2019t trust services Many organizations utilize the processing capabilities of third-party partners, who more than likely have different security policies and posture than you. It is unlikely that you can influence or control any external third party, whether they are home users or major suppliers or partners. Therefore, the implicit trust of externally run systems is not warranted. All external systems should be treated similarly. For example, a loyalty program provider provides data that is used by Internet Banking, providing the number of reward points and a small list of potential redemption items. However, the data should be checked to ensure that it is safe to display to end-users and that the reward points are a positive number, and not improbably large. Separation of duties The key to fraud control is the separation of duties. For example, someone who requests a computer cannot also sign for it, nor should they directly receive the computer. This prevents the user from requesting many computers and claiming they never arrived. Certain roles have different levels of trust than normal users. In particular, administrators are different from normal users. In general, administrators should not be users of the application. For example, an administrator should be able to turn the system on or off, set password policy but shouldn\u2019t be able to log on to the storefront as a super privileged user, such as being able to \u201cbuy\u201d goods on behalf of other users. Avoid security by obscurity Security through obscurity is a weak security control, and nearly always fails when it is the only control. This is not to say that keeping secrets is a bad idea, it simply means that the security of systems should not be reliant upon keeping details hidden. For example, the security of an application should not rely upon knowledge of the source code being kept secret. The security should rely upon many other factors, including reasonable password policies, defence in depth, business transaction limits, solid network architecture, and fraud, and audit controls. A practical example is Linux. Linux\u2019s source code is widely available, and yet when properly secured, Linux is a secure and robust operating system. Keep security simple Attack surface area and simplicity go hand in hand. Certain software engineering practices prefer overly complex approaches to what would otherwise be a relatively straightforward and simple design. Developers should avoid the use of double negatives and complex architectures when a simpler approach would be faster and simpler. For example, although it might be fashionable to have a slew of singleton entity beans running on a separate middleware server, it is more secure and faster to simply use global variables with an appropriate mutex mechanism to protect against race conditions. Fix security issues correctly Once a security issue has been identified, it is important to develop a test for it and to understand the root cause of the issue. When design patterns are used, the security issue is likely widespread amongst all codebases, so developing the right fix without introducing regressions is essential. For example, a user has found that they can see another user\u2019s balance by adjusting their cookie. The fix seems to be relatively straightforward, but as the cookie handling code is shared among all applications, a change to just one application will trickle through to all other applications. The fix must, therefore, be tested on all affected applications. Reliability & Security Reliability and security are both crucial components of a truly trustworthy system, but building systems that are both reliable and secure is difficult. While the requirements for reliability and security share many common properties, they also require different design considerations. It is easy to miss the subtle interplay between reliability and security that can cause unexpected outcomes Ex: A password management application failure was triggered by a reliability problem i.e poor load-balancing and load-shedding strategies and its recovery were later complicated by multiple measures (HSM mechanism which needs to be plugged into server racks, which works as an authentication & the HSM token supposedly locked inside a case.. & the problem can be further elongated ) designed to increase the security of the system. Authentication vs Authorization Authentication is the act of validating that users are who they claim to be. Passwords are the most common authentication factor\u2014if a user enters the correct password, the system assumes the identity is valid and grants access. Other technologies such as One-Time Pins, authentication apps, and even biometrics can also be used to authenticate identity. In some instances, systems require the successful verification of more than one factor before granting access. This multi-factor authentication (MFA) requirement is often deployed to increase security beyond what passwords alone can provide. Authorization in system security is the process of giving the user permission to access a specific resource or function. This term is often used interchangeably with access control or client privilege. Giving someone permission to download a particular file on a server or providing individual users with administrative access to an application are good examples. In secure environments, authorization must always follow authentication, users should first prove that their identities are genuine before an organization\u2019s administrators grant them access to the requested resources. Common authentication flow (local authentication) The user registers using an identifier like username/email/mobile The application stores user credentials in the database The application sends a verification email/message to validate the registration Post successful registration, the user enters credentials for logging in On successful authentication, the user is allowed access to specific resources OpenID/OAuth OpenID is an authentication protocol that allows us to authenticate users without using a local auth system. In such a scenario, a user has to be registered with an OpenID Provider and the same provider should be integrated with the authentication flow of your application. To verify the details, we have to forward the authentication requests to the provider. On successful authentication, we receive a success message and/or profile details with which we can execute the necessary flow. OAuth is an authorization mechanism that allows your application user access to a provider(Gmail/Facebook/Instagram/etc). On successful response, we (your application) receive a token with which the application can access certain APIs on behalf of a user. OAuth is convenient in case your business use case requires some certain user-facing APIs like access to Google Drive or sending tweets on your behalf. Most OAuth 2.0 providers can be used for pseudo authentication. Having said that, it can get pretty complicated if you are using multiple OAuth providers to authenticate users on top of the local authentication system. Cryptography It is the science and study of hiding any text in such a way that only the intended recipients or authorized persons can read it and that any text can even use things such as invisible ink or the mechanical cryptography machines of the past. Cryptography is necessary for securing critical or proprietary information and is used to encode private data messages by converting some plain text into ciphertext. At its core, there are two ways of doing this, more advanced methods are all built upon. Ciphers Ciphers are the cornerstone of cryptography. A cipher is a set of algorithms that performs encryption or decryption on a message. An encryption algorithm (E) takes a secret key (k) and a message (m) and produces a ciphertext (c). Similarly, a Decryption algorithm (D) takes a secret key (K) and the previous resulting Ciphertext (C). They are represented as follows: E(k,m) = c D(k,c) = m This also means that for it to be a cipher, it must satisfy the consistency equation as follows, making it possible to decrypt. D(k,E(k,m)) = m Stream Ciphers: The message is broken into characters or bits and enciphered with a key or keystream(should be random and generated independently of the message stream) that is as long as the plaintext bitstream. If the keystream is random, this scheme would be unbreakable unless the keystream was acquired, making it unconditionally secure. The keystream must be provided to both parties in a secure way to prevent its release. Block Ciphers: Block ciphers \u2014 process messages in blocks, each of which is then encrypted or decrypted. A block cipher is a symmetric cipher in which blocks of plaintext are treated as a whole and used to produce ciphertext blocks. The block cipher takes blocks that are b bits long and encrypts them to blocks that are also b bits long. Block sizes are typically 64 or 128 bits long. Encryption Secret Key (Symmetric Key) : the same key is used for encryption and decryption Public Key (Asymmetric Key) in an asymmetric, the encryption and decryption keys are different but related. The encryption key is known as the public key and the decryption key is known as the private key. The public and private keys are known as a key pair. Symmetric Key Encryption DES The Data Encryption Standard (DES) has been the worldwide encryption standard for a long time. IBM developed DES in 1975, and it has held up remarkably well against years of cryptanalysis. DES is a symmetric encryption algorithm with a fixed key length of 56 bits. The algorithm is still good, but because of the short key length, it is susceptible to brute-force attacks that have sufficient resources. DES usually operates in block mode, whereby it encrypts data in 64-bit blocks. The same algorithm and key are used for both encryption and decryption. Because DES is based on simple mathematical functions, it can be easily implemented and accelerated in hardware. Triple DES With advances in computer processing power, the original 56-bit DES key became too short to withstand an attacker with even a limited budget. One way of increasing the effective key length of DES without changing the well-analyzed algorithm itself is to use the same algorithm with different keys several times in a row. The technique of applying DES three times in a row to a plain text block is called Triple DES (3DES). The 3DES technique is shown in Figure. Brute-force attacks on 3DES are considered unfeasible today. Because the basic algorithm has been tested in the field for more than 25 years, it is considered to be more trustworthy than its predecessor. AES On October 2, 2000, The U.S. National Institute of Standards and Technology (NIST) announced the selection of the Rijndael cipher as the AES algorithm. This cipher, developed by Joan Daemen and Vincent Rijmen, has a variable block length and key length. The algorithm currently specifies how to use keys with a length of 128, 192, or 256 bits to encrypt blocks with a length of 128, 192, or 256 bits (all nine combinations of key length and block length are possible). Both block and key lengths can be extended easily to multiples of 32 bits. AES was chosen to replace DES and 3DES because they are either too weak (DES, in terms of key length) or too slow (3DES) to run on modern, efficient hardware. AES is more efficient and much faster, usually by a factor of 5 compared to DES on the same hardware. AES is also more suitable for high throughput, especially if pure software encryption is used. However, AES is a relatively young algorithm, and as the golden rule of cryptography states, \u201cA more mature algorithm is always more trusted.\u201d Asymmetric Key Algorithm In a symmetric key system, Alice first puts the secret message in a box and then padlocks the box using a lock to which she has a key. She then sends the box to Bob through regular mail. When Bob receives the box, he uses an identical copy of Alice's key (which he has obtained previously) to open the box and read the message. In an asymmetric key system, instead of opening the box when he receives it, Bob simply adds his own personal lock to the box and returns the box through public mail to Alice. Alice uses her key to remove her lock and returns the box to Bob, with Bob's lock still in place. Finally, Bob uses his key to remove his lock and reads the message from Alice. The critical advantage in an asymmetric system is that Alice never needs to send a copy of her key to Bob. This reduces the possibility that a third party (for example, an unscrupulous postmaster) can copy the key while it is in transit to Bob, allowing that third party to spy on all future messages sent by Alice. In addition, if Bob is careless and allows someone else to copy his key, Alice's messages to Bob are compromised, but Alice's messages to other people remain secret NOTE : In terms of TLS key exchange, this is the common approach. Diffie-Hellman The protocol has two system parameters, p and g. They are both public and may be used by everybody. Parameter p is a prime number, and parameter g (usually called a generator) is an integer that is smaller than p, but with the following property: For every number n between 1 and p \u2013 1 inclusive, there is a power k of g such that n = gk mod p. Diffie Hellman algorithm is an asymmetric algorithm used to establish a shared secret for a symmetric key algorithm. Nowadays most of the people use hybrid cryptosystem i.e, a combination of symmetric and asymmetric encryption. Asymmetric Encryption is used as a technique in key exchange mechanism to share a secret key and after the key is shared between sender and receiver, the communication will take place using symmetric encryption. The shared secret key will be used to encrypt the communication. Refer: https://medium.com/@akhigbemmanuel/what-is-the-diffie-hellman-key-exchange-algorithm-84d60025a30d RSA The RSA algorithm is very flexible and has a variable key length where, if necessary, speed can be traded for the level of security of the algorithm. The RSA keys are usually 512 to 2048 bits long. RSA has withstood years of extensive cryptanalysis. Although those years neither proved nor disproved RSA's security, they attest to a confidence level in the algorithm. RSA security is based on the difficulty of factoring very large numbers. If an easy method of factoring these large numbers were discovered, the effectiveness of RSA would be destroyed. Refer: https://medium.com/curiositypapers/a-complete-explanation-of-rsa-asymmetric-encryption-742c5971e0f NOTE : RSA Keys can be used for key exchange just like Diffie Hellman Hashing Algorithms Hashing is one of the mechanisms used for data integrity assurance. Hashing is based on a one-way mathematical function, which is relatively easy to compute but significantly harder to reverse. A hash function, which is a one-way function to input data to produce a fixed-length digest (fingerprint) of output data. The digest is cryptographically strong; that is, it is impossible to recover input data from its digest. If the input data changes just a little, the digest (fingerprint) changes substantially in what is called an avalanche effect. More: https://medium.com/@rauljordan/the-state-of-hashing-algorithms-the-why-the-how-and-the-future-b21d5c0440de https://medium.com/@StevieCEllis/the-beautiful-hash-algorithm-f18d9d2b84fb MD5 MD5 is a one-way function with which it is easy to compute the hash from the given input data, but it is unfeasible to compute input data given only a hash. SHA-1 MD5 is considered less secure than SHA-1 because MD5 has some weaknesses. HA-1 also uses a stronger, 160-bit digest, which makes MD5 the second choice as hash methods are concerned. The algorithm takes a message of less than 264 bits in length and produces a 160-bit message digest. This algorithm is slightly slower than MD5. NOTE : SHA-1 is also recently demonstrated to be broken, Minimum current recommendation is SHA-256 Digital Certificates Digital signatures, provide a means to digitally authenticate devices and individual users. In public-key cryptography, such as the RSA encryption system, each user has a key-pair containing both a public key and a private key. The keys act as complements, and anything encrypted with one of the keys can be decrypted with the other. In simple terms, a signature is formed when data is encrypted with a user's private key. The receiver verifies the signature by decrypting the message with the sender's public key. Key management is often considered the most difficult task in designing and implementing cryptographic systems. Businesses can simplify some of the deployment and management issues that are encountered with secured data communications by employing a Public Key Infrastructure (PKI). Because corporations often move security-sensitive communications across the Internet, an effective mechanism must be implemented to protect sensitive information from the threats presented on the Internet. PKI provides a hierarchical framework for managing digital security attributes. Each PKI participant holds a digital certificate that has been issued by a CA (either public or private). The certificate contains several attributes that are used when parties negotiate a secure connection. These attributes must include the certificate validity period, end-host identity information, encryption keys that will be used for secure communications, and the signature of the issuing CA. Optional attributes may be included, depending on the requirements and capability of the PKI. A CA can be a trusted third party, such as VeriSign or Entrust, or a private (in-house) CA that you establish within your organization. The fact that the message could be decrypted using the sender's public key means that the holder of the private key created the message. This process relies on the receiver having a copy of the sender's public key and knowing with a high degree of certainty that it really does belong to the sender and not to someone pretending to be the sender. To validate the CA's signature, the receiver must know the CA's public key. Normally, this is handled out-of-band or through an operation performed during the installation of the certificate. For instance, most web browsers are configured with the root certificates of several CAs by default. CA Enrollment process The end host generates a private-public key pair. The end host generates a certificate request, which it forwards to the CA. Manual human intervention is required to approve the enrollment request, which is received by the CA. After the CA operator approves the request, the CA signs the certificate request with its private key and returns the completed certificate to the end host. The end host writes the certificate into a nonvolatile storage area (PC hard disk or NVRAM on Cisco routers). Refer : https://www.ssh.com/manuals/server-zos-product/55/ch06s03s01.html Login Security SSH SSH, the Secure Shell, is a popular, powerful, software-based approach to network security. Whenever data is sent by a computer to the network, SSH automatically encrypts (scrambles) it. Then, when the data reaches its intended recipient, SSH automatically decrypts (unscrambles) it. The result is transparent encryption: users can work normally, unaware that their communications are safely encrypted on the network. In addition, SSH can use modern, secure encryption algorithms based on how it's being configured and is effective enough to be found within mission-critical applications at major corporations. SSH has a client/server architecture An SSH server program, typically installed and run by a system administrator, accepts or rejects incoming connections to its host computer. Users then run SSH client programs, typically on other computers, to make requests of the SSH server, such as \u201cPlease log me in,\u201d \u201cPlease send me a file,\u201d or \u201cPlease execute this command.\u201d All communications between clients and servers are securely encrypted and protected from modification. What SSH is not: Although SSH stands for Secure Shell, it is not a true shell in the sense of the Unix Bourne shell and C shell. It is not a command interpreter, nor does it provide wildcard expansion, command history, and so forth. Rather, SSH creates a channel for running a shell on a remote computer, with end-to-end encryption between the two systems. The major features and guarantees of the SSH protocol are: Privacy of your data, via strong encryption Integrity of communications, guaranteeing they haven\u2019t been altered Authentication, i.e., proof of identity of senders and receivers Authorization, i.e., access control to accounts Forwarding or tunnelling to encrypt other TCP/IP-based sessions Kerberos According to Greek mythology Kerberos (Cerberus) was the gigantic, three-headed dog that guards the gates of the underworld to prevent the dead from leaving. So when it comes to Computer Science, Kerberos is a network authentication protocol and is currently the default authentication technology used by Microsoft Active Directory to authenticate users to services within a local area network. Kerberos uses symmetric-key cryptography and requires a trusted third-party authentication service to verify user identities. So they used the name of Kerberos for their computer network authentication protocol as the three heads of the Kerberos represent: a client: A user/ a service a server: Kerberos protected hosts reside - a Key Distribution Center (KDC), which acts as the trusted third-party authentication service. The KDC includes the following two servers: Authentication Server (AS) that performs the initial authentication and issues ticket-granting tickets (TGT) for users. Ticket-Granting Server (TGS) that issues service tickets that are based on the initial ticket-granting tickets (TGT). Certificate Chain The first part of the output of the OpenSSL command shows three certificates numbered 0, 1, and 2(not 2 anymore). Each certificate has a subject, s, and an issuer, i. The first certificate, number 0, is called the end-entity certificate. The subject line tells us it\u2019s valid for any subdomain of google.com because its subject is set to *.google.com. $ openssl s_client -connect www.google.com:443 -CApath /etc/ssl/certs CONNECTED(00000005) depth=2 OU = GlobalSign Root CA - R2, O = GlobalSign, CN = GlobalSign verify return:1 depth=1 C = US, O = Google Trust Services, CN = GTS CA 1O1 verify return:1 depth=0 C = US, ST = California, L = Mountain View, O = Google LLC, CN = www.google.com verify return:1 --- Certificate chain 0 s:/C=US/ST=California/L=Mountain View/O=Google LLC/CN=www.google.com i:/C=US/O=Google Trust Services/CN=GTS CA 1O1 1 s:/C=US/O=Google Trust Services/CN=GTS CA 1O1 i:/OU=GlobalSign Root CA - R2/O=GlobalSign/CN=GlobalSign --- Server certificate The issuer line indicates it\u2019s issued by Google Internet Authority G2, which also happens to be the subject of the second certificate, number 1 What the OpenSSL command line doesn\u2019t show here is the trust store that contains the list of CA certificates trusted by the system OpenSSL runs on. The public certificate of GlobalSign Authority must be present in the system\u2019s trust store to close the verification chain. This is called a chain of trust, and the figure below summarizes its behaviour at a high level. High-level view of the concept of chain of trust applied to verifying the authenticity of a website. The Root CA in the Firefox trust store provides the initial trust to verify the entire chain and trust the end-entity certificate. TLS Handshake The client sends a HELLO message to the server with a list of protocols and algorithms it supports. The server says HELLO back and sends its chain of certificates. Based on the capabilities of the client, the server picks a cipher suite. If the cipher suite supports ephemeral key exchange, like ECDHE does(ECDHE is an algorithm known as the Elliptic Curve Diffie-Hellman Exchange), the server and the client negotiate a pre-master key with the Diffie-Hellman algorithm. The pre-master key is never sent over the wire. The client and server create a session key that will be used to encrypt the data transiting through the connection. At the end of the handshake, both parties possess a secret session key used to encrypt data for the rest of the connection. This is what OpenSSL refers to as Master-Key NOTE There are 3 versions of TLS , TLS 1.0, 1.1 & 1.2 TLS 1.0 was released in 1999, making it a nearly two-decade-old protocol. It has been known to be vulnerable to attacks\u2014such as BEAST and POODLE\u2014for years, in addition to supporting weak cryptography, which doesn\u2019t keep modern-day connections sufficiently secure. TLS 1.1 is the forgotten \u201cmiddle child.\u201d It also has bad cryptography like its younger sibling. In most software, it was leapfrogged by TLS 1.2 and it\u2019s rare to see TLS 1.1 used. \u201cPerfect\u201d Forward Secrecy The term \u201cephemeral\u201d in the key exchange provides an important security feature mis-named perfect forward secrecy (PFS) or just \u201cForward Secrecy\u201d. In a non-ephemeral key exchange, the client sends the pre-master key to the server by encrypting it with the server\u2019s public key. The server then decrypts the pre-master key with its private key. If at a later point in time, the private key of the server is compromised, an attacker can go back to this handshake, decrypt the pre-master key, obtain the session key, and decrypt the entire traffic. Non-ephemeral key exchanges are vulnerable to attacks that may happen in the future on recorded traffic. And because people seldom change their password, decrypting data from the past may still be valuable for an attacker. An ephemeral key exchange like DHE, or its variant on elliptic curve, ECDHE, solves this problem by not transmitting the pre-master key over the wire. Instead, the pre-master key is computed by both the client and the server in isolation, using nonsensitive information exchanged publicly. Because the pre-master key can\u2019t be decrypted later by an attacker, the session key is safe from future attacks: hence, the term perfect forward secrecy. Keys are changed every X blocks along the stream. That prevents an attacker from simply sniffing the stream and applying brute force to crack the whole thing. \"Forward secrecy\" means that just because I can decrypt block M, does not mean that I can decrypt block Q Downside: The downside to PFS is that all those extra computational steps induce latency on the handshake and slow the user down. To avoid repeating this expensive work at every connection, both sides cache the session key for future use via a technique called session resumption. This is what the session-ID and TLS ticket are for: they allow a client and server that share a session ID to skip over the negotiation of a session key, because they already agreed on one previously, and go directly to exchanging data securely.","title":"Fundamentals of Security"},{"location":"level101/security/fundamentals/#part-i-fundamentals","text":"","title":"Part I: Fundamentals"},{"location":"level101/security/fundamentals/#introduction-to-security-overview-for-sre","text":"If you look closely, both Site Reliability Engineering and Security Engineering are concerned with keeping a system usable. Issues like broken releases, capacity shortages, and misconfigurations can make a system unusable (at least temporarily). Security or privacy incidents that break the trust of users also undermine the usefulness of a system. Consequently, system security should be top of mind for SREs. SREs should be involved in both significant design discussions and actual system changes. They have quite a big role in System design & hence are quite sometimes the first line of defence. SRE\u2019s help in preventing bad design & implementations which can affect the overall security of the infrastructure. Successfully designing, implementing, and maintaining systems requires a commitment to the full system lifecycle . This commitment is possible only when security and reliability are central elements in the architecture of systems. Core Pillars of Information Security : Confidentiality \u2013 only allow access to data for which the user is permitted Integrity \u2013 ensure data is not tampered or altered by unauthorized users Availability \u2013 ensure systems and data are available to authorized users when they need it Thinking like a Security Engineer When starting a new application or re-factoring an existing application, you should consider each functional feature, and consider: Is the process surrounding this feature as safe as possible? In other words, is this a flawed process? If I were evil, how would I abuse this feature? Or more specifically failing to address how a feature can be abused can cause design flaws. Is the feature required to be on by default? If so, are there limits or options that could help reduce the risk from this feature? Security Principles By OWASP (Open Web Application Security Project) Minimize attack surface area : Every feature that is added to an application adds a certain amount of risk to the overall application. The aim of secure development is to reduce the overall risk by reducing the attack surface area. For example, a web application implements online help with a search function. The search function may be vulnerable to SQL injection attacks. If the help feature was limited to authorized users, the attack likelihood is reduced. If the help feature\u2019s search function was gated through centralized data validation routines, the ability to perform SQL injection is dramatically reduced. However, if the help feature was re-written to eliminate the search function (through a better user interface, for example), this almost eliminates the attack surface area, even if the help feature was available to the Internet at large. Establish secure defaults: There are many ways to deliver an \u201cout of the box\u201d experience for users. However, by default, the experience should be secure, and it should be up to the user to reduce their security \u2013 if they are allowed. For example, by default, password ageing and complexity should be enabled. Users might be allowed to turn these two features off to simplify their use of the application and increase their risk. Default Passwords of routers, IoT devices should be changed Principle of Least privilege The principle of least privilege recommends that accounts have the least amount of privilege required to perform their business processes. This encompasses user rights, resource permissions such as CPU limits, memory, network, and file system permissions. For example, if a middleware server only requires access to the network, read access to a database table, and the ability to write to a log, this describes all the permissions that should be granted. Under no circumstances should the middleware be granted administrative privileges. Principle of Defense in depth The principle of defence in depth suggests that where one control would be reasonable, more controls that approach risks in different fashions are better. Controls, when used in depth, can make severe vulnerabilities extraordinarily difficult to exploit and thus unlikely to occur. With secure coding, this may take the form of tier-based validation, centralized auditing controls, and requiring users to be logged on all pages. For example, a flawed administrative interface is unlikely to be vulnerable to an anonymous attack if it correctly gates access to production management networks, checks for administrative user authorization, and logs all access. Fail securely Applications regularly fail to process transactions for many reasons. How they fail can determine if an application is secure or not. ``` is_admin = true; try { code_which_may_faile(); is_admin = is_user_assigned_role(\"Adminstrator\"); } catch (Exception err) { log.error(err.toString()); } ``` - If either codeWhichMayFail() or isUserInRole fails or throws an exception, the user is an admin by default. This is obviously a security risk. Don\u2019t trust services Many organizations utilize the processing capabilities of third-party partners, who more than likely have different security policies and posture than you. It is unlikely that you can influence or control any external third party, whether they are home users or major suppliers or partners. Therefore, the implicit trust of externally run systems is not warranted. All external systems should be treated similarly. For example, a loyalty program provider provides data that is used by Internet Banking, providing the number of reward points and a small list of potential redemption items. However, the data should be checked to ensure that it is safe to display to end-users and that the reward points are a positive number, and not improbably large. Separation of duties The key to fraud control is the separation of duties. For example, someone who requests a computer cannot also sign for it, nor should they directly receive the computer. This prevents the user from requesting many computers and claiming they never arrived. Certain roles have different levels of trust than normal users. In particular, administrators are different from normal users. In general, administrators should not be users of the application. For example, an administrator should be able to turn the system on or off, set password policy but shouldn\u2019t be able to log on to the storefront as a super privileged user, such as being able to \u201cbuy\u201d goods on behalf of other users. Avoid security by obscurity Security through obscurity is a weak security control, and nearly always fails when it is the only control. This is not to say that keeping secrets is a bad idea, it simply means that the security of systems should not be reliant upon keeping details hidden. For example, the security of an application should not rely upon knowledge of the source code being kept secret. The security should rely upon many other factors, including reasonable password policies, defence in depth, business transaction limits, solid network architecture, and fraud, and audit controls. A practical example is Linux. Linux\u2019s source code is widely available, and yet when properly secured, Linux is a secure and robust operating system. Keep security simple Attack surface area and simplicity go hand in hand. Certain software engineering practices prefer overly complex approaches to what would otherwise be a relatively straightforward and simple design. Developers should avoid the use of double negatives and complex architectures when a simpler approach would be faster and simpler. For example, although it might be fashionable to have a slew of singleton entity beans running on a separate middleware server, it is more secure and faster to simply use global variables with an appropriate mutex mechanism to protect against race conditions. Fix security issues correctly Once a security issue has been identified, it is important to develop a test for it and to understand the root cause of the issue. When design patterns are used, the security issue is likely widespread amongst all codebases, so developing the right fix without introducing regressions is essential. For example, a user has found that they can see another user\u2019s balance by adjusting their cookie. The fix seems to be relatively straightforward, but as the cookie handling code is shared among all applications, a change to just one application will trickle through to all other applications. The fix must, therefore, be tested on all affected applications. Reliability & Security Reliability and security are both crucial components of a truly trustworthy system, but building systems that are both reliable and secure is difficult. While the requirements for reliability and security share many common properties, they also require different design considerations. It is easy to miss the subtle interplay between reliability and security that can cause unexpected outcomes Ex: A password management application failure was triggered by a reliability problem i.e poor load-balancing and load-shedding strategies and its recovery were later complicated by multiple measures (HSM mechanism which needs to be plugged into server racks, which works as an authentication & the HSM token supposedly locked inside a case.. & the problem can be further elongated ) designed to increase the security of the system.","title":"Introduction to Security Overview for SRE"},{"location":"level101/security/fundamentals/#authentication-vs-authorization","text":"Authentication is the act of validating that users are who they claim to be. Passwords are the most common authentication factor\u2014if a user enters the correct password, the system assumes the identity is valid and grants access. Other technologies such as One-Time Pins, authentication apps, and even biometrics can also be used to authenticate identity. In some instances, systems require the successful verification of more than one factor before granting access. This multi-factor authentication (MFA) requirement is often deployed to increase security beyond what passwords alone can provide. Authorization in system security is the process of giving the user permission to access a specific resource or function. This term is often used interchangeably with access control or client privilege. Giving someone permission to download a particular file on a server or providing individual users with administrative access to an application are good examples. In secure environments, authorization must always follow authentication, users should first prove that their identities are genuine before an organization\u2019s administrators grant them access to the requested resources.","title":"Authentication vs Authorization"},{"location":"level101/security/fundamentals/#common-authentication-flow-local-authentication","text":"The user registers using an identifier like username/email/mobile The application stores user credentials in the database The application sends a verification email/message to validate the registration Post successful registration, the user enters credentials for logging in On successful authentication, the user is allowed access to specific resources","title":"Common authentication flow (local authentication)"},{"location":"level101/security/fundamentals/#openidoauth","text":"OpenID is an authentication protocol that allows us to authenticate users without using a local auth system. In such a scenario, a user has to be registered with an OpenID Provider and the same provider should be integrated with the authentication flow of your application. To verify the details, we have to forward the authentication requests to the provider. On successful authentication, we receive a success message and/or profile details with which we can execute the necessary flow. OAuth is an authorization mechanism that allows your application user access to a provider(Gmail/Facebook/Instagram/etc). On successful response, we (your application) receive a token with which the application can access certain APIs on behalf of a user. OAuth is convenient in case your business use case requires some certain user-facing APIs like access to Google Drive or sending tweets on your behalf. Most OAuth 2.0 providers can be used for pseudo authentication. Having said that, it can get pretty complicated if you are using multiple OAuth providers to authenticate users on top of the local authentication system.","title":"OpenID/OAuth"},{"location":"level101/security/fundamentals/#cryptography","text":"It is the science and study of hiding any text in such a way that only the intended recipients or authorized persons can read it and that any text can even use things such as invisible ink or the mechanical cryptography machines of the past. Cryptography is necessary for securing critical or proprietary information and is used to encode private data messages by converting some plain text into ciphertext. At its core, there are two ways of doing this, more advanced methods are all built upon.","title":"Cryptography"},{"location":"level101/security/fundamentals/#ciphers","text":"Ciphers are the cornerstone of cryptography. A cipher is a set of algorithms that performs encryption or decryption on a message. An encryption algorithm (E) takes a secret key (k) and a message (m) and produces a ciphertext (c). Similarly, a Decryption algorithm (D) takes a secret key (K) and the previous resulting Ciphertext (C). They are represented as follows: E(k,m) = c D(k,c) = m This also means that for it to be a cipher, it must satisfy the consistency equation as follows, making it possible to decrypt. D(k,E(k,m)) = m Stream Ciphers: The message is broken into characters or bits and enciphered with a key or keystream(should be random and generated independently of the message stream) that is as long as the plaintext bitstream. If the keystream is random, this scheme would be unbreakable unless the keystream was acquired, making it unconditionally secure. The keystream must be provided to both parties in a secure way to prevent its release. Block Ciphers: Block ciphers \u2014 process messages in blocks, each of which is then encrypted or decrypted. A block cipher is a symmetric cipher in which blocks of plaintext are treated as a whole and used to produce ciphertext blocks. The block cipher takes blocks that are b bits long and encrypts them to blocks that are also b bits long. Block sizes are typically 64 or 128 bits long. Encryption Secret Key (Symmetric Key) : the same key is used for encryption and decryption Public Key (Asymmetric Key) in an asymmetric, the encryption and decryption keys are different but related. The encryption key is known as the public key and the decryption key is known as the private key. The public and private keys are known as a key pair. Symmetric Key Encryption DES The Data Encryption Standard (DES) has been the worldwide encryption standard for a long time. IBM developed DES in 1975, and it has held up remarkably well against years of cryptanalysis. DES is a symmetric encryption algorithm with a fixed key length of 56 bits. The algorithm is still good, but because of the short key length, it is susceptible to brute-force attacks that have sufficient resources. DES usually operates in block mode, whereby it encrypts data in 64-bit blocks. The same algorithm and key are used for both encryption and decryption. Because DES is based on simple mathematical functions, it can be easily implemented and accelerated in hardware. Triple DES With advances in computer processing power, the original 56-bit DES key became too short to withstand an attacker with even a limited budget. One way of increasing the effective key length of DES without changing the well-analyzed algorithm itself is to use the same algorithm with different keys several times in a row. The technique of applying DES three times in a row to a plain text block is called Triple DES (3DES). The 3DES technique is shown in Figure. Brute-force attacks on 3DES are considered unfeasible today. Because the basic algorithm has been tested in the field for more than 25 years, it is considered to be more trustworthy than its predecessor. AES On October 2, 2000, The U.S. National Institute of Standards and Technology (NIST) announced the selection of the Rijndael cipher as the AES algorithm. This cipher, developed by Joan Daemen and Vincent Rijmen, has a variable block length and key length. The algorithm currently specifies how to use keys with a length of 128, 192, or 256 bits to encrypt blocks with a length of 128, 192, or 256 bits (all nine combinations of key length and block length are possible). Both block and key lengths can be extended easily to multiples of 32 bits. AES was chosen to replace DES and 3DES because they are either too weak (DES, in terms of key length) or too slow (3DES) to run on modern, efficient hardware. AES is more efficient and much faster, usually by a factor of 5 compared to DES on the same hardware. AES is also more suitable for high throughput, especially if pure software encryption is used. However, AES is a relatively young algorithm, and as the golden rule of cryptography states, \u201cA more mature algorithm is always more trusted.\u201d Asymmetric Key Algorithm In a symmetric key system, Alice first puts the secret message in a box and then padlocks the box using a lock to which she has a key. She then sends the box to Bob through regular mail. When Bob receives the box, he uses an identical copy of Alice's key (which he has obtained previously) to open the box and read the message. In an asymmetric key system, instead of opening the box when he receives it, Bob simply adds his own personal lock to the box and returns the box through public mail to Alice. Alice uses her key to remove her lock and returns the box to Bob, with Bob's lock still in place. Finally, Bob uses his key to remove his lock and reads the message from Alice. The critical advantage in an asymmetric system is that Alice never needs to send a copy of her key to Bob. This reduces the possibility that a third party (for example, an unscrupulous postmaster) can copy the key while it is in transit to Bob, allowing that third party to spy on all future messages sent by Alice. In addition, if Bob is careless and allows someone else to copy his key, Alice's messages to Bob are compromised, but Alice's messages to other people remain secret NOTE : In terms of TLS key exchange, this is the common approach. Diffie-Hellman The protocol has two system parameters, p and g. They are both public and may be used by everybody. Parameter p is a prime number, and parameter g (usually called a generator) is an integer that is smaller than p, but with the following property: For every number n between 1 and p \u2013 1 inclusive, there is a power k of g such that n = gk mod p. Diffie Hellman algorithm is an asymmetric algorithm used to establish a shared secret for a symmetric key algorithm. Nowadays most of the people use hybrid cryptosystem i.e, a combination of symmetric and asymmetric encryption. Asymmetric Encryption is used as a technique in key exchange mechanism to share a secret key and after the key is shared between sender and receiver, the communication will take place using symmetric encryption. The shared secret key will be used to encrypt the communication. Refer: https://medium.com/@akhigbemmanuel/what-is-the-diffie-hellman-key-exchange-algorithm-84d60025a30d RSA The RSA algorithm is very flexible and has a variable key length where, if necessary, speed can be traded for the level of security of the algorithm. The RSA keys are usually 512 to 2048 bits long. RSA has withstood years of extensive cryptanalysis. Although those years neither proved nor disproved RSA's security, they attest to a confidence level in the algorithm. RSA security is based on the difficulty of factoring very large numbers. If an easy method of factoring these large numbers were discovered, the effectiveness of RSA would be destroyed. Refer: https://medium.com/curiositypapers/a-complete-explanation-of-rsa-asymmetric-encryption-742c5971e0f NOTE : RSA Keys can be used for key exchange just like Diffie Hellman Hashing Algorithms Hashing is one of the mechanisms used for data integrity assurance. Hashing is based on a one-way mathematical function, which is relatively easy to compute but significantly harder to reverse. A hash function, which is a one-way function to input data to produce a fixed-length digest (fingerprint) of output data. The digest is cryptographically strong; that is, it is impossible to recover input data from its digest. If the input data changes just a little, the digest (fingerprint) changes substantially in what is called an avalanche effect. More: https://medium.com/@rauljordan/the-state-of-hashing-algorithms-the-why-the-how-and-the-future-b21d5c0440de https://medium.com/@StevieCEllis/the-beautiful-hash-algorithm-f18d9d2b84fb MD5 MD5 is a one-way function with which it is easy to compute the hash from the given input data, but it is unfeasible to compute input data given only a hash. SHA-1 MD5 is considered less secure than SHA-1 because MD5 has some weaknesses. HA-1 also uses a stronger, 160-bit digest, which makes MD5 the second choice as hash methods are concerned. The algorithm takes a message of less than 264 bits in length and produces a 160-bit message digest. This algorithm is slightly slower than MD5. NOTE : SHA-1 is also recently demonstrated to be broken, Minimum current recommendation is SHA-256 Digital Certificates Digital signatures, provide a means to digitally authenticate devices and individual users. In public-key cryptography, such as the RSA encryption system, each user has a key-pair containing both a public key and a private key. The keys act as complements, and anything encrypted with one of the keys can be decrypted with the other. In simple terms, a signature is formed when data is encrypted with a user's private key. The receiver verifies the signature by decrypting the message with the sender's public key. Key management is often considered the most difficult task in designing and implementing cryptographic systems. Businesses can simplify some of the deployment and management issues that are encountered with secured data communications by employing a Public Key Infrastructure (PKI). Because corporations often move security-sensitive communications across the Internet, an effective mechanism must be implemented to protect sensitive information from the threats presented on the Internet. PKI provides a hierarchical framework for managing digital security attributes. Each PKI participant holds a digital certificate that has been issued by a CA (either public or private). The certificate contains several attributes that are used when parties negotiate a secure connection. These attributes must include the certificate validity period, end-host identity information, encryption keys that will be used for secure communications, and the signature of the issuing CA. Optional attributes may be included, depending on the requirements and capability of the PKI. A CA can be a trusted third party, such as VeriSign or Entrust, or a private (in-house) CA that you establish within your organization. The fact that the message could be decrypted using the sender's public key means that the holder of the private key created the message. This process relies on the receiver having a copy of the sender's public key and knowing with a high degree of certainty that it really does belong to the sender and not to someone pretending to be the sender. To validate the CA's signature, the receiver must know the CA's public key. Normally, this is handled out-of-band or through an operation performed during the installation of the certificate. For instance, most web browsers are configured with the root certificates of several CAs by default. CA Enrollment process The end host generates a private-public key pair. The end host generates a certificate request, which it forwards to the CA. Manual human intervention is required to approve the enrollment request, which is received by the CA. After the CA operator approves the request, the CA signs the certificate request with its private key and returns the completed certificate to the end host. The end host writes the certificate into a nonvolatile storage area (PC hard disk or NVRAM on Cisco routers). Refer : https://www.ssh.com/manuals/server-zos-product/55/ch06s03s01.html","title":"Ciphers"},{"location":"level101/security/fundamentals/#login-security","text":"","title":"Login Security"},{"location":"level101/security/fundamentals/#ssh","text":"SSH, the Secure Shell, is a popular, powerful, software-based approach to network security. Whenever data is sent by a computer to the network, SSH automatically encrypts (scrambles) it. Then, when the data reaches its intended recipient, SSH automatically decrypts (unscrambles) it. The result is transparent encryption: users can work normally, unaware that their communications are safely encrypted on the network. In addition, SSH can use modern, secure encryption algorithms based on how it's being configured and is effective enough to be found within mission-critical applications at major corporations. SSH has a client/server architecture An SSH server program, typically installed and run by a system administrator, accepts or rejects incoming connections to its host computer. Users then run SSH client programs, typically on other computers, to make requests of the SSH server, such as \u201cPlease log me in,\u201d \u201cPlease send me a file,\u201d or \u201cPlease execute this command.\u201d All communications between clients and servers are securely encrypted and protected from modification. What SSH is not: Although SSH stands for Secure Shell, it is not a true shell in the sense of the Unix Bourne shell and C shell. It is not a command interpreter, nor does it provide wildcard expansion, command history, and so forth. Rather, SSH creates a channel for running a shell on a remote computer, with end-to-end encryption between the two systems. The major features and guarantees of the SSH protocol are: Privacy of your data, via strong encryption Integrity of communications, guaranteeing they haven\u2019t been altered Authentication, i.e., proof of identity of senders and receivers Authorization, i.e., access control to accounts Forwarding or tunnelling to encrypt other TCP/IP-based sessions","title":"SSH"},{"location":"level101/security/fundamentals/#kerberos","text":"According to Greek mythology Kerberos (Cerberus) was the gigantic, three-headed dog that guards the gates of the underworld to prevent the dead from leaving. So when it comes to Computer Science, Kerberos is a network authentication protocol and is currently the default authentication technology used by Microsoft Active Directory to authenticate users to services within a local area network. Kerberos uses symmetric-key cryptography and requires a trusted third-party authentication service to verify user identities. So they used the name of Kerberos for their computer network authentication protocol as the three heads of the Kerberos represent: a client: A user/ a service a server: Kerberos protected hosts reside - a Key Distribution Center (KDC), which acts as the trusted third-party authentication service. The KDC includes the following two servers: Authentication Server (AS) that performs the initial authentication and issues ticket-granting tickets (TGT) for users. Ticket-Granting Server (TGS) that issues service tickets that are based on the initial ticket-granting tickets (TGT).","title":"Kerberos"},{"location":"level101/security/fundamentals/#certificate-chain","text":"The first part of the output of the OpenSSL command shows three certificates numbered 0, 1, and 2(not 2 anymore). Each certificate has a subject, s, and an issuer, i. The first certificate, number 0, is called the end-entity certificate. The subject line tells us it\u2019s valid for any subdomain of google.com because its subject is set to *.google.com. $ openssl s_client -connect www.google.com:443 -CApath /etc/ssl/certs CONNECTED(00000005) depth=2 OU = GlobalSign Root CA - R2, O = GlobalSign, CN = GlobalSign verify return:1 depth=1 C = US, O = Google Trust Services, CN = GTS CA 1O1 verify return:1 depth=0 C = US, ST = California, L = Mountain View, O = Google LLC, CN = www.google.com verify return:1 --- Certificate chain 0 s:/C=US/ST=California/L=Mountain View/O=Google LLC/CN=www.google.com i:/C=US/O=Google Trust Services/CN=GTS CA 1O1 1 s:/C=US/O=Google Trust Services/CN=GTS CA 1O1 i:/OU=GlobalSign Root CA - R2/O=GlobalSign/CN=GlobalSign --- Server certificate The issuer line indicates it\u2019s issued by Google Internet Authority G2, which also happens to be the subject of the second certificate, number 1 What the OpenSSL command line doesn\u2019t show here is the trust store that contains the list of CA certificates trusted by the system OpenSSL runs on. The public certificate of GlobalSign Authority must be present in the system\u2019s trust store to close the verification chain. This is called a chain of trust, and the figure below summarizes its behaviour at a high level. High-level view of the concept of chain of trust applied to verifying the authenticity of a website. The Root CA in the Firefox trust store provides the initial trust to verify the entire chain and trust the end-entity certificate.","title":"Certificate Chain"},{"location":"level101/security/fundamentals/#tls-handshake","text":"The client sends a HELLO message to the server with a list of protocols and algorithms it supports. The server says HELLO back and sends its chain of certificates. Based on the capabilities of the client, the server picks a cipher suite. If the cipher suite supports ephemeral key exchange, like ECDHE does(ECDHE is an algorithm known as the Elliptic Curve Diffie-Hellman Exchange), the server and the client negotiate a pre-master key with the Diffie-Hellman algorithm. The pre-master key is never sent over the wire. The client and server create a session key that will be used to encrypt the data transiting through the connection. At the end of the handshake, both parties possess a secret session key used to encrypt data for the rest of the connection. This is what OpenSSL refers to as Master-Key NOTE There are 3 versions of TLS , TLS 1.0, 1.1 & 1.2 TLS 1.0 was released in 1999, making it a nearly two-decade-old protocol. It has been known to be vulnerable to attacks\u2014such as BEAST and POODLE\u2014for years, in addition to supporting weak cryptography, which doesn\u2019t keep modern-day connections sufficiently secure. TLS 1.1 is the forgotten \u201cmiddle child.\u201d It also has bad cryptography like its younger sibling. In most software, it was leapfrogged by TLS 1.2 and it\u2019s rare to see TLS 1.1 used.","title":"TLS Handshake"},{"location":"level101/security/fundamentals/#perfect-forward-secrecy","text":"The term \u201cephemeral\u201d in the key exchange provides an important security feature mis-named perfect forward secrecy (PFS) or just \u201cForward Secrecy\u201d. In a non-ephemeral key exchange, the client sends the pre-master key to the server by encrypting it with the server\u2019s public key. The server then decrypts the pre-master key with its private key. If at a later point in time, the private key of the server is compromised, an attacker can go back to this handshake, decrypt the pre-master key, obtain the session key, and decrypt the entire traffic. Non-ephemeral key exchanges are vulnerable to attacks that may happen in the future on recorded traffic. And because people seldom change their password, decrypting data from the past may still be valuable for an attacker. An ephemeral key exchange like DHE, or its variant on elliptic curve, ECDHE, solves this problem by not transmitting the pre-master key over the wire. Instead, the pre-master key is computed by both the client and the server in isolation, using nonsensitive information exchanged publicly. Because the pre-master key can\u2019t be decrypted later by an attacker, the session key is safe from future attacks: hence, the term perfect forward secrecy. Keys are changed every X blocks along the stream. That prevents an attacker from simply sniffing the stream and applying brute force to crack the whole thing. \"Forward secrecy\" means that just because I can decrypt block M, does not mean that I can decrypt block Q Downside: The downside to PFS is that all those extra computational steps induce latency on the handshake and slow the user down. To avoid repeating this expensive work at every connection, both sides cache the session key for future use via a technique called session resumption. This is what the session-ID and TLS ticket are for: they allow a client and server that share a session ID to skip over the negotiation of a session key, because they already agreed on one previously, and go directly to exchanging data securely.","title":"\u201cPerfect\u201d Forward Secrecy"},{"location":"level101/security/intro/","text":"Security Prerequisites Linux Basics Linux Networking What to expect from this course The course covers fundamentals of information security along with touching on subjects of system security, network & web security. This course aims to get you familiar with the basics of information security in day to day operations & then as an SRE develop the mindset of ensuring that security takes a front-seat while developing solutions. The course also serves as an introduction to common risks and best practices along with practical ways to find out vulnerable systems and loopholes which might become compromised if not secured. What is not covered under this course The courseware is not an ethical hacking workshop or a very deep dive into the fundamentals of the problems. The course does not deal with hacking or breaking into systems but rather an approach on how to ensure you don\u2019t get into those situations and also to make you aware of different ways a system can be compromised. Course Contents Fundamentals Network Security Threats, Attacks & Defence Writing Secure Code & More Conclusion","title":"Introduction"},{"location":"level101/security/intro/#security","text":"","title":"Security"},{"location":"level101/security/intro/#prerequisites","text":"Linux Basics Linux Networking","title":"Prerequisites"},{"location":"level101/security/intro/#what-to-expect-from-this-course","text":"The course covers fundamentals of information security along with touching on subjects of system security, network & web security. This course aims to get you familiar with the basics of information security in day to day operations & then as an SRE develop the mindset of ensuring that security takes a front-seat while developing solutions. The course also serves as an introduction to common risks and best practices along with practical ways to find out vulnerable systems and loopholes which might become compromised if not secured.","title":"What to expect from this course"},{"location":"level101/security/intro/#what-is-not-covered-under-this-course","text":"The courseware is not an ethical hacking workshop or a very deep dive into the fundamentals of the problems. The course does not deal with hacking or breaking into systems but rather an approach on how to ensure you don\u2019t get into those situations and also to make you aware of different ways a system can be compromised.","title":"What is not covered under this course"},{"location":"level101/security/intro/#course-contents","text":"Fundamentals Network Security Threats, Attacks & Defence Writing Secure Code & More Conclusion","title":"Course Contents"},{"location":"level101/security/network_security/","text":"Part II: Network Security Introduction TCP/IP is the dominant networking technology today. It is a five-layer architecture. These layers are, from top to bottom, the application layer, the transport layer (TCP), the network layer (IP), the data-link layer, and the physical layer. In addition to TCP/IP, there also are other networking technologies. For convenience, we use the OSI network model to represent non-TCP/IP network technologies. Different networks are interconnected using gateways. A gateway can be placed at any layer. The OSI model is a seven-layer architecture. The OSI architecture is similar to the TCP/IP architecture, except that the OSI model specifies two additional layers between the application layer and the transport layer in the TCP/IP architecture. These two layers are the presentation layer and the session layer. Figure 5.1 shows the relationship between the TCP/IP layers and the OSI layers. The application layer in TCP/IP corresponds to the application layer and the presentation layer in OSI. The transport layer in TCP/IP corresponds to the session layer and the transport layer in OSI. The remaining three layers in the TCP/IP architecture are one-to-one correspondent to the remaining three layers in the OSI model. Correspondence between layers of the TCP/IP architecture and the OSI model. Also shown are placements of cryptographic algorithms in network layers, where the dotted arrows indicate actual communications of cryptographic algorithms The functionalities of OSI layers are briefly described as follows: The application layer serves as an interface between applications and network programs. It supports application programs and end-user processing. Common application-layer programs include remote logins, file transfer, email, and Web browsing. The presentation layer is responsible for dealing with data that is formed differently. This protocol layer allows application-layer programs residing on different sides of a communication channel with different platforms to understand each other's data formats regardless of how they are presented. The session layer is responsible for creating, managing, and closing a communication connection. The transport layer is responsible for providing reliable connections, such as packet sequencing, traffic control, and congestion control. The network layer is responsible for routing device-independent data packets from the current hop to the next hop. The data-link layer is responsible for encapsulating device-independent data packets into device-dependent data frames. It has two sublayers: logical link control and media access control. The physical layer is responsible for transmitting device-dependent frames through some physical media. Starting from the application layer, data generated from an application program is passed down layer-by-layer to the physical layer. Data from the previous layer is enclosed in a new envelope at the current layer, where the data from the previous layer is also just an envelope containing the data from the layer before it. This is similar to enclosing a smaller envelope in a larger one. The envelope added at each layer contains sufficient information for handling the packet. Application-layer data are divided into blocks small enough to be encapsulated in an envelope at the next layer. Application data blocks are \u201cdressed up\u201d in the TCP/IP architecture according to the following basic steps. At the sending side, an application data block is encapsulated in a TCP packet when it is passed down to the TCP layer. In other words, a TCP packet consists of a header and a payload, where the header corresponds to the TCP envelope and the payload is the application data block. Likewise, the TCP packet will be encapsulated in an IP packet when it is passed down to the IP layer. An IP packet consists of a header and a payload, which is the TCP packet passed down from the TCP layer. The IP packet will be encapsulated in a device-dependent frame (e.g., an Ethernet frame) when it is passed down to the data-link layer. A frame has a header, and it may also have a trailer. For example, in addition to having a header, an Ethernet frame also has a 32-bit cyclic redundancy check (CRC) trailer. When it is passed down to the physical layer, a frame will be transformed into a sequence of media signals for transmission Flow Diagram of a Packet Generation At the destination side, the medium signals are converted by the physical layer into a frame, which is passed up to the data-link layer. The data-link layer passes the frame payload (i.e., the IP packet encapsulated in the frame) up to the IP layer. The IP layer passes the IP payload, namely, the TCP packet encapsulated in the IP packet, up to the TCP layer. The TCP layer passes the TCP payload, namely, the application data block, up to the application layer. When a packet arrives at a router, it only goes up to the IP layer, where certain fields in the IP header are modified (e.g., the value of TTL is decreased by 1). This modified packet is then passed back down layer-by-layer to the physical layer for further transmission. Public Key Infrastructure To deploy cryptographic algorithms in network applications, we need a way to distribute secret keys using open networks. Public-key cryptography is the best way to distribute these secret keys. To use public-key cryptography, we need to build a public-key infrastructure (PKI) to support and manage public-key certificates and certificate authority (CA) networks. In particular, PKIs are set up to perform the following functions: Determine the legitimacy of users before issuing public-key certificates to them. Issue public-key certificates upon user requests. Extend public-key certificates valid time upon user requests. Revoke public-key certificates upon users' requests or when the corresponding private keys are compromised. Store and manage public-key certificates. Prevent digital signature signers from denying their signatures. Support CA networks to allow different CAs to authenticate public-key certificates issued by other CAs. X.509: https://certificatedecoder.dev/?gclid=EAIaIQobChMI0M731O6G6gIVVSQrCh04bQaAEAAYASAAEgKRkPD_BwE IPsec: A Security Protocol at the Network Layer IPsec is a major security protocol at the network layer IPsec provides a potent platform for constructing virtual private networks (VPN). VPNs are private networks overlayed on public networks. The purpose of deploying cryptographic algorithms at the network layer is to encrypt or authenticate IP packets (either just the payloads or the whole packets). IPsec also specifies how to exchange keys. Thus, IPsec consists of authentication protocols, encryption protocols, and key exchange protocols. They are referred to, respectively, as authentication header (AH), encapsulating security payload (ESP), and Internet key exchange (IKE). PGP & S/MIME : Email Security There are several security protocols at the application layer. The most used of these protocols are email security protocols namely PGP and S/MIME. SMTP (\u201cSimple Mail Transfer Protocol\u201d) is used for sending and delivering from a client to a server via port 25: it\u2019s the outgoing server. On the contrary, POP (\u201cPost Office Protocol\u201d) allows the users to pick up the message and download it into their inbox: it\u2019s the incoming server. The latest version of the Post Office Protocol is named POP3, and it\u2019s been used since 1996; it uses port 110 PGP PGP implements all major cryptographic algorithms, the ZIP compression algorithm, and the Base64 encoding algorithm. It can be used to authenticate a message, encrypt a message, or both. PGP follows the following general process: authentication, ZIP compression, encryption, and Base64 encoding. The Base64 encoding procedure makes the message ready for SMTP transmission GPG (GnuPG) GnuPG is another free encryption standard that companies may use that is based on OpenPGP. GnuPG serves as a replacement for Symantec\u2019s PGP. The main difference is the supported algorithms. However, GnuPG plays nice with PGP by design. Because GnuPG is open, some businesses would prefer the technical support and the user interface that comes with Symantec\u2019s PGP. It is important to note that there are some nuances between the compatibility of GnuPG and PGP, such as the compatibility between certain algorithms, but in most applications such as email, there are workarounds. One such algorithm is the IDEA Module which isn\u2019t included in GnuPG out of the box due to patent issues. S/MIME SMTP can only handle 7-bit ASCII text (You can use UTF-8 extensions to alleviate these limitations, ) messages. While POP can handle other content types besides 7-bit ASCII, POP may, under a common default setting, download all the messages stored in the mail server to the user's local computer. After that, if POP removes these messages from the mail server. This makes it difficult for the users to read their messages from multiple computers. The Multipurpose Internet Mail Extension protocol (MIME) was designed to support sending and receiving email messages in various formats, including nontext files generated by word processors, graphics files, sound files, and video clips. Moreover, MIME allows a single message to include mixed types of data in any combination of these formats. The Internet Mail Access Protocol (IMAP), operated on TCP port 143(only for non-encrypted), stores (Configurable on both server & client just like PoP) incoming email messages in the mail server until the user deletes them deliberately. This allows the users to access their mailbox from multiple machines and download messages to a local machine without deleting it from the mailbox in the mail server. SSL/TLS SSL uses a PKI to decide if a server\u2019s public key is trustworthy by requiring servers to use a security certificate signed by a trusted CA. When Netscape Navigator 1.0 was released, it trusted a single CA operated by the RSA Data Security corporation. The server\u2019s public RSA keys were used to be stored in the security certificate, which can then be used by the browser to establish a secure communication channel. The security certificates we use today still rely on the same standard (named X.509) that Netscape Navigator 1.0 used back then. Netscape intended to train users(though this didn\u2019t work out later) to differentiate secure communications from insecure ones, so they put a lock icon next to the address bar. When the lock is open, the communication is insecure. A closed lock means communication has been secured with SSL, which required the server to provide a signed certificate. You\u2019re obviously familiar with this icon as it\u2019s been in every browser ever since. The engineers at Netscape truly created a standard for secure internet communications. A year after releasing SSL 2.0, Netscape fixed several security issues and released SSL 3.0, a protocol that, albeit being officially deprecated since June 2015, remains in use in certain parts of the world more than 20 years after its introduction. To standardize SSL, the Internet Engineering Task Force (IETF) created a slightly modified SSL 3.0 and, in 1999, unveiled it as Transport Layer Security (TLS) 1.0. The name change between SSL and TLS continues to confuse people today. Officially, TLS is the new SSL, but in practice, people use SSL and TLS interchangeably to talk about any version of the protocol. Must See: https://tls.ulfheim.net/ https://davidwong.fr/tls13/ Network Perimeter Security Let us see how we keep a check on the perimeter i.e the edges, the first layer of protection General Firewall Framework Firewalls are needed because encryption algorithms cannot effectively stop malicious packets from getting into an edge network. This is because IP packets, regardless of whether they are encrypted, can always be forwarded into an edge network. Firewalls that were developed in the 1990s are important instruments to help restrict network access. A firewall may be a hardware device, a software package, or a combination of both. Packets flowing into the internal network from the outside should be evaluated before they are allowed to enter. One of the critical elements of a firewall is its ability to examine packets without imposing a negative impact on communication speed while providing security protections for the internal network. The packet inspection that is carried out by firewalls can be done using several different methods. Based on the particular method used by the firewall, it can be characterized as either a packet filter, circuit gateway, application gateway, or dynamic packet filter. Packet Filters It inspects ingress packets coming to an internal network from outside and inspects egress packets going outside from an internal network Packing filtering only inspects IP headers and TCP headers, not the payloads generated at the application layer A packet-filtering firewall uses a set of rules to determine whether a packet should be allowed or denied to pass through. 2 types: Stateless It treats each packet as an independent object, and it does not keep track of any previously processed packets. In other words, stateless filtering inspects a packet when it arrives and makes a decision without leaving any record of the packet being inspected. Stateful Stateful filtering, also referred to as connection-state filtering, keeps track of connections between an internal host and an external host. A connection state (or state, for short) indicates whether it is a TCP connection or a UDP connection and whether the connection is established. Circuit Gateways Circuit gateways, also referred to as circuit-level gateways, are typically operated at the transportation layer They evaluate the information of the IP addresses and the port numbers contained in TCP (or UDP) headers and use it to determine whether to allow or to disallow an internal host and an external host to establish a connection. It is common practice to combine packet filters and circuit gateways to form a dynamic packet filter (DPF). Application Gateways(ALG) Aka PROXY Servers An Application Level Gateway (ALG) acts as a proxy for internal hosts, processing service requests from external clients. An ALG performs deep inspections on each IP packet (ingress or egress). In particular, an ALG inspects application program formats contained in the packet (e.g., MIME format or SQL format) and examines whether its payload is permitted. Thus, an ALG may be able to detect a computer virus contained in the payload. Because an ALG inspects packet payloads, it may be able to detect malicious code and quarantine suspicious packets, in addition to blocking packets with suspicious IP addresses and TCP ports. On the other hand, an ALG also incurs substantial computation and space overheads. Trusted Systems & Bastion Hosts A Trusted Operating System (TOS) is an operating system that meets a particular set of security requirements. Whether an operating system can be trusted or not depends on several elements. For example, for an operating system on a particular computer to be certified trusted, one needs to validate that, among other things, the following four requirements are satisfied: Its system design contains no defects; Its system software contains no loopholes; Its system is configured properly; and Its system management is appropriate. Bastion Hosts Bastion hosts are computers with strong defence mechanisms. They often serve as host computers for implementing application gateways, circuit gateways, and other types of firewalls. A bastion host is operated on a trusted operating system that must not contain unnecessary functionalities or programs. This measure helps to reduce error probabilities and makes it easier to conduct security checks. Only those network application programs that are necessary, for example, SSH, DNS, SMTP, and authentication programs, are installed on a bastion host. Bastion hosts are also primarily used as controlled ingress points so that the security monitoring can focus more narrowly on actions happening at a single point closely. Common Techniques & Scannings, Packet Capturing Scanning Ports with Nmap Nmap (\"Network Mapper\") is a free and open-source (license) utility for network discovery and security auditing. Many systems and network administrators also find it useful for tasks such as network inventory, managing service upgrade schedules, and monitoring host or service uptime. The best thing about Nmap is it\u2019s free and open-source and is very flexible and versatile Nmap is often used to determine alive hosts in a network, open ports on those hosts, services running on those open ports, and version identification of that service on that port. More at http://scanme.nmap.org/ nmap [scan type] [options] [target specification] Nmap uses 6 different port states: Open \u2014 An open port is one that is actively accepting TCP, UDP or SCTP connections. Open ports are what interests us the most because they are the ones that are vulnerable to attacks. Open ports also show the available services on a network. Closed \u2014 A port that receives and responds to Nmap probe packets but there is no application listening on that port. Useful for identifying that the host exists and for OS detection. Filtered \u2014 Nmap can\u2019t determine whether the port is open because packet filtering prevents its probes from reaching the port. Filtering could come from firewalls or router rules. Often little information is given from filtered ports during scans as the filters can drop the probes without responding or respond with useless error messages e.g. destination unreachable. Unfiltered \u2014 Port is accessible but Nmap doesn\u2019t know if it is open or closed. Only used in ACK scan which is used to map firewall rulesets. Other scan types can be used to identify whether the port is open. Open/filtered \u2014 Nmap is unable to determine between open and filtered. This happens when an open port gives no response. No response could mean that the probe was dropped by a packet filter or any response is blocked. Closed/filtered \u2014 Nmap is unable to determine whether a port is closed or filtered. Only used in the IP ID idle scan. Types of Nmap Scan: TCP Connect TCP Connect scan completes the 3-way handshake. If a port is open, the operating system completes the TCP three-way handshake and the port scanner immediately closes the connection to avoid DOS. This is \u201cnoisy\u201d because the services can log the sender IP address and might trigger Intrusion Detection Systems. UDP Scan This scan checks to see if any UDP ports are listening. Since UDP does not respond with a positive acknowledgement like TCP and only responds to an incoming UDP packet when the port is closed, SYN Scan SYN scan is another form of TCP scanning. This scan type is also known as \u201chalf-open scanning\u201d because it never actually opens a full TCP connection. The port scanner generates a SYN packet. If the target port is open, it will respond with an SYN-ACK packet. The scanner host responds with an RST packet, closing the connection before the handshake is completed. If the port is closed but unfiltered, the target will instantly respond with an RST packet. SYN scan has the advantage that the individual services never actually receive a connection. FIN Scan This is a stealthy scan, like the SYN scan, but sends a TCP FIN packet instead. ACK Scan Ack scanning determines whether the port is filtered or not. Null Scan Another very stealthy scan that sets all the TCP header flags to off or null. This is not normally a valid packet and some hosts will not know what to do with this. XMAS Scan Similar to the NULL scan except for all the flags in the TCP header is set to on RPC Scan This special type of scan looks for machine answering to RPC (Remote Procedure Call) services IDLE Scan It is a super stealthy method whereby the scan packets are bounced off an external host. You don\u2019t need to have control over the other host but it does have to set up and meet certain requirements. You must input the IP address of our \u201czombie\u201d host and what port number to use. It is one of the more controversial options in Nmap since it only has a use for malicious attacks. Scan Techniques A couple of scan techniques which can be used to gain more information about a system and its ports. You can read more at https://medium.com/infosec-adventures/nmap-cheatsheet-a423fcdda0ca OpenVAS OpenVAS is a full-featured vulnerability scanner. OpenVAS is a framework of services and tools that provides a comprehensive and powerful vulnerability scanning and management package OpenVAS, which is an open-source program, began as a fork of the once-more-popular scanning program, Nessus. OpenVAS is made up of three main parts. These are: a regularly updated feed of Network Vulnerability Tests (NVTs); a scanner, which runs the NVTs; and an SQLite 3 database for storing both your test configurations and the NVTs\u2019 results and configurations. https://www.greenbone.net/en/install_use_gce/ WireShark Wireshark is a protocol analyzer. This means Wireshark is designed to decode not only packet bits and bytes but also the relations between packets and protocols. Wireshark understands protocol sequences. A simple demo of Wireshark Capture only udp packets: Capture filter = \u201cudp\u201d Capture only tcp packets Capture filter = \u201ctcp\u201d TCP/IP 3 way Handshake Filter by IP address: displays all traffic from IP, be it source or destination ip.addr == 192.168.1.1 Filter by source address: display traffic only from IP source ip.src == 192.168.0.1 Filter by destination: display traffic only form IP destination ip.dst == 192.168.0.1 Filter by IP subnet: display traffic from subnet, be it source or destination ip.addr = 192.168.0.1/24 Filter by protocol: filter traffic by protocol name dns http ftp arp ssh telnet icmp Exclude IP address: remove traffic from and to IP address !ip.addr ==192.168.0.1 Display traffic between two specific subnet ip.addr == 192.168.0.1/24 and ip.addr == 192.168.1.1/24 Display traffic between two specific workstations ip.addr == 192.168.0.1 and ip.addr == 192.168.0.2 Filter by MAC eth.addr = 00:50:7f:c5:b6:78 Filter TCP port tcp.port == 80 Filter TCP port source tcp.srcport == 80 Filter TCP port destination tcp.dstport == 80 Find user agents http.user_agent contains Firefox !http.user_agent contains || !http.user_agent contains Chrome Filter broadcast traffic !(arp or icmp or dns) Filter IP address and port tcp.port == 80 && ip.addr == 192.168.0.1 Filter all http get requests http.request Filter all http get requests and responses http.request or http.response Filter three way handshake tcp.flags.syn==1 or (tcp.seq==1 and tcp.ack==1 and tcp.len==0 and tcp.analysis.initial_rtt) Find files by type frame contains \u201c(attachment|tar|exe|zip|pdf)\u201d Find traffic based on keyword tcp contains facebook frame contains facebook Detecting SYN Floods tcp.flags.syn == 1 and tcp.flags.ack == 0 Wireshark Promiscuous Mode - By default, Wireshark only captures packets going to and from the computer where it runs. By checking the box to run Wireshark in Promiscuous Mode in the Capture Settings, you can capture most of the traffic on the LAN. DumpCap Dumpcap is a network traffic dump tool. It captures packet data from a live network and writes the packets to a file. Dumpcap\u2019s native capture file format is pcapng, which is also the format used by Wireshark. By default, Dumpcap uses the pcap library to capture traffic from the first available network interface and writes the received raw packet data, along with the packets\u2019 time stamps into a pcapng file. The capture filter syntax follows the rules of the pcap library. The Wireshark command-line utility called 'dumpcap.exe' can be used to capture LAN traffic over an extended period of time. Wireshark itself can also be used, but dumpcap does not significantly utilize the computer's memory while capturing for long periods. DaemonLogger Daemonlogger is a packet logging application designed specifically for use in Network and Systems Management (NSM) environments. The biggest benefit Daemonlogger provides is that, like Dumpcap, it is simple to use for capturing packets. In order to begin capturing, you need only to invoke the command and specify an interface. daemonlogger \u2013i eth1 This option, by default, will begin capturing packets and logging them to the current working directory. Packets will be collected until the capture file size reaches 2 GB, and then a new file will be created. This will continue indefinitely until the process is halted. NetSniff-NG Netsniff-NG is a high-performance packet capture utility While the utilities we\u2019ve discussed to this point rely on Libpcap for capture, Netsniff-NG utilizes zero-copy mechanisms to capture packets. This is done with the intent to support full packet capture over high throughput links. To begin capturing packets with Netsniff-NG, we have to specify an input and output. In most cases, the input will be a network interface, and the output will be a file or folder on disk. netsniff-ng \u2013i eth1 \u2013o data.pcap Netflow NetFlow is a feature that was introduced on Cisco routers around 1996 that provides the ability to collect IP network traffic as it enters or exits an interface. By analyzing the data provided by NetFlow, a network administrator can determine things such as the source and destination of traffic, class of service, and the causes of congestion. A typical flow monitoring setup (using NetFlow) consists of three main components:[1] Flow exporter: aggregates packets into flows and exports flow records towards one or more flow collectors. Flow collector: responsible for reception, storage and pre-processing of flow data received from a flow exporter. Analysis application: analyzes received flow data in the context of intrusion detection or traffic profiling, for example. Routers and switches that support NetFlow can collect IP traffic statistics on all interfaces where NetFlow is enabled, and later export those statistics as NetFlow records toward at least one NetFlow collector\u2014typically a server that does the actual traffic analysis. IDS A security solution that detects security-related events in your environment but does not block them. IDS sensors can be software and hardware-based used to collect and analyze the network traffic. These sensors are available in two varieties, network IDS and host IDS. A host IDS is a server-specific agent running on a server with a minimum of overhead to monitor the operating system. A network IDS can be embedded in a networking device, a standalone appliance, or a module monitoring the network traffic. Signature Based IDS The signature-based IDS monitors the network traffic or observes the system and sends an alarm if a known malicious event is happening. It does so by comparing the data flow against a database of known attack patterns These signatures explicitly define what traffic or activity should be considered as malicious. Signature-based detection has been the bread and butter of network-based defensive security for over a decade, partially because it is very similar to how malicious activity is detected at the host level with antivirus utilities The formula is fairly simple: an analyst observes a malicious activity, derives indicators from the activity and develops them into signatures, and then those signatures will alert whenever the activity occurs again. ex: SNORT & SURICATA Policy-Based IDS The policy-based IDSs (mainly host IDSs) trigger an alarm whenever a violation occurs against the configured policy. This configured policy is or should be a representation of the security policies. This type of IDS is flexible and can be customized to a company's network requirements because it knows exactly what is permitted and what is not. On the other hand, the signature-based systems rely on vendor specifics and default settings. Anomaly Based IDS The anomaly-based IDS looks for traffic that deviates from the normal, but the definition of what is a normal network traffic pattern is the tricky part Two types of anomaly-based IDS exist: statistical and nonstatistical anomaly detection Statistical anomaly detection learns the traffic patterns interactively over a period of time. In the nonstatistical approach, the IDS has a predefined configuration of the supposedly acceptable and valid traffic patterns. Host-Based IDS & Network-Based IDS A host IDS can be described as a distributed agent residing on each server of the network that needs protection. These distributed agents are tied very closely to the underlying operating system. Network IDSs, on the other hand, can be described as intelligent sniffing devices. Data (raw packets) is captured from the network by a network IDS, whereas host IDSs capture the data from the host on which they are installed. Honeypots The use of decoy machines to direct intruders' attention away from the machines under protection is a major technique to preclude intrusion attacks. Any device, system, directory, or file used as a decoy to lure attackers away from important assets and to collect intrusion or abusive behaviours is referred to as a honeypot. A honeypot may be implemented as a physical device or as an emulation system. The idea is to set up decoy machines in a LAN, or decoy directories/files in a file system and make them appear important, but with several exploitable loopholes, to lure attackers to attack these machines or directories/files, so that other machines, directories, and files can evade intruders' attentions. A decoy machine may be a host computer or a server computer. Likewise, we may also set up decoy routers or even decoy LANs. Chinks In The Armour (TCP/IP Security Issues) IP Spoofing In this type of attack, the attacker replaces the IP address of the sender, or in some rare cases the destination, with a different address. IP spoofing is normally used to exploit a target host. In other cases, it is used to start a denial-of-service (DoS) attack. In a DoS attack, an attacker modifies the IP packet to mislead the target host into accepting the original packet as a packet sourced at a trusted host. The attacker must know the IP address of the trusted host to modify the packet headers (source IP address) so that it appears that the packets are coming from that host. IP Spoofing Detection Techniques Direct TTL Probes In this technique we send a packet to a host of suspect spoofed IP that triggers reply and compares TTL with suspect packet; if the TTL in the reply is not the same as the packet being checked; it is a spoofed packet. This Technique is successful when the attacker is in a different subnet from the victim. IP Identification Number. Send a probe to the host of suspect spoofed traffic that triggers a reply and compares IP ID with suspect traffic. If IP IDs are not in the near value of packet being checked, suspect traffic is spoofed TCP Flow Control Method Attackers sending spoofed TCP packets will not receive the target\u2019s SYN-ACK packets. Attackers cannot, therefore, be responsive to change in the congestion window size When the receiver still receives traffic even after a windows size is exhausted, most probably the packets are spoofed. Covert Channel A covert or clandestine channel can be best described as a pipe or communication channel between two entities that can be exploited by a process or application transferring information in a manner that violates the system's security specifications. More specifically for TCP/IP, in some instances, covert channels are established, and data can be secretly passed between two end systems. Ex: ICMP resides at the Internet layer of the TCP/IP protocol suite and is implemented in all TCP/IP hosts. Based on the specifications of the ICMP Protocol, an ICMP Echo Request message should have an 8-byte header and a 56-byte payload. The ICMP Echo Request packet should not carry any data in the payload. However, these packets are often used to carry secret information. The ICMP packets are altered slightly to carry secret data in the payload. This makes the size of the packet larger, but no control exists in the protocol stack to defeat this behaviour. The alteration of ICMP packets allows intruders to program specialized client-server pairs. These small pieces of code export confidential information without alerting the network administrator. ICMP can be leveraged for more than data exfiltration. For eg. some C&C tools such as Loki used ICMP channel to establish encrypted interactive session back in 1996. Deep packet inspection has since come a long way. A lot of IDS/IPS detect ICMP tunnelling. Check for echo responses that do not contain the same payload as request Check for the volume of ICMP traffic especially for volumes beyond an acceptable threshold IP Fragmentation Attack The TCP/IP protocol suite, or more specifically IP, allows the fragmentation of packets.(this is a feature & not a bug) IP fragmentation offset is used to keep track of the different parts of a datagram. The information or content in this field is used at the destination to reassemble the datagrams All such fragments have the same Identification field value, and the fragmentation offset indicates the position of the current fragment in the context of the original packet. Many access routers and firewalls do not perform packet reassembly. In normal operation, IP fragments do not overlap, but attackers can create artificially fragmented packets to mislead the routers or firewalls. Usually, these packets are small and almost impractical for end systems because of data and computational overhead. A good example of an IP fragmentation attack is the Ping of Death attack. The Ping of Death attack sends fragments that, when reassembled at the end station, create a larger packet than the maximum permissible length. TCP Flags Data exchange using TCP does not happen until a three-way handshake has been completed. This handshake uses different flags to influence the way TCP segments are processed. There are 6 bits in the TCP header that are often called flags. Namely: 6 different flags are part of the TCP header: Urgent pointer field (URG), Acknowledgment field (ACK), Push function (PSH), Reset the connection (RST), Synchronize sequence numbers (SYN), and the sender is finished with this connection (FIN). Abuse of the normal operation or settings of these flags can be used by attackers to launch DoS attacks. This causes network servers or web servers to crash or hang. | SYN | FIN | PSH | RST | Validity| |------|------|-------|------|---------| | 1 |1 |0 |0 |Illegal Combination | 1 |1 |1 |0 |Illegal Combination | 1 |1 |0 |1 |Illegal Combination | 1 |1 |1 |1 |Illegal Combination The attacker's ultimate goal is to write special programs or pieces of code that can construct these illegal combinations resulting in an efficient DoS attack. SYN FLOOD The timers (or lack of certain timers) in 3 way handshake are often used and exploited by attackers to disable services or even to enter systems. After step 2 of the three-way handshake, no limit is set on the time to wait after receiving a SYN. The attacker initiates many connection requests to the webserver of Company XYZ (almost certainly with a spoofed IP address). The SYN+ACK packets (Step 2) sent by the web server back to the originating source IP address are not replied to. This leaves a TCP session half-open on the webserver. Multiple packets cause multiple TCP sessions to stay open. Based on the hardware limitations of the server, a limited number of TCP sessions can stay open, and as a result, the webserver refuses further connection establishments attempts from any host as soon as a certain limit is reached. These half-open connections need to be completed or timed out before new connections can be established. FIN Attack In normal operation, the sender sets the TCP FIN flag indicating that no more data will be transmitted and the connection can be closed down. This is a four-way handshake mechanism, with both sender and receiver expected to send an acknowledgement on a received FIN packet. During an attack that is trying to kill connections, a spoofed FIN packet is constructed. This packet also has the correct sequence number, so the packets are seen as valid by the targeted host. These sequence numbers are easy to predict. This process is referred to as TCP sequence number prediction, whereby the attacker either sniffs the current Sequence and Acknowledgment (SEQ/ACK) numbers of the connection or can algorithmically predict these numbers. Connection Hijacking An authorized user (Employee X) sends HTTP requests over a TCP session with the webserver. The web server accepts the packets from Employee X only when the packet has the correct SEQ/ACK numbers. As seen previously, these numbers are important for the webserver to distinguish between different sessions and to make sure it is still talking to Employee X. Imagine that the cracker starts sending packets to the web server spoofing the IP address of Employee X, using the correct SEQ/ACK combination. The web server accepts the packet and increments the ACK number. In the meantime, Employee X continues to send packets but with incorrect SEQ/ACK numbers. As a result of sending unsynchronized packets, all data from Employee X is discarded when received by the webserver. The attacker pretends to be Employee X using the correct numbers. This finally results in the cracker hijacking the connection, whereby Employee X is completely confused and the webserver replies assuming the cracker is sending correct synchronized data. STEPS: The attacker examines the traffic flows with a network monitor and notices traffic from Employee X to a web server. The web server returns or echoes data back to the origination station (Employee X). Employee X acknowledges the packet. The cracker launches a spoofed packet to the server. The web server responds to the cracker. The cracker starts verifying SEQ/ACK numbers to double-check success. At this time, the cracker takes over the session from Employee X, which results in a session hanging for Employee X. The cracker can start sending traffic to the webserver. The web server returns the requested data to confirm delivery with the correct ACK number. The cracker can continue to send data (keeping track of the correct SEQ/ACK numbers) until eventually setting the FIN flag to terminate the session. Buffer Overflow A buffer is a temporary data storage area used to store program code and data. When a program or process tries to store more data in a buffer than it was originally anticipated to hold, a buffer overflow occurs. Buffers are temporary storage locations in memory (memory or buffer sizes are often measured in bytes) that can store a fixed amount of data in bytes. When more data is retrieved than can be stored in a buffer location, the additional information must go into an adjacent buffer, resulting in overwriting the valid data held in them. Mechanism: Buffer overflow vulnerabilities exist in different types. But the overall goal for all buffer overflow attacks is to take over the control of a privileged program and, if possible, the host. The attacker has two tasks to achieve this goal. First, the dirty code needs to be available in the program's code address space. Second, the privileged program should jump to that particular part of the code, which ensures that the proper parameters are loaded into memory. The first task can be achieved in two ways: by injecting the code in the right address space or by using the existing code and modifying certain parameters slightly. The second task is a little more complex because the program's control flow needs to be modified to make the program jump to the dirty code. CounterMeasure: The most important approach is to have a concerted focus on writing correct code. A second method is to make the data buffers (memory locations) address space of the program code non-executable. This type of address space makes it impossible to execute code, which might be infiltrated in the program's buffers during an attack. More Spoofing Address Resolution Protocol Spoofing The Address Resolution Protocol (ARP) provides a mechanism to resolve, or map, a known IP address to a MAC sublayer address. Using ARP spoofing, the cracker can exploit this hardware address authentication mechanism by spoofing the hardware address of Host B. Basically, the attacker can convince any host or network device on the local network that the cracker's workstation is the host to be trusted. This is a common method used in a switched environment. ARP spoofing can be prevented with the implementation of static ARP tables in all the hosts and routers of your network. Alternatively, you can implement an ARP server that responds to ARP requests on behalf of the target host. DNS Spoofing DNS spoofing is the method whereby the hacker convinces the target machine that the system it wants to connect to is the machine of the cracker. The cracker modifies some records so that name entries of hosts correspond to the attacker's IP address. There have been instances in which the complete DNS server was compromised by an attack. To counter DNS spoofing, the reverse lookup detects these attacks. The reverse lookup is a mechanism to verify the IP address against a name. The IP address and name files are usually kept on different servers to make compromise much more difficult","title":"Network Security"},{"location":"level101/security/network_security/#part-ii-network-security","text":"","title":"Part II: Network Security"},{"location":"level101/security/network_security/#introduction","text":"TCP/IP is the dominant networking technology today. It is a five-layer architecture. These layers are, from top to bottom, the application layer, the transport layer (TCP), the network layer (IP), the data-link layer, and the physical layer. In addition to TCP/IP, there also are other networking technologies. For convenience, we use the OSI network model to represent non-TCP/IP network technologies. Different networks are interconnected using gateways. A gateway can be placed at any layer. The OSI model is a seven-layer architecture. The OSI architecture is similar to the TCP/IP architecture, except that the OSI model specifies two additional layers between the application layer and the transport layer in the TCP/IP architecture. These two layers are the presentation layer and the session layer. Figure 5.1 shows the relationship between the TCP/IP layers and the OSI layers. The application layer in TCP/IP corresponds to the application layer and the presentation layer in OSI. The transport layer in TCP/IP corresponds to the session layer and the transport layer in OSI. The remaining three layers in the TCP/IP architecture are one-to-one correspondent to the remaining three layers in the OSI model. Correspondence between layers of the TCP/IP architecture and the OSI model. Also shown are placements of cryptographic algorithms in network layers, where the dotted arrows indicate actual communications of cryptographic algorithms The functionalities of OSI layers are briefly described as follows: The application layer serves as an interface between applications and network programs. It supports application programs and end-user processing. Common application-layer programs include remote logins, file transfer, email, and Web browsing. The presentation layer is responsible for dealing with data that is formed differently. This protocol layer allows application-layer programs residing on different sides of a communication channel with different platforms to understand each other's data formats regardless of how they are presented. The session layer is responsible for creating, managing, and closing a communication connection. The transport layer is responsible for providing reliable connections, such as packet sequencing, traffic control, and congestion control. The network layer is responsible for routing device-independent data packets from the current hop to the next hop. The data-link layer is responsible for encapsulating device-independent data packets into device-dependent data frames. It has two sublayers: logical link control and media access control. The physical layer is responsible for transmitting device-dependent frames through some physical media. Starting from the application layer, data generated from an application program is passed down layer-by-layer to the physical layer. Data from the previous layer is enclosed in a new envelope at the current layer, where the data from the previous layer is also just an envelope containing the data from the layer before it. This is similar to enclosing a smaller envelope in a larger one. The envelope added at each layer contains sufficient information for handling the packet. Application-layer data are divided into blocks small enough to be encapsulated in an envelope at the next layer. Application data blocks are \u201cdressed up\u201d in the TCP/IP architecture according to the following basic steps. At the sending side, an application data block is encapsulated in a TCP packet when it is passed down to the TCP layer. In other words, a TCP packet consists of a header and a payload, where the header corresponds to the TCP envelope and the payload is the application data block. Likewise, the TCP packet will be encapsulated in an IP packet when it is passed down to the IP layer. An IP packet consists of a header and a payload, which is the TCP packet passed down from the TCP layer. The IP packet will be encapsulated in a device-dependent frame (e.g., an Ethernet frame) when it is passed down to the data-link layer. A frame has a header, and it may also have a trailer. For example, in addition to having a header, an Ethernet frame also has a 32-bit cyclic redundancy check (CRC) trailer. When it is passed down to the physical layer, a frame will be transformed into a sequence of media signals for transmission Flow Diagram of a Packet Generation At the destination side, the medium signals are converted by the physical layer into a frame, which is passed up to the data-link layer. The data-link layer passes the frame payload (i.e., the IP packet encapsulated in the frame) up to the IP layer. The IP layer passes the IP payload, namely, the TCP packet encapsulated in the IP packet, up to the TCP layer. The TCP layer passes the TCP payload, namely, the application data block, up to the application layer. When a packet arrives at a router, it only goes up to the IP layer, where certain fields in the IP header are modified (e.g., the value of TTL is decreased by 1). This modified packet is then passed back down layer-by-layer to the physical layer for further transmission.","title":"Introduction"},{"location":"level101/security/network_security/#public-key-infrastructure","text":"To deploy cryptographic algorithms in network applications, we need a way to distribute secret keys using open networks. Public-key cryptography is the best way to distribute these secret keys. To use public-key cryptography, we need to build a public-key infrastructure (PKI) to support and manage public-key certificates and certificate authority (CA) networks. In particular, PKIs are set up to perform the following functions: Determine the legitimacy of users before issuing public-key certificates to them. Issue public-key certificates upon user requests. Extend public-key certificates valid time upon user requests. Revoke public-key certificates upon users' requests or when the corresponding private keys are compromised. Store and manage public-key certificates. Prevent digital signature signers from denying their signatures. Support CA networks to allow different CAs to authenticate public-key certificates issued by other CAs. X.509: https://certificatedecoder.dev/?gclid=EAIaIQobChMI0M731O6G6gIVVSQrCh04bQaAEAAYASAAEgKRkPD_BwE","title":"Public Key Infrastructure"},{"location":"level101/security/network_security/#ipsec-a-security-protocol-at-the-network-layer","text":"IPsec is a major security protocol at the network layer IPsec provides a potent platform for constructing virtual private networks (VPN). VPNs are private networks overlayed on public networks. The purpose of deploying cryptographic algorithms at the network layer is to encrypt or authenticate IP packets (either just the payloads or the whole packets). IPsec also specifies how to exchange keys. Thus, IPsec consists of authentication protocols, encryption protocols, and key exchange protocols. They are referred to, respectively, as authentication header (AH), encapsulating security payload (ESP), and Internet key exchange (IKE).","title":"IPsec: A Security Protocol at the Network Layer"},{"location":"level101/security/network_security/#pgp-smime-email-security","text":"There are several security protocols at the application layer. The most used of these protocols are email security protocols namely PGP and S/MIME. SMTP (\u201cSimple Mail Transfer Protocol\u201d) is used for sending and delivering from a client to a server via port 25: it\u2019s the outgoing server. On the contrary, POP (\u201cPost Office Protocol\u201d) allows the users to pick up the message and download it into their inbox: it\u2019s the incoming server. The latest version of the Post Office Protocol is named POP3, and it\u2019s been used since 1996; it uses port 110 PGP PGP implements all major cryptographic algorithms, the ZIP compression algorithm, and the Base64 encoding algorithm. It can be used to authenticate a message, encrypt a message, or both. PGP follows the following general process: authentication, ZIP compression, encryption, and Base64 encoding. The Base64 encoding procedure makes the message ready for SMTP transmission GPG (GnuPG) GnuPG is another free encryption standard that companies may use that is based on OpenPGP. GnuPG serves as a replacement for Symantec\u2019s PGP. The main difference is the supported algorithms. However, GnuPG plays nice with PGP by design. Because GnuPG is open, some businesses would prefer the technical support and the user interface that comes with Symantec\u2019s PGP. It is important to note that there are some nuances between the compatibility of GnuPG and PGP, such as the compatibility between certain algorithms, but in most applications such as email, there are workarounds. One such algorithm is the IDEA Module which isn\u2019t included in GnuPG out of the box due to patent issues. S/MIME SMTP can only handle 7-bit ASCII text (You can use UTF-8 extensions to alleviate these limitations, ) messages. While POP can handle other content types besides 7-bit ASCII, POP may, under a common default setting, download all the messages stored in the mail server to the user's local computer. After that, if POP removes these messages from the mail server. This makes it difficult for the users to read their messages from multiple computers. The Multipurpose Internet Mail Extension protocol (MIME) was designed to support sending and receiving email messages in various formats, including nontext files generated by word processors, graphics files, sound files, and video clips. Moreover, MIME allows a single message to include mixed types of data in any combination of these formats. The Internet Mail Access Protocol (IMAP), operated on TCP port 143(only for non-encrypted), stores (Configurable on both server & client just like PoP) incoming email messages in the mail server until the user deletes them deliberately. This allows the users to access their mailbox from multiple machines and download messages to a local machine without deleting it from the mailbox in the mail server. SSL/TLS SSL uses a PKI to decide if a server\u2019s public key is trustworthy by requiring servers to use a security certificate signed by a trusted CA. When Netscape Navigator 1.0 was released, it trusted a single CA operated by the RSA Data Security corporation. The server\u2019s public RSA keys were used to be stored in the security certificate, which can then be used by the browser to establish a secure communication channel. The security certificates we use today still rely on the same standard (named X.509) that Netscape Navigator 1.0 used back then. Netscape intended to train users(though this didn\u2019t work out later) to differentiate secure communications from insecure ones, so they put a lock icon next to the address bar. When the lock is open, the communication is insecure. A closed lock means communication has been secured with SSL, which required the server to provide a signed certificate. You\u2019re obviously familiar with this icon as it\u2019s been in every browser ever since. The engineers at Netscape truly created a standard for secure internet communications. A year after releasing SSL 2.0, Netscape fixed several security issues and released SSL 3.0, a protocol that, albeit being officially deprecated since June 2015, remains in use in certain parts of the world more than 20 years after its introduction. To standardize SSL, the Internet Engineering Task Force (IETF) created a slightly modified SSL 3.0 and, in 1999, unveiled it as Transport Layer Security (TLS) 1.0. The name change between SSL and TLS continues to confuse people today. Officially, TLS is the new SSL, but in practice, people use SSL and TLS interchangeably to talk about any version of the protocol. Must See: https://tls.ulfheim.net/ https://davidwong.fr/tls13/","title":"PGP & S/MIME : Email Security"},{"location":"level101/security/network_security/#network-perimeter-security","text":"Let us see how we keep a check on the perimeter i.e the edges, the first layer of protection","title":"Network Perimeter Security"},{"location":"level101/security/network_security/#general-firewall-framework","text":"Firewalls are needed because encryption algorithms cannot effectively stop malicious packets from getting into an edge network. This is because IP packets, regardless of whether they are encrypted, can always be forwarded into an edge network. Firewalls that were developed in the 1990s are important instruments to help restrict network access. A firewall may be a hardware device, a software package, or a combination of both. Packets flowing into the internal network from the outside should be evaluated before they are allowed to enter. One of the critical elements of a firewall is its ability to examine packets without imposing a negative impact on communication speed while providing security protections for the internal network. The packet inspection that is carried out by firewalls can be done using several different methods. Based on the particular method used by the firewall, it can be characterized as either a packet filter, circuit gateway, application gateway, or dynamic packet filter.","title":"General Firewall Framework"},{"location":"level101/security/network_security/#packet-filters","text":"It inspects ingress packets coming to an internal network from outside and inspects egress packets going outside from an internal network Packing filtering only inspects IP headers and TCP headers, not the payloads generated at the application layer A packet-filtering firewall uses a set of rules to determine whether a packet should be allowed or denied to pass through. 2 types: Stateless It treats each packet as an independent object, and it does not keep track of any previously processed packets. In other words, stateless filtering inspects a packet when it arrives and makes a decision without leaving any record of the packet being inspected. Stateful Stateful filtering, also referred to as connection-state filtering, keeps track of connections between an internal host and an external host. A connection state (or state, for short) indicates whether it is a TCP connection or a UDP connection and whether the connection is established.","title":"Packet Filters"},{"location":"level101/security/network_security/#circuit-gateways","text":"Circuit gateways, also referred to as circuit-level gateways, are typically operated at the transportation layer They evaluate the information of the IP addresses and the port numbers contained in TCP (or UDP) headers and use it to determine whether to allow or to disallow an internal host and an external host to establish a connection. It is common practice to combine packet filters and circuit gateways to form a dynamic packet filter (DPF).","title":"Circuit Gateways"},{"location":"level101/security/network_security/#application-gatewaysalg","text":"Aka PROXY Servers An Application Level Gateway (ALG) acts as a proxy for internal hosts, processing service requests from external clients. An ALG performs deep inspections on each IP packet (ingress or egress). In particular, an ALG inspects application program formats contained in the packet (e.g., MIME format or SQL format) and examines whether its payload is permitted. Thus, an ALG may be able to detect a computer virus contained in the payload. Because an ALG inspects packet payloads, it may be able to detect malicious code and quarantine suspicious packets, in addition to blocking packets with suspicious IP addresses and TCP ports. On the other hand, an ALG also incurs substantial computation and space overheads.","title":"Application Gateways(ALG)"},{"location":"level101/security/network_security/#trusted-systems-bastion-hosts","text":"A Trusted Operating System (TOS) is an operating system that meets a particular set of security requirements. Whether an operating system can be trusted or not depends on several elements. For example, for an operating system on a particular computer to be certified trusted, one needs to validate that, among other things, the following four requirements are satisfied: Its system design contains no defects; Its system software contains no loopholes; Its system is configured properly; and Its system management is appropriate. Bastion Hosts Bastion hosts are computers with strong defence mechanisms. They often serve as host computers for implementing application gateways, circuit gateways, and other types of firewalls. A bastion host is operated on a trusted operating system that must not contain unnecessary functionalities or programs. This measure helps to reduce error probabilities and makes it easier to conduct security checks. Only those network application programs that are necessary, for example, SSH, DNS, SMTP, and authentication programs, are installed on a bastion host. Bastion hosts are also primarily used as controlled ingress points so that the security monitoring can focus more narrowly on actions happening at a single point closely.","title":"Trusted Systems & Bastion Hosts"},{"location":"level101/security/network_security/#common-techniques-scannings-packet-capturing","text":"","title":"Common Techniques & Scannings, Packet Capturing"},{"location":"level101/security/network_security/#scanning-ports-with-nmap","text":"Nmap (\"Network Mapper\") is a free and open-source (license) utility for network discovery and security auditing. Many systems and network administrators also find it useful for tasks such as network inventory, managing service upgrade schedules, and monitoring host or service uptime. The best thing about Nmap is it\u2019s free and open-source and is very flexible and versatile Nmap is often used to determine alive hosts in a network, open ports on those hosts, services running on those open ports, and version identification of that service on that port. More at http://scanme.nmap.org/ nmap [scan type] [options] [target specification] Nmap uses 6 different port states: Open \u2014 An open port is one that is actively accepting TCP, UDP or SCTP connections. Open ports are what interests us the most because they are the ones that are vulnerable to attacks. Open ports also show the available services on a network. Closed \u2014 A port that receives and responds to Nmap probe packets but there is no application listening on that port. Useful for identifying that the host exists and for OS detection. Filtered \u2014 Nmap can\u2019t determine whether the port is open because packet filtering prevents its probes from reaching the port. Filtering could come from firewalls or router rules. Often little information is given from filtered ports during scans as the filters can drop the probes without responding or respond with useless error messages e.g. destination unreachable. Unfiltered \u2014 Port is accessible but Nmap doesn\u2019t know if it is open or closed. Only used in ACK scan which is used to map firewall rulesets. Other scan types can be used to identify whether the port is open. Open/filtered \u2014 Nmap is unable to determine between open and filtered. This happens when an open port gives no response. No response could mean that the probe was dropped by a packet filter or any response is blocked. Closed/filtered \u2014 Nmap is unable to determine whether a port is closed or filtered. Only used in the IP ID idle scan.","title":"Scanning Ports with Nmap"},{"location":"level101/security/network_security/#types-of-nmap-scan","text":"TCP Connect TCP Connect scan completes the 3-way handshake. If a port is open, the operating system completes the TCP three-way handshake and the port scanner immediately closes the connection to avoid DOS. This is \u201cnoisy\u201d because the services can log the sender IP address and might trigger Intrusion Detection Systems. UDP Scan This scan checks to see if any UDP ports are listening. Since UDP does not respond with a positive acknowledgement like TCP and only responds to an incoming UDP packet when the port is closed, SYN Scan SYN scan is another form of TCP scanning. This scan type is also known as \u201chalf-open scanning\u201d because it never actually opens a full TCP connection. The port scanner generates a SYN packet. If the target port is open, it will respond with an SYN-ACK packet. The scanner host responds with an RST packet, closing the connection before the handshake is completed. If the port is closed but unfiltered, the target will instantly respond with an RST packet. SYN scan has the advantage that the individual services never actually receive a connection. FIN Scan This is a stealthy scan, like the SYN scan, but sends a TCP FIN packet instead. ACK Scan Ack scanning determines whether the port is filtered or not. Null Scan Another very stealthy scan that sets all the TCP header flags to off or null. This is not normally a valid packet and some hosts will not know what to do with this. XMAS Scan Similar to the NULL scan except for all the flags in the TCP header is set to on RPC Scan This special type of scan looks for machine answering to RPC (Remote Procedure Call) services IDLE Scan It is a super stealthy method whereby the scan packets are bounced off an external host. You don\u2019t need to have control over the other host but it does have to set up and meet certain requirements. You must input the IP address of our \u201czombie\u201d host and what port number to use. It is one of the more controversial options in Nmap since it only has a use for malicious attacks. Scan Techniques A couple of scan techniques which can be used to gain more information about a system and its ports. You can read more at https://medium.com/infosec-adventures/nmap-cheatsheet-a423fcdda0ca","title":"Types of Nmap Scan:"},{"location":"level101/security/network_security/#openvas","text":"OpenVAS is a full-featured vulnerability scanner. OpenVAS is a framework of services and tools that provides a comprehensive and powerful vulnerability scanning and management package OpenVAS, which is an open-source program, began as a fork of the once-more-popular scanning program, Nessus. OpenVAS is made up of three main parts. These are: a regularly updated feed of Network Vulnerability Tests (NVTs); a scanner, which runs the NVTs; and an SQLite 3 database for storing both your test configurations and the NVTs\u2019 results and configurations. https://www.greenbone.net/en/install_use_gce/","title":"OpenVAS"},{"location":"level101/security/network_security/#wireshark","text":"Wireshark is a protocol analyzer. This means Wireshark is designed to decode not only packet bits and bytes but also the relations between packets and protocols. Wireshark understands protocol sequences. A simple demo of Wireshark Capture only udp packets: Capture filter = \u201cudp\u201d Capture only tcp packets Capture filter = \u201ctcp\u201d TCP/IP 3 way Handshake Filter by IP address: displays all traffic from IP, be it source or destination ip.addr == 192.168.1.1 Filter by source address: display traffic only from IP source ip.src == 192.168.0.1 Filter by destination: display traffic only form IP destination ip.dst == 192.168.0.1 Filter by IP subnet: display traffic from subnet, be it source or destination ip.addr = 192.168.0.1/24 Filter by protocol: filter traffic by protocol name dns http ftp arp ssh telnet icmp Exclude IP address: remove traffic from and to IP address !ip.addr ==192.168.0.1 Display traffic between two specific subnet ip.addr == 192.168.0.1/24 and ip.addr == 192.168.1.1/24 Display traffic between two specific workstations ip.addr == 192.168.0.1 and ip.addr == 192.168.0.2 Filter by MAC eth.addr = 00:50:7f:c5:b6:78 Filter TCP port tcp.port == 80 Filter TCP port source tcp.srcport == 80 Filter TCP port destination tcp.dstport == 80 Find user agents http.user_agent contains Firefox !http.user_agent contains || !http.user_agent contains Chrome Filter broadcast traffic !(arp or icmp or dns) Filter IP address and port tcp.port == 80 && ip.addr == 192.168.0.1 Filter all http get requests http.request Filter all http get requests and responses http.request or http.response Filter three way handshake tcp.flags.syn==1 or (tcp.seq==1 and tcp.ack==1 and tcp.len==0 and tcp.analysis.initial_rtt) Find files by type frame contains \u201c(attachment|tar|exe|zip|pdf)\u201d Find traffic based on keyword tcp contains facebook frame contains facebook Detecting SYN Floods tcp.flags.syn == 1 and tcp.flags.ack == 0 Wireshark Promiscuous Mode - By default, Wireshark only captures packets going to and from the computer where it runs. By checking the box to run Wireshark in Promiscuous Mode in the Capture Settings, you can capture most of the traffic on the LAN.","title":"WireShark"},{"location":"level101/security/network_security/#dumpcap","text":"Dumpcap is a network traffic dump tool. It captures packet data from a live network and writes the packets to a file. Dumpcap\u2019s native capture file format is pcapng, which is also the format used by Wireshark. By default, Dumpcap uses the pcap library to capture traffic from the first available network interface and writes the received raw packet data, along with the packets\u2019 time stamps into a pcapng file. The capture filter syntax follows the rules of the pcap library. The Wireshark command-line utility called 'dumpcap.exe' can be used to capture LAN traffic over an extended period of time. Wireshark itself can also be used, but dumpcap does not significantly utilize the computer's memory while capturing for long periods.","title":"DumpCap"},{"location":"level101/security/network_security/#daemonlogger","text":"Daemonlogger is a packet logging application designed specifically for use in Network and Systems Management (NSM) environments. The biggest benefit Daemonlogger provides is that, like Dumpcap, it is simple to use for capturing packets. In order to begin capturing, you need only to invoke the command and specify an interface. daemonlogger \u2013i eth1 This option, by default, will begin capturing packets and logging them to the current working directory. Packets will be collected until the capture file size reaches 2 GB, and then a new file will be created. This will continue indefinitely until the process is halted.","title":"DaemonLogger"},{"location":"level101/security/network_security/#netsniff-ng","text":"Netsniff-NG is a high-performance packet capture utility While the utilities we\u2019ve discussed to this point rely on Libpcap for capture, Netsniff-NG utilizes zero-copy mechanisms to capture packets. This is done with the intent to support full packet capture over high throughput links. To begin capturing packets with Netsniff-NG, we have to specify an input and output. In most cases, the input will be a network interface, and the output will be a file or folder on disk. netsniff-ng \u2013i eth1 \u2013o data.pcap","title":"NetSniff-NG"},{"location":"level101/security/network_security/#netflow","text":"NetFlow is a feature that was introduced on Cisco routers around 1996 that provides the ability to collect IP network traffic as it enters or exits an interface. By analyzing the data provided by NetFlow, a network administrator can determine things such as the source and destination of traffic, class of service, and the causes of congestion. A typical flow monitoring setup (using NetFlow) consists of three main components:[1] Flow exporter: aggregates packets into flows and exports flow records towards one or more flow collectors. Flow collector: responsible for reception, storage and pre-processing of flow data received from a flow exporter. Analysis application: analyzes received flow data in the context of intrusion detection or traffic profiling, for example. Routers and switches that support NetFlow can collect IP traffic statistics on all interfaces where NetFlow is enabled, and later export those statistics as NetFlow records toward at least one NetFlow collector\u2014typically a server that does the actual traffic analysis.","title":"Netflow"},{"location":"level101/security/network_security/#ids","text":"A security solution that detects security-related events in your environment but does not block them. IDS sensors can be software and hardware-based used to collect and analyze the network traffic. These sensors are available in two varieties, network IDS and host IDS. A host IDS is a server-specific agent running on a server with a minimum of overhead to monitor the operating system. A network IDS can be embedded in a networking device, a standalone appliance, or a module monitoring the network traffic. Signature Based IDS The signature-based IDS monitors the network traffic or observes the system and sends an alarm if a known malicious event is happening. It does so by comparing the data flow against a database of known attack patterns These signatures explicitly define what traffic or activity should be considered as malicious. Signature-based detection has been the bread and butter of network-based defensive security for over a decade, partially because it is very similar to how malicious activity is detected at the host level with antivirus utilities The formula is fairly simple: an analyst observes a malicious activity, derives indicators from the activity and develops them into signatures, and then those signatures will alert whenever the activity occurs again. ex: SNORT & SURICATA Policy-Based IDS The policy-based IDSs (mainly host IDSs) trigger an alarm whenever a violation occurs against the configured policy. This configured policy is or should be a representation of the security policies. This type of IDS is flexible and can be customized to a company's network requirements because it knows exactly what is permitted and what is not. On the other hand, the signature-based systems rely on vendor specifics and default settings. Anomaly Based IDS The anomaly-based IDS looks for traffic that deviates from the normal, but the definition of what is a normal network traffic pattern is the tricky part Two types of anomaly-based IDS exist: statistical and nonstatistical anomaly detection Statistical anomaly detection learns the traffic patterns interactively over a period of time. In the nonstatistical approach, the IDS has a predefined configuration of the supposedly acceptable and valid traffic patterns. Host-Based IDS & Network-Based IDS A host IDS can be described as a distributed agent residing on each server of the network that needs protection. These distributed agents are tied very closely to the underlying operating system. Network IDSs, on the other hand, can be described as intelligent sniffing devices. Data (raw packets) is captured from the network by a network IDS, whereas host IDSs capture the data from the host on which they are installed. Honeypots The use of decoy machines to direct intruders' attention away from the machines under protection is a major technique to preclude intrusion attacks. Any device, system, directory, or file used as a decoy to lure attackers away from important assets and to collect intrusion or abusive behaviours is referred to as a honeypot. A honeypot may be implemented as a physical device or as an emulation system. The idea is to set up decoy machines in a LAN, or decoy directories/files in a file system and make them appear important, but with several exploitable loopholes, to lure attackers to attack these machines or directories/files, so that other machines, directories, and files can evade intruders' attentions. A decoy machine may be a host computer or a server computer. Likewise, we may also set up decoy routers or even decoy LANs.","title":"IDS"},{"location":"level101/security/network_security/#chinks-in-the-armour-tcpip-security-issues","text":"","title":"Chinks In The Armour (TCP/IP Security Issues)"},{"location":"level101/security/network_security/#ip-spoofing","text":"In this type of attack, the attacker replaces the IP address of the sender, or in some rare cases the destination, with a different address. IP spoofing is normally used to exploit a target host. In other cases, it is used to start a denial-of-service (DoS) attack. In a DoS attack, an attacker modifies the IP packet to mislead the target host into accepting the original packet as a packet sourced at a trusted host. The attacker must know the IP address of the trusted host to modify the packet headers (source IP address) so that it appears that the packets are coming from that host. IP Spoofing Detection Techniques Direct TTL Probes In this technique we send a packet to a host of suspect spoofed IP that triggers reply and compares TTL with suspect packet; if the TTL in the reply is not the same as the packet being checked; it is a spoofed packet. This Technique is successful when the attacker is in a different subnet from the victim. IP Identification Number. Send a probe to the host of suspect spoofed traffic that triggers a reply and compares IP ID with suspect traffic. If IP IDs are not in the near value of packet being checked, suspect traffic is spoofed TCP Flow Control Method Attackers sending spoofed TCP packets will not receive the target\u2019s SYN-ACK packets. Attackers cannot, therefore, be responsive to change in the congestion window size When the receiver still receives traffic even after a windows size is exhausted, most probably the packets are spoofed.","title":"IP Spoofing"},{"location":"level101/security/network_security/#covert-channel","text":"A covert or clandestine channel can be best described as a pipe or communication channel between two entities that can be exploited by a process or application transferring information in a manner that violates the system's security specifications. More specifically for TCP/IP, in some instances, covert channels are established, and data can be secretly passed between two end systems. Ex: ICMP resides at the Internet layer of the TCP/IP protocol suite and is implemented in all TCP/IP hosts. Based on the specifications of the ICMP Protocol, an ICMP Echo Request message should have an 8-byte header and a 56-byte payload. The ICMP Echo Request packet should not carry any data in the payload. However, these packets are often used to carry secret information. The ICMP packets are altered slightly to carry secret data in the payload. This makes the size of the packet larger, but no control exists in the protocol stack to defeat this behaviour. The alteration of ICMP packets allows intruders to program specialized client-server pairs. These small pieces of code export confidential information without alerting the network administrator. ICMP can be leveraged for more than data exfiltration. For eg. some C&C tools such as Loki used ICMP channel to establish encrypted interactive session back in 1996. Deep packet inspection has since come a long way. A lot of IDS/IPS detect ICMP tunnelling. Check for echo responses that do not contain the same payload as request Check for the volume of ICMP traffic especially for volumes beyond an acceptable threshold","title":"Covert Channel"},{"location":"level101/security/network_security/#ip-fragmentation-attack","text":"The TCP/IP protocol suite, or more specifically IP, allows the fragmentation of packets.(this is a feature & not a bug) IP fragmentation offset is used to keep track of the different parts of a datagram. The information or content in this field is used at the destination to reassemble the datagrams All such fragments have the same Identification field value, and the fragmentation offset indicates the position of the current fragment in the context of the original packet. Many access routers and firewalls do not perform packet reassembly. In normal operation, IP fragments do not overlap, but attackers can create artificially fragmented packets to mislead the routers or firewalls. Usually, these packets are small and almost impractical for end systems because of data and computational overhead. A good example of an IP fragmentation attack is the Ping of Death attack. The Ping of Death attack sends fragments that, when reassembled at the end station, create a larger packet than the maximum permissible length. TCP Flags Data exchange using TCP does not happen until a three-way handshake has been completed. This handshake uses different flags to influence the way TCP segments are processed. There are 6 bits in the TCP header that are often called flags. Namely: 6 different flags are part of the TCP header: Urgent pointer field (URG), Acknowledgment field (ACK), Push function (PSH), Reset the connection (RST), Synchronize sequence numbers (SYN), and the sender is finished with this connection (FIN). Abuse of the normal operation or settings of these flags can be used by attackers to launch DoS attacks. This causes network servers or web servers to crash or hang. | SYN | FIN | PSH | RST | Validity| |------|------|-------|------|---------| | 1 |1 |0 |0 |Illegal Combination | 1 |1 |1 |0 |Illegal Combination | 1 |1 |0 |1 |Illegal Combination | 1 |1 |1 |1 |Illegal Combination The attacker's ultimate goal is to write special programs or pieces of code that can construct these illegal combinations resulting in an efficient DoS attack. SYN FLOOD The timers (or lack of certain timers) in 3 way handshake are often used and exploited by attackers to disable services or even to enter systems. After step 2 of the three-way handshake, no limit is set on the time to wait after receiving a SYN. The attacker initiates many connection requests to the webserver of Company XYZ (almost certainly with a spoofed IP address). The SYN+ACK packets (Step 2) sent by the web server back to the originating source IP address are not replied to. This leaves a TCP session half-open on the webserver. Multiple packets cause multiple TCP sessions to stay open. Based on the hardware limitations of the server, a limited number of TCP sessions can stay open, and as a result, the webserver refuses further connection establishments attempts from any host as soon as a certain limit is reached. These half-open connections need to be completed or timed out before new connections can be established. FIN Attack In normal operation, the sender sets the TCP FIN flag indicating that no more data will be transmitted and the connection can be closed down. This is a four-way handshake mechanism, with both sender and receiver expected to send an acknowledgement on a received FIN packet. During an attack that is trying to kill connections, a spoofed FIN packet is constructed. This packet also has the correct sequence number, so the packets are seen as valid by the targeted host. These sequence numbers are easy to predict. This process is referred to as TCP sequence number prediction, whereby the attacker either sniffs the current Sequence and Acknowledgment (SEQ/ACK) numbers of the connection or can algorithmically predict these numbers.","title":"IP Fragmentation Attack"},{"location":"level101/security/network_security/#connection-hijacking","text":"An authorized user (Employee X) sends HTTP requests over a TCP session with the webserver. The web server accepts the packets from Employee X only when the packet has the correct SEQ/ACK numbers. As seen previously, these numbers are important for the webserver to distinguish between different sessions and to make sure it is still talking to Employee X. Imagine that the cracker starts sending packets to the web server spoofing the IP address of Employee X, using the correct SEQ/ACK combination. The web server accepts the packet and increments the ACK number. In the meantime, Employee X continues to send packets but with incorrect SEQ/ACK numbers. As a result of sending unsynchronized packets, all data from Employee X is discarded when received by the webserver. The attacker pretends to be Employee X using the correct numbers. This finally results in the cracker hijacking the connection, whereby Employee X is completely confused and the webserver replies assuming the cracker is sending correct synchronized data. STEPS: The attacker examines the traffic flows with a network monitor and notices traffic from Employee X to a web server. The web server returns or echoes data back to the origination station (Employee X). Employee X acknowledges the packet. The cracker launches a spoofed packet to the server. The web server responds to the cracker. The cracker starts verifying SEQ/ACK numbers to double-check success. At this time, the cracker takes over the session from Employee X, which results in a session hanging for Employee X. The cracker can start sending traffic to the webserver. The web server returns the requested data to confirm delivery with the correct ACK number. The cracker can continue to send data (keeping track of the correct SEQ/ACK numbers) until eventually setting the FIN flag to terminate the session.","title":"Connection Hijacking"},{"location":"level101/security/network_security/#buffer-overflow","text":"A buffer is a temporary data storage area used to store program code and data. When a program or process tries to store more data in a buffer than it was originally anticipated to hold, a buffer overflow occurs. Buffers are temporary storage locations in memory (memory or buffer sizes are often measured in bytes) that can store a fixed amount of data in bytes. When more data is retrieved than can be stored in a buffer location, the additional information must go into an adjacent buffer, resulting in overwriting the valid data held in them. Mechanism: Buffer overflow vulnerabilities exist in different types. But the overall goal for all buffer overflow attacks is to take over the control of a privileged program and, if possible, the host. The attacker has two tasks to achieve this goal. First, the dirty code needs to be available in the program's code address space. Second, the privileged program should jump to that particular part of the code, which ensures that the proper parameters are loaded into memory. The first task can be achieved in two ways: by injecting the code in the right address space or by using the existing code and modifying certain parameters slightly. The second task is a little more complex because the program's control flow needs to be modified to make the program jump to the dirty code. CounterMeasure: The most important approach is to have a concerted focus on writing correct code. A second method is to make the data buffers (memory locations) address space of the program code non-executable. This type of address space makes it impossible to execute code, which might be infiltrated in the program's buffers during an attack.","title":"Buffer Overflow"},{"location":"level101/security/network_security/#more-spoofing","text":"Address Resolution Protocol Spoofing The Address Resolution Protocol (ARP) provides a mechanism to resolve, or map, a known IP address to a MAC sublayer address. Using ARP spoofing, the cracker can exploit this hardware address authentication mechanism by spoofing the hardware address of Host B. Basically, the attacker can convince any host or network device on the local network that the cracker's workstation is the host to be trusted. This is a common method used in a switched environment. ARP spoofing can be prevented with the implementation of static ARP tables in all the hosts and routers of your network. Alternatively, you can implement an ARP server that responds to ARP requests on behalf of the target host. DNS Spoofing DNS spoofing is the method whereby the hacker convinces the target machine that the system it wants to connect to is the machine of the cracker. The cracker modifies some records so that name entries of hosts correspond to the attacker's IP address. There have been instances in which the complete DNS server was compromised by an attack. To counter DNS spoofing, the reverse lookup detects these attacks. The reverse lookup is a mechanism to verify the IP address against a name. The IP address and name files are usually kept on different servers to make compromise much more difficult","title":"More Spoofing"},{"location":"level101/security/threats_attacks_defences/","text":"Part III: Threats, Attacks & Defense DNS Protection Cache Poisoning Attack Since DNS responses are cached, a quick response can be provided for repeated translations. DNS negative queries are also cached, e.g., misspelt words, and all cached data periodically times out. Cache poisoning is an issue in what is known as pharming. This term is used to describe a hacker\u2019s attack in which a website\u2019s traffic is redirected to a bogus website by forging the DNS mapping. In this case, an attacker attempts to insert a fake address record for an Internet domain into the DNS. If the server accepts the fake record, the cache is poisoned and subsequent requests for the address of the domain are answered with the address of a server controlled by the attacker. As long as the fake entry is cached by the server, browsers or e-mail servers will automatically go to the address provided by the compromised DNS server. the typical time to live (TTL) for cached entries is a couple of hours, thereby permitting ample time for numerous users to be affected by the attack. DNSSEC (Security Extension) The long-term solution to these DNS problems is authentication. If a resolver cannot distinguish between valid and invalid data in a response, then add source authentication to verify that the data received in response is equal to the data entered by the zone administrator DNS Security Extensions (DNSSEC) protects against data spoofing and corruption and provides mechanisms to authenticate servers and requests, as well as mechanisms to establish authenticity and integrity. When authenticating DNS responses, each DNS zone signs its data using a private key. It is recommended that this signing be done offline and in advance. The query for a particular record returns the requested resource record set (RRset) and signature (RRSIG) of the requested resource record set. The resolver then authenticates the response using a public key, which is pre-configured or learned via a sequence of key records in the DNS hierarchy. The goals of DNSSEC are to provide authentication and integrity for DNS responses without confidentiality or DDoS protection. BGP BGP stands for border gateway protocol. It is a routing protocol that exchanges routing information among multiple Autonomous Systems (AS) An Autonomous System is a collection of routers or networks with the same network policy usually under single administrative control. BGP tells routers which hop to use in order to reach the destination network. BGP is used for both communicating information among routers in an AS (interior) and between multiple ASes (exterior). How BGP Works BGP is responsible for finding a path to a destination router & the path it chooses should be the shortest and most reliable one. This decision is done through a protocol known as Link state. With the link-state protocol, each router broadcasts to all other routers in the network the state of its links and IP subnets. Each router then receives information from the other routers and constructs a complete topology view of the entire network. The next-hop routing table is based on this topology view. The link-state protocol uses a famous algorithm in the field of computer science, Dijkstra\u2019s shortest path algorithm: We start from our router considering the path cost to all our direct neighbours. The shortest path is then taken We then re-look at all our neighbours that we can reach and update our link state table with the cost information. We then continue taking the shortest path until every router has been visited. BGP Vulnerabilities By corrupting the BGP routing table we are able to influence the direction traffic flows on the internet! This action is known as BGP hijacking. Injecting bogus route advertising information into the BGP-distributed routing database by malicious sources, accidentally or routers can disrupt Internet backbone operations. Blackholing traffic: Blackhole route is a network route, i.e., routing table entry, that goes nowhere and packets matching the route prefix are dropped or ignored. Blackhole routes can only be detected by monitoring the lost traffic. Blackhole routes are the best defence against many common viral attacks where the traffic is dropped from infected machines to/from command & control hosts. Infamous BGP Injection attack on Youtube Ex: In 2008, Pakistan decided to block YouTube by creating a BGP route that led into a black hole. Instead, this routing information got transmitted to a hong kong ISP and from there accidentally got propagated to the rest of the world meaning millions were routed through to this black hole and therefore unable to access YouTube. Potentially, the greatest risk to BGP occurs in a denial of service attack in which a router is flooded with more packets than it can handle. Network overload and router resource exhaustion happen when the network begins carrying an excessive number of BGP messages, overloading the router control processors, memory, routing table and reducing the bandwidth available for data traffic. Refer: https://medium.com/bugbountywriteup/bgp-the-weak-link-in-the-internet-what-is-bgp-and-how-do-hackers-exploit-it-d899a68ba5bb Router flapping is another type of attack. Route flapping refers to repetitive changes to the BGP routing table, often several times a minute. Withdrawing and re-advertising at a high-rate can cause a serious problem for routers since they propagate the announcements of routes. If these route flaps happen fast enough, e.g., 30 to 50 times per second, the router becomes overloaded, which eventually prevents convergence on valid routes. The potential impact for Internet users is a slowdown in message delivery, and in some cases, packets may not be delivered at all. BGP Security Border Gateway Protocol Security recommends the use of BGP peer authentication since it is one of the strongest mechanisms for preventing malicious activity. The authentication mechanisms are Internet Protocol Security (IPsec) or BGP MD5. Another method, known as prefix limits, can be used to avoid filling router tables. In this approach, routers should be configured to disable or terminate a BGP peering session, and issue warning messages to administrators when a neighbour sends in excess of a preset number of prefixes. IETF is currently working on improving this space Web-Based Attacks HTTP Response Splitting Attacks HTTP response splitting attack may happen where the server script embeds user data in HTTP response headers without appropriate sanitation. This typically happens when the script embeds user data in the redirection URL of a redirection response (HTTP status code 3xx), or when the script embeds user data in a cookie value or name when the response sets a cookie. HTTP response splitting attacks can be used to perform web cache poisoning and cross-site scripting attacks. HTTP response splitting is the attacker\u2019s ability to send a single HTTP request that forces the webserver to form an output stream, which is then interpreted by the target as two HTTP responses instead of one response. Cross-Site Request Forgery (CSRF or XSRF) A Cross-Site Request Forgery attack tricks the victim\u2019s browser into issuing a command to a vulnerable web application. Vulnerability is caused by browsers automatically including user authentication data, session ID, IP address, Windows domain credentials, etc. with each request. Attackers typically use CSRF to initiate transactions such as transfer funds, login/logout user, close account, access sensitive data, and change account details. The vulnerability is caused by web browsers that automatically include credentials with each request, even for requests caused by a form, script, or image on another site. CSRF can also be dynamically constructed as part of a payload for a cross-site scripting attack All sites relying on automatic credentials are vulnerable. Popular browsers cannot prevent cross-site request forgery. Logging out of high-value sites as soon as possible can mitigate CSRF risk. It is recommended that a high-value website must require a client to manually provide authentication data in the same HTTP request used to perform any operation with security implications. Limiting the lifetime of session cookies can also reduce the chance of being used by other malicious sites. OWASP recommends website developers include a required security token in HTTP requests associated with sensitive business functions in order to mitigate CSRF attacks Cross-Site Scripting (XSS) Attacks Cross-Site Scripting occurs when dynamically generated web pages display user input, such as login information, that is not properly validated, allowing an attacker to embed malicious scripts into the generated page and then execute the script on the machine of any user that views the site. If successful, Cross-Site Scripting vulnerabilities can be exploited to manipulate or steal cookies, create requests that can be mistaken for those of a valid user, compromise confidential information, or execute malicious code on end-user systems. Cross-Site Scripting (XSS or CSS) attacks involve the execution of malicious scripts on the victim\u2019s browser. The victim is simply a user\u2019s host and not the server. XSS results from a failure to validate user input by a web-based application. Document Object Model (DOM) XSS Attacks The Document Object Model (DOM) based XSS does not require the webserver to receive the XSS payload for a successful attack. The attacker abuses the runtime by embedding their data on the client-side. An attacker can force the client (browser) to render the page with parts of the DOM controlled by the attacker. When the page is rendered and the data is processed by the page, typically by a client-side HTML-embedded script such as JavaScript, the page\u2019s code may insecurely embed the data in the page itself, thus delivering the cross-site scripting payload. There are several DOM objects which can serve as an attack vehicle for delivering malicious script to victims browser. Clickjacking The technique works by hiding malicious link/scripts under the cover of the content of a legitimate site. Buttons on a website actually contain invisible links, placed there by the attacker. So, an individual who clicks on an object they can visually see is actually being duped into visiting a malicious page or executing a malicious script. When mouseover is used together with clickjacking, the outcome is devastating. Facebook users have been hit by a clickjacking attack, which tricks people into \u201cliking\u201d a particular Facebook page, thus enabling the attack to spread since Memorial Day 2010. There is not yet effective defence against clickjacking, and disabling JavaScript is the only viable method DataBase Attacks & Defenses SQL injection Attacks It exploits improper input validation in database queries. A successful exploit will allow attackers to access, modify, or delete information in the database. It permits attackers to steal sensitive information stored within the backend databases of affected websites, which may include such things as user credentials, email addresses, personal information, and credit card numbers SELECT USERNAME,PASSWORD from USERS where USERNAME='' AND PASSWORD=''; Here the username & password is the input provided by the user. Suppose an attacker gives the input as \" OR '1'='1'\" in both fields. Therefore the SQL query will look like: SELECT USERNAME,PASSWORD from USERS where USERNAME='' OR '1'='1' AND PASSOWRD='' OR '1'='1'; This query results in a true statement & the user gets logged in. This example depicts the bost basic type of SQL injection SQL Injection Attack Defenses SQL injection can be protected by filtering the query to eliminate malicious syntax, which involves the employment of some tools in order to (a) scan the source code. In addition, the input fields should be restricted to the absolute minimum, typically anywhere from 7-12 characters, and validate any data, e.g., if a user inputs an age make sure the input is an integer with a maximum of 3 digits. VPN A virtual private network (VPN) is a service that offers a secure, reliable connection over a shared public infrastructure such as the Internet. Cisco defines a VPN as an encrypted connection between private networks over a public network. To date, there are three types of VPNs: Remote access Site-to-site Firewall-based Security Breach In spite of the most aggressive steps to protect computers from attacks, attackers sometimes get through. Any event that results in a violation of any of the confidentiality, integrity, or availability (CIA) security tenets is a security breach. Denial of Service Attacks Denial of service (DoS) attacks result in downtime or inability of a user to access a system. DoS attacks impact the availability of tenet of information systems security. A DoS attack is a coordinated attempt to deny service by occupying a computer to perform large amounts of unnecessary tasks. This excessive activity makes the system unavailable to perform legitimate operations Two common types of DoS attacks are as follows: Logic attacks\u2014Logic attacks use software flaws to crash or seriously hinder the performance of remote servers. You can prevent many of these attacks by installing the latest patches to keep your software up to date. Flooding attacks\u2014Flooding attacks overwhelm the victim computer\u2019s CPU, memory, or network resources by sending large numbers of useless requests to the machine. Most DoS attacks target weaknesses in the overall system architecture rather than a software bug or security flaw One popular technique for launching a packet flood is a SYN flood. One of the best defences against DoS attacks is to use intrusion prevention system (IPS) software or devices to detect and stop the attack. Distributed Denial of Service Attacks DDoS attacks differ from regular DoS attacks in their scope. In a DDoS attack, attackers hijack hundreds or even thousands of Internet computers, planting automated attack agents on those systems. The attacker then instructs the agents to bombard the target site with forged messages. This overloads the site and blocks legitimate traffic. The key here is strength in numbers. The attacker does more damage by distributing the attack across multiple computers. Wiretapping Although the term wiretapping is generally associated with voice telephone communications, attackers can also use wiretapping to intercept data communications. Attackers can tap telephone lines and data communication lines. Wiretapping can be active, where the attacker makes modifications to the line. It can also be passive, where an unauthorized user simply listens to the transmission without changing the contents. Passive intrusion can include the copying of data for a subsequent active attack. Two methods of active wiretapping are as follows: Between-the-lines wiretapping\u2014This type of wiretapping does not alter the messages sent by the legitimate user but inserts additional messages into the communication line when the legitimate user pauses. Piggyback-entry wiretapping\u2014This type of wiretapping intercepts and modifies the original message by breaking the communications line and routing the message to another computer that acts as a host. Backdoors Software developers sometimes include hidden access methods, called backdoors, in their programs. Backdoors give developers or support personnel easy access to a system without having to struggle with security controls. The problem is that backdoors don\u2019t always stay hidden. When an attacker discovers a backdoor, he or she can use it to bypass existing security controls such as passwords, encryption, and so on. Where legitimate users log on through front doors using a user ID and password, attackers use backdoors to bypass these normal access controls. Malicious Attacks Birthday Attack Once an attacker compromises a hashed password file, a birthday attack is performed. A birthday attack is a type of cryptographic attack that is used to make a brute-force attack of one-way hashes easier. It is a mathematical exploit that is based on the birthday problem in probability theory. Further Reading: https://www.sciencedirect.com/topics/computer-science/birthday-attack https://www.internetsecurity.tips/birthday-attack/ Brute-Force Password Attacks In a brute-force password attack, the attacker tries different passwords on a system until one of them is successful. Usually, the attacker employs a software program to try all possible combinations of a likely password, user ID, or security code until it locates a match. This occurs rapidly and in sequence. This type of attack is called a brute-force password attack because the attacker simply hammers away at the code. There is no skill or stealth involved\u2014just brute force that eventually breaks the code. Further Reading: https://owasp.org/www-community/attacks/Brute_force_attack https://owasp.org/www-community/controls/Blocking_Brute_Force_Attacks Dictionary Password Attacks A dictionary password attack is a simple attack that relies on users making poor password choices. In a dictionary password attack, a simple password-cracker program takes all the words from a dictionary file and attempts to log on by entering each dictionary entry as a password. Further Reading: https://capec.mitre.org/data/definitions/16.html Replay Attacks Replay attacks involve capturing data packets from a network and retransmitting them to produce an unauthorized effect. The receipt of duplicate, authenticated IP packets may disrupt service or have some other undesired consequence. Systems can be broken through replay attacks when attackers reuse old messages or parts of old messages to deceive system users. This helps intruders to gain information that allows unauthorized access into a system. Further reading: https://study.com/academy/lesson/replay-attack-definition-examples-prevention.html Man-in-the-Middle Attacks A man-in-the-middle attack takes advantage of the multihop process used by many types of networks. In this type of attack, an attacker intercepts messages between two parties before transferring them on to their intended destination. Web spoofing is a type of man-in-the-middle attack in which the user believes a secure session exists with a particular web server. In reality, the secure connection exists only with the attacker, not the webserver. The attacker then establishes a secure connection with the webserver, acting as an invisible go-between. The attacker passes traffic between the user and the webserver. In this way, the attacker can trick the user into supplying passwords, credit card information, and other private data. Further Reading: https://owasp.org/www-community/attacks/Man-in-the-middle_attack Masquerading In a masquerade attack, one user or computer pretends to be another user or computer. Masquerade attacks usually include one of the other forms of active attacks, such as IP address spoofing or replaying. Attackers can capture authentication sequences and then replay them later to log on again to an application or operating system. For example, an attacker might monitor usernames and passwords sent to a weak web application. The attacker could then use the intercepted credentials to log on to the web application and impersonate the user. Further Reading: https://dl.acm.org/doi/book/10.5555/2521792 https://ieeexplore.ieee.org/document/1653228 Eavesdropping Eavesdropping, or sniffing, occurs when a host sets its network interface on promiscuous mode and copies packets that pass by for later analysis. Promiscuous mode enables a network device to intercept and read each network packet(of course given some conditions) given sec, even if the packet\u2019s address doesn\u2019t match the network device. It is possible to attach hardware and software to monitor and analyze all packets on that segment of the transmission media without alerting any other users. Candidates for eavesdropping include satellite, wireless, mobile, and other transmission methods. Social Engineering Attackers often use a deception technique called social engineering to gain access to resources in an IT infrastructure. In nearly all cases, social engineering involves tricking authorized users into carrying out actions for unauthorized users. The success of social engineering attacks depends on the basic tendency of people to want to be helpful. Phreaking Phone phreaking, or simply phreaking, is a slang term that describes the activity of a subculture of people who study, experiment with, or explore telephone systems, telephone company equipment, and systems connected to public telephone networks. Phreaking is the art of exploiting bugs and glitches that exist in the telephone system. Phishing Phishing is a type of fraud in which an attacker attempts to trick the victim into providing private information such as credit card numbers, passwords, dates of birth, bank account numbers, automated teller machine (ATM) PINs, and Social Security numbers. Pharming Pharming is another type of attack that seeks to obtain personal or private financial information through domain spoofing. A pharming attack doesn\u2019t use messages to trick victims into visiting spoofed websites that appear legitimate, however. Instead, pharming \u201cpoisons\u201d a domain name on the domain name server (DNS), a process known as DNS poisoning. The result is that when a user enters the poisoned server\u2019s web address into his or her address bar, that user navigates to the attacker\u2019s site. The user\u2019s browser still shows the correct website, which makes pharming difficult to detect\u2014and therefore more serious. Where phishing attempts to scam people one at a time with an email or instant message, pharming enables scammers to target large groups of people at one time through domain spoofing.","title":"Threat, Attacks & Defences"},{"location":"level101/security/threats_attacks_defences/#part-iii-threats-attacks-defense","text":"","title":"Part III: Threats, Attacks & Defense"},{"location":"level101/security/threats_attacks_defences/#dns-protection","text":"","title":"DNS Protection"},{"location":"level101/security/threats_attacks_defences/#cache-poisoning-attack","text":"Since DNS responses are cached, a quick response can be provided for repeated translations. DNS negative queries are also cached, e.g., misspelt words, and all cached data periodically times out. Cache poisoning is an issue in what is known as pharming. This term is used to describe a hacker\u2019s attack in which a website\u2019s traffic is redirected to a bogus website by forging the DNS mapping. In this case, an attacker attempts to insert a fake address record for an Internet domain into the DNS. If the server accepts the fake record, the cache is poisoned and subsequent requests for the address of the domain are answered with the address of a server controlled by the attacker. As long as the fake entry is cached by the server, browsers or e-mail servers will automatically go to the address provided by the compromised DNS server. the typical time to live (TTL) for cached entries is a couple of hours, thereby permitting ample time for numerous users to be affected by the attack.","title":"Cache Poisoning Attack"},{"location":"level101/security/threats_attacks_defences/#dnssec-security-extension","text":"The long-term solution to these DNS problems is authentication. If a resolver cannot distinguish between valid and invalid data in a response, then add source authentication to verify that the data received in response is equal to the data entered by the zone administrator DNS Security Extensions (DNSSEC) protects against data spoofing and corruption and provides mechanisms to authenticate servers and requests, as well as mechanisms to establish authenticity and integrity. When authenticating DNS responses, each DNS zone signs its data using a private key. It is recommended that this signing be done offline and in advance. The query for a particular record returns the requested resource record set (RRset) and signature (RRSIG) of the requested resource record set. The resolver then authenticates the response using a public key, which is pre-configured or learned via a sequence of key records in the DNS hierarchy. The goals of DNSSEC are to provide authentication and integrity for DNS responses without confidentiality or DDoS protection.","title":"DNSSEC (Security Extension)"},{"location":"level101/security/threats_attacks_defences/#bgp","text":"BGP stands for border gateway protocol. It is a routing protocol that exchanges routing information among multiple Autonomous Systems (AS) An Autonomous System is a collection of routers or networks with the same network policy usually under single administrative control. BGP tells routers which hop to use in order to reach the destination network. BGP is used for both communicating information among routers in an AS (interior) and between multiple ASes (exterior).","title":"BGP"},{"location":"level101/security/threats_attacks_defences/#how-bgp-works","text":"BGP is responsible for finding a path to a destination router & the path it chooses should be the shortest and most reliable one. This decision is done through a protocol known as Link state. With the link-state protocol, each router broadcasts to all other routers in the network the state of its links and IP subnets. Each router then receives information from the other routers and constructs a complete topology view of the entire network. The next-hop routing table is based on this topology view. The link-state protocol uses a famous algorithm in the field of computer science, Dijkstra\u2019s shortest path algorithm: We start from our router considering the path cost to all our direct neighbours. The shortest path is then taken We then re-look at all our neighbours that we can reach and update our link state table with the cost information. We then continue taking the shortest path until every router has been visited.","title":"How BGP Works"},{"location":"level101/security/threats_attacks_defences/#bgp-vulnerabilities","text":"By corrupting the BGP routing table we are able to influence the direction traffic flows on the internet! This action is known as BGP hijacking. Injecting bogus route advertising information into the BGP-distributed routing database by malicious sources, accidentally or routers can disrupt Internet backbone operations. Blackholing traffic: Blackhole route is a network route, i.e., routing table entry, that goes nowhere and packets matching the route prefix are dropped or ignored. Blackhole routes can only be detected by monitoring the lost traffic. Blackhole routes are the best defence against many common viral attacks where the traffic is dropped from infected machines to/from command & control hosts. Infamous BGP Injection attack on Youtube Ex: In 2008, Pakistan decided to block YouTube by creating a BGP route that led into a black hole. Instead, this routing information got transmitted to a hong kong ISP and from there accidentally got propagated to the rest of the world meaning millions were routed through to this black hole and therefore unable to access YouTube. Potentially, the greatest risk to BGP occurs in a denial of service attack in which a router is flooded with more packets than it can handle. Network overload and router resource exhaustion happen when the network begins carrying an excessive number of BGP messages, overloading the router control processors, memory, routing table and reducing the bandwidth available for data traffic. Refer: https://medium.com/bugbountywriteup/bgp-the-weak-link-in-the-internet-what-is-bgp-and-how-do-hackers-exploit-it-d899a68ba5bb Router flapping is another type of attack. Route flapping refers to repetitive changes to the BGP routing table, often several times a minute. Withdrawing and re-advertising at a high-rate can cause a serious problem for routers since they propagate the announcements of routes. If these route flaps happen fast enough, e.g., 30 to 50 times per second, the router becomes overloaded, which eventually prevents convergence on valid routes. The potential impact for Internet users is a slowdown in message delivery, and in some cases, packets may not be delivered at all. BGP Security Border Gateway Protocol Security recommends the use of BGP peer authentication since it is one of the strongest mechanisms for preventing malicious activity. The authentication mechanisms are Internet Protocol Security (IPsec) or BGP MD5. Another method, known as prefix limits, can be used to avoid filling router tables. In this approach, routers should be configured to disable or terminate a BGP peering session, and issue warning messages to administrators when a neighbour sends in excess of a preset number of prefixes. IETF is currently working on improving this space","title":"BGP Vulnerabilities"},{"location":"level101/security/threats_attacks_defences/#web-based-attacks","text":"","title":"Web-Based Attacks"},{"location":"level101/security/threats_attacks_defences/#http-response-splitting-attacks","text":"HTTP response splitting attack may happen where the server script embeds user data in HTTP response headers without appropriate sanitation. This typically happens when the script embeds user data in the redirection URL of a redirection response (HTTP status code 3xx), or when the script embeds user data in a cookie value or name when the response sets a cookie. HTTP response splitting attacks can be used to perform web cache poisoning and cross-site scripting attacks. HTTP response splitting is the attacker\u2019s ability to send a single HTTP request that forces the webserver to form an output stream, which is then interpreted by the target as two HTTP responses instead of one response.","title":"HTTP Response Splitting Attacks"},{"location":"level101/security/threats_attacks_defences/#cross-site-request-forgery-csrf-or-xsrf","text":"A Cross-Site Request Forgery attack tricks the victim\u2019s browser into issuing a command to a vulnerable web application. Vulnerability is caused by browsers automatically including user authentication data, session ID, IP address, Windows domain credentials, etc. with each request. Attackers typically use CSRF to initiate transactions such as transfer funds, login/logout user, close account, access sensitive data, and change account details. The vulnerability is caused by web browsers that automatically include credentials with each request, even for requests caused by a form, script, or image on another site. CSRF can also be dynamically constructed as part of a payload for a cross-site scripting attack All sites relying on automatic credentials are vulnerable. Popular browsers cannot prevent cross-site request forgery. Logging out of high-value sites as soon as possible can mitigate CSRF risk. It is recommended that a high-value website must require a client to manually provide authentication data in the same HTTP request used to perform any operation with security implications. Limiting the lifetime of session cookies can also reduce the chance of being used by other malicious sites. OWASP recommends website developers include a required security token in HTTP requests associated with sensitive business functions in order to mitigate CSRF attacks","title":"Cross-Site Request Forgery (CSRF or XSRF)"},{"location":"level101/security/threats_attacks_defences/#cross-site-scripting-xss-attacks","text":"Cross-Site Scripting occurs when dynamically generated web pages display user input, such as login information, that is not properly validated, allowing an attacker to embed malicious scripts into the generated page and then execute the script on the machine of any user that views the site. If successful, Cross-Site Scripting vulnerabilities can be exploited to manipulate or steal cookies, create requests that can be mistaken for those of a valid user, compromise confidential information, or execute malicious code on end-user systems. Cross-Site Scripting (XSS or CSS) attacks involve the execution of malicious scripts on the victim\u2019s browser. The victim is simply a user\u2019s host and not the server. XSS results from a failure to validate user input by a web-based application.","title":"Cross-Site Scripting (XSS) Attacks"},{"location":"level101/security/threats_attacks_defences/#document-object-model-dom-xss-attacks","text":"The Document Object Model (DOM) based XSS does not require the webserver to receive the XSS payload for a successful attack. The attacker abuses the runtime by embedding their data on the client-side. An attacker can force the client (browser) to render the page with parts of the DOM controlled by the attacker. When the page is rendered and the data is processed by the page, typically by a client-side HTML-embedded script such as JavaScript, the page\u2019s code may insecurely embed the data in the page itself, thus delivering the cross-site scripting payload. There are several DOM objects which can serve as an attack vehicle for delivering malicious script to victims browser.","title":"Document Object Model (DOM) XSS Attacks"},{"location":"level101/security/threats_attacks_defences/#clickjacking","text":"The technique works by hiding malicious link/scripts under the cover of the content of a legitimate site. Buttons on a website actually contain invisible links, placed there by the attacker. So, an individual who clicks on an object they can visually see is actually being duped into visiting a malicious page or executing a malicious script. When mouseover is used together with clickjacking, the outcome is devastating. Facebook users have been hit by a clickjacking attack, which tricks people into \u201cliking\u201d a particular Facebook page, thus enabling the attack to spread since Memorial Day 2010. There is not yet effective defence against clickjacking, and disabling JavaScript is the only viable method","title":"Clickjacking"},{"location":"level101/security/threats_attacks_defences/#database-attacks-defenses","text":"","title":"DataBase Attacks & Defenses"},{"location":"level101/security/threats_attacks_defences/#sql-injection-attacks","text":"It exploits improper input validation in database queries. A successful exploit will allow attackers to access, modify, or delete information in the database. It permits attackers to steal sensitive information stored within the backend databases of affected websites, which may include such things as user credentials, email addresses, personal information, and credit card numbers SELECT USERNAME,PASSWORD from USERS where USERNAME='' AND PASSWORD=''; Here the username & password is the input provided by the user. Suppose an attacker gives the input as \" OR '1'='1'\" in both fields. Therefore the SQL query will look like: SELECT USERNAME,PASSWORD from USERS where USERNAME='' OR '1'='1' AND PASSOWRD='' OR '1'='1'; This query results in a true statement & the user gets logged in. This example depicts the bost basic type of SQL injection","title":"SQL injection Attacks"},{"location":"level101/security/threats_attacks_defences/#sql-injection-attack-defenses","text":"SQL injection can be protected by filtering the query to eliminate malicious syntax, which involves the employment of some tools in order to (a) scan the source code. In addition, the input fields should be restricted to the absolute minimum, typically anywhere from 7-12 characters, and validate any data, e.g., if a user inputs an age make sure the input is an integer with a maximum of 3 digits.","title":"SQL Injection Attack Defenses"},{"location":"level101/security/threats_attacks_defences/#vpn","text":"A virtual private network (VPN) is a service that offers a secure, reliable connection over a shared public infrastructure such as the Internet. Cisco defines a VPN as an encrypted connection between private networks over a public network. To date, there are three types of VPNs: Remote access Site-to-site Firewall-based","title":"VPN"},{"location":"level101/security/threats_attacks_defences/#security-breach","text":"In spite of the most aggressive steps to protect computers from attacks, attackers sometimes get through. Any event that results in a violation of any of the confidentiality, integrity, or availability (CIA) security tenets is a security breach.","title":"Security Breach"},{"location":"level101/security/threats_attacks_defences/#denial-of-service-attacks","text":"Denial of service (DoS) attacks result in downtime or inability of a user to access a system. DoS attacks impact the availability of tenet of information systems security. A DoS attack is a coordinated attempt to deny service by occupying a computer to perform large amounts of unnecessary tasks. This excessive activity makes the system unavailable to perform legitimate operations Two common types of DoS attacks are as follows: Logic attacks\u2014Logic attacks use software flaws to crash or seriously hinder the performance of remote servers. You can prevent many of these attacks by installing the latest patches to keep your software up to date. Flooding attacks\u2014Flooding attacks overwhelm the victim computer\u2019s CPU, memory, or network resources by sending large numbers of useless requests to the machine. Most DoS attacks target weaknesses in the overall system architecture rather than a software bug or security flaw One popular technique for launching a packet flood is a SYN flood. One of the best defences against DoS attacks is to use intrusion prevention system (IPS) software or devices to detect and stop the attack.","title":"Denial of Service Attacks"},{"location":"level101/security/threats_attacks_defences/#distributed-denial-of-service-attacks","text":"DDoS attacks differ from regular DoS attacks in their scope. In a DDoS attack, attackers hijack hundreds or even thousands of Internet computers, planting automated attack agents on those systems. The attacker then instructs the agents to bombard the target site with forged messages. This overloads the site and blocks legitimate traffic. The key here is strength in numbers. The attacker does more damage by distributing the attack across multiple computers.","title":"Distributed Denial of Service Attacks"},{"location":"level101/security/threats_attacks_defences/#wiretapping","text":"Although the term wiretapping is generally associated with voice telephone communications, attackers can also use wiretapping to intercept data communications. Attackers can tap telephone lines and data communication lines. Wiretapping can be active, where the attacker makes modifications to the line. It can also be passive, where an unauthorized user simply listens to the transmission without changing the contents. Passive intrusion can include the copying of data for a subsequent active attack. Two methods of active wiretapping are as follows: Between-the-lines wiretapping\u2014This type of wiretapping does not alter the messages sent by the legitimate user but inserts additional messages into the communication line when the legitimate user pauses. Piggyback-entry wiretapping\u2014This type of wiretapping intercepts and modifies the original message by breaking the communications line and routing the message to another computer that acts as a host.","title":"Wiretapping"},{"location":"level101/security/threats_attacks_defences/#backdoors","text":"Software developers sometimes include hidden access methods, called backdoors, in their programs. Backdoors give developers or support personnel easy access to a system without having to struggle with security controls. The problem is that backdoors don\u2019t always stay hidden. When an attacker discovers a backdoor, he or she can use it to bypass existing security controls such as passwords, encryption, and so on. Where legitimate users log on through front doors using a user ID and password, attackers use backdoors to bypass these normal access controls.","title":"Backdoors"},{"location":"level101/security/threats_attacks_defences/#malicious-attacks","text":"","title":"Malicious Attacks"},{"location":"level101/security/threats_attacks_defences/#birthday-attack","text":"Once an attacker compromises a hashed password file, a birthday attack is performed. A birthday attack is a type of cryptographic attack that is used to make a brute-force attack of one-way hashes easier. It is a mathematical exploit that is based on the birthday problem in probability theory. Further Reading: https://www.sciencedirect.com/topics/computer-science/birthday-attack https://www.internetsecurity.tips/birthday-attack/","title":"Birthday Attack"},{"location":"level101/security/threats_attacks_defences/#brute-force-password-attacks","text":"In a brute-force password attack, the attacker tries different passwords on a system until one of them is successful. Usually, the attacker employs a software program to try all possible combinations of a likely password, user ID, or security code until it locates a match. This occurs rapidly and in sequence. This type of attack is called a brute-force password attack because the attacker simply hammers away at the code. There is no skill or stealth involved\u2014just brute force that eventually breaks the code. Further Reading: https://owasp.org/www-community/attacks/Brute_force_attack https://owasp.org/www-community/controls/Blocking_Brute_Force_Attacks","title":"Brute-Force Password Attacks"},{"location":"level101/security/threats_attacks_defences/#dictionary-password-attacks","text":"A dictionary password attack is a simple attack that relies on users making poor password choices. In a dictionary password attack, a simple password-cracker program takes all the words from a dictionary file and attempts to log on by entering each dictionary entry as a password. Further Reading: https://capec.mitre.org/data/definitions/16.html","title":"Dictionary Password Attacks"},{"location":"level101/security/threats_attacks_defences/#replay-attacks","text":"Replay attacks involve capturing data packets from a network and retransmitting them to produce an unauthorized effect. The receipt of duplicate, authenticated IP packets may disrupt service or have some other undesired consequence. Systems can be broken through replay attacks when attackers reuse old messages or parts of old messages to deceive system users. This helps intruders to gain information that allows unauthorized access into a system. Further reading: https://study.com/academy/lesson/replay-attack-definition-examples-prevention.html","title":"Replay Attacks"},{"location":"level101/security/threats_attacks_defences/#man-in-the-middle-attacks","text":"A man-in-the-middle attack takes advantage of the multihop process used by many types of networks. In this type of attack, an attacker intercepts messages between two parties before transferring them on to their intended destination. Web spoofing is a type of man-in-the-middle attack in which the user believes a secure session exists with a particular web server. In reality, the secure connection exists only with the attacker, not the webserver. The attacker then establishes a secure connection with the webserver, acting as an invisible go-between. The attacker passes traffic between the user and the webserver. In this way, the attacker can trick the user into supplying passwords, credit card information, and other private data. Further Reading: https://owasp.org/www-community/attacks/Man-in-the-middle_attack","title":"Man-in-the-Middle Attacks"},{"location":"level101/security/threats_attacks_defences/#masquerading","text":"In a masquerade attack, one user or computer pretends to be another user or computer. Masquerade attacks usually include one of the other forms of active attacks, such as IP address spoofing or replaying. Attackers can capture authentication sequences and then replay them later to log on again to an application or operating system. For example, an attacker might monitor usernames and passwords sent to a weak web application. The attacker could then use the intercepted credentials to log on to the web application and impersonate the user. Further Reading: https://dl.acm.org/doi/book/10.5555/2521792 https://ieeexplore.ieee.org/document/1653228","title":"Masquerading"},{"location":"level101/security/threats_attacks_defences/#eavesdropping","text":"Eavesdropping, or sniffing, occurs when a host sets its network interface on promiscuous mode and copies packets that pass by for later analysis. Promiscuous mode enables a network device to intercept and read each network packet(of course given some conditions) given sec, even if the packet\u2019s address doesn\u2019t match the network device. It is possible to attach hardware and software to monitor and analyze all packets on that segment of the transmission media without alerting any other users. Candidates for eavesdropping include satellite, wireless, mobile, and other transmission methods.","title":"Eavesdropping"},{"location":"level101/security/threats_attacks_defences/#social-engineering","text":"Attackers often use a deception technique called social engineering to gain access to resources in an IT infrastructure. In nearly all cases, social engineering involves tricking authorized users into carrying out actions for unauthorized users. The success of social engineering attacks depends on the basic tendency of people to want to be helpful.","title":"Social Engineering"},{"location":"level101/security/threats_attacks_defences/#phreaking","text":"Phone phreaking, or simply phreaking, is a slang term that describes the activity of a subculture of people who study, experiment with, or explore telephone systems, telephone company equipment, and systems connected to public telephone networks. Phreaking is the art of exploiting bugs and glitches that exist in the telephone system.","title":"Phreaking"},{"location":"level101/security/threats_attacks_defences/#phishing","text":"Phishing is a type of fraud in which an attacker attempts to trick the victim into providing private information such as credit card numbers, passwords, dates of birth, bank account numbers, automated teller machine (ATM) PINs, and Social Security numbers.","title":"Phishing"},{"location":"level101/security/threats_attacks_defences/#pharming","text":"Pharming is another type of attack that seeks to obtain personal or private financial information through domain spoofing. A pharming attack doesn\u2019t use messages to trick victims into visiting spoofed websites that appear legitimate, however. Instead, pharming \u201cpoisons\u201d a domain name on the domain name server (DNS), a process known as DNS poisoning. The result is that when a user enters the poisoned server\u2019s web address into his or her address bar, that user navigates to the attacker\u2019s site. The user\u2019s browser still shows the correct website, which makes pharming difficult to detect\u2014and therefore more serious. Where phishing attempts to scam people one at a time with an email or instant message, pharming enables scammers to target large groups of people at one time through domain spoofing.","title":"Pharming"},{"location":"level101/security/writing_secure_code/","text":"PART IV: Writing Secure Code & More The first and most important step in reducing security and reliability issues is to educate developers. However, even the best-trained engineers make mistakes, security experts can write insecure code and SREs can miss reliability issues. It\u2019s difficult to keep the many considerations and tradeoffs involved in building secure and reliable systems in mind simultaneously, especially if you\u2019re also responsible for producing software. Use frameworks to enforce security and reliability while writing code A better approach is to handle security and reliability in common frameworks, languages, and libraries. Ideally, libraries only expose an interface that makes writing code with common classes of security vulnerabilities impossible. Multiple applications can use each library or framework. When domain experts fix an issue, they remove it from all the applications the framework supports, allowing this engineering approach to scale better. Common Security Vulnerabilities In large codebases, a handful of classes account for the majority of security vulnerabilities, despite ongoing efforts to educate developers and introduce code review. OWASP and SANS publish lists of common vulnerability classes Write Simple Code Try to keep your code clean and simple. Avoid Multi-Level Nesting Multilevel nesting is a common anti-pattern that can lead to simple mistakes. If the error is in the most common code path, it will likely be captured by the unit tests. However, unit tests don\u2019t always check error handling paths in multilevel nested code. The error might result in decreased reliability (for example, if the service crashes when it mishandles an error) or a security vulnerability (like a mishandled authorization check error). Eliminate YAGNI Smells Sometimes developers overengineer solutions by adding functionality that may be useful in the future, \u201cjust in case.\u201d This goes against the YAGNI (You Aren\u2019t Gonna Need It) principle, which recommends implementing only the code that you need. YAGNI code adds unnecessary complexity because it needs to be documented, tested, and maintained. To summarize, avoiding YAGNI code leads to improved reliability, and simpler code leads to fewer security bugs, fewer opportunities to make mistakes, and less developer time spent maintaining unused code. Repay Technical Debt It is a common practice for developers to mark places that require further attention with TODO or FIXME annotations. In the short term, this habit can accelerate the delivery velocity for the most critical functionality, and allow a team to meet early deadlines\u2014but it also incurs technical debt. Still, it\u2019s not necessarily a bad practice, as long as you have a clear process (and allocate time) for repaying such debt. Refactoring Refactoring is the most effective way to keep a codebase clean and simple. Even a healthy codebase occasionally needs to be Regardless of the reasons behind refactoring, you should always follow one golden rule: never mix refactoring and functional changes in a single commit to the code repository. Refactoring changes are typically significant and can be difficult to understand. If a commit also includes functional changes, there\u2019s a higher risk that an author or reviewer might overlook bugs. Unit Testing Unit testing can increase system security and reliability by pinpointing a wide range of bugs in individual software components before a release. This technique involves breaking software components into smaller, self-contained \u201cunits\u201d that have no external dependencies, and then testing each unit. Fuzz Testing Fuzz testing is a technique that complements the previously mentioned testing techniques. Fuzzing involves using a fuzzing engine to generate a large number of candidate inputs that are then passed through a fuzz driver to the fuzz target. The fuzzer then analyzes how the system handles the input. Complex inputs handled by all kinds of software are popular targets for fuzzing - for example, file parsers, compression algorithms, network protocol implementation and audio codec. Integration Testing Integration testing moves beyond individual units and abstractions, replacing fake or stubbed-out implementations of abstractions like databases or network services with real implementations. As a result, integration tests exercise more complete code paths. Because you must initialize and configure these other dependencies, integration testing may be slower and flakier than unit testing\u2014to execute the test, this approach incorporates real-world variables like network latency as services communicate end-to-end. As you move from testing individual low-level units of code to testing how they interact when composed together, the net result is a higher degree of confidence that the system is behaving as expected. Last But not the least Code Reviews Rely on Automation Don\u2019t check in Secrets Verifiable Builds","title":"Writing Secure code"},{"location":"level101/security/writing_secure_code/#part-iv-writing-secure-code-more","text":"The first and most important step in reducing security and reliability issues is to educate developers. However, even the best-trained engineers make mistakes, security experts can write insecure code and SREs can miss reliability issues. It\u2019s difficult to keep the many considerations and tradeoffs involved in building secure and reliable systems in mind simultaneously, especially if you\u2019re also responsible for producing software.","title":"PART IV: Writing Secure Code & More"},{"location":"level101/security/writing_secure_code/#use-frameworks-to-enforce-security-and-reliability-while-writing-code","text":"A better approach is to handle security and reliability in common frameworks, languages, and libraries. Ideally, libraries only expose an interface that makes writing code with common classes of security vulnerabilities impossible. Multiple applications can use each library or framework. When domain experts fix an issue, they remove it from all the applications the framework supports, allowing this engineering approach to scale better.","title":"Use frameworks to enforce security and reliability while writing code"},{"location":"level101/security/writing_secure_code/#common-security-vulnerabilities","text":"In large codebases, a handful of classes account for the majority of security vulnerabilities, despite ongoing efforts to educate developers and introduce code review. OWASP and SANS publish lists of common vulnerability classes","title":"Common Security Vulnerabilities"},{"location":"level101/security/writing_secure_code/#write-simple-code","text":"Try to keep your code clean and simple.","title":"Write Simple Code"},{"location":"level101/security/writing_secure_code/#avoid-multi-level-nesting","text":"Multilevel nesting is a common anti-pattern that can lead to simple mistakes. If the error is in the most common code path, it will likely be captured by the unit tests. However, unit tests don\u2019t always check error handling paths in multilevel nested code. The error might result in decreased reliability (for example, if the service crashes when it mishandles an error) or a security vulnerability (like a mishandled authorization check error).","title":"Avoid Multi-Level Nesting"},{"location":"level101/security/writing_secure_code/#eliminate-yagni-smells","text":"Sometimes developers overengineer solutions by adding functionality that may be useful in the future, \u201cjust in case.\u201d This goes against the YAGNI (You Aren\u2019t Gonna Need It) principle, which recommends implementing only the code that you need. YAGNI code adds unnecessary complexity because it needs to be documented, tested, and maintained. To summarize, avoiding YAGNI code leads to improved reliability, and simpler code leads to fewer security bugs, fewer opportunities to make mistakes, and less developer time spent maintaining unused code.","title":"Eliminate YAGNI Smells"},{"location":"level101/security/writing_secure_code/#repay-technical-debt","text":"It is a common practice for developers to mark places that require further attention with TODO or FIXME annotations. In the short term, this habit can accelerate the delivery velocity for the most critical functionality, and allow a team to meet early deadlines\u2014but it also incurs technical debt. Still, it\u2019s not necessarily a bad practice, as long as you have a clear process (and allocate time) for repaying such debt.","title":"Repay Technical Debt"},{"location":"level101/security/writing_secure_code/#refactoring","text":"Refactoring is the most effective way to keep a codebase clean and simple. Even a healthy codebase occasionally needs to be Regardless of the reasons behind refactoring, you should always follow one golden rule: never mix refactoring and functional changes in a single commit to the code repository. Refactoring changes are typically significant and can be difficult to understand. If a commit also includes functional changes, there\u2019s a higher risk that an author or reviewer might overlook bugs.","title":"Refactoring"},{"location":"level101/security/writing_secure_code/#unit-testing","text":"Unit testing can increase system security and reliability by pinpointing a wide range of bugs in individual software components before a release. This technique involves breaking software components into smaller, self-contained \u201cunits\u201d that have no external dependencies, and then testing each unit.","title":"Unit Testing"},{"location":"level101/security/writing_secure_code/#fuzz-testing","text":"Fuzz testing is a technique that complements the previously mentioned testing techniques. Fuzzing involves using a fuzzing engine to generate a large number of candidate inputs that are then passed through a fuzz driver to the fuzz target. The fuzzer then analyzes how the system handles the input. Complex inputs handled by all kinds of software are popular targets for fuzzing - for example, file parsers, compression algorithms, network protocol implementation and audio codec.","title":"Fuzz Testing"},{"location":"level101/security/writing_secure_code/#integration-testing","text":"Integration testing moves beyond individual units and abstractions, replacing fake or stubbed-out implementations of abstractions like databases or network services with real implementations. As a result, integration tests exercise more complete code paths. Because you must initialize and configure these other dependencies, integration testing may be slower and flakier than unit testing\u2014to execute the test, this approach incorporates real-world variables like network latency as services communicate end-to-end. As you move from testing individual low-level units of code to testing how they interact when composed together, the net result is a higher degree of confidence that the system is behaving as expected.","title":"Integration Testing"},{"location":"level101/security/writing_secure_code/#last-but-not-the-least","text":"Code Reviews Rely on Automation Don\u2019t check in Secrets Verifiable Builds","title":"Last But not the least"},{"location":"level101/systems_design/availability/","text":"HA - Availability - Common \u201cNines\u201d Availability is generally expressed as \u201cNines\u201d, common \u2018Nines\u2019 are listed below. Availability % Downtime per year Downtime per month Downtime per week Downtime per day 99%(Two Nines) 3.65 days 7.31 hours 1.68 hours 14.40 minutes 99.5%(Two and a half Nines) 1.83 days 3.65 hours 50.40 minutes 7.20 minutes 99.9%(Three Nines) 8.77 hours 43.83 minutes 10.08 minutes 1.44 minutes 99.95%(Three and a half Nines) 4.38 hours 21.92 minutes 5.04 minutes 43.20 seconds 99.99%(Four Nines) 52.60 minutes 4.38 minutes 1.01 minutes 8.64 seconds 99.995%(Four and a half Nines) 26.30 minutes 2.19 minutes 30.24 seconds 4.32 seconds 99.999%(Five Nines) 5.26 minutes 26.30 seconds 6.05 seconds 864.0 ms Refer https://en.wikipedia.org/wiki/High_availability#Percentage_calculation HA - Availability Serial Components A System with components is operating in the series If the failure of a part leads to the combination becoming inoperable. For example, if LB in our architecture fails, all access to app tiers will fail. LB and app tiers are connected serially. The combined availability of the system is the product of individual components availability A = Ax x Ay x \u2026.. Refer http://www.eventhelix.com/RealtimeMantra/FaultHandling/system_reliability_availability.htm HA - Availability Parallel Components A System with components is operating in parallel If the failure of a part leads to the other part taking over the operations of the failed part. If we have more than one LB and if the rest of the LBs can take over the traffic during one LB failure then LBs are operating in parallel The combined availability of the system is A = 1 - ( (1-Ax) x (1-Ax) x \u2026.. ) Refer http://www.eventhelix.com/RealtimeMantra/FaultHandling/system_reliability_availability.htm HA - Core Principles Elimination of single points of failure (SPOF) This means adding redundancy to the system so that the failure of a component does not mean failure of the entire system. Reliable crossover In redundant systems, the crossover point itself tends to become a single point of failure. Reliable systems must provide for reliable crossover. Detection of failures as they occur If the two principles above are observed, then a user may never see a failure Refer https://en.wikipedia.org/wiki/High_availability#Principles HA - SPOF WHAT: Never implement and always eliminate single points of failure. WHEN TO USE: During architecture reviews and new designs. HOW TO USE: Identify single instances on architectural diagrams. Strive for active/active configurations. At the very least we should have a standby to take control when active instances fail. WHY: Maximize availability through multiple instances. KEY TAKEAWAYS: Strive for active/active rather than active/passive solutions. Use load balancers to balance traffic across instances of a service. Use control services with active/passive instances for patterns that require singletons. HA - Reliable Crossover WHAT: Ensure when system components failover they do so reliably. WHEN TO USE: During architecture reviews, failure modeling, and designs. HOW TO USE: Identify how available a system is during the crossover and ensure it is within acceptable limits. WHY: Maximize availability and ensure data handling semantics are preserved. KEY TAKEAWAYS: Strive for active/active rather than active/passive solutions, they have a lesser risk of cross over being unreliable. Use LB and the right load balancing methods to ensure reliable failover. Model and build your data systems to ensure data is correctly handled when crossover happens. Generally, DB systems follow active/passive semantics for writes. Masters accept writes and when the master goes down, the follower is promoted to master(active from being passive) to accept writes. We have to be careful here that the cutover never introduces more than one master. This problem is called a split brain. Applications in SRE role SRE works on deciding an acceptable SLA and make sure the system is available to achieve the SLA SRE is involved in architecture design right from building the data center to make sure the site is not affected by a network switch, hardware, power, or software failures SRE also run mock drills of failures to see how the system behaves in uncharted territory and comes up with a plan to improve availability if there are misses. https://engineering.linkedin.com/blog/2017/11/resilience-engineering-at-linkedin-with-project-waterbear Post our understanding about HA, our architecture diagram looks something like this below","title":"Availability"},{"location":"level101/systems_design/availability/#ha-availability-common-nines","text":"Availability is generally expressed as \u201cNines\u201d, common \u2018Nines\u2019 are listed below. Availability % Downtime per year Downtime per month Downtime per week Downtime per day 99%(Two Nines) 3.65 days 7.31 hours 1.68 hours 14.40 minutes 99.5%(Two and a half Nines) 1.83 days 3.65 hours 50.40 minutes 7.20 minutes 99.9%(Three Nines) 8.77 hours 43.83 minutes 10.08 minutes 1.44 minutes 99.95%(Three and a half Nines) 4.38 hours 21.92 minutes 5.04 minutes 43.20 seconds 99.99%(Four Nines) 52.60 minutes 4.38 minutes 1.01 minutes 8.64 seconds 99.995%(Four and a half Nines) 26.30 minutes 2.19 minutes 30.24 seconds 4.32 seconds 99.999%(Five Nines) 5.26 minutes 26.30 seconds 6.05 seconds 864.0 ms","title":"HA - Availability - Common \u201cNines\u201d"},{"location":"level101/systems_design/availability/#refer","text":"https://en.wikipedia.org/wiki/High_availability#Percentage_calculation","title":"Refer"},{"location":"level101/systems_design/availability/#ha-availability-serial-components","text":"A System with components is operating in the series If the failure of a part leads to the combination becoming inoperable. For example, if LB in our architecture fails, all access to app tiers will fail. LB and app tiers are connected serially. The combined availability of the system is the product of individual components availability A = Ax x Ay x \u2026..","title":"HA - Availability Serial Components"},{"location":"level101/systems_design/availability/#refer_1","text":"http://www.eventhelix.com/RealtimeMantra/FaultHandling/system_reliability_availability.htm","title":"Refer"},{"location":"level101/systems_design/availability/#ha-availability-parallel-components","text":"A System with components is operating in parallel If the failure of a part leads to the other part taking over the operations of the failed part. If we have more than one LB and if the rest of the LBs can take over the traffic during one LB failure then LBs are operating in parallel The combined availability of the system is A = 1 - ( (1-Ax) x (1-Ax) x \u2026.. )","title":"HA - Availability Parallel Components"},{"location":"level101/systems_design/availability/#refer_2","text":"http://www.eventhelix.com/RealtimeMantra/FaultHandling/system_reliability_availability.htm","title":"Refer"},{"location":"level101/systems_design/availability/#ha-core-principles","text":"Elimination of single points of failure (SPOF) This means adding redundancy to the system so that the failure of a component does not mean failure of the entire system. Reliable crossover In redundant systems, the crossover point itself tends to become a single point of failure. Reliable systems must provide for reliable crossover. Detection of failures as they occur If the two principles above are observed, then a user may never see a failure","title":"HA - Core Principles"},{"location":"level101/systems_design/availability/#refer_3","text":"https://en.wikipedia.org/wiki/High_availability#Principles","title":"Refer"},{"location":"level101/systems_design/availability/#ha-spof","text":"WHAT: Never implement and always eliminate single points of failure. WHEN TO USE: During architecture reviews and new designs. HOW TO USE: Identify single instances on architectural diagrams. Strive for active/active configurations. At the very least we should have a standby to take control when active instances fail. WHY: Maximize availability through multiple instances. KEY TAKEAWAYS: Strive for active/active rather than active/passive solutions. Use load balancers to balance traffic across instances of a service. Use control services with active/passive instances for patterns that require singletons.","title":"HA - SPOF"},{"location":"level101/systems_design/availability/#ha-reliable-crossover","text":"WHAT: Ensure when system components failover they do so reliably. WHEN TO USE: During architecture reviews, failure modeling, and designs. HOW TO USE: Identify how available a system is during the crossover and ensure it is within acceptable limits. WHY: Maximize availability and ensure data handling semantics are preserved. KEY TAKEAWAYS: Strive for active/active rather than active/passive solutions, they have a lesser risk of cross over being unreliable. Use LB and the right load balancing methods to ensure reliable failover. Model and build your data systems to ensure data is correctly handled when crossover happens. Generally, DB systems follow active/passive semantics for writes. Masters accept writes and when the master goes down, the follower is promoted to master(active from being passive) to accept writes. We have to be careful here that the cutover never introduces more than one master. This problem is called a split brain.","title":"HA - Reliable Crossover"},{"location":"level101/systems_design/availability/#applications-in-sre-role","text":"SRE works on deciding an acceptable SLA and make sure the system is available to achieve the SLA SRE is involved in architecture design right from building the data center to make sure the site is not affected by a network switch, hardware, power, or software failures SRE also run mock drills of failures to see how the system behaves in uncharted territory and comes up with a plan to improve availability if there are misses. https://engineering.linkedin.com/blog/2017/11/resilience-engineering-at-linkedin-with-project-waterbear Post our understanding about HA, our architecture diagram looks something like this below","title":"Applications in SRE role"},{"location":"level101/systems_design/conclusion/","text":"Conclusion Armed with these principles, we hope the course will give a fresh perspective to design software systems. It might be over-engineering to get all this on day zero. But some are really important from day 0 like eliminating single points of failure, making scalable services by just increasing replicas. As a bottleneck is reached, we can split code by services, shard data to scale. As the organization matures, bringing in chaos engineering to measure how systems react to failure will help in designing robust software systems.","title":"Conclusion"},{"location":"level101/systems_design/conclusion/#conclusion","text":"Armed with these principles, we hope the course will give a fresh perspective to design software systems. It might be over-engineering to get all this on day zero. But some are really important from day 0 like eliminating single points of failure, making scalable services by just increasing replicas. As a bottleneck is reached, we can split code by services, shard data to scale. As the organization matures, bringing in chaos engineering to measure how systems react to failure will help in designing robust software systems.","title":"Conclusion"},{"location":"level101/systems_design/fault-tolerance/","text":"Fault Tolerance Failures are not avoidable in any system and will happen all the time, hence we need to build systems that can tolerate failures or recover from them. In systems, failure is the norm rather than the exception. \"Anything that can go wrong will go wrong\u201d -- Murphy\u2019s Law \u201cComplex systems contain changing mixtures of failures latent within them\u201d -- How Complex Systems Fail. Fault Tolerance - Failure Metrics Common failure metrics that get measured and tracked for any system. Mean time to repair (MTTR): The average time to repair and restore a failed system. Mean time between failures (MTBF): The average operational time between one device failure or system breakdown and the next. Mean time to failure (MTTF): The average time a device or system is expected to function before it fails. Mean time to detect (MTTD): The average time between the onset of a problem and when the organization detects it. Mean time to investigate (MTTI): The average time between the detection of an incident and when the organization begins to investigate its cause and solution. Mean time to restore service (MTRS): The average elapsed time from the detection of an incident until the affected system or component is again available to users. Mean time between system incidents (MTBSI): The average elapsed time between the detection of two consecutive incidents. MTBSI can be calculated by adding MTBF and MTRS (MTBSI = MTBF + MTRS). Failure rate: Another reliability metric, which measures the frequency with which a component or system fails. It is expressed as a number of failures over a unit of time. Refer https://www.splunk.com/en_us/data-insider/what-is-mean-time-to-repair.html Fault Tolerance - Fault Isolation Terms Systems should have a short circuit. Say in our content sharing system, if \u201cNotifications\u201d is not working, the site should gracefully handle that failure by removing the functionality instead of taking the whole site down. Swimlane is one of the commonly used fault isolation methodologies. Swimlane adds a barrier to the service from other services so that failure on either of them won\u2019t affect the other. Say we roll out a new feature \u2018Advertisement\u2019 in our content sharing app. We can have two architectures If Ads are generated on the fly synchronously during each Newsfeed request, the faults in the Ads feature get propagated to the Newsfeed feature. Instead if we swimlane the \u201cGeneration of Ads\u201d service and use a shared storage to populate Newsfeed App, Ads failures won\u2019t cascade to Newsfeed, and worst case if Ads don\u2019t meet SLA , we can have Newsfeed without Ads. Let's take another example, we have come up with a new model for our Content sharing App. Here we roll out an enterprise content sharing App where enterprises pay for the service and the content should never be shared outside the enterprise. Swimlane Principles Principle 1: Nothing is shared (also known as \u201cshare as little as possible\u201d). The less that is shared within a swim lane, the more fault isolative the swim lane becomes. (as shown in Enterprise use-case) Principle 2: Nothing crosses a swim lane boundary. Synchronous (defined by expecting a request\u2014not the transfer protocol) communication never crosses a swim lane boundary; if it does, the boundary is drawn incorrectly. (as shown in Ads feature) Swimlane Approaches Approach 1: Swim lane the money-maker. Never allow your cash register to be compromised by other systems. (Tier 1 vs Tier 2 in enterprise use case) Approach 2: Swim lane the biggest sources of incidents. Identify the recurring causes of pain and isolate them. (if Ads feature is in code yellow, swim laning it is the best option) Approach 3: Swim lane natural barriers. Customer boundaries make good swim lanes. (Public vs Enterprise customers) Refer https://learning.oreilly.com/library/view/the-art-of/9780134031408/ch21.html#ch21 Applications in SRE role Work with the DC tech or cloud team to distribute infrastructure such that its immune to switch or power failures by creating fault zones within a Data Center https://docs.microsoft.com/en-us/azure/virtual-machines/manage-availability#use-availability-zones-to-protect-from-datacenter-level-failures Work with the partners and design interaction between services such that one service breakdown is not amplified in a cascading fashion to all upstreams","title":"Fault Tolerance"},{"location":"level101/systems_design/fault-tolerance/#fault-tolerance","text":"Failures are not avoidable in any system and will happen all the time, hence we need to build systems that can tolerate failures or recover from them. In systems, failure is the norm rather than the exception. \"Anything that can go wrong will go wrong\u201d -- Murphy\u2019s Law \u201cComplex systems contain changing mixtures of failures latent within them\u201d -- How Complex Systems Fail.","title":"Fault Tolerance"},{"location":"level101/systems_design/fault-tolerance/#fault-tolerance-failure-metrics","text":"Common failure metrics that get measured and tracked for any system. Mean time to repair (MTTR): The average time to repair and restore a failed system. Mean time between failures (MTBF): The average operational time between one device failure or system breakdown and the next. Mean time to failure (MTTF): The average time a device or system is expected to function before it fails. Mean time to detect (MTTD): The average time between the onset of a problem and when the organization detects it. Mean time to investigate (MTTI): The average time between the detection of an incident and when the organization begins to investigate its cause and solution. Mean time to restore service (MTRS): The average elapsed time from the detection of an incident until the affected system or component is again available to users. Mean time between system incidents (MTBSI): The average elapsed time between the detection of two consecutive incidents. MTBSI can be calculated by adding MTBF and MTRS (MTBSI = MTBF + MTRS). Failure rate: Another reliability metric, which measures the frequency with which a component or system fails. It is expressed as a number of failures over a unit of time.","title":"Fault Tolerance - Failure Metrics"},{"location":"level101/systems_design/fault-tolerance/#refer","text":"https://www.splunk.com/en_us/data-insider/what-is-mean-time-to-repair.html","title":"Refer"},{"location":"level101/systems_design/fault-tolerance/#fault-tolerance-fault-isolation-terms","text":"Systems should have a short circuit. Say in our content sharing system, if \u201cNotifications\u201d is not working, the site should gracefully handle that failure by removing the functionality instead of taking the whole site down. Swimlane is one of the commonly used fault isolation methodologies. Swimlane adds a barrier to the service from other services so that failure on either of them won\u2019t affect the other. Say we roll out a new feature \u2018Advertisement\u2019 in our content sharing app. We can have two architectures If Ads are generated on the fly synchronously during each Newsfeed request, the faults in the Ads feature get propagated to the Newsfeed feature. Instead if we swimlane the \u201cGeneration of Ads\u201d service and use a shared storage to populate Newsfeed App, Ads failures won\u2019t cascade to Newsfeed, and worst case if Ads don\u2019t meet SLA , we can have Newsfeed without Ads. Let's take another example, we have come up with a new model for our Content sharing App. Here we roll out an enterprise content sharing App where enterprises pay for the service and the content should never be shared outside the enterprise.","title":"Fault Tolerance - Fault Isolation Terms"},{"location":"level101/systems_design/fault-tolerance/#swimlane-principles","text":"Principle 1: Nothing is shared (also known as \u201cshare as little as possible\u201d). The less that is shared within a swim lane, the more fault isolative the swim lane becomes. (as shown in Enterprise use-case) Principle 2: Nothing crosses a swim lane boundary. Synchronous (defined by expecting a request\u2014not the transfer protocol) communication never crosses a swim lane boundary; if it does, the boundary is drawn incorrectly. (as shown in Ads feature)","title":"Swimlane Principles"},{"location":"level101/systems_design/fault-tolerance/#swimlane-approaches","text":"Approach 1: Swim lane the money-maker. Never allow your cash register to be compromised by other systems. (Tier 1 vs Tier 2 in enterprise use case) Approach 2: Swim lane the biggest sources of incidents. Identify the recurring causes of pain and isolate them. (if Ads feature is in code yellow, swim laning it is the best option) Approach 3: Swim lane natural barriers. Customer boundaries make good swim lanes. (Public vs Enterprise customers)","title":"Swimlane Approaches"},{"location":"level101/systems_design/fault-tolerance/#refer_1","text":"https://learning.oreilly.com/library/view/the-art-of/9780134031408/ch21.html#ch21","title":"Refer"},{"location":"level101/systems_design/fault-tolerance/#applications-in-sre-role","text":"Work with the DC tech or cloud team to distribute infrastructure such that its immune to switch or power failures by creating fault zones within a Data Center https://docs.microsoft.com/en-us/azure/virtual-machines/manage-availability#use-availability-zones-to-protect-from-datacenter-level-failures Work with the partners and design interaction between services such that one service breakdown is not amplified in a cascading fashion to all upstreams","title":"Applications in SRE role"},{"location":"level101/systems_design/intro/","text":"Systems Design Prerequisites Fundamentals of common software system components: Linux Basics Linux Networking Databases RDBMS NoSQL Concepts What to expect from this course Thinking about and designing for scalability, availability, and reliability of large scale software systems. What is not covered under this course Individual software components\u2019 scalability and reliability concerns like e.g. Databases, while the same scalability principles and thinking can be applied, these individual components have their own specific nuances when scaling them and thinking about their reliability. More light will be shed on concepts rather than on setting up and configuring components like Loadbalancers to achieve scalability, availability, and reliability of systems Course Contents Introduction Scalability High Availability Fault Tolerance Introduction So, how do you go about learning to design a system? \u201d Like most great questions, it showed a level of naivety that was breathtaking. The only short answer I could give was, essentially, that you learned how to design a system by designing systems and finding out what works and what doesn\u2019t work.\u201d Jim Waldo, Sun Microsystems, On System Design As software and hardware systems have multiple moving parts, we need to think about how those parts will grow, their failure modes, their inter-dependencies, how it will impact the users and the business. There is no one-shot method or way to learn or do system design, we only learn to design systems by designing and iterating on them. This course will be a starter to make one think about scalability, availability, and fault tolerance during systems design. Backstory Let\u2019s design a simple content sharing application where users can share photos, media in our application which can be liked by their friends. Let\u2019s start with a simple design of the application and evolve it as we learn system design concepts","title":"Introduction"},{"location":"level101/systems_design/intro/#systems-design","text":"","title":"Systems Design"},{"location":"level101/systems_design/intro/#prerequisites","text":"Fundamentals of common software system components: Linux Basics Linux Networking Databases RDBMS NoSQL Concepts","title":"Prerequisites"},{"location":"level101/systems_design/intro/#what-to-expect-from-this-course","text":"Thinking about and designing for scalability, availability, and reliability of large scale software systems.","title":"What to expect from this course"},{"location":"level101/systems_design/intro/#what-is-not-covered-under-this-course","text":"Individual software components\u2019 scalability and reliability concerns like e.g. Databases, while the same scalability principles and thinking can be applied, these individual components have their own specific nuances when scaling them and thinking about their reliability. More light will be shed on concepts rather than on setting up and configuring components like Loadbalancers to achieve scalability, availability, and reliability of systems","title":"What is not covered under this course"},{"location":"level101/systems_design/intro/#course-contents","text":"Introduction Scalability High Availability Fault Tolerance","title":"Course Contents"},{"location":"level101/systems_design/intro/#introduction","text":"So, how do you go about learning to design a system? \u201d Like most great questions, it showed a level of naivety that was breathtaking. The only short answer I could give was, essentially, that you learned how to design a system by designing systems and finding out what works and what doesn\u2019t work.\u201d Jim Waldo, Sun Microsystems, On System Design As software and hardware systems have multiple moving parts, we need to think about how those parts will grow, their failure modes, their inter-dependencies, how it will impact the users and the business. There is no one-shot method or way to learn or do system design, we only learn to design systems by designing and iterating on them. This course will be a starter to make one think about scalability, availability, and fault tolerance during systems design.","title":"Introduction"},{"location":"level101/systems_design/intro/#backstory","text":"Let\u2019s design a simple content sharing application where users can share photos, media in our application which can be liked by their friends. Let\u2019s start with a simple design of the application and evolve it as we learn system design concepts","title":"Backstory"},{"location":"level101/systems_design/scalability/","text":"Scalability What does scalability mean for a system/service? A system is composed of services/components, each service/component scalability needs to be tackled separately, and the scalability of the system as a whole. A service is said to be scalable if, as resources are added to the system, it results in increased performance in a manner proportional to resources added An always-on service is said to be scalable if adding resources to facilitate redundancy does not result in a loss of performance Refer https://www.allthingsdistributed.com/2006/03/a_word_on_scalability.html Scalability - AKF Scale Cube The Scale Cube is a model for segmenting services, defining microservices, and scaling products. It also creates a common language for teams to discuss scale related options in designing solutions. The following section talks about certain scaling patterns based on our inferences from the AKF cube Scalability - Horizontal scaling Horizontal scaling stands for cloning of an application or service such that work can easily be distributed across instances with absolutely no bias. Let's see how our monolithic application improves with this principle Here DB is scaled separately from the application. This is to let you know each component\u2019s scaling capabilities can be different. Usually, web applications can be scaled by adding resources unless there is state stored inside the application. But DBs can be scaled only for Reads by adding more followers but Writes have to go to only one leader to make sure data is consistent. There are some DBs that support multi-leader writes but we are keeping them out of scope at this point. Apps should be able to differentiate between Reads and Writes to choose appropriate DB servers. Load balancers can split traffic between identical servers transparently. WHAT: Duplication of services or databases to spread transaction load. WHEN TO USE: Databases with a very high read-to-write ratio (5:1 or greater\u2014the higher the better). Because only read replicas of DBs can be scaled, not the Leader. HOW TO USE: Simply clone services and implement a load balancer. For databases, ensure that the accessing code understands the difference between a read and a write. WHY: Allows for the fast scale of transactions at the cost of duplicated data and functionality. KEY TAKEAWAYS: This is fast to implement, is a low cost from a developer effort perspective, and can scale transaction volumes nicely. However, they tend to be high cost from the perspective of the operational cost of data. The cost here means if we have 3 followers and 1 Leader DB, the same database will be stored as 4 copies in the 4 servers. Hence added storage cost Refer https://learning.oreilly.com/library/view/the-art-of/9780134031408/ch23.html Scalability Pattern - Load Balancing Improves the distribution of workloads across multiple computing resources, such as computers, a computer cluster, network links, central processing units, or disk drives. A commonly used technique is load balancing traffic across identical server clusters. A similar philosophy is used to load balance traffic across network links by ECMP , disk drives by RAID ,etc Aims to optimize resource use, maximize throughput, minimize response time, and avoid overload of any single resource. Using multiple components with load balancing instead of a single component may increase reliability and availability through redundancy. In our updated architecture diagram we have 4 servers to handle app traffic instead of a single server The device or system that performs load balancing is called a load balancer, abbreviated as LB. Refer https://en.wikipedia.org/wiki/Load_balancing_(computing) https://blog.envoyproxy.io/introduction-to-modern-network-load-balancing-and-proxying-a57f6ff80236 https://learning.oreilly.com/library/view/load-balancing-in/9781492038009/ https://learning.oreilly.com/library/view/practical-load-balancing/9781430236801/ http://shop.oreilly.com/product/9780596000509.do Scalability Pattern - LB Tasks What does an LB do? Service discovery: What backends are available in the system? In our architecture, 4 servers are available to serve App traffic. LB acts as a single endpoint that clients can use transparently to reach one of the 4 servers. Health checking: What backends are currently healthy and available to accept requests? If one out of the 4 App servers turns bad, LB should automatically short circuit the path so that clients don\u2019t sense any application downtime Load balancing: What algorithm should be used to balance individual requests across the healthy backends? There are many algorithms to distribute traffic across one of the four servers. Based on observations/experience, SRE can pick the algorithm that suits their pattern Scalability Pattern - LB Methods Common Load Balancing Methods Least Connection Method directs traffic to the server with the fewest active connections. Most useful when there are a large number of persistent connections in the traffic unevenly distributed between the servers. Works if clients maintain long-lived connections Least Response Time Method directs traffic to the server with the fewest active connections and the lowest average response time. Here response time is used to provide feedback of the server\u2019s health Round Robin Method rotates servers by directing traffic to the first available server and then moves that server to the bottom of the queue. Most useful when servers are of equal specification and there are not many persistent connections. IP Hash the IP address of the client determines which server receives the request. This can sometimes cause skewness in distribution but is useful if apps store some state locally and need some stickiness More advanced client/server-side example techniques - https://docs.nginx.com/nginx/admin-guide/load-balancer/ - http://cbonte.github.io/haproxy-dconv/2.2/intro.html#3.3.5 - https://twitter.github.io/finagle/guide/Clients.html#load-balancing Scalability Pattern - Caching - Content Delivery Networks (CDN) CDNs are added closer to the client\u2019s location. If the app has static data like images, Javascript, CSS which don\u2019t change very often, they can be cached. Since our example is a content sharing site, static content can be cached in CDNs with a suitable expiry. WHAT: Use CDNs (content delivery networks) to offload traffic from your site. WHEN TO USE: When speed improvements and scale warrant the additional cost. HOW TO USE: Most CDNs leverage DNS to serve content on your site\u2019s behalf. Thus you may need to make minor DNS changes or additions and move content to be served from new subdomains. Eg media-exp1.licdn.com is a domain used by Linkedin to serve static content Here a CNAME points the domain to the DNS of the CDN provider dig media-exp1.licdn.com +short 2-01-2c3e-005c.cdx.cedexis.net. WHY: CDNs help offload traffic spikes and are often economical ways to scale parts of a site\u2019s traffic. They also often substantially improve page download times. KEY TAKEAWAYS: CDNs are a fast and simple way to offset the spikiness of traffic as well as traffic growth in general. Make sure you perform a cost-benefit analysis and monitor the CDN usage. If CDNs have a lot of cache misses, then we don\u2019t gain much from CDN and are still serving requests using our compute resources. Scalability - Microservices This pattern represents the separation of work by service or function within the application. Microservices are meant to address the issues associated with growth and complexity in the code base and data sets. The intent is to create fault isolation as well as to reduce response times. Microservices can scale transactions, data sizes, and codebase sizes. They are most effective in scaling the size and complexity of your codebase. They tend to cost a bit more than horizontal scaling because the engineering team needs to rewrite services or, at the very least, disaggregate them from the original monolithic application. WHAT: Sometimes referred to as scale through services or resources, this rule focuses on scaling by splitting data sets, transactions, and engineering teams along verb (services) or noun (resources) boundaries. WHEN TO USE: Very large data sets where relations between data are not necessary. Large, complex systems where scaling engineering resources requires specialization. HOW TO USE: Split up actions by using verbs, or resources by using nouns, or use a mix. Split both the services and the data along the lines defined by the verb/noun approach. WHY: Allows for efficient scaling of not only transactions but also very large data sets associated with those transactions. It also allows for the efficient scaling of teams. KEY TAKEAWAYS: Microservices allow for efficient scaling of transactions, large data sets, and can help with fault isolation. It helps reduce the communication overhead of teams. The codebase becomes less complex as disjoint features are decoupled and spun as new services thereby letting each service scale independently specific to its requirement. Refer https://learning.oreilly.com/library/view/the-art-of/9780134031408/ch23.html Scalability - Sharding This pattern represents the separation of work based on attributes that are looked up to or determined at the time of the transaction. Most often, these are implemented as splits by requestor, customer, or client. Very often, a lookup service or deterministic algorithm will need to be written for these types of splits. Sharding aids in scaling transaction growth, scaling instruction sets, and decreasing processing time (the last by limiting the data necessary to perform any transaction). This is more effective at scaling growth in customers or clients. It can aid with disaster recovery efforts, and limit the impact of incidents to only a specific segment of customers. Here the auth data is sharded based on user names so that DBs can respond faster as the amount of data DBs have to work on has drastically reduced during queries. There can be other ways to split Here the whole data center is split and replicated and clients are directed to a data center based on their geography. This helps in improving performance as clients are directed to the closest data center and performance increases as we add more data centers. There are some replication and consistency overhead with this approach one needs to be aware of. This also gives fault tolerance by rolling out test features to one site and rollback if there is an impact to that geography WHAT: This is very often a split by some unique aspect of the customer such as customer ID, name, geography, and so on. WHEN TO USE: Very large, similar data sets such as large and rapidly growing customer bases or when the response time for a geographically distributed customer base is important. HOW TO USE: Identify something you know about the customer, such as customer ID, last name, geography, or device, and split or partition both data and services based on that attribute. WHY: Rapid customer growth exceeds other forms of data growth, or you have the need to perform fault isolation between certain customer groups as you scale. KEY TAKEAWAYS: Shards are effective at helping you to scale customer bases but can also be applied to other very large data sets that can\u2019t be pulled apart using the microservices methodology. Refer https://learning.oreilly.com/library/view/the-art-of/9780134031408/ch23.html Applications in SRE role SREs in coordination with the network team work on how to map users' traffic to a particular site. https://engineering.linkedin.com/blog/2017/05/trafficshift--load-testing-at-scale SREs work closely with the Dev team to split monoliths to multiple microservices that are easy to run and manage SREs work on improving Load Balancers' reliability, service discovery, and performance SREs work closely to split Data into shards and manage data integrity and consistency. https://engineering.linkedin.com/espresso/introducing-espresso-linkedins-hot-new-distributed-document-store SREs work to set up, configure, and improve the CDN cache hit rate.","title":"Scalability"},{"location":"level101/systems_design/scalability/#scalability","text":"What does scalability mean for a system/service? A system is composed of services/components, each service/component scalability needs to be tackled separately, and the scalability of the system as a whole. A service is said to be scalable if, as resources are added to the system, it results in increased performance in a manner proportional to resources added An always-on service is said to be scalable if adding resources to facilitate redundancy does not result in a loss of performance","title":"Scalability"},{"location":"level101/systems_design/scalability/#refer","text":"https://www.allthingsdistributed.com/2006/03/a_word_on_scalability.html","title":"Refer"},{"location":"level101/systems_design/scalability/#scalability-akf-scale-cube","text":"The Scale Cube is a model for segmenting services, defining microservices, and scaling products. It also creates a common language for teams to discuss scale related options in designing solutions. The following section talks about certain scaling patterns based on our inferences from the AKF cube","title":"Scalability - AKF Scale Cube"},{"location":"level101/systems_design/scalability/#scalability-horizontal-scaling","text":"Horizontal scaling stands for cloning of an application or service such that work can easily be distributed across instances with absolutely no bias. Let's see how our monolithic application improves with this principle Here DB is scaled separately from the application. This is to let you know each component\u2019s scaling capabilities can be different. Usually, web applications can be scaled by adding resources unless there is state stored inside the application. But DBs can be scaled only for Reads by adding more followers but Writes have to go to only one leader to make sure data is consistent. There are some DBs that support multi-leader writes but we are keeping them out of scope at this point. Apps should be able to differentiate between Reads and Writes to choose appropriate DB servers. Load balancers can split traffic between identical servers transparently. WHAT: Duplication of services or databases to spread transaction load. WHEN TO USE: Databases with a very high read-to-write ratio (5:1 or greater\u2014the higher the better). Because only read replicas of DBs can be scaled, not the Leader. HOW TO USE: Simply clone services and implement a load balancer. For databases, ensure that the accessing code understands the difference between a read and a write. WHY: Allows for the fast scale of transactions at the cost of duplicated data and functionality. KEY TAKEAWAYS: This is fast to implement, is a low cost from a developer effort perspective, and can scale transaction volumes nicely. However, they tend to be high cost from the perspective of the operational cost of data. The cost here means if we have 3 followers and 1 Leader DB, the same database will be stored as 4 copies in the 4 servers. Hence added storage cost","title":"Scalability - Horizontal scaling"},{"location":"level101/systems_design/scalability/#refer_1","text":"https://learning.oreilly.com/library/view/the-art-of/9780134031408/ch23.html","title":"Refer"},{"location":"level101/systems_design/scalability/#scalability-pattern-load-balancing","text":"Improves the distribution of workloads across multiple computing resources, such as computers, a computer cluster, network links, central processing units, or disk drives. A commonly used technique is load balancing traffic across identical server clusters. A similar philosophy is used to load balance traffic across network links by ECMP , disk drives by RAID ,etc Aims to optimize resource use, maximize throughput, minimize response time, and avoid overload of any single resource. Using multiple components with load balancing instead of a single component may increase reliability and availability through redundancy. In our updated architecture diagram we have 4 servers to handle app traffic instead of a single server The device or system that performs load balancing is called a load balancer, abbreviated as LB.","title":"Scalability Pattern - Load Balancing"},{"location":"level101/systems_design/scalability/#refer_2","text":"https://en.wikipedia.org/wiki/Load_balancing_(computing) https://blog.envoyproxy.io/introduction-to-modern-network-load-balancing-and-proxying-a57f6ff80236 https://learning.oreilly.com/library/view/load-balancing-in/9781492038009/ https://learning.oreilly.com/library/view/practical-load-balancing/9781430236801/ http://shop.oreilly.com/product/9780596000509.do","title":"Refer"},{"location":"level101/systems_design/scalability/#scalability-pattern-lb-tasks","text":"What does an LB do?","title":"Scalability Pattern - LB Tasks"},{"location":"level101/systems_design/scalability/#service-discovery","text":"What backends are available in the system? In our architecture, 4 servers are available to serve App traffic. LB acts as a single endpoint that clients can use transparently to reach one of the 4 servers.","title":"Service discovery:"},{"location":"level101/systems_design/scalability/#health-checking","text":"What backends are currently healthy and available to accept requests? If one out of the 4 App servers turns bad, LB should automatically short circuit the path so that clients don\u2019t sense any application downtime","title":"Health checking:"},{"location":"level101/systems_design/scalability/#load-balancing","text":"What algorithm should be used to balance individual requests across the healthy backends? There are many algorithms to distribute traffic across one of the four servers. Based on observations/experience, SRE can pick the algorithm that suits their pattern","title":"Load balancing:"},{"location":"level101/systems_design/scalability/#scalability-pattern-lb-methods","text":"Common Load Balancing Methods","title":"Scalability Pattern - LB Methods"},{"location":"level101/systems_design/scalability/#least-connection-method","text":"directs traffic to the server with the fewest active connections. Most useful when there are a large number of persistent connections in the traffic unevenly distributed between the servers. Works if clients maintain long-lived connections","title":"Least Connection Method"},{"location":"level101/systems_design/scalability/#least-response-time-method","text":"directs traffic to the server with the fewest active connections and the lowest average response time. Here response time is used to provide feedback of the server\u2019s health","title":"Least Response Time Method"},{"location":"level101/systems_design/scalability/#round-robin-method","text":"rotates servers by directing traffic to the first available server and then moves that server to the bottom of the queue. Most useful when servers are of equal specification and there are not many persistent connections.","title":"Round Robin Method"},{"location":"level101/systems_design/scalability/#ip-hash","text":"the IP address of the client determines which server receives the request. This can sometimes cause skewness in distribution but is useful if apps store some state locally and need some stickiness More advanced client/server-side example techniques - https://docs.nginx.com/nginx/admin-guide/load-balancer/ - http://cbonte.github.io/haproxy-dconv/2.2/intro.html#3.3.5 - https://twitter.github.io/finagle/guide/Clients.html#load-balancing","title":"IP Hash"},{"location":"level101/systems_design/scalability/#scalability-pattern-caching-content-delivery-networks-cdn","text":"CDNs are added closer to the client\u2019s location. If the app has static data like images, Javascript, CSS which don\u2019t change very often, they can be cached. Since our example is a content sharing site, static content can be cached in CDNs with a suitable expiry. WHAT: Use CDNs (content delivery networks) to offload traffic from your site. WHEN TO USE: When speed improvements and scale warrant the additional cost. HOW TO USE: Most CDNs leverage DNS to serve content on your site\u2019s behalf. Thus you may need to make minor DNS changes or additions and move content to be served from new subdomains. Eg media-exp1.licdn.com is a domain used by Linkedin to serve static content Here a CNAME points the domain to the DNS of the CDN provider dig media-exp1.licdn.com +short 2-01-2c3e-005c.cdx.cedexis.net. WHY: CDNs help offload traffic spikes and are often economical ways to scale parts of a site\u2019s traffic. They also often substantially improve page download times. KEY TAKEAWAYS: CDNs are a fast and simple way to offset the spikiness of traffic as well as traffic growth in general. Make sure you perform a cost-benefit analysis and monitor the CDN usage. If CDNs have a lot of cache misses, then we don\u2019t gain much from CDN and are still serving requests using our compute resources.","title":"Scalability Pattern - Caching - Content Delivery Networks (CDN)"},{"location":"level101/systems_design/scalability/#scalability-microservices","text":"This pattern represents the separation of work by service or function within the application. Microservices are meant to address the issues associated with growth and complexity in the code base and data sets. The intent is to create fault isolation as well as to reduce response times. Microservices can scale transactions, data sizes, and codebase sizes. They are most effective in scaling the size and complexity of your codebase. They tend to cost a bit more than horizontal scaling because the engineering team needs to rewrite services or, at the very least, disaggregate them from the original monolithic application. WHAT: Sometimes referred to as scale through services or resources, this rule focuses on scaling by splitting data sets, transactions, and engineering teams along verb (services) or noun (resources) boundaries. WHEN TO USE: Very large data sets where relations between data are not necessary. Large, complex systems where scaling engineering resources requires specialization. HOW TO USE: Split up actions by using verbs, or resources by using nouns, or use a mix. Split both the services and the data along the lines defined by the verb/noun approach. WHY: Allows for efficient scaling of not only transactions but also very large data sets associated with those transactions. It also allows for the efficient scaling of teams. KEY TAKEAWAYS: Microservices allow for efficient scaling of transactions, large data sets, and can help with fault isolation. It helps reduce the communication overhead of teams. The codebase becomes less complex as disjoint features are decoupled and spun as new services thereby letting each service scale independently specific to its requirement.","title":"Scalability - Microservices"},{"location":"level101/systems_design/scalability/#refer_3","text":"https://learning.oreilly.com/library/view/the-art-of/9780134031408/ch23.html","title":"Refer"},{"location":"level101/systems_design/scalability/#scalability-sharding","text":"This pattern represents the separation of work based on attributes that are looked up to or determined at the time of the transaction. Most often, these are implemented as splits by requestor, customer, or client. Very often, a lookup service or deterministic algorithm will need to be written for these types of splits. Sharding aids in scaling transaction growth, scaling instruction sets, and decreasing processing time (the last by limiting the data necessary to perform any transaction). This is more effective at scaling growth in customers or clients. It can aid with disaster recovery efforts, and limit the impact of incidents to only a specific segment of customers. Here the auth data is sharded based on user names so that DBs can respond faster as the amount of data DBs have to work on has drastically reduced during queries. There can be other ways to split Here the whole data center is split and replicated and clients are directed to a data center based on their geography. This helps in improving performance as clients are directed to the closest data center and performance increases as we add more data centers. There are some replication and consistency overhead with this approach one needs to be aware of. This also gives fault tolerance by rolling out test features to one site and rollback if there is an impact to that geography WHAT: This is very often a split by some unique aspect of the customer such as customer ID, name, geography, and so on. WHEN TO USE: Very large, similar data sets such as large and rapidly growing customer bases or when the response time for a geographically distributed customer base is important. HOW TO USE: Identify something you know about the customer, such as customer ID, last name, geography, or device, and split or partition both data and services based on that attribute. WHY: Rapid customer growth exceeds other forms of data growth, or you have the need to perform fault isolation between certain customer groups as you scale. KEY TAKEAWAYS: Shards are effective at helping you to scale customer bases but can also be applied to other very large data sets that can\u2019t be pulled apart using the microservices methodology.","title":"Scalability - Sharding"},{"location":"level101/systems_design/scalability/#refer_4","text":"https://learning.oreilly.com/library/view/the-art-of/9780134031408/ch23.html","title":"Refer"},{"location":"level101/systems_design/scalability/#applications-in-sre-role","text":"SREs in coordination with the network team work on how to map users' traffic to a particular site. https://engineering.linkedin.com/blog/2017/05/trafficshift--load-testing-at-scale SREs work closely with the Dev team to split monoliths to multiple microservices that are easy to run and manage SREs work on improving Load Balancers' reliability, service discovery, and performance SREs work closely to split Data into shards and manage data integrity and consistency. https://engineering.linkedin.com/espresso/introducing-espresso-linkedins-hot-new-distributed-document-store SREs work to set up, configure, and improve the CDN cache hit rate.","title":"Applications in SRE role"},{"location":"level102/containerization_and_orchestration/conclusion/","text":"Conclusion In this sub-module we have toured the world of containers starting from why we use containers, how containers evolved from the virtual machine past (though they are, in no means, obsolete) and how they are different from virtual machines. We then saw how containers are implemented with emphasis on cgroups and namespaces along with some hands-on exercises. Finally we concluded our journey with container orchestration where we learnt a bit of Kubernetes with some practical examples. Hope this module gives you enough knowledge and interest to continue learning and applying these technologies in greater depth!","title":"Conclusion"},{"location":"level102/containerization_and_orchestration/conclusion/#conclusion","text":"In this sub-module we have toured the world of containers starting from why we use containers, how containers evolved from the virtual machine past (though they are, in no means, obsolete) and how they are different from virtual machines. We then saw how containers are implemented with emphasis on cgroups and namespaces along with some hands-on exercises. Finally we concluded our journey with container orchestration where we learnt a bit of Kubernetes with some practical examples. Hope this module gives you enough knowledge and interest to continue learning and applying these technologies in greater depth!","title":"Conclusion"},{"location":"level102/containerization_and_orchestration/containerization_with_docker/","text":"Introduction Docker has gained huge popularity among other container engines since it was released to the public in 2013. Here are some of the reasons why Docker so popular: Improved portability Docker containers can be shipped and run across environments be it local machine, on-prem or cloud instances in the form of Docker images. Compared to docker containers, LXC containers have more machine specifications. - Lighter weight Docker images are light weight compared to VM images. For example, an Ubuntu 18.04 VM size is about 3GB whereas the docker image is 45MB! Versioning of container images Docker supports maintaining multiple versions of images which makes it easier to look up the history of an image and even rollback. Reuse of images Since Docker images are in the form of layers, one image can be used as base on top of which new images are built. For example, Alpine is a light weight image (5MB) which is commonly used as a base image. Docker layers are managed using storage drivers . Community support Docker hub is a container registry where anyone logged in can upload or download a container image. Docker images of popular OS distros are regularly updated in docker hub and receive large community support. Let\u2019s look at some terms which come up during our discussion of Docker. Docker terminology Docker images Docker image contains the executable version of the application along with the dependencies (config files, libraries, binaries) required for the application to run as a standalone container. It can be understood as a snapshot of a container. Docker images are present as layers on top of the base layer. These layers are the ones that are versioned. The most recent version of layer is the one that is used on top of the base image. docker image ls lists the images present in the host machine. Docker containers Docker container is the running instance of the docker image. While images are static, containers created from the images can be executed into and interacted with. This is actually the \u201ccontainer\u201d from the previous sections of the module. docker run is the command used to instantiate containers from images. docker ps lists docker containers currently running in the host machine. Docker file It is a plain text file of instructions based on which an image is assembled by docker engine (daemon, to be precise). It contains information on base image, ENV variables to be injected. docker build is used to build images from dockerfile. Docker hub It is Docker\u2019s official container registry of images. Any user with a docker login can upload custom images to Docker hub using docker push and fetch images using docker pull . Having known the basic terminologies let\u2019s look at how docker engine works; how CLI commands are interpreted and container life-cycle is managed. Components of Docker engine Let\u2019s start with the diagram of Docker Engine to understand better: The docker engine follows a client-server architecture. It consists of 3 components: Docker client This is the component the user directly interacts with. When you execute docker commands which we saw earlier (push, pull, container ls, image ls) , we are actually using the docker client. A single docker client can communicate with multiple docker daemons. REST API Provides an interface for the docker client and daemon to communicate. Docker Daemon (server) This is the main component of the docker engine. It builds images from dockerfile, fetches images from docker registry, pushes images to the registry, stops, starts containers etc. It also manages networking between containers. LAB The official docker github provides labs at several levels for learning Docker. We're linking one of the labs which we found great for people beginning from scratch. Please follow the labs in this order: Setting up local environment for the labs Basics for using docker CLI Creating and containerizing a basic Flask app Here is another beginner level lab for dockerizing a MERN (Mongo + React + Express) application and it\u2019s easy to follow along. Advanced features of Docker While we have covered the basics of containerization and how a standalone application can be dockerized, processes in the real world need to communicate with each other. This need is particularly prevalent in applications which follow a microservice architecture. Docker networks Docker networks facilitate the interaction between containers running on the same hosts or even different hosts. There are several options provided through docker network command which specifies how the container interacts with the host and with other containers. The host option allows sharing of network stack with the host, bridge allows communication between containers running on the same host but not external to the host, overlay facilitates interaction between containers across hosts attached to the same network and macvlan which assigns a separate MAC address to a container for legacy containers are some important types of networks supported by Docker. This however is outside the scope of this module. The official documentation on docker networks itself is a good place to start. Volumes Apart from images, containers and networks, Docker also provides the option to create and mount volumes within containers. Generally, data within docker containers is non-persistent i.e once you kill the container the data is lost. Volumes are used for storing persistent data in containers. This Docker lab is a great place to start playing with volumes. In the next section we see how container deployments are orchestrated with Kubernetes.","title":"Containerization With Docker"},{"location":"level102/containerization_and_orchestration/containerization_with_docker/#introduction","text":"Docker has gained huge popularity among other container engines since it was released to the public in 2013. Here are some of the reasons why Docker so popular: Improved portability Docker containers can be shipped and run across environments be it local machine, on-prem or cloud instances in the form of Docker images. Compared to docker containers, LXC containers have more machine specifications. - Lighter weight Docker images are light weight compared to VM images. For example, an Ubuntu 18.04 VM size is about 3GB whereas the docker image is 45MB! Versioning of container images Docker supports maintaining multiple versions of images which makes it easier to look up the history of an image and even rollback. Reuse of images Since Docker images are in the form of layers, one image can be used as base on top of which new images are built. For example, Alpine is a light weight image (5MB) which is commonly used as a base image. Docker layers are managed using storage drivers . Community support Docker hub is a container registry where anyone logged in can upload or download a container image. Docker images of popular OS distros are regularly updated in docker hub and receive large community support. Let\u2019s look at some terms which come up during our discussion of Docker.","title":"Introduction"},{"location":"level102/containerization_and_orchestration/containerization_with_docker/#docker-terminology","text":"Docker images Docker image contains the executable version of the application along with the dependencies (config files, libraries, binaries) required for the application to run as a standalone container. It can be understood as a snapshot of a container. Docker images are present as layers on top of the base layer. These layers are the ones that are versioned. The most recent version of layer is the one that is used on top of the base image. docker image ls lists the images present in the host machine. Docker containers Docker container is the running instance of the docker image. While images are static, containers created from the images can be executed into and interacted with. This is actually the \u201ccontainer\u201d from the previous sections of the module. docker run is the command used to instantiate containers from images. docker ps lists docker containers currently running in the host machine. Docker file It is a plain text file of instructions based on which an image is assembled by docker engine (daemon, to be precise). It contains information on base image, ENV variables to be injected. docker build is used to build images from dockerfile. Docker hub It is Docker\u2019s official container registry of images. Any user with a docker login can upload custom images to Docker hub using docker push and fetch images using docker pull . Having known the basic terminologies let\u2019s look at how docker engine works; how CLI commands are interpreted and container life-cycle is managed.","title":"Docker terminology"},{"location":"level102/containerization_and_orchestration/containerization_with_docker/#components-of-docker-engine","text":"Let\u2019s start with the diagram of Docker Engine to understand better: The docker engine follows a client-server architecture. It consists of 3 components: Docker client This is the component the user directly interacts with. When you execute docker commands which we saw earlier (push, pull, container ls, image ls) , we are actually using the docker client. A single docker client can communicate with multiple docker daemons. REST API Provides an interface for the docker client and daemon to communicate. Docker Daemon (server) This is the main component of the docker engine. It builds images from dockerfile, fetches images from docker registry, pushes images to the registry, stops, starts containers etc. It also manages networking between containers.","title":"Components of Docker engine"},{"location":"level102/containerization_and_orchestration/containerization_with_docker/#lab","text":"The official docker github provides labs at several levels for learning Docker. We're linking one of the labs which we found great for people beginning from scratch. Please follow the labs in this order: Setting up local environment for the labs Basics for using docker CLI Creating and containerizing a basic Flask app Here is another beginner level lab for dockerizing a MERN (Mongo + React + Express) application and it\u2019s easy to follow along.","title":"LAB"},{"location":"level102/containerization_and_orchestration/containerization_with_docker/#advanced-features-of-docker","text":"While we have covered the basics of containerization and how a standalone application can be dockerized, processes in the real world need to communicate with each other. This need is particularly prevalent in applications which follow a microservice architecture. Docker networks Docker networks facilitate the interaction between containers running on the same hosts or even different hosts. There are several options provided through docker network command which specifies how the container interacts with the host and with other containers. The host option allows sharing of network stack with the host, bridge allows communication between containers running on the same host but not external to the host, overlay facilitates interaction between containers across hosts attached to the same network and macvlan which assigns a separate MAC address to a container for legacy containers are some important types of networks supported by Docker. This however is outside the scope of this module. The official documentation on docker networks itself is a good place to start. Volumes Apart from images, containers and networks, Docker also provides the option to create and mount volumes within containers. Generally, data within docker containers is non-persistent i.e once you kill the container the data is lost. Volumes are used for storing persistent data in containers. This Docker lab is a great place to start playing with volumes. In the next section we see how container deployments are orchestrated with Kubernetes.","title":"Advanced features of Docker"},{"location":"level102/containerization_and_orchestration/intro/","text":"Containers and orchestration Introduction Containers, Docker and Kubernetes are \"cool\" terms that are being spoken of by everyone involved with software in some way. Let's dive into each of these pieces of technology at enough depth to understand what the whole deal is about! In this module we talk about the ins and outs of containers: the internals and usage of containers; how they are implemented, how to containerize your application and finally, how to deploy containerized applications on a large scale without losing your sleep. We'll also get our hands dirty by trying out a few lab exercises. Prerequisites Basic knowledge of linux will be helpful understanding the internals of containers Basic knowledge of shell commands (will come handy when we're containerizing applications) Knowledge of running a basic web application. You can go through our Python And Web module to gain familiarity with this. What to expect from this course This module is divided into 3 sub-modules. In the first sub module, we will cover the internals of containerization and why they\u2019re used for. The second sub-module introduces Docker, a popular container engine and contains lab exercises on dockerizing a basic webapp. The last module talks about container orchestration with Kubernetes and some lab exercises to show how it makes the lives of SREs easy. What is not covered under this course We will not cover advanced docker and kubernetes concepts. However, we will be leading you to links and references from where you can pick them up as per your interest. Course Contents The following topics has been covered in this course: Introduction to containers What are containers Why containers Difference between virtual machines and containers How are containers implemented Namespaces Cgroups Container engines Containerization with Docker Introduction Basic docker terminology Components of Docker engine Hands-on Introduction to Advanced Docker Container orchestration with Kubernetes Introduction Motivation to use Kubernetes Kubernetes Architecture Hands-on Introduction to Advanced Kubernetes concepts Conclusion","title":"Introduction"},{"location":"level102/containerization_and_orchestration/intro/#containers-and-orchestration","text":"","title":"Containers and orchestration"},{"location":"level102/containerization_and_orchestration/intro/#introduction","text":"Containers, Docker and Kubernetes are \"cool\" terms that are being spoken of by everyone involved with software in some way. Let's dive into each of these pieces of technology at enough depth to understand what the whole deal is about! In this module we talk about the ins and outs of containers: the internals and usage of containers; how they are implemented, how to containerize your application and finally, how to deploy containerized applications on a large scale without losing your sleep. We'll also get our hands dirty by trying out a few lab exercises.","title":"Introduction"},{"location":"level102/containerization_and_orchestration/intro/#prerequisites","text":"Basic knowledge of linux will be helpful understanding the internals of containers Basic knowledge of shell commands (will come handy when we're containerizing applications) Knowledge of running a basic web application. You can go through our Python And Web module to gain familiarity with this.","title":"Prerequisites"},{"location":"level102/containerization_and_orchestration/intro/#what-to-expect-from-this-course","text":"This module is divided into 3 sub-modules. In the first sub module, we will cover the internals of containerization and why they\u2019re used for. The second sub-module introduces Docker, a popular container engine and contains lab exercises on dockerizing a basic webapp. The last module talks about container orchestration with Kubernetes and some lab exercises to show how it makes the lives of SREs easy.","title":"What to expect from this course"},{"location":"level102/containerization_and_orchestration/intro/#what-is-not-covered-under-this-course","text":"We will not cover advanced docker and kubernetes concepts. However, we will be leading you to links and references from where you can pick them up as per your interest.","title":"What is not covered under this course"},{"location":"level102/containerization_and_orchestration/intro/#course-contents","text":"The following topics has been covered in this course: Introduction to containers What are containers Why containers Difference between virtual machines and containers How are containers implemented Namespaces Cgroups Container engines Containerization with Docker Introduction Basic docker terminology Components of Docker engine Hands-on Introduction to Advanced Docker Container orchestration with Kubernetes Introduction Motivation to use Kubernetes Kubernetes Architecture Hands-on Introduction to Advanced Kubernetes concepts Conclusion","title":"Course Contents"},{"location":"level102/containerization_and_orchestration/intro_to_containers/","text":"What are containers Here's a popular definition of containers according to Docker , a popular containerization engine : A container is a standard unit of software that packages up code and all its dependencies so the application runs quickly and reliably from one computing environment to another Let's break this down. A container is your code bundled along with its entire runtime environment. That includes your system libraries, binaries and config files needed for your application to run. Why containers You might wonder why we need to pack your application along with its dependencies. This is where the second part of the definition comes, ...so the application runs quickly and reliably from one computing environment to another. Developers usually write code in their dev environment (or local machine), test it in one or two staging/test environments before pushing their code into production. Ideally, for reliably testing applications before pushing to production, we need all these environments to be uniform to a tee (underlying OS, system libraries etc). Of course, the ideal is hard to achieve especially when we're using a mix of on-prem (complete control) and cloud infrastructure providers (more restrictive in terms of control of hardware and security options), a scenario which is more common today. This is exactly why we need to package not only the code but also the dependencies; so that your application runs reliably irrespective of which infrastructure or environment it runs on. We can run several containers on a single host. Due to how containers are implemented, each container has its own isolated environment within the same host. This means that a monolithic application can be broken down into micro-services and packaged into containers. Each microservice runs in the host machine in isolated environments. This is another reason why containers are used: separation of concerns . Providing isolated environments does not let the failure of one application in one container affect the other. This is called fault isolation . Isolation also gives the added benefit of increased security due to restricted visibility of processes in a container. Due to how most of the containerization solutions are implemented, we also have the option to cap the amount of resources consumed by applications running within a container. This is called resource limiting . Will will discuss this feature in more detail in the section on cgroups. Difference between virtual machines and containers Let's digress a little and go into some history. In the previous section we talked about how containers help us in achieving separation of concerns. Before the wide-spread usage of containers, virtualization was used for running applications in isolated environments in the same host (it\u2019s still being used today in some cases). In plain terms, virtualization is where we package software along with a copy of the OS on which it runs. This package is called a virtual machine (VM). The image of the OS bundled in the VM is called Guest OS. A component called Hypervisor sits between the Guest and the Host OS and is responsible for facilitating the access of the underlying OS\u2019s hardware to the Guest OS. You can learn more about hypervisors here . Similar to how multiple containers can be run in a single host machine, multiple VMs can be run on a single host and in this way, it\u2019s possible to run applications (or each microservice) in a separate VM and achieve separation of concerns. The main focus here is on the size of the VMs and containers. VMs come along with a copy of the guest operating system and therefore are heavy-weight compared to containers. If you\u2019re more interested in comparison of VMs and containers, you can check these articles from Backblaze and NetApp . While it is possible to run an operating system on a host with an incompatible kernel using hypervisors (e.g Windows 10 VM on CentOS 7), in cases where kernels can be shared (e.g Ubuntu on CentOS 7) containers are preferred over VMs due to the size factor. Sharing kernels, as you will see later, also gives containers many performance benefits over VMs like quicker boot-ups. Let\u2019s look at the diagram of how containers work. Comparing the two diagrams, we notice two things: Containers do not have a separate (guest) OS Container engine is the intermediary between containers and Host OS. It is used to facilitate the life-cycle of a container on the Host OS (it is not a necessity, however). The next section explains in detail how containers share the same operating system (kernel, to be precise) as the host machine and yet provide isolated environments for applications to run. How are containers implemented We\u2019ve talked about how containers, unlike virtual machines, share the same kernel as the host operating system and provide isolated environments for applications to run. This is achieved without the overhead of running a guest operating system on the host OS, thanks to two features of linux kernel called cgroups and kernel namespaces. Now that we are touching upon the internals of containers, it would be appropriate to give a more technically accurate representation of what they are. A container is a linux process or a group of linux processes which is restricted in - visibility into processes outside the container (implemented using namespace) - quantity of resources it can use (implemented using cgroups) and - system calls that can be made from the container. Refer seccomp , if interested in knowing more. These restrictions are what make a containerized application remain isolated from other processes running in the same host. Now let\u2019s talk about namespaces and cgroup in a little more detail. Namespaces Visibility of processes inside a container should be restricted within itself. This is what linux namespaces do. The idea is that processes within a namespace can\u2019t affect those which it can\u2019t \u201csee\u201d. Processes sharing a single namespace have identities, service and/or interfaces unique to the namespace they exist in. Here\u2019s a list of namespaces in linux: Mount Process groups sharing a mount namespace share a separate, private set of mount points and file system view. Any modifications made to these namespaced mount points are not visible outside the namespace. For example it is possible to have a /var within the a mount namespace which is different from /var in the host. PID A processes in a pid namespace have process ids which are unique only within the namespace. A process can be a root process (pid 1) in its own pid namespace and have an entire tree of processes under it. Network Each network namespace will have its own network device instances that can be configured with individual network addresses. Processes in the same network namespace can have their own ports and route tables. User User namespaces can have their own users and group ids. It\u2019s possible for a process using a non-privileged user in the host machine to have a root user identity within a user namespace. Cgroup Allows creation of cgroups which can be used only within the cgroup namespace. Cgroups will be covered in more detail in the following section. UTS This namespace has its own hostname and domain name IPC. Each IPC namespace has its own System V and POSIX message queues. As complex as it seems, creating namespaces in linux is quite simple. Let\u2019s see a quick demo to create a PID namespace. You\u2019ll need a linux based OS with sudoers permission to follow along. DEMO: namespaces First we check which processes are running in the host system (output varies from system to system). Note the process with pid 1. Let\u2019s create a PID namespace with the unshare command and create a bash process in the namespace You can see that ps aux (which itself is a process launched in the PID namespace so created) can only see processes within its own namespace. Hence, the output shows only 2 processes running within the namespace. Also note, the root process (pid 1) in the namespace is not init but it is the bash shell which we specified while creating the namespace. Let\u2019s create another process in the same namespace which sleeps for 1000 seconds in the background. In my case the pid of the sleep process is 44 within the PID namespace . On a separate terminal, check for the process id of the sleep process as seen from the host. Note the difference in pid (23844 in the host and 44 within the namespace) though both refer to the same process (start time and all other attributes are same). It\u2019s also possible to nest namespaces i.e create a pid namespace from another pid namespace. Try out sudo nsenter -t 23844 --pid -r bash to reenter the namespace and create another pid namespace within it. It should be fun to do! Cgroups A cgroup can be defined as a set of processes whose usage of resources is metered and monitored. The resources can be memory pages, disk i/o, CPU etc. In fact, cgroups are classified based on which resource the limit is imposed on and nature of action taken when a limit is violated. The component in the cgroup which tracks resource utilization and controls the behaviour of processes in a cgroup is called resource-subsystem or resource controller. Following is the set of resource controllers and their function according to RHEL\u2019s introduction to cgroups : blkio \u2014 this subsystem sets limits on input/output access to and from block devices such as physical drives (disk, solid state, or USB). cpu \u2014 this subsystem uses the scheduler to provide cgroup processes access to the CPU. cpuacct \u2014 this subsystem generates automatic reports on CPU resources used by processes in a cgroup. cpuset \u2014 this subsystem assigns individual CPUs (on a multicore system) and memory nodes to processes in a cgroup. devices \u2014 this subsystem allows or denies access to devices by processes in a cgroup. freezer \u2014 this subsystem suspends or resumes processes in a cgroup. memory \u2014 this subsystem sets limits on memory use by processes in a cgroup and generates automatic reports on memory resources used by those processes. Cgroups follow a hierarchical, tree-like structure for each resource controller i.e one cgroup exists for each controller. Each cgroup in a hierarchy inherits certain attributes (e.g limits) from its parent cgroup. Let\u2019s try out a quick demo with memory cgroups to wrap our heads around the above ideas. You\u2019ll need a linux based OS (here, RedHat) with sudo permission to follow along. DEMO: cgroups Let\u2019s start by checking if cgroup tools are installed in your machine. Execute mount | grep \"^cgroup\" . If you have the tools installed you\u2019ll see a output like this: If not, install the tools with sudo yum install libcgroup-tools -y . Now, we create a memory cgroup called mem_group with \u201croot\u201d as the owner of the cgroup. Command executed sudo cgcreate -a root -g memory:mem_group . Verify that cgroup is created. /sys/fs/cgroup/ is the pseudo filesystem where a newly created cgroup is added as a sub-group. Memory cgroup puts a limit on the memory usage of processes in the cgroup. Let\u2019s see what the limits are for mem_group. The file for checking the memory limit is memory.limit_in_bytes( more information here , if you\u2019re interested). Note that mem_group has inherited the limit from its parent cgroup Now, let\u2019s reduce the memory usage limit to 20KB for the purpose of our demo (the actual limit is rounded off to the nearest power of 2). This limit is too low and hence most of the processes attached to mem_group should be OOM killed. Create a new shell and attach it to the cgroup. We need sudo permissions for this. The process is OOM killed as expected. You can confirm the same with dmesg logs (mm_fault_error). If you want to try out a more in-depth exercise on cgroups, check out this tutorial from Geeks for Geeks . Let\u2019s come back to containers again. Containers share the same kernel as the underlying host operating system and provide an isolated environment of the application within. Cgroups help in managing resources used by processes within a container and namespaces help isolate network stack, pids, users, group ids and mount points in a container from another container running on the same host. Of course, there are more components to containers which truly make it fully functional but that discussion is out of scope of this module. Container engine Container engines ease the process of creating and managing containers in a host machine. How? The container creation workflow typically begins with a container image. A container image is a packaged, portable version of the target application bundled with all dependencies for it to run. These container images are either available on the host machine (container host) from previous builds or need to be pulled from a remote repository of images. Sometimes the container engine might need to build the container image from a set of instructions. Finally once the container image is fetched/built, the container engine unpacks the image and creates an isolated environment for the application as per the image specifications. The files in the container image are then mounted to the isolated environment to get the application up and running within the container. There are several container engines available like Docker, RKT, LXC (one of the first container engines) which require different image formats (Docker, LXD). OCI (Open Container Initiative) is a collaborative project started by Docker that aims to standardize container runtime specifications and image formats across vendors. OCI FAQ section is a good place to start if you\u2019re curious about this project. We will focus on Docker in the next section .","title":"Introduction To Containers"},{"location":"level102/containerization_and_orchestration/intro_to_containers/#what-are-containers","text":"Here's a popular definition of containers according to Docker , a popular containerization engine : A container is a standard unit of software that packages up code and all its dependencies so the application runs quickly and reliably from one computing environment to another Let's break this down. A container is your code bundled along with its entire runtime environment. That includes your system libraries, binaries and config files needed for your application to run.","title":"What are containers"},{"location":"level102/containerization_and_orchestration/intro_to_containers/#why-containers","text":"You might wonder why we need to pack your application along with its dependencies. This is where the second part of the definition comes, ...so the application runs quickly and reliably from one computing environment to another. Developers usually write code in their dev environment (or local machine), test it in one or two staging/test environments before pushing their code into production. Ideally, for reliably testing applications before pushing to production, we need all these environments to be uniform to a tee (underlying OS, system libraries etc). Of course, the ideal is hard to achieve especially when we're using a mix of on-prem (complete control) and cloud infrastructure providers (more restrictive in terms of control of hardware and security options), a scenario which is more common today. This is exactly why we need to package not only the code but also the dependencies; so that your application runs reliably irrespective of which infrastructure or environment it runs on. We can run several containers on a single host. Due to how containers are implemented, each container has its own isolated environment within the same host. This means that a monolithic application can be broken down into micro-services and packaged into containers. Each microservice runs in the host machine in isolated environments. This is another reason why containers are used: separation of concerns . Providing isolated environments does not let the failure of one application in one container affect the other. This is called fault isolation . Isolation also gives the added benefit of increased security due to restricted visibility of processes in a container. Due to how most of the containerization solutions are implemented, we also have the option to cap the amount of resources consumed by applications running within a container. This is called resource limiting . Will will discuss this feature in more detail in the section on cgroups.","title":"Why containers"},{"location":"level102/containerization_and_orchestration/intro_to_containers/#difference-between-virtual-machines-and-containers","text":"Let's digress a little and go into some history. In the previous section we talked about how containers help us in achieving separation of concerns. Before the wide-spread usage of containers, virtualization was used for running applications in isolated environments in the same host (it\u2019s still being used today in some cases). In plain terms, virtualization is where we package software along with a copy of the OS on which it runs. This package is called a virtual machine (VM). The image of the OS bundled in the VM is called Guest OS. A component called Hypervisor sits between the Guest and the Host OS and is responsible for facilitating the access of the underlying OS\u2019s hardware to the Guest OS. You can learn more about hypervisors here . Similar to how multiple containers can be run in a single host machine, multiple VMs can be run on a single host and in this way, it\u2019s possible to run applications (or each microservice) in a separate VM and achieve separation of concerns. The main focus here is on the size of the VMs and containers. VMs come along with a copy of the guest operating system and therefore are heavy-weight compared to containers. If you\u2019re more interested in comparison of VMs and containers, you can check these articles from Backblaze and NetApp . While it is possible to run an operating system on a host with an incompatible kernel using hypervisors (e.g Windows 10 VM on CentOS 7), in cases where kernels can be shared (e.g Ubuntu on CentOS 7) containers are preferred over VMs due to the size factor. Sharing kernels, as you will see later, also gives containers many performance benefits over VMs like quicker boot-ups. Let\u2019s look at the diagram of how containers work. Comparing the two diagrams, we notice two things: Containers do not have a separate (guest) OS Container engine is the intermediary between containers and Host OS. It is used to facilitate the life-cycle of a container on the Host OS (it is not a necessity, however). The next section explains in detail how containers share the same operating system (kernel, to be precise) as the host machine and yet provide isolated environments for applications to run.","title":"Difference between virtual machines and containers"},{"location":"level102/containerization_and_orchestration/intro_to_containers/#how-are-containers-implemented","text":"We\u2019ve talked about how containers, unlike virtual machines, share the same kernel as the host operating system and provide isolated environments for applications to run. This is achieved without the overhead of running a guest operating system on the host OS, thanks to two features of linux kernel called cgroups and kernel namespaces. Now that we are touching upon the internals of containers, it would be appropriate to give a more technically accurate representation of what they are. A container is a linux process or a group of linux processes which is restricted in - visibility into processes outside the container (implemented using namespace) - quantity of resources it can use (implemented using cgroups) and - system calls that can be made from the container. Refer seccomp , if interested in knowing more. These restrictions are what make a containerized application remain isolated from other processes running in the same host. Now let\u2019s talk about namespaces and cgroup in a little more detail.","title":"How are containers implemented"},{"location":"level102/containerization_and_orchestration/intro_to_containers/#namespaces","text":"Visibility of processes inside a container should be restricted within itself. This is what linux namespaces do. The idea is that processes within a namespace can\u2019t affect those which it can\u2019t \u201csee\u201d. Processes sharing a single namespace have identities, service and/or interfaces unique to the namespace they exist in. Here\u2019s a list of namespaces in linux: Mount Process groups sharing a mount namespace share a separate, private set of mount points and file system view. Any modifications made to these namespaced mount points are not visible outside the namespace. For example it is possible to have a /var within the a mount namespace which is different from /var in the host. PID A processes in a pid namespace have process ids which are unique only within the namespace. A process can be a root process (pid 1) in its own pid namespace and have an entire tree of processes under it. Network Each network namespace will have its own network device instances that can be configured with individual network addresses. Processes in the same network namespace can have their own ports and route tables. User User namespaces can have their own users and group ids. It\u2019s possible for a process using a non-privileged user in the host machine to have a root user identity within a user namespace. Cgroup Allows creation of cgroups which can be used only within the cgroup namespace. Cgroups will be covered in more detail in the following section. UTS This namespace has its own hostname and domain name IPC. Each IPC namespace has its own System V and POSIX message queues. As complex as it seems, creating namespaces in linux is quite simple. Let\u2019s see a quick demo to create a PID namespace. You\u2019ll need a linux based OS with sudoers permission to follow along.","title":"Namespaces"},{"location":"level102/containerization_and_orchestration/intro_to_containers/#demo-namespaces","text":"First we check which processes are running in the host system (output varies from system to system). Note the process with pid 1. Let\u2019s create a PID namespace with the unshare command and create a bash process in the namespace You can see that ps aux (which itself is a process launched in the PID namespace so created) can only see processes within its own namespace. Hence, the output shows only 2 processes running within the namespace. Also note, the root process (pid 1) in the namespace is not init but it is the bash shell which we specified while creating the namespace. Let\u2019s create another process in the same namespace which sleeps for 1000 seconds in the background. In my case the pid of the sleep process is 44 within the PID namespace . On a separate terminal, check for the process id of the sleep process as seen from the host. Note the difference in pid (23844 in the host and 44 within the namespace) though both refer to the same process (start time and all other attributes are same). It\u2019s also possible to nest namespaces i.e create a pid namespace from another pid namespace. Try out sudo nsenter -t 23844 --pid -r bash to reenter the namespace and create another pid namespace within it. It should be fun to do!","title":"DEMO: namespaces"},{"location":"level102/containerization_and_orchestration/intro_to_containers/#cgroups","text":"A cgroup can be defined as a set of processes whose usage of resources is metered and monitored. The resources can be memory pages, disk i/o, CPU etc. In fact, cgroups are classified based on which resource the limit is imposed on and nature of action taken when a limit is violated. The component in the cgroup which tracks resource utilization and controls the behaviour of processes in a cgroup is called resource-subsystem or resource controller. Following is the set of resource controllers and their function according to RHEL\u2019s introduction to cgroups : blkio \u2014 this subsystem sets limits on input/output access to and from block devices such as physical drives (disk, solid state, or USB). cpu \u2014 this subsystem uses the scheduler to provide cgroup processes access to the CPU. cpuacct \u2014 this subsystem generates automatic reports on CPU resources used by processes in a cgroup. cpuset \u2014 this subsystem assigns individual CPUs (on a multicore system) and memory nodes to processes in a cgroup. devices \u2014 this subsystem allows or denies access to devices by processes in a cgroup. freezer \u2014 this subsystem suspends or resumes processes in a cgroup. memory \u2014 this subsystem sets limits on memory use by processes in a cgroup and generates automatic reports on memory resources used by those processes. Cgroups follow a hierarchical, tree-like structure for each resource controller i.e one cgroup exists for each controller. Each cgroup in a hierarchy inherits certain attributes (e.g limits) from its parent cgroup. Let\u2019s try out a quick demo with memory cgroups to wrap our heads around the above ideas. You\u2019ll need a linux based OS (here, RedHat) with sudo permission to follow along.","title":"Cgroups"},{"location":"level102/containerization_and_orchestration/intro_to_containers/#demo-cgroups","text":"Let\u2019s start by checking if cgroup tools are installed in your machine. Execute mount | grep \"^cgroup\" . If you have the tools installed you\u2019ll see a output like this: If not, install the tools with sudo yum install libcgroup-tools -y . Now, we create a memory cgroup called mem_group with \u201croot\u201d as the owner of the cgroup. Command executed sudo cgcreate -a root -g memory:mem_group . Verify that cgroup is created. /sys/fs/cgroup/ is the pseudo filesystem where a newly created cgroup is added as a sub-group. Memory cgroup puts a limit on the memory usage of processes in the cgroup. Let\u2019s see what the limits are for mem_group. The file for checking the memory limit is memory.limit_in_bytes( more information here , if you\u2019re interested). Note that mem_group has inherited the limit from its parent cgroup Now, let\u2019s reduce the memory usage limit to 20KB for the purpose of our demo (the actual limit is rounded off to the nearest power of 2). This limit is too low and hence most of the processes attached to mem_group should be OOM killed. Create a new shell and attach it to the cgroup. We need sudo permissions for this. The process is OOM killed as expected. You can confirm the same with dmesg logs (mm_fault_error). If you want to try out a more in-depth exercise on cgroups, check out this tutorial from Geeks for Geeks . Let\u2019s come back to containers again. Containers share the same kernel as the underlying host operating system and provide an isolated environment of the application within. Cgroups help in managing resources used by processes within a container and namespaces help isolate network stack, pids, users, group ids and mount points in a container from another container running on the same host. Of course, there are more components to containers which truly make it fully functional but that discussion is out of scope of this module.","title":"DEMO: cgroups"},{"location":"level102/containerization_and_orchestration/intro_to_containers/#container-engine","text":"Container engines ease the process of creating and managing containers in a host machine. How? The container creation workflow typically begins with a container image. A container image is a packaged, portable version of the target application bundled with all dependencies for it to run. These container images are either available on the host machine (container host) from previous builds or need to be pulled from a remote repository of images. Sometimes the container engine might need to build the container image from a set of instructions. Finally once the container image is fetched/built, the container engine unpacks the image and creates an isolated environment for the application as per the image specifications. The files in the container image are then mounted to the isolated environment to get the application up and running within the container. There are several container engines available like Docker, RKT, LXC (one of the first container engines) which require different image formats (Docker, LXD). OCI (Open Container Initiative) is a collaborative project started by Docker that aims to standardize container runtime specifications and image formats across vendors. OCI FAQ section is a good place to start if you\u2019re curious about this project. We will focus on Docker in the next section .","title":"Container engine"},{"location":"level102/containerization_and_orchestration/orchestration_with_kubernetes/","text":"Introduction Now we finally arrive at the most awaited part: running and managing containers at scale. So far, we have seen how Docker facilitates managing the life-cycle of containers and provides improved portability of applications. Docker does provide a solution for easing the deployment of containers on a large scale ( you can check out Docker Swarm, if interested) which integrates well with Docker containers. However, Kubernetes has become the de-facto tool for orchestrating the management of microservices (as containers) in large distributed environments. Let\u2019s see the points of interest for us, SREs, to use container orchestration tools and Kubernetes in particular. Motivation to use Kubernetes Ease of usage Though there is a steep learning curve associated with Kubernetes, once learnt , can be used as a one stop tool to manage your microservices. With a single command it is possible to deploy full fledged production ready environments. The desired state of an application needs to be recorded as a YAML manifest and Kubernetes manages the application for you. Ensure optimum usage of resources We can specify limits on resources used by each container in a deployment. We can also specify our choice of nodes where Kubernetes can schedule nodes to be deployed (e.g microservices with high CPU consumption can be instructed to be deployed in high compute nodes). Fault tolerance Self-healing is built into basic resource types of Kubernetes. This removes the headache of designing a fault tolerant application system from scratch. This applies especially to stateless applications. Infrastructure agnostic Kubernetes does not have vendor lock-in. It can be set up in multiple cloud environments or in on-prem data centers. Strong community support and documentation Kubernetes is open-source and many technologies like operators, service mesh etc. have been built by the community to manage and monitor Kubernetes-orchestrated applications better. Extensible and customisable We can build our custom resource definitions which fit our use case for managing applications and use Kubernetes to manage them (with custom controllers). You can check out this article if you are more interested in this topic. Architecture of Kubernetes Here\u2019s a diagram (from the official Kubernetes documentation ) containing different components which make Kubernetes work: Kubernetes components can be divided into two parts: control plane components and data plane components . A Kubernetes cluster consists of 1 or more host machines (called nodes) where the containers managed by Kubernetes are run. This constitutes the data plane (or node plane). The brain of Kubernetes which responds to events from the node plane (e.g create a pod, replicas mismatch) and does the main orchestration is called the control plane. All control plane components are typically installed in a master node. This master node does not run any user containers. The Kubernetes components themselves are run as containers wrapped in Pods (which is the most basic kubernetes resource object). Control plane components: kube-apiserver etcd kube-scheduler kube-controller-manager Node plane components kubelet kube-proxy This workflow might help you understand the working on components better: An SRE installs kubectl in their local machine. This is the client which interacts with the Kubernetes control plane (and hence the cluster). They create a YAML file, called manifest which specifies the desired state of the resource (e.g a deployment names \u201cfrontend\u201d needs 3 pods to always be running) When they issue a command to create objects based in the YAML file, the kubectl CLI tool sends a rest API request to the kube-apiserver . If the manifest is valid, it is stored as key value pairs in the etcd server on the control plane. kube-scheduler chooses which nodes to put the containers on (basically schedules them) There are controller processes (managed by kube-controller manager) which makes sure the current state of the cluster is equivalent to the desired state (here, 3 pods are indeed running in the cluster -> all is fine). On the node plane side, kubelet makes sure that pods are locally kept in running state. LAB Prerequisites The best way to start this exercise is to use a Play with kubernetes lab . The environment gets torn down after 4 hours. So make sure that you save your files if you want to resume them. For persistent kubernetes clusters, you can set it up either in your local (using minikube ) or you can create a kubernetes cluster in Azure , GCP or any other cloud provider. Knowledge of YAML is nice to have for understanding the manifest files. Hands-on Lab 1: We are going to create an object called Pod which is the most basic unit for running a container in Kubernetes. Here, we will create a pod called \u2018nginx-pod\u201d which contains an nginx container called \u201cweb\u201d. We will also expose port 80 in the container so that we can interact with the nginx container. Save the below manifest in a file called nginx-pod.yaml apiVersion: v1 #[1] kind: Pod #[2] metadata: #[3] name: nginx-pod #[4] labels: #[5] app: nginx spec: #[6] containers: #[7] - name: web #[8] image: nginx #[9] ports: #[10] - name: web #[11] containerPort: 80 #[12] protocol: TCP #[13] Let\u2019s very briefly understand what\u2019s here: #[2] - kind: The \u201ckind\u201d of object that\u2019s being created. Here it is a Pod #[1] - apiVersion: The apiVersion of the \u201cPod\u201d resource. There could be minor changes in the values or keys in the yaml file if the version varies. #[3] - metadata: The metadata section of the file where pod labels and name is given #[6] - spec: This is the main part where the things inside the pod are defined These are not random key value pairs! They have to be interpretable by the kubeapiserver. You can check which key value pairs are optional/mandatory using kubectl explain pod command. Do try it out! Apply the manifest using the command kubectl apply -f nginx-pod.yaml . This creates the \u201cnginx-pod\u201d pod in the kubernetes cluster. Verify that the pod is in running state using kubectl get pod . It shows that nginx-pod is in Running state. 1/1 indicates that out of 1 out of 1 container(s) inside the pod is healthy. To check if the container running in \u201cnginx-pod\u201d is indeed \u201cweb\u201d we do the kubectl describe pod/nginx-pod command. This gives a lengthy output with a detailed description of the pod and the events that happened since the pod was created. This command is very useful for debugging. The part we are concerned here is this: You can see \u201cweb\u201d under the Containers section with Image as nginx. This is what we are looking for. How do we access the welcome page of nginx \u201cweb\u201d container? In the describe command you can see the IP address of the pod. Each pod is assigned an IP address on creation. Here, this is 10.244.1.3 Issue a curl request from the host curl 10.244.1.3:80 . You will get the welcome page! Let\u2019s say we want to use a specific tag of nginx (say 1.20.1) in the same pod i.e we want to modify some property of the pod. You can try editing nginx-pod.yaml (image: nginx:1.20.1 in #[9])and reapplying (step 2.). It will create a new container in the same pod with the new image. A container is created within the pod but the pod is the same. You can verify by checking the pod start time in describe command. It would show a much older time. What if we want to change the image to 1.20.1 for 1000 nginx pods? Stepping a little back, what if we want to create 1000 nginx pods. Of course, we can write a script but Kubernetes already offers a resource type called \u201cdeployment\u201d to manage large scale deployments better. Lab 2: We\u2019ll go a step further to see how we can create more than a single instance of the nginx pod at the same time. We will first create Save the below manifest in a file called nginx-deploy.yaml apiVersion: apps/v1 kind: Deployment #[1] metadata: name: nginx-deployment labels: app: nginx spec: replicas: 3 #[2] selector: matchLabels: app: nginx #[3] template: #[4] metadata: labels: app: nginx #[5] spec: containers: - name: web image: nginx ports: - name: web containerPort: 80 protocol: \"TCP\" You can see that it is similar to a pod definition till spec ( #[1] has Deployment as kind, api version is also different). Another thing interesting observation is the metadata and spec parts under #[4] is almost the same as the metadata and spec section under the Pod definition in Lab 1 (do go up and cross check this). What this implies is that we are deploying 3 nginx pods similar to Lab1. Also, the labels in matchLabels should be the same as labels under #[4] . Now apply the manifest using kubectl apply -f nginx-deploy.yaml Verify that 3 pods are indeed created. If you\u2019re curious, check the output of kubectl get deploy and kubectl describe deploy nginx-deployment . Delete one of the 3 pods using kubectl delete pod . After a few seconds again do kubectl get pod . You can see that a new pod is spawned to keep the total number of pods as 3 (see AGE 15s compared to others created 27 minutes ago)! This is a demonstration of how Kubernetes does fault tolerance. This is a property of Kubernetes deployment object (kill the pod from Lab1, it won\u2019t be respawned :) ) Let\u2019s say we want to increase the number of pods to 10. Try out kubectl scale deploy --replicas=10 nginx-deployment . You can see that 3/10 pods are older than the rest. This means Kubernetes has added 7 extra pods to scale the deployment to 10. This shows how simple it is to scale up and scale down containers using Kubernetes. Let\u2019s put all these pods behind a ClusterIP service. Execute kubectl expose deployment nginx-deployment --name=nginx-service . Curl the IP corresponding to 10.96.114.184. This curl request reaches one of the 10 pods in the deployment \u201cnginx-deployment\u201d in a round robin fashion. What happens when we execute the expose command is that a kubernetes Service is created of type Cluster IP so that all the pods behind this service are accessible through a single local IP (10.96.114.184, here). It is possible to have a public IP instead (i.e an actual external load balancer) by creating a Service of type LoadBalancer . Do feel free to play around with it! The above exercises a pretty good exposure to using Kubernetes to manage large scale deployments. Trust me, the process is very similar to the above for operating 1000 deployments and containers too! While a Deployment object is good enough for managing stateless applications, Kubernetes provides other resources like Job, Daemonset, Cronjob, Statefulset etc. to manage special use cases. eAdditional labs: https://kubernetes.courselabs.co/ (Huge number of free follow-along exercises to play with Kubernetes) Advanced topics Most often than not, microservices orchestrated with Kubernetes contain dozens of instances of resources like deployment, services and configs. The manifests for these applications can be auto- generated with Helm templates and passed on as Helm charts. Similar to how we have PiPy for python packages there are remote repositories like Bitnami where Helm charts (e.g for setting up a production-ready Prometheus or Kafka with a single click) can be downloaded and used. This is a good place to begin . Kubernetes provides the flexibility to create our custom resources (similar to Deployment or the Pod which we saw). For instance, if you want to create 5 instances of a resource with kind as SchoolOfSre you can! The only thing is that you have to write your custom resource for it. You can also build a custom operator for your custom resource to take certain actions on the resource instance. You can check here for more information.","title":"Orchestration With Kubernetes"},{"location":"level102/containerization_and_orchestration/orchestration_with_kubernetes/#introduction","text":"Now we finally arrive at the most awaited part: running and managing containers at scale. So far, we have seen how Docker facilitates managing the life-cycle of containers and provides improved portability of applications. Docker does provide a solution for easing the deployment of containers on a large scale ( you can check out Docker Swarm, if interested) which integrates well with Docker containers. However, Kubernetes has become the de-facto tool for orchestrating the management of microservices (as containers) in large distributed environments. Let\u2019s see the points of interest for us, SREs, to use container orchestration tools and Kubernetes in particular.","title":"Introduction"},{"location":"level102/containerization_and_orchestration/orchestration_with_kubernetes/#motivation-to-use-kubernetes","text":"Ease of usage Though there is a steep learning curve associated with Kubernetes, once learnt , can be used as a one stop tool to manage your microservices. With a single command it is possible to deploy full fledged production ready environments. The desired state of an application needs to be recorded as a YAML manifest and Kubernetes manages the application for you. Ensure optimum usage of resources We can specify limits on resources used by each container in a deployment. We can also specify our choice of nodes where Kubernetes can schedule nodes to be deployed (e.g microservices with high CPU consumption can be instructed to be deployed in high compute nodes). Fault tolerance Self-healing is built into basic resource types of Kubernetes. This removes the headache of designing a fault tolerant application system from scratch. This applies especially to stateless applications. Infrastructure agnostic Kubernetes does not have vendor lock-in. It can be set up in multiple cloud environments or in on-prem data centers. Strong community support and documentation Kubernetes is open-source and many technologies like operators, service mesh etc. have been built by the community to manage and monitor Kubernetes-orchestrated applications better. Extensible and customisable We can build our custom resource definitions which fit our use case for managing applications and use Kubernetes to manage them (with custom controllers). You can check out this article if you are more interested in this topic.","title":"Motivation to use Kubernetes"},{"location":"level102/containerization_and_orchestration/orchestration_with_kubernetes/#architecture-of-kubernetes","text":"Here\u2019s a diagram (from the official Kubernetes documentation ) containing different components which make Kubernetes work: Kubernetes components can be divided into two parts: control plane components and data plane components . A Kubernetes cluster consists of 1 or more host machines (called nodes) where the containers managed by Kubernetes are run. This constitutes the data plane (or node plane). The brain of Kubernetes which responds to events from the node plane (e.g create a pod, replicas mismatch) and does the main orchestration is called the control plane. All control plane components are typically installed in a master node. This master node does not run any user containers. The Kubernetes components themselves are run as containers wrapped in Pods (which is the most basic kubernetes resource object). Control plane components: kube-apiserver etcd kube-scheduler kube-controller-manager Node plane components kubelet kube-proxy This workflow might help you understand the working on components better: An SRE installs kubectl in their local machine. This is the client which interacts with the Kubernetes control plane (and hence the cluster). They create a YAML file, called manifest which specifies the desired state of the resource (e.g a deployment names \u201cfrontend\u201d needs 3 pods to always be running) When they issue a command to create objects based in the YAML file, the kubectl CLI tool sends a rest API request to the kube-apiserver . If the manifest is valid, it is stored as key value pairs in the etcd server on the control plane. kube-scheduler chooses which nodes to put the containers on (basically schedules them) There are controller processes (managed by kube-controller manager) which makes sure the current state of the cluster is equivalent to the desired state (here, 3 pods are indeed running in the cluster -> all is fine). On the node plane side, kubelet makes sure that pods are locally kept in running state.","title":"Architecture of Kubernetes"},{"location":"level102/containerization_and_orchestration/orchestration_with_kubernetes/#lab","text":"","title":"LAB"},{"location":"level102/containerization_and_orchestration/orchestration_with_kubernetes/#prerequisites","text":"The best way to start this exercise is to use a Play with kubernetes lab . The environment gets torn down after 4 hours. So make sure that you save your files if you want to resume them. For persistent kubernetes clusters, you can set it up either in your local (using minikube ) or you can create a kubernetes cluster in Azure , GCP or any other cloud provider. Knowledge of YAML is nice to have for understanding the manifest files.","title":"Prerequisites"},{"location":"level102/containerization_and_orchestration/orchestration_with_kubernetes/#hands-on","text":"","title":"Hands-on"},{"location":"level102/containerization_and_orchestration/orchestration_with_kubernetes/#lab-1","text":"We are going to create an object called Pod which is the most basic unit for running a container in Kubernetes. Here, we will create a pod called \u2018nginx-pod\u201d which contains an nginx container called \u201cweb\u201d. We will also expose port 80 in the container so that we can interact with the nginx container. Save the below manifest in a file called nginx-pod.yaml apiVersion: v1 #[1] kind: Pod #[2] metadata: #[3] name: nginx-pod #[4] labels: #[5] app: nginx spec: #[6] containers: #[7] - name: web #[8] image: nginx #[9] ports: #[10] - name: web #[11] containerPort: 80 #[12] protocol: TCP #[13] Let\u2019s very briefly understand what\u2019s here: #[2] - kind: The \u201ckind\u201d of object that\u2019s being created. Here it is a Pod #[1] - apiVersion: The apiVersion of the \u201cPod\u201d resource. There could be minor changes in the values or keys in the yaml file if the version varies. #[3] - metadata: The metadata section of the file where pod labels and name is given #[6] - spec: This is the main part where the things inside the pod are defined These are not random key value pairs! They have to be interpretable by the kubeapiserver. You can check which key value pairs are optional/mandatory using kubectl explain pod command. Do try it out! Apply the manifest using the command kubectl apply -f nginx-pod.yaml . This creates the \u201cnginx-pod\u201d pod in the kubernetes cluster. Verify that the pod is in running state using kubectl get pod . It shows that nginx-pod is in Running state. 1/1 indicates that out of 1 out of 1 container(s) inside the pod is healthy. To check if the container running in \u201cnginx-pod\u201d is indeed \u201cweb\u201d we do the kubectl describe pod/nginx-pod command. This gives a lengthy output with a detailed description of the pod and the events that happened since the pod was created. This command is very useful for debugging. The part we are concerned here is this: You can see \u201cweb\u201d under the Containers section with Image as nginx. This is what we are looking for. How do we access the welcome page of nginx \u201cweb\u201d container? In the describe command you can see the IP address of the pod. Each pod is assigned an IP address on creation. Here, this is 10.244.1.3 Issue a curl request from the host curl 10.244.1.3:80 . You will get the welcome page! Let\u2019s say we want to use a specific tag of nginx (say 1.20.1) in the same pod i.e we want to modify some property of the pod. You can try editing nginx-pod.yaml (image: nginx:1.20.1 in #[9])and reapplying (step 2.). It will create a new container in the same pod with the new image. A container is created within the pod but the pod is the same. You can verify by checking the pod start time in describe command. It would show a much older time. What if we want to change the image to 1.20.1 for 1000 nginx pods? Stepping a little back, what if we want to create 1000 nginx pods. Of course, we can write a script but Kubernetes already offers a resource type called \u201cdeployment\u201d to manage large scale deployments better.","title":"Lab 1:"},{"location":"level102/containerization_and_orchestration/orchestration_with_kubernetes/#lab-2","text":"We\u2019ll go a step further to see how we can create more than a single instance of the nginx pod at the same time. We will first create Save the below manifest in a file called nginx-deploy.yaml apiVersion: apps/v1 kind: Deployment #[1] metadata: name: nginx-deployment labels: app: nginx spec: replicas: 3 #[2] selector: matchLabels: app: nginx #[3] template: #[4] metadata: labels: app: nginx #[5] spec: containers: - name: web image: nginx ports: - name: web containerPort: 80 protocol: \"TCP\" You can see that it is similar to a pod definition till spec ( #[1] has Deployment as kind, api version is also different). Another thing interesting observation is the metadata and spec parts under #[4] is almost the same as the metadata and spec section under the Pod definition in Lab 1 (do go up and cross check this). What this implies is that we are deploying 3 nginx pods similar to Lab1. Also, the labels in matchLabels should be the same as labels under #[4] . Now apply the manifest using kubectl apply -f nginx-deploy.yaml Verify that 3 pods are indeed created. If you\u2019re curious, check the output of kubectl get deploy and kubectl describe deploy nginx-deployment . Delete one of the 3 pods using kubectl delete pod . After a few seconds again do kubectl get pod . You can see that a new pod is spawned to keep the total number of pods as 3 (see AGE 15s compared to others created 27 minutes ago)! This is a demonstration of how Kubernetes does fault tolerance. This is a property of Kubernetes deployment object (kill the pod from Lab1, it won\u2019t be respawned :) ) Let\u2019s say we want to increase the number of pods to 10. Try out kubectl scale deploy --replicas=10 nginx-deployment . You can see that 3/10 pods are older than the rest. This means Kubernetes has added 7 extra pods to scale the deployment to 10. This shows how simple it is to scale up and scale down containers using Kubernetes. Let\u2019s put all these pods behind a ClusterIP service. Execute kubectl expose deployment nginx-deployment --name=nginx-service . Curl the IP corresponding to 10.96.114.184. This curl request reaches one of the 10 pods in the deployment \u201cnginx-deployment\u201d in a round robin fashion. What happens when we execute the expose command is that a kubernetes Service is created of type Cluster IP so that all the pods behind this service are accessible through a single local IP (10.96.114.184, here). It is possible to have a public IP instead (i.e an actual external load balancer) by creating a Service of type LoadBalancer . Do feel free to play around with it! The above exercises a pretty good exposure to using Kubernetes to manage large scale deployments. Trust me, the process is very similar to the above for operating 1000 deployments and containers too! While a Deployment object is good enough for managing stateless applications, Kubernetes provides other resources like Job, Daemonset, Cronjob, Statefulset etc. to manage special use cases. eAdditional labs: https://kubernetes.courselabs.co/ (Huge number of free follow-along exercises to play with Kubernetes)","title":"Lab 2:"},{"location":"level102/containerization_and_orchestration/orchestration_with_kubernetes/#advanced-topics","text":"Most often than not, microservices orchestrated with Kubernetes contain dozens of instances of resources like deployment, services and configs. The manifests for these applications can be auto- generated with Helm templates and passed on as Helm charts. Similar to how we have PiPy for python packages there are remote repositories like Bitnami where Helm charts (e.g for setting up a production-ready Prometheus or Kafka with a single click) can be downloaded and used. This is a good place to begin . Kubernetes provides the flexibility to create our custom resources (similar to Deployment or the Pod which we saw). For instance, if you want to create 5 instances of a resource with kind as SchoolOfSre you can! The only thing is that you have to write your custom resource for it. You can also build a custom operator for your custom resource to take certain actions on the resource instance. You can check here for more information.","title":"Advanced topics"},{"location":"level102/continuous_integration_and_continuous_delivery/cicd_brief_history/","text":"The Evolution of the CI/CD Traditional development approaches have been around for a very long time. The waterfall model has been widely used in both large and small projects and has been successful. Despite the success, it has a lot of drawbacks like longer cycle times or delivery. While multiple team members are working on the project, the code changes get accumulated and never integrated until the planned build date. The build usually happens on agreed cycles that range from a month to a quarter. This results in several integration issues and build failures as the developers were working on their features in silos. It was a nightmare situation for the operations teams/for anyone to deploy the new builds/releases to the production environment because of lack of proper documentation on every change and the configuration requirements. So, to deploy successfully, often it required hot fixes and immediate patches. Another big challenge was collaboration. It is rare that the developer meets the operation engineers and does not have a full understanding of the production environment. All these challenges have given rise to longer cycle times for the delivery of the code changes. Agile methodology prescribes the delivery of incremental delivery of features in multiple iterations. So, the developers commit their code changes in smaller increments and roll out more frequently. Every code commit triggers a new build, and the integration issues are identified much early. This has improved the build process and thereby reduced the cycle time. This process is known as continuous integration or CI . The big barrier between the developers and the operation teams has been shrunken with the emergence of the trend where organizations are adapting to the DevOps and SRE disciplines. The collaboration between the developers and the operation teams is improved. Moreover, the use of the same tools and processes by both the teams has improved coordination and avoided conflicting understanding of the process. One of the main drivers in this regard is the continuous delivery (CD) process that ensures the incremental deployment of smaller changes. There are multiple pre-production environments also called the staging environments before deploying to production environments. CI/CD and DevOps The term DevOps represents the combination of Development (Dev) and Operations (Ops) teams. That is bringing developers and operations teams together for more collaboration. The development team often wants to introduce more features and more changes while the operation teams are more focused on the stability of the application in production. A change is always taken as a threat by the operations team as it can shake the stability of the environment. DevOps is termed as a culture that introduces the processes to reduce the barriers between developers and operations. The collaboration between Dev and Ops allows better follow-up of end-to-end production deployments and more frequent deployments. So, thus CI/CD is a key element in the DevOps processes.","title":"Brief History"},{"location":"level102/continuous_integration_and_continuous_delivery/cicd_brief_history/#the-evolution-of-the-cicd","text":"Traditional development approaches have been around for a very long time. The waterfall model has been widely used in both large and small projects and has been successful. Despite the success, it has a lot of drawbacks like longer cycle times or delivery. While multiple team members are working on the project, the code changes get accumulated and never integrated until the planned build date. The build usually happens on agreed cycles that range from a month to a quarter. This results in several integration issues and build failures as the developers were working on their features in silos. It was a nightmare situation for the operations teams/for anyone to deploy the new builds/releases to the production environment because of lack of proper documentation on every change and the configuration requirements. So, to deploy successfully, often it required hot fixes and immediate patches. Another big challenge was collaboration. It is rare that the developer meets the operation engineers and does not have a full understanding of the production environment. All these challenges have given rise to longer cycle times for the delivery of the code changes. Agile methodology prescribes the delivery of incremental delivery of features in multiple iterations. So, the developers commit their code changes in smaller increments and roll out more frequently. Every code commit triggers a new build, and the integration issues are identified much early. This has improved the build process and thereby reduced the cycle time. This process is known as continuous integration or CI . The big barrier between the developers and the operation teams has been shrunken with the emergence of the trend where organizations are adapting to the DevOps and SRE disciplines. The collaboration between the developers and the operation teams is improved. Moreover, the use of the same tools and processes by both the teams has improved coordination and avoided conflicting understanding of the process. One of the main drivers in this regard is the continuous delivery (CD) process that ensures the incremental deployment of smaller changes. There are multiple pre-production environments also called the staging environments before deploying to production environments.","title":"The Evolution of the CI/CD"},{"location":"level102/continuous_integration_and_continuous_delivery/cicd_brief_history/#cicd-and-devops","text":"The term DevOps represents the combination of Development (Dev) and Operations (Ops) teams. That is bringing developers and operations teams together for more collaboration. The development team often wants to introduce more features and more changes while the operation teams are more focused on the stability of the application in production. A change is always taken as a threat by the operations team as it can shake the stability of the environment. DevOps is termed as a culture that introduces the processes to reduce the barriers between developers and operations. The collaboration between Dev and Ops allows better follow-up of end-to-end production deployments and more frequent deployments. So, thus CI/CD is a key element in the DevOps processes.","title":"CI/CD and DevOps"},{"location":"level102/continuous_integration_and_continuous_delivery/conclusion/","text":"Applications in SRE Role The Monitoring, Automation and Eliminating the toil are some of the core pillars of the SRE discipline. As an SRE, you may require spending about 50% of time on automating the repetitive tasks and to eliminate the toil. CI/CD pipelines are one of the crucial tools for the SRE. They help in delivering the quality application with the smaller and regular and more frequent builds. Additionally, the CI/CD metrics such as Deployment time, Success rate, Cycle time and Automated test success rate etc. are the key things to watch to improve the quality of the product thus improving the reliability of the applications. Infrastructure-as-code is one of the standard practices followed in SRE for automating the repetitive configuration tasks. Every configuration is maintained as code, so it can be deployed using CI/CD pipelines. It is important to deliver the configuration changes to the production environments through CI/CD pipelines to maintain the versioning, consistency of the changes across environments and to avoid manual errors. Often, as an SRE, you are required to review the application CI/CD pipelines and recommend additional stages such as static code analysis and the security and privacy checks in the code to improve the security and reliability of the product. Conclusion In this chapter, we have studied the CI/CD pipelines with brief history on the challenges with the traditional build practices. We have also looked at how the CI/CD pipelines augments the SRE discipline. Use of CI/CD pipelines in software development life cycle is a modern approach in the SRE realm that helps achieve greater efficiency. We have also performed a hands-on lab activity on creating the CI/CD pipeline using Jenkins. References Continuous Integration(martinfowler.com) CI/CD for microservices - Azure Architecture Center | Microsoft Docs SREFoundationBlueprint_2 (devopsinstitute.com) Jenkins User Documentation","title":"Conclusion"},{"location":"level102/continuous_integration_and_continuous_delivery/conclusion/#applications-in-sre-role","text":"The Monitoring, Automation and Eliminating the toil are some of the core pillars of the SRE discipline. As an SRE, you may require spending about 50% of time on automating the repetitive tasks and to eliminate the toil. CI/CD pipelines are one of the crucial tools for the SRE. They help in delivering the quality application with the smaller and regular and more frequent builds. Additionally, the CI/CD metrics such as Deployment time, Success rate, Cycle time and Automated test success rate etc. are the key things to watch to improve the quality of the product thus improving the reliability of the applications. Infrastructure-as-code is one of the standard practices followed in SRE for automating the repetitive configuration tasks. Every configuration is maintained as code, so it can be deployed using CI/CD pipelines. It is important to deliver the configuration changes to the production environments through CI/CD pipelines to maintain the versioning, consistency of the changes across environments and to avoid manual errors. Often, as an SRE, you are required to review the application CI/CD pipelines and recommend additional stages such as static code analysis and the security and privacy checks in the code to improve the security and reliability of the product.","title":"Applications in SRE Role"},{"location":"level102/continuous_integration_and_continuous_delivery/conclusion/#conclusion","text":"In this chapter, we have studied the CI/CD pipelines with brief history on the challenges with the traditional build practices. We have also looked at how the CI/CD pipelines augments the SRE discipline. Use of CI/CD pipelines in software development life cycle is a modern approach in the SRE realm that helps achieve greater efficiency. We have also performed a hands-on lab activity on creating the CI/CD pipeline using Jenkins.","title":"Conclusion"},{"location":"level102/continuous_integration_and_continuous_delivery/conclusion/#references","text":"Continuous Integration(martinfowler.com) CI/CD for microservices - Azure Architecture Center | Microsoft Docs SREFoundationBlueprint_2 (devopsinstitute.com) Jenkins User Documentation","title":"References"},{"location":"level102/continuous_integration_and_continuous_delivery/continuous_delivery_release_pipeline/","text":"Continuous Delivery means deploying the application builds more frequently in the non-production environments such as SIT, UAT, INT and performing the integration tests and the acceptance tests automatically. In the CD, the tests are performed on the integrated application instead of the single microservice in the cases of microservice based application. The tests must include all the functional tests and the acceptance tests that may contain the UI tests. The build must be immutable in nature, that is the same package must be deployed across all the environments including the Production. The deployment to the Production is often manual after performing additional acceptance tests such as performance tests etc. So, the fully automated deployment to the Production environments is called the Continuous Deployment (whereas CD \u2013 Continuous delivery doesn\u2019t automatically deploy to Production). The continuous deployment must have a feature toggle so that a feature can be toggled off without the need for redeploying the code. Often, the deployment involves more than one production environment, for example in blue-green environments the application is first deployed to the blue environment and then to the green environment so that the downtime is not required. Fig 3: Continuous Delivery Pipeline","title":"Continuous Delivery and Deployment"},{"location":"level102/continuous_integration_and_continuous_delivery/continuous_integration_build_pipeline/","text":"CI is a software development practice where members of a team integrate their work frequently. Each integration is verified by an automated build (including test) to detect integration errors as quickly as possible. Continuous integration requires that all the code changes be maintained in a single code repository where all the members can push the changes to their feature branches regularly. The code changes must be quickly integrated with the rest of the code and automated builds should happen and feedback to the member to resolve them early. There should be a CI server where it can trigger a build as soon as the code is pushed by a member. The build typically involves compiling the code and transforming it to an executable file such as JARs or DLLs etc. called packaging. It must also perform unit tests with code coverage. Optionally, the build process can have additional stages such as static code analysis and vulnerability checks etc. Jenkins , Bamboo , Travis CI , GitLab , Azure DevOps etc. are the few popular CI tools. These tools provide various plugins and integration such as ant , maven etc. for building and packaging, and Junit, selenium etc. are for performing the unit tests. SonarQube can be used for static code analysis and code security. Fig 1: Continuous Integration Pipeline Fig 2: Continuous Integration Process","title":"Continuous Integration"},{"location":"level102/continuous_integration_and_continuous_delivery/introduction/","text":"Prerequisites Software Development and Maintenance Git Docker What to expect from this course? In this course, you will learn the basics of CI/CD and how it helps drive the SRE discipline in an organization. It also discusses the various DevOps tools in CI/CD practice and a hands-on lab session on Jenkins based pipeline. Finally, it will conclude by explaining the role in the growing SRE philosophy. What is not covered under this course? The course does not cover DevOps elements such as Infrastructure as a code, continuous monitoring applications and infrastructure comprehensively. Table of Contents What is CI/CD? Brief History to CI/CD and DevOps Continuous Integration Continuous Delivery and Deployment Jenkins based CI/CD pipeline - Hands-on Conclusion","title":"Introduction"},{"location":"level102/continuous_integration_and_continuous_delivery/introduction/#prerequisites","text":"Software Development and Maintenance Git Docker","title":"Prerequisites"},{"location":"level102/continuous_integration_and_continuous_delivery/introduction/#what-to-expect-from-this-course","text":"In this course, you will learn the basics of CI/CD and how it helps drive the SRE discipline in an organization. It also discusses the various DevOps tools in CI/CD practice and a hands-on lab session on Jenkins based pipeline. Finally, it will conclude by explaining the role in the growing SRE philosophy.","title":"What to expect from this course?"},{"location":"level102/continuous_integration_and_continuous_delivery/introduction/#what-is-not-covered-under-this-course","text":"The course does not cover DevOps elements such as Infrastructure as a code, continuous monitoring applications and infrastructure comprehensively.","title":"What is not covered under this course?"},{"location":"level102/continuous_integration_and_continuous_delivery/introduction/#table-of-contents","text":"What is CI/CD? Brief History to CI/CD and DevOps Continuous Integration Continuous Delivery and Deployment Jenkins based CI/CD pipeline - Hands-on Conclusion","title":"Table of Contents"},{"location":"level102/continuous_integration_and_continuous_delivery/introduction_to_cicd/","text":"Continuous Integration and Continuous Delivery, also known as CI/CD, is a set of processes that helps in faster integration of software code changes and deployment to the end user in a reliable manner. The more frequent integrations and deployments helps reduce the software development lifecycle. There are three practices in CI/CD: Continuous Integration Continuous Delivery Continuous Deployment Let\u2019s look in detail at each of these in the coming sections. The Benefits of CI/CD Significant reduction in integration problems. Teams can develop cohesive software more rapidly. Improved Collaboration between developers and operation teams can reduce the production integration issues. Faster delivery of new features with less friction Better debugging the production issues and fixing them in the next release/patch.","title":"What is CI/CD?"},{"location":"level102/continuous_integration_and_continuous_delivery/introduction_to_cicd/#the-benefits-of-cicd","text":"Significant reduction in integration problems. Teams can develop cohesive software more rapidly. Improved Collaboration between developers and operation teams can reduce the production integration issues. Faster delivery of new features with less friction Better debugging the production issues and fixing them in the next release/patch.","title":"The Benefits of CI/CD"},{"location":"level102/continuous_integration_and_continuous_delivery/jenkins_cicd_pipeline_hands_on_lab/","text":"Jenkins based CI/CD Pipeline Jenkins is an open-source continuous integration server for orchestrating the CI/CD pipelines. It supports integration with several components, infrastructure such as git, cloud etc. that helps in complete software development life cycle. In this hands-on lab, let us: * Create a build pipeline (CI) for a simple java application. * Adding Test stage to build pipeline This hands-on is based on the Jenkins running on docker on your local workstation, designed for Windows OS. For Linux OS, please follow the demo Note: The hands-on lab is designed with Jenkins on the docker. However, the steps are applicable for the direct docker installation on your windows workstation as well. Installing Git, Docker and Jenkins: Install git command line tool on your workstation. (Follow this to install Git Locally\u00b7) Docker Desktop for windows is installed on the workstation. Follow the instructions to install docker. Ensure that your Docker for Windows installation is configured to run Linux Containers rather than Windows Containers. See the Docker documentation for instructions to switch to Linux containers. Refer this to run and setup the Jenkins on docker. Configure Jenkins with initial steps such as create an admin user etc. Follow Setup wizard. If you have installed the Jenkins on your local workstation, make sure the maven tool is installed. Follow this to installl maven. Forking Sample java application: For this hands-on, let us fork a simple java application from the GitHub simple-java-maven-app . 1. Sign up for the GitHub account Join GitHub \u00b7 GitHub . Once signed up, proceed to login . 2. Open the simple-java-maven-app by clicking on this link 3. On the top right corner, click on the \u2018Fork\u2019 to create a copy of the project to your GitHub account. (Refer Fork A Repo ) 4. Once forked, clone this repository to your local workstation. Create Jenkins Project: Login to the Jenkins portal at http://localhost:8080 using the admin account created earlier during Jenkins\u2019s setup. On your first login, the following screen will appear. Click on \u201c Create a Job \u201d. Fig 4: Jenkins - Create a Job On the next screen, type simple-java-pipeline in the Enter an Item Name field. Select Pipeline from the list of items and click OK . Fig 5: Jenkins - Create Pipeline Click the Pipeline tab at the top of the page to scroll down to the Pipeline section. From the Definition field, choose the Pipeline script from SCM option. This option instructs Jenkins to obtain your Pipeline from Source Control Management (SCM), which will be your locally cloned Git repository. From the SCM field, choose Git . In the Repository URL field, specify the directory path of your locally cloned repository from the Forking Sample Java application section above. Screen looks like below after entering the details. Fig 6: Jenkins - Pipeline Configuration Create Build pipeline using the Jenkinsfile: Jenkinsfile is a script file containing the pipeline configuration and the stages and other instructions to Jenkins to create a pipeline from the file. This file will be saved at the root of the code repository. 1. Using your favorite text editor or IDE, create and save a new text file with the name Jenkinsfile at the root of your local simple-java-maven-app Git repository. 2. Copy the following declarative pipeline code and paste it into the empty Jenkinsfile . pipeline { agent { docker { image 'maven:3.8.1-adoptopenjdk-11' args '-v /root/.m2:/root/.m2' } } stages { stage('Build') { steps { sh 'mvn -B -DskipTests clean package' } } } } Note: If you are running Jenkins on your local workstation without the docker, please change the agent to any as shown below so that it runs on the localhost. Please ensure that the maven tool is installed on your local workstation. pipeline { agent any stages { stage('Build') { steps { sh 'mvn -B -DskipTests clean package' } } } } In the above Jenkinsfile: * We specified an agent where the pipeline should run. 'docker\u2019 in the agent section indicates to run a new docker container with the specified image. * In the stages section, we can define multiple steps as different stages. Here, we have a stage called \u2018Build\u2019, with the maven command for building the java application. Save your Jenkinsfile and commit and push to your forked repository. Run the following commands from the command prompt. cd git add . git commit -m \"Add initial Jenkinsfile\" git push origin master Go to Jenkins portal on your browser and click on the Dashboard . Open the simple-java-pipeline and from the left-menu, click on Build Now . Fig 7: Jenkins - Building the Pipeline Notice the build running under the Build History menu. Click on the build number and it shows the stages. Fig 8: Jenkins - View Running Builds We have successfully created a build pipeline with single stage and ran it. We can check the logs by clicking on the Console Output menu. Additional stages in the build pipeline: In the previous section, we have created the pipeline with a single stage. Typically, your CI pipeline contains multiple stages such as Build, Test and other optional stages such Code scanning etc. In this section, let us add a Test stage to the build pipeline and run. Go back to your text editor/IDE and open Jenkinsfile and the Test stage shown below. stage('Test') { steps { sh 'mvn test' } post { always { junit 'target/surefire-reports/*.xml' } } } The Jenkinsfile looks like below after adding the Test stage. pipeline { agent { docker { image 'maven:3.8.1-adoptopenjdk-11' args '-v /root/.m2:/root/.m2' } } stages { stage('Build') { steps { sh 'mvn -B -DskipTests clean package' } } stage('Test') { steps { sh 'mvn test' } post { always { junit 'target/surefire-reports/*.xml' } } } } } Here the stage \u2018Test\u2019 is added which runs the maven command test. The post -> always section ensures that this step is executed always after the steps are completed. The test report is available through Jenkins\u2019s interface. Note: If you are running Jenkins on your local workstation without the docker, please change the agent to any so that it runs on the localhost. Please ensure that the maven tool is installed on your local workstation. pipeline { agent any stages {\u2026 } } Save your Jenkinsfile and commit and push to your forked repository. Run the following commands from the command prompt. cd git add . git commit -m \"Test stage is added to Jenkinsfile\" git push origin master Go to Jenkins portal on your browser and click on the Dashboard . Open the simple-java-pipeline and from the left-menu, click on Build Now. Notice the Build and Test stages are showing in the Build screen. Fig 9: Jenkins - Viewing the Running Builds with Test stage Included We have now successfully created CI pipeline with two stages: Build and Test stages.","title":"CI/CD Pipeline - Hands-on"},{"location":"level102/continuous_integration_and_continuous_delivery/jenkins_cicd_pipeline_hands_on_lab/#jenkins-based-cicd-pipeline","text":"Jenkins is an open-source continuous integration server for orchestrating the CI/CD pipelines. It supports integration with several components, infrastructure such as git, cloud etc. that helps in complete software development life cycle. In this hands-on lab, let us: * Create a build pipeline (CI) for a simple java application. * Adding Test stage to build pipeline This hands-on is based on the Jenkins running on docker on your local workstation, designed for Windows OS. For Linux OS, please follow the demo Note: The hands-on lab is designed with Jenkins on the docker. However, the steps are applicable for the direct docker installation on your windows workstation as well.","title":"Jenkins based CI/CD Pipeline"},{"location":"level102/continuous_integration_and_continuous_delivery/jenkins_cicd_pipeline_hands_on_lab/#installing-git-docker-and-jenkins","text":"Install git command line tool on your workstation. (Follow this to install Git Locally\u00b7) Docker Desktop for windows is installed on the workstation. Follow the instructions to install docker. Ensure that your Docker for Windows installation is configured to run Linux Containers rather than Windows Containers. See the Docker documentation for instructions to switch to Linux containers. Refer this to run and setup the Jenkins on docker. Configure Jenkins with initial steps such as create an admin user etc. Follow Setup wizard. If you have installed the Jenkins on your local workstation, make sure the maven tool is installed. Follow this to installl maven.","title":"Installing Git, Docker and Jenkins:"},{"location":"level102/continuous_integration_and_continuous_delivery/jenkins_cicd_pipeline_hands_on_lab/#forking-sample-java-application","text":"For this hands-on, let us fork a simple java application from the GitHub simple-java-maven-app . 1. Sign up for the GitHub account Join GitHub \u00b7 GitHub . Once signed up, proceed to login . 2. Open the simple-java-maven-app by clicking on this link 3. On the top right corner, click on the \u2018Fork\u2019 to create a copy of the project to your GitHub account. (Refer Fork A Repo ) 4. Once forked, clone this repository to your local workstation.","title":"Forking Sample java application:"},{"location":"level102/continuous_integration_and_continuous_delivery/jenkins_cicd_pipeline_hands_on_lab/#create-jenkins-project","text":"Login to the Jenkins portal at http://localhost:8080 using the admin account created earlier during Jenkins\u2019s setup. On your first login, the following screen will appear. Click on \u201c Create a Job \u201d. Fig 4: Jenkins - Create a Job On the next screen, type simple-java-pipeline in the Enter an Item Name field. Select Pipeline from the list of items and click OK . Fig 5: Jenkins - Create Pipeline Click the Pipeline tab at the top of the page to scroll down to the Pipeline section. From the Definition field, choose the Pipeline script from SCM option. This option instructs Jenkins to obtain your Pipeline from Source Control Management (SCM), which will be your locally cloned Git repository. From the SCM field, choose Git . In the Repository URL field, specify the directory path of your locally cloned repository from the Forking Sample Java application section above. Screen looks like below after entering the details. Fig 6: Jenkins - Pipeline Configuration","title":"Create Jenkins Project:"},{"location":"level102/continuous_integration_and_continuous_delivery/jenkins_cicd_pipeline_hands_on_lab/#create-build-pipeline-using-the-jenkinsfile","text":"Jenkinsfile is a script file containing the pipeline configuration and the stages and other instructions to Jenkins to create a pipeline from the file. This file will be saved at the root of the code repository. 1. Using your favorite text editor or IDE, create and save a new text file with the name Jenkinsfile at the root of your local simple-java-maven-app Git repository. 2. Copy the following declarative pipeline code and paste it into the empty Jenkinsfile . pipeline { agent { docker { image 'maven:3.8.1-adoptopenjdk-11' args '-v /root/.m2:/root/.m2' } } stages { stage('Build') { steps { sh 'mvn -B -DskipTests clean package' } } } } Note: If you are running Jenkins on your local workstation without the docker, please change the agent to any as shown below so that it runs on the localhost. Please ensure that the maven tool is installed on your local workstation. pipeline { agent any stages { stage('Build') { steps { sh 'mvn -B -DskipTests clean package' } } } } In the above Jenkinsfile: * We specified an agent where the pipeline should run. 'docker\u2019 in the agent section indicates to run a new docker container with the specified image. * In the stages section, we can define multiple steps as different stages. Here, we have a stage called \u2018Build\u2019, with the maven command for building the java application. Save your Jenkinsfile and commit and push to your forked repository. Run the following commands from the command prompt. cd git add . git commit -m \"Add initial Jenkinsfile\" git push origin master Go to Jenkins portal on your browser and click on the Dashboard . Open the simple-java-pipeline and from the left-menu, click on Build Now . Fig 7: Jenkins - Building the Pipeline Notice the build running under the Build History menu. Click on the build number and it shows the stages. Fig 8: Jenkins - View Running Builds We have successfully created a build pipeline with single stage and ran it. We can check the logs by clicking on the Console Output menu.","title":"Create Build pipeline using the Jenkinsfile:"},{"location":"level102/continuous_integration_and_continuous_delivery/jenkins_cicd_pipeline_hands_on_lab/#additional-stages-in-the-build-pipeline","text":"In the previous section, we have created the pipeline with a single stage. Typically, your CI pipeline contains multiple stages such as Build, Test and other optional stages such Code scanning etc. In this section, let us add a Test stage to the build pipeline and run. Go back to your text editor/IDE and open Jenkinsfile and the Test stage shown below. stage('Test') { steps { sh 'mvn test' } post { always { junit 'target/surefire-reports/*.xml' } } } The Jenkinsfile looks like below after adding the Test stage. pipeline { agent { docker { image 'maven:3.8.1-adoptopenjdk-11' args '-v /root/.m2:/root/.m2' } } stages { stage('Build') { steps { sh 'mvn -B -DskipTests clean package' } } stage('Test') { steps { sh 'mvn test' } post { always { junit 'target/surefire-reports/*.xml' } } } } } Here the stage \u2018Test\u2019 is added which runs the maven command test. The post -> always section ensures that this step is executed always after the steps are completed. The test report is available through Jenkins\u2019s interface. Note: If you are running Jenkins on your local workstation without the docker, please change the agent to any so that it runs on the localhost. Please ensure that the maven tool is installed on your local workstation. pipeline { agent any stages {\u2026 } } Save your Jenkinsfile and commit and push to your forked repository. Run the following commands from the command prompt. cd git add . git commit -m \"Test stage is added to Jenkinsfile\" git push origin master Go to Jenkins portal on your browser and click on the Dashboard . Open the simple-java-pipeline and from the left-menu, click on Build Now. Notice the Build and Test stages are showing in the Build screen. Fig 9: Jenkins - Viewing the Running Builds with Test stage Included We have now successfully created CI pipeline with two stages: Build and Test stages.","title":"Additional stages in the build pipeline:"},{"location":"level102/linux_intermediate/archiving_backup/","text":"Archiving and Backup Introduction One of the things SREs make sure of is the services are up all the time (at least 99.99% of the time), but the amount of data generated at each server running those services are immense. This data could be logs, user data in the database, or any other kind of metadata. Hence we need to compress, archive, rotate, and Backup the data in a timely manner for data safety and to make sure we don\u2019t run out of space. Archiving We usually archive the data that are no longer needed but are kept mostly for compliance purposes. This helps in storing the data into compressed format saving a lot of space. Below section is to familiarize with the archiving tools and commands. gzip gzip is a program used to compress one or more files, it replaces the original file with a compressed version of the original file. Here we can see that the messages log file is compressed to almost one-fifth of the original size and replaced with messages.gz. We can uncompress this file using gunzip command. tar tar program is a tool for archiving files and directories into a single file (often called tarball). This tool is usually used to prepare archives of files before it is transferred to a long term backup server. tar doesn\u2019t replace the existing files and folders but creates a new file with extension .tar . It provides lot of flag to choose from for archiving Flags Description -c Creates archive -x Extracts the archive -f Creates archive with the given filename -t Displays or lists files in archived file -u Archives and adds to an existing archive file -v Displays verbose information -A Concatenates the archived file -z Compresses the tar file using gzip -j Compresses the tar file using bzip2 -W Verifies an archive file -r Updates or adds file or directory in already existing .tar file Create an archive with files and folder Flag c is used for creating the archive where f is the filename. Listing files in the archive We can use flag t for listing out what an archive contains. Extract files from the archive We can use flag x to unarchive the archive. Backup Backup is a process of copying/duplicating the existing data, This backup can be used to restore the dataset in case of data loss. Data backup also becomes critical when the data is not needed in a day to day job but can be referred to as a source of truth and for compliance reasons in future. Different types of backup are : Incremental backup Incremental backup is the backup of data since the last backup, this reduces data redundancy and storage efficiency. Differential backup Sometimes our data keeps on modifying/updating. In that case we take backup of changes that occurred since the last backup called differential backup. Network backup Network backup refers to sending out data over the network from the source to a backup destination in a client-server model. This backup destination can be centralized or decentralized. Decentralized backups are useful for disaster recovery scenarios. rsync is one of the linux command which sync up file from one server to the destination server over the network. The syntax for rsync goes like rsync \\[options\\] . We can locate the file on the path specified after : (colon) in the \u201c destination\u201d . If nothing is specified the default path is the home directory of the user used for backup. /home/azureuser in this case. You can always look for different options for rsync using the man rsync command. Cloud Backup There are various third parties which provide the backup of data to the cloud. These cloud backups are much more reliable than stored backups on local machines or any server without RAID configuration as these providers manage redundancy of data, data recovery along with the data security. Two most widely used cloud backup options are Azure backup (from Microsoft) and Amazon Glacier backup (from AWS).","title":"Archiving and Backup"},{"location":"level102/linux_intermediate/archiving_backup/#archiving-and-backup","text":"","title":"Archiving and Backup"},{"location":"level102/linux_intermediate/archiving_backup/#introduction","text":"One of the things SREs make sure of is the services are up all the time (at least 99.99% of the time), but the amount of data generated at each server running those services are immense. This data could be logs, user data in the database, or any other kind of metadata. Hence we need to compress, archive, rotate, and Backup the data in a timely manner for data safety and to make sure we don\u2019t run out of space.","title":"Introduction"},{"location":"level102/linux_intermediate/archiving_backup/#archiving","text":"We usually archive the data that are no longer needed but are kept mostly for compliance purposes. This helps in storing the data into compressed format saving a lot of space. Below section is to familiarize with the archiving tools and commands.","title":"Archiving"},{"location":"level102/linux_intermediate/archiving_backup/#gzip","text":"gzip is a program used to compress one or more files, it replaces the original file with a compressed version of the original file. Here we can see that the messages log file is compressed to almost one-fifth of the original size and replaced with messages.gz. We can uncompress this file using gunzip command.","title":"gzip"},{"location":"level102/linux_intermediate/archiving_backup/#tar","text":"tar program is a tool for archiving files and directories into a single file (often called tarball). This tool is usually used to prepare archives of files before it is transferred to a long term backup server. tar doesn\u2019t replace the existing files and folders but creates a new file with extension .tar . It provides lot of flag to choose from for archiving Flags Description -c Creates archive -x Extracts the archive -f Creates archive with the given filename -t Displays or lists files in archived file -u Archives and adds to an existing archive file -v Displays verbose information -A Concatenates the archived file -z Compresses the tar file using gzip -j Compresses the tar file using bzip2 -W Verifies an archive file -r Updates or adds file or directory in already existing .tar file","title":"tar"},{"location":"level102/linux_intermediate/archiving_backup/#create-an-archive-with-files-and-folder","text":"Flag c is used for creating the archive where f is the filename.","title":"Create an archive with files and folder"},{"location":"level102/linux_intermediate/archiving_backup/#listing-files-in-the-archive","text":"We can use flag t for listing out what an archive contains.","title":"Listing files in the archive"},{"location":"level102/linux_intermediate/archiving_backup/#extract-files-from-the-archive","text":"We can use flag x to unarchive the archive.","title":"Extract files from the archive"},{"location":"level102/linux_intermediate/archiving_backup/#backup","text":"Backup is a process of copying/duplicating the existing data, This backup can be used to restore the dataset in case of data loss. Data backup also becomes critical when the data is not needed in a day to day job but can be referred to as a source of truth and for compliance reasons in future. Different types of backup are :","title":"Backup"},{"location":"level102/linux_intermediate/archiving_backup/#incremental-backup","text":"Incremental backup is the backup of data since the last backup, this reduces data redundancy and storage efficiency.","title":"Incremental backup"},{"location":"level102/linux_intermediate/archiving_backup/#differential-backup","text":"Sometimes our data keeps on modifying/updating. In that case we take backup of changes that occurred since the last backup called differential backup.","title":"Differential backup"},{"location":"level102/linux_intermediate/archiving_backup/#network-backup","text":"Network backup refers to sending out data over the network from the source to a backup destination in a client-server model. This backup destination can be centralized or decentralized. Decentralized backups are useful for disaster recovery scenarios. rsync is one of the linux command which sync up file from one server to the destination server over the network. The syntax for rsync goes like rsync \\[options\\] . We can locate the file on the path specified after : (colon) in the \u201c destination\u201d . If nothing is specified the default path is the home directory of the user used for backup. /home/azureuser in this case. You can always look for different options for rsync using the man rsync command.","title":"Network backup"},{"location":"level102/linux_intermediate/archiving_backup/#cloud-backup","text":"There are various third parties which provide the backup of data to the cloud. These cloud backups are much more reliable than stored backups on local machines or any server without RAID configuration as these providers manage redundancy of data, data recovery along with the data security. Two most widely used cloud backup options are Azure backup (from Microsoft) and Amazon Glacier backup (from AWS).","title":"Cloud Backup"},{"location":"level102/linux_intermediate/bashscripting/","text":"Bash Scripting Introduction As an SRE, the Linux system sits at the core of our day to day work and so is bash scripting. It\u2019s a scripting language that is run by Linux Bash Interpreter. Until now we have covered a lot of features mostly on a command line, now we will use this command line as an interpreter to write programs that will ease our day to day job as an SRE. Writing the first bash script: We will start with a simple program, we will use Vim as the editor during the whole journey. #!/bin/bash # This if my first bash script # Line starting with # is commented echo \"Hello world!\" The first line of the script starting with \u201c#!\u201d is called she-bang. This is simply to let the system which interpreter to use while executing the script. Any Line starting with \u201c#\u201d (other than #!) is referred to as comments in script and is ignored by the interpreter while executing the script. Line 6 shows the \u201cecho\u201d command that we would be running. We will save this script as \u201cfirstscript.sh\u201d and make the script executable using chmod . Next thing is to run the script with the explicit path. We can see the desired \u201cHello World!\u201d as output. Taking user input and working with variables: Taking standard input using the read command and working with variables in bash. #!/bin/bash #We will take standard input #Will list all files at the path #We will concate variable and string echo \"Enter the path\" read path echo \"How deep in directory you want to go:\" read depth echo \"All files at path \" $path du -d $depth -all -h $path We are reading path in variable \u201c path \u201d and variable \u201c depth \u201d to list files and directories up to that depth. We concatenated strings with variables. We always use $ (dollar-sign) to reference the value it contains. We pass these variables to the du command to list out all the files and directories in that path upto the desired depth. Exit status: Every command and script when it completes executing, returns an integer in the range from 0 to 255 to the system, this is called exit status. \u201c0\u201d denotes success of the command while non-zero return code usually indicates various kinds of errors. We use $? special shell variable to get exit status of the last executed script or command. Command line arguments and understanding If \u2026 else branching: Another way to pass some values to the script is using command line arguments. Usually command line arguments in bash are accessed by $ followed by the index. The 0th index refers to the file itself, $1 to the first argument and so on. We use $# to check the count of arguments passed to the script. Making decisions in the programming language is it\u2019s integral part, and to tackle different conditions we use if \u2026 else statements or some more nested variant of it. The below script uses multiple concepts in one script. The aim of the script is to get some properties of the file. Line 4 to 7 is the standard example of \"if statement\" in bash. Syntax is as explained below: If [ condition ]; then If_block_to_execute else else_block_to_execute fi fi is to close the if \u2026 else block. We are comparing count of argument($#) if it is equal to 1 or not. If not we prompt for only one argument and exit the script with status code 1(not a success). One or more if statements can exist without else statement but vice versa doesn\u2019t make any sense. Operator -ne is used to compare two integers, read as \u201cinteger1 not equal to integer 2\u201d. Other comparison operators are: Operations Description num1 -eq num2 check if 1st number is equal to 2nd number num1 -ge num2 checks if 1st number is greater than or equal to 2nd number num1 -gt num2 checks if 1st number is greater than 2nd number num1 -le num2 checks if 1st number is less than or equal to 2nd number num1 -lt num2 checks if 1st number is less than 2nd number #!/bin/bash # This script evaluate the status of a file if [ $# -ne 1 ]; then echo \"Please pass one file name as argument\" exit 1 fi FILE=$1 if [ -e \"$FILE\" ]; then if [ -f \"$FILE\" ]; then echo \"$FILE is a regular file.\" fi if [ -d \"$FILE\" ]; then echo \"$FILE is a directory.\" fi if [ -r \"$FILE\" ]; then echo \"$FILE is readable.\" fi if [ -w \"$FILE\" ]; then echo \"$FILE is writable.\" fi if [ -x \"$FILE\" ]; then echo \"$FILE is executable/searchable.\" fi else echo \"$FILE does not exist\" exit 2 fi exit 0 There are lots of file expressions to evaluate file,like in bash script \u201c-e\u201d in line 10 returns true if the file passed as argument exist, false otherwise. Below are the some widely used file expressions: File Operations Description -e file File exists -d file File exists and is directory -f file File exists and is regular file -L file File exists and is symbolic link -r file File exists and has readable permission -w file File exists and has writable permission -x file File exists and has executable permission -s file File exists and size is greater than zero -S file File exists and is a network socket. Exit status is 2 when the file is not found. And if the file is found it prints out the properties it holds with exit status 0(success). Looping over to do a repeated task. We usually come up with tasks that are mostly repetitive, looping helps us to code those repetitive tasks in a more formal manner. There are different types of loop statement we can use in bash: Loop Syntax while while [ expression ] do [ while_block_to_execute ] done for for variable in 1,2,3 .. n do [ for_block_to_execute ] done until until [ expression ] do [ until_block_to_execute ] done #!/bin/bash #Script to monitor the server hosts=`cat host_list` while true do for i in $hosts do h=\"$i\" ping -c 1 -q \"$h\" &>/dev/null if [ $? -eq 0 ] then echo `date` \"server $h alive\" else echo `date` \"server $h is dead\" fi done sleep 60 done Monitoring a server is an important part of being an SRE. The file \u201chost_list\u201d contains the list of host which we want to monitor. We used an infinite \u201cwhile\u201d loop that will sleep every 60seconds. And for each host in the host_list we want to ping that host and check if that ping was successful with its exit status, if it\u2019s successful we say server is live or it\u2019s dead. The output of the script shows it is running every minute with the timestamp. Function Developers always try to make their applications/programs in modular fashion so that they don\u2019t have to write the same code every time and everywhere to carry out similar tasks. Functions help us achieve this. We usually call functions with some arguments and expect result based on that argument. The backup process we discussed in earlier section, we will try to automate that process using the below script and also get familiar with some more concepts like string comparison, functions and logical AND and OR operations. In the below code \u201clog_backup\u201d is a function which won\u2019t be executed until it is called. Line37 will be executed first where we will check the no. of arguments passed to the script. There are many logical operators like AND,OR, XOR etc. Logical Operator Symbol AND && OR | NOT ! Passing the wrong argument to script \u201cbackup.sh\u201d will prompt for correct usage. We have to pass whether we want to have incremental backup of the directory or the full backup along with the path of the directory we want to backup. If we want the incremental backup we will an additional argument as a meta file which is used to store the information of previous backed up files.(usually a metafile is .snar extension). #!/bin/bash #Scripts to take incremental and full backup backup_dir=\"/mnt/backup/\" time_stamp=\"`date +%d-%m-%Y-%Hh-%Mm-%Ss`\" log_backup(){ if [ $# -lt 2 ]; then echo \"Usage: ./backup.sh [backup_type] [log_path]\" exit 1; fi if [ $1 == \"incremental\" ]; then if [ $# -ne 3 ]; then echo \"Usage: ./backup.sh [backup_type] [log_path] [meta_file]\" exit 3; fi tar --create --listed-incremental=$3 --verbose --verbose --file=\"${backup_dir}incremental-${time_stamp}.tar\" $2 if [ $? -eq 0 ]; then echo \"Incremental backup succesful at '${backup_dir}incremental-${time_stamp}.tar'\" else echo \"Incremental Backup Failure\" fi elif [ $1 == \"full\" ];then tar cf \"${backup_dir}fullbackup-${time_stamp}.tar\" $2 if [ $? -eq 0 ];then echo \"Full backup successful at '${backup_dir}fullbackup-${time_stamp}.tar'\" else echo \"Full Backup Failure\" fi else echo \"Unknown parameter passed\" echo \"Usage: ./backup.sh [incremental|full] [log_path]\" exit 2; fi } if [ $# -lt 2 ] || [ $# -gt 3 ];then echo \"Usage: ./backup.sh [incremental|full] [log_path]\" exit 1 elif [ $# -eq 2 ];then log_backup $1 $2 elif [ $# -eq 3 ];then log_backup $1 $2 $3 fi exit 0 Passing all 3 arguments for incremental backup will take incremental backup at \u201c/mnt/backup/\u201d with each archive having timestamp concatenated to each file. The arguments passed inside the function can be accessed via $ followed by the index. The 0th index refers to the function itself, $1 to the first argument and so on. We use #$ to check the count of arguments passed to the function. Once we pass the string \u201cincremental\u201d or \u201cfull\u201d it gets compared inside the function and the specific block is executed. Below are some more operations that can be performed over strings. String Operations Description string1 == string2 Returns true if string1 equals string 2 otherwise false. string1 != string2 Returns true if string NOT equal string 2 otherwise false. string1 ~= regex Returns true if string1 matches the extended regular expression. -z string Returns true if string length is zero otherwise false. -n string Returns true if string length is non-zero otherwise false.","title":"Bash Scripting"},{"location":"level102/linux_intermediate/bashscripting/#bash-scripting","text":"","title":"Bash Scripting"},{"location":"level102/linux_intermediate/bashscripting/#introduction","text":"As an SRE, the Linux system sits at the core of our day to day work and so is bash scripting. It\u2019s a scripting language that is run by Linux Bash Interpreter. Until now we have covered a lot of features mostly on a command line, now we will use this command line as an interpreter to write programs that will ease our day to day job as an SRE.","title":"Introduction"},{"location":"level102/linux_intermediate/bashscripting/#writing-the-first-bash-script","text":"We will start with a simple program, we will use Vim as the editor during the whole journey. #!/bin/bash # This if my first bash script # Line starting with # is commented echo \"Hello world!\" The first line of the script starting with \u201c#!\u201d is called she-bang. This is simply to let the system which interpreter to use while executing the script. Any Line starting with \u201c#\u201d (other than #!) is referred to as comments in script and is ignored by the interpreter while executing the script. Line 6 shows the \u201cecho\u201d command that we would be running. We will save this script as \u201cfirstscript.sh\u201d and make the script executable using chmod . Next thing is to run the script with the explicit path. We can see the desired \u201cHello World!\u201d as output.","title":"Writing the first bash script:"},{"location":"level102/linux_intermediate/bashscripting/#taking-user-input-and-working-with-variables","text":"Taking standard input using the read command and working with variables in bash. #!/bin/bash #We will take standard input #Will list all files at the path #We will concate variable and string echo \"Enter the path\" read path echo \"How deep in directory you want to go:\" read depth echo \"All files at path \" $path du -d $depth -all -h $path We are reading path in variable \u201c path \u201d and variable \u201c depth \u201d to list files and directories up to that depth. We concatenated strings with variables. We always use $ (dollar-sign) to reference the value it contains. We pass these variables to the du command to list out all the files and directories in that path upto the desired depth.","title":"Taking user input and working with variables:"},{"location":"level102/linux_intermediate/bashscripting/#exit-status","text":"Every command and script when it completes executing, returns an integer in the range from 0 to 255 to the system, this is called exit status. \u201c0\u201d denotes success of the command while non-zero return code usually indicates various kinds of errors. We use $? special shell variable to get exit status of the last executed script or command.","title":"Exit status:"},{"location":"level102/linux_intermediate/bashscripting/#command-line-arguments-and-understanding-if-else-branching","text":"Another way to pass some values to the script is using command line arguments. Usually command line arguments in bash are accessed by $ followed by the index. The 0th index refers to the file itself, $1 to the first argument and so on. We use $# to check the count of arguments passed to the script. Making decisions in the programming language is it\u2019s integral part, and to tackle different conditions we use if \u2026 else statements or some more nested variant of it. The below script uses multiple concepts in one script. The aim of the script is to get some properties of the file. Line 4 to 7 is the standard example of \"if statement\" in bash. Syntax is as explained below: If [ condition ]; then If_block_to_execute else else_block_to_execute fi fi is to close the if \u2026 else block. We are comparing count of argument($#) if it is equal to 1 or not. If not we prompt for only one argument and exit the script with status code 1(not a success). One or more if statements can exist without else statement but vice versa doesn\u2019t make any sense. Operator -ne is used to compare two integers, read as \u201cinteger1 not equal to integer 2\u201d. Other comparison operators are: Operations Description num1 -eq num2 check if 1st number is equal to 2nd number num1 -ge num2 checks if 1st number is greater than or equal to 2nd number num1 -gt num2 checks if 1st number is greater than 2nd number num1 -le num2 checks if 1st number is less than or equal to 2nd number num1 -lt num2 checks if 1st number is less than 2nd number #!/bin/bash # This script evaluate the status of a file if [ $# -ne 1 ]; then echo \"Please pass one file name as argument\" exit 1 fi FILE=$1 if [ -e \"$FILE\" ]; then if [ -f \"$FILE\" ]; then echo \"$FILE is a regular file.\" fi if [ -d \"$FILE\" ]; then echo \"$FILE is a directory.\" fi if [ -r \"$FILE\" ]; then echo \"$FILE is readable.\" fi if [ -w \"$FILE\" ]; then echo \"$FILE is writable.\" fi if [ -x \"$FILE\" ]; then echo \"$FILE is executable/searchable.\" fi else echo \"$FILE does not exist\" exit 2 fi exit 0 There are lots of file expressions to evaluate file,like in bash script \u201c-e\u201d in line 10 returns true if the file passed as argument exist, false otherwise. Below are the some widely used file expressions: File Operations Description -e file File exists -d file File exists and is directory -f file File exists and is regular file -L file File exists and is symbolic link -r file File exists and has readable permission -w file File exists and has writable permission -x file File exists and has executable permission -s file File exists and size is greater than zero -S file File exists and is a network socket. Exit status is 2 when the file is not found. And if the file is found it prints out the properties it holds with exit status 0(success).","title":"Command line arguments and understanding If \u2026 else branching:"},{"location":"level102/linux_intermediate/bashscripting/#looping-over-to-do-a-repeated-task","text":"We usually come up with tasks that are mostly repetitive, looping helps us to code those repetitive tasks in a more formal manner. There are different types of loop statement we can use in bash: Loop Syntax while while [ expression ] do [ while_block_to_execute ] done for for variable in 1,2,3 .. n do [ for_block_to_execute ] done until until [ expression ] do [ until_block_to_execute ] done #!/bin/bash #Script to monitor the server hosts=`cat host_list` while true do for i in $hosts do h=\"$i\" ping -c 1 -q \"$h\" &>/dev/null if [ $? -eq 0 ] then echo `date` \"server $h alive\" else echo `date` \"server $h is dead\" fi done sleep 60 done Monitoring a server is an important part of being an SRE. The file \u201chost_list\u201d contains the list of host which we want to monitor. We used an infinite \u201cwhile\u201d loop that will sleep every 60seconds. And for each host in the host_list we want to ping that host and check if that ping was successful with its exit status, if it\u2019s successful we say server is live or it\u2019s dead. The output of the script shows it is running every minute with the timestamp.","title":"Looping over to do a repeated task."},{"location":"level102/linux_intermediate/bashscripting/#function","text":"Developers always try to make their applications/programs in modular fashion so that they don\u2019t have to write the same code every time and everywhere to carry out similar tasks. Functions help us achieve this. We usually call functions with some arguments and expect result based on that argument. The backup process we discussed in earlier section, we will try to automate that process using the below script and also get familiar with some more concepts like string comparison, functions and logical AND and OR operations. In the below code \u201clog_backup\u201d is a function which won\u2019t be executed until it is called. Line37 will be executed first where we will check the no. of arguments passed to the script. There are many logical operators like AND,OR, XOR etc. Logical Operator Symbol AND && OR | NOT ! Passing the wrong argument to script \u201cbackup.sh\u201d will prompt for correct usage. We have to pass whether we want to have incremental backup of the directory or the full backup along with the path of the directory we want to backup. If we want the incremental backup we will an additional argument as a meta file which is used to store the information of previous backed up files.(usually a metafile is .snar extension). #!/bin/bash #Scripts to take incremental and full backup backup_dir=\"/mnt/backup/\" time_stamp=\"`date +%d-%m-%Y-%Hh-%Mm-%Ss`\" log_backup(){ if [ $# -lt 2 ]; then echo \"Usage: ./backup.sh [backup_type] [log_path]\" exit 1; fi if [ $1 == \"incremental\" ]; then if [ $# -ne 3 ]; then echo \"Usage: ./backup.sh [backup_type] [log_path] [meta_file]\" exit 3; fi tar --create --listed-incremental=$3 --verbose --verbose --file=\"${backup_dir}incremental-${time_stamp}.tar\" $2 if [ $? -eq 0 ]; then echo \"Incremental backup succesful at '${backup_dir}incremental-${time_stamp}.tar'\" else echo \"Incremental Backup Failure\" fi elif [ $1 == \"full\" ];then tar cf \"${backup_dir}fullbackup-${time_stamp}.tar\" $2 if [ $? -eq 0 ];then echo \"Full backup successful at '${backup_dir}fullbackup-${time_stamp}.tar'\" else echo \"Full Backup Failure\" fi else echo \"Unknown parameter passed\" echo \"Usage: ./backup.sh [incremental|full] [log_path]\" exit 2; fi } if [ $# -lt 2 ] || [ $# -gt 3 ];then echo \"Usage: ./backup.sh [incremental|full] [log_path]\" exit 1 elif [ $# -eq 2 ];then log_backup $1 $2 elif [ $# -eq 3 ];then log_backup $1 $2 $3 fi exit 0 Passing all 3 arguments for incremental backup will take incremental backup at \u201c/mnt/backup/\u201d with each archive having timestamp concatenated to each file. The arguments passed inside the function can be accessed via $ followed by the index. The 0th index refers to the function itself, $1 to the first argument and so on. We use #$ to check the count of arguments passed to the function. Once we pass the string \u201cincremental\u201d or \u201cfull\u201d it gets compared inside the function and the specific block is executed. Below are some more operations that can be performed over strings. String Operations Description string1 == string2 Returns true if string1 equals string 2 otherwise false. string1 != string2 Returns true if string NOT equal string 2 otherwise false. string1 ~= regex Returns true if string1 matches the extended regular expression. -z string Returns true if string length is zero otherwise false. -n string Returns true if string length is non-zero otherwise false.","title":"Function"},{"location":"level102/linux_intermediate/conclusion/","text":"Conclusion Understanding package management is very crucial as an SRE, we always want the right set of software with their compatible versions to work in harmony to drive the big infrastructure and organization. We also saw how we can configure and use storage drives and how we can have redundancy of data using RAID to avoid the data loss, how data is placed over disk and use of file systems. Archiving and Backup is also a crucial part of being an SRE, It\u2019s our responsibility to keep the data safe and in a more efficient manner. Bash is very useful to automate the day to day toil that an SRE stumbles into. The above walkthrough of bash gives us an idea to get started, but mere reading through it won\u2019t take you much further. I believe \u201ctaking action and practicing the topic\u201d would give you confidence and will help you become a better SRE.","title":"Conclusion"},{"location":"level102/linux_intermediate/conclusion/#conclusion","text":"Understanding package management is very crucial as an SRE, we always want the right set of software with their compatible versions to work in harmony to drive the big infrastructure and organization. We also saw how we can configure and use storage drives and how we can have redundancy of data using RAID to avoid the data loss, how data is placed over disk and use of file systems. Archiving and Backup is also a crucial part of being an SRE, It\u2019s our responsibility to keep the data safe and in a more efficient manner. Bash is very useful to automate the day to day toil that an SRE stumbles into. The above walkthrough of bash gives us an idea to get started, but mere reading through it won\u2019t take you much further. I believe \u201ctaking action and practicing the topic\u201d would give you confidence and will help you become a better SRE.","title":"Conclusion"},{"location":"level102/linux_intermediate/introduction/","text":"Linux-Intermediate Prerequisites Expect to have gone through the School Of SRE Linux Basics . What to expect from this course This course is divided into two sections. In the first section we will cover where we left off the Linux Basics, earlier in the School of SRE curriculum, we will deep dive into some of the more advanced linux commands and concepts. In this second section we will discuss how we use Bash scripting in day to day work, automation and toil reduction as an SRE with the help of real life examples of any SRE. What is not covered under this course This course aims to make you familiar with the intersection of Linux commands, shell scripting and how SRE uses it. We would not be covering Linux internals. Lab Environment Setup Install docker on your system. https://docs.docker.com/engine/install/ We would be using RedHat Enterprise Linux (RHEL) 8. We would be running most of the commands in the above docker container. __________________________________________________________________________ Course Content Package Management Package: Dependencies Repository High Level and Low-Level Package management tools Storage Media Listing the mounted storage devices Creating a FileSystem Mounting the device Unmounting the device Making it easier with /etc/fstab file? Checking and Repairing FS RAID RAID levels RAID 0 (Striping) RAID 1(Mirroring) RAID 5(Striping with distributed parity) RAID 6(Striping with double parity) RAID 10(RAID 1+0 : Mirroring and Striping) Commands to monitor RAID LVM Archiving and Backup Archiving gzip tar Create an archive with files and folder Listing files in the archive Extract files from the archive Backup Incremental backup Differential backup Network backup Cloud Backup Introduction to Vim Opening a file and using insert mode Saving a file Exiting the VIM editor Bash Scripting Writing the first bash script Taking user input and working with variables Exit status Command line arguments and understanding If \u2026 else branching Looping over to do a repeated task Function Conclusion","title":"Introduction"},{"location":"level102/linux_intermediate/introduction/#linux-intermediate","text":"","title":"Linux-Intermediate"},{"location":"level102/linux_intermediate/introduction/#prerequisites","text":"Expect to have gone through the School Of SRE Linux Basics .","title":"Prerequisites"},{"location":"level102/linux_intermediate/introduction/#what-to-expect-from-this-course","text":"This course is divided into two sections. In the first section we will cover where we left off the Linux Basics, earlier in the School of SRE curriculum, we will deep dive into some of the more advanced linux commands and concepts. In this second section we will discuss how we use Bash scripting in day to day work, automation and toil reduction as an SRE with the help of real life examples of any SRE.","title":"What to expect from this course"},{"location":"level102/linux_intermediate/introduction/#what-is-not-covered-under-this-course","text":"This course aims to make you familiar with the intersection of Linux commands, shell scripting and how SRE uses it. We would not be covering Linux internals.","title":"What is not covered under this course"},{"location":"level102/linux_intermediate/introduction/#lab-environment-setup","text":"Install docker on your system. https://docs.docker.com/engine/install/ We would be using RedHat Enterprise Linux (RHEL) 8. We would be running most of the commands in the above docker container. __________________________________________________________________________","title":"Lab Environment Setup"},{"location":"level102/linux_intermediate/introduction/#course-content","text":"Package Management Package: Dependencies Repository High Level and Low-Level Package management tools Storage Media Listing the mounted storage devices Creating a FileSystem Mounting the device Unmounting the device Making it easier with /etc/fstab file? Checking and Repairing FS RAID RAID levels RAID 0 (Striping) RAID 1(Mirroring) RAID 5(Striping with distributed parity) RAID 6(Striping with double parity) RAID 10(RAID 1+0 : Mirroring and Striping) Commands to monitor RAID LVM Archiving and Backup Archiving gzip tar Create an archive with files and folder Listing files in the archive Extract files from the archive Backup Incremental backup Differential backup Network backup Cloud Backup Introduction to Vim Opening a file and using insert mode Saving a file Exiting the VIM editor Bash Scripting Writing the first bash script Taking user input and working with variables Exit status Command line arguments and understanding If \u2026 else branching Looping over to do a repeated task Function Conclusion","title":"Course Content"},{"location":"level102/linux_intermediate/introvim/","text":"Introduction to Vim Introduction As an SRE we several times log into into the servers and make changes to the config file, edit and modify scripts and the editor which comes handy and available in almost all linux distribution is Vim. Vim is an open-source and free command line editor, widely accepted and used. We will see some basics of how to use vim for creating and editing files. This knowledge will help us in understanding the next section, Scripting. Opening a file and using insert mode We use the command vim filename to open a file filename . The terminal will open an editor but once you start writing, it won\u2019t work. It\u2019s because we are not in \"INSERT\" mode in vim. Press i and get into insert mode and start writing. You will see on the bottom left \u201cINSERT\u201d after pressing \u201c i \u201d . You can use * ESC \u201d key to get back to normal mode. Saving a file After you insert your text in INSERT mode press ESC(escape) key on your keyboard to get out of it. Press : (colon shift +;) and press w and hit enter, the text you entered will get written in the file. Exiting the VIM editor Exiting vim can get real challenging for the beginners. There are various ways you can exit the Vim like exit without saving the work, exit with saving the work. Try below commands after exiting insert mode and pressing : (colon). Vim Commands Description :q Exit the file but won\u2019t exit if file has unsaved changes :wq Write(save) and exit the file. :q! Exit without saving the changes. This is basic we would be needing in bash scripting in the next section. You can always visit tutorial for learning more. For quick practice of vim commands visit: https://www.openvim.com/","title":"Introduction to Vim"},{"location":"level102/linux_intermediate/introvim/#introduction-to-vim","text":"","title":"Introduction to Vim"},{"location":"level102/linux_intermediate/introvim/#introduction","text":"As an SRE we several times log into into the servers and make changes to the config file, edit and modify scripts and the editor which comes handy and available in almost all linux distribution is Vim. Vim is an open-source and free command line editor, widely accepted and used. We will see some basics of how to use vim for creating and editing files. This knowledge will help us in understanding the next section, Scripting.","title":"Introduction"},{"location":"level102/linux_intermediate/introvim/#opening-a-file-and-using-insert-mode","text":"We use the command vim filename to open a file filename . The terminal will open an editor but once you start writing, it won\u2019t work. It\u2019s because we are not in \"INSERT\" mode in vim. Press i and get into insert mode and start writing. You will see on the bottom left \u201cINSERT\u201d after pressing \u201c i \u201d . You can use * ESC \u201d key to get back to normal mode.","title":"Opening a file and using insert mode"},{"location":"level102/linux_intermediate/introvim/#saving-a-file","text":"After you insert your text in INSERT mode press ESC(escape) key on your keyboard to get out of it. Press : (colon shift +;) and press w and hit enter, the text you entered will get written in the file.","title":"Saving a file"},{"location":"level102/linux_intermediate/introvim/#exiting-the-vim-editor","text":"Exiting vim can get real challenging for the beginners. There are various ways you can exit the Vim like exit without saving the work, exit with saving the work. Try below commands after exiting insert mode and pressing : (colon). Vim Commands Description :q Exit the file but won\u2019t exit if file has unsaved changes :wq Write(save) and exit the file. :q! Exit without saving the changes. This is basic we would be needing in bash scripting in the next section. You can always visit tutorial for learning more. For quick practice of vim commands visit: https://www.openvim.com/","title":"Exiting the VIM editor"},{"location":"level102/linux_intermediate/package_management/","text":"Package Management Introduction One of the main features of any operating system is the ability to run other programs and softwares, and hence Package management comes into picture. Package management is a method of installing and maintaining software programs on any operating system. Package In the early days of Linux, one had to download source code of any software and compile it to install and run the software. As the Linux space became more mature, it is understood the software landscape is very dynamic and started distributing software in the form of packages. Package file is a compressed collection of files that contains software, its dependencies, installation instructions and metadata about the package. Dependencies It is rare that a software package is stand-alone, it depends on the different software, libraries and modules. These subroutines are stored and made available in the form of shared libraries which may serve more than one program. These shared resources are called dependencies. Package management does this hard job of resolving dependencies and installing them for the user along with the software. Repository Repository is a storage location where all the packages, updates, dependencies are stored. Each repository can contain thousands of software packages hosted on a remote server intended to be installed and updated on linux systems. We usually update the package information ( often referred to as metadata ) by running \u201c sudo dnf update\u201d. Try out sudo dnf repolist all to list all the repositories. We usually add repositories for installing packages from third party vendors. dnf config-manager --add-repo http://www.example.com/example.repo High Level and Low-Level Package management tools There are mainly two types of packages management tools: 1. Low-level tools : This is mostly used for installing, removing and upgrading package files. 2. High-Level tools : In addition to Low-level tools, High-level tools do metadata searching and dependency resolution as well. Linux Distribution Low-Level Tools High-Level tools Debian dpkg apt-get Fedora, RedHat dnf dnf","title":"Package Management"},{"location":"level102/linux_intermediate/package_management/#package-management","text":"","title":"Package Management"},{"location":"level102/linux_intermediate/package_management/#introduction","text":"One of the main features of any operating system is the ability to run other programs and softwares, and hence Package management comes into picture. Package management is a method of installing and maintaining software programs on any operating system.","title":"Introduction"},{"location":"level102/linux_intermediate/package_management/#package","text":"In the early days of Linux, one had to download source code of any software and compile it to install and run the software. As the Linux space became more mature, it is understood the software landscape is very dynamic and started distributing software in the form of packages. Package file is a compressed collection of files that contains software, its dependencies, installation instructions and metadata about the package.","title":"Package"},{"location":"level102/linux_intermediate/package_management/#dependencies","text":"It is rare that a software package is stand-alone, it depends on the different software, libraries and modules. These subroutines are stored and made available in the form of shared libraries which may serve more than one program. These shared resources are called dependencies. Package management does this hard job of resolving dependencies and installing them for the user along with the software.","title":"Dependencies"},{"location":"level102/linux_intermediate/package_management/#repository","text":"Repository is a storage location where all the packages, updates, dependencies are stored. Each repository can contain thousands of software packages hosted on a remote server intended to be installed and updated on linux systems. We usually update the package information ( often referred to as metadata ) by running \u201c sudo dnf update\u201d. Try out sudo dnf repolist all to list all the repositories. We usually add repositories for installing packages from third party vendors. dnf config-manager --add-repo http://www.example.com/example.repo","title":"Repository"},{"location":"level102/linux_intermediate/package_management/#high-level-and-low-level-package-management-tools","text":"There are mainly two types of packages management tools: 1. Low-level tools : This is mostly used for installing, removing and upgrading package files. 2. High-Level tools : In addition to Low-level tools, High-level tools do metadata searching and dependency resolution as well. Linux Distribution Low-Level Tools High-Level tools Debian dpkg apt-get Fedora, RedHat dnf dnf","title":"High Level and Low-Level Package management tools"},{"location":"level102/linux_intermediate/storage_media/","text":"Storage Media Introduction Storage media are devices which are used to store data and information. Linux has amazing capabilities when it comes to handling external devices including storage devices. There are many kinds of storage devices physical storage devices like hard drives, virtual storage devices like RAID or LVM, network storage and so on. In this section we will learn to work with any storage device and configure it to our needs. Listing the mounted storage devices: We can use command mount to list all the storage devices mounted to your computer. The format in which we see above output is: device on mount_point type file\\_system\\_type (options) For example in the first line the device virtual sysfs is mounted at /sys path and has a sysfs file system. Now let\u2019s see what and how a filesystem is created. Creating a FileSystem Imagine a disk where all the data stored in the disk is in the form of one large chunk, there is nothing to figure out where one piece of data starts and ends, which piece of data is located at which place of the whole chunk of data and hence the File System comes into picture. File System(fs) is responsible for data storage, indexing and retrieval on any storage device. Below are the most popularly used file systems: FS Type Description FAT File Allocation Table, initially used on DOS and Microsoft Windows and now widely used for portable USB storage NTFS (New Technology File System) Used on Microsoft\u2019s Windows based operating systems ext Extended file system, designed for Linux systems. ext4 Fourth extended filesystem, is a journaled file system that is commonly used by the Linux kernel. HFS Hierarchical File System, in use until HFS+ was introduced on Mac OS 8.1. HFS+ Supports file system journaling, enabling recovery of data after a system crash. NFS Network File System originally from Sun Microsystems is the standard in UNIX-based networks. We will try to create an ext4 file system which is linux native fs using mkfs . Discalimer: Run this command on empty disk as this will wipe out the existing data. Here the device /dev/sdb1 is formatted and it\u2019s filesystem is changed to ext4 . Mounting the device: In Linux systems all files are arranged in a tree structure with (/) as root. Mounting a fs simply means making that fs accessible to a certain point in the Linux directory tree. We need a mount point(location) where we want to mount the above formatted device. We created a mount point /mount and used the mount command to attach the filesystem. Here -t flag specifies what is the fs type and after that the /dev/sdb1 (device name) and /mount (mount point we created earlier). Unmounting the device: Now let\u2019s see how we can unmount the device, which is equally important if we have removable storage media and want to mount on another host. We use umount for unmounting the device. Our first attempt did not unmount the /sdb1 because we were inside the storage device and it was being used. Once we jumped back to the home directory we were successfully able to unmount the device. Making it easier with /etc/fstab file? In our production environment, we can have servers with many storage devices that need to be mounted, and it is not feasible to mount each device using the command every time we reboot the system. To ease this burden, we can make use of configuration table called \u201cfstab\u201d usually found in /etc/fstab on Linux systems. Here on the first line we have /dev/mapper/rootvg-rootlv (storage device ) mounted on / (root mount point) which has the xfs filesystem type followed by options. We can run mount -a to reload this file after making changes. Checking and Repairing FS Filesystems encounter issues in case of any hardware failure, power failure and sometimes due to improper shutdown. Linux usually checks and repairs the corrupted disk if any during startup. We can also manually check for filesystem corruption using the command fsck . We can repair the same filesystem using fsck -y /dev/sdb1 . There are error codes attached to each kind of file system error ,and A sum of active errors is returned. Error Codes Description 0 No errors 1 Filesystem errors corrected 2 System should be rebooted 4 Filesystem errors left uncorrected 8 Operational error 16 Usage or syntax error 32 Checking canceled by user request 128 Shared-library error In the above fs check we got return code as 12 which is the sum of error code 8(operational error) and 4(uncorrected FS error). RAID RAID or \u201cRedundant Arrays of Independent Disks\u201d is a technique that distributes I/O across multiple disks to achieve increased performance and data redundancy. RAID has the ability to increase overall disk performance and survive disk failures. Software RAID uses the computer\u2019s CPU to carry out RAID operations whereas hardware RAID uses specialized processors, on disk controllers, to manage the disks. Three essential features of RAID are mirroring, striping and parity. RAID levels The below section discusses the RAID levels that are commonly used. For information on all RAID levels, please refer to here . RAID 0 (Striping) Striping is the method by which data is split up into \u201cblocks\u201d and written across all the disks present in the array. By spreading data across multiple drives, it means multiple disks can access the file, resulting in faster read/write speeds. The first disk in the array is not reused until an equal amount of data is written to each of the other disks in the array. Advantages It can be easily implemented. Bottlenecks caused due to I/O operations from the same disk are avoided, increasing the performance of such operations. Disadvantages It does not offer any kind of redundancy. If any one of the disks fails, then the data of the entire disk is lost and cannot be recovered. Use cases RAID 0 can be used for systems with non-critical data that has to be read at high speed, such as a video/audio editing station or gaming environments. RAID 1(Mirroring) Mirroring writes a copy of data to each disk which is part of the array. This means that the data is written as many times as disks in the array . It stores an exact replica of all data on a separate disk or disks. As expected, this would result in a slow write performance compared to that of a single disk. On the other hand, read operations can be done parallelly improving read performance. Advantages RAID 1 offers a better read performance than RAID 0 or single disk. It can survive multiple disk failures without the need for special data recovery algorithms Disadvantages It is costly since the effective storage capacity is only half of the number of disks due to replication of data. Use cases Applications that require low downtime but can have a slight hit on write performance. RAID 4(Striping with dedicated parity) RAID 4 works uses block-level striping (data can be striped in blocks of a variety of sizes depending on the applications and data to be stored) and a dedicated drive used to store parity information.The parity information is generated by an algorithm every time data is written to an array disk. The use of a parity bit is a way of adding checksums into data that can enable the target device to determine whether the data has been received correctly. In the event of a drive failure , the algorithm can be reversed and missing data can be generated based on the remaining data and parity information. Advantages Each drive in a RAID 4 array operates independently so I/O requests take place in parallel, speeding up performance over previous RAID levels. It can survive multiple disk failures without the need for special data recovery algorithms Disadvantages A minimum of 3 disks is required for setup. It needs hardware support for parity calculation. Write speeds are slow since parity relies on a single disk drive and carry out modifications of parity blocks for each I/O session. Use cases Operations dealing with really large files \u2013 when sequential read and write data process is used RAID 5(Striping with distributed parity) RAID 5 is similar to RAID 4, except that the parity information is spread across all drives in the array. This helps reduce the bottleneck inherent in writing parity information to a single drive during each write operation. RAID 5 is the most common secure RAID level. Advantages Read data transactions are fast as compared to write data transactions that are somewhat slow due to the calculation of parity. Data remains accessible even after drive failure and during replacement of a failed hard drive because the storage controller rebuilds the data on the new drive. Disadvantages RAID 5 requires a minimum of 3 drives and can work up to a maximum of 16 drives It needs hardware support for parity calculation. More than two drive failures can cause data loss. Use cases File storage and application servers, such as email, general storage servers, etc. RAID 6(Striping with double parity) RAID 6 is similar to RAID 5 with an added advantage of double distributed parity, which provides fault tolerance up to two failed drives. Advantages Read data transactions are fast. This provides a fault tolerance up to 2 failed drives. RAID 6 is more resilient than RAID 5. Disadvantages Write data transactions are slow due to double parity. Rebuilding the RAID array takes a longer time because of complex structure. Use cases Office automation, online customer service, and applications that require very high availability. RAID 10(RAID 1+0 : Mirroring and Striping) RAID 10 is a combination of RAID 0 and RAID 1. It means that both mirroring and striping in one single RAID array. Advantages Rebuilding the RAID array is fast. Read and write operations performance are good. Disadvantages Just like RAID 1, only half the drive capacity is available. It can be expensive to implement RAID 10. Use cases Transactional databases with sensitive information that require high performance and high data security. Commands to monitor RAID The command cat /proc/mdstat will give the status of a software RAID. Let us examine the output of the command: Personalities : [raid1] md0 : active raid1 sdb1[2] sda1[0] 10476544 blocks super 1.1 [2/2] [UU] bitmap: 0/1 pages [0KB], 65536KB chunk md1 : active raid1 sdb2[2] sda2[0] 10476544 blocks super 1.1 [2/2] [UU] bitmap: 1/1 pages [4KB], 65536KB chunk md2 : active raid1 sdb3[2] 41909248 blocks super 1.1 [2/1] [_U] bitmap: 1/1 pages [4KB], 65536KB chunk The \u201cpersonalities\u201d gives us the raid level that the raid is configured. In the above example, the raid is configured with RAID 1. md0 : active raid1 sdb1[2] sda1[0] tells us that there is an active raid of RAID 1 between sdb1(which is device 2) and sda1(which is device 0).An inactive array generally means that one of the disks are faulty. Md2 in the above example shows that we have 41909248 blocks super 1.1 [2/1] [_U] , this means that one disk is down in this particular raid. The command mdadm --detail /dev/ gives detailed information about that particular array. sudo mdadm --detail /dev/md0 /dev/md0: Version : 1.1 Creation Time : Fri Nov 17 11:49:20 2019 Raid Level : raid1 Array Size : 10476544 (9.99 GiB 10.32 GB) Used Dev Size : 10476544 (9.99 GiB 10.32 GB) Raid Devices : 2 Total Devices : 2 Persistence : Superblock is persistent Intent Bitmap : Internal Update Time : Sun Dec 2 01:00:53 2019 State : clean Active Devices : 2 Working Devices : 2 Failed Devices : 0 Spare Devices : 0 UUID : xxxxxxx:yyyyyy:zzzzzz:ffffff Events : 987 Number Major Minor RaidDevice State 0 8 1 0 active sync /dev/sda1 1 8 49 1 active sync /dev/sdb1 Incase of a missing disk in the above example, the State of the raid would be \u2018dirty\u2019 and Active Devices and Working Devices would be reduced to one. One of the entries(either /dev/sda1 or /dev/sdb1 depending on the missing disk) would have their RaidDevice changed to faulty. LVM LVM stands for Logical Volume Management. In the above section we saw how we can create FS and use individual disks according to our need the traditional way but using LVM we can achieve more flexibility in storage allocation like we can stitch three 2TB disks to make one single partition of 6TB, or we can attach another physical disk of 4TB to the server and add that disk to the logical volume group to make it 10TB in total. Refer to know more about LVM: https://www.redhat.com/sysadmin/lvm-vs-partitioning","title":"Storage Media"},{"location":"level102/linux_intermediate/storage_media/#storage-media","text":"","title":"Storage Media"},{"location":"level102/linux_intermediate/storage_media/#introduction","text":"Storage media are devices which are used to store data and information. Linux has amazing capabilities when it comes to handling external devices including storage devices. There are many kinds of storage devices physical storage devices like hard drives, virtual storage devices like RAID or LVM, network storage and so on. In this section we will learn to work with any storage device and configure it to our needs.","title":"Introduction"},{"location":"level102/linux_intermediate/storage_media/#listing-the-mounted-storage-devices","text":"We can use command mount to list all the storage devices mounted to your computer. The format in which we see above output is: device on mount_point type file\\_system\\_type (options) For example in the first line the device virtual sysfs is mounted at /sys path and has a sysfs file system. Now let\u2019s see what and how a filesystem is created.","title":"Listing the mounted storage devices:"},{"location":"level102/linux_intermediate/storage_media/#creating-a-filesystem","text":"Imagine a disk where all the data stored in the disk is in the form of one large chunk, there is nothing to figure out where one piece of data starts and ends, which piece of data is located at which place of the whole chunk of data and hence the File System comes into picture. File System(fs) is responsible for data storage, indexing and retrieval on any storage device. Below are the most popularly used file systems: FS Type Description FAT File Allocation Table, initially used on DOS and Microsoft Windows and now widely used for portable USB storage NTFS (New Technology File System) Used on Microsoft\u2019s Windows based operating systems ext Extended file system, designed for Linux systems. ext4 Fourth extended filesystem, is a journaled file system that is commonly used by the Linux kernel. HFS Hierarchical File System, in use until HFS+ was introduced on Mac OS 8.1. HFS+ Supports file system journaling, enabling recovery of data after a system crash. NFS Network File System originally from Sun Microsystems is the standard in UNIX-based networks. We will try to create an ext4 file system which is linux native fs using mkfs . Discalimer: Run this command on empty disk as this will wipe out the existing data. Here the device /dev/sdb1 is formatted and it\u2019s filesystem is changed to ext4 .","title":"Creating a FileSystem"},{"location":"level102/linux_intermediate/storage_media/#mounting-the-device","text":"In Linux systems all files are arranged in a tree structure with (/) as root. Mounting a fs simply means making that fs accessible to a certain point in the Linux directory tree. We need a mount point(location) where we want to mount the above formatted device. We created a mount point /mount and used the mount command to attach the filesystem. Here -t flag specifies what is the fs type and after that the /dev/sdb1 (device name) and /mount (mount point we created earlier).","title":"Mounting the device:"},{"location":"level102/linux_intermediate/storage_media/#unmounting-the-device","text":"Now let\u2019s see how we can unmount the device, which is equally important if we have removable storage media and want to mount on another host. We use umount for unmounting the device. Our first attempt did not unmount the /sdb1 because we were inside the storage device and it was being used. Once we jumped back to the home directory we were successfully able to unmount the device.","title":"Unmounting the device:"},{"location":"level102/linux_intermediate/storage_media/#making-it-easier-with-etcfstab-file","text":"In our production environment, we can have servers with many storage devices that need to be mounted, and it is not feasible to mount each device using the command every time we reboot the system. To ease this burden, we can make use of configuration table called \u201cfstab\u201d usually found in /etc/fstab on Linux systems. Here on the first line we have /dev/mapper/rootvg-rootlv (storage device ) mounted on / (root mount point) which has the xfs filesystem type followed by options. We can run mount -a to reload this file after making changes.","title":"Making it easier with /etc/fstab file?"},{"location":"level102/linux_intermediate/storage_media/#checking-and-repairing-fs","text":"Filesystems encounter issues in case of any hardware failure, power failure and sometimes due to improper shutdown. Linux usually checks and repairs the corrupted disk if any during startup. We can also manually check for filesystem corruption using the command fsck . We can repair the same filesystem using fsck -y /dev/sdb1 . There are error codes attached to each kind of file system error ,and A sum of active errors is returned. Error Codes Description 0 No errors 1 Filesystem errors corrected 2 System should be rebooted 4 Filesystem errors left uncorrected 8 Operational error 16 Usage or syntax error 32 Checking canceled by user request 128 Shared-library error In the above fs check we got return code as 12 which is the sum of error code 8(operational error) and 4(uncorrected FS error).","title":"Checking and Repairing FS"},{"location":"level102/linux_intermediate/storage_media/#raid","text":"RAID or \u201cRedundant Arrays of Independent Disks\u201d is a technique that distributes I/O across multiple disks to achieve increased performance and data redundancy. RAID has the ability to increase overall disk performance and survive disk failures. Software RAID uses the computer\u2019s CPU to carry out RAID operations whereas hardware RAID uses specialized processors, on disk controllers, to manage the disks. Three essential features of RAID are mirroring, striping and parity.","title":"RAID"},{"location":"level102/linux_intermediate/storage_media/#raid-levels","text":"The below section discusses the RAID levels that are commonly used. For information on all RAID levels, please refer to here .","title":"RAID levels"},{"location":"level102/linux_intermediate/storage_media/#raid-0-striping","text":"Striping is the method by which data is split up into \u201cblocks\u201d and written across all the disks present in the array. By spreading data across multiple drives, it means multiple disks can access the file, resulting in faster read/write speeds. The first disk in the array is not reused until an equal amount of data is written to each of the other disks in the array. Advantages It can be easily implemented. Bottlenecks caused due to I/O operations from the same disk are avoided, increasing the performance of such operations. Disadvantages It does not offer any kind of redundancy. If any one of the disks fails, then the data of the entire disk is lost and cannot be recovered. Use cases RAID 0 can be used for systems with non-critical data that has to be read at high speed, such as a video/audio editing station or gaming environments.","title":"RAID 0 (Striping)"},{"location":"level102/linux_intermediate/storage_media/#raid-1mirroring","text":"Mirroring writes a copy of data to each disk which is part of the array. This means that the data is written as many times as disks in the array . It stores an exact replica of all data on a separate disk or disks. As expected, this would result in a slow write performance compared to that of a single disk. On the other hand, read operations can be done parallelly improving read performance. Advantages RAID 1 offers a better read performance than RAID 0 or single disk. It can survive multiple disk failures without the need for special data recovery algorithms Disadvantages It is costly since the effective storage capacity is only half of the number of disks due to replication of data. Use cases Applications that require low downtime but can have a slight hit on write performance.","title":"RAID 1(Mirroring)"},{"location":"level102/linux_intermediate/storage_media/#raid-4striping-with-dedicated-parity","text":"RAID 4 works uses block-level striping (data can be striped in blocks of a variety of sizes depending on the applications and data to be stored) and a dedicated drive used to store parity information.The parity information is generated by an algorithm every time data is written to an array disk. The use of a parity bit is a way of adding checksums into data that can enable the target device to determine whether the data has been received correctly. In the event of a drive failure , the algorithm can be reversed and missing data can be generated based on the remaining data and parity information. Advantages Each drive in a RAID 4 array operates independently so I/O requests take place in parallel, speeding up performance over previous RAID levels. It can survive multiple disk failures without the need for special data recovery algorithms Disadvantages A minimum of 3 disks is required for setup. It needs hardware support for parity calculation. Write speeds are slow since parity relies on a single disk drive and carry out modifications of parity blocks for each I/O session. Use cases Operations dealing with really large files \u2013 when sequential read and write data process is used","title":"RAID 4(Striping with dedicated parity)"},{"location":"level102/linux_intermediate/storage_media/#raid-5striping-with-distributed-parity","text":"RAID 5 is similar to RAID 4, except that the parity information is spread across all drives in the array. This helps reduce the bottleneck inherent in writing parity information to a single drive during each write operation. RAID 5 is the most common secure RAID level. Advantages Read data transactions are fast as compared to write data transactions that are somewhat slow due to the calculation of parity. Data remains accessible even after drive failure and during replacement of a failed hard drive because the storage controller rebuilds the data on the new drive. Disadvantages RAID 5 requires a minimum of 3 drives and can work up to a maximum of 16 drives It needs hardware support for parity calculation. More than two drive failures can cause data loss. Use cases File storage and application servers, such as email, general storage servers, etc.","title":"RAID 5(Striping with distributed parity)"},{"location":"level102/linux_intermediate/storage_media/#raid-6striping-with-double-parity","text":"RAID 6 is similar to RAID 5 with an added advantage of double distributed parity, which provides fault tolerance up to two failed drives. Advantages Read data transactions are fast. This provides a fault tolerance up to 2 failed drives. RAID 6 is more resilient than RAID 5. Disadvantages Write data transactions are slow due to double parity. Rebuilding the RAID array takes a longer time because of complex structure. Use cases Office automation, online customer service, and applications that require very high availability.","title":"RAID 6(Striping with double parity)"},{"location":"level102/linux_intermediate/storage_media/#raid-10raid-10-mirroring-and-striping","text":"RAID 10 is a combination of RAID 0 and RAID 1. It means that both mirroring and striping in one single RAID array. Advantages Rebuilding the RAID array is fast. Read and write operations performance are good. Disadvantages Just like RAID 1, only half the drive capacity is available. It can be expensive to implement RAID 10. Use cases Transactional databases with sensitive information that require high performance and high data security.","title":"RAID 10(RAID 1+0 : Mirroring and Striping)"},{"location":"level102/linux_intermediate/storage_media/#commands-to-monitor-raid","text":"The command cat /proc/mdstat will give the status of a software RAID. Let us examine the output of the command: Personalities : [raid1] md0 : active raid1 sdb1[2] sda1[0] 10476544 blocks super 1.1 [2/2] [UU] bitmap: 0/1 pages [0KB], 65536KB chunk md1 : active raid1 sdb2[2] sda2[0] 10476544 blocks super 1.1 [2/2] [UU] bitmap: 1/1 pages [4KB], 65536KB chunk md2 : active raid1 sdb3[2] 41909248 blocks super 1.1 [2/1] [_U] bitmap: 1/1 pages [4KB], 65536KB chunk The \u201cpersonalities\u201d gives us the raid level that the raid is configured. In the above example, the raid is configured with RAID 1. md0 : active raid1 sdb1[2] sda1[0] tells us that there is an active raid of RAID 1 between sdb1(which is device 2) and sda1(which is device 0).An inactive array generally means that one of the disks are faulty. Md2 in the above example shows that we have 41909248 blocks super 1.1 [2/1] [_U] , this means that one disk is down in this particular raid. The command mdadm --detail /dev/ gives detailed information about that particular array. sudo mdadm --detail /dev/md0 /dev/md0: Version : 1.1 Creation Time : Fri Nov 17 11:49:20 2019 Raid Level : raid1 Array Size : 10476544 (9.99 GiB 10.32 GB) Used Dev Size : 10476544 (9.99 GiB 10.32 GB) Raid Devices : 2 Total Devices : 2 Persistence : Superblock is persistent Intent Bitmap : Internal Update Time : Sun Dec 2 01:00:53 2019 State : clean Active Devices : 2 Working Devices : 2 Failed Devices : 0 Spare Devices : 0 UUID : xxxxxxx:yyyyyy:zzzzzz:ffffff Events : 987 Number Major Minor RaidDevice State 0 8 1 0 active sync /dev/sda1 1 8 49 1 active sync /dev/sdb1 Incase of a missing disk in the above example, the State of the raid would be \u2018dirty\u2019 and Active Devices and Working Devices would be reduced to one. One of the entries(either /dev/sda1 or /dev/sdb1 depending on the missing disk) would have their RaidDevice changed to faulty.","title":"Commands to monitor RAID"},{"location":"level102/linux_intermediate/storage_media/#lvm","text":"LVM stands for Logical Volume Management. In the above section we saw how we can create FS and use individual disks according to our need the traditional way but using LVM we can achieve more flexibility in storage allocation like we can stitch three 2TB disks to make one single partition of 6TB, or we can attach another physical disk of 4TB to the server and add that disk to the logical volume group to make it 10TB in total. Refer to know more about LVM: https://www.redhat.com/sysadmin/lvm-vs-partitioning","title":"LVM"},{"location":"level102/networking/conclusion/","text":"This course would have given some background on deploying services in datacentre and various parameters to consider and available solutions. It has to be noted that, each of the solution discussed here have various pros and cons, so specific to the scenario/requirement, the right fit among these are to be identified and used. As we didnt go the depth of various technologies/solution in this course, it might have made the reader curious to know about some of the topics. Here are some of the reference or online training content, for further learning. linked engineering blog : has information about how Linkedin datacentres are setup and some of the key problems are solved. IPSpace blog : Has lot of articles about datacentre networking. Networking Basics course in edx. Happy learning !!","title":"Conclusion"},{"location":"level102/networking/infrastructure-features/","text":"Some of the aspects to consider are, whether the underlying data centre infrastructure supports ToR resiliency, i.e. features like link bundling (bonds), BGP, support for anycast service, load balancer, firewall, Quality of Service. As seen in previous sections, to deploy applications at scale, it will need certain capabilities to be supported from the infrastructure. This section will cover different options available, and their suitability. ToR connectivity This being one of the most frequent points of failure (considering the scale of deployment), there are different options available to connect the servers to the ToR. We are going to see them in detail below, Single ToR This is the simplest of all the options. Where a NIC of the server is connected to one ToR. The advantage of this approach is, there is a minimal number of switch ports used, allowing the DC fabric to support the rapid growth of server infrastructure (Note: Not only the ToR ports are used efficiently, but the upper switching layer in DC fabric as well, the port usage will be efficient). On the downside, the servers can be unreachable if there is an issue with the ToR, link or NIC. This will impact the stateful apps more, as the existing connections get abruptly disconnected. Fig 4: Single ToR design Dual ToR In this option, each server is connected to two ToR, of the same cabinet. This can be set up in active/passive mode, thereby providing resiliency during ToR/link/NIC failures. The resiliency can be achieved either in layer 2 or in layer 3. Layer 2 In this case, both the links are bundled together as a bond on the server side (with one NIC taking the active role and the other being passive). On the switch side, these two links are made part of multi-chassis lag (similar to bonding, but spread across switches). The prerequisite here is, both the ToR should be part of the same layer 2 domain. The IP addresses are configured on the bond interface on the server and SVI on the switch side. Note: In this, the ToR 2 role is only to provide resiliency. Fig 5: Dual ToR layer 2 setup Layer 3 In this case, both the links are configured as separate layer 3 interfaces. The resiliency is achieved by setting up a routing protocol (like BGP). Wherein one link is given higher preference over the other. In this case, the two ToR's can be set up independently, in layer 3 mode. The servers would need a virtual address, to which the services have to be bound. Note: In this, the ToR 2 role is only to provide resiliency. Fig 6: Dual ToR layer 3 setup Though the resiliency is better with dual ToR, the drawback is, the number of ports being used. As the access port in the ToR doubles up, the number of ports required in the Spine layer also doubles up, and this keeps cascading to higher layers. Type Single ToR Dual ToR (layer 2) Dual ToR (layer 3) Resiliency 1 No 2 Yes Yes Port usage 1:1 1:2 1:2 Cabling Less More More Cost of DC fabric Low High High ToR features required Low High Medium 1 Resiliency in terms of ToR/Link/NIC 2 As an alternative, resiliency can be addressed at the application layer. Along with the above-mentioned ones, an application might need more capabilities out of the infrastructure to deploy at scale. Some of them are, Anycast As seen in the previous section, of deploying at scale, anycast is one of the means to have services distributed across cabinets and still have traffic flowing to each one of the servers. To achieve this, two things are required Routing protocol between ToR and server (to announce the anycast address) Support for ECMP (Equal Cost Multi-Path) load balancing in the infrastructure, to distribute the flows across the cabinets. Load balancing Similar to Anycast, another means to achieve load balancing across servers (host a particular app), is using load balancers. These could be implemented in different ways Hardware load balancers: A LB device is placed inline of the traffic flow, and looks at the layer 3 and layer 4 information in an incoming packet. Then determine the set of real hosts, to which the connections are to be redirected. As covered in the Scale topic, these load balancers can be set up in two ways, Single-arm mode: In this mode, the load balancer handles only the incoming requests to the VIP. The response from the server goes directly to the clients. There are two ways to implement this, L2 DSR: Where the load balancer and the real servers remain in the same VLAN. Upon getting an incoming request, the load balancer identifies the real server to redirect the request and then modifies the destination mac address of that Ethernet frame. Upon processing this packet, the real server responds directly to the client. L3 DSR : In this case, the load balancer and real servers need not be in the same VLAN (does away with layer 2 complexities like running STP, managing wider broadcast domain, etc). Upon incoming request, the load balancer redirects to the real server, by modifying the destination IP address of the packet. Along with this, the DSCP value of the packet is set to a predefined value (mapped for that VIP). Upon receipt of this packet, the real server uses the DSCP value to determine the loopback address (VIP address). The response again goes directly to the client. Two arm mode: In this case, the load balancer is in line for incoming and outgoing traffic. DNS based load balancer: Here the DNS servers keep a check of the health of the real servers and resolve the domain in such a way that the client can connect to different servers in that cluster. This part was explained in detail in the deployment at scale section. IPVS based load balancing: This is another means, where an IPVS server presents itself as the service endpoint to the clients. Upon incoming request, the IPVS directs the request to the real servers. The IPVS can be set up to do health for the real servers. NAT Network Address Translation (NAT) will be required for hosts that need to connect to destinations on the Internet, but don't want to expose their configured NIC address. In this case, the address (of the internal server) is translated to a public address by a firewall. Few examples of this are proxy servers, mail servers, etc. QoS Quality of Service is a means to provide, differentiate treatment to few packets over others. These could provide priority in forwarding queues, or bandwidth reservations. In the data centre scenario, depending upon the bandwidth subscription ratio, the need for QoS varies, 1:1 bandwidth subscription ratio: In this case, the server to ToR connectivity (all servers in that cabinet) bandwidth should be equivalent to the ToR to Spine switch connectivity. Similarly for the upper layers as well. In this design, congestion on a link is not going to happen, as enough bandwidth will always be available. In this case, the only difference QoS can bring, it provides priority treatment for certain packets in the forwarding queue. Note: Packet buffering happens, when the packet moves between ports of different speeds, like 100Gbps, 10Gbps. Oversubscribed network: In this case, not all layers maintain a bandwidth subscription ratio, for example, the ToR uplink may be of lower bandwidth, compared to ToR to Server bandwidth (This is sometimes referred to as oversubscription ratio). In this case, there is a possibility of congestion. Here QoS might be required, to give priority as well as bandwidth reservation, for certain types of traffic flows.","title":"Infrastructure Services"},{"location":"level102/networking/infrastructure-features/#tor-connectivity","text":"This being one of the most frequent points of failure (considering the scale of deployment), there are different options available to connect the servers to the ToR. We are going to see them in detail below,","title":"ToR connectivity"},{"location":"level102/networking/infrastructure-features/#single-tor","text":"This is the simplest of all the options. Where a NIC of the server is connected to one ToR. The advantage of this approach is, there is a minimal number of switch ports used, allowing the DC fabric to support the rapid growth of server infrastructure (Note: Not only the ToR ports are used efficiently, but the upper switching layer in DC fabric as well, the port usage will be efficient). On the downside, the servers can be unreachable if there is an issue with the ToR, link or NIC. This will impact the stateful apps more, as the existing connections get abruptly disconnected. Fig 4: Single ToR design","title":"Single ToR"},{"location":"level102/networking/infrastructure-features/#dual-tor","text":"In this option, each server is connected to two ToR, of the same cabinet. This can be set up in active/passive mode, thereby providing resiliency during ToR/link/NIC failures. The resiliency can be achieved either in layer 2 or in layer 3.","title":"Dual ToR"},{"location":"level102/networking/infrastructure-features/#layer-2","text":"In this case, both the links are bundled together as a bond on the server side (with one NIC taking the active role and the other being passive). On the switch side, these two links are made part of multi-chassis lag (similar to bonding, but spread across switches). The prerequisite here is, both the ToR should be part of the same layer 2 domain. The IP addresses are configured on the bond interface on the server and SVI on the switch side. Note: In this, the ToR 2 role is only to provide resiliency. Fig 5: Dual ToR layer 2 setup","title":"Layer 2"},{"location":"level102/networking/infrastructure-features/#layer-3","text":"In this case, both the links are configured as separate layer 3 interfaces. The resiliency is achieved by setting up a routing protocol (like BGP). Wherein one link is given higher preference over the other. In this case, the two ToR's can be set up independently, in layer 3 mode. The servers would need a virtual address, to which the services have to be bound. Note: In this, the ToR 2 role is only to provide resiliency. Fig 6: Dual ToR layer 3 setup Though the resiliency is better with dual ToR, the drawback is, the number of ports being used. As the access port in the ToR doubles up, the number of ports required in the Spine layer also doubles up, and this keeps cascading to higher layers. Type Single ToR Dual ToR (layer 2) Dual ToR (layer 3) Resiliency 1 No 2 Yes Yes Port usage 1:1 1:2 1:2 Cabling Less More More Cost of DC fabric Low High High ToR features required Low High Medium 1 Resiliency in terms of ToR/Link/NIC 2 As an alternative, resiliency can be addressed at the application layer. Along with the above-mentioned ones, an application might need more capabilities out of the infrastructure to deploy at scale. Some of them are,","title":"Layer 3"},{"location":"level102/networking/infrastructure-features/#anycast","text":"As seen in the previous section, of deploying at scale, anycast is one of the means to have services distributed across cabinets and still have traffic flowing to each one of the servers. To achieve this, two things are required Routing protocol between ToR and server (to announce the anycast address) Support for ECMP (Equal Cost Multi-Path) load balancing in the infrastructure, to distribute the flows across the cabinets.","title":"Anycast"},{"location":"level102/networking/infrastructure-features/#load-balancing","text":"Similar to Anycast, another means to achieve load balancing across servers (host a particular app), is using load balancers. These could be implemented in different ways Hardware load balancers: A LB device is placed inline of the traffic flow, and looks at the layer 3 and layer 4 information in an incoming packet. Then determine the set of real hosts, to which the connections are to be redirected. As covered in the Scale topic, these load balancers can be set up in two ways, Single-arm mode: In this mode, the load balancer handles only the incoming requests to the VIP. The response from the server goes directly to the clients. There are two ways to implement this, L2 DSR: Where the load balancer and the real servers remain in the same VLAN. Upon getting an incoming request, the load balancer identifies the real server to redirect the request and then modifies the destination mac address of that Ethernet frame. Upon processing this packet, the real server responds directly to the client. L3 DSR : In this case, the load balancer and real servers need not be in the same VLAN (does away with layer 2 complexities like running STP, managing wider broadcast domain, etc). Upon incoming request, the load balancer redirects to the real server, by modifying the destination IP address of the packet. Along with this, the DSCP value of the packet is set to a predefined value (mapped for that VIP). Upon receipt of this packet, the real server uses the DSCP value to determine the loopback address (VIP address). The response again goes directly to the client. Two arm mode: In this case, the load balancer is in line for incoming and outgoing traffic. DNS based load balancer: Here the DNS servers keep a check of the health of the real servers and resolve the domain in such a way that the client can connect to different servers in that cluster. This part was explained in detail in the deployment at scale section. IPVS based load balancing: This is another means, where an IPVS server presents itself as the service endpoint to the clients. Upon incoming request, the IPVS directs the request to the real servers. The IPVS can be set up to do health for the real servers.","title":"Load balancing"},{"location":"level102/networking/infrastructure-features/#nat","text":"Network Address Translation (NAT) will be required for hosts that need to connect to destinations on the Internet, but don't want to expose their configured NIC address. In this case, the address (of the internal server) is translated to a public address by a firewall. Few examples of this are proxy servers, mail servers, etc.","title":"NAT"},{"location":"level102/networking/infrastructure-features/#qos","text":"Quality of Service is a means to provide, differentiate treatment to few packets over others. These could provide priority in forwarding queues, or bandwidth reservations. In the data centre scenario, depending upon the bandwidth subscription ratio, the need for QoS varies, 1:1 bandwidth subscription ratio: In this case, the server to ToR connectivity (all servers in that cabinet) bandwidth should be equivalent to the ToR to Spine switch connectivity. Similarly for the upper layers as well. In this design, congestion on a link is not going to happen, as enough bandwidth will always be available. In this case, the only difference QoS can bring, it provides priority treatment for certain packets in the forwarding queue. Note: Packet buffering happens, when the packet moves between ports of different speeds, like 100Gbps, 10Gbps. Oversubscribed network: In this case, not all layers maintain a bandwidth subscription ratio, for example, the ToR uplink may be of lower bandwidth, compared to ToR to Server bandwidth (This is sometimes referred to as oversubscription ratio). In this case, there is a possibility of congestion. Here QoS might be required, to give priority as well as bandwidth reservation, for certain types of traffic flows.","title":"QoS"},{"location":"level102/networking/introduction/","text":"Prerequisites It is recommended to have basic knowledge of network security, TCP and datacenter setup and the common terminologies used in them. Also, the readers are expected to go through the School of Sre contents - Linux Networking system design security What to expect from this course This part will cover how a datacenter infrastructure is segregated for different application needs as well as the consideration of deciding where to place an application. These will be broadly based on, Security, Scale, RTT (latency), Infrastructure features. Each of these topics will be covered in detail, Security - Will cover threat vectors faced by services facing external/internal clients. Potential mitigation options to consider while deploying them. This will touch upon perimeter security, DDoS protection, Network demarcation and ring-fencing the server clusters. Scale - Deploying large scale applications, require a better understanding of infrastructure capabilities, in terms of resource availability, failure domains, scaling options like using anycast, layer 4/7 load balancer, DNS based load balancing. RTT (latency) - Latency plays a key role in determining the overall performance of the distributed service/application, where calls are made between hosts to serve the users. Infrastructure features - Some of the aspects to consider are, whether the underlying data centre infrastructure supports ToR resiliency, i.e., features like link bundling (bonds), BGP (Border Gateway Protocol), support for anycast service, load balancer, firewall, Quality of Service. What is not covered under this course Though these parameters play a role in designing an application, we will not go into the details of the design. Each of these topics are vast, hence the objective is to introduce the terms and relevance of the parameters in them, and not to provide extensive details about each one of them. Course Contents Security Scale RTT Infrastructure features Conclusion Terminology Before discussing each of the topics, it is important to get familiar with few commonly used terms Cloud This refers to hosted solutions from different providers like Azure, AWS, GCP. Wherein enterprises can host their applications for either public or private usage. On-prem This term refers to physical Data Center(DC) infrastructure, built and managed by enterprises themselves. This can be used for private access as well as public (like users connecting over the Internet). Leaf switch (ToR) This refers to the switch, where the servers connect to, in a DC. They are called by many names, like access switch, Top of the Rack switch, Leaf switch. The term leaf switch comes from the Spine-leaf architecture , where the access switches are called leaf switches. Spine-leaf architecture is commonly used in large/hyper-scale data centres, which brings very high scalability options for the DC switching layer and is also more efficient in building and implementing these switches. Sometimes these are referred to as Clos architecture. Spine switch Spine switches are the aggregation point of several leaf switches, they provide the inter-leaf communication and also connect to the upper layer of DC infrastructure. DC fabric As the data centre grows, multiple Clos networks need to be interconnected, to support the scale, and fabric switches help to interconnect them. Cabinet This refers to the rack, where the servers and ToR are installed. One cabinet refers to the entire rack. BGP It is the Border Gateway Protocol, used to exchange routing information between routers and switches. This is one of the common protocols used in the Internet and as well Data Centers as well. Other protocols are also used in place of BGP, like OSPF. VPN A Virtual Private Network is a tunnel solution, where two private networks (like offices, datacentres, etc) can be interconnected over a public network (internet). These VPN tunnels encrypt the traffic before sending over the Internet, as a security measure. NIC Network Interface Card refers to the module in Servers, which consists of the Ethernet port and the interconnection to the system bus. It is used to connect to the switches (commonly ToR switches). Flow Flows refer to a traffic exchange between two nodes (could be servers, switches, routers, etc), which has common parameters like source/destination IP address, source/destination port number, IP Protocol number. This helps in traffic a particular traffic exchange session, between two nodes (like a file copy session, or an HTTP connection, etc). ECMP Equal Cost Multi-Path means, a switch/router can distribute the traffic to a destination, among multiple exit interfaces. The flow information is used to build a hash value and based on that, exit interfaces are selected. Once a flow is mapped to a particular exit interface, all the packets of that flow exit via the same interface only. This helps in preventing out of order delivery of packets. RTT This is a measure of the time it takes for a packet from the source to reach the destination and return to the source. This is most commonly used in measuring network performance and also troubleshooting. TCP throughput This is the measure of the data transfer rate achieved between two nodes. This is impacted by many parameters like RTT, packet size, window size, etc. Unicast This refers to the traffic flow between a single source to a single destination (i.e.) like ssh sessions, where there is one to one communication. Anycast This refers to one-to-one traffic flow as above, but endpoints could be multiple (i.e.) a single source can send traffic to any one of the destination hosts in that group. This is achieved by having the same IP address configured in multiple servers and every new traffic flow is mapped to one of the servers. Multicast This refers to one-to-many traffic flow (i.e.) a single source can send traffic to multiple destinations. To make it feasible, the network routers replicate the traffic to different hosts (which register as members of that particular multicast group).","title":"Introduction"},{"location":"level102/networking/introduction/#prerequisites","text":"It is recommended to have basic knowledge of network security, TCP and datacenter setup and the common terminologies used in them. Also, the readers are expected to go through the School of Sre contents - Linux Networking system design security","title":"Prerequisites"},{"location":"level102/networking/introduction/#what-to-expect-from-this-course","text":"This part will cover how a datacenter infrastructure is segregated for different application needs as well as the consideration of deciding where to place an application. These will be broadly based on, Security, Scale, RTT (latency), Infrastructure features. Each of these topics will be covered in detail, Security - Will cover threat vectors faced by services facing external/internal clients. Potential mitigation options to consider while deploying them. This will touch upon perimeter security, DDoS protection, Network demarcation and ring-fencing the server clusters. Scale - Deploying large scale applications, require a better understanding of infrastructure capabilities, in terms of resource availability, failure domains, scaling options like using anycast, layer 4/7 load balancer, DNS based load balancing. RTT (latency) - Latency plays a key role in determining the overall performance of the distributed service/application, where calls are made between hosts to serve the users. Infrastructure features - Some of the aspects to consider are, whether the underlying data centre infrastructure supports ToR resiliency, i.e., features like link bundling (bonds), BGP (Border Gateway Protocol), support for anycast service, load balancer, firewall, Quality of Service.","title":"What to expect from this course"},{"location":"level102/networking/introduction/#what-is-not-covered-under-this-course","text":"Though these parameters play a role in designing an application, we will not go into the details of the design. Each of these topics are vast, hence the objective is to introduce the terms and relevance of the parameters in them, and not to provide extensive details about each one of them.","title":"What is not covered under this course"},{"location":"level102/networking/introduction/#course-contents","text":"Security Scale RTT Infrastructure features Conclusion","title":"Course Contents"},{"location":"level102/networking/introduction/#terminology","text":"Before discussing each of the topics, it is important to get familiar with few commonly used terms Cloud This refers to hosted solutions from different providers like Azure, AWS, GCP. Wherein enterprises can host their applications for either public or private usage. On-prem This term refers to physical Data Center(DC) infrastructure, built and managed by enterprises themselves. This can be used for private access as well as public (like users connecting over the Internet). Leaf switch (ToR) This refers to the switch, where the servers connect to, in a DC. They are called by many names, like access switch, Top of the Rack switch, Leaf switch. The term leaf switch comes from the Spine-leaf architecture , where the access switches are called leaf switches. Spine-leaf architecture is commonly used in large/hyper-scale data centres, which brings very high scalability options for the DC switching layer and is also more efficient in building and implementing these switches. Sometimes these are referred to as Clos architecture. Spine switch Spine switches are the aggregation point of several leaf switches, they provide the inter-leaf communication and also connect to the upper layer of DC infrastructure. DC fabric As the data centre grows, multiple Clos networks need to be interconnected, to support the scale, and fabric switches help to interconnect them. Cabinet This refers to the rack, where the servers and ToR are installed. One cabinet refers to the entire rack. BGP It is the Border Gateway Protocol, used to exchange routing information between routers and switches. This is one of the common protocols used in the Internet and as well Data Centers as well. Other protocols are also used in place of BGP, like OSPF. VPN A Virtual Private Network is a tunnel solution, where two private networks (like offices, datacentres, etc) can be interconnected over a public network (internet). These VPN tunnels encrypt the traffic before sending over the Internet, as a security measure. NIC Network Interface Card refers to the module in Servers, which consists of the Ethernet port and the interconnection to the system bus. It is used to connect to the switches (commonly ToR switches). Flow Flows refer to a traffic exchange between two nodes (could be servers, switches, routers, etc), which has common parameters like source/destination IP address, source/destination port number, IP Protocol number. This helps in traffic a particular traffic exchange session, between two nodes (like a file copy session, or an HTTP connection, etc). ECMP Equal Cost Multi-Path means, a switch/router can distribute the traffic to a destination, among multiple exit interfaces. The flow information is used to build a hash value and based on that, exit interfaces are selected. Once a flow is mapped to a particular exit interface, all the packets of that flow exit via the same interface only. This helps in preventing out of order delivery of packets. RTT This is a measure of the time it takes for a packet from the source to reach the destination and return to the source. This is most commonly used in measuring network performance and also troubleshooting. TCP throughput This is the measure of the data transfer rate achieved between two nodes. This is impacted by many parameters like RTT, packet size, window size, etc. Unicast This refers to the traffic flow between a single source to a single destination (i.e.) like ssh sessions, where there is one to one communication. Anycast This refers to one-to-one traffic flow as above, but endpoints could be multiple (i.e.) a single source can send traffic to any one of the destination hosts in that group. This is achieved by having the same IP address configured in multiple servers and every new traffic flow is mapped to one of the servers. Multicast This refers to one-to-many traffic flow (i.e.) a single source can send traffic to multiple destinations. To make it feasible, the network routers replicate the traffic to different hosts (which register as members of that particular multicast group).","title":"Terminology"},{"location":"level102/networking/rtt/","text":"Latency plays a key role in determining the overall performance of the distributed service/application, where calls are made between hosts to serve the users. RTT is a measure of time, it takes for a packet to reach B from A, and return to A. It is measured in milliseconds. This measure plays a role in determining the performance of the services. Its impact is seen in calls made between different servers/services, to serve the user, as well as the TCP throughput that can be achieved. It is fairly common that service makes multiple calls to servers within its cluster or to different services like authentication, logging, database, etc, to respond to each user/client request. These servers can be spread across different cabinets, at times even between different data centres in the same region. Such cases are quite possible in cloud solutions, where the deployment spreads across different sites within a region. As the RTT increases, the response time for each of the calls gets longer and thereby has a cascading effect on the end response being sent to the user. Relation of RTT and throughput RTT is inversely proportional to the TCP throughput. As RTT increases, it reduces the TCP throughput, just like packet loss. Below is a formula to estimate the TCP throughput, based on TCP mss, RTT and packet loss. As within a data centre, these calculations are also, important for communication over the internet, where a client can connect to the DC hosted services, over different telco networks and the RTT is not very stable, due to the unpredictability of the Internet routing policies.","title":"RTT"},{"location":"level102/networking/rtt/#relation-of-rtt-and-throughput","text":"RTT is inversely proportional to the TCP throughput. As RTT increases, it reduces the TCP throughput, just like packet loss. Below is a formula to estimate the TCP throughput, based on TCP mss, RTT and packet loss. As within a data centre, these calculations are also, important for communication over the internet, where a client can connect to the DC hosted services, over different telco networks and the RTT is not very stable, due to the unpredictability of the Internet routing policies.","title":"Relation of RTT and throughput"},{"location":"level102/networking/scale/","text":"Deploying large scale applications, require a better understanding of infrastructure capabilities, in terms of resource availability, failure domains, scaling options like using anycast, layer 4/7 load balancer, DNS based load balancing. Building large scale applications is a complex activity, which should cover many aspects in design, development and as well as operationalisation. This section will talk about the considerations to look for while deploying them. Failure domains In any infrastructure, failures due to hardware or software issues are common. Though these may be a pain from a service availability perspective, these failures do happen and a pragmatic goal would be to, try to keep these failures to the minimum. Hence while deploying any service, failures/non-availability of some of the nodes to be factored in. Server failures A server could fail, due to power or NIC or software bug. And at times it may not be a complete failure but could be an error in the NIC, which causes some packet loss. This is a very common scenario and will impact the stateful services more. While designing such services, it is important to accommodate some level of tolerance to such failures. ToR failures This is one of the common scenarios, where the leaf switch connecting the servers goes down, along with it taking down the entire cabinet. There could be more than one server of the same service that can go down in this case. It requires planning to decide how much server loss can be handled without overloading other servers. Based on this, the service can be distributed across many cabinets. These calculations may vary, depending upon the resiliency in the ToR design, which will be covered in ToR connectivity section. Site failures Here site failure is a generic term, which could mean, a particular service is down in a site, maybe due to new version rollout, or failures of devices like firewall, load balancer, if the service depends on them, or loss of connectivity to remote sites (which might have limited options for resiliency) or issues with critical services like DNS, etc. Though these events may not be common, they can have a significant impact. In summary, handling these failure scenarios has to be thought about while designing the application itself. That will provide the tolerance required within the application to recover from unexpected failures. This will help not only for failures, even for planned maintenance work, as it will be easier to take part of the infrastructure, out of service. Resource availability The other aspect to consider while deploying applications at scale is the availability of the required infrastructure and the features the service is dependent upon. For example, for the resiliency of a cabinet, if one decides to distribute the service to 5 cabinets, but the service needs a load balancer (to distribute incoming connections to different servers), it may become challenging if load balancers are not supported in all cabinets. Or there could be a case that there are not enough cabinets available (that meet the minimum required specification for service to be set up). The best approach in these cases is to identify the requirements and gaps and then work with the Infrastructure team to best solve them. Scaling options While distributing the application to different cabinets, the incoming traffic to these services has to be distributed across these servers. To achieve this, the following may be considered Anycast This is one of the quickest ways to roll out traffic distribution across multiple cabinets. In this, each server, part of the cluster (where the service is set up), advertises a loopback address (/32 IPv4 or /128 IPv6 address), to the DC switch fabric (most commonly BGP is used for this purpose). The service has to be set up to be listening to this loopback address. When the clients try to connect to the service, get resolved to this virtual address and forward their queries. The DC switch fabric distributes each flow into different available next hops (eventually to all the servers in that service cluster). Note: The DC switch computes a hash, based on the IP packet header, this could include any combination of source and destination addresses, source and destination port, mac address and IP protocol number. Based on this hash value, a particular next-hop is picked up. Since all the packets in a traffic flow, carry the same values for these headers, all the packets in that flow will be mapped to the same path. Fig 1: Anycast setup To achieve a proportionate distribution of flows across these servers, it is important to maintain uniformity in each of the cabinets and pods. But remember, the distribution happens only based on flows, and if there are any elephant (large) flows, some servers might receive a higher volume of traffic. If there are any server or ToR failures, the advertisement of loopback address to the switches will stop, and thereby the new packets will be forwarded to the remaining available servers. Load balancer Another common approach is to use a load balancer. A Virtual IP is set up in the load balancers, to which the client connects while trying to access the service. The load balancer, in turn, redirects these connections to, one of the actual servers, where the service is running. In order to, verify the server is in the serviceable state, the load balancer does periodic health checks, and if it fails, the LB stops redirecting the connection to these servers. The load balancer can be deployed in single-arm mode, where the traffic to the VIP is redirected by the LB, and the return traffic from the server to the client is sent directly. The other option is the two-arm mode, where the return traffic is also passed through the LB. Fig 2: Single-arm mode Fig 3: Two-arm mode One of the cons of this approach is, at a higher scale, the load balancer can become the bottleneck, to support higher traffic volumes or concurrent connections per second. DNS based load balancing This is similar to the above approach, with the only difference is instead of an appliance, the load balancing is done at the DNS. The clients get different IP's to connect when they query for the DNS records of the service. The DNS server has to do a health check, to know which servers are in a good state. This approach alleviates the bottleneck of the load balancer solution. But require shorter TTL for the DNS records, so that problematic servers can be taken out of rotation quickly, which means, there will be far more DNS queries.","title":"Scale"},{"location":"level102/networking/scale/#failure-domains","text":"In any infrastructure, failures due to hardware or software issues are common. Though these may be a pain from a service availability perspective, these failures do happen and a pragmatic goal would be to, try to keep these failures to the minimum. Hence while deploying any service, failures/non-availability of some of the nodes to be factored in.","title":"Failure domains"},{"location":"level102/networking/scale/#server-failures","text":"A server could fail, due to power or NIC or software bug. And at times it may not be a complete failure but could be an error in the NIC, which causes some packet loss. This is a very common scenario and will impact the stateful services more. While designing such services, it is important to accommodate some level of tolerance to such failures.","title":"Server failures"},{"location":"level102/networking/scale/#tor-failures","text":"This is one of the common scenarios, where the leaf switch connecting the servers goes down, along with it taking down the entire cabinet. There could be more than one server of the same service that can go down in this case. It requires planning to decide how much server loss can be handled without overloading other servers. Based on this, the service can be distributed across many cabinets. These calculations may vary, depending upon the resiliency in the ToR design, which will be covered in ToR connectivity section.","title":"ToR failures"},{"location":"level102/networking/scale/#site-failures","text":"Here site failure is a generic term, which could mean, a particular service is down in a site, maybe due to new version rollout, or failures of devices like firewall, load balancer, if the service depends on them, or loss of connectivity to remote sites (which might have limited options for resiliency) or issues with critical services like DNS, etc. Though these events may not be common, they can have a significant impact. In summary, handling these failure scenarios has to be thought about while designing the application itself. That will provide the tolerance required within the application to recover from unexpected failures. This will help not only for failures, even for planned maintenance work, as it will be easier to take part of the infrastructure, out of service.","title":"Site failures"},{"location":"level102/networking/scale/#resource-availability","text":"The other aspect to consider while deploying applications at scale is the availability of the required infrastructure and the features the service is dependent upon. For example, for the resiliency of a cabinet, if one decides to distribute the service to 5 cabinets, but the service needs a load balancer (to distribute incoming connections to different servers), it may become challenging if load balancers are not supported in all cabinets. Or there could be a case that there are not enough cabinets available (that meet the minimum required specification for service to be set up). The best approach in these cases is to identify the requirements and gaps and then work with the Infrastructure team to best solve them.","title":"Resource availability"},{"location":"level102/networking/scale/#scaling-options","text":"While distributing the application to different cabinets, the incoming traffic to these services has to be distributed across these servers. To achieve this, the following may be considered","title":"Scaling options"},{"location":"level102/networking/scale/#anycast","text":"This is one of the quickest ways to roll out traffic distribution across multiple cabinets. In this, each server, part of the cluster (where the service is set up), advertises a loopback address (/32 IPv4 or /128 IPv6 address), to the DC switch fabric (most commonly BGP is used for this purpose). The service has to be set up to be listening to this loopback address. When the clients try to connect to the service, get resolved to this virtual address and forward their queries. The DC switch fabric distributes each flow into different available next hops (eventually to all the servers in that service cluster). Note: The DC switch computes a hash, based on the IP packet header, this could include any combination of source and destination addresses, source and destination port, mac address and IP protocol number. Based on this hash value, a particular next-hop is picked up. Since all the packets in a traffic flow, carry the same values for these headers, all the packets in that flow will be mapped to the same path. Fig 1: Anycast setup To achieve a proportionate distribution of flows across these servers, it is important to maintain uniformity in each of the cabinets and pods. But remember, the distribution happens only based on flows, and if there are any elephant (large) flows, some servers might receive a higher volume of traffic. If there are any server or ToR failures, the advertisement of loopback address to the switches will stop, and thereby the new packets will be forwarded to the remaining available servers.","title":"Anycast"},{"location":"level102/networking/scale/#load-balancer","text":"Another common approach is to use a load balancer. A Virtual IP is set up in the load balancers, to which the client connects while trying to access the service. The load balancer, in turn, redirects these connections to, one of the actual servers, where the service is running. In order to, verify the server is in the serviceable state, the load balancer does periodic health checks, and if it fails, the LB stops redirecting the connection to these servers. The load balancer can be deployed in single-arm mode, where the traffic to the VIP is redirected by the LB, and the return traffic from the server to the client is sent directly. The other option is the two-arm mode, where the return traffic is also passed through the LB. Fig 2: Single-arm mode Fig 3: Two-arm mode One of the cons of this approach is, at a higher scale, the load balancer can become the bottleneck, to support higher traffic volumes or concurrent connections per second.","title":"Load balancer"},{"location":"level102/networking/scale/#dns-based-load-balancing","text":"This is similar to the above approach, with the only difference is instead of an appliance, the load balancing is done at the DNS. The clients get different IP's to connect when they query for the DNS records of the service. The DNS server has to do a health check, to know which servers are in a good state. This approach alleviates the bottleneck of the load balancer solution. But require shorter TTL for the DNS records, so that problematic servers can be taken out of rotation quickly, which means, there will be far more DNS queries.","title":"DNS based load balancing"},{"location":"level102/networking/security/","text":"This section will cover threat vectors faced by services facing external/internal clients. Potential mitigation options to consider while deploying them. This will touch upon perimeter security, DDoS protection, Network demarcation and operational practices. Security Threat Security is one of the major considerations in any infrastructure. There are various security threats, which could amount to data theft, loss of service, fraudulent activity, etc. An attacker can use techniques like phishing, spamming, malware, Dos/DDoS, exploiting vulnerabilities, man-in-the-middle attack, and many more. In this section, we will cover some of these threats and possible mitigation. As there are numerous means to attack and secure the infrastructure, we will only focus on some of the most common ones. Phishing is mostly done via email (and other mass communication methods), where an attacker provides links to fake websites/URLs. Upon accessing that, victim's sensitive information like login credentials or personal data is collected and can be misused. Spamming is also similar to phishing, but the attacker doesn't collect data from users but tries to spam a particular website and probably overwhelm them (to cause slowness) and well use that opportunity to, compromise the security of the attacked website. Malware is like a trojan horse, where an attacker manages to install a piece of code on the secured systems in the infrastructure. Using this, the hacker can collect sensitive data and as well infect the critical services of the target company. Exploiting vulnerabilities is another method an attacker can gain access to the systems. These could be bugs or misconfiguration in web servers, internet-facing routers/switches/firewalls, etc. DoS/DDoS is one of the common attacks seen on internet-based services/solutions, especially those businesses based on eyeball traffic. Here the attacker tries to overwhelm the resources of the victim by generating spurious traffic to the external-facing services. By this, primarily the services turn slow or non-responsive, during this time, the attacker could try to hack into the network, if some of the security mechanism fails to filter through the attack traffic due to overload. Securing the infrastructure The first and foremost aspect for any infrastructure administration is to identify the various security threats that could affect the business running over this infrastructure. Once different threats are known, the security defence mechanism has to be designed and implemented. Some of the common means to securing the infrastructure are Perimeter security This is the first line of defence in any infrastructure, where unwanted/unexpected traffic flows into the infrastructure are filtered/blocked. These could be filters in the edge routers, that allow expected services (like port 443 traffic for web service running on HTTPS), or this filter can be set up to block unwanted traffic, like blocking UDP ports, if the services are not dependent on UDP. Similar to the application traffic entering the network, there could be other traffic like BGP messages for Internet peers, VPN tunnels traffic, as well other services like email/DNS, etc. There are means to protect every one of these, like using authentication mechanisms (password or key-based) for peers of BGP, VPN, and whitelisting these specific peers to make inbound connections (in perimeter filters). Along with these, the amount of messages/traffic can be rate-limited to known scale or expected load, so the resources are not overwhelmed. DDoS mitigation Protecting against a DDoS attack is another important aspect. The attack traffic will look similar to the genuine users/client request, but with the intention to flood the externally exposed app, which could be a web server, DNS, etc. Therefore it is essential to differentiate between the attack traffic and genuine traffic, for this, there are different methods to do at the application level, one such example using Captcha on a web service, to catch traffic originating from bots. For these methods to be useful, the nodes should be capable of handling both the attack traffic and genuine traffic. It may be possible in cloud-based infrastructure to dynamically add more virtual machines/resources, to handle the sudden spike in volume of traffic, but on-prem, the option to add additional resources might be challenging. To handle a large volume of attack traffic, there are solutions available, which can inspect the packets/traffic flows and identify anomalies (i.e.) traffic patterns that don't resemble a genuine connection, like client initiating TCP connection, but fail to complete the handshake, or set of sources, which have abnormally huge traffic flow. Once this unwanted traffic is identified, these are dropped at the edge of the network itself, thereby protecting the resources of app nodes. This topic alone can be discussed more in detail, but that will be beyond the scope of this section. Network Demarcation Network demarcation is another common strategy deployed in different networks when applications are grouped based on their security needs and vulnerability to an attack. Some common demarcations are, the external/internet facing nodes are grouped into a separate zone, whereas those nodes having sensitive data are segregated into a separate zone. And any communication between these zones is restricted with the help of security tools to limit exposure to unwanted hosts/ports. These inter-zone communication filters are sometimes called ring-fencing. The number of zones to be created, varies for different deployments, for example, there could be a host which should be able to communicate to the external world as well as internal servers, like proxy, email, in this case, these can be grouped under one zone, say De-Militarized Zones (DMZ). The main advantage of creating zones is that, even if there is a compromised host, that doesn't act as a back door entry for the rest of the infrastructure. Node protection Be it server, router, switches, load balancers, firewall, etc, each of these devices come with certain capabilities to secure themselves, like support for filters (e.g. Access-list, Iptables) to control what traffic to process and what to drop, anti-virus software can be used in servers to check on the software installed in them. Operational practices There are numerous security threats for infrastructure, and there are different solutions to defend them. The key part to the defence, is not only identifying the right solution and the tools for it but also making sure there are robust operational procedures in place, to respond promptly, decisively and with clarity, for any security incident. Standard Operating Procedures (SOP) SOP need to be well defined and act as a reference for on-call to follow during a security incident. This SoP should cover things like, When a security incident happens, how it will be alerted, to whom it will be alerted. Identify the scale and severity of the security incident. Who are the points of escalation and the threshold/time to intimate them, there could be other concerned teams or to the management or even to the security operations in-charge. Which solutions to use (and the procedure to follow in them) to mitigate the security incident. Also the data about the security incident has to be collated for further analysis. Many organisations have a dedicated team focused on security, and they drive most of the activities, during an attack and even before, to come up with best practices, guidelines and compliance audits. It is the responsibility of respective technical teams, to ensure the infrastructure meets these recommendations and gaps are fixed. Periodic review Along with defining SoP's, the entire security of the infrastructure has to be reviewed periodically. This review should include, Identifying any new/improved security threat that could potentially target the infrastructure. The SoP's have to be reviewed periodically, depending upon new security threats or changes in the procedure (to implement the solutions) Ensuring software upgrades/patches are done in a timely manner. Audit the infrastructure for any non-compliance of the security standards. Review of recent security incidents and find means to improvise the defence mechanisms.","title":"Security"},{"location":"level102/networking/security/#security-threat","text":"Security is one of the major considerations in any infrastructure. There are various security threats, which could amount to data theft, loss of service, fraudulent activity, etc. An attacker can use techniques like phishing, spamming, malware, Dos/DDoS, exploiting vulnerabilities, man-in-the-middle attack, and many more. In this section, we will cover some of these threats and possible mitigation. As there are numerous means to attack and secure the infrastructure, we will only focus on some of the most common ones. Phishing is mostly done via email (and other mass communication methods), where an attacker provides links to fake websites/URLs. Upon accessing that, victim's sensitive information like login credentials or personal data is collected and can be misused. Spamming is also similar to phishing, but the attacker doesn't collect data from users but tries to spam a particular website and probably overwhelm them (to cause slowness) and well use that opportunity to, compromise the security of the attacked website. Malware is like a trojan horse, where an attacker manages to install a piece of code on the secured systems in the infrastructure. Using this, the hacker can collect sensitive data and as well infect the critical services of the target company. Exploiting vulnerabilities is another method an attacker can gain access to the systems. These could be bugs or misconfiguration in web servers, internet-facing routers/switches/firewalls, etc. DoS/DDoS is one of the common attacks seen on internet-based services/solutions, especially those businesses based on eyeball traffic. Here the attacker tries to overwhelm the resources of the victim by generating spurious traffic to the external-facing services. By this, primarily the services turn slow or non-responsive, during this time, the attacker could try to hack into the network, if some of the security mechanism fails to filter through the attack traffic due to overload.","title":"Security Threat"},{"location":"level102/networking/security/#securing-the-infrastructure","text":"The first and foremost aspect for any infrastructure administration is to identify the various security threats that could affect the business running over this infrastructure. Once different threats are known, the security defence mechanism has to be designed and implemented. Some of the common means to securing the infrastructure are","title":"Securing the infrastructure"},{"location":"level102/networking/security/#perimeter-security","text":"This is the first line of defence in any infrastructure, where unwanted/unexpected traffic flows into the infrastructure are filtered/blocked. These could be filters in the edge routers, that allow expected services (like port 443 traffic for web service running on HTTPS), or this filter can be set up to block unwanted traffic, like blocking UDP ports, if the services are not dependent on UDP. Similar to the application traffic entering the network, there could be other traffic like BGP messages for Internet peers, VPN tunnels traffic, as well other services like email/DNS, etc. There are means to protect every one of these, like using authentication mechanisms (password or key-based) for peers of BGP, VPN, and whitelisting these specific peers to make inbound connections (in perimeter filters). Along with these, the amount of messages/traffic can be rate-limited to known scale or expected load, so the resources are not overwhelmed.","title":"Perimeter security"},{"location":"level102/networking/security/#ddos-mitigation","text":"Protecting against a DDoS attack is another important aspect. The attack traffic will look similar to the genuine users/client request, but with the intention to flood the externally exposed app, which could be a web server, DNS, etc. Therefore it is essential to differentiate between the attack traffic and genuine traffic, for this, there are different methods to do at the application level, one such example using Captcha on a web service, to catch traffic originating from bots. For these methods to be useful, the nodes should be capable of handling both the attack traffic and genuine traffic. It may be possible in cloud-based infrastructure to dynamically add more virtual machines/resources, to handle the sudden spike in volume of traffic, but on-prem, the option to add additional resources might be challenging. To handle a large volume of attack traffic, there are solutions available, which can inspect the packets/traffic flows and identify anomalies (i.e.) traffic patterns that don't resemble a genuine connection, like client initiating TCP connection, but fail to complete the handshake, or set of sources, which have abnormally huge traffic flow. Once this unwanted traffic is identified, these are dropped at the edge of the network itself, thereby protecting the resources of app nodes. This topic alone can be discussed more in detail, but that will be beyond the scope of this section.","title":"DDoS mitigation"},{"location":"level102/networking/security/#network-demarcation","text":"Network demarcation is another common strategy deployed in different networks when applications are grouped based on their security needs and vulnerability to an attack. Some common demarcations are, the external/internet facing nodes are grouped into a separate zone, whereas those nodes having sensitive data are segregated into a separate zone. And any communication between these zones is restricted with the help of security tools to limit exposure to unwanted hosts/ports. These inter-zone communication filters are sometimes called ring-fencing. The number of zones to be created, varies for different deployments, for example, there could be a host which should be able to communicate to the external world as well as internal servers, like proxy, email, in this case, these can be grouped under one zone, say De-Militarized Zones (DMZ). The main advantage of creating zones is that, even if there is a compromised host, that doesn't act as a back door entry for the rest of the infrastructure.","title":"Network Demarcation"},{"location":"level102/networking/security/#node-protection","text":"Be it server, router, switches, load balancers, firewall, etc, each of these devices come with certain capabilities to secure themselves, like support for filters (e.g. Access-list, Iptables) to control what traffic to process and what to drop, anti-virus software can be used in servers to check on the software installed in them.","title":"Node protection"},{"location":"level102/networking/security/#operational-practices","text":"There are numerous security threats for infrastructure, and there are different solutions to defend them. The key part to the defence, is not only identifying the right solution and the tools for it but also making sure there are robust operational procedures in place, to respond promptly, decisively and with clarity, for any security incident.","title":"Operational practices"},{"location":"level102/networking/security/#standard-operating-procedures-sop","text":"SOP need to be well defined and act as a reference for on-call to follow during a security incident. This SoP should cover things like, When a security incident happens, how it will be alerted, to whom it will be alerted. Identify the scale and severity of the security incident. Who are the points of escalation and the threshold/time to intimate them, there could be other concerned teams or to the management or even to the security operations in-charge. Which solutions to use (and the procedure to follow in them) to mitigate the security incident. Also the data about the security incident has to be collated for further analysis. Many organisations have a dedicated team focused on security, and they drive most of the activities, during an attack and even before, to come up with best practices, guidelines and compliance audits. It is the responsibility of respective technical teams, to ensure the infrastructure meets these recommendations and gaps are fixed.","title":"Standard Operating Procedures (SOP)"},{"location":"level102/networking/security/#periodic-review","text":"Along with defining SoP's, the entire security of the infrastructure has to be reviewed periodically. This review should include, Identifying any new/improved security threat that could potentially target the infrastructure. The SoP's have to be reviewed periodically, depending upon new security threats or changes in the procedure (to implement the solutions) Ensuring software upgrades/patches are done in a timely manner. Audit the infrastructure for any non-compliance of the security standards. Review of recent security incidents and find means to improvise the defence mechanisms.","title":"Periodic review"},{"location":"level102/system_calls_and_signals/conclusion/","text":"Conclusion One of the main goals of a SRE is to improve the reliability of high scale systems. Inorder to achieve this, a basic understanding of the internal workings of a system is necessary. Getting to know about how signals work is important since they play a big role in the lifecycle of processes. We see the use of signals in a range of operations on processes : from creating a process to killing a process. Knowledge of signals is important especially when handling them in programs. If you anticipate an event that causes signals, you can define a handler function and tell the operating system to run it when that particular type of signal arrives. Understanding system calls is especially useful to SRE's while debugging any Linux process. System calls provide precise knowledge of the internal functionalities of an operating system. It gives an in-depth understanding for programmers about C library functions which implement system calls at a lower level. With the use of strace command, one may easily debug slow or hung processes. Further Reading https://www.oreilly.com/library/view/understanding-the-linux/0596002130/ch01s06.html https://jvns.ca/blog/2021/04/03/what-problems-do-people-solve-with-strace/ https://medium.com/@akhandmishra/important-system-calls-every-programmer-should-know-8884381ceadb https://www.brendangregg.com/blog/2014-05-11/strace-wow-much-syscall.html","title":"Conclusion"},{"location":"level102/system_calls_and_signals/conclusion/#conclusion","text":"One of the main goals of a SRE is to improve the reliability of high scale systems. Inorder to achieve this, a basic understanding of the internal workings of a system is necessary. Getting to know about how signals work is important since they play a big role in the lifecycle of processes. We see the use of signals in a range of operations on processes : from creating a process to killing a process. Knowledge of signals is important especially when handling them in programs. If you anticipate an event that causes signals, you can define a handler function and tell the operating system to run it when that particular type of signal arrives. Understanding system calls is especially useful to SRE's while debugging any Linux process. System calls provide precise knowledge of the internal functionalities of an operating system. It gives an in-depth understanding for programmers about C library functions which implement system calls at a lower level. With the use of strace command, one may easily debug slow or hung processes.","title":"Conclusion"},{"location":"level102/system_calls_and_signals/conclusion/#further-reading","text":"https://www.oreilly.com/library/view/understanding-the-linux/0596002130/ch01s06.html https://jvns.ca/blog/2021/04/03/what-problems-do-people-solve-with-strace/ https://medium.com/@akhandmishra/important-system-calls-every-programmer-should-know-8884381ceadb https://www.brendangregg.com/blog/2014-05-11/strace-wow-much-syscall.html","title":"Further Reading"},{"location":"level102/system_calls_and_signals/intro/","text":"System Calls and Signals Prerequisites Linux Basics Python Basics What to expect from this course The course covers a fundamental understanding of signals and system calls. It sheds light on how the knowledge of signals and system calls can be helpful for an SRE. What is not covered under this course The course does not discuss any other interrupts or interrupt handling apart from signals. The course will not deep dive into signal handler and GNU C library. Course Contents Signals Introduction to interrupts and signals Types of signals Sending signals to process Handling signals Role of signals in system calls with the example of wait() System calls Introduction Types of system calls User mode,kernel mode and their transitions Working of write() system call Debugging in Linux with strace","title":"Introduction"},{"location":"level102/system_calls_and_signals/intro/#system-calls-and-signals","text":"","title":"System Calls and Signals"},{"location":"level102/system_calls_and_signals/intro/#prerequisites","text":"Linux Basics Python Basics","title":"Prerequisites"},{"location":"level102/system_calls_and_signals/intro/#what-to-expect-from-this-course","text":"The course covers a fundamental understanding of signals and system calls. It sheds light on how the knowledge of signals and system calls can be helpful for an SRE.","title":"What to expect from this course"},{"location":"level102/system_calls_and_signals/intro/#what-is-not-covered-under-this-course","text":"The course does not discuss any other interrupts or interrupt handling apart from signals. The course will not deep dive into signal handler and GNU C library.","title":"What is not covered under this course"},{"location":"level102/system_calls_and_signals/intro/#course-contents","text":"Signals Introduction to interrupts and signals Types of signals Sending signals to process Handling signals Role of signals in system calls with the example of wait() System calls Introduction Types of system calls User mode,kernel mode and their transitions Working of write() system call Debugging in Linux with strace","title":"Course Contents"},{"location":"level102/system_calls_and_signals/signals/","text":"Introduction to interrupts and signals An interrupt is an event that alters the normal execution flow of a program and can be generated by hardware devices or even by the CPU itself. When an interrupt occurs the current flow of execution is suspended and the interrupt handler runs. After the interrupt handler runs the previous execution flow is resumed. There are three types of events that can cause the CPU to interrupt: hardware interrupts, software interrupts, and exceptions. Signals are nothing but software interrupts that notifies a process that an event has occurred. These events might be requests from users or indications that a system problem (such as a memory access error) has occurred. Every signal has a signal number and a default action defined. A process can react to them in any of the following ways: a default (OS-provided) way catch the signal and handle them in a program-defined way ignore the signal entirely Signal Groups Signals fall into two broad categories. The first set constitutes the traditional or standard signals, which are used by the kernel to notify processes of events. On Linux, the standard signals are numbered from 1 to 31. The other set of signals consists of the realtime signals. Linux supports both POSIX reliable signals (hereinafter \"standard signals\") and POSIX real-time signals. Realtime Signals Realtime signals were defined in POSIX.1b to remedy a number of limitations of standard signals. They have the following advantages over standard signals: Realtime signals provide an increased range of signals that can be used for application-defined purposes. Only two standard signals are freely available for application-defined purposes: SIGUSR1 and SIGUSR2. Realtime signals are queued. If multiple instances of a realtime signal are sent to a process, then the signal is delivered multiple times. By contrast, if we send further instances of a standard signal that is already pending for a process, that signal is delivered only once. When sending a realtime signal, it is possible to specify data (an integer or pointer value) that accompanies the signal. The signal handler in the receiving process can retrieve this data. The order of delivery of different realtime signals is guaranteed. If multiple different realtime signals are pending, then the lowest-numbered signal is delivered first. In other words, signals are prioritized, with lower-numbered signals having higher priority. When multiple signals of the same type are queued, they are delivered\u2014along with their accompanying data\u2014in the order in which they were sent. Standard Signals The standard signals are the classical signals that have been there since the early days of Unix. Further here, we will be discussing about standard signals. Signal Overview A signal is said to be generated by some event. Once generated, a signal is later delivered to a process, which then takes some action in response to the signal. Between the time it is generated and the time it is delivered, a signal is said to be pending. Normally, a pending signal is delivered to a process as soon as it is next scheduled to run, or immediately if the process is already running (e.g., if the process sent a signal to itself). Sometimes, however, we need to ensure that a segment of code is not interrupted by the delivery of a signal. To do this, we can add a signal to the process\u2019s signal mask - a set of signals whose delivery is currently blocked. If a signal is generated while it is blocked, it remains pending until it is later unblocked (removed from the signal mask). Various system calls allow a process to add and remove signals from its signal mask. Upon delivery of a signal, a process carries out one of the following default actions, depending on the signal: The signal is ignored; that is, it is discarded by the kernel and has no effect on the process. (The process never even knows that it occurred.) The process is terminated (killed). This is sometimes referred to as abnormal process termination, as opposed to the normal process termination that occurs when a process terminates using exit(). A core dump file is generated, and the process is terminated. A core dump file contains an image of the virtual memory of the process, which can be loaded into a debugger in order to inspect the state of the process at the time that it terminated. The process is stopped\u2014execution of the process is suspended. Execution of the process is resumed after previously being stopped. Instead of accepting the default for a particular signal, a program can change the action that occurs when the signal is delivered. This is known as setting the disposition of the signal. To read more about disposition, refer here . A program can set one of the following dispositions for a signal: The default action should occur. This is useful to undo an earlier change of the disposition of the signal to something other than its default. The signal is ignored. This is useful for a signal whose default action would be to terminate the process. A signal handler is executed. A signal handler is a function, written by the programmer, that performs appropriate tasks in response to the delivery of a signal. For example, the shell has a handler for the SIGINT signal (generated by the interrupt character, Control-C) that causes it to stop what it is currently doing and return control to the main input loop, so that the user is once more presented with the shell prompt. Notifying the kernel that a handler function should be invoked is usually referred to as installing or establishing a signal handler. When a signal handler is invoked in response to the delivery of a signal, we say that the signal has been handled or, synonymously, caught. Note that it isn\u2019t possible to set the disposition of a signal to terminate or dump core (unless one of these is the default disposition of the signal). The nearest we can get to this is to install a handler for the signal that then calls either exit() or abort(). The abort() function generates a SIGABRT signal for the process, which causes it to dump core and terminate. Types of signals To list available signals in a Linux system, you can use the command kill -l . The table below lists the signals 1 to 20. To get a full list of signals, you can refer here . Signal name Signal number Default Action Meaning SIGHUP 1 Terminate Hangup detected on controlling terminal or death of controlling process SIGINT 2 Terminate Interrupt from keyboard SIGQUIT 3 Core dump Quit from keyboard SIGILL 4 Core dump Illegal instruction SIGTRAP 5 Core dump Trace/breakpoint trap for debugging SIGABRT , SIGIOT 6 Core dump Abnormal termination SIGBUS 7 Core dump Bus error SIGFPE 8 Core dump Floating point exception SIGKILL 9 Terminate Kill signal(cannot be caught or ignored) SIGUSR1 10 Terminate User-defined signal 1 SIGSEGV 11 Core dump Invalid memory reference SIGUSR2 12 Terminate User-defined signal 2 SIGPIPE 13 Terminate Broken pipe;write pipe with no readers SIGALRM 14 Terminate Timer signal from alarm SIGTERM 15 Terminate Process termination SIGSTKFLT 16 Terminate Stack fault on math co-processor SIGCHLD 17 Ignore Child stopped or terminated SIGCONT 18 Continue Continue if stopped SIGSTOP 19 Stop Stop process (can not be caught or ignore) SIGTSTP 20 Stop Stop types at tty Sending signals to process There are three different ways to send signals to processes: Sending signal to process using kill Kill command can be used to send signals to process. By default a SIGTERM signal is sent but a different type of signal can be sent to the process by defining the signal number(or signal name). For example, the command kill -9 367 sends SIGKILL to the process with PID 367 Sending signal to process via keyboard Signals can be sent to a running process by pressing some specific keys. For example, holding Ctrl+C sends SIGINT to the process which terminates it. Sending signal to process via another process A process can send a signal to another process via the kill() system call. In this use, signals can be employed as a synchronization technique, or even as a primitive form of interprocess communication (IPC). It is also possible for a process to send a signal to itself. int kill(pid_t pid, int sig) system call takes 2 arguments, pid of the process you wish to send the signal to and the signal number of the desired signal. Handling signals Referring to the table of signals in the previous section, you can see that there are default handlers attached to all signals when the program is started. When we invoke signal to attach our own handler, we are over-riding the default behaviour of the program in response to that signal. Specifically, if we attach a handler to SIGINT, the program will no longer terminate when you press +C (or send the program a SIGINT by any other means); rather, the function specified as the handler will be invoked instead which will define the behaviour of the program in response to that signal. Let\u2019s take an example of handling SIGINT signal and terminating a program. We will use Python\u2019s signal library to achieve this. When we press Ctrl+C, SIGINT signal is sent. From the signals table, we see that the default action for SIGINT is to terminate the process. To illustrate how the process reacts to the default action and a signal handler, let us consider the below example. Default Action of SIGINT: Let us first run the below lines in a python environment: while 1: continue Now let us press \"Ctrl+C\". On pressing \"Ctrl+C\" , a SIGINT interrupt is sent to the process and the default action for SIGINT as per the table we saw in the previous section is to terminate the process. We see that the while loop is terminated and we get the below on our console: ^CTraceback (most recent call last): File \"\", line 2, in KeyboardInterrupt The process terminated(default action) since it received a SIGINT(Keyboard Interrupt) when we pressed Ctrl+C. Signal Handler for SIGINT: Let us run the below lines of code in the Python environment. import signal import sys #Start of signal_handler function def signal_handler(signal, frame): print ('You pressed Ctrl+C!') # End of signal_handler function signal.signal(signal.SIGINT, signal_handler) This is an example of a program that defines its own signal handler for SIGINT , overriding the default action. Now let us run the while and continue statement as we did previously. while 1: continue Do we see any changes when Ctrl+C is pressed? Does the program terminate? We see the below output: ^CYou pressed Ctrl+C! Everytime we press Ctrl+C, we just the see the above message and the program does not terminate. Inorder to terminate the program, you can press Ctrl+Z which sends the SIGSTOP signal whose default action is to stop the process. In the case of the signal handler, we define a function signal_handler() which prints \u201cYou pressed Ctrl+C!\u201d and does not terminate the program. The handler is called with two arguments, the signal number and the current stack frame (None or a frame object ). signal.signal() allows defining custom handlers to be executed when a signal is received. Its two arguments are the signal number(name) you want to trap and the name of the signal handler. Role of signals in system calls with the example of wait() The wait() system call waits for one of the children of the calling process to terminate and returns the termination status of that child in the buffer pointed to by statusPtr . If the parent process calls the wait() system call, then the execution of the parent is suspended until the child is terminated. At the termination of the child, a SIGCHLD signal is generated which is delivered to the parent by the kernel. SIGCHLD signal indicates to the parent that there is some information on the child that needs to be collected. Parent, on receipt of SIGCHLD , reaps the status of the child from the process table. Even though the child is terminated, there is an entry in the process table corresponding to the child where the process entry and PID is stored. When the parent collects the status, this entry is deleted. Thus, all the traces of the child process are removed from the system. Zombie and Orphane States If the parent decides not to wait for the child\u2019s termination and it executes its subsequent task, or fails to read the exit status of the child, there remains an entry in the process table even after the termination of the child. This state of the child process is known as the Zombie state. In order to avoid long-lasting zombies, we need to have code that calls wait() after the child process is created. It is generally good to create a signal handler for the SIGCHLD signal, calling one of the wait-family functions in a loop, until no uncollected child data remains. A child process becomes orphaned, if its parent process terminates before the child .The orphaned child is adopted by init/systemd, the ancestor of all processes, whose process ID is 1. Further calls to fetch the parent pid of this process returns 1.","title":"Signals"},{"location":"level102/system_calls_and_signals/signals/#introduction-to-interrupts-and-signals","text":"An interrupt is an event that alters the normal execution flow of a program and can be generated by hardware devices or even by the CPU itself. When an interrupt occurs the current flow of execution is suspended and the interrupt handler runs. After the interrupt handler runs the previous execution flow is resumed. There are three types of events that can cause the CPU to interrupt: hardware interrupts, software interrupts, and exceptions. Signals are nothing but software interrupts that notifies a process that an event has occurred. These events might be requests from users or indications that a system problem (such as a memory access error) has occurred. Every signal has a signal number and a default action defined. A process can react to them in any of the following ways: a default (OS-provided) way catch the signal and handle them in a program-defined way ignore the signal entirely","title":"Introduction to interrupts and signals"},{"location":"level102/system_calls_and_signals/signals/#signal-groups","text":"Signals fall into two broad categories. The first set constitutes the traditional or standard signals, which are used by the kernel to notify processes of events. On Linux, the standard signals are numbered from 1 to 31. The other set of signals consists of the realtime signals. Linux supports both POSIX reliable signals (hereinafter \"standard signals\") and POSIX real-time signals.","title":"Signal Groups"},{"location":"level102/system_calls_and_signals/signals/#realtime-signals","text":"Realtime signals were defined in POSIX.1b to remedy a number of limitations of standard signals. They have the following advantages over standard signals: Realtime signals provide an increased range of signals that can be used for application-defined purposes. Only two standard signals are freely available for application-defined purposes: SIGUSR1 and SIGUSR2. Realtime signals are queued. If multiple instances of a realtime signal are sent to a process, then the signal is delivered multiple times. By contrast, if we send further instances of a standard signal that is already pending for a process, that signal is delivered only once. When sending a realtime signal, it is possible to specify data (an integer or pointer value) that accompanies the signal. The signal handler in the receiving process can retrieve this data. The order of delivery of different realtime signals is guaranteed. If multiple different realtime signals are pending, then the lowest-numbered signal is delivered first. In other words, signals are prioritized, with lower-numbered signals having higher priority. When multiple signals of the same type are queued, they are delivered\u2014along with their accompanying data\u2014in the order in which they were sent.","title":"Realtime Signals"},{"location":"level102/system_calls_and_signals/signals/#standard-signals","text":"The standard signals are the classical signals that have been there since the early days of Unix. Further here, we will be discussing about standard signals.","title":"Standard Signals"},{"location":"level102/system_calls_and_signals/signals/#signal-overview","text":"A signal is said to be generated by some event. Once generated, a signal is later delivered to a process, which then takes some action in response to the signal. Between the time it is generated and the time it is delivered, a signal is said to be pending. Normally, a pending signal is delivered to a process as soon as it is next scheduled to run, or immediately if the process is already running (e.g., if the process sent a signal to itself). Sometimes, however, we need to ensure that a segment of code is not interrupted by the delivery of a signal. To do this, we can add a signal to the process\u2019s signal mask - a set of signals whose delivery is currently blocked. If a signal is generated while it is blocked, it remains pending until it is later unblocked (removed from the signal mask). Various system calls allow a process to add and remove signals from its signal mask. Upon delivery of a signal, a process carries out one of the following default actions, depending on the signal: The signal is ignored; that is, it is discarded by the kernel and has no effect on the process. (The process never even knows that it occurred.) The process is terminated (killed). This is sometimes referred to as abnormal process termination, as opposed to the normal process termination that occurs when a process terminates using exit(). A core dump file is generated, and the process is terminated. A core dump file contains an image of the virtual memory of the process, which can be loaded into a debugger in order to inspect the state of the process at the time that it terminated. The process is stopped\u2014execution of the process is suspended. Execution of the process is resumed after previously being stopped. Instead of accepting the default for a particular signal, a program can change the action that occurs when the signal is delivered. This is known as setting the disposition of the signal. To read more about disposition, refer here . A program can set one of the following dispositions for a signal: The default action should occur. This is useful to undo an earlier change of the disposition of the signal to something other than its default. The signal is ignored. This is useful for a signal whose default action would be to terminate the process. A signal handler is executed. A signal handler is a function, written by the programmer, that performs appropriate tasks in response to the delivery of a signal. For example, the shell has a handler for the SIGINT signal (generated by the interrupt character, Control-C) that causes it to stop what it is currently doing and return control to the main input loop, so that the user is once more presented with the shell prompt. Notifying the kernel that a handler function should be invoked is usually referred to as installing or establishing a signal handler. When a signal handler is invoked in response to the delivery of a signal, we say that the signal has been handled or, synonymously, caught. Note that it isn\u2019t possible to set the disposition of a signal to terminate or dump core (unless one of these is the default disposition of the signal). The nearest we can get to this is to install a handler for the signal that then calls either exit() or abort(). The abort() function generates a SIGABRT signal for the process, which causes it to dump core and terminate.","title":"Signal Overview"},{"location":"level102/system_calls_and_signals/signals/#types-of-signals","text":"To list available signals in a Linux system, you can use the command kill -l . The table below lists the signals 1 to 20. To get a full list of signals, you can refer here . Signal name Signal number Default Action Meaning SIGHUP 1 Terminate Hangup detected on controlling terminal or death of controlling process SIGINT 2 Terminate Interrupt from keyboard SIGQUIT 3 Core dump Quit from keyboard SIGILL 4 Core dump Illegal instruction SIGTRAP 5 Core dump Trace/breakpoint trap for debugging SIGABRT , SIGIOT 6 Core dump Abnormal termination SIGBUS 7 Core dump Bus error SIGFPE 8 Core dump Floating point exception SIGKILL 9 Terminate Kill signal(cannot be caught or ignored) SIGUSR1 10 Terminate User-defined signal 1 SIGSEGV 11 Core dump Invalid memory reference SIGUSR2 12 Terminate User-defined signal 2 SIGPIPE 13 Terminate Broken pipe;write pipe with no readers SIGALRM 14 Terminate Timer signal from alarm SIGTERM 15 Terminate Process termination SIGSTKFLT 16 Terminate Stack fault on math co-processor SIGCHLD 17 Ignore Child stopped or terminated SIGCONT 18 Continue Continue if stopped SIGSTOP 19 Stop Stop process (can not be caught or ignore) SIGTSTP 20 Stop Stop types at tty","title":"Types of signals"},{"location":"level102/system_calls_and_signals/signals/#sending-signals-to-process","text":"There are three different ways to send signals to processes: Sending signal to process using kill Kill command can be used to send signals to process. By default a SIGTERM signal is sent but a different type of signal can be sent to the process by defining the signal number(or signal name). For example, the command kill -9 367 sends SIGKILL to the process with PID 367 Sending signal to process via keyboard Signals can be sent to a running process by pressing some specific keys. For example, holding Ctrl+C sends SIGINT to the process which terminates it. Sending signal to process via another process A process can send a signal to another process via the kill() system call. In this use, signals can be employed as a synchronization technique, or even as a primitive form of interprocess communication (IPC). It is also possible for a process to send a signal to itself. int kill(pid_t pid, int sig) system call takes 2 arguments, pid of the process you wish to send the signal to and the signal number of the desired signal.","title":"Sending signals to process"},{"location":"level102/system_calls_and_signals/signals/#handling-signals","text":"Referring to the table of signals in the previous section, you can see that there are default handlers attached to all signals when the program is started. When we invoke signal to attach our own handler, we are over-riding the default behaviour of the program in response to that signal. Specifically, if we attach a handler to SIGINT, the program will no longer terminate when you press +C (or send the program a SIGINT by any other means); rather, the function specified as the handler will be invoked instead which will define the behaviour of the program in response to that signal. Let\u2019s take an example of handling SIGINT signal and terminating a program. We will use Python\u2019s signal library to achieve this. When we press Ctrl+C, SIGINT signal is sent. From the signals table, we see that the default action for SIGINT is to terminate the process. To illustrate how the process reacts to the default action and a signal handler, let us consider the below example. Default Action of SIGINT: Let us first run the below lines in a python environment: while 1: continue Now let us press \"Ctrl+C\". On pressing \"Ctrl+C\" , a SIGINT interrupt is sent to the process and the default action for SIGINT as per the table we saw in the previous section is to terminate the process. We see that the while loop is terminated and we get the below on our console: ^CTraceback (most recent call last): File \"\", line 2, in KeyboardInterrupt The process terminated(default action) since it received a SIGINT(Keyboard Interrupt) when we pressed Ctrl+C. Signal Handler for SIGINT: Let us run the below lines of code in the Python environment. import signal import sys #Start of signal_handler function def signal_handler(signal, frame): print ('You pressed Ctrl+C!') # End of signal_handler function signal.signal(signal.SIGINT, signal_handler) This is an example of a program that defines its own signal handler for SIGINT , overriding the default action. Now let us run the while and continue statement as we did previously. while 1: continue Do we see any changes when Ctrl+C is pressed? Does the program terminate? We see the below output: ^CYou pressed Ctrl+C! Everytime we press Ctrl+C, we just the see the above message and the program does not terminate. Inorder to terminate the program, you can press Ctrl+Z which sends the SIGSTOP signal whose default action is to stop the process. In the case of the signal handler, we define a function signal_handler() which prints \u201cYou pressed Ctrl+C!\u201d and does not terminate the program. The handler is called with two arguments, the signal number and the current stack frame (None or a frame object ). signal.signal() allows defining custom handlers to be executed when a signal is received. Its two arguments are the signal number(name) you want to trap and the name of the signal handler.","title":"Handling signals"},{"location":"level102/system_calls_and_signals/signals/#role-of-signals-in-system-calls-with-the-example-of-wait","text":"The wait() system call waits for one of the children of the calling process to terminate and returns the termination status of that child in the buffer pointed to by statusPtr . If the parent process calls the wait() system call, then the execution of the parent is suspended until the child is terminated. At the termination of the child, a SIGCHLD signal is generated which is delivered to the parent by the kernel. SIGCHLD signal indicates to the parent that there is some information on the child that needs to be collected. Parent, on receipt of SIGCHLD , reaps the status of the child from the process table. Even though the child is terminated, there is an entry in the process table corresponding to the child where the process entry and PID is stored. When the parent collects the status, this entry is deleted. Thus, all the traces of the child process are removed from the system.","title":"Role of signals in system calls with the example of wait()"},{"location":"level102/system_calls_and_signals/signals/#zombie-and-orphane-states","text":"If the parent decides not to wait for the child\u2019s termination and it executes its subsequent task, or fails to read the exit status of the child, there remains an entry in the process table even after the termination of the child. This state of the child process is known as the Zombie state. In order to avoid long-lasting zombies, we need to have code that calls wait() after the child process is created. It is generally good to create a signal handler for the SIGCHLD signal, calling one of the wait-family functions in a loop, until no uncollected child data remains. A child process becomes orphaned, if its parent process terminates before the child .The orphaned child is adopted by init/systemd, the ancestor of all processes, whose process ID is 1. Further calls to fetch the parent pid of this process returns 1.","title":"Zombie and Orphane States"},{"location":"level102/system_calls_and_signals/system_calls/","text":"Introduction A system call is a controlled entry point into the kernel, allowing a process to request the kernel to perform some action on the process\u2019s behalf. The kernel makes a range of services accessible to programs via the system call application programming interface (API). Application developers often do not have direct access to the system calls, but can access them through this API. These services include, for example, creating a new process, performing I/O, and creating a pipe for interprocess communication. The set of system calls is fixed. Each system call is identified by a unique number. The list of different system calls can be found here . A system call changes the processor state from user mode to kernel mode, so that the CPU can access protected kernel memory. Each system call may have a set of arguments that specify information to be transferred from user space (i.e., the process\u2019s virtual address space) to kernel space and vice versa. From a programming point of view, invoking a system call looks much like calling a C function. Types of system calls There are mainly 5 types of different system calls. They are : Process Control: These system calls are used to handle tasks related to a process such as process creation, termination,etc. File Management: These system calls are used for operations on files such as reading/writing a file. Device Management: These system calls are used to deal with devices such as reading/writing into device buffers. Information Maintenance: These system calls handle information and its transfer between the operating system and the user program. Communication: These system calls are useful for inter-process communication. They are also used for creating and deleting a communication connection. Types Of System Calls Examples in Linux Process Control fork(),exit(),wait() File Management open(), read(),write() Device Management ioctl(),read(),write() Information Maintenance getpid(),alarm(),sleep() Communication pipe(),shmget(),mmap() User mode, kernel mode and their transitions Modern processor architectures typically allow the CPU to operate in at least two different modes: user mode and kernel mode . Correspondingly, areas of virtual memory can be marked as being part of user space or kernel space. When running in user mode, the CPU can access only memory that is marked as being in user space; attempts to access memory in kernel space result in a hardware exception. At any given time, a process may be executing in either user mode or kernel mode. The type of instructions that can be executed depends on the mode and this is enforced at the hardware level. CPU modes (also called processor modes, CPU states, CPU privilege levels) are operating modes for the central processing unit of some computer architectures that place restrictions on the type and scope of operations that can be performed by certain processes being run by the CPU. The kernel itself is not a process but a process manager. The kernel model assumes that processes that require a kernel service use specific programming constructs called system calls. When a program is executed in user mode, it cannot directly access the kernel data structures or the kernel programs. When an application executes in kernel mode, however, these restrictions no longer apply. A program usually executes in user mode and switches to kernel mode only when requesting a service provided by the kernel. If an application needs access to hardware resources on the system(like peripherals,memory,disks), it must issue a system call, which causes a context switch from user mode to kernel mode. This procedure is followed when reading/writing from/to files, etc. It is only the system call itself which runs in kernel mode, not the application code. When the system call is complete, the process returns to the user mode with the return value using an inverse context switch. Apart from system calls, kernel routines can be activated in the below ways as well: The CPU executing the process signals an exception , which is an unusual condition such as an invalid instruction. The kernel handles the exception on behalf of the process that caused it. A peripheral device issues an interrupt signal to the CPU to notify it of an event such as a request for attention, a status change, or the completion of an I/O operation. Each interrupt signal is dealt by a kernel program called an interrupt handler . Since peripheral devices operate asynchronously with respect to the CPU, interrupts occur at unpredictable times. A kernel thread is executed. Since it runs in kernel Mode, the corresponding program must be considered part of the kernel. In the above diagram, Process 1 in User Mode issues a system call, after which the process switches to Kernel Mode and the system call is serviced. Process 1 then resumes execution in User Mode until a timer interrupt occurs and the scheduler is activated in Kernel Mode. A process switch takes place and Process 2 starts its execution in User Mode until a hardware device raises an interrupt. As a consequence of the interrupt, Process 2 switches to Kernel Mode and services the interrupt. Working of write() system call The write() system call writes data to an open file. # include ssize_t write(int fd, void *buffer, size_t count); buffer is the address of the data to be written; count is the number of bytes to write from buffer; and fd is a file descriptor referring to the file to which data is to be written. write() call writes up to count bytes from buffer to the open file referred to by fd . On success, write() call returns the number of bytes actually written, which may be less than count and returns -1 on error. When performing I/O on a disk file, a successful return from write() doesn\u2019t guarantee that the data has been transferred to disk, because the kernel performs buffering of disk I/O in order to reduce disk activity and expedite write() calls. It simply copies data between a user-space buffer and a buffer in the kernel buffer cache. At some later point, the kernel writes (flushes) its buffer to the disk. If, in the interim, another process attempts to read these bytes of the file, then the kernel automatically supplies the data from the buffer cache, rather than from (the outdated contents of) the file. The aim of this design is to allow write() to be fast, since they don\u2019t need to wait on a (slow) disk operation. This design is also efficient, since it reduces the number of disk transfers that the kernel must perform. Debugging in Linux with strace strace is a tool used to trace the transition between user processes and the Linux kernel. Inorder to use the tool, we need ensure that it is installed in the system by running the command: $ rpm -qa | grep -i strace strace-4.12-9.el7.x86_64 If the above command does not give any output, you can install the tool via: $ yum install strace The functions which are a part of standard C library are known as library functions. The purposes of these functions are very diverse, including such tasks as opening a file, converting a time to a human-readable format, and comparing two character strings. Some library functions are layered on top of system calls. Often, library functions are designed to provide a more caller-friendly interface than the underlying system call. For example, the printf() function provides output formatting and data buffering, whereas the write() system call just outputs a block of bytes.The most commonly used implementation of the standard C library on Linux is the GNU C library glibc . The C programming language gives printf() that lets the user write data in many different formats. So printf() as a function converts your data into a formatted sequence of bytes and that calls write() to write those bytes onto the output. Let us examine what happens when a printf() statement is executed with the use of strace command : strace printf %s \u201cHello world\u201d ~]$ strace printf %s \"Hello world\" execve(\"/usr/bin/printf\", [\"printf\", \"%s\", \"Hello world\"], [/* 47 vars */]) = 0 brk(NULL) = 0x90d000 mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f8fc672f000 access(\"/etc/ld.so.preload\", R_OK) = -1 ENOENT (No such file or directory) open(\"/etc/ld.so.cache\", O_RDONLY|O_CLOEXEC) = 3 fstat(3, {st_mode=S_IFREG|0644, st_size=98854, ...}) = 0 mmap(NULL, 98854, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7f8fc6716000 close(3) = 0 open(\"/lib64/libc.so.6\", O_RDONLY|O_CLOEXEC) = 3 read(3, \"\\177ELF\\2\\1\\1\\3\\0\\0\\0\\0\\0\\0\\0\\0\\3\\0>\\0\\1\\0\\0\\0\\20&\\2\\0\\0\\0\\0\\0\"..., 832) = 832 fstat(3, {st_mode=S_IFREG|0755, st_size=2156160, ...}) = 0 mmap(NULL, 3985888, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7f8fc6141000 mprotect(0x7f8fc6304000, 2097152, PROT_NONE) = 0 mmap(0x7f8fc6504000, 24576, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x1c3000) = 0x7f8fc6504000 mmap(0x7f8fc650a000, 16864, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x7f8fc650a000 close(3) = 0 mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f8fc6715000 mmap(NULL, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f8fc6713000 arch_prctl(ARCH_SET_FS, 0x7f8fc6713740) = 0 mprotect(0x7f8fc6504000, 16384, PROT_READ) = 0 mprotect(0x60a000, 4096, PROT_READ) = 0 mprotect(0x7f8fc6730000, 4096, PROT_READ) = 0 munmap(0x7f8fc6716000, 98854) = 0 brk(NULL) = 0x90d000 brk(0x92e000) = 0x92e000 brk(NULL) = 0x92e000 open(\"/usr/lib/locale/locale-archive\", O_RDONLY|O_CLOEXEC) = 3 fstat(3, {st_mode=S_IFREG|0644, st_size=106075056, ...}) = 0 mmap(NULL, 106075056, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7f8fbfc17000 close(3) = 0 open(\"/usr/share/locale/locale.alias\", O_RDONLY|O_CLOEXEC) = 3 fstat(3, {st_mode=S_IFREG|0644, st_size=2502, ...}) = 0 mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f8fc672e000 read(3, \"# Locale name alias data base.\\n#\"..., 4096) = 2502 read(3, \"\", 4096) = 0 close(3) = 0 munmap(0x7f8fc672e000, 4096) = 0 open(\"/usr/lib/locale/UTF-8/LC_CTYPE\", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory) fstat(1, {st_mode=S_IFCHR|0620, st_rdev=makedev(136, 1), ...}) = 0 mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f8fc672e000 write(1, \"Hello world\", 11Hello world) = 11 close(1) = 0 munmap(0x7f8fc672e000, 4096) = 0 close(2) = 0 exit_group(0) = ? +++ exited with 0 +++ execve(\"/usr/bin/printf\", [\"printf\", \"%s\", \"Hello world\"], [/ 47 vars /]) = 0 The first system call made is execve() and does three things: The operating system (OS) stops the duplicated process (of the parent). OS loads up the new program (in this case: printf() ), and starts the new program. execve() replaces defining parts of the current process's memory stack with the new stuff loaded from the printf executable file. The first word of the line, execve, is the name of the system call being executed. The first parameter must be the path of a binary executable or a script. The second is an array of argument strings passed to the new program. By convention, the first of these strings should contain the filename associated with the file being executed. The third parameter must be an environment variable. The number after the = sign (which is 0 in this case) is a value returned by the execve system call which indicates that the call was successful. open(\"/usr/lib/locale/UTF-8/LC_CTYPE\", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory) In this line, the program tried to open() file /usr/lib/locale/UTF-8/LC_CTYPE . However the system call failed (with -1 status) with the descriptive error message No such file or directory , as the file wasn\u2019t found (ENOENT). brk(NULL) = 0x90d000 brk(0x92e000) = 0x92e000 brk(NULL) = 0x92e000 The system call brk() is used to increase or decrease the process\u2019s data segment. It returns the new address where the data segment of the process is to end. open(\"/lib64/libc.so.6\", O_RDONLY|O_CLOEXEC) = 3 read(3, \"\\177ELF\\2\\1\\1\\3\\0\\0\\0\\0\\0\\0\\0\\0\\3\\0>\\0\\1\\0\\0\\0\\20&\\2\\0\\0\\0\\0\\0\"..., 832) = 832 In the above lines of the console output, we see that a successful open() call is made, followed by read() system call. In open() , the first parameter is the path to file which you want to use and the second parameter defines the permissions. In this example, O_RDONLY which means the file is read only and O_CLOEXEC which enables the close-on-exec flag for the opened file. This is useful to avoid race conditions in multithreaded programs where one thread opens a file descriptor at the same time as another thread. 3 indicates the file descriptor used to open the file. Since fd 0, 1, 2 are already taken by stdin, stdout and stderr. So first unused file descriptor is 3 in file descriptor table. If open() In read() , the first parameter is the file descriptor which is 3(the file was opened using this file descriptor by open() . The second parameter is the buffer to read data from and the third parameter is the length of the buffer. The return value is 832 which is the number of bytes read. close(3) = 0 A close system call is used to close a file descriptor by the kernel. For most file systems, a program terminates access to a file in a filesystem using the close system call. 0 after the = sign indicates that the system call was successful. write(1, \"Hello world\", 11Hello world) = 11 In the previous section, we described the write() system call and the arguments that it takes. Whenever we see any output to the video screen, it\u2019s from the file named /dev/tty and written to stdout on screen through fd 1. The first parameter is the file descriptor , second parameter is the buffer containing the information to be written and the last parameter contains the count of characters. On success, the number of bytes written are returned (zero indicates nothing was written) which is 11 in this example. +++ exited with 0 +++ This indicates that the program exited successfully with exit code 0. An exit code of 0 generally indicates successful execution and termination in Linux programs. You don't need to memorize all the system calls or what they do, because you can refer to documentation when you need to. Ensure the following package is installed before running the man command: $ rpm -qa | grep -i man-pages man-pages-3.53-5.el7.noarch Run the following man command with the system call name to see the documentation for that system call(for eg, execve): man 2 execve Apart from system calls, strace can be used to detect the files that are being accessed by the program. In the above trace, we have a system call open(\"/lib64/libc.so.6\", O_RDONLY|O_CLOEXEC) = 3 which is opening the libc shared object /lib64/libc.so.6 which is the C implementation of various standard functions. It is the file where we see the printf() definition needed for printing Hello World . Strace can also be used to check if a program is hanging or stuck. When we have the trace, we can observe at which operation the program is stuck as well. Furthermore, as we go through the trace, we can also find errors(if there are any) to point out why the program is hung/stuck. Strace can be very helpful in finding the reason behind slow performance of a program. Alhough strace has the aforementioned uses to it, if you're running a trace in a production environment, strace is not a good choice. It introduces a substantial amount of overhead. According to a performance test conducted by Arnaldo Carvalho de Melo, a senior software engineer at Red Hat, the process traced using strace ran 173 times slower, which can be disastrous for a production environment.","title":"System Calls"},{"location":"level102/system_calls_and_signals/system_calls/#introduction","text":"A system call is a controlled entry point into the kernel, allowing a process to request the kernel to perform some action on the process\u2019s behalf. The kernel makes a range of services accessible to programs via the system call application programming interface (API). Application developers often do not have direct access to the system calls, but can access them through this API. These services include, for example, creating a new process, performing I/O, and creating a pipe for interprocess communication. The set of system calls is fixed. Each system call is identified by a unique number. The list of different system calls can be found here . A system call changes the processor state from user mode to kernel mode, so that the CPU can access protected kernel memory. Each system call may have a set of arguments that specify information to be transferred from user space (i.e., the process\u2019s virtual address space) to kernel space and vice versa. From a programming point of view, invoking a system call looks much like calling a C function.","title":"Introduction"},{"location":"level102/system_calls_and_signals/system_calls/#types-of-system-calls","text":"There are mainly 5 types of different system calls. They are : Process Control: These system calls are used to handle tasks related to a process such as process creation, termination,etc. File Management: These system calls are used for operations on files such as reading/writing a file. Device Management: These system calls are used to deal with devices such as reading/writing into device buffers. Information Maintenance: These system calls handle information and its transfer between the operating system and the user program. Communication: These system calls are useful for inter-process communication. They are also used for creating and deleting a communication connection. Types Of System Calls Examples in Linux Process Control fork(),exit(),wait() File Management open(), read(),write() Device Management ioctl(),read(),write() Information Maintenance getpid(),alarm(),sleep() Communication pipe(),shmget(),mmap()","title":"Types of system calls"},{"location":"level102/system_calls_and_signals/system_calls/#user-mode-kernel-mode-and-their-transitions","text":"Modern processor architectures typically allow the CPU to operate in at least two different modes: user mode and kernel mode . Correspondingly, areas of virtual memory can be marked as being part of user space or kernel space. When running in user mode, the CPU can access only memory that is marked as being in user space; attempts to access memory in kernel space result in a hardware exception. At any given time, a process may be executing in either user mode or kernel mode. The type of instructions that can be executed depends on the mode and this is enforced at the hardware level. CPU modes (also called processor modes, CPU states, CPU privilege levels) are operating modes for the central processing unit of some computer architectures that place restrictions on the type and scope of operations that can be performed by certain processes being run by the CPU. The kernel itself is not a process but a process manager. The kernel model assumes that processes that require a kernel service use specific programming constructs called system calls. When a program is executed in user mode, it cannot directly access the kernel data structures or the kernel programs. When an application executes in kernel mode, however, these restrictions no longer apply. A program usually executes in user mode and switches to kernel mode only when requesting a service provided by the kernel. If an application needs access to hardware resources on the system(like peripherals,memory,disks), it must issue a system call, which causes a context switch from user mode to kernel mode. This procedure is followed when reading/writing from/to files, etc. It is only the system call itself which runs in kernel mode, not the application code. When the system call is complete, the process returns to the user mode with the return value using an inverse context switch. Apart from system calls, kernel routines can be activated in the below ways as well: The CPU executing the process signals an exception , which is an unusual condition such as an invalid instruction. The kernel handles the exception on behalf of the process that caused it. A peripheral device issues an interrupt signal to the CPU to notify it of an event such as a request for attention, a status change, or the completion of an I/O operation. Each interrupt signal is dealt by a kernel program called an interrupt handler . Since peripheral devices operate asynchronously with respect to the CPU, interrupts occur at unpredictable times. A kernel thread is executed. Since it runs in kernel Mode, the corresponding program must be considered part of the kernel. In the above diagram, Process 1 in User Mode issues a system call, after which the process switches to Kernel Mode and the system call is serviced. Process 1 then resumes execution in User Mode until a timer interrupt occurs and the scheduler is activated in Kernel Mode. A process switch takes place and Process 2 starts its execution in User Mode until a hardware device raises an interrupt. As a consequence of the interrupt, Process 2 switches to Kernel Mode and services the interrupt.","title":"User mode, kernel mode and their transitions"},{"location":"level102/system_calls_and_signals/system_calls/#working-of-write-system-call","text":"The write() system call writes data to an open file. # include ssize_t write(int fd, void *buffer, size_t count); buffer is the address of the data to be written; count is the number of bytes to write from buffer; and fd is a file descriptor referring to the file to which data is to be written. write() call writes up to count bytes from buffer to the open file referred to by fd . On success, write() call returns the number of bytes actually written, which may be less than count and returns -1 on error. When performing I/O on a disk file, a successful return from write() doesn\u2019t guarantee that the data has been transferred to disk, because the kernel performs buffering of disk I/O in order to reduce disk activity and expedite write() calls. It simply copies data between a user-space buffer and a buffer in the kernel buffer cache. At some later point, the kernel writes (flushes) its buffer to the disk. If, in the interim, another process attempts to read these bytes of the file, then the kernel automatically supplies the data from the buffer cache, rather than from (the outdated contents of) the file. The aim of this design is to allow write() to be fast, since they don\u2019t need to wait on a (slow) disk operation. This design is also efficient, since it reduces the number of disk transfers that the kernel must perform.","title":"Working of write() system call"},{"location":"level102/system_calls_and_signals/system_calls/#debugging-in-linux-with-strace","text":"strace is a tool used to trace the transition between user processes and the Linux kernel. Inorder to use the tool, we need ensure that it is installed in the system by running the command: $ rpm -qa | grep -i strace strace-4.12-9.el7.x86_64 If the above command does not give any output, you can install the tool via: $ yum install strace The functions which are a part of standard C library are known as library functions. The purposes of these functions are very diverse, including such tasks as opening a file, converting a time to a human-readable format, and comparing two character strings. Some library functions are layered on top of system calls. Often, library functions are designed to provide a more caller-friendly interface than the underlying system call. For example, the printf() function provides output formatting and data buffering, whereas the write() system call just outputs a block of bytes.The most commonly used implementation of the standard C library on Linux is the GNU C library glibc . The C programming language gives printf() that lets the user write data in many different formats. So printf() as a function converts your data into a formatted sequence of bytes and that calls write() to write those bytes onto the output. Let us examine what happens when a printf() statement is executed with the use of strace command : strace printf %s \u201cHello world\u201d ~]$ strace printf %s \"Hello world\" execve(\"/usr/bin/printf\", [\"printf\", \"%s\", \"Hello world\"], [/* 47 vars */]) = 0 brk(NULL) = 0x90d000 mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f8fc672f000 access(\"/etc/ld.so.preload\", R_OK) = -1 ENOENT (No such file or directory) open(\"/etc/ld.so.cache\", O_RDONLY|O_CLOEXEC) = 3 fstat(3, {st_mode=S_IFREG|0644, st_size=98854, ...}) = 0 mmap(NULL, 98854, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7f8fc6716000 close(3) = 0 open(\"/lib64/libc.so.6\", O_RDONLY|O_CLOEXEC) = 3 read(3, \"\\177ELF\\2\\1\\1\\3\\0\\0\\0\\0\\0\\0\\0\\0\\3\\0>\\0\\1\\0\\0\\0\\20&\\2\\0\\0\\0\\0\\0\"..., 832) = 832 fstat(3, {st_mode=S_IFREG|0755, st_size=2156160, ...}) = 0 mmap(NULL, 3985888, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7f8fc6141000 mprotect(0x7f8fc6304000, 2097152, PROT_NONE) = 0 mmap(0x7f8fc6504000, 24576, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x1c3000) = 0x7f8fc6504000 mmap(0x7f8fc650a000, 16864, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x7f8fc650a000 close(3) = 0 mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f8fc6715000 mmap(NULL, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f8fc6713000 arch_prctl(ARCH_SET_FS, 0x7f8fc6713740) = 0 mprotect(0x7f8fc6504000, 16384, PROT_READ) = 0 mprotect(0x60a000, 4096, PROT_READ) = 0 mprotect(0x7f8fc6730000, 4096, PROT_READ) = 0 munmap(0x7f8fc6716000, 98854) = 0 brk(NULL) = 0x90d000 brk(0x92e000) = 0x92e000 brk(NULL) = 0x92e000 open(\"/usr/lib/locale/locale-archive\", O_RDONLY|O_CLOEXEC) = 3 fstat(3, {st_mode=S_IFREG|0644, st_size=106075056, ...}) = 0 mmap(NULL, 106075056, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7f8fbfc17000 close(3) = 0 open(\"/usr/share/locale/locale.alias\", O_RDONLY|O_CLOEXEC) = 3 fstat(3, {st_mode=S_IFREG|0644, st_size=2502, ...}) = 0 mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f8fc672e000 read(3, \"# Locale name alias data base.\\n#\"..., 4096) = 2502 read(3, \"\", 4096) = 0 close(3) = 0 munmap(0x7f8fc672e000, 4096) = 0 open(\"/usr/lib/locale/UTF-8/LC_CTYPE\", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory) fstat(1, {st_mode=S_IFCHR|0620, st_rdev=makedev(136, 1), ...}) = 0 mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f8fc672e000 write(1, \"Hello world\", 11Hello world) = 11 close(1) = 0 munmap(0x7f8fc672e000, 4096) = 0 close(2) = 0 exit_group(0) = ? +++ exited with 0 +++ execve(\"/usr/bin/printf\", [\"printf\", \"%s\", \"Hello world\"], [/ 47 vars /]) = 0 The first system call made is execve() and does three things: The operating system (OS) stops the duplicated process (of the parent). OS loads up the new program (in this case: printf() ), and starts the new program. execve() replaces defining parts of the current process's memory stack with the new stuff loaded from the printf executable file. The first word of the line, execve, is the name of the system call being executed. The first parameter must be the path of a binary executable or a script. The second is an array of argument strings passed to the new program. By convention, the first of these strings should contain the filename associated with the file being executed. The third parameter must be an environment variable. The number after the = sign (which is 0 in this case) is a value returned by the execve system call which indicates that the call was successful. open(\"/usr/lib/locale/UTF-8/LC_CTYPE\", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory) In this line, the program tried to open() file /usr/lib/locale/UTF-8/LC_CTYPE . However the system call failed (with -1 status) with the descriptive error message No such file or directory , as the file wasn\u2019t found (ENOENT). brk(NULL) = 0x90d000 brk(0x92e000) = 0x92e000 brk(NULL) = 0x92e000 The system call brk() is used to increase or decrease the process\u2019s data segment. It returns the new address where the data segment of the process is to end. open(\"/lib64/libc.so.6\", O_RDONLY|O_CLOEXEC) = 3 read(3, \"\\177ELF\\2\\1\\1\\3\\0\\0\\0\\0\\0\\0\\0\\0\\3\\0>\\0\\1\\0\\0\\0\\20&\\2\\0\\0\\0\\0\\0\"..., 832) = 832 In the above lines of the console output, we see that a successful open() call is made, followed by read() system call. In open() , the first parameter is the path to file which you want to use and the second parameter defines the permissions. In this example, O_RDONLY which means the file is read only and O_CLOEXEC which enables the close-on-exec flag for the opened file. This is useful to avoid race conditions in multithreaded programs where one thread opens a file descriptor at the same time as another thread. 3 indicates the file descriptor used to open the file. Since fd 0, 1, 2 are already taken by stdin, stdout and stderr. So first unused file descriptor is 3 in file descriptor table. If open() In read() , the first parameter is the file descriptor which is 3(the file was opened using this file descriptor by open() . The second parameter is the buffer to read data from and the third parameter is the length of the buffer. The return value is 832 which is the number of bytes read. close(3) = 0 A close system call is used to close a file descriptor by the kernel. For most file systems, a program terminates access to a file in a filesystem using the close system call. 0 after the = sign indicates that the system call was successful. write(1, \"Hello world\", 11Hello world) = 11 In the previous section, we described the write() system call and the arguments that it takes. Whenever we see any output to the video screen, it\u2019s from the file named /dev/tty and written to stdout on screen through fd 1. The first parameter is the file descriptor , second parameter is the buffer containing the information to be written and the last parameter contains the count of characters. On success, the number of bytes written are returned (zero indicates nothing was written) which is 11 in this example. +++ exited with 0 +++ This indicates that the program exited successfully with exit code 0. An exit code of 0 generally indicates successful execution and termination in Linux programs. You don't need to memorize all the system calls or what they do, because you can refer to documentation when you need to. Ensure the following package is installed before running the man command: $ rpm -qa | grep -i man-pages man-pages-3.53-5.el7.noarch Run the following man command with the system call name to see the documentation for that system call(for eg, execve): man 2 execve Apart from system calls, strace can be used to detect the files that are being accessed by the program. In the above trace, we have a system call open(\"/lib64/libc.so.6\", O_RDONLY|O_CLOEXEC) = 3 which is opening the libc shared object /lib64/libc.so.6 which is the C implementation of various standard functions. It is the file where we see the printf() definition needed for printing Hello World . Strace can also be used to check if a program is hanging or stuck. When we have the trace, we can observe at which operation the program is stuck as well. Furthermore, as we go through the trace, we can also find errors(if there are any) to point out why the program is hung/stuck. Strace can be very helpful in finding the reason behind slow performance of a program. Alhough strace has the aforementioned uses to it, if you're running a trace in a production environment, strace is not a good choice. It introduces a substantial amount of overhead. According to a performance test conducted by Arnaldo Carvalho de Melo, a senior software engineer at Red Hat, the process traced using strace ran 173 times slower, which can be disastrous for a production environment.","title":"Debugging in Linux with strace"},{"location":"level102/system_design/conclusion/","text":"We have looked at designing a sytem from the scratch, scaling it up from a single server to multiple datacenters and hundreds of thousands of users. However, you might have (rightly!) guessed that there is a lot more to system design than what we have covered so far. This course should give you a sweeping glance at the things that are fundamental to any system design process. Specific solutions implemented, frameworks and orchestration systems used evolve rapidly. However, the guiding principles remain the same. We hope you this course helped in getting you started along the right direction and that you have fun designing systems and solving interesting problems.","title":"Conclusion"},{"location":"level102/system_design/intro/","text":"System Design Prerequisites School of SRE - System Design - Phase I What to expect from this course The aim is to empower the reader to understand the building blocks of a well-designed system, evaluate existing systems, understand the trade-offs, come up with their own design, and to explore the various tools available to implement such a system. In phase one of this module, we talked about the fundamentals of system design including concepts like scalability, availability and reliability. We continue to build on those fundamentals in this phase. Throughout the course, there are callout sections that appear like this, and talk about things that are closely related to the system design process, but don\u2019t form a part of the system itself. They also have information about some common issues that crop up in system design. Watch out for them. What is not covered under this course While this course covers many aspects of system design, it does not cover the most fundamental concepts. For such topics, it is advised to go through the prerequisites. In general, this module will not go into actually implementing the architecture - we will not talk about choosing a hosting/cloud provider or an orchestration setup or a CI/CD system. Instead, we try to focus on the fundamental considerations that need to go into system design. Course Contents Introduction Large system Design Scaling Scaling beyond the datacentre Design patterns for resiliency Conclusion Introduction We talked about building a basic photo sharing application in the previous phase of this course. Our basic requirements for the application were that It should work for a reasonably large number of users Avoid service failures/cluster crash in case of any issues In other words, we wanted to build a system that was available, scalable and fault tolerant. We will continue designing that application, and cover additional concepts in the course of doing so. The photo sharing application is a web application that will handle everything from user sign up, log in, uploads, feed generation, user interaction and interaction with uploaded content. Also a database to store this information. In the simplest design, both the web app and the database can run on the same server. Recall this initial design from Phase 1. Building on that, we will talk about performance elements in system design - setting the right performance measurement metrics and using them to drive our design decisions, improving performance using caching, Content Delivery Networks (CDNs), etc. We will also talk about how to design for resilience by looking at some system design patterns - graceful degradation, time-outs and circuit breakers. Cost System design considerations like availability, scalability cannot exist in isolation. When operating outside the lab, we have other considerations / the existing considerations take on a different hue. One such consideration is cost. Real world systems almost always have budget constraints. System design, implementation and continued operation needs to have predictable costs per unit output. The output is usually the business problem you are trying to solve. Striking a balance between the two is very important. Understanding the capabilities of your system A well designed system requires understanding the building blocks intimately in terms of their capabilities. Not all components are created equal, and understanding what a single component can do is very important - for e.g., in the photo upload application it is important to know what a single database instance is capable of, in terms of read or write transactions per second and what would be a reasonable expectation be. This helps in building systems that are appropriately weighted - and will eliminate obvious sources of bottlenecks. On a lower level, even understanding the capabilities of the underlying hardware (or a VM instance if you are on cloud) is important. For eg., all disks don\u2019t perform the same, and all disks don\u2019t perform the same per dollar. If we are planning to have an API that is expected to return a response in 100ms under normal circumstances, then it is important to know how much of it will be spent in which parts of the system. The following link will help in getting a sense of each component\u2019s performance, all the way from the CPU cache to the network link to our end user. Numbers every programmer should know","title":"Introduction"},{"location":"level102/system_design/intro/#system-design","text":"","title":"System Design"},{"location":"level102/system_design/intro/#prerequisites","text":"School of SRE - System Design - Phase I","title":"Prerequisites"},{"location":"level102/system_design/intro/#what-to-expect-from-this-course","text":"The aim is to empower the reader to understand the building blocks of a well-designed system, evaluate existing systems, understand the trade-offs, come up with their own design, and to explore the various tools available to implement such a system. In phase one of this module, we talked about the fundamentals of system design including concepts like scalability, availability and reliability. We continue to build on those fundamentals in this phase. Throughout the course, there are callout sections that appear like this, and talk about things that are closely related to the system design process, but don\u2019t form a part of the system itself. They also have information about some common issues that crop up in system design. Watch out for them.","title":"What to expect from this course"},{"location":"level102/system_design/intro/#what-is-not-covered-under-this-course","text":"While this course covers many aspects of system design, it does not cover the most fundamental concepts. For such topics, it is advised to go through the prerequisites. In general, this module will not go into actually implementing the architecture - we will not talk about choosing a hosting/cloud provider or an orchestration setup or a CI/CD system. Instead, we try to focus on the fundamental considerations that need to go into system design.","title":"What is not covered under this course"},{"location":"level102/system_design/intro/#course-contents","text":"Introduction Large system Design Scaling Scaling beyond the datacentre Design patterns for resiliency Conclusion","title":"Course Contents"},{"location":"level102/system_design/intro/#introduction","text":"We talked about building a basic photo sharing application in the previous phase of this course. Our basic requirements for the application were that It should work for a reasonably large number of users Avoid service failures/cluster crash in case of any issues In other words, we wanted to build a system that was available, scalable and fault tolerant. We will continue designing that application, and cover additional concepts in the course of doing so. The photo sharing application is a web application that will handle everything from user sign up, log in, uploads, feed generation, user interaction and interaction with uploaded content. Also a database to store this information. In the simplest design, both the web app and the database can run on the same server. Recall this initial design from Phase 1. Building on that, we will talk about performance elements in system design - setting the right performance measurement metrics and using them to drive our design decisions, improving performance using caching, Content Delivery Networks (CDNs), etc. We will also talk about how to design for resilience by looking at some system design patterns - graceful degradation, time-outs and circuit breakers.","title":"Introduction"},{"location":"level102/system_design/large-system-design/","text":"Designing a system usually starts out to be abstract - we have large functional blocks that need to work together and are abstracted away into frontend, backend and database layers. However, when it is time to implement the system, especially as an SRE we have no other choice but to think in specific terms. Servers have a fixed amount of memory, storage capacity and processing power. So we need to think about the realistic expectations from our system, assess the requirements, translate them into specific requirements from each component of the system like network, storage and compute. This is typically how almost all large scale systems are built. The folks over at Google have formalized this approach to designing systems as \u2018Non abstract large system design\u2019 (NALSD). According to the Google site reliability workbook, \u201cPractically, NALSD combines elements of capacity planning, component isolation, and graceful system degradation that are crucial to highly available production systems.\u201d We will be using an approach similar to this to build our system. Application requirements Let us define our application requirements in more concrete terms i.e., specific functions: Our photo sharing application must let the user Sign up to become a member, and login to the application Upload photographs, and optionally add a description and tag location and/or people Follow other users on the platform See a feed comprising of photos from other users that they follow View their own profile page, and manage who they follow Let us define expectations for the application\u2019s performance for a better user experience. We also need to define the health of the system. SLIs and SLOs help us do just that. SLIs and SLOs The Google SRE book defines service level indicator(SLI) as \u201ca carefully defined quantitative measure of some aspect of the level of service that is provided.\u201d For our application, we can define multiple SLIs. One indicator can be the response time for loading the feed for our photo sharing application. Picking the right set of SLIs is very important since they essentially help us define the health of the system as a whole using concrete data. SLIs for an application are defined by the owners of the service, in consultation with the SREs. Service level objective (SLO) is defined as \u201ca target value or range of values for a service level that is measured by an SLI\u201d. SLO is a way for us to anchor ourselves to an optimal user experience by defining SLI boundaries. If our application takes a long time to load the feed, users might not open it very often. As a result, an example of SLO can be that at least 99% of the users should see their feed loaded within 1 second. Now that we have defined SLIs and SLOs, let us define the application\u2019s scalability, reliability and performance characteristics in terms of specific SLI and SLO levels. Defining application requirements in terms of SLIs and SLOs The following can be some of the expectations for our application: Once the user successfully uploads the image, it should be accessible to the user and their followers 100% of the time, barring user elected deletion. At least 50000 unique visitors should be able to visit the site at any given time and view their feed. 99% of the users should be able to view their feeds in less than 1 second. Upon uploading a new image, it should show up in the feed of the user\u2019s followers within 15 minutes. Users should be able to upload potentially thousands of images. (as long as they are not abusing the service) Since our ultimate aim is to learn system design, we will arbitrarily limit the functionalities of the system. This will help us keep sight of our aim, and keep us focussed. Having defined the functionalities and expectations for our system, let us quickly sketch an initial design. As of now, all the functionalities are residing on a single server, which has endpoints for all of these functions. We will attempt to build a system that satisfies our SLOs, is able to serve 50k concurrent users, and about a million total users. In the course of this attempt, we will discuss a string of concepts, some of which we have already seen in Phase 1 of this course. Caution Note that the numbers we have picked in the following sections are completely arbitrary. They have been chosen to demonstrate thinking about system design in a non-abstract manner. They have not been benchmarked, and bear no real world resemblance. Do not use them in any real world systems that you may be designing. You should come up with your own numbers, using the guiding principles we have relied upon here. Estimating resource requirements Single Computer If we wished to run the application on a single server, we would need to perform all the above functionalities from the diagram on this server itself. Let us perform some calculations to figure out what kind of resources we will need. Before anything else, we need to store the data about users, their uploads, follower information and any other metadata. We will choose a relational DB to store this information, like MySQL. Do note that we can also choose to use a NOSQL solution here. That would require a similar approach to calculate the requirements. Let us represent the users like so: UserID(int) UserName(varchar) DisplayName(varchar) YearOfBirth(year) Email(varchar) Photos can be represented like this: PhotoID(int) PhotoHash(varchar) Uploadtime(datetime) Location(varchar) OptionalFlag(varchar) Followers can be represented like this: Follower(int) Followee(int) Let us quickly estimate the storage needed for a hundred million total users. A single user would need 4B + 32B + 32B + 4B + 32B = 104 bytes. A hundred million users would need 10.4 GB storage. A single photo would need about 4B + 20B + 4B + 32B + 4B = 64 bytes of storage to store the metadata related to the photo. Assuming a million photos uploaded in one day, we would need about 64 MB of storage per day, just for the metadata. For the photo storage itself, we will need about 300GB per day, assuming 300KB average photo size. A single visitor opening our application simply hits our /get_feed endpoint upon logging in to the application. Let us quickly calculate the resources needed to serve this request. Assuming the initial feed loads 5 images (of 300 KB size on an average) and then does lazy loading to infinitely scroll, we will need to send about 1.5 megabytes of images to the user for his initial call. With a 1000 Mbps* network link to the server, we can send only about (1000/8)/1.5 or about 83 users all loading the feed at the same time, before we saturate the network link. If we needed to serve 50k concurrent users every second, we would need 1.5*50000*8 = 600000 Mbps network throughput needed for every 5 images sent, assuming we send out all 5 images in a single second. If we are reading all of it from disk, we would likely hit disk throughput limits far before approaching anywhere near this amount of traffic. So in order to meet our application requirements, we would need a server that has ~310GB storage for the database and the images of one day, and about 600 Gbps link to serve 50k users concurrently, along with CPU required to perform all this. Clearly not the task for a single server. And do note that we have severely limited the information we are storing in the database. We would likely need an order of magnitude more information to be stored. While we clearly do not have any real world server that has the resources we calculated above, this exercise provides us some valuable data points about what the resource cost is. Armed with this information, let us work on scaling our system through system design to get us as close as possible to our goals for the application. * Modern servers even have multi-gigabit links, but it is highly unlikely that such a huge server will be serving our application alone. Modern cloud providers have VMs that also boast several gigabits of bandwidth, but they usually end up being throttled after certain limits. References: SQL vs NoSQL databases Introducing Non-Abstract Large System Design","title":"Large System Design"},{"location":"level102/system_design/large-system-design/#application-requirements","text":"Let us define our application requirements in more concrete terms i.e., specific functions: Our photo sharing application must let the user Sign up to become a member, and login to the application Upload photographs, and optionally add a description and tag location and/or people Follow other users on the platform See a feed comprising of photos from other users that they follow View their own profile page, and manage who they follow Let us define expectations for the application\u2019s performance for a better user experience. We also need to define the health of the system. SLIs and SLOs help us do just that.","title":"Application requirements"},{"location":"level102/system_design/large-system-design/#slis-and-slos","text":"The Google SRE book defines service level indicator(SLI) as \u201ca carefully defined quantitative measure of some aspect of the level of service that is provided.\u201d For our application, we can define multiple SLIs. One indicator can be the response time for loading the feed for our photo sharing application. Picking the right set of SLIs is very important since they essentially help us define the health of the system as a whole using concrete data. SLIs for an application are defined by the owners of the service, in consultation with the SREs. Service level objective (SLO) is defined as \u201ca target value or range of values for a service level that is measured by an SLI\u201d. SLO is a way for us to anchor ourselves to an optimal user experience by defining SLI boundaries. If our application takes a long time to load the feed, users might not open it very often. As a result, an example of SLO can be that at least 99% of the users should see their feed loaded within 1 second. Now that we have defined SLIs and SLOs, let us define the application\u2019s scalability, reliability and performance characteristics in terms of specific SLI and SLO levels.","title":"SLIs and SLOs"},{"location":"level102/system_design/large-system-design/#defining-application-requirements-in-terms-of-slis-and-slos","text":"The following can be some of the expectations for our application: Once the user successfully uploads the image, it should be accessible to the user and their followers 100% of the time, barring user elected deletion. At least 50000 unique visitors should be able to visit the site at any given time and view their feed. 99% of the users should be able to view their feeds in less than 1 second. Upon uploading a new image, it should show up in the feed of the user\u2019s followers within 15 minutes. Users should be able to upload potentially thousands of images. (as long as they are not abusing the service) Since our ultimate aim is to learn system design, we will arbitrarily limit the functionalities of the system. This will help us keep sight of our aim, and keep us focussed. Having defined the functionalities and expectations for our system, let us quickly sketch an initial design. As of now, all the functionalities are residing on a single server, which has endpoints for all of these functions. We will attempt to build a system that satisfies our SLOs, is able to serve 50k concurrent users, and about a million total users. In the course of this attempt, we will discuss a string of concepts, some of which we have already seen in Phase 1 of this course.","title":"Defining application requirements in terms of SLIs and SLOs"},{"location":"level102/system_design/large-system-design/#estimating-resource-requirements","text":"Single Computer If we wished to run the application on a single server, we would need to perform all the above functionalities from the diagram on this server itself. Let us perform some calculations to figure out what kind of resources we will need. Before anything else, we need to store the data about users, their uploads, follower information and any other metadata. We will choose a relational DB to store this information, like MySQL. Do note that we can also choose to use a NOSQL solution here. That would require a similar approach to calculate the requirements. Let us represent the users like so: UserID(int) UserName(varchar) DisplayName(varchar) YearOfBirth(year) Email(varchar) Photos can be represented like this: PhotoID(int) PhotoHash(varchar) Uploadtime(datetime) Location(varchar) OptionalFlag(varchar) Followers can be represented like this: Follower(int) Followee(int) Let us quickly estimate the storage needed for a hundred million total users. A single user would need 4B + 32B + 32B + 4B + 32B = 104 bytes. A hundred million users would need 10.4 GB storage. A single photo would need about 4B + 20B + 4B + 32B + 4B = 64 bytes of storage to store the metadata related to the photo. Assuming a million photos uploaded in one day, we would need about 64 MB of storage per day, just for the metadata. For the photo storage itself, we will need about 300GB per day, assuming 300KB average photo size. A single visitor opening our application simply hits our /get_feed endpoint upon logging in to the application. Let us quickly calculate the resources needed to serve this request. Assuming the initial feed loads 5 images (of 300 KB size on an average) and then does lazy loading to infinitely scroll, we will need to send about 1.5 megabytes of images to the user for his initial call. With a 1000 Mbps* network link to the server, we can send only about (1000/8)/1.5 or about 83 users all loading the feed at the same time, before we saturate the network link. If we needed to serve 50k concurrent users every second, we would need 1.5*50000*8 = 600000 Mbps network throughput needed for every 5 images sent, assuming we send out all 5 images in a single second. If we are reading all of it from disk, we would likely hit disk throughput limits far before approaching anywhere near this amount of traffic. So in order to meet our application requirements, we would need a server that has ~310GB storage for the database and the images of one day, and about 600 Gbps link to serve 50k users concurrently, along with CPU required to perform all this. Clearly not the task for a single server. And do note that we have severely limited the information we are storing in the database. We would likely need an order of magnitude more information to be stored. While we clearly do not have any real world server that has the resources we calculated above, this exercise provides us some valuable data points about what the resource cost is. Armed with this information, let us work on scaling our system through system design to get us as close as possible to our goals for the application. * Modern servers even have multi-gigabit links, but it is highly unlikely that such a huge server will be serving our application alone. Modern cloud providers have VMs that also boast several gigabits of bandwidth, but they usually end up being throttled after certain limits.","title":"Estimating resource requirements"},{"location":"level102/system_design/large-system-design/#references","text":"SQL vs NoSQL databases Introducing Non-Abstract Large System Design","title":"References:"},{"location":"level102/system_design/resiliency/","text":"A resilient system is one that can keep functioning in the face of adversity. With our application, there can be numerous failures that act as adversities. There can be network level failures that take out entire data centres, there might be issues at the rack level or at the server level, or there might be something wrong with the cloud provider. We may also run out of capacity, or there might be a wrong code push that breaks the system. We will talk about a couple of such issues, and understand how we might design a system to work around such things. In some cases, a workaround might not be possible. However it is still valuable to know potential vulnerabilities to the system stability. Resilient architectures leverage system design patterns such as graceful degradation, quotas, timeouts and circuit breakers. Let us look at some of them in this section. Quotas A system may have a component or an endpoint that is consumed by multiple components and endpoints. It is important to have something in place that will prevent one consumer or client from overwhelming such a system. Quotas are one way to do this - we simply assign a specific quota for each component - by way of specifying requests per unit time. Anyone who breaches the quota is either warned or dropped, depending on the implementation. This way, one of our own systems misbehaving cannot result in denial of service to others. Quotas also help us prevent cascading failures. Graceful Degradation When a system with multiple dependencies encounters failure in one of the dependencies, gracefully degrading to minimum viable functionality would be a lot better than grinding the entire system to a halt. For example, let us assume there is an endpoint (an URL for a service or a specific function) in our application whose responsibility is to parse the location information in an user uploaded image from the image's metadata and provide suggestions for location tagging to the user. Rather than failing the entire upload, it is much better to skip over this functionality and still give the user an option to manually tag a location. Gracefully degrading is always better compared to total failures. Timeouts We sometimes call other services or resources like databases or API endpoints in our application. When calling such a resource from our application, it is important to always have a reasonable timeout. It doesn\u2019t necessarily even have to be that the resource will fail for all requests. It just might be that a specific request falls in the high tail latency category. A reasonable time out is helpful to keep the user experience consistent - it is better to fail rather than to have frustratingly long delays, in some cases. Exponential back-offs When a service endpoint fails, retries are one way to see if it was a momentary failure. However, if the retry is also going to fail, there is no point in endlessly retrying. At large enough scale, the retries can compete with the new requests (which might very well be served as expected) and saturate the system. To avoid this, we can look at exponential back-off for retries. This essentially decreases the rate at which the clients retry, upon encountering consecutive failures on retries. Circuit breakers While exponential back off is one way to deal with retry storms, circuit breakers can be another. Circuit breakers can help failures from percolating the entire system. Else, an unmitigated failure that flows through the system may result in false alerts, worsening the mean time to detection(MTTD) and mean time to resolution(MTTR). For example, in case one of the in-memory cache nodes fails resulting in requests reaching the database post the initial timeouts for cache, it might end up overloading the database. If the initial connection between cache node failure and DB node failure is not made, then it might result in increased MTTD of the actual cause and consequently the MTTR. Self healing systems A traditionally load-balanced application with multiple instances might fail when more than a threshold of instances stop responding to requests - either because they are down, or suddenly there is a huge influx of requests, resulting in degraded performance. A self-healing system adds more instances in this scenario to replace the failed instances. Auto-scaling like this can also help when there is a sudden spike in query. If our application runs on a public cloud, it might simply be a matter of spinning up more virtual machines . If we are running on-premise out of our data center, then we will want to think about capacity planning much more carefully. Regardless of how we handle adding additional capacity - simply addition may not be enough. We should also think about additional potential failure modes that might be encountered. For example, the load balancing layer itself might need scaling up, to handle the influx of new backends. Continuous Deployment and Integration A well designed system also needs to take into account the need for a proper staging setup that can mimic the production environment as closely as possible. There should also be a way for us to replay production traffic in the staging environment to test changes to production thoroughly.","title":"Resiliency"},{"location":"level102/system_design/resiliency/#quotas","text":"A system may have a component or an endpoint that is consumed by multiple components and endpoints. It is important to have something in place that will prevent one consumer or client from overwhelming such a system. Quotas are one way to do this - we simply assign a specific quota for each component - by way of specifying requests per unit time. Anyone who breaches the quota is either warned or dropped, depending on the implementation. This way, one of our own systems misbehaving cannot result in denial of service to others. Quotas also help us prevent cascading failures.","title":"Quotas"},{"location":"level102/system_design/resiliency/#graceful-degradation","text":"When a system with multiple dependencies encounters failure in one of the dependencies, gracefully degrading to minimum viable functionality would be a lot better than grinding the entire system to a halt. For example, let us assume there is an endpoint (an URL for a service or a specific function) in our application whose responsibility is to parse the location information in an user uploaded image from the image's metadata and provide suggestions for location tagging to the user. Rather than failing the entire upload, it is much better to skip over this functionality and still give the user an option to manually tag a location. Gracefully degrading is always better compared to total failures.","title":"Graceful Degradation"},{"location":"level102/system_design/resiliency/#timeouts","text":"We sometimes call other services or resources like databases or API endpoints in our application. When calling such a resource from our application, it is important to always have a reasonable timeout. It doesn\u2019t necessarily even have to be that the resource will fail for all requests. It just might be that a specific request falls in the high tail latency category. A reasonable time out is helpful to keep the user experience consistent - it is better to fail rather than to have frustratingly long delays, in some cases.","title":"Timeouts"},{"location":"level102/system_design/resiliency/#exponential-back-offs","text":"When a service endpoint fails, retries are one way to see if it was a momentary failure. However, if the retry is also going to fail, there is no point in endlessly retrying. At large enough scale, the retries can compete with the new requests (which might very well be served as expected) and saturate the system. To avoid this, we can look at exponential back-off for retries. This essentially decreases the rate at which the clients retry, upon encountering consecutive failures on retries.","title":"Exponential back-offs"},{"location":"level102/system_design/resiliency/#circuit-breakers","text":"While exponential back off is one way to deal with retry storms, circuit breakers can be another. Circuit breakers can help failures from percolating the entire system. Else, an unmitigated failure that flows through the system may result in false alerts, worsening the mean time to detection(MTTD) and mean time to resolution(MTTR). For example, in case one of the in-memory cache nodes fails resulting in requests reaching the database post the initial timeouts for cache, it might end up overloading the database. If the initial connection between cache node failure and DB node failure is not made, then it might result in increased MTTD of the actual cause and consequently the MTTR.","title":"Circuit breakers"},{"location":"level102/system_design/resiliency/#self-healing-systems","text":"A traditionally load-balanced application with multiple instances might fail when more than a threshold of instances stop responding to requests - either because they are down, or suddenly there is a huge influx of requests, resulting in degraded performance. A self-healing system adds more instances in this scenario to replace the failed instances. Auto-scaling like this can also help when there is a sudden spike in query. If our application runs on a public cloud, it might simply be a matter of spinning up more virtual machines . If we are running on-premise out of our data center, then we will want to think about capacity planning much more carefully. Regardless of how we handle adding additional capacity - simply addition may not be enough. We should also think about additional potential failure modes that might be encountered. For example, the load balancing layer itself might need scaling up, to handle the influx of new backends.","title":"Self healing systems"},{"location":"level102/system_design/resiliency/#continuous-deployment-and-integration","text":"A well designed system also needs to take into account the need for a proper staging setup that can mimic the production environment as closely as possible. There should also be a way for us to replay production traffic in the staging environment to test changes to production thoroughly.","title":"Continuous Deployment and Integration"},{"location":"level102/system_design/scaling-beyond-the-datacenter/","text":"Caching static assets Extending the existing caching solution a bit, we arrive at Content Delivery Networks(CDNs). CDNs are the caching layer that is closest to the user. A significant chunk of resources served in a webpage, may not be changing on an hourly or even a daily basis. In those cases, we would want to cache these at the CDN level, reducing our load. CDNs not only help reduce the load on our servers by removing the burden of serving static / bandwidth intensive resources, they also let us be present closer to our users, by way of points of presence(POPs). CDNs also let us do geo-load balancing, in case we have multiple data centres around the world, and would want to serve from the closest data center (DC) possible. Taking it a step further With the addition of caching and distributing our application into simpler services, we have solved the problem of scaling to 50000 users. However, our users may be geographically distributed locations and may not be at the same distance from our data centre or our cloud region. Consistency in user experience is important, else we are excluding users who are far away from our location, potentially eliminating a significant chunk of potential users. However, it is not impractical to have data centers all over the world, or even in more than a couple of locations in the world. This is where CDNs and POPs come into picture. Points of Presence CDN POPs are geographically distributed data centers aimed at being close to users. POPs reduce the round trip time by delivering content from a location that is nearest to the user. POPs typically may not have all the content, but have caching servers that cache the static assets, and fetch the rest of the content from the origin server where the application actually resides. Their main function is to reduce round trip time by bringing the content closer to the website\u2019s visitor. POPs can also route traffic to one of the multiple origin DCs possible. This way, POPs can be leveraged to add resiliency as well as load-balancing. Now, with our image sharing application becoming more popular by the day, let us assume that we have hit 100,000 concurrent users. And we have built another data center, predicting this increase in traffic. Now we need to be able to route the service to both of these data centers in a reliable manner, while also retaining the ability to fall back to a single data center in case there is an issue with one of the two DCs. This is where sticky routing comes into play. Sticky Routing When an user sends a request, there are cases in which we might want to serve a specific user\u2019s requests from a DC if we have multiple DCs, or a specific server inside a DC. We may also wish to serve all requests from a specific POP by a single data center. Sticky routing helps us do exactly that. It might be simply pinning all users to a specific DC or pinning specific users to specific servers. This is typically done from the POP, so that as soon as the user enters reaches our servers, we can route them to the nearest DC possible. Geo DNS When a user opens the application, the user can be directed to one of the multiple globally distributed POPs. This can be done using GeoDNS , which simply put, gives out a different IP address(which are distributed geographically), depending on the location of the user making the DNS request. GeoDNS is the first step in distributing users to different locations - it is not 100% accurate, and typically makes use of IP address allotment information for guessing the location of the user. However, it works well enough for >90% of the users. After this, we can have a sticky routing service that assigns each user to a specific DC, which we can use to assign a DC to this user, and set a cookie. When the user next visits, the cookie can be read at the POP to decide which data center the user\u2019s traffic must be directed to. Having multiple DCs and leveraging sticky routing has not only scaling benefits, but also adds to the resiliency of the service, albeit at the cost of additional complexity. Let us consider another use case in which an user uploads a new profile picture for themselves. If we have multiple data centres or POPs which are not synced in real time - not all of them might have the newer picture. In such a case, it would make sense to tie that user to a specific DC/region until the update has propagated to all regions. Sticky routing would enable us to do this. References CDNs LinkedIn's TrafficShift blog talks about sticky routing","title":"Scaling Beyond the Data Center"},{"location":"level102/system_design/scaling-beyond-the-datacenter/#caching-static-assets","text":"Extending the existing caching solution a bit, we arrive at Content Delivery Networks(CDNs). CDNs are the caching layer that is closest to the user. A significant chunk of resources served in a webpage, may not be changing on an hourly or even a daily basis. In those cases, we would want to cache these at the CDN level, reducing our load. CDNs not only help reduce the load on our servers by removing the burden of serving static / bandwidth intensive resources, they also let us be present closer to our users, by way of points of presence(POPs). CDNs also let us do geo-load balancing, in case we have multiple data centres around the world, and would want to serve from the closest data center (DC) possible. Taking it a step further With the addition of caching and distributing our application into simpler services, we have solved the problem of scaling to 50000 users. However, our users may be geographically distributed locations and may not be at the same distance from our data centre or our cloud region. Consistency in user experience is important, else we are excluding users who are far away from our location, potentially eliminating a significant chunk of potential users. However, it is not impractical to have data centers all over the world, or even in more than a couple of locations in the world. This is where CDNs and POPs come into picture.","title":"Caching static assets"},{"location":"level102/system_design/scaling-beyond-the-datacenter/#points-of-presence","text":"CDN POPs are geographically distributed data centers aimed at being close to users. POPs reduce the round trip time by delivering content from a location that is nearest to the user. POPs typically may not have all the content, but have caching servers that cache the static assets, and fetch the rest of the content from the origin server where the application actually resides. Their main function is to reduce round trip time by bringing the content closer to the website\u2019s visitor. POPs can also route traffic to one of the multiple origin DCs possible. This way, POPs can be leveraged to add resiliency as well as load-balancing. Now, with our image sharing application becoming more popular by the day, let us assume that we have hit 100,000 concurrent users. And we have built another data center, predicting this increase in traffic. Now we need to be able to route the service to both of these data centers in a reliable manner, while also retaining the ability to fall back to a single data center in case there is an issue with one of the two DCs. This is where sticky routing comes into play.","title":"Points of Presence"},{"location":"level102/system_design/scaling-beyond-the-datacenter/#sticky-routing","text":"When an user sends a request, there are cases in which we might want to serve a specific user\u2019s requests from a DC if we have multiple DCs, or a specific server inside a DC. We may also wish to serve all requests from a specific POP by a single data center. Sticky routing helps us do exactly that. It might be simply pinning all users to a specific DC or pinning specific users to specific servers. This is typically done from the POP, so that as soon as the user enters reaches our servers, we can route them to the nearest DC possible.","title":"Sticky Routing"},{"location":"level102/system_design/scaling-beyond-the-datacenter/#geo-dns","text":"When a user opens the application, the user can be directed to one of the multiple globally distributed POPs. This can be done using GeoDNS , which simply put, gives out a different IP address(which are distributed geographically), depending on the location of the user making the DNS request. GeoDNS is the first step in distributing users to different locations - it is not 100% accurate, and typically makes use of IP address allotment information for guessing the location of the user. However, it works well enough for >90% of the users. After this, we can have a sticky routing service that assigns each user to a specific DC, which we can use to assign a DC to this user, and set a cookie. When the user next visits, the cookie can be read at the POP to decide which data center the user\u2019s traffic must be directed to. Having multiple DCs and leveraging sticky routing has not only scaling benefits, but also adds to the resiliency of the service, albeit at the cost of additional complexity. Let us consider another use case in which an user uploads a new profile picture for themselves. If we have multiple data centres or POPs which are not synced in real time - not all of them might have the newer picture. In such a case, it would make sense to tie that user to a specific DC/region until the update has propagated to all regions. Sticky routing would enable us to do this.","title":"Geo DNS"},{"location":"level102/system_design/scaling-beyond-the-datacenter/#references","text":"CDNs LinkedIn's TrafficShift blog talks about sticky routing","title":"References"},{"location":"level102/system_design/scaling/","text":"In the Phase 1 of this course, we had seen AKF scale cube and how it can help in segmenting services, defining microservices and scaling the overall application. We will use a similar strategy to scale our application - while using the estimates from the previous section, so that we can have a data driven design rather than arbitrarily choosing scaling patterns. Splitting the application Considering the huge volume of traffic that might be generated by our application, and the related resource requirements in terms of memory and CPU, let us split the application into smaller chunks. One of the simplest ways to do this would be to simply divide the application along the endpoints, and spin them up as separate instances. In reality, this decision would probably be a little more complicated, and you might end up having multiple endpoints running from the same instance. The images can be stored in an object store that can be scaled independently, rather than locating it on the servers where the application or the database resides. This would reduce the resource requirements for the servers. Stateful vs Stateless services A stateless process or service doesn\u2019t rely on stored data of it\u2019s past invocations. A stateful service on the other hand stores its state in a datastore, and typically uses the state on every call or transaction. In some cases, there are options for us to design services in such a way that certain components can be made stateless and this helps in multiple ways. Applications can be containerized easily if they are stateless. Containerized applications are also easier to scale. Stateful services require you to scale the datastore with the state as well. However, containerizing databases or scaling databases is out of the scope of this module. The resulting design after such distribution of workloads might look something like this. You might notice that the diagram also has multiple databases. We will see more about this in the following sharding section. Now that we have split the application into smaller services, we need to look at scaling up the capacity of each of these endpoints. The popular Pareto principle states that \u201c80% of consequences come from 20% of the causes\u201d. Modifying it slightly, we can say that 80% of the traffic will be for 20% of images. The no. of images uploaded vs the no. of images seen by the user is going to be similarly skewed. An user is much more likely to view images on a daily basis than they are to upload new ones. In our simple design, generating the feed page with initial 5 images will be a matter of choosing 5 recently uploaded images from fellow users whom this user follows. While we can dynamically fetch the images from the database and generate the page on the fly once the user logs on, we might soon overwhelm the database in case a large number of users choose to login at the same time and load their feeds. There are two things we can do here, one is caching, and the other one is ahead of time generation of user feeds. An user with a million followers can potentially lead to hundreds of thousands of calls to the DB, simply to fetch the latest photoID that the user has uploaded. This can quickly overwhelm any DB, and can potentially bring down the DB itself. Sharding One way to solve the problem of DB limitation is scaling up the DB write and reads. Sharding is one way to scale the DB writes, where the DB would be split into parts that reside in different instances of the DB running on separate machines. DB reads can be scaled up similarly by using read replicas as we had seen in Phase 1 of this module. Compared to the number of images the popular user uploads, the number of views generated would be massive. In that case, we should cache the photoIDs of the user\u2019s uploads, to be returned without having to perform a potentially expensive call to the DB. Let us consider another endpoint in our application named /get_user_details . It simply returns the page an user would see upon clicking another user\u2019s name. This endpoint will return a list of posts that the user has created. Normally, a call to that endpoint will involve the application talking to the DB, fetching a list of all the posts by the user and returning the result. If someone\u2019s profile is viewed thousands of times that means there are thousands of calls to the DB - which may result in issues like hot keys and hot partitions. As with all other systems, an increase in load may result in worsening response times, resulting in inconsistent and potentially bad user experience. A simple solution here would be a cache layer - one that would return the user\u2019s profile with posts without having to call the DB everytime. Caching A cache is used for the temporary storage of data that is likely to be accessed again, often repetitively. When the data requested is found in the cache, it is termed as a `cache hit\u2019. A \u2018cache miss\u2019 is the obvious complement. A well positioned cache can greatly reduce the query response time as well as improve the scalability of a system. Caches can be placed at multiple levels between the user and the application. In Phase 1, we saw how we could use caches / CDNs to service static resources of the application, resulting in quicker response times as well as lesser burden on the application servers. Let us look at more situations where caching can play a role. In-memory caching: In memory caching is when the information to be cached is kept in the main memory of the server, allowing it to be retrieved much faster than a DB residing on a disk. We cache arbitrary text (which can be HTML fragments or may be JSON objects) and fetch it back really fast. An in memory cache is the quickest way to add a layer of fast cache that can optionally be persisted to disk as well. While caching can aid significantly in scaling up and improving performance, there are situations where cache is suddenly not in place. It might be that the cache was accidentally wiped, leading to all the queries falling through to the DB layer, often multiple calls for the same piece of information. It is important to be aware of this potential \u2018thundering herd\u2019 problem and design your system accordingly. Caching proxies: There are cases where you may want to cache entire webpages / responses of other upstream resources that you need to respond to requests. There are also cases where you want to let your upstream tell you what to cache and how long to cache it for. In such cases, it might be a good idea to have a caching solution that understands Cache related HTTP headers. One example for our usecase can be when users search for a specific term in our application - if there is a frequent enough search for a user or a term, it might be more efficient to cache the responses for some duration rather than performing the search anew everytime. Let\u2019s recap one of the goals - Atleast 50000 unique visitors should be able to visit the site at any given time and view their feed. With the implementation of caching, we have removed one potential bottleneck - the DB. We also decomposed the monolith into smaller chunks that provide individual services. Another step closer to our goal is to simply horizontally scale the services needed for feed viewing and putting them behind a load balancer. Please recall the scaling concepts discussed in Phase 1 of this module. Cache managment While caching sounds like a simple, easy solution for a hard problem, an even harder problem is to manage the cache efficiently. Like most things in your system, the cache layer is not infinite. Effective cache management means removing things from the cache at the right time, to ensure the cache hit rate remains high. There are many strategies to invalidate cache after a certain time period or below certain usage thresholds. It is important to keep an eye on cache-hit rate and fine tune your caching strategy accordingly. References There are many object storage solutions available. Minio is one self hosted solution. There are also vendor-specific solutions for the cloud like Azure Blob storage and Amazon S3 . Microservices architecture style - Azure architecture guide There are many in-memory caching solutions. Some of the most popular ones include redis and memcached . Cloud vendors also have their managed cache solutions. Some of the most popular proxies include squid and Apache Traffic Server Thundering herd problem - how instagram tackled it .","title":"Scaling"},{"location":"level102/system_design/scaling/#splitting-the-application","text":"Considering the huge volume of traffic that might be generated by our application, and the related resource requirements in terms of memory and CPU, let us split the application into smaller chunks. One of the simplest ways to do this would be to simply divide the application along the endpoints, and spin them up as separate instances. In reality, this decision would probably be a little more complicated, and you might end up having multiple endpoints running from the same instance. The images can be stored in an object store that can be scaled independently, rather than locating it on the servers where the application or the database resides. This would reduce the resource requirements for the servers.","title":"Splitting the application"},{"location":"level102/system_design/scaling/#stateful-vs-stateless-services","text":"A stateless process or service doesn\u2019t rely on stored data of it\u2019s past invocations. A stateful service on the other hand stores its state in a datastore, and typically uses the state on every call or transaction. In some cases, there are options for us to design services in such a way that certain components can be made stateless and this helps in multiple ways. Applications can be containerized easily if they are stateless. Containerized applications are also easier to scale. Stateful services require you to scale the datastore with the state as well. However, containerizing databases or scaling databases is out of the scope of this module. The resulting design after such distribution of workloads might look something like this. You might notice that the diagram also has multiple databases. We will see more about this in the following sharding section. Now that we have split the application into smaller services, we need to look at scaling up the capacity of each of these endpoints. The popular Pareto principle states that \u201c80% of consequences come from 20% of the causes\u201d. Modifying it slightly, we can say that 80% of the traffic will be for 20% of images. The no. of images uploaded vs the no. of images seen by the user is going to be similarly skewed. An user is much more likely to view images on a daily basis than they are to upload new ones. In our simple design, generating the feed page with initial 5 images will be a matter of choosing 5 recently uploaded images from fellow users whom this user follows. While we can dynamically fetch the images from the database and generate the page on the fly once the user logs on, we might soon overwhelm the database in case a large number of users choose to login at the same time and load their feeds. There are two things we can do here, one is caching, and the other one is ahead of time generation of user feeds. An user with a million followers can potentially lead to hundreds of thousands of calls to the DB, simply to fetch the latest photoID that the user has uploaded. This can quickly overwhelm any DB, and can potentially bring down the DB itself.","title":"Stateful vs Stateless services"},{"location":"level102/system_design/scaling/#sharding","text":"One way to solve the problem of DB limitation is scaling up the DB write and reads. Sharding is one way to scale the DB writes, where the DB would be split into parts that reside in different instances of the DB running on separate machines. DB reads can be scaled up similarly by using read replicas as we had seen in Phase 1 of this module. Compared to the number of images the popular user uploads, the number of views generated would be massive. In that case, we should cache the photoIDs of the user\u2019s uploads, to be returned without having to perform a potentially expensive call to the DB. Let us consider another endpoint in our application named /get_user_details . It simply returns the page an user would see upon clicking another user\u2019s name. This endpoint will return a list of posts that the user has created. Normally, a call to that endpoint will involve the application talking to the DB, fetching a list of all the posts by the user and returning the result. If someone\u2019s profile is viewed thousands of times that means there are thousands of calls to the DB - which may result in issues like hot keys and hot partitions. As with all other systems, an increase in load may result in worsening response times, resulting in inconsistent and potentially bad user experience. A simple solution here would be a cache layer - one that would return the user\u2019s profile with posts without having to call the DB everytime.","title":"Sharding"},{"location":"level102/system_design/scaling/#caching","text":"A cache is used for the temporary storage of data that is likely to be accessed again, often repetitively. When the data requested is found in the cache, it is termed as a `cache hit\u2019. A \u2018cache miss\u2019 is the obvious complement. A well positioned cache can greatly reduce the query response time as well as improve the scalability of a system. Caches can be placed at multiple levels between the user and the application. In Phase 1, we saw how we could use caches / CDNs to service static resources of the application, resulting in quicker response times as well as lesser burden on the application servers. Let us look at more situations where caching can play a role.","title":"Caching"},{"location":"level102/system_design/scaling/#in-memory-caching","text":"In memory caching is when the information to be cached is kept in the main memory of the server, allowing it to be retrieved much faster than a DB residing on a disk. We cache arbitrary text (which can be HTML fragments or may be JSON objects) and fetch it back really fast. An in memory cache is the quickest way to add a layer of fast cache that can optionally be persisted to disk as well. While caching can aid significantly in scaling up and improving performance, there are situations where cache is suddenly not in place. It might be that the cache was accidentally wiped, leading to all the queries falling through to the DB layer, often multiple calls for the same piece of information. It is important to be aware of this potential \u2018thundering herd\u2019 problem and design your system accordingly. Caching proxies: There are cases where you may want to cache entire webpages / responses of other upstream resources that you need to respond to requests. There are also cases where you want to let your upstream tell you what to cache and how long to cache it for. In such cases, it might be a good idea to have a caching solution that understands Cache related HTTP headers. One example for our usecase can be when users search for a specific term in our application - if there is a frequent enough search for a user or a term, it might be more efficient to cache the responses for some duration rather than performing the search anew everytime. Let\u2019s recap one of the goals - Atleast 50000 unique visitors should be able to visit the site at any given time and view their feed. With the implementation of caching, we have removed one potential bottleneck - the DB. We also decomposed the monolith into smaller chunks that provide individual services. Another step closer to our goal is to simply horizontally scale the services needed for feed viewing and putting them behind a load balancer. Please recall the scaling concepts discussed in Phase 1 of this module.","title":"In-memory caching:"},{"location":"level102/system_design/scaling/#cache-managment","text":"While caching sounds like a simple, easy solution for a hard problem, an even harder problem is to manage the cache efficiently. Like most things in your system, the cache layer is not infinite. Effective cache management means removing things from the cache at the right time, to ensure the cache hit rate remains high. There are many strategies to invalidate cache after a certain time period or below certain usage thresholds. It is important to keep an eye on cache-hit rate and fine tune your caching strategy accordingly.","title":"Cache managment"},{"location":"level102/system_design/scaling/#references","text":"There are many object storage solutions available. Minio is one self hosted solution. There are also vendor-specific solutions for the cloud like Azure Blob storage and Amazon S3 . Microservices architecture style - Azure architecture guide There are many in-memory caching solutions. Some of the most popular ones include redis and memcached . Cloud vendors also have their managed cache solutions. Some of the most popular proxies include squid and Apache Traffic Server Thundering herd problem - how instagram tackled it .","title":"References"},{"location":"level102/system_troubleshooting_and_performance/conclusion/","text":"Complex systems have many factors which can go wrong. It can be a bad design & architecture, poorly managed code, poor policies around different caches, bad DB queries or architecture, improper use of resources, or bad OS version, poorly monitored system, datacenter issues, network faults, and many more, Any of these can go wrong. As an SRE, Knowing important tools/commands, best practices, profiling, benchmarking and scaling can help you with faster troubleshooting and performance improvement of the overall system. Further readings Here are some links from the LinkedIn Engineering Blog, as written by LinkedIn engineers, about firefighting they did, ensuring site up 24x7x365. Taming memory fragmentation in Venice with Jemalloc Intro: Every Day Is Monday in Operations Fixing Linux filesystem performance regressions The impact of slow NFS on data systems","title":"Conclusion"},{"location":"level102/system_troubleshooting_and_performance/conclusion/#further-readings","text":"Here are some links from the LinkedIn Engineering Blog, as written by LinkedIn engineers, about firefighting they did, ensuring site up 24x7x365. Taming memory fragmentation in Venice with Jemalloc Intro: Every Day Is Monday in Operations Fixing Linux filesystem performance regressions The impact of slow NFS on data systems","title":"Further readings"},{"location":"level102/system_troubleshooting_and_performance/important-tools/","text":"Important linux commands Having knowledge of following commands will help find issues faster. Elaborating each command in detail is out of scope, please look for man pages or online for more information and examples around the same. For logs parsing -: grep, sed, awk, cut, tail, head For network checks -: nc, netstat, traceroute/6, mtr, ping/6, route, tcpdump, ss, ip For DNS -: dig, host, nslookup For tracing system call -: strace For parallel executions over ssh -: gnu parallel, xargs + ssh. For http/s checks -: curl, wget For list of open files -: lsof For modifying attributes of the system kernel -: sysctl In case of distributed systems, some good third party tools can help to execute commands/instructions on many hosts at once, like: SSH based tools ClusterSSH : Cluster ssh can help you run a command in parallel on many hosts at once. Ansible : It allows you to write ansible playbooks which you can run on hundreds/thousands of hosts at the same time. Agent Based tools Saltstack : Is a configuration, state and remote execution framework, provides a wide variety of flexibility to users to execute modules on large numbers of hosts at once. Puppet : Is an automated administrative engine for your Linux, Unix, and Windows systems, performs administrative tasks. Log analysis tools These can help in writing SQL type queries for parsing, analysing logs and provide an easy UI interface to create dashboards which can render various types of charts based on defined queries. ELK : Elasticsearch, Logstash and Kibana, provide package of tools and services to allow, parse logs, index logs and analyse logs easily and quickly. Once logs/data is parsed/filtered through logstash and indexed in elasticsearch, one can create dynamic dashboards in Kibana in a matter of minutes. Such provides easy analysis and correlation on application errors/exceptions/warnings. Azure kusto : Azure kusto is a cloud based service similar to Elasticsearch and Kibana, it allows easy indexing of heavy logs, provides SQL type interface for writing queries, and an interface to create dynamic dashboards.","title":"Important Tools"},{"location":"level102/system_troubleshooting_and_performance/important-tools/#important-linux-commands","text":"Having knowledge of following commands will help find issues faster. Elaborating each command in detail is out of scope, please look for man pages or online for more information and examples around the same. For logs parsing -: grep, sed, awk, cut, tail, head For network checks -: nc, netstat, traceroute/6, mtr, ping/6, route, tcpdump, ss, ip For DNS -: dig, host, nslookup For tracing system call -: strace For parallel executions over ssh -: gnu parallel, xargs + ssh. For http/s checks -: curl, wget For list of open files -: lsof For modifying attributes of the system kernel -: sysctl In case of distributed systems, some good third party tools can help to execute commands/instructions on many hosts at once, like: SSH based tools ClusterSSH : Cluster ssh can help you run a command in parallel on many hosts at once. Ansible : It allows you to write ansible playbooks which you can run on hundreds/thousands of hosts at the same time. Agent Based tools Saltstack : Is a configuration, state and remote execution framework, provides a wide variety of flexibility to users to execute modules on large numbers of hosts at once. Puppet : Is an automated administrative engine for your Linux, Unix, and Windows systems, performs administrative tasks.","title":"Important linux commands"},{"location":"level102/system_troubleshooting_and_performance/important-tools/#log-analysis-tools","text":"These can help in writing SQL type queries for parsing, analysing logs and provide an easy UI interface to create dashboards which can render various types of charts based on defined queries. ELK : Elasticsearch, Logstash and Kibana, provide package of tools and services to allow, parse logs, index logs and analyse logs easily and quickly. Once logs/data is parsed/filtered through logstash and indexed in elasticsearch, one can create dynamic dashboards in Kibana in a matter of minutes. Such provides easy analysis and correlation on application errors/exceptions/warnings. Azure kusto : Azure kusto is a cloud based service similar to Elasticsearch and Kibana, it allows easy indexing of heavy logs, provides SQL type interface for writing queries, and an interface to create dynamic dashboards.","title":"Log analysis tools"},{"location":"level102/system_troubleshooting_and_performance/introduction/","text":"System troubleshooting and performance improvements Prerequisites Linux Basics System design Basic Networking Metrics and Monitoring What to expect from this course This brief course tries to provide a general introduction on how to troubleshoot system issues, like analysing api failures, resource utilization, network issues, hardware and OS issues. Course also briefs on profiling and benchmarking to measure overall system performance. What is not covered under this course This course does not cover following -: System Design and Architecture. Programming practices. Metrics and Monitoring. OS basics. Course Contents Introduction Troubleshooting Troubleshooting Flowchart General Practices General Host issues Important tools to know Important linux commands Log analysis tools Performance improvements Performance analysis commands Profiling tools Benchmarking Scaling Troubleshooting Example Conclusion Further readings Introduction Troubleshooting is an important part of operations & development. It can\u2019t be learned by just reading one article or completing a course online, Its a continuous learning process, one learns it during :- Daily operations and development. Finding & Fixing application bugs. Finding & Fixing system & network issues. Performance analysis and improvements. And more. From an SRE\u2019s perspective, It is expected that they are aware of certain topics upfront to be able to troubleshoot problems around single or distributed systems. Know your resources well, understand host specifications, liks CPU, Memory, Network, Disk etc. Understand system design and architecture. Ensure important metrics are being collected/rendered properly. There was a famous quote by HP founders - \u201cWhat gets measured gets fixed\u201d If system components and performance metrics are captured thoroughly then there is a high chance of success in troubleshooting an issue, at its earliest. Scope There is no common approach to troubleshoot different types of applications or services, the failure can occur at any layer of it. We will keep the scope of this work to a web api service type only. Note -: Linux ecosystem is wide, there are hundreds of tools and utilities which can help with system troubleshooting, each comes with its own set of benefits and functionalities. We will cover some of the known tools, either already available with Linux or are available in the open source world. Detailed explanation of mentioned tools in this doc is out of scope, please explore the internet or man pages for more examples and documentation around the same.","title":"Introduction"},{"location":"level102/system_troubleshooting_and_performance/introduction/#system-troubleshooting-and-performance-improvements","text":"","title":"System troubleshooting and performance improvements"},{"location":"level102/system_troubleshooting_and_performance/introduction/#prerequisites","text":"Linux Basics System design Basic Networking Metrics and Monitoring","title":"Prerequisites"},{"location":"level102/system_troubleshooting_and_performance/introduction/#what-to-expect-from-this-course","text":"This brief course tries to provide a general introduction on how to troubleshoot system issues, like analysing api failures, resource utilization, network issues, hardware and OS issues. Course also briefs on profiling and benchmarking to measure overall system performance.","title":"What to expect from this course"},{"location":"level102/system_troubleshooting_and_performance/introduction/#what-is-not-covered-under-this-course","text":"This course does not cover following -: System Design and Architecture. Programming practices. Metrics and Monitoring. OS basics.","title":"What is not covered under this course"},{"location":"level102/system_troubleshooting_and_performance/introduction/#course-contents","text":"Introduction Troubleshooting Troubleshooting Flowchart General Practices General Host issues Important tools to know Important linux commands Log analysis tools Performance improvements Performance analysis commands Profiling tools Benchmarking Scaling Troubleshooting Example Conclusion Further readings","title":"Course Contents"},{"location":"level102/system_troubleshooting_and_performance/introduction/#introduction","text":"Troubleshooting is an important part of operations & development. It can\u2019t be learned by just reading one article or completing a course online, Its a continuous learning process, one learns it during :- Daily operations and development. Finding & Fixing application bugs. Finding & Fixing system & network issues. Performance analysis and improvements. And more. From an SRE\u2019s perspective, It is expected that they are aware of certain topics upfront to be able to troubleshoot problems around single or distributed systems. Know your resources well, understand host specifications, liks CPU, Memory, Network, Disk etc. Understand system design and architecture. Ensure important metrics are being collected/rendered properly. There was a famous quote by HP founders - \u201cWhat gets measured gets fixed\u201d If system components and performance metrics are captured thoroughly then there is a high chance of success in troubleshooting an issue, at its earliest.","title":"Introduction"},{"location":"level102/system_troubleshooting_and_performance/introduction/#scope","text":"There is no common approach to troubleshoot different types of applications or services, the failure can occur at any layer of it. We will keep the scope of this work to a web api service type only. Note -: Linux ecosystem is wide, there are hundreds of tools and utilities which can help with system troubleshooting, each comes with its own set of benefits and functionalities. We will cover some of the known tools, either already available with Linux or are available in the open source world. Detailed explanation of mentioned tools in this doc is out of scope, please explore the internet or man pages for more examples and documentation around the same.","title":"Scope"},{"location":"level102/system_troubleshooting_and_performance/performance-improvements/","text":"Performance tools are an important part of development/operations lifecycle, Its highly important for understanding application behavior. SRE generally uses such tools to evaluate how well service will perform and make/suggest improvements accordingly. Performance analysis commands Most of these commands are a must to know for doing performance analysis of a system or service. top -: shows real-time view of running system, processes, threads etc. htop -: Similar to top command, but a bit more interactive then it. iotop -: An interactive disk I/O monitoring tool. vmstat -: Virtual memory statistics explorer. iostat -: Monitoring tool for input/output statistics for devices and partitions. free -: Tell info about physical memory and swap memory. sar -: System activity report, reports diff metrics such as cpu, disk, mem, network, etc. mpstat -: Display info about CPU utilization and performance. lsof -: Provides info about the list of open files, opened by which processes. perf -: Performance analysing tool. Profiling tools Profiling is an important part of performance analysis of the service. There are various profiler tools available, which can help figure most frequent code-paths, debugging, memory profiling, etc. These can generate the heatmap to understand the code performance when under load. FlameGraph : Flame graphs are a visualization of profiled software, allowing the most frequent code-paths to be identified quickly and accurately. Valgrind : It is a programming tool for memory debugging, memory leak detection, and profiling. Gprof : GNU profiler tool uses a hybrid of instrumentation and sampling. Instrumentation is used to collect function call information, and sampling is used to gather runtime profiling information. To know how LinkedIn performs On-Demand Profiling on its services, Read LinkedIn blog ODP: An Infrastructure for On-Demand Service Profiling Benchmarking It is a process of measuring the best performance of the service. Like how much QPS service can handle, its latency when load is increasing, host resource utilization, loadavg etc etc. The regression testing (i.e load testing) is a must before deploying the service to production. Some of known tools -: Apache Benchmark Tool, ab :, It simulate a high load on webapp and gather data for analysis Httperf : It sends requests to the web server at a specified rate and gathers stats. Increase till one finds the saturation point. Apache JMeter : It is a popular open-source tool to measure web application performance. JMeter is a java based application and not only a web server, but you can use it against PHP, Java, REST, etc. Wrk : It is another modern performance measurement tool to put a load on your web server and give you latency, request per second, transfer per second, etc. details. Locust : Easy to use, scriptable and scalable performance testing tool. Limitation -: Above tools help in synthetic load or stress testing, but such does not measure actual end user experience, It can\u2019t see how end user resources will affect application performance, it is due to lack of memory, CPU, or poor connectivity to the internet. To know how LinkedIn performs load testing across its fleet. Read : Eliminating toil with fully automated load testing And to know how LinkedIn makes use of Real Time Monitoring (RUM) data to overcome the limitations of load testing, and help improve overall experience for end users. Read : Monitor and Improve Web Performance Using RUM Data Visualization Scaling System designed optimally can perform up to a certain limit only, based on availability of resources. Continuous optimization is always needed to ensure optimum use of resources at its peak. With increasing QPS, Systems need to scale up. We can either scale vertically or horizontally. Vertical scalability has its limits as one can increase cpu, memory, disk, GPU and other specifications to certain limit only, whereas horizontal scalability can grow easily and infinitely given limitations imposed by application design and environment attributes. Scaling a web application will require some or all of the following -: Ease the server load by adding more hosts. Distributing the traffic across servers by using Load Balancers. Scale up DB by sharding the data and increasing read replicas. Here\u2019s a good read how LinkedIn scaled its application stack A Brief History of Scaling LinkedIn","title":"Performance Improvements"},{"location":"level102/system_troubleshooting_and_performance/performance-improvements/#performance-analysis-commands","text":"Most of these commands are a must to know for doing performance analysis of a system or service. top -: shows real-time view of running system, processes, threads etc. htop -: Similar to top command, but a bit more interactive then it. iotop -: An interactive disk I/O monitoring tool. vmstat -: Virtual memory statistics explorer. iostat -: Monitoring tool for input/output statistics for devices and partitions. free -: Tell info about physical memory and swap memory. sar -: System activity report, reports diff metrics such as cpu, disk, mem, network, etc. mpstat -: Display info about CPU utilization and performance. lsof -: Provides info about the list of open files, opened by which processes. perf -: Performance analysing tool.","title":"Performance analysis commands"},{"location":"level102/system_troubleshooting_and_performance/performance-improvements/#profiling-tools","text":"Profiling is an important part of performance analysis of the service. There are various profiler tools available, which can help figure most frequent code-paths, debugging, memory profiling, etc. These can generate the heatmap to understand the code performance when under load. FlameGraph : Flame graphs are a visualization of profiled software, allowing the most frequent code-paths to be identified quickly and accurately. Valgrind : It is a programming tool for memory debugging, memory leak detection, and profiling. Gprof : GNU profiler tool uses a hybrid of instrumentation and sampling. Instrumentation is used to collect function call information, and sampling is used to gather runtime profiling information. To know how LinkedIn performs On-Demand Profiling on its services, Read LinkedIn blog ODP: An Infrastructure for On-Demand Service Profiling","title":"Profiling tools"},{"location":"level102/system_troubleshooting_and_performance/performance-improvements/#benchmarking","text":"It is a process of measuring the best performance of the service. Like how much QPS service can handle, its latency when load is increasing, host resource utilization, loadavg etc etc. The regression testing (i.e load testing) is a must before deploying the service to production. Some of known tools -: Apache Benchmark Tool, ab :, It simulate a high load on webapp and gather data for analysis Httperf : It sends requests to the web server at a specified rate and gathers stats. Increase till one finds the saturation point. Apache JMeter : It is a popular open-source tool to measure web application performance. JMeter is a java based application and not only a web server, but you can use it against PHP, Java, REST, etc. Wrk : It is another modern performance measurement tool to put a load on your web server and give you latency, request per second, transfer per second, etc. details. Locust : Easy to use, scriptable and scalable performance testing tool. Limitation -: Above tools help in synthetic load or stress testing, but such does not measure actual end user experience, It can\u2019t see how end user resources will affect application performance, it is due to lack of memory, CPU, or poor connectivity to the internet. To know how LinkedIn performs load testing across its fleet. Read : Eliminating toil with fully automated load testing And to know how LinkedIn makes use of Real Time Monitoring (RUM) data to overcome the limitations of load testing, and help improve overall experience for end users. Read : Monitor and Improve Web Performance Using RUM Data Visualization","title":"Benchmarking"},{"location":"level102/system_troubleshooting_and_performance/performance-improvements/#scaling","text":"System designed optimally can perform up to a certain limit only, based on availability of resources. Continuous optimization is always needed to ensure optimum use of resources at its peak. With increasing QPS, Systems need to scale up. We can either scale vertically or horizontally. Vertical scalability has its limits as one can increase cpu, memory, disk, GPU and other specifications to certain limit only, whereas horizontal scalability can grow easily and infinitely given limitations imposed by application design and environment attributes. Scaling a web application will require some or all of the following -: Ease the server load by adding more hosts. Distributing the traffic across servers by using Load Balancers. Scale up DB by sharding the data and increasing read replicas. Here\u2019s a good read how LinkedIn scaled its application stack A Brief History of Scaling LinkedIn","title":"Scaling"},{"location":"level102/system_troubleshooting_and_performance/troubleshooting-example/","text":"In this section we will see an example of an issue and try to troubleshoot it, and at the end a few famous troubleshooting stories are shared, which were shared by LinkedIn engineers earlier. Example - Memory leak : Often memory leak issues go unnoticed until the service becomes unresponsive after running for some time (days, week or even month) until service is restarted or bug is fixed, In such cases, service memory usage will reflect in increasing order in the metric graph, something like this graph. Memory leak is mismanagement of memory allocations by application, where unneeded memory is not released, over the period of time objects continue to pile up in memory resulting in service crash. Generally such non-released objects get sorted by garbage collector automatically, but sometimes due to a bug it fails. Debugging helps in figuring where much of the application storage memory is being applied. Then, you start tracking and filter everything based on usage. In case, you find objects that aren\u2019t in use, but are referenced, you can get rid of them by deleting them to avoid memory leaks. In the case of python applications, it comes with inbuilt features like tracemalloc . This module can help pinpoint where an object was allocated first. Almost every language comes with a set of tools/libraries (inbuilt or external) which helps find memory issues. Similarly for Java there is a famous memory leak detection tool called Java VisualVM . Let\u2019s see how a dummy flask based web app with a memory leak bug, with every request its memory usage keeps increasing, and how we can use tracemalloc to capture the leak. Assumption -: A python virtual environment is created, and flask is installed in it. A bare minimum flask code with bug, read comments for more info Starting flask app On start, Its memory usage is around 26576 kb, i.e approx 26MB Now with every subsequent GET request, We can notice that process memory usage continues to increase slowly. Now lets try 10000 requests, to see if memory usage increases heavily. To hit a high number of requests, we use an Apache benchmarking tool called \u201cab\u201d . After 10000 hits, we can notice memory usage of flask app is jumped almost 15 times, i.e from initial 26576 KB to 419316 KB, i.e from roughly 26 MB to 419 MB , That\u2019s a huge jump for such a small webapp. Lets try the python tracemalloc module to try to understand the application memory allocations. Tracemalloc takes memory snapshots at a particular point, performing various statistics on the same. Adding a bare minimum code to our app.py file, no change in fetchuserdata.py file, it will allow us to capture tracemalloc snapshots whenever we will hit /capture uri. After restart of app.py (flask run) , we will - First hit http://127.0.0.1:5000/capture - Then hit http://127.0.0.1:5000/ 10000 times, for memory leak/s to take place. - Finally hit http://127.0.0.1:5000/capture again to take a snapshot to know which line has the most allocation. In the final snapshot, we noticed the exact module and lineno where most allocation happened. I.e fetchuserdata.py, line no 6, after 10000 hits, it is holding 419 MB of memory. Summary Above example shows how a bug can lead to memory leak, and how we can use tracemalloc to understand where it is. In real world applications are way more complex than the above dummy example, you must understand that using tracemalloc might degrade application performance somebit, due to tracemalloc own overheads. Be mindful about its use in production environments. If you are interested in digging deeper into Python Object Memory Allocation Internals and debugging memory leak, have a look at an Interesting talk by Sanket Patel in PyCon India 2019, Debug Memory Leak In Python Flask | Python Object Memory Allocation Internals","title":"Troubleshooting Example"},{"location":"level102/system_troubleshooting_and_performance/troubleshooting-example/#example-memory-leak","text":"Often memory leak issues go unnoticed until the service becomes unresponsive after running for some time (days, week or even month) until service is restarted or bug is fixed, In such cases, service memory usage will reflect in increasing order in the metric graph, something like this graph. Memory leak is mismanagement of memory allocations by application, where unneeded memory is not released, over the period of time objects continue to pile up in memory resulting in service crash. Generally such non-released objects get sorted by garbage collector automatically, but sometimes due to a bug it fails. Debugging helps in figuring where much of the application storage memory is being applied. Then, you start tracking and filter everything based on usage. In case, you find objects that aren\u2019t in use, but are referenced, you can get rid of them by deleting them to avoid memory leaks. In the case of python applications, it comes with inbuilt features like tracemalloc . This module can help pinpoint where an object was allocated first. Almost every language comes with a set of tools/libraries (inbuilt or external) which helps find memory issues. Similarly for Java there is a famous memory leak detection tool called Java VisualVM . Let\u2019s see how a dummy flask based web app with a memory leak bug, with every request its memory usage keeps increasing, and how we can use tracemalloc to capture the leak. Assumption -: A python virtual environment is created, and flask is installed in it. A bare minimum flask code with bug, read comments for more info Starting flask app On start, Its memory usage is around 26576 kb, i.e approx 26MB Now with every subsequent GET request, We can notice that process memory usage continues to increase slowly. Now lets try 10000 requests, to see if memory usage increases heavily. To hit a high number of requests, we use an Apache benchmarking tool called \u201cab\u201d . After 10000 hits, we can notice memory usage of flask app is jumped almost 15 times, i.e from initial 26576 KB to 419316 KB, i.e from roughly 26 MB to 419 MB , That\u2019s a huge jump for such a small webapp. Lets try the python tracemalloc module to try to understand the application memory allocations. Tracemalloc takes memory snapshots at a particular point, performing various statistics on the same. Adding a bare minimum code to our app.py file, no change in fetchuserdata.py file, it will allow us to capture tracemalloc snapshots whenever we will hit /capture uri. After restart of app.py (flask run) , we will - First hit http://127.0.0.1:5000/capture - Then hit http://127.0.0.1:5000/ 10000 times, for memory leak/s to take place. - Finally hit http://127.0.0.1:5000/capture again to take a snapshot to know which line has the most allocation. In the final snapshot, we noticed the exact module and lineno where most allocation happened. I.e fetchuserdata.py, line no 6, after 10000 hits, it is holding 419 MB of memory. Summary Above example shows how a bug can lead to memory leak, and how we can use tracemalloc to understand where it is. In real world applications are way more complex than the above dummy example, you must understand that using tracemalloc might degrade application performance somebit, due to tracemalloc own overheads. Be mindful about its use in production environments. If you are interested in digging deeper into Python Object Memory Allocation Internals and debugging memory leak, have a look at an Interesting talk by Sanket Patel in PyCon India 2019, Debug Memory Leak In Python Flask | Python Object Memory Allocation Internals","title":"Example - Memory leak :"},{"location":"level102/system_troubleshooting_and_performance/troubleshooting/","text":"Troubleshooting system failures can be tricky or tedious at times. In this practice we need to examine the end-to-end flow of a service, all its downstreams, analysing logs, memory leak, CPU usage, disk IO, network failures, hosts issues, etc. Knowing certain practices and tools can help figure & mitigate failures faster. Here\u2019s the high level troubleshooting flowchart -: Troubleshooting Flowchart General Practices Different systems require different approaches for finding issues. Scope of this is limited and given a problem, there can be many more points which can be looked into. Following points will look at some high level practices towards finding webapp failures and finding fixes for the same. Reproduce problem Try the broken request to reproduce the issue, Like try Hit http/s request which fails. Check the end to end flow of request and look for return codes, mostly 3xx, 4xx or 5xx . 3xx are mostly about redirections, 4xx are about unauthorized, bad request, forbidden, etc, And 5xx is mostly about server side issues. Based on the return code you can look for the next step. Client side issues are mainly about missing or buggy static contents, like javascript issues, bad image, broken json from an async call etc, such can result in incorrect page rendering on browsers. Gather Information Look for errors/exceptions in application logs, Like \"Can\u2019t Allocate Memory\" or OutOfMemoryError, Or Something like \"disk I/O error\", Or a DNS resolution error. Check application and host metrics, Look for anomalies in service and hosts graphs. Since when CPU usage has increased, since when memory usage increased, since when disk space is reduced Or Disk I/O is increased, when load average start shooting up etc. Please read the School of SRE link for more detail around metrics and monitoring . Look for recent code or config changes which possibly are breaking the system. Understand the problem Try correlating gathered data with recent actions, like an exception showing up in logs after config/code deployment. Is it due to the QPS increase? Is it bad SQL queries? Do recent code changes demand better or more hardware? Find a solution and apply a fix Based on the above findings, look for a quick fix if possible, For example like rolling back changes if errors/exceptions correlate. Try patching or hotfixing the code, probably in staging setup if you want to fix forward. Try to scale up the system, if high QPS is the reason for system failure, then try adding resources (compute, storage, memory, etc) as necessary. Optimize SQL queries if needed. Verify complete request flow Hit requests again and ensure returns are successful (return code 2xx). Check Logs ensure no more exceptions/errors, as found earlier. Ensure metrics are back to normal. General Host issues To Know if host health is fine or not, look for any hardware failures or its performance issues, one can try following -: Dmesg -: Shows recent errors / failures thrown by kernel. This help with knowing hardware failures if any ls commands -: lspci, lsblk, lscpu, lsscsi, These commands list out pci, disk, cpu information. /var/log/messages -: Shows system app/service related errors/warnings, also shows kernel issues. Smartd -: check disk health.","title":"Troubleshooting"},{"location":"level102/system_troubleshooting_and_performance/troubleshooting/#troubleshooting-flowchart","text":"","title":"Troubleshooting Flowchart"},{"location":"level102/system_troubleshooting_and_performance/troubleshooting/#general-practices","text":"Different systems require different approaches for finding issues. Scope of this is limited and given a problem, there can be many more points which can be looked into. Following points will look at some high level practices towards finding webapp failures and finding fixes for the same. Reproduce problem Try the broken request to reproduce the issue, Like try Hit http/s request which fails. Check the end to end flow of request and look for return codes, mostly 3xx, 4xx or 5xx . 3xx are mostly about redirections, 4xx are about unauthorized, bad request, forbidden, etc, And 5xx is mostly about server side issues. Based on the return code you can look for the next step. Client side issues are mainly about missing or buggy static contents, like javascript issues, bad image, broken json from an async call etc, such can result in incorrect page rendering on browsers. Gather Information Look for errors/exceptions in application logs, Like \"Can\u2019t Allocate Memory\" or OutOfMemoryError, Or Something like \"disk I/O error\", Or a DNS resolution error. Check application and host metrics, Look for anomalies in service and hosts graphs. Since when CPU usage has increased, since when memory usage increased, since when disk space is reduced Or Disk I/O is increased, when load average start shooting up etc. Please read the School of SRE link for more detail around metrics and monitoring . Look for recent code or config changes which possibly are breaking the system. Understand the problem Try correlating gathered data with recent actions, like an exception showing up in logs after config/code deployment. Is it due to the QPS increase? Is it bad SQL queries? Do recent code changes demand better or more hardware? Find a solution and apply a fix Based on the above findings, look for a quick fix if possible, For example like rolling back changes if errors/exceptions correlate. Try patching or hotfixing the code, probably in staging setup if you want to fix forward. Try to scale up the system, if high QPS is the reason for system failure, then try adding resources (compute, storage, memory, etc) as necessary. Optimize SQL queries if needed. Verify complete request flow Hit requests again and ensure returns are successful (return code 2xx). Check Logs ensure no more exceptions/errors, as found earlier. Ensure metrics are back to normal.","title":"General Practices"},{"location":"level102/system_troubleshooting_and_performance/troubleshooting/#general-host-issues","text":"To Know if host health is fine or not, look for any hardware failures or its performance issues, one can try following -: Dmesg -: Shows recent errors / failures thrown by kernel. This help with knowing hardware failures if any ls commands -: lspci, lsblk, lscpu, lsscsi, These commands list out pci, disk, cpu information. /var/log/messages -: Shows system app/service related errors/warnings, also shows kernel issues. Smartd -: check disk health.","title":"General Host issues"}]} \ No newline at end of file diff --git a/sitemap.xml b/sitemap.xml index cebc1f8b..dd6b0364 100644 --- a/sitemap.xml +++ b/sitemap.xml @@ -2,502 +2,502 @@ https://linkedin.github.io/school-of-sre/ - 2023-03-24 + 2023-09-12 daily https://linkedin.github.io/school-of-sre/CODE_OF_CONDUCT/ - 2023-03-24 + 2023-09-12 daily https://linkedin.github.io/school-of-sre/CONTRIBUTING/ - 2023-03-24 + 2023-09-12 daily https://linkedin.github.io/school-of-sre/sre_community/ - 2023-03-24 + 2023-09-12 daily https://linkedin.github.io/school-of-sre/level101/big_data/evolution/ - 2023-03-24 + 2023-09-12 daily https://linkedin.github.io/school-of-sre/level101/big_data/intro/ - 2023-03-24 + 2023-09-12 daily https://linkedin.github.io/school-of-sre/level101/big_data/tasks/ - 2023-03-24 + 2023-09-12 daily https://linkedin.github.io/school-of-sre/level101/databases_nosql/further_reading/ - 2023-03-24 + 2023-09-12 daily https://linkedin.github.io/school-of-sre/level101/databases_nosql/intro/ - 2023-03-24 + 2023-09-12 daily https://linkedin.github.io/school-of-sre/level101/databases_nosql/key_concepts/ - 2023-03-24 + 2023-09-12 daily https://linkedin.github.io/school-of-sre/level101/databases_sql/backup_recovery/ - 2023-03-24 + 2023-09-12 daily https://linkedin.github.io/school-of-sre/level101/databases_sql/concepts/ - 2023-03-24 + 2023-09-12 daily https://linkedin.github.io/school-of-sre/level101/databases_sql/conclusion/ - 2023-03-24 + 2023-09-12 daily https://linkedin.github.io/school-of-sre/level101/databases_sql/innodb/ - 2023-03-24 + 2023-09-12 daily https://linkedin.github.io/school-of-sre/level101/databases_sql/intro/ - 2023-03-24 + 2023-09-12 daily https://linkedin.github.io/school-of-sre/level101/databases_sql/lab/ - 2023-03-24 + 2023-09-12 daily https://linkedin.github.io/school-of-sre/level101/databases_sql/mysql/ - 2023-03-24 + 2023-09-12 daily https://linkedin.github.io/school-of-sre/level101/databases_sql/operations/ - 2023-03-24 + 2023-09-12 daily https://linkedin.github.io/school-of-sre/level101/databases_sql/query_performance/ - 2023-03-24 + 2023-09-12 daily https://linkedin.github.io/school-of-sre/level101/databases_sql/replication/ - 2023-03-24 + 2023-09-12 daily https://linkedin.github.io/school-of-sre/level101/databases_sql/select_query/ - 2023-03-24 + 2023-09-12 daily https://linkedin.github.io/school-of-sre/level101/git/branches/ - 2023-03-24 + 2023-09-12 daily https://linkedin.github.io/school-of-sre/level101/git/conclusion/ - 2023-03-24 + 2023-09-12 daily https://linkedin.github.io/school-of-sre/level101/git/git-basics/ - 2023-03-24 + 2023-09-12 daily https://linkedin.github.io/school-of-sre/level101/git/github-hooks/ - 2023-03-24 + 2023-09-12 daily https://linkedin.github.io/school-of-sre/level101/linux_basics/command_line_basics/ - 2023-03-24 + 2023-09-12 daily https://linkedin.github.io/school-of-sre/level101/linux_basics/conclusion/ - 2023-03-24 + 2023-09-12 daily https://linkedin.github.io/school-of-sre/level101/linux_basics/intro/ - 2023-03-24 + 2023-09-12 daily https://linkedin.github.io/school-of-sre/level101/linux_basics/linux_server_administration/ - 2023-03-24 + 2023-09-12 daily https://linkedin.github.io/school-of-sre/level101/linux_networking/conclusion/ - 2023-03-24 + 2023-09-12 daily https://linkedin.github.io/school-of-sre/level101/linux_networking/dns/ - 2023-03-24 + 2023-09-12 daily https://linkedin.github.io/school-of-sre/level101/linux_networking/http/ - 2023-03-24 + 2023-09-12 daily https://linkedin.github.io/school-of-sre/level101/linux_networking/intro/ - 2023-03-24 + 2023-09-12 daily https://linkedin.github.io/school-of-sre/level101/linux_networking/ipr/ - 2023-03-24 + 2023-09-12 daily https://linkedin.github.io/school-of-sre/level101/linux_networking/tcp/ - 2023-03-24 + 2023-09-12 daily https://linkedin.github.io/school-of-sre/level101/linux_networking/udp/ - 2023-03-24 + 2023-09-12 daily https://linkedin.github.io/school-of-sre/level101/metrics_and_monitoring/alerts/ - 2023-03-24 + 2023-09-12 daily https://linkedin.github.io/school-of-sre/level101/metrics_and_monitoring/best_practices/ - 2023-03-24 + 2023-09-12 daily https://linkedin.github.io/school-of-sre/level101/metrics_and_monitoring/command-line_tools/ - 2023-03-24 + 2023-09-12 daily https://linkedin.github.io/school-of-sre/level101/metrics_and_monitoring/conclusion/ - 2023-03-24 + 2023-09-12 daily https://linkedin.github.io/school-of-sre/level101/metrics_and_monitoring/introduction/ - 2023-03-24 + 2023-09-12 daily https://linkedin.github.io/school-of-sre/level101/metrics_and_monitoring/observability/ - 2023-03-24 + 2023-09-12 daily https://linkedin.github.io/school-of-sre/level101/metrics_and_monitoring/third-party_monitoring/ - 2023-03-24 + 2023-09-12 daily https://linkedin.github.io/school-of-sre/level101/python_web/intro/ - 2023-03-24 + 2023-09-12 daily https://linkedin.github.io/school-of-sre/level101/python_web/python-concepts/ - 2023-03-24 + 2023-09-12 daily https://linkedin.github.io/school-of-sre/level101/python_web/python-web-flask/ - 2023-03-24 + 2023-09-12 daily https://linkedin.github.io/school-of-sre/level101/python_web/sre-conclusion/ - 2023-03-24 + 2023-09-12 daily https://linkedin.github.io/school-of-sre/level101/python_web/url-shorten-app/ - 2023-03-24 + 2023-09-12 daily https://linkedin.github.io/school-of-sre/level101/security/conclusion/ - 2023-03-24 + 2023-09-12 daily https://linkedin.github.io/school-of-sre/level101/security/fundamentals/ - 2023-03-24 + 2023-09-12 daily https://linkedin.github.io/school-of-sre/level101/security/intro/ - 2023-03-24 + 2023-09-12 daily https://linkedin.github.io/school-of-sre/level101/security/network_security/ - 2023-03-24 + 2023-09-12 daily https://linkedin.github.io/school-of-sre/level101/security/threats_attacks_defences/ - 2023-03-24 + 2023-09-12 daily https://linkedin.github.io/school-of-sre/level101/security/writing_secure_code/ - 2023-03-24 + 2023-09-12 daily https://linkedin.github.io/school-of-sre/level101/systems_design/availability/ - 2023-03-24 + 2023-09-12 daily https://linkedin.github.io/school-of-sre/level101/systems_design/conclusion/ - 2023-03-24 + 2023-09-12 daily https://linkedin.github.io/school-of-sre/level101/systems_design/fault-tolerance/ - 2023-03-24 + 2023-09-12 daily https://linkedin.github.io/school-of-sre/level101/systems_design/intro/ - 2023-03-24 + 2023-09-12 daily https://linkedin.github.io/school-of-sre/level101/systems_design/scalability/ - 2023-03-24 + 2023-09-12 daily https://linkedin.github.io/school-of-sre/level102/containerization_and_orchestration/conclusion/ - 2023-03-24 + 2023-09-12 daily https://linkedin.github.io/school-of-sre/level102/containerization_and_orchestration/containerization_with_docker/ - 2023-03-24 + 2023-09-12 daily https://linkedin.github.io/school-of-sre/level102/containerization_and_orchestration/intro/ - 2023-03-24 + 2023-09-12 daily https://linkedin.github.io/school-of-sre/level102/containerization_and_orchestration/intro_to_containers/ - 2023-03-24 + 2023-09-12 daily https://linkedin.github.io/school-of-sre/level102/containerization_and_orchestration/orchestration_with_kubernetes/ - 2023-03-24 + 2023-09-12 daily https://linkedin.github.io/school-of-sre/level102/continuous_integration_and_continuous_delivery/cicd_brief_history/ - 2023-03-24 + 2023-09-12 daily https://linkedin.github.io/school-of-sre/level102/continuous_integration_and_continuous_delivery/conclusion/ - 2023-03-24 + 2023-09-12 daily https://linkedin.github.io/school-of-sre/level102/continuous_integration_and_continuous_delivery/continuous_delivery_release_pipeline/ - 2023-03-24 + 2023-09-12 daily https://linkedin.github.io/school-of-sre/level102/continuous_integration_and_continuous_delivery/continuous_integration_build_pipeline/ - 2023-03-24 + 2023-09-12 daily https://linkedin.github.io/school-of-sre/level102/continuous_integration_and_continuous_delivery/introduction/ - 2023-03-24 + 2023-09-12 daily https://linkedin.github.io/school-of-sre/level102/continuous_integration_and_continuous_delivery/introduction_to_cicd/ - 2023-03-24 + 2023-09-12 daily https://linkedin.github.io/school-of-sre/level102/continuous_integration_and_continuous_delivery/jenkins_cicd_pipeline_hands_on_lab/ - 2023-03-24 + 2023-09-12 daily https://linkedin.github.io/school-of-sre/level102/linux_intermediate/archiving_backup/ - 2023-03-24 + 2023-09-12 daily https://linkedin.github.io/school-of-sre/level102/linux_intermediate/bashscripting/ - 2023-03-24 + 2023-09-12 daily https://linkedin.github.io/school-of-sre/level102/linux_intermediate/conclusion/ - 2023-03-24 + 2023-09-12 daily https://linkedin.github.io/school-of-sre/level102/linux_intermediate/introduction/ - 2023-03-24 + 2023-09-12 daily https://linkedin.github.io/school-of-sre/level102/linux_intermediate/introvim/ - 2023-03-24 + 2023-09-12 daily https://linkedin.github.io/school-of-sre/level102/linux_intermediate/package_management/ - 2023-03-24 + 2023-09-12 daily https://linkedin.github.io/school-of-sre/level102/linux_intermediate/storage_media/ - 2023-03-24 + 2023-09-12 daily https://linkedin.github.io/school-of-sre/level102/networking/conclusion/ - 2023-03-24 + 2023-09-12 daily https://linkedin.github.io/school-of-sre/level102/networking/infrastructure-features/ - 2023-03-24 + 2023-09-12 daily https://linkedin.github.io/school-of-sre/level102/networking/introduction/ - 2023-03-24 + 2023-09-12 daily https://linkedin.github.io/school-of-sre/level102/networking/rtt/ - 2023-03-24 + 2023-09-12 daily https://linkedin.github.io/school-of-sre/level102/networking/scale/ - 2023-03-24 + 2023-09-12 daily https://linkedin.github.io/school-of-sre/level102/networking/security/ - 2023-03-24 + 2023-09-12 daily https://linkedin.github.io/school-of-sre/level102/system_calls_and_signals/conclusion/ - 2023-03-24 + 2023-09-12 daily https://linkedin.github.io/school-of-sre/level102/system_calls_and_signals/intro/ - 2023-03-24 + 2023-09-12 daily https://linkedin.github.io/school-of-sre/level102/system_calls_and_signals/signals/ - 2023-03-24 + 2023-09-12 daily https://linkedin.github.io/school-of-sre/level102/system_calls_and_signals/system_calls/ - 2023-03-24 + 2023-09-12 daily https://linkedin.github.io/school-of-sre/level102/system_design/conclusion/ - 2023-03-24 + 2023-09-12 daily https://linkedin.github.io/school-of-sre/level102/system_design/intro/ - 2023-03-24 + 2023-09-12 daily https://linkedin.github.io/school-of-sre/level102/system_design/large-system-design/ - 2023-03-24 + 2023-09-12 daily https://linkedin.github.io/school-of-sre/level102/system_design/resiliency/ - 2023-03-24 + 2023-09-12 daily https://linkedin.github.io/school-of-sre/level102/system_design/scaling-beyond-the-datacenter/ - 2023-03-24 + 2023-09-12 daily https://linkedin.github.io/school-of-sre/level102/system_design/scaling/ - 2023-03-24 + 2023-09-12 daily https://linkedin.github.io/school-of-sre/level102/system_troubleshooting_and_performance/conclusion/ - 2023-03-24 + 2023-09-12 daily https://linkedin.github.io/school-of-sre/level102/system_troubleshooting_and_performance/important-tools/ - 2023-03-24 + 2023-09-12 daily https://linkedin.github.io/school-of-sre/level102/system_troubleshooting_and_performance/introduction/ - 2023-03-24 + 2023-09-12 daily https://linkedin.github.io/school-of-sre/level102/system_troubleshooting_and_performance/performance-improvements/ - 2023-03-24 + 2023-09-12 daily https://linkedin.github.io/school-of-sre/level102/system_troubleshooting_and_performance/troubleshooting-example/ - 2023-03-24 + 2023-09-12 daily https://linkedin.github.io/school-of-sre/level102/system_troubleshooting_and_performance/troubleshooting/ - 2023-03-24 + 2023-09-12 daily \ No newline at end of file diff --git a/sitemap.xml.gz b/sitemap.xml.gz index 16a07d41..c90b62ce 100644 Binary files a/sitemap.xml.gz and b/sitemap.xml.gz differ