diff --git a/content/understand-django/2021-01-06-serving-static-files.pt.md b/content/understand-django/2021-01-06-serving-static-files.pt.md new file mode 100644 index 0000000..984b31c --- /dev/null +++ b/content/understand-django/2021-01-06-serving-static-files.pt.md @@ -0,0 +1,320 @@ +--- +title: "Serving Static Files" +description: >- + In this Understand Django article, we'll examine static files. Static files are critical to apps, but have little to do with Python code. We'll see what they are and what they do. +image: img/django.png +type: post +categories: + - Python + - Django +tags: + - Python + - Django + - static +series: "Understand Django" + +--- + +In the previous +{{< web >}} +[Understand Django]({{< ref "/understand-django/_index.pt.md" >}}) article, +{{< /web >}} +{{< book >}} +chapter, +{{< /book >}} +I described how Django gives us tools to run code for any request using the middleware system. Our next focus will be on static files. Static files are vital to your application, but they have little to do with Python code. We'll see what they are and what they do. + +{{< understand-django-series-pt "static" >}} + +## What Are Static Files? + +Static files are files that don't change when your application is running. + +These files do a lot to improve your application, but they aren't dynamically generated by your Python web server like a usual HTML response. In a typical web application, your most common static files will be the following types: + +* Cascading Style Sheets, CSS +* JavaScript +* Images + +Keep in mind that even though Django will serve these files statically, there may be a complex process in place to produce the files. For instance, modern JavaScript apps often use complex build tools like {{< extlink "https://webpack.js.org/" "webpack" >}} +to build the final JavaScript files that are served to users. + +Static files are crucial to your Django project because the modern web requires more than dynamically generated HTML markup. Do you visit any website that has *zero* styling of its HTML? These kinds of sites exist and can be awesome for making a quick tool, but most users expect websites to be aesthetically pleasing. For us, that means that we should be prepared to include some CSS styling at a minimum. + +Let's look at some configuration to see where static files live in your project, then begin to work with some examples. + +## Configuration + +To use static files in your project, you need the `django.contrib.staticfiles` app in your project's `INSTALLED_APPS` list. This is another one of the default Django applications that Django will include if you start from the `startproject` command. + +The `staticfiles` app has a handful of {{< extlink "https://docs.djangoproject.com/en/4.1/ref/settings/#settings-staticfiles" "settings" >}} that we need to consider to start. + +I'm going to make the same recommendation about static files as I did with templates. I recommend that you create a `static` directory at the root of your project to hold your static files. Similarly to templates, the `staticfiles` app will look for `static` directories within each of your Django apps to find files, but I find it easier to work with and locate static files if they are all in the same directory. + +To make that setup work, use the `STATICFILES_DIRS` setting. This setting tells Django any additional locations for static files beyond looking for a `static` directory within each app: + +```python +# project/settings.py + +... + +STATICFILES_DIRS = [BASE_DIR / "static"] +``` + +Next, we can define the URL path prefix that Django will use when it serves a static file. Let's say you have `site.css` in the root of your project's `static` directory. You probably wouldn't want the file to be accessible as `mysite.com/site.css`. To do so would mean that static files could conflict with URL paths that your app might need to direct to a view. The `STATIC_URL` setting lets us namespace our static files and, as the {{< extlink "https://www.python.org/dev/peps/pep-0020/" "Zen of Python" >}} says: + +> Namespaces are one honking great idea -- let's do more of those! + +```python +# project/settings.py + +... + +STATICFILES_DIRS = [BASE_DIR / "static"] +STATIC_URL = '/static/' +``` + +With `STATIC_URL` set, we can access `site.css` from `mysite.com/static/site.css`. + +There's one more crucial setting that we need to set, and it is called `STATIC_ROOT`. When we deploy our Django project, Django wants to find all static files from a single directory. The reason for this is for efficiency. It's possible for Django to search through all the app `static` directories and any directories set in `STATICFILES_DIRS` whenever it searches for a file to serve, but that would be slow. + +Instead, Django will put all static files into a single directory so that searching for a file is a search through a single file tree. We'll look more at how this happens in the deployment section later +{{< web >}} +in this article. +{{< /web >}} +{{< book >}} +in this chapter. +{{< /book >}} + +Once we set `STATIC_ROOT`, Django will have the desired output location for static files. If you set the path somewhere in your repository, don't forget to put that path in your `.gitignore` if you're using version control with {{< extlink "https://git-scm.com/" "Git" >}} (and I highly recommend that you do!). Without that addition to `.gitignore`, you'll needlessly add the generated files to version control. I happen to set my `STATIC_ROOT` to a `staticfiles` directory: + +```python +# project/settings.py + +... + +STATICFILES_DIRS = [BASE_DIR / "static"] +STATIC_ROOT = BASE_DIR / "staticfiles" +STATIC_URL = '/static/' +``` + +Now that we know how to configure static files, we're ready to see how to use them in our Django code. + +## Working With Static Files + +The primary way of working with static files is with a template tag. The `static` template tag will help render the proper URL for a static file for your site. + +Here's an example template to consider: + +{{< web >}} +```django +{% load static %} + + + + + + +

Example of static template tag!

+ + +``` +{{< /web >}} +{{< book >}} +```djangotemplate +{% load static %} + + + + + + +

Example of static template tag!

+ + +``` +{{< /book >}} + +In this example, I'm assuming that there is a `css` directory in my `static` directory with a `site.css` file inside. Django will render this template tag as `/static/css/site.css` in the most basic format. We should also note that I had to include `{% load static %}` to ensure that the `static` template tag was available. + +In practice, I find that this `load` requirement bites me all the time. Thankfully, the `TemplateSyntaxError` that Django will raise provides a good clue on how to fix this issue. The exception says "Did you forget to register or load this tag?" How helpful of the Django developers to tell us what we're probably missing! + +Since we know that `STATIC_URL` is `/static/` from the configuration section, why don't I hardcode the link tag path to `/static/css/site.css`? You could, and that might work, but you'll probably run into some long term problems. + +* What if you ever wanted to change `STATIC_URL`? Maybe you want to change it to something shorter like `/s/`. If you hardcode the name, now you have more than one place to change. +* Using some extra features, Django may change the name of a file to something unique by adding a hash to the file name. With a hardcoded path of `/static/css/site.css`, this may lead to a 404 response if Django expects the unique name instead. We'll see what the unique name is for in the next section. + +We should remember to use the `static` tag in the same way that we use the `url` tag when we want to resolve a Django URL path. Both of these tags help avoid hardcoding paths that can change. + +Less commonly, we can refer to a static file from Python code. You can do this by calling a `static` function defined in the same location as the `static` template tag function, but the function is not located where you might expect it. Instead of importing from the `staticfiles` app, Django defines these functions in `django.templatetags.static`. + +For example, if you wanted to serve a JSON view that feeds a JavaScript client application the path to a CSS file, you might write: + +```python +# application/views.py + +from django.http import JsonResponse +from django.templatetags.static import ( + static +) + +def get_css(request): + return JsonResponse( + {'css': static('css/site.css')} + ) +``` + +In my years of experience as a Django developer, I've only seen `static` used in views a handful of times. `static` is certainly more widely used in templates. + +When using static files, there are some important considerations for deploying your application for wider use on the internet. On its own, deployment is a large topic +{{< web >}} +that we'll cover in a future article, +{{< /web >}} +{{< book >}} +that we'll cover in a future chapter, +{{< /book >}} +but we'll focus on static files deployment issues next. + +## Deployment Considerations + +In the configuration section, we saw the `STATIC_ROOT` option. That option will collect all the static files into a single directory, but *when* does it do that? And how do static files work when we run in development mode and don't have all the files in the `STATIC_ROOT` location? + +When you deploy your application to a server, one crucial setting to disable is the `DEBUG` setting. If `DEBUG` is on, all kinds of secret data can leak from your application, so the Django developers *expect* `DEBUG` to be `False` for your live site. Because of this expectation, certain parts of Django behave differently when `DEBUG` changes, and the `staticfiles` app is one such part. + +When `DEBUG` is `True` and you are using the `runserver` command to run the development web server, Django will search for files using a set of "finders" whenever a user requests a static file. These finders are defined by the `STATICFILES_FINDERS` setting, which defaults to: + +```python +[ + 'django.contrib.staticfiles.finders.FileSystemFinder', + 'django.contrib.staticfiles.finders.AppDirectoriesFinder', +] +``` + +As you might guess, the `FileSystemFinder` looks for any static files found in the file system directory that we listed in `STATICFILES_DIRS`. The `AppDirectoriesFinder` looks for static files in the `static` directory of each Django application that you have. You can see how this gets slow when you realize that Django will walk through `len(STATICFILES_DIRS) + len(INSTALLED_APPS)` before giving up to find a single file. + +To make this whole process faster, we turn `DEBUG` to `False`. When `DEBUG` is `False`, all of the slow machinery that searches for files throughout your project for static file requests is turned off. Django only looks in the `STATIC_ROOT` directory for files. + +Since the finders are off when `DEBUG` is `True`, we have to make sure that `STATIC_ROOT` is filled with all the proper files. To put all the static files into place, you can use the `collectstatic` command. + +`collectstatic` will copy all the files it discovers from iterating through each finder and collecting files from what a finder lists. In my example below, my Django project directory is `myproject`, and I set `STATIC_ROOT` to `staticfiles`: + +```bash +$ ./manage.py collectstatic + +42 static files copied to '/Users/matt/myproject/staticfiles'. +``` + +When deploying your application to your server, you would run `collectstatic` before starting the web server. By doing that, you ensure that the web server can access any static files that the Django app might request. + +Can we make this better? You bet! + +### Optimizing Performance In Django + +`staticfiles` has another setting worth considering. I didn't mention it in the configuration section because it's not a critical setting to make static files work, but we're ready for the setting as we're thinking about optimization. We should really consider this setting for our projects because Django is fairly slow at serving static files compared to some of the alternative options that are available in the ecosystem. + +The last setting we'll consider is the `STATICFILES_STORAGE` setting. This setting controls how static files are stored and accessed by Django. We may want to change `STATICFILES_STORAGE` to improve the efficiency of the application. The biggest boost we can get from this setting will provide file caching. + +In an ideal world, your application would only have to serve a static file exactly *one* time to a user's browser. In that scenario, if an application needed to use the file again, then the browser would reuse the *cached* file that it already retrieved. The challenge that we have is that static files (ironically?) change over time. + +Say, for instance, you changed `site.css` to change the styling of the application. You wouldn't want a browser to reuse the old version because it's missing the latest and greatest changes that you made. How do we get the benefit of telling a browser to cache a file for a long time to be as efficient as possible while still having the flexibility to make changes and make the user's browser fetch a new version of the file? + +The "trick" is to serve a "fingerprinted" version of the file. As a part of the deployment process, we would like to uniquely identify each file with some kind of version information. An easy way for a computer to do this is to take the file's content and calculate a hash value. We can have code take `site.css`, calculate the hash, and generate a file with the same content, but with a different filename like `site.abcd1234.css` if `abcd1234` was the generated hash value. + +The next part of the process is to make the template rendering use the `site.abcd1234.css` name. Remember how we used the `static` template tag instead of hardcoding `/static/css/site.css`? This example is a great reason why we did that. By using the `static` tag, Django can render the filename that includes the hash instead of only using `site.css`. + +The final bit that brings this scheme together is to tell the browser to cache `site.abcd1234.css` for a very long time by sending back a certain caching header in the HTTP response. + +Now, we've got the best of both worlds: + +* If the user fetches `site.abcd1234.css`, their browser will keep it for a long time and never need to download it again. This can be reused every time the user visits a page in your app. +* If we ever change `site.css`, then the deployment process can generate a new file like `site.ef567890.css`. When the user makes a request, the HTML will include the new version, their browser won't have it in the cache, and the browser will download the new version with your new changes. + +Great! How do we get this and how much work is it going to require? The answer comes back to the `STATICFILES_STORAGE` setting and a tool called {{< extlink "http://whitenoise.evans.io/en/stable/" "WhiteNoise" >}} (get it!? "white noise" *is* "static." har har). + +WhiteNoise is a pretty awesome piece of software. The library will handle that *entire* caching scheme that I described above. + +To set up WhiteNoise, you install it with `pip install whitenoise`. Then, you need to change your `MIDDLEWARE` list and `STATICFILES_STORAGE` settings: + +```python +# project/settings.py + +... + +MIDDLEWARE = [ + 'django.middleware.security.SecurityMiddleware', + 'whitenoise.middleware.WhiteNoiseMiddleware', + # ... +] + +STATICFILES_STORAGE = \ + 'whitenoise.storage.CompressedManifestStaticFilesStorage' +``` + +That's about it! With this setup, WhiteNoise will do a bunch of work during the `collectstatic` command. The library will generate fingerprinted files like `site.abcd1234.css`, and it will generate compressed versions of those files using the gzip compression algorithm (and, optionally, the {{< extlink "https://en.wikipedia.org/wiki/Brotli" "brotli" >}} compression algorithm). Those extra files look like `site.abcd1234.css.gz` or `site.abcd1234.css.br`. + +When your application runs, the WhiteNoise middleware will handle which files to serve. Because files are static and don't require dynamic processing, we include the middleware high on the list to skip a lot of needless extra Python processing. In my configuration example, I left the `SecurityMiddleware` above WhiteNoise so the app can still benefit from certain security protections. + +As a user's browser makes a request for a fingerprinted file, the browser can include a request header to indicate what compressed formats it can handle. Sending compressed files is *way* faster than sending uncompressed files over a network. WhiteNoise will read the appropriate header and try to respond with the gzip or brotli version. + +The scheme that I described is not the only way to handle static files. In fact, there are some tradeoffs to think about: + +1. Building with WhiteNoise means that we only need to deploy a single app and let Python handle all of the processing. +2. Python, for all its benefits, is not the fastest programming language out there. Leaving Python to serve your static requests will run slower than some other methods. Additionally, your web server's processes must spend time serving the static files rather than being fully devoted to dynamic requests. + +### Optimizing Performance With A Reverse Proxy + +An alternative approach to using Django to serve static files is to use another program as a *reverse proxy*. This setup is more complex, but it can offer better performance if you need it. A reverse proxy is software that sits between your users and your Django application server. CloudFlare has a {{< extlink "https://www.cloudflare.com/learning/cdn/glossary/reverse-proxy/" "good article" >}} if you want to understand why "reverse" is in the name. + +If you set up a reverse proxy, you can instruct it to handle many things, including URL paths coming to your site's domain. This is where `STATIC_ROOT` and `collectstatic` are useful outside of Django. You can set a reverse proxy to serve all the files that Django collects into `STATIC_ROOT`. + +The process is roughly: + +1. Run `collectstatic` to put files into `STATIC_ROOT`. +2. Configure the reverse proxy to handle any URL pattern that starts with `STATIC_URL` (recall `/static/` as an example) and pass those requests to the directory structure of `STATIC_ROOT`. +3. Anything that doesn't look like a static file (e.g., `/accounts/login/`) is delegated to the app server running Django. + +In this setup, the Django app never has to worry about serving static files because the reverse proxy takes care of those requests before reaching the app server. The performance boost comes from the reverse proxy itself. Most reverse proxies are designed in very high performance languages like C because they are designed to handle a specific problem: routing requests. This flow lets Django handle the dynamic requests that it needs to and prevents the slower Python processes from doing work that reverse proxies are built for. + +If this kind of setup appeals to you, one such reverse proxy that you can consider is {{< extlink "https://www.nginx.com/" "Nginx" >}}. The configuration of Nginx is beyond the scope +{{< web >}} +of this series, +{{< /web >}} +{{< book >}} +of this book, +{{< /book >}} +but there are plenty of solid tutorials that will show how to configure a Django app with Nginx. + +## Summary + +{{< web >}} +In this article, +{{< /web >}} +{{< book >}} +In this chapter, +{{< /book >}} +we covered static files. + +We looked at: + +* How to configure static files +* The way to work with static files +* How to handle static files when deploying your site to the internet + +{{< web >}} +Looking ahead to the next article, +{{< /web >}} +{{< book >}} +In the next chapter, +{{< /book >}} +we will learn about automated testing for your Django applications. Testing is one of my favorite topics so I'm excited to share with you about it. + +We'll cover: + +* Why would anyone want to write automated tests +* What kinds of tests are useful to a Django app +* What tools can you use to make testing easier + +{{< web >}} +If you'd like to follow along with the series, please feel free to sign up for my newsletter where I announce all of my new content. If you have other questions, you can reach me online on Twitter where I am {{< extlink "https://twitter.com/mblayman" "@mblayman" >}}. +{{< /web >}} +  diff --git a/content/understand-django/2021-02-22-test-your-apps.pt.md b/content/understand-django/2021-02-22-test-your-apps.pt.md new file mode 100644 index 0000000..b2c69b7 --- /dev/null +++ b/content/understand-django/2021-02-22-test-your-apps.pt.md @@ -0,0 +1,481 @@ +--- +title: "Test Your Apps" +description: >- + How do you confirm that your website works? You could click around and check things out yourself, or you can write code to verify the site. I'll show you why you should prefer the latter. In this Understand Django article, we'll study automated tests to verify the correctness of your site. +image: img/django.png +type: post +categories: + - Python + - Django +tags: + - Python + - Django + - tests +series: "Understand Django" + +--- + +In the previous +{{< web >}} +[Understand Django]({{< ref "/understand-django/_index.pt.md" >}}) article, +{{< /web >}} +{{< book >}} +chapter, +{{< /book >}} +we saw how static files like CSS, JavaScript, and images can be incorporated into your site. Now we're going to focus on how to verify that your website works and continues to work by writing automated tests that check your pages and your code logic. + +{{< understand-django-series-pt "tests" >}} + +## Why Write Tests + +I'm going to assume that if you're reading this, then you've either got a Django project or are considering working with Django to build a project. If that's true, think about your project and how you would make sure it works. + +When you start out with a project, whether for a tutorial or for something real that you plan to grow, the fledgling site has very little functionality. To check that the site is working, you can start up the local web server, open your browser, navigate to the `localhost` URL, and confirm that the site is functional. How long does that take? 5 seconds? 15 seconds? 30 seconds? + +For starting out, manually checking out your site is fine. What happens, though, when you create more pages? How do you continue to confirm that all your pages are functional? You could open up the local site and start clicking around, but the time spent confirming that everything works begins to grow. Maybe your verification effort takes 3 minutes, 5 minutes, or perhaps much more. If you're not careful, your creation may start to feel like the mythical multi-headed Hydra, and what once was a fun project to work on devolves into a chore of tedious page verification. + +You can't eliminate the fact that a larger project means that there is more to check. What you *can* do is change the name of the game. You can change your page checking from something manual that may take 15 seconds to verify a page to something that a computer can do in *milliseconds*. + +This is where automated tests come into the picture. Automated tests let computers do what computers do best: run repetitive tasks repeatedly, consistently, and quickly. When we write tests, our goal is to confirm some logic or behavior in a deterministic way. + +Let's look at a test for a hypothetical `add` function which functions like the `+` operator. This should give us a feel for what an automated test is like if you've never encountered tests before: + +```python +def test_does_it_add(): + assert add(40, 2) == 42 +``` + +The test works by running the code and comparing the result to whatever we expect that result to be. The test *asserts* that the equality statement is true. If the equality is false, then the assertion raises an exception and the test fails. + +This automated test would take virtually no time to run if you compared it to running the function in a Python REPL to inspect the result manually. + +Seeing a silly example of an `add` function doesn't really help you much with how you should test your Django project. Next, we'll look at some types of tests for Django. If you add these kinds of tests to your project, you'll be able to make changes to your website with more confidence that you're not breaking things. + +## Useful Types Of Django Tests + +When we explored the anatomy of a Django application, I noted that I *always* delete the `tests.py` file that comes with the `startapp` command. The reason I do this is because there are different kinds of tests, and I want those different kinds to live in separate files. My apps have those separate files in a `tests` package within the app instead of a `tests.py` module. + +My `tests` package will often mirror the structure of the application itself. The program which executes tests, which is called a "test runner," typically expects to find tests in files that start with `test_`. The package often includes: + +* `test_forms.py` +* `test_models.py` +* `test_views.py` +* etc. + +This structure hints at the kinds of tests that you'd write for your application, but I'll touch on specifics more a bit later. Broadly, when we write automated tests, there is an important dimension to consider: how much application code should my test run? + +The answer to that question influences the behavior of tests. If we write a test that runs a lot of code, then we benefit by checking a lot of a system at once; however, there are some downsides: + +* Running a lot of code means more things can happen and there is a higher chance of your test breaking in unexpected ways. A test that often breaks in unexpected ways is called a "brittle" test. +* Running a lot of code means that there is a lot of code to run. That's axiomatic, but the implication is that a test with more code to execute will take longer to run. Big automated tests are still very likely to be much faster than the same test executed manually, so running time is relative. + +When we have tests that run many parts of your application that are *integrated* together, we call these tests *integration tests*. Integration tests are good at surfacing issues related to the *connections between code*. For instance, if you called a method and passed in the wrong arguments, an integration test is likely to discover that problem. + +On the other end of the spectrum are tests that run very little code. The `add` test from above is a good example. These kinds of tests check individual units of code (e.g., a Django model). For that reason, we call these *unit tests*. Unit tests are good at *checking a piece of code in isolation* to confirm its behavior. + +Unit tests have downsides too. These tests execute without a lot of context from the rest of an application. This can help you confirm the behavior of the piece, but it might not be the behavior that the larger application requires. + +In this explanation, the lesson is that both kinds of tests are good, yet have tradeoffs. Beware of anyone who tells you that you should only write one kind of test or the other. + +> A good set of automated tests will include both **unit** and **integration** tests to check behavior of the individual units and the interconnections between parts. + +We have to consider another aspect to this discussion: what is the "right" amount of code for a unit test? *There's no absolutely correct answer here.* In fact, this topic is hotly debated among testers. + +Some people will assert that a unit test should only run the code for that unit. If you have a class that implements some pure logic and doesn't need other code, then you're in the ideal case. But what happens if you're testing a method that you added to a Django model that needs to interact with a database? Even if the only thing you're testing is the individual model method, a unit test purist would highlight that the test is actually an integration test if it interacts with a database. + +**I usually find this kind of discussion counterproductive.** In my experience, this sort of philosophical debate about what is a unit test doesn't typically help with testing your web app to verify its correctness. I brought all of this up because, if you're going to learn more about testing +{{< web >}} +after this article, +{{< /web >}} +{{< book >}} +after this chapter, +{{< /book >}} +I caution you to avoid getting sucked into this definition trap. + +Here are my working definitions of unit and integration tests in Django. These definitions are imperfect (as are *any* definitions), but they should help frame the discussion +{{< web >}} +in this article. +{{< /web >}} +{{< book >}} +in this chapter. +{{< /book >}} + +* **Unit tests** - Tests that check individual units within a Django project like a model method or a form. +* **Integration test** - Tests that check a group of units and their interactions like checking if a view renders the expected output. + +Now that we have some core notion of what tests are about, let's get into the details. + +### Unit Tests + +As we get into some examples, I need to introduce a couple of tools that I use on all of my Django projects. I'll describe these tools in more depth in a later section, but they need a brief introduction here or my examples won't make much sense. My two "must have" packages are: + +* `pytest-django` +* `factory-boy` + +`pytest-django` is a package that makes it possible to run Django tests through the `pytest` program. pytest is an extremely popular Python testing tool with a huge ecosystem of extensions. In fact, `pytest-django` is one of those extensions. + +My biggest reason for using `pytest-django` is that it lets me use the `assert` keyword in all of my tests. In the Python standard library's `unittest` module and, by extension, Django's built-in test tools which subclass `unittest` classes, checking values requires methods like `assertEqual` and `assertTrue`. As we'll see, using the `assert` keyword exclusively is a very natural way to write tests. + +The other vital tool in my tool belt is `factory-boy`. PyPI calls this `factory-boy`, but the documentation uses `factory_boy`, so we'll use that naming from here on. `factory_boy` is a tool for building test database data. The library has fantastic Django integration and gives us the ability to generate model data with ease. + +Again, I'll focus on these two packages later on to cover more of their features, but you'll see them used immediately in the examples. + +#### Model Tests + +In Django projects, we use models to hold data about our app, so it's very natural to add methods to the models to interact with the data. How do we write a test that checks that the method does what we expect? + +I'm going to give you a mental framework for *any* of your tests, not only unit tests. This framework should help you reason through any tests that you encounter when reading and writing code. The framework is the *AAA pattern*. The AAA patterns stands for: + +* **Arrange** - This is the part of the test that sets up your data and any necessary preconditions for your test. +* **Act** - This stage is when your test runs the application code that you want to test. +* **Assert** - The last part checks that your action is what you expected. + +For a model test, this looks like: + +```python +# application/tests/test_models.py + +from application.models import Order +from application.tests.factories import OrderFactory + +class TestOrder: + def test_shipped(self): + """After shipping an order, the status is shipped.""" + order = OrderFactory( + status=Order.Status.PENDING + ) + + order.ship() + + order.refresh_from_db() + assert order.status == Order.Status.SHIPPED +``` + +We can imagine a project that includes an ecommerce system. A big part of handling orders is tracking status. We could manually set the status field throughout the app, but changing status within a method gives us the chance to do other things. For instance, maybe the `ship` method also triggers sending an email. + +In the test above, we're checking the state transition from `PENDING` to `SHIPPED`. The test acts on the `ship` method, then refreshes the model instance from the database to ensure that the `SHIPPED` status persisted. + +What are some good qualities about this test? + +The test includes a docstring. Trust me, you *will* benefit from docstrings on your tests. There is a strong temptation to leave things at `test_shipped`, but in the future you may not have enough context. + +Many developers opt for long test names instead. While I have no problem with long descriptive test names, docstrings are helpful too. Whitespace is a *good* thing and, in my opinion, it's easier to read "The widget updates the game state when pushed." than `test_widget_updates_game_state_when_pushed`. + +The test checks one action. A test that checks a single action can fit in your head. There's no question about interaction with other parts. There's also no question about what is actually being tested. The simplicity of testing a single action makes each unit test +tell a unique story. + +Conversely, you'll likely encounter tests in projects that do a lot of initial arrangement, then alternate between act and assert lines in a single test. These kinds of tests are brittle (i.e., the term to indicate that the test can break and fail easily) and are difficult to understand when there is a failure. + +The qualities in this test translate to lots of different test types. I think that's the beauty of having a solid mental model for testing. Once you see the way that tests: + +1. Set up the inputs, +2. Take action, +3. Check the outputs, + +then automated testing becomes a lot less scary and more valuable to you. Now let's see how this same pattern plays out in forms. + +#### Form Tests + +When writing tests, we often want to write a "happy path" test. This kind of test is when everything works exactly as you hope. This is a happy path form test: + +```python +# application/tests/test_forms.py + +from application.forms import SupportForm +from application.models import SupportRequest + +class TestSupportForm: + def test_request_created(self): + """A submission to the support form creates a support request.""" + email = "hello@notreal.com" + data = { + "email": email, + "message": "I'm having trouble with your product." + } + form = SupportForm(data=data) + form.is_valid() + + form.save() + + assert SupportRequest.objects.filter( + email=email + ).count() == 1 +``` + +With this test, we are synthesizing a POST request. The test: + +* Builds the POST data as `data` +* Creates a bound form (i.e., connects `data=data` in the constructor) +* Validates the form +* Saves the form +* Asserts that a new record was created + +Notice that I'm bending the AAA rules a bit for this test. Part of the Django convention for forms is that the form is valid before calling the `save` method. If that convention is not followed, then `cleaned_data` won't be populated correctly and most `save` methods depend on `cleaned_data`. Even though `is_valid` is an action, I view it as a setup step for form tests. + +When we work with forms, a lot of what we care about is cleaning the data to make sure that junk is not getting into your app's database. Let's write a test for an invalid form: + +```python +# application/tests/test_forms.py + +from application.forms import SupportForm +from application.models import SupportRequest + +class TestSupportForm: + # ... def test_request_created ... + + def test_bad_email(self): + """An malformed email address is invalid.""" + data = { + "email": "bogus", + "message": "Whatever" + } + form = SupportForm(data=data) + + is_valid = form.is_valid() + + assert not is_valid + assert 'email' in form.errors +``` + +The test shows the mechanics for checking an invalid form. The key elements are: + +* Set up the bad form data +* Check the validity with `is_valid` +* Inspect the output state in `form.errors` + +This test shows how to check an invalid form, but I'm less likely to write this particular test in a real project. Why? Because the test is checking functionality from Django's `EmailField` which has the validation logic to know what is a real email or not. + +Generally, I don't think it's valuable to test features from the framework itself. A good open source project like Django is already testing those features for you. When you write form tests, you should check on custom `clean_*` and `clean` methods as well as any custom `save` method that you might add. + +The patterns for both happy path and error cases are what I use for virtually all of my Django form tests. Let's move on to the integration tests to see what it looks like to test more code at once. + +### Integration Tests + +In my opinion, a good integration test won't look very different from a good unit test. An integration test can still follow the AAA pattern like other automated tests. The parts that change are the tools you'll use and the assertions you will write. + +My definition of an integration test in Django is a test that uses Django's test `Client`. +{{< web >}} +In previous articles, +{{< /web >}} +{{< book >}} +In previous chapters, +{{< /book >}} +I've only mentioned what a client is in passing. In the context of a web application, a client is anything that consumes the output of a web app to display it to a user. + +The most obvious client for a web app is a web browser, but there are plenty of other client types out there. Some examples that could use output from a web application: + +* A native mobile application +* A command line interface +* A programming library like Python's `requests` package that can handle HTTP requests and responses + +The Django test `Client` is like these other clients in that it can interact with your Django project to receive data from requests that it creates. The nice part about the test client is that the output is returned in a convenient way that we can assert against. The client returns the `HttpResponse` object directly! + +With that context, here's an integration test that we can discuss: + +```python +# application/tests/test_views.py + +from django.test import Client +from django.urls import reverse + +from application.tests.factories import UserFactory + +class TestProfileView: + def test_shows_name(self): + """The profile view shows the user's name.""" + client = Client() + user = UserFactory() + + response = client.get( + reverse("profile") + ) + + assert response.status_code == 200 + assert user.first_name in response.content.decode() +``` + +What is this test doing? Also, what is this test *not* doing? + +By using the Django test client, the test runs a lot of Django code. This goes through: + +* URL routing +* View execution (which will likely fetch from the database) +* Template rendering + +That's a lot of code to execute in a single test! The goal of the test is to check that all the major pieces hang together. + +Now let's observe what the test is not doing. Even though the test runs a ton of code, there aren't a huge number of `assert` statements. In other words, our goal with an integration isn't to check every tiny little thing that could happen in the whole flow. Hopefully, we have unit tests that cover those little parts of the system. + +When I write an integration test, I'm mostly trying to answer the question: **does the *system* hold together without breaking?** + +Now that we've covered unit tests and integration tests, what are some tools that will help you make testing easier? + +## Tools To Help + +When testing your application, you have access to so many packages to help that it can be fairly overwhelming. If you're testing for the first time, you may be struggling with applying the AAA pattern and knowing what to test. We want to minimize the extra stuff that you have to know. + +We're going to revisit the tools that I listed earlier, `pytest-django` and `factory_boy`, to get you started. Consider these your Django testing survival kit. As you develop your testing skills, you can add more tools to your toolbox, but these two tools are a fantastic start. + +### `pytest-django` + +{{< extlink "https://docs.pytest.org/en/stable/" "pytest" >}} is a "test runner." The tool's job is to run automated tests. If you read {{< extlink "https://docs.djangoproject.com/en/4.1/topics/testing/overview/" "Writing and running tests" >}} in the Django documentation, you'll discover that Django *also* includes a test runner with `./manage.py test`. What gives? Why am I suggesting that you use `pytest`? + +I'm going to make a bold assertion: **pytest is better**. (Did I just go meta there? Yes, I did. ๐Ÿ˜†) + +I like a lot about Django's built-in test runner, but I keep coming back to pytest for one primary reason: I can use `assert` in tests. As you've seen in these test examples, the `assert` keyword makes for clear reading. We can use all of Python's normal comparison tests (e.g., `==`, `!=`, `in`) to check the output of tests. + +Django's test runner builds off the test tools that are included with Python in the `unittest` module. With those test tools, developers must make test classes that subclass `unittest.TestCase`. The downside of `TestCase` classes is that you must use a set of `assert*` methods to check your code. + +The list of `assert*` methods are included in the {{< extlink "https://docs.python.org/3/library/unittest.html#assert-methods" "unittest" >}} documentation. You can be very successful with these methods, but I think it requires remembering an API that includes a large number of methods. Consider this. Would you rather: + +1. Use `assert`? OR +2. Use `assertEqual`, `assertNotEqual`, `assertTrue`, `assertFalse`, `assertIs`, `assertIsNot`, `assertIsNone`, `assertIsNotNone`, `assertIn`, `assertNotIn`, `assertIsInstance`, and `assertNotIsInstance`? + +Using `assert` from pytest means that you get all the benefits of the `assert*` methods, but you only need to remember a single keyword. If that wasn't enough, let's compare the readability: + +```python +self.assertEqual(my_value, 42) +assert my_value == 42 + +self.assertNotEqual(my_value, 42) +assert my_value != 42 + +self.assertIsNotNone(my_value) +assert my_value is not None + +self.assertTrue(my_value) +assert my_value +``` + +For the same reason that Python developers prefer `property` methods instead of getters and setters (e.g. `obj.value = 42` instead of `obj.set_value(42)`), I think the `assert` style syntax is far simpler to visually process. + +Outside of the awesome handling of `assert`, {{< extlink "https://pytest-django.readthedocs.io/en/latest/" "pytest-django" >}} includes a lot of other features that you might find interesting when writing automated tests. + +### `factory_boy` + +The other test package that I think every developer should use in their Django projects is `{{< extlink "https://factoryboy.readthedocs.io/en/stable/" "factory_boy" >}}`. + +> `factory_boy` helps you build model data for your tests. + +The recommendation when writing automated tests is to use an empty test database. If fact, the common pattern provided with Django testing tools is to use an empty database for every test. Having a blank slate in the database helps each test be independent and makes it easier to assert on the state of the database. Because the test database is empty, you'll need a strategy to populate your tests with the appropriate data to check against. + +As you build up your Django project, you will have more models that help to describe the domain that your website addresses. Generating model data for your tests is an immensely valuable capability. + +You *could* use your model manager's `create` method to create a database entry for your test, but you're going to run into some limits very fast. + +The biggest challenge with using `create` comes from database constraints like foreign keys. What do you do if you want to build a record that requires a large number of non-nullable foreign key relationships? Your only choice is to create those foreign key records. + +We can imagine an app that shows information about movies. The `Movie` model could have a variety of foreign key relationships like director, producer, studio, and so on. I'll use a few in the example, but imagine what would happen as the number of foreign key relationships increases: + +```python +def test_detail_view_show_genre(client): + """The genre is on the detail page.""" + director = Director.objects.create( + name="Steven Spielberg" + ) + producer = Producer.objects.create( + name="George Lucas" + ) + studio = Studio.objects.create( + name='Paramount' + ) + movie = Movie.objects.create( + genre='Sci-Fi', + director=director, + producer=producer, + studio=studio + ) + + response = client.get( + reverse('movie:detail', args=[movie.id]) + ) + + assert response.status_code == 200 + assert 'Sci-Fi' in response.content.decode() +``` + +On the surface, the test isn't *too* bad. I think that's mostly because I kept the modeling simple. What if the `Director`, `Producer`, or `Studio` models also had required foreign keys? We'd spend most of our effort on the Arrangement section of the test. Also, as we inspect the test, we get bogged down with unnecessary details. Did we need to know the names of the director, producer, and studio? No, we didn't need that for this test. Now, let's look at the `factory_boy` equivalent: + +```python +def test_detail_view_show_genre(client): + """The genre is on the detail page.""" + movie = MovieFactory(genre='Sci-Fi') + + response = client.get( + reverse('movie:detail', args=[movie.id]) + ) + + assert response.status_code == 200 + assert 'Sci-Fi' in response.content.decode() +``` + +`MovieFactory` seems like magic. Our test got to ignore all the other details. Now the test could focus entirely on the genre. + +Factories simplify the construction of database records. Instead of wiring the models together in the test, we move that wiring to the factory definition. The benefit is that our tests can use the plain style that we see in the second example. If we need to add a new foreign key to the model, only the factory has to be updated, not all your other tests that use that model. + +What might this `Movie` factory look like? The factory might be: + +```python +# application/tests/factories.py + +import factory + +from application.models import Movie + +# Other factories defined here... + +class MovieFactory(factory.django.DjangoModelFactory): + class Meta: + model = Movie + + director = factory.SubFactory( + DirectorFactory + ) + producer = factory.SubFactory( + ProducerFactory + ) + studio = factory.SubFactory( + StudioFactory + ) + genre = 'Action' +``` + +This factory definition is very declarative. We declare what we want, and `factory_boy` figures out how to put it together. This quality leads to factories that you can reason about because you can focus on the what and not the how of model construction. + +The other noteworthy aspect is that the factories compose together. When we call `MovieFactory()`, `factory_boy` is missing data about everything so it must build all of that data. The challenge is that the `MovieFactory` doesn't know how to build a `Director` or any of the movie's foreign key relationships. Instead, the factory will delegate to *other* factories using the `SubFactory` attribute. By delegating to other factories, `factory_boy` can build the model and its entire tree of relationships with a single call. + +When we want to override the behavior of some of the generated data, we pass in the extra argument as I did in the second example by providing "Sci-Fi" as the `genre`. You can pass in other model instances to your factories too. + +`factory_boy` makes testing with database records a joy. In my experience, most of my Django tests require some amount of database data so I use factories very heavily. I think you will find that `factory_boy` is a worthy addition to your test tools. + +## Summary + +{{< web >}} +In this article, +{{< /web >}} +{{< book >}} +In this chapter, +{{< /book >}} +we explored tests with Django projects. We focused on: + +* Why anyone would want to write automated tests +* What kinds of tests are useful to a Django app +* What tools you can use to make testing easier + +{{< web >}} +Next time, +{{< /web >}} +{{< book >}} +In the next chapter, +{{< /book >}} +we will dig into deployment. Deployment is getting your project into the environment where you will share your application for use. This might be the internet or it might be a private network for your company. Wherever you're putting your app, you'll want to know about: + +* Deploying your application with a Python web application server (i.e., `./manage.py runserver` isn't meant for deployed apps) +* Deployment preconditions for managing settings, migrations, and static files +* A checklist to confirm that your settings are configured with the proper security guards +* Monitoring your application for errors + +{{< web >}} +If you'd like to follow along with the series, please feel free to sign up for my newsletter where I announce all of my new content. If you have other questions, you can reach me online on Twitter where I am {{< extlink "https://twitter.com/mblayman" "@mblayman" >}}. +{{< /web >}} +  diff --git a/content/understand-django/2021-03-23-deploy-site-live.pt.md b/content/understand-django/2021-03-23-deploy-site-live.pt.md new file mode 100644 index 0000000..6f9c581 --- /dev/null +++ b/content/understand-django/2021-03-23-deploy-site-live.pt.md @@ -0,0 +1,366 @@ +--- +title: "Deploy A Site Live" +description: >- + You're ready to take the site you developed and share it with the world. What steps should you take to prepare your Django project for life on the web? That's the focus of this article. +image: img/django.png +type: post +categories: + - Python + - Django +tags: + - Python + - Django + - deployment +series: "Understand Django" + +--- + +In the previous +{{< web >}} +[Understand Django]({{< ref "/understand-django/_index.pt.md" >}}) article, +{{< /web >}} +{{< book >}} +chapter, +{{< /book >}} +we looked at automated testing and how writing tests to check your Django project can be very valuable, saving you time and making sure your site works for your users. Next, we're going to look into how to share your site on the internet by understanding what it means to *deploy* a Django project. Deployment is the act of making your application live to your audience, +{{< web >}} +and this article explains the actions +{{< /web >}} +{{< book >}} +and this chapter explains the actions +{{< /book >}} +you should consider to deploy effectively. + +{{< understand-django-series-pt "deployment" >}} + +## Pick A Python Application Server + +When you begin to learn Django, the documentation will instruct you to use `./manage.py runserver` to interact with your application locally. `runserver` is a great tool for getting started because you can avoid extra software from the outset of your Django journey. + +While great, the `runserver` command is not designed for handling a lot of web traffic. `runserver` is explicitly intended for a development-only setting. Aside from a lack of performance tuning options, the server doesn't receive the same security scrutiny as other Python web application servers. + +These factors add up to make the `runserver` command unsuitable for handling your live site. What should you use instead? When you read the {{< extlink "https://docs.djangoproject.com/en/4.1/howto/deployment/wsgi/" "deployment documentation" >}}, you'll find many possible Python web application servers listed. Gunicorn, uWSGI, Apache with mod_wsgi, Daphne, Hypercorn, and Uvicorn are all presented as available options. That's way too much choice. + +> Use {{< extlink "https://gunicorn.org/" "Gunicorn" >}}. + +Gunicorn (which stands for "Green Unicorn") is a very simple web application server to start using. In my experience, Gunicorn *stays* easy to use and works for projects that receive a ton of traffic. I've used some of the other options presented for Django apps, and few are as simple to use as Gunicorn. + +To use Gunicorn, we need to point the `gunicorn` command to the WSGI application that Django projects have. If you recall, WSGI is the Web Server Gateway Interface. WSGI is the protocol that permits Django apps to talk to any of these web application servers. + +If you ran the `startproject` command and named your Django project as "project," the WSGI application should be in a file like `project/wsgi.py`. Gunicorn is aware that Django conventionally calls the WSGI application in that module `application`, so your only action is to point Gunicorn to the module with Python's dotted syntax. Here's the most basic setup: + +```bash +$ gunicorn project.wsgi +``` + +Gunicorn works by starting a main process that will listen for HTTP requests on the local machine at port 5000 by default. As each request reaches the main process, the request routes to an available worker process. The worker process executes your Django app code to provide the response data to the user. + +By default, Gunicorn will only create a single worker process. The Gunicorn documentation recommends picking a number that is two to four times larger than the number of CPU cores available to your machine. + +The number of workers is a large determining factor in how many requests your Django app can handle at once. The number of requests processed is usually called *traffic* by web developers. The idea of handling more traffic by creating more processes (i.e., Gunicorn workers) is called *horizontal scaling*. In contrast, *vertical scaling* handles more traffic by using a better individual computer. A faster processor with a single CPU can handle more requests. When thinking about performance, horizontal scaling is often a far easier approach. + +One of my projects has a small amount of traffic and runs on a single CPU on its hosting provider. In that scenario, I use two workers which looks like: + +```bash +$ gunicorn project.wsgi --workers 2 +``` + +The only other option you may require is an option to handle where logging data goes. I haven't covered logging in depth yet, +{{< web >}} +but recall from previous articles +{{< /web >}} +{{< book >}} +but recall from previous chapters +{{< /book >}} +that logging allows you to record information about what your application is doing while it's running. + +Some hosting providers expect monitoring output like logging to go to stdout or stderr. stdout stands for "standard output" and stderr is "standard error." stdout is where data appears in your terminal when you use `print`. To tell Gunicorn to log to stderr, you can use a dash as the value to the `log-file` option. My full Gunicorn command looks like: + +```bash +$ gunicorn project.wsgi \ + --workers 2 \ + --log-file - +``` + +A note about ASGI: I am assuming that your use of Django will use WSGI and its synchronous mode. In recent years, Django added support for asynchronous Python. Asynchronous Python brings the promise of higher performance with the tradeoff of some implementation complexity. For learning Django initially, you don't need to understand asynchronous Python and the Asynchronous Server Gateway Interface (ASGI). + +## Pick Your Cloud + +Once you know which application server to use and how to use it, you need to run your code somewhere. Again, you can be paralyzed by the sheer volume of choices available to you. AWS, GCP, Azure, Digital Ocean, Linode, Heroku, PythonAnywhere, and so many other cloud vendors are out there and able to run your application. + +If you're getting started, use a Platform as a Service (PaaS) option. Specifically, I think Heroku is a great platform for applications. A PaaS removes loads of operational complexity that you may be unequipped to handle initially if you're newer to web development. + +Think you want to run your application on a general cloud provider like AWS? You can certainly do that, but you'll potentially need to be prepared for: + +* Setting up machines +* Getting TLS certificates for https +* Doing database backups +* Using configuration management tools to automate deployment +* And loads more! + +You may be using Django to learn those skills. If so, that's awesome and good luck! But if your primary goal is to get your application out into the world, then these tasks are a huge drag on your productivity. + +Let's contrast this with Heroku. Because Heroku is a PaaS, they deal with the vast majority of the setup and coordination of machines in their cloud. Your experience as a developer primarily moves to a single command: + +```bash +$ git push heroku main +``` + +The Heroku instructions have you set up a Git remote. That remote is the place you push your code to and let Heroku handle the deployment of your application. Heroku manages this by cleverly detecting Django applications and applying the correct commands. We'll see some of those required commands when looking at the preconditions. + +To be really clear, this is not an ad for Heroku. I have personal experience with most of the cloud vendors that I listed earlier in this section. I have found that a PaaS like Heroku is far and away an easier option to apply for my own projects. That's why I recommend the service so strongly. + +## Project Preconditions + +Django has a few preconditions that it expects before running your application in a live setting. +{{< web >}} +If you've read the previous articles, then you've actually seen most of these preconditions by now, +{{< /web >}} +{{< book >}} +You've seen most of these preconditions by now in earlier chapters, +{{< /book >}} +but we'll group them together in this section so you can see the complete picture. + +One that we haven't discussed is the `DJANGO_SETTINGS_MODULE` environment variable. This is one critical element to your application because the variable signals to your Django application where the settings module is located. If you have different settings modules for different configurations (e.g., a live site configuration versus a unit testing configuration), then you may need to specify which settings module Django should use when running. + +{{< web >}} +In a future article, +{{< /web >}} +{{< book >}} +In a future chapter, +{{< /book >}} +we'll focus on how to manage your settings modules. At that time, you'll see how using some particular techniques diminish the need for multiple modules. For now, keep in mind `DJANGO_SETTINGS_MODULE` for deployments, and you should be good. + +The next important precondition for your app is keeping your database in sync using migrations. As mentioned +{{< web >}} +in the models article, +{{< /web >}} +{{< book >}} +in the models chapter, +{{< /book >}} +we make migrations when making model changes. These migrations generate instructions for your relational database, so that Django can match the database schema. + +Without the migration system, Django would be unable to communicate effectively with the database. Because of that, you need to ensure that you have applied all migrations to your application before running your app. + +For whatever cloud you're using, you need to make sure that when you deploy, your deployment scripts run: + +```bash +$ ./manage.py migrate +``` + +For instance, with my Heroku setup, Heroku lets me define a "release" command that they guarantee to run before launching the new version of the app. Heroku uses a `Procfile` to set which machines and commands to run so my `Procfile` looks like: + +```yaml +release: python manage.py migrate +web: gunicorn project.wsgi --workers 2 --log-file - +``` + +This file tells Heroku to run migrations before launching, then run gunicorn as the web process for the application. + +Another precondition needed for your app is static files. +{{< web >}} +We saw in the static files article +{{< /web >}} +{{< book >}} +We saw in the static files chapter +{{< /book >}} +that Django looks for static files in a single directory for performance reasons. That requires running a command to put those files in the expected location: + +```bash +$ ./manage.py collectstatic +``` + +In my deployment process for Heroku, this is a step that Heroku automatically does because it can detect a Django project. + +These items are the required preconditions to run your application. Django also has steps that aren't strictly required to make your app work, but are very beneficial. Let's look at how to address those next. + +## Protecting Your Site + +"Put on your seat belt." The average person knows that it's wise to wear a seat belt in a car. The statistical data is overwhelming that a seat belt can help save your life if you're ever in a car accident. Yet, a seat belt is not strictly necessary (aside from a legal perspective) to operate a vehicle. + +Django includes a command that produces a set of instructive safety messages for important site settings and configurations. Thankfully, ignoring these messages is unlikely to affect your personal health, but the messages are valuable to help you combat the bad forces that exist on the public internet. + +To view these important messages, run: + +```bash +$ ./manage.py check --deploy --fail-level WARNING +``` + +On a little sample project that I created, +{{< web >}} +the (slightly reformatted for the article) output looks like: +{{< /web >}} +{{< book >}} +the output looks like: +{{< /book >}} + +```bash +$ ./manage.py check --deploy --fail-level WARNING +SystemCheckError: System check identified some issues: + +WARNINGS: +?: (security.W004) You have not set a value for the SECURE_HSTS_SECONDS + setting. If your entire site is served only over SSL, you may want to + consider setting a value and enabling HTTP Strict Transport Security. + Be sure to read the documentation first; enabling HSTS carelessly can + cause serious, irreversible problems. +?: (security.W008) Your SECURE_SSL_REDIRECT setting is not set to True. + Unless your site should be available over both SSL and non-SSL connections, + you may want to either set this setting True or configure a load balancer + or reverse-proxy server to redirect all connections to HTTPS. +?: (security.W012) SESSION_COOKIE_SECURE is not set to True. Using a + secure-only session cookie makes it more difficult for network traffic + sniffers to hijack user sessions. +?: (security.W016) You have 'django.middleware.csrf.CsrfViewMiddleware' + in your MIDDLEWARE, but you have not set CSRF_COOKIE_SECURE to True. + Using a secure-only CSRF cookie makes it more difficult for network traffic + sniffers to steal the CSRF token. +?: (security.W018) You should not have DEBUG set to True in deployment. +?: (security.W020) ALLOWED_HOSTS must not be empty in deployment. + +System check identified 6 issues (0 silenced). +``` + +The items reported by the checklist are often about settings that could be configured better. These checks are created by the {{< extlink "https://docs.djangoproject.com/en/4.1/topics/checks/" "System check framework" >}} that comes with Django. + +You should review each of the checks and learn about the changes that the check recommends. The items that appear with the `--deploy` flag are usually quite important and fixing them can greatly improve the safety and security of your application. + +Some of these checks are too modest about their importance. For instance, `security.W018` is the warning that tells you that `DEBUG` is set to `True` in the settings. `DEBUG = True` is *TERRIBLE* for a live site since it can *trivially* leak loads of private data. + +As a *warning*, `security.W018` will *not* fail the deploy check because `./manage.py check` defaults to failing on things that are errors. If you want to make sure that your site is sufficiently protected, I strongly encourage you to add the `--fail-level WARNING` flag so that the check will give those warnings the weight that they likely deserve. + +What do you do if the check is handled by some other part of your system? For example, maybe you've set up a secure configuration with HTTPS, and you've set HSTS headers with a reverse proxy like Nginx (this was one of the configurations that I mentioned +{{< web >}} +in the static files article). +{{< /web >}} +{{< book >}} +in the static files chapter). +{{< /book >}} +If HSTS is handled elsewhere, you could set the `SILENCED_SYSTEM_CHECKS` setting to tell Django that you took care of it: + +```python +# project/settings.py + +SILENCED_SYSTEM_CHECKS = [ + "security.W004" +] +``` + +Once you have finished the checklist, your application will be much better equipped to handle the hostile internet, but things can still go wrong. What should you do about errors that happen on your live site? Let's look at that next. + +## Prepare For Errors + +If an error happens on a live site and the site administrator (i.e., *you*) didn't hear it, did it really happen? **Yes, yes it did.** + +Dealing with a live site brings a new set of challenges. Try as we might to consider every possible action that our users do, we'll never get them all. There are lots of ways that a site can have errors from things that we failed to consider. Since errors *will* happen with a large enough product and large enough customer base, we need some plan to manage them. + +We can consider a few strategies: + +#### 1. Do nothing. + +While I don't recommend this strategy, you *could* wait for your customers to report errors to you. Some portion of customers might actually write to you and report a problem, but the vast majority won't. What's worse is that some of these customers may abandon your product if the errors are bad enough or frequent enough. + +> Using your customers to learn about errors makes for a poor user experience. + +#### 2. Use error emails. + +The Django deployment documentation highlights Django's ability to send error information to certain email addresses. *I don't recommend this strategy either.* Why? + +* Setting up email properly can be a very tricky endeavor that involves far more configuration than you may realize. You may need email for your service, but setting it up for error info alone is overkill. +* The error emails can include Python tracebacks to provide context, but other tools can provide much richer context information (e.g., the browser used when a customer experiences an error). +* If you have a runaway error that happens constantly on your site, you can say "bye, bye" to your email inbox. A flood of emails is a quick way to get email accounts flagged and hurt the deliverability of email. + +This brings us to the final strategy that I'll cover. + +#### 3. Use an error tracking service. + +Error tracking services are specifically designed to collect context about errors, aggregate common errors together, and generally give you tools to respond to your site's errors appropriately. + +**I find that error tracking services are the best tools to understand what's going wrong on your live site** because the tools are purpose-built for detailing error behavior. + +Many of these services are not complicated to get installed and configured, and the services often have an extremely generous free tier to monitor your application. + +In the Django world, I generally hear about two of these error tracking services: {{< extlink "https://rollbar.com/" "Rollbar" >}} +and {{< extlink "https://sentry.io/welcome/" "Sentry" >}}. I've used both of these error trackers, and I think they are both great. For my personal projects, I happen to pick Rollbar by default, so I'll describe that service in this section as an example. + +The flow for installing Rollbar is: + +1. Create a Rollbar account on their site. +2. Install the `rollbar` package. +3. Set some settings in a `ROLLBAR` dictionary. + +That's it! + +My Rollbar configuration for one of my projects looks like: + +```python +# project/settings.py + +ROLLBAR = { + "enabled": env("ROLLBAR_ENABLED"), + "access_token": env("ROLLBAR_ACCESS_TOKEN"), + "environment": env("ROLLBAR_ENVIRONMENT"), + "branch": "main", + "root": BASE_DIR, +} +``` + +In this example, everything coming from `env` is from an environment variable that we'll discuss more when we focus on settings management in Django. + +The `enabled` parameter can quickly turn Rollbar on and off. This is good for local development so you're not sending data to the service when working on new features. + +The `access_token` is the secret that Rollbar will provide in your account to associate your app's error data with your account so you can see problems. + +The `environment` setting lets Rollbar split your errors into different groupings. You can use this to separate different configurations that you put on the internet. For instance, the software industry likes to call live sites "production." You may also have a separate site that is available privately to a team that you might call "development." + +The other settings tell Rollbar information that can help map errors back to your code repository. + +Once you set this up, how can you tell that it's working? Like a musician tapping a microphone to see if it's working, I like to add a view to my code that lets me test that my error tracking service is operational: + +```python +# application/views.py + +from django.contrib.admin.views.decorators import staff_member_required + +@staff_member_required +def boom(request): + """This is for checking error handling (like Rollbar).""" + raise Exception("Is this thing on?") +``` + +I connect this view to a URL configuration, then check it after I deploy Rollbar for the first time. Importantly, don't forget to include a `staff_member_required` decorator so that random people on the internet can't trigger errors on your server on a whim! + +With error tracking set up, you'll be in a good position to see errors when they happen on your site. In fact, a great way to win the favor of customers can be to fix errors proactively and reach out to them. Most people don't enjoy contacting support and would be surprised and delighted if you tell them that you fixed their problem immediately. + +## Summary + +{{< web >}} +In this article, +{{< /web >}} +{{< book >}} +In this chapter, +{{< /book >}} +we learned the things to consider when deploying a site to the internet. We examined: + +* Deploying your application with a Python web application server (i.e., `./manage.py runserver` isn't meant for deployed apps) +* Running your app on a cloud vendor +* Deployment preconditions for managing settings, migrations, and static files +* A checklist to confirm that your settings are configured with the proper security guards +* Monitoring your application for errors + +{{< web >}} +In the next article, +{{< /web >}} +{{< book >}} +In the next chapter, +{{< /book >}} +we'll look at Django's tools for managing shorter term user data like authentication info with Django sessions. We'll see the different modes that Django provides and how to use sessions to support your project. You'll learn about: + +* What sessions are and how they work +* Ways that Django uses sessions +* How to use sessions in your apps + +{{< web >}} +If you'd like to follow along with the series, please feel free to sign up for my newsletter where I announce all of my new content. If you have other questions, you can reach me online on Twitter where I am {{< extlink "https://twitter.com/mblayman" "@mblayman" >}}. +{{< /web >}} +  diff --git a/content/understand-django/2021-05-21-sessions.pt.md b/content/understand-django/2021-05-21-sessions.pt.md new file mode 100644 index 0000000..50deca2 --- /dev/null +++ b/content/understand-django/2021-05-21-sessions.pt.md @@ -0,0 +1,241 @@ +--- +title: "Per-visitor Data With Sessions" +description: >- + How does Django know when a user is logged in? Where can the framework store data for a visitor on your app? In this article, we'll answer those questions and look at a storage concept in Django called sessions. +image: img/django.png +type: post +categories: + - Python + - Django +tags: + - Python + - Django + - sessions +series: "Understand Django" + +--- + +In the last +{{< web >}} +[Understand Django]({{< ref "/understand-django/_index.pt.md" >}}) article, +{{< /web >}} +{{< book >}} +chapter, +{{< /book >}} +we saw what it takes to make your Django project live on the internet. Now, we'll get back to a more narrow topic and focus on a way Django can store data for visitors to your site. This is the kind of data that doesn't often fit well into your Django models and is called *session* data. + +{{< understand-django-series-pt "sessions" >}} + +## What Is A Session? + +As *I* was learning Django, I'd run into sessions occasionally and accept that I didn't really understand them. They felt like magic to me. But what is a session? + +> A session is a set of data that is available to users that Django can use over multiple requests. + +From a development perspective, the data is different from the regular data that you would store in a database with Django models. When working with session data, you don't query the database using the ORM. Instead, you can access session content via the `request.session` attribute. + +The `request.session` is a dictionary-like object. Storing data into the session is like working with any other Python dictionary: + +```python +# application/view.py + +from django.http import HttpResponse + +def a_session_view(request): + request.session['data_to_keep'] = 'store this' + return HttpReponse('') +``` + +When Django stores the session data, the framework will keep the data in a JSON format. What is JSON? I've mentioned JSON in passing +{{< web >}} +in previous articles, +{{< /web >}} +{{< book >}} +in previous chapters, +{{< /book >}} +but now is a decent time to explain it. Knowing what JSON is will help you understand what happens to session data as the data is stored. + +### The "What is JSON?" Sidebar + +JSON is a data format. JSON is a way of describing data so that the data can be stored or transmitted. The definition of that format is listed on the official {{< extlink "https://www.json.org/json-en.html" "JSON website" >}} and can be understood in probably 10 minutes or less. + +That stored data can be parsed based on the definition of the format to recreate the data at a different time or on a different computer. In general, you can view JSON as a tool to take Python dictionaries or lists and store or transmit them for use elsewhere. + +The Python standard library includes a module for working with JSON data. Here's an example to give you an idea of what JSON output looks like: + +```python +>>> import json +>>> data = {'hello': 'world'} +>>> json.dumps(data) +'{"hello": "world"}' +>>> json_string = json.dumps(data) +>>> parsed_data = json.loads(json_string) +>>> parsed_data +{'hello': 'world'} +>>> data == parsed_data +True +``` + +The `dumps` and `loads` functions transform data to and from a string, respectively (the `s` in the those function names stands for "string"). + +JSON is an extremely versatile format and is used all over the internet. Getting back to sessions, JSON is a good fit because there are multiple places where Django can store session data. Let's look at those next. + +## Session Storage + +{{< web >}} +You probably know this drill by now if you've been following this series. +{{< /web >}} +{{< book >}} +You probably know this drill by now. +{{< /book >}} +Like the template system, the ORM, and the authentication system, the session application is configurable with multiple different "engines" to store session data. + +When you start a new Django project with the `startproject` command, the session engine will be set to `django.contrib.sessions.backends.db`. This is because the `SESSION_ENGINE` setting will be unset in your settings module, and Django will fall back to the default. + +With this engine, Django will store session data in the database. Because `startproject` includes the `django.contrib.sessions` app in `INSTALLED_APPS`, you'd probably see the following stream by when you migrate your database for the first time: + +```text +(venv) $ ./manage.py migrate + ... +Running migrations: + ... + Applying sessions.0001_initial... OK +``` + +The `Session` model stores three things: + +* A session key that uniquely identifies the session in the storage engine +* The actual session data, stored in JSON format, in a `TextField` +* An expiration date for the session data + +With these three fields, Django can handle the temporary storage needs for any of your site's visitors. + +Why is the session engine configurable? Django's session storage is configurable to manage tradeoffs. The default storage of a database engine is a safe default and the easiest to understand. The answer to "Where is my app's session data?" is "In the database with all of my other application data." + +If your site grows in popularity and usage, using the database to store sessions can become a bottleneck and limit your performance and application scaling. Additionally, the default engine creates an ever expanding set of database rows in the `Session` model's table. You can work around the second challenge by periodically running the `clearsessions` management command, but what if the performance is a problem for you? + +This is where other storage engines might be better for your application. One method to improve performance is to switch to an engine that uses caching. If you have set up the caching system with a technology like {{< extlink "https://redis.io/" "Redis" >}} or {{< extlink "https://memcached.org/" "Memcached" >}}, then a lot of session load on the database can be pushed to the cache service. Caching is a topic we will explore more +{{< web >}} +in a future article, +{{< /web >}} +{{< book >}} +in a future chapter, +{{< /book >}} +so if this doesn't make too much sense right now, I apologize for referencing concepts that I haven't introduced yet. For the time being, understand that caching can improve session performance. + +Another session storage engine that can remove load from a database uses the browser's cookie system. This system will certainly remove database load because the state will be stored with the browser, but this strategy comes with its own set of tradeoffs. With cookie-base storage: + +* The storage could be cleared at any time by the user. +* The storage engine is limited to a small amount of data storage by the browser, based on the maximum allowed size of a cookie (commonly, only 4kB). + +Choosing the right session storage engine for your application depends on what the app does. If you're in doubt, start with the default of database-backed storage, and you should be fine initially. + +## How Does The Session System Identify Visitors? + +When a visitor comes to your site, Django needs to associate the session data to the visitor. To do this association, Django will store a session identifier in a cookie on the user's browser. + +On the first visit, the session storage engine will look for a cookie with the name `sessionid` (by default). If the application doesn't find that cookie, then the session storage will generate a random ID and ensure that the random ID doesn't conflict with any other session IDs that already exist. + +From there, the storage engine will store some session data via whatever mechanism that engine uses (e.g., the database engine will create a new session row in the table). + +The session ID is added to the user's browser cookies for your site's domain. Cookies are stored in a secured manner so only that browser will have access to that randomly generated value. The session ID is very long (32 characters), and the session will expire after a given length of time. These characteristics make session IDs quite secure. + +Since sessions are secure and can uniquely identify a browser, what kind of data can we put in there? + +## What Uses Sessions? + +Sessions can store all kinds of data, but what are some real world use cases? You can look in Django's source code to find some immediate answers! + +In my estimate, the most used part of Django that uses sessions heavily is the auth system. We explored the authentication and authorization system +{{< web >}} +in [User Authentication]({{< ref "/understand-django/2020-11-04-user-authentication.pt.md" >}}). +{{< /web >}} +{{< book >}} +in the User Authentication chapter. +{{< /book >}} +At the time, I mentioned in the pre-requisites that sessions were required, but I noted that sessions were an internal detail. Now that you know what sessions are about, let's see how the auth system uses them. + +If you look into the session data after you've authenticated, you'll find three pieces of information: + +* The user's ID (stored in `_auth_user_id`) +* The user's hash (stored in `_auth_user_hash`) +* The string name of the auth backend used (stored in `_auth_user_backend`) + +Since we know that a session identifies a browser and does so securely, the auth system stores identity information into the session to tie that unique session to a unique user. When a user's browser makes an HTTP request, Django can determine the session associated with the request via the `sessionid` and gain access to the user auth data (i.e., the user's ID, hash, and auth backend). With these data elements, the auth system can determine if the request is valid and should be considered authenticated by checking against the associated auth session data. + +The auth system will read which backend is used and load that backend if possible. The backend is used to load the specific user record from the ID found in the session. Finally, that user is used to check if the hash provided validates when compared to the user's hashed password (there is some extra hashing involved to ensure that the user's password hash is not stored directly in the session). If the comparison checks out, the user is authenticated and the request proceeds as an authenticated request. + +You can see that the session is vital to this flow with the auth system. Without the ability to store state in the session, the user would be unable to prove who they were. + +Another use of sessions is found with CSRF handling. The CSRF security features in Django +{{< web >}} +(which I mentioned in the forms article +{{< /web >}} +{{< book >}} +(which I mentioned in the forms chapter +{{< /book >}} +and we will explore more in a future topic) permit CSRF tokens to be stored in the session instead of a cookie when the `CSRF_USE_SESSIONS` setting is enabled. Django provides a safe default for CSRF tokens in cookies, but the session is an alternative storage place if you're not happy enough with the cookie configuration. + +As a final example, we can look at the `messages` application. The `messages` app can store "flash" messages. A flash message is the kind of message that you'd expect to see on a single page view. For instance, if you have a message that you'd like to display to a user upon some action, you might use a flash message. Perhaps your application has some "Contact Us" form to receive customer feedback. After the customer submits the form, you might want the application to flash "Thank you for the feedback!": + +```python +# application/views.py + +from django.contrib import messages +from django.views.generic import FormView +from django.urls import reverse_lazy + +from .forms import ContactForm + +class ContactView(FormView): + form_class = ContactForm + success_url = reverse_lazy("application:index") + + def form_valid(self, form): + messages.info( + self.request, + "Thank you for the feedback!" + ) + return super().form_valid(form) +``` + +In the default setup, Django will attempt to store the flash message in the request's cookies, but, as we saw earlier, browsers constrain the maximum cookie size. If the flash messages will not fit in the request's cookies, then the `messages` app will switch to the session as a more robust alternative. Observe that this might run into problems if you are using the session's cookie storage engine! + +I hope that these examples from Django's `contrib` package provide you with some ideas for how you might use sessions in your own projects. + +## Summary + +{{< web >}} +In this article, +{{< /web >}} +{{< book >}} +In this chapter, +{{< /book >}} +we dug into Django sessions and how you use them. + +We saw: + +* What sessions are and the interface they expose as `request.session` +* How JSON is used to manage session data +* Different kinds of session storage that are available to your site +* The way that Django recognizes a user's session in the browser +* Examples within `django.contrib` of how sessions get used by Django's built-in apps. + +{{< web >}} +In the next article, +{{< /web >}} +{{< book >}} +In the next chapter, +{{< /book >}} +we are going to spend time focusing on settings in Django. + +You'll learn about: + +* Various strategies for managing your project's settings +* Django's tools to help with settings +* Tools in the larger Django ecosystem that can make your life easier + +{{< web >}} +If you'd like to follow along with the series, please feel free to sign up for my newsletter where I announce all of my new content. If you have other questions, you can reach me online on Twitter where I am {{< extlink "https://twitter.com/mblayman" "@mblayman" >}}. +{{< /web >}} +  diff --git a/content/understand-django/2021-07-15-settings.pt.md b/content/understand-django/2021-07-15-settings.pt.md new file mode 100644 index 0000000..19caccb --- /dev/null +++ b/content/understand-django/2021-07-15-settings.pt.md @@ -0,0 +1,387 @@ +--- +title: "Making Sense Of Settings" +description: >- + All Django apps need to be configured in order to run properly. In this article, we will dig into how Django lets you configure your project using a settings module. We'll also look at ways to be extra effective with settings. +image: img/django.png +type: post +categories: + - Python + - Django +tags: + - Python + - Django + - settings +series: "Understand Django" + +--- + +In the last +{{< web >}} +[Understand Django]({{< ref "/understand-django/_index.pt.md" >}}) article, +{{< /web >}} +{{< book >}} +chapter, +{{< /book >}} +we looked at a storage concept in Django called sessions. Sessions provide a solution to problems like "How does Django know when a user is logged in?" or "Where can the framework store data for a visitor on your app?" + +{{< web >}} +With this article, +{{< /web >}} +{{< book >}} +With this chapter, +{{< /book >}} +you'll learn about Django settings and how to manage the configuration of your application. We'll also look at tools to help you define your settings effectively. + +{{< understand-django-series-pt "settings" >}} + +## How Is Django Configured? + +To run properly, Django needs to be configured. We need to understand where this configuration comes from. Django has the ability to use default configuration values or values set by developers like yourself, but where does it get those from? + +Early in the process of starting a Django application, Django will internally import the following: + +```python +from django.conf import settings +``` + +This `settings` import is a package level object created in `django/conf/__init__.py`. The `settings` object has attributes added to it from two primary sources. + +The first source is a set of global default settings that come from the framework. These global settings are from `django/conf/global_settings.py` and provide a set of initial values for configuration that Django needs to operate. + +The second source of configuration settings comes from user defined values. Django will accept a Python module and apply its module level attributes to the `settings` object. To find the user module, Django searches for a `DJANGO_SETTINGS_MODULE` environment variable. + +### Sidebar: Environment Variables + +Environment variables are not a Django concept. When any program runs on a computer, the operating system makes certain data available to the running program. This set of data is called the program's "environment," and each piece of data in that set is an environment variable. + +If you're starting Django from a terminal, you can view the environment variables that Django will receive from the operating system by running the `env` command on macOS or Linux, or the `set` command on Windows. + +We can add our own environment variables to the environment with the `export` command on macOS or Linux, or the `set` command on Windows. Environment variables are typically named in all capital letters: + +```bash +$ export HELLO=world +``` + +Now that we have a basic understanding of environment variables, let's return to the `DJANGO_SETTINGS_MODULE` variable. The variable's value should be the location of a Python module containing any settings that a developer wants to change from Django's default values. + +If you create a Django project with `startproject` and use `project` as the name, then you will find a generated settings file with the path `project/settings.py`. When Django runs, you could explicitly instruct Django with: + +```bash +$ export DJANGO_SETTINGS_MODULE=project.settings +``` + +Instead of supplying the file path, the `DJANGO_SETTINGS_MODULE` should be in a Python module dotted notation. + +You may not actually need to set `DJANGO_SETTINGS_MODULE` explicitly. If you stick with the same settings file that is created by `startproject`, you can find a line in `wsgi.py` that looks like: + +```python +os.environ.setdefault( + 'DJANGO_SETTINGS_MODULE', + 'project.settings' +) +``` + +Because of this line, Django will attempt to read from `project.settings` (or whatever you named your project) without the need to explicitly set `DJANGO_SETTINGS_MODULE`. Feel free to adjust the default value if you have a different settings file that you prefer to use for local development. + +Once Django reads the global settings and any user defined settings, we can get any configuration from the `settings` object via attribute access. This convention of keeping all configuration in the `settings` object is a convenient pattern that the framework, third party library ecosystem, and *you* can depend on: + +```python +$ ./manage.py shell +>>> from django.conf import settings +>>> settings.SECRET_KEY +'a secret to everybody' +``` + +The `settings` object is a shared item so it is generally thought to be a Really Bad Ideaโ„ข to edit and assign to the object directly. Keep your settings in your settings module! + +That's the core of Django configuration. We're ready to focus on the user defined settings and our responsibilities as Django app developers. + +## Settings Module Patterns + +There are multiple ways to deal with settings modules and how to populate those modules with the appropriate values for different environments. Let's look at some popular patterns. + +### Multiple Modules Per Environment + +A Django settings module is a Python module. Nothing is stopping us from using the full power of Python to configure that module the way we want. + +Minimally, you will probably have at least two environments where your Django app runs: + +* On your local machine while developing +* On the internet for your live site + +We should know by now that setting `DEBUG = True` is a terrible idea for a live Django site, so how can we get the benefits of the debug mode without having `DEBUG` set to `True` in our module? + +One technique is to use separate settings modules. With this strategy, you can pick which environment your Django app should run for by switching the `DJANGO_SETTINGS_MODULE` value to pick a different environment. You might have modules like: + +* `project.settings.dev` +* `project.settings.stage` +* `project.settings.production` + +These examples would be for a local development environment on your laptop, a staging environment (which is a commonly used pattern for testing a site that is as similar to the live site as possible without *being* the live site), and a production environment. +{{< web >}} +As a reminder from the deployment article, +{{< /web >}} +{{< book >}} +As a reminder from the deployment chapter, +{{< /book >}} +the software industry likes to call the primary site for customers "production." + +This strategy has certain challenges to consider. Should you replicate settings in each file or use some common module between them? + +If you decide to replicate the settings across modules, you'll have the advantage that the settings module shows *all* of the settings in a single place for that environment. The disadvantage is that keeping the common settings the same could be a challenge if you forget to update one of the modules. + +On the other hand, you could use a common module. The advantage to this form is that the common settings can be in a single location. The environment specific files only need to record the *differences* between the environments. The disadvantage is that it is harder to get a clear picture of all the settings of that environment. + +If you decide to use a common module, this style is often implemented with a `*` import. I can probably count on one hand the number of places where I'm ok with a `*` import, and this is one of them. In most cases the Python community prefers explicit over implicit, and the idea extends to the treatment of imports. Explicit imports make it clear what a module is actually using. The `*` import is very implicit, and it makes it unclear what a module uses. For the case of a common settings module, a `*` import is actually positive because we want to use *everything* in the common module. + +Let's make this more concrete. Assume that you have a `project.settings.base` module. This module would hold your common settings for your app. I'd recommend that you try to make your settings safe and secure by default. For instance, use `DEBUG = False` in the base settings and force other settings modules to opt-in to the more unsafe behavior. + +For your local development environment on your laptop, you could use `project.settings.dev`. This settings module would look like: + +```python +# project/settings/dev.py + +from project.settings.base import * + +DEBUG = True + +# Define any other settings that you want to override. +... +``` + +By using the `*` import in the `dev.py` file, all the settings from `base.py` are pulled into the module level scope. Where you want a setting to be different, you set the value in `dev.py`. When Django starts using `DJANGO_SETTINGS_MODULE` of `project.settings.dev`, all the values from `base.py` will be used via `dev.py`. + +This scheme gives you control to define common things once, but there is still a big challenge with this. What do we do about settings that need to be kept secret (e.g., API keys)? + +*Don't commit secret data to your code repository!* Adding secrets to your source control tool like Git is usually not a good idea. This is especially true if you have a public repository on GitHub. Think no one is paying attention to your repo? Think again! There are tools out there that scan *every public commit* made to GitHub. These tools are specifically looking for secret data to exploit. + +If you can't safely add secrets to your code repo, where can we add them instead? You can use environment variables! Let's look at another scheme for managing settings with environment variables. + +### Settings Via Environment Variables + +In Python, you can access environment variables through the `os` module. The module contains the `environ` attribute, which functions like a dictionary. + +By using environment variables, your settings module can get configuration settings from the external environment that is running the Django app. This is a solid pattern because it can accomplish two things: + +* Secret data can be kept out of your code +* Configuration differences between environments can be managed by changing environment variable values + +Here's an example of secret data management: + +```python +# project/settings.py + +import os + +SECRET_KEY = os.environ['SECRET_KEY'] + +... +``` + +Django needs a secret key for a variety of safe hashing purposes. There is a warning in the default `startproject` output that reads: + +```python +# SECURITY WARNING: keep the secret key used in production secret! +``` + +By moving the secret key value to an environment variable that happens to have a matching name of `SECRET_KEY`, we won't be committing the value to source control for some nefarious actor to discover. + +This pattern works really well for secrets, but it can also work well for *any* configuration that we want to vary between environments. + +For instance, on one of my projects, I use the excellent {{< extlink "https://anymail.readthedocs.io/en/stable/" "Anymail" >}} package +to send emails via an email service provider (of the ESPs, I happen to use {{< extlink "https://sendgrid.com/" "SendGrid" >}}). When I'm working with my development environment, I don't want to send real email. Because of that, I use an environment variable to set Django's `EMAIL_BACKEND` setting. This lets me switch between the Anymail backend and Django's built-in `django.core.mail.backends.console.EmailBackend` that prints emails to the terminal instead. + +If I did this email configuration with `os.environ`, it would look like: + +```python +# project/settings.py + +import os + +EMAIL_BACKEND = os.environ.get( + "EMAIL_BACKEND", + "anymail.backends.sendgrid.EmailBackend" +) + +... +``` + +I prefer to make my default settings closer to the live site context. This not only leads to safer behavior (because I have to explicitly opt-out of safer settings like switching from `DEBUG = False` to `DEBUG = True`), but it also means that my live site has less to configure. That's good because there are fewer chances to make configuration mistakes on the site that matters most: the one where my customers are. + +We need to be aware of a big gotcha with using environment variables. *Environment variables* are only available as a `str` type. This is something to be aware of because there will be times when you want a boolean settings value or some other *type* of data. In a situation where you need a different type, you have to coerce a `str` into the type you need. In other words, don't forget that every string except the empty string is truthy in Python: + +```python +>>> not_false = "False" +>>> bool(not_false) +True +``` + +In the next section, we will see tools that help alleviate this typing problem. + +Note: As you learn more about settings, you will probably encounter advice that says to avoid using environment variables. This is well intentioned advice that highlights that there *is* some risk with using environment variables. With this kind of advice, you may read a recommendation for secret management tools like {{< extlink "https://www.vaultproject.io/" "HashiCorp Vault" >}}. These are good tools, but consider them a more advanced topic. In my opinion, using environment variables for secret management is a reasonably low risk storage mechanism. + +## Settings Management Tools + +We can focus on two categories of tools that can help you manage your settings in Django: built-in tools and third party libraries. + +The built-in tool that is available to you is the `diffsettings` command. This tool makes it easy to see the computed settings of your module. Since settings can come from multiple files (including Django's `global_settings.py`) or environment variables, inspecting the settings output of `diffsettings` is more convenient than thinking through how a setting is set. + +By default, `diffsettings` will show a comparison of the settings module to the default Django settings. Settings that aren't in the defaults are marked with `###` after the value to indicate that they are different. + +I find that the default output is not the most useful mode. Instead, you can instruct `diffsettings` to output in a "unified" format. This format looks a lot more like a code diff. In addition, Django will colorize that output so that it's easier to see. Here's an example of some of the security settings by running `./manage.py diffsettings --output unified` for one of my projects: + +```diff +- SECURE_HSTS_INCLUDE_SUBDOMAINS = False ++ SECURE_HSTS_INCLUDE_SUBDOMAINS = True +- SECURE_PROXY_SSL_HEADER = None ++ SECURE_PROXY_SSL_HEADER = ('HTTP_X_FORWARDED_PROTO', 'https') +``` + +Finally, I'll note that you can actually compare two separate settings modules. Let's say you wanted to compare settings between your development mode and your live site. Assuming your settings files have names like I described earlier, you could run something like: + +```bash +$ ./manage.py diffsettings \ + --default project.settings.dev \ + --settings project.settings.production \ + --output unified +``` + +By using the `--default` flag, we instruct Django that `project.settings.dev` is the baseline for comparison. This version of the command will show where the two settings modules are different. + +Django only includes this single tool for working with settings, but I hope you can see that it's really handy. Now let's talk about a useful third party library that can help you with settings. + +{{< web >}} +Earlier in the article, +{{< /web >}} +{{< book >}} +Earlier in the chapter, +{{< /book >}} +I noted that dealing with environment variables has the pitfall of working with string data for everything. Thankfully, there is a package that can help you work with environment variables. The project is called {{< extlink "https://django-environ.readthedocs.io/en/latest/" "django-environ" >}}. +django-environ primarily does two important things that I value: + +* The package allows you to coerce strings into a desired data type. +* The package will read from a file to load environment variables into your environment. + +What does type coercion look like? With `django-environ`, you start with an `Env` object: + +```python +# project/settings.py + +import environ + +env = environ.Env() +``` + +The keyword arguments to `Env` describe the different environment variables that you expect the app to process. The key is the name of the environment variable. The value is a two element tuple. The first tuple element is the type you want, and the second element is a default value if the environment variable doesn't exist. + +If you want to be able to control `DEBUG` from an environment variable, the settings would be: + +```python +# project/settings.py + +import environ + +env = environ.Env( + DEBUG=(bool, False), +) + +DEBUG = env("DEBUG") +``` + +With this setup, your app will be safe by default with `DEBUG` set to `False`, but you'll be able to override that via the environment. `django-environ` works with a handful of strings that it will accept as `True` such as "on", "yes", "true", and others (see the documentation for more details). + +Once you start using environment variables, you'll want a convenient way to set them when your app runs. Manually calling `export` for all your variables before running your app is a totally unsustainable way to run apps. + +The `Env` class comes with a handy class method named `read_env`. With this method, your app can read environment variables into `os.environ` from a file. Conventionally, this file is named `.env`, and the file contains a list of key/value pairs that you want as environment variables. Following our earlier example, here's how we could set our app to be in debug mode: + +```env +# .env +DEBUG=on +``` + +Back in the settings file, you'd include `read_env`: + +```python +# project/settings.py + +import environ + +environ.Env.read_env() +env = environ.Env( + DEBUG=(bool, False), +) + +DEBUG = env("DEBUG") +``` + +If you use a `.env` file, you will occasionally find a need to put secrets into this file for testing. Since the file can be a source for secrets, you should add this to `.gitignore` or ignore it in whatever version control system you use. As time goes on, the list of variables and settings will likely grow, so it's also a common pattern to create a `.env.example` file that you can use as a template in case you ever need to start with a fresh clone of your repository. + +## My Preferred Settings Setup + +Now we've looked at multiple strategies and tools for managing settings. I've used many of these schemes on various Django projects, so what is my preferred setup? + +For the majority of use cases, I find that working with `django-environ` in a single file is the best pattern in my experience. + +When I use this approach, I make sure that all of my settings favor a safe default configuration. This minimizes the configuration that I have to do for a live site. + +I like the flexibility of the pattern, and I find that I can quickly set certain configurations when developing. For instance, when I want to do certain kinds of testing like checking email rendering, I'll call something like: + +```bash +$ EMAIL_TESTING=on ./manage.py runserver +``` + +My settings file has a small amount of configuration to alter the email settings to point emails to a local SMTP server tool called {{< extlink "https://github.com/mailhog/MailHog" "MailHog" >}}. Because I set an environment variable directly on my command line call, I can easily switch into a mode that sends email to MailHog for quick review. + +Overall, I like the environment variable approach, but I do use more than one settings file for one important scenario: testing. + +When I run my unit tests, I want to guarantee that certain conditions are always true. There are things that a test suite should never do in the vast majority of cases. Sending real emails is a good example. If I happen to configure my `.env` to test real emails for the local environment, I don't want my tests to send out an email accidentally. + +Thus, I create a separate testing settings file and configure my test runner (pytest) to use those settings. This settings file *does* mostly use the base environment, but I'll override some settings with explicit values. Here's how I protect myself from accidental live emails: + +```python +# project/testing_settings.py + +from .settings import * + +# Make sure that tests are never sending real emails. +EMAIL_BACKEND = "django.core.mail.backends.locmem.EmailBackend" +``` + +Even though my `Env` will look for an `EMAIL_BACKEND` environment variable to configure that setting dynamically, the testing setting is hardcoded to make email sending accidents impossible. + +The combination of a single file for most settings supplemented with a testing settings file for safety is the approach that has worked the best for me. + +## Summary + +{{< web >}} +In this article, +{{< /web >}} +{{< book >}} +In this chapter, +{{< /book >}} +you learned about Django settings and how to manage the configuration of your application. + +We covered: + +* How Django is configured +* Patterns for working with settings in your projects +* Tools that help you observe and manage settings + +{{< web >}} +In the next article, +{{< /web >}} +{{< book >}} +In the next chapter, +{{< /book >}} +we will look at how to handle files and media provided by users (e.g., profile pictures). + +You'll learn about: + +* How Django models maintain references to files +* How the files are managed in Django +* Packages that can store files in various cloud services + +{{< web >}} +If you'd like to follow along with the series, please feel free to sign up for my newsletter where I announce all of my new content. If you have other questions, you can reach me online on Twitter where I am {{< extlink "https://twitter.com/mblayman" "@mblayman" >}}. +{{< /web >}} +  diff --git a/content/understand-django/2021-09-14-media-files.pt.md b/content/understand-django/2021-09-14-media-files.pt.md new file mode 100644 index 0000000..c3c5d85 --- /dev/null +++ b/content/understand-django/2021-09-14-media-files.pt.md @@ -0,0 +1,300 @@ +--- +title: "User File Use" +description: >- + Maybe your app needs to handle files from users like profile pictures. Accepting files from others is tricky to do safely. In this article, we'll see the tools that Django provides to manage files safely. +image: img/django.png +type: post +categories: + - Python + - Django +tags: + - Python + - Django + - files +series: "Understand Django" + +--- + +In the last +{{< web >}} +[Understand Django]({{< ref "/understand-django/_index.pt.md" >}}) article, +{{< /web >}} +{{< book >}} +chapter, +{{< /book >}} +you learned about Django settings and how to manage the configuration of your application. We also looked at tools to help you define settings effectively. + +{{< web >}} +With this article, +{{< /web >}} +{{< book >}} +With this chapter, +{{< /book >}} +we're going to dig into file management. Unlike the static files that you create for the app yourself, you may want your app to accept files from your users. Profile pictures are a good example of user files. You'll see how Django handles those kinds of files and how to deal with them safely. + +{{< understand-django-series-pt "files" >}} + +## Files In Django Models + +{{< web >}} +As we saw in the models article, +{{< /web >}} +{{< book >}} +As we saw in the models chapter, +{{< /book >}} +model fields in a Django model map to a column in a database table. When you want to access the *data* for a model instance, Django will pull the data from a database row. + +Dealing with files in models is a bit different. While it *is* possible to store file data directly in a database, you won't see that happen often. The reason is that storing the data in the database usually affects the performance of the database, especially with a large number of files. + +Instead, a common pattern in database usage is to store files separately from the database itself. Within the database, a column would store some kind of *reference* to the stored file like a path if files are stored on a filesystem. This is the approach that Django takes with files. + +Now that you know that Django takes this approach, you can remember: + +1. Django models hold the *reference* to a file (e.g., a file path) +2. The file *data* (i.e., the file itself) is stored somewhere else. + +The "somewhere else" is called the "file storage," and we'll discuss storage in more depth in the next section. + +Let's focus on the first item. What do you use to reference the files? Like all other model data, we'll use a field! Django includes two fields that help with file management: + +* `FileField` +* `ImageField` + +### `FileField` + +What if you want to store a profile picture? You might do something like this: + +```python +# application/models.py + +from django.db import models + +class Profile(models.Model): + picture = models.FileField() + # Other fields like a OneToOneKey to User ... +``` + +This is the most basic version of using file fields. We can use this model very directly with a Django shell to illustrate file management: + +```python +$ ./manage.py shell +>>> from django.core.files import File +>>> from application.models import Profile +>>> f = open('/Users/matt/path/to/image.png') +>>> profile = Profile() +>>> profile.picture.save( +... 'my-image.png', +... File(f) +... ) +``` + +In this example, I'm creating a profile instance manually. There are a few interesting notes: + +* The `File` class is an important wrapper that Django uses to make Python file objects (i.e., the value returned from `open`) work with the storage system. +* The name `image.png` and `my-image.png` do not have to match. Django can store the content of `image.png` and use `my-image.png` as the name to reference within the storage system. +* Saving the picture will automatically save the parent model instance by default. + +More often than not, you won't need to use these interfaces directly because Django has form fields and other tools that manage much of this for you. + +The current model example raises questions. + +* Where does that data go? +* What if we have a name conflict between two files like "`my-image.png`"? +* What happens if we try to save something that isn't an image? + +If we make no changes to the current setup, the data will go into the root of the media file storage. Media file storage is a topic that will be covered later. For the moment, recognize that putting all the files into a single place (i.e., the root) will be a mess. This mess will be pronounced if you're trying to track many file fields, but we can fix this with the `upload_to` field keyword argument. The simplest version of `upload_to` can take a string that the storage logic will use as a directory prefix to scope content into a different area. + +We're still left with potentially conflicting filenames. Thankfully, `upload_to` can also accept a callable that gives us a chance to fix that issue. Let's rework the example: + +```python +# application/models.py + +import uuid +from pathlib import Path +from django.db import models + +def profile_pic_path( + instance, + filename + ): + path = Path(filename) + return "profile_pics/{}{}".format( + uuid.uuid4(), + path.suffix + ) + +class Profile(models.Model): + picture = models.FileField( + upload_to=profile_pic_path + ) + # Other fields like a OneToOneKey to User ... +``` + +With this new version of the profile model, all of the images will be stored in a `profile_pics` path within the file storage. + +This version also solves the duplicate filename problem. `profile_pic_path` ignores most of the original filename provided. If two users both happen to upload `profile-pic.jpg`, `profile_pic_path` will assign those images random IDs and ignore the `profile-pic` part of the filename. + +You can see that the function calls `uuid4()`. These are effectively random IDs called {{< extlink "https://en.wikipedia.org/wiki/Universally_unique_identifier#Version_4_(random)" "Universally Unique Identifiers (UUID)" >}}. UUIDs are likely something that you've seen before if you've worked with computers long enough, even if you didn't know their name. An example UUID would be `76ee4ae4-8659-4b50-a04f-e222df9a656a`. In the storage area, you might find a file stored as: + +```text +profile_pics/76ee4ae4-8659-4b50-a04f-e222df9a656a.jpg +``` + +Each call to `uuid4()` is nearly certain to generate a unique value. Because of this feature, we can avoid filename conflicts by storing profile pictures with a unique name. As an aside, UUIDs are not very friendly for users, so if you plan to let your users download these files, you might wish to explore alternative naming techniques. + +There's one more problem to fix in this example. How do we know that a user provided a valid image file? This is important to check, because we want to avoid storing malicious files that bad actors might upload to our apps. + +This is where the `ImageField` has value. This field type contains extra validation logic that can check the *content* of the file to check that the file is, in fact, an image. To use `ImageField`, you'll need to install the {{< extlink "https://pillow.readthedocs.io/en/latest/" "Pillow" >}} library. Pillow is a package that lets Python work with image data. + +Our final example looks like: + +```python +# application/models.py + +import uuid +from pathlib import Path +from django.db import models + +def profile_pic_path( + instance, + filename + ): + path = Path(filename) + return "profile_pics/{}{}".format( + uuid.uuid4(), + path.suffix + ) + +class Profile(models.Model): + picture = models.ImageField( + upload_to=profile_pic_path + ) + # Other fields like a OneToOneKey to User ... +``` + +Now that we've seen how Django will track files and images in your models, let's go deeper and try to understand the file storage features. + +## Files Under The Hood + +We now know that models store references to files and not the files themselves. The file storage task is delegated to a special Python class in the system. + +This Python class must implement {{< extlink "https://docs.djangoproject.com/en/4.1/ref/files/storage/" "a specific API" >}}. Why? Like so many other parts of Django, the storage class can be swapped out for a different class. We've seen this swappable pattern already with templates, databases, authentication, static files, and sessions. + +The setting to control which type of file storage Django uses is `DEFAULT_FILE_STORAGE`. This setting is a Python module path string to the specific class. + +So, what's the default? The default is a storage class that will store files locally on the server that runs the app. This is found at `django.core.files.storage.FileSystemStorage`. The storage class uses a couple of important settings: `MEDIA_ROOT` and `MEDIA_URL`. + +The `MEDIA_ROOT` setting defines where Django should look for files in the filesystem: + +```python +MEDIA_ROOT = BASE_DIR / "media" +``` + +On my computer, with the above setting and the `Profile` class example from earlier, Django would store a file somewhere like: + +```text +# This path is split to be easier to read. +/Users/matt/example-app/ \ + media/profile_pics/ \ + 76ee4ae4-8659-4b50-a04f-e222df9a656a.jpg +``` + +The other setting important to `FileSystemStorage` is `MEDIA_URL`. This setting will determine how files are accessed by browsers when Django is running. Let's say `MEDIA_URL` is: + +```python +MEDIA_URL = "/media/" +``` + +Our profile picture would have a URL like: + +```python +>>> from application.models import Profile +>>> profile = Profile.objects.last() +>>> profile.picture.url +'/media/profile_pics/76ee4ae4-8659-4b50-a04f-e222df9a656a.jpg' +``` + +This is the path that we can reference in templates. An image tag template fragment would look like: + +{{< web >}} +```django + +``` +{{< /web >}} +{{< book >}} +```djangotemplate + +``` +{{< /book >}} + +The Django documentation shows how file storage is a specific interface. `FileSystemStorage` happens to be included with Django and implements this interface for the simplest storage mechanism, the file system of your server's operating system. + +We can also store files separately from the web server, and there are often really good reasons to do that. Up next, we'll look at another option for file storage aside from the provided default. + +## Recommended Package + +What is a problem that can arise if you use the built-in `FileSystemStorage` to store files for your application? There are actually many possible problems! Here are a few: + +* The web server can have too many files and run out of disk space. +* Users may upload malicious files to attempt to gain control of your server. +* Users can upload large files that can cause a Denial of Service (DOS) attack and make your site inaccessible. + +If you conclude that `FileSystemStorage` will not work for your app, is there another good option? Absolutely! + +The most popular storage package to reach for is {{< extlink "https://django-storages.readthedocs.io/en/latest/" "django-storages" >}}. django-storages includes a set of storage classes that can connect to a variety of cloud services. These cloud services are able to store an arbitrary number of files. With django-storages, your application can connect to services like: + +* Amazon Simple Storage Service (S3) +* Google Cloud Storage +* Digital Ocean Spaces +* Services you run separately like an SFTP server + +These services would have additional cost beyond the cost of running your web server in the cloud, but the services usually have shockingly low rates and some offer a generous free tier for lower levels of data storage. + +Why use django-storages? + +* You will never need to worry about disk space. The cloud services offer effectively unlimited storage space if you're willing to pay for it. +* The files will be separated from your Django web server. This can eliminate some categories of security problems like a malicious file trying to execute arbitrary code on the web server. +* Cloud storage can offer some caching benefits and be easily connected to Content Delivery Networks to optimize how files are served to your app's users. + +As with all software choices, we have tradeoffs to consider when using different storage classes. On its face, django-storages seems to be nearly all positives. The benefits come with some setup complexity cost. + +For instance, I like to use Amazon S3 for file storage. You can see from the {{< extlink "https://django-storages.readthedocs.io/en/latest/backends/amazon-S3.html" "Amazon S3 setup" >}} documentation that there is a fair amount of work to do beyond setting a different `DEFAULT_FILE_STORAGE` class. This setup includes setting AWS private keys, access controls, regions, buckets, and a handful of other important settings. + +While the setup cost exists, you'll usually pay that cost at the beginning of a project and be mostly hands off after that. + +django-storages is a pretty fantastic package, so if your project has a lot of files to manage, you should definitely consider using it as an alternative to the `FileSystemStorage`. + +## Summary + +{{< web >}} +In this article, +{{< /web >}} +{{< book >}} +In this chapter, +{{< /book >}} +you learned about Django file management. + +We covered: + +* How Django models maintain references to files +* How the files are managed in Django +* A Python package that can store files in various cloud services + +{{< web >}} +In the next article, +{{< /web >}} +{{< book >}} +In the next chapter, +{{< /book >}} +let's explore commands. Commands are the code that you can run with `./manage.py`. + +You'll learn about: + +* Built-in commands provided by Django +* How to build custom commands +* Extra commands from the community that are useful extensions for apps + +{{< web >}} +If you'd like to follow along with the series, please feel free to sign up for my newsletter where I announce all of my new content. If you have other questions, you can reach me online on Twitter where I am {{< extlink "https://twitter.com/mblayman" "@mblayman" >}}. +{{< /web >}} +  diff --git a/content/understand-django/2021-11-04-command-apps.pt.md b/content/understand-django/2021-11-04-command-apps.pt.md new file mode 100644 index 0000000..07e7faf --- /dev/null +++ b/content/understand-django/2021-11-04-command-apps.pt.md @@ -0,0 +1,411 @@ +--- +title: "Command Your App" +description: >- + With this Understand Django article, you'll learn about commands. Commands are the way to execute scripts that interact with your Django app. We'll see built-in commands and how to build your own commands. +image: img/django.png +type: post +categories: + - Python + - Django +tags: + - Python + - Django + - commands +series: "Understand Django" + +--- + +In the last +{{< web >}} +[Understand Django]({{< ref "/understand-django/_index.pt.md" >}}) article, +{{< /web >}} +{{< book >}} +chapter, +{{< /book >}} +we dug into file management. We saw how Django handles user uploaded files and how to deal with them safely. + +{{< web >}} +With this article, +{{< /web >}} +{{< book >}} +With this chapter, +{{< /book >}} +you'll learn about commands. Commands are the way to execute scripts that interact with your Django app. We'll see built-in commands and how to build your own. + +{{< understand-django-series-pt "commands" >}} + +## Why Commands? + +Django makes it possible to run code from a terminal with `./manage.py`, but why is this helpful or needed? Consider this script: + +```python +# product_stat.py + +from application.models import Product + +print(Product.objects.count()) +``` + +which you could try running with: + +```bash +$ python product_stat.py +``` + +The problem with this script is that Django is not ready to run yet. If you tried to run this kind of code, you would get an `ImproperlyConfigured` exception. There are a couple of modifications that you could make to get the script to run: + +* Call `django.setup()`. +* Specify the `DJANGO_SETTINGS_MODULE`. + +```python +# product_stat.py +import django + +django.setup() + +from application.models import Product + +print(Product.objects.count()) +``` + +Note that `django.setup()` must be before your Django related imports (like the `Product` model in this example). Now the script can run if you supply where the settings are located too: + +```bash +$ DJANGO_SETTING_MODULE=project.settings python product_stat.py +``` + +This arrangement is less than ideal, but why else might we want a way to run commands through Django? + +Try running `./manage.py -h`. + +What you'll likely see is more commands than what Django provides alone. This is where we begin to see more value from the command system. Because Django provides a standard way to run scripts, other Django applications can bundle useful commands and make them easily accessible to you. + +Now that you've had a chance to see why commands exist to run scripts for Django apps, let's back up and see *what* commands are. + +## We Hereby Command + +Django gives us a tool to run commands before we've even started our project. That tool is the `django-admin` script. We saw it all the way back +{{< web >}} +in the first article +{{< /web >}} +{{< book >}} +in the first chapter +{{< /book >}} +where I provided a short set of setup instructions to get you started if you've never used Django before. + +After you've started a project, your code will have a `manage.py` file, +{{< web >}} +and the commands you've seen in most articles are in the form of: +{{< /web >}} +{{< book >}} +and the commands you've seen in most chapters are in the form of: +{{< /book >}} + +```bash +$ ./manage.py some_command +``` + +What's the difference between `django-admin` and `manage.py`? In truth, **not much!** + +`django-admin` comes from Django's Python packaging. In Python packages, package developers can create scripts by defining an entry point in the {{< extlink "https://github.com/django/django/blob/4.1/setup.cfg" "packaging configuration" >}}. In Django, this configuration looks like: + +```ini +[options.entry_points] +console_scripts = + django-admin = django.core.management:execute_from_command_line +``` + +Meanwhile, the entire `manage.py` of a Django project looks like: + +```python +#!/usr/bin/env python +"""Django's command-line utility for administrative tasks.""" +import os +import sys + + +def main(): + os.environ.setdefault('DJANGO_SETTINGS_MODULE', 'project.settings') + try: + from django.core.management import execute_from_command_line + except ImportError as exc: + raise ImportError( + "Couldn't import Django. Are you sure it's installed and " + "available on your PYTHONPATH environment variable? Did you " + "forget to activate a virtual environment?" + ) from exc + execute_from_command_line(sys.argv) + + +if __name__ == '__main__': + main() +``` + +If you look closely, you can see that the different scripts are both ways to invoke the `execute_from_command_line` function to run a command. The primary difference is that the latter script will attempt to set the `DJANGO_SETTINGS_MODULE` environment variable automatically. Since Django needs to have `DJANGO_SETTINGS_MODULE` defined for most commands (note: `startproject` does *not* require that variable), `manage.py` is a more convenient way to run commands. + +`execute_from_command_line` is able to present what commands are available to a project, whether a command comes from Django itself, an installed app, or is a custom command that you created yourself. How are the commands discovered? *The command system does discovery by following some packaging conventions.* + +Let's say your project has an app named `application`. Django can find the command if you have the following packaging structure: + +```text +application +โ”œโ”€โ”€ __init__.py +โ”œโ”€โ”€ management +โ”‚ย ย  โ”œโ”€โ”€ __init__.py +โ”‚ย ย  โ””โ”€โ”€ commands +โ”‚ย ย  โ”œโ”€โ”€ __init__.py +โ”‚ย ย  โ””โ”€โ”€ custom_command.py +โ”œโ”€โ”€ models.py +โ””โ”€โ”€ views.py +... Other typical Django app files +``` + +With this structure, you could run: + +```bash +$ ./manage.py custom_command +``` + +Notes: + +* Django will create a command for a module found in `/management/commands/.py`. +* Don't forget the `__init__.py` files! Django can only discover the commands if `management` and `commands` are proper Python package directories. +* The example uses `custom_command`, but you can name your command with whatever valid Python module name that you want. + +Unfortunately, we can't slap some Python code into `custom_command.py` and assume that Django will know how to run it. Within the `custom_command.py` module, Django needs to find a `Command` class that subclasses a `BaseCommand` class that is provided by the framework. Django requires this structure to give command authors a consistent way to access features of the command system. + +With the `Command` class, you can add a `help` class attribute. Adding help can give users a description of what your command does when running `./manage.py custom_command -h`. + +The `Command` class will also help you with handling arguments. If your command needs to work with user input, you'll need to parse that data. Thankfully, the class integrates with Python's built-in `argparse` module. By including an `add_arguments` method, a command can parse the data and pass the results to the command's handler method in a structured way. If you've had to write Python scripts before, then you may understand how much time this kind of parsing can save you (and for those who haven't, the answer is "a lot of time!"). + +Other smaller features exist within the `Command` class too. Perhaps you only want your command to run if your project has satisfied certain pre-conditions. Commands can use the `requires_migration_checks` or `requires_system_checks` to ensure that the system is in the correct state before running. + +I hope it's clear that the goal of the `Command` class is to help you with common actions that many commands will need to use. There is a small API to learn, but the system is a boon to making scripts quickly. + +## Command By Example + +Let's consider a powerful use case to see a command in action. When you initially start a Django app, all of your app's interaction will probably be through web pages. After all, you were trying to use Django to make a web app, right? What do you do when you need to do something that doesn't involve a browser? + +This kind of work for your app is often considered *background* work. Background work is a pretty deep topic and will often involve special background task software like {{< extlink "https://docs.celeryproject.org/en/stable/getting-started/introduction.html" "Celery" >}}. When your app is at an early stage, Celery or similar software can be overkill and far more than you need. + +A simpler alternative for some background tasks could be a command paired with a scheduling tool like {{< extlink "https://en.wikipedia.org/wiki/Cron" "cron" >}}. + +On one of my projects, I offer free trials for accounts. After 60 days, the free trial ends and users either need to pay for the service or discontinue using it. By using a command and pairing it with the {{< extlink "https://devcenter.heroku.com/articles/scheduler" "Heroku Scheduler" >}}, I can move accounts from their trial status to expired with a daily check. + +The following code is very close to what my `expire_trials` command looks like in my app. I've simplified things a bit, so that you can ignore the details that are specific to my service: + +```python +# application/management/commands/expire_trials.py + +import datetime + +from django.core.management.base import BaseCommand +from django.utils import timezone + +from application.models import Account + + +class Command(BaseCommand): + help = "Expire any accounts that are TRIALING beyond the trial days limit" + + def handle(self, *args, **options): + self.stdout.write( + "Search for old trial accounts..." + ) + # Give an extra day to be gracious and avoid customer complaints. + cutoff_days = 61 + trial_cutoff = timezone.now() - datetime.timedelta(days=cutoff_days) + expired_trials = Account.objects.filter( + status=Account.TRIALING, created__lt=trial_cutoff + ) + count = expired_trials.update( + status=Account.TRIAL_EXPIRED + ) + self.stdout.write( + f"Expired {count} trial(s)" + ) +``` + +I configured the scheduler to run `python manage.py expire_trials` every day in the early morning. The command checks the current time and looks for `Account` records in the trialing state that were created before the cutoff time. From that query set, the affected accounts are set to the expired account state. + +How can you test this command? There are a couple of approaches you can take when testing a command. + +If you need to simulate calling the command with command line arguments, then you can use `call_command` from `django.core.management`. Since the example command doesn't require arguments, I didn't take that approach. + +Generally, my preference is to create a command object and invoke the `handle` method directly. In my example above, you can see that the command uses `self.stdout` instead of calling `print`. Django does this so that you could check your output if desired. + +Here is a test for this command: + +```python +# application/tests/test_commands.py + +from io import StringIO + +from application.management.commands.expire_trials import ( + Command +) +from application.models import Account +from application.tests.factories import AccountFactory + +def test_expires_trials(): + """Old trials are marked as expired.""" + stdout = StringIO() + account = AccountFactory( + status=Account.TRIALING, + created=timezone.now() - datetime.timedelta(days=65), + ) + command = Command(stdout=stdout) + + command.handle() + + account.refresh_from_db() + assert account.status == Account.TRIAL_EXPIRED + assert "Expired 1 trial(s)" in stdout.getvalue() +``` + +In this test, I constructed a command instance and checked the account state after the command invocation. Also, observe that the `StringIO` instance is injected into the `Command` constructor. By building the command this way, checking the output becomes a very achievable task via the `getvalue` method. + +Overall, this scheme of making a command and running it on a schedule avoids all the work of setting up a background worker process. I've been extremely satisfied with how this technique has worked for me, and I think it's a great pattern when your app doesn't have to do a lot of complex background processing. + +## Useful Commands + +Django is full of {{< extlink "https://docs.djangoproject.com/en/4.1/ref/django-admin/" "useful commands" >}} that you can use for all kinds of purposes. +{{< web >}} +Thus far in this series, +{{< /web >}} +{{< book >}} +Thus far in this book, +{{< /book >}} +we've discussed a bunch of them, including: + +* `check` - Checks that your project is in good shape. +* `collectstatic` - Collects static files into a single directory. +* `createsuperuser` - Creates a super user record. +* `makemigrations` - Makes new migration files based on model changes. +* `migrate` - Runs any unapplied migrations to your database. +* `runserver` - Runs a development web server to check your app. +* `shell` - Starts a Django shell that allows you to use Django code on the command line. +* `startapp` - Makes a new Django app from a template. +* `startproject` - Makes a new Django project from a template. +* `test` - Executes tests that check the validity of your app. + +Here is a sampling of other commands that I find useful when working with Django projects. + +### `dbshell` + +The `dbshell` command starts a different kind of shell. The shell is a database program that will connect to the same database that your Django app uses. This shell will vary based on your choice of database. + +For instance, when using PostgreSQL, `./manage.py dbshell` +will start `{{< extlink "https://www.postgresql.org/docs/current/app-psql.html" "psql" >}}`. +From this shell, you can execute SQL statements directly to inspect the state of your database. I don't reach for this command often, but I find it very useful to connect to my database without having to remember database credentials. + +### `showmigrations` + +The `showmigrations` command has a simple job. The command shows all the migrations for each Django app in your project. Next to each migration is an indicator of whether the migration is applied to your database. + +Here is an example of the `users` app from one of my Django projects: + +```bash +$ ./manage.py showmigrations users +users + [X] 0001_initial + [X] 0002_first_name_to_150_max + [ ] 0003_profile +``` + +In my real project, I've applied all the migrations, but for this example, I'm showing the third migration as it would appear if the migration wasn't applied yet. + +`showmigrations` is a good way to show the state of your database from Django's point of view. + +### `sqlmigrate` + +The `sqlmigrate` command is *very* handy. The command will show you what SQL statements Django would run for an individual migration file. + +Let's see an example. In Django 3.1, the team changed the `AbstractUser` model so that the `first_name` field could have a maximum length of 150 characters. Anyone using the `AbstractUser` model (which includes me) had to generate a migration to apply that change. + +From my `showmigrations` output above, you can see that the second migration of my `users` app applied this particular framework change. + +To see the Postgres SQL statements that made the change, I can run: + +```bash +$ ./manage.py sqlmigrate users 0002 +BEGIN; +-- +-- Alter field first_name on user +-- +ALTER TABLE "users_user" ALTER COLUMN "first_name" TYPE varchar(150); +COMMIT; +``` + +From this, we can tell that Postgres executed an `ALTER COLUMN` {{< extlink "https://en.wikipedia.org/wiki/Data_definition_language" "DDL" >}} statement to modify the length of the `first_name` field. + +### `squashmigrations` + +Django migrations are a stack of separate database changes that produce a final desired schema state in your database. Over time, your Django apps will accumulate migration files, but those files have a shelf life. The `squashmigrations` command is designed to let you tidy up an app's set of migration files. + +By running `squashmigrations`, you can condense an app's migrations into a significantly smaller number. The reduced migrations can accurately represent your database schema, and make it easier to reason about what changes happened in the app's history. As a side benefit, migration squashing can make Django's migration handling faster because Django gets to process fewer files. + +## Even More Useful Commands + +The commands above come with the standard Django install. Adding in third-party libraries gives you access to even more cool stuff to help with your project development! + +A package that I often reach for with my Django projects is the {{< extlink "https://django-extensions.readthedocs.io/en/latest/index.html" "django-extensions" >}} package. This package is full of goodies, including some great optional commands that you can use! + +A couple of my favorites include: + +### `shell_plus` + +How often do you fire up a Django shell, import a model, then do some ORM queries to see the current state of the database? This is something I do *quite* often. + +The `shell_plus` command is like the regular shell, but the command will import all your models *automatically*. For the five extra characters of `_plus`, you can save your fingers a lot of typing to import your models and get directly to whatever you needed the shell for. + +The command will also import some commonly used Django functions and features like `reverse`, `settings,` `timezone`, and more. + +Also, if you have installed a separate REPL like {{< extlink "https://ipython.org/" "IPython" >}}, `shell_plus` will attempt to use the alternate REPL instead of the default version that comes with Python. + +### `graph_models` + +When I'm live streaming my side projects on {{< extlink "https://www.youtube.com/c/MattLayman" "my YouTube channel" >}}, I will often want to show the model relationships of my Django project. With the `graph_models` command, I can create an image of all my models and how those models relate to each other (using UML syntax). This is a great way to: + +* Remind myself of the data modeling choices in my apps. +* Orient others to what I'm doing with my project. + +This particular command requires some extra setup to install the right tools to create images, but the setup is manageable and the results are worth it. + +Aside from `shell_plus` and `graph_models`, there are 20 other commands that you can use that may be very useful to you. You should definitely check out django-extensions. + +## Summary + +{{< web >}} +In this article, +{{< /web >}} +{{< book >}} +In this chapter, +{{< /book >}} +you saw Django commands. + +We covered: + +* Why commands exist in the Django framework +* How commands work +* How to create your own custom command and how to test it +* Useful commands from the core framework and the django-extensions package + +{{< web >}} +In the next article, +{{< /web >}} +{{< book >}} +In the next chapter, +{{< /book >}} +we're going to look into performance. + +You'll learn about: + +* How Django sites get slow +* Ways to optimize your database queries +* How to apply caching to save processing + +{{< web >}} +If you'd like to follow along with the series, please feel free to sign up for my newsletter where I announce all of my new content. If you have other questions, you can reach me online on Twitter where I am {{< extlink "https://twitter.com/mblayman" "@mblayman" >}}. +{{< /web >}} +  diff --git a/content/understand-django/2022-01-19-go-fast.pt.md b/content/understand-django/2022-01-19-go-fast.pt.md new file mode 100644 index 0000000..0077d3f --- /dev/null +++ b/content/understand-django/2022-01-19-go-fast.pt.md @@ -0,0 +1,453 @@ +--- +title: "Go Fast With Django" +description: >- + How do you make your Django app fast? You measure what is slow, scale your system when necessary, and use a combination of fast database queries and strategic caching. In this Understand Django article, we'll explore those topics and more to help you get a performant Django app. +image: img/django.png +type: post +categories: + - Python + - Django +tags: + - Python + - Django + - performance + - caching +series: "Understand Django" + +--- + +In the last +{{< web >}} +[Understand Django]({{< ref "/understand-django/_index.pt.md" >}}) article, +{{< /web >}} +{{< book >}} +chapter, +{{< /book >}} +we learned about commands. Commands are the way to execute scripts that interact with your Django app. + +{{< web >}} +With this article, +{{< /web >}} +{{< book >}} +With this chapter, +{{< /book >}} +we're going to dig into performance. How do you make your Django site faster? Keep reading to find out. + +{{< understand-django-series-pt "performance" >}} + +## Theory Of Performance + +{{< web >}} +>There are two ways to make a website faster: +> +> * Do **more** work. +> * Do **less** work. +{{< /web >}} +{{< book >}} +There are two ways to make a website faster: + + * Do **more** work. + * Do **less** work. +{{< /book >}} + +How we do more work *and* less work depends on the type of work that we're trying to address on the site. + +When we're talking about doing more work, what I'm really saying is that we often need to increase the ***throughput*** of a site. Throughput is a measure of work over time. By increasing throughput, a site can serve more users concurrently. + +A very natural throughput measure on a Django site is *requests per second*. A page view on a site could contain multiple requests, so requests per second isn't a perfect analog to how many users your site can handle, but it's still a useful measure to help you reason about site performance. + +On the flip side of doing more work, what does it mean to do less work to improve performance? + +> The fastest code is no code. + +Every line of code that you write must be processed by a computer when it runs. If your code is inefficient or if too much is running, that will naturally mean that it will take longer to produce a final result. + +The time from when an input happens to when an output is received is called ***latency***. If a user clicks on a link on your site, how long does it take for them to get a response? That time delay is latency. + +Less work doesn't just mean that you're running too much code! There are a lot of factors that might contribute to latency, some of them are easier to optimize than others. + +* Inefficient code - mistakes in development can make a computer slower than necessary +* Data size - sending more data requires more effort to deliver to users +* Geographic location - the speed of light is a real limit on network communication +* and more! + +If you can reduce latency on your site, you can improve the experience of the people using the site. + +{{< web >}} +In the rest of this article, +{{< /web >}} +{{< book >}} +In the rest of this chapter, +{{< /book >}} +you'll learn how you can do both more and less work to make a better site that will benefit your users. + +## Measure First + +Before optimizing, we have to recognize what kind of work is impacting an app. In other words, *what is the resource constraint that is preventing an app from performing better?* + +Measuring applications, especially those that are running with live traffic, can be a tricky endeavor. I think we can look at an app from a zoomed out macro level or a zoomed in point of view. + +I would start my analysis from inspecting the system overall for patterns. Broadly, you will find that performance tends to fall into a couple of major bottleneck categories: + +* I/O bound +* CPU bound + +*I/O bound* means that the system is limited (that's the "bound" part) by the inputs and outputs of the system. Since that's still a really vague statement, let's make it more concrete. An I/O bound system is one that is waiting for work to be available. Classic examples include: + +* waiting for responses from a database +* waiting for content from a file system +* waiting for data to transfer over a network +* and so on + +Optimizing an I/O bound system is all about minimizing those waiting moments. + +Conversely, *CPU bound* systems are systems that are drowning in immediate work to calculate. The computer's **C**entral **P**rocessing **U**nit can't keep up with all that it's being asked to do. Classic examples of CPU bound work are: + +* Data science computations for machine learning +* Image processing and rendering +* Test suite execution + +Optimizing a CPU bound system focuses heavily on making calculations faster and cheaper. + +Since we understand that an application that is underperforming is likely to be I/O bound or CPU bound, we can start to look for these patterns within the system. The easiest initial signal to observe is the CPU load. If the processors on your production machines are running at very high CPU utilization, then it's a pretty clear indicator that your app is CPU bound. In my estimation, you'll rarely see this for web applications. **Most underperforming web applications are likely to be I/O bound.** + +The reason that web apps are often I/O bound has to do with their typical function. Most apps are fetching data from a database and displaying it to a user. There isn't a massive amount of computation (comparatively to something like machine learning) that the app needs to do. Thus, your app is probably waiting around for data from the database. + +If you observe that the system is not CPU bound, then the next action is to dig deeper and find where the application is spending its time waiting. But how can we do this? Philosophically, you should now have an ok understanding of what to be looking for, but what tools can you use to accomplish the task of measurement? + +We need to rewind a bit. A moment ago, I also made an assumption that you know how to find the CPU load of your application. That may not be the case. Let's look at some tools that help you categorize where your app's resource bottleneck is. + +The easiest way to monitor basic resource information about your app including CPU, memory, disk usage, and more may come from your hosting vendor. My preferred hosting vendor, Heroku, displays all these kinds of metrics on a single page so I can assess system performance at a glance. Other vendors like Digital Ocean or AWS provide tools to help you see this information too. + +If your hosting vendor doesn't provide these tools, then you'll have to use other techniques. Presumably, if you're using a Virtual Private Server (VPS) for hosting, you have access to the server itself via ssh. On the server that's running your application, you can use a program like `top`. This is a classic program for checking which processes are using the "top" amount of resources. `top` will show the list of processes ordered by what is consuming the most CPU and will refresh the order of the list every second to provide a current snapshot in time. (**Tip**: use `q` to quit `top` after you start it.) + +While `top` is useful and gets the job done to learn about CPU usage, it's not exactly the friendliest tool out there. There are alternatives to `top` that may offer a better user experience. I personally find `top` sufficient, but I know `htop` is a popular alternative. + +If you don't have tools from your hosting provider and don't want to use ssh to log into a server, there are other options to consider. Broadly, this other category of tools is called Application Performance Monitoring (APM). APM tools are vendors that will monitor your application (go figure!) if you install the tool along with your app. These tools help show *both* CPU problems and I/O issues. Application performance is very important to businesses, so the software industry is full of vendors to choose from with a wide range of features. + +To see what these tools can be like for free, you might want to check out {{< extlink "https://www.datadoghq.com/" "Datadog" >}} which has a free tier (Datadog is not a sponsor, I've just used their service, enjoyed it, and know that it's free for a small number of servers). Other popular vendors include {{< extlink "https://scoutapm.com/" "Scout APM" >}} and {{< extlink "https://newrelic.com/" "New Relic" >}}. + +Finally, we've reached a point where you can diagnose the performance constraints on your application using a wide variety of tools or services. Let's see how to fix problems you may be experiencing! + +## Do More + +We can address throughput and do more by tuning a few different knobs. + +When thinking about doing more, try to think about the system in two different scaling dimensions: + +* Horizontally +* Vertically + +Horizontal and vertical scaling are methods of describing *how* to do more in software systems. + +Let's relate this to a silly example to give you a good intuitive feel for scaling. Imagine that you need to move large bags of dirt (approximately 40 lbs / 18 kg per bag) to plant a huge garden. The job is to unload the hundreds of bags from a delivery truck to your imaginary back yard. You enlist the help of your friends to get the job done. + +One strategy is to get your *strongest* friends to help. Maybe there aren't as many of them, but their strength can make quick work of moving the bags. This is *vertical scaling*. The additional power of your friends allows them to move the bags more easily than someone with an average or weaker build. + +Another strategy is to get *lots* of friends to help. Maybe these friends can't move as many bags as the stronger ones, but many hands make light work. This is *horizontal scaling*. The increased number of people allows the group to move more bags because more individuals can do the work simultaneously. + +We can apply this same thinking to computer systems. + +### Vertical Scaling + +To achieve vertical scaling, you would run your application on a more powerful computer. Cloud vendors give you all kinds of tools to do this. Picking a faster computer is naturally going to cost you more, so vendors make many options available (check out {{< extlink "https://aws.amazon.com/ec2/instance-types/" "this page from AWS" >}} to see the dizzying array of options). + +When should you think about vertical scaling? One natural case is when your application is CPU bound. If the processor is struggling to process the requests from an application, a faster processor may help. With a higher clock speed from a faster individual CPU, a computer will be able to process an individual request faster. + +Moving to a larger computer is typically considered vertical scaling, but it may be possible to have horizontal effects by moving to a larger computer because of how modern computers are designed. These days, larger computers typically come with a higher number of CPUs. Each individual CPU may be faster than a smaller computer configuration *and* there will be more CPUs on the single machine. Because of this characteristic, you will likely need to change your application configuration to take advantage of the additional power supplied by the extra CPU cores. While the traditional definition of vertical scaling (i.e., a faster individual CPU can do work quicker than a slower one) still applies, the line between vertical and horizontal scaling is somewhat blurred because of the multi-CPU core paradigm of modern CPUs. + +{{< web >}} +In the Understand Django deployment article, +{{< /web >}} +{{< book >}} +In the deployment chapter, +{{< /book >}} +we discussed Gunicorn's `--workers` flag. Recall that Python application servers like Gunicorn work by creating a main process and a set of worker processes. The main process will distribute incoming network connections to the worker processes to handle the actual traffic on your website. If you vertically scale the server machine from a size that has 1 CPU to a machine that has 2, 4, or more CPUs, and you don't change the number of workers, then you'll waste available CPU capacity and won't see most of the benefits from the upgrade in server size. + +If modern vertical scaling uses more CPUs when moving to a bigger machine, then what is horizontal scaling? The difference is primarily in the number of computers needed to do the scaling. Vertical scaling changes a single machine to achieve more throughput. Horizontal scaling pulls multiple machines into the equation. + +### Horizontal Scaling + +Conceptually, how does horizontal scaling work? With the vertical scaling model, you can see a clear connection between users making a request to your website's domain and a single machine handling those requests (i.e., the main process from your application server distributes requests). With the horizontal model, we're now discussing multiple computers. How does a single domain name handle routing to multiple computers? With more computers! + +Like the main process that distributes requests, we need a central hub that is able to route traffic to the different machines in your horizontally scaled system. This hub is usually called a **load balancer**. A load balancer can be used for multiple things. I see load balancers used primarily to: + +* route traffic to the different application servers in a system +* handle the TLS certificate management that makes HTTPS possible + +Since the load balancer doesn't do most of the actual work of processing a request, your system can increase its throughput by increasing the number of application servers. In this setup, each application server "thinks" that it is the main server that's handling requests. The load balancer behaves like a client that's making requests on behalf of the actual user. This kind of configuration is called a proxy setup. + +If you want to learn more about horizontal scaling with a load balancer, then I suggest you check out {{< extlink "https://www.nginx.com/" "Nginx" >}} (pronounced "engine X"), {{< extlink "http://www.haproxy.org/" "HAProxy" >}} (which stands for "high availability proxy"), or {{< extlink "https://docs.aws.amazon.com/elasticloadbalancing/latest/application/introduction.html" "AWS ALBs" >}} (for "application load balancer"). These tools are commonly used and have a reputation for being strong load balancers. + +### What's Better? + +*What are the tradeoffs between horizontal scaling and vertical scaling?* + +When you add more pieces to a system, you're increasing the complexity of the system. Thus, vertical scaling can, at least initially, produce a design with lower operational complexity. Personally, if I ran a service on some VPS like Digital Ocean or AWS, I would probably reach for vertical scaling first. A bigger machine would allow me to use a higher number of concurrent worker processes to increase throughput, and I would avoid the complexity of deploying multiple application servers. + +In reality, I run my side projects on a Platform as a Service, Heroku. With my choice of Heroku, the service already includes a load balancer by default. This means that I can trivially scale horizontally by changing a setting in Heroku that will start multiple application servers. + +While vertical scaling may be a good fit if you don't have an existing load balancer, that scaling path does have downsides to consider. + +First, in a vertically scaled world, downtime on your server could mean downtime *for your service*. Whether a site is reachable or not reachable is called "availability" by the software industry. If your entire site is tied to a large vertically scaled server, then it can act as a single point of failure if there is a problem. + +Secondly, a vertically scaled service may potentially have more cost for you. In my experience, most websites have high and low periods of usage throughout the day. For instance, my current employer is a US healthcare company that provides telemedicine visits for people that need to speak with a doctor virtually. When it's the middle of the night in the US, the site utilization is naturally lower as most people are sleeping. + +One common cost optimization is to use fewer computing resources during periods of lower utilization. On a vertically scaled service, it is harder to change computer sizes quickly. Thus, that computing resource's usage is relatively fixed, even if no one is using your service. In contrast, a horizontally scaled service can be configured to use "auto-scaling." + +Auto-scaling is the idea that the infrastructure can be resized dynamically, depending on the use of the site. During periods of high activity, more computers can be added automatically to join the load balancer distribution and handle additional load. When activity dies down, these extra machines can be removed from use. This cost saving technique helps ensure that your system is only using what it needs. + +The truth is that if your system reaches a large enough size and scale, then picking horizontal or vertical scaling is a false choice. As a system matures and grows, you may need to have a mix of the two scaling types, so that your service has the characteristics that you need (like availability). + +I hope that I've helped equip you with some mental modeling tools. With these tools, you should have some idea of how to handle more traffic when your site becomes wildly popular. ๐Ÿ˜„ + +In this section, we focused on increasing throughput by changing your service's infrastructure to handle more load. Now let's shift from the macro point of view to the micro view and talk about how to improve throughput by doing less. + +## Do Less + +*How do you make your Django site do less work?* We should always measure, but since I believe most websites are I/O bound, let's focus on techniques to improve in that dimension. + +### Optimizing Database Queries + +The most common performance problem that I've encountered with Django applications is the N+1 query bug (some people will describe it as the 1+N query bug for reasons that may become evident in a moment). + +The N+1 bug occurs when your application code calls the database in a loop. How many queries are in this made up example?: + +```python +from application.models import Movie + +movies = Movie.objects.all() +for movie in movies: + print(movie.director.name) +``` + +It's a bit of a trick question because you might have a custom manager (i.e., `objects`), but, in the simplest scenario, there is one query to fetch the movies, and one query for each director. + +The reason for this behavior is that Django does a lazy evaluation of the movies `QuerySet`. The ORM is not aware that it needs to do a join on the movie and director tables to fetch all the data. The first query on the movie table occurs when the iteration happens in the Python `for` loop. When the `print` function tries to access the `director` foreign key, the ORM does not have the director information cached in Python memory for the query set. Django must then fetch the director data in another database query to display the director's name. + +This adds up to: + +* 1 query for the movies table +* N queries on the director table for each iteration through the `for` loop + +Hence the name, "N+1" query bug. + +The reason that this is so bad is because calling the database is way slower than accessing data in Python memory. Also, this problem gets worse if there are more rows to iterate over (i.e., more movies to process and, thus, more directors to fetch individually). + +The way to fix this issue is to hint to Django that the code is going to access data from the deeper relationship. We can do that by hinting to the ORM with `select_related`. Let's see the previous example with this change: + +```python +from application.models import Movie + +movies = Movie.objects.select_related( + "director").all() +for movie in movies: + print(movie.director.name) +``` + +In the reworked example, the ORM will "know" that it must fetch the director data. Because of this extra information, the framework will fetch from both the movie and director tables *in a single query* when the `for` loop iteration starts. + +Under the hood, Django performs a more complex `SELECT` query that includes a join on the two tables. The database sends back all the data at once and Django caches the data in Python memory. Now, when execution reaches the `print` line, the `director.name` attribute can pull from memory instead of needing to trigger another database query. + +The performance savings here can be massive, especially if your code works with a lot of database rows at once. + +While `select_related` is fantastic, it doesn't work for all scenarios. Other types of relationships like a many to many relationship can't be fetched in a single query. For those scenarios, you can reach for `prefetch_related`. With this method, Django will issue a smaller number of queries (usually 1 per table) and blend the results together in memory. In practice, `prefetch_related` operates very much like `select_related` in most circumstances. Check out the Django docs to understand more. + +### Caching Expensive Work + +If you know: + +* execution will likely happen many times +* is expensive to create, AND +* won't need to change + +then you're looking at work that is a very good candidate to cache. With caching, Django can save the results of some expensive operation into a very fast caching tool and restore those results later. + +A good example of this might be a news site. A news site is very "read heavy," that is, users are more likely to use the site for viewing information than for writing and saving information to the site. A news site is also a good example because users will read the same article, and the content of that article is fixed in form. + +Django includes tools to make it simple to work with the cache to optimize content like our news site example. + +The simplest of these tools is the `cache_page` decorator. This decorator can cache the results of an entire Django view for a period of time. When a page doesn't have any personalization, this can be a quick and effective way to serve HTML results from a view. You can find this decorator in `django.views.decorators.cache`. + +You may need a lower level of granularity than a whole page. For instance, your site might have some kind of logged in user and a customized navigation bar with a profile picture or something similar. In that scenario, you can't really cache the whole page and serve that to multiple users, because other users would see the customized navigation bar of the first user who made the request. If this is the kind of situation you're in, then the `cache` template tag may be the best tool for you. + +Here's a template example of the `cache` tag in use: + +{{< web >}} +```django +{% load cache %} + +Hi {{ user.username }}, this part won't be cached. + +{% cache 600 my_cache_key_name %} + Everything inside of here will be cached. The first argument to `cache` is how long this should be cached in seconds. This cache fragment will cache for 10 minutes. Cached chunks need a key name to help the cache system find the right cache chunk. + + This cache example usage is a bit silly because this is static text and there is no expensive computation in this chunk. +{% endcache %} +``` +{{< /web >}} +{{< book >}} +```djangotemplate +{% load cache %} + +Hi {{ user.username }}, this part won't be cached. + +{% cache 600 my_cache_key_name %} + Everything inside of here will be cached. The first argument to `cache` is how long this should be cached in seconds. This cache fragment will cache for 10 minutes. Cached chunks need a key name to help the cache system find the right cache chunk. + + This cache example usage is a bit silly because this is static text and there is no expensive computation in this chunk. +{% endcache %} +``` +{{< /book >}} + +With this scheme, any expensive computation that your template does will be cached. *Be careful with this tag!* The tag is useful if computation happens during rendering, but if you're doing the evaluation and fetching *inside of your view* instead of at template rendering time, then you're unlikely to get the benefits that you want. + +Finally, there is the option to use the cache interface directly. Here's the basic usage pattern: + +```python +# application/views.py +from django.core.cache import cache + +from application.complex import calculate_expensive_thing + +def some_view(request): + expensive_result = cache.get( + "expensive_computation") + if expensive_result is None: + expensive_result = calculate_expensive_thing() + cache.set( + "expensive_computation", + expensive_result + ) + + # Handle the rest of the view. + ... +``` + +On the first request to this view, the `expensive_result` won't be in the cache, so the view will calculate the result and save it to the cache. On subsequent requests, the expensive result can be pulled from the cache. In this example, I'm using the default timeout for the cache, but you can control the timeout values when you need more control. The cache system has plenty of other cool features, so check it out in the docs. + +As fair warning, caching often requires more tools and configuration. Django works with very popular cache tools like Redis and Memcached, but you'll have to configure one of the tools on your own. The Django documentation will help you, but be prepared for more work on your part. + +Database optimization and caching are go-to techniques for optimization. When you're optimizing, how do you know that you're doing it right? What gains are you getting? Let's look at some tools next that will let you answer those questions. + +## Tools To Measure Change + +We'll look at tools at an increasing level of complexity. This first tool is one that is massively useful while developing in Django. The other tools are more general purpose tools, but they still are worth knowing about, so that you'll know when to reach for them. + +Each of these tools helps you get real performance data. By measuring the before and after of your changes, you can learn if the changes are actually producing the gains that you expect or hope to achieve. + +### Django Debug Toolbar + +The {{< extlink "https://django-debug-toolbar.readthedocs.io/en/latest/index.html" "Django Debug Toolbar" >}} is a critical tool that I add to my projects. The toolbar acts as an overlay on your site that opens to give you a tray of different categories of diagnostic information about your views. + +Each category of information is grouped into a "panel," and you can select between the different panels to dig up information that will assist you while doing optimization work. + +You'll find panels like: + +* SQL +* Templates +* Request +* Time + +The SQL panel is where I spend nearly all of my time when optimizing. This panel will display all the queries that a page requests. For each query, you can find what code triggered the database query and you even get the exact SQL `SELECT`. You can also get an `EXPLAIN` about a query if you really need the gory details of what the database is doing. + +With a little bit of eye training, you'll learn to spot N+1 query bugs because you can see certain queries repeated over and over and "cascading" like a waterfall. + +I'll often test with the debug toolbar when I'm trying to sprinkle in `select_related` to visually confirm that I've reduced the query count on a page. The debug toolbar is open source and is a great free resource. The toolbar is totally worth the investment of configuring it for your next Django project. + +### hey / ab + +There are two tools that are very similar that I use when I need to get a crude measure of the performance of a site. These tools are {{< extlink "https://github.com/rakyll/hey" "hey" >}} and {{< extlink "https://httpd.apache.org/docs/2.4/programs/ab.html" "ab" >}} (Apache Bench). Both of these tools are *load generators* that are meant to benchmark a site's basic performance characteristics. + +In practice, I prefer `hey`, but I mention `ab` because it is a well known tool in this space that you are likely to encounter if you research this load generator topic. + +Operating this kind of tool is trivial: + +```bash +$ hey https://www.example.com +``` + +In this example, hey will try to open up a large number of concurrent connections to the URL and make a bunch of requests. When the tool is done, it will report how many of the requests were successful and some related timing information and statistics. Using a load generator like this lets you synthesize traffic to learn how your site is going to perform. + +I'd suggest you be careful where you tell these tools to operate. If you're not careful, *you could cause a Denial of Service attack on your own machines.* The flood of requests might make your site unavailable to other users by consuming all your server's resources. Think twice before pointing this at your live site! + +### Locust + +The previous load generator tools that I mentioned act as somewhat crude measurements because you're limited to testing a single URL at a time. What should you do if you need to simulate traffic that matches real user usage patterns? Enter {{< extlink "https://locust.io/" "Locust" >}}. Locust is not a tool that I would reach for casually, but it is super cool and worth knowing about. + +The goal of Locust is to do load testing on your site in a realistic way. This means that it's your job to model the expected behavior of your users in a machine understandable way. If you know your users well (and I hope you do), then you can imagine the flows that they might follow when using your site. + +In Locust, you codify the behavior patterns that you care about, then run Locust to simulate a large number of users that will act like you expect (with randomness to boot to really make the test like reality). + +Advanced load testing is something you may never need for your site, but it's pretty cool to know that Python has you covered if you need to understand performance and the limits of your site at that deep level. + +### Application Performance Monitoring (APM) + +{{< web >}} +Earlier in this article, +{{< /web >}} +{{< book >}} +Earlier in this chapter, +{{< /book >}} +I mentioned that Application Performance Monitoring tools can show you CPU and memory utilization of your site. That's usually just the tip of the iceberg. + +An APM tool often goes far beyond hardware resource measurement. I like to think of APMs as a supercharged version of the debug toolbar. + +First, an APM is used on live sites typically. The tool will collect data about real requests. This lets you learn about the *real* performance problems on the site that affect *real* users. + +For instance, New Relic will collect data on slow requests into "traces." These traces are aggregated into a set to show you which pages on your site are the worst performers. You can drill into that list, view an individual trace, and investigate the problem. + +Maybe you've got an N+1 bug. Maybe one of your database tables is missing an index on an important field, and the database is scanning too many records during `SELECT` statements. These traces (or whatever they are called in other services) help you prioritize what to fix. + +In fact, an APM highlights the true value of measurement. If I can leave you with a parting thought about optimization, think about this: ***optimize where it counts***. + +Here's a simple thought experiment to illustrate what I mean. You have an idealized system that does two things repeatedly: + +* One task (A) is 90% of all activity on the site. +* The other task (B) is the remaining 10%. + +If you have to pick a target to try to optimize because your system performance is inadequate, which one do you pick? + +Let's assume that you know an optimization for each type of task that could cause the task to execute in 50% of the time. If implementing each optimization is the same level of effort, then there is a clear winner as to which task you should optimize. You could either: + +* Optimize A for 90% * 50% for a total system saving of 45%. +* Optimize B for 10% * 50% for a total system saving of 5%. + +In most circumstances, spend your optimization effort on the area that will have outsized impact (i.e., pick task A as much as you can). Sometimes the hard part is figuring out which task is A and which task is B. Monitoring tools like an APM can help you see where the worst offenders are so you can focus your limited time in the right spot. + +## Summary + +{{< web >}} +In this article, +{{< /web >}} +{{< book >}} +In this chapter, +{{< /book >}} +we looked into making Django apps go fast. + +We saw: + +* A mental model for thinking about performance optimization +* Different types of performance bottlenecks +* How to get your system to do more by either horizontal or vertical scaling +* How to get your app to do less work by optimizing database queries and caching +* Tools to aid you in all of this optimization work + +{{< web >}} +In the next article, +{{< /web >}} +{{< book >}} +In the next chapter, +{{< /book >}} +we'll look into security. + +You'll learn about: + +* How Django helps you be more secure with some of its design features +* What those different warnings from `./manage.py check --deploy` mean +* Fundamental things you should consider to help keep your site secure + +{{< web >}} +If you'd like to follow along with the series, please feel free to sign up for my newsletter where I announce all of my new content. If you have other questions, you can reach me online on Twitter where I am {{< extlink "https://twitter.com/mblayman" "@mblayman" >}}. +{{< /web >}} +  diff --git a/content/understand-django/2022-03-10-secure-apps.pt.md b/content/understand-django/2022-03-10-secure-apps.pt.md new file mode 100644 index 0000000..62f9dba --- /dev/null +++ b/content/understand-django/2022-03-10-secure-apps.pt.md @@ -0,0 +1,296 @@ +--- +title: "Security And Django" +description: >- + You want to protect your users' privacy, right? The goal is noble and users demand it, but how do you do it? In this Understand Django article, we'll look at some areas that improve the security of your application. +image: img/django.png +type: post +categories: + - Python + - Django +tags: + - Python + - Django + - security +series: "Understand Django" + +--- + +In the last +{{< web >}} +[Understand Django]({{< ref "/understand-django/_index.pt.md" >}}) article, +{{< /web >}} +{{< book >}} +chapter, +{{< /book >}} +we learned about where apps slow down. We explored techniques that help sites handle the load and provide a fast experience for users. + +{{< web >}} +With this article, +{{< /web >}} +{{< book >}} +With this chapter, +{{< /book >}} +we will look at security. How does a Django site stay safe on the big, bad internet? Let's find out. + +{{< understand-django-series-pt "security" >}} + +## A Security Confession + +I have a confession to make. Of all the topics that I've covered about Django +{{< web >}} +in this series, +{{< /web >}} +{{< book >}} +in this book, +{{< /book >}} +*this is my least favorite one.* +{{< web >}} +Perhaps that's why I've pushed the subject so far into this list of articles. +{{< /web >}} + +I have a very hard time getting excited about security because it feels like a pure cost to me. As developers, we're in this arms race against malicious people who want to steal and profit from the data of others. In a perfect world, everyone would respect the privacy of others and leave private data alone. Alas, the world is far from perfect. + +The bad actors have devised clever and tricky methods of exploiting websites to steal data. Because of this, application developers have to implement guards in an attempt to prevent these exploits. Implementing those guards detract from the main objective of site building and often feels like a drag on efficiency. + +All that being said, **security is super important**. Even if you're like me and the topic doesn't naturally interest you (or actively feels like a waste of time), the security of your application matters. + +* Privacy matters. +* Trust matters. + +If we cannot protect the information that users of our Django sites bring, then trust will rapidly erode and, most likely, your users will disappear along with it. + +As noted in this section, security is not my favorite topic. I'm going to describe some security topics as they relate to Django, but if you want to learn from people who *love* security, then I would recommend reading from the {{< extlink "https://owasp.org/" "Open Web Application Security Project" >}}. This popular group can teach you far more about security than I can, and do it with gusto! + +## The Three **C**s + +Learning about security involves learning a bunch of acronyms. I don't know if this is something that security researchers like to do, or if the reason is because the problems that the acronyms stand for are challenging to understand. Either way, let's look at three common acronyms that start with C and the problems they address. + +### CSRF + +{{< web >}} +In a number of these Understand Django articles, +{{< /web >}} +{{< book >}} +In a number of these chapters, +{{< /book >}} +I have discussed CSRF briefly. +{{< web >}} +In the forms article, +{{< /web >}} +{{< book >}} +In the forms chapter, +{{< /book >}} +I did some hand waving and stated that you need a CSRF token for security reasons and basically said "trust me" at the time. + +CSRF stands for *Cross Site Request Forgery*. In simple terms, a CSRF attack allows an attacker to use someone's credentials to a different site without their permission. With a bit of imagination, you can see where this goes: + +* Attacker socially manipulates a user to click a link. +* The click activity exploits the user's credentials to a site and changes something about the user's account like their email address. +* The attacker changes the email address to something they control. +* If the original site is something like an e-commerce site, the attacker may make purchases using the user's stored credit card information. + +Django includes a capability to help thwart this kind of attack. Through the use of **CSRF tokens**, we can help prevent bad actors from performing actions without user consent. + +A CSRF token works by including a generated value that gets submitted along with the form. The template looks like: + +{{< web >}} +```django +
+ {% csrf_token %} + + +
+``` +{{< /web >}} +{{< book >}} +```djangotemplate +
+ {% csrf_token %} + + +
+``` +{{< /book >}} + +When this renders, the result would be something like: + +```html +
+ + + +
+``` + +The value would naturally be different from my example. When the form is submitted, the CSRF token gets checked for validity. A valid CSRF token is required to make a `POST` request, so this level of checking can help prevent attackers from changing a user's data on your site. + +You can learn more about CSRF with Django's {{< extlink "https://docs.djangoproject.com/en/4.1/ref/csrf/" "Cross Site Request Forgery protection" >}} reference page. + +### CORS + +Imagine that you've built `myapp.com`. For your user interface, instead of Django templates, you built a client UI using a JavaScript framework like {{< extlink "https://vuejs.org/" "Vue.js" >}}. Your application is serving static files at `myapp.com`, and you built a Django-powered API that is handling the data which gets called at `api.myapp.com`. + +In this scenario, browsers will require you to set up CORS. CORS is *Cross-Origin Resource Sharing*. The goal of CORS is to help protect a domain from undesirable access. + +In our example, your API at `api.myapp.com` may only be designed to work with the user interface at `myapp.com`. With CORS, you can configure your API so that it will reject any requests that do not come from the `myapp.com` domain. This helps prevent bad actors from using `api.myapp.com` in the browser. + +Django does *not* include tools to handle CORS configuration from the core package. To make this work, you'll need to reach for a third party package. Since CORS configuration is handled through HTTP headers, you'll find that the very appropriately named {{< extlink "https://github.com/adamchainz/django-cors-headers" "django-cors-headers" >}} package is exactly what you need. + +I won't walk through the whole setup of that package because the README does a good job of explaining the process, but I will highlight the crucial setting. With django-cors-headers, you need to set the `CORS_ALLOWED_ORIGINS` list. Anything not in that list will be blocked by CORS controls in the browser. Our example configuration would look like: + +```python +CORS_ALLOWED_ORIGINS = [ + "https://myapp.com", +] +``` + +As you read about CORS on the internet, you'll probably run into advice to set the HTTP header of `Access-Control-Allow-Origin: *`. This wildcard is what you'll get if you set `CORS_ALLOW_ALL_ORIGINS = True` in django-cors-headers. *This is probably not what you really want. Using this feature opts your site out of CORS protection.* Unless you have some public web API that is *designed* to work for many domains, you should try to avoid opting out of CORS. + +CORS is not a core concept that you will find in Django. If you want to learn more about the specifics of CORS, check out {{< extlink "https://developer.mozilla.org/en-US/docs/Web/HTTP/CORS" "Cross-Origin Resource Sharing (CORS)" >}} from the Mozilla Developer Network (MDN). + +### CSP + +The final **C** in our tour is *Content Security Policy* or CSP, for short. You might roughly think of CSP as the inverse of CORS. Where CORS defines what parts of the internet can access your domain, CSP defines what your domain can access from the internet. + +The goal of CSP is to protect users on your site from running JavaScript (and other potentially harmful resources like images) from places that you don't want. + +To understand how your site can be vulnerable to these kinds of attacks, we need to understand a well-known attack vector called Cross-Site Scripting (XSS). XSS is when a bad actor finds a way to run a script on your domain. Here's a classic way that XSS can happen: + +* A site has a form that accepts text data. +* Then the site displays that text data *in its raw form*. + +At first, that seems harmless: + +```html +
+ What could possibly go wrong? +
+``` + +For an honest interaction like "What could possibly go wrong?" as user input, that is truly harmless. What about this? + +```html +
+ I am getting sneaky. +
+``` + +Now, the user added a bit of HTML markup. Again, this is fairly benign and will only add some unanticipated italics. What if the user is a bit more clever than that? + +```html +
+ +
+``` + +Here's where a site gets into trouble. If this is rendered on a page, a little alert box will appear. You can imagine this happening in a forum or some other sharing site where multiple people will see this output. That's annoying, but it's still not horrible. + +What does a really bad scenario look like? A really bad scenario is where the bad guys figure out that your site is unsafe in this way. Consider this: + +```html +
+ +
+``` + +Now your users are really in trouble. In this final version, the bad guys won. A user's browser will download and execute whatever JavaScript is in `owned.js`. This code could do all kinds of stuff like using the `fetch` API to run AJAX requests that can change the user's account credentials and steal their account. + +How do we defend against this kind of attack? There isn't a singular answer. In fact, multiple layers of protection is often what you really want. In security, this idea is called "defense-in-depth." If you have multiple layers to protect your site, then the site may become a less appealing target for attackers. + +For this particular scenario, we can use a couple of things + +* HTML escaping of untrusted input +* CSP + +The real problem above is that the site is rendering user input without any modification. This is a problem with HTML because the raw characters are interpreted as HTML code and not just user data. + +The simplest solution is to make sure that any characters that mean something specific to HTML (like `<` or `>`) are replaced with escape codes (`<` or `>`) that will display the character in the browser without treating it like the actual HTML code character. **Django does this auto-escaping of user data by default**. You can disable this behavior for portions of a template using a variety of template tags like `autoescape` and the (ironically named?) `safe` tag. + +Because there are ways to opt out of safe behavior from HTML escaping and because clever attackers might find other ways to inject script calls into your site, CSP is another layer of protection. + +Primarily, CSP is possible with a `Content-Security-Policy` HTTP header. You can read all of the gritty details on the {{< extlink "https://developer.mozilla.org/en-US/docs/Web/HTTP/CSP" "Content Security Policy (CSP)" >}} article on MDN. Like CORS, CSP is not something that Django supports out-of-the-box. Thankfully, Mozilla (yep, the same Mozilla from MDN), offers a {{< extlink "https://django-csp.readthedocs.io/en/latest/index.html" "django-csp" >}} package that you can use to configure an appropriate policy for your Django site. + +In a content security policy, you mark everything that you want to allow. This fundamentally changes what requests your site will connect to. Instead of allowing everything by default, the site operates on a model that denies things by default. With a "deny by default" stance, you can then pick resources which you deem are safe for your site. Modern browsers respect the policy declared by the HTTP header and will refuse to connect to resources outside of your policy when users visit your site at your domain. + +There is something obvious that we should get out of the way. This setup and configuration requires *more work*. Having more secure systems requires effort, study, and plenty of frustration as you make a site more secure. The benefits to your users or customers is that their data stays safe. I think most people expect this level of protection by default. + +So, do you have to become a security expert to build websites? I don't think so. There is a set of fundamental issues with web application security that you should know about (and we've covered some of those issues already), but you don't need to be prepared to go to black hat conventions in order to work on the web. + +{{< web >}} +Before finishing up this security article, +{{< /web >}} +{{< book >}} +Before finishing up this security chapter, +{{< /book >}} +let's look at what Django provides so that you can be less of a security expert and use the knowledge of the community. + +## Check Command Revisited + +The Django documentation includes a good overview of the framework's security features on the {{< extlink "https://docs.djangoproject.com/en/dev/topics/security/" "Security in Django" >}} page. + +Aside from the content outlined on the security page, we can return to the check command +{{< web >}} +discussed in previous articles. +{{< /web >}} +{{< book >}} +discussed in previous chapters. +{{< /book >}} +Recall that Django includes a `check` command that can check your site's configuration before you deploy a site live. The command looks like: + +```bash +$ ./manage.py check --deploy +``` + +The output from this command can show where your configuration is less than ideal from a security perspective. + +The security warnings that come from running the `check` command are defined in `django.core.checks.security`. A more readable version of the available security checks is on the {{< extlink "https://docs.djangoproject.com/en/4.1/ref/checks/#security" "System check framework" >}} reference page. + +Scanning through the list of checks, you'll find that: + +* many checks center around configuring your site to run with HTTPS. Secure connections used to reference SSL for Secure Sockets Layer. Along the way, that layer changed names to TLS for Transport Layer Security. In practice, if you see either of those terms, think `https://`. +* other checks confirm that your site has the kinds of {{< extlink "https://docs.djangoproject.com/en/4.1/ref/middleware/#security-middleware" "middleware" >}} installed that offer some of the protection discussed previously (like CSRF). +* still other checks look for core settings that should be set like `DEBUG = False` and defining the `ALLOWED_HOSTS` setting. + +There is a good comment in the Django security checks reference docs that is worth repeating here: + +> The security checks do not make your site secure. They do not audit code, do intrusion detection, or do anything particularly complex. Rather, they help perform an automated, low-hanging-fruit checklist, that can help you to improve your siteโ€™s security. + +When you're thinking through security, do some homework and don't let your brain go on autopilot. Remember that users develop a trust relationship with your websites. That trust is easy to break and may cause people to leave your site forever. + +## Summary + +{{< web >}} +In this article, +{{< /web >}} +{{< book >}} +In this chapter, +{{< /book >}} +we explored security topics and how they relate to Django. + +We covered: + +* CSRF +* CORS +* CSP +* Cross-site scripting (XSS) +* The security checks and information available from the `check` command + +{{< web >}} +In the *last* article of the Understand Django series, +{{< /web >}} +{{< book >}} +In the *last* chapter, +{{< /book >}} +we'll get into debugging. + +You'll learn about: + +* Debugging tools like `pdb` +* Browser tools +* Strategies for finding and fixing problems + +{{< web >}} +If you'd like to follow along with the series, please feel free to sign up for my newsletter where I announce all of my new content. If you have other questions, you can reach me online on Twitter where I am {{< extlink "https://twitter.com/mblayman" "@mblayman" >}}. +{{< /web >}} +  diff --git a/content/understand-django/2022-05-31-debugging-tips-techniques.pt.md b/content/understand-django/2022-05-31-debugging-tips-techniques.pt.md new file mode 100644 index 0000000..d981f5c --- /dev/null +++ b/content/understand-django/2022-05-31-debugging-tips-techniques.pt.md @@ -0,0 +1,307 @@ +--- +title: "Debugging Tips And Techniques" +description: >- + Your Django app is up. You've got users. Your users are hitting bugs. How do you debug to fix the problems? That's the focus of this Understand Django article. +image: img/django.png +type: post +categories: + - Python + - Django +tags: + - Python + - Django + - debugging +series: "Understand Django" + +--- + +In the last +{{< web >}} +[Understand Django]({{< ref "/understand-django/_index.pt.md" >}}) article, +{{< /web >}} +{{< book >}} +chapter, +{{< /book >}} +we looked at security. How does a Django site stay safe on the big, bad internet? +{{< web >}} +The article explored some core elements +{{< /web >}} +{{< book >}} +The chapter explored some core elements +{{< /book >}} +for making a Django app more secure. + +{{< web >}} +With this article, +{{< /web >}} +{{< book >}} +With this chapter, +{{< /book >}} +we will investigate problem solving techniques for Django apps. The goal is to equip you with tools to fix the real problems that you'll hit when building your Django site. + +{{< understand-django-series-pt "debugging" >}} + +## Systematic Problem Solving + +When you have real users using your website and they report problems with the site, what can you do to fix the problems? Having a mental framework for how to fix problems is really useful, because it provides a repeatable process that you'll be able to use as long as you're building on the web. Let's discuss the mental framework that I use, which I think of as *systematic problem solving*. + +The first thing to drill into your head is that computers are deterministic. Given a set of inputs, they will produce a repeatable output. This is crucial to remember because the art of problem solving with computers is about sleuthing the state of your system (i.e., a Django website in this context) to determine why a given set of inputs produced an undesirable output, also known as a *bug*. + +When debugging a problem, the challenge is figuring out what all those inputs are. The inputs can vary wildly. Successful problem solvers are those who can find the inputs that caused the bad output. + +This is the core of systematic problem solving. With systematic problem solving, you want repeatable patterns that help you build up a mental model of how a problem occurred. Once you understand the problem, you'll be ready to create the solution and fix your bug. + +*How do you repeatably figure out problems with a website?* + +In my experience, the most fruitful way to understand a problem is to reproduce it. To reproduce a problem requires context and data about what happened in the first place. + +What are the sources of this data? + +{{< web >}} +* Error monitoring services like Rollbar or Sentry can be a fantastic source of data. +* Log data, which we'll discuss later in this article, can be another excellent source. +{{< /web >}} +{{< book >}} +* Error monitoring services like Rollbar or Sentry can be a fantastic source of data. +* Log data, which we'll discuss later in this chapter, can be another excellent source. +{{< /book >}} + +Error monitoring services help show what happened. These services will collect tracebacks of Python exceptions and display them in context of a particular request. Some of these tools can even show the sequence of events that led to the failure case. + +My favorite strategy for cases like this is to produce an automated test that can trigger the same exception as the one reported by the error monitoring service. This is often a mechanical transformation of the data provided on an error report page into the setup data of a unit test. + +The great part about having a unit test is that you now have something that you can repeat with ease. This is a lot better than clicking around on your site and trying to recreate a problem manually. Because you've captured the problem scenario in test code, you can know *exactly* when you've **fixed** the problem. You can work on the site's code and run the test over and over until you've devised a solution that overcomes your error. + +This process is a systematic approach that you can apply again and again to solve problems. The process in a nutshell looks like: + +* Collect data from the failure scenario +* Transform that data into an automated test that demonstrates the problem in a repeatable fashion. +* Fix your site's code until the test passes and the problem is resolved. + +The secondary benefit of this systematic model is that your site has a passing unit test when you're done. This test acts as a guard against future failures of a similar kind (i.e., a "regression" test). + +I don't want to oversell the value of these kinds of tests. Regression tests can be useful, but in my experience, I've rarely seen regression tests fail after the offending code is fixed. Nonetheless, if the test is fast enough, these kinds of tests are worth keeping in your automated test suite for those rare cases where you *do* encounter a regression failure. + +Maybe my unit testing approach doesn't appeal to you. That's fine. Do what works well for you. My larger point is that you should try to take a systematic approach to problem solving. **Find repeatable patterns that you can apply to your context.** + +Approaching problems with a process in hand helps you avoid feeling stuck. Having methods for solving problems also helps when the pressure is on, and you're trying to fix a critical and time sensitive bug for your customers. Not knowing what to do in a high pressure scenario can add even more stress to an already stressful situation. + +## `print` Without Shame + +In the previous section, I wrote "Fix your site's code until the test passes," but I didn't explain how you'd actually do that. + +When you're working with a failing test and you're trying to make it pass to solve a problem, you have to understand the scenario at a pretty deep level. + +One tool to reach for is the `print` function. You can sprinkle some calls to `print` into the code you're working with to gain an understanding of what is happening. + +In your journey in programming, you'll run across this idea that "print debugging" is a bad idea. Not everyone espouses this idea, but it exists out there. In the same breath, you'll read that you should be using a debugger instead or some other tool. + +Debuggers, which are great tools that we'll cover shortly, are not the only tool worth knowing. In fact, in many cases, debuggers are a terrible tool for the job. + +What does the `print` function do that makes it so useful? **`print` provides a chronology.** + +With the `print` function, you can observe changes *over time*. The `print` function can answer questions like: + +* What is the value of my variable in each iteration through the loop? +* Is this block of code even reached by the interpreter? +* What is the before and after state after a section of code executes? + +The nice part about using `print` in an automated test is that you can see all the data, all at once. The results of a handful of `print` calls starts to tell a story of what happened during execution. If your "story" isn't rich enough and meaningful enough to understand what's happening, you add more `print` calls and run the test again. + +I'd estimate that I am able to use `print` for 70-80% of my debugging needs. + +What should you print? + +* One simple strategy is to print a number sequence. Adding `print(1)`, `print(2)`, or more to various lines can show what parts of code are executing or what patterns are happening in loops. +* Sometimes I'll print the values of conditionals for branches that I'm interested in. This helps me figure out when certain branches in the code are taken versus when they are not. +* Other times I'll add prints with a bunch of newlines to visually separate things. `print('\n\nHERE\n\n')` is surprisingly effective. + +The answer to "what should you print?" is really "what do you want to know?" As you use `print`, you'll develop a good feel for what information you find most useful. + +Tip: As of Python 3.8, you can print the variable name in an f-string along with its value: + +```python +>>> foo = 5 +>>> print(f'Hello world {foo=}') +Hello world foo=5 +``` + +`print` is pretty great, but it isn't always my tool of choice. For instance, `print` is not the best tool when problem solving a live site. `print` also doesn't work as well when I need to get really detailed information about the state of my program. + +When do I reach for something else? Let's look at debuggers next. + +## Debuggers + +If 70-80% of the time I can get away with `print`, what about the rest of the time? + +Sometimes I'll use a debugger. A debugger is a specialized tool that allows you to go through your Python code, one line at a time. + +Considering how much code we write and how much we use from other packages, this means that a debugger can be a slow tool to utilize. A debugger makes up for this slowness with an unparalleled amount of information about what is going on at the exact moment that the Python interpreter executes a line of code. + +How do you use a debugger? + +Assuming that you're following my strategy of writing an automated test when you're working on a problem, the simplest way to start a debugger is by adding `breakpoint()` before the code you want to check. + +In a standard Python installation, adding this function call will pause your program by running the Python debugger, `pdb`, starting from the call to `breakpoint`. + +If you do this, you'll be left at a prompt that starts with `(Pdb)`. From here, you'll need to know some commands to navigate within the debugger. I'll cover the primary commands I use here, but you can type `h` to see a list of the available commands with instructions for how to get more help info. + +My natural inclination in the debugger is to know where I am. The `l` command will *list* code that the interpreter is about to execute, along with an arrow showing the next line that Python will run. I may also want to know where I am in the call stack (i.e., the history of calls that go back all the way to the main function that started the Python process). By using the `w` command to show *where* I am, pdb will show the call stack with the current line of code listed last by the prompt and the oldest function listed at the top of the output. These two commands give me the context at any particular moment. + +Next, I'll often want to know the values of local variables when I'm debugging. I can either use `p` to *print* the value or `pp` to *pretty print* in cases when I have a structure like a list or dictionary. + +All the previous commands orient me to where my code is and what values are in the data structures. With that context, I'm ready to work through my code with two additional commands. In a debugger, you advance the Python interpreter a line at a time. There are two styles to do this. + +One way is with the `n` command to go to the *next* line. This command is what you want when you don't really care about how a line runs. For instance, if the line calls a function that you know works, `n` is the way to pass that line to let it execute in its entirety. + +The other command to advance the interpreter is the `s` command. The `s` command lets you *step* through the code at the smallest possible increment. That means if you're on a line with a function call, the `s` command will move *into* the function call to show what's executing inside of it. I view the step command as the very fine adjustment command to run as I get close to the problem area in my code. + +Whenever I'm done debugging my code and have seen all that I need to see, I finish with the `c` command to *continue* normal execution. + +What's great about having a unit test is that I can run the debugger in a very coarse way by liberally using the `n` command. As I hit the error state (like an exception happening), I can continue, restart the test, then quickly return to right before the point in time when the error occurs. At that juncture, I switch to the `s` command until I find the exact moment when something goes wrong. This process helps me avoid wasting time by stepping through parts of the code that don't matter. + +The debugger has more interesting features like setting breakpoints and checking data at different parts of the call stack, but I hope this description gives you an idea of how to apply a debugger in your workflow. + +Learn more about the debugger and `pdb` by checking out the {{< extlink "https://docs.python.org/3/library/pdb.html" "pdb standard libary documentation" >}}. + +## Use The Source, Luke + +At times, you're going to run into problems with code that is not part of your project. This might be code in the Django source or it might be code from some other library that you happen to use in your project. + +What I have found over time is that many less experienced developers will hit this situation and suddenly freeze. There's some kind of mental barrier preventing them from looking any further than the boundary of their own code. + +Here's my tip: **don't freeze!** + +When you wake up to the reality that it's all just code that people wrote, you cross that mental barrier and move into realms that others may fear to tread. The funny part is that this code is often no more special than anything that you might write yourself! In fact, many open source developers are fantastic programmers, and fantastic programmers often know how to write clear, maintainable, and well-documented code. + +So, listen to your inner "Obi-Wan" and use the *source*, Luke! + +* If you're hitting a problem with a package, look at the source code on GitHub (or wherever the code is hosted). You can study the modules and trace through how the code would execute. +* If that's not enough, don't be afraid to step through functions in other libraries with your debugger as you are testing out whatever problem that you are trying to fix. +* Remember `print` debugging? Who says you can't do that with other software? If I've got a virtual environment and I need to figure out what's going on with how my software interacts with a library, I will sometimes add `print` statements directly to the library files in my virtual environment's `site-packages` directory. This is a great trick! + +There are limitations to this approach because some packages will include code sections that are compiled with C. That kind of code is harder to peek into, but I've found that the majority of Python packages that you find on PyPI are Python-only. That makes most packages great candidates for digging into the source code. + +## Logging Chronology + +Logging is a topic that we will only scratch the surface of, but the subject is important to round out a discussion of debugging tools. + +A logging system provides the ability to record messages during the execution of your application. The easiest way to think of logging is as a permanent set of `print` calls that record information every time your functions or methods call the logging system. + +When logging is used in an application, you can capture what happened within your app even if you weren't there to observe the system at the moment that it happened. The logging system forms a chronological log of events that happen within a system. + +This has some big benefits, but comes with some tradeoffs to consider. + +* **Pro**: When you have sufficient logging, you can see what is happening in the app when your users are working in the app. +* **Pro**: More advanced logging configurations can include metadata that you can filter on to look for specific kinds of activity. This can aid your diagnostic abilities. +* **Con**: Logging generates data. While this enables the points above, you now have a new challenge of *managing* this new source of data. +* **Con**: Log management is even more of a challenge as an application system grows. When you have multiple servers, where does all that log data go? + +This concept of logging is not a Django-specific idea. In fact, you can find a `logging` module in the Python standard library that serves as the basis for logging in the Python world. Django builds its logging features on top of the `logging` module. + +The short version of logging in Django is that you can use logging by configuring the `LOGGING` settings in your Django settings module. This process is described in the documentation at {{< extlink "https://docs.djangoproject.com/en/4.1/howto/logging/" "How to configure and use logging" >}}. + +In a basic workflow, logging can be little more than writing lines of data to a log *file*. If you're running an app on a VPS like Digital Ocean and have control over your filesystem, this is a good starting option. If you choose that path, keep in mind that disk space is a limited resource. One way to tank your app is to fill your machine's disk with logging data. To prevent this, you should investigate how to use logrotate. {{< extlink "https://linux.die.net/man/8/logrotate" "logrotate" >}} is a command that can archive your log data on whatever frequency you desire and clean up old data to prevent your storage from filling completely. + +You may follow a path like I've recommended frequently and deploy your app on a Platform as a Service (PaaS) like Heroku. In that kind of environment, you have less control over the tools that are available to run your application. In the case of Heroku, the platform does not guarantee that your app will stay on the same machine. Because of that constraint, you can't rely on a log file on the machine. Instead, you would need to add an additional service to the application that will store logs for your later use. {{< extlink "https://www.papertrail.com/" "Papertrail" >}} is an example of a log aggregation service that works with Heroku. + +As you can see, logging brings some complexity to manage if you choose to use it. Logs can be a great tool to understand what happened when a user used your app, especially if there was an error that didn't raise an exception. For me, personally, I *don't* reach for logging right away. While logging can be useful, it can be an overwhelming amount of data that can turn debugging into a "needle in a haystack" problem. Logging is still a potentially useful tool for your toolbox as you think about how to debug your apps. + +## Summary + +{{< web >}} +In this article, +{{< /web >}} +{{< book >}} +In this chapter, +{{< /book >}} +we saw debugging tips, tools, and techniques to make you a bug crushing machine in Django. + +We discussed: + +* A systematic method for finding and fixing bugs in your app +* How `print` is an awesome debugging tool that should be used without shame +* Debugging with a "proper" debugging tool for particularly thorny problems +* Using the source code of packages you didn't write to figure out what's going on +* Logging and the ability to see a history of activity in your application + +{{< web >}} +That's the end of this series! +{{< /web >}} +{{< book >}} +That's the end of this book! +{{< /book >}} +Can you believe there are still more topics +{{< web >}} +that this series could cover? +{{< /web >}} +{{< book >}} +that this book could cover? +{{< /book >}} +Django has so much available that, +{{< web >}} +even after twenty articles, +{{< /web >}} +{{< book >}} +even after all these chapters, +{{< /book >}} +I've not covered everything. But this is the end of the line for me. + +{{< web >}} +I knew when I started this series +{{< /web >}} +{{< book >}} +I knew when I started this book +{{< /book >}} +that I was going to cover a huge number of topics. What I didn't know when I started this in January of 2020 is how wacky the world would be. +{{< web >}} +I thought that I might produce an article every other week +{{< /web >}} +{{< book >}} +I thought that I might produce a chapter every other week +{{< /book >}} +and be done in less than a year. *How wrong I was!* More than two years later, I'm writing the words +{{< web >}} +of this last article. +{{< /web >}} +{{< book >}} +of this last chapter. +{{< /book >}} + +{{< web >}} +As I wrap up this series, +{{< /web >}} +{{< book >}} +As I wrap up this book, +{{< /book >}} +my hope is that you, dear reader, have come to understand Django better. Django is a web framework written by people like you and me. That community, through years of collaboration, polished a tool that can bring your ideas to life on the web. + +While the framework might initially seem magical in all that it is able to do, we can see through study and examination that the code is comprehensible. The magic becomes less magical, but we can grow in our respect for those who contributed untold hours to making something so useful for others (for *free* in most cases!). + +{{< web >}} +I would like to conclude this series +{{< /web >}} +{{< book >}} +I would like to conclude this book +{{< /book >}} +by thanking all of you readers out there. Along this lengthy journey, so many of you have reached out +{{< web >}} +and told me how this series has helped you grow +{{< /web >}} +{{< book >}} +and told me how this book has helped you grow +{{< /book >}} +as a Django developer. I'm hopeful that developers have used these words to learn and create websites that help their own communities. Knowing that I've impacted some of you and, by extension, the communities that you've helped, +{{< web >}} +made writing this series worth it. +{{< /web >}} +{{< book >}} +made writing this book worth it. +{{< /book >}} + +> Thank you for reading! + +{{< web >}} +If you'd like to see more content like this series, please feel free to sign up for my newsletter where I publish all new things first. If you have other questions, you can reach me online on Twitter where I am {{< extlink "https://twitter.com/mblayman" "@mblayman" >}}. +{{< /web >}} +