-
Notifications
You must be signed in to change notification settings - Fork 708
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Package-Requirements passes along public URL constraints in packaged_requirements.txt #348
base: v2.7.2
Are you sure you want to change the base?
Conversation
Added functionality to ensure "package_requirements.txt" doesn't reference a public URL cause packages to not get installed.
Add AIRFLOW_VERSION environment variable to ensure it is usable in entrypoint.sh.
Fixed spacing.
Added contextual information to the readme, about this update and the constraints.txt file.
@url54 I had the same issue here as well. thanks for this PR. wondering if it would be cleaner/possible to add the |
Modified your code slightly and can confirm this worked for me:
|
Hey @charlielu05
No worries, I have been working with local runner and users, just noticed it one day.
That's actually a really good idea. That would make it work, regardless if the user read through all the documentation or not. Before, they would have to know the constraints.txt existed in that directory, and they would have had to move it to their S3 DAGs/ directory knowingly. So yeah, much cleaner and more user friendly, good catch! Probably need to modify the README.md, from what I suggested, as well then. Since we won't have to inform them that it exists in a directory and that they will have to upload it anymore. Can probably still mention that we are doing it, through the process? But won't be nearly as verbose as I was originally. |
@charlielu05 Hey Charlie, this is my first pull request so I am pretty inexperienced with this XD, whatever you think is best works for me. Do you need me to do anything specific from my end? |
I've opened a PR to your branch, from what I understand if you approve and merge those changes then it should be reflected here :) |
Add constraints.txt to plugins.zip, updated readme
@charlielu05 Done! =) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like this change, and we should port this to the other versions as well to maintain consistency.
@@ -139,6 +139,8 @@ For example usage see [Installing Python dependencies using PyPi.org Requirement | |||
- There is a directory at the root of this repository called plugins. | |||
- In this directory, create a file for your new custom plugin. | |||
- Add any Python dependencies to `requirements/requirements.txt`. | |||
- Adds a local `constraints.txt` file to the `plugins/` directory, this is zipped together into the `plugins.zip` artefact. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Typo: artefact -> artifact
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed spelling to artifact
@@ -139,6 +139,8 @@ For example usage see [Installing Python dependencies using PyPi.org Requirement | |||
- There is a directory at the root of this repository called plugins. | |||
- In this directory, create a file for your new custom plugin. | |||
- Add any Python dependencies to `requirements/requirements.txt`. | |||
- Adds a local `constraints.txt` file to the `plugins/` directory, this is zipped together into the `plugins.zip` artefact. | |||
- Creates a new `packaged_requirements.txt` file with the correct configuration for `--find-links` and `--constraint`. This file should be the one you rename to `requirements.txt` and upload to S3 to be used by MWAA. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Renaming isn't strictly necessary since you can configure the MWAA environment to the requirements file regardless of name. This can be left up to the individual user. The second sentence could be removed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed second sentence.
docker/Dockerfile
Outdated
@@ -21,6 +21,7 @@ ARG INDEX_URL="" | |||
ENV AIRFLOW_HOME=${AIRFLOW_USER_HOME} | |||
ENV PATH="$PATH:/usr/local/airflow/.local/bin:/root/.local/bin:/usr/local/airflow/.local/lib/python3.10/site-packages" | |||
ENV PYTHON_VERSION=3.11.6 | |||
ENV AIRFLOW_VERSION=2.7.2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nitpick: ENV AIRFLOW_VERSION=$AIRFLOW_VERSION
to reduce any source for error and avoid having the same magic-number defined in two places.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using curly brackets to keep consistency with AIRFLOW_HOME
environment variable in line 21
docker/script/entrypoint.sh
Outdated
@@ -41,10 +41,11 @@ package_requirements() { | |||
if [[ -e "$AIRFLOW_HOME/$REQUIREMENTS_FILE" ]]; then | |||
echo "Packaging requirements.txt into plugins" | |||
pip3 download -r "$AIRFLOW_HOME/$REQUIREMENTS_FILE" -d "$AIRFLOW_HOME/plugins" | |||
wget "https://raw.githubusercontent.com/apache/airflow/constraints-${AIRFLOW_VERSION}/constraints-${PYTHON_VERSION%.*}.txt" -O $AIRFLOW_HOME/plugins/constraints.txt |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you compare the local runner images' constraints file against the one published by Apache Airflow, you will notice there are a few differences for some versions. The source of truth should be the constraints.txt file which exists in this repository. Additionally; users can make changes to this constraints file to fit their use-cases. You can copy it into the plugins directory.
I am also wondering if it's better to place the constraints file in the plugins directory in the package-requirements()
function instead of being a step in the startup script. This will ensure that the constraint file will be populated if you delete the plugins directory files before executing the package command.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Modified to fetch the constraints from aws-mwaa-local-runner
instead of airflow.
Also for consistency, would it be wise to also switch out the constraint file in requirements/requirements.txt
since it is also currently pointing to the apache airflow constraint instead of this repo?
--constraint "https://raw.githubusercontent.com/apache/airflow/constraints-2.7.2/constraints-3.11.txt"
On your last point, do you mean to populate the constraint in that startup script instead of this package-requirements
function?
docker/script/entrypoint.sh
Outdated
cd "$AIRFLOW_HOME/plugins" | ||
zip "$AIRFLOW_HOME/requirements/plugins.zip" * | ||
printf '%s\n%s\n' "--no-index" "$(cat $AIRFLOW_HOME/$REQUIREMENTS_FILE)" > "$AIRFLOW_HOME/requirements/packaged_requirements.txt" | ||
printf '%s\n%s\n' "--find-links /usr/local/airflow/plugins" "$(cat $AIRFLOW_HOME/requirements/packaged_requirements.txt)" > "$AIRFLOW_HOME/requirements/packaged_requirements.txt" | ||
printf '%s\n%s\n%s\n%s\n' "--find-links /usr/local/airflow/plugins" "--constraint /usr/local/airflow/plugins/constraints.txt" "$(cat $AIRFLOW_HOME/requirements/packaged_requirements.txt | grep -v '^--constraint .https://*')" > "$AIRFLOW_HOME/requirements/packaged_requirements.txt" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we could loosen the grep to just ^--constraint
. There are use-cases where a customer needs to customize the constraints file, so they will use the one present in the config/constraints.txt
, instead of the one published by Apache Airflow.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Applied this change.
@url54, I've made most of the changes that @mayushko26 commented on. Could you kindly review and merge the PR to your repo? url54#2 |
changes to reflect PR comments
hi @mayushko26, would you be able to kindly review the new merged code? |
Issue #, if available:
Currently when using
package-requirements
if your using constraints in your requirements.txt, these are also added to package_requirements.txt. If you don't modify it afterwards, PIP will fail to install in private MWAA as it won't reach public URL in constraints line.Description of changes:
Provided a means of "grep" removing the used constraints, and adding in a local
constraints.txt
file, that is created usingwget
after package-requirements has completed. Ensuring that everything is completed together, in the same directory.By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.