diff --git a/DAGs.md b/DAGs.md index 02b0db93e3b..a5759a2be97 100644 --- a/DAGs.md +++ b/DAGs.md @@ -14,7 +14,6 @@ The DAGs are shown in two forms: The following are DAGs grouped by their primary tag: - 1. [Commoncrawl](#commoncrawl) 1. [Data Refresh](#data_refresh) 1. [Database](#database) 1. [Maintenance](#maintenance) @@ -22,15 +21,6 @@ The following are DAGs grouped by their primary tag: 1. [Provider](#provider) 1. [Provider Reingestion](#provider-reingestion) -## Commoncrawl - -| DAG ID | Schedule Interval | -| --- | --- | -| `commoncrawl_etl_workflow` | `0 0 * * 1` | -| `sync_commoncrawl_workflow` | `0 16 15 * *` | - - - ## Data Refresh | DAG ID | Schedule Interval | diff --git a/README.md b/README.md index 3a253882f48..25722aa753e 100644 --- a/README.md +++ b/README.md @@ -10,12 +10,25 @@ This repository contains the methods used to identify over 1.4 billion Creative Commons licensed works. The challenge is that these works are dispersed throughout the web and identifying them requires a combination of techniques. -Two approaches are currently in use: +Currently, we only pull data from APIs which serve Creative Commons licensed media. +In the past, we have also used web crawl data as a source. -1. Web crawl data -2. Application Programming Interfaces (API Data) +## API Data + +[Apache Airflow](https://airflow.apache.org/) is used to manage the workflow for +various API ETL jobs which pull and process data from a number of open APIs on +the internet. -## Web Crawl Data +### API Workflows + +To view more information about all the available workflows (DAGs) within the project, +see [DAGs.md](DAGs.md). + +See each provider API script's notes in their respective [handbook][ov-handbook] entry. + +[ov-handbook]: https://make.wordpress.org/openverse/handbook/openverse-handbook/ + +## Web Crawl Data (retired) The Common Crawl Foundation provides an open repository of petabyte-scale web crawl data. A new dataset is published at the end of each month comprising over @@ -31,10 +44,10 @@ The data is available in three file formats: For more information about these formats, please see the [Common Crawl documentation][ccrawl_doc]. -Openverse Catalog uses AWS Data Pipeline service to automatically create an Amazon EMR -cluster of 100 c4.8xlarge instances that will parse the WAT archives to identify +Openverse Catalog used AWS Data Pipeline service to automatically create an Amazon EMR +cluster of 100 c4.8xlarge instances that parsed the WAT archives to identify all domains that link to creativecommons.org. Due to the volume of data, Apache -Spark is used to streamline the processing. The output of this methodology is a +Spark was also used to streamline the processing. The output of this methodology was a series of parquet files that contain: - the domains and its respective content path and query string (i.e. the exact @@ -45,26 +58,13 @@ series of parquet files that contain: - the location of the webpage in the WARC file so that the page contents can be found. -The steps above are performed in [`ExtractCCLinks.py`][ex_cc_links]. +The steps above were performed in [`ExtractCCLinks.py`][ex_cc_links]. + +This method was retired in 2021. [ccrawl_doc]: https://commoncrawl.org/the-data/get-started/ [ex_cc_links]: archive/ExtractCCLinks.py -## API Data - -[Apache Airflow](https://airflow.apache.org/) is used to manage the workflow for -various API ETL jobs which pull and process data from a number of open APIs on -the internet. - -### API Workflows - -To view more information about all the available workflows (DAGs) within the project, -see [DAGs.md](DAGs.md). - -See each provider API script's notes in their respective [handbook][ov-handbook] entry. - -[ov-handbook]: https://make.wordpress.org/openverse/handbook/openverse-handbook/ - ## Development setup for Airflow and API puller scripts There are a number of scripts in the directory @@ -224,12 +224,13 @@ openverse-catalog ├── openverse_catalog/ # Primary code directory │ ├── dags/ # DAGs & DAG support code │ │ ├── common/ # - Shared modules used across DAGs -│ │ ├── commoncrawl/ # - DAGs & scripts for commoncrawl parsing +│ │ ├── data_refresh/ # - DAGs & code related to the data refresh process │ │ ├── database/ # - DAGs related to database actions (matview refresh, cleaning, etc.) │ │ ├── maintenance/ # - DAGs related to airflow/infrastructure maintenance │ │ ├── oauth2/ # - DAGs & code for Oauth2 key management │ │ ├── providers/ # - DAGs & code for provider ingestion │ │ │ ├── provider_api_scripts/ # - API access code specific to providers +│ │ │ ├── provider_csv_load_scripts/ # - Schema initialization SQL definitions for SQL-based providers │ │ │ └── *.py # - DAG definition files for providers │ │ └── retired/ # - DAGs & code that is no longer needed but might be a useful guide for the future │ └── templates/ # Templates for generating new provider code diff --git a/docker-compose.override.yml b/docker-compose.override.yml index ffd5daf9bef..e2da3531b3b 100644 --- a/docker-compose.override.yml +++ b/docker-compose.override.yml @@ -27,7 +27,7 @@ services: MINIO_ROOT_USER: ${AWS_ACCESS_KEY} MINIO_ROOT_PASSWORD: ${AWS_SECRET_KEY} # Comma separated list of buckets to create on startup - BUCKETS_TO_CREATE: ${OPENVERSE_BUCKET},openverse-airflow-logs,commonsmapper-v2,commonsmapper + BUCKETS_TO_CREATE: ${OPENVERSE_BUCKET},openverse-airflow-logs # Create empty buckets on every container startup # Note: $0 is included in the exec because "/bin/bash -c" swallows the first # argument, so it must be re-added at the beginning of the exec call diff --git a/env.template b/env.template index c8c1023feea..48e1073069a 100644 --- a/env.template +++ b/env.template @@ -100,9 +100,6 @@ AWS_ACCESS_KEY=test_key AWS_SECRET_KEY=test_secret # General bucket used for TSV->DB ingestion and logging OPENVERSE_BUCKET=openverse-storage -# Used only for commoncrawl parsing -S3_BUCKET=not_set -COMMONCRAWL_BUCKET=not_set # Seconds to wait before poking for availability of the data refresh pool when running a data_refresh # DAG. Used to shorten the time for testing purposes. DATA_REFRESH_POKE_INTERVAL=5 diff --git a/openverse_catalog/dags/.airflowignore b/openverse_catalog/dags/.airflowignore index b81cf3733fb..a4e1542ff45 100644 --- a/openverse_catalog/dags/.airflowignore +++ b/openverse_catalog/dags/.airflowignore @@ -1,5 +1,4 @@ # Ignore all non-DAG files common/ -commoncrawl/commoncrawl_scripts providers/provider_api_scripts retired diff --git a/openverse_catalog/dags/commoncrawl/commoncrawl_etl.py b/openverse_catalog/dags/retired/commoncrawl/commoncrawl_etl.py similarity index 100% rename from openverse_catalog/dags/commoncrawl/commoncrawl_etl.py rename to openverse_catalog/dags/retired/commoncrawl/commoncrawl_etl.py diff --git a/openverse_catalog/dags/commoncrawl/commoncrawl_scripts/commoncrawl_s3_syncer/SyncImageProviders.py b/openverse_catalog/dags/retired/commoncrawl/commoncrawl_scripts/commoncrawl_s3_syncer/SyncImageProviders.py similarity index 100% rename from openverse_catalog/dags/commoncrawl/commoncrawl_scripts/commoncrawl_s3_syncer/SyncImageProviders.py rename to openverse_catalog/dags/retired/commoncrawl/commoncrawl_scripts/commoncrawl_s3_syncer/SyncImageProviders.py diff --git a/openverse_catalog/dags/commoncrawl/commoncrawl_scripts/scripts/merge_cc_tags.py b/openverse_catalog/dags/retired/commoncrawl/commoncrawl_scripts/scripts/merge_cc_tags.py similarity index 100% rename from openverse_catalog/dags/commoncrawl/commoncrawl_scripts/scripts/merge_cc_tags.py rename to openverse_catalog/dags/retired/commoncrawl/commoncrawl_scripts/scripts/merge_cc_tags.py diff --git a/openverse_catalog/dags/commoncrawl/commoncrawl_utils.py b/openverse_catalog/dags/retired/commoncrawl/commoncrawl_utils.py similarity index 100% rename from openverse_catalog/dags/commoncrawl/commoncrawl_utils.py rename to openverse_catalog/dags/retired/commoncrawl/commoncrawl_utils.py diff --git a/openverse_catalog/dags/commoncrawl/sync_commoncrawl_workflow.py b/openverse_catalog/dags/retired/commoncrawl/sync_commoncrawl_workflow.py similarity index 100% rename from openverse_catalog/dags/commoncrawl/sync_commoncrawl_workflow.py rename to openverse_catalog/dags/retired/commoncrawl/sync_commoncrawl_workflow.py diff --git a/tests/dags/common/etl/__init__.py b/tests/dags/common/etl/__init__.py deleted file mode 100644 index e69de29bb2d..00000000000 diff --git a/tests/dags/common/etl/test_commoncrawl_utils.py b/tests/dags/common/etl/test_commoncrawl_utils.py deleted file mode 100644 index a1ed83bbd50..00000000000 --- a/tests/dags/common/etl/test_commoncrawl_utils.py +++ /dev/null @@ -1,40 +0,0 @@ -from unittest.mock import patch - -from commoncrawl import commoncrawl_utils - - -def test_load_file_to_s3_uses_connection_id(): - local_file = "/test/file/here.txt" - remote_key = "abc/def/here.txt" - aws_conn_id = "test_conn_id" - test_bucket_name = "test-bucket" - - with patch.object(commoncrawl_utils, "S3Hook") as mock_s3: - commoncrawl_utils.load_file_to_s3( - local_file, - remote_key, - test_bucket_name, - aws_conn_id, - ) - mock_s3.assert_called_once_with(aws_conn_id=aws_conn_id) - - -def test_load_file_to_s3_loads_file(): - local_file = "/test/file/here.txt" - remote_key = "abc/def/here.txt" - aws_conn_id = "test_conn_id" - test_bucket_name = "test-bucket" - - with patch.object(commoncrawl_utils.S3Hook, "load_file") as mock_s3_load_file: - commoncrawl_utils.load_file_to_s3( - local_file, - remote_key, - test_bucket_name, - aws_conn_id, - ) - mock_s3_load_file.assert_called_once_with( - local_file, - remote_key, - replace=True, - bucket_name=test_bucket_name, - ) diff --git a/tests/dags/common/loader/test_resources/new_columns_crawl.tsv b/tests/dags/common/loader/test_resources/new_columns_crawl.tsv deleted file mode 100644 index 84a0b36014b..00000000000 --- a/tests/dags/common/loader/test_resources/new_columns_crawl.tsv +++ /dev/null @@ -1,2 +0,0 @@ -7789139 https://www.behance.com/thing:2823006 https://cdn.behance.com/renders/cd/2f/f4/ba/85/1892c3b6344dc168a18897359a0c97a9_display_large.jpg https://cdn.behance.com/renders/cd/2f/f4/ba/85/1892c3b6344dc168a18897359a0c97a9_display_medium.jpg \N \N \N cc0 1.0 Walter Hsiao https://www.behance.com/walter Air Spinner {"description": "*Update 2018-04-21: Added two additional versions* I had no idea what this was for when I uploaded it. I didn't think it would be useful for anything other than as a 3D print demo, I just made it to satisfy my curiosity. But if you check out comments below, [guttyr](https://www.behance.com/guttyr/about) discovered you can spin it up by blowing on it, and it can be used by speech therapists to help special needs kids get better at breath support and control. Different parts of the spinner will spin depending where you blow on it, I'm still figuring which locations and angles work best. Maybe it'll help me too as I'm frequently out of breath when I'm at altitude or underwater. It can be printed with a 0.5mm extrusion width and 2 perimeters without infill. Printed in [Atomic Filament Granite Grey PLA](https://atomicfilament.com/products/gray-granite-pla-filament) and [Octave Precision Gold PLA*](http://amzn.to/2p7JPU0) New versions printed in [Atomic Filament Starry Night PLA](https://atomicfilament.com/collections/opaque-pla-filaments-1/products/translucent-starry-night-pla) and [3D Solutech Transparent Blue PETG*](https://amzn.to/2K3migu) **affiliate links*", "3d_model": "https://cdn.behance.com/assets/49/2e/e5/2b/26/Gimball_Test_Print.stl"} [{"name": "air", "provider": "behance"}, {"name": "Demo", "provider": "behance"}, {"name": "Gimbal", "provider": "behance"}, {"name": "Spinner", "provider": "behance"}, {"name": "Test", "provider": "behance"}, {"name": "Useless", "provider": "behance"}] f behance behance commoncrawl -8033502 https://www.behance.com/thing:2823006 https://cdn.behance.com/renders/ed/47/66/f7/43/3dee6caf0e321f796b4d34b78945a4ee_display_large.jpg https://cdn.behance.com/renders/ed/47/66/f7/43/3dee6caf0e321f796b4d34b78945a4ee_display_medium.jpg \N \N \N cc0 1.0 Walter Hsiao https://www.behance.com/walter Air Spinner {"description": "*Update 2018-04-21: Added two additional versions* I had no idea what this was for when I uploaded it. I didn't think it would be useful for anything other than as a 3D print demo, I just made it to satisfy my curiosity. But if you check out comments below, [guttyr](https://www.behance.com/guttyr/about) discovered you can spin it up by blowing on it, and it can be used by speech therapists to help special needs kids get better at breath support and control. Different parts of the spinner will spin depending where you blow on it, I'm still figuring which locations and angles work best. Maybe it'll help me too as I'm frequently out of breath when I'm at altitude or underwater. It can be printed with a 0.5mm extrusion width and 2 perimeters without infill. Printed in [Atomic Filament Granite Grey PLA](https://atomicfilament.com/products/gray-granite-pla-filament) and [Octave Precision Gold PLA*](http://amzn.to/2p7JPU0) New versions printed in [Atomic Filament Starry Night PLA](https://atomicfilament.com/collections/opaque-pla-filaments-1/products/translucent-starry-night-pla) and [3D Solutech Transparent Blue PETG*](https://amzn.to/2K3migu) **affiliate links*", "3d_model": "https://cdn.behance.com/assets/c7/e9/4a/9f/6b/Air_Spinner_2_-_Solid.stl"} [{"name": "air", "provider": "behance"}, {"name": "Demo", "provider": "behance"}, {"name": "Gimbal", "provider": "behance"}, {"name": "Spinner", "provider": "behance"}, {"name": "Test", "provider": "behance"}, {"name": "Useless", "provider": "behance"}] f behance behance commoncrawl diff --git a/tests/dags/common/loader/test_resources/new_columns_papis.tsv b/tests/dags/common/loader/test_resources/new_columns_papis.tsv deleted file mode 100644 index 7050804677a..00000000000 --- a/tests/dags/common/loader/test_resources/new_columns_papis.tsv +++ /dev/null @@ -1,2 +0,0 @@ -7789139 https://www.thingiverse.com/thing:2823006 https://cdn.thingiverse.com/renders/cd/2f/f4/ba/85/1892c3b6344dc168a18897359a0c97a9_display_large.jpg https://cdn.thingiverse.com/renders/cd/2f/f4/ba/85/1892c3b6344dc168a18897359a0c97a9_display_medium.jpg \N \N \N cc0 1.0 Walter Hsiao https://www.thingiverse.com/walter Air Spinner {"description": "*Update 2018-04-21: Added two additional versions* I had no idea what this was for when I uploaded it. I didn't think it would be useful for anything other than as a 3D print demo, I just made it to satisfy my curiosity. But if you check out comments below, [guttyr](https://www.thingiverse.com/guttyr/about) discovered you can spin it up by blowing on it, and it can be used by speech therapists to help special needs kids get better at breath support and control. Different parts of the spinner will spin depending where you blow on it, I'm still figuring which locations and angles work best. Maybe it'll help me too as I'm frequently out of breath when I'm at altitude or underwater. It can be printed with a 0.5mm extrusion width and 2 perimeters without infill. Printed in [Atomic Filament Granite Grey PLA](https://atomicfilament.com/products/gray-granite-pla-filament) and [Octave Precision Gold PLA*](http://amzn.to/2p7JPU0) New versions printed in [Atomic Filament Starry Night PLA](https://atomicfilament.com/collections/opaque-pla-filaments-1/products/translucent-starry-night-pla) and [3D Solutech Transparent Blue PETG*](https://amzn.to/2K3migu) **affiliate links*", "3d_model": "https://cdn.thingiverse.com/assets/49/2e/e5/2b/26/Gimball_Test_Print.stl"} [{"name": "air", "provider": "thingiverse"}, {"name": "Demo", "provider": "thingiverse"}, {"name": "Gimbal", "provider": "thingiverse"}, {"name": "Spinner", "provider": "thingiverse"}, {"name": "Test", "provider": "thingiverse"}, {"name": "Useless", "provider": "thingiverse"}] f thingiverse thingiverse provider_api -8033502 https://www.thingiverse.com/thing:2823006 https://cdn.thingiverse.com/renders/ed/47/66/f7/43/3dee6caf0e321f796b4d34b78945a4ee_display_large.jpg https://cdn.thingiverse.com/renders/ed/47/66/f7/43/3dee6caf0e321f796b4d34b78945a4ee_display_medium.jpg \N \N \N cc0 1.0 Walter Hsiao https://www.thingiverse.com/walter Air Spinner {"description": "*Update 2018-04-21: Added two additional versions* I had no idea what this was for when I uploaded it. I didn't think it would be useful for anything other than as a 3D print demo, I just made it to satisfy my curiosity. But if you check out comments below, [guttyr](https://www.thingiverse.com/guttyr/about) discovered you can spin it up by blowing on it, and it can be used by speech therapists to help special needs kids get better at breath support and control. Different parts of the spinner will spin depending where you blow on it, I'm still figuring which locations and angles work best. Maybe it'll help me too as I'm frequently out of breath when I'm at altitude or underwater. It can be printed with a 0.5mm extrusion width and 2 perimeters without infill. Printed in [Atomic Filament Granite Grey PLA](https://atomicfilament.com/products/gray-granite-pla-filament) and [Octave Precision Gold PLA*](http://amzn.to/2p7JPU0) New versions printed in [Atomic Filament Starry Night PLA](https://atomicfilament.com/collections/opaque-pla-filaments-1/products/translucent-starry-night-pla) and [3D Solutech Transparent Blue PETG*](https://amzn.to/2K3migu) **affiliate links*", "3d_model": "https://cdn.thingiverse.com/assets/c7/e9/4a/9f/6b/Air_Spinner_2_-_Solid.stl"} [{"name": "air", "provider": "thingiverse"}, {"name": "Demo", "provider": "thingiverse"}, {"name": "Gimbal", "provider": "thingiverse"}, {"name": "Spinner", "provider": "thingiverse"}, {"name": "Test", "provider": "thingiverse"}, {"name": "Useless", "provider": "thingiverse"}] f thingiverse thingiverse provider_api diff --git a/tests/dags/common/loader/test_resources/old_columns_crawl.tsv b/tests/dags/common/loader/test_resources/old_columns_crawl.tsv deleted file mode 100644 index c3940f3b82a..00000000000 --- a/tests/dags/common/loader/test_resources/old_columns_crawl.tsv +++ /dev/null @@ -1,2 +0,0 @@ -7789139 https://www.behance.com/thing:2823006 https://cdn.behance.com/renders/cd/2f/f4/ba/85/1892c3b6344dc168a18897359a0c97a9_display_large.jpg https://cdn.behance.com/renders/cd/2f/f4/ba/85/1892c3b6344dc168a18897359a0c97a9_display_medium.jpg \N \N \N cc0 1.0 Walter Hsiao https://www.behance.com/walter Air Spinner {"description": "*Update 2018-04-21: Added two additional versions* I had no idea what this was for when I uploaded it. I didn't think it would be useful for anything other than as a 3D print demo, I just made it to satisfy my curiosity. But if you check out comments below, [guttyr](https://www.behance.com/guttyr/about) discovered you can spin it up by blowing on it, and it can be used by speech therapists to help special needs kids get better at breath support and control. Different parts of the spinner will spin depending where you blow on it, I'm still figuring which locations and angles work best. Maybe it'll help me too as I'm frequently out of breath when I'm at altitude or underwater. It can be printed with a 0.5mm extrusion width and 2 perimeters without infill. Printed in [Atomic Filament Granite Grey PLA](https://atomicfilament.com/products/gray-granite-pla-filament) and [Octave Precision Gold PLA*](http://amzn.to/2p7JPU0) New versions printed in [Atomic Filament Starry Night PLA](https://atomicfilament.com/collections/opaque-pla-filaments-1/products/translucent-starry-night-pla) and [3D Solutech Transparent Blue PETG*](https://amzn.to/2K3migu) **affiliate links*", "3d_model": "https://cdn.behance.com/assets/49/2e/e5/2b/26/Gimball_Test_Print.stl"} [{"name": "air", "provider": "behance"}, {"name": "Demo", "provider": "behance"}, {"name": "Gimbal", "provider": "behance"}, {"name": "Spinner", "provider": "behance"}, {"name": "Test", "provider": "behance"}, {"name": "Useless", "provider": "behance"}] f behance commoncrawl -8033502 https://www.behance.com/thing:2823006 https://cdn.behance.com/renders/ed/47/66/f7/43/3dee6caf0e321f796b4d34b78945a4ee_display_large.jpg https://cdn.behance.com/renders/ed/47/66/f7/43/3dee6caf0e321f796b4d34b78945a4ee_display_medium.jpg \N \N \N cc0 1.0 Walter Hsiao https://www.behance.com/walter Air Spinner {"description": "*Update 2018-04-21: Added two additional versions* I had no idea what this was for when I uploaded it. I didn't think it would be useful for anything other than as a 3D print demo, I just made it to satisfy my curiosity. But if you check out comments below, [guttyr](https://www.behance.com/guttyr/about) discovered you can spin it up by blowing on it, and it can be used by speech therapists to help special needs kids get better at breath support and control. Different parts of the spinner will spin depending where you blow on it, I'm still figuring which locations and angles work best. Maybe it'll help me too as I'm frequently out of breath when I'm at altitude or underwater. It can be printed with a 0.5mm extrusion width and 2 perimeters without infill. Printed in [Atomic Filament Granite Grey PLA](https://atomicfilament.com/products/gray-granite-pla-filament) and [Octave Precision Gold PLA*](http://amzn.to/2p7JPU0) New versions printed in [Atomic Filament Starry Night PLA](https://atomicfilament.com/collections/opaque-pla-filaments-1/products/translucent-starry-night-pla) and [3D Solutech Transparent Blue PETG*](https://amzn.to/2K3migu) **affiliate links*", "3d_model": "https://cdn.behance.com/assets/c7/e9/4a/9f/6b/Air_Spinner_2_-_Solid.stl"} [{"name": "air", "provider": "behance"}, {"name": "Demo", "provider": "behance"}, {"name": "Gimbal", "provider": "behance"}, {"name": "Spinner", "provider": "behance"}, {"name": "Test", "provider": "behance"}, {"name": "Useless", "provider": "behance"}] f behance commoncrawl diff --git a/tests/dags/common/loader/test_resources/old_columns_papis.tsv b/tests/dags/common/loader/test_resources/old_columns_papis.tsv deleted file mode 100644 index 5bcb9d52a9a..00000000000 --- a/tests/dags/common/loader/test_resources/old_columns_papis.tsv +++ /dev/null @@ -1,2 +0,0 @@ -7789139 https://www.thingiverse.com/thing:2823006 https://cdn.thingiverse.com/renders/cd/2f/f4/ba/85/1892c3b6344dc168a18897359a0c97a9_display_large.jpg https://cdn.thingiverse.com/renders/cd/2f/f4/ba/85/1892c3b6344dc168a18897359a0c97a9_display_medium.jpg \N \N \N cc0 1.0 Walter Hsiao https://www.thingiverse.com/walter Air Spinner {"description": "*Update 2018-04-21: Added two additional versions* I had no idea what this was for when I uploaded it. I didn't think it would be useful for anything other than as a 3D print demo, I just made it to satisfy my curiosity. But if you check out comments below, [guttyr](https://www.thingiverse.com/guttyr/about) discovered you can spin it up by blowing on it, and it can be used by speech therapists to help special needs kids get better at breath support and control. Different parts of the spinner will spin depending where you blow on it, I'm still figuring which locations and angles work best. Maybe it'll help me too as I'm frequently out of breath when I'm at altitude or underwater. It can be printed with a 0.5mm extrusion width and 2 perimeters without infill. Printed in [Atomic Filament Granite Grey PLA](https://atomicfilament.com/products/gray-granite-pla-filament) and [Octave Precision Gold PLA*](http://amzn.to/2p7JPU0) New versions printed in [Atomic Filament Starry Night PLA](https://atomicfilament.com/collections/opaque-pla-filaments-1/products/translucent-starry-night-pla) and [3D Solutech Transparent Blue PETG*](https://amzn.to/2K3migu) **affiliate links*", "3d_model": "https://cdn.thingiverse.com/assets/49/2e/e5/2b/26/Gimball_Test_Print.stl"} [{"name": "air", "provider": "thingiverse"}, {"name": "Demo", "provider": "thingiverse"}, {"name": "Gimbal", "provider": "thingiverse"}, {"name": "Spinner", "provider": "thingiverse"}, {"name": "Test", "provider": "thingiverse"}, {"name": "Useless", "provider": "thingiverse"}] f thingiverse thingiverse -8033502 https://www.thingiverse.com/thing:2823006 https://cdn.thingiverse.com/renders/ed/47/66/f7/43/3dee6caf0e321f796b4d34b78945a4ee_display_large.jpg https://cdn.thingiverse.com/renders/ed/47/66/f7/43/3dee6caf0e321f796b4d34b78945a4ee_display_medium.jpg \N \N \N cc0 1.0 Walter Hsiao https://www.thingiverse.com/walter Air Spinner {"description": "*Update 2018-04-21: Added two additional versions* I had no idea what this was for when I uploaded it. I didn't think it would be useful for anything other than as a 3D print demo, I just made it to satisfy my curiosity. But if you check out comments below, [guttyr](https://www.thingiverse.com/guttyr/about) discovered you can spin it up by blowing on it, and it can be used by speech therapists to help special needs kids get better at breath support and control. Different parts of the spinner will spin depending where you blow on it, I'm still figuring which locations and angles work best. Maybe it'll help me too as I'm frequently out of breath when I'm at altitude or underwater. It can be printed with a 0.5mm extrusion width and 2 perimeters without infill. Printed in [Atomic Filament Granite Grey PLA](https://atomicfilament.com/products/gray-granite-pla-filament) and [Octave Precision Gold PLA*](http://amzn.to/2p7JPU0) New versions printed in [Atomic Filament Starry Night PLA](https://atomicfilament.com/collections/opaque-pla-filaments-1/products/translucent-starry-night-pla) and [3D Solutech Transparent Blue PETG*](https://amzn.to/2K3migu) **affiliate links*", "3d_model": "https://cdn.thingiverse.com/assets/c7/e9/4a/9f/6b/Air_Spinner_2_-_Solid.stl"} [{"name": "air", "provider": "thingiverse"}, {"name": "Demo", "provider": "thingiverse"}, {"name": "Gimbal", "provider": "thingiverse"}, {"name": "Spinner", "provider": "thingiverse"}, {"name": "Test", "provider": "thingiverse"}, {"name": "Useless", "provider": "thingiverse"}] f thingiverse thingiverse diff --git a/tests/dags/test_dag_parsing.py b/tests/dags/test_dag_parsing.py index 3087291a35c..37dea4e639b 100644 --- a/tests/dags/test_dag_parsing.py +++ b/tests/dags/test_dag_parsing.py @@ -18,8 +18,6 @@ "providers/provider_workflow_dag_factory.py", "maintenance/airflow_log_cleanup_workflow.py", "maintenance/pr_review_reminders/pr_review_reminders_dag.py", - "commoncrawl/sync_commoncrawl_workflow.py", - "commoncrawl/commoncrawl_etl.py", "database/recreate_popularity_calculation_dag_factory.py", "data_refresh/dag_factory.py", "oauth2/authorize_dag.py",