diff --git a/content/en/user-guide/aws/athena/athena-resource-browser.png b/content/en/user-guide/aws/athena/athena-resource-browser.png new file mode 100644 index 0000000000..de18024def Binary files /dev/null and b/content/en/user-guide/aws/athena/athena-resource-browser.png differ diff --git a/content/en/user-guide/aws/athena/index.md b/content/en/user-guide/aws/athena/index.md index 5493ec3003..d36d3b4c8e 100644 --- a/content/en/user-guide/aws/athena/index.md +++ b/content/en/user-guide/aws/athena/index.md @@ -1,69 +1,108 @@ --- title: "Athena" linkTitle: "Athena" -categories: ["LocalStack Pro"] -description: > - Get started with AWS Athena on LocalStack -aliases: - - /aws/athena/ +description: Get started with Athena on LocalStack --- -LocalStack Pro ships with built-in support for [Athena](https://aws.amazon.com/athena), Amazon's serverless data warehouse and analytics platform. -Athena uses [Presto](https://prestodb.github.io)/[Trino](https://trino.io) under the covers, and your Athena instance will be automatically configured with a Hive metastore that connects seamlessly to the LocalStack S3 API. -That is, you can easily connect your local S3 buckets and query data directly from S3 via the powerful Athena query API. +Athena is an interactive query service provided by Amazon Web Services (AWS) that enables you to analyze data stored in S3 using standard SQL queries. +Athena allows users to create ad-hoc queries to perform data analysis, filter, aggregate, and join datasets stored in S3. +It supports various file formats, such as JSON, Parquet, and CSV, making it compatible with a wide range of data sources. -## Basic Query Execution +LocalStack supports Athena via the Pro/Team offering, allowing you to configure the Athena APIs with a Hive metastore that can connect to the S3 API and query your data directly in your local environment. +The supported APIs are available on our [API coverage page](https://docs.localstack.cloud/references/coverage/coverage_athena/), which provides information on the extent of Athena's integration with LocalStack. -The following commands illustrate how to use Athena from the command line (assuming you have [`awslocal`](https://github.com/localstack/awscli-local) installed): +## Getting started + +This guide is designed for users new to Athena and assumes basic knowledge of the AWS CLI and our [`awslocal`](https://github.com/localstack/awscli-local) wrapper script. + +Start your LocalStack container using your preferred method. +We will demonstrate how to create an Athena table and run a query against it in addition to reading the results with the AWS CLI. + +{{< alert title="Note" >}} +To utilize the Athena API, LocalStack will download additional dependencies. +This involves getting a Docker image of around 1.5GB, containing Presto, Hive, and other tools. +These components are retrieved automatically when you initiate the service. +To ensure a smooth initial setup, ensure you're connected to a stable internet connection while fetching these components for the first time. +{{< /alert >}} + +### Create an S3 bucket + +You can create an S3 bucket using the [`mb`](https://docs.aws.amazon.com/cli/latest/reference/s3/mb.html) command. +Run the following command to create a bucket named `athena-bucket`: {{< command >}} -$ awslocal athena start-query-execution --query-string 'SELECT 1, 2, 3' -{ - "QueryExecutionId": "c9f453ad" -} -$ awslocal athena list-query-executions -{ - "QueryExecutionIds": [ - "c9f453ad" - ] -} -$ awslocal athena get-query-results --query-execution-id c9f453ad +$ awslocal s3 mb s3://athena-bucket +{{< / command >}} + +You can create some sample data using the following commands: + +{{< command >}} +$ echo "Name,Service" > data.csv +$ echo "LocalStack,Athena" >> data.csv +{{< / command >}} + +You can upload the data to your bucket using the [`cp`](https://docs.aws.amazon.com/cli/latest/reference/s3/cp.html) command: + +{{< command >}} +$ awslocal s3 cp data.csv s3://athena-bucket/data/ +{{< / command >}} + +### Create an Athena table + +You can create an Athena table using the [`CreateTable`](https://docs.aws.amazon.com/athena/latest/APIReference/API_CreateTable.html) API +Run the following command to create a table named `athena_table`: + +{{< command >}} +$ awslocal athena start-query-execution \ + --query-string "create external table tbl01 (name STRING, surname STRING) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' LOCATION 's3://athena-bucket/data/';" --result-configuration "OutputLocation=s3://athena-bucket/output/" +{{< / command >}} + +The following output would be retrieved: + +```bash { - "ResultSet": { - "Rows": [ - { - "Data": [ - { "VarCharValue": "_col0" }, - { "VarCharValue": "_col1" }, - { "VarCharValue": "_col2" } - ] - }, { - "Data": [ - { "VarCharValue": "1" }, - { "VarCharValue": "2" }, - { "VarCharValue": "3" } - ] - } - ], - "ResultSetMetadata": { - "ColumnInfo": [ - { - "Name": "_col0", "Type": "integer" - }, { - "Name": "_col1", "Type": "integer" - }, { - "Name": "_col2", "Type": "integer" - } - ] - } - }, - "UpdateCount": 0 + "QueryExecutionId": "593acab7" } +``` + +You can retrieve information about the query execution using the [`GetQueryExecution`](https://docs.aws.amazon.com/athena/latest/APIReference/API_GetQueryExecution.html) API. +Run the following command: + +{{< command >}} +$ awslocal athena get-query-execution --query-execution-id 593acab7 {{< / command >}} -{{< alert title="Note" >}} -In order to use the Athena API, some additional dependencies have to be fetched from the network (including Presto, Hive, and other tools). These dependencies are automatically fetched when you start up the service, so please make sure you're on a decent internet connection when using Athena for the first time. -{{< /alert >}} +Replace `593acab7` with the `QueryExecutionId` returned by the [`StartQueryExecution`](https://docs.aws.amazon.com/athena/latest/APIReference/API_StartQueryExecution.html) API. + +### Get output of the query + +You can get the output of the query using the [`GetQueryResults`](https://docs.aws.amazon.com/athena/latest/APIReference/API_GetQueryResults.html) API. +Run the following command: + +{{< command >}} +$ awslocal athena get-query-results --query-execution-id 593acab7 +{{< / command >}} + +You can now read the data from the `tbl01` table and retrieve the data from S3 that was mentioned in your table creation statement. +Run the following command: + +{{< command >}} +$ awslocal athena start-query-execution \ + --query-string "select * from tbl01;" --result-configuration "OutputLocation=s3://athena-bucket/output/" +{{< / command >}} + +You can retrieve the execution details similarly using the [`GetQueryExecution`](https://docs.aws.amazon.com/athena/latest/APIReference/API_GetQueryExecution.html) API using the `QueryExecutionId` returned by the previous step. + +You can copy the `ResultConfiguration` from the output and use it to retrieve the results of the query. +Run the following command: + +{{< command >}} +$ awslocal cp s3://athena-bucket/output/593acab7.csv . +$ cat 593acab7.csv +{{< / command >}} + +Replace `593acab7.csv` with the path to the file that was present in the `ResultConfiguration` of the previous step. +You can also use the [`GetQueryResults`](https://docs.aws.amazon.com/athena/latest/APIReference/API_GetQueryResults.html) API to retrieve the results of the query. ## Delta Lake Tables @@ -95,7 +134,8 @@ $ awslocal athena start-query-execution \ LOCATION 's3://test/' TBLPROPERTIES ('table_type'='DELTA')" {{< / command >}} -Please note that this query may take some time to finish executing. You can observe the output in the LocalStack container (ideally with `DEBUG=1` enabled) to follow the steps of the query execution. +Please note that this query may take some time to finish executing. +You can observe the output in the LocalStack container (ideally with `DEBUG=1` enabled) to follow the steps of the query execution. Finally, we can now run a `SELECT` query to extract data from the Delta Lake table we've just created: {{< command >}} @@ -138,3 +178,24 @@ The `SELECT` statement above currently requires us to prefix the database/table ## Iceberg Tables The LocalStack Athena implementation also supports [Iceberg tables](https://docs.aws.amazon.com/athena/latest/ug/querying-iceberg-creating-tables.html) - more details and samples will be provided here soon. + +## Resource Browser + +The LocalStack Web Application provides a Resource Browser for Athena query execution, writing SQL queries, and visualizing query results. +You can access the Resource Browser by opening the LocalStack Web Application in your browser, navigating to the **Resources** section, and then clicking on **Athena** under the **Analytics** section. + +Athena Resource Browser + +The Resource Browser allows you to perform the following actions: + +- **View Databases**: View the databases available in your Athena instance by clicking on the **Databases** tab. +- **View Catalogs**: View the catalogs available in your Athena instance by clicking on the **Catalogs** tab. +- **Edit Catalogs**: Edit the catalogs available in your Athena instance by clicking on the **Catalog name**, editing the catalog, and then clicking on the **Submit** button. +- **Create Catalogs**: Create a new catalog by clicking on the **Create Catalog** button, entering the catalog details, and then clicking on the **Submit** button. +- **Run SQL Queries**: Run SQL queries by clicking on the **SQL** button, entering the query, and then clicking on the **Execute** button. + +## Examples + +The following code snippets and sample applications provide practical examples of how to use Athena in LocalStack for various use cases: + +- [Query data in S3 Bucket with Amazon Athena, Glue Catalog & CloudFormation](https://github.com/localstack/query-data-s3-athena-glue-sample)