Skip to content

Commit

Permalink
enhance docs for athena (#789)
Browse files Browse the repository at this point in the history
  • Loading branch information
HarshCasper authored Sep 5, 2023
1 parent 97f3d8b commit 74def0a
Show file tree
Hide file tree
Showing 2 changed files with 115 additions and 54 deletions.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
169 changes: 115 additions & 54 deletions content/en/user-guide/aws/athena/index.md
Original file line number Diff line number Diff line change
@@ -1,69 +1,108 @@
---
title: "Athena"
linkTitle: "Athena"
categories: ["LocalStack Pro"]
description: >
Get started with AWS Athena on LocalStack
aliases:
- /aws/athena/
description: Get started with Athena on LocalStack
---

LocalStack Pro ships with built-in support for [Athena](https://aws.amazon.com/athena), Amazon's serverless data warehouse and analytics platform.
Athena uses [Presto](https://prestodb.github.io)/[Trino](https://trino.io) under the covers, and your Athena instance will be automatically configured with a Hive metastore that connects seamlessly to the LocalStack S3 API.
That is, you can easily connect your local S3 buckets and query data directly from S3 via the powerful Athena query API.
Athena is an interactive query service provided by Amazon Web Services (AWS) that enables you to analyze data stored in S3 using standard SQL queries.
Athena allows users to create ad-hoc queries to perform data analysis, filter, aggregate, and join datasets stored in S3.
It supports various file formats, such as JSON, Parquet, and CSV, making it compatible with a wide range of data sources.

## Basic Query Execution
LocalStack supports Athena via the Pro/Team offering, allowing you to configure the Athena APIs with a Hive metastore that can connect to the S3 API and query your data directly in your local environment.
The supported APIs are available on our [API coverage page](https://docs.localstack.cloud/references/coverage/coverage_athena/), which provides information on the extent of Athena's integration with LocalStack.

The following commands illustrate how to use Athena from the command line (assuming you have [`awslocal`](https://github.com/localstack/awscli-local) installed):
## Getting started

This guide is designed for users new to Athena and assumes basic knowledge of the AWS CLI and our [`awslocal`](https://github.com/localstack/awscli-local) wrapper script.

Start your LocalStack container using your preferred method.
We will demonstrate how to create an Athena table and run a query against it in addition to reading the results with the AWS CLI.

{{< alert title="Note" >}}
To utilize the Athena API, LocalStack will download additional dependencies.
This involves getting a Docker image of around 1.5GB, containing Presto, Hive, and other tools.
These components are retrieved automatically when you initiate the service.
To ensure a smooth initial setup, ensure you're connected to a stable internet connection while fetching these components for the first time.
{{< /alert >}}

### Create an S3 bucket

You can create an S3 bucket using the [`mb`](https://docs.aws.amazon.com/cli/latest/reference/s3/mb.html) command.
Run the following command to create a bucket named `athena-bucket`:

{{< command >}}
$ awslocal athena start-query-execution --query-string 'SELECT 1, 2, 3'
{
"QueryExecutionId": "c9f453ad"
}
$ awslocal athena list-query-executions
{
"QueryExecutionIds": [
"c9f453ad"
]
}
$ awslocal athena get-query-results --query-execution-id c9f453ad
$ awslocal s3 mb s3://athena-bucket
{{< / command >}}

You can create some sample data using the following commands:

{{< command >}}
$ echo "Name,Service" > data.csv
$ echo "LocalStack,Athena" >> data.csv
{{< / command >}}

You can upload the data to your bucket using the [`cp`](https://docs.aws.amazon.com/cli/latest/reference/s3/cp.html) command:

{{< command >}}
$ awslocal s3 cp data.csv s3://athena-bucket/data/
{{< / command >}}

### Create an Athena table

You can create an Athena table using the [`CreateTable`](https://docs.aws.amazon.com/athena/latest/APIReference/API_CreateTable.html) API
Run the following command to create a table named `athena_table`:

{{< command >}}
$ awslocal athena start-query-execution \
--query-string "create external table tbl01 (name STRING, surname STRING) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' LOCATION 's3://athena-bucket/data/';" --result-configuration "OutputLocation=s3://athena-bucket/output/"
{{< / command >}}

The following output would be retrieved:

```bash
{
"ResultSet": {
"Rows": [
{
"Data": [
{ "VarCharValue": "_col0" },
{ "VarCharValue": "_col1" },
{ "VarCharValue": "_col2" }
]
}, {
"Data": [
{ "VarCharValue": "1" },
{ "VarCharValue": "2" },
{ "VarCharValue": "3" }
]
}
],
"ResultSetMetadata": {
"ColumnInfo": [
{
"Name": "_col0", "Type": "integer"
}, {
"Name": "_col1", "Type": "integer"
}, {
"Name": "_col2", "Type": "integer"
}
]
}
},
"UpdateCount": 0
"QueryExecutionId": "593acab7"
}
```

You can retrieve information about the query execution using the [`GetQueryExecution`](https://docs.aws.amazon.com/athena/latest/APIReference/API_GetQueryExecution.html) API.
Run the following command:

{{< command >}}
$ awslocal athena get-query-execution --query-execution-id 593acab7
{{< / command >}}

{{< alert title="Note" >}}
In order to use the Athena API, some additional dependencies have to be fetched from the network (including Presto, Hive, and other tools). These dependencies are automatically fetched when you start up the service, so please make sure you're on a decent internet connection when using Athena for the first time.
{{< /alert >}}
Replace `593acab7` with the `QueryExecutionId` returned by the [`StartQueryExecution`](https://docs.aws.amazon.com/athena/latest/APIReference/API_StartQueryExecution.html) API.

### Get output of the query

You can get the output of the query using the [`GetQueryResults`](https://docs.aws.amazon.com/athena/latest/APIReference/API_GetQueryResults.html) API.
Run the following command:

{{< command >}}
$ awslocal athena get-query-results --query-execution-id 593acab7
{{< / command >}}

You can now read the data from the `tbl01` table and retrieve the data from S3 that was mentioned in your table creation statement.
Run the following command:

{{< command >}}
$ awslocal athena start-query-execution \
--query-string "select * from tbl01;" --result-configuration "OutputLocation=s3://athena-bucket/output/"
{{< / command >}}

You can retrieve the execution details similarly using the [`GetQueryExecution`](https://docs.aws.amazon.com/athena/latest/APIReference/API_GetQueryExecution.html) API using the `QueryExecutionId` returned by the previous step.

You can copy the `ResultConfiguration` from the output and use it to retrieve the results of the query.
Run the following command:

{{< command >}}
$ awslocal cp s3://athena-bucket/output/593acab7.csv .
$ cat 593acab7.csv
{{< / command >}}

Replace `593acab7.csv` with the path to the file that was present in the `ResultConfiguration` of the previous step.
You can also use the [`GetQueryResults`](https://docs.aws.amazon.com/athena/latest/APIReference/API_GetQueryResults.html) API to retrieve the results of the query.

## Delta Lake Tables

Expand Down Expand Up @@ -95,7 +134,8 @@ $ awslocal athena start-query-execution \
LOCATION 's3://test/' TBLPROPERTIES ('table_type'='DELTA')"
{{< / command >}}

Please note that this query may take some time to finish executing. You can observe the output in the LocalStack container (ideally with `DEBUG=1` enabled) to follow the steps of the query execution.
Please note that this query may take some time to finish executing.
You can observe the output in the LocalStack container (ideally with `DEBUG=1` enabled) to follow the steps of the query execution.

Finally, we can now run a `SELECT` query to extract data from the Delta Lake table we've just created:
{{< command >}}
Expand Down Expand Up @@ -138,3 +178,24 @@ The `SELECT` statement above currently requires us to prefix the database/table
## Iceberg Tables

The LocalStack Athena implementation also supports [Iceberg tables](https://docs.aws.amazon.com/athena/latest/ug/querying-iceberg-creating-tables.html) - more details and samples will be provided here soon.

## Resource Browser

The LocalStack Web Application provides a Resource Browser for Athena query execution, writing SQL queries, and visualizing query results.
You can access the Resource Browser by opening the LocalStack Web Application in your browser, navigating to the **Resources** section, and then clicking on **Athena** under the **Analytics** section.

<img src="athena-resource-browser.png" alt="Athena Resource Browser" title="Athena Resource Browser" width="900" />

The Resource Browser allows you to perform the following actions:

- **View Databases**: View the databases available in your Athena instance by clicking on the **Databases** tab.
- **View Catalogs**: View the catalogs available in your Athena instance by clicking on the **Catalogs** tab.
- **Edit Catalogs**: Edit the catalogs available in your Athena instance by clicking on the **Catalog name**, editing the catalog, and then clicking on the **Submit** button.
- **Create Catalogs**: Create a new catalog by clicking on the **Create Catalog** button, entering the catalog details, and then clicking on the **Submit** button.
- **Run SQL Queries**: Run SQL queries by clicking on the **SQL** button, entering the query, and then clicking on the **Execute** button.

## Examples

The following code snippets and sample applications provide practical examples of how to use Athena in LocalStack for various use cases:

- [Query data in S3 Bucket with Amazon Athena, Glue Catalog & CloudFormation](https://github.com/localstack/query-data-s3-athena-glue-sample)

0 comments on commit 74def0a

Please sign in to comment.