Dremio and all dependencies to setup self-analytics
This guide provides step-by-step instructions to set up a local data lakehouse environment using MinIO as Nessie or Amazon S3 and Dremio with Docker and Docker Compose.
- Prerequisites
- Installation Steps
- Additional Resources
- Docker installed on your machine.
- Docker Compose installed.
- Docker: Get Docker
- Docker Compose: Install Docker Compose
Clone the repository containing the docker-compose.yml
file.
git clone <repository-url>
cd <repository-directory>
Navigate to the directory containing the docker-compose.yml
file and start the services using Docker Compose.
docker-compose up -d
This command will start MinIO, Nessie, and Dremio services in detached mode.
Access the MinIO console by navigating to http://localhost:9000
in your web browser. Use the default credentials to log in:
- Username:
admin
- Password:
password
Create a new bucket named datalake
.
Access the Dremio UI by navigating to http://localhost:9047
in your web browser. Follow the setup wizard to complete the initial configuration.
Ensure all services are running correctly by checking their respective UIs:
- MinIO:
http://localhost:9000
- Dremio:
http://localhost:9047
You should be able to interact with each service without issues.
-
To configure Nessie as a source in Dremio using MinIO, follow 7.1. Configure Nessie Source in Dremio using MinIO.
-
To configure Amazon S3 source as a source in Dremio, follow 7.2. Configure S3 Source in Dremio using MinIO.
-
Access Dremio UI: Navigate to
http://localhost:9047
and log in if you haven't already. -
Add a New Source:
- Click on the
+
icon next toSources
in the left-hand menu. - Select
Nessie
from the list of available sources.
- Click on the
-
Configure the Nessie Source:
-
Name: Enter a name for the Nessie source, e.g.,
NessieSource
. -
Nessie Server URL: Enter
http://nessie:19120/api/v2
. -
Authentication Type: Select
None
(or configure as needed). -
Go to Storage inside Nessie configuration
- AWS root patht: Enter
datalake
. - AWS Access Key: Enter
admin
. - AWS Secret Key: Enter
password
.
- AWS root patht: Enter
-
User Other set the followings Connection Properties:
- fs.s3a.path.style.access: Enter
true
- fs.s3a.endpoint: Enter
minio:9000
- dremio.s3.compat: Enter
true
- fs.s3a.path.style.access: Enter
-
-
Save the Configuration: Click
Save
to add the Nessie source. -
Verify the Source:
- Navigate to the
Sources
section in Dremio. - Click on the newly created
NessieSource
to ensure it connects and displays the contents of thedatalake
bucket.
- Navigate to the
This completes the configuration of Nessie as a source in Dremio using MinIO.
-
Access Dremio UI: Navigate to
http://localhost:9047
and log in if you haven't already. -
Add a New Source:
- Click on the
+
icon next toSources
in the left-hand menu. - Select
Amazon S3
from the list of available sources.
- Click on the
-
Configure the Nessie Source:
-
Name: Enter a name for the S3 source source, e.g.,
S3Source
. -
Authentication Type: Select
AWS Access Key
. -
AWS Access Key: Enter
admin
. -
AWS Secret Key: Enter
password
. -
Disable the option
Encrypt connection
-
Go to
Advanced Options
tab and set the following Connection Properties:- fs.s3a.path.style.access: Enter
true
- fs.s3a.endpoint: Enter
minio:9000
- dremio.s3.compat: Enter
true
- fs.s3a.path.style.access: Enter
-
In
Cache Options
disable the optionEnable local caching when possible
-
This completes the configuration of Amazon S3 as a source in Dremio using MinIO.
To verify that writing to the source is working correctly, follow these steps and in <source_name>
replace with NessieSource
or S3Source
accordingly:
-
Access Dremio SQL Editor:
- Navigate to
http://localhost:9047
and log in if you haven't already. - Click on the
SQL Editor
tab at the top of the page.
- Navigate to
-
Create a New Table:
- In the SQL Editor, enter the following SQL command to create a new table in the source:
CREATE TABLE <source_name>.datalake.people ( id INT, first_name VARCHAR, last_name VARCHAR, age INT ) PARTITION BY (truncate(1, last_name));
-
Execute the Command:
- Click the
Run
button to execute the SQL command.
- Click the
-
Verify the Table Creation:
- Navigate to the
Sources
section in Dremio. - Click on
<source_name>
and thendatalake
to ensure thepeople
table has been created successfully.
- Navigate to the
This step confirms that you can write to the source configured in Dremio using MinIO.
For more information and advanced configurations, refer to the following resources:
These resources provide deeper insights and extended functionalities that you can explore to enhance your data lakehouse setup.
This project is licensed under the MIT License. See the LICENSE file for details.