zoom 20201201

Zoom meeting 1st December

20201201 16:00 UTC

Previous meeting zoom-20201127

In progress

Wiki page to plan notebooks
- in progress wiki
Deploy a larger cluster to work with the full size data set
- in progress 239
- external access http://zeppelin.aglais.uk:8080/
User space ssh rsync access
- in progress 195
- in progress 226
- external access ssh://[email protected]

New issues

Configuration for Ansible deployment
- new 240
Experiment with scaling the Ansible deployment
- new 241
User accounts in Drupal
- in progress 242
Integration with IRIS IAM
- in progress 243
Resource booking in Drupal
- in progress 244
Automated testing for Kubernetes deployment
- in progress 245
Investigate IRIS echo S3 service for user data
- new 246

New questions

User data space

Simple implementation reserves 10G per user.
Simple implementation for now - works for small number of users.
Longer term - How do we recover unused space?
Longer term - How do we handle dormant accounts?
Longer term - Staging mechanism to push older data to an archive and recover unused space?

Spark version

Current live system is spark-2.7.
Zeppelin Hadoop-Yarn deploy is spark-2.7.
Kubernetes deploy is spark-3.x.
Nigel's Random Forrest example uses spark-2.7?
- Does it need spark-2.7?
AXS distribution is based on spark-2.7.
- Does it need spark-2.7 or can we create a spark-3.x version?

Do we stick with spark-2.7 or try to upgrade to spark-3.x.

Are there issues with Zeppelin Hadoop Yarn deployment?
Are there issues with getting AXS to work with spark-3.x?
The Kubernetes deployment probably won't work with spark-2.x.

Questions about AXS

Can we figure out how to apply AXS changes to a standard Spark distribution?
What benefits does AXS give us?
Can we create an example that demonstrates this?

AXS issues

Differences between a standard distribution and the AXS distribution.
- issue 221
Apply differences to add AXS to our deployment.
- issue 222
Tests that demonstrate that AXS in installed and working
- issue 223
Benchmark to compare performance of AXS augmented deployment
- issue 224

Actions

Create script that shows conversion from csv to parquet for Gaia, writing the results to the Ceph shares (stv) Test out multiple concurrent users running jobs via Zeppelin

Provide feedback

Saved searches

Use saved searches to filter your results more quickly