Skip to content
This repository has been archived by the owner on Feb 21, 2023. It is now read-only.

Latest commit

 

History

History
102 lines (63 loc) · 5.44 KB

README.md

File metadata and controls

102 lines (63 loc) · 5.44 KB

epileptiSentry Logo

Real-time electroencephalography (EEG) signal analysis for monitoring seizure activity at scale

Table of Contents

  1. About
  2. Engineering Design
  3. Deployment
  4. Credits
  5. Reference

Overview

An estimated 3.4 million Americans have epilepsy [1]. Despite its prevalence, accurate diagnosis of epilepsy is difficult: in fact, experts estimate that about 20% of people are misdiagnosed [2]. Electroencephalography (EEG) enables us to visualize brain activity and understand the nature of epileptic seizures, but they require extensive processing [3], and needs to be contextualized with a detailed and reliable account of the corresponding seizure event, which is often unavailable [4]. epileptiSentry leverages scalable stream processing technologies to:

  1. identify potential seizure activities from EEG signals real-time; and
  2. notify the right people to closely monitor the seizure events as they happen

Engineering Design

Processing Pipeline

Pipeline diagram

A Python script unpacks EEG data in S3 (see the Data Source section), reproduces them as stand-ins for EEG instruments, and sends them to a cluster of EC2 instances running Kafka. A cluster with a matching number of EC2 instances running Spark consumes the messages and calculates the abnormal indicator metric for each subject and channel, which is stored in TimescaleDB. Grafana is used to visualize the metric and issue alerts to observers if a predefined threshold is exceeded.

Processing Logic

The processing logic uses a discrete wavelet tranform-based method described by Ocak [5]. For a more in-depth discussion, consult the README file in the Spark subdirectory.

Data source

EEG data was sourced from PhysioNet's CHB-MIT Scalp EEG Database, mirrored in S3.

This database, collected at the Children’s Hospital Boston, consists of EEG recordings from pediatric subjects with intractable seizures. Subjects were monitored for up to several days following withdrawal of anti-seizure medication in order to characterize their seizures and assess their candidacy for surgical intervention.

Deployment

epileptiSentry was developed and deployed using Amazon Web Services' cloud computing platform.

EC2 Configuration

Role Signal Reproduction Kafka Spark TimescaleDB Grafana
Type c5.xlarge m5.large c5.2xlarge c5.2xlarge t3.micro
Number 2 5 5 1 1

Signal Reproduction

Launch two c5.xlarge instances running Ubuntu 18.04 LTS with roles for S3 access. Install Python 3.7.3 and required packages:

python3.7 -m pip -r $PROJECT_HOME/config/eeg-player-requirements.txt

To start signal reproduction, run a bash script from each instance, which will simultaneously kick off 12 processes running produce-signals.py.

src/sig_gen/start_producer.sh  

To stop all processes from the above:

pkill -f produce-signals

Kafka & Zookeeper

Insight Pegasus is required to deploy this cluster. Once Pegasus is installed, run deploy_kafka_cluster_pegasus.sh in the config/kafka-cluster directory.

Spark

Download and install Spark 2.4.3 pre-built with Apache Hadoop 2.7+ on instances running Ubuntu 18.04 LTS. Install Python 3.7.3 and Python packages:

python3.7 -m pip -r config/sparkcluster-requirements.txt

TimescaleDB

Launch an c5.2xlarge instance with Amazon Machine Image (AMI) provided by Timescale. Then, import the following schema:

psql -U postgres -d epileptisentry -f config/tsdb-schema.sql

Grafana

Follow the instructions from the Grafana Labs website to deploy Grafana. Custom configuration key-values and apache2's setup (http-to-https redirection) are available in the `config/grafana' directory.

Credits

epileptiSentry was developed by David Lee (LinkedIn Profile). This project was a deliverable from my fellowship in Insight Data Engineering Fellowship program in June 2019 in New York City, NY, United States.

References

[1] U.S. Center for Disease Control and Prevention. Epilepsy Data and Statistics.
[2] Oto, M. The misdiagnosis of epilepsy: Appraising risks and managing uncertainty. Seizure (2017) 44:143-6.
[3] Bigdely-Shamlo et al. The PREP pipeline: standardized preprocessing for large-scale EEG analysis. Front Neuroinform (2015) 9:16.
[4] Moeller et al. Electroencephalography (EEG) in the diagnosis of seizures and epilepsy. UpToDate (2018).
[5] Ocak, H. Automatic detection of epileptic seizures in EEG using discrete wavelet transform and approximate entropy. Exp System Appl (2009) 2, Part 1:2027-2036.