Real-time electroencephalography (EEG) signal analysis for monitoring seizure activity at scale
An estimated 3.4 million Americans have epilepsy [1]. Despite its prevalence, accurate diagnosis of epilepsy is difficult: in fact, experts estimate that about 20% of people are misdiagnosed [2]. Electroencephalography (EEG) enables us to visualize brain activity and understand the nature of epileptic seizures, but they require extensive processing [3], and needs to be contextualized with a detailed and reliable account of the corresponding seizure event, which is often unavailable [4]. epileptiSentry leverages scalable stream processing technologies to:
- identify potential seizure activities from EEG signals real-time; and
- notify the right people to closely monitor the seizure events as they happen
A Python script unpacks EEG data in S3 (see the Data Source section), reproduces them as stand-ins for EEG instruments, and sends them to a cluster of EC2 instances running Kafka. A cluster with a matching number of EC2 instances running Spark consumes the messages and calculates the abnormal indicator metric for each subject and channel, which is stored in TimescaleDB. Grafana is used to visualize the metric and issue alerts to observers if a predefined threshold is exceeded.
The processing logic uses a discrete wavelet tranform-based method described by Ocak [5]. For a more in-depth discussion, consult the README file in the Spark subdirectory.
EEG data was sourced from PhysioNet's CHB-MIT Scalp EEG Database, mirrored in S3.
This database, collected at the Children’s Hospital Boston, consists of EEG recordings from pediatric subjects with intractable seizures. Subjects were monitored for up to several days following withdrawal of anti-seizure medication in order to characterize their seizures and assess their candidacy for surgical intervention.
epileptiSentry was developed and deployed using Amazon Web Services' cloud computing platform.
Role | Signal Reproduction | Kafka | Spark | TimescaleDB | Grafana |
---|---|---|---|---|---|
Type | c5.xlarge | m5.large | c5.2xlarge | c5.2xlarge | t3.micro |
Number | 2 | 5 | 5 | 1 | 1 |
Launch two c5.xlarge instances running Ubuntu 18.04 LTS with roles for S3 access. Install Python 3.7.3 and required packages:
python3.7 -m pip -r $PROJECT_HOME/config/eeg-player-requirements.txt
To start signal reproduction, run a bash script from each instance, which will simultaneously kick off 12 processes running produce-signals.py.
src/sig_gen/start_producer.sh
To stop all processes from the above:
pkill -f produce-signals
Insight Pegasus is required to deploy this cluster. Once Pegasus is installed, run deploy_kafka_cluster_pegasus.sh
in the config/kafka-cluster
directory.
Download and install Spark 2.4.3 pre-built with Apache Hadoop 2.7+ on instances running Ubuntu 18.04 LTS. Install Python 3.7.3 and Python packages:
python3.7 -m pip -r config/sparkcluster-requirements.txt
Launch an c5.2xlarge instance with Amazon Machine Image (AMI) provided by Timescale. Then, import the following schema:
psql -U postgres -d epileptisentry -f config/tsdb-schema.sql
Follow the instructions from the Grafana Labs website to deploy Grafana. Custom configuration key-values and apache2's setup (http-to-https redirection) are available in the `config/grafana' directory.
epileptiSentry was developed by David Lee (LinkedIn Profile). This project was a deliverable from my fellowship in Insight Data Engineering Fellowship program in June 2019 in New York City, NY, United States.
[1] U.S. Center for Disease Control and Prevention. Epilepsy Data and Statistics.
[2] Oto, M. The misdiagnosis of epilepsy: Appraising risks and managing uncertainty. Seizure (2017) 44:143-6.
[3] Bigdely-Shamlo et al. The PREP pipeline: standardized preprocessing for large-scale EEG analysis. Front Neuroinform (2015) 9:16.
[4] Moeller et al. Electroencephalography (EEG) in the diagnosis of seizures and epilepsy. UpToDate (2018).
[5] Ocak, H. Automatic detection of epileptic seizures in EEG using discrete wavelet transform and approximate entropy. Exp System Appl (2009) 2, Part 1:2027-2036.