LDAF - Large Data Analysis Framework

LDAF is a Python framework to support data scientists when working on large data sets. In this case Large Data is defined as data, that can still fit into the RAM. Out of core processing may also be possible, but has not been tested. Data sets with size up to 10GB have been successfully tested.

Exploring Data is often an iterative process which can be slowed down when loading the data on every iteration. LDAF has been developed to solve this problem by separating the data from the analysis process. Data can be loaded once and analysis modules can be loaded/updated dynamically during runtime.

To visualise data, LDAF supports matplotlib. Figures are printed inside the matplotlib canvas with all its features, including the picker event. In this way, interactive applications can be created with little effort.

Custom settings can be added to provide user input for the analysis.

⚠️ This project is still in beta state and the API may change in future.

⚠️ Any suggestions and comments are welcome.

Functionality

Load data
Load/Reload analysis modules
Custom settings
View data and interact with the figure
View data as table

Example

For a full example application please see: https://github.com/peckto/ldaf-example

Workflow

To work with LDAF, the following workflow is recommended:

Load data DataSource.py (eg. from csv, hdf5, sql, ...)
Optional: Add custom settings Settings.py
Develop analysis module Module_1.py
Run module
go to 3 and repeat

Usage

LDAF is designed as a library and cannot be run standalone. It can be installed like any other python packet. To start the application, at least the following abstract classes must be implemented:

ldaf.DataSource Load the data set(s)
ldaf.Settings Optional: Add custom settings

To run LDAF, the following main code snippet can be used:

import sys
from PyQt5.QtWidgets import QApplication

from ldaf.App import App
from Settings import Settings
from DataSource import DataSource


if __name__ == "__main__":
    app = QApplication(sys.argv)
    data_source = DataSource()
    settings = Settings()
    modules_dir = 'Modules'
    window_title = 'LDAF - Examples'

    window = App(app, data_source, modules_dir, settings, window_title)
    window.show()

    sys.exit(app.exec_())

All analysis modules must be located in one folder. All python files inside the modules_dir are loaded as modules. One module can have multiple analysis functions.

A sample module with one analysis function can look as follows:

settings = {}
"Work in process"
actions = {}
"Work in process"

name = 'Module 2'
"Display name of module"
table = 'example2'
"is mapped to app.active_table, WIP"


def example_1(app: 'App', fig=None):
    table = app.settings.get('Table')

    df = app.data_source.get_table(table)
    df.name = '%s Data' % table
    "name attribute must be set"

    return df


functions = {
    'Example 1 Table': example_1,
}

GUI

The GUI is based on PyQt5 and has been created with Qt Designer (Main.ui).

The GUI has the following main widgets:

Widget	Description
Menu (File)	Load data and reload modules
Settings	Custom settings to interact with the modules (`Settings.py`)
Loaded Tables	Shows statistics about loaded data sets
Log	Modules log messages
Statusbar	Shows information about running process
Analysis	The loaded modules are represented as tabs and the analysis functions can be called via the buttons

Dependencies

The following python modules are necessary to run LDAF:

numpy
pandas
PyQt5
matplotlib

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

LDAF - Large Data Analysis Framework

Functionality

Example

Workflow

Usage

GUI

Dependencies

Files

README.md

Latest commit

History

README.md

File metadata and controls

LDAF - Large Data Analysis Framework

Functionality

Example

Workflow

Usage

GUI

Dependencies