Skip to content

Latest commit

 

History

History
57 lines (27 loc) · 2.57 KB

IMPLEMENTATION.md

File metadata and controls

57 lines (27 loc) · 2.57 KB

SQL data model

To put it simply, ifsql works by creating in-memory SQLite database that is then populated with file metadata obtained from traversing the tree with os.walk. Management of this database is delegated to sqlalchemy library in order to avoid excesive use of raw queries.

To represent directory tree structure using RDBMS a pattern called Closure Table is used. Basically, there are two tables in the memory: files and relations. Each row of files table contains detailed information about a single file - its name, size, owner and so on. relations table is used to model the hierarchy. For every directory there are rows mapping relations between it and every of its descendants, including a relation to itself. For example, given following directory tree:

$ tree
.
├── DIRECTORY_1
│   └── FILE_2
├── DIRECTORY_2
│   └── DIRECTORY_3
│       └── FILE_3
└── FILE_1

We obtain following model:

example

Every arrow in this chart represents a single relation row. Besides the pair ancestor_id/descendant_id it also contain a depth value, which makes queries limited to a certain directory tree depth possible.

Both tables and database utilities can be found in ifsql/database.py.

More information about this technique can be found in a book SQL Antipatterns by Bill Karwin.

Parsing queries

After we have the model loaded into memory, we can make SQL calls. Lets say we want to call SELECT file_name, file_size FROM subdirectory. To make this work we must find out file_id of a row representing subdirectory. This is easy, since during the tree traversal all ids are stored in a cache, mapping subdirectory paths into ids from files table. Assuming that the value of this file_id is equal to 4, we need to replace subdirectory table name with a following join:

SELECT f.file_name, f.file_size
FROM files AS f
JOIN relations AS r ON f.file_id = r.descendant_id
WHERE r.ancestor_id = 4;

To do so, a parser written using lark-parser library is used. This parser can be found in ifsql/parser.py based on SQLite BNF grammar. Parser constructs a sqlalchemy.sql.select object from a string query. This object is passed down to Database class using query methods which transforms it as described above.