Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimise manifest database #7

Open
open768 opened this issue Aug 21, 2024 · 7 comments
Open

Optimise manifest database #7

open768 opened this issue Aug 21, 2024 · 7 comments
Labels
enhancement New feature or request

Comments

@open768
Copy link
Owner

open768 commented Aug 21, 2024

The database contains thousands of duplicate strings for mission, sol, instrument, type.

The database would be a lot smaller if integer indexes were used

@open768
Copy link
Owner Author

open768 commented Sep 5, 2024

look at https://www.doctrine-project.org/

open768 pushed a commit that referenced this issue Sep 5, 2024
open768 pushed a commit that referenced this issue Sep 6, 2024
@open768
Copy link
Owner Author

open768 commented Sep 17, 2024

look at https://meekro.com/ for database abstraction

@open768
Copy link
Owner Author

open768 commented Nov 5, 2024

meekro doesnt perform cacheing by itself, but can be used alongside stash

Sure! To achieve this, you can modify the code to ensure that read operations fetch data from the cache, and only write operations interact with the database. Here’s how you can adjust the code:

Install Stash (if not already done):
composer require tedivm/stash

Configure Stash:

PHP

use Stash\Driver\FileSystem;
use Stash\Pool;

$driver = new FileSystem(['path' => '/path/to/cache']);
$pool = new Pool($driver);

AI-generated code. Review and use carefully. More info on FAQ.
Modify MeekroORM to use caching:
PHP

class MyORM extends MeekroORM {
    protected static $cache;

    public static function setCache(Pool $cachePool) {
        self::$cache = $cachePool;
    }

    public static function getFromCache($key) {
        $item = self::$cache->getItem($key);
        return $item->isHit() ? $item->get() : null;
    }

    public static function saveToCache($key, $data, $ttl = 3600) {
        $item = self::$cache->getItem($key);
        $item->set($data)->expiresAfter($ttl);
        self::$cache->save($item);
    }

    public static function addEntry($table, $data) {
        parent::insert($table, $data);
        // Invalidate cache after adding a new entry
        self::$cache->clear();
    }

    public static function getLookupTable($table) {
        $cacheKey = 'lookup_table_' . $table;
        $result = self::getFromCache($cacheKey);

        if (!$result) {
            $result = parent::query("SELECT * FROM %b", $table);
            self::saveToCache($cacheKey, $result);
        }

        return $result;
    }
}

// Set the cache pool
MyORM::setCache($pool);

// Example usage
$lookupTable = MyORM::getLookupTable('my_lookup_table');

// Adding a new entry
MyORM::addEntry('my_lookup_table', ['column1' => 'value1', 'column2' => 'value2']);

In this setup:

getLookupTable: Reads from the cache if available, otherwise fetches from the database and caches the result.
addEntry: Writes to the database and clears the cache to ensure consistency.
This approach ensures that read operations are served from the cache, while write operations update the database and invalidate the cache to keep it up-to-date.

@open768
Copy link
Owner Author

open768 commented Nov 12, 2024

ive added meekro to the startup.
created a placeholder page: https://dev.mars-browser.co.uk:8443/curiosity_browser/php/test/meekro_mani.php

@open768
Copy link
Owner Author

open768 commented Nov 13, 2024

Drop meekro, try propel or redbean which has table relationships

https://propelorm.org/documentation/04-relationships.html

https://redbeanphp.com/index.php

@open768
Copy link
Owner Author

open768 commented Dec 1, 2024

also consider using an ETL library such as flow which has support for Json and xml

### Updated Schema Definition (schema.xml):

<database name="default" defaultIdMethod="native">
    <table name="instrument">
        <column name="id" type="INTEGER" primaryKey="true" autoIncrement="true" />
        <column name="name" type="VARCHAR" size="255" required="true" />
        <behavior name="query_cache" />
    </table>
    <table name="images">
        <column name="id" type="INTEGER" primaryKey="true" autoIncrement="true" />
        <column name="bucket" type="VARCHAR" size="255" />
        <column name="camera_model_component_list" type="VARCHAR" size="255" />
        <column name="date_added" type="TIMESTAMP" />
        <column name="filter_name" type="VARCHAR" size="255" />
        <column name="pds_label_url" type="VARCHAR" size="255" />
        <column name="scale_factor" type="VARCHAR" size="255" />
        <column name="url_list" type="VARCHAR" size="255" />
        <column name="sclk" type="VARCHAR" size="255" />
        <column name="attitude" type="VARCHAR" size="255" />
        <column name="camera_position" type="VARCHAR" size="255" />
        <column name="drive" type="VARCHAR" size="255" />
        <column name="camera_model_type" type="VARCHAR" size="255" />
        <column name="contributor" type="VARCHAR" size="255" />
        <column name="mast_az" type="VARCHAR" size="255" />
        <column name="site" type="VARCHAR" size="255" />
        <column name="camera_vector" type="VARCHAR" size="255" />
        <column name="item_name" type="VARCHAR" size="255" />
        <column name="subframe_rect" type="VARCHAR" size="255" />
        <column name="utc" type="TIMESTAMP" />
        <column name="sol" type="INTEGER" />
        <column name="mast_el" type="VARCHAR" size="255" />
        <column name="instrument_id" type="INTEGER" required="true" />
        <foreign-key foreignTable="instrument">
            <reference local="instrument_id" foreign="id" />
        </foreign-key>
    </table>
</database>

### Updated Ingestion Script with Caching:

<?php

require_once 'propel.php';

use YourNamespace\InstrumentQuery;
use YourNamespace\Instrument;
use YourNamespace\Images;

// Load and decode JSON file
$json = file_get_contents('path/to/your/json/file.json');
$data = json_decode($json, true);

// Prepare a list of instruments to be inserted
$instrumentQuery = InstrumentQuery::create();
$instruments = [];
foreach ($data['images'] as $imageData) {
    $instrumentName = $imageData['instrument'];
    if (!isset($instruments[$instrumentName])) {
        $instrument = $instrumentQuery
            ->setQueryKey('find_instrument_by_name')
            ->findOneByName($instrumentName);

        if (!$instrument) {
            $instrument = new Instrument();
            $instrument->setName($instrumentName);
            $instrument->save();
        }
        $instruments[$instrumentName] = $instrument->getId();
    }
}

// Insert images data
foreach ($data['images'] as $imageData) {
    $image = new Images();
    $image->setBucket($imageData['bucket']);
    $image->setCameraModelComponentList($imageData['cameraModelComponentList']);
    $image->setDateAdded(new \DateTime($imageData['dateAdded']));
    $image->setFilterName($imageData['filterName']);
    $image->setPdsLabelUrl($imageData['pdsLabelUrl']);
    $image->setScaleFactor($imageData['scaleFactor']);
    $image->setUrlList($imageData['urlList']);
    $image->setSclk($imageData['sclk']);
    $image->setAttitude($imageData['attitude']);
    $image->setCameraPosition($imageData['cameraPosition']);
    $image->setDrive($imageData['drive']);
    $image->setCameraModelType($imageData['cameraModelType']);
    $image->setContributor($imageData['contributor']);
    $image->setMastAz($imageData['mastAz']);
    $image->setSite($imageData['site']);
    $image->setCameraVector($imageData['cameraVector']);
    $image->setItemName($imageData['itemName']);
    $image->setSubframeRect($imageData['subframeRect']);
    $image->setUtc(new \DateTime($imageData['utc']));
    $image->setSol($imageData['sol']);
    $image->setMastEl($imageData['mastEl']);
    $image->setInstrumentId($instruments[$imageData['instrument']]);
    $image->save();
}

echo "Data ingestion complete!";

@open768
Copy link
Owner Author

open768 commented Dec 1, 2024

theres a lot of code needed to manage the instrument table. copilot suggests using eloquent:

Here’s a simplified example using Eloquent:

  1. Install Eloquent:

    composer require illuminate/database
  2. Set Up Eloquent:

    <?php
    
    require 'vendor/autoload.php';
    
    use Illuminate\Database\Capsule\Manager as Capsule;
    
    $capsule = new Capsule;
    
    $capsule->addConnection([
        'driver'    => 'sqlite',
        'database'  => 'path/to/your/database.sqlite',
        'prefix'    => '',
    ]);
    
    $capsule->setAsGlobal();
    $capsule->bootEloquent();
  3. Define Models:

    <?php
    
    use Illuminate\Database\Eloquent\Model;
    
    class Instrument extends Model {
        protected $table = 'instrument';
        public $timestamps = false;
    }
    
    class Image extends Model {
        protected $table = 'images';
        public $timestamps = false;
    
        public function instrument() {
            return $this->belongsTo(Instrument::class);
        }
    }
  4. Create and Migrate Database Schema:

    <?php
    
    use Illuminate\Database\Capsule\Manager as Capsule;
    
    Capsule::schema()->create('instrument', function ($table) {
        $table->increments('id');
        $table->string('name')->unique();
    });
    
    Capsule::schema()->create('images', function ($table) {
        $table->increments('id');
        $table->string('bucket')->nullable();
        $table->string('camera_model_component_list')->nullable();
        $table->timestamp('date_added')->nullable();
        $table->string('filter_name')->nullable();
        $table->string('pds_label_url')->nullable();
        $table->string('scale_factor')->nullable();
        $table->string('url_list')->nullable();
        $table->string('sclk')->nullable();
        $table->string('attitude')->nullable();
        $table->string('camera_position')->nullable();
        $table->string('drive')->nullable();
        $table->string('camera_model_type')->nullable();
        $table->string('contributor')->nullable();
        $table->string('mast_az')->nullable();
        $table->string('site')->nullable();
        $table->string('camera_vector')->nullable();
        $table->string('item_name')->nullable();
        $table->string('subframe_rect')->nullable();
        $table->timestamp('utc')->nullable();
        $table->integer('sol')->nullable();
        $table->string('mast_el')->nullable();
        $table->integer('instrument_id')->unsigned();
        $table->foreign('instrument_id')->references('id')->on('instrument');
    });
  5. Ingest JSON Data:

    <?php
    
    // Load and decode JSON file
    $json = file_get_contents('path/to/your/json/file.json');
    $data = json_decode($json, true);
    
    // Ingest data
    foreach ($data['images'] as $imageData) {
        $instrument = Instrument::firstOrCreate(['name' => $imageData['instrument']]);
    
        Image::create([
            'bucket' => $imageData['bucket'],
            'camera_model_component_list' => $imageData['cameraModelComponentList'],
            'date_added' => $imageData['dateAdded'],
            'filter_name' => $imageData['filterName'],
            'pds_label_url' => $imageData['pdsLabelUrl'],
            'scale_factor' => $imageData['scaleFactor'],
            'url_list' => $imageData['urlList'],
            'sclk' => $imageData['sclk'],
            'attitude' => $imageData['attitude'],
            'camera_position' => $imageData['cameraPosition'],
            'drive' => $imageData['drive'],
            'camera_model_type' => $imageData['cameraModelType'],
            'contributor' => $imageData['contributor'],
            'mast_az' => $imageData['mastAz'],
            'site' => $imageData['site'],
            'camera_vector' => $imageData['cameraVector'],
            'item_name' => $imageData['itemName'],
            'subframe_rect' => $imageData['subframeRect'],
            'utc' => $imageData['utc'],
            'sol' => $imageData['sol'],
            'mast_el' => $imageData['mastEl'],
            'instrument_id' => $instrument->id
        ]);
    }
    
    echo "Data ingestion complete!";

Key Advantages:

  1. Concise Syntax:

    • Eloquent's fluent syntax makes it easy to interact with the database without writing extensive code.
  2. Automatic Relationship Handling:

    • Eloquent handles the foreign key relationships for you, simplifying the process.
  3. Schema Management:

    • Eloquent's schema builder allows you to define and migrate your database schema with minimal code.

Using Eloquent should reduce the amount of code required and make it easier to manage your data. If you have any other questions or need further assistance, feel free to ask! 🌟🚀

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant