-
Notifications
You must be signed in to change notification settings - Fork 30
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cep27 #259
base: source
Are you sure you want to change the base?
Cep27 #259
Changes from 3 commits
069dfe8
3230670
eec6cbc
1d67bca
b63c111
b06a941
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,287 @@ | ||
CEP 27 - |Cyclus| Database Restructuring | ||
******************************************** | ||
|
||
:CEP: 27 | ||
:Title: |Cyclus| Database Restructuring | ||
:Last-Modified: 2017-09-11 | ||
:Author: Jin whan Bae & Anthony Scopatz | ||
:Status: Draft | ||
:Type: Standards Track | ||
:Created: 2013-09-11 | ||
|
||
Abstract | ||
============ | ||
This CEP proposes to restructure the |cyclus| output database structure in order to | ||
reduce the number of tables and redundancy of data, and ultimately reduce the number | ||
of ``joins`` required for data analysis. Doing so would reduce the computing time | ||
for end-user analysis, and allow for a clearer, more concise output database. | ||
|
||
|
||
Motivation | ||
========== | ||
The current output database requires the user to join multiple tables to acquire | ||
meaningful material data, such as quantity and composition. This causes long | ||
analysis computing times and confusion for the user. | ||
|
||
|
||
Rationale | ||
========= | ||
The proposed restructure aims to reduce the number of tables the user has to query | ||
for analysis. This can be done by two methods: | ||
|
||
1. Combine redundant tables | ||
2. Reduce a table (``Compositions`` table) into a column with variable-type map. | ||
|
||
Additionally, this CEP proposes to store both **Inventories** and **Transactions** | ||
by default. Either table may be backed out of the other (with additional | ||
information coming from **Materials** etc). However, this backing out process has proven | ||
extrodinarily expensive, exploding the number of operations needed to back out non-present | ||
by millions to billions. Even for small databases, this has proven prohibitive. | ||
|
||
While storing both **Inventories** and **Transactions** may seem inefficient, consider | ||
that: | ||
|
||
* Data storage is cheap, | ||
* Material inventories are what most analysis tasks require, and | ||
* This is precisely double-entry bookkeeping, as applied to the nuclear fuel cycle. | ||
|
||
Double-entry bookkeeping was huge innovation in accounting systems. When implemented | ||
correctly and without fraud, it leads to a self-consisent system. This enables errors | ||
to be discovered and corrected earlier. This CEP argues that |Cyclus| should provide | ||
the information needed to verify the mass balances, if requested. | ||
|
||
|
||
Specification \& Implementation | ||
=============================== | ||
The following tables that are currently in output are considered for editing: | ||
|
||
1. Compositions | ||
2. Transactions | ||
3. Recipes | ||
4. ExplicitInventory | ||
5. ExplicitInventoryCompact | ||
6. Info | ||
7. InfoExplicitInv | ||
8. ResCreators | ||
9. Resources | ||
|
||
|
||
Material and Product | ||
-------------------- | ||
|
||
Currently, both **Material** and **Product** are in the Resources Table. | ||
The internal state of **Material** is stored in **Compositions**, and | ||
the internal state of **Product** is stored in **Products** table. | ||
This requires the user to make joins to acquire the internal state | ||
of the resources. | ||
|
||
We can avoid unnecessary joins by creating a **Materials** and | ||
**Products** table, with the internal state (composition and quality) | ||
as a column. | ||
|
||
In short, we propose to replace **Compositions**, **Products**, and | ||
**Resources** table with **Materials** and **Products** Table. In the | ||
process, the **QualId** column would be removed. | ||
|
||
Currently: | ||
|
||
============ ========== | ||
Resources | ||
------------------------ | ||
Column Type | ||
============ ========== | ||
SimId uuid | ||
ResourceId int | ||
ObjId int | ||
Type string | ||
TimeCreated int | ||
Quantity double | ||
Units string | ||
QualId int | ||
Parent1 int | ||
Parent2 int | ||
============ ========== | ||
|
||
|
||
|
||
============ ========== | ||
Products | ||
------------------------ | ||
Column Type | ||
============ ========== | ||
SimId uuid | ||
QualId int | ||
Quality string | ||
============ ========== | ||
|
||
|
||
|
||
|
||
============ ========== | ||
Compositions | ||
------------------------ | ||
Column Type | ||
============ ========== | ||
Simid uuid | ||
QualId int | ||
NucId int | ||
MassFrac double | ||
============ ========== | ||
|
||
Would be restructured to: | ||
|
||
|
||
============ ========== | ||
Materials | ||
------------------------ | ||
Column Type | ||
============ ========== | ||
SimId uuid | ||
ResourceId int | ||
ObjId int | ||
TimeCreated int | ||
Parent1 int | ||
Parent2 int | ||
Units string | ||
Quantity double | ||
Composition map<int,double> | ||
============ ========== | ||
|
||
Where the composition column would map <NucId, MassFrac> | ||
|
||
============ ========== | ||
Products | ||
------------------------ | ||
Column Type | ||
============ ========== | ||
SimId uuid | ||
ResourceId int | ||
ObjId int | ||
TimeCreated int | ||
Parent1 int | ||
Parent2 int | ||
Units string | ||
Quantity double | ||
Quality string | ||
============ ========== | ||
|
||
Also, since **QualId** is removed, the **Recipes** Table | ||
also needs to be edited: | ||
|
||
============ ========== | ||
Recipes | ||
------------------------ | ||
Column Type | ||
============ ========== | ||
SimId uuid | ||
Recipes string | ||
Composition map<int,double> | ||
============ ========== | ||
|
||
|
||
Transactions | ||
------------ | ||
The transactions table would be modified to have an integer flag for whether | ||
the commodity is a material or a product. This flag let's anyone inspecting | ||
the transaction table know which resource table (either **Materials** or | ||
**Products**) to go to to find the actual concrete resource. | ||
|
||
**Current:** | ||
|
||
============ ========== | ||
Transactions | ||
------------------------ | ||
Column Type | ||
============ ========== | ||
SimId uuid | ||
TransactionId int | ||
SenderId int | ||
ReceiverId int | ||
ResourceId int | ||
Commodity string | ||
Time int | ||
============ ========== | ||
|
||
**Proposed** | ||
|
||
================ ========== | ||
Transactions | ||
---------------------------- | ||
Column Type | ||
================ ========== | ||
SimId uuid | ||
TransactionId int | ||
SenderId int | ||
ReceiverId int | ||
**ResourceType** **int** | ||
ResourceId int | ||
Commodity string | ||
Time int | ||
================ ========== | ||
|
||
This table will now be optionally written to the database. The default will be to | ||
write this table (true). | ||
|
||
|
||
ResCreators | ||
----------- | ||
Along with **Transactions**, the **ResCreators** | ||
table would need another column, ResourceType: | ||
|
||
============ ========== | ||
ResCreators | ||
------------------------ | ||
Column Type | ||
============ ========== | ||
Simid uuid | ||
Resourceid int | ||
AgentId int | ||
ResourceType int | ||
============ ========== | ||
|
||
|
||
Merge ExplicitInventory & ExplicitInventoryCompact | ||
---------------------------------------------------- | ||
The **ExplicitInventory** table and **ExplicitInventoryCompact** | ||
table should be merged to a single table, called **Inventories**, | ||
with the following columns: | ||
|
||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Since I'm unfamiliar with the previous tables, can you clarify what is changing here by showing the old table layout? (as you did with the others) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I am not sure I completely understand the question, but the old table layout is shown so that the reader can clearly understand what is being modified / removed with the proposed change. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The Explicitinventory table corresponds to a table with all the different inventory in all the different facilities: where the ExplicitInventoryCompact looks more like the new "Inventory" There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @bam241 answered my question - the old table layout was NOT shown for this case and perhaps should be... There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. fixed! |
||
============ ========== | ||
Inventories | ||
------------------------ | ||
Column Type | ||
============ ========== | ||
Simid uuid | ||
Agentid int | ||
Time int | ||
InventoryName string | ||
Quantity double | ||
Composition map<int,double> | ||
============ ========== | ||
|
||
This table will be optionally written to the database. The default will be to | ||
write this table (true). | ||
|
||
|
||
Merge Info & InfoExplicitInv | ||
---------------------------- | ||
We saw little reason to separate the two tables. Combining them is a matter of cleanliness. | ||
Additionallty, the single **Info** table will have to contain an extra column, **RecordTransactions**. | ||
Furthermore, the **RecordInventory** column is no longer needed and will be removed. | ||
|
||
Other informational tables may also be merged into the single table. | ||
|
||
|
||
Backwards Compatibility | ||
======================= | ||
This CEP is not backwards compatible. | ||
|
||
Document History | ||
================ | ||
This document is released under the CC-BY 3.0 license. | ||
|
||
References and Footnotes | ||
======================== | ||
|
||
.. rubric:: References | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The previous layout anticipated the desire to select on nuclide in the query, and hence a different column for each NucId. Perhaps this has not emerged in the wild, but it seems that a consequence of this change would make this no longer possible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hello, that is a valid point.
Maybe it's necessary for us to define what
For example, if one wants the timeseries mass of Pu239,
the query would be like the following:
in the newer database structure, it would be:
followed by a script that processes the result:
So I do assume that it would take a longer time to accomplish
what you mentioned ( and also needs additional scripting outside of the sqlite query)...
You probably know much more than me, but @scopatz and my initial thought was that
this would have more benefit than loss. What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe this will not be optimized for a large calculation (1000-10000 facilities), if you want to see the plutonium inventory in the fleet, you will need to load all the composition, get the informations you need and then re-generate a table.
I would prefer a system that allow us to filter using facility's name and nucid, but I am not sure it is possible without having a gigantic table :(