You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
SOEMPI can import 10K record set within 5 seconds range and that includes reading the data from the flat file, string tokenization, constructing Person objects, persisting Person objects. For some reason the persistence of record links goes 10-100 times slower (persisting half million record pairs takes about an hour). Although the record pair data is even smaller than a person.
What happened so far:
from system monitoring I can rule out CPU or IO saturation.
SOEMPI long time ago gathers read and write operations into batches. The size is determined by Constants.PAGE_SIZE. This helps to minimize Hibernate flush calls: flush is called only once per PAGE_SIZE
Enhanced the system that it won't use the sequence generator when doing mass persistence. In case off mass persistence operations (dataset import, match par stat / half stat persistence, record link persistence) SOEMPI assigns the ids using a simple counter. This can possibly avoid a DB internal select call fro the next sequence number. This affects all persistence though (Person and link too) and didn't bring notable speed change.
Changed the textual vector information in PersonLink/person_link from old "text" type to varchar(65536). This was a schema-only change and didn't bring notable improvement.
in case of CBF/RBF match there's only one field to match so the binary and continuous vector textual information is redundant, since the weight (double) field already has the info. So in this case I don't generate and persist those.
The main question: why the Person persistence is much faster than the link persistence.
SOEMPI can import 10K record set within 5 seconds range and that includes reading the data from the flat file, string tokenization, constructing Person objects, persisting Person objects. For some reason the persistence of record links goes 10-100 times slower (persisting half million record pairs takes about an hour). Although the record pair data is even smaller than a person.
What happened so far:
The main question: why the Person persistence is much faster than the link persistence.
Things to try:
The text was updated successfully, but these errors were encountered: