Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Record link and match pair stat / half stat persistence is unreasonably slow #3

Open
MrCsabaToth opened this issue Jun 30, 2013 · 0 comments
Labels

Comments

@MrCsabaToth
Copy link
Owner

SOEMPI can import 10K record set within 5 seconds range and that includes reading the data from the flat file, string tokenization, constructing Person objects, persisting Person objects. For some reason the persistence of record links goes 10-100 times slower (persisting half million record pairs takes about an hour). Although the record pair data is even smaller than a person.
What happened so far:

  • from system monitoring I can rule out CPU or IO saturation.
  • SOEMPI long time ago gathers read and write operations into batches. The size is determined by Constants.PAGE_SIZE. This helps to minimize Hibernate flush calls: flush is called only once per PAGE_SIZE
  • Enhanced the system that it won't use the sequence generator when doing mass persistence. In case off mass persistence operations (dataset import, match par stat / half stat persistence, record link persistence) SOEMPI assigns the ids using a simple counter. This can possibly avoid a DB internal select call fro the next sequence number. This affects all persistence though (Person and link too) and didn't bring notable speed change.
  • Changed the textual vector information in PersonLink/person_link from old "text" type to varchar(65536). This was a schema-only change and didn't bring notable improvement.
  • in case of CBF/RBF match there's only one field to match so the binary and continuous vector textual information is redundant, since the weight (double) field already has the info. So in this case I don't generate and persist those.

The main question: why the Person persistence is much faster than the link persistence.

Things to try:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant