You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The UpdateStream will send Updates to a SolrCloud Collection. UpdateStream will wrap a TupleStream. As it iterates the TupleStream it will send the Tuples to be indexed as documents in a SolrCloud collection. This will allow developers to build new data sets by combining and transforming TupleStreams.
Documents will be routed directly to the correct SolrCloud leader using techniques similar to CloudSolrServer. The actual documents will be sent using the ConcurrentUpdateSolrServer so updates can be Streamed rather than batched.
The UpdateStream can wrap any TupleStream. So it can wrap custom TupleStreams that pull data from other data sources such as RDBM's or NoSQL engines. This provides a generalized streaming ETL framework.
The text was updated successfully, but these errors were encountered:
Added initial implementation to the helio_ustream branch. fdf85a0
Not working yet but gives the basic idea. The initial code uses CloudSolrServer as the indexer.
Next step is to work on the Tuples that are returned from the read() method after each batch. These tuples will report on the progress of the indexing. I think it makes sense to report the number of batches indexed, in the queue and error counts.
The UpdateStream will send Updates to a SolrCloud Collection. UpdateStream will wrap a TupleStream. As it iterates the TupleStream it will send the Tuples to be indexed as documents in a SolrCloud collection. This will allow developers to build new data sets by combining and transforming TupleStreams.
Documents will be routed directly to the correct SolrCloud leader using techniques similar to CloudSolrServer. The actual documents will be sent using the ConcurrentUpdateSolrServer so updates can be Streamed rather than batched.
The UpdateStream can wrap any TupleStream. So it can wrap custom TupleStreams that pull data from other data sources such as RDBM's or NoSQL engines. This provides a generalized streaming ETL framework.
The text was updated successfully, but these errors were encountered: