Skip to content

Commit

Permalink
Browse files Browse the repository at this point in the history
  • Loading branch information
imilosk committed Aug 15, 2024
1 parent f2ef383 commit f510ee1
Show file tree
Hide file tree
Showing 2 changed files with 24 additions and 24 deletions.
24 changes: 12 additions & 12 deletions site/2.html
Original file line number Diff line number Diff line change
Expand Up @@ -147,6 +147,18 @@
</header>
<main id="main">
<section aria-label="Blog post list">
<a href="https://ayende.com/blog/201442-B/indexing-only-recent-data-adventures-with-large-datasets-archiving" target="_blank"><h1 class="title mb-6">Indexing only recent data - adventures with large datasets &amp; archiving</h1></a>
<p class="mb-2">by Oren Eini</p>
<p class="mb-6 flex gap-1.5">
<span>
<svg width="1.25rem" fill="currentColor" viewBox="0 0 24 24"
xmlns="http://www.w3.org/2000/svg"><path
xmlns="http://www.w3.org/2000/svg"
d="M12 4C7.58172 4 4 7.58172 4 12C4 16.4183 7.58172 20 12 20C16.4183 20 20 16.4183 20 12C20 7.58172 16.4183 4 12 4ZM2 12C2 6.47715 6.47715 2 12 2C17.5228 2 22 6.47715 22 12C22 17.5228 17.5228 22 12 22C6.47715 22 2 17.5228 2 12ZM12 6C12.5523 6 13 6.44772 13 7V11.5858L15.7071 14.2929C16.0976 14.6834 16.0976 15.3166 15.7071 15.7071C15.3166 16.0976 14.6834 16.0976 14.2929 15.7071L11.2929 12.7071C11.1054 12.5196 11 12.2652 11 12V7C11 6.44772 11.4477 6 12 6Z"></path></svg>
</span>
posted on: July 26, 2024
</p>
<p class="max-w-full w-full line-clamp-5 text-justify mb-20">We recently got a support request from a user in which they had the following issue:We have an index that is using way too much disk space. We don&#x2019;t need to search the entire dataset, just the most recent documents. Can we do something like this?from d in docs.Events&#xD;&#xA;where d.CreationDate &gt;= DateTime.UtcNow.AddMonths(-3)&#xD;&#xA;select new { d.CreationDate, d.Content };The idea is that only documents from the past 3 months would be indexed, while older documents would be purged from the index but still retained. The actual problem is that this is a full-text search index, and the actual data size required to perform a full-text search across the entire dataset is higher than just storing the documents (which can be easily compressed). This is a great example of an XY problem. The request was to allow access to the current date during the indexing process so the index could filter out old documents. However, that is actually something that we explicitly&#xA0;prevent. The problem is that the current date isn&#x2019;t really meaningful when we talk about indexing. The indexing time isn&#x2019;t really relevant for filtering or operations, since it has no association with the actual data. The date of a document and the time it was indexed are completely unrelated. I might update a document (and thus re-index it) whose CreationDate is far in the past. That would filter it out from the index. However, if we didn&#x2019;t&#xA0;update the document, it would be retained indefinitely, since the filtering occurs only at indexing time.Going back to the XY problem, what is the user trying to solve? They don&#x2019;t want to index all data, but they do want to retain it forever. So how can we achieve this with RavenDB?Data Archiving in RavenDBOne of the things we aim to do with RavenDB is ensure that we have a good fit for most common scenarios, and archiving is certainly one of them. In RavenDB 6.0 we added explicit support for Data Archiving.When you save a document, all you need to do is add a metadata element: @archive-at&#xA0;and you are set. For example, take a look at the following document:{&#xD;&#xA; &quot;Name&quot;: &quot;Wilman Kal&quot;,&#xD;&#xA; &quot;Phone&quot;: &quot;90-224 8888&quot;,&#xD;&#xA; &quot;@metadata&quot;: {&#xD;&#xA; &quot;@archive-at&quot;: &quot;2024-11-01T12:00:00.000Z&quot;,&#xD;&#xA; &quot;@collection&quot;: &quot;Companies&quot;,&#xD;&#xA; }&#xD;&#xA;}This document is set to be archived on Nov 1st, 2024. What does that mean? From that day on, RavenDB will automatically mark it as an archived document, meaning it will be stored in a compressed format and excluded from indexing by default.In fact, this exact scenario is detailed&#xA0;in the documentation. You can decide (on a per-index basis) whether to include archived documents in the index. This gives you a very high level of flexibility without requiring much manual effort. In short, for this scenario, you can simply tell RavenDB when to archive the document and let RavenDB handle the rest. RavenDB will do the right thing for you.</p>
<a href="https://ayende.com/blog/201441-B/cryptographically-impossible-bug-hunt" target="_blank"><h1 class="title mb-6">Cryptographically impossible bug hunt</h1></a>
<p class="mb-2">by Oren Eini</p>
<p class="mb-6 flex gap-1.5">
Expand Down Expand Up @@ -207,18 +219,6 @@
posted on: July 22, 2024
</p>
<p class="max-w-full w-full line-clamp-5 text-justify mb-20">Learn how to integrate AI into your .NET applications with Prompty, a powerful Visual Studio Code extension.</p>
<a href="https://devblogs.microsoft.com/dotnet/introducing-core-wcf-and-wcf-client-azure-queue-storage-bindings-for-dotnet/" target="_blank"><h1 class="title mb-6">Introducing CoreWCF and WCF Client Azure Queue Storage bindings for .NET</h1></a>
<p class="mb-2">by Subhrajit Saha</p>
<p class="mb-6 flex gap-1.5">
<span>
<svg width="1.25rem" fill="currentColor" viewBox="0 0 24 24"
xmlns="http://www.w3.org/2000/svg"><path
xmlns="http://www.w3.org/2000/svg"
d="M12 4C7.58172 4 4 7.58172 4 12C4 16.4183 7.58172 20 12 20C16.4183 20 20 16.4183 20 12C20 7.58172 16.4183 4 12 4ZM2 12C2 6.47715 6.47715 2 12 2C17.5228 2 22 6.47715 22 12C22 17.5228 17.5228 22 12 22C6.47715 22 2 17.5228 2 12ZM12 6C12.5523 6 13 6.44772 13 7V11.5858L15.7071 14.2929C16.0976 14.6834 16.0976 15.3166 15.7071 15.7071C15.3166 16.0976 14.6834 16.0976 14.2929 15.7071L11.2929 12.7071C11.1054 12.5196 11 12.2652 11 12V7C11 6.44772 11.4477 6 12 6Z"></path></svg>
</span>
posted on: July 18, 2024
</p>
<p class="max-w-full w-full line-clamp-5 text-justify mb-20">The initial beta release of the official libraries Microsoft.CoreWCF.Azure.StorageQueues and Microsoft.WCF.Azure.StorageQueues.Client library for .NET is now available.</p>
<a href="https://ayende.com/blog/201409-A/temporal-cattle-and-other-important-jargon" target="_blank"><h1 class="title mb-6">Temporal cattle and other important jargon</h1></a>
<p class="mb-2">by Oren Eini</p>
<p class="mb-6 flex gap-1.5">
Expand Down
24 changes: 12 additions & 12 deletions site/index.html
Original file line number Diff line number Diff line change
Expand Up @@ -147,6 +147,18 @@
</header>
<main id="main">
<section aria-label="Blog post list">
<a href="https://devblogs.microsoft.com/dotnet/dotnet-conf-2024-celebrating-the-release-of-dotnet-9-save-the-date/" target="_blank"><h1 class="title mb-6">.NET Conf 2024 &#x2013; Celebrating the Release of .NET 9! &#x2013; Save the Date!</h1></a>
<p class="mb-2">by Mehul Harry</p>
<p class="mb-6 flex gap-1.5">
<span>
<svg width="1.25rem" fill="currentColor" viewBox="0 0 24 24"
xmlns="http://www.w3.org/2000/svg"><path
xmlns="http://www.w3.org/2000/svg"
d="M12 4C7.58172 4 4 7.58172 4 12C4 16.4183 7.58172 20 12 20C16.4183 20 20 16.4183 20 12C20 7.58172 16.4183 4 12 4ZM2 12C2 6.47715 6.47715 2 12 2C17.5228 2 22 6.47715 22 12C22 17.5228 17.5228 22 12 22C6.47715 22 2 17.5228 2 12ZM12 6C12.5523 6 13 6.44772 13 7V11.5858L15.7071 14.2929C16.0976 14.6834 16.0976 15.3166 15.7071 15.7071C15.3166 16.0976 14.6834 16.0976 14.2929 15.7071L11.2929 12.7071C11.1054 12.5196 11 12.2652 11 12V7C11 6.44772 11.4477 6 12 6Z"></path></svg>
</span>
posted on: August 14, 2024
</p>
<p class="max-w-full w-full line-clamp-5 text-justify mb-20">Announcing .NET Conf 2024 - a free, three-day virtual developer event that celebrates the release of .NET 9.</p>
<a href="https://devblogs.microsoft.com/dotnet/azure-ai-model-catalog-dotnet-inference-sdk/" target="_blank"><h1 class="title mb-6">Introducing the Azure AI Inference SDK: Access More AI Models with the Azure AI Model Catalog</h1></a>
<p class="mb-2">by Luis Quintanilla</p>
<p class="mb-6 flex gap-1.5">
Expand Down Expand Up @@ -255,18 +267,6 @@
posted on: July 29, 2024
</p>
<p class="max-w-full w-full line-clamp-5 text-justify mb-20">Learn how to get started creating bindings with Native Library Interop by following this example binding native Chart libraries in a .NET MAUI application.</p>
<a href="https://ayende.com/blog/201442-B/indexing-only-recent-data-adventures-with-large-datasets-archiving" target="_blank"><h1 class="title mb-6">Indexing only recent data - adventures with large datasets &amp; archiving</h1></a>
<p class="mb-2">by Oren Eini</p>
<p class="mb-6 flex gap-1.5">
<span>
<svg width="1.25rem" fill="currentColor" viewBox="0 0 24 24"
xmlns="http://www.w3.org/2000/svg"><path
xmlns="http://www.w3.org/2000/svg"
d="M12 4C7.58172 4 4 7.58172 4 12C4 16.4183 7.58172 20 12 20C16.4183 20 20 16.4183 20 12C20 7.58172 16.4183 4 12 4ZM2 12C2 6.47715 6.47715 2 12 2C17.5228 2 22 6.47715 22 12C22 17.5228 17.5228 22 12 22C6.47715 22 2 17.5228 2 12ZM12 6C12.5523 6 13 6.44772 13 7V11.5858L15.7071 14.2929C16.0976 14.6834 16.0976 15.3166 15.7071 15.7071C15.3166 16.0976 14.6834 16.0976 14.2929 15.7071L11.2929 12.7071C11.1054 12.5196 11 12.2652 11 12V7C11 6.44772 11.4477 6 12 6Z"></path></svg>
</span>
posted on: July 26, 2024
</p>
<p class="max-w-full w-full line-clamp-5 text-justify mb-20">We recently got a support request from a user in which they had the following issue:We have an index that is using way too much disk space. We don&#x2019;t need to search the entire dataset, just the most recent documents. Can we do something like this?from d in docs.Events&#xD;&#xA;where d.CreationDate &gt;= DateTime.UtcNow.AddMonths(-3)&#xD;&#xA;select new { d.CreationDate, d.Content };The idea is that only documents from the past 3 months would be indexed, while older documents would be purged from the index but still retained. The actual problem is that this is a full-text search index, and the actual data size required to perform a full-text search across the entire dataset is higher than just storing the documents (which can be easily compressed). This is a great example of an XY problem. The request was to allow access to the current date during the indexing process so the index could filter out old documents. However, that is actually something that we explicitly&#xA0;prevent. The problem is that the current date isn&#x2019;t really meaningful when we talk about indexing. The indexing time isn&#x2019;t really relevant for filtering or operations, since it has no association with the actual data. The date of a document and the time it was indexed are completely unrelated. I might update a document (and thus re-index it) whose CreationDate is far in the past. That would filter it out from the index. However, if we didn&#x2019;t&#xA0;update the document, it would be retained indefinitely, since the filtering occurs only at indexing time.Going back to the XY problem, what is the user trying to solve? They don&#x2019;t want to index all data, but they do want to retain it forever. So how can we achieve this with RavenDB?Data Archiving in RavenDBOne of the things we aim to do with RavenDB is ensure that we have a good fit for most common scenarios, and archiving is certainly one of them. In RavenDB 6.0 we added explicit support for Data Archiving.When you save a document, all you need to do is add a metadata element: @archive-at&#xA0;and you are set. For example, take a look at the following document:{&#xD;&#xA; &quot;Name&quot;: &quot;Wilman Kal&quot;,&#xD;&#xA; &quot;Phone&quot;: &quot;90-224 8888&quot;,&#xD;&#xA; &quot;@metadata&quot;: {&#xD;&#xA; &quot;@archive-at&quot;: &quot;2024-11-01T12:00:00.000Z&quot;,&#xD;&#xA; &quot;@collection&quot;: &quot;Companies&quot;,&#xD;&#xA; }&#xD;&#xA;}This document is set to be archived on Nov 1st, 2024. What does that mean? From that day on, RavenDB will automatically mark it as an archived document, meaning it will be stored in a compressed format and excluded from indexing by default.In fact, this exact scenario is detailed&#xA0;in the documentation. You can decide (on a per-index basis) whether to include archived documents in the index. This gives you a very high level of flexibility without requiring much manual effort. In short, for this scenario, you can simply tell RavenDB when to archive the document and let RavenDB handle the rest. RavenDB will do the right thing for you.</p>
<div class="button flex justify-between">
<span class="back invisible arrow"></span>

Expand Down

0 comments on commit f510ee1

Please sign in to comment.