Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUGFIX: Simplify PartialPublish & Discard (for now) to NOT select dimensions and have custom events #5385

Merged

Conversation

mhsdesign
Copy link
Member

@mhsdesign mhsdesign commented Nov 29, 2024

Partially fixes: #4614 (as we would publish the node move that is marked as moved in some random dimension)
Partially fixes: #5387
See also slack: https://neos-project.slack.com/archives/C04PYL8H3/p1732899328791309

Why the NodeIdentifiersToPublishOrDiscard is removed from the event

Only with f2c66e0
the NodeIdentifiersToPublishOrDiscard was introduced to the WorkspaceWasPartiallyPublished and DiscardIndividualNodesFromWorkspace events. Previously they did not contain the information which "filter" was used to create the events. And strictly speaking - for the ESCR core content graph this information on the event is purely cosmetic.

The only use currently is the asset usage catchup hook that uses the node ids and their dimension if a workspace was discarded to remove usage data:

WorkspaceWasPartiallyDiscarded::class => $this->discardNodes($eventInstance->getWorkspaceName(), $eventInstance->discardedNodes),

One could also optimise the content cache flusher for the partial discard case to not flush the whole workspace but only partial entries:

https://github.com/neos/neos-development-collection/pull/5160/files#diff-008f3460b749dbf869e63ba0602a1501990fa09cc37cd7fdaca6afa3789017e4R143

As we currently flush the whole workspace on discard:

$this->scheduleCacheFlushJobForWorkspaceName($this->contentRepositoryId, $eventInstance->workspaceName);

But this is not worth to optimise as we discussed: #5175 (comment)

So as this is the only - discussable use-case - we dont want to make the event creation more complicated. Instead of gathering the information which nodes were really affected and which dimensions we just fully leave out this information in the event. (Like we do also with other events where we don't specify which nodes were all actually removed in NodeAggregateWasRemoved but rely on that the projection does what it needs).

First i tried to extract the information for the $publishedNodes and $discardedNodes as shown in this pr in extractNodeAggregateIdsAndAffectedDimensionSpacePoints #5386 but its problems are further described there.

Further treating a partial and full discard the same also simplifies the catchup hooks.

Why the distinction between full and partial publish is removed from the event

Without the need to specify the NodeIdentifiersToPublishOrDiscard (see above) there is no reason to differentiate between the events at all.

Thus we upcast the the partial publish and discard events to a publish or discard inside fromArray:

public static function fromArray(array $values): EventInterface
{
    return new WorkspaceWasPublished(
        WorkspaceName::fromString($values['sourceWorkspaceName']),
        WorkspaceName::fromString($values['targetWorkspaceName']),
        ContentStreamId::fromString($values['newSourceContentStreamId']),
        ContentStreamId::fromString($values['previousSourceContentStreamId']),
    );
}

Both events share the same properties their counter parts have.
Well keep the $publishedNodes and $discardedNodes fields in the database as they don't do harm.

WorkspaceWasPartiallyPublished -> WorkspaceWasPublished

  • WorkspaceName $sourceWorkspaceName
  • WorkspaceName $targetWorkspaceName
  • ContentStreamId $newSourceContentStreamId
  • ContentStreamId $previousSourceContentStreamId
  • ignore: NodeIdsToPublishOrDiscard $publishedNodes

WorkspaceWasPartiallyDiscarded -> WorkspaceWasDiscarded

  • WorkspaceName $workspaceName
  • ContentStreamId $newContentStreamId
  • ContentStreamId $previousContentStreamId
  • ignore: NodeIdsToPublishOrDiscard $discardedNodes

To allow to distinct what events are part of a (partial) publish or not, instead the causation id will be used in the future: #3887 (to be implemented in a followup)

Upgrade instructions

TODO:

  • adjust tests that test that partial publish is a full publish
  • check against left todos!

Review instructions

Checklist

  • Code follows the PSR-2 coding style
  • Tests have been created, run and adjusted as needed
  • The PR is created against the lowest maintained branch
  • Reviewer - PR Title is brief but complete and starts with FEATURE|TASK|BUGFIX
  • Reviewer - The first section explains the change briefly for change-logs
  • Reviewer - Breaking Changes are marked with !!! and have upgrade-instructions

@kitsunet
Copy link
Member

Still don't like it, but probably best way forward for now.

@mhsdesign mhsdesign changed the title BUGFIX: Partial publish & delete always across all dimensions (for now) BUGFIX: Partial publish & discard always across all dimensions (for now) Nov 30, 2024
the logic is carried out inside `RebaseableCommands::commandMatchesAtLeastOneNode`

The reasoning behind this is that asking the commands which node aggregate id is affected _does_ work but the capabilities stop if we want to ask which dimensions are affected.

For that we must use the event instead - at least in the future if we want to reintroduce dimension based publishing again. The matching list from commands to node id will also be simplified once we use the `EmbedsNodeAggregateId` interface instead and operate on events.
…stead of `NodeIdsToPublishOrDiscard`

... as a 3rd option the followup commit will just flush the full asset usage workspace!
…arded in the events (for now)

Introduced with neos@f2c66e0 - its not important for the graph projection and only used by the asset usage catchup hook which can be adjusted.

If we want to re-introduce `$publishedNodes` to the event should be also careful that we calculate them from the _actual_ events so we can know for sure there is no gibberish in the event.
Previously for example discarding a non-existing node or a node in a not valid DSP would just save that directly into the event, which is not correct as that did not happen like that. It was merely a bad intention.
Instead of fine granular updating only the usage references on the nodes that were discarded, well throw away everything and then rely on the reapplied events what is kept.

This is more consistent to how other parts react to publishing like the graph just forks a new content stream and doesnt care about the old data, the change projection behaves similar to the graph.
And for the content cache flusher we had the discussion to use `discardedNodes` but opted against it and do a full flush there as well: neos#5175 (comment)

This change allows us to get rid of the `discardedNodes` property (with dimensions) in the `WorkspaceWasPartiallyDiscarded` event.
@mhsdesign mhsdesign force-pushed the bugfix/publish-only-across-all-dimensions branch from f8ca75f to e4ad10b Compare November 30, 2024 20:13
reasoning:

- the `publishedNodeIds` were removed on the event, as we dont need it, and it would rise a discussion if we need to keep dimension space points there too or not: see neos#5386 how i archived it.

- instead of now trying to make `WorkspaceWasPartiallyPublished` contain all the information we dont use or need, the idea is to make it dumb. Now its basically just a `WorkspaceWasPublished` and then the other remaining events are applied as if they were carried out by a user.
If we need to add more metadata to the partial publish at some point, that would need a new event id say as we woouldnt be able to mess with the `WorkspaceWasPartiallyPublished` once released.
So by just combining them into one we have the option open to create a `WorkspaceWasPartiallyPublished` again but differently which wil be breaking too but differently. Its just a new event then, no need to migrate data.
@mhsdesign
Copy link
Member Author

Soooo i stripped this pr a little futher to its needed parts. And its beautiful. +130 lines -400lines. With the last controversial commit another -200 lines.
So a lot complexity is gone ^^ the parts that were perviously in here - extracting the node ids and their affected dsp as metadata for the event - are now part of #5386
This 'feature' is unused (expect on place in the asset usage) and fully untested in the core.

Adjustment in the asset usage:

TASK: Simplify asset usage catchup hook to not optimise partial discard
Instead of fine granular updating only the usage references on the nodes that were discarded, well throw away everything and then rely on the reapplied events what is kept.

This is more consistent to how other parts react to publishing like the graph just forks a new content stream and doesnt care about the old data, the change projection behaves similar to the graph.
And for the content cache flusher we had the discussion to use discardedNodes but opted against it and do a full flush there as well: #5175 (comment)

This change allows us to get rid of the discardedNodes property (with dimensions) in the WorkspaceWasPartiallyDiscarded event.

@mhsdesign mhsdesign changed the title BUGFIX: Partial publish & discard always across all dimensions (for now) BUGFIX: Simplify PartialPublish & Discard (for now) to NOT select dimensions and have custom events Nov 30, 2024
@mhsdesign mhsdesign marked this pull request as ready for review December 1, 2024 08:46
@dlubitz
Copy link
Contributor

dlubitz commented Dec 5, 2024

I don't understand, why we need to remove the "WasPartially*" Events. This is exactly what we shouldn't do in a event sourced system. I see the point, that we don't need to handle that anymore, but this shouldn't be the way to go.

@mhsdesign
Copy link
Member Author

Its not the classy thing but were beyond that - especially as its beta id say. The point why i exactly removed it are:

  • a replay is NOT necessary and the migration is only needed for if you want to replay some time - so the current state of the db is fine
  • at some point we might reintroduce the WasPartially events with proper metdata and information, and to avoid having WasPartially_2 then its just easier to get rid of them now as we dont need them
  • its easy to forget implementing one of the events like PublishWorkspace or the partial thing, to make it simpler to write catchup hooks its just sane to have one event

i also discussed this approach with christian and Sebastian and thing its the right one? - also in terms of reducing complexity for once.

Copy link
Member

@kitsunet kitsunet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah as I said before I am not happy at all with this, but it's probably safer for now. The "lost" information is minimal, I don't see a scenario in which we retroactively would want, need to do something based on the partially (published|discarded) events.

Reverts d2d9faa partially

Initially this change was done to fix neos#5303
But that was a hoax see neos#5303 (comment)

we always emit the `WorkspaceWasPublished` event and this just simplifies the code-path again.

For the discard all case, the code is stil a little easier to read _with_ the condition.
…t runtime

We cannot erase history just because we want to.
Especially because this does not fix any bugs.
…d spread around

This overhauls the `NodeAggregateWasDisabled` upcasting from

neos#4659
@mhsdesign
Copy link
Member Author

Okay i give in. @dlubitz youre right we dont need to rewrite history if we dont need to. i relaxed the fromArray to allow to return other objects than self. Now its happening at runtime.

Also the content stream pruner will still select the legacy events in the query having in mind, that they will be upcasted.

@mhsdesign mhsdesign requested a review from dlubitz December 10, 2024 10:25
@bwaidelich
Copy link
Member

I'm late to the party and only skimmed through the code so far.
I fail to understand why those Partially events ought to be removed.
IMO there's a crucial domain-specific difference between a full and a partial publish/discard. And this difference should not only be reflected by some metadata in the events.
But most probably I misunderstood things

This was done do ease debugging and still preserve the intention.
@mhsdesign
Copy link
Member Author

As discussed today, i added a partial flag to the events as we want keep the main idea to reduce the complexity (only one public event)

See: TASK: Introduce "partial" flag on publish events

@kitsunet
Copy link
Member

I guess we can merge this then?

@mhsdesign mhsdesign merged commit 088edbd into neos:9.0 Dec 12, 2024
9 checks passed
@mhsdesign mhsdesign deleted the bugfix/publish-only-across-all-dimensions branch December 12, 2024 08:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
4 participants