Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stop using SetLink for search results! #1502

Closed
linas opened this issue Dec 22, 2017 · 35 comments
Closed

Stop using SetLink for search results! #1502

linas opened this issue Dec 22, 2017 · 35 comments

Comments

@linas
Copy link
Member

linas commented Dec 22, 2017

SetLink has a number of technical issues with respect to the core concept of atoms and how atoms should be used. Basically, SetLink is a bad citizen. The only answer I can find is to simply excommunicate, banish SetLink from the Kingdom. The correct solution is to use MemberLink instead. (See however, other ideas below.)

Unfortunately, BindLink and GetLink, when executed, return SetLink's containing the search results. This needs to be replaced by MemberLink, and specifically, I think the following form might work best:

MemberLink
    SomeSearchResultAtom
    AnchorNode "pattern-9a30bd4cbdcd6"

and then have cog-execute! return the AnchorNode -- all search results can then be found, attached to this anchor.

Q: Why is SetLink bad?
A: See the wiki page for it. https://wiki.opencog.org/w/SetLink

Q: Why AnchorNode?
A: It seems like the best kind of node for this job.

Q: What other benefits (besides avoiding SetLink)?
A1: Search results can be reported incrementally, as the search proceeds, instead of waiting for one big glob to appear at the end.

A2: The SetLinks that get returned glom up the atomspace, and people usually forget to delete them. They are hard to delete - you have to actually have them in your hand to delete them - searching for them later on is really hard or impossible. By contrast, finding and deleting the AnchoreNodes is easy, and can be done at any time.

A3: Its a baby-step to converting the AtomSpace itself to a special kind of MemberLink and/or ContextLink. Having to pass the AtomSpace around in various methods is a real pain, and causes a lot of programming issues. It would be nice to just get rid of the atomspace entirely, and instead declare membership as:

AtomSpaceLink
    AtomSpaceNode "primary atomspace"
    SomeAtom

This or something like this could solve a lot of technical problems, and nuking SetLink is a natural first step.

I believe that this will simplify creating a distributed atomspace. We could layer everything on top of (for example Redis or Riak and have that do all the work of distributing atoms. I don't believe this any longer. The StorageNode is a superior solution for distributed atomspaces.

A4: If we do the above or something like it, we could merge MapLink/BindLink, since they both do almost the same thing, just that MapLink does it for its arguments, while BindLink does it for the entire atomspace. Similarly GetLink/FilterLink...

Q: Is this a lot of work?
A: You bet. BindLinks are used all over the place.

@ngeiswei
Copy link
Member

ngeiswei commented Jan 3, 2018

👍 coincidentally I could use "the AtomSpace itself to a special kind of MemberLink" for implementing an elegant as well as efficient URE-based pattern miner (EDIT: along with outputting pattern matcher results with member links, of course).

@linas
Copy link
Member Author

linas commented Feb 2, 2018

List of distributed DB alternatives:

Ignite vs Redis vs Riak: https://db-engines.com/en/system/Ignite%3BRedis%3BRiak+KV

After a super-quick skim, Ignite seems to have a better feature-set, Redis is more popular and thus stable.

This now appears to be off-topic. The StorageNode provides a superior solution. See the wiki entry for it.

@linas
Copy link
Member Author

linas commented Apr 25, 2018

See also #1507

@linas
Copy link
Member Author

linas commented Dec 24, 2018

See #1967 for details on AtomSpaceLink

@linas
Copy link
Member Author

linas commented Jan 19, 2019

To avoid cluttering the atomspace with "useless" anonymous SetLinks, one could implement a SetLink so that, when it has no incoming links, i.e. has been orphaned, then it is also removed from the atomspace. This will avoid a lot of the general clutter associated with anonymous SetLinks (they are anonymous, because they are un-named. Since there is no name, there is no way to refer to them. If they don't have any incoming set, then they really are unreferenced, and thus kind-of dead.)

@noskill
Copy link
Contributor

noskill commented Feb 21, 2019

It is possible to pass child atomspace to execute_atom thus avoiding cluttering

@linas
Copy link
Member Author

linas commented Feb 21, 2019

There's no mechanism for that. If one has (cog-evaluate! (SequentialAnd ... (Put .... (State ... (Get ... (SequentialOr ... (Delete ... (Bind ... there's no way to fiddle in some other atomspace into that sequence. If the top-level SequentialAnd is tail-recursive, its an infinite loop, and never returns. There's no chance to sneak in there and insert some other atomsapce, or clean things up.

@linas
Copy link
Member Author

linas commented Feb 21, 2019

Just to be clear: the above example actually is "realistic"; its how the robot behaviors were being done some years ago. Its actually quite natural, and fits well with time-line editors, behavior trees, drag-n-drop GUI's etc. You just "connect the dots" to get things done, and all of that nesting was just a way of flowing data through the graph.

Now that we have Values, we have an alternative way of flowing data through the graph...

@noskill
Copy link
Contributor

noskill commented Feb 22, 2019

There is mismatch between scheme and python api. Python api requires to pass atomspace to cog-execute and cog-evaluate. We are using it a lot. I made context manager to conveniently create and set as default child atomspaces to define query in temporary atomspace too:
https://github.com/singnet/semantic-vision/blob/2681f93cf12ee0e5c1a53faef28a815d9e9c857c/experiments/opencog/cog_module/module.py#L138-L148

@linas
Copy link
Member Author

linas commented Feb 22, 2019

The python api is abandonware and I'm exhausted trying to fix the infinite number of bugs in it. I want to punch the people who created it, they have caused me huge amounts of pain.

You can certainly do temporary work in child atomspaces - that is what they were created for. However, at some point, you have to move the desired results back into the main atomspace. (since the main atomspace is what you are saving to disk, when you power-off; its also typically the atomspace you share with other machines, when doing distributed processing.) ... In principle you could save child atomspaces also, but that is currently not implemented in the postgres backend. (and since the distributed atomspace depends on postgres, you also cannot have a distributed-child-atomspace)

Anyway, the whole point of the (cog-evaluate! (SequentialAnd ... (Put .... (State ... (Get ... (SequentialOr ... (Delete ... (Bind ... example is to remind you that, in general, you should NOT be coding in python. And you should NOT be coding in scheme. You should, in general, think "how can I represent my data flow with atoms, so that all data and flows and processing happen in the atomspace". or even better: "how can I represent my data flow with atoms, in such a way that PLN and the chainers, and the pattern miner and moses can understand my data flow, learn it, adjust it, alter it". Since PLN, the chainers, the pattern miner don't know how to read/write python code, any python code that you write is "invisible" or "inaccessible" to these subsystems.

@noskill
Copy link
Contributor

noskill commented Feb 25, 2019

Just to be clear: the above example actually is "realistic"; its how the robot behaviors were being done some years ago.

Do you still use this approach? Is there example of implementing main loop in atomese? I would certainly look more into it to use this approach in the future.

Code that i linked is intended to be released for general machine learning community. It is kind-of code generator for building neural networks during inference along the lines of #1970. It's quite hard to write in atomese, at least harder than in general purpose programming language. We don't even have a language tutorial yet. That's why main loop is in python and not in atomese. There is example of training on sum of two digits from mnist https://github.com/singnet/semantic-vision/blob/2681f93cf12ee0e5c1a53faef28a815d9e9c857c/experiments/opencog/cog_module/mnist.py
atomese describes probability of getting correct sum.

bottom line
I think code generation is the way to use atomese, at least for wider audience. And perhaps i can make something like scan operator in theano http://deeplearning.net/software/theano/library/scan.html to make loops over datasets.

@noskill
Copy link
Contributor

noskill commented Feb 25, 2019

It seems we need atomese operators to manipulate atomspaces e.g. create and delete child atomspaces. Then it will be easier to port some code from python/schema to atomese

@linas
Copy link
Member Author

linas commented Feb 25, 2019

Do you still use this approach?

Yes. Hit bug opencog/opencog#3433 only a week ago, when I broke tail recursion during the cleanup.

Is there example of implementing main loop in atomese?

Yes. /examples/atomspace/recursive-loop.scm

@linas
Copy link
Member Author

linas commented Feb 25, 2019

Code that i linked is intended to be released for general machine learning community. It is kind-of code generator for building neural networks during inference along the lines of #1970.

Well, as I stated repeatedly there, I think that using the grounded schema node is a deep and fundamental mistake. Last summer, during very long and detailed email conversations with @Necr0x0Der Alexey Potapov, we worked out, in considerable detail, exactly how to map neural nets into the atomspace. I was rather unhappy that you decided to throw these plans away, without any discussion, without any warning, for no particular reason, and create something completely new, different and incompatible with any sort of long-term roadmaps. I still do not understand why the original plan for neural nets was thrown away. I think it was a decent plan, a good plan. And I'm still deeply unhappy with the GroundedSchema design, for all the reasons that I already explained.

It's quite hard to write in atomese, at least harder than in general purpose programming language.

Atomese is not supposed to be written by human beings. That's the whole point. Only code generators should read and write Atomese. Not humans.

This means that all of the code generators need to be compatible with each-other. The original neural-net design was compatible with the existing code generators (the chainers, the miner, PLN, openpsi, ghost) The grounded-schema design is incompatible with the code generators.

We don't even have a language tutorial yet

For which language? There are 27 tutorials for atomese in /examples/atomspace and 25 more in /examples/pattern-matcher If you actually go through and study all fifty tutorials for Atomese, you shoul have a pretty good idea of how things work. If you skip the tutorials .. then, of course, things will be confusing.

@linas
Copy link
Member Author

linas commented Feb 25, 2019

It seems we need atomese operators to manipulate atomspaces e.g. create and delete child atomspaces. Then it will be easier to port some code from python/schema to atomese

Can you open a new issue to discuss this topic? I don't understand why this is needed.

@noskill
Copy link
Contributor

noskill commented Feb 26, 2019

#2105 - new issue

@linas
Copy link
Member Author

linas commented Feb 26, 2019

OK, I'm thinking that now is a good time to move forward on this. (Yes, @noskill the temp-atomspace idea works too; but there are also other reasons to change the API too -- e.g. parallelism) I think I found a simple solution. Create a new link, call it QueryLink (or SearchLink ??) that is almost identical to the current BindLink. The difference is that it returns nothing.

To provide backwards-compat, every (BindLink P Q) will be treated as (QueryLink P (MemberLink Q (AnchorNode "results"))). Then, when the QueryLink completes, the results will be gathered up in a SetLink and returned in backwards-compat fashion.

New users will be encouraged to use QueryLink instead of BindLink

By running QueryLink in it's own thread i.e. (ParallelLink (QueryLink ...)) the results will accumulate in real-time, so the user does not have to wait for all results before starting work on the first one... If user wants to know when all results have been found, then (JoinLink (QueryLink ...)) can be used.

@noskill
Copy link
Contributor

noskill commented Feb 27, 2019

examples and language tutorial/introduction are different things. Otherwise you would name that directory "tutorial" and not examples. I personally look for information in opencog wiki, and update it sometimes.

And actually our design is somewhat compatible with pln. I need to rewrite Implication and fuzzy conjunction rules yet. This in turn will give up learnable weights for links. Currently there is not way to backpropagate error to these weights.

@linas
Copy link
Member Author

linas commented Feb 27, 2019

examples and language tutorial/introduction are different things

The examples directory is the de-facto best, clearest and most complete language tutorial. We could trivially rename that directory "tutorials", if that makes you happier. The "tutorials" on the wiki are in terrible condition. First, out of the 50-or-so examples/tutorials in the examples directory, the wiki pages cover maybe 3 or 4 of them. Those were written by someone who was trying to learn the atomspace for the first time ever. And worse -- they had no long-term plan to actually do anything with the atomspace, so no motivation to make the tutorials "meaningful", illustrating how certain kinds of problems can be solved in certain ways in the atomspace. The net result is that the quality of the "tutorials" on the wiki is very low.

@noskill
Copy link
Contributor

noskill commented Mar 1, 2019

Thanks Linas, we'll get in contact with Misgana for knowledge sharing.

@noskill
Copy link
Contributor

noskill commented Mar 1, 2019

I agree on what are you saying about distinction between values and atoms. We did not use any new atom types yet. Vitaly implemented ptrvalue, so that pyobject can be attached to atom. Everything neural network related is stored in values. This value type is incompatible with other atomese datatype, but that is the price of backpropagation. It can be converted to float if needed.

@linas
Copy link
Member Author

linas commented Mar 5, 2019

that is the price of backpropagation

Well, I think you understand what I've been trying to say. Perhaps its time for me to review the ptrvalue thing again. Are you saying that you're actually doing backpropagation in the atomspace itself ? I was assuming that you're just wrapping code that does computation elsewhere (i.e. on gpu's) anyway, we should have this discussion on some other github issue, or mailing list, not here.

@noskill
Copy link
Contributor

noskill commented Mar 6, 2019

All that is discussed here is related to the issue #1970, i'll update it instead of creating new one

@stellarspot
Copy link
Contributor

stellarspot commented Jul 31, 2019

I just want to consider the option where SetLink is replaced by MemeberLink with AnchorNode.
Here is an example which finds nodes using GetLink and call LambdaLink using PutLink:

(Inheritance (Concept "B") (Concept "A"))
(Inheritance (Concept "C") (Concept "A"))
(Inheritance (Concept "D") (Concept "E"))

(display
 (cog-execute!
  (Get
   (Inheritance (Variable "$X") (Concept "A")))
  ))

(display
 (cog-execute!
  (Put
   (Lambda
    (Variable "X")
    (Inheritance
     (Variable "X")
     (Concept "W")))
   (Get
    (Inheritance (Variable "$X") (Concept "A"))))))

The current result for GetLink is:

(SetLink
   (ConceptNode "B")
   (ConceptNode "C")
)

And the result of PutLink is:

(SetLink
   (InheritanceLink
      (ConceptNode "C")
      (ConceptNode "W")
   )
   (InheritanceLink
      (ConceptNode "B")
      (ConceptNode "W")
   )
)

If I understand correctly the new proposed design suggests:
GetLink:

  • returns AnchorNode , for example (AnchorNode "pattern-#12345")
  • puts MemberLinks into AtomSpace:
MemberLink
    ConceptNode "ball-1"
    AnchorNode "pattern-#12345"

MemberLink
    ConceptNode "ball-3"
    AnchorNode "pattern-#12345"

What is expected behavior of PutLink in this case?
Should it look at the argument and if it is AnchorNode then retrieve the first atom from MemberLink outgoing set with the given AnchorNode and use it as input?

Could such functionality be added in two steps:

  1. introduce something like ParallelGetLink and ParallelPutLink which do not use SetLink at all
  2. when it works replace GetLink and PutLink by their new implementation

@linas
Copy link
Member Author

linas commented Jul 31, 2019

AnchorNode

Yes, more or less. AnchorNode is also an imperfect solution; thus I'm interested in a publish-subscribe system. #1750 which is meant to be better than AnchorNodes. One problem with Anchor nodes is that if there are two users (two threads) each grabbing one atom (dettaching it from the Anchor Node, and then "processing" it) then there are race conditions where the two threads might both get the same atom. Maybe we can fix this with a LockAnchorNode ? How would that work? What if I want the threads to sleep, waiting for data to arrive at the LockAnchorNode? Do I need a SemphoreNode?

There are other possible solutions: maybe the results of a Get/Bind should be attached to a Value, somewhere, instead of a MemberLink. So maybe have a ResultsValue that provides locking, semaphores, messaging? Or is that a bad idea?

What is expected behavior of PutLink in this case?

Backwards-compatible, in some way. It should "do the right thing", whatever that is. It is not always clear what the "right thing" is, and so some careful thinking and design and prototyping is needed. I like things that are small, simple, and have little or no negative impact on the performance of existing systems. I don't want 99% of the users to run slower, just so that 1% of the users can use some new whiz-bang feature.

Introduce something like ParallelGetLink and ParallelPutLink

Maybe. If you are really interested in this, put together a detailed design (in some other github issue) that shows exactly how this might work. I'm kind-of interested in solutions that solve big problems -- i.e. not just get/put, but also bindlink, also the queueing/dequeueing problem. The SetLink alone is itself not all that terrible -- its really a symptom of the bigger problem of "how does the data flow around?" that needs to be solved, that needs to have better solutions.

@stellarspot
Copy link
Contributor

I just want to summarize problems related to SetLink to have a deeper view on this question.
GetLink and PutLink are just used below as examples of systems which produce and consumer some results.

  1. Problems with SetLink as a set concept. It is difficult to add/remove elements to/from SetLink and find a set by pattern (present/absent elements).

Possible solution could be using MemberLink. For example using AnchorNode:

MemberLink
    ConceptNode "ball-1"
    AnchorNode "pattern-#12345"

or a set can be marked by some specialized node like SetNode:

MemberLink
    ConceptNode "ball-1"
    SetNode "set-1"
  1. GetLink can generate results in parallel. PutLink can use results in parallel.
    There are several dimensions from which the problem can be viewed.

Control flows:

  • GetLink needs to notify that all results are generated
  • PutLink can cancel results generation when they are not necessary anymore

PutLink can work as busy loop which checks for new results or this can be implemented as publisher/subscriber system where GetLink notifies subscribers that a new result is available

Results of GetLink can be stored in AtomSpace as MemberLink or it can be internally implemented program objects which are directly used by PutLink.

Possible solution where results are stored in MemberLinks:
GetLink can have SetNode as a parameter where to store results:

GetLink
  SetNode "S"
  InheritanceLink
    VariableNode "X"
    ConceptNode "A"

results:

MemberLink
    ConceptNode "a-1"
    SetNode "S"

PutLink uses SetNode as argument to know where to retrieve results from:

PutLink
  LambdaLink
    ...
  SetNode "S"

(Just a curious question. Why PutLink mentions its argument as the last atom after the body? Is not more naturally to place the argument after the variable list and before the body? )

SetNode can contain its control information as values:

  • "done"
  • "canceled"
  • "subscribers"

When GetLink finishes it sets SetNode "done" value to true.
GetLink reads "canceled" value from SetNode and if it is true stops the execution.
When new value is generated GetLink notifies one or all subscribers from "subscribers" value.

MemberLink has "lock" value. If someone needs to delete an element of the set it needs to lock the MemberLink link to not allow to read/delete it by other subscribers.
If someone needs to read a set value it must obtain the read lock form the MemberLink has "lock" value.

Another option is to not put results of GetLink into AtomSpace. In this case different program primitive like promise/future, queue or publish/subscribe systems can be used.

  1. AtomSpace cleanup.
    One of the aim to get rid of SetLink is that it pollutes the AtomSpace and it is difficult to clean it up.
    It looks like it is a general problem not only related to SetLink.
    For example ListLink can pollute the AtomSpaces as well:
PutLink
   LambdaLink
      ...
  ListLink
    Concept "A"
    Concept "B"
    SomeLink ...

There should be a unified way to remove links which are used only to pass results from one place to another.

May be some ArgumentLink which means that PutLink passes values from ArgumentLink to LambdaLink and then removes the LambdaLink.

  1. Description of PutLink declares that it is a notation for beta redex. It can mean that PutLink first puts its arguments to body and makes execution after that.

The current PutLink implementation first executes its arguments and then puts it to body.
It seems as an internal property of PutLink. New design of Get/PutLink which uses MemberLinks can't avoid it. The reason is that it is not clear before substitution if result of PutLink argument execution contains only one element or many. So it is necessary to execute the PutLink argument first.

In this case PutLink description should be revisited so it is not just a beta redex.

@linas
Copy link
Member Author

linas commented Dec 26, 2019

FYI: please note: issue #1750 describes both a QueueValue and a publish-subscribe system for distributing processing results.

@linas
Copy link
Member Author

linas commented Feb 10, 2020

Pull req #2500 implements the initial idea discussed at the very top, that began this long conversation. It's simple (very simple, actually) and it works. ...

... but in some senses, its incomplete. The above discusses the need for a QueueValue where search results could be placed on a queue, and/or some kind of promises/futures system, that would allow search results to be pulled from the pattern matcher as needed. This has not been done. However, I think these are now quite doable: they can be slotted into the code in the same way that the AnchorLink is slotted in with pull req #2500

To provide more details: the pull req #2500 allows this:

(define get-link
   (Get
      (VariableList
         (TypedVariable (Variable "$x") (Type 'ConceptNode))
         (Anchor "get-results"))
      (Present
         (Evaluation (Predicate "foo")
            (List (Variable "$x") (Concept "alpha"))))))

and the search results will be attached with MemberLinks to the specified AnchorNode. In the same way, one could instead have a FIFONode or a QueueNode or a PromiseNode, instead ot the AnchorNode above, and search results could then be attached to QueueValue on that node, instead of using the rather wasteful MemberLinks.

@linas
Copy link
Member Author

linas commented May 4, 2020

FYI, The following updates are relevant:

  • Pull req Implement a parallelizable query #2117 (Feb 2019) introduces a QueryLink which is identical to BindLink except that it returns a LinkValue of results, instead of a SetLink.
  • Pull req All search functions now use QueueValue for results. #2571 (may 2020) converts the pattern matcher to use a QueueValue under the covers. The QueueValue is a thread-safe FIFO, and in principle allows one (or more) pattern-matcher threads to report results, while one (or more) other threads to process them as they arrive. So far, nothing uses this ability, but it is now at least possible.

linas added a commit to linas/atomspace that referenced this issue May 5, 2020
@linas linas pinned this issue Jul 27, 2021
@linas
Copy link
Member Author

linas commented Dec 4, 2021

Summary and state of affairs.

  • Anchor proposal -- This suggests specifying an AnchorLink in the query, and chaining on results with MemberLink as they show up. This has been implemented and documented and unit-tested. See Post search results to an AnchorLink #2500 No one uses it. I'm not sure I like it; it does not offer elegant compositionality. It does not seem to provide a way to say "the query is now done". There seems to be some kind of need for semaphores or locking or something for the anchors.
  • AtomSpaceNode -- This suggests treating the AtomSpace as a kind of mutable Link. Thus, the query would return an AtomSpace, layered on the main space, holding the search results. Partly implemented: The AtomSpacePtr is now a kind of ValuePtr. No one is using this for anything, just yet. There is no way to automatically project/collapse contents of derived atomspaces back into the main atomspace. (You'd have to do it by hand.)
  • Compositionality. -- This comment points out that the robot behavior code uses constructs like (cog-evaluate! (SequentialAnd ... (Put .... (State ... (Get ... (SequentialOr ... (Delete ... (Bind ... with the top-level SequentialAnd written to be tail recursive. This kind of compositional use of queries needs to be supported. None of the alternate propsals here have explored this much or at all.
  • QueryLink returning QueueValue -- implemented, works, documented, unit tested. See All search functions now use QueueValue for results. #2571 How do do compositionality with this is unexplored.
  • Generic Publish-subscribe system See Design a Publish/Subscribe System (aka Futures) #1750
  • Monads -- The PutLink and other misc links currently accept either individual atoms, or sets of atoms, as input. In the case of sets, the contained atoms are automatically unwrapped and processed. This suggests that the current use of Setlink is as a kind of sloppy monad and thus what we really need is a cleaned-up monad to hold multiple-value results. The precise cleanup is unclear.

@linas
Copy link
Member Author

linas commented Dec 4, 2021

Closing. Replaced by #2911

@linas linas closed this as completed Dec 4, 2021
@linas linas unpinned this issue Dec 4, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants