Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PoS initialization / SmesherService does not work properly (got stuck?) #4288

Closed
brusherru opened this issue Apr 12, 2023 · 4 comments
Closed
Assignees

Comments

@brusherru
Copy link
Member

brusherru commented Apr 12, 2023

Description

If you run the Node with valid smeshing opts in the node config (aka it will start smeshing on run) and you don't have a PoS data yet — then your Node will start generating it. However, it won't respond on SmesherService.StopSmeshing (and keep sending you PostSetupStateStream updates, with PostStatus === In_progress).

Such behavior makes Smapp GUI looks like "it is not responding". For example, if you click on "Pause smeshing" or "Delete PoS data" buttons (actually both of them calls SmesherService.StopSmeshing with a different deleteFiles flag).

Another symptom is that if you try to close the Node it will be ending the process for too long (Smapp kills it by timeout)...

Steps to reproduce

  1. Clean ~/post
  2. Add such smeshing-settings into your node config:
    {
        "smeshing-coinbase": "stest1qqqqqqp8r9d6yzz8kqc3e9t0rgtlu8jwhlhyc4gm2xaut",
        "smeshing-opts": {
            "smeshing-opts-datadir": "~/post/data",
            "smeshing-opts-maxfilesize": 2147483648,
            "smeshing-opts-numunits": 4,
            "smeshing-opts-provider": 1,
            "smeshing-opts-throttle": false
        },
        "smeshing-start": true
    }
    
  3. Run the node
  4. Call
    grpcurl --plaintext localhost:9092 spacemesh.v1.SmesherService.PostSetupStatusStream 
    
    And you'll get something like:
    {
     "status": {
       "state": "STATE_IN_PROGRESS",
       "opts": {
         "dataDir": "/Users/username/post/data",
         "numUnits": 4,
         "maxFileSize": "2147483648",
         "computeProviderId": 1
       }
     }
    }
    
    Pay attention, that the response does not contain numLabelsWritten, but it should contain it. So it might be a clue :)
  5. Call in another shell:
    grpcurl -d "{\"deleteFiles\":false}" --plaintext localhost:9092 spacemesh.v1.SmesherService.StopSmeshing
    
    It is expected to have an answer with empty status obj, but actually, this request gets stuck...
  6. Wait for a while (I didn't find what exactly changes, but seems like Node goes to another "state" and then everything becomes fixed). It takes about 10-20 minutes on my machine. Also, I've noticed that usually, it starts sending numLabelsWritten, and my PoS file jumps from zero bytes to 16.8 MB :)

Interesting moment, that if you repeat everything, but set smeshing-start: false on the second step, and then call StartSmeshing with the same opts — everything will work well (but I'm not sure it is actually generating PoS). Let's call it a "workaround" just for reference :)

MAYBE it is happening because I'm using CPU as PoS data provider (go-spacemesh is constantly using about 99-103% CPU on my machine).

Actual Behavior

  • Requests to SmesherService.StopSmeshing does nothing
  • There is no numLabelsWritten in PostSetupStateStream
  • But after some time (actually, generated at least some PoS data? It's my assumption) — it fixes

Expected Behavior

  • StopSmeshing stops it at any time or at least reacts well (and stops sending "Progress" state)
  • numLabelsWritten should proceed :)

Environment

Please complete the following information:

  • OS: macOS 12.3 (but I'm afraid this issue valid for other platforms as well)
  • Node Version: 0.3.0-beta.0

Additional resources

Application logs (while it "stuck"):

15:44:17.369 › SmesherService, in grpc PostDataCreationProgressStream, output: {"state":2} 
15:44:22.728 › SmesherService.StopSmeshing called:   {"deleteFiles":false}
15:44:40.867 › SmesherService.StopSmeshing called:   {"deleteFiles":false}
15:44:47.369 › SmesherService, in grpc PostDataCreationProgressStream, output: {"state":2}

Application logs (how it should be, got it using "workaround" described above):

15:48:34.961 › SmesherService.StartSmeshing called:   {"coinbase":{"address":"stest1qqqqqqp8r9d6yzz8kqc3e9t0rgtlu8jwhlhyc4gm2xaut"},"opts":{"dataDir":"/Users/brusher/post/eeb970d3","numUnits":4,"maxFileSize":2147483648,"computeProviderId":1,"throttle":false}}
...
15:48:35.193 › SmesherService, in grpc PostDataCreationProgressStream, output: {"state":2,"numLabelsWritten":{"low":1048576,"high":0,"unsigned":true}} 
15:48:35.917 › SmesherService, in grpc PostDataCreationProgressStream, output: {"state":2,"numLabelsWritten":{"low":1048576,"high":0,"unsigned":true}} 
15:49:06.185 › SmesherService, in grpc PostDataCreationProgressStream, output: {"state":2,"numLabelsWritten":{"low":1048576,"high":0,"unsigned":true}} 
15:49:06.909 › SmesherService, in grpc PostDataCreationProgressStream, output: {"state":2,"numLabelsWritten":{"low":1048576,"high":0,"unsigned":true}} 
...
15:49:45.574 › SmesherService.StopSmeshing called:   {"deleteFiles":false}
15:49:45.908 › SmesherService, in grpc PostDataCreationProgressStream, output: {"state":3,"numLabelsWritten":{"low":1048576,"high":0,"unsigned":true}} 
15:49:46.184 › SmesherService, in grpc PostDataCreationProgressStream, output: {"state":3,"numLabelsWritten":{"low":1048576,"high":0,"unsigned":true}}
...
@pigmej
Copy link
Member

pigmej commented Apr 13, 2023

@brusherru actually the smeshing can be stopped when batch is finished. When batch is in progress then you cannot stop it (that's how gpu-post works)

What GPU was used for the test?

Cc @piersy

@piersy piersy moved this from 🔖 Next to 🏗 Doing in Dev team kanban Apr 17, 2023
@pigmej
Copy link
Member

pigmej commented Apr 17, 2023

@brusherru were you able to check that with smaller batch size? Does it behave better then?

@piersy
Copy link
Contributor

piersy commented Apr 17, 2023

On my machine using a smaller batch size is fixing this. I used the smallest batch size of 8 and smeshing seemed to stop almost immediately.

@pigmej I suggest closing in favour of spacemeshos/post#125

@pigmej
Copy link
Member

pigmej commented Apr 18, 2023

yes makes sense :)

@pigmej pigmej closed this as completed Apr 18, 2023
@github-project-automation github-project-automation bot moved this from 🏗 Doing to ✅ Done in Dev team kanban Apr 18, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Archived in project
Development

No branches or pull requests

3 participants