Skip to content

  • Projects
  • Groups
  • Snippets
  • Help
    • Loading...
    • Help
    • Submit feedback
    • Contribute to GitLab
  • Sign in / Register
N
neo
  • Project
    • Project
    • Details
    • Activity
    • Releases
    • Cycle Analytics
  • Repository
    • Repository
    • Files
    • Commits
    • Branches
    • Tags
    • Contributors
    • Graph
    • Compare
    • Charts
  • Merge Requests 1
    • Merge Requests 1
  • CI / CD
    • CI / CD
    • Pipelines
    • Jobs
    • Schedules
    • Charts
  • Members
    • Members
  • Collapse sidebar
  • Activity
  • Graph
  • Charts
  • Jobs
  • Commits
  • Kirill Smelkov
  • neo
  • Merge Requests
  • !2

Merged
Opened Mar 29, 2023 by Levin Zimmermann@levin.zimmermann
  • Report abuse
Report abuse

WIP: Teach NEO/go to handle multiple master nodes

Hello Kirill,

in the process of migrating to WC2 on our WWM/wind clone instance, I realized that it's currently not yet possible to use a NEO/go client with a NEO server which has more than one master node.

There are multiple reasons why this isn't possible yet. Not all of them are related to NEO/go: I also made a mistake in nexedi/slapos@0cf70a6e when re-working the NEO URL formatting for WCFS (I forgot to split the master nodes IPv6 from each other with a comma ,).

On the NEO/go side - as far as I can see - we need to finish the following steps:

  1. We need to fix the deadlock when a node connects to another node and this other node sends a packet with a different connection id (which is documented as to-fix in the source code). This need to be fixed here, because when we have multiple masters and we try to connect to a secondary master, this secondary master sends us a NotPrimaryMaster packet, which is not an answer of RequestIdentification and therefore has a different msg id and therefore results in a deadlock. Dial should instead return an error, so that TalkMaster could try a different node.

  2. We need to finish the TODO in TalkMaster: if our initially tried master node is not the primary master, we should try the other nodes (as it's stated in the protocol definition).

  3. If I'm not mistaken, this also means we need to change the MasterAddr attribute of Node type to something like MasterAddrArray (and do the same in related code for instance NewMasteredNode).

  4. Finally we also need to split the host part of a NEO URI by , to MasterAddrArray as we do it in NEO/py.

I'm fine with working on all steps. I want to open this MR to discuss what's the best way how this should be done. I hope (2), (3) and (4) shouldn't be too difficult. But (1) seems to be more complicated.

Of course, the easiest solution would be if any answer to a request would have the same connection id, I guess this would make most sense for the NEO/go connection based model. I also guess this won't happen. So let's find a different way.

You already sketched a possible solution with

link.CloseAccept()
link.Ask1(reqID, accept)
link.Listen()

where I assume that Listen is meant to be Accept in current NEO/go/t?

When trying this I had the problem that the other nodes packet was dropped/not accepted in serveRecv, then in serveRecvs next iteration it catched an EOF error which made the NodeLink to be closed and therefore creates an error in Conn.recvPkt, which is finally propagated back to Expect -> Ask1 -> Dial.

Is there another case where we send a message to another node and the other node is expected to reply with a packet which is not the answer e.g. has a different message id (and is not a simple Error or CloseClient message)?

I can't think of any solution which stays in the isolated-connection-model as long as we can receive answers with a different connection id. If this is an exception we could introduce something like Ask1RPC. In pseudo-code this could do something like this:

func Ask1RPC(link, request, ...response):
	conn = link.newConn()
	conn.sendMsgDirect(request)
	return link.ExpectRPC(request, ...response)

func ExpectRPC(link, request, ...response):
	loop forever:
		for each connection in link:
			if connection.hasNewPackage:
				if connection.newPackage is any of response:
					return connection.newPackage index in response
                                else: return error

e.g. Ask1RPC could temporarily ignore the connection-isolation and process any incoming packet from any connection. I think in case of an initial dialing this could be ok, because there aren't so many different packet we can expected to arrive from the dialed node.

What do you think about this Kirill, do you think this could be an acceptable trial, would you choose a different approach?

Best, Levin

/cc @kirr

Check out, review, and merge locally

Step 1. Fetch and check out the branch for this merge request

git fetch https://lab.nexedi.com/levin.zimmermann/neoppod.git t-with-multiple-master-nodes
git checkout -b levin.zimmermann/neoppod-t-with-multiple-master-nodes FETCH_HEAD

Step 2. Review the changes locally

Step 3. Merge the branch and fix any conflicts that come up

git fetch origin
git checkout origin/t
git merge --no-ff levin.zimmermann/neoppod-t-with-multiple-master-nodes

Step 4. Push the result of the merge to GitLab

git push origin t

Note that pushing to GitLab requires write access to this repository.

Tip: You can also checkout merge requests locally by following these guidelines.

  • Discussion 9
  • Commits 9
  • Changes 12
Assignee
Assign to
None
Milestone
None
Assign milestone
Time tracking
0
Labels
None
Assign labels
  • View project labels
Reference: kirr/neo!2

Revert this merge request

This will create a new commit in order to revert the existing changes.

Switch branch
Cancel
A new branch will be created in your fork and a new merge request will be started.

Cherry-pick this merge request

Switch branch
Cancel
A new branch will be created in your fork and a new merge request will be started.
GitLab Nexedi Edition | About GitLab | About Nexedi | 沪ICP备2021021310号-2 | 沪ICP备2021021310号-7