go/neo/neonet: Start (first draft)

Continue NEO/go with neonet - the layer to exchange messages in between NEO nodes. NEO/go shifts from thinking about NEO protocol logic as RPC to thinking of it as more general network protocol and so settles to provide general connection-oriented message exchange service. This way neonet provides generic connection multiplexing on top of a single TCP node-node link. Neonet compatibility with NEO/py depends on the following small NEO/py patch: kirr/neo@dd3bb8b4 which adjusts message ID a bit so it behaves like stream_id in HTTP/2: - always even for server initiated streams - always odd for client initiated streams and is incremented by += 2, instead of += 1 to maintain above invariant. See http://navytux.spb.ru/~kirr/neo.html#development-overview (starting from "Then comes the link layer which provides service to exchange messages over network...") for the rationale. Unfortunately current NEO/py maintainer is very much against merging that patch. This patch brings in the core of neonet. Next patches will add initial handshaking, user-level Send/Recv + Ask/Expect and "lightweight mode". Some neonet core history: lab.nexedi.com/kirr/neo/commit/6b9ed46d X neonet: Avoid integer overflow on max packet length check lab.nexedi.com/kirr/neo/commit/8eac771c X neo/connection: Fix race between link.shutdown() and conn.lightClose() lab.nexedi.com/kirr/neo/commit/8021a1d5 X rxghandoff lab.nexedi.com/kirr/neo/commit/68738036 X ... but negative impact on separate client / server processes, strange ... lab.nexedi.com/kirr/neo/commit/b0dda9d2 X serveRecv: help Go scheduler to switch to receiving G sooner lab.nexedi.com/kirr/neo/commit/4989918a X remove defer from rx/tx hot paths lab.nexedi.com/kirr/neo/commit/e055406a X no select for acceptq - similarly for rxq path lab.nexedi.com/kirr/neo/commit/c28ad4d0 X Conn.Recv: receive without select lab.nexedi.com/kirr/neo/commit/496bd425 X add benchmark RTT over plain net.Conn with serveRecv-style RX handler lab.nexedi.com/kirr/neo/commit/9fa79958 X draft how to mark RX down without reallocating .rxdown lab.nexedi.com/kirr/neo/commit/4324c812 X restore all Conn functionality lab.nexedi.com/kirr/neo/commit/a8e61d2f X serveSend is not needed lab.nexedi.com/kirr/neo/commit/9d047b36 X recvPkt via only 1 syscall lab.nexedi.com/kirr/neo/commit/b555a507 X baseline net RTT benchmark lab.nexedi.com/kirr/neo/commit/91be5cdd X everyone is listening from start; CloseAccept to disable listening - works lab.nexedi.com/kirr/neo/commit/c2a1b63a X naming: Packet = raw data; Message = meaningful object lab.nexedi.com/kirr/neo/commit/6fd0c9be X connection: Adding context to errors from NodeLink and Conn operations lab.nexedi.com/kirr/neo/commit/65b17bdc X rework Conn acceptance to be explicit via NodeLink.Accept

go/neo/neonet: Start (first draft)
Continue NEO/go with neonet - the layer to exchange messages in between NEO nodes. NEO/go shifts from thinking about NEO protocol logic as RPC to thinking of it as more general network protocol and so settles to provide general connection-oriented message exchange service. This way neonet provides generic connection multiplexing on top of a single TCP node-node link. Neonet compatibility with NEO/py depends on the following small NEO/py patch: kirr/neo@dd3bb8b4 which adjusts message ID a bit so it behaves like stream_id in HTTP/2: - always even for server initiated streams - always odd for client initiated streams and is incremented by += 2, instead of += 1 to maintain above invariant. See http://navytux.spb.ru/~kirr/neo.html#development-overview (starting from "Then comes the link layer which provides service to exchange messages over network...") for the rationale. Unfortunately current NEO/py maintainer is very much against merging that patch. This patch brings in the core of neonet. Next patches will add initial handshaking, user-level Send/Recv + Ask/Expect and "lightweight mode". Some neonet core history: lab.nexedi.com/kirr/neo/commit/6b9ed46d X neonet: Avoid integer overflow on max packet length check lab.nexedi.com/kirr/neo/commit/8eac771c X neo/connection: Fix race between link.shutdown() and conn.lightClose() lab.nexedi.com/kirr/neo/commit/8021a1d5 X rxghandoff lab.nexedi.com/kirr/neo/commit/68738036 X ... but negative impact on separate client / server processes, strange ... lab.nexedi.com/kirr/neo/commit/b0dda9d2 X serveRecv: help Go scheduler to switch to receiving G sooner lab.nexedi.com/kirr/neo/commit/4989918a X remove defer from rx/tx hot paths lab.nexedi.com/kirr/neo/commit/e055406a X no select for acceptq - similarly for rxq path lab.nexedi.com/kirr/neo/commit/c28ad4d0 X Conn.Recv: receive without select lab.nexedi.com/kirr/neo/commit/496bd425 X add benchmark RTT over plain net.Conn with serveRecv-style RX handler lab.nexedi.com/kirr/neo/commit/9fa79958 X draft how to mark RX down without reallocating .rxdown lab.nexedi.com/kirr/neo/commit/4324c812 X restore all Conn functionality lab.nexedi.com/kirr/neo/commit/a8e61d2f X serveSend is not needed lab.nexedi.com/kirr/neo/commit/9d047b36 X recvPkt via only 1 syscall lab.nexedi.com/kirr/neo/commit/b555a507 X baseline net RTT benchmark lab.nexedi.com/kirr/neo/commit/91be5cdd X everyone is listening from start; CloseAccept to disable listening - works lab.nexedi.com/kirr/neo/commit/c2a1b63a X naming: Packet = raw data; Message = meaningful object lab.nexedi.com/kirr/neo/commit/6fd0c9be X connection: Adding context to errors from NodeLink and Conn operations lab.nexedi.com/kirr/neo/commit/65b17bdc X rework Conn acceptance to be explicit via NodeLink.Accept
64513925 · Kirill Smelkov · 5beab048 · 64513925 · 64513925 · 64513925
Commit 64513925 authored Jul 06, 2018 by Kirill Smelkov
5 changed files
--- a/go/neo/neonet/connection.go
+++ b/go/neo/neonet/connection.go
+// Copyright (C) 2016-2018  Nexedi SA and Contributors.
+//                          Kirill Smelkov <kirr@nexedi.com>
+//
+// This program is free software: you can Use, Study, Modify and Redistribute
+// it under the terms of the GNU General Public License version 3, or (at your
+// option) any later version, as published by the Free Software Foundation.
+//
+// You can also Link and Combine this program with other software covered by
+// the terms of any of the Free Software licenses or any of the Open Source
+// Initiative approved licenses and Convey the resulting work. Corresponding
+// source of such a combination shall include the source code for all other
+// software used.
+//
+// This program is distributed WITHOUT ANY WARRANTY; without even the implied
+// warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
+//
+// See COPYING file for full licensing terms.
+// See https://www.nexedi.com/licensing for rationale and options.
+
+// Package neonet provides service to exchange messages in a NEO network.
+//
+// A NEO node - node
+// link (NodeLink) provides service for multiplexing several communication
+// connections on top of it. Connections (Conn) in turn provide service to
+// exchange NEO protocol messages.
+//
+// New connections can be created with link.NewConn(). Once connection is
+// created and a message is sent over it, on peer's side another corresponding
+// new connection can be accepted via link.Accept(), and all further communication
+// send/receive exchange will be happening in between those 2 connections.
+//
+// See also package lab.nexedi.com/kirr/neo/go/neo/proto for definition of NEO
+// messages.
+package neonet
+
+// XXX neonet compatibility with NEO/py depends on the following small NEO/py patch:
+//
+//	https://lab.nexedi.com/kirr/neo/commit/dd3bb8b4
+//
+// which adjusts message ID a bit so it behaves like stream_id in HTTP/2:
+//
+//	- always even for server initiated streams
+//	- always odd  for client initiated streams
+//
+// and is incremented by += 2, instead of += 1 to maintain above invariant.
+//
+// See http://navytux.spb.ru/~kirr/neo.html#development-overview (starting from
+// "Then comes the link layer which provides service to exchange messages over
+// network...") for the rationale.
+//
+// Unfortunately current NEO/py maintainer is very much against merging that patch.
+
+
+import (
+	"errors"
+	"fmt"
+	"io"
+	"math"
+	"net"
+	//"runtime"
+	"sync"
+	"time"
+
+	"lab.nexedi.com/kirr/neo/go/internal/packed"
+	"lab.nexedi.com/kirr/neo/go/neo/proto"
+
+	"github.com/someonegg/gocontainer/rbuf"
+
+	"lab.nexedi.com/kirr/go123/xbytes"
+)
+
+// NodeLink is a node-node link in NEO.
+//
+// A node-node link represents bidirectional symmetrical communication
+// channel in between 2 NEO nodes. The link provides service for multiplexing
+// several communication connections on top of the node-node link.
+//
+// New connection can be created with .NewConn() . Once connection is
+// created and data is sent over it, on peer's side another corresponding
+// new connection can be accepted via .Accept(), and all further communication
+// send/receive exchange will be happening in between those 2 connections.
+//
+// A NodeLink has to be explicitly closed, once it is no longer needed.
+//
+// It is safe to use NodeLink from multiple goroutines simultaneously.
+type NodeLink struct {
+	peerLink net.Conn // raw conn to peer
+
+	connMu     sync.Mutex
+	connTab    map[uint32]*Conn // connId -> Conn associated with connId
+	nextConnId uint32           // next connId to use for Conn initiated by us
+
+	serveWg sync.WaitGroup	// for serve{Send,Recv}
+	txq	chan txReq	// tx requests from Conns go via here
+				// (rx packets are routed to Conn.rxq)
+
+	acceptq  chan *Conn	// queue of incoming connections for Accept
+	axqWrite atomic32	//  1 while serveRecv is doing `acceptq <- ...`
+	axqRead  atomic32	// +1 while Accept is doing `... <- acceptq`
+	axdownFlag atomic32	//  1 when AX is marked no longer operational
+
+//	axdown  chan struct{}	// ready when accept is marked as no longer operational
+	axdown1 sync.Once	// CloseAccept may be called several times
+
+	down     chan struct{}  // ready when NodeLink is marked as no longer operational
+	downOnce sync.Once      // shutdown may be due to both Close and IO error
+	downWg   sync.WaitGroup // for activities at shutdown
+	errClose error          // error got from peerLink.Close
+
+	errMu    sync.Mutex
+	errRecv	 error		// error got from recvPkt on shutdown
+
+	axclosed atomic32	// whether CloseAccept was called
+	closed   atomic32	// whether Close was called
+
+	rxbuf    rbuf.RingBuf	// buffer for reading from peerLink
+
+	// scheduling optimization: whenever serveRecv sends to Conn.rxq
+	// receiving side must ack here to receive G handoff.
+	// See comments in serveRecv for details.
+	rxghandoff chan struct{}
+}
+
+// XXX rx handoff make latency better for serial request-reply scenario but
+// does a lot of harm for case when there are several parallel requests -
+// serveRecv after handing off is put to tail of current cpu runqueue - not
+// receiving next requests and not spawning handlers for them, thus essential
+// creating Head-of-line (HOL) blocking problem.
+//
+// XXX ^^^ problem reproducible on deco but not on z6001
+const rxghandoff = true // XXX whether to do rxghandoff trick
+
+// Conn is a connection established over NodeLink.
+//
+// Messages can be sent and received over it.
+// Once connection is no longer needed it has to be closed.
+//
+// It is safe to use Conn from multiple goroutines simultaneously.
+type Conn struct {
+	link      *NodeLink
+	connId    uint32
+
+	rxq	   chan *pktBuf	 // received packets for this Conn go here
+	rxqWrite   atomic32	 //  1 while serveRecv is doing `rxq <- ...`
+	rxqRead    atomic32      // +1 while Conn.Recv is doing `... <- rxq`
+	rxdownFlag atomic32	 //  1 when RX is marked no longer operational
+	// XXX ^^^ split to different cache lines?
+
+	rxerrOnce sync.Once     // rx error is reported only once - then it is link down or closed
+
+//	rxdown     chan struct{} // ready when RX is marked no longer operational
+	rxdownOnce sync.Once	 // ----//----	XXX review
+	rxclosed   atomic32	 // whether CloseRecv was called
+
+	txerr     chan error	 // transmit results for this Conn go back here
+
+	txdown     chan struct{} // ready when Conn TX is marked as no longer operational
+	txdownOnce sync.Once	 // tx shutdown may be called by both Close and nodelink.shutdown
+	txclosed   atomic32	 // whether CloseSend was called
+
+	// closing Conn is shutdown + some cleanup work to remove it from
+	// link.connTab including arming timers etc. Let this work be spawned only once.
+	// (for Conn.Close to be valid called several times)
+	closeOnce sync.Once
+}
+
+var ErrLinkClosed   = errors.New("node link is closed")	// operations on closed NodeLink
+var ErrLinkDown     = errors.New("node link is down")	// e.g. due to IO error
+var ErrLinkNoListen = errors.New("node link is not listening for incoming connections")
+var ErrLinkManyConn = errors.New("too many opened connections")
+var ErrClosedConn   = errors.New("connection is closed")
+
+// LinkError is returned by NodeLink operations.
+type LinkError struct {
+	Link *NodeLink
+	Op   string
+	Err  error
+}
+
+// ConnError is returned by Conn operations.
+type ConnError struct {
+	Link   *NodeLink
+	ConnId uint32 // NOTE Conn's are reused - cannot use *Conn here
+	Op     string
+	Err    error
+}
+
+// _LinkRole is a role an end of NodeLink is intended to play.
+type _LinkRole int
+const (
+	_LinkServer _LinkRole = iota // link created as server
+	_LinkClient                  // link created as client
+
+	// for testing:
+	linkNoRecvSend _LinkRole = 1 << 16 // do not spawn serveRecv & serveSend
+	linkFlagsMask  _LinkRole = (1<<32 - 1) << 16
+)
+
+// newNodeLink makes a new NodeLink from already established net.Conn .
+//
+// Role specifies how to treat our role on the link - either as client or
+// server. The difference in between client and server roles is in:
+//
+//    how connection ids are allocated for connections initiated at our side:
+//    there is no conflict in identifiers if one side always allocates them as
+//    even (server) and its peer as odd (client).
+//
+// Usually server role should be used for connections created via
+// net.Listen/net.Accept and client role for connections created via net.Dial.
+func newNodeLink(conn net.Conn, role _LinkRole) *NodeLink {
+	var nextConnId uint32
+	switch role &^ linkFlagsMask {
+	case _LinkServer:
+		nextConnId = 0 // all initiated by us connId will be even
+	case _LinkClient:
+		nextConnId = 1 // ----//---- odd
+	default:
+		panic("invalid conn role")
+	}
+
+	nl := &NodeLink{
+		peerLink:   conn,
+		connTab:    map[uint32]*Conn{},
+		nextConnId: nextConnId,
+		acceptq:    make(chan *Conn),	// XXX +buf ?
+		txq:        make(chan txReq),
+		rxghandoff: make(chan struct{}),
+//		axdown:     make(chan struct{}),
+		down:       make(chan struct{}),
+	}
+	if role&linkNoRecvSend == 0 {
+		nl.serveWg.Add(2)
+		go nl.serveRecv()
+		go nl.serveSend()
+	}
+	return nl
+}
+
+// newConn creates new Conn with id=connId and registers it into connTab.
+// must be called with connMu held.
+func (link *NodeLink) newConn(connId uint32) *Conn {
+	c := &Conn{
+		link:   link,
+		connId: connId,
+		rxq:    make(chan *pktBuf, 1), // NOTE non-blocking - see serveRecv XXX +buf ?
+		txerr:  make(chan error, 1),   // NOTE non-blocking - see Conn.Send
+		txdown: make(chan struct{}),
+//		rxdown: make(chan struct{}),
+	}
+	link.connTab[connId] = c
+	return c
+}
+
+// NewConn creates new connection on top of node-node link.
+func (link *NodeLink) NewConn() (*Conn, error) {
+	link.connMu.Lock()
+	//defer link.connMu.Unlock()
+	c, err := link._NewConn()
+	link.connMu.Unlock()
+	return c, err
+}
+
+func (link *NodeLink) _NewConn() (*Conn, error) {
+	if link.connTab == nil {
+		if link.closed.Get() != 0 {
+			return nil, link.err("newconn", ErrLinkClosed)
+		}
+		return nil, link.err("newconn", ErrLinkDown)
+	}
+
+	// nextConnId could wrap around uint32 limits - find first free slot to
+	// not blindly replace existing connection
+	for i := uint32(0); ; i++ {
+		_, exists := link.connTab[link.nextConnId]
+		if !exists {
+			break
+		}
+		link.nextConnId += 2
+
+		if i > math.MaxUint32 / 2 {
+			return nil, link.err("newconn", ErrLinkManyConn)
+		}
+	}
+
+	c := link.newConn(link.nextConnId)
+	link.nextConnId += 2
+
+	return c, nil
+}
+
+// shutdownAX marks acceptq as no longer operational and interrupts Accept.
+func (link *NodeLink) shutdownAX() {
+	link.axdown1.Do(func() {
+//		close(link.axdown)
+
+		link.axdownFlag.Set(1) // XXX cmpxchg and return if already down?
+
+		// drain all connections from .acceptq:
+		// - something could be already buffered there
+		// - serveRecv could start writing acceptq at the same time we set axdownFlag; we derace it
+		for {
+			// if serveRecv is outside `.acceptq <- ...` critical
+			// region and fully drained - we are done.
+			// see description of the logic in shutdownRX
+			if link.axqWrite.Get() == 0 && len(link.acceptq) == 0 {
+				break
+			}
+
+			select {
+			case conn := <-link.acceptq:
+				// serveRecv already put at least 1 packet into conn.rxq before putting
+				// conn into .acceptq - shutting it down will send the error to peer.
+				conn.shutdownRX(errConnRefused)
+
+				link.connMu.Lock()
+				delete(link.connTab, conn.connId)
+				link.connMu.Unlock()
+
+			default:
+				// ok - continue spinning
+			}
+		}
+
+		// wakeup Accepts
+		for {
+			// similarly to above: .axdownFlag vs .axqRead
+			// see logic description in shutdownRX
+			if link.axqRead.Get() == 0 {
+				break
+			}
+
+			select {
+			case link.acceptq <- nil:
+				// ok - woken up
+
+			default:
+				// ok - continue spinning
+			}
+		}
+	})
+}
+
+// shutdown closes raw link to peer and marks NodeLink as no longer operational.
+//
+// it also shutdowns all opened connections over this node link.
+func (nl *NodeLink) shutdown() {
+	nl.shutdownAX()
+	nl.downOnce.Do(func() {
+		close(nl.down)
+
+		// close actual link to peer. this will wakeup {send,recv}Pkt
+		// NOTE we need it here so that e.g. aborting on error in serveSend wakes up serveRecv
+		nl.errClose = nl.peerLink.Close()
+
+		nl.downWg.Add(1)
+		go func() {
+			defer nl.downWg.Done()
+
+			// wait for serve{Send,Recv} to complete before shutting connections down
+			//
+			// we have to do it so that e.g. serveSend has chance
+			// to return last error from sendPkt to requester.
+			nl.serveWg.Wait()
+
+			// clear + mark down .connTab + shutdown all connections
+			nl.connMu.Lock()
+			connTab := nl.connTab
+			nl.connTab = nil
+			nl.connMu.Unlock()
+
+			// conn.shutdown() outside of link.connMu lock
+			for _, conn := range connTab {
+				conn.shutdown()
+			}
+		}()
+	})
+}
+
+// CloseAccept instructs node link to not accept incoming connections anymore.
+//
+// Any blocked Accept() will be unblocked and return error.
+// The peer will receive "connection refused" if it tries to connect after and
+// for already-queued connection requests.
+//
+// It is safe to call CloseAccept several times.
+func (link *NodeLink) CloseAccept() {
+	link.axclosed.Set(1)
+	link.shutdownAX()
+}
+
+// Close closes node-node link.
+//
+// All blocking operations - Accept and IO on associated connections
+// established over node link - are automatically interrupted with an error.
+// Underlying raw connection is closed.
+// It is safe to call Close several times.
+func (link *NodeLink) Close() error {
+	link.axclosed.Set(1)
+	link.closed.Set(1)
+	link.shutdown()
+	link.downWg.Wait()
+	return link.err("close", link.errClose)
+}
+
+// shutdown marks connection as no longer operational and interrupts Send and Recv.
+func (c *Conn) shutdown() {
+	c.shutdownTX()
+	c.shutdownRX(errConnClosed)
+}
+
+// shutdownTX marks TX as no longer operational and interrupts Send.
+func (c *Conn) shutdownTX() {
+	c.txdownOnce.Do(func() {
+		close(c.txdown)
+	})
+}
+
+// shutdownRX marks .rxq as no longer operational and interrupts Recv.
+func (c *Conn) shutdownRX(errMsg *proto.Error) {
+	c.rxdownOnce.Do(func() {
+//		close(c.rxdown)	// wakeup Conn.Recv
+		c.downRX(errMsg)
+	})
+}
+
+// downRX marks .rxq as no longer operational.
+//
+// used in shutdownRX.
+func (c *Conn) downRX(errMsg *proto.Error) {
+	// let serveRecv know RX is down for this connection
+	c.rxdownFlag.Set(1) // XXX cmpxchg and return if already down?
+
+	// drain all packets from .rxq:
+	// - something could be already buffered there
+	// - serveRecv could start writing rxq at the same time we set rxdownFlag; we derace it.
+	i := 0
+	for {
+		// we set .rxdownFlag=1 above.
+		// now if serveRecv is outside `.rxq <- ...` critical section we know it is either:
+		// - before it	-> it will eventually see .rxdownFlag=1 and won't send pkt to rxq.
+		// - after it	-> it already sent pkt to rxq and won't touch
+		//		   rxq until next packet (where it will hit "before it").
+		//
+		// when serveRecv stopped sending we know we are done draining when rxq is empty.
+		if c.rxqWrite.Get() == 0 && len(c.rxq) == 0 {
+			break
+		}
+
+		select {
+		case <-c.rxq:
+			c.rxack()
+			i++
+
+		default:
+			// ok - continue spinning
+		}
+	}
+
+	// if something was queued already there - reply "connection closed"
+	if i != 0 {
+		go c.link.replyNoConn(c.connId, errMsg)
+	}
+
+	// wakeup recvPkt(s)
+	for {
+		// similarly to above:
+		// we set .rxdownFlag=1
+		// now if recvPkt is outside `... <- .rxq` critical section we know that it is either:
+		// - before it	-> it will eventually see .rxdownFlag=1 and won't try to read rxq.
+		// - after it	-> it already read pktfrom rxq and won't touch
+		//                 rxq until next recvPkt (where it will hit "before it").
+		if c.rxqRead.Get() == 0 {
+			break
+		}
+
+		select {
+		case c.rxq <- nil:
+			// ok - woken up
+
+		default:
+			// ok - continue spinning
+		}
+	}
+}
+
+
+// time to keep record of a closed connection so that we can properly reply
+// "connection closed" if a packet comes in with same connID.
+var connKeepClosed = 1 * time.Minute
+
+// CloseRecv closes reading end of connection.
+//
+// Any blocked Recv*() will be unblocked and return error.
+// The peer will receive "connection closed" if it tries to send anything after
+// and for messages already in local rx queue.
+//
+// It is safe to call CloseRecv several times.
+func (c *Conn) CloseRecv() {
+	c.rxclosed.Set(1)
+	c.shutdownRX(errConnClosed)
+}
+
+// Close closes connection.
+//
+// Any blocked Send*() or Recv*() will be unblocked and return error.
+//
+// NOTE for Send() - once transmission was started - it will complete in the
+// background on the wire not to break node-node link framing.
+//
+// It is safe to call Close several times.
+func (c *Conn) Close() error {
+	link := c.link
+	c.closeOnce.Do(func() {
+		c.rxclosed.Set(1)
+		c.txclosed.Set(1)
+		c.shutdown()
+
+		// adjust link.connTab
+		var tmpclosed *Conn
+		link.connMu.Lock()
+		if link.connTab != nil {
+			// connection was initiated by us - simply delete - we always
+			// know if a packet comes to such connection - it is closed.
+			//
+			// XXX checking vvv should be possible without connMu lock
+			if c.connId == link.nextConnId % 2 {
+				delete(link.connTab, c.connId)
+
+			// connection was initiated by peer which we accepted - put special
+			// "closed" connection into connTab entry for some time to reply
+			// "connection closed" if another packet comes to it.
+			//
+			// ( we cannot reuse same connection since after it is marked as
+			//   closed Send refuses to work )
+			} else {
+				// delete(link.connTab, c.connId)
+				// XXX vvv was temp. disabled - costs a lot in 1req=1conn model
+
+				// c implicitly goes away from connTab
+				tmpclosed = link.newConn(c.connId)
+			}
+		}
+		link.connMu.Unlock()
+
+		if tmpclosed != nil {
+			tmpclosed.shutdownRX(errConnClosed)
+
+			time.AfterFunc(connKeepClosed, func() {
+				link.connMu.Lock()
+				delete(link.connTab, c.connId)
+				link.connMu.Unlock()
+			})
+		}
+	})
+
+	return nil
+}
+
+// ---- receive ----
+
+// errAcceptShutdownAX returns appropriate error when link.axdown is found ready in Accept.
+func (link *NodeLink) errAcceptShutdownAX() error {
+	switch {
+	case link.closed.Get() != 0:
+		return ErrLinkClosed
+
+	case link.axclosed.Get() != 0:
+		return ErrLinkNoListen
+
+	default:
+		// XXX do the same as in errRecvShutdown (check link.errRecv)
+		return ErrLinkDown
+	}
+}
+
+// Accept waits for and accepts incoming connection on top of node-node link.
+func (link *NodeLink) Accept() (*Conn, error) {
+	// semantically equivalent to the following:
+	// ( this is hot path for py compatibility mode because new connection
+	//   is established in every message and select hurts performance )
+	//
+	// select {
+	// case <-link.axdown:
+	// 	return nil, link.err("accept", link.errAcceptShutdownAX())
+	//
+	// case c := <-link.acceptq:
+	// 	return c, nil
+	// }
+
+	var conn *Conn
+	var err  error
+
+	link.axqRead.Add(1)
+	axdown := link.axdownFlag.Get() != 0
+	if !axdown {
+		conn = <-link.acceptq
+	}
+	link.axqRead.Add(-1)
+
+	// in contrast to recvPkt we can decide about error after releasing axqRead
+	// reason: link is not going to be released to a free pool.
+	if axdown || conn == nil {
+		err = link.err("accept", link.errAcceptShutdownAX())
+	}
+
+	return conn, err
+}
+
+// errRecvShutdown returns appropriate error when c.rxdown is found ready in recvPkt.
+func (c *Conn) errRecvShutdown() error {
+	switch {
+	case c.rxclosed.Get() != 0:
+		return ErrClosedConn
+
+	case c.link.closed.Get() != 0:
+		return ErrLinkClosed
+
+	default:
+		// we have to check what was particular RX error on nodelink shutdown
+		// only do that once - after reporting RX error the first time
+		// tell client the node link is no longer operational.
+		var err error
+		c.rxerrOnce.Do(func() {
+			c.link.errMu.Lock()
+			err = c.link.errRecv
+			c.link.errMu.Unlock()
+		})
+		if err == nil {
+			err = ErrLinkDown
+		}
+		return err
+	}
+}
+
+// recvPkt receives raw packet from connection
+func (c *Conn) recvPkt() (*pktBuf, error) {
+	// semantically equivalent to the following:
+	// (this is hot path and select is not used for performance reason)
+	//
+	// select {
+	// case <-c.rxdown:
+	// 	return nil, c.err("recv", c.errRecvShutdown())
+	//
+	// case pkt := <-c.rxq:
+	// 	return pkt, nil
+	// }
+
+	var pkt *pktBuf
+	var err error
+
+	c.rxqRead.Add(1)
+	rxdown := c.rxdownFlag.Get() != 0
+	if !rxdown {
+		pkt = <-c.rxq
+	}
+
+	// decide about error while under rxqRead (if after - the Conn can go away to be released)
+	if rxdown || pkt == nil {
+		err = c.err("recv", c.errRecvShutdown())
+	}
+
+	c.rxqRead.Add(-1)
+	if err == nil {
+		c.rxack()
+	}
+	return pkt, err
+}
+
+// rxack unblocks serveRecv after it handed G to us.
+// see comments about rxghandoff in serveRecv.
+func (c *Conn) rxack() {
+	if !rxghandoff {
+		return
+	}
+	//fmt.Printf("conn: rxack <- ...\n")
+	c.link.rxghandoff <- struct{}{}
+	//fmt.Printf("\tconn: rxack <- ... ok\n")
+}
+
+// serveRecv handles incoming packets routing them to either appropriate
+// already-established connection or, if node link is accepting incoming
+// connections, to new connection put to accept queue.
+func (nl *NodeLink) serveRecv() {
+	defer nl.serveWg.Done()
+	for {
+		// receive 1 packet
+		// NOTE if nl.peerLink was just closed by tx->shutdown we'll get ErrNetClosing
+		pkt, err := nl.recvPkt()
+		//fmt.Printf("\n%p recvPkt -> %v, %v\n", nl, pkt, err)
+		if err != nil {
+			// on IO error framing over peerLink becomes broken
+			// so we shut down node link and all connections over it.
+
+			nl.errMu.Lock()
+			nl.errRecv = err
+			nl.errMu.Unlock()
+
+			nl.shutdown()
+			return
+		}
+
+		// pkt.ConnId -> Conn
+		connId := packed.Ntoh32(pkt.Header().ConnId)
+		accept := false
+
+		nl.connMu.Lock()
+
+		// connTab is never nil here - because shutdown, before
+		// resetting it, waits for us to finish.
+		conn := nl.connTab[connId]
+
+		if conn == nil {
+			// message with connid for a stream initiated by peer
+			// it will be considered to be accepted (not if .axdown)
+			if connId % 2 != nl.nextConnId % 2 {
+				accept = true
+				conn = nl.newConn(connId)
+			}
+
+			// else it is message with connid that should be initiated by us
+			// leave conn=nil - we'll reply errConnClosed
+		}
+
+		nl.connMu.Unlock()
+
+		//fmt.Printf("%p\tconn: %v\n", nl, conn)
+		if conn == nil {
+			// see ^^^ "message with connid that should be initiated by us"
+			go nl.replyNoConn(connId, errConnClosed)
+			continue
+		}
+
+		// route packet to serving goroutine handler
+		//
+		// TODO backpressure when Recv is not keeping up with Send on peer side?
+		//      (not to let whole link starve because of one connection)
+		//
+		// NOTE rxq must be buffered with at least 1 element so that
+		// queuing pkt succeeds for incoming connection that is not yet
+		// there in acceptq.
+		conn.rxqWrite.Set(1)
+		rxdown := conn.rxdownFlag.Get() != 0
+		if !rxdown {
+			conn.rxq <- pkt
+		}
+		conn.rxqWrite.Set(0)
+
+		//fmt.Printf("%p\tconn.rxdown: %v\taccept: %v\n", nl, rxdown, accept)
+
+
+		// conn exists, but rx is down - "connection closed"
+		// (this cannot happen for newly accepted connection)
+		if rxdown {
+			go nl.replyNoConn(connId, errConnClosed)
+			continue
+		}
+
+		// this packet established new connection - try to accept it
+		if accept {
+			nl.axqWrite.Set(1)
+			axdown := nl.axdownFlag.Get() != 0
+			if !axdown {
+				nl.acceptq <- conn
+			}
+			nl.axqWrite.Set(0)
+
+			// we are not accepting the connection
+			if axdown {
+				go nl.replyNoConn(connId, errConnRefused)
+				nl.connMu.Lock()
+				delete(nl.connTab, conn.connId)
+				nl.connMu.Unlock()
+
+				continue
+			}
+		}
+
+		//fmt.Printf("%p\tafter accept\n", nl)
+
+		// Normally serveRecv G will continue to run with G waking up
+		// on rxq/acceptq only being put into the runqueue of current proc.
+		// By default proc runq will execute only when sendRecv blocks
+		// again next time deep in nl.recvPkt(), but let's force the switch
+		// now without additional waiting to reduce latency.
+
+		// XXX bad - puts serveRecv to global runq thus with high p to switch M
+		//runtime.Gosched()
+
+		// handoff execution to receiving goroutine via channel.
+		// rest of serveRecv is put to current P local runq.
+		//
+		// details:
+		// - https://github.com/golang/go/issues/20168
+		// - https://github.com/golang/go/issues/15110
+		//
+		// see BenchmarkTCPlo* - for serveRecv style RX handoff there
+		// cuts RTT from 12.5μs to 6.6μs (RTT without serveRecv style G is 4.8μs)
+		//
+		// TODO report upstream
+		if rxghandoff {
+			//fmt.Printf("serveRecv: <-rxghandoff\n")
+			<-nl.rxghandoff
+			//fmt.Printf("\tserveRecv: <-rxghandoff ok\n")
+		}
+
+/*
+		// XXX goes away in favour of .rxdownFlag; reasons
+		// - no select
+		//
+		// XXX review synchronization via flags for correctness (e.g.
+		// if both G were on the same runqueue, spinning in G1 will
+		// prevent G2 progress)
+		//
+		// XXX maybe we'll need select if we add ctx into Send/Recv.
+
+		// don't even try `conn.rxq <- ...` if conn.rxdown is ready
+		// ( else since select is picking random ready variant Recv/serveRecv
+		//   could receive something on rxdown Conn sometimes )
+		rxdown := false
+		select {
+		case <-conn.rxdown:
+			rxdown = true
+		default:
+			// ok
+		}
+
+		// route packet to serving goroutine handler
+		//
+		// TODO backpressure when Recv is not keeping up with Send on peer side?
+		//      (not to let whole link starve because of one connection)
+		//
+		// NOTE rxq must be buffered with at least 1 element so that
+		// queuing pkt succeeds for incoming connection that is not yet
+		// there in acceptq.
+		if !rxdown {
+			// XXX can avoid select here: if conn closer cares to drain rxq (?)
+			select {
+			case <-conn.rxdown:
+				rxdown = true
+
+			case conn.rxq <- pkt:
+				// ok
+			}
+		}
+
+		...
+
+		// this packet established new connection - try to accept it
+		if accept {
+			// don't even try `link.acceptq <- ...` if link.axdown is ready
+			// ( else since select is picking random ready variant Accept/serveRecv
+			//   could receive something on axdown Link sometimes )
+			axdown := false
+			select {
+			case <-nl.axdown:
+				axdown = true
+
+			default:
+				// ok
+			}
+
+			// put conn to .acceptq
+			if !axdown {
+				// XXX can avoid select here if shutdownAX cares to drain acceptq (?)
+				select {
+				case <-nl.axdown:
+					axdown = true
+
+				case nl.acceptq <- conn:
+					//fmt.Printf("%p\t.acceptq <- conn  ok\n", nl)
+					// ok
+				}
+			}
+
+			// we are not accepting the connection
+			if axdown {
+				conn.shutdownRX(errConnRefused)
+				nl.connMu.Lock()
+				delete(nl.connTab, conn.connId)
+				nl.connMu.Unlock()
+			}
+		}
+*/
+	}
+}
+
+// ---- network replies for closed / refused connections ----
+
+var errConnClosed  = &proto.Error{proto.PROTOCOL_ERROR, "connection closed"}
+var errConnRefused = &proto.Error{proto.PROTOCOL_ERROR, "connection refused"}
+
+// replyNoConn sends error message to peer when a packet was sent to closed / nonexistent connection
+func (link *NodeLink) replyNoConn(connId uint32, errMsg proto.Msg) {
+	//fmt.Printf("%s .%d: -> replyNoConn %v\n", link, connId, errMsg)
+	link.sendMsg(connId, errMsg) // ignore errors
+	//fmt.Printf("%s .%d: replyNoConn(%v) -> %v\n", link, connId, errMsg, err)
+}
+
+// ---- transmit ----
+
+// txReq is request to transmit a packet. Result error goes back to errch.
+type txReq struct {
+	pkt   *pktBuf
+	errch chan error
+}
+
+// errSendShutdown returns appropriate error when c.txdown is found ready in Send.
+func (c *Conn) errSendShutdown() error {
+	switch {
+	case c.txclosed.Get() != 0:
+		return ErrClosedConn
+
+	// the only other error possible besides Conn being .Close()'ed is that
+	// NodeLink was closed/shutdowned itself - on actual IO problems corresponding
+	// error is delivered to particular Send that caused it.
+
+	case c.link.closed.Get() != 0:
+		return ErrLinkClosed
+
+	default:
+		return ErrLinkDown
+	}
+}
+
+// sendPkt sends raw packet via connection.
+//
+// on success pkt is freed.
+func (c *Conn) sendPkt(pkt *pktBuf) error {
+	err := c.sendPkt2(pkt)
+	return c.err("send", err)
+}
+
+func (c *Conn) sendPkt2(pkt *pktBuf) error {
+	// connId must be set to one associated with this connection
+	if pkt.Header().ConnId != packed.Hton32(c.connId) {
+		panic("Conn.sendPkt: connId wrong")
+	}
+
+	var err error
+
+	select {
+	case <-c.txdown:
+		return c.errSendShutdown()
+
+	case c.link.txq <- txReq{pkt, c.txerr}:
+		select {
+		// tx request was sent to serveSend and is being transmitted on the wire.
+		// the transmission may block for indefinitely long though and
+		// we cannot interrupt it as the only way to interrupt is
+		// .link.Close() which will close all other Conns.
+		//
+		// That's why we are also checking for c.txdown while waiting
+		// for reply from serveSend (and leave pkt to finish transmitting).
+		//
+		// NOTE after we return straight here serveSend won't be later
+		// blocked on c.txerr<- because that backchannel is a non-blocking one.
+		case <-c.txdown:
+
+			// also poll c.txerr here because: when there is TX error,
+			// serveSend sends to c.txerr _and_ closes c.txdown .
+			// We still want to return actual transmission error to caller.
+			select {
+			case err = <-c.txerr:
+				return err
+			default:
+				return c.errSendShutdown()
+			}
+
+		case err = <-c.txerr:
+			return err
+		}
+	}
+}
+
+// serveSend handles requests to transmit packets from client connections and
+// serially executes them over associated node link.
+func (nl *NodeLink) serveSend() {
+	defer nl.serveWg.Done()
+	for {
+		select {
+		case <-nl.down:
+			return
+
+		case txreq := <-nl.txq:
+			// XXX if n.peerLink was just closed by rx->shutdown we'll get ErrNetClosing
+			err := nl.sendPkt(txreq.pkt)
+			//fmt.Printf("sendPkt -> %v\n", err)
+
+			// FIXME if several goroutines call conn.Send
+			// simultaneously - c.txerr even if buffered(1) will be
+			// overflown and thus deadlock here.
+			//
+			// -> require "Conn.Send must not be used concurrently"?
+			txreq.errch <- err
+
+			// on IO error framing over peerLink becomes broken
+			// so we shut down node link and all connections over it.
+			//
+			// XXX move to link.sendPkt?
+			if err != nil {
+				nl.shutdown()
+				return
+			}
+		}
+	}
+}
+
+// ---- raw IO ----
+
+const dumpio = false
+
+// sendPkt sends raw packet to peer.
+//
+// tx error, if any, is returned as is and is analyzed in serveSend.
+//
+// XXX pkt should be freed always or only on error?
+func (nl *NodeLink) sendPkt(pkt *pktBuf) error {
+	if dumpio {
+		// XXX -> log
+		fmt.Printf("%v > %v: %v\n", nl.peerLink.LocalAddr(), nl.peerLink.RemoteAddr(), pkt)
+		//defer fmt.Printf("\t-> sendPkt err: %v\n", err)
+	}
+
+	// NOTE Write writes data in full, or it is error
+	_, err := nl.peerLink.Write(pkt.data)
+	pkt.Free()
+	return err
+}
+
+var ErrPktTooBig = errors.New("packet too big")
+
+// recvPkt receives raw packet from peer.
+//
+// rx error, if any, is returned as is and is analyzed in serveRecv
+func (nl *NodeLink) recvPkt() (*pktBuf, error) {
+	// FIXME if rxbuf is non-empty - first look there for header and then if
+	// we know size -> allocate pkt with that size.
+	pkt := pktAlloc(4096)
+	// len=4K but cap can be more since pkt is from pool - use all space to buffer reads
+	// XXX vvv -> pktAlloc() ?
+	data := pkt.data[:cap(pkt.data)]
+
+	n := 0 // number of pkt bytes obtained so far
+
+	// next packet could be already prefetched in part by previous read
+	if nl.rxbuf.Len() > 0 {
+		δn, _ := nl.rxbuf.Read(data[:proto.PktHeaderLen])
+		n += δn
+	}
+
+	// first read to read pkt header and hopefully rest of packet in 1 syscall
+	if n < proto.PktHeaderLen {
+		δn, err := io.ReadAtLeast(nl.peerLink, data[n:], proto.PktHeaderLen - n)
+		if err != nil {
+			return nil, err
+		}
+		n += δn
+	}
+
+	pkth := pkt.Header()
+
+	msgLen := packed.Ntoh32(pkth.MsgLen)
+	if msgLen > proto.PktMaxSize - proto.PktHeaderLen {
+		return nil, ErrPktTooBig
+	}
+	pktLen := int(proto.PktHeaderLen + msgLen) // whole packet length
+
+	// resize data if we don't have enough room in it
+	data = xbytes.Resize(data, pktLen)
+	data = data[:cap(data)]
+
+	// we might have more data already prefetched in rxbuf
+	if nl.rxbuf.Len() > 0 {
+		δn, _ := nl.rxbuf.Read(data[n:pktLen])
+		n += δn
+	}
+
+	// read rest of pkt data, if we need to
+	if n < pktLen {
+		δn, err := io.ReadAtLeast(nl.peerLink, data[n:], pktLen - n)
+		if err != nil {
+			return nil, err
+		}
+		n += δn
+	}
+
+	// put overread data into rxbuf for next reader
+	if n > pktLen {
+		nl.rxbuf.Write(data[pktLen:n])
+	}
+
+	// fixup data/pkt
+	data = data[:n]
+	pkt.data = data
+
+	if dumpio {
+		// XXX -> log
+		fmt.Printf("%v < %v: %v\n", nl.peerLink.LocalAddr(), nl.peerLink.RemoteAddr(), pkt)
+	}
+
+	return pkt, nil
+}
+
+
+// ---- for convenience: Conn -> NodeLink & local/remote link addresses  ----
+
+// LocalAddr returns local address of the underlying link to peer.
+func (link *NodeLink) LocalAddr() net.Addr {
+	return link.peerLink.LocalAddr()
+}
+
+// RemoteAddr returns remote address of the underlying link to peer.
+func (link *NodeLink) RemoteAddr() net.Addr {
+	return link.peerLink.RemoteAddr()
+}
+
+// Link returns underlying NodeLink of this connection.
+func (c *Conn) Link() *NodeLink {
+	return c.link
+}
+
+// ConnID returns connection identifier used for the connection.
+func (c *Conn) ConnID() uint32 {
+	return c.connId
+}
+
+
+// ---- for convenience: String / Error / Cause ----
+
+func (link *NodeLink) String() string {
+	s := fmt.Sprintf("%s - %s", link.LocalAddr(), link.RemoteAddr())
+	return s	// XXX add "(closed)" if link is closed ?
+			// XXX other flags e.g. (down) ?
+}
+
+func (c *Conn) String() string {
+	s := fmt.Sprintf("%s .%d", c.link, c.connId)
+	return s	// XXX add "(closed)" if c is closed ?
+}
+
+func (e *LinkError) Error() string {
+	return fmt.Sprintf("%s: %s: %s", e.Link, e.Op, e.Err)
+}
+
+func (e *ConnError) Error() string {
+	return fmt.Sprintf("%s .%d: %s: %s", e.Link, e.ConnId, e.Op, e.Err)
+}
+
+func (e *LinkError) Cause() error { return e.Err }
+func (e *ConnError) Cause() error { return e.Err }
+
+func (nl *NodeLink) err(op string, e error) error {
+	if e == nil {
+		return nil
+	}
+	return &LinkError{Link: nl, Op: op, Err: e}
+}
+
+func (c *Conn) err(op string, e error) error {
+	if e == nil {
+		return nil
+	}
+	return &ConnError{Link: c.link, ConnId: c.connId, Op: op, Err: e}
+}
+
+
+// ---- exchange of messages ----
+
+// msgPack allocates pktBuf and encodes msg into it.
+func msgPack(connId uint32, msg proto.Msg) *pktBuf {
+	l := msg.NEOMsgEncodedLen()
+	buf := pktAlloc(proto.PktHeaderLen + l)
+
+	h := buf.Header()
+	h.ConnId = packed.Hton32(connId)
+	h.MsgCode = packed.Hton16(msg.NEOMsgCode())
+	h.MsgLen = packed.Hton32(uint32(l)) // XXX casting: think again
+
+	msg.NEOMsgEncode(buf.Payload())
+	return buf
+}
+
+// TODO msgUnpack
+
+// sendMsg sends message with specified connection ID.
+//
+// it encodes message into packet, sets header appropriately and sends it.
+//
+// it is ok to call sendMsg in parallel with serveSend.
+func (link *NodeLink) sendMsg(connId uint32, msg proto.Msg) error {
+	buf := msgPack(connId, msg)
+	return link.sendPkt(buf) // XXX more context in err? (msg type)
+	// FIXME ^^^ shutdown whole link on error
+}
--- a/go/neo/neonet/connection_test.go
+++ b/go/neo/neonet/connection_test.go
+// Copyright (C) 2016-2018  Nexedi SA and Contributors.
+//                          Kirill Smelkov <kirr@nexedi.com>
+//
+// This program is free software: you can Use, Study, Modify and Redistribute
+// it under the terms of the GNU General Public License version 3, or (at your
+// option) any later version, as published by the Free Software Foundation.
+//
+// You can also Link and Combine this program with other software covered by
+// the terms of any of the Free Software licenses or any of the Open Source
+// Initiative approved licenses and Convey the resulting work. Corresponding
+// source of such a combination shall include the source code for all other
+// software used.
+//
+// This program is distributed WITHOUT ANY WARRANTY; without even the implied
+// warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
+//
+// See COPYING file for full licensing terms.
+// See https://www.nexedi.com/licensing for rationale and options.
+
+package neonet
+
+import (
+	"bytes"
+	"context"
+	"io"
+	"net"
+	"runtime"
+	"testing"
+	"time"
+
+	"golang.org/x/sync/errgroup"
+
+	"lab.nexedi.com/kirr/go123/exc"
+	"lab.nexedi.com/kirr/go123/xerr"
+
+	"lab.nexedi.com/kirr/neo/go/internal/packed"
+
+	"lab.nexedi.com/kirr/neo/go/neo/proto"
+
+	"github.com/kylelemons/godebug/pretty"
+)
+
+func xclose(c io.Closer) {
+	err := c.Close()
+	exc.Raiseif(err)
+}
+
+func xnewconn(nl *NodeLink) *Conn {
+	c, err := nl.NewConn()
+	exc.Raiseif(err)
+	return c
+}
+
+func xaccept(nl *NodeLink) *Conn {
+	c, err := nl.Accept()
+	exc.Raiseif(err)
+	return c
+}
+
+func xsendPkt(c interface{ sendPkt(*pktBuf) error }, pkt *pktBuf) {
+	err := c.sendPkt(pkt)
+	exc.Raiseif(err)
+}
+
+func xrecvPkt(c interface{ recvPkt() (*pktBuf, error) }) *pktBuf {
+	pkt, err := c.recvPkt()
+	exc.Raiseif(err)
+	return pkt
+}
+
+func xwait(w interface{ Wait() error }) {
+	err := w.Wait()
+	exc.Raiseif(err)
+}
+
+func gox(wg interface{ Go(func() error) }, xf func()) {
+	wg.Go(exc.Funcx(xf))
+}
+
+// xlinkError verifies that err is *LinkError and returns err.Err .
+func xlinkError(err error) error {
+	le, ok := err.(*LinkError)
+	if !ok {
+		exc.Raisef("%#v is not *LinkError", err)
+	}
+	return le.Err
+}
+
+// xconnError verifies that err is *ConnError and returns err.Err .
+func xconnError(err error) error {
+	ce, ok := err.(*ConnError)
+	if !ok {
+		exc.Raisef("%#v is not *ConnError", err)
+	}
+	return ce.Err
+}
+
+// Prepare pktBuf with content.
+func _mkpkt(connid uint32, msgcode uint16, payload []byte) *pktBuf {
+	pkt := &pktBuf{make([]byte, proto.PktHeaderLen+len(payload))}
+	h := pkt.Header()
+	h.ConnId = packed.Hton32(connid)
+	h.MsgCode = packed.Hton16(msgcode)
+	h.MsgLen = packed.Hton32(uint32(len(payload)))
+	copy(pkt.Payload(), payload)
+	return pkt
+}
+
+func (c *Conn) mkpkt(msgcode uint16, payload []byte) *pktBuf {
+	// in Conn exchange connid is automatically set by Conn.sendPkt
+	return _mkpkt(c.connId, msgcode, payload)
+}
+
+// Verify pktBuf is as expected.
+func xverifyPkt(pkt *pktBuf, connid uint32, msgcode uint16, payload []byte) {
+	errv := xerr.Errorv{}
+	h := pkt.Header()
+	// TODO include caller location
+	if packed.Ntoh32(h.ConnId) != connid {
+		errv.Appendf("header: unexpected connid %v  (want %v)", packed.Ntoh32(h.ConnId), connid)
+	}
+	if packed.Ntoh16(h.MsgCode) != msgcode {
+		errv.Appendf("header: unexpected msgcode %v  (want %v)", packed.Ntoh16(h.MsgCode), msgcode)
+	}
+	if packed.Ntoh32(h.MsgLen) != uint32(len(payload)) {
+		errv.Appendf("header: unexpected msglen %v  (want %v)", packed.Ntoh32(h.MsgLen), len(payload))
+	}
+	if !bytes.Equal(pkt.Payload(), payload) {
+		errv.Appendf("payload differ:\n%s",
+			pretty.Compare(string(payload), string(pkt.Payload())))
+	}
+
+	exc.Raiseif(errv.Err())
+}
+
+// Verify pktBuf to match expected message.
+func xverifyPktMsg(pkt *pktBuf, connid uint32, msg proto.Msg) {
+	data := make([]byte, msg.NEOMsgEncodedLen())
+	msg.NEOMsgEncode(data)
+	xverifyPkt(pkt, connid, msg.NEOMsgCode(), data)
+}
+
+// delay a bit.
+//
+// needed e.g. to test Close interaction with waiting read or write
+// (we cannot easily sync and make sure e.g. read is started and became asleep)
+//
+// XXX JM suggested to really wait till syscall starts this way:
+// - via polling get traceback for thread that is going to call syscall and eventually block
+// - if from that traceback we can see that blocking syscall is already called
+//   -> this way we can know that it is already blocking and thus sleep-hack can be avoided
+// this can be done via runtime/pprof -> "goroutine" predefined profile
+func tdelay() {
+	time.Sleep(1 * time.Millisecond)
+}
+
+// create NodeLinks connected via net.Pipe
+func _nodeLinkPipe(flags1, flags2 _LinkRole) (nl1, nl2 *NodeLink) {
+	node1, node2 := net.Pipe()
+	nl1 = newNodeLink(node1, _LinkClient|flags1)
+	nl2 = newNodeLink(node2, _LinkServer|flags2)
+	return nl1, nl2
+}
+
+func nodeLinkPipe() (nl1, nl2 *NodeLink) {
+	return _nodeLinkPipe(0, 0)
+}
+
+func TestNodeLink(t *testing.T) {
+	// TODO catch exception -> add proper location from it -> t.Fatal (see git-backup)
+
+	// Close vs recvPkt
+	nl1, nl2 := _nodeLinkPipe(linkNoRecvSend, linkNoRecvSend)
+	wg := &errgroup.Group{}
+	gox(wg, func() {
+		tdelay()
+		xclose(nl1)
+	})
+	pkt, err := nl1.recvPkt()
+	if !(pkt == nil && err == io.ErrClosedPipe) {
+		t.Fatalf("NodeLink.recvPkt() after close: pkt = %v  err = %v", pkt, err)
+	}
+	xwait(wg)
+	xclose(nl2)
+
+	// Close vs sendPkt
+	nl1, nl2 = _nodeLinkPipe(linkNoRecvSend, linkNoRecvSend)
+	wg = &errgroup.Group{}
+	gox(wg, func() {
+		tdelay()
+		xclose(nl1)
+	})
+	pkt = &pktBuf{[]byte("data")}
+	err = nl1.sendPkt(pkt)
+	if err != io.ErrClosedPipe {
+		t.Fatalf("NodeLink.sendPkt() after close: err = %v", err)
+	}
+	xwait(wg)
+	xclose(nl2)
+
+	// {Close,CloseAccept} vs Accept
+	nl1, nl2 = _nodeLinkPipe(linkNoRecvSend, linkNoRecvSend)
+	wg = &errgroup.Group{}
+	gox(wg, func() {
+		tdelay()
+		xclose(nl2)
+	})
+	c, err := nl2.Accept()
+	if !(c == nil && xlinkError(err) == ErrLinkClosed) {
+		t.Fatalf("NodeLink.Accept() after close: conn = %v, err = %v", c, err)
+	}
+	gox(wg, func() {
+		tdelay()
+		nl1.CloseAccept()
+	})
+	c, err = nl1.Accept()
+	if !(c == nil && xlinkError(err) == ErrLinkNoListen) {
+		t.Fatalf("NodeLink.Accept() after CloseAccept: conn = %v, err = %v", c, err)
+	}
+	xwait(wg)
+	// nl1 is now not accepting connections - because it was CloseAccept'ed
+	// check further Accept behaviour.
+	c, err = nl1.Accept()
+	if !(c == nil && xlinkError(err) == ErrLinkNoListen) {
+		t.Fatalf("NodeLink.Accept() on non-listening node link: conn = %v, err = %v", c, err)
+	}
+	xclose(nl1)
+
+	// Close vs recvPkt on another side
+	nl1, nl2 = _nodeLinkPipe(linkNoRecvSend, linkNoRecvSend)
+	wg = &errgroup.Group{}
+	gox(wg, func() {
+		tdelay()
+		xclose(nl2)
+	})
+	pkt, err = nl1.recvPkt()
+	if !(pkt == nil && err == io.EOF) { // NOTE io.EOF on Read per io.Pipe
+		t.Fatalf("NodeLink.recvPkt() after peer shutdown: pkt = %v  err = %v", pkt, err)
+	}
+	xwait(wg)
+	xclose(nl1)
+
+	// Close vs sendPkt on another side
+	nl1, nl2 = _nodeLinkPipe(linkNoRecvSend, linkNoRecvSend)
+	wg = &errgroup.Group{}
+	gox(wg, func() {
+		tdelay()
+		xclose(nl2)
+	})
+	pkt = &pktBuf{[]byte("data")}
+	err = nl1.sendPkt(pkt)
+	if err != io.ErrClosedPipe { // NOTE io.ErrClosedPipe on Write per io.Pipe
+		t.Fatalf("NodeLink.sendPkt() after peer shutdown: pkt = %v  err = %v", pkt, err)
+	}
+	xwait(wg)
+	xclose(nl1)
+
+	// raw exchange
+	nl1, nl2 = _nodeLinkPipe(linkNoRecvSend, linkNoRecvSend)
+
+	wg, ctx := errgroup.WithContext(context.Background())
+	gox(wg, func() {
+		// send ping; wait for pong
+		pkt := _mkpkt(1, 2, []byte("ping"))
+		xsendPkt(nl1, pkt)
+		pkt = xrecvPkt(nl1)
+		xverifyPkt(pkt, 3, 4, []byte("pong"))
+	})
+	gox(wg, func() {
+		// wait for ping; send pong
+		pkt = xrecvPkt(nl2)
+		xverifyPkt(pkt, 1, 2, []byte("ping"))
+		pkt = _mkpkt(3, 4, []byte("pong"))
+		xsendPkt(nl2, pkt)
+	})
+
+	// close nodelinks either when checks are done, or upon first error
+	wgclose := &errgroup.Group{}
+	gox(wgclose, func() {
+		<-ctx.Done()
+		xclose(nl1)
+		xclose(nl2)
+	})
+
+	xwait(wg)
+	xwait(wgclose)
+
+	// ---- connections on top of nodelink ----
+
+	// Close vs recvPkt
+	nl1, nl2 = _nodeLinkPipe(0, linkNoRecvSend)
+	c = xnewconn(nl1)
+	wg = &errgroup.Group{}
+	gox(wg, func() {
+		tdelay()
+		xclose(c)
+	})
+	pkt, err = c.recvPkt()
+	if !(pkt == nil && xconnError(err) == ErrClosedConn) {
+		t.Fatalf("Conn.recvPkt() after close: pkt = %v  err = %v", pkt, err)
+	}
+	xwait(wg)
+	xclose(nl1)
+	xclose(nl2)
+
+	// Close vs sendPkt
+	nl1, nl2 = _nodeLinkPipe(0, linkNoRecvSend)
+	c = xnewconn(nl1)
+	wg = &errgroup.Group{}
+	gox(wg, func() {
+		tdelay()
+		xclose(c)
+	})
+	pkt = c.mkpkt(0, []byte("data"))
+	err = c.sendPkt(pkt)
+	if xconnError(err) != ErrClosedConn {
+		t.Fatalf("Conn.sendPkt() after close: err = %v", err)
+	}
+	xwait(wg)
+
+	// NodeLink.Close vs Conn.sendPkt/recvPkt
+	c11 := xnewconn(nl1)
+	c12 := xnewconn(nl1)
+	wg = &errgroup.Group{}
+	gox(wg, func() {
+		pkt, err := c11.recvPkt()
+		if !(pkt == nil && xconnError(err) == ErrLinkClosed) {
+			exc.Raisef("Conn.recvPkt() after NodeLink close: pkt = %v  err = %v", pkt, err)
+		}
+	})
+	gox(wg, func() {
+		pkt := c12.mkpkt(0, []byte("data"))
+		err := c12.sendPkt(pkt)
+		if xconnError(err) != ErrLinkClosed {
+			exc.Raisef("Conn.sendPkt() after NodeLink close: err = %v", err)
+		}
+	})
+	tdelay()
+	xclose(nl1)
+	xwait(wg)
+	xclose(c11)
+	xclose(c12)
+	xclose(nl2)
+
+	// NodeLink.Close vs Conn.sendPkt/recvPkt and Accept on another side
+	nl1, nl2 = _nodeLinkPipe(linkNoRecvSend, 0)
+	c21 := xnewconn(nl2)
+	c22 := xnewconn(nl2)
+	c23 := xnewconn(nl2)
+	wg = &errgroup.Group{}
+	var errRecv error
+	gox(wg, func() {
+		pkt, err := c21.recvPkt()
+		want1 := io.EOF           // if recvPkt wakes up due to peer close
+		want2 := io.ErrClosedPipe // if recvPkt wakes up due to sendPkt wakes up first and closes nl1
+		cerr := xconnError(err)
+		if !(pkt == nil && (cerr == want1 || cerr == want2)) {
+			exc.Raisef("Conn.recvPkt after peer NodeLink shutdown: pkt = %v  err = %v", pkt, err)
+		}
+
+		errRecv = cerr
+	})
+	gox(wg, func() {
+		pkt := c22.mkpkt(0, []byte("data"))
+		err := c22.sendPkt(pkt)
+		want := io.ErrClosedPipe // always this in both due to peer close or recvPkt waking up and closing nl2
+		if xconnError(err) != want {
+			exc.Raisef("Conn.sendPkt after peer NodeLink shutdown: %v", err)
+		}
+
+	})
+	gox(wg, func() {
+		conn, err := nl2.Accept()
+		if !(conn == nil && xlinkError(err) == ErrLinkDown) {
+			exc.Raisef("Accept after peer NodeLink shutdown: conn = %v  err = %v", conn, err)
+		}
+	})
+	tdelay()
+	xclose(nl1)
+	xwait(wg)
+
+	// XXX denoise vvv
+
+	// NewConn after NodeLink shutdown
+	c, err = nl2.NewConn()
+	if xlinkError(err) != ErrLinkDown {
+		t.Fatalf("NewConn after NodeLink shutdown: %v", err)
+	}
+
+	// Accept after NodeLink shutdown
+	c, err = nl2.Accept()
+	if xlinkError(err) != ErrLinkDown {
+		t.Fatalf("Accept after NodeLink shutdown: conn = %v  err = %v", c, err)
+	}
+
+	// recvPkt/sendPkt on another Conn
+	pkt, err = c23.recvPkt()
+	if !(pkt == nil && xconnError(err) == errRecv) {
+		t.Fatalf("Conn.recvPkt 2 after peer NodeLink shutdown: pkt = %v  err = %v", pkt, err)
+	}
+	err = c23.sendPkt(c23.mkpkt(0, []byte("data")))
+	if xconnError(err) != ErrLinkDown {
+		t.Fatalf("Conn.sendPkt 2 after peer NodeLink shutdown: %v", err)
+	}
+
+	// recvPkt/sendPkt error on second call
+	pkt, err = c21.recvPkt()
+	if !(pkt == nil && xconnError(err) == ErrLinkDown) {
+		t.Fatalf("Conn.recvPkt after NodeLink shutdown: pkt = %v  err = %v", pkt, err)
+	}
+	err = c22.sendPkt(c22.mkpkt(0, []byte("data")))
+	if xconnError(err) != ErrLinkDown {
+		t.Fatalf("Conn.sendPkt after NodeLink shutdown: %v", err)
+	}
+
+	xclose(c23)
+	// recvPkt/sendPkt on closed Conn but not closed NodeLink
+	pkt, err = c23.recvPkt()
+	if !(pkt == nil && xconnError(err) == ErrClosedConn) {
+		t.Fatalf("Conn.recvPkt after close but only stopped NodeLink: pkt = %v  err = %v", pkt, err)
+	}
+	err = c23.sendPkt(c23.mkpkt(0, []byte("data")))
+	if xconnError(err) != ErrClosedConn {
+		t.Fatalf("Conn.sendPkt after close but only stopped NodeLink: %v", err)
+	}
+
+	xclose(nl2)
+	// recvPkt/sendPkt NewConn/Accept error after NodeLink close
+	pkt, err = c21.recvPkt()
+	if !(pkt == nil && xconnError(err) == ErrLinkClosed) {
+		t.Fatalf("Conn.recvPkt after NodeLink shutdown: pkt = %v  err = %v", pkt, err)
+	}
+	err = c22.sendPkt(c22.mkpkt(0, []byte("data")))
+	if xconnError(err) != ErrLinkClosed {
+		t.Fatalf("Conn.sendPkt after NodeLink shutdown: %v", err)
+	}
+
+	c, err = nl2.NewConn()
+	if xlinkError(err) != ErrLinkClosed {
+		t.Fatalf("NewConn after NodeLink close: %v", err)
+	}
+	c, err = nl2.Accept()
+	if xlinkError(err) != ErrLinkClosed {
+		t.Fatalf("Accept after NodeLink close: %v", err)
+	}
+
+	xclose(c21)
+	xclose(c22)
+	// recvPkt/sendPkt error after Close & NodeLink shutdown
+	pkt, err = c21.recvPkt()
+	if !(pkt == nil && xconnError(err) == ErrClosedConn) {
+		t.Fatalf("Conn.recvPkt after close and NodeLink close: pkt = %v  err = %v", pkt, err)
+	}
+	err = c22.sendPkt(c22.mkpkt(0, []byte("data")))
+	if xconnError(err) != ErrClosedConn {
+		t.Fatalf("Conn.sendPkt after close and NodeLink close: %v", err)
+	}
+
+
+	saveKeepClosed := connKeepClosed
+	connKeepClosed = 10 * time.Millisecond
+
+	// Conn accept + exchange
+	nl1, nl2 = nodeLinkPipe()
+	nl1.CloseAccept()
+	wg = &errgroup.Group{}
+	closed := make(chan int)
+	gox(wg, func() {
+		c := xaccept(nl2)
+
+		pkt := xrecvPkt(c)
+		xverifyPkt(pkt, c.connId, 33, []byte("ping"))
+
+		// change pkt a bit and send it back
+		xsendPkt(c, c.mkpkt(34, []byte("pong")))
+
+		// one more time
+		pkt = xrecvPkt(c)
+		xverifyPkt(pkt, c.connId, 35, []byte("ping2"))
+		xsendPkt(c, c.mkpkt(36, []byte("pong2")))
+
+		xclose(c)
+		closed <- 1
+
+		// once again as ^^^ but finish only with CloseRecv
+		c2 := xaccept(nl2)
+		pkt = xrecvPkt(c2)
+		xverifyPkt(pkt, c2.connId, 41, []byte("ping5"))
+		xsendPkt(c2, c2.mkpkt(42, []byte("pong5")))
+
+		c2.CloseRecv()
+		closed <- 2
+
+		// "connection refused" when trying to connect to not-listening peer
+		c = xnewconn(nl2) // XXX should get error here?
+		xsendPkt(c, c.mkpkt(38, []byte("pong3")))
+		pkt = xrecvPkt(c)
+		xverifyPktMsg(pkt, c.connId, errConnRefused)
+		xsendPkt(c, c.mkpkt(40, []byte("pong4"))) // once again
+		pkt = xrecvPkt(c)
+		xverifyPktMsg(pkt, c.connId, errConnRefused)
+
+		xclose(c)
+
+	})
+
+	c1 := xnewconn(nl1)
+	xsendPkt(c1, c1.mkpkt(33, []byte("ping")))
+	pkt = xrecvPkt(c1)
+	xverifyPkt(pkt, c1.connId, 34, []byte("pong"))
+	xsendPkt(c1, c1.mkpkt(35, []byte("ping2")))
+	pkt = xrecvPkt(c1)
+	xverifyPkt(pkt, c1.connId, 36, []byte("pong2"))
+
+	// "connection closed" after peer closed its end
+	<-closed
+	xsendPkt(c1, c1.mkpkt(37, []byte("ping3")))
+	pkt = xrecvPkt(c1)
+	xverifyPktMsg(pkt, c1.connId, errConnClosed)
+	xsendPkt(c1, c1.mkpkt(39, []byte("ping4"))) // once again
+	pkt = xrecvPkt(c1)
+	xverifyPktMsg(pkt, c1.connId, errConnClosed)
+	// XXX also should get EOF on recv
+
+	// one more time but now peer does only .CloseRecv()
+	c2 := xnewconn(nl1)
+	xsendPkt(c2, c2.mkpkt(41, []byte("ping5")))
+	pkt = xrecvPkt(c2)
+	xverifyPkt(pkt, c2.connId, 42, []byte("pong5"))
+	<-closed
+	xsendPkt(c2, c2.mkpkt(41, []byte("ping6")))
+	pkt = xrecvPkt(c2)
+	xverifyPktMsg(pkt, c2.connId, errConnClosed)
+
+	xwait(wg)
+
+	// make sure entry for closed nl2.1 stays in nl2.connTab
+	nl2.connMu.Lock()
+	if cnl2 := nl2.connTab[1]; cnl2 == nil {
+		t.Fatal("nl2.connTab[1] == nil  ; want \"closed\" entry")
+	}
+	nl2.connMu.Unlock()
+
+	// make sure "closed" entry goes away after its time
+	time.Sleep(3*connKeepClosed)
+	nl2.connMu.Lock()
+	if cnl2 := nl2.connTab[1]; cnl2 != nil {
+		t.Fatalf("nl2.connTab[1] == %v after close time window  ; want nil", cnl2)
+	}
+	nl2.connMu.Unlock()
+
+	xclose(c1)
+	xclose(c2)
+	xclose(nl1)
+	xclose(nl2)
+	connKeepClosed = saveKeepClosed
+
+	// test 2 channels with replies coming in reversed time order
+	nl1, nl2 = nodeLinkPipe()
+	wg = &errgroup.Group{}
+	replyOrder := map[uint16]struct { // "order" in which to process requests
+		start chan struct{} // processing starts when start chan is ready
+		next  uint16        // after processing this switch to next
+	}{
+		2: {make(chan struct{}), 1},
+		1: {make(chan struct{}), 0},
+	}
+	close(replyOrder[2].start)
+
+	gox(wg, func() {
+		for _ = range replyOrder {
+			c := xaccept(nl2)
+
+			gox(wg, func() {
+				pkt := xrecvPkt(c)
+				n := packed.Ntoh16(pkt.Header().MsgCode)
+				x := replyOrder[n]
+
+				// wait before it is our turn & echo pkt back
+				<-x.start
+				xsendPkt(c, pkt)
+
+				xclose(c)
+
+				// tell next it can start
+				if x.next != 0 {
+					close(replyOrder[x.next].start)
+				}
+			})
+		}
+	})
+
+	c1 = xnewconn(nl1)
+	c2 = xnewconn(nl1)
+	xsendPkt(c1, c1.mkpkt(1, []byte("")))
+	xsendPkt(c2, c2.mkpkt(2, []byte("")))
+
+	// replies must be coming in reverse order
+	xechoWait := func(c *Conn, msgCode uint16) {
+		pkt := xrecvPkt(c)
+		xverifyPkt(pkt, c.connId, msgCode, []byte(""))
+	}
+	xechoWait(c2, 2)
+	xechoWait(c1, 1)
+	xwait(wg)
+
+	xclose(c1)
+	xclose(c2)
+	xclose(nl1)
+	xclose(nl2)
+}
+
+
+// ---- benchmarks ----
+
+// rtt over chan - for comparison as base.
+func benchmarkChanRTT(b *testing.B, c12, c21 chan byte) {
+	go func() {
+		for {
+			c, ok := <-c12
+			if !ok {
+				break
+			}
+
+			c21 <- c
+		}
+	}()
+
+	for i := 0; i < b.N; i++ {
+		c := byte(i)
+		c12 <- c
+		cc := <-c21
+		if cc != c {
+			b.Fatalf("sent %q != got %q", c, cc)
+		}
+	}
+
+	close(c12)
+}
+
+func BenchmarkSyncChanRTT(b *testing.B) {
+	benchmarkChanRTT(b, make(chan byte), make(chan byte))
+}
+
+func BenchmarkBufChanRTT(b *testing.B) {
+	benchmarkChanRTT(b, make(chan byte, 1), make(chan byte, 1))
+}
+
+// rtt over (acceptq, rxq) & ack channels - base comparison for link.Accept + conn.Recv .
+func BenchmarkBufChanAXRXRTT(b *testing.B) {
+	axq := make(chan chan byte)
+	ack := make(chan byte)
+	go func() {
+		for {
+			// accept
+			rxq, ok := <-axq
+			if !ok {
+				break
+			}
+
+			// recv
+			c := <-rxq
+
+			// send back
+			ack <- c
+		}
+	}()
+
+	rxq := make(chan byte, 1) // buffered
+	for i := 0; i < b.N; i++ {
+		c := byte(i)
+		axq <- rxq
+		rxq <- c
+		cc := <-ack
+		if cc != c {
+			b.Fatalf("sent %q != got %q", c, cc)
+		}
+	}
+
+	close(axq)
+}
+
+
+var gosched = make(chan struct{})
+
+// GoschedLocal is like runtime.Gosched but queues current goroutine on P-local
+// runqueue instead of global runqueue.
+// FIXME does not work - in the end goroutines appear on different Ps/Ms
+func GoschedLocal() {
+	go func() {
+		gosched <- struct{}{}
+	}()
+	<-gosched
+}
+
+// rtt over net.Conn Read/Write
+// if serveRecv=t - do RX path with additional serveRecv-style goroutine
+func benchmarkNetConnRTT(b *testing.B, c1, c2 net.Conn, serveRecv bool, ghandoff bool) {
+	buf1 := make([]byte, 1)
+	buf2 := make([]byte, 1)
+
+	// make func to recv from c into buf via selected rx strategy
+	mkrecv := func(c net.Conn, buf []byte) func() (int, error) {
+		var recv func() (int, error)
+		if serveRecv {
+			type rx struct {
+				n   int
+				erx error
+			}
+			rxq := make(chan rx, 1)
+			rxghandoff := make(chan struct{})
+			var serveRx func()
+			serveRx = func() {
+				for {
+					n, erx := io.ReadFull(c, buf)
+					//fmt.Printf("(go) %p rx -> %v %v\n", c, n, erx)
+					rxq <- rx{n, erx}
+
+					// good: reduce switch to receiver G latency
+					// see comment about rxghandoff in serveRecv
+					// in case of TCP/loopback saves ~5μs
+					if ghandoff {
+						<-rxghandoff
+					}
+
+					// stop on first error
+					if erx != nil {
+						return
+					}
+
+					if false {
+						// bad - puts G in global runq and so it changes M
+						runtime.Gosched()
+					}
+					if false {
+						// bad - same as runtime.Gosched
+						GoschedLocal()
+					}
+
+					if false {
+						// bad - in the end Gs appear on different Ms
+						go serveRx()
+						return
+					}
+				}
+			}
+
+			go serveRx()
+
+			recv = func() (int, error) {
+				r := <-rxq
+				if ghandoff {
+					rxghandoff <- struct{}{}
+				}
+				return r.n, r.erx
+			}
+
+		} else {
+			recv = func() (int, error) {
+				return io.ReadFull(c, buf)
+			}
+		}
+		return recv
+	}
+
+	recv1 := mkrecv(c1, buf1)
+	recv2 := mkrecv(c2, buf2)
+
+
+	b.ResetTimer()
+
+	go func() {
+		defer func() {
+			//fmt.Printf("2: close\n")
+			xclose(c2)
+		}()
+
+		for {
+			n, erx := recv2()
+			//fmt.Printf("2: rx %q\n", buf2[:n])
+			if n > 0 {
+				if n != len(buf2) {
+					b.Fatalf("read -> %d bytes  ; want %d", n, len(buf2))
+				}
+
+				//fmt.Printf("2: tx %q\n", buf2)
+				_, etx := c2.Write(buf2)
+				if etx != nil {
+					b.Fatal(etx)
+				}
+			}
+
+			switch erx {
+			case nil:
+				// ok
+
+			case io.ErrClosedPipe, io.EOF: // net.Pipe, TCP
+				return
+
+			default:
+				b.Fatal(erx) // XXX cannot call b.Fatal from non-main goroutine?
+			}
+		}
+	}()
+
+	for i := 0; i < b.N; i++ {
+		c := byte(i)
+		buf1[0] = c
+		//fmt.Printf("1: tx %q\n", buf1)
+		_, err := c1.Write(buf1)
+		if err != nil {
+			b.Fatal(err)
+		}
+
+		n, err := recv1()
+		//fmt.Printf("1: rx %q\n", buf1[:n])
+		if !(n == len(buf1) && err == nil) {
+			b.Fatalf("read back: n=%v  err=%v", n, err)
+		}
+
+		if buf1[0] != c {
+			b.Fatalf("sent %q != got %q", c, buf1[0])
+		}
+	}
+
+	//fmt.Printf("1: close\n")
+	xclose(c1)
+}
+
+// rtt over net.Pipe - for comparison as base.
+func BenchmarkNetPipeRTT(b *testing.B) {
+	c1, c2 := net.Pipe()
+	benchmarkNetConnRTT(b, c1, c2, false, false)
+}
+
+func BenchmarkNetPipeRTTsr(b *testing.B) {
+	c1, c2 := net.Pipe()
+	benchmarkNetConnRTT(b, c1, c2, true, false)
+}
+
+func BenchmarkNetPipeRTTsrho(b *testing.B) {
+	c1, c2 := net.Pipe()
+	benchmarkNetConnRTT(b, c1, c2, true, true)
+}
+
+// xtcpPipe creates two TCP connections connected to each other via loopback.
+func xtcpPipe() (*net.TCPConn, *net.TCPConn) {
+	// NOTE go sets TCP_NODELAY by default for TCP sockets
+	l, err := net.Listen("tcp", "localhost:")
+	exc.Raiseif(err)
+
+	c1, err := net.Dial("tcp", l.Addr().String())
+	exc.Raiseif(err)
+
+	c2, err := l.Accept()
+	exc.Raiseif(err)
+
+	xclose(l)
+	return c1.(*net.TCPConn), c2.(*net.TCPConn)
+}
+
+// rtt over TCP/loopback - for comparison as base.
+func BenchmarkTCPlo(b *testing.B) {
+	c1, c2 := xtcpPipe()
+	benchmarkNetConnRTT(b, c1, c2, false, false)
+}
+
+func BenchmarkTCPlosr(b *testing.B) {
+	c1, c2 := xtcpPipe()
+	benchmarkNetConnRTT(b, c1, c2, true, false)
+}
+
+func BenchmarkTCPlosrho(b *testing.B) {
+	c1, c2 := xtcpPipe()
+	benchmarkNetConnRTT(b, c1, c2, true, true)
+}
--- a/go/neo/neonet/misc.go
+++ b/go/neo/neonet/misc.go
+// Copyright (C) 2016-2018  Nexedi SA and Contributors.
+//                          Kirill Smelkov <kirr@nexedi.com>
+//
+// This program is free software: you can Use, Study, Modify and Redistribute
+// it under the terms of the GNU General Public License version 3, or (at your
+// option) any later version, as published by the Free Software Foundation.
+//
+// You can also Link and Combine this program with other software covered by
+// the terms of any of the Free Software licenses or any of the Open Source
+// Initiative approved licenses and Convey the resulting work. Corresponding
+// source of such a combination shall include the source code for all other
+// software used.
+//
+// This program is distributed WITHOUT ANY WARRANTY; without even the implied
+// warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
+//
+// See COPYING file for full licensing terms.
+// See https://www.nexedi.com/licensing for rationale and options.
+
+package neonet
+// syntax sugar for atomic load/store to raise signal/noise in logic
+
+import "sync/atomic"
+
+type atomic32 struct {
+	v int32	// struct member so `var a atomic32; if a == 0 ...` does not work
+}
+
+func (a *atomic32) Get() int32 {
+	return atomic.LoadInt32(&a.v)
+}
+
+func (a *atomic32) Set(v int32) {
+	atomic.StoreInt32(&a.v, v)
+}
+
+func (a *atomic32) Add(δ int32) int32 {
+	return atomic.AddInt32(&a.v, δ)
+}
--- a/go/neo/neonet/pkt.go
+++ b/go/neo/neonet/pkt.go
+// Copyright (C) 2016-2018  Nexedi SA and Contributors.
+//                          Kirill Smelkov <kirr@nexedi.com>
+//
+// This program is free software: you can Use, Study, Modify and Redistribute
+// it under the terms of the GNU General Public License version 3, or (at your
+// option) any later version, as published by the Free Software Foundation.
+//
+// You can also Link and Combine this program with other software covered by
+// the terms of any of the Free Software licenses or any of the Open Source
+// Initiative approved licenses and Convey the resulting work. Corresponding
+// source of such a combination shall include the source code for all other
+// software used.
+//
+// This program is distributed WITHOUT ANY WARRANTY; without even the implied
+// warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
+//
+// See COPYING file for full licensing terms.
+// See https://www.nexedi.com/licensing for rationale and options.
+
+package neonet
+// packets and packet buffers management
+
+import (
+	"fmt"
+	"reflect"
+	"sync"
+	"unsafe"
+
+	"lab.nexedi.com/kirr/go123/xbytes"
+
+	"lab.nexedi.com/kirr/neo/go/internal/packed"
+	"lab.nexedi.com/kirr/neo/go/neo/proto"
+)
+
+// pktBuf is a buffer with full raw packet (header + payload).
+//
+// Allocate pktBuf via pktAlloc() and free via pktBuf.Free().
+type pktBuf struct {
+	data []byte // whole packet data including all headers
+}
+
+// Header returns pointer to packet header.
+func (pkt *pktBuf) Header() *proto.PktHeader {
+	// NOTE no need to check len(.data) < PktHeader:
+	// .data is always allocated with cap >= PktHeaderLen.
+	return (*proto.PktHeader)(unsafe.Pointer(&pkt.data[0]))
+}
+
+// Payload returns []byte representing packet payload.
+func (pkt *pktBuf) Payload() []byte {
+	return pkt.data[proto.PktHeaderLen:]
+}
+
+// ---- pktBuf freelist ----
+
+// pktBufPool is sync.Pool<pktBuf>.
+var pktBufPool = sync.Pool{New: func() interface{} {
+	return &pktBuf{data: make([]byte, 0, 4096)}
+}}
+
+// pktAlloc allocates pktBuf with len=n.
+//
+// n must be >= sizeof(proto.PktHeader).
+func pktAlloc(n int) *pktBuf {
+	if n < proto.PktHeaderLen {
+		panic("pktAlloc: n < sizeof(PktHeader)")
+	}
+	pkt := pktBufPool.Get().(*pktBuf)
+	pkt.data = xbytes.Realloc(pkt.data, n)
+	return pkt
+}
+
+// Free marks pkt as no longer needed.
+func (pkt *pktBuf) Free() {
+	pktBufPool.Put(pkt)
+}
+
+
+// ---- pktBuf dump ----
+
+// String dumps a packet in human-readable form.
+func (pkt *pktBuf) String() string {
+	if len(pkt.data) < proto.PktHeaderLen {
+		return fmt.Sprintf("(! < PktHeaderLen) % x", pkt.data)
+	}
+
+	h := pkt.Header()
+	s := fmt.Sprintf(".%d", packed.Ntoh32(h.ConnId))
+
+	msgCode := packed.Ntoh16(h.MsgCode)
+	msgLen  := packed.Ntoh32(h.MsgLen)
+	data    := pkt.Payload()
+	msgType := proto.MsgType(msgCode)
+	if msgType == nil {
+		s += fmt.Sprintf(" ? (%d) #%d [%d]: % x", msgCode, msgLen, len(data), data)
+		return s
+	}
+
+	// XXX dup wrt Conn.Recv
+	msg := reflect.New(msgType).Interface().(proto.Msg)
+	n, err := msg.NEOMsgDecode(data)
+	if err != nil {
+		s += fmt.Sprintf(" (%s) %v; #%d [%d]: % x", msgType, err, msgLen, len(data), data)
+	}
+
+	s += fmt.Sprintf(" %s %v", msgType.Name(), msg) // XXX or %+v better?
+
+	if n < len(data) {
+		tail := data[n:]
+		s += fmt.Sprintf(" ;  [%d]tail: % x", len(tail), tail)
+	}
+
+	return s
+}
+
+// Dump dumps a packet in raw form.
+func (pkt *pktBuf) Dump() string {
+	if len(pkt.data) < proto.PktHeaderLen {
+		return fmt.Sprintf("(! < pktHeaderLen) % x", pkt.data)
+	}
+
+	h := pkt.Header()
+	data := pkt.Payload()
+	return fmt.Sprintf(".%d (%d) #%d [%d]: % x",
+		packed.Ntoh32(h.ConnId), packed.Ntoh16(h.MsgCode), packed.Ntoh32(h.MsgLen), len(data), data)
+}
--- a/go/neo/proto/proto.go
+++ b/go/neo/proto/proto.go
@@ -31,7 +31,8 @@
 // A message type can be looked up by message code with MsgType.
 //
 // The proto packages provides only message definitions and low-level
-// primitives for their marshalling.
+// primitives for their marshalling. Package lab.nexedi.com/kirr/neo/go/neo/neonet
+// provides actual service for message exchange over network.
 package proto

 // This file defines everything that relates to messages on the wire.