Commit 573514c6 authored by Kirill Smelkov's avatar Kirill Smelkov

X go/neo: Add SSL support

parent cc4854b9
......@@ -615,12 +615,56 @@ func openClientByURL(ctx context.Context, u *url.URL, opt *zodb.DriverOptions) (
return nil, zodb.InvalidTid, fmt.Errorf("cluster name not specified")
}
qv, err := url.ParseQuery(u.RawQuery)
if err != nil {
return nil, zodb.InvalidTid, err
}
q := map[string]string{}
for k, vv := range qv {
if len(vv) == 0 {
return nil, zodb.InvalidTid, fmt.Errorf("parameter %q without value", k)
}
if len(vv) != 1 {
return nil, zodb.InvalidTid, fmt.Errorf("duplicate parameter %q ", k)
}
q[k] = vv[0]
}
qpop := func(k string) string {
v := q[k]
delete(q, k)
return v
}
ssl := false
ca := qpop("ca")
cert := qpop("cert")
key := qpop("key")
if len(q) != 0 {
return nil, zodb.InvalidTid, fmt.Errorf("invalid query: %v", q)
  • @kirr do you remember why this if-clause was added? What's the purpose of returning an error if the query part of the zuri is not empty?

    I mean, at this moment of time, I guess it was assumed, that if there are more options than the SSL options - which at this time were provided in the URL query part - the user provided some faulty SSL options?

    But then this error could have removed with 8c974485, couldn't it?

    The URI scheme // neo(s)://[credentials@]master1,master2,...,masterN/name?options explicitly allows options in the URI query part. Even already before nexedi/slapos@0cf70a6e we added zstorage options to the query part of the zurl (see here). If there are any options and they are passed to NEO/go client, this currently raises invalid query error. From the URI scheme I would rather guess that this is a problem of NEO/go than of SlapOS, do I miss something there?

  • @levin.zimmermann, I do not remember exactly, but when I look at this code today what comes to mind is that unsupported options are rejected similarly to how calling a function with unsupported arguments is rejected as well. For example:

    In [1]: def f(x): pass
    
    In [2]: f(x=1)
    
    In [3]: f(y=1)
    ---------------------------------------------------------------------------
    TypeError                                 Traceback (most recent call last)
    Cell In [3], line 1
    ----> 1 f(y=1)
    
    TypeError: f() got an unexpected keyword argument 'y'

    the logic here is that we do not ignore arguments that we don't know about and instead report back about it.

    So if neo:// would support an option, we would extract those option names from query part, and then check that what is left is empty and complain if not. But here it is that we do not handle any options yet, and that's why it simply checks that original query is completely empty.

    Another reason not to ignore unknown options is this: imagine if we actually ignore unknowns, then e.g. neo://...&someoption=aaa would be treated exactly as neo://.... But if, in the future, we add support to the code to handle that someoption, the behaviour will become different without any requesting action from the user. And also originally, if &someoption=aaa was used, the user was assuming that the behaviour of opened storage would correspond to the meaning related to option passed, but it was not the case. From this point of view it makes sense not to ignore unknown options to avoid behaviour changing surprises and falsely reporting "ok" for particular request.

    Another reason not to ignore unknowns is related: in WCFS we compute mountpoint as md5(zurl) and assume that if two ZODB users use the same storage, they would use the same WCFS mountpoint. This works only if md5(zurl) is the same for both, which in practice means that zurl should be also the same for both. Now if we ignore options and user A uses zurl without option and user B use same zurl, but with appended option, this will break.

    Do you see?

    I prepared this reply quickly and so it might be not well structured and clear, but hopefully it could still help a bit,

    Kirill

    P.S. please ask if you have further questions or something is not clear.

  • P.P.S. what is the particular option that gets passed and rejected?

  • Thanks for your fast reply @kirr, everything is very well understandable and clear.

    Currently, SlapOS passes to the query parts of the zurl any option passed to the storage (from the storage-dict parameter) which is not consumed by explicit z.pop calls in instance-wcfs.cfg.in. In case of our WWM/wind clone this is _ca, _cert, _key and compress.

    I guess, we either need to stop SlapOS from passing any option provided in storage-dict to the query part of the zurl, or we allow unused options in the query part in NEO/go (we could also do this in explicit ways, popping out any option we would expect, to avoid receiving undefined parameters, and if then is still any option left, return the error).

    I just very quickly checked respective NEO/py code. Here any option is initially processed, but I don't know if an error is just raised later.

    If I understand zodburi correctly, the zurl doesn't only specifies how we can connect to a storage, but also how we can initialize a storage. Then two zurl are indeed only equal if both zurl specify the same options. On the other hand, these options aren't necessarily needed to connect to a storage (for instance with NEO we only need the master nodes url, the cluster name, and maybe the certificates to connect).

  • Hello @levin.zimmermann,

    Thanks for feedback - I see the context now. It is true that zodburi mixes a bit meaning of parameters and pushes everything - both on server parameters and parameters specific to client behaviour - into query. It is not very correct to do though. I've created kirr/neo!3 (closed) to fix this for NEO. Please note that neo/client/zodburi.py was implemented completely by me from scratch. Only in 2017 I was not so conscious about which part of the URL should be used for what, and there was no WCFS use-case that highlighted the differences. With the proposed fix SlapOS should be able to pass compress option in fragment. Do you think that might work for you?

    Regarding _ca, _cert and _key - I'm not sure why we pass them down in options at all: those defines certificate data and after processing we should refer to the files with that data via ca=...&cert=...&key=... in credential without using them in the options, isn't it?

    In general I would prefer not to ignore unknown options that control server-side behaviour.

    Kirill

  • Hello @kirr,

    thanks for your feedback and your proposed solution! I'll reply more on this in the respective MR.

    Regarding _ca, _cert and _key - I'm not sure why we pass them down in options at all: those defines certificate data and after processing we should refer to the files with that data via ca=...&cert=...&key=... in credential without using them in the options, isn't it?

    Yes we don't need them for our WCFS options. I think I need to patch SlapOS to only forward a limited set of the storage-dict content (to either fragment or query part in order to differentiate between client/server options).

    When asking, why we use _ca in our storage-dict at all, I think the answer to this is here: SlapOS doesn't deploy SSL files. In order to still ensure an instance works with only using SlapOS commands and no manual copy-paste of files those _ca etc. parameters are used.

    The ca, cert and key options (as file paths) don't exist on SlapOS UI level. It's hard-coded that they must be at ~/etc/...

    In general I would prefer not to ignore unknown options that control server-side behaviour.

    I understand, this makes sense.

    Best, Levin

  • Hello @levin.zimmermann,

    Thanks for feedback and you are welcome. Good to hear that proposed patch might work for you.

    Regarding _ca and friends - I looked around to better understand what you describe and my reading is this:

    "ssl": { "description": "Enable SSL. All nodes look for 3 files in ~/etc: ca.crt, neo.crt, neo.key. Waiting that SlapOS provides a way to manage certificates, the user must deploy them manually, or use the temporary _ca/_cert/_key parameters."

    (emphasis mine)

    So if _ca, _cert and _key are given, ERP5 and NEO SRs deploy ~/etc/*.crt from the data provided in those parameters. This was initially added in nexedi/slapos@706801f2 and is still present as of today. So the part, that deploys certificate files from those parameters should kind of "consume" them. I mean we should not pass them down to take part in constructed zurl. This can be achieved in several way. One way, for example, is to pop those parameters from storage-dict, after the cert files are deployed, so that for the rest of the code it looks like there is no such storage parameters at all. The other way is to explicitly ignore them in instance-wcfs.cfg . Probably that's what we'll need to do as anyway we'll need to distinguish ? parameters from # ones somehow, and that logic is specific to zurl construction. So it might be convenient to have 3 sets in that logic:

    1. parameters to ignore (e.g. _ca)
    2. parameters to treat as local ones (e.g. #compress)
    3. the rest is parameters to append as ? options

    That's what you probably meant with

    I think I need to patch SlapOS to only forward a limited set of the storage-dict content (to either fragment or query part in order to differentiate between client/server options)

    so it looks we are on the common track on how to solve this.

    Kirill

  • Hi @kirr, thanks for your further explanations, this is also my understanding of _ca & friends. I indeed also had the same vision for the solution on SlapOS level, I applied it in nexedi/slapos!1400 (closed). Best, Levin

  • Thanks, Levin. I will follow-up with my feedback there.

Please register or sign in to reply
}
if ca != "" || cert != "" || key != "" {
if !(ca != "" && cert != "" && key != "") {
return nil, zodb.InvalidTid, fmt.Errorf("incomplete ca/cert/key provided")
}
ssl = true
}
if !opt.ReadOnly {
return nil, zodb.InvalidTid, fmt.Errorf("TODO write mode not implemented")
}
// XXX check/use other url fields
net := xnet.NetPlain("tcp") // TODO + TLS; not only "tcp" ?
net := xnet.NetPlain("tcp") // TODO not only "tcp" ?
if ssl {
tlsCfg, err := tlsForSSL(ca, cert, key)
if err != nil {
return nil, zodb.InvalidTid, err
}
net = xnet.NetTLS(net, tlsCfg)
}
c := NewClient(u.User.Username(), u.Host, net)
c.watchq = opt.Watchq
......
......@@ -26,6 +26,7 @@ import (
"net/url"
"os"
"os/exec"
"strings"
"testing"
"time"
......@@ -38,8 +39,10 @@ import (
// NEOSrv represents running NEO server.
type NEOSrv interface {
ClusterName() string // name of the cluster
MasterAddr() string // address of the master
// XXX kill or restore?
//ClusterName() string // name of the cluster
//MasterAddr() string // address of the master
ZUrl() string // zurl to access this NEO server
Bugs() []string // list of known server bugs
}
......@@ -57,6 +60,11 @@ type NEOPySrv struct {
errExit error // error from Wait
masterAddr string // address of master in spawned cluster
// CA/Cert/Key used for SSL exchange. Empty if SSL is not used
CA string
Cert string
Key string
}
func (_ *NEOPySrv) Bugs() []string {
......@@ -71,6 +79,8 @@ type NEOPyOptions struct {
// nreplica
// name
SSL bool // whether to use SSL for node-node exchange
}
// StartNEOPySrv starts NEO/py server for clusterName NEO database located in workdir/.
......@@ -90,8 +100,20 @@ func StartNEOPySrv(workdir, clusterName string, opt NEOPyOptions) (_ *NEOPySrv,
}
n := &NEOPySrv{workdir: workdir, clusterName: clusterName, cancel: cancel, done: make(chan struct{})}
if opt.SSL {
npytests := "../../neo/tests/"
n.CA = npytests + "ca.crt"
n.Cert = npytests + "node.crt"
n.Key = npytests + "node.key"
}
// XXX $PYTHONPATH to top, so that `import neo` works?
n.pysrv = xexec.Command("./py/runneo.py", workdir, clusterName) // XXX +opt
n.pysrv = xexec.Command("./py/runneo.py", workdir, clusterName)
if opt.SSL {
n.pysrv.Args = append(n.pysrv.Args, "ca=" +n.CA)
n.pysrv.Args = append(n.pysrv.Args, "cert="+n.Cert)
n.pysrv.Args = append(n.pysrv.Args, "key=" +n.Key)
}
n.opt = opt
// $TEMP -> workdir (else NEO/py creates another one for e.g. coverage)
n.pysrv.Env = append(os.Environ(), "TEMP="+workdir)
......@@ -152,6 +174,23 @@ func (n *NEOPySrv) MasterAddr() string {
return n.masterAddr
}
func (n *NEOPySrv) ZUrl() string {
zurl := fmt.Sprintf("neo://%s@%s", n.ClusterName(), n.MasterAddr())
argv := []string{}
if n.opt.SSL {
argv = append(argv, "ca=" +url.QueryEscape(n.CA))
argv = append(argv, "cert="+url.QueryEscape(n.Cert))
argv = append(argv, "key=" +url.QueryEscape(n.Key))
}
if len(argv) != 0 {
zurl += "?"
zurl += strings.Join(argv, "&")
}
return zurl
}
func (n *NEOPySrv) Close() (err error) {
defer xerr.Contextf(&err, "stopneo %s", n.workdir)
......@@ -194,16 +233,22 @@ func withNEOSrv(t *testing.T, f func(t *testing.T, nsrv NEOSrv), optv ...tOption
f(work)
}
// TODO + all variants with nreplic=X, npartition=Y, nmaster=Z, ... ?
// TODO + all variants with nreplica=X, npartition=Y, nmaster=Z, ... ?
for _, ssl := range []bool{false, true} {
kind := ""
if ssl { kind = "ssl" } else { kind = "!ssl" }
// NEO/py
t.Run("py", func(t *testing.T) {
t.Run("py/"+kind, func(t *testing.T) {
t.Helper()
// XXX needpy
inWorkDir(t, func(workdir string) {
X := xtesting.FatalIf(t)
npy, err := StartNEOPySrv(workdir, "1", NEOPyOptions{}); X(err)
npy, err := StartNEOPySrv(workdir, "1", NEOPyOptions{
SSL: ssl,
}); X(err)
defer func() {
err := npy.Close(); X(err)
}()
......@@ -213,8 +258,16 @@ func withNEOSrv(t *testing.T, f func(t *testing.T, nsrv NEOSrv), optv ...tOption
"from neo.scripts.neomigrate import main; main()",
"-q",
"-c", npy.ClusterName(),
)
if ssl {
cmd.Args = append(cmd.Args, "--ca", npy.CA)
cmd.Args = append(cmd.Args, "--cert", npy.Cert)
cmd.Args = append(cmd.Args, "--key", npy.Key)
}
cmd.Args = append(cmd.Args,
opt.Preload,
npy.MasterAddr())
npy.MasterAddr(),
)
cmd.Stdin = nil
cmd.Stdout = os.Stdout
cmd.Stderr = os.Stderr
......@@ -227,6 +280,7 @@ func withNEOSrv(t *testing.T, f func(t *testing.T, nsrv NEOSrv), optv ...tOption
// TODO NEO/go
}
}
// withNEO tests f on all kinds of NEO servers connected to by NEO client.
......@@ -235,7 +289,7 @@ func withNEO(t *testing.T, f func(t *testing.T, nsrv NEOSrv, ndrv *Client), optv
withNEOSrv(t, func(t *testing.T, nsrv NEOSrv) {
t.Helper()
X := xtesting.FatalIf(t)
ndrv, _, err := neoOpen(fmt.Sprintf("neo://%s@%s", nsrv.ClusterName(), nsrv.MasterAddr()),
ndrv, _, err := neoOpen(nsrv.ZUrl(),
&zodb.DriverOptions{ReadOnly: true}); X(err)
defer func() {
err := ndrv.Close(); X(err)
......@@ -271,7 +325,7 @@ func TestLoad(t *testing.T) {
func TestWatch(t *testing.T) {
withNEOSrv(t, func(t *testing.T, nsrv NEOSrv) {
xtesting.DrvTestWatch(t, fmt.Sprintf("neo://%s@%s", nsrv.ClusterName(), nsrv.MasterAddr()), openClientByURL)
xtesting.DrvTestWatch(t, nsrv.ZUrl(), openClientByURL)
})
}
......
......@@ -20,7 +20,7 @@
# See https://www.nexedi.com/licensing for rationale and options.
"""runneo.py runs NEO/py cluster for NEO/go testing.
Usage: runneo.py <workdir> <cluster-name> XXX + (**kw for NEOCluster)
Usage: runneo.py <workdir> <cluster-name> [k1=v1] [k2=v2] ...
<workdir>/ready is created with address of master after spawned cluster becomes
operational.
......@@ -40,7 +40,14 @@ def main():
clusterName = sys.argv[2]
readyf = workdir + "/ready"
def sinfo(msg): return "I: runneo.py: %s/%s: %s" % (workdir, clusterName, msg)
kw = {}
for arg in sys.argv[3:]:
k, v = arg.split('=')
kw[k] = v
flags = ''
def sinfo(msg): return "I: runneo.py: %s/%s%s: %s" % (workdir, clusterName, flags, msg)
def info(msg): print(sinfo(msg))
# SIGTERM -> exit gracefully, so that defers are run
......@@ -48,7 +55,21 @@ def main():
raise SystemExit(sinfo("terminated"))
signal(SIGTERM, _)
cluster = NEOCluster([clusterName], adapter='SQLite', name=clusterName, temp_dir=workdir) # XXX +kw
flags = ' !ssl'
ca = kw.pop('ca', None)
cert = kw.pop('cert', None)
key = kw.pop('key', None)
if ca or cert or key:
if not (ca and cert and key):
raise RuntimeError(sinfo("incomplete ca/cert/key provided"))
# neo/py does `NEOCluster.SSL = neo.test.SSL` (= (ca.crt, node.crt, node.key) )
flags = ' ssl'
NEOCluster.SSL = (ca, cert, key)
if kw:
raise RuntimeError(sinfo("unexpected flags: %s" % kw))
cluster = NEOCluster([clusterName], adapter='SQLite', name=clusterName, temp_dir=workdir)
cluster.start()
defer(cluster.stop)
......
// Copyright (C) 2017 Nexedi SA and Contributors.
// Copyright (C) 2017-2020 Nexedi SA and Contributors.
// Kirill Smelkov <kirr@nexedi.com>
//
// This program is free software: you can Use, Study, Modify and Redistribute
......@@ -21,7 +21,13 @@ package neo
import (
"context"
"crypto/tls"
"crypto/x509"
"fmt"
"io"
"io/ioutil"
"lab.nexedi.com/kirr/go123/xerr"
"lab.nexedi.com/kirr/neo/go/zodb"
"lab.nexedi.com/kirr/neo/go/internal/log"
......@@ -64,3 +70,82 @@ func before2At(before zodb.Tid) (at zodb.Tid) {
return zodb.TidMax
}
}
// tlsForSSL builds tls.Config from ca/cert/key files that should be interoperable with NEO/py.
//
// see https://lab.nexedi.com/nexedi/neoppod/blob/v1.12-61-gc1c26894/neo/lib/app.py#L74-90
func tlsForSSL(ca, cert, key string) (_ *tls.Config, err error) {
defer xerr.Contextf(&err, "tls setup")
caData, err := ioutil.ReadFile(ca)
if err != nil {
return nil, err
}
CA := x509.NewCertPool()
ok := CA.AppendCertsFromPEM(caData)
if !ok {
return nil, fmt.Errorf("invalid CA")
}
crt, err := tls.LoadX509KeyPair(cert, key)
if err != nil {
return nil, err
}
tlsCfg := &tls.Config{
Certificates: []tls.Certificate{crt}, // (cert, key) as loaded
RootCAs: CA, // (ca,) as loaded
// a server also verifies cient (but also see verifyPeerCert below)
ClientAuth: tls.RequireAndVerifyClientCert,
ClientCAs: CA,
PreferServerCipherSuites: true,
}
// TODO only accept TLS >= 1.2 ?
// tls docs say we should parse Certificate[0] into Leaf ourselves
leaf, err := x509.ParseCertificate(crt.Certificate[0])
if err != nil {
return nil, err
}
crt.Leaf = leaf
// NEO/py does not verify CommonName (ssl.check_hostname=False implicitly).
// Match that behaviour with custom VerifyPeerCertificate because Go
// does not provide functionality to skip only CN verification out of the box.
// https://github.com/golang/go/issues/21971#issuecomment-332693931
// https://stackoverflow.com/questions/44295820
verifyPeerCert := func(rawCerts [][]byte, _ [][]*x509.Certificate) (err error) {
defer xerr.Contextf(&err, "verify peer cert")
certv := []*x509.Certificate{}
for _, certData := range rawCerts {
cert, err := x509.ParseCertificate(certData)
if err != nil {
return err
}
certv = append(certv, cert)
}
vopt := x509.VerifyOptions{
DNSName: "", // means "don't verify name"
Roots: tlsCfg.RootCAs,
Intermediates: x509.NewCertPool(),
}
for _, cert := range certv[1:] {
vopt.Intermediates.AddCert(cert)
}
_, err = certv[0].Verify(vopt)
return err
}
tlsCfg.InsecureSkipVerify = true // disables all verifications including for ServerName
tlsCfg.VerifyPeerCertificate = verifyPeerCert
return tlsCfg, nil
}
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment