1. 22 Nov, 2012 6 commits
    • Ying Xue's avatar
      tipc: eliminate an unnecessary cast of node variable · 4cb7d55a
      Ying Xue authored
      As the variable:node is currently defined to u32 type, it is
      unnecessary to cast its type to u32 again when using it.
      Signed-off-by: default avatarYing Xue <ying.xue@windriver.com>
      Signed-off-by: default avatarPaul Gortmaker <paul.gortmaker@windriver.com>
      4cb7d55a
    • Jon Maloy's avatar
      tipc: introduce message to synchronize broadcast link · c64f7a6a
      Jon Maloy authored
      Upon establishing a first link between two nodes, there is
      currently a risk that the two endpoints will disagree on exactly
      which sequence number reception and acknowleding of broadcast
      packets should start.
      
      The following scenarios may happen:
      
      1: Node A sends an ACTIVATE message to B, telling it to start acking
         packets from sequence number N.
      2: Node A sends out broadcast N, but does not expect an acknowledge
         from B, since B is not yet in its broadcast receiver's list.
      3: Node A receives ACK for N from all nodes except B, and releases
         packet N.
      4: Node B receives the ACTIVATE, activates its link endpoint, and
         stores the value N as sequence number of first expected packet.
      5: Node B sends a NAME_DISTR message to A.
      6: Node A receives the NAME_DISTR message, and activates its endpoint.
         At this moment B is added to A's broadcast receiver's set.
         Node A also sets sequence number 0 as the first broadcast packet
         to be received from B.
      7: Node A sends broadcast N+1.
      8: B receives N+1, determines there is a gap in the sequence, since
         it is expecting N, and sends a NACK for N back to A.
      9: Node A has already released N, so no retransmission is possible.
         The broadcast link in direction A->B is stale.
      
      In addition to, or instead of, 7-9 above, the following may happen:
      
      10: Node B sends broadcast M > 0 to A.
      11: Node A receives M, falsely decides there must be a gap, since
          it is expecting packet 0, and asks for retransmission of packets
          [0,M-1].
      12: Node B has already released these packets, so the broadcast
          link is stale in direction B->A.
      
      We solve this problem by introducing a new unicast message type,
      BCAST_PROTOCOL/STATE, to convey the sequence number of the next
      sent broadcast packet to the other endpoint, at exactly the moment
      that endpoint is added to the own node's broadcast receivers list,
      and before any other unicast messages are permitted to be sent.
      
      Furthermore, we don't allow any node to start receiving and
      processing broadcast packets until this new synchronization
      message has been received.
      
      To maintain backwards compatibility, we still open up for
      broadcast reception if we receive a NAME_DISTR message without
      any preceding broadcast sync message. In this case, we must
      assume that the other end has an older code version, and will
      never send out the new synchronization message. Hence, for mixed
      old and new nodes, the issue arising in 7-12 of the above may
      happen with the same probability as before.
      Signed-off-by: default avatarJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: default avatarYing Xue <ying.xue@windriver.com>
      Signed-off-by: default avatarPaul Gortmaker <paul.gortmaker@windriver.com>
      c64f7a6a
    • Ying Xue's avatar
      tipc: rename supported flag to recv_permitted · 389dd9bc
      Ying Xue authored
      Rename the "supported" flag in bclink structure to "recv_permitted"
      to better reflect what it is used for. When this flag is set for a
      given node, we are permitted to receive and acknowledge broadcast
      messages from that node.  Convert it to a bool at the same time,
      since it is not used to store any numerical values.
      Signed-off-by: default avatarYing Xue <ying.xue@windriver.com>
      Signed-off-by: default avatarJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: default avatarPaul Gortmaker <paul.gortmaker@windriver.com>
      389dd9bc
    • Ying Xue's avatar
      tipc: remove supportable flag from bclink structure · 818f4da5
      Ying Xue authored
      The "supportable" flag in bclink structure is a compatibility flag
      indicating whether a peer node is capable of receiving TIPC broadcast
      messages. However, all TIPC versions since tipc-1.5, and after the
      inclusion in the upstream Linux kernel in 2006, support this capability.
      It is highly unlikely that anybody is still using such an old
      version of TIPC, let alone that they want to mix it with TIPC-2.0
      nodes. Therefore, we now remove the "supportable" flag.
      Signed-off-by: default avatarYing Xue <ying.xue@windriver.com>
      Signed-off-by: default avatarJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: default avatarPaul Gortmaker <paul.gortmaker@windriver.com>
      818f4da5
    • Ying Xue's avatar
      tipc: remove the bearer congestion mechanism · 3c294cb3
      Ying Xue authored
      Currently at the TIPC bearer layer there is the following congestion
      mechanism:
      
      Once sending packets has failed via that bearer, the bearer will be
      flagged as being in congested state at once. During bearer congestion,
      all packets arriving at link will be queued on the link's outgoing
      buffer.  When we detect that the state of bearer congestion has
      relaxed (e.g. some packets are received from the bearer) we will try
      our best to push all packets in the link's outgoing buffer until the
      buffer is empty, or until the bearer is congested again.
      
      However, in fact the TIPC bearer never receives any feedback from the
      device layer whether a send was successful or not, so it must always
      assume it was successful. Therefore, the bearer congestion mechanism
      as it exists currently is of no value.
      
      But the bearer blocking state is still useful for us. For example,
      when the physical media goes down/up, we need to change the state of
      the links bound to the bearer.  So the code maintaing the state
      information is not removed.
      Signed-off-by: default avatarYing Xue <ying.xue@windriver.com>
      Signed-off-by: default avatarPaul Gortmaker <paul.gortmaker@windriver.com>
      3c294cb3
    • Ying Xue's avatar
      tipc: wake up all waiting threads at socket shutdown · 75031151
      Ying Xue authored
      When a socket is shut down, we should wake up all thread sleeping on
      it, instead of just one of them. Otherwise, when several threads are
      polling the same socket, and one of them does shutdown(), the
      remaining threads may end up sleeping forever.
      
      Also, to align socket usage with common practice in other stacks, we
      use one of the common socket callback handlers, sk_state_change(),
      to wake up pending users. This is similar to the usage in e.g.
      inet_shutdown(). [net/ipv4/af_inet.c].
      Signed-off-by: default avatarYing Xue <ying.xue@windriver.com>
      Signed-off-by: default avatarJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: default avatarPaul Gortmaker <paul.gortmaker@windriver.com>
      75031151
  2. 21 Nov, 2012 2 commits
    • Erik Hugne's avatar
      tipc: return POLLOUT for sockets in an unconnected state · c4fc298a
      Erik Hugne authored
      If an implied connect is attempted on a nonblocking STREAM/SEQPACKET
      socket during link congestion, the connect message will be discarded
      and sendmsg will return EAGAIN. This is normal behavior, and the
      application is expected to poll the socket until POLLOUT is set,
      after which the connection attempt can be retried.
      However, the POLLOUT flag is never set for unconnected sockets and
      poll() always returns a zero mask. The application is then left without
      a trigger for when it can make another attempt at sending the message.
      
      The solution is to check if we're polling on an unconnected socket
      and set the POLLOUT flag if the TIPC port owned by this socket
      is not congested. The TIPC ports waiting on a specific link will be
      marked as 'not congested' when the link congestion have abated.
      Signed-off-by: default avatarErik Hugne <erik.hugne@ericsson.com>
      Signed-off-by: default avatarPaul Gortmaker <paul.gortmaker@windriver.com>
      c4fc298a
    • Ying Xue's avatar
      tipc: fix race/inefficiencies in poll/wait behaviour · f288bef4
      Ying Xue authored
      When an application blocks at poll/select on a TIPC socket
      while requesting a specific event mask, both the filter_rcv() and
      wakeupdispatch() case will wake it up unconditionally whenever
      the state changes (i.e an incoming message arrives, or congestion
      has subsided).  No mask is used.
      
      To avoid this, we populate sk->sk_data_ready and sk->sk_write_space
      with tipc_data_ready and tipc_write_space respectively, which makes
      tipc more in alignment with the rest of the networking code.  These
      pass the exact set of possible events to the waker in fs/select.c
      hence avoiding waking up blocked processes unnecessarily.
      
      In doing so, we uncover another issue -- that there needs to be a
      memory barrier in these poll/receive callbacks, otherwise we are
      subject to the the same race as documented above wq_has_sleeper()
      [in commit a57de0b4 "net: adding memory barrier to the poll and
      receive callbacks"].  So we need to replace poll_wait() with
      sock_poll_wait() and use rcu protection for the sk->sk_wq pointer
      in these two new functions.
      Signed-off-by: default avatarYing Xue <ying.xue@windriver.com>
      Signed-off-by: default avatarPaul Gortmaker <paul.gortmaker@windriver.com>
      f288bef4
  3. 20 Nov, 2012 11 commits
  4. 19 Nov, 2012 21 commits
    • Shan Wei's avatar
      ae4b46e9
    • Shan Wei's avatar
      net: core: use this_cpu_ptr per-cpu helper · 1f743b07
      Shan Wei authored
      flush_tasklet is a struct, not a pointer in percpu var.
      so use this_cpu_ptr to get the member pointer.
      Signed-off-by: default avatarShan Wei <davidshan@tencent.com>
      Reviewed-by: default avatarChristoph Lameter <cl@linux.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      1f743b07
    • Sachin Kamat's avatar
      vhost: Remove duplicate inclusion of linux/vhost.h · 91aa7651
      Sachin Kamat authored
      linux/vhost.h was included twice.
      Signed-off-by: default avatarSachin Kamat <sachin.kamat@linaro.org>
      Acked-by: default avatarMichael S. Tsirkin <mst@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      91aa7651
    • Nicolas Ferre's avatar
      net/macb: move to circ_buf macros and fix initial condition · 909a8583
      Nicolas Ferre authored
      Move to circular buffers management macro and correct an error
      with circular buffer initial condition.
      
      Without this patch, the macb_tx_ring_avail() function was
      not reporting the proper ring availability at startup:
      macb macb: eth0: BUG! Tx Ring full when queue awake!
      macb macb: eth0: tx_head = 0, tx_tail = 0
      And hanginig forever...
      
      I remove the macb_tx_ring_avail() function and use the
      proven macros from circ_buf.h. CIRC_CNT() is used in the
      "consumer" part of the driver: macb_tx_interrupt() to match
      advice from Documentation/circular-buffers.txt.
      Reported-by: default avatarJean-Christophe PLAGNIOL-VILLARD <plagnioj@jcrosoft.com>
      Signed-off-by: default avatarNicolas Ferre <nicolas.ferre@atmel.com>
      Tested-by: default avatarJean-Christophe PLAGNIOL-VILLARD <plagnioj@jcrosoft.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      909a8583
    • Eric W. Biederman's avatar
      netfilter: Remove the spurious \ in __ip_vs_lblc_init · e5ef39ed
      Eric W. Biederman authored
      In (464dc801 net: Don't export sysctls to unprivileged users)
      I typoed and introduced a spurious backslash.  Delete it.
      Signed-off-by: default avatar"Eric W. Biederman" <ebiederm@xmission.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e5ef39ed
    • Stefan Raspl's avatar
      qeth: Remove BUG_ONs · 18af5c17
      Stefan Raspl authored
      Remove BUG_ONs or convert to WARN_ON_ONCE/WARN_ONs since a failure within a
      networking device driver is no reason to shut down the entire machine.
      Signed-off-by: default avatarStefan Raspl <raspl@linux.vnet.ibm.com>
      Signed-off-by: default avatarFrank Blaschka <frank.blaschka@de.ibm.com>
      Reviewed-by: default avatarUrsula Braun <ursula.braun@de.ibm.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      18af5c17
    • Stefan Raspl's avatar
      qeth: Consolidate tracing of card features · 395672e0
      Stefan Raspl authored
      Trace all supported and enabled card features to s390dbf.
      Signed-off-by: default avatarStefan Raspl <raspl@linux.vnet.ibm.com>
      Signed-off-by: default avatarFrank Blaschka <frank.blaschka@de.ibm.com>
      Reviewed-by: default avatarUrsula Braun <ursula.braun@de.ibm.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      395672e0
    • Stefan Raspl's avatar
      qeth: Clarify card type naming for virtual NICs · 7096b187
      Stefan Raspl authored
      So far, virtual NICs whether attached to a VSWITCH or a guest LAN were always
      displayed as guest LANs in the device driver attributes and messages, while
      in fact it is a virtual NIC.
      Signed-off-by: default avatarStefan Raspl <raspl@linux.vnet.ibm.com>
      Signed-off-by: default avatarFrank Blaschka <frank.blaschka@de.ibm.com>
      Reviewed-by: default avatarUrsula Braun <ursula.braun@de.ibm.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      7096b187
    • Ursula Braun's avatar
      claw: remove BUG_ONs · 569743e4
      Ursula Braun authored
      Remove BUG_ON's in claw driver, since the checked error conditions
      are null pointer accesses.
      Signed-off-by: default avatarUrsula Braun <ursula.braun@de.ibm.com>
      Signed-off-by: default avatarFrank Blaschka <frank.blaschka@de.ibm.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      569743e4
    • Ursula Braun's avatar
      ctcm: remove BUG_ONs · bfd2eb3b
      Ursula Braun authored
      Remove BUG_ON's in ctcm driver, since the checked error conditions
      are null pointer accesses.
      Signed-off-by: default avatarUrsula Braun <ursula.braun@de.ibm.com>
      Signed-off-by: default avatarFrank Blaschka <frank.blaschka@de.ibm.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      bfd2eb3b
    • Stefan Raspl's avatar
      qeth: Remove unused variable · 7bf9bcff
      Stefan Raspl authored
      Eliminate a variable that is never modified.
      Signed-off-by: default avatarStefan Raspl <raspl@linux.vnet.ibm.com>
      Signed-off-by: default avatarFrank Blaschka <frank.blaschka@de.ibm.com>
      Reviewed-by: default avatarUrsula Braun <ursula.braun@de.ibm.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      7bf9bcff
    • Eric W. Biederman's avatar
      net: Allow userns root to control tun and tap devices · c260b772
      Eric W. Biederman authored
      Allow an unpriviled user who has created a user namespace, and then
      created a network namespace to effectively use the new network
      namespace, by reducing capable(CAP_NET_ADMIN) calls to
      ns_capable(net->user_ns,CAP_NET_ADMIN) calls.
      
      Allow setting of the tun iff flags.
      Allow creating of tun devices.
      Allow adding a new queue to a tun device.
      Signed-off-by: default avatar"Eric W. Biederman" <ebiederm@xmission.com>
      Acked-by: default avatarSerge Hallyn <serge.hallyn@canonical.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c260b772
    • Eric W. Biederman's avatar
      net: Make CAP_NET_BIND_SERVICE per user namespace · 3594698a
      Eric W. Biederman authored
      Allow privileged users in any user namespace to bind to
      privileged sockets in network namespaces they control.
      Signed-off-by: default avatar"Eric W. Biederman" <ebiederm@xmission.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      3594698a
    • Eric W. Biederman's avatar
      net: Enable a userns root rtnl calls that are safe for unprivilged users · b51642f6
      Eric W. Biederman authored
      - Only allow moving network devices to network namespaces you have
        CAP_NET_ADMIN privileges over.
      
      - Enable creating/deleting/modifying interfaces
      - Enable adding/deleting addresses
      - Enable adding/setting/deleting neighbour entries
      - Enable adding/removing routes
      - Enable adding/removing fib rules
      - Enable setting the forwarding state
      - Enable adding/removing ipv6 address labels
      - Enable setting bridge parameter
      Signed-off-by: default avatar"Eric W. Biederman" <ebiederm@xmission.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      b51642f6
    • Eric W. Biederman's avatar
      net: Enable some sysctls that are safe for the userns root · c027aab4
      Eric W. Biederman authored
      - Enable the per device ipv4 sysctls:
         net/ipv4/conf/<if>/forwarding
         net/ipv4/conf/<if>/mc_forwarding
         net/ipv4/conf/<if>/accept_redirects
         net/ipv4/conf/<if>/secure_redirects
         net/ipv4/conf/<if>/shared_media
         net/ipv4/conf/<if>/rp_filter
         net/ipv4/conf/<if>/send_redirects
         net/ipv4/conf/<if>/accept_source_route
         net/ipv4/conf/<if>/accept_local
         net/ipv4/conf/<if>/src_valid_mark
         net/ipv4/conf/<if>/proxy_arp
         net/ipv4/conf/<if>/medium_id
         net/ipv4/conf/<if>/bootp_relay
         net/ipv4/conf/<if>/log_martians
         net/ipv4/conf/<if>/tag
         net/ipv4/conf/<if>/arp_filter
         net/ipv4/conf/<if>/arp_announce
         net/ipv4/conf/<if>/arp_ignore
         net/ipv4/conf/<if>/arp_accept
         net/ipv4/conf/<if>/arp_notify
         net/ipv4/conf/<if>/proxy_arp_pvlan
         net/ipv4/conf/<if>/disable_xfrm
         net/ipv4/conf/<if>/disable_policy
         net/ipv4/conf/<if>/force_igmp_version
         net/ipv4/conf/<if>/promote_secondaries
         net/ipv4/conf/<if>/route_localnet
      
      - Enable the global ipv4 sysctl:
         net/ipv4/ip_forward
      
      - Enable the per device ipv6 sysctls:
         net/ipv6/conf/<if>/forwarding
         net/ipv6/conf/<if>/hop_limit
         net/ipv6/conf/<if>/mtu
         net/ipv6/conf/<if>/accept_ra
         net/ipv6/conf/<if>/accept_redirects
         net/ipv6/conf/<if>/autoconf
         net/ipv6/conf/<if>/dad_transmits
         net/ipv6/conf/<if>/router_solicitations
         net/ipv6/conf/<if>/router_solicitation_interval
         net/ipv6/conf/<if>/router_solicitation_delay
         net/ipv6/conf/<if>/force_mld_version
         net/ipv6/conf/<if>/use_tempaddr
         net/ipv6/conf/<if>/temp_valid_lft
         net/ipv6/conf/<if>/temp_prefered_lft
         net/ipv6/conf/<if>/regen_max_retry
         net/ipv6/conf/<if>/max_desync_factor
         net/ipv6/conf/<if>/max_addresses
         net/ipv6/conf/<if>/accept_ra_defrtr
         net/ipv6/conf/<if>/accept_ra_pinfo
         net/ipv6/conf/<if>/accept_ra_rtr_pref
         net/ipv6/conf/<if>/router_probe_interval
         net/ipv6/conf/<if>/accept_ra_rt_info_max_plen
         net/ipv6/conf/<if>/proxy_ndp
         net/ipv6/conf/<if>/accept_source_route
         net/ipv6/conf/<if>/optimistic_dad
         net/ipv6/conf/<if>/mc_forwarding
         net/ipv6/conf/<if>/disable_ipv6
         net/ipv6/conf/<if>/accept_dad
         net/ipv6/conf/<if>/force_tllao
      
      - Enable the global ipv6 sysctls:
         net/ipv6/bindv6only
         net/ipv6/icmp/ratelimit
      Signed-off-by: default avatar"Eric W. Biederman" <ebiederm@xmission.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c027aab4
    • Eric W. Biederman's avatar
      net: Allow the userns root to control vlans. · 276996fd
      Eric W. Biederman authored
      Allow an unpriviled user who has created a user namespace, and then
      created a network namespace to effectively use the new network
      namespace, by reducing capable(CAP_NET_ADMIN) and
      capable(CAP_NET_RAW) calls to be ns_capable(net->user_ns,
      CAP_NET_ADMIN), or capable(net->user_ns, CAP_NET_RAW) calls.
      
      Allow the vlan ioctls:
      SET_VLAN_INGRESS_PRIORITY_CMD
      SET_VLAN_EGRESS_PRIORITY_CMD
      SET_VLAN_FLAG_CMD
      SET_VLAN_NAME_TYPE_CMD
      ADD_VLAN_CMD
      DEL_VLAN_CMD
      Signed-off-by: default avatar"Eric W. Biederman" <ebiederm@xmission.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      276996fd
    • Eric W. Biederman's avatar
      net: Allow userns root to control the network bridge code. · cb990503
      Eric W. Biederman authored
      Allow an unpriviled user who has created a user namespace, and then
      created a network namespace to effectively use the new network
      namespace, by reducing capable(CAP_NET_ADMIN) and
      capable(CAP_NET_RAW) calls to be ns_capable(net->user_ns,
      CAP_NET_ADMIN), or capable(net->user_ns, CAP_NET_RAW) calls.
      
      Allow setting bridge paramters via sysfs.
      
      Allow all of the bridge ioctls:
      BRCTL_ADD_IF
      BRCTL_DEL_IF
      BRCTL_SET_BRDIGE_FORWARD_DELAY
      BRCTL_SET_BRIDGE_HELLO_TIME
      BRCTL_SET_BRIDGE_MAX_AGE
      BRCTL_SET_BRIDGE_AGING_TIME
      BRCTL_SET_BRIDGE_STP_STATE
      BRCTL_SET_BRIDGE_PRIORITY
      BRCTL_SET_PORT_PRIORITY
      BRCTL_SET_PATH_COST
      BRCTL_ADD_BRIDGE
      BRCTL_DEL_BRDIGE
      Signed-off-by: default avatar"Eric W. Biederman" <ebiederm@xmission.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      cb990503
    • Eric W. Biederman's avatar
      net: Allow userns root to control llc, netfilter, netlink, packet, and xfrm · df008c91
      Eric W. Biederman authored
      Allow an unpriviled user who has created a user namespace, and then
      created a network namespace to effectively use the new network
      namespace, by reducing capable(CAP_NET_ADMIN) and
      capable(CAP_NET_RAW) calls to be ns_capable(net->user_ns,
      CAP_NET_ADMIN), or capable(net->user_ns, CAP_NET_RAW) calls.
      
      Allow creation of af_key sockets.
      Allow creation of llc sockets.
      Allow creation of af_packet sockets.
      
      Allow sending xfrm netlink control messages.
      
      Allow binding to netlink multicast groups.
      Allow sending to netlink multicast groups.
      Allow adding and dropping netlink multicast groups.
      Allow sending to all netlink multicast groups and port ids.
      
      Allow reading the netfilter SO_IP_SET socket option.
      Allow sending netfilter netlink messages.
      Allow setting and getting ip_vs netfilter socket options.
      Signed-off-by: default avatar"Eric W. Biederman" <ebiederm@xmission.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      df008c91
    • Eric W. Biederman's avatar
      net: Allow userns root to control ipv6 · af31f412
      Eric W. Biederman authored
      Allow an unpriviled user who has created a user namespace, and then
      created a network namespace to effectively use the new network
      namespace, by reducing capable(CAP_NET_ADMIN) and
      capable(CAP_NET_RAW) calls to be ns_capable(net->user_ns,
      CAP_NET_ADMIN), or capable(net->user_ns, CAP_NET_RAW) calls.
      
      Settings that merely control a single network device are allowed.
      Either the network device is a logical network device where
      restrictions make no difference or the network device is hardware NIC
      that has been explicity moved from the initial network namespace.
      
      In general policy and network stack state changes are allowed while
      resource control is left unchanged.
      
      Allow the SIOCSIFADDR ioctl to add ipv6 addresses.
      Allow the SIOCDIFADDR ioctl to delete ipv6 addresses.
      Allow the SIOCADDRT ioctl to add ipv6 routes.
      Allow the SIOCDELRT ioctl to delete ipv6 routes.
      
      Allow creation of ipv6 raw sockets.
      
      Allow setting the IPV6_JOIN_ANYCAST socket option.
      Allow setting the IPV6_FL_A_RENEW parameter of the IPV6_FLOWLABEL_MGR
      socket option.
      
      Allow setting the IPV6_TRANSPARENT socket option.
      Allow setting the IPV6_HOPOPTS socket option.
      Allow setting the IPV6_RTHDRDSTOPTS socket option.
      Allow setting the IPV6_DSTOPTS socket option.
      Allow setting the IPV6_IPSEC_POLICY socket option.
      Allow setting the IPV6_XFRM_POLICY socket option.
      
      Allow sending packets with the IPV6_2292HOPOPTS control message.
      Allow sending packets with the IPV6_2292DSTOPTS control message.
      Allow sending packets with the IPV6_RTHDRDSTOPTS control message.
      
      Allow setting the multicast routing socket options on non multicast
      routing sockets.
      
      Allow the SIOCADDTUNNEL, SIOCCHGTUNNEL, and SIOCDELTUNNEL ioctls for
      setting up, changing and deleting tunnels over ipv6.
      
      Allow the SIOCADDTUNNEL, SIOCCHGTUNNEL, SIOCDELTUNNEL ioctls for
      setting up, changing and deleting ipv6 over ipv4 tunnels.
      
      Allow the SIOCADDPRL, SIOCDELPRL, SIOCCHGPRL ioctls for adding,
      deleting, and changing the potential router list for ISATAP tunnels.
      Signed-off-by: default avatar"Eric W. Biederman" <ebiederm@xmission.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      af31f412
    • Eric W. Biederman's avatar
      net: Allow userns root to control ipv4 · 52e804c6
      Eric W. Biederman authored
      Allow an unpriviled user who has created a user namespace, and then
      created a network namespace to effectively use the new network
      namespace, by reducing capable(CAP_NET_ADMIN) and
      capable(CAP_NET_RAW) calls to be ns_capable(net->user_ns,
      CAP_NET_ADMIN), or capable(net->user_ns, CAP_NET_RAW) calls.
      
      Settings that merely control a single network device are allowed.
      Either the network device is a logical network device where
      restrictions make no difference or the network device is hardware NIC
      that has been explicity moved from the initial network namespace.
      
      In general policy and network stack state changes are allowed
      while resource control is left unchanged.
      
      Allow creating raw sockets.
      Allow the SIOCSARP ioctl to control the arp cache.
      Allow the SIOCSIFFLAG ioctl to allow setting network device flags.
      Allow the SIOCSIFADDR ioctl to allow setting a netdevice ipv4 address.
      Allow the SIOCSIFBRDADDR ioctl to allow setting a netdevice ipv4 broadcast address.
      Allow the SIOCSIFDSTADDR ioctl to allow setting a netdevice ipv4 destination address.
      Allow the SIOCSIFNETMASK ioctl to allow setting a netdevice ipv4 netmask.
      Allow the SIOCADDRT and SIOCDELRT ioctls to allow adding and deleting ipv4 routes.
      
      Allow the SIOCADDTUNNEL, SIOCCHGTUNNEL and SIOCDELTUNNEL ioctls for
      adding, changing and deleting gre tunnels.
      
      Allow the SIOCADDTUNNEL, SIOCCHGTUNNEL and SIOCDELTUNNEL ioctls for
      adding, changing and deleting ipip tunnels.
      
      Allow the SIOCADDTUNNEL, SIOCCHGTUNNEL and SIOCDELTUNNEL ioctls for
      adding, changing and deleting ipsec virtual tunnel interfaces.
      
      Allow setting the MRT_INIT, MRT_DONE, MRT_ADD_VIF, MRT_DEL_VIF, MRT_ADD_MFC,
      MRT_DEL_MFC, MRT_ASSERT, MRT_PIM, MRT_TABLE socket options on multicast routing
      sockets.
      
      Allow setting and receiving IPOPT_CIPSO, IP_OPT_SEC, IP_OPT_SID and
      arbitrary ip options.
      
      Allow setting IP_SEC_POLICY/IP_XFRM_POLICY ipv4 socket option.
      Allow setting the IP_TRANSPARENT ipv4 socket option.
      Allow setting the TCP_REPAIR socket option.
      Allow setting the TCP_CONGESTION socket option.
      Signed-off-by: default avatar"Eric W. Biederman" <ebiederm@xmission.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      52e804c6
    • Eric W. Biederman's avatar
      net: Allow userns root control of the core of the network stack. · 5e1fccc0
      Eric W. Biederman authored
      Allow an unpriviled user who has created a user namespace, and then
      created a network namespace to effectively use the new network
      namespace, by reducing capable(CAP_NET_ADMIN) and
      capable(CAP_NET_RAW) calls to be ns_capable(net->user_ns,
      CAP_NET_ADMIN), or capable(net->user_ns, CAP_NET_RAW) calls.
      
      Settings that merely control a single network device are allowed.
      Either the network device is a logical network device where
      restrictions make no difference or the network device is hardware NIC
      that has been explicity moved from the initial network namespace.
      
      In general policy and network stack state changes are allowed
      while resource control is left unchanged.
      
      Allow ethtool ioctls.
      
      Allow binding to network devices.
      Allow setting the socket mark.
      Allow setting the socket priority.
      
      Allow setting the network device alias via sysfs.
      Allow setting the mtu via sysfs.
      Allow changing the network device flags via sysfs.
      Allow setting the network device group via sysfs.
      
      Allow the following network device ioctls.
      SIOCGMIIPHY
      SIOCGMIIREG
      SIOCSIFNAME
      SIOCSIFFLAGS
      SIOCSIFMETRIC
      SIOCSIFMTU
      SIOCSIFHWADDR
      SIOCSIFSLAVE
      SIOCADDMULTI
      SIOCDELMULTI
      SIOCSIFHWBROADCAST
      SIOCSMIIREG
      SIOCBONDENSLAVE
      SIOCBONDRELEASE
      SIOCBONDSETHWADDR
      SIOCBONDCHANGEACTIVE
      SIOCBRADDIF
      SIOCBRDELIF
      SIOCSHWTSTAMP
      Signed-off-by: default avatar"Eric W. Biederman" <ebiederm@xmission.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      5e1fccc0