1. 08 Feb, 2013 1 commit
    • Ian Campbell's avatar
      xen/netback: shutdown the ring if it contains garbage. · 48856286
      Ian Campbell authored
      A buggy or malicious frontend should not be able to confuse netback.
      If we spot anything which is not as it should be then shutdown the
      device and don't try to continue with the ring in a potentially
      hostile state. Well behaved and non-hostile frontends will not be
      penalised.
      
      As well as making the existing checks for such errors fatal also add a
      new check that ensures that there isn't an insane number of requests
      on the ring (i.e. more than would fit in the ring). If the ring
      contains garbage then previously is was possible to loop over this
      insane number, getting an error each time and therefore not generating
      any more pending requests and therefore not exiting the loop in
      xen_netbk_tx_build_gops for an externded period.
      
      Also turn various netdev_dbg calls which no precipitate a fatal error
      into netdev_err, they are rate limited because the device is shutdown
      afterwards.
      
      This fixes at least one known DoS/softlockup of the backend domain.
      Signed-off-by: default avatarIan Campbell <ian.campbell@citrix.com>
      Reviewed-by: default avatarKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Acked-by: default avatarJan Beulich <JBeulich@suse.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      48856286
  2. 04 Feb, 2013 3 commits
    • Bjørn Mork's avatar
      net: usbnet: fix tx_dropped statistics · bf414b36
      Bjørn Mork authored
      It is normal for minidrivers accumulating frames to return NULL
      from their tx_fixup function. We do not want to count this as a
      drop, or log any debug messages.  A different exit path is
      therefore chosen for such drivers, skipping the debug message
      and the tx_dropped increment.
      
      The test for accumulating drivers was however completely bogus,
      making the exit path selection depend on whether the user had
      enabled tx_err logging or not. This would arbitrarily mess up
      accounting for both accumulating and non-accumulating minidrivers,
      and would result in unwanted debug messages for the accumulating
      drivers.
      
      Fix by testing for FLAG_MULTI_PACKET instead, which probably was
      the intention from the beginning.  This usage match the documented
      behaviour of this flag:
      
       Indicates to usbnet, that USB driver accumulates multiple IP packets.
       Affects statistic (counters) and short packet handling.
      Signed-off-by: default avatarBjørn Mork <bjorn@mork.no>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      bf414b36
    • Vijay Subramanian's avatar
      tcp: ipv6: Update MIB counters for drops · 5f1e942c
      Vijay Subramanian authored
      This patch updates LINUX_MIB_LISTENDROPS and LINUX_MIB_LISTENOVERFLOWS in
      tcp_v6_conn_request() and tcp_v6_err(). tcp_v6_conn_request() in particular can
      drop SYNs for various reasons which are not currently tracked.
      Signed-off-by: default avatarVijay Subramanian <subramanian.vijay@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      5f1e942c
    • Vijay Subramanian's avatar
      tcp: Update MIB counters for drops · 848bf15f
      Vijay Subramanian authored
      This patch updates LINUX_MIB_LISTENDROPS in tcp_v4_conn_request() and
      tcp_v4_err(). tcp_v4_conn_request() in particular can drop SYNs for various
      reasons which are not currently tracked.
      Signed-off-by: default avatarVijay Subramanian <subramanian.vijay@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      848bf15f
  3. 03 Feb, 2013 6 commits
  4. 01 Feb, 2013 3 commits
  5. 31 Jan, 2013 6 commits
  6. 30 Jan, 2013 2 commits
  7. 29 Jan, 2013 14 commits
    • Neil Horman's avatar
      vmxnet3: set carrier state properly on probe · 6cdd20c3
      Neil Horman authored
      vmxnet3 fails to set netif_carrier_off on probe, meaning that when an interface
      is opened the __LINK_STATE_NOCARRIER bit is already cleared, and so
      /sys/class/net/<ifname>/operstate remains in the unknown state.  Correct this by
      setting netif_carrier_off on probe, like other drivers do.
      
      Also, while we're at it, lets remove the netif_carrier_ok checks from the
      link_state_update function, as that check is atomically contained within the
      netif_carrier_[on|off] functions anyway
      
      Tested successfully by myself
      Signed-off-by: default avatarNeil Horman <nhorman@tuxdriver.com>
      CC: "David S. Miller" <davem@davemloft.net>
      CC: "VMware, Inc." <pv-drivers@vmware.com>
      CC: Ben Hutchings <bhutchings@solarflare.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      6cdd20c3
    • Bruce Allan's avatar
      e1000e: enable ECC on I217/I218 to catch packet buffer memory errors · 28600304
      Bruce Allan authored
      In rare instances, memory errors have been detected in the internal packet
      buffer memory on I217/I218 when stressed under certain environmental
      conditions.  Enable Error Correcting Code (ECC) in hardware to catch both
      correctable and uncorrectable errors.  Correctable errors will be handled
      by the hardware.  Uncorrectable errors in the packet buffer will cause the
      packet to be received with an error indication in the buffer descriptor
      causing the packet to be discarded.  If the uncorrectable error is in the
      descriptor itself, the hardware will stop and interrupt the driver
      indicating the error.  The driver will then reset the hardware in order to
      clear the error and restart.
      
      Both types of errors will be accounted for in statistics counters.
      Signed-off-by: default avatarBruce Allan <bruce.w.allan@intel.com>
      Cc: <stable@vger.kernel.org> # 3.5.x & 3.6.x
      Tested-by: default avatarJeff Pieper <jeffrey.e.pieper@intel.com>
      Signed-off-by: default avatarJeff Kirsher <jeffrey.t.kirsher@intel.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      28600304
    • Milos Vyletel's avatar
      bonding: unset primary slave via sysfs · eb492f74
      Milos Vyletel authored
      When bonding module is loaded with primary parameter and one decides to unset
      primary slave using sysfs these settings are not preserved during bond device
      restart. Primary slave is only unset once and it's not remembered in
      bond->params structure. Below is example of recreation.
      
       grep OPTS /etc/sysconfig/network-scripts/ifcfg-bond0
      BONDING_OPTS="mode=active-backup miimon=100 primary=eth01"
       grep "Primary Slave" /proc/net/bonding/bond0
      Primary Slave: eth01 (primary_reselect always)
      
       echo "" > /sys/class/net/bond0/bonding/primary
       grep "Primary Slave" /proc/net/bonding/bond0
      Primary Slave: None
      
       sed -i -e 's/primary=eth01//' /etc/sysconfig/network-scripts/ifcfg-bond0
       grep OPTS /etc/sysconfig/network-scripts/ifcfg-bond
      BONDING_OPTS="mode=active-backup miimon=100 "
       ifdown bond0 && ifup bond0
      
      without patch:
       grep "Primary Slave" /proc/net/bonding/bond0
      Primary Slave: eth01 (primary_reselect always)
      
      with patch:
       grep "Primary Slave" /proc/net/bonding/bond0
      Primary Slave: None
      Reviewed-by: default avatarJiri Pirko <jiri@resnulli.us>
      Signed-off-by: default avatarMilos Vyletel <milos.vyletel@sde.cz>
      Signed-off-by: default avatarJay Vosburgh <fubar@us.ibm.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      eb492f74
    • Nivedita Singhvi's avatar
      tcp: Increment LISTENOVERFLOW and LISTENDROPS in tcp_v4_conn_request() · 2aeef18d
      Nivedita Singhvi authored
      We drop a connection request if the accept backlog is full and there are
      sufficient packets in the syn queue to warrant starting drops. Increment the
      appropriate counters so this isn't silent, for accurate stats and help in
      debugging.
      
      This patch assumes LINUX_MIB_LISTENDROPS is a superset of/includes the
      counter LINUX_MIB_LISTENOVERFLOWS.
      Signed-off-by: default avatarNivedita Singhvi <niv@us.ibm.com>
      Acked-by: default avatarVijay Subramanian <subramanian.vijay@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      2aeef18d
    • YOSHIFUJI Hideaki / 吉藤英明's avatar
      ipv6 addrconf: Fix interface identifiers of 802.15.4 devices. · 5e98a36e
      YOSHIFUJI Hideaki / 吉藤英明 authored
      The "Universal/Local" (U/L) bit must be complmented according to RFC4944
      and RFC2464.
      Signed-off-by: default avatarYOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      5e98a36e
    • Sarveshwar Bandi's avatar
    • Jason Wang's avatar
      tuntap: allow polling/writing/reading when detached · 9e85722d
      Jason Wang authored
      We forbid polling, writing and reading when the file were detached, this may
      complex the user in several cases:
      
      - when guest pass some buffers to vhost/qemu and then disable some queues,
        host/qemu needs to do its own cleanup on those buffers which is complex
        sometimes. We can do this simply by allowing a user can still write to an
        disabled queue. Write to an disabled queue will cause the packet pass to the
        kernel and read will get nothing.
      - align the polling behavior with macvtap which never fails when the queue is
        created. This can simplify the polling errors handling of its user (e.g vhost)
      
      We can simply achieve this by don't assign NULL to tfile->tun when detached.
      Signed-off-by: default avatarJason Wang <jasowang@redhat.com>
      Acked-by: default avatarMichael S. Tsirkin <mst@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      9e85722d
    • Jason Wang's avatar
      vhost_net: handle polling errors when setting backend · 2b8b328b
      Jason Wang authored
      Currently, the polling errors were ignored, which can lead following issues:
      
      - vhost remove itself unconditionally from waitqueue when stopping the poll,
        this may crash the kernel since the previous attempt of starting may fail to
        add itself to the waitqueue
      - userspace may think the backend were successfully set even when the polling
        failed.
      
      Solve this by:
      
      - check poll->wqh before trying to remove from waitqueue
      - report polling errors in vhost_poll_start(), tx_poll_start(), the return value
        will be checked and returned when userspace want to set the backend
      
      After this fix, there still could be a polling failure after backend is set, it
      will addressed by the next patch.
      Signed-off-by: default avatarJason Wang <jasowang@redhat.com>
      Acked-by: default avatarMichael S. Tsirkin <mst@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      2b8b328b
    • Jason Wang's avatar
      vhost_net: correct error handling in vhost_net_set_backend() · 692a998b
      Jason Wang authored
      Currently, when vhost_init_used() fails the sock refcnt and ubufs were
      leaked. Correct this by calling vhost_init_used() before assign ubufs and
      restore the oldsock when it fails.
      Signed-off-by: default avatarJason Wang <jasowang@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      692a998b
    • Michael S. Tsirkin's avatar
      tun: fix carrier on/off status · af668b3c
      Michael S. Tsirkin authored
      Commit c8d68e6b removed carrier off call
      from tun_detach since it's now called on queue disable and not only on
      tun close.  This confuses userspace which used this flag to detect a
      free tun. To fix, put this back but under if (clean).
      Signed-off-by: default avatarMichael S. Tsirkin <mst@redhat.com>
      Tested-by: default avatarJason Wang <jasowang@redhat.com>
      Acked-by: default avatarJason Wang <jasowang@redhat.com>
      Tested-by: default avatarToralf Förster <toralf.foerster@gmx.de>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      af668b3c
    • Cong Wang's avatar
      pktgen: correctly handle failures when adding a device · 604dfd6e
      Cong Wang authored
      The return value of pktgen_add_device() is not checked, so
      even if we fail to add some device, for example, non-exist one,
      we still see "OK:...". This patch fixes it.
      
      After this patch, I got:
      
      	# echo "add_device non-exist" > /proc/net/pktgen/kpktgend_0
      	-bash: echo: write error: No such device
      	# cat /proc/net/pktgen/kpktgend_0
      	Running:
      	Stopped:
      	Result: ERROR: can not add device non-exist
      	# echo "add_device eth0" > /proc/net/pktgen/kpktgend_0
      	# cat /proc/net/pktgen/kpktgend_0
      	Running:
      	Stopped: eth0
      	Result: OK: add_device=eth0
      
      (Candidate for -stable)
      
      Cc: David S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarCong Wang <amwang@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      604dfd6e
    • Johannes Naab's avatar
      netem: fix delay calculation in rate extension · a13d3104
      Johannes Naab authored
      The delay calculation with the rate extension introduces in v3.3 does
      not properly work, if other packets are still queued for transmission.
      For the delay calculation to work, both delay types (latency and delay
      introduces by rate limitation) have to be handled differently. The
      latency delay for a packet can overlap with the delay of other packets.
      The delay introduced by the rate however is separate, and can only
      start, once all other rate-introduced delays finished.
      
      Latency delay is from same distribution for each packet, rate delay
      depends on the packet size.
      
      .: latency delay
      -: rate delay
      x: additional delay we have to wait since another packet is currently
         transmitted
      
        .....----                    Packet 1
          .....xx------              Packet 2
                     .....------     Packet 3
          ^^^^^
          latency stacks
               ^^
               rate delay doesn't stack
                     ^^
                     latency stacks
      
        -----> time
      
      When a packet is enqueued, we first consider the latency delay. If other
      packets are already queued, we can reduce the latency delay until the
      last packet in the queue is send, however the latency delay cannot be
      <0, since this would mean that the rate is overcommitted.  The new
      reference point is the time at which the last packet will be send. To
      find the time, when the packet should be send, the rate introduces delay
      has to be added on top of that.
      Signed-off-by: default avatarJohannes Naab <jn@stusta.de>
      Acked-by: default avatarHagen Paul Pfeifer <hagen@jauu.net>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a13d3104
    • Tom Parkin's avatar
      l2tp: prevent l2tp_tunnel_delete racing with userspace close · 80d84ef3
      Tom Parkin authored
      If a tunnel socket is created by userspace, l2tp hooks the socket destructor
      in order to clean up resources if userspace closes the socket or crashes.  It
      also caches a pointer to the struct sock for use in the data path and in the
      netlink interface.
      
      While it is safe to use the cached sock pointer in the data path, where the
      skb references keep the socket alive, it is not safe to use it elsewhere as
      such access introduces a race with userspace closing the socket.  In
      particular, l2tp_tunnel_delete is prone to oopsing if a multithreaded
      userspace application closes a socket at the same time as sending a netlink
      delete command for the tunnel.
      
      This patch fixes this oops by forcing l2tp_tunnel_delete to explicitly look up
      a tunnel socket held by userspace using sockfd_lookup().
      Signed-off-by: default avatarTom Parkin <tparkin@katalix.com>
      Signed-off-by: default avatarJames Chapman <jchapman@katalix.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      80d84ef3
    • Linus Torvalds's avatar
      Merge branch 'merge' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc · fc16e884
      Linus Torvalds authored
      Pull powerpc fixes from Benjamin Herrenschmidt:
       "Whenever you have a chance between two dives, you might want to
        consider pulling my merge branch to pickup a few fixes for 3.8 that
        have been accumulating for the last couple of weeks (I was myself
        travelling then on vacation).
      
        Nothing major, just a handful of powerpc bug fixes that I consider
        worth getting in before 3.8 goes final."
      
      And I'll have everybody know that I'm not diving for several days yet.
      Snif.
      
      * 'merge' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc:
        powerpc: Max next_tb to prevent from replaying timer interrupt
        powerpc: kernel/kgdb.c: Fix memory leakage
        powerpc/book3e: Disable interrupt after preempt_schedule_irq
        powerpc/oprofile: Fix error in oprofile power7_marked_instr_event() function
        powerpc/pasemi: Fix crash on reboot
        powerpc: Fix MAX_STACK_TRACE_ENTRIES too low warning for ppc32
      fc16e884
  8. 28 Jan, 2013 5 commits
    • Tiejun Chen's avatar
      powerpc: Max next_tb to prevent from replaying timer interrupt · 689dfa89
      Tiejun Chen authored
      With lazy interrupt, we always call __check_irq_replaysome with
      decrementers_next_tb to check if we need to replay timer interrupt.
      So in hotplug case we also need to set decrementers_next_tb as MAX
      to make sure __check_irq_replay don't replay timer interrupt
      when return as we expect, otherwise we'll trap here infinitely.
      Signed-off-by: default avatarTiejun Chen <tiejun.chen@windriver.com>
      Signed-off-by: default avatarBenjamin Herrenschmidt <benh@kernel.crashing.org>
      689dfa89
    • Cong Ding's avatar
      powerpc: kernel/kgdb.c: Fix memory leakage · fefd9e6f
      Cong Ding authored
      the variable backup_current_thread_info isn't freed before existing the
      function.
      Signed-off-by: default avatarCong Ding <dinggnu@gmail.com>
      Signed-off-by: default avatarBenjamin Herrenschmidt <benh@kernel.crashing.org>
      fefd9e6f
    • Tiejun Chen's avatar
      powerpc/book3e: Disable interrupt after preempt_schedule_irq · 572177d7
      Tiejun Chen authored
      In preempt case current arch_local_irq_restore() from
      preempt_schedule_irq() may enable hard interrupt but we really
      should disable interrupts when we return from the interrupt,
      and so that we don't get interrupted after loading SRR0/1.
      Signed-off-by: default avatarTiejun Chen <tiejun.chen@windriver.com>
      CC: <stable@vger.kernel.org>
      Signed-off-by: default avatarBenjamin Herrenschmidt <benh@kernel.crashing.org>
      572177d7
    • Carl E. Love's avatar
      powerpc/oprofile: Fix error in oprofile power7_marked_instr_event() function · 46ed7a76
      Carl E. Love authored
      The calculation for the left shift of the mask OPROFILE_PM_PMCSEL_MSK has an
      error.  The calculation is should be to shift left by (max_cntrs - cntr) times
      the width of the pmsel field width.  However, the #define OPROFILE_MAX_PMC_NUM
      was used instead of OPROFILE_PMSEL_FIELD_WIDTH.  This patch fixes the
      calculation.
      Signed-off-by: default avatarCarl Love <cel@us.ibm.com>
      Acked-by: default avatarPaul Mackerras <paulus@samba.org>
      Signed-off-by: default avatarBenjamin Herrenschmidt <benh@kernel.crashing.org>
      46ed7a76
    • Steven Rostedt's avatar
      powerpc/pasemi: Fix crash on reboot · 72640d88
      Steven Rostedt authored
      commit f96972f2 "kernel/sys.c: call disable_nonboot_cpus() in
      kernel_restart()"
      
      added a call to disable_nonboot_cpus() on kernel_restart(), which tries
      to shutdown all the CPUs except the first one. The issue with the PA
      Semi, is that it does not support CPU hotplug.
      
      When the call is made to __cpu_down(), it calls the notifiers
      CPU_DOWN_PREPARE, and then tries to take the CPU down.
      
      One of the notifiers to the CPU hotplug code, is the cpufreq. The
      DOWN_PREPARE will call __cpufreq_remove_dev() which calls
      cpufreq_driver->exit. The PA Semi exit handler unmaps regions of I/O
      that is used by an interrupt that goes off constantly
      (system_reset_common, but it goes off during normal system operations
      too). I'm not sure exactly what this interrupt does.
      
      Running a simple function trace, you can see it goes off quite a bit:
      
      # tracer: function
      #
      #           TASK-PID    CPU#    TIMESTAMP  FUNCTION
      #              | |       |          |         |
                <idle>-0     [001]  1558.859363: .pasemi_system_reset_exception <-.system_reset_exception
                <idle>-0     [000]  1558.860112: .pasemi_system_reset_exception <-.system_reset_exception
                <idle>-0     [000]  1558.861109: .pasemi_system_reset_exception <-.system_reset_exception
                <idle>-0     [001]  1558.861361: .pasemi_system_reset_exception <-.system_reset_exception
                <idle>-0     [000]  1558.861437: .pasemi_system_reset_exception <-.system_reset_exception
      
      When the region is unmapped, the system crashes with:
      
      Disabling non-boot CPUs ...
      Error taking CPU1 down: -38
      Unable to handle kernel paging request for data at address 0xd0000800903a0100
      Faulting instruction address: 0xc000000000055fcc
      Oops: Kernel access of bad area, sig: 11 [#1]
      PREEMPT SMP NR_CPUS=64 NUMA PA Semi PWRficient
      Modules linked in: shpchp
      NIP: c000000000055fcc LR: c000000000055fb4 CTR: c0000000000df1fc
      REGS: c0000000012175d0 TRAP: 0300   Not tainted  (3.8.0-rc4-test-dirty)
      MSR: 9000000000009032 <SF,HV,EE,ME,IR,DR,RI>  CR: 24000088  XER: 00000000
      SOFTE: 0
      DAR: d0000800903a0100, DSISR: 42000000
      TASK = c0000000010e9008[0] 'swapper/0' THREAD: c000000001214000 CPU: 0
      GPR00: d0000800903a0000 c000000001217850 c0000000012167e0 0000000000000000
      GPR04: 0000000000000000 0000000000000724 0000000000000724 0000000000000000
      GPR08: 0000000000000000 0000000000000000 0000000000000001 0000000000a70000
      GPR12: 0000000024000080 c00000000fff0000 ffffffffffffffff 000000003ffffae0
      GPR16: ffffffffffffffff 0000000000a21198 0000000000000060 0000000000000000
      GPR20: 00000000008fdd35 0000000000a21258 000000003ffffaf0 0000000000000417
      GPR24: 0000000000a226d0 c000000000000000 0000000000000000 0000000000000000
      GPR28: c00000000138b358 0000000000000000 c000000001144818 d0000800903a0100
      NIP [c000000000055fcc] .set_astate+0x5c/0xa4
      LR [c000000000055fb4] .set_astate+0x44/0xa4
      Call Trace:
      [c000000001217850] [c000000000055fb4] .set_astate+0x44/0xa4 (unreliable)
      [c0000000012178f0] [c00000000005647c] .restore_astate+0x2c/0x34
      [c000000001217980] [c000000000054668] .pasemi_system_reset_exception+0x6c/0x88
      [c000000001217a00] [c000000000019ef0] .system_reset_exception+0x48/0x84
      [c000000001217a80] [c000000000001e40] system_reset_common+0x140/0x180
      Signed-off-by: default avatarBenjamin Herrenschmidt <benh@kernel.crashing.org>
      72640d88