1. 06 Mar, 2013 22 commits
    • David Ward's avatar
      net/ipv4: Timestamp option cannot overflow with prespecified addresses · fa2b04f4
      David Ward authored
      When a router forwards a packet that contains the IPv4 timestamp option,
      if there is no space left in the option for the router to add its own
      timestamp, then the router increments the Overflow value in the option.
      
      However, if the addresses of the routers are prespecified in the option,
      then the overflow condition cannot happen: the option is structured so
      that each prespecified router has a place to write its timestamp. Other
      routers do not add a timestamp, so there will never be a lack of space.
      
      This fix ensures that the Overflow value in the IPv4 timestamp option is
      not incremented when the addresses of the routers are prespecified, even
      if the Pointer value is greater than the Length value.
      Signed-off-by: default avatarDavid Ward <david.ward@ll.mit.edu>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      fa2b04f4
    • Eric Dumazet's avatar
      net: reduce net_rx_action() latency to 2 HZ · d1f41b67
      Eric Dumazet authored
      We should use time_after_eq() to get maximum latency of two ticks,
      instead of three.
      
      Bug added in commit 24f8b238 (net: increase receive packet quantum)
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d1f41b67
    • Randy Dunlap's avatar
      net: fix new kernel-doc warnings in net core · 691b3b7e
      Randy Dunlap authored
      Fix new kernel-doc warnings in net/core/dev.c:
      
      Warning(net/core/dev.c:4788): No description found for parameter 'new_carrier'
      Warning(net/core/dev.c:4788): Excess function parameter 'new_carries' description in 'dev_change_carrier'
      Signed-off-by: default avatarRandy Dunlap <rdunlap@infradead.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      691b3b7e
    • Zang MingJie's avatar
      reset nf before xmit vxlan encapsulated packet · 88c4c066
      Zang MingJie authored
      We should reset nf settings bond to the skb as ipip/ipgre do.
      
      If not, the conntrack/nat info bond to the origin packet may continually
      redirect the packet to vxlan interface causing a routing loop.
      
      this is the scenario:
      
           VETP     VXLAN Gateway
          /----\  /---------------\
          |    |  |               |
          |  vx+--+vx --NAT-> eth0+--> Internet
          |    |  |               |
          \----/  \---------------/
      
      when there are any packet coming from internet to the vetp, there will be lots
      of garbage packets coming out the gateway's vxlan interface, but none actually
      sent to the physical interface, because they are redirected back to the vxlan
      interface in the postrouting chain of NAT rule, and dmesg complains:
      
          Mar  1 21:52:53 debian kernel: [ 8802.997699] Dead loop on virtual device vxlan0, fix it urgently!
          Mar  1 21:52:54 debian kernel: [ 8804.004907] Dead loop on virtual device vxlan0, fix it urgently!
          Mar  1 21:52:55 debian kernel: [ 8805.012189] Dead loop on virtual device vxlan0, fix it urgently!
          Mar  1 21:52:56 debian kernel: [ 8806.020593] Dead loop on virtual device vxlan0, fix it urgently!
      
      the patch should fix the problem
      Signed-off-by: default avatarZang MingJie <zealot0630@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      88c4c066
    • Paolo Valente's avatar
      pkt_sched: sch_qfq: remove a useless invocation of qfq_update_eligible · 76e4cb0d
      Paolo Valente authored
      QFQ+ can select for service only 'eligible' aggregates, i.e.,
      aggregates that would have started to be served also in the emulated
      ideal system.  As a consequence, for QFQ+ to be work conserving, at
      least one of the active aggregates must be eligible when it is time to
      choose the next aggregate to serve.
      
      The set of eligible aggregates is updated through the function
      qfq_update_eligible(), which does guarantee that, after its
      invocation, at least one of the active aggregates is eligible.
      Because of this property, this function is invoked in
      qfq_deactivate_agg() to guarantee that at least one of the active
      aggregates is still eligible after an aggregate has been deactivated.
      In particular, the critical case is when there are other active
      aggregates, but the aggregate being deactivated happens to be the only
      one eligible.
      
      However, this precaution is not needed for QFQ+ to be work conserving,
      because update_eligible() is always invoked also at the beginning of
      qfq_choose_next_agg(). This patch removes the additional invocation of
      update_eligible() in qfq_deactivate_agg().
      Signed-off-by: default avatarPaolo Valente <paolo.valente@unimore.it>
      Reviewed-by: default avatarFabio Checconi <fchecconi@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      76e4cb0d
    • Paolo Valente's avatar
      pkt_sched: sch_qfq: do not allow virtual time to jump if an aggregate is in service · 40dd2d54
      Paolo Valente authored
      By definition of (the algorithm of) QFQ+, the system virtual time must
      be pushed up only if there is no 'eligible' aggregate, i.e. no
      aggregate that would have started to be served also in the ideal
      system emulated by QFQ+.  QFQ+ serves only eligible aggregates, hence
      the aggregate currently in service is eligible.  As a consequence, to
      decide whether there is no eligible aggregate, QFQ+ must also check
      whether there is no aggregate in service.
      Signed-off-by: default avatarPaolo Valente <paolo.valente@unimore.it>
      Reviewed-by: default avatarFabio Checconi <fchecconi@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      40dd2d54
    • Paolo Valente's avatar
      pkt_sched: sch_qfq: prevent budget from wrapping around after a dequeue · a0143efa
      Paolo Valente authored
      Aggregate budgets are computed so as to guarantee that, after an
      aggregate has been selected for service, that aggregate has enough
      budget to serve at least one maximum-size packet for the classes it
      contains. For this reason, after a new aggregate has been selected
      for service, its next packet is immediately dequeued, without any
      further control.
      
      The maximum packet size for a class, lmax, can be changed through
      qfq_change_class(). In case the user sets lmax to a lower value than
      the the size of some of the still-to-arrive packets, QFQ+ will
      automatically push up lmax as it enqueues these packets.  This
      automatic push up is likely to happen with TSO/GSO.
      
      In any case, if lmax is assigned a lower value than the size of some
      of the packets already enqueued for the class, then the following
      problem may occur: the size of the next packet to dequeue for the
      class may happen to be larger than lmax, after the aggregate to which
      the class belongs has been just selected for service. In this case,
      even the budget of the aggregate, which is an unsigned value, may be
      lower than the size of the next packet to dequeue. After dequeueing
      this packet and subtracting its size from the budget, the latter would
      wrap around.
      
      This fix prevents the budget from wrapping around after any packet
      dequeue.
      Signed-off-by: default avatarPaolo Valente <paolo.valente@unimore.it>
      Reviewed-by: default avatarFabio Checconi <fchecconi@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a0143efa
    • Paolo Valente's avatar
      pkt_sched: sch_qfq: serve activated aggregates immediately if the scheduler is empty · 2f3b89a1
      Paolo Valente authored
      If no aggregate is in service, then the function qfq_dequeue() does
      not dequeue any packet. For this reason, to guarantee QFQ+ to be work
      conserving, a just-activated aggregate must be set as in service
      immediately if it happens to be the only active aggregate.
      This is done by the function qfq_enqueue().
      
      Unfortunately, the function qfq_add_to_agg(), used to add a class to
      an aggregate, does not perform this important additional operation.
      In particular, if: 1) qfq_add_to_agg() is invoked to complete the move
      of a class from a source aggregate, becoming, for this move, inactive,
      to a destination aggregate, becoming instead active, and 2) the
      destination aggregate becomes the only active aggregate, then this
      aggregate is not however set as in service. QFQ+ remains then in a
      non-work-conserving state until a new invocation of qfq_enqueue()
      recovers the situation.
      
      This fix solves the problem by moving the logic for setting an
      aggregate as in service directly into the function qfq_activate_agg().
      Hence, from whatever point qfq_activate_aggregate() is invoked, QFQ+
      remains work conserving.  Since the more-complex logic of this new
      version of activate_aggregate() is not necessary, in qfq_dequeue(), to
      reschedule an aggregate that finishes its budget, then the aggregate
      is now rescheduled by invoking directly the functions needed.
      Signed-off-by: default avatarPaolo Valente <paolo.valente@unimore.it>
      Reviewed-by: default avatarFabio Checconi <fchecconi@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      2f3b89a1
    • Paolo Valente's avatar
      pkt_sched: sch_qfq: fix the update of eligible-group sets · 624b85fb
      Paolo Valente authored
      Between two invocations of make_eligible, the system virtual time may
      happen to grow enough that, in its binary representation, a bit with
      higher order than 31 flips. This happens especially with
      TSO/GSO. Before this fix, the mask used in make_eligible was computed
      as (1UL<<index_of_last_flipped_bit)-1, whose value is well defined on
      a 64-bit architecture, because index_of_flipped_bit <= 63, but is in
      general undefined on a 32-bit architecture if index_of_flipped_bit > 31.
      The fix just replaces 1UL with 1ULL.
      Signed-off-by: default avatarPaolo Valente <paolo.valente@unimore.it>
      Reviewed-by: default avatarFabio Checconi <fchecconi@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      624b85fb
    • Paolo Valente's avatar
      pkt_sched: sch_qfq: properly cap timestamps in charge_actual_service · 9b99b7e9
      Paolo Valente authored
      QFQ+ schedules the active aggregates in a group using a bucket list
      (one list per group). The bucket in which each aggregate is inserted
      depends on the aggregate's timestamps, and the number
      of buckets in a group is enough to accomodate the possible (range of)
      values of the timestamps of all the aggregates in the group. For this
      property to hold, timestamps must however be computed correctly.  One
      necessary condition for computing timestamps correctly is that the
      number of bits dequeued for each aggregate, while the aggregate is in
      service, does not exceed the maximum budget budgetmax assigned to the
      aggregate.
      
      For each aggregate, budgetmax is proportional to the number of classes
      in the aggregate. If the number of classes of the aggregate is
      decreased through qfq_change_class(), then budgetmax is decreased
      automatically as well.  Problems may occur if the aggregate is in
      service when budgetmax is decreased, because the current remaining
      budget of the aggregate and/or the service already received by the
      aggregate may happen to be larger than the new value of budgetmax.  In
      this case, when the aggregate is eventually deselected and its
      timestamps are updated, the aggregate may happen to have received an
      amount of service larger than budgetmax.  This may cause the aggregate
      to be assigned a higher virtual finish time than the maximum
      acceptable value for the last bucket in the bucket list of the group.
      
      This fix introduces a cap that addresses this issue.
      Signed-off-by: default avatarPaolo Valente <paolo.valente@unimore.it>
      Reviewed-by: default avatarFabio Checconi <fchecconi@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      9b99b7e9
    • Peter Hurley's avatar
      net/irda: Raise dtr in non-blocking open · f74861ca
      Peter Hurley authored
      DTR/RTS need to be raised, regardless of the open() mode, but not
      if the port has already shutdown.
      Signed-off-by: default avatarPeter Hurley <peter@hurleysoftware.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f74861ca
    • Peter Hurley's avatar
      net/irda: Use barrier to set task state · 0b176ce3
      Peter Hurley authored
      Without a memory and compiler barrier, the task state change
      can migrate relative to the condition testing in a blocking loop.
      However, the task state change must be visible across all cpus
      prior to testing those conditions. Failing to do this can result
      in the familiar 'lost wakeup' and this task will hang until killed.
      Signed-off-by: default avatarPeter Hurley <peter@hurleysoftware.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      0b176ce3
    • Peter Hurley's avatar
      net/irda: Hold port lock while bumping blocked_open · 2f7c069b
      Peter Hurley authored
      Although tty_lock() already protects concurrent update to
      blocked_open, that fails to meet the separation-of-concerns between
      tty_port and tty.
      Signed-off-by: default avatarPeter Hurley <peter@hurleysoftware.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      2f7c069b
    • Peter Hurley's avatar
      net/irda: Fix port open counts · a4ed2e73
      Peter Hurley authored
      Saving the port count bump is unsafe. If the tty is hung up while
      this open was blocking, the port count is zeroed.
      
      Explicitly check if the tty was hung up while blocking, and correct
      the port count if not.
      Signed-off-by: default avatarPeter Hurley <peter@hurleysoftware.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a4ed2e73
    • David S. Miller's avatar
      Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/net into intel · 0305d068
      David S. Miller authored
      Jeff Kirsher says:
      
      ===================
      This series contains fixes to e1000e and igb.
      
      The e1000e fix resolves an issue at 1000Mbps link speed, where one of the
      MAC's internal clocks can be stopped for up to 4us when entering K1 (a
      power mode of the MAC-PHY interconnect).  If the MAC is waiting for
      completion indications for 2 DMA write requests into Host memory
      (e.g. descriptor writeback or Rx packet writing) and the
      indications occur while the clock is stopped, both indications will be
      missed by the MAC causing the MAC to wait for the completion indications
      and be unable to generate further DMA write requests.  This results in an
      apparent hardware hang.  The patch works-around the issue by disabling
      the de-assertion of the clock request when 1000Mbps link is acquired (K1
      must be disabled while doing this).
      
      The igb fix to drop BUILD_BUG_ON check from igb_build_rx_buffer resolves
      a build error on s390 devices.  The igb driver was throwing a build error
      due to the fact that a frame built using build_skb would be larger than 2K.
      Since this is not likely to change at any point in the future we are better
      off just dropping the check since we already had a check in
      igb_set_rx_buffer_len that will just disable the usage of build_skb anyway.
      
      The igb fix for i210 link setup changes the setup copper link function
      to use a switch statement, so that the appropriate setup link function
      is called for the given PHY types.
      
      Lastly, the igb fix for a lockdep issue in igb_get_i2c_client resolves
      the issue by re-factoring the initialization and usage of the i2c_client.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      0305d068
    • Linus Torvalds's avatar
      Merge branch 'merge' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc · 9f225788
      Linus Torvalds authored
      Pull powerpc fixes from Ben Herrenschmidt:
       "Here are a few powerpc bits & fixes for rc1.  A couple of str*cpy
        fixes, some fixes in handling the FSCR register on Power8 (controls
        the enabling of processor features), a 32-bit build fix and a couple
        more nits."
      
      * 'merge' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc:
        powerpc: Set DSCR bit in FSCR setup
        powerpc: Add DSCR FSCR register bit definition
        powerpc: Fix setting FSCR for HV=0 and on secondary CPUs
        powerpc: Wireup the kcmp syscall to sys_ni
        powerpc: Remove unused BITOP_LE_SWIZZLE macro
        powerpc: Avoid link stack corruption in MMU on syscall entry path
        drivers/tty/hvc: Use strlcpy instead of strncpy
        powerpc/pseries/hvcserver: Fix strncpy buffer limit in location code
        powerpc: Fix compile of sha1-powerpc-asm.S on 32-bit
      9f225788
    • Linus Torvalds's avatar
      Merge tag 'fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux · d7b815d4
      Linus Torvalds authored
      Pull virtio hwrng fix from Rusty Russell:
       "Nasty side-effect of vmalloc'ing modules: their static vars cannot be
        put into scatterlists.  Jens has a check queued for this, so it
        shouldn't happen again.
      
        We could fix this in virtio_rng, but it's actually far easier to just
        do it in the core"
      
      * tag 'fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux:
        hw_random: make buffer usable in scatterlist.
      d7b815d4
    • Linus Torvalds's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net · 9da060d0
      Linus Torvalds authored
      Pull networking fixes from David Miller:
       "A moderately sized pile of fixes, some specifically for merge window
        introduced regressions although others are for longer standing items
        and have been queued up for -stable.
      
        I'm kind of tired of all the RDS protocol bugs over the years, to be
        honest, it's way out of proportion to the number of people who
        actually use it.
      
         1) Fix missing range initialization in netfilter IPSET, from Jozsef
            Kadlecsik.
      
         2) ieee80211_local->tim_lock needs to use BH disabling, from Johannes
            Berg.
      
         3) Fix DMA syncing in SFC driver, from Ben Hutchings.
      
         4) Fix regression in BOND device MAC address setting, from Jiri
            Pirko.
      
         5) Missing usb_free_urb in ISDN Hisax driver, from Marina Makienko.
      
         6) Fix UDP checksumming in bnx2x driver for 57710 and 57711 chips,
            fix from Dmitry Kravkov.
      
         7) Missing cfgspace_lock initialization in BCMA driver.
      
         8) Validate parameter size for SCTP assoc stats getsockopt(), from
            Guenter Roeck.
      
         9) Fix SCTP association hangs, from Lee A Roberts.
      
        10) Fix jumbo frame handling in r8169, from Francois Romieu.
      
        11) Fix phy_device memory leak, from Petr Malat.
      
        12) Omit trailing FCS from frames received in BGMAC driver, from Hauke
            Mehrtens.
      
        13) Missing socket refcount release in L2TP, from Guillaume Nault.
      
        14) sctp_endpoint_init should respect passed in gfp_t, rather than use
            GFP_KERNEL unconditionally.  From Dan Carpenter.
      
        15) Add AISX AX88179 USB driver, from Freddy Xin.
      
        16) Remove MAINTAINERS entries for drivers deleted during the merge
            window, from Cesar Eduardo Barros.
      
        17) RDS protocol can try to allocate huge amounts of memory, check
            that the user's request length makes sense, from Cong Wang.
      
        18) SCTP should use the provided KMALLOC_MAX_SIZE instead of it's own,
            bogus, definition.  From Cong Wang.
      
        19) Fix deadlocks in FEC driver by moving TX reclaim into NAPI poll,
            from Frank Li.  Also, fix a build error introduced in the merge
            window.
      
        20) Fix bogus purging of default routes in ipv6, from Lorenzo Colitti.
      
        21) Don't double count RTT measurements when we leave the TCP receive
            fast path, from Neal Cardwell."
      
      * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (61 commits)
        tcp: fix double-counted receiver RTT when leaving receiver fast path
        CAIF: fix sparse warning for caif_usb
        rds: simplify a warning message
        net: fec: fix build error in no MXC platform
        net: ipv6: Don't purge default router if accept_ra=2
        net: fec: put tx to napi poll function to fix dead lock
        sctp: use KMALLOC_MAX_SIZE instead of its own MAX_KMALLOC_SIZE
        rds: limit the size allocated by rds_message_alloc()
        MAINTAINERS: remove eexpress
        MAINTAINERS: remove drivers/net/wan/cycx*
        MAINTAINERS: remove 3c505
        caif_dev: fix sparse warnings for caif_flow_cb
        ax88179_178a: ASIX AX88179_178A USB 3.0/2.0 to gigabit ethernet adapter driver
        sctp: use the passed in gfp flags instead GFP_KERNEL
        ipv[4|6]: correct dropwatch false positive in local_deliver_finish
        l2tp: Restore socket refcount when sendmsg succeeds
        net/phy: micrel: Disable asymmetric pause for KSZ9021
        bgmac: omit the fcs
        phy: Fix phy_device_free memory leak
        bnx2x: Fix KR2 work-around condition
        ...
      9da060d0
    • Linus Torvalds's avatar
      Merge branch 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · e3b59518
      Linus Torvalds authored
      Pull irq fixes and cleanups from Thomas Gleixner:
       "Commit e5ab012c ("nohz: Make tick_nohz_irq_exit() irq safe") is
        the first commit in the series and the minimal necessary bugfix, which
        needs to go back into stable.
      
        The remanining commits enforce irq disabling in irq_exit(), sanitize
        the hardirq/softirq preempt count transition and remove a bunch of no
        longer necessary conditionals."
      
      I personally love getting rid of the very subtle and confusing
      IRQ_EXIT_OFFSET thing.  Even apart from the whole "more lines removed
      than added" thing.
      
      * 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        irq: Don't re-enable interrupts at the end of irq_exit
        irq: Remove IRQ_EXIT_OFFSET workaround
        Revert "nohz: Make tick_nohz_irq_exit() irq safe"
        irq: Sanitize invoke_softirq
        irq: Ensure irq_exit() code runs with interrupts disabled
        nohz: Make tick_nohz_irq_exit() irq safe
      e3b59518
    • Linus Torvalds's avatar
      Merge branch 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 6516ab6f
      Linus Torvalds authored
      Pull smpboot bugfix from Thomas Gleixner:
       "A single bugfix for a regression introduced with the conversion of the
        stop machine threads to the generic smpboot thread management
        facility"
      
      * 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        stop_machine: Mark per cpu stopper enabled early
      6516ab6f
    • Linus Torvalds's avatar
      Merge tag 'gpio-for-linus' of git://git.secretlab.ca/git/linux · 06e79d3b
      Linus Torvalds authored
      Pull second round of GPIO changes from Grant Likely:
       "This branch contains a few bug fixes that I missed the first time
        around and updates to the gpio_desc series included in the first pull
        request.  This tag has been retagged to drop the 2 head commits
        because the one of them caused a build failure."
      
      * tag 'gpio-for-linus' of git://git.secretlab.ca/git/linux:
        gpio/gpio-ich: fix ichx_gpio_check_available() return what callers expect
        gpiolib: move comment to right function
        gpiolib: use const parameters when possible
        gpiolib: check descriptors validity before use
      06e79d3b
    • Linus Torvalds's avatar
      Merge tag 'md-3.9' of git://neil.brown.name/md · a5e0d731
      Linus Torvalds authored
      Pull md updates from NeilBrown:
       "Mostly little bugfixes.
      
        Only "feature" is a new RAID10 layout which slightly improves the
        number of sets of devices that can concurrently fail, without data
        loss."
      
      * tag 'md-3.9' of git://neil.brown.name/md:
        md: expedite metadata update when switching  read-auto -> active
        md: remove CONFIG_MULTICORE_RAID456
        md/raid1,raid10: fix deadlock with freeze_array()
        md/raid0: improve error message when converting RAID4-with-spares to RAID0
        md: raid0: fix error return from create_stripe_zones.
        md: fix two bugs when attempting to resize RAID0 array.
        DM RAID: Add support for MD's RAID10 "far" and "offset" algorithms
        MD RAID10: Improve redundancy for 'far' and 'offset' algorithms (part 2)
        MD RAID10: Improve redundancy for 'far' and 'offset' algorithms (part 1)
        MD RAID10: Minor non-functional code changes
        md: raid1,10: Handle REQ_WRITE_SAME flag in write bios
        md: protect against crash upon fsync on ro array
      a5e0d731
  2. 05 Mar, 2013 13 commits
  3. 04 Mar, 2013 5 commits