1. 01 Oct, 2012 22 commits
    • Jack Morgenstein's avatar
      IB/mlx4: Create paravirt contexts for VFs when master IB driver initializes · 3806d08c
      Jack Morgenstein authored
      When we have VFs and PFs on same host, the VFs are activated within
      the mlx4_core module before the mlx4_ib kernel module is loaded.
      
      When the mlx4_ib module initializes the PF (master), it now creates
      MAD paravirtualization contexts for any VFs that already active.
      Signed-off-by: default avatarJack Morgenstein <jackm@dev.mellanox.co.il>
      Signed-off-by: default avatarRoland Dreier <roland@purestorage.com>
      3806d08c
    • Jack Morgenstein's avatar
      mlx4: Modify proxy/tunnel QP mechanism so that guests do no calculations · 47605df9
      Jack Morgenstein authored
      Previously, the structure of a guest's proxy QPs followed the
      structure of the PPF special qps (qp0 port 1, qp0 port 2, qp1 port 1,
      qp1 port 2, ...).  The guest then did offset calculations on the
      sqp_base qp number that the PPF passed to it in QUERY_FUNC_CAP().
      
      This is now changed so that the guest does no offset calculations
      regarding proxy or tunnel QPs to use.  This change frees the PPF from
      needing to adhere to a specific order in allocating proxy and tunnel
      QPs.
      
      Now QUERY_FUNC_CAP provides each port individually with its proxy
      qp0, proxy qp1, tunnel qp0, and tunnel qp1 QP numbers, and these are
      used directly where required (with no offset calculations).
      
      To accomplish this change, several fields were added to the phys_caps
      structure for use by the PPF and by non-SR-IOV mode:
      
          base_sqpn -- in non-sriov mode, this was formerly sqp_start.
          base_proxy_sqpn -- the first physical proxy qp number -- used by PPF
          base_tunnel_sqpn -- the first physical tunnel qp number -- used by PPF.
      
      The current code in the PPF still adheres to the previous layout of
      sqps, proxy-sqps and tunnel-sqps.  However, the PPF can change this
      layout without affecting VF or (paravirtualized) PF code.
      Signed-off-by: default avatarJack Morgenstein <jackm@dev.mellanox.co.il>
      Signed-off-by: default avatarRoland Dreier <roland@purestorage.com>
      47605df9
    • Jack Morgenstein's avatar
      mlx4: Paravirtualize Node Guids for slaves · afa8fd1d
      Jack Morgenstein authored
      This is necessary in order to support > 1 VF/PF in a VM for software
      that uses the node guid as a discriminator, such as librdmacm.
      Signed-off-by: default avatarJack Morgenstein <jackm@dev.mellanox.co.il>
      Signed-off-by: default avatarRoland Dreier <roland@purestorage.com>
      afa8fd1d
    • Jack Morgenstein's avatar
      mlx4: Activate SR-IOV mode for IB · 026149cb
      Jack Morgenstein authored
      Remove the error returns for IB ports from mlx4_ib_add,
      mlx4_INIT_PORT_wrapper, and mlx4_CLOSE_PORT_wrapper.
      
      Currently, SRIOV is supported only for devices for which the
      link layer is IB on all ports; RoCE support will be added later.
      Signed-off-by: default avatarJack Morgenstein <jackm@dev.mellanox.co.il>
      Signed-off-by: default avatarRoland Dreier <roland@purestorage.com>
      026149cb
    • Jack Morgenstein's avatar
      IB/mlx4: Miscellaneous adjustments for SR-IOV IB support · 992e8e6e
      Jack Morgenstein authored
      1. Allow only master to change node description.
      2. Prevent AH leakage in send mads.
      3. Take device part number from PCI structure, so that guests see the
         VF part number (and not the PF part number).
      4. Place the device revision ID into caps structure at startup.
      5. SET_PORT in update_gids_task needs to go through wrapper on master.
      6. In mlx4_ib_event(), PORT_MGMT_EVENT needs be handled in a work
         queue on the master, since it propagates events to slaves using
         GEN_EQE.
      7. Do not support FMR on slaves.
      8. Add spinlock to slave_event(), since it is called both in interrupt
         context and in process context (due to 6 above, and also if
         smp_snoop is used).  This fix was found and implemented by Saeed
         Mahameed <saeedm@mellanox.com>
      Signed-off-by: default avatarJack Morgenstein <jackm@dev.mellanox.co.il>
      Signed-off-by: default avatarRoland Dreier <roland@purestorage.com>
      992e8e6e
    • Jack Morgenstein's avatar
      mlx4_core: INIT/CLOSE port logic for IB ports in SR-IOV mode · 980e9001
      Jack Morgenstein authored
      Normally, INIT_PORT and CLOSE_PORT are invoked when special QP0
      transitions to RTR, or transitions to ERR/RESET respectively.
      
      In SR-IOV mode, however, the master is also paravirtualized.  This in
      turn requires that we not do INIT_PORT until the entire QP0 path (real
      QP0 and proxy QP0) is ready to receive.  When the real QP0 goes down,
      we should indicate that the port is not active.
      Signed-off-by: default avatarJack Morgenstein <jackm@dev.mellanox.co.il>
      Signed-off-by: default avatarRoland Dreier <roland@purestorage.com>
      980e9001
    • Jack Morgenstein's avatar
      net/mlx4_core: Adjustments to SET_PORT for IB SR-IOV · efcd235d
      Jack Morgenstein authored
      1. Slaves may not set the IS_SM capability for the port.
      2. DEV_MGMT may not be set in multifunction mode.
      Signed-off-by: default avatarJack Morgenstein <jackm@dev.mellanox.co.il>
      Signed-off-by: default avatarRoland Dreier <roland@purestorage.com>
      efcd235d
    • Jack Morgenstein's avatar
      IB/mlx4: Add iov directory in sysfs under the ib device · c1e7e466
      Jack Morgenstein authored
      This directory is added only for the master -- slaves do not have it.
      
      The sysfs iov directory is used to manage and examine the port P_Key
      and guid paravirtualization.
      
      Under iov/ports, the administrator may examine the gid and P_Key tables
      as they are present in the device (and as are seen in the "network
      view" presented to the SM).
      
      Under the iov/<pci slot number> directories, the admin may map the
      index numbers in the physical tables (as under iov/ports) to the
      paravirtualized index numbers that guests see.
      
      For example, if the administrator, for port 1 on guest 2 maps physical
      pkey index 10 to virtual index 1, then that guest, whenever it uses
      its pkey index 1, will actually be using the real pkey index 10.
      
      Based on patch from Erez Shitrit <erezsh@mellanox.com>
      Signed-off-by: default avatarJack Morgenstein <jackm@dev.mellanox.co.il>
      Signed-off-by: default avatarRoland Dreier <roland@purestorage.com>
      c1e7e466
    • Jack Morgenstein's avatar
      IB/mlx4: Propagate P_Key and guid change port management events to slaves · 2a4fae14
      Jack Morgenstein authored
      P_Key change and guid change events are not of interest to all slaves,
      but only to those slaves which "see" the table slots whose contents
      have change.
      
      For example, if the guid at port 1, index 5 has changed in the PPF, we
      wish to propagate the gid-change event only to the function which has
      that guid index mapped to its port/guid table (in this case it is
      slave #5). Other functions should not get the event, since the event
      does not affect them.
      
      Similarly with P_Keys -- P_Key change events are forwarded only to
      slaves which have that P_Key index mapped to their virtual P_Key table.
      Signed-off-by: default avatarJack Morgenstein <jackm@dev.mellanox.co.il>
      Signed-off-by: default avatarRoland Dreier <roland@purestorage.com>
      2a4fae14
    • Jack Morgenstein's avatar
      mlx4: Add alias_guid mechanism · a0c64a17
      Jack Morgenstein authored
      For IB ports, we paravirtualize the GUID at index 0 on slaves.  The
      GUID at index 0 seen by a slave is the actual GUID occupying the GUID
      table at the slave-id index.
      
      The driver, by default, requests at startup time that subnet manager
      populate its entire guid table with GUIDs. These guids are then mapped
      (paravirtualized) to the slaves, and appear for each slave as its GUID
      at index 0.
      
      Until each slave has such a guid, its port status is DOWN.
      
      The guid table is cached to support special QP paravirtualization, and
      event propagation to slaves on guid change (we test to see if the guid
      really changed before propagating an event to the slave).
      
      To support this caching, add capability to __mlx4_ib_query_gid() to
      obtain the network view (i.e., physical view) gid at index X, not just
      the host (paravirtualized) view.
      
      Based on a patch from Erez Shitrit <erezsh@mellanox.com>
      Signed-off-by: default avatarJack Morgenstein <jackm@dev.mellanox.co.il>
      Signed-off-by: default avatarRoland Dreier <roland@purestorage.com>
      a0c64a17
    • Jack Morgenstein's avatar
      mlx4_core: Add IB port-state machine and port mgmt event propagation · 993c401e
      Jack Morgenstein authored
      For an IB port, a slave should not show port active until that slave
      has a valid alias-guid (provided by the subnet manager).  Therefore
      the port-up event should be passed to a slave only after both the port
      is up, and the slave's alias-guid has been set.
      
      Also, provide the infrastructure for propagating port-management
      events (client-reregister, etc) to slaves.
      Signed-off-by: default avatarJack Morgenstein <jackm@dev.mellanox.co.il>
      Signed-off-by: default avatarRoland Dreier <roland@purestorage.com>
      993c401e
    • Amir Vadai's avatar
      IB/mlx4: Add CM paravirtualization · 3cf69cc8
      Amir Vadai authored
      In CM para-virtualization:
      
      1. Incoming requests are steered to the correct vHCA according to the
         embedded GID.
      2. Communication IDs on outgoing requests are replaced by a globally
         unique ID, generated by the PPF, since there is no synchronization
         of ID generation between guests (and so these IDs are not
         guaranteed to be globally unique).  The guest's comm ID is stored,
         and is returned to the response MAD when it arrives.
      Signed-off-by: default avatarAmir Vadai <amirv@mellanox.co.il>
      Signed-off-by: default avatarJack Morgenstein <jackm@dev.mellanox.co.il>
      Signed-off-by: default avatarRoland Dreier <roland@purestorage.com>
      3cf69cc8
    • Oren Duer's avatar
      IB/mlx4: Add multicast group (MCG) paravirtualization for SR-IOV · b9c5d6a6
      Oren Duer authored
      MCG paravirtualization support includes:
      - Creating multicast groups by VFs, and keeping accounting of them
      - Leaving multicast groups by VFs
      - Updating SM only with real changes in the overall picture of MCGs status
      - Creation of MGID=0 groups (let SM choose MGID)
      
      Note that the MCG module maintains its own internal MCG object
      reference counts.  The reason for this is that the IB core is used to
      track only the multicast groups joins generated by the PF it runs
      over.  The PF IB core layer is unaware of slaves, so it cannot be used
      to keep track of MCG joins they generate.
      Signed-off-by: default avatarOren Duer <oren@mellanox.co.il>
      Signed-off-by: default avatarEli Cohen <eli@mellanox.com>
      Signed-off-by: default avatarJack Morgenstein <jackm@dev.mellanox.co.il>
      Signed-off-by: default avatarRoland Dreier <roland@purestorage.com>
      b9c5d6a6
    • Jack Morgenstein's avatar
      mlx4: MAD_IFC paravirtualization · 0a9a0188
      Jack Morgenstein authored
      The MAD_IFC firmware command fulfills two functions.
      
      First, it is used in the QP0/QP1 MAD-handling flow to obtain
      information from the FW (for answering queries), and for setting
      variables in the HCA (MAD SET packets).
      
      For this, MAD_IFC should provide the FW (physical) view of the data.
      This is the view that OpenSM needs.  We call this the "network view".
      
      In the second case, MAD_IFC is used by various verbs to obtain data
      regarding the local HCA (e.g., ib_query_device()).  We call this the
      "host view".
      
      This data needs to be paravirtualized.
      
      MAD_IFC therefore needs a wrapper function, and also needs another
      flag indicating whether it should provide the network view (when it is
      called by ib_process_mad in special-qp packet handling), or the host
      view (when it is called while implementing a verb).
      
      There are currently 2 flag parameters in mlx4_MAD_IFC already:
      ignore_bkey and ignore_mkey.  These two parameters are replaced by a
      single "mad_ifc_flags" parameter, with different bits set for each
      flag.  A third flag is added: "network-view/host-view".
      Signed-off-by: default avatarJack Morgenstein <jackm@dev.mellanox.co.il>
      Signed-off-by: default avatarRoland Dreier <roland@purestorage.com>
      0a9a0188
    • Jack Morgenstein's avatar
      IB/mlx4: SR-IOV multiplex and demultiplex MADs · 37bfc7c1
      Jack Morgenstein authored
      Special QPs are paravirtualized.
      
      vHCAs are not given direct access to QP0/1. Rather, these QPs are
      operated by a special context hosted by the PF, which mediates access
      to/from vHCAs.  This is done by opening a "tunnel" per vHCA port per
      QP0/1. A tunnel comprises a pair of UD QPs: a "Tunnel QP" in the
      PF-context and a "Proxy QP" in the vHCA.  All vHCA MAD traffic must
      pass through the corresponding tunnel.  vHCA QPs cannot be assigned to
      VL15 and are denied of the well-known QKey.
      
      Outgoing messages are "de-multiplexed" (i.e., directed to the wire via
      the real special QP).
      
      Incoming messages are "multiplexed" (i.e. steered by the PPF to the
      correct VF or to the PF)
      
      QP0 access is restricted to the PF vHCA. VF vHCAs also have (virtual)
      QP0s, but they never receive any SMPs and all SMPs sent are discarded.
      QP1 traffic is allowed for all vHCAs, but special care is required to
      bridge the gap between the host and network views.
      
      Specifically:
      - Transaction IDs are mapped to guarantee uniqueness among vHCAs
      - CM para-virtualization
        o   Incoming requests are steered to the correct vHCA according to the embedded GID
        o   Local communication IDs are mapped to ensure uniqueness among vHCAs
        (see the patch that adds CM paravirtualization.)
      - Multicast para-virtualization
        o   The PF context aggregates membership state from all vHCAs
        o   The SA is contacted only when the aggregate membership changes
        o   If the aggregate does not change, the PF context will provide the
            requesting vHCA with the proper response.
        (see the patch that adds multicast group paravirtualization)
      
      Incoming MADs are steered according to:
      - the DGID If a GRH is present
      - the mapped transaction ID for response MADs
      - the embedded GID in CM requests
      - the remote communication ID in other CM messages
      Signed-off-by: default avatarJack Morgenstein <jackm@dev.mellanox.co.il>
      Signed-off-by: default avatarRoland Dreier <roland@purestorage.com>
      37bfc7c1
    • Jack Morgenstein's avatar
      mlx4: Implement QP paravirtualization and maintain phys_pkey_cache for smp_snoop · 54679e14
      Jack Morgenstein authored
      This requires:
      
      1. Replacing the paravirtualized P_Key index (inserted by the guest)
         with the real P_Key index.
      
      2. For UD QPs, placing the guest's true source GID index in the
         address path structure mgid field, and setting the ud_force_mgid
         bit so that the mgid is taken from the QP context and not from the
         WQE when posting sends.
      
      3. For UC and RC QPs, placing the guest's true source GID index in the
         address path structure mgid field.
      
      4. For tunnel and proxy QPs, setting the Q_Key value reserved for that
         proxy/tunnel pair.
      
      Since not all the above adjustments occur in all the QP transitions,
      the QP transitions require separate wrapper functions.
      
      Secondly, initialize the P_Key virtualization table to its default
      values: Master virtualized table is 1-1 with the real P_Key table,
      guest virtualized table has P_Key index 0 mapped to the real P_Key
      index 0, and all the other P_Key indices mapped to the reserved
      (invalid) P_Key at index 127.
      
      Finally, add logic in smp_snoop for maintaining the phys_P_Key_cache.
      and generating events on the master only if a P_Key actually changed.
      Signed-off-by: default avatarJack Morgenstein <jackm@dev.mellanox.co.il>
      Signed-off-by: default avatarRoland Dreier <roland@purestorage.com>
      54679e14
    • Jack Morgenstein's avatar
      IB/mlx4: Initialize SR-IOV IB support for slaves in master context · fc06573d
      Jack Morgenstein authored
      Allocate SR-IOV paravirtualization resources and MAD demuxing contexts
      on the master.
      
      This has two parts.  The first part is to initialize the structures to
      contain the contexts.  This is done at master startup time in
      mlx4_ib_init_sriov().
      
      The second part is to actually create the tunneling resources required
      on the master to support a slave.  This is performed the master
      detects that a slave has started up (MLX4_DEV_EVENT_SLAVE_INIT event
      generated when a slave initializes its comm channel).
      
      For the master, there is no such startup event, so it creates its own
      tunneling resources when it starts up.  In addition, the master also
      creates the real special QPs.  The ib_core layer on the master causes
      creation of proxy special QPs, since the master is also
      paravirtualized at the ib_core layer.
      Signed-off-by: default avatarJack Morgenstein <jackm@dev.mellanox.co.il>
      Signed-off-by: default avatarRoland Dreier <roland@purestorage.com>
      fc06573d
    • Jack Morgenstein's avatar
      mlx4_core: Add proxy and tunnel QPs to the reserved QP area · e2c76824
      Jack Morgenstein authored
      In addition, pass the proxy and tunnel QP numbers to slaves so the
      driver can perform special QP paravirtualization.
      Signed-off-by: default avatarJack Morgenstein <jackm@dev.mellanox.co.il>
      Signed-off-by: default avatarRoland Dreier <roland@purestorage.com>
      e2c76824
    • Jack Morgenstein's avatar
      IB/mlx4: SR-IOV IB context objects and proxy/tunnel SQP support · 1ffeb2eb
      Jack Morgenstein authored
      1. Introduce the basic SR-IOV parvirtualization context objects for
         multiplexing and demultiplexing MADs.
      2. Introduce support for the new proxy and tunnel QP types.
      
      This patch introduces the objects required by the master for managing
      QP paravirtualization for guests.
      
      struct mlx4_ib_sriov is created by the master only.
      It is a container for the following:
      
      1. All the info required by the PPF to multiplex and de-multiplex MADs
         (including those from the PF). (struct mlx4_ib_demux_ctx demux)
      2. All the info required to manage alias GUIDs (i.e., the GUID at
         index 0 that each guest perceives.  In fact, this is not the GUID
         which is actually at index 0, but is, in fact, the GUID which is at
         index[<VF number>] in the physical table.
      3. structures which are used to manage CM paravirtualization
      4. structures for managing the real special QPs when running in SR-IOV
         mode.  The real SQPs are controlled by the PPF in this case.  All
         SQPs created and controlled by the ib core layer are proxy SQP.
      
      struct mlx4_ib_demux_ctx contains the information per port needed
      to manage paravirtualization:
      
      1. All multicast paravirt info
      2. All tunnel-qp paravirt info for the port.
      3. GUID-table and GUID-prefix for the port
      4. work queues.
      
      struct mlx4_ib_demux_pv_ctx contains all the info for managing the
      paravirtualized QPs for one slave/port.
      
      struct mlx4_ib_demux_pv_qp contains the info need to run an individual
      QP (either tunnel qp or real SQP).
      
      Note:  We made use of the 2 most significant bits in enum
      mlx4_ib_qp_flags (based on enum ib_qp_create_flags in ib_verbs.h).
      We need these bits in the low-level driver for internal purposes.
      Signed-off-by: default avatarJack Morgenstein <jackm@dev.mellanox.co.il>
      Signed-off-by: default avatarRoland Dreier <roland@purestorage.com>
      1ffeb2eb
    • Jack Morgenstein's avatar
      IB/core: Add ib_find_exact_cached_pkey() · 73aaa741
      Jack Morgenstein authored
      When P_Key tables potentially contain both full and partial membership
      copies for the same P_Key, we need a function to find the index for an
      exact (16-bit) P_Key.
      
      This is necessary when the master forwards QP1 MADs sent by guests.
      If the guest has sent the MAD with a limited membership P_Key, we need
      to to forward the MAD using the same limited membership P_Key.  Since
      the master may have both the limited and the full member P_Keys in its
      table, we must make sure to retrieve the limited membership P_Key in
      this case.
      Signed-off-by: default avatarJack Morgenstein <jackm@dev.mellanox.co.il>
      Signed-off-by: default avatarOr Gerlitz <ogerlitz@mellanox.com>
      Signed-off-by: default avatarRoland Dreier <roland@purestorage.com>
      73aaa741
    • Jack Morgenstein's avatar
      IB/core: Handle table with full and partial membership for the same P_Key · ff7166c4
      Jack Morgenstein authored
      Extend the cached and non-cached P_Key table lookups to handle limited
      and full membership of the same P_Key to co-exist in the P_Key table.
      
      This is necessary for SR-IOV, to allow for some guests would to have
      the full membership P_Key in their virtual P_Key table, while other
      guests on the same physical HCA would have the limited one.
      To support this, we need both the limited and full membership P_Keys
      to be present in the master's (hypervisor physical port) P_Key table.
      
      The algorithm for handling P_Key tables which contain both the limited
      and the full membership versions of the same P_Key works as follows:
      
      When scanning the P_Key table for a 15-bit P_Key:
      
      A. If there is a full member version of that P_Key anywhere in the
          table, return its index (even if a limited-member version of the
          P_Key exists earlier in the table).
      
      B. If the full member version is not in the table, but the
         limited-member version is in the table, return the index of the
         limited P_Key.
      Signed-off-by: default avatarLiran Liss <liranl@mellanox.com>
      Signed-off-by: default avatarJack Morgenstein <jackm@dev.mellanox.co.il>
      Signed-off-by: default avatarOr Gerlitz <ogerlitz@mellanox.com>
      Signed-off-by: default avatarRoland Dreier <roland@purestorage.com>
      ff7166c4
    • Jack Morgenstein's avatar
      IB/core: Reserve bits in enum ib_qp_create_flags for low-level driver use · d2b57063
      Jack Morgenstein authored
      Reserve bits 26-31 for internal use by low-level drivers. Two such
      bits are used in the mlx4_b driver SR-IOV implementation.
      
      These enum additions guarantee that the core layer will never use
      these bits, so that low level drivers may safely make use of them.
      Signed-off-by: default avatarJack Morgenstein <jackm@dev.mellanox.co.il>
      Signed-off-by: default avatarRoland Dreier <roland@purestorage.com>
      d2b57063
  2. 24 Sep, 2012 1 commit
  3. 23 Sep, 2012 8 commits
    • Linus Torvalds's avatar
      Merge branch 'rc-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/mmarek/kbuild · 56bae802
      Linus Torvalds authored
      Pull kbuild fixes from Michal Marek:
       "There are two more kbuild fixes for 3.6.
      
        One fixes a race between x86's archscripts target and the rule
        (re)building scripts/basic/fixdep.  The second is a fix for the
        previous attempt at fixing make firmware_install with make 3.82.
        This new solution should work with any version of GNU make"
      
      * 'rc-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/mmarek/kbuild:
        x86/kbuild: archscripts depends on scripts_basic
        firmware: fix directory creation rule matching with make 3.80
      56bae802
    • Linus Torvalds's avatar
      Merge branch 'hwmon-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jdelvare/staging · 0737c8d7
      Linus Torvalds authored
      Pull hwmon subsystem fixes from Jean Delvare.
      
      * 'hwmon-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jdelvare/staging:
        hwmon: (fam15h_power) Tweak runavg_range on resume
        hwmon: (coretemp) Use get_online_cpus to avoid races involving CPU hotplug
        hwmon: (via-cputemp) Use get_online_cpus to avoid races involving CPU hotplug
      0737c8d7
    • Linus Torvalds's avatar
      Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi · 0bf7a705
      Linus Torvalds authored
      Pull SCSI fixes from James Bottomley:
       "This is a set of four essential fixes: two oops related (bnx2i,
        virtio-scsi), one data corruption related (hpsa) and one failure to
        boot due to interrupt routing issues (mpt2ss).
      
        Signed-off-by: James Bottomley <JBottomley@Parallels.com>"
      
      * tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
        [SCSI] hpsa: fix handling of protocol error
        [SCSI] mpt2sas: Fix for issue - Unable to boot from the drive connected to HBA
        [SCSI] bnx2i: Fixed NULL ptr deference for 1G bnx2 Linux iSCSI offload
        [SCSI] scsi: virtio-scsi: Fix address translation failure of HighMem pages used by sg list
      0bf7a705
    • Shaun Ruffell's avatar
      edac_mc: edac_mc_free() cannot assume mem_ctl_info is registered in sysfs. · faa2ad09
      Shaun Ruffell authored
      Fix potential NULL pointer dereference in edac_unregister_sysfs() on
      system boot introduced in 3.6-rc1.
      
      Since commit 7a623c03 ("edac: rewrite the sysfs code to use struct
      device") edac_mc_alloc() no longer initializes embedded kobjects in
      struct mem_ctl_info.  Therefore edac_mc_free() can no longer simply
      decrement a kobject reference count to free the allocated memory unless
      the memory controller driver module had also called edac_mc_add_mc().
      
      Now edac_mc_free() will check if the newly embedded struct device has
      been registered with sysfs before using either the standard device
      release functions or freeing the data structures itself with logic
      pulled out of the error path of edac_mc_alloc().
      
      The BUG this patch resolves for me:
      
        BUG: unable to handle kernel NULL pointer dereference at   (null)
        EIP is at __wake_up_common+0x1a/0x6a
        Process modprobe (pid: 933, ti=f3dc6000 task=f3db9520 task.ti=f3dc6000)
        Call Trace:
          complete_all+0x3f/0x50
          device_pm_remove+0x23/0xa2
          device_del+0x34/0x142
          edac_unregister_sysfs+0x3b/0x5c [edac_core]
          edac_mc_free+0x29/0x2f [edac_core]
          e7xxx_probe1+0x268/0x311 [e7xxx_edac]
          e7xxx_init_one+0x56/0x61 [e7xxx_edac]
          local_pci_probe+0x13/0x15
        ...
      
      Cc: Mauro Carvalho Chehab <mchehab@redhat.com>
      Cc: Shaohui Xie <Shaohui.Xie@freescale.com>
      Signed-off-by: default avatarShaun Ruffell <sruffell@digium.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      faa2ad09
    • Fengguang Wu's avatar
      edac_mc: fix messy kfree calls in the error path · ef6e7816
      Fengguang Wu authored
      coccinelle warns about:
      
      + drivers/edac/edac_mc.c:429:9-23: ERROR: reference preceded by free on line 429
      
         421         if (mci->csrows) {
       > 422                 for (chn = 0; chn < tot_channels; chn++) {
         423                         csr = mci->csrows[chn];
         424                         if (csr) {
       > 425                                 for (chn = 0; chn < tot_channels; chn++)
         426                                          kfree(csr->channels[chn]);
         427                                  kfree(csr);
         428                          }
       > 429                          kfree(mci->csrows[i]);
         430                  }
         431                  kfree(mci->csrows);
         432          }
      
      and that code block seem to mess things up in several ways (double free, memory
      leak, out-of-bound reads etc.):
      
      L422: The iterator "chn" and bound "tot_channels" are totally wrong. Should be
            "row" and "tot_csrows" respectively. Which means either memory leak, or
            out-of-bound reads (which if does not trigger an immediate page fault
            error, will further lead to kfree() on random addresses).
      
      L425: The inner loop is reusing the same iterator "chn" as the outer loop,
            which could lead to premature end of the outer loop, and hence memory leak.
      
      L429: The array index 'i' in mci->csrows[i] is a temporary value used in
            previous loops, and won't change at all in the current loop. Which
            means either out-of-bound read and possibly kfree(random number), or the
            same mci->csrows[i] get freed once and again, and possibly double free
            for the kfree(csr) in L427.
      
      L426/L427: a kfree(csr->channels) is needed in between to avoid leaking the memory.
      
      The buggy code was introduced by commit de3910eb ("edac: change the mem
      allocation scheme to make Documentation/kobject.txt happy") in the 3.6-rc1
      merge window. Fix it by freeing up resources in this order:
      
        free csrows[i]->channels[j]
        free csrows[i]->channels
        free csrows[i]
        free csrows
      
      CC: Mauro Carvalho Chehab <mchehab@redhat.com>
      CC: Shaun Ruffell <sruffell@digium.com>
      Signed-off-by: default avatarFengguang Wu <fengguang.wu@intel.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      ef6e7816
    • Andreas Herrmann's avatar
      hwmon: (fam15h_power) Tweak runavg_range on resume · 5f0ecb90
      Andreas Herrmann authored
      The quirk introduced with commit
      00250ec9 (hwmon: fam15h_power: fix
      bogus values with current BIOSes) is not only required during driver
      load but also when system resumes from suspend. The BIOS might set the
      previously recommended (but unsuitable) initilization value for the
      running average range register during resume.
      Signed-off-by: default avatarAndreas Herrmann <andreas.herrmann3@amd.com>
      Tested-by: default avatarAndreas Hartmann <andihartmann@01019freenet.de>
      Signed-off-by: default avatarJean Delvare <khali@linux-fr.org>
      Cc: stable@vger.kernel.org # 3.0+
      5f0ecb90
    • Silas Boyd-Wickizer's avatar
      hwmon: (coretemp) Use get_online_cpus to avoid races involving CPU hotplug · 641f1456
      Silas Boyd-Wickizer authored
      coretemp_init loops with for_each_online_cpu, adding platform_devices
      and sysfs interfaces, then calls register_hotcpu_notifier.  There is a
      race if a CPU is offlined or onlined after the loop, but before
      register_hotcpu_notifier.  The race might result in the absence of a
      platform_device+sysfs interface for an online CPU, or the presence of
      a platform_device+sysfs interface for an offline CPU.  A similar race
      occurs during coretemp_exit, after the module calls
      unregister_hotcpu_notifier, but before it unregisters all devices, a
      CPU might offline and a device for an offline CPU will exist for a
      short while.
      
      This fix surrounds for_each_online_cpu and register_hotcpu_notifier
      with get_online_cpus+put_online_cpus; and surrounds
      unregister_hotcpu_notifier and device unregistering with
      get_online_cpus+put_online_cpus.
      
      Build tested.
      Signed-off-by: default avatarSilas Boyd-Wickizer <sbw@mit.edu>
      Signed-off-by: default avatarJean Delvare <khali@linux-fr.org>
      641f1456
    • Silas Boyd-Wickizer's avatar
      hwmon: (via-cputemp) Use get_online_cpus to avoid races involving CPU hotplug · 1ec3ddfd
      Silas Boyd-Wickizer authored
      via_cputemp_init loops with for_each_online_cpu, adding
      platform_devices, then calls register_hotcpu_notifier.  If a CPU is
      offlined between the loop and register_hotcpu_notifier, then later
      onlined, via_cputemp_device_add will attempt to add platform devices
      with the same ID.  A similar race occurs during via_cputemp_exit,
      after the module calls unregister_hotcpu_notifier, a CPU might offline
      and a device will exist for a CPU that is offline.
      
      This fix surrounds for_each_online_cpu and register_hotcpu_notifier
      with get_online_cpus+put_online_cpus; and surrounds
      unregister_hotcpu_notifier and device unregistering with
      get_online_cpus+put_online_cpus.
      
      Build tested.
      Signed-off-by: default avatarSilas Boyd-Wickizer <sbw@mit.edu>
      Acked-by: default avatarHarald Welte <laforge@gnumonks.org>
      Signed-off-by: default avatarJean Delvare <khali@linux-fr.org>
      1ec3ddfd
  4. 22 Sep, 2012 6 commits
  5. 21 Sep, 2012 3 commits
    • Linus Torvalds's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net · abef3bd7
      Linus Torvalds authored
      Pull networking updates from David Miller:
       "More bug fixes, nothing gets past these guys"
      
       1) More kernel info leaks found by Mathias Krause, this time in the
          IPSEC configuration layers.
      
       2) When IPSEC policies change, we do not properly make sure that cached
          routes (which could now be stale) throughout the system will be
          revalidated.  Fix this by generalizing the generation count
          invalidation scheme used by ipv4.  From Nicolas Dichtel.
      
       3) When repairing TCP sockets, we need to allow to restore not just the
          send window scale, but the receive one too.  Extend the existing
          interface to achieve this in a backwards compatible way.  From
          Andrey Vagin.
      
       4) A fix for FCOE scatter gather feature validation erroneously caused
          scatter gather to be disabled for things like AOE too.  From Ed L
          Cashin.
      
       5) Several cases of mishandling of error pointers, from Mathias Krause,
          Wei Yongjun, and Devendra Naga.
      
       6) Fix gianfar build, from Richard Cochran.
      
       7) CAP_NET_* failures should return -EPERM not -EACCES, from Zhao
          Hongjiang.
      
       8) Hardware reset fix in janz-ican3 CAN driver, from Ira W Snyder.
      
       9) Fix oops during rmmod in ti_hecc CAN driver, from Marc Kleine-Budde.
      
      10) The removal of the conditional compilation of the clk support code
          in the stmmac driver broke things.  This is because the interfaces
          used are the ones that don't also perform the enable/disable of the
          clk.  Fix from Stefan Roese.
      
      11) The QFQ packet scheduler can record out of range virtual start
          times, resulting later in misbehavior and even crashes.  Fix from
          Paolo Valente.
      
      12) If MSG_WAITALL is used with IOAT DMA under TCP, we can wedge the
          receiver when the advertised receive window goes to zero.  Detect
          this case and force the processing of the IOAT DMA queue when it
          happens to avoid getting stuck.  Fix from Michal Kubecek.
      
      13) batman-adv assumes that test_bit() returns only 0 or 1, but this is
          not true for x86 (which returns -1 or 0, via the 'sbb' instruction).
          Fix from Linus Lussing.
      
      14) Fix small packet corruption in e1000, from Tushar Dave.
      
      15) make_blackhole() in the IPSEC policy code can do one read unlock too
          many, fix from Li RongQing.
      
      16) The new tcp_try_coalesce() code introduced a bug in TCP URG
          handling, fix from Eric Dumazet.
      
      17) Fix memory leak in __netif_receive_skb() when doing zerocopy and
          when hit an OOM condition.  From Michael S Tsirkin.
      
      18) netxen blindly deferences pdev->bus->self, which is not guarenteed
          to be non-NULL.  Fix from Nikolay Aleksandrov.
      
      19) Fix a performance regression caused by mistakes in ipv6 checksum
          validation in the bnx2x driver, fix from Michal Schmidt.
      
      * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (45 commits)
        net/stmmac: Use clk_prepare_enable and clk_disable_unprepare
        net: change return values from -EACCES to -EPERM
        net/irda: sh_sir: fix return value check in sh_sir_set_baudrate()
        stmmac: fix return value check in stmmac_open_ext_timer()
        gianfar: fix phc index build failure
        ipv6: fix return value check in fib6_add()
        bnx2x: remove false warning regarding interrupt number
        can: ti_hecc: fix oops during rmmod
        can: janz-ican3: fix support for older hardware revisions
        net: do not disable sg for packets requiring no checksum
        aoe: assert AoE packets marked as requiring no checksum
        at91ether: return PTR_ERR if call to clk_get fails
        xfrm_user: don't copy esn replay window twice for new states
        xfrm_user: ensure user supplied esn replay window is valid
        xfrm_user: fix info leak in copy_to_user_tmpl()
        xfrm_user: fix info leak in copy_to_user_policy()
        xfrm_user: fix info leak in copy_to_user_state()
        xfrm_user: fix info leak in copy_to_user_auth()
        net: qmi_wwan: adding Huawei E367, ZTE MF683 and Pantech P4200
        tcp: restore rcv_wscale in a repair mode (v2)
        ...
      abef3bd7
    • Linus Torvalds's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc · 6219844e
      Linus Torvalds authored
      Pull sparc updates from David Miller:
      
      1) Debugging builds on 32-bit sparc need to handle the R_SPARC_DISP32
         relocation, not just 64-bit sparc.  From Andreas Larsson.
      
      2) Wei Yongjun noticed that module_alloc() on sparc can return an
         error pointer, but that's not allowed.  module_alloc() should
         return only a valid pointer, or NULL.
      
      * git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc:
        sparc: fix the return value of module_alloc()
        sparc32: Enable the relocation target R_SPARC_DISP32 for sparc32
      6219844e
    • Linus Torvalds's avatar
      Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 9d108907
      Linus Torvalds authored
      Pull x86 fixes from Ingo Molnar:
       "Small fixlets"
      
      * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        x86/mm/init.c: Fix devmem_is_allowed() off by one
        x86/kconfig: Remove outdated reference to Intel CPUs in CONFIG_SWIOTLB
      9d108907