Commits · 118ce7ab9785846e1c673f6130bee526c127206c · Kirill Smelkov / linux

27 Feb, 2014 1 commit

atm: nicstar: remove interruptible_sleep_on_timeout · 118ce7ab

Arnd Bergmann authored Feb 27, 2014

We are trying to finally kill off interruptible_sleep_on_timeout.
the two uses in the nicstar driver can be trivially replaced
with wait_event_interruptible_lock_irq_timeout, which prevents the
wake-up race and is able to check the buffer state with scq->lock
held.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Chas Williams <chas@cmf.nrl.navy.mil>
Signed-off-by: David S. Miller <davem@davemloft.net>

118ce7ab

26 Feb, 2014 31 commits

tcp: switch rtt estimations to usec resolution · 740b0f18

Eric Dumazet authored Feb 26, 2014

Upcoming congestion controls for TCP require usec resolution for RTT
estimations. Millisecond resolution is simply not enough these days.

FQ/pacing in DC environments also require this change for finer control
and removal of bimodal behavior due to the current hack in
tcp_update_pacing_rate() for 'small rtt'

TCP_CONG_RTT_STAMP is no longer needed.

As Julian Anastasov pointed out, we need to keep user compatibility :
tcp_metrics used to export RTT and RTTVAR in msec resolution,
so we added RTT_US and RTTVAR_US. An iproute2 patch is needed
to use the new attributes if provided by the kernel.

In this example ss command displays a srtt of 32 usecs (10Gbit link)

lpk51:~# ./ss -i dst lpk52
Netid  State      Recv-Q Send-Q   Local Address:Port       Peer
Address:Port
tcp    ESTAB      0      1         10.246.11.51:42959
10.246.11.52:64614
         cubic wscale:6,6 rto:201 rtt:0.032/0.001 ato:40 mss:1448
cwnd:10 send
3620.0Mbps pacing_rate 7240.0Mbps unacked:1 rcv_rtt:993 rcv_space:29559

Updated iproute2 ip command displays :

lpk51:~# ./ip tcp_metrics | grep 10.246.11.52
10.246.11.52 age 561.914sec cwnd 10 rtt 274us rttvar 213us source
10.246.11.51

Old binary displays :

lpk51:~# ip tcp_metrics | grep 10.246.11.52
10.246.11.52 age 561.914sec cwnd 10 rtt 250us rttvar 125us source
10.246.11.51

With help from Julian Anastasov, Stephen Hemminger and Yuchung Cheng
Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Cc: Stephen Hemminger <stephen@networkplumber.org>
Cc: Yuchung Cheng <ycheng@google.com>
Cc: Larry Brakmo <brakmo@google.com>
Cc: Julian Anastasov <ja@ssi.bg>
Signed-off-by: David S. Miller <davem@davemloft.net>

740b0f18

net: add skb_mstamp infrastructure · 363ec392

Eric Dumazet authored Feb 26, 2014

ktime_get() is too expensive on some cases, and we'd like to get
usec resolution timestamps in TCP stack.

This patch adds a light weight facility using a combination of
local_clock() and jiffies samples.

Instead of :

        u64 t0, t1;

        t0 = ktime_get();
        // stuff
        t1 = ktime_get();
        delta_us = ktime_us_delta(t1, t0);

use :
        struct skb_mstamp t0, t1;

        skb_mstamp_get(&t0);
        // stuff
        skb_mstamp_get(&t1);
        delta_us = skb_mstamp_us_delta(&t1, &t0);

Note : local_clock() might have a (bounded) drift between cpus.

Do not use this infra in place of ktime_get() without understanding the
issues.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Stephen Hemminger <stephen@networkplumber.org>
Cc: Yuchung Cheng <ycheng@google.com>
Cc: Neal Cardwell <ncardwell@google.com>
Cc: Larry Brakmo <brakmo@google.com>
Cc: Julian Anastasov <ja@ssi.bg>
Signed-off-by: David S. Miller <davem@davemloft.net>

363ec392

phy: micrel: add of configuration for LED mode · 20d8435a

Ben Dooks authored Feb 26, 2014

Add support for the led-mode property for the following PHYs
which have a single LED mode configuration value.

KSZ8001 and KSZ8041 which both use register 0x1e bits 15,14 and
KSZ8021, KSZ8031 and KSZ8051 which use register 0x1f bits 5,4
to control the LED configuration.
Signed-off-by: Ben Dooks <ben.dooks@codethink.co.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>

20d8435a

isdn: fix multiple sleep_on races · 94fcf696

Arnd Bergmann authored Feb 26, 2014

The isdn core code uses a couple of wait queues with
interruptible_sleep_on, which is racy and about to get
removed from the kernel. Fortunately, we know for each case
what we are waiting for, so they can all be converted to
the better wait_event_interruptible interface.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Cc: Karsten Keil <isdn@linux-pingi.de>
Cc: netdev@vger.kernel.org
Signed-off-by: David S. Miller <davem@davemloft.net>

94fcf696

isdn: divert, hysdn: fix interruptible_sleep_on race · c11da83b

Arnd Bergmann authored Feb 26, 2014

These two drivers use identical code for their procfs status
file handling, which contains a small race against status
data becoming available while reading the file.

This uses wait_event_interruptible instead to fix this
particular race and eventually get rid of all sleep_on
instances. There seems to be another race involving
multiple concurrent readers of the same procfs file, which
I don't try to fix here.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Cc: Karsten Keil <isdn@linux-pingi.de>
Cc: netdev@vger.kernel.org
Signed-off-by: David S. Miller <davem@davemloft.net>

c11da83b

isdn: hisax/elsa: fix sleep_on race in elsa FSM · c728cc88

Arnd Bergmann authored Feb 26, 2014

The state machine code in the elsa driver uses interruptible_sleep_on
to wait for state changes, which is racy. A closer look at the possible
states reveals that it is always used to wait for getting back into
ARCOFI_NOP, so we can use wait_event_interruptible instead.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Cc: Karsten Keil <isdn@linux-pingi.de>
Cc: netdev@vger.kernel.org
Signed-off-by: David S. Miller <davem@davemloft.net>

c728cc88

isdn: pcbit: fix interruptible_sleep_on race · e5b3fa15

Arnd Bergmann authored Feb 26, 2014

interruptible_sleep_on is racy and going away. In case of pcbit,
the driver would run into a timeout if the card is initialized
before we start waiting for it. This uses wait_event to fix the
race. In order to do this, the state machine handling for the
timeout case has to get trivially reorganized so we actually know
whether the timeout has occorred or not.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Cc: Karsten Keil <isdn@linux-pingi.de>
Cc: netdev@vger.kernel.org
Signed-off-by: David S. Miller <davem@davemloft.net>

e5b3fa15

atm: firestream: fix interruptible_sleep_on race · c73b1f6a

Arnd Bergmann authored Feb 26, 2014

interruptible_sleep_on is racy and going away. This replaces the one use
in the firestream driver with the appropriate wait_event_interruptible
variant.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Chas Williams <chas@cmf.nrl.navy.mil>
Cc: linux-atm-general@lists.sourceforge.net
Cc: netdev@vger.kernel.org
Signed-off-by: David S. Miller <davem@davemloft.net>

c73b1f6a

Merge branch 'intel-next' · 53e0b080

David S. Miller authored Feb 26, 2014

Aaron Brown says:

====================
Intel Wired LAN Driver Updates

This series contains updates to ixgbe, igb and documentation.  The
first four have been sent up as part of other series where 1 or more
in the series were rejected and either dropped or still being worked
on for reasons unrelated to these patches.

Don makes recovery from a HW ECC error just schedule a reset as it turns
out the previous behaviour of forcing the user to reload is not necessary.
Mark adds WoL support to port 0 of a new device.  Jacob removes a magic
number from the ptp_caps.name and updates the SubmittingPatches
documentation with details on the Fixed: tag.  And Carolyn updates igb
files to remove the FSF physical mail address.

[ DaveM Note: SubmittingPatches change omitted, will go via LKML ]
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

53e0b080

igb: Update license text to remove FSF address and update copyright. · 74cfb2e1

Carolyn Wyborny authored Feb 25, 2014

This patch updates the license text to remove address of Free Software
Foundation and refer  users to www.gnu.org instead. This patch also updates
the copyright dates in appropriate igb driver files.
Signed-off-by: Carolyn Wyborny <carolyn.wyborny@intel.com>
Signed-off-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

74cfb2e1

igb: make local functions static and remove dead code · 167f3f71

Jeff Kirsher authored Feb 25, 2014

Based on Stephen Hemminger's original patch.
Make local functions static, and remove unused functions.
Reported-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

167f3f71

ixgbe: Add WoL support for a new device · 87557440

Mark Rustad authored Feb 25, 2014

Add WoL support for port 0 of a new 82599-based device.
Signed-off-by: Mark Rustad <mark.d.rustad@intel.com>
Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
Signed-off-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

87557440

ixgbe: don't use magic size number to assign ptp_caps.name · ca324099

Jacob Keller authored Feb 25, 2014

Rather than using a magic size number, just use sizeof since that will
work and is more robust to future changes.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
Signed-off-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ca324099

ixgbe: modify behavior on receiving a HW ECC error. · d773ce2d

Don Skidmore authored Feb 25, 2014

Currently when we noticed a HW ECC error we would request the use reload
the driver to force a reset of the part. This was done due to the mistaken
believe that a normal reset would not be sufficient. Well it turns out it
would be so now we just schedule a reset upon seeing the ECC.
Signed-off-by: Don Skidmore <donald.c.skidmore@intel.com>
Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
Signed-off-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

d773ce2d

ipv6: yet another new IPV6_MTU_DISCOVER option IPV6_PMTUDISC_OMIT · 0b95227a

Hannes Frederic Sowa authored Feb 26, 2014

This option has the same semantic as IP_PMTUDISC_OMIT for IPv4 which
got recently introduced. It doesn't honor the path mtu discovered by the
host but in contrary to IPV6_PMTUDISC_INTERFACE allows the generation of
fragments if the packet size exceeds the MTU of the outgoing interface
MTU.

Fixes: 93b36cf3 ("ipv6: support IPV6_PMTU_INTERFACE on sockets")
Cc: Florian Weimer <fweimer@redhat.com>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

0b95227a

ipv4: yet another new IP_MTU_DISCOVER option IP_PMTUDISC_OMIT · 1b346576

Hannes Frederic Sowa authored Feb 26, 2014

IP_PMTUDISC_INTERFACE has a design error: because it does not allow the
generation of fragments if the interface mtu is exceeded, it is very
hard to make use of this option in already deployed name server software
for which I introduced this option.

This patch adds yet another new IP_MTU_DISCOVER option to not honor any
path mtu information and not accepting new icmp notifications destined for
the socket this option is enabled on. But we allow outgoing fragmentation
in case the packet size exceeds the outgoing interface mtu.

As such this new option can be used as a drop-in replacement for
IP_PMTUDISC_DONT, which is currently in use by most name server software
making the adoption of this option very smooth and easy.

The original advantage of IP_PMTUDISC_INTERFACE is still maintained:
ignoring incoming path MTU updates and not honoring discovered path MTUs
in the output path.

Fixes: 482fc609 ("ipv4: introduce new IP_MTU_DISCOVER mode IP_PMTUDISC_INTERFACE")
Cc: Florian Weimer <fweimer@redhat.com>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

1b346576

ipv4: use ip_skb_dst_mtu to determine mtu in ip_fragment · 69647ce4

Hannes Frederic Sowa authored Feb 26, 2014

ip_skb_dst_mtu mostly falls back to ip_dst_mtu_maybe_forward if no socket
is attached to the skb (in case of forwarding) or determines the mtu like
we do in ip_finish_output, which actually checks if we should branch to
ip_fragment. Thus use the same function to determine the mtu here, too.

This is important for the introduction of IP_PMTUDISC_OMIT, where we
want the packets getting cut in pieces of the size of the outgoing
interface mtu. IPv6 already does this correctly.
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

69647ce4

neigh: probe application via netlink in NUD_PROBE · a960ff81

Timo Teräs authored Feb 26, 2014

iproute2 arpd seems to expect this as there's code and comments
to handle netlink probes with NUD_PROBE set. It is used to flush
the arpd cached mappings.

opennhrp instead turns off unicast probes (so it can handle all
neighbour discovery). Without this change it will not see NUD_PROBE
probes and cannot reconfirm the mapping. Thus currently neigh entry
will just fail and can cause few packets dropped until broadcast
discovery is restarted.

Earlier discussion on the subject:
http://marc.info/?t=139305877100001&r=1&w=2Signed-off-by: Timo Teräs <timo.teras@iki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>

a960ff81

ieee802154: fix new function declaration · 44a6bd86

Jean Sacren authored Feb 25, 2014

The commit 8fad346f ("eee802154: add basic support for RF212 to
at86rf230 driver") introduced the new function is_rf212() with some
minor issues in declaration:

1) Fix the function type by changing it to bool as the function
   definition returns a boolean value. Additionally both callers of
   is_rf212() are expected to return a boolean value.

2) Fix the function specifier by deleting the inline keyword as the
   compiler takes care of that.
Signed-off-by: Jean Sacren <sakiwit@gmail.com>
Cc: Phoebe Buckheister <phoebe.buckheister@itwm.fraunhofer.de>
Signed-off-by: David S. Miller <davem@davemloft.net>

44a6bd86

ipv6: log src and dst along with "udp checksum is 0" · 84a3e72c

Bjørn Mork authored Feb 25, 2014

These info messages are rather pointless without any means to identify
the source of the bogus packets.  Logging the src and dst addresses and
ports may help a bit.

Cc: Joe Perches <joe@perches.com>
Signed-off-by: Bjørn Mork <bjorn@mork.no>
Signed-off-by: David S. Miller <davem@davemloft.net>

84a3e72c

vxlan: remove unused port variable in vxlan_udp_encap_recv() · 86c3f0f8
Pablo Neira Ayuso authored Feb 25, 2014
```
Signed-off-by: Pablo Neira Ayuso <pablo@gnumonks.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
```
86c3f0f8

Merge branch 'mlx4' · 941ee45c

David S. Miller authored Feb 26, 2014

Amir Vadai says:

====================
net, net/mlx4: Add sysfs file for port number

Modern distro's are using biosdevname to rename interface to a name based on
slot/port number.
biosdevname can't get the port number of devices that have multiple ports that
share the same PCI function.
This patch adds a sysfs file under: /sys/devices/.../net/<interface>/dev_port,
that contains the port number (0 based) - to be used by biosdevname.
Also, dev_id was wrongly used in mlx4_en driver - added a patch that fix it.

This patch was tested and applied over commit 51adfcc "net: bcmgenet: remove
unused bh_lock member"
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

941ee45c

net/mlx4_en: Fix bad use of dev_id · ca9f9f70

Amir Vadai authored Feb 25, 2014

dev_id should be set for multiple netdev's sharing the same MAC, which
is not the case here.
Signed-off-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ca9f9f70

net/mlx4_en: Expose port number through sysfs · 76a066f2

Amir Vadai authored Feb 25, 2014

Initialize dev_port with port number (0 based) to be accessed through
sysfs from user space.
Signed-off-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

76a066f2

net: Add sysfs file for port number · 3f85944f

Amir Vadai authored Feb 25, 2014

Add a sysfs file to enable user space to query the device
port number used by a netdevice instance. This is needed for
devices that have multiple ports on the same PCI function.
Signed-off-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

3f85944f

Merge branch 'bnx2x' · f2ea0cfd

David S. Miller authored Feb 26, 2014

Michal Schmidt says:

====================
bnx2x: minimize RAM usage in kdump

kdump kernels usually have only a small amount of memory reserved.
bnx2x can be memory-hungry. Let's minimize its memory usage when
running in kdump.

I detect kdump by looking at the "reset_devices" flag. A couple of
storage drivers (cciss, hpsa) use it for the same purpose. I am not sure
this is the best way to solve the problem, but it works.

Should it be made more generic by, say, looking at the total amount
of lowmem instead? Not using TPA by default when lowmem is small and/or
defaulting to fewer queues would help 32bit systems where a driver for
a multi-function multi-queue NIC can consume a significant amount
of available memory. Or do we want no such heuristics?

Is this something to consider doing for other network drivers too?
====================
Acked-by: Ariel Elior <ariele@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

f2ea0cfd

bnx2x: save RAM in kdump kernel by disabling TPA · 94d9de3c

Michal Schmidt authored Feb 25, 2014

When running in a kdump kernel, disable TPA. This saves memory, which
tends to be scarce in kdump.

TPA, being a receive acceleration, is unlikely to be useful for kdump,
whose purpose is to send the memory image out.

This saves additional 5 MB in the kdump environment.
Signed-off-by: Michal Schmidt <mschmidt@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

94d9de3c

bnx2x: save RAM in kdump kernel by using a single queue · ff2ad307

Michal Schmidt authored Feb 25, 2014

When running in a kdump kernel, make sure to use only a single ethernet
queue even if a num_queues option in /etc/modprobe.d/*.conf would specify
otherwise. This saves memory, which tends to be scarce in kdump.

This saves about 40 MB in the kdump environment on a setup with
num_queues=8 in the config file.
Signed-off-by: Michal Schmidt <mschmidt@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ff2ad307

bnx2x: clamp num_queues to prevent passing a negative value · 7d0445d6

Michal Schmidt authored Feb 25, 2014

Use the clamp() macro to make the calculation of the number of queues
slightly easier to understand. It also avoids a crash when someone
accidentally passes a negative value in num_queues= module parameter.
Signed-off-by: Michal Schmidt <mschmidt@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

7d0445d6

net: tcp: add mib counters to track zero window transitions · 8e165e20

Florian Westphal authored Feb 25, 2014

Three counters are added:
- one to track when we went from non-zero to zero window
- one to track the reverse
- one counter incremented when we want to announce zero window,
  but can't because we would shrink current window.
Suggested-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

8e165e20

net: order MPLS ethertypes numerically · 2ebe21fd

Neil Jerram authored Feb 25, 2014

All ethertypes other than ETH_P_MPLS_UC, ETH_P_MPLS_MC and
ETH_P_ATMMPOA were already ordered numerically.  This commit moves
those three ETH_P_... values into correct numerical order too.
Signed-off-by: Neil Jerram <Neil.Jerram@metaswitch.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

2ebe21fd

25 Feb, 2014 8 commits

bnx2x: Remove hidden flow control goto from BNX2X_ALLOC macros · cd2b0389

Joe Perches authored Feb 20, 2014

BNX2X_ALLOC macros use "goto alloc_mem_err"
so these labels appear unused in some functions.

Expand these macros in-place via coccinelle and
some typing.

Update the macros to use statement expressions
and remove the BNX2X_ALLOC macro.

This adds some > 80 char lines.

$ cat bnx2x_pci_alloc.cocci
@@
expression e1;
expression e2;
expression e3;
@@
-	BNX2X_PCI_ALLOC(e1, e2, e3);
+	e1 = BNX2X_PCI_ALLOC(e2, e3); if (!e1) goto alloc_mem_err;

@@
expression e1;
expression e2;
expression e3;
@@
-	BNX2X_PCI_FALLOC(e1, e2, e3);
+	e1 = BNX2X_PCI_FALLOC(e2, e3); if (!e1) goto alloc_mem_err;

@@
expression e1;
expression e2;
@@
-	BNX2X_ALLOC(e1, e2);
+	e1 = kzalloc(e2, GFP_KERNEL); if (!e1) goto alloc_mem_err;

@@
expression e1;
expression e2;
expression e3;
@@
-	kzalloc(sizeof(e1) * e2, e3)
+	kcalloc(e2, sizeof(e1), e3)

@@
expression e1;
expression e2;
expression e3;
@@
-	kzalloc(e1 * sizeof(e2), e3)
+	kcalloc(e1, sizeof(e2), e3)
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

cd2b0389

net: bcmgenet: remove unused bh_lock member · 51adfcc3

Florian Fainelli authored Feb 24, 2014

bh_lock spinlock is unused, remove it from the private driver structure.
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

51adfcc3

net: bcmgenet: remove commented code in bcmgenet_xmit() · da56bbf7

Florian Fainelli authored Feb 24, 2014

This code is commented since it is unused, left-over from the very first
time this driver was merged.
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

da56bbf7

net: bcmgenet: drop checks on priv->phydev · 80d8e96d

Florian Fainelli authored Feb 24, 2014

Drop all the checks on priv->phydev since we will refuse probing the
driver if we cannot attach to a PHY device. Drop all checks on
priv->phydev. This also fixes some smatch issues reported by Dan
Carpenter.
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

80d8e96d

Merge branch 'gianfar' · 432c5b3a

David S. Miller authored Feb 24, 2014

Claudiu Manoil says:

====================
gianfar: Device reset and reconfig fixes

These patches end up fixing some notable device reset & reconfig
related problems.  One issue is on-the-fly (Rx/Tx on) programming
of interrupt coalescing (IC) registers on the processing path,
against HW recommendation.  This is an old issue that became visible
after BQL introduction, as under certain conditions (low traffic)
one TX interrupt gets lost and BQL fires Tx timeout as a result.
Another notable issue is a race on the Tx path (xmit, clean_tx)
during device reset (i.e. during Tx timeout watchdog firing)
that leads to NULL access.
Fixing the problematic on-thy-fly register writes (i.e. the IC regs)
required the implementation of a MAC soft reset procedure.
The race leading to NULL access was addressed by fixing the
stop_gfar()/startup_gfar() pair (disable/enable napi a.s.o.)
and adding the device state DOWN to sync with the TX path.

v2: Refactored if() clauses from gfar_set_features(), PATCH 2.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

432c5b3a

gianfar: Fix Tx int miss, dont write IC on-the-fly · f19015ba

Claudiu Manoil authored Feb 24, 2014

Programming the interrupt coalescing (IC) registers while
the controller/DMA is on may incur the loss of one Tx
confirmation interrupt, under certain conditions.  This is
a subtle hw race because it does not occur during a burst
of Tx packets.  It has been observed on p2020 devices that,
if just one packet is being xmit'ed, the Tx confirmation
doesn't trigger and BQL evetually blocks the Tx queues,
followed by Tx timeout and an un-responsive device.
This issue was not apparent prior to introducing BQL
support, as a late Tx confirmation was not an issue back then
and the next burst of Tx frames would have triggered the
Tx confirmation/ Tx ring cleanup anyway.

Bottom line, the hw specifications state that the IC registers
should not be programmed while the Rx/Tx blocks (the DMA) are
enabled. Further more, these registers are currently re-written
with the same values on the processing path, over and over again.
To fix this, rewriting the IC registers has been removed from
the processing path (napi poll).  A complete MAC reset procedure
has been implemented for the ethtool -c option instead, to
reliably update these registers while the controller is stopped.
Signed-off-by: Claudiu Manoil <claudiu.manoil@freescale.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

f19015ba

gianfar: Fix device reset races (oops) for Tx · 0851133b

Claudiu Manoil authored Feb 24, 2014

The device reset procedure, stop_gfar()/startup_gfar(), has
concurrency issues.
"Kernel access of bad area" oopses show up during Tx timeout
device reset or other reset cases (like changing MTU) that
happen while the interface still has traffic. The oopses
happen in start_xmit and clean_tx_ring when accessing tx_queue->
tx_skbuff which is NULL. The race comes from de-allocating the
tx_skbuff while transmission and napi processing are still
active. Though the Tx queues get temoprarily stopped when Tx
timeout occurs, they get re-enabled as a result of Tx congestion
handling inside the napi context (see clean_tx_ring()). Not
disabling the napi during reset is also a bug, because
clean_tx_ring() will try to access tx_skbuff while it is being
de-alloc'ed and re-alloc'ed.

To fix this, stop_gfar() needs to disable napi processing
after stopping the Tx queues. However, in order to prevent
clean_tx_ring() to re-enable the Tx queue before the napi
gets disabled, the device state DOWN has been introduced.
It prevents the Tx congestion management from re-enabling the
de-congested Tx queue while the device is brought down.
An additional locking state, RESETTING, has been introduced
to prevent simultaneous resets or to prevent configuring the
device while it is resetting.
The bogus 'rxlock's (for each Rx queue) have been removed since
their purpose is not justified, as they don't prevent nor are
suited to prevent device reset/reconfig races (such as this one).
Signed-off-by: Claudiu Manoil <claudiu.manoil@freescale.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

0851133b

gianfar: Don't free/request irqs on device reset · 80ec396c

Claudiu Manoil authored Feb 24, 2014

Resetting the device (stop_gfar()/startup_gfar()) should
be fast and to the point, in order to timely recover
from an error condition (like Tx timeout) or during
device reconfig.  The irq free/ request routines are just
redundant here, and they should be part of the device
close/ open routines instead.
Signed-off-by: Claudiu Manoil <claudiu.manoil@freescale.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

80ec396c