1. 03 Nov, 2020 8 commits
    • Oliver Hartkopp's avatar
      can: dev: __can_get_echo_skb(): fix real payload length return value for RTR frames · ed3320ce
      Oliver Hartkopp authored
      The can_get_echo_skb() function returns the number of received bytes to
      be used for netdev statistics. In the case of RTR frames we get a valid
      (potential non-zero) data length value which has to be passed for further
      operations. But on the wire RTR frames have no payload length. Therefore
      the value to be used in the statistics has to be zero for RTR frames.
      Reported-by: default avatarVincent Mailhol <mailhol.vincent@wanadoo.fr>
      Signed-off-by: default avatarOliver Hartkopp <socketcan@hartkopp.net>
      Link: https://lore.kernel.org/r/20201020064443.80164-1-socketcan@hartkopp.net
      Fixes: cf5046b3 ("can: dev: let can_get_echo_skb() return dlc of CAN frame")
      Signed-off-by: default avatarMarc Kleine-Budde <mkl@pengutronix.de>
      ed3320ce
    • Vincent Mailhol's avatar
      can: dev: can_get_echo_skb(): prevent call to kfree_skb() in hard IRQ context · 2283f79b
      Vincent Mailhol authored
      If a driver calls can_get_echo_skb() during a hardware IRQ (which is often, but
      not always, the case), the 'WARN_ON(in_irq)' in
      net/core/skbuff.c#skb_release_head_state() might be triggered, under network
      congestion circumstances, together with the potential risk of a NULL pointer
      dereference.
      
      The root cause of this issue is the call to kfree_skb() instead of
      dev_kfree_skb_irq() in net/core/dev.c#enqueue_to_backlog().
      
      This patch prevents the skb to be freed within the call to netif_rx() by
      incrementing its reference count with skb_get(). The skb is finally freed by
      one of the in-irq-context safe functions: dev_consume_skb_any() or
      dev_kfree_skb_any(). The "any" version is used because some drivers might call
      can_get_echo_skb() in a normal context.
      
      The reason for this issue to occur is that initially, in the core network
      stack, loopback skb were not supposed to be received in hardware IRQ context.
      The CAN stack is an exeption.
      
      This bug was previously reported back in 2017 in [1] but the proposed patch
      never got accepted.
      
      While [1] directly modifies net/core/dev.c, we try to propose here a
      smoother modification local to CAN network stack (the assumption
      behind is that only CAN devices are affected by this issue).
      
      [1] http://lore.kernel.org/r/57a3ffb6-3309-3ad5-5a34-e93c3fe3614d@cetitec.comSigned-off-by: default avatarVincent Mailhol <mailhol.vincent@wanadoo.fr>
      Link: https://lore.kernel.org/r/20201002154219.4887-2-mailhol.vincent@wanadoo.fr
      Fixes: 39549eef ("can: CAN Network device driver and Netlink interface")
      Signed-off-by: default avatarMarc Kleine-Budde <mkl@pengutronix.de>
      2283f79b
    • Marc Kleine-Budde's avatar
      can: rx-offload: don't call kfree_skb() from IRQ context · 2ddd6bfe
      Marc Kleine-Budde authored
      A CAN driver, using the rx-offload infrastructure, is reading CAN frames
      (usually in IRQ context) from the hardware and placing it into the rx-offload
      queue to be delivered to the networking stack via NAPI.
      
      In case the rx-offload queue is full, trying to add more skbs results in the
      skbs being dropped using kfree_skb(). If done from hard-IRQ context this
      results in the following warning:
      
      [  682.552693] ------------[ cut here ]------------
      [  682.557360] WARNING: CPU: 0 PID: 3057 at net/core/skbuff.c:650 skb_release_head_state+0x74/0x84
      [  682.566075] Modules linked in: can_raw can coda_vpu flexcan dw_hdmi_ahb_audio v4l2_jpeg imx_vdoa can_dev
      [  682.575597] CPU: 0 PID: 3057 Comm: cansend Tainted: G        W         5.7.0+ #18
      [  682.583098] Hardware name: Freescale i.MX6 Quad/DualLite (Device Tree)
      [  682.589657] [<c0112628>] (unwind_backtrace) from [<c010c1c4>] (show_stack+0x10/0x14)
      [  682.597423] [<c010c1c4>] (show_stack) from [<c06c481c>] (dump_stack+0xe0/0x114)
      [  682.604759] [<c06c481c>] (dump_stack) from [<c0128f10>] (__warn+0xc0/0x10c)
      [  682.611742] [<c0128f10>] (__warn) from [<c0129314>] (warn_slowpath_fmt+0x5c/0xc0)
      [  682.619248] [<c0129314>] (warn_slowpath_fmt) from [<c0b95dec>] (skb_release_head_state+0x74/0x84)
      [  682.628143] [<c0b95dec>] (skb_release_head_state) from [<c0b95e08>] (skb_release_all+0xc/0x24)
      [  682.636774] [<c0b95e08>] (skb_release_all) from [<c0b95eac>] (kfree_skb+0x74/0x1c8)
      [  682.644479] [<c0b95eac>] (kfree_skb) from [<bf001d1c>] (can_rx_offload_queue_sorted+0xe0/0xe8 [can_dev])
      [  682.654051] [<bf001d1c>] (can_rx_offload_queue_sorted [can_dev]) from [<bf001d6c>] (can_rx_offload_get_echo_skb+0x48/0x94 [can_dev])
      [  682.666007] [<bf001d6c>] (can_rx_offload_get_echo_skb [can_dev]) from [<bf01efe4>] (flexcan_irq+0x194/0x5dc [flexcan])
      [  682.676734] [<bf01efe4>] (flexcan_irq [flexcan]) from [<c019c1ec>] (__handle_irq_event_percpu+0x4c/0x3ec)
      [  682.686322] [<c019c1ec>] (__handle_irq_event_percpu) from [<c019c5b8>] (handle_irq_event_percpu+0x2c/0x88)
      [  682.695993] [<c019c5b8>] (handle_irq_event_percpu) from [<c019c64c>] (handle_irq_event+0x38/0x5c)
      [  682.704887] [<c019c64c>] (handle_irq_event) from [<c01a1058>] (handle_fasteoi_irq+0xc8/0x180)
      [  682.713432] [<c01a1058>] (handle_fasteoi_irq) from [<c019b2c0>] (generic_handle_irq+0x30/0x44)
      [  682.722063] [<c019b2c0>] (generic_handle_irq) from [<c019b8f8>] (__handle_domain_irq+0x64/0xdc)
      [  682.730783] [<c019b8f8>] (__handle_domain_irq) from [<c06df4a4>] (gic_handle_irq+0x48/0x9c)
      [  682.739158] [<c06df4a4>] (gic_handle_irq) from [<c0100b30>] (__irq_svc+0x70/0x98)
      [  682.746656] Exception stack(0xe80e9dd8 to 0xe80e9e20)
      [  682.751725] 9dc0:                                                       00000001 e80e8000
      [  682.759922] 9de0: e820cf80 00000000 ffffe000 00000000 eaf08fe4 00000000 600d0013 00000000
      [  682.768117] 9e00: c1732e3c c16093a8 e820d4c0 e80e9e28 c018a57c c018b870 600d0013 ffffffff
      [  682.776315] [<c0100b30>] (__irq_svc) from [<c018b870>] (lock_acquire+0x108/0x4e8)
      [  682.783821] [<c018b870>] (lock_acquire) from [<c0e938e4>] (down_write+0x48/0xa8)
      [  682.791242] [<c0e938e4>] (down_write) from [<c02818dc>] (unlink_file_vma+0x24/0x40)
      [  682.798922] [<c02818dc>] (unlink_file_vma) from [<c027a258>] (free_pgtables+0x34/0xb8)
      [  682.806858] [<c027a258>] (free_pgtables) from [<c02835a4>] (exit_mmap+0xe4/0x170)
      [  682.814361] [<c02835a4>] (exit_mmap) from [<c01248e0>] (mmput+0x5c/0x110)
      [  682.821171] [<c01248e0>] (mmput) from [<c012e910>] (do_exit+0x374/0xbe4)
      [  682.827892] [<c012e910>] (do_exit) from [<c0130888>] (do_group_exit+0x38/0xb4)
      [  682.835132] [<c0130888>] (do_group_exit) from [<c0130914>] (__wake_up_parent+0x0/0x14)
      [  682.843063] irq event stamp: 1936
      [  682.846399] hardirqs last  enabled at (1935): [<c02938b0>] rmqueue+0xf4/0xc64
      [  682.853553] hardirqs last disabled at (1936): [<c0100b20>] __irq_svc+0x60/0x98
      [  682.860799] softirqs last  enabled at (1878): [<bf04cdcc>] raw_release+0x108/0x1f0 [can_raw]
      [  682.869256] softirqs last disabled at (1876): [<c0b8f478>] release_sock+0x18/0x98
      [  682.876753] ---[ end trace 7bca4751ce44c444 ]---
      
      This patch fixes the problem by replacing the kfree_skb() by
      dev_kfree_skb_any(), as rx-offload might be called from threaded IRQ handlers
      as well.
      
      Fixes: ca913f1a ("can: rx-offload: can_rx_offload_queue_sorted(): fix error handling, avoid skb mem leak")
      Fixes: 6caf8a6d ("can: rx-offload: can_rx_offload_queue_tail(): fix error handling, avoid skb mem leak")
      Link: http://lore.kernel.org/r/20201019190524.1285319-3-mkl@pengutronix.deSigned-off-by: default avatarMarc Kleine-Budde <mkl@pengutronix.de>
      2ddd6bfe
    • Zhang Changzhong's avatar
      can: proc: can_remove_proc(): silence remove_proc_entry warning · 3accbfdc
      Zhang Changzhong authored
      If can_init_proc() fail to create /proc/net/can directory, can_remove_proc()
      will trigger a warning:
      
      WARNING: CPU: 6 PID: 7133 at fs/proc/generic.c:672 remove_proc_entry+0x17b0
      Kernel panic - not syncing: panic_on_warn set ...
      
      Fix to return early from can_remove_proc() if can proc_dir does not exists.
      Signed-off-by: default avatarZhang Changzhong <zhangchangzhong@huawei.com>
      Link: https://lore.kernel.org/r/1594709090-3203-1-git-send-email-zhangchangzhong@huawei.com
      Fixes: 8e8cda6d ("can: initial support for network namespaces")
      Acked-by: default avatarOliver Hartkopp <socketcan@hartkopp.net>
      Signed-off-by: default avatarMarc Kleine-Budde <mkl@pengutronix.de>
      3accbfdc
    • Oleksij Rempel's avatar
      dt-bindings: can: flexcan: convert fsl,*flexcan bindings to yaml · e5ab9aa7
      Oleksij Rempel authored
      In order to automate the verification of DT nodes convert
      fsl-flexcan.txt to fsl,flexcan.yaml
      Signed-off-by: default avatarOleksij Rempel <o.rempel@pengutronix.de>
      Reviewed-by: default avatarRob Herring <robh@kernel.org>
      Link: https://lore.kernel.org/r/20201022075218.11880-3-o.rempel@pengutronix.deSigned-off-by: default avatarMarc Kleine-Budde <mkl@pengutronix.de>
      e5ab9aa7
    • Oleksij Rempel's avatar
      dt-bindings: can: add can-controller.yaml · 1f923440
      Oleksij Rempel authored
      For now we have only node name as common rule for all CAN controllers
      Signed-off-by: default avatarOleksij Rempel <o.rempel@pengutronix.de>
      Reviewed-by: default avatarRob Herring <robh@kernel.org>
      Link: https://lore.kernel.org/r/20201022075218.11880-2-o.rempel@pengutronix.deSigned-off-by: default avatarMarc Kleine-Budde <mkl@pengutronix.de>
      1f923440
    • YueHaibing's avatar
      sfp: Fix error handing in sfp_probe() · 96216181
      YueHaibing authored
      gpiod_to_irq() never return 0, but returns negative in
      case of error, check it and set gpio_irq to 0.
      
      Fixes: 73970055 ("sfp: add SFP module support")
      Signed-off-by: default avatarYueHaibing <yuehaibing@huawei.com>
      Reviewed-by: default avatarAndrew Lunn <andrew@lunn.ch>
      Link: https://lore.kernel.org/r/20201031031053.25264-1-yuehaibing@huawei.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      96216181
    • Sukadev Bhattiprolu's avatar
      powerpc/vnic: Extend "failover pending" window · 1d850493
      Sukadev Bhattiprolu authored
      Commit 5a18e1e0 introduced the 'failover_pending' state to track
      the "failover pending window" - where we wait for the partner to become
      ready (after a transport event) before actually attempting to failover.
      i.e window is between following two events:
      
              a. we get a transport event due to a FAILOVER
      
              b. later, we get CRQ_INITIALIZED indicating the partner is
                 ready  at which point we schedule a FAILOVER reset.
      
      and ->failover_pending is true during this window.
      
      If during this window, we attempt to open (or close) a device, we pretend
      that the operation succeded and let the FAILOVER reset path complete the
      operation.
      
      This is fine, except if the transport event ("a" above) occurs during the
      open and after open has already checked whether a failover is pending. If
      that happens, we fail the open, which can cause the boot scripts to leave
      the interface down requiring administrator to manually bring up the device.
      
      This fix "extends" the failover pending window till we are _actually_
      ready to perform the failover reset (i.e until after we get the RTNL
      lock). Since open() holds the RTNL lock, we can be sure that we either
      finish the open or if the open() fails due to the failover pending window,
      we can again pretend that open is done and let the failover complete it.
      
      We could try and block the open until failover is completed but a) that
      could still timeout the application and b) Existing code "pretends" that
      failover occurred "just after" open succeeded, so marks the open successful
      and lets the failover complete the open. So, mark the open successful even
      if the transport event occurs before we actually start the open.
      
      Fixes: 5a18e1e0 ("ibmvnic: Fix failover case for non-redundant configuration")
      Signed-off-by: default avatarSukadev Bhattiprolu <sukadev@linux.ibm.com>
      Acked-by: default avatarDany Madden <drt@linux.ibm.com>
      Link: https://lore.kernel.org/r/20201030170711.1562994-1-sukadev@linux.ibm.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      1d850493
  2. 02 Nov, 2020 7 commits
  3. 01 Nov, 2020 3 commits
    • Jakub Kicinski's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf · 859191b2
      Jakub Kicinski authored
      Pablo Neira Ayuso says:
      
      ====================
      Netfilter fixes for net
      
      The following patchset contains Netfilter fixes for net:
      
      1) Incorrect netlink report logic in flowtable and genID.
      
      2) Add a selftest to check that wireguard passes the right sk
         to ip_route_me_harder, from Jason A. Donenfeld.
      
      3) Pass the actual sk to ip_route_me_harder(), also from Jason.
      
      4) Missing expression validation of updates via nft --check.
      
      5) Update byte and packet counters regardless of whether they
         match, from Stefano Brivio.
      ====================
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      859191b2
    • wenxu's avatar
      ip_tunnel: fix over-mtu packet send fail without TUNNEL_DONT_FRAGMENT flags · 20149e9e
      wenxu authored
      The tunnel device such as vxlan, bareudp and geneve in the lwt mode set
      the outer df only based TUNNEL_DONT_FRAGMENT.
      And this was also the behavior for gre device before switching to use
      ip_md_tunnel_xmit in commit 962924fa ("ip_gre: Refactor collect
      metatdata mode tunnel xmit to ip_md_tunnel_xmit")
      
      When the ip_gre in lwt mode xmit with ip_md_tunnel_xmi changed the rule and
      make the discrepancy between handling of DF by different tunnels. So in the
      ip_md_tunnel_xmit should follow the same rule like other tunnels.
      
      Fixes: cfc7381b ("ip_tunnel: add collect_md mode to IPIP tunnel")
      Signed-off-by: default avatarwenxu <wenxu@ucloud.cn>
      Link: https://lore.kernel.org/r/1604028728-31100-1-git-send-email-wenxu@ucloud.cnSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      20149e9e
    • Mark Deneen's avatar
      cadence: force nonlinear buffers to be cloned · 403dc167
      Mark Deneen authored
      In my test setup, I had a SAMA5D27 device configured with ip forwarding, and
      second device with usb ethernet (r8152) sending ICMP packets.  If the packet
      was larger than about 220 bytes, the SAMA5 device would "oops" with the
      following trace:
      
      kernel BUG at net/core/skbuff.c:1863!
      Internal error: Oops - BUG: 0 [#1] ARM
      Modules linked in: xt_MASQUERADE ppp_async ppp_generic slhc iptable_nat xt_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 can_raw can bridge stp llc ipt_REJECT nf_reject_ipv4 sd_mod cdc_ether usbnet usb_storage r8152 scsi_mod mii o
      ption usb_wwan usbserial micrel macb at91_sama5d2_adc phylink gpio_sama5d2_piobu m_can_platform m_can industrialio_triggered_buffer kfifo_buf of_mdio can_dev fixed_phy sdhci_of_at91 sdhci_pltfm libphy sdhci mmc_core ohci_at91 ehci_atmel o
      hci_hcd iio_rescale industrialio sch_fq_codel spidev prox2_hal(O)
      CPU: 0 PID: 0 Comm: swapper Tainted: G           O      5.9.1-prox2+ #1
      Hardware name: Atmel SAMA5
      PC is at skb_put+0x3c/0x50
      LR is at macb_start_xmit+0x134/0xad0 [macb]
      pc : [<c05258cc>]    lr : [<bf0ea5b8>]    psr: 20070113
      sp : c0d01a60  ip : c07232c0  fp : c4250000
      r10: c0d03cc8  r9 : 00000000  r8 : c0d038c0
      r7 : 00000000  r6 : 00000008  r5 : c59b66c0  r4 : 0000002a
      r3 : 8f659eff  r2 : c59e9eea  r1 : 00000001  r0 : c59b66c0
      Flags: nzCv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment none
      Control: 10c53c7d  Table: 2640c059  DAC: 00000051
      Process swapper (pid: 0, stack limit = 0x75002d81)
      
      <snipped stack>
      
      [<c05258cc>] (skb_put) from [<bf0ea5b8>] (macb_start_xmit+0x134/0xad0 [macb])
      [<bf0ea5b8>] (macb_start_xmit [macb]) from [<c053e504>] (dev_hard_start_xmit+0x90/0x11c)
      [<c053e504>] (dev_hard_start_xmit) from [<c0571180>] (sch_direct_xmit+0x124/0x260)
      [<c0571180>] (sch_direct_xmit) from [<c053eae4>] (__dev_queue_xmit+0x4b0/0x6d0)
      [<c053eae4>] (__dev_queue_xmit) from [<c05a5650>] (ip_finish_output2+0x350/0x580)
      [<c05a5650>] (ip_finish_output2) from [<c05a7e24>] (ip_output+0xb4/0x13c)
      [<c05a7e24>] (ip_output) from [<c05a39d0>] (ip_forward+0x474/0x500)
      [<c05a39d0>] (ip_forward) from [<c05a13d8>] (ip_sublist_rcv_finish+0x3c/0x50)
      [<c05a13d8>] (ip_sublist_rcv_finish) from [<c05a19b8>] (ip_sublist_rcv+0x11c/0x188)
      [<c05a19b8>] (ip_sublist_rcv) from [<c05a2494>] (ip_list_rcv+0xf8/0x124)
      [<c05a2494>] (ip_list_rcv) from [<c05403c4>] (__netif_receive_skb_list_core+0x1a0/0x20c)
      [<c05403c4>] (__netif_receive_skb_list_core) from [<c05405c4>] (netif_receive_skb_list_internal+0x194/0x230)
      [<c05405c4>] (netif_receive_skb_list_internal) from [<c0540684>] (gro_normal_list.part.0+0x14/0x28)
      [<c0540684>] (gro_normal_list.part.0) from [<c0541280>] (napi_complete_done+0x16c/0x210)
      [<c0541280>] (napi_complete_done) from [<bf14c1c0>] (r8152_poll+0x684/0x708 [r8152])
      [<bf14c1c0>] (r8152_poll [r8152]) from [<c0541424>] (net_rx_action+0x100/0x328)
      [<c0541424>] (net_rx_action) from [<c01012ec>] (__do_softirq+0xec/0x274)
      [<c01012ec>] (__do_softirq) from [<c012d6d4>] (irq_exit+0xcc/0xd0)
      [<c012d6d4>] (irq_exit) from [<c0160960>] (__handle_domain_irq+0x58/0xa4)
      [<c0160960>] (__handle_domain_irq) from [<c0100b0c>] (__irq_svc+0x6c/0x90)
      Exception stack(0xc0d01ef0 to 0xc0d01f38)
      1ee0:                                     00000000 0000003d 0c31f383 c0d0fa00
      1f00: c0d2eb80 00000000 c0d2e630 4dad8c49 4da967b0 0000003d 0000003d 00000000
      1f20: fffffff5 c0d01f40 c04e0f88 c04e0f8c 30070013 ffffffff
      [<c0100b0c>] (__irq_svc) from [<c04e0f8c>] (cpuidle_enter_state+0x7c/0x378)
      [<c04e0f8c>] (cpuidle_enter_state) from [<c04e12c4>] (cpuidle_enter+0x28/0x38)
      [<c04e12c4>] (cpuidle_enter) from [<c014f710>] (do_idle+0x194/0x214)
      [<c014f710>] (do_idle) from [<c014fa50>] (cpu_startup_entry+0xc/0x14)
      [<c014fa50>] (cpu_startup_entry) from [<c0a00dc8>] (start_kernel+0x46c/0x4a0)
      Code: e580c054 8a000002 e1a00002 e8bd8070 (e7f001f2)
      ---[ end trace 146c8a334115490c ]---
      
      The solution was to force nonlinear buffers to be cloned.  This was previously
      reported by Klaus Doth (https://www.spinics.net/lists/netdev/msg556937.html)
      but never formally submitted as a patch.
      
      This is the third revision, hopefully the formatting is correct this time!
      Suggested-by: default avatarKlaus Doth <krnl@doth.eu>
      Fixes: 653e92a9 ("net: macb: add support for padding and fcs computation")
      Signed-off-by: default avatarMark Deneen <mdeneen@saucontech.com>
      Link: https://lore.kernel.org/r/20201030155814.622831-1-mdeneen@saucontech.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      403dc167
  4. 31 Oct, 2020 5 commits
    • Jakub Kicinski's avatar
      Merge branch 'ipv6-reply-icmp-error-if-fragment-doesn-t-contain-all-headers' · 72a41f95
      Jakub Kicinski authored
      Hangbin Liu says:
      
      ====================
      IPv6: reply ICMP error if fragment doesn't contain all headers
      
      When our Engineer run latest IPv6 Core Conformance test, test v6LC.1.3.6:
      First Fragment Doesn’t Contain All Headers[1] failed. The test purpose is to
      verify that the node (Linux for example) should properly process IPv6 packets
      that don’t include all the headers through the Upper-Layer header.
      
      Based on RFC 8200, Section 4.5 Fragment Header
      
        -  If the first fragment does not include all headers through an
           Upper-Layer header, then that fragment should be discarded and
           an ICMP Parameter Problem, Code 3, message should be sent to
           the source of the fragment, with the Pointer field set to zero.
      
      The first patch add a definition for ICMPv6 Parameter Problem, code 3.
      The second patch add a check for the 1st fragment packet to make sure
      Upper-Layer header exist.
      
      [1] Page 68, v6LC.1.3.6: First Fragment Doesn’t Contain All Headers part A, B,
      C and D at https://ipv6ready.org/docs/Core_Conformance_5_0_0.pdf
      [2] My reproducer:
      
      import sys, os
      from scapy.all import *
      
      def send_frag_dst_opt(src_ip6, dst_ip6):
          ip6 = IPv6(src = src_ip6, dst = dst_ip6, nh = 44)
      
          frag_1 = IPv6ExtHdrFragment(nh = 60, m = 1)
          dst_opt = IPv6ExtHdrDestOpt(nh = 58)
      
          frag_2 = IPv6ExtHdrFragment(nh = 58, offset = 4, m = 1)
          icmp_echo = ICMPv6EchoRequest(seq = 1)
      
          pkt_1 = ip6/frag_1/dst_opt
          pkt_2 = ip6/frag_2/icmp_echo
      
          send(pkt_1)
          send(pkt_2)
      
      def send_frag_route_opt(src_ip6, dst_ip6):
          ip6 = IPv6(src = src_ip6, dst = dst_ip6, nh = 44)
      
          frag_1 = IPv6ExtHdrFragment(nh = 43, m = 1)
          route_opt = IPv6ExtHdrRouting(nh = 58)
      
          frag_2 = IPv6ExtHdrFragment(nh = 58, offset = 4, m = 1)
          icmp_echo = ICMPv6EchoRequest(seq = 2)
      
          pkt_1 = ip6/frag_1/route_opt
          pkt_2 = ip6/frag_2/icmp_echo
      
          send(pkt_1)
          send(pkt_2)
      
      if __name__ == '__main__':
          src = sys.argv[1]
          dst = sys.argv[2]
          conf.iface = sys.argv[3]
          send_frag_dst_opt(src, dst)
          send_frag_route_opt(src, dst)
      ====================
      
      Link: https://lore.kernel.org/r/20201027123313.3717941-1-liuhangbin@gmail.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      72a41f95
    • Hangbin Liu's avatar
      IPv6: reply ICMP error if the first fragment don't include all headers · 2efdaaaf
      Hangbin Liu authored
      Based on RFC 8200, Section 4.5 Fragment Header:
      
        -  If the first fragment does not include all headers through an
           Upper-Layer header, then that fragment should be discarded and
           an ICMP Parameter Problem, Code 3, message should be sent to
           the source of the fragment, with the Pointer field set to zero.
      
      Checking each packet header in IPv6 fast path will have performance impact,
      so I put the checking in ipv6_frag_rcv().
      
      As the packet may be any kind of L4 protocol, I only checked some common
      protocols' header length and handle others by (offset + 1) > skb->len.
      Also use !(frag_off & htons(IP6_OFFSET)) to catch atomic fragments
      (fragmented packet with only one fragment).
      
      When send ICMP error message, if the 1st truncated fragment is ICMP message,
      icmp6_send() will break as is_ineligible() return true. So I added a check
      in is_ineligible() to let fragment packet with nexthdr ICMP but no ICMP header
      return false.
      Signed-off-by: default avatarHangbin Liu <liuhangbin@gmail.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      2efdaaaf
    • Hangbin Liu's avatar
      ICMPv6: Add ICMPv6 Parameter Problem, code 3 definition · b59e286b
      Hangbin Liu authored
      Based on RFC7112, Section 6:
      
         IANA has added the following "Type 4 - Parameter Problem" message to
         the "Internet Control Message Protocol version 6 (ICMPv6) Parameters"
         registry:
      
            CODE     NAME/DESCRIPTION
             3       IPv6 First Fragment has incomplete IPv6 Header Chain
      Signed-off-by: default avatarHangbin Liu <liuhangbin@gmail.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      b59e286b
    • Colin Ian King's avatar
      net: atm: fix update of position index in lec_seq_next · 2f71e006
      Colin Ian King authored
      The position index in leq_seq_next is not updated when the next
      entry is fetched an no more entries are available. This causes
      seq_file to report the following error:
      
      "seq_file: buggy .next function lec_seq_next [lec] did not update
       position index"
      
      Fix this by always updating the position index.
      
      [ Note: this is an ancient 2002 bug, the sha is from the
        tglx/history repo ]
      
      Fixes 4aea2cbf ("[ATM]: Move lan seq_file ops to lec.c [1/3]")
      Signed-off-by: default avatarColin Ian King <colin.king@canonical.com>
      Link: https://lore.kernel.org/r/20201027114925.21843-1-colin.king@canonical.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      2f71e006
    • Stefano Brivio's avatar
      netfilter: ipset: Update byte and packet counters regardless of whether they match · 7d10e62c
      Stefano Brivio authored
      In ip_set_match_extensions(), for sets with counters, we take care of
      updating counters themselves by calling ip_set_update_counter(), and of
      checking if the given comparison and values match, by calling
      ip_set_match_counter() if needed.
      
      However, if a given comparison on counters doesn't match the configured
      values, that doesn't mean the set entry itself isn't matching.
      
      This fix restores the behaviour we had before commit 4750005a
      ("netfilter: ipset: Fix "don't update counters" mode when counters used
      at the matching"), without reintroducing the issue fixed there: back
      then, mtype_data_match() first updated counters in any case, and then
      took care of matching on counters.
      
      Now, if the IPSET_FLAG_SKIP_COUNTER_UPDATE flag is set,
      ip_set_update_counter() will anyway skip counter updates if desired.
      
      The issue observed is illustrated by this reproducer:
      
        ipset create c hash:ip counters
        ipset add c 192.0.2.1
        iptables -I INPUT -m set --match-set c src --bytes-gt 800 -j DROP
      
      if we now send packets from 192.0.2.1, bytes and packets counters
      for the entry as shown by 'ipset list' are always zero, and, no
      matter how many bytes we send, the rule will never match, because
      counters themselves are not updated.
      Reported-by: default avatarMithil Mhatre <mmhatre@redhat.com>
      Fixes: 4750005a ("netfilter: ipset: Fix "don't update counters" mode when counters used at the matching")
      Signed-off-by: default avatarStefano Brivio <sbrivio@redhat.com>
      Signed-off-by: default avatarJozsef Kadlecsik <kadlec@netfilter.org>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      7d10e62c
  5. 30 Oct, 2020 17 commits