- 04 Dec, 2015 16 commits
-
-
David S. Miller authored
Bjørn Mork says: ==================== net: qmi_wwan: MDM9x30 support We add new device IDs all the time, often without any testing on actual hardware. This is usually OK as long as the device is similar to already supported devices, using the same chipset and firmware basis. But the Sierra Wireless MC7455 is an example of a new chipset generation. Adding it based on assumed similarity with its ancestors proved too optimistic. This series adds the missing bits and pieces necessary to support LTE Advanced modems based on the Qualcomm MDM9x30 chipset. A big thanks to Sierra Wireless for providing MC7455 samples for testing The most important change is the "raw-ip" support. The series also adds a necessary control request, removes an unsupported device ID, and adds a driver specific entry in MAINTAINERS. A few random notes about "raw-ip": "I rather have these all running in raw IP mode. The 802.3 framing is utterly stupid." - Marcel Holtmann in Jan 2012 [1] Marcel was right. I should have listened to him. What more can I say? The 802.3 framing has provided a steady supply of firmware bugs for many years. We've added driver workarounds for many of these, but there are still known bugs where the workaround is so yucky that we have refused to apply it. But all that is over now. The latest generation Qualcomm chips no longer supports 802.3 framing at all. I had two open questions regarding the "raw-ip" userspace API: 1) Should we continue faking an ethernet device, even if we don't use the L2 headers on the USB link anymore? There was a vote in favour of the "headerless" device. This is the honest representation of the hardware/firmware interface. 2) What input should the driver base its framing on? Snooping or directly manipulating QMI is considered out of the question. We delegated all QMI handling to userspace from the beginning. We have so far required userspace to configure the firmware for "802.3" framing, or fail if that proved impossible. This requirement is now changed. Userspace must now inform the driver if it negotiates "raw-ip" framing. Two alternative interfaces were proposed: - ethtool private driver flag, or - sysfs file The NetworkManager/ModemManager developers were in favour of the sysfs alternative. These questions (or any other you migh have :) are of course still open. This patch set presents the solutions I currently prefer, considering the above. All comments are appreciated, even simple '+1' ones. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
-
Bjørn Mork authored
Signed-off-by: Bjørn Mork <bjorn@mork.no> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Bjørn Mork authored
Signed-off-by: Bjørn Mork <bjorn@mork.no> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Bjørn Mork authored
QMI wwan devices have traditionally emulated ethernet devices by default. But they have always had the capability of operating without any L2 header at all, transmitting and receiving "raw" IP packets over the USB link. This firmware feature used to be configurable through the QMI management protocol. Traditionally there was no way to verify the firmware mode without attempting to change it. And the firmware would often disallow changes anyway, i.e. due to a session already being established. In some cases, this could be a hidden firmware internal session, completely outside host control. For these reasons, sticking with the "well known" default mode was safest. But newer generations of QMI hardware and firmware have moved towards defaulting to "raw IP" mode instead, followed by an increasing number of bugs in the already buggy "802.3" firmware implementation. At the same time, the QMI management protocol gained the ability to detect the current mode. This has enabled the userspace QMI management application to verify the current firmware mode without trying to modify it. Following this development, the latest QMI hardware and firmware (the MDM9x30 generation) has dropped support for "802.3" mode entirely. Support for "raw IP" framing in the driver is therefore necessary for these devices, and to a certain degree to work around problems with the previous generation, This patch adds support for "raw IP" framing for QMI devices, changing the netdev from an ethernet device to an ARPHRD_NONE p-t-p device when "raw IP" framing is enabled. The firmware setup is fully delegated to the QMI userspace management application, through simple tunneling of the QMI protocol. The driver will therefore not know which mode has been "negotiated" between firmware and userspace. Allowing userspace to inform the driver of the result through a sysfs switch is considered a better alternative than to change the well established clean delegation of firmware management to userspace. Signed-off-by: Bjørn Mork <bjorn@mork.no> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Bjørn Mork authored
Assume the minidriver has taken care of all L2 header parsing if it sets skb->protocol. This allows the minidriver to support non-ethernet L2 headers, and even operate without any L2 header at all. Signed-off-by: Bjørn Mork <bjorn@mork.no> Acked-by: Oliver Neukum <oneukum@suse.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Bjørn Mork authored
This turned out to be a bootloader device ID. No need for that in this driver. It will only provide a single serial function. Signed-off-by: Bjørn Mork <bjorn@mork.no> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Bjørn Mork authored
MDM9x30 based modems appear to go into a deeper sleep when suspended without "Remote Wakeup" enabled. The QMI interface will not respond unless a "set DTR" control request is sent on resume. The effect is similar to a QMI_CTL SYNC request, resetting (some of) the firmware state. We allow userspace sessions to span multiple character device open/close sequences. This means that userspace can depend on firmware state while both the netdev and the character device are closed. We have disabled "needs_remote_wakeup" at this point to allow devices without remote wakeup support to be auto-suspended. To make sure the MDM9x30 keeps firmware state, we need to keep "needs_remote_wakeup" always set. We also need to issue a "set DTR" request to enable the QMI interface. Signed-off-by: Bjørn Mork <bjorn@mork.no> Signed-off-by: David S. Miller <davem@davemloft.net>
-
David S. Miller authored
Salil Mehta says: ==================== net:hns: Add support of Hip06 SoC to the Hislicon Network Subsystem This PATCH V7 addresses the TAB formatting comments by Sergei Shtylyov. Missing TABs at some other palces have also been corrected. PATCH V6: This addresses the review comments provided by David Miller over the existing use of ENABLE/DISABLE hash defines with the code. These hash defines are doing a similar job as implicit type bool would do. So these are kind of duplicate and are redundant. PATCH V5: This PATCH addresses the review comments by Yuval Mintz <Yuval.Mintz@qlogic.com>. This rework of comments are basically related to: 1) styling of the code, 2) RSS default Key initiailization related code 3) redundant code removal PATCH V4: This addresses the review comment provided by Sergei Shtylyov. The changelog of every patch has also been modified. PATCH V3: Addresses the review comment floated by David Miller PATCH V2: 1) Bug Fixes and Clean-up: Internally identified 2) Addresses internal review comments by Kenneth Lee and by Huang Daode 3) Addresses the review comment from "Yisen.Zhuang(Zhuangyuzeng)" 4) Adds fix from Fengguang Wu for an error generated from "kbuild test robot" from Intel 5) Ethtool support for TSO set option from Lisheng PATCH V1: Adds initial support of Hip06 SoC with below changes: This patch-set adds support of new Hisilicon Hip06 SoC to the existing (already part of net-next) HNS ethernet driver for Hip05 SoC. Hip06 is a multi-core SoC and is a derivative of Hip05 SoC with lots of new hardware featres supported like RSS, TSO, hardware VLAN assist etc. The changes in the driver are mainly due to following: 1) changes in the DMA descriptor provided by the Hip06 ethernet hardware. These changes need to co-exist with already present Hip05 DMA descriptor and its operating functions. The decision to choose the correct type of DMA descriptor is taken dynamically depending upon the version of the hardware (i.e. V1/hip05 or V2/hip06, see already existing hisilicon-hns-nic.txt binding file for the detailed description version and naming). 2) To support new features added to the Hip06 ethernet hardware: a. RSS (Receive Side Scaling) b. TSO (TCP Segment Offload) c. Hardware VLAN support (currently we are initializing hardware to not assist in stripping the vlan tag at hardware level. Proper support of this feature and ethtool would come after these patches have been accepted) Kindly note that, this patchset has been based on latest net-next. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
-
Salil authored
This patch adds the initializzation code to disable the hardware vlan support for VLAN Tag stripping by default for now. Proper support of "hardware VLAN assitance" feature would soon come in the next coming patches. Signed-off-by: Salil Mehta <salil.mehta@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Salil authored
This patch adds the support of ethtool TSO option to support Hip06 SoC to HNS Signed-off-by: Salil Mehta <salil.mehta@huawei.com> Signed-off-by: lisheng <lisheng011@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Salil authored
This patch adds the support of "TSO (TCP Segment Offload)" feature provided by the Hip06 ethernet hardware to the HNS ethernet driver. Enabling this feature would help offload the TCP Segmentation process to the Hip06 ethernet hardware. This eventually would help in saving precious cpu cycles. Signed-off-by: Salil Mehta <salil.mehta@huawei.com> Signed-off-by: lisheng <lisheng011@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Salil authored
This patch adds the support of "RSS (Receive Side Scaling)" feature provided by the Hip06 ethernet hardware to the HNS ethernet driver. This feature helps in distributing the different flows (mapped as hash by hardware using Toeplitz Hash) to different Queues asssociated with the processor cores. The mapping of flow-hash values to the different queues is stored in indirection table (which is per Packet- parse-Engine/PPE). This patch also provides the changes to re-program the (flow-hash<->Qid) mapping using the ethtool. Signed-off-by: Salil Mehta <salil.mehta@huawei.com> Reviewed-by: Kenneth Lee <liguozhu@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Salil authored
This patchset adds support of Hisilicon Hip06 SoC to the existing HNS ethernet driver. The changes in the driver are mainly due to changes in the DMA descriptor provided by the Hip06 ethernet hardware. These changes need to co-exist with already present Hip05 DMA descriptor and its operating functions. The decision to choose the correct type of DMA descriptor is taken dynamically depending upon the version of the hardware (i.e. V1/hip05 or V2/hip06, see already existing hisilicon-hns-nic.txt binding file for detailed description). other changes includes in SBM, DSAF and PPE modules as well. Changes affecting the driver related to the newly added ethernet hardware features in Hip06 would be added as separate patch over this and subsequent patches. Signed-off-by: Salil Mehta <salil.mehta@huawei.com> Signed-off-by: yankejian <yankejian@huawei.com> Signed-off-by: huangdaode <huangdaode@hisilicon.com> Signed-off-by: lipeng <lipeng321@huawei.com> Signed-off-by: lisheng <lisheng011@huawei.com> Signed-off-by: Fengguang Wu <fengguang.wu@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
yzhu1 authored
It is not necessary to use two brackets. As such, the redudant brackets are removed. CC: Jay Vosburgh <j.vosburgh@gmail.com> CC: Veaceslav Falico <vfalico@gmail.com> CC: Andy Gospodarek <gospo@cumulusnetworks.com> Signed-off-by: Zhu Yanjun <yanjun.zhu@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller authored
Conflicts: drivers/net/ethernet/renesas/ravb_main.c kernel/bpf/syscall.c net/ipv4/ipmr.c All three conflicts were cases of overlapping changes. Signed-off-by: David S. Miller <davem@davemloft.net>
-
git://git.kernel.org/pub/scm/linux/kernel/git/davem/netLinus Torvalds authored
Pull networking fixes from David Miller: "A lot of Thanksgiving turkey leftovers accumulated, here goes: 1) Fix bluetooth l2cap_chan object leak, from Johan Hedberg. 2) IDs for some new iwlwifi chips, from Oren Givon. 3) Fix rtlwifi lockups on boot, from Larry Finger. 4) Fix memory leak in fm10k, from Stephen Hemminger. 5) We have a route leak in the ipv6 tunnel infrastructure, fix from Paolo Abeni. 6) Fix buffer pointer handling in arm64 bpf JIT,f rom Zi Shen Lim. 7) Wrong lockdep annotations in tcp md5 support, fix from Eric Dumazet. 8) Work around some middle boxes which prevent proper handling of TCP Fast Open, from Yuchung Cheng. 9) TCP repair can do huge kmalloc() requests, build paged SKBs instead. From Eric Dumazet. 10) Fix msg_controllen overflow in scm_detach_fds, from Daniel Borkmann. 11) Fix device leaks on ipmr table destruction in ipv4 and ipv6, from Nikolay Aleksandrov. 12) Fix use after free in epoll with AF_UNIX sockets, from Rainer Weikusat. 13) Fix double free in VRF code, from Nikolay Aleksandrov. 14) Fix skb leaks on socket receive queue in tipc, from Ying Xue. 15) Fix ifup/ifdown crach in xgene driver, from Iyappan Subramanian. 16) Fix clearing of persistent array maps in bpf, from Daniel Borkmann. 17) In TCP, for the cross-SYN case, we don't initialize tp->copied_seq early enough. From Eric Dumazet. 18) Fix out of bounds accesses in bpf array implementation when updating elements, from Daniel Borkmann. 19) Fill gaps in RCU protection of np->opt in ipv6 stack, from Eric Dumazet. 20) When dumping proxy neigh entries, we have to accomodate NULL device pointers properly, from Konstantin Khlebnikov. 21) SCTP doesn't release all ipv6 socket resources properly, fix from Eric Dumazet. 22) Prevent underflows of sch->q.qlen for multiqueue packet schedulers, also from Eric Dumazet. 23) Fix MAC and unicast list handling in bnxt_en driver, from Jeffrey Huang and Michael Chan. 24) Don't actively scan radar channels, from Antonio Quartulli" * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (110 commits) net: phy: reset only targeted phy bnxt_en: Setup uc_list mac filters after resetting the chip. bnxt_en: enforce proper storing of MAC address bnxt_en: Fixed incorrect implementation of ndo_set_mac_address net: lpc_eth: remove irq > NR_IRQS check from probe() net_sched: fix qdisc_tree_decrease_qlen() races openvswitch: fix hangup on vxlan/gre/geneve device deletion ipv4: igmp: Allow removing groups from a removed interface ipv6: sctp: implement sctp_v6_destroy_sock() arm64: bpf: add 'store immediate' instruction ipv6: kill sk_dst_lock ipv6: sctp: add rcu protection around np->opt net/neighbour: fix crash at dumping device-agnostic proxy entries sctp: use GFP_USER for user-controlled kmalloc sctp: convert sack_needed and sack_generation to bits ipv6: add complete rcu protection around np->opt bpf: fix allocation warnings in bpf maps and integer overflow mvebu: dts: enable IP checksum with jumbo frames for Armada 38x on Port0 net: mvneta: enable setting custom TX IP checksum limit net: mvneta: fix error path for building skb ...
-
- 03 Dec, 2015 24 commits
-
-
git://git.kernel.dk/linux-blockLinus Torvalds authored
Pull block fixes from Jens Axboe: "A collection of fixes from this series. The most important here is a regression fix for an issue that some folks would hit in blk-merge.c, and the NVMe queue depth limit for the screwed up Apple "nvme" controller. In more detail, this pull request contains: - a set of fixes for null_blk, including a fix for a few corner cases where we could hang the device. From Arianna and Paolo. - lightnvm: - A build improvement from Keith. - Update the qemu pci id detection from Matias. - Error handling fixes for leaks and other little fixes from Sudip and Wenwei. - fix from Eric where BLKRRPART would not return EBUSY for whole device mounts, only when partitions were mounted. - fix from Jan Kara, where EOF O_DIRECT reads would return negatively. - remove check for rq_mergeable() when checking limits for cloned requests. The check doesn't make any sense. It's assuming that since NOMERGE is set on the request that we don't have to recalculate limits since the request didn't change, but that's not true if the request has been redirected. From Hannes. - correctly get the bio front segment value set for single segment bio's, fixing a BUG() in blk-merge. From Ming" * 'for-linus' of git://git.kernel.dk/linux-block: nvme: temporary fix for Apple controller reset null_blk: change type of completion_nsec to unsigned long null_blk: guarantee device restart in all irq modes null_blk: set a separate timer for each command blk-merge: fix computing bio->bi_seg_front_size in case of single segment direct-io: Fix negative return from dio read beyond eof block: Always check queue limits for cloned requests lightnvm: missing nvm_lock acquire lightnvm: unconverted ppa returned in get_bb_tbl lightnvm: refactor and change vendor id for qemu lightnvm: do device max sectors boundary check first lightnvm: fix ioctl memory leaks lightnvm: free memory when gennvm register fails lightnvm: Simplify config when disabled Return EBUSY from BLKRRPART for mounted whole-dev fs
-
git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-traceLinus Torvalds authored
Pull tracing fix from Steven Rostedt: "During the merge window I added a new file that is used to filter trace events on pids. It filters all events where only tasks with their pid in that file exists. It also handles the sched_switch and sched_wakeup trace events where the current task does not have its pid in the file, but the task either being switched to or awaken does. Unfortunately, I forgot about sched_wakeup_new and sched_waking. Both of these tracepoints use the same class as the sched_wakeup tracepoint, and they too should be included in what gets filtered by the set_event_pid file" * tag 'trace-v4.4-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: tracing: Add sched_wakeup_new and sched_waking tracepoints for pid filter
-
David S. Miller authored
Merge tag 'mac80211-for-davem-2015-12-02' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211 Johannes Berg says: ==================== A small set of fixes for 4.4: * fix scanning in mac80211 to not actively scan radar channels (from Antonio) * fix uninitialized variable in remain-on-channel that could lead to treating frame TX as remain-on-channel and not sending the frame at all * remove NL80211_FEATURE_FULL_AP_CLIENT_STATE again, it was broken and needs more work, we'll enable it later * fix call_rcu() induced use-after-reset/free in mesh (that was suddenly causing issues in certain tests) * always request block-ack window size 64 as we found some APs will otherwise crash (really ...) * fix P2P-Device teardown sequence to avoid restarting with uninitialized data ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
-
Jiri Pirko authored
Better to just warn the user that something really odd is going on and continue to run. Suggested-by: Or Gerlitz <gerlitz.or@gmail.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Jérôme Pouiller authored
It is possible to address another chip on same MDIO bus. The case is correctly handled for media advertising. It is taken into account only if mii_data->phy_id == phydev->addr. However, this condition was missing for reset case. Signed-off-by: Jérôme Pouiller <jezz@sysmic.org> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Stephen Boyd authored
Typically we return error pointers when we want to use those pointers in the non-error case, but this function is just returning error pointers or NULL for success. Change the style to plain int to follow normal kernel coding styles. Cc: Joachim Eastwood <manabian@gmail.com> Signed-off-by: Stephen Boyd <sboyd@codeaurora.org> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Jon Paul Maloy authored
Commit 5405ff6e ("tipc: convert node lock to rwlock") introduced a bug to the node reference counter handling. When a message is successfully sent in the function tipc_node_xmit(), we return directly after releasing the node lock, instead of continuing and decrementing the node reference counter as we should do. This commit fixes this bug. Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
David S. Miller authored
Stas Sergeev says: ==================== mvneta: implement ethtool autonegotiation control These 2 patches add an ability to control the autonegotiation via ethtool. For example: ethtool -s eth0 autoneg off ethtool -s eth0 autoneg on This is needed if you want to connect the mvneta's MII to different switches or PHYs: the ones the do support the in-band status, and the ones that do not. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
-
Stas Sergeev authored
This patch allows to do ethtool -s eth0 autoneg off ethtool -s eth0 autoneg on to disable or enable autonegotiation at run-time. Without that functionality, the only way to control the autonegotiation is to modify the device tree. This is needed if you plan to use the same kernel with different ethernet switches, the ones that support the in-band status and the ones that not. CC: Thomas Petazzoni <thomas.petazzoni@free-electrons.com> CC: netdev@vger.kernel.org CC: linux-kernel@vger.kernel.org Signed-off-by: Stas Sergeev <stsp@users.sourceforge.net> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Stas Sergeev authored
This moves autoneg-related bit manipulations to the single place. CC: Thomas Petazzoni <thomas.petazzoni@free-electrons.com> CC: netdev@vger.kernel.org CC: linux-kernel@vger.kernel.org Signed-off-by: Stas Sergeev <stsp@users.sourceforge.net> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Thierry Reding authored
These new helpers simplify implementing multi-driver modules and properly handle failure to register one driver by unregistering all previously registered drivers. Signed-off-by: Thierry Reding <treding@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Thierry Reding authored
These new helpers simplify implementing multi-driver modules and properly handle failure to register one driver by unregistering all previously registered drivers. Signed-off-by: Thierry Reding <treding@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Thierry Reding authored
These new helpers simplify implementing multi-driver modules and properly handle failure to register one driver by unregistering all previously registered drivers. Signed-off-by: Thierry Reding <treding@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Thierry Reding authored
These new helpers simplify implementing multi-driver modules and properly handle failure to register one driver by unregistering all previously registered drivers. Signed-off-by: Thierry Reding <treding@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Guillaume Nault authored
* Register PF_PPPOX with pppox module rather than with pppoe, so that pppoe doesn't get loaded for any PF_PPPOX socket. * Register PX_PROTO_* with standard MODULE_ALIAS_NET_PF_PROTO() instead of using pppox's own naming scheme. * While there, add auto-loading feature for pptp. Signed-off-by: Guillaume Nault <g.nault@alphalink.fr> Signed-off-by: David S. Miller <davem@davemloft.net>
-
David S. Miller authored
Michael Chan says: ==================== bnxt_en: set mac address and uc_list bug fixes. Fix ndo_set_mac_address() for PF and VF. Re-apply uc_list after chip reset. v2: Fix compile error if CONFIG_BNXT_SRIOV is not set. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
-
Michael Chan authored
Call bnxt_cfg_rx_mode() in bnxt_init_chip() to setup uc_list and mc_list mac address filters. Before the patch, uc_list is not setup again after chip reset (such as ethtool ring size change) and macvlans don't work any more after that. Modify bnxt_cfg_rx_mode() to return error codes appropriately so that the init chip sequence can detect any failures. Signed-off-by: Michael Chan <mchan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Jeffrey Huang authored
For PF, the bp->pf.mac_addr always holds the permanent MAC addr assigned by the HW. For VF, the bp->vf.mac_addr always holds the administrator assigned VF MAC addr. The random generated VF MAC addr should never get stored to bp->vf.mac_addr. This way, when the VF wants to change the MAC address, we can tell if the adminstrator has already set it and disallow the VF from changing it. v2: Fix compile error if CONFIG_BNXT_SRIOV is not set. Signed-off-by: Jeffrey Huang <huangjw@broadcom.com> Signed-off-by: Michael Chan <mchan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Jeffrey Huang authored
The existing ndo_set_mac_address only copies the new MAC addr and didn't set the new MAC addr to the HW. The correct way is to delete the existing default MAC filter from HW and add the new one. Because of RFS filters are also dependent on the default mac filter l2 context, the driver must go thru close_nic() to delete the default MAC and RFS filters, then open_nic() to set the default MAC address to HW. Signed-off-by: Jeffrey Huang <huangjw@broadcom.com> Signed-off-by: Michael Chan <mchan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
David S. Miller authored
Stefan Hajnoczi says: ==================== Add virtio transport for AF_VSOCK v2: * Rebased onto Linux v4.4-rc2 * vhost: Refuse to assign reserved CIDs * vhost: Refuse guest CID if already in use * vhost: Only accept correctly addressed packets (no spoofing!) * vhost: Support flexible rx/tx descriptor layout * vhost: Add missing total_tx_buf decrement * virtio_transport: Fix total_tx_buf accounting * virtio_transport: Add virtio_transport global mutex to prevent races * common: Notify other side of SOCK_STREAM disconnect (fixes shutdown semantics) * common: Avoid recursive mutex_lock(tx_lock) for write_space (fixes deadlock) * common: Define VIRTIO_VSOCK_TYPE_STREAM/DGRAM hardware interface constants * common: Define VIRTIO_VSOCK_SHUTDOWN_RCV/SEND hardware interface constants * common: Fix peer_buf_alloc inheritance on child socket This patch series adds a virtio transport for AF_VSOCK (net/vmw_vsock/). AF_VSOCK is designed for communication between virtual machines and hypervisors. It is currently only implemented for VMware's VMCI transport. This series implements the proposed virtio-vsock device specification from here: http://comments.gmane.org/gmane.comp.emulators.virtio.devel/855 Most of the work was done by Asias He and Gerd Hoffmann a while back. I have picked up the series again. The QEMU userspace changes are here: https://github.com/stefanha/qemu/commits/vsock Why virtio-vsock? ----------------- Guest<->host communication is currently done over the virtio-serial device. This makes it hard to port sockets API-based applications and is limited to static ports. virtio-vsock uses the sockets API so that applications can rely on familiar SOCK_STREAM and SOCK_DGRAM semantics. Applications on the host can easily connect to guest agents because the sockets API allows multiple connections to a listen socket (unlike virtio-serial). This simplifies the guest<->host communication and eliminates the need for extra processes on the host to arbitrate virtio-serial ports. Overview -------- This series adds 3 pieces: 1. virtio_transport_common.ko - core virtio vsock code that uses vsock.ko 2. virtio_transport.ko - guest driver 3. drivers/vhost/vsock.ko - host driver Howto ----- The following kernel options are needed: CONFIG_VSOCKETS=y CONFIG_VIRTIO_VSOCKETS=y CONFIG_VIRTIO_VSOCKETS_COMMON=y CONFIG_VHOST_VSOCK=m Launch QEMU as follows: # qemu ... -device vhost-vsock-pci,id=vhost-vsock-pci0,guest-cid=3 Guest and host can communicate via AF_VSOCK sockets. The host's CID (address) is 2 and the guest is automatically assigned a CID (use VMADDR_CID_ANY (-1) to bind to it). Status ------ There are a few design changes I'd like to make to the virtio-vsock device: 1. The 3-way handshake isn't necessary over a reliable transport (virtqueue). Spoofing packets is also impossible so the security aspects of the 3-way handshake (including syn cookie) add nothing. The next version will have a single operation to establish a connection. 2. Credit-based flow control doesn't work for SOCK_DGRAM since multiple clients can transmit to the same listen socket. There is no way for the clients to coordinate buffer space with each other fairly. The next version will drop credit-based flow control for SOCK_DGRAM and only rely on best-effort delivery. SOCK_STREAM still has guaranteed delivery. 3. In the next version only the host will be able to establish connections (i.e. to connect to a guest agent). This is for security reasons since there is currently no ability to provide host services only to certain guests. This also matches how AF_VSOCK works on modern VMware hypervisors. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
-
Asias He authored
Enable virtio-vsock and vhost-vsock. Signed-off-by: Asias He <asias@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Asias He authored
VM sockets vhost transport implementation. This module runs in host kernel. Signed-off-by: Asias He <asias@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Asias He authored
VM sockets virtio transport implementation. This module runs in guest kernel. Signed-off-by: Asias He <asias@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Asias He authored
This module contains the common code and header files for the following virtio-vsock and virtio-vhost kernel modules. Signed-off-by: Asias He <asias@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-