What it is: vhost net is a character device that can be used to reduce
the number of system calls involved in virtio networking.
Existing virtio net code is used in the guest without modification.
There's similarity with vringfd, with some differences and reduced scope
- uses eventfd for signalling
- structures can be moved around in memory at any time (good for
migration, bug work-arounds in userspace)
- write logging is supported (good for migration)
- support memory table and not just an offset (needed for kvm)
common virtio related code has been put in a separate file vhost.c and
can be made into a separate module if/when more backends appear. I used
Rusty's lguest.c as the source for developing this part : this supplied
me with witty comments I wouldn't be able to write myself.
What it is not: vhost net is not a bus, and not a generic new system
call. No assumptions are made on how guest performs hypercalls.
Userspace hypervisors are supported as well as kvm.
How it works: Basically, we connect virtio frontend (configured by
userspace) to a backend. The backend could be a network device, or a tap
device. Backend is also configured by userspace, including vlan/mac
etc.
Status: This works for me, and I haven't see any crashes.
Compared to userspace, people reported improved latency (as I save up to
4 system calls per packet), as well as better bandwidth and CPU
utilization.
Features that I plan to look at in the future:
- mergeable buffers
- zero copy
- scalability tuning: figure out the best threading model to use
Note on RCU usage (this is also documented in vhost.h, near
private_pointer which is the value protected by this variant of RCU):
what is happening is that the rcu_dereference() is being used in a
workqueue item. The role of rcu_read_lock() is taken on by the start of
execution of the workqueue item, of rcu_read_unlock() by the end of
execution of the workqueue item, and of synchronize_rcu() by
flush_workqueue()/flush_work(). In the future we might need to apply
some gcc attribute or sparse annotation to the function passed to
INIT_WORK(). Paul's ack below is for this RCU usage.
(Includes fixes by Alan Cox <alan@linux.intel.com>,
David L Stevens <dlstevens@us.ibm.com>,
Chris Wright <chrisw@redhat.com>)
Acked-by: Rusty Russell <rusty@rustcorp.com.au>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Tun device looks similar to a packet socket
in that both pass complete frames from/to userspace.
This patch fills in enough fields in the socket underlying tun driver
to support sendmsg/recvmsg operations, and message flags
MSG_TRUNC and MSG_DONTWAIT, and exports access to this socket
to modules. Regular read/write behaviour is unchanged.
This way, code using raw sockets to inject packets
into a physical device, can support injecting
packets into host network stack almost without modification.
First user of this interface will be vhost virtualization
accelerator.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds error checking of ctrlmode values for CAN devices. As
an example all availabe bits are implemented in the mcp251x driver.
Signed-off-by: Christian Pellegrin <chripell@fsfe.org>
Acked-by: Wolfgang Grandegger <wg@grandegger.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
GNU General Public License is in file "COPYING".
Signed-off-by: Amit Kumar Salecha <amit.salecha@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Before sending Interrupt coalesce parameters to device,
convert them in little endian.
Signed-off-by: Amit Kumar Salecha <amit.salecha@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In netxen_read_mac_addr, mac_addr should be declared
u64 instead of __le64, used by host only.
Signed-off-by: Amit Kumar Salecha <amit.salecha@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Convert code away from ->read_proc/->write_proc interfaces. Switch to
proc_create()/proc_create_data() which make addition of proc entries
reliable wrt NULL ->proc_fops, NULL ->data and so on.
Problem with ->read_proc et al is described here commit
786d7e1612f0b0adb6046f19b906609e4fe8b1ba "Fix rmmod/read/write races in
/proc entries"
[akpm@linux-foundation.org: CONFIG_PROC_FS=n build fix]
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Tilman Schmidt <tilman@imap.cc>
Signed-off-by: Karsten Keil <keil@b1-systems.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Set default BLKT values for new OSA/3 hardware.
Signed-off-by: Einar Lueck <elelueck@de.ibm.com>
Signed-off-by: Frank Blaschka <frank.blaschka@de.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
If a qeth device is set online, several initialisation steps are
performed. If a failure in one of these steps occurs, the qeth
device is reset into DOWN state. If due to the failure a qeth recovery
is scheduled and started in another thread, this might cause all kinds
of conflicts, even a kernel panic. The patch forbids scheduling of a
qeth recovery while online processing is performed till the card is in
state SOFTSETUP.
Signed-off-by: Ursula Braun <ursula.braun@de.ibm.com>
Signed-off-by: Frank Blaschka <frank.blaschka@de.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
New feature to trace HiperSockets network traffic for debugging
purposes.
Signed-off-by: Ursula Braun <ursula.braun@de.ibm.com>
Signed-off-by: Frank Blaschka <frank.blaschka@de.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Make updating the multicast address list generic for all families and
enforce the requirement to update the entire multicast table array all at
once instead of piecemeal which causes problems on some parts.
Signed-off-by: Bruce Allan <bruce.w.allan@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Provide MAC-specific function pointer to determine the LAN ID (PCI func).
The LAN ID is used internally by the driver to determine which h/w lock
to use to protect accessing the PHY on ESB2 as well as help to determine
the alternate MAC address on some parts.
Signed-off-by: Bruce Allan <bruce.w.allan@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Similar to 82571/2/3 parts that already do this, if ESB2/80003es2lan parts
have an alternate MAC address provided in the EEPROM use it instead of the
default MAC address. This patch makes the the actual code that does this
generic so that it can be better used by both MAC families.
Signed-off-by: Bruce Allan <bruce.w.allan@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
To prevent the CAN drivers to operate on invalid socketbuffers the skbs are
now checked and silently dropped at the xmit-function consistently.
Also the netdev stats are consistently using the CAN data length code (dlc)
for [rx|tx]_bytes now.
Signed-off-by: Oliver Hartkopp <oliver@hartkopp.net>
Acked-by: Wolfgang Grandegger <wg@grandegger.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
pci_dma_mapping_error should be used to test return value of
pci_map_single or pci_map_page.
Signed-off-by: Denis Kirjanov <kirjanov@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Skip MAC loopback test when the adapter is set to a VT mode such as SR-IOV
or VMDq
Signed-off-by: Greg Rose <gregory.v.rose@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Adds SR-IOV features supported by the 82599 controller to the main driver
module. If the CONFIG_PCI_IOV kernel option is selected then the SR-IOV
features are enabled. Use the max_vfs module option to allocate up to 63
virtual functions per physical port.
Signed-off-by: Greg Rose <gregory.v.rose@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Adds code to the core 82599 module to support SR-IOV features of the 82599
network controller
Signed-off-by: Greg Rose <gregory.v.rose@intel.com>
Acked-by: Peter P Waskiewicz Jr <peter.p.waskiewicz.jr@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add the mailbox and SR-IOV feature modules to the ixgbe driver Makefile.
Signed-off-by: Greg Rose <gregory.v.rose@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This module and header file add functions to support SR-IOV features of the
82599 10Gbe controller.
Signed-off-by: Greg Rose <gregory.v.rose@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds register definitions, bit definitions and structures used by
the driver to support SR-IOV features of the 82599 controller.
Signed-off-by: Greg Rose <gregory.v.rose@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The 82599 virtual function device and the master 82599 physical function
device implement a mailbox utility for communication between the devices
using some SRAM scratch memory and a doorbell/answering mechanism enabled
via interrupt and/or polling. This C module and accompanying header
file implement the base functions for use of this feature.
Signed-off-by: Greg Rose <gregory.v.rose@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Modifications for the Kconfig and network device Makefile to add the ixgbevf
driver module to the kernel plus basic driver documentation.
Signed-off-by: Greg Rose <gregory.v.rose@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
82599 Virtual Function Device Driver Makefile
Signed-off-by: Greg Rose <gregory.v.rose@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
These modules and header contain the Linux OS network interface code and core
interrupt and network send/receive handlers.
Signed-off-by: Greg Rose <gregory.v.rose@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The 82599 virtual function device and the master 82599 physical function
device implement a mailbox utility for communication between the devices
using some SRAM scratch memory and a doorbell/answering mechanism enabled
via interrupt and/or polling. This C module and accompanying header
file implement the base functions for use of this feature.
Signed-off-by: Greg Rose <gregory.v.rose@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This module and header file contain the core functions for the 82599
virtual function device.
Signed-off-by: Greg Rose <gregory.v.rose@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
These two headers define the commonly used macros, data structures,
register bits and register offsets used by the ixgbevf driver on the
82599 virtual function device
Signed-off-by: Greg Rose <gregory.v.rose@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
commit 18c0019102228875cb0f6f252dad5148491e96b2 ("drivers/net/mac8390.c:
Convert printk(KERN_<level> to pr_<level>(") broke the build:
| drivers/net/mac8390.c:306: error: expected ')' before 'version'
as "version" is not a string literal, but a variable.
Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds support for the global device reset interrupt. Without
this change the drivers will report tx hangs on all ports when a global
device reset occurs.
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The main differences compared to the MSCAN on the MPC5200 are:
- More flexibility in choosing the CAN source clock and frequency:
Three different clock sources can be selected: "ip", "ref" or "sys".
For the latter two, a clock divider can be defined as well. If the
clock source is not specified by the device tree, we first try to
find an optimal CAN source clock based on the system clock. If that
is not possible, the reference clock will be used.
- The behavior of bus-off recovery is configurable:
To comply with the usual handling of Socket-CAN bus-off recovery,
"recovery on request" is selected (instead of automatic recovery).
Note that only MPC5121 Rev. 2 and later is supported.
Signed-off-by: Wolfgang Grandegger <wg@denx.de>
Reviewed-by: Wolfram Sang <w.sang@pengutronix.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
The start_xmit function of the MSCAN Driver did return improperly if
the CAN dlc check failed (skb not freed and invalid return code). This
patch adds a proper check of the frame lenght and data size and returns
now correctly. The invalid skb packets are dropped silently as suggested
by David Miller in the thread "[RFC] ndo_validate_skb: Let the netdev
check a valid skb content" on the netdev mailing list.
Furthermore, a typo has been fixed.
Signed-off-by: Wolfgang Grandegger <wg@denx.de>
Reviewed-by: Wolfram Sang <w.sang@pengutronix.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
General cleanup of the ep93xx_eth driver.
1) Use pr_fmt() to prefix the module name and __func__ to the error
messages.
2) <linux/io.h> instead of <asm/io.h>
3) <mach/hardware.h> instead of <mach/ep93xx-regs.h> and <mach/platform.h>
4) Move the ep93xx_mdio_read (and ep93xx_mdio_write) function to eliminate
the function prototype.
5) Change all the printk(<level> messages to pr_<level> and remove the
__func__ argument.
6) Use platform_get_{resource/irq} to get the platform resources and add
an error check.
7) Use resource_size() for request_mem_region() and ioremap().
8) Use %pM to print the MAC address at the end of the probe.
9) Use dev->dev_addr not data->dev_addr for the MAC argument because a
random address could be used if the platform does not supply one.
The message at the end of the probe is left as a printk since it displays
cleaner without the function name that would be displayed with pr_info().
Signed-off-by: H Hartley Sweeten <hsweeten@visionengravers.com>
Acked-by: Lennert Buytenhek <kernel@wantstofly.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
The MSI-X table size needs to be properly set before pci_enable_msix()
is called. But on certain machines, the writes are delayed and the
MSI-X table size is incorrectly read. By reading the
BNX2_PCI_MSIX_CONTROL register, the writes are flushed and now
ensure that the MSI-X table is set correctly before MSI-X
is enable on the device.
This patch was originally diagnosed and authored by
Kalyan Ram Chintalapati <kalyanc@vmware.com>.
Signed-off-by: Benjamin Li <benli@broadcom.com>
Signed-off-by: Kalyan Ram Chintalapati <kalyanc@vmware.com>
Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fix e1000e_rar_set() to flush consecutive register writes to avoid write
combining which some parts cannot handle. Update e1000e_init_rx_addrs()
to call the fixed e1000e_rar_set() instead of duplicating code.
Also change e1000e_rar_set() to _not_ set the Address Valid bit if the MAC
address is all zeros.
Signed-off-by: Bruce Allan <bruce.w.allan@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
e1000e_enable_tx_pkt_filtering() will return a non-zero value if the
driver fails to enable the manageability interface on the host for
any reason; instead it should retun zero to indicate filtering has been
disabled. Also provide a single exit point for the function.
Signed-off-by: Bruce Allan <bruce.w.allan@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Adaptive IFS which involves writing to the Adaptive IFS Throttle register
was being done for all devices supported by the driver even though it is
not supported (i.e. the register doesn't even exist) on some devices. The
feature is supported on 8257x/82583 and ICH/PCH based devices, but not
on ESB2.
Signed-off-by: Bruce Allan <bruce.w.allan@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Bruce Allan <bruce.w.allan@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Due to a change in pci_restore_state()[1] which clears the saved_state
flag, the driver should call pci_save_state() to set the flag once again
to avoid issues with EEH (same fix that recently was submitted for ixgbe).
[1] commmit 4b77b0a2ba27d64f58f16d8d4d48d8319dda36ff
Signed-off-by: Bruce Allan <bruce.w.allan@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
o If tx and rx resources are not available, during set mac request.
Then this request wont be passed to firmware and it will be added to
driver mac list and will never make it to firmware.
So if resources are not available, don't add it to driver mac list.
Signed-off-by: Amit Kumar Salecha <amit.salecha@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
o While unloading driver or resetting the context, tx ring was not
getting free.
Signed-off-by: Amit Kumar Salecha <amit.salecha@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The patch implements a firmware command to fetch the eeprom data.
Signed-off-by: Sarveshwar Bandi <sarveshwarb@serverengines.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Use DEFINE_PCI_DEVICE_TABLE() so we get place PCI ids table into correct section
in every case.
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In the driver probe function the emac module clock needs to
be enabled before calling register_netdev(). As soon as the
device is registered the driver get_stats function can be invoked
by the core - the module clock must be switched on to be able to
read from stats registers. Also explicitly call matching clk_disable
for failure conditions in probe function.
Signed-off-by: Sriramakrishnan <srk@ti.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The Davicom DM9100 and DM9102 chips are used on the motherboards of
some SPARC systems (supported by the tulip driver) and also in PCI
expansion cards (supported by the dmfe driver). There is no
difference in the PCI device ids for the two different configurations,
so these drivers both claim the device ids. However, it is possible
to distinguish the two configurations by the presence of Open Firmware
properties for them, so we do that.
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Signed-off-by: Grant Grundler <grundler@parisc-linux.org>
Signed-off-by: David S. Miller <davem@davemloft.net>