diff options
Diffstat (limited to 'Documentation/PCI')
| -rw-r--r-- | Documentation/PCI/00-INDEX | 4 | ||||
| -rw-r--r-- | Documentation/PCI/MSI-HOWTO.txt | 435 | ||||
| -rw-r--r-- | Documentation/PCI/pci-iov-howto.txt | 56 | ||||
| -rw-r--r-- | Documentation/PCI/pci.txt | 36 |
4 files changed, 394 insertions, 137 deletions
diff --git a/Documentation/PCI/00-INDEX b/Documentation/PCI/00-INDEX index 812b17fe3ed..147231f1613 100644 --- a/Documentation/PCI/00-INDEX +++ b/Documentation/PCI/00-INDEX @@ -2,12 +2,12 @@ - this file MSI-HOWTO.txt - the Message Signaled Interrupts (MSI) Driver Guide HOWTO and FAQ. -PCI-DMA-mapping.txt - - info for PCI drivers using DMA portably across all platforms PCIEBUS-HOWTO.txt - a guide describing the PCI Express Port Bus driver pci-error-recovery.txt - info on PCI error recovery +pci-iov-howto.txt + - the PCI Express I/O Virtualization HOWTO pci.txt - info on the PCI subsystem for device driver authors pcieaer-howto.txt diff --git a/Documentation/PCI/MSI-HOWTO.txt b/Documentation/PCI/MSI-HOWTO.txt index dcf7acc720e..10a93696e55 100644 --- a/Documentation/PCI/MSI-HOWTO.txt +++ b/Documentation/PCI/MSI-HOWTO.txt @@ -45,7 +45,7 @@ arrived in memory (this becomes more likely with devices behind PCI-PCI bridges). In order to ensure that all the data has arrived in memory, the interrupt handler must read a register on the device which raised the interrupt. PCI transaction ordering rules require that all the data -arrives in memory before the value can be returned from the register. +arrive in memory before the value may be returned from the register. Using MSIs avoids this problem as the interrupt-generating write cannot pass the data writes, so by the time the interrupt is raised, the driver knows that all the data has arrived in memory. @@ -86,61 +86,143 @@ device. int pci_enable_msi(struct pci_dev *dev) -A successful call will allocate ONE interrupt to the device, regardless -of how many MSIs the device supports. The device will be switched from +A successful call allocates ONE interrupt to the device, regardless +of how many MSIs the device supports. The device is switched from pin-based interrupt mode to MSI mode. The dev->irq number is changed -to a new number which represents the message signaled interrupt. -This function should be called before the driver calls request_irq() -since enabling MSIs disables the pin-based IRQ and the driver will not -receive interrupts on the old interrupt. +to a new number which represents the message signaled interrupt; +consequently, this function should be called before the driver calls +request_irq(), because an MSI is delivered via a vector that is +different from the vector of a pin-based interrupt. -4.2.2 pci_enable_msi_block +4.2.2 pci_enable_msi_range -int pci_enable_msi_block(struct pci_dev *dev, int count) +int pci_enable_msi_range(struct pci_dev *dev, int minvec, int maxvec) -This variation on the above call allows a device driver to request multiple -MSIs. The MSI specification only allows interrupts to be allocated in -powers of two, up to a maximum of 2^5 (32). +This function allows a device driver to request any number of MSI +interrupts within specified range from 'minvec' to 'maxvec'. -If this function returns 0, it has succeeded in allocating at least as many -interrupts as the driver requested (it may have allocated more in order -to satisfy the power-of-two requirement). In this case, the function -enables MSI on this device and updates dev->irq to be the lowest of -the new interrupts assigned to it. The other interrupts assigned to -the device are in the range dev->irq to dev->irq + count - 1. +If this function returns a positive number it indicates the number of +MSI interrupts that have been successfully allocated. In this case +the device is switched from pin-based interrupt mode to MSI mode and +updates dev->irq to be the lowest of the new interrupts assigned to it. +The other interrupts assigned to the device are in the range dev->irq +to dev->irq + returned value - 1. Device driver can use the returned +number of successfully allocated MSI interrupts to further allocate +and initialize device resources. If this function returns a negative number, it indicates an error and the driver should not attempt to request any more MSI interrupts for -this device. If this function returns a positive number, it will be -less than 'count' and indicate the number of interrupts that could have -been allocated. In neither case will the irq value have been -updated, nor will the device have been switched into MSI mode. - -The device driver must decide what action to take if -pci_enable_msi_block() returns a value less than the number asked for. -Some devices can make use of fewer interrupts than the maximum they -request; in this case the driver should call pci_enable_msi_block() -again. Note that it is not guaranteed to succeed, even when the -'count' has been reduced to the value returned from a previous call to -pci_enable_msi_block(). This is because there are multiple constraints -on the number of vectors that can be allocated; pci_enable_msi_block() -will return as soon as it finds any constraint that doesn't allow the -call to succeed. - -4.2.3 pci_disable_msi +this device. + +This function should be called before the driver calls request_irq(), +because MSI interrupts are delivered via vectors that are different +from the vector of a pin-based interrupt. + +It is ideal if drivers can cope with a variable number of MSI interrupts; +there are many reasons why the platform may not be able to provide the +exact number that a driver asks for. + +There could be devices that can not operate with just any number of MSI +interrupts within a range. See chapter 4.3.1.3 to get the idea how to +handle such devices for MSI-X - the same logic applies to MSI. + +4.2.1.1 Maximum possible number of MSI interrupts + +The typical usage of MSI interrupts is to allocate as many vectors as +possible, likely up to the limit returned by pci_msi_vec_count() function: + +static int foo_driver_enable_msi(struct pci_dev *pdev, int nvec) +{ + return pci_enable_msi_range(pdev, 1, nvec); +} + +Note the value of 'minvec' parameter is 1. As 'minvec' is inclusive, +the value of 0 would be meaningless and could result in error. + +Some devices have a minimal limit on number of MSI interrupts. +In this case the function could look like this: + +static int foo_driver_enable_msi(struct pci_dev *pdev, int nvec) +{ + return pci_enable_msi_range(pdev, FOO_DRIVER_MINIMUM_NVEC, nvec); +} + +4.2.1.2 Exact number of MSI interrupts + +If a driver is unable or unwilling to deal with a variable number of MSI +interrupts it could request a particular number of interrupts by passing +that number to pci_enable_msi_range() function as both 'minvec' and 'maxvec' +parameters: + +static int foo_driver_enable_msi(struct pci_dev *pdev, int nvec) +{ + return pci_enable_msi_range(pdev, nvec, nvec); +} + +Note, unlike pci_enable_msi_exact() function, which could be also used to +enable a particular number of MSI-X interrupts, pci_enable_msi_range() +returns either a negative errno or 'nvec' (not negative errno or 0 - as +pci_enable_msi_exact() does). + +4.2.1.3 Single MSI mode + +The most notorious example of the request type described above is +enabling the single MSI mode for a device. It could be done by passing +two 1s as 'minvec' and 'maxvec': + +static int foo_driver_enable_single_msi(struct pci_dev *pdev) +{ + return pci_enable_msi_range(pdev, 1, 1); +} + +Note, unlike pci_enable_msi() function, which could be also used to +enable the single MSI mode, pci_enable_msi_range() returns either a +negative errno or 1 (not negative errno or 0 - as pci_enable_msi() +does). + +4.2.3 pci_enable_msi_exact + +int pci_enable_msi_exact(struct pci_dev *dev, int nvec) + +This variation on pci_enable_msi_range() call allows a device driver to +request exactly 'nvec' MSIs. + +If this function returns a negative number, it indicates an error and +the driver should not attempt to request any more MSI interrupts for +this device. + +By contrast with pci_enable_msi_range() function, pci_enable_msi_exact() +returns zero in case of success, which indicates MSI interrupts have been +successfully allocated. + +4.2.4 pci_disable_msi void pci_disable_msi(struct pci_dev *dev) -This function should be used to undo the effect of pci_enable_msi() or -pci_enable_msi_block(). Calling it restores dev->irq to the pin-based -interrupt number and frees the previously allocated message signaled -interrupt(s). The interrupt may subsequently be assigned to another -device, so drivers should not cache the value of dev->irq. +This function should be used to undo the effect of pci_enable_msi_range(). +Calling it restores dev->irq to the pin-based interrupt number and frees +the previously allocated MSIs. The interrupts may subsequently be assigned +to another device, so drivers should not cache the value of dev->irq. + +Before calling this function, a device driver must always call free_irq() +on any interrupt for which it previously called request_irq(). +Failure to do so results in a BUG_ON(), leaving the device with +MSI enabled and thus leaking its vector. -A device driver must always call free_irq() on the interrupt(s) -for which it has called request_irq() before calling this function. -Failure to do so will result in a BUG_ON(), the device will be left with -MSI enabled and will leak its vector. +4.2.4 pci_msi_vec_count + +int pci_msi_vec_count(struct pci_dev *dev) + +This function could be used to retrieve the number of MSI vectors the +device requested (via the Multiple Message Capable register). The MSI +specification only allows the returned value to be a power of two, +up to a maximum of 2^5 (32). + +If this function returns a negative number, it indicates the device is +not capable of sending MSIs. + +If this function returns a positive number, it indicates the maximum +number of MSI interrupt vectors that could be allocated. 4.3 Using MSI-X @@ -155,72 +237,213 @@ struct msix_entry { }; This allows for the device to use these interrupts in a sparse fashion; -for example it could use interrupts 3 and 1027 and allocate only a +for example, it could use interrupts 3 and 1027 and yet allocate only a two-element array. The driver is expected to fill in the 'entry' value -in each element of the array to indicate which entries it wants the kernel -to assign interrupts for. It is invalid to fill in two entries with the +in each element of the array to indicate for which entries the kernel +should assign interrupts; it is invalid to fill in two entries with the same number. -4.3.1 pci_enable_msix +4.3.1 pci_enable_msix_range -int pci_enable_msix(struct pci_dev *dev, struct msix_entry *entries, int nvec) +int pci_enable_msix_range(struct pci_dev *dev, struct msix_entry *entries, + int minvec, int maxvec) -Calling this function asks the PCI subsystem to allocate 'nvec' MSIs. +Calling this function asks the PCI subsystem to allocate any number of +MSI-X interrupts within specified range from 'minvec' to 'maxvec'. The 'entries' argument is a pointer to an array of msix_entry structs -which should be at least 'nvec' entries in size. On success, the -function will return 0 and the device will have been switched into -MSI-X interrupt mode. The 'vector' elements in each entry will have -been filled in with the interrupt number. The driver should then call -request_irq() for each 'vector' that it decides to use. +which should be at least 'maxvec' entries in size. + +On success, the device is switched into MSI-X mode and the function +returns the number of MSI-X interrupts that have been successfully +allocated. In this case the 'vector' member in entries numbered from +0 to the returned value - 1 is populated with the interrupt number; +the driver should then call request_irq() for each 'vector' that it +decides to use. The device driver is responsible for keeping track of the +interrupts assigned to the MSI-X vectors so it can free them again later. +Device driver can use the returned number of successfully allocated MSI-X +interrupts to further allocate and initialize device resources. If this function returns a negative number, it indicates an error and the driver should not attempt to allocate any more MSI-X interrupts for -this device. If it returns a positive number, it indicates the maximum -number of interrupt vectors that could have been allocated. See example -below. +this device. -This function, in contrast with pci_enable_msi(), does not adjust +This function, in contrast with pci_enable_msi_range(), does not adjust dev->irq. The device will not generate interrupts for this interrupt -number once MSI-X is enabled. The device driver is responsible for -keeping track of the interrupts assigned to the MSI-X vectors so it can -free them again later. +number once MSI-X is enabled. Device drivers should normally call this function once per device during the initialization phase. -It is ideal if drivers can cope with a variable number of MSI-X interrupts, +It is ideal if drivers can cope with a variable number of MSI-X interrupts; there are many reasons why the platform may not be able to provide the -exact number a driver asks for. +exact number that a driver asks for. + +There could be devices that can not operate with just any number of MSI-X +interrupts within a range. E.g., an network adapter might need let's say +four vectors per each queue it provides. Therefore, a number of MSI-X +interrupts allocated should be a multiple of four. In this case interface +pci_enable_msix_range() can not be used alone to request MSI-X interrupts +(since it can allocate any number within the range, without any notion of +the multiple of four) and the device driver should master a custom logic +to request the required number of MSI-X interrupts. + +4.3.1.1 Maximum possible number of MSI-X interrupts -A request loop to achieve that might look like: +The typical usage of MSI-X interrupts is to allocate as many vectors as +possible, likely up to the limit returned by pci_msix_vec_count() function: static int foo_driver_enable_msix(struct foo_adapter *adapter, int nvec) { - while (nvec >= FOO_DRIVER_MINIMUM_NVEC) { - rc = pci_enable_msix(adapter->pdev, - adapter->msix_entries, nvec); - if (rc > 0) - nvec = rc; - else - return rc; + return pci_enable_msix_range(adapter->pdev, adapter->msix_entries, + 1, nvec); +} + +Note the value of 'minvec' parameter is 1. As 'minvec' is inclusive, +the value of 0 would be meaningless and could result in error. + +Some devices have a minimal limit on number of MSI-X interrupts. +In this case the function could look like this: + +static int foo_driver_enable_msix(struct foo_adapter *adapter, int nvec) +{ + return pci_enable_msix_range(adapter->pdev, adapter->msix_entries, + FOO_DRIVER_MINIMUM_NVEC, nvec); +} + +4.3.1.2 Exact number of MSI-X interrupts + +If a driver is unable or unwilling to deal with a variable number of MSI-X +interrupts it could request a particular number of interrupts by passing +that number to pci_enable_msix_range() function as both 'minvec' and 'maxvec' +parameters: + +static int foo_driver_enable_msix(struct foo_adapter *adapter, int nvec) +{ + return pci_enable_msix_range(adapter->pdev, adapter->msix_entries, + nvec, nvec); +} + +Note, unlike pci_enable_msix_exact() function, which could be also used to +enable a particular number of MSI-X interrupts, pci_enable_msix_range() +returns either a negative errno or 'nvec' (not negative errno or 0 - as +pci_enable_msix_exact() does). + +4.3.1.3 Specific requirements to the number of MSI-X interrupts + +As noted above, there could be devices that can not operate with just any +number of MSI-X interrupts within a range. E.g., let's assume a device that +is only capable sending the number of MSI-X interrupts which is a power of +two. A routine that enables MSI-X mode for such device might look like this: + +/* + * Assume 'minvec' and 'maxvec' are non-zero + */ +static int foo_driver_enable_msix(struct foo_adapter *adapter, + int minvec, int maxvec) +{ + int rc; + + minvec = roundup_pow_of_two(minvec); + maxvec = rounddown_pow_of_two(maxvec); + + if (minvec > maxvec) + return -ERANGE; + +retry: + rc = pci_enable_msix_range(adapter->pdev, adapter->msix_entries, + maxvec, maxvec); + /* + * -ENOSPC is the only error code allowed to be analized + */ + if (rc == -ENOSPC) { + if (maxvec == 1) + return -ENOSPC; + + maxvec /= 2; + + if (minvec > maxvec) + return -ENOSPC; + + goto retry; } - return -ENOSPC; + return rc; } -4.3.2 pci_disable_msix +Note how pci_enable_msix_range() return value is analized for a fallback - +any error code other than -ENOSPC indicates a fatal error and should not +be retried. + +4.3.2 pci_enable_msix_exact + +int pci_enable_msix_exact(struct pci_dev *dev, + struct msix_entry *entries, int nvec) + +This variation on pci_enable_msix_range() call allows a device driver to +request exactly 'nvec' MSI-Xs. + +If this function returns a negative number, it indicates an error and +the driver should not attempt to allocate any more MSI-X interrupts for +this device. + +By contrast with pci_enable_msix_range() function, pci_enable_msix_exact() +returns zero in case of success, which indicates MSI-X interrupts have been +successfully allocated. + +Another version of a routine that enables MSI-X mode for a device with +specific requirements described in chapter 4.3.1.3 might look like this: + +/* + * Assume 'minvec' and 'maxvec' are non-zero + */ +static int foo_driver_enable_msix(struct foo_adapter *adapter, + int minvec, int maxvec) +{ + int rc; + + minvec = roundup_pow_of_two(minvec); + maxvec = rounddown_pow_of_two(maxvec); + + if (minvec > maxvec) + return -ERANGE; + +retry: + rc = pci_enable_msix_exact(adapter->pdev, + adapter->msix_entries, maxvec); + + /* + * -ENOSPC is the only error code allowed to be analyzed + */ + if (rc == -ENOSPC) { + if (maxvec == 1) + return -ENOSPC; + + maxvec /= 2; + + if (minvec > maxvec) + return -ENOSPC; + + goto retry; + } else if (rc < 0) { + return rc; + } + + return maxvec; +} + +4.3.3 pci_disable_msix void pci_disable_msix(struct pci_dev *dev) -This API should be used to undo the effect of pci_enable_msix(). It frees -the previously allocated message signaled interrupts. The interrupts may +This function should be used to undo the effect of pci_enable_msix_range(). +It frees the previously allocated MSI-X interrupts. The interrupts may subsequently be assigned to another device, so drivers should not cache the value of the 'vector' elements over a call to pci_disable_msix(). -A device driver must always call free_irq() on the interrupt(s) -for which it has called request_irq() before calling this function. -Failure to do so will result in a BUG_ON(), the device will be left with -MSI enabled and will leak its vector. +Before calling this function, a device driver must always call free_irq() +on any interrupt for which it previously called request_irq(). +Failure to do so results in a BUG_ON(), leaving the device with +MSI-X enabled and thus leaking its vector. 4.3.3 The MSI-X Table @@ -229,18 +452,32 @@ MSI-X Table. This address is mapped by the PCI subsystem, and should not be accessed directly by the device driver. If the driver wishes to mask or unmask an interrupt, it should call disable_irq() / enable_irq(). +4.3.4 pci_msix_vec_count + +int pci_msix_vec_count(struct pci_dev *dev) + +This function could be used to retrieve number of entries in the device +MSI-X table. + +If this function returns a negative number, it indicates the device is +not capable of sending MSI-Xs. + +If this function returns a positive number, it indicates the maximum +number of MSI-X interrupt vectors that could be allocated. + 4.4 Handling devices implementing both MSI and MSI-X capabilities If a device implements both MSI and MSI-X capabilities, it can -run in either MSI mode or MSI-X mode but not both simultaneously. +run in either MSI mode or MSI-X mode, but not both simultaneously. This is a requirement of the PCI spec, and it is enforced by the -PCI layer. Calling pci_enable_msi() when MSI-X is already enabled or -pci_enable_msix() when MSI is already enabled will result in an error. -If a device driver wishes to switch between MSI and MSI-X at runtime, -it must first quiesce the device, then switch it back to pin-interrupt -mode, before calling pci_enable_msi() or pci_enable_msix() and resuming -operation. This is not expected to be a common operation but may be -useful for debugging or testing during development. +PCI layer. Calling pci_enable_msi_range() when MSI-X is already +enabled or pci_enable_msix_range() when MSI is already enabled +results in an error. If a device driver wishes to switch between MSI +and MSI-X at runtime, it must first quiesce the device, then switch +it back to pin-interrupt mode, before calling pci_enable_msi_range() +or pci_enable_msix_range() and resuming operation. This is not expected +to be a common operation but may be useful for debugging or testing +during development. 4.5 Considerations when using MSIs @@ -251,10 +488,10 @@ the MSI-X facilities in preference to the MSI facilities. As mentioned above, MSI-X supports any number of interrupts between 1 and 2048. In constrast, MSI is restricted to a maximum of 32 interrupts (and must be a power of two). In addition, the MSI interrupt vectors must -be allocated consecutively, so the system may not be able to allocate +be allocated consecutively, so the system might not be able to allocate as many vectors for MSI as it could for MSI-X. On some platforms, MSI -interrupts must all be targetted at the same set of CPUs whereas MSI-X -interrupts can all be targetted at different CPUs. +interrupts must all be targeted at the same set of CPUs whereas MSI-X +interrupts can all be targeted at different CPUs. 4.5.2 Spinlocks @@ -281,7 +518,7 @@ disabled to enabled and back again. Using 'lspci -v' (as root) may show some devices with "MSI", "Message Signalled Interrupts" or "MSI-X" capabilities. Each of these capabilities -has an 'Enable' flag which will be followed with either "+" (enabled) +has an 'Enable' flag which is followed with either "+" (enabled) or "-" (disabled). @@ -298,7 +535,7 @@ The PCI stack provides three ways to disable MSIs: Some host chipsets simply don't support MSIs properly. If we're lucky, the manufacturer knows this and has indicated it in the ACPI -FADT table. In this case, Linux will automatically disable MSIs. +FADT table. In this case, Linux automatically disables MSIs. Some boards don't include this information in the table and so we have to detect them ourselves. The complete list of these is found near the quirk_disable_all_msi() function in drivers/pci/quirks.c. @@ -317,7 +554,7 @@ Some bridges allow you to enable MSIs by changing some bits in their PCI configuration space (especially the Hypertransport chipsets such as the nVidia nForce and Serverworks HT2000). As with host chipsets, Linux mostly knows about them and automatically enables MSIs if it can. -If you have a bridge which Linux doesn't yet know about, you can enable +If you have a bridge unknown to Linux, you can enable MSIs in configuration space using whatever method you know works, then enable MSIs on that bridge by doing: @@ -327,7 +564,7 @@ where $bridge is the PCI address of the bridge you've enabled (eg 0000:00:0e.0). To disable MSIs, echo 0 instead of 1. Changing this value should be -done with caution as it can break interrupt handling for all devices +done with caution as it could break interrupt handling for all devices below this bridge. Again, please notify linux-pci@vger.kernel.org of any bridges that need @@ -336,7 +573,7 @@ special handling. 5.3. Disabling MSIs on a single device Some devices are known to have faulty MSI implementations. Usually this -is handled in the individual device driver but occasionally it's necessary +is handled in the individual device driver, but occasionally it's necessary to handle this with a quirk. Some drivers have an option to disable use of MSI. While this is a convenient workaround for the driver author, it is not good practise, and should not be emulated. @@ -350,10 +587,10 @@ for your machine. You should also check your .config to be sure you have enabled CONFIG_PCI_MSI. Then, 'lspci -t' gives the list of bridges above a device. Reading -/sys/bus/pci/devices/*/msi_bus will tell you whether MSI are enabled (1) +/sys/bus/pci/devices/*/msi_bus will tell you whether MSIs are enabled (1) or disabled (0). If 0 is found in any of the msi_bus files belonging to bridges between the PCI root and the device, MSIs are disabled. It is also worth checking the device driver to see whether it supports MSIs. -For example, it may contain calls to pci_enable_msi(), pci_enable_msix() or -pci_enable_msi_block(). +For example, it may contain calls to pci_enable_msi_range() or +pci_enable_msix_range(). diff --git a/Documentation/PCI/pci-iov-howto.txt b/Documentation/PCI/pci-iov-howto.txt index fc73ef5d65b..2d91ae25198 100644 --- a/Documentation/PCI/pci-iov-howto.txt +++ b/Documentation/PCI/pci-iov-howto.txt @@ -2,6 +2,9 @@ Copyright (C) 2009 Intel Corporation Yu Zhao <yu.zhao@intel.com> + Update: November 2012 + -- sysfs-based SRIOV enable-/disable-ment + Donald Dutile <ddutile@redhat.com> 1. Overview @@ -24,10 +27,21 @@ real existing PCI device. 2.1 How can I enable SR-IOV capability -The device driver (PF driver) will control the enabling and disabling -of the capability via API provided by SR-IOV core. If the hardware -has SR-IOV capability, loading its PF driver would enable it and all -VFs associated with the PF. +Multiple methods are available for SR-IOV enablement. +In the first method, the device driver (PF driver) will control the +enabling and disabling of the capability via API provided by SR-IOV core. +If the hardware has SR-IOV capability, loading its PF driver would +enable it and all VFs associated with the PF. Some PF drivers require +a module parameter to be set to determine the number of VFs to enable. +In the second method, a write to the sysfs file sriov_numvfs will +enable and disable the VFs associated with a PCIe PF. This method +enables per-PF, VF enable/disable values versus the first method, +which applies to all PFs of the same device. Additionally, the +PCI SRIOV core support ensures that enable/disable operations are +valid to reduce duplication in multiple drivers for the same +checks, e.g., check numvfs == 0 if enabling VFs, ensure +numvfs <= totalvfs. +The second method is the recommended method for new/future VF devices. 2.2 How can I use the Virtual Functions @@ -40,20 +54,25 @@ requires device driver that is same as a normal PCI device's. 3.1 SR-IOV API To enable SR-IOV capability: +(a) For the first method, in the driver: int pci_enable_sriov(struct pci_dev *dev, int nr_virtfn); 'nr_virtfn' is number of VFs to be enabled. +(b) For the second method, from sysfs: + echo 'nr_virtfn' > \ + /sys/bus/pci/devices/<DOMAIN:BUS:DEVICE.FUNCTION>/sriov_numvfs To disable SR-IOV capability: +(a) For the first method, in the driver: void pci_disable_sriov(struct pci_dev *dev); - -To notify SR-IOV core of Virtual Function Migration: - irqreturn_t pci_sriov_migration(struct pci_dev *dev); +(b) For the second method, from sysfs: + echo 0 > \ + /sys/bus/pci/devices/<DOMAIN:BUS:DEVICE.FUNCTION>/sriov_numvfs 3.2 Usage example Following piece of code illustrates the usage of the SR-IOV API. -static int __devinit dev_probe(struct pci_dev *dev, const struct pci_device_id *id) +static int dev_probe(struct pci_dev *dev, const struct pci_device_id *id) { pci_enable_sriov(dev, NR_VIRTFN); @@ -62,7 +81,7 @@ static int __devinit dev_probe(struct pci_dev *dev, const struct pci_device_id * return 0; } -static void __devexit dev_remove(struct pci_dev *dev) +static void dev_remove(struct pci_dev *dev) { pci_disable_sriov(dev); @@ -88,12 +107,29 @@ static void dev_shutdown(struct pci_dev *dev) ... } +static int dev_sriov_configure(struct pci_dev *dev, int numvfs) +{ + if (numvfs > 0) { + ... + pci_enable_sriov(dev, numvfs); + ... + return numvfs; + } + if (numvfs == 0) { + .... + pci_disable_sriov(dev); + ... + return 0; + } +} + static struct pci_driver dev_driver = { .name = "SR-IOV Physical Function driver", .id_table = dev_id_table, .probe = dev_probe, - .remove = __devexit_p(dev_remove), + .remove = dev_remove, .suspend = dev_suspend, .resume = dev_resume, .shutdown = dev_shutdown, + .sriov_configure = dev_sriov_configure, }; diff --git a/Documentation/PCI/pci.txt b/Documentation/PCI/pci.txt index 6148d4080f8..9518006f667 100644 --- a/Documentation/PCI/pci.txt +++ b/Documentation/PCI/pci.txt @@ -123,8 +123,10 @@ initialization with a pointer to a structure describing the driver The ID table is an array of struct pci_device_id entries ending with an -all-zero entry; use of the macro DEFINE_PCI_DEVICE_TABLE is the preferred -method of declaring the table. Each entry consists of: +all-zero entry. Definitions with static const are generally preferred. +Use of the deprecated macro DEFINE_PCI_DEVICE_TABLE should be avoided. + +Each entry consists of: vendor,device Vendor and device ID to match (or PCI_ANY_ID) @@ -183,12 +185,6 @@ Please mark the initialization and cleanup functions where appropriate initializes. __exit Exit code. Ignored for non-modular drivers. - - __devinit Device initialization code. - Identical to __init if the kernel is not compiled - with CONFIG_HOTPLUG, normal function otherwise. - __devexit The same for __exit. - Tips on when/where to use the above attributes: o The module_init()/module_exit() functions (and all initialization functions called _only_ from these) @@ -196,20 +192,6 @@ Tips on when/where to use the above attributes: o Do not mark the struct pci_driver. - o The ID table array should be marked __devinitconst; this is done - automatically if the table is declared with DEFINE_PCI_DEVICE_TABLE(). - - o The probe() and remove() functions should be marked __devinit - and __devexit respectively. All initialization functions - exclusively called by the probe() routine, can be marked __devinit. - Ditto for remove() and __devexit. - - o If mydriver_remove() is marked with __devexit(), then all address - references to mydriver_remove must use __devexit_p(mydriver_remove) - (in the struct pci_driver declaration for example). - __devexit_p() will generate the function name _or_ NULL if the - function will be discarded. For an example, see drivers/net/tg3.c. - o Do NOT mark a function if you are not sure which mark to use. Better to not mark the function than mark the function wrong. @@ -314,7 +296,7 @@ from the PCI device config space. Use the values in the pci_dev structure as the PCI "bus address" might have been remapped to a "host physical" address by the arch/chip-set specific kernel support. -See Documentation/IO-mapping.txt for how to access device registers +See Documentation/io-mapping.txt for how to access device registers or device memory. The device driver needs to call pci_request_region() to verify @@ -545,8 +527,9 @@ corresponding register block for you. 6. Other interesting functions ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ -pci_find_slot() Find pci_dev corresponding to given bus and - slot numbers. +pci_get_domain_bus_and_slot() Find pci_dev corresponding to given domain, + bus and slot and number. If the device is + found, its reference count is increased. pci_set_power_state() Set PCI Power Management state (0=D0 ... 3=D3) pci_find_capability() Find specified capability in device's capability list. @@ -602,7 +585,8 @@ having sane locking. pci_find_device() Superseded by pci_get_device() pci_find_subsys() Superseded by pci_get_subsys() -pci_find_slot() Superseded by pci_get_slot() +pci_find_slot() Superseded by pci_get_domain_bus_and_slot() +pci_get_slot() Superseded by pci_get_domain_bus_and_slot() The alternative is the traditional PCI device driver that walks PCI |
