aboutsummaryrefslogtreecommitdiff
diff options
context:
space:
mode:
-rw-r--r--docs/system/device-emulation.rst1
-rw-r--r--docs/system/devices/cxl.rst302
2 files changed, 303 insertions, 0 deletions
diff --git a/docs/system/device-emulation.rst b/docs/system/device-emulation.rst
index ae8dd23..3b729b9 100644
--- a/docs/system/device-emulation.rst
+++ b/docs/system/device-emulation.rst
@@ -84,6 +84,7 @@ Emulated Devices
devices/can.rst
devices/ccid.rst
+ devices/cxl.rst
devices/ivshmem.rst
devices/net.rst
devices/nvme.rst
diff --git a/docs/system/devices/cxl.rst b/docs/system/devices/cxl.rst
new file mode 100644
index 0000000..9293cbf
--- /dev/null
+++ b/docs/system/devices/cxl.rst
@@ -0,0 +1,302 @@
+Compute Express Link (CXL)
+==========================
+From the view of a single host, CXL is an interconnect standard that
+targets accelerators and memory devices attached to a CXL host.
+This description will focus on those aspects visible either to
+software running on a QEMU emulated host or to the internals of
+functional emulation. As such, it will skip over many of the
+electrical and protocol elements that would be more of interest
+for real hardware and will dominate more general introductions to CXL.
+It will also completely ignore the fabric management aspects of CXL
+by considering only a single host and a static configuration.
+
+CXL shares many concepts and much of the infrastructure of PCI Express,
+with CXL Host Bridges, which have CXL Root Ports which may be directly
+attached to CXL or PCI End Points. Alternatively there may be CXL Switches
+with CXL and PCI Endpoints attached below them. In many cases additional
+control and capabilities are exposed via PCI Express interfaces.
+This sharing of interfaces and hence emulation code is is reflected
+in how the devices are emulated in QEMU. In most cases the various
+CXL elements are built upon an equivalent PCIe devices.
+
+CXL devices support the following interfaces:
+
+* Most conventional PCIe interfaces
+
+ - Configuration space access
+ - BAR mapped memory accesses used for registers and mailboxes.
+ - MSI/MSI-X
+ - AER
+ - DOE mailboxes
+ - IDE
+ - Many other PCI express defined interfaces..
+
+* Memory operations
+
+ - Equivalent of accessing DRAM / NVDIMMs. Any access / feature
+ supported by the host for normal memory should also work for
+ CXL attached memory devices.
+
+* Cache operations. The are mostly irrelevant to QEMU emulation as
+ QEMU is not emulating a coherency protocol. Any emulation related
+ to these will be device specific and is out of the scope of this
+ document.
+
+CXL 2.0 Device Types
+--------------------
+CXL 2.0 End Points are often categorized into three types.
+
+**Type 1:** These support coherent caching of host memory. Example might
+be a crypto accelerators. May also have device private memory accessible
+via means such as PCI memory reads and writes to BARs.
+
+**Type 2:** These support coherent caching of host memory and host
+managed device memory (HDM) for which the coherency protocol is managed
+by the host. This is a complex topic, so for more information on CXL
+coherency see the CXL 2.0 specification.
+
+**Type 3 Memory devices:** These devices act as a means of attaching
+additional memory (HDM) to a CXL host including both volatile and
+persistent memory. The CXL topology may support interleaving across a
+number of Type 3 memory devices using HDM Decoders in the host, host
+bridge, switch upstream port and endpoints.
+
+Scope of CXL emulation in QEMU
+------------------------------
+The focus of CXL emulation is CXL revision 2.0 and later. Earlier CXL
+revisions defined a smaller set of features, leaving much of the control
+interface as implementation defined or device specific, making generic
+emulation challenging with host specific firmware being responsible
+for setup and the Endpoints being presented to operating systems
+as Root Complex Integrated End Points. CXL rev 2.0 looks a lot
+more like PCI Express, with fully specified discoverability
+of the CXL topology.
+
+CXL System components
+----------------------
+A CXL system is made up a Host with a number of 'standard components'
+the control and capabilities of which are discoverable by system software
+using means described in the CXL 2.0 specification.
+
+CXL Fixed Memory Windows (CFMW)
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+A CFMW consists of a particular range of Host Physical Address space
+which is routed to particular CXL Host Bridges. At time of generic
+software initialization it will have a particularly interleaving
+configuration and associated Quality of Serice Throtling Group (QTG).
+This information is available to system software, when making
+decisions about how to configure interleave across available CXL
+memory devices. It is provide as CFMW Structures (CFMWS) in
+the CXL Early Discovery Table, an ACPI table.
+
+Note: QTG 0 is the only one currently supported in QEMU.
+
+CXL Host Bridge (CXL HB)
+~~~~~~~~~~~~~~~~~~~~~~~~
+A CXL host bridge is similar to the PCIe equivalent, but with a
+specification defined register interface called CXL Host Bridge
+Component Registers (CHBCR). The location of this CHBCR MMIO
+space is described to system software via a CXL Host Bridge
+Structure (CHBS) in the CEDT ACPI table. The actual interfaces
+are identical to those used for other parts of the CXL heirarchy
+as CXL Component Registers in PCI BARs.
+
+Interfaces provided include:
+
+* Configuration of HDM Decoders to route CXL Memory accesses with
+ a particularly Host Physical Address range to the target port
+ below which the CXL device servicing that address lies. This
+ may be a mapping to a single Root Port (RP) or across a set of
+ target RPs.
+
+CXL Root Ports (CXL RP)
+~~~~~~~~~~~~~~~~~~~~~~~
+A CXL Root Port servers te same purpose as a PCIe Root Port.
+There are a number of CXL specific Designated Vendor Specific
+Extended Capabilities (DVSEC) in PCIe Configuration Space
+and associated component register access via PCI bars.
+
+CXL Switch
+~~~~~~~~~~
+Not yet implemented in QEMU.
+
+Here we consider a simple CXL switch with only a single
+virtual hierarchy. Whilst more complex devices exist, their
+visibility to a particular host is generally the same as for
+a simple switch design. Hosts often have no awareness
+of complex rerouting and device pooling, they simply see
+devices being hot added or hot removed.
+
+A CXL switch has a similar architecture to those in PCIe,
+with a single upstream port, internal PCI bus and multiple
+downstream ports.
+
+Both the CXL upstream and downstream ports have CXL specific
+DVSECs in configuration space, and component registers in PCI
+BARs. The Upstream Port has the configuration interfaces for
+the HDM decoders which route incoming memory accesses to the
+appropriate downstream port.
+
+CXL Memory Devices - Type 3
+~~~~~~~~~~~~~~~~~~~~~~~~~~~
+CXL type 3 devices use a PCI class code and are intended to be supported
+by a generic operating system driver. They have HDM decoders
+though in these EP devices, the decoder is reponsible not for
+routing but for translation of the incoming host physical address (HPA)
+into a Device Physical Address (DPA).
+
+CXL Memory Interleave
+---------------------
+To understand the interaction of different CXL hardware components which
+are emulated in QEMU, let us consider a memory read in a fully configured
+CXL topology. Note that system software is responsible for configuration
+of all components with the exception of the CFMWs. System software is
+responsible for allocating appropriate ranges from within the CFMWs
+and exposing those via normal memory configurations as would be done
+for system RAM.
+
+Example system Topology. x marks the match in each decoder level::
+
+ |<------------------SYSTEM PHYSICAL ADDRESS MAP (1)----------------->|
+ | __________ __________________________________ __________ |
+ | | | | | | | |
+ | | CFMW 0 | | CXL Fixed Memory Window 1 | | CFMW 1 | |
+ | | HB0 only | | Configured to interleave memory | | HB1 only | |
+ | | | | memory accesses across HB0/HB1 | | | |
+ | |__________| |_____x____________________________| |__________| |
+ | | | |
+ | | | |
+ | | | |
+ | Interleave Decoder | |
+ | Matches this HB | |
+ \_____________| |_____________/
+ __________|__________ _____|_______________
+ | | | |
+ (2) | CXL HB 0 | | CXL HB 1 |
+ | HB IntLv Decoders | | HB IntLv Decoders |
+ | PCI/CXL Root Bus 0c | | PCI/CXL Root Bus 0d |
+ | | | |
+ |___x_________________| |_____________________|
+ | | | |
+ | | | |
+ A HB 0 HDM Decoder | | |
+ matches this Port | | |
+ | | | |
+ ___________|___ __________|__ __|_________ ___|_________
+ (3)| Root Port 0 | | Root Port 1 | | Root Port 2| | Root Port 3 |
+ | Appears in | | Appears in | | Appears in | | Appear in |
+ | PCI topology | | PCI Topology| | PCI Topo | | PCI Topo |
+ | As 0c:00.0 | | as 0c:01.0 | | as de:00.0 | | as de:01.0 |
+ |_______________| |_____________| |____________| |_____________|
+ | | | |
+ | | | |
+ _____|_________ ______|______ ______|_____ ______|_______
+ (4)| x | | | | | | |
+ | CXL Type3 0 | | CXL Type3 1 | | CXL type3 2| | CLX Type 3 3 |
+ | | | | | | | |
+ | PMEM0(Vol LSA)| | PMEM1 (...) | | PMEM2 (...)| | PMEM3 (...) |
+ | Decoder to go | | | | | | |
+ | from host PA | | PCI 0e:00.0 | | PCI df:00.0| | PCI e0:00.0 |
+ | to device PA | | | | | | |
+ | PCI as 0d:00.0| | | | | | |
+ |_______________| |_____________| |____________| |______________|
+
+Notes:
+
+(1) **3 CXL Fixed Memory Windows (CFMW)** corresponding to different
+ ranges of the system physical address map. Each CFMW has
+ particular interleave setup across the CXL Host Bridges (HB)
+ CFMW0 provides uninterleaved access to HB0, CFW2 provides
+ uninterleaved acess to HB1. CFW1 provides interleaved memory access
+ across HB0 and HB1.
+
+(2) **Two CXL Host Bridges**. Each of these has 2 CXL Root Ports and
+ programmable HDM decoders to route memory accesses either to
+ a single port or interleave them across multiple ports.
+ A complex configuration here, might be to use the following HDM
+ decoders in HB0. HDM0 routes CFMW0 requests to RP0 and hence
+ part of CXL Type3 0. HDM1 routes CFMW0 requests from a
+ different region of the CFMW0 PA range to RP2 and hence part
+ of CXL Type 3 1. HDM2 routes yet another PA range from within
+ CFMW0 to be interleaved across RP0 and RP1, providing 2 way
+ interleave of part of the memory provided by CXL Type3 0 and
+ CXL Type 3 1. HDM3 routes those interleaved accesses from
+ CFMW1 that target HB0 to RP 0 and another part of the memory of
+ CXL Type 3 0 (as part of a 2 way interleave at the system level
+ across for example CXL Type3 0 and CXL Type3 2.
+ HDM4 is used to enable system wide 4 way interleave across all
+ the present CXL type3 devices, by interleaving those (interleaved)
+ requests that HB0 receives from from CFMW1 across RP 0 and
+ RP 1 and hence to yet more regions of the memory of the
+ attached Type3 devices. Note this is a representative subset
+ of the full range of possible HDM decoder configurations in this
+ topology.
+
+(3) **Four CXL Root Ports.** In this case the CXL Type 3 devices are
+ directly attached to these ports.
+
+(4) **Four CXL Type3 memory expansion devices.** These will each have
+ HDM decoders, but in this case rather than performing interleave
+ they will take the Host Physical Addresses of accesses and map
+ them to their own local Device Physical Address Space (DPA).
+
+Example command lines
+---------------------
+A very simple setup with just one directly attached CXL Type 3 device::
+
+ qemu-system-aarch64 -M virt,gic-version=3,cxl=on -m 4g,maxmem=8G,slots=8 -cpu max \
+ ...
+ -object memory-backend-file,id=cxl-mem1,share=on,mem-path=/tmp/cxltest.raw,size=256M \
+ -object memory-backend-file,id=cxl-lsa1,share=on,mem-path=/tmp/lsa.raw,size=256M \
+ -device pxb-cxl,bus_nr=12,bus=pcie.0,id=cxl.1 \
+ -device cxl-rp,port=0,bus=cxl.1,id=root_port13,chassis=0,slot=2 \
+ -device cxl-type3,bus=root_port13,memdev=cxl-mem1,lsa=cxl-lsa1,id=cxl-pmem0 \
+ -cxl-fixed-memory-window targets.0=cxl.1,size=4G
+
+A setup suitable for 4 way interleave. Only one fixed window provided, to enable 2 way
+interleave across 2 CXL host bridges. Each host bridge has 2 CXL Root Ports, with
+the CXL Type3 device directly attached (no switches).::
+
+ qemu-system-aarch64 -M virt,gic-version=3,cxl=on -m 4g,maxmem=8G,slots=8 -cpu max \
+ ...
+ -object memory-backend-file,id=cxl-mem1,share=on,mem-path=/tmp/cxltest.raw,size=256M \
+ -object memory-backend-file,id=cxl-mem2,share=on,mem-path=/tmp/cxltest2.raw,size=256M \
+ -object memory-backend-file,id=cxl-mem3,share=on,mem-path=/tmp/cxltest3.raw,size=256M \
+ -object memory-backend-file,id=cxl-mem4,share=on,mem-path=/tmp/cxltest4.raw,size=256M \
+ -object memory-backend-file,id=cxl-lsa1,share=on,mem-path=/tmp/lsa.raw,size=256M \
+ -object memory-backend-file,id=cxl-lsa2,share=on,mem-path=/tmp/lsa2.raw,size=256M \
+ -object memory-backend-file,id=cxl-lsa3,share=on,mem-path=/tmp/lsa3.raw,size=256M \
+ -object memory-backend-file,id=cxl-lsa4,share=on,mem-path=/tmp/lsa4.raw,size=256M \
+ -device pxb-cxl,bus_nr=12,bus=pcie.0,id=cxl.1 \
+ -device pxb-cxl,bus_nr=222,bus=pcie.0,id=cxl.2 \
+ -device cxl-rp,port=0,bus=cxl.1,id=root_port13,chassis=0,slot=2 \
+ -device cxl-type3,bus=root_port13,memdev=cxl-mem1,lsa=cxl-lsa1,id=cxl-pmem0 \
+ -device cxl-rp,port=1,bus=cxl.1,id=root_port14,chassis=0,slot=3 \
+ -device cxl-type3,bus=root_port14,memdev=cxl-mem2,lsa=cxl-lsa2,id=cxl-pmem1 \
+ -device cxl-rp,port=0,bus=cxl.2,id=root_port15,chassis=0,slot=5 \
+ -device cxl-type3,bus=root_port15,memdev=cxl-mem3,lsa=cxl-lsa3,id=cxl-pmem2 \
+ -device cxl-rp,port=1,bus=cxl.2,id=root_port16,chassis=0,slot=6 \
+ -device cxl-type3,bus=root_port16,memdev=cxl-mem4,lsa=cxl-lsa4,id=cxl-pmem3 \
+ -cxl-fixed-memory-window targets.0=cxl.1,targets.1=cxl.2,size=4G,interleave-granularity=8k
+
+Kernel Configuration Options
+----------------------------
+
+In Linux 5.18 the followings options are necessary to make use of
+OS management of CXL memory devices as described here.
+
+* CONFIG_CXL_BUS
+* CONFIG_CXL_PCI
+* CONFIG_CXL_ACPI
+* CONFIG_CXL_PMEM
+* CONFIG_CXL_MEM
+* CONFIG_CXL_PORT
+* CONFIG_CXL_REGION
+
+References
+----------
+
+ - Consortium website for specifications etc:
+ http://www.computeexpresslink.org
+ - Compute Express link Revision 2 specification, October 2020
+ - CEDT CFMWS & QTG _DSM ECN May 2021