diff options
Diffstat (limited to 'docs/WritingAnLLVMBackend.rst')
-rw-r--r-- | docs/WritingAnLLVMBackend.rst | 1835 |
1 files changed, 1835 insertions, 0 deletions
diff --git a/docs/WritingAnLLVMBackend.rst b/docs/WritingAnLLVMBackend.rst new file mode 100644 index 0000000000..7803163ae6 --- /dev/null +++ b/docs/WritingAnLLVMBackend.rst @@ -0,0 +1,1835 @@ +================================ +Writing an LLVM Compiler Backend +================================ + +.. sectionauthor:: Mason Woo <http://www.woo.com> and Misha Brukman <http://misha.brukman.net> + +.. contents:: + :local: + +Introduction +============ + +This document describes techniques for writing compiler backends that convert +the LLVM Intermediate Representation (IR) to code for a specified machine or +other languages. Code intended for a specific machine can take the form of +either assembly code or binary code (usable for a JIT compiler). + +The backend of LLVM features a target-independent code generator that may +create output for several types of target CPUs --- including X86, PowerPC, +ARM, and SPARC. The backend may also be used to generate code targeted at SPUs +of the Cell processor or GPUs to support the execution of compute kernels. + +The document focuses on existing examples found in subdirectories of +``llvm/lib/Target`` in a downloaded LLVM release. In particular, this document +focuses on the example of creating a static compiler (one that emits text +assembly) for a SPARC target, because SPARC has fairly standard +characteristics, such as a RISC instruction set and straightforward calling +conventions. + +Audience +-------- + +The audience for this document is anyone who needs to write an LLVM backend to +generate code for a specific hardware or software target. + +Prerequisite Reading +-------------------- + +These essential documents must be read before reading this document: + +* `LLVM Language Reference Manual <LangRef.html>`_ --- a reference manual for + the LLVM assembly language. + +* :doc:`CodeGenerator` --- a guide to the components (classes and code + generation algorithms) for translating the LLVM internal representation into + machine code for a specified target. Pay particular attention to the + descriptions of code generation stages: Instruction Selection, Scheduling and + Formation, SSA-based Optimization, Register Allocation, Prolog/Epilog Code + Insertion, Late Machine Code Optimizations, and Code Emission. + +* :doc:`TableGenFundamentals` --- a document that describes the TableGen + (``tblgen``) application that manages domain-specific information to support + LLVM code generation. TableGen processes input from a target description + file (``.td`` suffix) and generates C++ code that can be used for code + generation. + +* `Writing an LLVM Pass <WritingAnLLVMPass.html>`_ --- The assembly printer is + a ``FunctionPass``, as are several SelectionDAG processing steps. + +To follow the SPARC examples in this document, have a copy of `The SPARC +Architecture Manual, Version 8 <http://www.sparc.org/standards/V8.pdf>`_ for +reference. For details about the ARM instruction set, refer to the `ARM +Architecture Reference Manual <http://infocenter.arm.com/>`_. For more about +the GNU Assembler format (``GAS``), see `Using As +<http://sourceware.org/binutils/docs/as/index.html>`_, especially for the +assembly printer. "Using As" contains a list of target machine dependent +features. + +Basic Steps +----------- + +To write a compiler backend for LLVM that converts the LLVM IR to code for a +specified target (machine or other language), follow these steps: + +* Create a subclass of the ``TargetMachine`` class that describes + characteristics of your target machine. Copy existing examples of specific + ``TargetMachine`` class and header files; for example, start with + ``SparcTargetMachine.cpp`` and ``SparcTargetMachine.h``, but change the file + names for your target. Similarly, change code that references "``Sparc``" to + reference your target. + +* Describe the register set of the target. Use TableGen to generate code for + register definition, register aliases, and register classes from a + target-specific ``RegisterInfo.td`` input file. You should also write + additional code for a subclass of the ``TargetRegisterInfo`` class that + represents the class register file data used for register allocation and also + describes the interactions between registers. + +* Describe the instruction set of the target. Use TableGen to generate code + for target-specific instructions from target-specific versions of + ``TargetInstrFormats.td`` and ``TargetInstrInfo.td``. You should write + additional code for a subclass of the ``TargetInstrInfo`` class to represent + machine instructions supported by the target machine. + +* Describe the selection and conversion of the LLVM IR from a Directed Acyclic + Graph (DAG) representation of instructions to native target-specific + instructions. Use TableGen to generate code that matches patterns and + selects instructions based on additional information in a target-specific + version of ``TargetInstrInfo.td``. Write code for ``XXXISelDAGToDAG.cpp``, + where ``XXX`` identifies the specific target, to perform pattern matching and + DAG-to-DAG instruction selection. Also write code in ``XXXISelLowering.cpp`` + to replace or remove operations and data types that are not supported + natively in a SelectionDAG. + +* Write code for an assembly printer that converts LLVM IR to a GAS format for + your target machine. You should add assembly strings to the instructions + defined in your target-specific version of ``TargetInstrInfo.td``. You + should also write code for a subclass of ``AsmPrinter`` that performs the + LLVM-to-assembly conversion and a trivial subclass of ``TargetAsmInfo``. + +* Optionally, add support for subtargets (i.e., variants with different + capabilities). You should also write code for a subclass of the + ``TargetSubtarget`` class, which allows you to use the ``-mcpu=`` and + ``-mattr=`` command-line options. + +* Optionally, add JIT support and create a machine code emitter (subclass of + ``TargetJITInfo``) that is used to emit binary code directly into memory. + +In the ``.cpp`` and ``.h``. files, initially stub up these methods and then +implement them later. Initially, you may not know which private members that +the class will need and which components will need to be subclassed. + +Preliminaries +------------- + +To actually create your compiler backend, you need to create and modify a few +files. The absolute minimum is discussed here. But to actually use the LLVM +target-independent code generator, you must perform the steps described in the +:doc:`LLVM Target-Independent Code Generator <CodeGenerator>` document. + +First, you should create a subdirectory under ``lib/Target`` to hold all the +files related to your target. If your target is called "Dummy", create the +directory ``lib/Target/Dummy``. + +In this new directory, create a ``Makefile``. It is easiest to copy a +``Makefile`` of another target and modify it. It should at least contain the +``LEVEL``, ``LIBRARYNAME`` and ``TARGET`` variables, and then include +``$(LEVEL)/Makefile.common``. The library can be named ``LLVMDummy`` (for +example, see the MIPS target). Alternatively, you can split the library into +``LLVMDummyCodeGen`` and ``LLVMDummyAsmPrinter``, the latter of which should be +implemented in a subdirectory below ``lib/Target/Dummy`` (for example, see the +PowerPC target). + +Note that these two naming schemes are hardcoded into ``llvm-config``. Using +any other naming scheme will confuse ``llvm-config`` and produce a lot of +(seemingly unrelated) linker errors when linking ``llc``. + +To make your target actually do something, you need to implement a subclass of +``TargetMachine``. This implementation should typically be in the file +``lib/Target/DummyTargetMachine.cpp``, but any file in the ``lib/Target`` +directory will be built and should work. To use LLVM's target independent code +generator, you should do what all current machine backends do: create a +subclass of ``LLVMTargetMachine``. (To create a target from scratch, create a +subclass of ``TargetMachine``.) + +To get LLVM to actually build and link your target, you need to add it to the +``TARGETS_TO_BUILD`` variable. To do this, you modify the configure script to +know about your target when parsing the ``--enable-targets`` option. Search +the configure script for ``TARGETS_TO_BUILD``, add your target to the lists +there (some creativity required), and then reconfigure. Alternatively, you can +change ``autotools/configure.ac`` and regenerate configure by running +``./autoconf/AutoRegen.sh``. + +Target Machine +============== + +``LLVMTargetMachine`` is designed as a base class for targets implemented with +the LLVM target-independent code generator. The ``LLVMTargetMachine`` class +should be specialized by a concrete target class that implements the various +virtual methods. ``LLVMTargetMachine`` is defined as a subclass of +``TargetMachine`` in ``include/llvm/Target/TargetMachine.h``. The +``TargetMachine`` class implementation (``TargetMachine.cpp``) also processes +numerous command-line options. + +To create a concrete target-specific subclass of ``LLVMTargetMachine``, start +by copying an existing ``TargetMachine`` class and header. You should name the +files that you create to reflect your specific target. For instance, for the +SPARC target, name the files ``SparcTargetMachine.h`` and +``SparcTargetMachine.cpp``. + +For a target machine ``XXX``, the implementation of ``XXXTargetMachine`` must +have access methods to obtain objects that represent target components. These +methods are named ``get*Info``, and are intended to obtain the instruction set +(``getInstrInfo``), register set (``getRegisterInfo``), stack frame layout +(``getFrameInfo``), and similar information. ``XXXTargetMachine`` must also +implement the ``getDataLayout`` method to access an object with target-specific +data characteristics, such as data type size and alignment requirements. + +For instance, for the SPARC target, the header file ``SparcTargetMachine.h`` +declares prototypes for several ``get*Info`` and ``getDataLayout`` methods that +simply return a class member. + +.. code-block:: c++ + + namespace llvm { + + class Module; + + class SparcTargetMachine : public LLVMTargetMachine { + const DataLayout DataLayout; // Calculates type size & alignment + SparcSubtarget Subtarget; + SparcInstrInfo InstrInfo; + TargetFrameInfo FrameInfo; + + protected: + virtual const TargetAsmInfo *createTargetAsmInfo() const; + + public: + SparcTargetMachine(const Module &M, const std::string &FS); + + virtual const SparcInstrInfo *getInstrInfo() const {return &InstrInfo; } + virtual const TargetFrameInfo *getFrameInfo() const {return &FrameInfo; } + virtual const TargetSubtarget *getSubtargetImpl() const{return &Subtarget; } + virtual const TargetRegisterInfo *getRegisterInfo() const { + return &InstrInfo.getRegisterInfo(); + } + virtual const DataLayout *getDataLayout() const { return &DataLayout; } + static unsigned getModuleMatchQuality(const Module &M); + + // Pass Pipeline Configuration + virtual bool addInstSelector(PassManagerBase &PM, bool Fast); + virtual bool addPreEmitPass(PassManagerBase &PM, bool Fast); + }; + + } // end namespace llvm + +* ``getInstrInfo()`` +* ``getRegisterInfo()`` +* ``getFrameInfo()`` +* ``getDataLayout()`` +* ``getSubtargetImpl()`` + +For some targets, you also need to support the following methods: + +* ``getTargetLowering()`` +* ``getJITInfo()`` + +In addition, the ``XXXTargetMachine`` constructor should specify a +``TargetDescription`` string that determines the data layout for the target +machine, including characteristics such as pointer size, alignment, and +endianness. For example, the constructor for ``SparcTargetMachine`` contains +the following: + +.. code-block:: c++ + + SparcTargetMachine::SparcTargetMachine(const Module &M, const std::string &FS) + : DataLayout("E-p:32:32-f128:128:128"), + Subtarget(M, FS), InstrInfo(Subtarget), + FrameInfo(TargetFrameInfo::StackGrowsDown, 8, 0) { + } + +Hyphens separate portions of the ``TargetDescription`` string. + +* An upper-case "``E``" in the string indicates a big-endian target data model. + A lower-case "``e``" indicates little-endian. + +* "``p:``" is followed by pointer information: size, ABI alignment, and + preferred alignment. If only two figures follow "``p:``", then the first + value is pointer size, and the second value is both ABI and preferred + alignment. + +* Then a letter for numeric type alignment: "``i``", "``f``", "``v``", or + "``a``" (corresponding to integer, floating point, vector, or aggregate). + "``i``", "``v``", or "``a``" are followed by ABI alignment and preferred + alignment. "``f``" is followed by three values: the first indicates the size + of a long double, then ABI alignment, and then ABI preferred alignment. + +Target Registration +=================== + +You must also register your target with the ``TargetRegistry``, which is what +other LLVM tools use to be able to lookup and use your target at runtime. The +``TargetRegistry`` can be used directly, but for most targets there are helper +templates which should take care of the work for you. + +All targets should declare a global ``Target`` object which is used to +represent the target during registration. Then, in the target's ``TargetInfo`` +library, the target should define that object and use the ``RegisterTarget`` +template to register the target. For example, the Sparc registration code +looks like this: + +.. code-block:: c++ + + Target llvm::TheSparcTarget; + + extern "C" void LLVMInitializeSparcTargetInfo() { + RegisterTarget<Triple::sparc, /*HasJIT=*/false> + X(TheSparcTarget, "sparc", "Sparc"); + } + +This allows the ``TargetRegistry`` to look up the target by name or by target +triple. In addition, most targets will also register additional features which +are available in separate libraries. These registration steps are separate, +because some clients may wish to only link in some parts of the target --- the +JIT code generator does not require the use of the assembler printer, for +example. Here is an example of registering the Sparc assembly printer: + +.. code-block:: c++ + + extern "C" void LLVMInitializeSparcAsmPrinter() { + RegisterAsmPrinter<SparcAsmPrinter> X(TheSparcTarget); + } + +For more information, see "`llvm/Target/TargetRegistry.h +</doxygen/TargetRegistry_8h-source.html>`_". + +Register Set and Register Classes +================================= + +You should describe a concrete target-specific class that represents the +register file of a target machine. This class is called ``XXXRegisterInfo`` +(where ``XXX`` identifies the target) and represents the class register file +data that is used for register allocation. It also describes the interactions +between registers. + +You also need to define register classes to categorize related registers. A +register class should be added for groups of registers that are all treated the +same way for some instruction. Typical examples are register classes for +integer, floating-point, or vector registers. A register allocator allows an +instruction to use any register in a specified register class to perform the +instruction in a similar manner. Register classes allocate virtual registers +to instructions from these sets, and register classes let the +target-independent register allocator automatically choose the actual +registers. + +Much of the code for registers, including register definition, register +aliases, and register classes, is generated by TableGen from +``XXXRegisterInfo.td`` input files and placed in ``XXXGenRegisterInfo.h.inc`` +and ``XXXGenRegisterInfo.inc`` output files. Some of the code in the +implementation of ``XXXRegisterInfo`` requires hand-coding. + +Defining a Register +------------------- + +The ``XXXRegisterInfo.td`` file typically starts with register definitions for +a target machine. The ``Register`` class (specified in ``Target.td``) is used +to define an object for each register. The specified string ``n`` becomes the +``Name`` of the register. The basic ``Register`` object does not have any +subregisters and does not specify any aliases. + +.. code-block:: llvm + + class Register<string n> { + string Namespace = ""; + string AsmName = n; + string Name = n; + int SpillSize = 0; + int SpillAlignment = 0; + list<Register> Aliases = []; + list<Register> SubRegs = []; + list<int> DwarfNumbers = []; + } + +For example, in the ``X86RegisterInfo.td`` file, there are register definitions +that utilize the ``Register`` class, such as: + +.. code-block:: llvm + + def AL : Register<"AL">, DwarfRegNum<[0, 0, 0]>; + +This defines the register ``AL`` and assigns it values (with ``DwarfRegNum``) +that are used by ``gcc``, ``gdb``, or a debug information writer to identify a +register. For register ``AL``, ``DwarfRegNum`` takes an array of 3 values +representing 3 different modes: the first element is for X86-64, the second for +exception handling (EH) on X86-32, and the third is generic. -1 is a special +Dwarf number that indicates the gcc number is undefined, and -2 indicates the +register number is invalid for this mode. + +From the previously described line in the ``X86RegisterInfo.td`` file, TableGen +generates this code in the ``X86GenRegisterInfo.inc`` file: + +.. code-block:: c++ + + static const unsigned GR8[] = { X86::AL, ... }; + + const unsigned AL_AliasSet[] = { X86::AX, X86::EAX, X86::RAX, 0 }; + + const TargetRegisterDesc RegisterDescriptors[] = { + ... + { "AL", "AL", AL_AliasSet, Empty_SubRegsSet, Empty_SubRegsSet, AL_SuperRegsSet }, ... + +From the register info file, TableGen generates a ``TargetRegisterDesc`` object +for each register. ``TargetRegisterDesc`` is defined in +``include/llvm/Target/TargetRegisterInfo.h`` with the following fields: + +.. code-block:: c++ + + struct TargetRegisterDesc { + const char *AsmName; // Assembly language name for the register + const char *Name; // Printable name for the reg (for debugging) + const unsigned *AliasSet; // Register Alias Set + const unsigned *SubRegs; // Sub-register set + const unsigned *ImmSubRegs; // Immediate sub-register set + const unsigned *SuperRegs; // Super-register set + }; + +TableGen uses the entire target description file (``.td``) to determine text +names for the register (in the ``AsmName`` and ``Name`` fields of +``TargetRegisterDesc``) and the relationships of other registers to the defined +register (in the other ``TargetRegisterDesc`` fields). In this example, other +definitions establish the registers "``AX``", "``EAX``", and "``RAX``" as +aliases for one another, so TableGen generates a null-terminated array +(``AL_AliasSet``) for this register alias set. + +The ``Register`` class is commonly used as a base class for more complex +classes. In ``Target.td``, the ``Register`` class is the base for the +``RegisterWithSubRegs`` class that is used to define registers that need to +specify subregisters in the ``SubRegs`` list, as shown here: + +.. code-block:: llvm + + class RegisterWithSubRegs<string n, list<Register> subregs> : Register<n> { + let SubRegs = subregs; + } + +In ``SparcRegisterInfo.td``, additional register classes are defined for SPARC: +a ``Register`` subclass, ``SparcReg``, and further subclasses: ``Ri``, ``Rf``, +and ``Rd``. SPARC registers are identified by 5-bit ID numbers, which is a +feature common to these subclasses. Note the use of "``let``" expressions to +override values that are initially defined in a superclass (such as ``SubRegs`` +field in the ``Rd`` class). + +.. code-block:: llvm + + class SparcReg<string n> : Register<n> { + field bits<5> Num; + let Namespace = "SP"; + } + // Ri - 32-bit integer registers + class Ri<bits<5> num, string n> : + SparcReg<n> { + let Num = num; + } + // Rf - 32-bit floating-point registers + class Rf<bits<5> num, string n> : + SparcReg<n> { + let Num = num; + } + // Rd - Slots in the FP register file for 64-bit floating-point values. + class Rd<bits<5> num, string n, list<Register> subregs> : SparcReg<n> { + let Num = num; + let SubRegs = subregs; + } + +In the ``SparcRegisterInfo.td`` file, there are register definitions that +utilize these subclasses of ``Register``, such as: + +.. code-block:: llvm + + def G0 : Ri< 0, "G0">, DwarfRegNum<[0]>; + def G1 : Ri< 1, "G1">, DwarfRegNum<[1]>; + ... + def F0 : Rf< 0, "F0">, DwarfRegNum<[32]>; + def F1 : Rf< 1, "F1">, DwarfRegNum<[33]>; + ... + def D0 : Rd< 0, "F0", [F0, F1]>, DwarfRegNum<[32]>; + def D1 : Rd< 2, "F2", [F2, F3]>, DwarfRegNum<[34]>; + +The last two registers shown above (``D0`` and ``D1``) are double-precision +floating-point registers that are aliases for pairs of single-precision +floating-point sub-registers. In addition to aliases, the sub-register and +super-register relationships of the defined register are in fields of a +register's ``TargetRegisterDesc``. + +Defining a Register Class +------------------------- + +The ``RegisterClass`` class (specified in ``Target.td``) is used to define an +object that represents a group of related registers and also defines the +default allocation order of the registers. A target description file +``XXXRegisterInfo.td`` that uses ``Target.td`` can construct register classes +using the following class: + +.. code-block:: llvm + + class RegisterClass<string namespace, + list<ValueType> regTypes, int alignment, dag regList> { + string Namespace = namespace; + list<ValueType> RegTypes = regTypes; + int Size = 0; // spill size, in bits; zero lets tblgen pick the size + int Alignment = alignment; + + // CopyCost is the cost of copying a value between two registers + // default value 1 means a single instruction + // A negative value means copying is extremely expensive or impossible + int CopyCost = 1; + dag MemberList = regList; + + // for register classes that are subregisters of this class + list<RegisterClass> SubRegClassList = []; + + code MethodProtos = [{}]; // to insert arbitrary code + code MethodBodies = [{}]; + } + +To define a ``RegisterClass``, use the following 4 arguments: + +* The first argument of the definition is the name of the namespace. + +* The second argument is a list of ``ValueType`` register type values that are + defined in ``include/llvm/CodeGen/ValueTypes.td``. Defined values include + integer types (such as ``i16``, ``i32``, and ``i1`` for Boolean), + floating-point types (``f32``, ``f64``), and vector types (for example, + ``v8i16`` for an ``8 x i16`` vector). All registers in a ``RegisterClass`` + must have the same ``ValueType``, but some registers may store vector data in + different configurations. For example a register that can process a 128-bit + vector may be able to handle 16 8-bit integer elements, 8 16-bit integers, 4 + 32-bit integers, and so on. + +* The third argument of the ``RegisterClass`` definition specifies the + alignment required of the registers when they are stored or loaded to + memory. + +* The final argument, ``regList``, specifies which registers are in this class. + If an alternative allocation order method is not specified, then ``regList`` + also defines the order of allocation used by the register allocator. Besides + simply listing registers with ``(add R0, R1, ...)``, more advanced set + operators are available. See ``include/llvm/Target/Target.td`` for more + information. + +In ``SparcRegisterInfo.td``, three ``RegisterClass`` objects are defined: +``FPRegs``, ``DFPRegs``, and ``IntRegs``. For all three register classes, the +first argument defines the namespace with the string "``SP``". ``FPRegs`` +defines a group of 32 single-precision floating-point registers (``F0`` to +``F31``); ``DFPRegs`` defines a group of 16 double-precision registers +(``D0-D15``). + +.. code-block:: llvm + + // F0, F1, F2, ..., F31 + def FPRegs : RegisterClass<"SP", [f32], 32, (sequence "F%u", 0, 31)>; + + def DFPRegs : RegisterClass<"SP", [f64], 64, + (add D0, D1, D2, D3, D4, D5, D6, D7, D8, + D9, D10, D11, D12, D13, D14, D15)>; + + def IntRegs : RegisterClass<"SP", [i32], 32, + (add L0, L1, L2, L3, L4, L5, L6, L7, + I0, I1, I2, I3, I4, I5, + O0, O1, O2, O3, O4, O5, O7, + G1, + // Non-allocatable regs: + G2, G3, G4, + O6, // stack ptr + I6, // frame ptr + I7, // return address + G0, // constant zero + G5, G6, G7 // reserved for kernel + )>; + +Using ``SparcRegisterInfo.td`` with TableGen generates several output files +that are intended for inclusion in other source code that you write. +``SparcRegisterInfo.td`` generates ``SparcGenRegisterInfo.h.inc``, which should +be included in the header file for the implementation of the SPARC register +implementation that you write (``SparcRegisterInfo.h``). In +``SparcGenRegisterInfo.h.inc`` a new structure is defined called +``SparcGenRegisterInfo`` that uses ``TargetRegisterInfo`` as its base. It also +specifies types, based upon the defined register classes: ``DFPRegsClass``, +``FPRegsClass``, and ``IntRegsClass``. + +``SparcRegisterInfo.td`` also generates ``SparcGenRegisterInfo.inc``, which is +included at the bottom of ``SparcRegisterInfo.cpp``, the SPARC register +implementation. The code below shows only the generated integer registers and +associated register classes. The order of registers in ``IntRegs`` reflects +the order in the definition of ``IntRegs`` in the target description file. + +.. code-block:: c++ + + // IntRegs Register Class... + static const unsigned IntRegs[] = { + SP::L0, SP::L1, SP::L2, SP::L3, SP::L4, SP::L5, + SP::L6, SP::L7, SP::I0, SP::I1, SP::I2, SP::I3, + SP::I4, SP::I5, SP::O0, SP::O1, SP::O2, SP::O3, + SP::O4, SP::O5, SP::O7, SP::G1, SP::G2, SP::G3, + SP::G4, SP::O6, SP::I6, SP::I7, SP::G0, SP::G5, + SP::G6, SP::G7, + }; + + // IntRegsVTs Register Class Value Types... + static const MVT::ValueType IntRegsVTs[] = { + MVT::i32, MVT::Other + }; + + namespace SP { // Register class instances + DFPRegsClass DFPRegsRegClass; + FPRegsClass FPRegsRegClass; + IntRegsClass IntRegsRegClass; + ... + // IntRegs Sub-register Classess... + static const TargetRegisterClass* const IntRegsSubRegClasses [] = { + NULL + }; + ... + // IntRegs Super-register Classess... + static const TargetRegisterClass* const IntRegsSuperRegClasses [] = { + NULL + }; + ... + // IntRegs Register Class sub-classes... + static const TargetRegisterClass* const IntRegsSubclasses [] = { + NULL + }; + ... + // IntRegs Register Class super-classes... + static const TargetRegisterClass* const IntRegsSuperclasses [] = { + NULL + }; + + IntRegsClass::IntRegsClass() : TargetRegisterClass(IntRegsRegClassID, + IntRegsVTs, IntRegsSubclasses, IntRegsSuperclasses, IntRegsSubRegClasses, + IntRegsSuperRegClasses, 4, 4, 1, IntRegs, IntRegs + 32) {} + } + +The register allocators will avoid using reserved registers, and callee saved +registers are not used until all the volatile registers have been used. That +is usually good enough, but in some cases it may be necessary to provide custom +allocation orders. + +Implement a subclass of ``TargetRegisterInfo`` +---------------------------------------------- + +The final step is to hand code portions of ``XXXRegisterInfo``, which +implements the interface described in ``TargetRegisterInfo.h`` (see +:ref:`TargetRegisterInfo`). These functions return ``0``, ``NULL``, or +``false``, unless overridden. Here is a list of functions that are overridden +for the SPARC implementation in ``SparcRegisterInfo.cpp``: + +* ``getCalleeSavedRegs`` --- Returns a list of callee-saved registers in the + order of the desired callee-save stack frame offset. + +* ``getReservedRegs`` --- Returns a bitset indexed by physical register + numbers, indicating if a particular register is unavailable. + +* ``hasFP`` --- Return a Boolean indicating if a function should have a + dedicated frame pointer register. + +* ``eliminateCallFramePseudoInstr`` --- If call frame setup or destroy pseudo + instructions are used, this can be called to eliminate them. + +* ``eliminateFrameIndex`` --- Eliminate abstract frame indices from + instructions that may use them. + +* ``emitPrologue`` --- Insert prologue code into the function. + +* ``emitEpilogue`` --- Insert epilogue code into the function. + +.. _instruction-set: + +Instruction Set +=============== + +During the early stages of code generation, the LLVM IR code is converted to a +``SelectionDAG`` with nodes that are instances of the ``SDNode`` class +containing target instructions. An ``SDNode`` has an opcode, operands, type +requirements, and operation properties. For example, is an operation +commutative, does an operation load from memory. The various operation node +types are described in the ``include/llvm/CodeGen/SelectionDAGNodes.h`` file +(values of the ``NodeType`` enum in the ``ISD`` namespace). + +TableGen uses the following target description (``.td``) input files to +generate much of the code for instruction definition: + +* ``Target.td`` --- Where the ``Instruction``, ``Operand``, ``InstrInfo``, and + other fundamental classes are defined. + +* ``TargetSelectionDAG.td`` --- Used by ``SelectionDAG`` instruction selection + generators, contains ``SDTC*`` classes (selection DAG type constraint), + definitions of ``SelectionDAG`` nodes (such as ``imm``, ``cond``, ``bb``, + ``add``, ``fadd``, ``sub``), and pattern support (``Pattern``, ``Pat``, + ``PatFrag``, ``PatLeaf``, ``ComplexPattern``. + +* ``XXXInstrFormats.td`` --- Patterns for definitions of target-specific + instructions. + +* ``XXXInstrInfo.td`` --- Target-specific definitions of instruction templates, + condition codes, and instructions of an instruction set. For architecture + modifications, a different file name may be used. For example, for Pentium + with SSE instruction, this file is ``X86InstrSSE.td``, and for Pentium with + MMX, this file is ``X86InstrMMX.td``. + +There is also a target-specific ``XXX.td`` file, where ``XXX`` is the name of +the target. The ``XXX.td`` file includes the other ``.td`` input files, but +its contents are only directly important for subtargets. + +You should describe a concrete target-specific class ``XXXInstrInfo`` that +represents machine instructions supported by a target machine. +``XXXInstrInfo`` contains an array of ``XXXInstrDescriptor`` objects, each of +which describes one instruction. An instruction descriptor defines: + +* Opcode mnemonic +* Number of operands +* List of implicit register definitions and uses +* Target-independent properties (such as memory access, is commutable) +* Target-specific flags + +The Instruction class (defined in ``Target.td``) is mostly used as a base for +more complex instruction classes. + +.. code-block:: llvm + + class Instruction { + string Namespace = ""; + dag OutOperandList; // A dag containing the MI def operand list. + dag InOperandList; // A dag containing the MI use operand list. + string AsmString = ""; // The .s format to print the instruction with. + list<dag> Pattern; // Set to the DAG pattern for this instruction. + list<Register> Uses = []; + list<Register> Defs = []; + list<Predicate> Predicates = []; // predicates turned into isel match code + ... remainder not shown for space ... + } + +A ``SelectionDAG`` node (``SDNode``) should contain an object representing a +target-specific instruction that is defined in ``XXXInstrInfo.td``. The +instruction objects should represent instructions from the architecture manual +of the target machine (such as the SPARC Architecture Manual for the SPARC +target). + +A single instruction from the architecture manual is often modeled as multiple +target instructions, depending upon its operands. For example, a manual might +describe an add instruction that takes a register or an immediate operand. An +LLVM target could model this with two instructions named ``ADDri`` and +``ADDrr``. + +You should define a class for each instruction category and define each opcode +as a subclass of the category with appropriate parameters such as the fixed +binary encoding of opcodes and extended opcodes. You should map the register +bits to the bits of the instruction in which they are encoded (for the JIT). +Also you should specify how the instruction should be printed when the +automatic assembly printer is used. + +As is described in the SPARC Architecture Manual, Version 8, there are three +major 32-bit formats for instructions. Format 1 is only for the ``CALL`` +instruction. Format 2 is for branch on condition codes and ``SETHI`` (set high +bits of a register) instructions. Format 3 is for other instructions. + +Each of these formats has corresponding classes in ``SparcInstrFormat.td``. +``InstSP`` is a base class for other instruction classes. Additional base +classes are specified for more precise formats: for example in +``SparcInstrFormat.td``, ``F2_1`` is for ``SETHI``, and ``F2_2`` is for +branches. There are three other base classes: ``F3_1`` for register/register +operations, ``F3_2`` for register/immediate operations, and ``F3_3`` for +floating-point operations. ``SparcInstrInfo.td`` also adds the base class +``Pseudo`` for synthetic SPARC instructions. + +``SparcInstrInfo.td`` largely consists of operand and instruction definitions +for the SPARC target. In ``SparcInstrInfo.td``, the following target +description file entry, ``LDrr``, defines the Load Integer instruction for a +Word (the ``LD`` SPARC opcode) from a memory address to a register. The first +parameter, the value 3 (``11``\ :sub:`2`), is the operation value for this +category of operation. The second parameter (``000000``\ :sub:`2`) is the +specific operation value for ``LD``/Load Word. The third parameter is the +output destination, which is a register operand and defined in the ``Register`` +target description file (``IntRegs``). + +.. code-block:: llvm + + def LDrr : F3_1 <3, 0b000000, (outs IntRegs:$dst), (ins MEMrr:$addr), + "ld [$addr], $dst", + [(set IntRegs:$dst, (load ADDRrr:$addr))]>; + +The fourth parameter is the input source, which uses the address operand +``MEMrr`` that is defined earlier in ``SparcInstrInfo.td``: + +.. code-block:: llvm + + def MEMrr : Operand<i32> { + let PrintMethod = "printMemOperand"; + let MIOperandInfo = (ops IntRegs, IntRegs); + } + +The fifth parameter is a string that is used by the assembly printer and can be +left as an empty string until the assembly printer interface is implemented. +The sixth and final parameter is the pattern used to match the instruction +during the SelectionDAG Select Phase described in :doc:`CodeGenerator`. +This parameter is detailed in the next section, :ref:`instruction-selector`. + +Instruction class definitions are not overloaded for different operand types, +so separate versions of instructions are needed for register, memory, or +immediate value operands. For example, to perform a Load Integer instruction +for a Word from an immediate operand to a register, the following instruction +class is defined: + +.. code-block:: llvm + + def LDri : F3_2 <3, 0b000000, (outs IntRegs:$dst), (ins MEMri:$addr), + "ld [$addr], $dst", + [(set IntRegs:$dst, (load ADDRri:$addr))]>; + +Writing these definitions for so many similar instructions can involve a lot of +cut and paste. In ``.td`` files, the ``multiclass`` directive enables the +creation of templates to define several instruction classes at once (using the +``defm`` directive). For example in ``SparcInstrInfo.td``, the ``multiclass`` +pattern ``F3_12`` is defined to create 2 instruction classes each time +``F3_12`` is invoked: + +.. code-block:: llvm + + multiclass F3_12 <string OpcStr, bits<6> Op3Val, SDNode OpNode> { + def rr : F3_1 <2, Op3Val, + (outs IntRegs:$dst), (ins IntRegs:$b, IntRegs:$c), + !strconcat(OpcStr, " $b, $c, $dst"), + [(set IntRegs:$dst, (OpNode IntRegs:$b, IntRegs:$c))]>; + def ri : F3_2 <2, Op3Val, + (outs IntRegs:$dst), (ins IntRegs:$b, i32imm:$c), + !strconcat(OpcStr, " $b, $c, $dst"), + [(set IntRegs:$dst, (OpNode IntRegs:$b, simm13:$c))]>; + } + +So when the ``defm`` directive is used for the ``XOR`` and ``ADD`` +instructions, as seen below, it creates four instruction objects: ``XORrr``, +``XORri``, ``ADDrr``, and ``ADDri``. + +.. code-block:: llvm + + defm XOR : F3_12<"xor", 0b000011, xor>; + defm ADD : F3_12<"add", 0b000000, add>; + +``SparcInstrInfo.td`` also includes definitions for condition codes that are +referenced by branch instructions. The following definitions in +``SparcInstrInfo.td`` indicate the bit location of the SPARC condition code. +For example, the 10\ :sup:`th` bit represents the "greater than" condition for +integers, and the 22\ :sup:`nd` bit represents the "greater than" condition for +floats. + +.. code-block:: llvm + + def ICC_NE : ICC_VAL< 9>; // Not Equal + def ICC_E : ICC_VAL< 1>; // Equal + def ICC_G : ICC_VAL<10>; // Greater + ... + def FCC_U : FCC_VAL<23>; // Unordered + def FCC_G : FCC_VAL<22>; // Greater + def FCC_UG : FCC_VAL<21>; // Unordered or Greater + ... + +(Note that ``Sparc.h`` also defines enums that correspond to the same SPARC +condition codes. Care must be taken to ensure the values in ``Sparc.h`` +correspond to the values in ``SparcInstrInfo.td``. I.e., ``SPCC::ICC_NE = 9``, +``SPCC::FCC_U = 23`` and so on.) + +Instruction Operand Mapping +--------------------------- + +The code generator backend maps instruction operands to fields in the +instruction. Operands are assigned to unbound fields in the instruction in the +order they are defined. Fields are bound when they are assigned a value. For +example, the Sparc target defines the ``XNORrr`` instruction as a ``F3_1`` +format instruction having three operands. + +.. code-block:: llvm + + def XNORrr : F3_1<2, 0b000111, + (outs IntRegs:$dst), (ins IntRegs:$b, IntRegs:$c), + "xnor $b, $c, $dst", + [(set IntRegs:$dst, (not (xor IntRegs:$b, IntRegs:$c)))]>; + +The instruction templates in ``SparcInstrFormats.td`` show the base class for +``F3_1`` is ``InstSP``. + +.. code-block:: llvm + + class InstSP<dag outs, dag ins, string asmstr, list<dag> pattern> : Instruction { + field bits<32> Inst; + let Namespace = "SP"; + bits<2> op; + let Inst{31-30} = op; + dag OutOperandList = outs; + dag InOperandList = ins; + let AsmString = asmstr; + let Pattern = pattern; + } + +``InstSP`` leaves the ``op`` field unbound. + +.. code-block:: llvm + + class F3<dag outs, dag ins, string asmstr, list<dag> pattern> + : InstSP<outs, ins, asmstr, pattern> { + bits<5> rd; + bits<6> op3; + bits<5> rs1; + let op{1} = 1; // Op = 2 or 3 + let Inst{29-25} = rd; + let Inst{24-19} = op3; + let Inst{18-14} = rs1; + } + +``F3`` binds the ``op`` field and defines the ``rd``, ``op3``, and ``rs1`` +fields. ``F3`` format instructions will bind the operands ``rd``, ``op3``, and +``rs1`` field |