diff options
author | Sean Callanan <scallanan@apple.com> | 2009-12-19 02:59:52 +0000 |
---|---|---|
committer | Sean Callanan <scallanan@apple.com> | 2009-12-19 02:59:52 +0000 |
commit | 8ed9f51663bc5533f36ca62e5668ae08e9a1313f (patch) | |
tree | 3054645839caee367e9403507d8487538819ed5b /utils | |
parent | e9ec6ad1ba5fd9ad70f5d0c059c5a5aa44f501f7 (diff) |
Table-driven disassembler for the X86 architecture (16-, 32-, and 64-bit
incarnations), integrated into the MC framework.
The disassembler is table-driven, using a custom TableGen backend to
generate hierarchical tables optimized for fast decode. The disassembler
consumes MemoryObjects and produces arrays of MCInsts, adhering to the
abstract base class MCDisassembler (llvm/MC/MCDisassembler.h).
The disassembler is documented in detail in
- lib/Target/X86/Disassembler/X86Disassembler.cpp (disassembler runtime)
- utils/TableGen/DisassemblerEmitter.cpp (table emitter)
You can test the disassembler by running llvm-mc -disassemble for i386
or x86_64 targets. Please let me know if you encounter any problems
with it.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@91749 91177308-0d34-0410-b5e6-96231b3b80d8
Diffstat (limited to 'utils')
-rw-r--r-- | utils/TableGen/CMakeLists.txt | 2 | ||||
-rw-r--r-- | utils/TableGen/DisassemblerEmitter.cpp | 99 | ||||
-rw-r--r-- | utils/TableGen/X86DisassemblerShared.h | 37 | ||||
-rw-r--r-- | utils/TableGen/X86DisassemblerTables.cpp | 603 | ||||
-rw-r--r-- | utils/TableGen/X86DisassemblerTables.h | 291 | ||||
-rw-r--r-- | utils/TableGen/X86ModRMFilters.h | 197 | ||||
-rw-r--r-- | utils/TableGen/X86RecognizableInstr.cpp | 959 | ||||
-rw-r--r-- | utils/TableGen/X86RecognizableInstr.h | 237 |
8 files changed, 2425 insertions, 0 deletions
diff --git a/utils/TableGen/CMakeLists.txt b/utils/TableGen/CMakeLists.txt index daf8676826..ce9b66f8c3 100644 --- a/utils/TableGen/CMakeLists.txt +++ b/utils/TableGen/CMakeLists.txt @@ -23,6 +23,8 @@ add_executable(tblgen TGValueTypes.cpp TableGen.cpp TableGenBackend.cpp + X86DisassemblerTables.cpp + X86RecognizableInstr.cpp ) target_link_libraries(tblgen LLVMSupport LLVMSystem) diff --git a/utils/TableGen/DisassemblerEmitter.cpp b/utils/TableGen/DisassemblerEmitter.cpp index cc131257cf..61b9b1583b 100644 --- a/utils/TableGen/DisassemblerEmitter.cpp +++ b/utils/TableGen/DisassemblerEmitter.cpp @@ -10,7 +10,86 @@ #include "DisassemblerEmitter.h" #include "CodeGenTarget.h" #include "Record.h" +#include "X86DisassemblerTables.h" +#include "X86RecognizableInstr.h" using namespace llvm; +using namespace llvm::X86Disassembler; + +/// DisassemblerEmitter - Contains disassembler table emitters for various +/// architectures. + +/// X86 Disassembler Emitter +/// +/// *** IF YOU'RE HERE TO RESOLVE A "Primary decode conflict", LOOK DOWN NEAR +/// THE END OF THIS COMMENT! +/// +/// The X86 disassembler emitter is part of the X86 Disassembler, which is +/// documented in lib/Target/X86/X86Disassembler.h. +/// +/// The emitter produces the tables that the disassembler uses to translate +/// instructions. The emitter generates the following tables: +/// +/// - One table (CONTEXTS_SYM) that contains a mapping of attribute masks to +/// instruction contexts. Although for each attribute there are cases where +/// that attribute determines decoding, in the majority of cases decoding is +/// the same whether or not an attribute is present. For example, a 64-bit +/// instruction with an OPSIZE prefix and an XS prefix decodes the same way in +/// all cases as a 64-bit instruction with only OPSIZE set. (The XS prefix +/// may have effects on its execution, but does not change the instruction +/// returned.) This allows considerable space savings in other tables. +/// - Four tables (ONEBYTE_SYM, TWOBYTE_SYM, THREEBYTE38_SYM, and +/// THREEBYTE3A_SYM) contain the hierarchy that the decoder traverses while +/// decoding an instruction. At the lowest level of this hierarchy are +/// instruction UIDs, 16-bit integers that can be used to uniquely identify +/// the instruction and correspond exactly to its position in the list of +/// CodeGenInstructions for the target. +/// - One table (INSTRUCTIONS_SYM) contains information about the operands of +/// each instruction and how to decode them. +/// +/// During table generation, there may be conflicts between instructions that +/// occupy the same space in the decode tables. These conflicts are resolved as +/// follows in setTableFields() (X86DisassemblerTables.cpp) +/// +/// - If the current context is the native context for one of the instructions +/// (that is, the attributes specified for it in the LLVM tables specify +/// precisely the current context), then it has priority. +/// - If the current context isn't native for either of the instructions, then +/// the higher-priority context wins (that is, the one that is more specific). +/// That hierarchy is determined by outranks() (X86DisassemblerTables.cpp) +/// - If the current context is native for both instructions, then the table +/// emitter reports a conflict and dies. +/// +/// *** RESOLUTION FOR "Primary decode conflict"S +/// +/// If two instructions collide, typically the solution is (in order of +/// likelihood): +/// +/// (1) to filter out one of the instructions by editing filter() +/// (X86RecognizableInstr.cpp). This is the most common resolution, but +/// check the Intel manuals first to make sure that (2) and (3) are not the +/// problem. +/// (2) to fix the tables (X86.td and its subsidiaries) so the opcodes are +/// accurate. Sometimes they are not. +/// (3) to fix the tables to reflect the actual context (for example, required +/// prefixes), and possibly to add a new context by editing +/// lib/Target/X86/X86DisassemblerDecoderCommon.h. This is unlikely to be +/// the cause. +/// +/// DisassemblerEmitter.cpp contains the implementation for the emitter, +/// which simply pulls out instructions from the CodeGenTarget and pushes them +/// into X86DisassemblerTables. +/// X86DisassemblerTables.h contains the interface for the instruction tables, +/// which manage and emit the structures discussed above. +/// X86DisassemblerTables.cpp contains the implementation for the instruction +/// tables. +/// X86ModRMFilters.h contains filters that can be used to determine which +/// ModR/M values are valid for a particular instruction. These are used to +/// populate ModRMDecisions. +/// X86RecognizableInstr.h contains the interface for a single instruction, +/// which knows how to translate itself from a CodeGenInstruction and provide +/// the information necessary for integration into the tables. +/// X86RecognizableInstr.cpp contains the implementation for a single +/// instruction. void DisassemblerEmitter::run(raw_ostream &OS) { CodeGenTarget Target; @@ -25,6 +104,26 @@ void DisassemblerEmitter::run(raw_ostream &OS) { << " *===---------------------------------------------------------------" << "-------===*/\n"; + // X86 uses a custom disassembler. + if (Target.getName() == "X86") { + DisassemblerTables Tables; + + std::vector<const CodeGenInstruction*> numberedInstructions; + Target.getInstructionsByEnumValue(numberedInstructions); + + for (unsigned i = 0, e = numberedInstructions.size(); i != e; ++i) + RecognizableInstr::processInstr(Tables, *numberedInstructions[i], i); + + // FIXME: As long as we are using exceptions, might as well drop this to the + // actual conflict site. + if (Tables.hasConflicts()) + throw TGError(Target.getTargetRecord()->getLoc(), + "Primary decode conflict"); + + Tables.emit(OS); + return; + } + throw TGError(Target.getTargetRecord()->getLoc(), "Unable to generate disassembler for this target"); } diff --git a/utils/TableGen/X86DisassemblerShared.h b/utils/TableGen/X86DisassemblerShared.h new file mode 100644 index 0000000000..9003cbfbde --- /dev/null +++ b/utils/TableGen/X86DisassemblerShared.h @@ -0,0 +1,37 @@ +//===- X86DisassemblerShared.h - Emitter shared header ----------*- C++ -*-===// +// +// The LLVM Compiler Infrastructure +// +// This file is distributed under the University of Illinois Open Source +// License. See LICENSE.TXT for details. +// +//===----------------------------------------------------------------------===// + +#ifndef X86DISASSEMBLERSHARED_H +#define X86DISASSEMBLERSHARED_H + +#include <string> + +#define INSTRUCTION_SPECIFIER_FIELDS \ + bool filtered; \ + InstructionContext insnContext; \ + std::string name; \ + \ + InstructionSpecifier() { \ + filtered = false; \ + insnContext = IC; \ + name = ""; \ + modifierType = MODIFIER_NONE; \ + modifierBase = 0; \ + bzero(operands, sizeof(operands)); \ + } + +#define INSTRUCTION_IDS \ + InstrUID instructionIDs[256]; + +#include "../../lib/Target/X86/Disassembler/X86DisassemblerDecoderCommon.h" + +#undef INSTRUCTION_SPECIFIER_FIELDS +#undef INSTRUCTION_IDS + +#endif diff --git a/utils/TableGen/X86DisassemblerTables.cpp b/utils/TableGen/X86DisassemblerTables.cpp new file mode 100644 index 0000000000..83284a77ba --- /dev/null +++ b/utils/TableGen/X86DisassemblerTables.cpp @@ -0,0 +1,603 @@ +//===- X86DisassemblerTables.cpp - Disassembler tables ----------*- C++ -*-===// +// +// The LLVM Compiler Infrastructure +// +// This file is distributed under the University of Illinois Open Source +// License. See LICENSE.TXT for details. +// +//===----------------------------------------------------------------------===// +// +// This file is part of the X86 Disassembler Emitter. +// It contains the implementation of the disassembler tables. +// Documentation for the disassembler emitter in general can be found in +// X86DisasemblerEmitter.h. +// +//===----------------------------------------------------------------------===// + +#include "X86DisassemblerShared.h" +#include "X86DisassemblerTables.h" + +#include "TableGenBackend.h" +#include "llvm/Support/ErrorHandling.h" +#include "llvm/Support/Format.h" + +#include <string> + +using namespace llvm; +using namespace X86Disassembler; + +/// inheritsFrom - Indicates whether all instructions in one class also belong +/// to another class. +/// +/// @param child - The class that may be the subset +/// @param parent - The class that may be the superset +/// @return - True if child is a subset of parent, false otherwise. +static inline bool inheritsFrom(InstructionContext child, + InstructionContext parent) { + if (child == parent) + return true; + + switch (parent) { + case IC: + return true; + case IC_64BIT: + return(inheritsFrom(child, IC_64BIT_REXW) || + inheritsFrom(child, IC_64BIT_OPSIZE) || + inheritsFrom(child, IC_64BIT_XD) || + inheritsFrom(child, IC_64BIT_XS)); + case IC_OPSIZE: + return(inheritsFrom(child, IC_64BIT_OPSIZE)); + case IC_XD: + return(inheritsFrom(child, IC_64BIT_XD)); + case IC_XS: + return(inheritsFrom(child, IC_64BIT_XS)); + case IC_64BIT_REXW: + return(inheritsFrom(child, IC_64BIT_REXW_XS) || + inheritsFrom(child, IC_64BIT_REXW_XD) || + inheritsFrom(child, IC_64BIT_REXW_OPSIZE)); + case IC_64BIT_OPSIZE: + return(inheritsFrom(child, IC_64BIT_REXW_OPSIZE)); + case IC_64BIT_XD: + return(inheritsFrom(child, IC_64BIT_REXW_XD)); + case IC_64BIT_XS: + return(inheritsFrom(child, IC_64BIT_REXW_XS)); + case IC_64BIT_REXW_XD: + return false; + case IC_64BIT_REXW_XS: + return false; + case IC_64BIT_REXW_OPSIZE: + return false; + default: + return false; + } +} + +/// outranks - Indicates whether, if an instruction has two different applicable +/// classes, which class should be preferred when performing decode. This +/// imposes a total ordering (ties are resolved toward "lower") +/// +/// @param upper - The class that may be preferable +/// @param lower - The class that may be less preferable +/// @return - True if upper is to be preferred, false otherwise. +static inline bool outranks(InstructionContext upper, + InstructionContext lower) { + assert(upper < IC_max); + assert(lower < IC_max); + +#define ENUM_ENTRY(n, r, d) r, + static int ranks[IC_max] = { + INSTRUCTION_CONTEXTS + }; +#undef ENUM_ENTRY + + return (ranks[upper] > ranks[lower]); +} + +/// stringForContext - Returns a string containing the name of a particular +/// InstructionContext, usually for diagnostic purposes. +/// +/// @param insnContext - The instruction class to transform to a string. +/// @return - A statically-allocated string constant that contains the +/// name of the instruction class. +static inline const char* stringForContext(InstructionContext insnContext) { + switch (insnContext) { + default: + llvm_unreachable("Unhandled instruction class"); +#define ENUM_ENTRY(n, r, d) case n: return #n; break; + INSTRUCTION_CONTEXTS +#undef ENUM_ENTRY + } +} + +/// stringForOperandType - Like stringForContext, but for OperandTypes. +static inline const char* stringForOperandType(OperandType type) { + switch (type) { + default: + llvm_unreachable("Unhandled type"); +#define ENUM_ENTRY(i, d) case i: return #i; + TYPES +#undef ENUM_ENTRY + } +} + +/// stringForOperandEncoding - like stringForContext, but for +/// OperandEncodings. +static inline const char* stringForOperandEncoding(OperandEncoding encoding) { + switch (encoding) { + default: + llvm_unreachable("Unhandled encoding"); +#define ENUM_ENTRY(i, d) case i: return #i; + ENCODINGS +#undef ENUM_ENTRY + } +} + +void DisassemblerTables::emitOneID(raw_ostream &o, + uint32_t &i, + InstrUID id, + bool addComma) const { + if (id) + o.indent(i * 2) << format("0x%hx", id); + else + o.indent(i * 2) << 0; + + if (addComma) + o << ", "; + else + o << " "; + + o << "/* "; + o << InstructionSpecifiers[id].name; + o << "*/"; + + o << "\n"; +} + +/// emitEmptyTable - Emits the modRMEmptyTable, which is used as a ID table by +/// all ModR/M decisions for instructions that are invalid for all possible +/// ModR/M byte values. +/// +/// @param o - The output stream on which to emit the table. +/// @param i - The indentation level for that output stream. +static void emitEmptyTable(raw_ostream &o, uint32_t &i) +{ + o.indent(i * 2) << "InstrUID modRMEmptyTable[1] = { 0 };" << "\n"; + o << "\n"; +} + +/// getDecisionType - Determines whether a ModRM decision with 255 entries can +/// be compacted by eliminating redundant information. +/// +/// @param decision - The decision to be compacted. +/// @return - The compactest available representation for the decision. +static ModRMDecisionType getDecisionType(ModRMDecision &decision) +{ + bool satisfiesOneEntry = true; + bool satisfiesSplitRM = true; + + uint16_t index; + + for (index = 0; index < 256; ++index) { + if (decision.instructionIDs[index] != decision.instructionIDs[0]) + satisfiesOneEntry = false; + + if (((index & 0xc0) == 0xc0) && + (decision.instructionIDs[index] != decision.instructionIDs[0xc0])) + satisfiesSplitRM = false; + + if (((index & 0xc0) != 0xc0) && + (decision.instructionIDs[index] != decision.instructionIDs[0x00])) + satisfiesSplitRM = false; + } + + if (satisfiesOneEntry) + return MODRM_ONEENTRY; + + if (satisfiesSplitRM) + return MODRM_SPLITRM; + + return MODRM_FULL; +} + +/// stringForDecisionType - Returns a statically-allocated string corresponding +/// to a particular decision type. +/// +/// @param dt - The decision type. +/// @return - A pointer to the statically-allocated string (e.g., +/// "MODRM_ONEENTRY" for MODRM_ONEENTRY). +static const char* stringForDecisionType(ModRMDecisionType dt) +{ +#define ENUM_ENTRY(n) case n: return #n; + switch (dt) { + default: + llvm_unreachable("Unknown decision type"); + MODRMTYPES + }; +#undef ENUM_ENTRY +} + +/// stringForModifierType - Returns a statically-allocated string corresponding +/// to an opcode modifier type. +/// +/// @param mt - The modifier type. +/// @return - A pointer to the statically-allocated string (e.g., +/// "MODIFIER_NONE" for MODIFIER_NONE). +static const char* stringForModifierType(ModifierType mt) +{ +#define ENUM_ENTRY(n) case n: return #n; + switch(mt) { + default: + llvm_unreachable("Unknown modifier type"); + MODIFIER_TYPES + }; +#undef ENUM_ENTRY +} + +DisassemblerTables::DisassemblerTables() { + unsigned i; + + for (i = 0; i < 4; i++) { + Tables[i] = new ContextDecision; + bzero(Tables[i], sizeof(ContextDecision)); + } + + HasConflicts = false; +} + +DisassemblerTables::~DisassemblerTables() { + unsigned i; + + for (i = 0; i < 4; i++) + delete Tables[i]; +} + +void DisassemblerTables::emitModRMDecision(raw_ostream &o1, + raw_ostream &o2, + uint32_t &i1, + uint32_t &i2, + ModRMDecision &decision) + const { + static uint64_t sTableNumber = 0; + uint64_t thisTableNumber = sTableNumber; + ModRMDecisionType dt = getDecisionType(decision); + uint16_t index; + + if (dt == MODRM_ONEENTRY && decision.instructionIDs[0] == 0) + { + o2.indent(i2) << "{ /* ModRMDecision */" << "\n"; + i2++; + + o2.indent(i2) << stringForDecisionType(dt) << "," << "\n"; + o2.indent(i2) << "modRMEmptyTable"; + + i2--; + o2.indent(i2) << "}"; + return; + } + + o1.indent(i1) << "InstrUID modRMTable" << thisTableNumber; + + switch (dt) { + default: + llvm_unreachable("Unknown decision type"); + case MODRM_ONEENTRY: + o1 << "[1]"; + break; + case MODRM_SPLITRM: + o1 << "[2]"; + break; + case MODRM_FULL: + o1 << "[256]"; + break; + } + + o1 << " = {" << "\n"; + i1++; + + switch (dt) { + default: + llvm_unreachable("Unknown decision type"); + case MODRM_ONEENTRY: + emitOneID(o1, i1, decision.instructionIDs[0], false); + break; + case MODRM_SPLITRM: + emitOneID(o1, i1, decision.instructionIDs[0x00], true); // mod = 0b00 + emitOneID(o1, i1, decision.instructionIDs[0xc0], false); // mod = 0b11 + break; + case MODRM_FULL: + for (index = 0; index < 256; ++index) + emitOneID(o1, i1, decision.instructionIDs[index], index < 255); + break; + } + + i1--; + o1.indent(i1) << "};" << "\n"; + o1 << "\n"; + + o2.indent(i2) << "{ /* struct ModRMDecision */" << "\n"; + i2++; + + o2.indent(i2) << stringForDecisionType(dt) << "," << "\n"; + o2.indent(i2) << "modRMTable" << sTableNumber << "\n"; + + i2--; + o2.indent(i2) << "}"; + + ++sTableNumber; +} + +void DisassemblerTables::emitOpcodeDecision( + raw_ostream &o1, + raw_ostream &o2, + uint32_t &i1, + uint32_t &i2, + OpcodeDecision &decision) const { + uint16_t index; + + o2.indent(i2) << "{ /* struct OpcodeDecision */" << "\n"; + i2++; + o2.indent(i2) << "{" << "\n"; + i2++; + + for (index = 0; index < 256; ++index) { + o2.indent(i2); + + o2 << "/* 0x" << format("%02hhx", index) << " */" << "\n"; + + emitModRMDecision(o1, o2, i1, i2, decision.modRMDecisions[index]); + + if (index < 255) + o2 << ","; + + o2 << "\n"; + } + + i2--; + o2.indent(i2) << "}" << "\n"; + i2--; + o2.indent(i2) << "}" << "\n"; +} + +void DisassemblerTables::emitContextDecision( + raw_ostream &o1, + raw_ostream &o2, + uint32_t &i1, + uint32_t &i2, + ContextDecision &decision, + const char* name) const { + o2.indent(i2) << "struct ContextDecision " << name << " = {" << "\n"; + i2++; + o2.indent(i2) << "{ /* opcodeDecisions */" << "\n"; + i2++; + + unsigned index; + + for (index = 0; index < IC_max; ++index) { + o2.indent(i2) << "/* "; + o2 << stringForContext((InstructionContext)index); + o2 << " */"; + o2 << "\n"; + + emitOpcodeDecision(o1, o2, i1, i2, decision.opcodeDecisions[index]); + + if (index + 1 < IC_max) + o2 << ", "; + } + + i2--; + o2.indent(i2) << "}" << "\n"; + i2--; + o2.indent(i2) << "};" << "\n"; +} + +void DisassemblerTables::emitInstructionInfo(raw_ostream &o, uint32_t &i) + const { + o.indent(i * 2) << "struct InstructionSpecifier "; + o << INSTRUCTIONS_STR << "["; + o << InstructionSpecifiers.size(); + o << "] = {" << "\n"; + + i++; + + uint16_t numInstructions = InstructionSpecifiers.size(); + uint16_t index, operandIndex; + + for (index = 0; index < numInstructions; ++index) { + o.indent(i * 2) << "{ /* " << index << " */" << "\n"; + i++; + + o.indent(i * 2) << + stringForModifierType(InstructionSpecifiers[index].modifierType); + o << "," << "\n"; + + o.indent(i * 2) << "0x"; + o << format("%02hhx", (uint16_t)InstructionSpecifiers[index].modifierBase); + o << "," << "\n"; + + o.indent(i * 2) << "{" << "\n"; + i++; + + for (operandIndex = 0; operandIndex < X86_MAX_OPERANDS; ++operandIndex) { + o.indent(i * 2) << "{ "; + o << stringForOperandEncoding(InstructionSpecifiers[index] + .operands[operandIndex] + .encoding); + o << ", "; + o << stringForOperandType(InstructionSpecifiers[index] + .operands[operandIndex] + .type); + o << " }"; + + if (operandIndex < X86_MAX_OPERANDS - 1) + o << ","; + + o << "\n"; + } + + i--; + o.indent(i * 2) << "}," << "\n"; + + o.indent(i * 2) << "\"" << InstructionSpecifiers[index].name << "\""; + o << "\n"; + + i--; + o.indent(i * 2) << "}"; + + if (index + 1 < numInstructions) + o << ","; + + o << "\n"; + } + + i--; + o.indent(i * 2) << "};" << "\n"; +} + +void DisassemblerTables::emitContextTable(raw_ostream &o, uint32_t &i) const { + uint16_t index; + + o.indent(i * 2) << "InstructionContext "; + o << CONTEXTS_STR << "[256] = {" << "\n"; + i++; + + for (index = 0; index < 256; ++index) { + o.indent(i * 2); + + if ((index & ATTR_64BIT) && (index & ATTR_REXW) && (index & ATTR_XS)) + o << "IC_64BIT_REXW_XS"; + else if ((index & ATTR_64BIT) && (index & ATTR_REXW) && (index & ATTR_XD)) + o << "IC_64BIT_REXW_XD"; + else if ((index & ATTR_64BIT) && (index & ATTR_REXW) && + (index & ATTR_OPSIZE)) + o << "IC_64BIT_REXW_OPSIZE"; + else if ((index & ATTR_64BIT) && (index & ATTR_XS)) + o << "IC_64BIT_XS"; + else if ((index & ATTR_64BIT) && (index & ATTR_XD)) + o << "IC_64BIT_XD"; + else if ((index & ATTR_64BIT) && (index & ATTR_OPSIZE)) + o << "IC_64BIT_OPSIZE"; + else if ((index & ATTR_64BIT) && (index & ATTR_REXW)) + o << "IC_64BIT_REXW"; + else if ((index & ATTR_64BIT)) + o << "IC_64BIT"; + else if (index & ATTR_XS) + o << "IC_XS"; + else if (index & ATTR_XD) + o << "IC_XD"; + else if (index & ATTR_OPSIZE) + o << "IC_OPSIZE"; + else + o << "IC"; + + if (index < 255) + o << ","; + else + o << " "; + + o << " /* " << index << " */"; + + o << "\n"; + } + + i--; + o.indent(i * 2) << "};" << "\n"; +} + +void DisassemblerTables::emitContextDecisions(raw_ostream &o1, + raw_ostream &o2, + uint32_t &i1, + uint32_t &i2) + const { + emitContextDecision(o1, o2, i1, i2, *Tables[0], ONEBYTE_STR); + emitContextDecision(o1, o2, i1, i2, *Tables[1], TWOBYTE_STR); + emitContextDecision(o1, o2, i1, i2, *Tables[2], THREEBYTE38_STR); + emitContextDecision(o1, o2, i1, i2, *Tables[3], THREEBYTE3A_STR); +} + +void DisassemblerTables::emit(raw_ostream &o) const { + uint32_t i1 = 0; + uint32_t i2 = 0; + + std::string s1; + std::string s2; + + raw_string_ostream o1(s1); + raw_string_ostream o2(s2); + + emitInstructionInfo(o, i2); + o << "\n"; + + emitContextTable(o, i2); + o << "\n"; + + emitEmptyTable(o1, i1); + emitContextDecisions(o1, o2, i1, i2); + + o << o1.str(); + o << "\n"; + o << o2.str(); + o << "\n"; + o << "\n"; +} + +void DisassemblerTables::setTableFields(ModRMDecision &decision, + const ModRMFilter &filter, + InstrUID uid, + uint8_t opcode) { + unsigned index; + + for (index = 0; index < 256; ++index) { + if (filter.accepts(index)) { + if (decision.instructionIDs[index] == uid) + continue; + + if (decision.instructionIDs[index] != 0) { + InstructionSpecifier &newInfo = + InstructionSpecifiers[uid]; + InstructionSpecifier &previousInfo = + InstructionSpecifiers[decision.instructionIDs[index]]; + + if(newInfo.filtered) + continue; // filtered instructions get lowest priority + + if(previousInfo.name == "NOOP") + continue; // special case for XCHG32ar and NOOP + + if (outranks(previousInfo.insnContext, newInfo.insnContext)) + continue; + + if (previousInfo.insnContext == newInfo.insnContext && + !previousInfo.filtered) { + errs() << "Error: Primary decode conflict: "; + errs() << newInfo.name << " would overwrite " << previousInfo.name; + errs() << "\n"; + errs() << "ModRM " << index << "\n"; + errs() << "Opcode " << (uint16_t)opcode << "\n"; + errs() << "Context " << stringForContext(newInfo.insnContext) << "\n"; + HasConflicts = true; + } + } + + decision.instructionIDs[index] = uid; + } + } +} + +void DisassemblerTables::setTableFields(OpcodeType type, + InstructionContext insnContext, + uint8_t opcode, + const ModRMFilter &filter, + InstrUID uid) { + unsigned index; + + ContextDecision &decision = *Tables[type]; + + for (index = 0; index < IC_max; ++index) { + if (inheritsFrom((InstructionContext)index, + InstructionSpecifiers[uid].insnContext)) + setTableFields(decision.opcodeDecisions[index].modRMDecisions[opcode], + filter, + uid, + opcode); + } +} diff --git a/utils/TableGen/X86DisassemblerTables.h b/utils/TableGen/X86DisassemblerTables.h new file mode 100644 index 0000000000..08eba019c0 --- /dev/null +++ b/utils/TableGen/X86DisassemblerTables.h @@ -0,0 +1,291 @@ +//===- X86DisassemblerTables.h - Disassembler tables ------------*- C++ -*-===// +// +// The LLVM Compiler Infrastructure +// +// This file is distributed under the University of Illinois Open Source +// License. See LICENSE.TXT for details. +// +//===----------------------------------------------------------------------===// +// +// This file is part of the X86 Disassembler Emitter. +// It contains the interface of the disassembler tables. +// Documentation for the disassembler emitter in general can be found in +// X86DisasemblerEmitter.h. +// +//===----------------------------------------------------------------------===// + +#ifndef X86DISASSEMBLERTABLES_H +#define X86DISASSEMBLERTABLES_H + +#include "X86DisassemblerShared.h" +#include "X86ModRMFilters.h" + +#include "llvm/Support/raw_ostream.h" + +#include <vector> + +namespace llvm { + +namespace X86Disassembler { + +/// DisassemblerTables - Encapsulates all the decode tables being generated by +/// the table emitter. Contains functions to populate the tables as well as +/// to emit them as hierarchical C structures suitable for consumption by the +/// runtime. +class DisassemblerTables { +private: + /// The decoder tables. There is one for each opcode type: + /// [0] one-byte opcodes + /// [1] two-byte opcodes of the form 0f __ + /// [2] three-byte opcodes of the form 0f 38 __ + /// [3] three-byte opcodes of the form 0f 3a __ + ContextDecision* Tables[4]; + + /// The instruction information table + std::vector<InstructionSpecifier> InstructionSpecifiers; + + /// True if there are primary decode conflicts in the instruction set + bool HasConflicts; + + /// emitOneID - Emits a table entry for a single instruction entry, at the + /// innermost level of the structure hierarchy. The entry is printed out + /// in the format "nnnn, /* MNEMONIC */" where nnnn is the ID in decimal, + /// the comma is printed if addComma is true, and the menonic is the name + /// of the instruction as listed in the LLVM tables. + /// + /// @param o - The output stream to print the entry on. + /// @param i - The indentation level for o. + /// @param id - The unique ID of the instruction to print. + /// @param addComma - Whether or not to print a comma after the ID. True if + /// additional items will follow. + void emitOneID(raw_ostream &o, + uint32_t &i, + InstrUID id, + bool addComma) const; + + /// emitModRMDecision - Emits a table of entries corresponding to a single + /// ModR/M decision. Compacts the ModR/M decision if possible. ModR/M + /// decisions are printed as: + /// + /// { /* struct ModRMDecision */ + /// TYPE, + /// modRMTablennnn + /// } + /// + /// where nnnn is a unique ID for the corresponding table of IDs. + /// TYPE indicates whether the table has one entry that is the same + /// regardless of ModR/M byte, two entries - one for bytes 0x00-0xbf and one + /// for bytes 0xc0-0xff -, or 256 entries, one for each possible byte. + /// nnnn is the number of a table for looking up these values. The tables + /// are writen separately so that tables consisting entirely of zeros will + /// not be duplicated. (These all have the name modRMEmptyTable.) A table + /// is printed as: + /// + /// InstrUID modRMTablennnn[k] = { + /// nnnn, /* MNEMONIC */ + /// ... + /// nnnn /* MNEMONIC */ + /// }; + /// + /// @param o1 - The output stream to print the ID table to. + /// @param o2 - The output stream to print the decision structure to. + /// @param i1 - The indentation level to use with stream o1. + /// @param i2 - The indentation level to use with stream o2. + /// @param decision - The ModR/M decision to emit. This decision has 256 + /// entries - emitModRMDecision decides how to compact it. + void emitModRMDecision(raw_ostream &o1, + raw_ostream &o2, + uint32_t &i1, + uint32_t &i2, + ModRMDecision &decision) const; + + /// emitOpcodeDecision - Emits an OpcodeDecision and all its subsidiary ModR/M + /// decisions. An OpcodeDecision is printed as: + /// + /// { /* struct OpcodeDecision */ + /// /* 0x00 */ + /// { /* struct ModRMDecision */ + /// ... + /// } + /// ... + /// } + /// + /// where the ModRMDecision structure is printed as described in the + /// documentation for emitModRMDecision(). emitOpcodeDecision() passes on a + /// stream and indent level for the UID tables generated by + /// emitModRMDecision(), but does not use them itself. + /// + /// @param o1 - The output stream to print the ID tables generated by + /// emitModRMDecision() to. + /// @param o2 - The output stream for the decision structure itself. + /// @param i1 - The indent level to use with stream o1. + /// @param i2 - The indent level to use with stream o2. + /// @param decision - The OpcodeDecision to emit along with its subsidiary + /// structures. + void emitOpcodeDecision(raw_ostream &o1, + raw_ostream &o2, + uint32_t &i1, + uint32_t &i2, + OpcodeDecision &decision) const; + + /// emitContextDecision - Emits a ContextDecision and all its subsidiary + /// Opcode and ModRMDecisions. A ContextDecision is printed as: + /// + /// struct ContextDecision NAME = { + /// { /* OpcodeDecisions */ + /// /* IC */ + /// { /* struct OpcodeDecision */ + /// ... + /// }, + /// ... + /// } + /// } + /// + /// NAME is the name of the ContextDecision (typically one of the four names + /// ONEBYTE_SYM, TWOBYTE_SYM, THREEBYTE38_SYM, and THREEBYTE3A_SYM from + /// X86DisassemblerDecoderCommon.h). + /// IC is one of the contexts in InstructionContext. There is an opcode + /// decision for each possible context. + /// The OpcodeDecision structures are printed as described in the + /// documentation for emitOpcodeDecision. + /// + /// @param o1 - The output stream to print the ID tables generated by + /// emitModRMDecision() to. + /// @param o2 - The output stream to print the decision structure to. + /// @param i1 - The indent level to use with stream o1. + /// @param i2 - The indent level to use with stream o2. + /// |