aboutsummaryrefslogtreecommitdiff
path: root/lib
diff options
context:
space:
mode:
authorChandler Carruth <chandlerc@gmail.com>2012-03-31 12:42:41 +0000
committerChandler Carruth <chandlerc@gmail.com>2012-03-31 12:42:41 +0000
commitf2286b0152f0b942e82d8e809186e5cc0d247131 (patch)
tree033db72a90a2d25b807c089b32829cda0aed7189 /lib
parent7384530c7cb1e0f747afa0a076dc7a9c13106518 (diff)
Initial commit for the rewrite of the inline cost analysis to operate
on a per-callsite walk of the called function's instructions, in breadth-first order over the potentially reachable set of basic blocks. This is a major shift in how inline cost analysis works to improve the accuracy and rationality of inlining decisions. A brief outline of the algorithm this moves to: - Build a simplification mapping based on the callsite arguments to the function arguments. - Push the entry block onto a worklist of potentially-live basic blocks. - Pop the first block off of the *front* of the worklist (for breadth-first ordering) and walk its instructions using a custom InstVisitor. - For each instruction's operands, re-map them based on the simplification mappings available for the given callsite. - Compute any simplification possible of the instruction after re-mapping, and store that back int othe simplification mapping. - Compute any bonuses, costs, or other impacts of the instruction on the cost metric. - When the terminator is reached, replace any conditional value in the terminator with any simplifications from the mapping we have, and add any successors which are not proven to be dead from these simplifications to the worklist. - Pop the next block off of the front of the worklist, and repeat. - As soon as the cost of inlining exceeds the threshold for the callsite, stop analyzing the function in order to bound cost. The primary goal of this algorithm is to perfectly handle dead code paths. We do not want any code in trivially dead code paths to impact inlining decisions. The previous metric was *extremely* flawed here, and would always subtract the average cost of two successors of a conditional branch when it was proven to become an unconditional branch at the callsite. There was no handling of wildly different costs between the two successors, which would cause inlining when the path actually taken was too large, and no inlining when the path actually taken was trivially simple. There was also no handling of the code *path*, only the immediate successors. These problems vanish completely now. See the added regression tests for the shiny new features -- we skip recursive function calls, SROA-killing instructions, and high cost complex CFG structures when dead at the callsite being analyzed. Switching to this algorithm required refactoring the inline cost interface to accept the actual threshold rather than simply returning a single cost. The resulting interface is pretty bad, and I'm planning to do lots of interface cleanup after this patch. Several other refactorings fell out of this, but I've tried to minimize them for this patch. =/ There is still more cleanup that can be done here. Please point out anything that you see in review. I've worked really hard to try to mirror at least the spirit of all of the previous heuristics in the new model. It's not clear that they are all correct any more, but I wanted to minimize the change in this single patch, it's already a bit ridiculous. One heuristic that is *not* yet mirrored is to allow inlining of functions with a dynamic alloca *if* the caller has a dynamic alloca. I will add this back, but I think the most reasonable way requires changes to the inliner itself rather than just the cost metric, and so I've deferred this for a subsequent patch. The test case is XFAIL-ed until then. As mentioned in the review mail, this seems to make Clang run about 1% to 2% faster in -O0, but makes its binary size grow by just under 4%. I've looked into the 4% growth, and it can be fixed, but requires changes to other parts of the inliner. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@153812 91177308-0d34-0410-b5e6-96231b3b80d8
Diffstat (limited to 'lib')
-rw-r--r--lib/Analysis/CodeMetrics.cpp88
-rw-r--r--lib/Analysis/InlineCost.cpp1441
-rw-r--r--lib/Transforms/IPO/InlineAlways.cpp5
-rw-r--r--lib/Transforms/IPO/InlineSimple.cpp5
-rw-r--r--lib/Transforms/IPO/Inliner.cpp53
5 files changed, 971 insertions, 621 deletions
diff --git a/lib/Analysis/CodeMetrics.cpp b/lib/Analysis/CodeMetrics.cpp
index 6c93f78629..316e7bc934 100644
--- a/lib/Analysis/CodeMetrics.cpp
+++ b/lib/Analysis/CodeMetrics.cpp
@@ -50,6 +50,52 @@ bool llvm::callIsSmall(const Function *F) {
return false;
}
+bool llvm::isInstructionFree(const Instruction *I, const TargetData *TD) {
+ if (isa<PHINode>(I))
+ return true;
+
+ // If a GEP has all constant indices, it will probably be folded with
+ // a load/store.
+ if (const GetElementPtrInst *GEP = dyn_cast<GetElementPtrInst>(I))
+ return GEP->hasAllConstantIndices();
+
+ if (const IntrinsicInst *II = dyn_cast<IntrinsicInst>(I)) {
+ switch (II->getIntrinsicID()) {
+ default:
+ return false;
+ case Intrinsic::dbg_declare:
+ case Intrinsic::dbg_value:
+ case Intrinsic::invariant_start:
+ case Intrinsic::invariant_end:
+ case Intrinsic::lifetime_start:
+ case Intrinsic::lifetime_end:
+ case Intrinsic::objectsize:
+ case Intrinsic::ptr_annotation:
+ case Intrinsic::var_annotation:
+ // These intrinsics don't count as size.
+ return true;
+ }
+ }
+
+ if (const CastInst *CI = dyn_cast<CastInst>(I)) {
+ // Noop casts, including ptr <-> int, don't count.
+ if (CI->isLosslessCast() || isa<IntToPtrInst>(CI) || isa<PtrToIntInst>(CI))
+ return true;
+ // trunc to a native type is free (assuming the target has compare and
+ // shift-right of the same width).
+ if (TD && isa<TruncInst>(CI) &&
+ TD->isLegalInteger(TD->getTypeSizeInBits(CI->getType())))
+ return true;
+ // Result of a cmp instruction is often extended (to be used by other
+ // cmp instructions, logical or return instructions). These are usually
+ // nop on most sane targets.
+ if (isa<CmpInst>(CI->getOperand(0)))
+ return true;
+ }
+
+ return false;
+}
+
/// analyzeBasicBlock - Fill in the current structure with information gleaned
/// from the specified block.
void CodeMetrics::analyzeBasicBlock(const BasicBlock *BB,
@@ -58,27 +104,11 @@ void CodeMetrics::analyzeBasicBlock(const BasicBlock *BB,
unsigned NumInstsBeforeThisBB = NumInsts;
for (BasicBlock::const_iterator II = BB->begin(), E = BB->end();
II != E; ++II) {
- if (isa<PHINode>(II)) continue; // PHI nodes don't count.
+ if (isInstructionFree(II, TD))
+ continue;
// Special handling for calls.
if (isa<CallInst>(II) || isa<InvokeInst>(II)) {
- if (const IntrinsicInst *IntrinsicI = dyn_cast<IntrinsicInst>(II)) {
- switch (IntrinsicI->getIntrinsicID()) {
- default: break;
- case Intrinsic::dbg_declare:
- case Intrinsic::dbg_value:
- case Intrinsic::invariant_start:
- case Intrinsic::invariant_end:
- case Intrinsic::lifetime_start:
- case Intrinsic::lifetime_end:
- case Intrinsic::objectsize:
- case Intrinsic::ptr_annotation:
- case Intrinsic::var_annotation:
- // These intrinsics don't count as size.
- continue;
- }
- }
-
ImmutableCallSite CS(cast<Instruction>(II));
if (const Function *F = CS.getCalledFunction()) {
@@ -115,28 +145,6 @@ void CodeMetrics::analyzeBasicBlock(const BasicBlock *BB,
if (isa<ExtractElementInst>(II) || II->getType()->isVectorTy())
++NumVectorInsts;
- if (const CastInst *CI = dyn_cast<CastInst>(II)) {
- // Noop casts, including ptr <-> int, don't count.
- if (CI->isLosslessCast() || isa<IntToPtrInst>(CI) ||
- isa<PtrToIntInst>(CI))
- continue;
- // trunc to a native type is free (assuming the target has compare and
- // shift-right of the same width).
- if (isa<TruncInst>(CI) && TD &&
- TD->isLegalInteger(TD->getTypeSizeInBits(CI->getType())))
- continue;
- // Result of a cmp instruction is often extended (to be used by other
- // cmp instructions, logical or return instructions). These are usually
- // nop on most sane targets.
- if (isa<CmpInst>(CI->getOperand(0)))
- continue;
- } else if (const GetElementPtrInst *GEPI = dyn_cast<GetElementPtrInst>(II)){
- // If a GEP has all constant indices, it will probably be folded with
- // a load/store.
- if (GEPI->hasAllConstantIndices())
- continue;
- }
-
++NumInsts;
}
diff --git a/lib/Analysis/InlineCost.cpp b/lib/Analysis/InlineCost.cpp
index dedbfebea7..bc6c1687fd 100644
--- a/lib/Analysis/InlineCost.cpp
+++ b/lib/Analysis/InlineCost.cpp
@@ -11,659 +11,1014 @@
//
//===----------------------------------------------------------------------===//
+#define DEBUG_TYPE "inline-cost"
#include "llvm/Analysis/InlineCost.h"
+#include "llvm/Analysis/ConstantFolding.h"
+#include "llvm/Analysis/InstructionSimplify.h"
#include "llvm/Support/CallSite.h"
+#include "llvm/Support/Debug.h"
+#include "llvm/Support/InstVisitor.h"
+#include "llvm/Support/GetElementPtrTypeIterator.h"
+#include "llvm/Support/raw_ostream.h"
#include "llvm/CallingConv.h"
#include "llvm/IntrinsicInst.h"
+#include "llvm/Operator.h"
+#include "llvm/GlobalAlias.h"
#include "llvm/Target/TargetData.h"
+#include "llvm/ADT/STLExtras.h"
+#include "llvm/ADT/SetVector.h"
+#include "llvm/ADT/SmallVector.h"
#include "llvm/ADT/SmallPtrSet.h"
using namespace llvm;
-unsigned InlineCostAnalyzer::FunctionInfo::countCodeReductionForConstant(
- const CodeMetrics &Metrics, Value *V) {
- unsigned Reduction = 0;
- SmallVector<Value *, 4> Worklist;
- Worklist.push_back(V);
- do {
- Value *V = Worklist.pop_back_val();
- for (Value::use_iterator UI = V->use_begin(), E = V->use_end(); UI != E;++UI){
- User *U = *UI;
- if (isa<BranchInst>(U) || isa<SwitchInst>(U)) {
- // We will be able to eliminate all but one of the successors.
- const TerminatorInst &TI = cast<TerminatorInst>(*U);
- const unsigned NumSucc = TI.getNumSuccessors();
- unsigned Instrs = 0;
- for (unsigned I = 0; I != NumSucc; ++I)
- Instrs += Metrics.NumBBInsts.lookup(TI.getSuccessor(I));
- // We don't know which blocks will be eliminated, so use the average size.
- Reduction += InlineConstants::InstrCost*Instrs*(NumSucc-1)/NumSucc;
- continue;
- }
+namespace {
+
+class CallAnalyzer : public InstVisitor<CallAnalyzer, bool> {
+ typedef InstVisitor<CallAnalyzer, bool> Base;
+ friend class InstVisitor<CallAnalyzer, bool>;
+
+ // TargetData if available, or null.
+ const TargetData *const TD;
+
+ // The called function.
+ Function &F;
+
+ int Threshold;
+ int Cost;
+ const bool AlwaysInline;
+
+ bool IsRecursive;
+ bool ExposesReturnsTwice;
+ bool HasDynamicAlloca;
+ unsigned NumInstructions, NumVectorInstructions;
+ int FiftyPercentVectorBonus, TenPercentVectorBonus;
+ int VectorBonus;
+
+ // While we walk the potentially-inlined instructions, we build up and
+ // maintain a mapping of simplified values specific to this callsite. The
+ // idea is to propagate any special information we have about arguments to
+ // this call through the inlinable section of the function, and account for
+ // likely simplifications post-inlining. The most important aspect we track
+ // is CFG altering simplifications -- when we prove a basic block dead, that
+ // can cause dramatic shifts in the cost of inlining a function.
+ DenseMap<Value *, Constant *> SimplifiedValues;
+
+ // Keep track of the values which map back (through function arguments) to
+ // allocas on the caller stack which could be simplified through SROA.
+ DenseMap<Value *, Value *> SROAArgValues;
+
+ // The mapping of caller Alloca values to their accumulated cost savings. If
+ // we have to disable SROA for one of the allocas, this tells us how much
+ // cost must be added.
+ DenseMap<Value *, int> SROAArgCosts;
+
+ // Keep track of values which map to a pointer base and constant offset.
+ DenseMap<Value *, std::pair<Value *, APInt> > ConstantOffsetPtrs;
+
+ // Custom simplification helper routines.
+ bool isAllocaDerivedArg(Value *V);
+ bool lookupSROAArgAndCost(Value *V, Value *&Arg,
+ DenseMap<Value *, int>::iterator &CostIt);
+ void disableSROA(DenseMap<Value *, int>::iterator CostIt);
+ void disableSROA(Value *V);
+ void accumulateSROACost(DenseMap<Value *, int>::iterator CostIt,
+ int InstructionCost);
+ bool handleSROACandidate(bool IsSROAValid,
+ DenseMap<Value *, int>::iterator CostIt,
+ int InstructionCost);
+ bool isGEPOffsetConstant(GetElementPtrInst &GEP);
+ bool accumulateGEPOffset(GEPOperator &GEP, APInt &Offset);
+ ConstantInt *stripAndComputeInBoundsConstantOffsets(Value *&V);
+
+ // Custom analysis routines.
+ bool analyzeBlock(BasicBlock *BB);
+
+ // Disable several entry points to the visitor so we don't accidentally use
+ // them by declaring but not defining them here.
+ void visit(Module *); void visit(Module &);
+ void visit(Function *); void visit(Function &);
+ void visit(BasicBlock *); void visit(BasicBlock &);
+
+ // Provide base case for our instruction visit.
+ bool visitInstruction(Instruction &I);
+
+ // Our visit overrides.
+ bool visitAlloca(AllocaInst &I);
+ bool visitPHI(PHINode &I);
+ bool visitGetElementPtr(GetElementPtrInst &I);
+ bool visitBitCast(BitCastInst &I);
+ bool visitPtrToInt(PtrToIntInst &I);
+ bool visitIntToPtr(IntToPtrInst &I);
+ bool visitCastInst(CastInst &I);
+ bool visitUnaryInstruction(UnaryInstruction &I);
+ bool visitICmp(ICmpInst &I);
+ bool visitSub(BinaryOperator &I);
+ bool visitBinaryOperator(BinaryOperator &I);
+ bool visitLoad(LoadInst &I);
+ bool visitStore(StoreInst &I);
+ bool visitCallSite(CallSite CS);
+
+public:
+ CallAnalyzer(const TargetData *TD, Function &Callee, int Threshold)
+ : TD(TD), F(Callee), Threshold(Threshold), Cost(0),
+ AlwaysInline(F.hasFnAttr(Attribute::AlwaysInline)),
+ IsRecursive(false), ExposesReturnsTwice(false), HasDynamicAlloca(false),
+ NumInstructions(0), NumVectorInstructions(0),
+ FiftyPercentVectorBonus(0), TenPercentVectorBonus(0), VectorBonus(0),
+ NumConstantArgs(0), NumConstantOffsetPtrArgs(0), NumAllocaArgs(0),
+ NumConstantPtrCmps(0), NumConstantPtrDiffs(0),
+ NumInstructionsSimplified(0), SROACostSavings(0), SROACostSavingsLost(0) {
+ }
- // Figure out if this instruction will be removed due to simple constant
- // propagation.
- Instruction &Inst = cast<Instruction>(*U);
-
- // We can't constant propagate instructions which have effects or
- // read memory.
- //
- // FIXME: It would be nice to capture the fact that a load from a
- // pointer-to-constant-global is actually a *really* good thing to zap.
- // Unfortunately, we don't know the pointer that may get propagated here,
- // so we can't make this decision.
- if (Inst.mayReadFromMemory() || Inst.mayHaveSideEffects() ||
- isa<AllocaInst>(Inst))
- continue;
+ bool analyzeCall(CallSite CS);
- bool AllOperandsConstant = true;
- for (unsigned i = 0, e = Inst.getNumOperands(); i != e; ++i)
- if (!isa<Constant>(Inst.getOperand(i)) && Inst.getOperand(i) != V) {
- AllOperandsConstant = false;
- break;
- }
- if (!AllOperandsConstant)
- continue;
+ int getThreshold() { return Threshold; }
+ int getCost() { return Cost; }
- // We will get to remove this instruction...
- Reduction += InlineConstants::InstrCost;
+ // Keep a bunch of stats about the cost savings found so we can print them
+ // out when debugging.
+ unsigned NumConstantArgs;
+ unsigned NumConstantOffsetPtrArgs;
+ unsigned NumAllocaArgs;
+ unsigned NumConstantPtrCmps;
+ unsigned NumConstantPtrDiffs;
+ unsigned NumInstructionsSimplified;
+ unsigned SROACostSavings;
+ unsigned SROACostSavingsLost;
- // And any other instructions that use it which become constants
- // themselves.
- Worklist.push_back(&Inst);
- }
- } while (!Worklist.empty());
- return Reduction;
-}
+ void dump();
+};
-static unsigned countCodeReductionForAllocaICmp(const CodeMetrics &Metrics,
- ICmpInst *ICI) {
- unsigned Reduction = 0;
+} // namespace
- // Bail if this is comparing against a non-constant; there is nothing we can
- // do there.
- if (!isa<Constant>(ICI->getOperand(1)))
- return Reduction;
+/// \brief Test whether the given value is an Alloca-derived function argument.
+bool CallAnalyzer::isAllocaDerivedArg(Value *V) {
+ return SROAArgValues.count(V);
+}
- // An icmp pred (alloca, C) becomes true if the predicate is true when
- // equal and false otherwise.
- bool Result = ICI->isTrueWhenEqual();
+/// \brief Lookup the SROA-candidate argument and cost iterator which V maps to.
+/// Returns false if V does not map to a SROA-candidate.
+bool CallAnalyzer::lookupSROAArgAndCost(
+ Value *V, Value *&Arg, DenseMap<Value *, int>::iterator &CostIt) {
+ if (SROAArgValues.empty() || SROAArgCosts.empty())
+ return false;
- SmallVector<Instruction *, 4> Worklist;
- Worklist.push_back(ICI);
- do {
- Instruction *U = Worklist.pop_back_val();
- Reduction += InlineConstants::InstrCost;
- for (Value::use_iterator UI = U->use_begin(), UE = U->use_end();
- UI != UE; ++UI) {
- Instruction *I = dyn_cast<Instruction>(*UI);
- if (!I || I->mayHaveSideEffects()) continue;
- if (I->getNumOperands() == 1)
- Worklist.push_back(I);
- if (BinaryOperator *BO = dyn_cast<BinaryOperator>(I)) {
- // If BO produces the same value as U, then the other operand is
- // irrelevant and we can put it into the Worklist to continue
- // deleting dead instructions. If BO produces the same value as the
- // other operand, we can delete BO but that's it.
- if (Result == true) {
- if (BO->getOpcode() == Instruction::Or)
- Worklist.push_back(I);
- if (BO->getOpcode() == Instruction::And)
- Reduction += InlineConstants::InstrCost;
- } else {
- if (BO->getOpcode() == Instruction::Or ||
- BO->getOpcode() == Instruction::Xor)
- Reduction += InlineConstants::InstrCost;
- if (BO->getOpcode() == Instruction::And)
- Worklist.push_back(I);
- }
- }
- if (BranchInst *BI = dyn_cast<BranchInst>(I)) {
- BasicBlock *BB = BI->getSuccessor(Result ? 0 : 1);
- if (BB->getSinglePredecessor())
- Reduction
- += InlineConstants::InstrCost * Metrics.NumBBInsts.lookup(BB);
- }
- }
- } while (!Worklist.empty());
+ DenseMap<Value *, Value *>::iterator ArgIt = SROAArgValues.find(V);
+ if (ArgIt == SROAArgValues.end())
+ return false;
- return Reduction;
+ Arg = ArgIt->second;
+ CostIt = SROAArgCosts.find(Arg);
+ return CostIt != SROAArgCosts.end();
}
-/// \brief Compute the reduction possible for a given instruction if we are able
-/// to SROA an alloca.
+/// \brief Disable SROA for the candidate marked by this cost iterator.
///
-/// The reduction for this instruction is added to the SROAReduction output
-/// parameter. Returns false if this instruction is expected to defeat SROA in
-/// general.
-static bool countCodeReductionForSROAInst(Instruction *I,
- SmallVectorImpl<Value *> &Worklist,
- unsigned &SROAReduction) {
- if (LoadInst *LI = dyn_cast<LoadInst>(I)) {
- if (!LI->isSimple())
- return false;
- SROAReduction += InlineConstants::InstrCost;
+/// This markes the candidate as no longer viable for SROA, and adds the cost
+/// savings associated with it back into the inline cost measurement.
+void CallAnalyzer::disableSROA(DenseMap<Value *, int>::iterator CostIt) {
+ // If we're no longer able to perform SROA we need to undo its cost savings
+ // and prevent subsequent analysis.
+ Cost += CostIt->second;
+ SROACostSavings -= CostIt->second;
+ SROACostSavingsLost += CostIt->second;
+ SROAArgCosts.erase(CostIt);
+}
+
+/// \brief If 'V' maps to a SROA candidate, disable SROA for it.
+void CallAnalyzer::disableSROA(Value *V) {
+ Value *SROAArg;
+ DenseMap<Value *, int>::iterator CostIt;
+ if (lookupSROAArgAndCost(V, SROAArg, CostIt))
+ disableSROA(CostIt);
+}
+
+/// \brief Accumulate the given cost for a particular SROA candidate.
+void CallAnalyzer::accumulateSROACost(DenseMap<Value *, int>::iterator CostIt,
+ int InstructionCost) {
+ CostIt->second += InstructionCost;
+ SROACostSavings += InstructionCost;
+}
+
+/// \brief Helper for the common pattern of handling a SROA candidate.
+/// Either accumulates the cost savings if the SROA remains valid, or disables
+/// SROA for the candidate.
+bool CallAnalyzer::handleSROACandidate(bool IsSROAValid,
+ DenseMap<Value *, int>::iterator CostIt,
+ int InstructionCost) {
+ if (IsSROAValid) {
+ accumulateSROACost(CostIt, InstructionCost);
return true;
}
- if (StoreInst *SI = dyn_cast<StoreInst>(I)) {
- if (!SI->isSimple())
+ disableSROA(CostIt);
+ return false;
+}
+
+/// \brief Check whether a GEP's indices are all constant.
+///
+/// Respects any simplified values known during the analysis of this callsite.
+bool CallAnalyzer::isGEPOffsetConstant(GetElementPtrInst &GEP) {
+ for (User::op_iterator I = GEP.idx_begin(), E = GEP.idx_end(); I != E; ++I)
+ if (!isa<Constant>(*I) && !SimplifiedValues.lookup(*I))
return false;
- SROAReduction += InlineConstants::InstrCost;
- return true;
- }
- if (GetElementPtrInst *GEP = dyn_cast<GetElementPtrInst>(I)) {
- // If the GEP has variable indices, we won't be able to do much with it.
- if (!GEP->hasAllConstantIndices())
+ return true;
+}
+
+/// \brief Accumulate a constant GEP offset into an APInt if possible.
+///
+/// Returns false if unable to compute the offset for any reason. Respects any
+/// simplified values known during the analysis of this callsite.
+bool CallAnalyzer::accumulateGEPOffset(GEPOperator &GEP, APInt &Offset) {
+ if (!TD)
+ return false;
+
+ unsigned IntPtrWidth = TD->getPointerSizeInBits();
+ assert(IntPtrWidth == Offset.getBitWidth());
+
+ for (gep_type_iterator GTI = gep_type_begin(GEP), GTE = gep_type_end(GEP);
+ GTI != GTE; ++GTI) {
+ ConstantInt *OpC = dyn_cast<ConstantInt>(GTI.getOperand());
+ if (!OpC)
+ if (Constant *SimpleOp = SimplifiedValues.lookup(GTI.getOperand()))
+ OpC = dyn_cast<ConstantInt>(SimpleOp);
+ if (!OpC)
return false;
- // A non-zero GEP will likely become a mask operation after SROA.
- if (GEP->hasAllZeroIndices())
- SROAReduction += InlineConstants::InstrCost;
- Worklist.push_back(GEP);
- return true;
- }
+ if (OpC->isZero()) continue;
- if (BitCastInst *BCI = dyn_cast<BitCastInst>(I)) {
- // Track pointer through bitcasts.
- Worklist.push_back(BCI);
- SROAReduction += InlineConstants::InstrCost;
- return true;
+ // Handle a struct index, which adds its field offset to the pointer.
+ if (StructType *STy = dyn_cast<StructType>(*GTI)) {
+ unsigned ElementIdx = OpC->getZExtValue();
+ const StructLayout *SL = TD->getStructLayout(STy);
+ Offset += APInt(IntPtrWidth, SL->getElementOffset(ElementIdx));
+ continue;
+ }
+
+ APInt TypeSize(IntPtrWidth, TD->getTypeAllocSize(GTI.getIndexedType()));
+ Offset += OpC->getValue().sextOrTrunc(IntPtrWidth) * TypeSize;
}
+ return true;
+}
+
+bool CallAnalyzer::visitAlloca(AllocaInst &I) {
+ // FIXME: Check whether inlining will turn a dynamic alloca into a static
+ // alloca, and handle that case.
- // We just look for non-constant operands to ICmp instructions as those will
- // defeat SROA. The actual reduction for these happens even without SROA.
- if (ICmpInst *ICI = dyn_cast<ICmpInst>(I))
- return isa<Constant>(ICI->getOperand(1));
-
- if (SelectInst *SI = dyn_cast<SelectInst>(I)) {
- // SROA can handle a select of alloca iff all uses of the alloca are
- // loads, and dereferenceable. We assume it's dereferenceable since
- // we're told the input is an alloca.
- for (Value::use_iterator UI = SI->use_begin(), UE = SI->use_end();
- UI != UE; ++UI) {
- LoadInst *LI = dyn_cast<LoadInst>(*UI);
- if (LI == 0 || !LI->isSimple())
+ // We will happily inline tatic alloca instructions or dynamic alloca
+ // instructions in always-inline situations.
+ if (AlwaysInline || I.isStaticAlloca())
+ return Base::visitAlloca(I);
+
+ // FIXME: This is overly conservative. Dynamic allocas are inefficient for
+ // a variety of reasons, and so we would like to not inline them into
+ // functions which don't currently have a dynamic alloca. This simply
+ // disables inlining altogether in the presence of a dynamic alloca.
+ HasDynamicAlloca = true;
+ return false;
+}
+
+bool CallAnalyzer::visitPHI(PHINode &I) {
+ // FIXME: We should potentially be tracking values through phi nodes,
+ // especially when they collapse to a single value due to deleted CFG edges
+ // during inlining.
+
+ // FIXME: We need to propagate SROA *disabling* through phi nodes, even
+ // though we don't want to propagate it's bonuses. The idea is to disable
+ // SROA if it *might* be used in an inappropriate manner.
+
+ // Phi nodes are always zero-cost.
+ return true;
+}
+
+bool CallAnalyzer::visitGetElementPtr(GetElementPtrInst &I) {
+ Value *SROAArg;
+ DenseMap<Value *, int>::iterator CostIt;
+ bool SROACandidate = lookupSROAArgAndCost(I.getPointerOperand(),
+ SROAArg, CostIt);
+
+ // Try to fold GEPs of constant-offset call site argument pointers. This
+ // requires target data and inbounds GEPs.
+ if (TD && I.isInBounds()) {
+ // Check if we have a base + offset for the pointer.
+ Value *Ptr = I.getPointerOperand();
+ std::pair<Value *, APInt> BaseAndOffset = ConstantOffsetPtrs.lookup(Ptr);
+ if (BaseAndOffset.first) {
+ // Check if the offset of this GEP is constant, and if so accumulate it
+ // into Offset.
+ if (!accumulateGEPOffset(cast<GEPOperator>(I), BaseAndOffset.second)) {
+ // Non-constant GEPs aren't folded, and disable SROA.
+ if (SROACandidate)
+ disableSROA(CostIt);
return false;
+ }
+
+ // Add the result as a new mapping to Base + Offset.
+ ConstantOffsetPtrs[&I] = BaseAndOffset;
+
+ // Also handle SROA candidates here, we already know that the GEP is
+ // all-constant indexed.
+ if (SROACandidate)
+ SROAArgValues[&I] = SROAArg;
+
+ return true;
}
- // We don't know whether we'll be deleting the rest of the chain of
- // instructions from the SelectInst on, because we don't know whether
- // the other side of the select is also an alloca or not.
+ }
+
+ if (isGEPOffsetConstant(I)) {
+ if (SROACandidate)
+ SROAArgValues[&I] = SROAArg;
+
+ // Constant GEPs are modeled as free.
return true;
}
- if (IntrinsicInst *II = dyn_cast<IntrinsicInst>(I)) {
- switch (II->getIntrinsicID()) {
- default:
- return false;
- case Intrinsic::memset:
- case Intrinsic::memcpy:
- case Intrinsic::memmove:
- case Intrinsic::lifetime_start:
- case Intrinsic::lifetime_end:
- // SROA can usually chew through these intrinsics.
- SROAReduction += InlineConstants::InstrCost;
+ // Variable GEPs will require math and will disable SROA.
+ if (SROACandidate)
+ disableSROA(CostIt);
+ return false;
+}
+
+bool CallAnalyzer::visitBitCast(BitCastInst &I) {
+ // Propagate constants through bitcasts.
+ if (Constant *COp = dyn_cast<Constant>(I.getOperand(0)))
+ if (Constant *C = ConstantExpr::getBitCast(COp, I.getType())) {
+ SimplifiedValues[&I] = C;
return true;
}
+
+ // Track base/offsets through casts
+ std::pair<Value *, APInt> BaseAndOffset
+ = ConstantOffsetPtrs.lookup(I.getOperand(0));
+ // Casts don't change the offset, just wrap it up.
+ if (BaseAndOffset.first)
+ ConstantOffsetPtrs[&I] = BaseAndOffset;
+
+ // Also look for SROA candidates here.
+ Value *SROAArg;
+ DenseMap<Value *, int>::iterator CostIt;
+ if (lookupSROAArgAndCost(I.getOperand(0), SROAArg, CostIt))
+ SROAArgValues[&I] = SROAArg;
+
+ // Bitcasts are always zero cost.
+ return true;
+}
+
+bool CallAnalyzer::visitPtrToInt(PtrToIntInst &I) {
+ // Propagate constants through ptrtoint.
+ if (Constant *COp = dyn_cast<Constant>(I.getOperand(0)))
+ if (Constant *C = ConstantExpr::getPtrToInt(COp, I.getType())) {
+ SimplifiedValues[&I] = C;
+ return true;
+ }
+
+ // Track base/offset pairs when converted to a plain integer provided the
+ // integer is large enough to represent the pointer.
+ unsigned IntegerSize = I.getType()->getScalarSizeInBits();
+ if (TD && IntegerSize >= TD->getPointerSizeInBits()) {
+ std::pair<Value *, APInt> BaseAndOffset
+ = ConstantOffsetPtrs.lookup(I.getOperand(0));
+ if (BaseAndOffset.first)
+ ConstantOffsetPtrs[&I] = BaseAndOffset;
}
- // If there is some other strange instruction, we're not going to be
- // able to do much if we inline this.
- return false;
+ // This is really weird. Technically, ptrtoint will disable SROA. However,
+ // unless that ptrtoint is *used* somewhere in the live basic blocks after
+ // inlining, it will be nuked, and SROA should proceed. All of the uses which
+ // would block SROA would also block SROA if applied directly to a pointer,
+ // and so we can just add the integer in here. The only places where SROA is
+ // preserved either cannot fire on an integer, or won't in-and-of themselves
+ // disable SROA (ext) w/o some later use that we would see and disable.
+ Value *SROAArg;
+ DenseMap<Value *, int>::iterator CostIt;
+ if (lookupSROAArgAndCost(I.getOperand(0), SROAArg, CostIt))
+ SROAArgValues[&I] = SROAArg;
+
+ // A ptrtoint cast is free so long as the result is large enough to store the
+ // pointer, and a legal integer type.
+ return TD && TD->isLegalInteger(IntegerSize) &&
+ IntegerSize >= TD->getPointerSizeInBits();
}
-unsigned InlineCostAnalyzer::FunctionInfo::countCodeReductionForAlloca(
- const CodeMetrics &Metrics, Value *V) {
- if (!V->getType()->isPointerTy()) return 0; // Not a pointer
- unsigned Reduction = 0;
- unsigned SROAReduction = 0;
- bool CanSROAAlloca = true;
+bool CallAnalyzer::visitIntToPtr(IntToPtrInst &I) {
+ // Propagate constants through ptrtoint.
+ if (Constant *COp = dyn_cast<Constant>(I.getOperand(0)))
+ if (Constant *C = ConstantExpr::getIntToPtr(COp, I.getType())) {
+ SimplifiedValues[&I] = C;
+ return true;
+ }
- SmallVector<Value *, 4> Worklist;
- Worklist.push_back(V);
- do {
- Value *V = Worklist.pop_back_val();
- for (Value::use_iterator UI = V->use_begin(), E = V->use_end();
- UI != E; ++UI){
- Instruction *I = cast<Instruction>(*UI);
+ // Track base/offset pairs when round-tripped through a pointer without
+ // modifications provided the integer is not too large.
+ Value *Op = I.getOperand(0);
+ unsigned IntegerSize = Op->getType()->getScalarSizeInBits();
+ if (TD && IntegerSize <= TD->getPointerSizeInBits()) {
+ std::pair<Value *, APInt> BaseAndOffset = ConstantOffsetPtrs.lookup(Op);
+ if (BaseAndOffset.first)
+ ConstantOffsetPtrs[&I] = BaseAndOffset;
+ }
- if (ICmpInst *ICI = dyn_cast<ICmpInst>(I))
- Reduction += countCodeReductionForAllocaICmp(Metrics, ICI);
+ // "Propagate" SROA here in the same manner as we do for ptrtoint above.
+ Value *SROAArg;
+ DenseMap<Value *, int>::iterator CostIt;
+ if (lookupSROAArgAndCost(Op, SROAArg, CostIt))
+ SROAArgValues[&I] = SROAArg;
- if (CanSROAAlloca)
- CanSROAAlloca = countCodeReductionForSROAInst(I, Worklist,
- SROAReduction);
+ // An inttoptr cast is free so long as the input is a legal integer type
+ // which doesn't contain values outside the range of a pointer.
+ return TD && TD->isLegalInteger(IntegerSize) &&
+ IntegerSize <= TD->getPointerSizeInBits();
+}
+
+bool CallAnalyzer::visitCastInst(CastInst &I) {
+ // Propagate constants through ptrtoint.
+ if (Constant *COp = dyn_cast<Constant>(I.getOperand(0)))
+ if (Constant *C = ConstantExpr::getCast(I.getOpcode(), COp, I.getType())) {
+ SimplifiedValues[&I] = C;
+ return true;
}
- } while (!Worklist.empty());
- return Reduction + (CanSROAAlloca ? SROAReduction : 0);
+ // Disable SROA in the face of arbitrary casts we don't whitelist elsewhere.
+ disableSROA(I.getOperand(0));
+
+ // No-op casts don't have any cost.
+ if (I.isLosslessCast())
+ return true;
+
+ // trunc to a native type is free (assuming the target has compare and
+ // shift-right of the same width).
+ if (TD && isa<TruncInst>(I) &&
+ TD->isLegalInteger(TD->getTypeSizeInBits(I.getType())))
+ return true;
+
+ // Result of a cmp instruction is often extended (to be used by other
+ // cmp instructions, logical or return instructions). These are usually
+ // no-ops on most sane targets.
+ if (isa<CmpInst>(I.getOperand(0)))
+ return true;
+
+ // Assume the rest of the casts require work.
+ return false;
}
-void InlineCostAnalyzer::FunctionInfo::countCodeReductionForPointerPair(
- const CodeMetrics &Metrics, DenseMap<Value *, unsigned> &PointerArgs,
- Value *V, unsigned ArgIdx) {
- SmallVector<Value *, 4> Worklist;
- Worklist.push_back(V);
- do {
- Value *V = Worklist.pop_back_val();
- for (Value::use_iterator UI = V->use_begin(), E = V->use_end();
- UI != E; ++UI){
- Instruction *I = cast<Instruction>(*UI);
-
- if (GetElementPtrInst *GEP = dyn_cast<GetElementPtrInst>(I)) {
- // If the GEP has variable indices, we won't be able to do much with it.
- if (!GEP->hasAllConstantIndices())
- continue;
- // Unless the GEP is in-bounds, some comparisons will be non-constant.
- // Fortunately, the real-world cases where this occurs uses in-bounds
- // GEPs, and so we restrict the optimization to them here.
- if (!GEP->isInBounds())
- continue;
+bool CallAnalyzer::visitUnaryInstruction(UnaryInstruction &I) {
+ Value *Operand = I.getOperand(0);
+ Constant *Ops[1] = { dyn_cast<Constant>(Operand) };
+ if (Ops[0] || (Ops[0] = SimplifiedValues.lookup(Operand)))
+ if (Constant *C = ConstantFoldInstOperands(I.getOpcode(), I.getType(),
+ Ops, TD)) {
+ SimplifiedValues[&I] = C;
+ return true;
+ }
- // Constant indices just change the constant offset. Add the resulting
- // value both to our worklist for this argument, and to the set of
- // viable paired values with future arguments.
- PointerArgs[GEP] = ArgIdx;
- Worklist.push_back(GEP);
- continue;
+ // Disable any SROA on the argument to arbitrary unary operators.
+ disableSROA(Operand);
+
+ return false;
+}
+
+bool CallAnalyzer::visitICmp(ICmpInst &I) {
+ Value *LHS = I.getOperand(0), *RHS = I.getOperand(1);
+ // First try to handle simplified comparisons.
+ if (!isa<Constant>(LHS))
+ if (Constant *SimpleLHS = SimplifiedValues.lookup(LHS))
+ LHS = SimpleLHS;
+ if (!isa<Constant>(RHS))
+ if (Constant *SimpleRHS = SimplifiedValues.lookup(RHS))
+ RHS = SimpleRHS;
+ if (Constant *CLHS = dyn_cast<Constant>(LHS))
+ if (Constant *CRHS = dyn_cast<Constant>(RHS))
+ if (Constant *C = ConstantExpr::getICmp(I.getPredicate(), CLHS, CRHS)) {
+ SimplifiedValues[&I] = C;
+ return true;
}
- // Track pointer through casts. Even when the result is not a pointer, it
- // remains a constant relative to constants derived from other constant
- // pointers.
- if (CastInst *CI = dyn_cast<CastInst>(I)) {
- PointerArgs[CI] = ArgIdx;
- Worklist.push_back(CI);
- continue;
+ // Otherwise look for a comparison between constant offset pointers with
+ // a common base.
+ Value *LHSBase, *RHSBase;
+ APInt LHSOffset, RHSOffset;
+ llvm::tie(LHSBase, LHSOffset) = ConstantOffsetPtrs.lookup(LHS);
+ if (LHSBase) {
+ llvm::tie(RHSBase, RHSOffset) = ConstantOffsetPtrs.lookup(RHS);
+ if (RHSBase && LHSBase == RHSBase) {
+ // We have common bases, fold the icmp to a constant based on the
+ // offsets.
+ Constant *CLHS = ConstantInt::get(LHS->getContext(), LHSOffset);
+ Constant *CRHS = ConstantInt::get(RHS->getContext(), RHSOffset);
+ if (Constant *C = ConstantExpr::getICmp(I.getPredicate(), CLHS, CRHS)) {
+ SimplifiedValues[&I] = C;
+ ++NumConstantPtrCmps;
+ return true;
}
+ }
+ }
- // There are two instructions which produce a strict constant value when
- // applied to two related pointer values. Ignore everything else.
- if (!isa<ICmpInst>(I) && I->getOpcode() != Instruction::Sub)
- continue;
- assert(I->getNumOperands() == 2);
-
- // Ensure that the two operands are in our set of potentially paired