From e00906fbc222c19b7ab84a817b2be46b87484e99 Mon Sep 17 00:00:00 2001 From: Reid Spencer Date: Thu, 10 Aug 2006 20:15:58 +0000 Subject: Answer the most frequently asked question, about GEPs. The answer is sufficiently long that I placed it in a separate file but it links from the FAQ page. More might need to be added to GetElementPtr.html to address additional confusion surrounding GEP. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@29594 91177308-0d34-0410-b5e6-96231b3b80d8 --- docs/GetElementPtr.html | 249 ++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 249 insertions(+) create mode 100644 docs/GetElementPtr.html (limited to 'docs/GetElementPtr.html') diff --git a/docs/GetElementPtr.html b/docs/GetElementPtr.html new file mode 100644 index 0000000000..13b5138ab2 --- /dev/null +++ b/docs/GetElementPtr.html @@ -0,0 +1,249 @@ + + + + + The Often Misunderstood GEP Instruction + + + + +
+ The Often Misunderstood GEP Instruction +
+ +
    +
  1. Introduction
  2. +
  3. The Questions +
      +
    1. Why is the extra 0 index required?
    2. +
    3. What is dereferenced by GEP?
    4. +
    5. Why can you index through the first pointer but not + subsequent ones?
    6. +
    7. Why don't GEP x,0,0,1 and GEP x,1 alias?
    8. +
    9. Why do GEP x,1,0,0 and GEP x,1 alias?
    10. +
  4. +
  5. Summary
  6. +
+ +
+

Written by: Reid Spencer.

+
+ + + +
Introduction
+ +
+

This document seeks to dispel the mystery and confusion surrounding LLVM's + GetElementPtr (GEP) instruction. Questions about the wiley GEP instruction are + probably the most frequently occuring questions once a developer gets down to + coding with LLVM. Here we lay out the sources of confusion and show that the + GEP instruction is really quite simple. +

+
+ + +
The Questions
+ +
+

When people are first confronted with the GEP instruction, they tend to + relate it to known concepts from other programming paradigms, most notably C + array indexing and field selection. However, GEP is a little different and + this leads to the following questions, all of which are answered in the + following sections.

+
    +
  1. Why is the extra 0 index required?
  2. +
  3. What is dereferenced by GEP?
  4. +
  5. Why can you index through the first pointer but not + subsequent ones?
  6. +
  7. Why don't GEP x,0,0,1 and GEP x,1 alias?
  8. +
  9. Why do GEP x,1,0,0 and GEP x,1 alias?
  10. +
+
+ + +
+ Why is the extra 0 index required? +
+ +
+

Quick answer: there are no superfluous indices.

+

This question arises most often when the GEP instruction is applied to a + global variable which is always a pointer type. For example, consider + this:

+  %MyStruct = uninitialized global { float*, int }
+  ...
+  %idx = getelementptr { float*, int }* %MyStruct, long 0, ubyte 1
+

The GEP above yields an int* by indexing the int typed + field of the structure %MyStruct. When people first look at it, they + wonder why the long 0 index is needed. However, a closer inspection + of how globals and GEPs work reveals the need. Becoming aware of the following + facts will dispell the confusion:

+
    +
  1. The type of %MyStruct is not { float*, int } + but rather { float*, int }*. That is, %MyStruct is a + pointer to a structure containing a pointer to a float and an + int.
  2. +
  3. Point #1 is evidenced by noticing the type of the first operand of + the GEP instruction (%MyStruct) which is + { float*, int }*.
  4. +
  5. The first index, long 0 is required to dereference the + pointer associated with %MyStruct.
  6. +
  7. The second index, ubyte 1 selects the second field of the + structure (the int).
  8. +
+
+ + +
+ What is dereferenced by GEP? +
+
+

Quick answer: nothing.

+

The GetElementPtr instruction dereferences nothing. That is, it doesn't + access memory in any way. That's what the Load instruction is for. GEP is + only involved in the computation of addresses. For example, consider this:

+
+  %MyVar = uninitialized global { [40 x int ]* }
+  ...
+  %idx = getelementptr { [40 x int]* }* %MyVar, long 0, ubyte 0, long 0, long 17
+

In this example, we have a global variable, %MyVar that is a + pointer to a structure containing a pointer to an array of 40 ints. The + GEP instruction seems to be accessing the 18th integer of of the structure's + array of ints. However, this is actually an illegal GEP instruction. It + won't compile. The reason is that the pointer in the structure must + be dereferenced in order to index into the array of 40 ints. Since the + GEP instruction never accesses memory, it is illegal.

+

In order to access the 18th integer in the array, you would need to do the + following:

+
+  %idx = getelementptr { [40 x int]* }* %, long 0, ubyte 0
+  %arr = load [40 x int]** %idx
+  %idx = getelementptr [40 x int]* %arr, long 0, long 17
+

In this case, we have to load the pointer in the structure with a load + instruction before we can index into the array. If the example was changed + to:

+
+  %MyVar = uninitialized global { [40 x int ] }
+  ...
+  %idx = getelementptr { [40 x int] }*, long 0, ubyte 0, long 17
+

then everything works fine. In this case, the structure does not contain a + pointer and the GEP instruction can index through the global variable pointer, + into the first field of the structure and access the 18th int in the + array there.

+
+ + +
+ Why can you index through the first pointer? +
+
+

Quick answer: Because its already present.

+

Having understood the previous question, a new + question then arises:

+
Why is it okay to index through the first pointer, but + subsequent pointers won't be dereferenced?
+

The answer is simply because + memory does not have to be accessed to perform the computation. The first + operand to the GEP instruction must be a value of a pointer type. The value + of the pointer is provided directly to the GEP instruction without any need + for accessing memory. It must, therefore be indexed like any other operand. + Consider this example:

+
+  %MyVar = unintialized global int
+  ...
+  %idx1 = getelementptr int* %MyVar, long 0
+  %idx2 = getelementptr int* %MyVar, long 1
+  %idx3 = getelementptr int* %MyVar, long 2
+

These GEP instructions are simply making address computations from the + base address of MyVar. They compute, as follows (using C syntax):

+ +

Since the type int is known to be four bytes long, the indices + 0, 1 and 2 translate into memory offsets of 0, 4, and 8, respectively. No + memory is accessed to make these computations because the address of + %MyVar is passed directly to the GEP instructions.

+

Note that the cases of %idx2 and %idx3 are a bit silly. + They are computing addresses of something of unknown type (and thus + potentially breaking type safety) because %MyVar is only one + integer long.

+
+ + +
+ Why don't GEP x,0,0,1 and GEP x,1 alias? +
+
+

Quick Answer: They compute different address locations.

+

If you look at the first indices in these GEP + instructions you find that they are different (0 and 1), therefore the address + computation diverges with that index. Consider this example:

+
+  %MyVar = global { [10 x int ] }
+  %idx1 = getlementptr { [10 x int ] }* %MyVar, long 0, byte 0, long 1
+  %idx2 = getlementptr { [10 x int ] }* %MyVar, long 1
+

In this example, idx1 computes the address of the second integer + in the array that is in the structure in %MyVar, that is MyVar+4. The + type of idx1 is int*. However, idx2 computes the + address of the next structure after %MyVar. The type of + idx2 is { [10 x int] }* and its value is equivalent + to MyVar + 40 because it indexes past the ten 4-byte integers + in MyVar. Obviously, in such a situation, the pointers don't + alias.

+
+ + +
+ Why do GEP x,1,0,0 and GEP x,1 alias? +
+
+

Quick Answer: They compute the same address location.

+

These two GEP instructions will compute the same address because indexing + through the 0th element does not change the address. However, it does change + the type. Consider this example:

+
+  %MyVar = global { [10 x int ] }
+  %idx1 = getlementptr { [10 x int ] }* %MyVar, long 1, byte 0, long 0
+  %idx2 = getlementptr { [10 x int ] }* %MyVar, long 1
+

In this example, the value of %idx1 is %MyVar+40 and + its type is int*. The value of %idx2 is also + MyVar+40 but its type is { [10 x int] }*.

+
+ + +
Summary
+ + +
+

In summary, here's some things to always remember about the GetElementPtr + instruction:

+
    +
  1. The GEP instruction never accesses memory, it only provides pointer + computations.
  2. +
  3. The first operand to the GEP instruction is always a pointer and it must + be indexed.
  4. +
  5. There are no superfluous indices for the GEP instruction.
  6. +
  7. Trailing zero indices are superfluous for pointer aliasing, but not for + the types of the pointers.
  8. +
  9. Leading zero indices are not superfluous for pointer aliasing nor the + types of the pointers.
  10. +
+
+ + + +
+
+ Valid CSS! + Valid HTML 4.01! + The LLVM Compiler Infrastructure
+ Last modified: $Date$ +
+ + -- cgit v1.2.3-70-g09d2