# Pointers---Cower In Fear! { #pointers} Pointers are one of the most feared things in the C language. In fact, they are the one thing that makes this language challenging at all. But why? Because they, quite honestly, can cause electric shocks to come up through the keyboard and physically _weld_ your arms permanently in place, cursing you to a life at the keyboard in this language from the 70s! Well, not really. But they can cause huge headaches if you don't know what you're doing when you try to mess with them. Depending on what language you came from, you might already understand the concept of _references_, where a variable refers to an object of some type. This is very much the same, except we have to be more explicit with C about when we're talking about the reference or the thing it refers to. ## Memory and Variables {#ptmem} Computer memory holds data of all kinds, right? It'll hold `float`s, `int`s, or whatever you have. To make memory easy to cope with, each byte of memory is identified by an integer. These integers increase sequentially as you move up through memory. You can think of it as a bunch of numbered boxes, where each box holds a byte^[A byte is a number made up of no more than 8 binary digits, or _bits_ for short. This means in decimal digits just like grandma used to use, it can hold an unsigned number between 0 and 255, inclusive.] of data. Or like a big array where each element holds a byte, if you come from a language with arrays. The number that represents each box is called its _address_. Now, not all data types use just a byte. For instance, an `int` is often four bytes, as is a `float`, but it really depends on the system. You can use the `sizeof` operator to determine how many bytes of memory a certain type uses. ``` {.c} // %zu is the format specifier for type size_t ("t" is for "type", but // it's pronounced "size tee"), which is what is returned by sizeof. // More on size_t later. printf("an int uses %zu bytes of memory\n", sizeof(int)); // That prints "4" for me, but can vary by system. ``` When you have a data type that uses more than a byte of memory, the bytes that make up the data are always adjacent to one another in memory. Sometimes they're in order, and sometimes they're not^[The order that bytes come in is referred to as the _endianess_ of the number. Common ones are _big endian_ and _little endian_. This usually isn't something you need to worry about.], but that's platform-dependent, and often taken care of for you without you needing to worry about pesky byte orderings. So _anyway_, if we can get on with it and get a drum roll and some forboding music playing for the definition of a pointer, _a pointer is a variable that holds an address_. Imagine the classical score from 2001: A Space Odessey at this point. Ba bum ba bum ba bum BAAAAH! Ok, so maybe a bit overwrought here, yes? There's not a lot of mystery about pointers. They are the address of data. Just like an `int` variable can hold the value `12`, a pointer variable can hold the address of data. This means that all these things mean the same thing, i.e. a number that represents a point in memory: * Index into memory (if you're thinking of memory like a big array) * Address * Location I'm going to use these interchangeably. And yes, I just threw _location_ in there because you can never have enough words that mean the same thing. And a pointer variable holds that address number. Just like a `float` variable might hold `3.14159`. Often, we like to store the address of some data that we have stored in a variable, as opposed to any old random address out in memory wherever. When we have that, we say we have a "pointer to" some data. So if we have an `int`, say, and we want a pointer to it, what we want is some way to get the address of that `int`, right? After all, the pointer just holds the _address of_ the data. What operator do you suppose we'd use to find the _address of_ the `int`? Well, by a shocking surprise that must come as something of a shock to you, gentle reader, we use the `address-of` operator (which happens to be an ampersand: "`&`") to find the address of the data. Ampersand. So for a quick example, we'll introduce a new _format specifier_ for `printf()` so you can print a pointer. You know already how `%d` prints a decimal integer, yes? Well, `%p` prints a pointer. Now, this pointer is going to look like a garbage number (and it might be printed in hexadecimal^[That is, base 16 with digits 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, A, B, C, D, E, and F.] instead of decimal), but it is merely the index into memory the data is stored in. (Or the index into memory that the first byte of data is stored in, if the data is multi-byte.) In virtually all circumstances, including this one, the actual value of the number printed is unimportant to you, and I show it here only for demonstration of the `address-of` operator. ``` {.c .numberLines} #include int main(void) { int i = 10; printf("The value of i is %d, and its address is %p\n", i, &i); } ``` On my computer, this prints: ```shell The value of i is 10, and its address is 0x7ffda2546fc4 ``` If you're curious, that hexadecimal number is 140,727,326,896,068 in decimal (base 10 just like Grandma used to use). That's the index into memory where the variable `i`'s data is stored. It's the address of `i`. It's the location of `i`. It's a pointer to `i`. It's a pointer because it lets you know where `i` is in memory. Like a home address written on a scrap of paper tells you where you can find a particular house, this number indicates to us where in memory we can find the value of `i`. It points to `i`. Again, we don't really care what the address's exact number is, generally. We just care that it's a pointer to `i`. ## Pointer Types {#pttypes} Well, this is all well and good. You can now successfully take the address of a variable and print it on the screen. There's a little something for the ol' resume, right? Here's where you grab me by the scruff of the neck and ask politely what the frick pointers are good for. Excellent question, and we'll get to that right after these messages from our sponsor. > `ACME ROBOTIC HOUSING UNIT CLEANING SERVICES. YOUR HOMESTEAD WILL BE > DRAMATICALLY IMPROVED OR YOU WILL BE TERMINATED. MESSAGE ENDS.` Welcome back to another installment of Beej's Guide to Whatever. When we met last we were talking about how to make use of pointers. Well, what we're going to do is store a pointer off in a variable so that we can use it later. You can identify the _pointer type_ because there's an asterisk (`*`) before the variable name and after its type: ``` {.c .numberLines} int main(void) { int i; // i's type is "int" int *p; // p's type is "pointer to an int", or "int-pointer" } ``` Hey, so we have here a variable that is a pointer itself, and it can point to other `int`s. That it, is can hold the address of other `int`s. We know it points to `int`s, since it's of type `int*` (read "int-pointer"). When you do an assignment into a pointer variable, the type of the right hand side of the assignment has to be the same type as the pointer variable. Fortunately for us, when you take the `address-of` a variable, the resultant type is a pointer to that variable type, so assignments like the following are perfect: ``` {.c} int i; int *p; // p is a pointer, but is uninitialized and points to garbage p = &i; // p is assigned the address of i--p now "points to" i ``` On the left of the assignment, we have a variable of type pointer-to-`int` (`int*`), and on the right side, we have expression of type pointer-to-`int` since `i` is an `int` (because address-of `int` gives you a pointer to `int`). The address of a thing can be stored in a pointer to that thing. Get it? I know is still doesn't quite make much sense since you haven't seen an actual use for the pointer variable, but we're taking small steps here so that no one gets lost. So now, let's introduce you to the anti-address-of, operator. It's kind of like what `address-of` would be like in Bizarro World. ## Dereferencing {#deref} A pointer variable can be thought of as _referring_ to another variable by pointing to it. It's rare you'll hear anyone in C land talking about "referring" or "references", but I bring it up just so that the name of this operator will make a little more sense. When you have a pointer to a variable (roughly "a reference to a variable"), you can use the original variable through the pointer by _dereferencing_ the pointer. (You can think of this as "de-pointering" the pointer, but no one ever says "de-pointering".) What do I mean by "get access to the original variable"? Well, if you have a variable called `i`, and you have a pointer to `i` called `p`, you can use the dereferenced pointer `p` _exactly as if it were the original variable `i`_! You almost have enough knowledge to handle an example. The last tidbit you need to know is actually this: what is the dereference operator? It is the asterisk, again: `*`. Now, don't get this confused with the asterisk you used in the pointer declaration, earlier. They are the same character, but they have different meanings in different contexts^[That's not all! It's used in `/*comments*/` and multiplication and in function prototypes with variable length arrays! It's all the same `*`, but the context gives it different meaning.]. Here's a full-blown example: ``` {.c .numberLines} #include int main(void) { int i; int *p; // this is NOT a dereference--this is a type "int*" p = &i; // p now points to i, p holds address of i i = 10; // i is now 10 *p = 20; // i (yes i!) is now 20!! printf("i is %d\n", i); // prints "20" printf("i is %d\n", *p); // "20"! dereference-p is the same as i! } ``` Remember that `p` holds the address of `i`, as you can see where we did the assignment to `p` on line 8. What the dereference operator does is tells the computer to _use the object the pointer points to_ instead of using the pointer itself. In this way, we have turned `*p` into an alias of sorts for `i`. Great, but _why_? Why do any of this? ## Passing Pointers as Parameters {#ptpass} Right about now, you're thinking that you have an awful lot of knowledge about pointers, but absolutely zero application, right? I mean, what use is `*p` if you could just simply say `i` instead? Well, my feathered friend, the real power of pointers comes into play when you start passing them to functions. Why is this a big deal? You might recall from before that you could pass all kinds of parameters to functions and they'd be dutifully copied onto the stack, and then you could manipulate local copies of those variables from within the function, and then you could return a single value. What if you wanted to bring back more than one single piece of data from the function? I mean, you can only return one thing, right? What if I answered that question with another question? ...Er, two questions? What happens when you pass a pointer as a parameter to a function? Does a copy of the pointer get put on the stack? _You bet your sweet peas it does._ Remember how earlier I rambled on and on about how _EVERY SINGLE PARAMETER_ gets copied onto the stack and the function uses a copy of the parameter? Well, the same is true here. The function will get a copy of the pointer. But, and this is the clever part: we will have set up the pointer in advance to point at a variable...and then the function can dereference its copy of the pointer to get back to the original variable! The function can't see the variable itself, but it can certainly dereference a pointer to that variable! This is analogous to writing a home address on a piece of paper, and then copying that onto another piece of paper. You now have _two_ pointers to that house. In the case of a function call. one of the copies is stored in a pointer variable out in the calling scope, and the other is stored in a pointer variable that is the parameter of the function. Example! ``` {.c .numberLines} #include void increment(int *p) // note that it accepts a pointer to an int { *p = *p + 1; // add one to the thing p points to } int main(void) { int i = 10; int *j = &i; // note the address-of; turns it into a pointer to i printf("i is %d\n", i); // prints "10" printf("i is also %d\n", *j); // prints "10" increment(j); // j is an int*--to i printf("i is %d\n", i); // prints "11"! } ``` Ok! There are a couple things to see here...not the least of which is that the `increment()` function takes an `int*` as a parameter. We pass it an `int*` in the call by changing the `int` variable `i` to an `int*` using the `address-of` operator. (Remember, a pointer holds an address, so we make pointers to variables by running them through the `address-of` operator.) The `increment()` function gets a copy of the pointer on the stack. Both the original pointer `j` (in `main()`) and the copy of that pointer `p` (the parameter in `increment()`) point to the same address, namely the one holding the value `i`. (Again, by analogy, like two pieces of paper with the same home address written on them.) Dereferencing either will allow you to modify the original variable `i`! The function can modify a variable in another scope! Rock on! The above example is often more concisely written in the call just by using address-of right in the argument list: ``` {.c} printf("i is %d\n", i); // prints "10" increment(&i); printf("i is %d\n", i); // prints "11"! ``` Pointer enthusiasts will recall from early on in the guide, we used a function to read from the keyboard, `scanf()`...and, although you might not have recognized it at the time, we used the `address-of` to pass a pointer to a value to `scanf()`. We had to pass a pointer, see, because `scanf()` reads from the keyboard (typically) and stores the result in a variable. The only way it can see that variable that is local to that calling function is if we pass a pointer to that variable: ``` {.c} int i = 0; scanf("%d", &i); // pretend you typed "12" printf("i is %d\n", i); // prints "i is 12" ``` See, `scanf()` dereferences the pointer we pass it in order to modify the variable it points to. And now you know why you have to put that pesky ampersand in there! ## The `NULL` Pointer Any pointer variable of any pointer type can be set to a special value called `NULL`. This indicates that this pointer doesn't point to anything. ``` {.c} int *p; p = NULL; ``` Since it doesn't point to a value, dereferencing it is undefined behavior, and probably will result in a crash: ``` {.c} int *p = NULL; *p = 12; // CRASH or SOMETHING PROBABLY BAD. BEST AVOIDED. ``` Despite being called [flw[the billion dollar mistake by its creator|Null_pointer#History]], the `NULL` pointer is a good [flw[sentinel value|Sentinel_value]] and general indicator that a pointer hasn't yet been initialized. (Of course, the pointer points to garbage unless you explicitly assign it to point to an address or `NULL`.) ## A Note on Declaring Pointers The syntax for declaring a pointer can get a little weird. Let's look at this example: ``` {.c} int a; int b; ``` We can condense that into a single line, right? ``` {.c} int a, b; // Same thing ``` So `a` and `b` are both `int`s. No problem. But what about this? ``` {.c} int a; int *p; ``` Can we make that into one line? We can. But where does the `*` go? The rule is that the `*` goes in front of any variable that is a pointer type. That is. the `*` is _not_ part of the `int` in this example. it's a part of variable `p`. With that in mind, we can write this: ``` {.c} int a, *p; // Same thing ``` It's important to note that the following line does _not_ declare two pointers: ``` {.c} int *p, q; // p is a pointer to an int; q is just an int. ``` This can be particularly insidious-looking if the programmer writes this following (valid) line of code which is functionally identical to the one above. ``` {.c} int* p, q; // p is a pointer to an int; q is just an int. ``` So take a look at this and determine which variables are pointers and which are not: ``` {.c} int *a, b, c, *d, e, *f, g, h, *i; ``` I'll drop the answer in a footnote^[The pointer type variables are `a`, `d`, `f`, and `i`, because those are the ones with `*` in front of them.]. ## `sizeof` and Pointers Just a little bit of syntax here that might be confusing and you might see from time to time. Recall that `sizeof` operates on the _type_ of the expression. ``` {.c} int *p; sizeof(int); // Returns size of an `int` sizeof p // p is type int*, so returns size of `int*` sizeof *p // *p is type int, so returns size of `int` ``` You might see code with that last `sizeof` in there. Just remember that `sizeof` is all about the type of the expression, not the variables in the expression themselves.