# `struct`s II: More Fun with `struct`s Turns out there's a lot more you can do with `struct`s than we've talked about, but it's just a big pile of miscellaneous things. So we'll throw them in this chapter. If you're good with `struct` basics, you can round out your knowledge here. ## Initializers of Nested `struct`s and Arrays Remember how you could [initialize structure members along these lines](#struct-initializers)? ``` {.c} struct foo x = {.a=12, .b=3.14}; ``` Turns out we have more power in these initializers than we'd originally shared. Exciting! For one thing, if you have a nested substructure like the following, you can initialize members of that substructure by following the variable names down the line: ``` {.c} struct foo x = {.a.b.c=12}; ``` Let's look at an example: ``` {.c .numberLines} #include struct cabin_information { int window_count; int o2level; }; struct spaceship { char *manufacturer; struct cabin_information ci; }; int main(void) { struct spaceship s = { .manufacturer="General Products", .ci.window_count = 8, // <-- NESTED INITIALIZER! .ci.o2level = 21 }; printf("%s: %d seats, %d%% oxygen\n", s.manufacturer, s.ci.window_count, s.ci.o2level); } ``` Check out lines 16-17! That's where we're initializing members of the `struct cabin_information` in the definition of `s`, our `struct spaceship`. And here is another option for that same initializer---this time we'll do something more standard-looking, but either approach works: ``` {.c .numberLines startFrom="15"} struct spaceship s = { .manufacturer="General Products", .ci={ .window_count = 8, .o2level = 21 } }; ``` Now, as if the above information isn't spectacular enough, we can also mix in array initializers in there, too. Let's change this up to get an array of passenger information in there, and we can check out how the initializers work in there, too. ``` {.c .numberLines} #include struct passenger { char *name; int covid_vaccinated; // Boolean }; #define MAX_PASSENGERS 8 struct spaceship { char *manufacturer; struct passenger passenger[MAX_PASSENGERS]; }; int main(void) { struct spaceship s = { .manufacturer="General Products", .passenger = { // Initialize a field at a time [0].name = "Gridley, Lewis", [0].covid_vaccinated = 0, // Or all at once [7] = {.name="Brown, Teela", .covid_vaccinated=1}, } }; printf("Passengers for %s ship:\n", s.manufacturer); for (int i = 0; i < MAX_PASSENGERS; i++) if (s.passenger[i].name != NULL) printf(" %s (%svaccinated)\n", s.passenger[i].name, s.passenger[i].covid_vaccinated? "": "not "); } ``` ## Anonymous `struct`s These are "the `struct` with no name". We also mention these in the [`typedef`](#typedef-struct) section, but we'll refresh here. Here's a regular `struct`: ``` {.c} struct animal { char *name; int leg_count, speed; }; ``` And here's the anonymous equivalent: ``` {.c} struct { // <-- No name! char *name; int leg_count, speed; }; ``` Okaaaaay. So we have a `struct`, but it has no name, so we have no way of using it later? Seems pretty pointless. Admittedly, in that example, it is. But we can still make use of it a couple ways. One is rare, but since the anonymous `struct` represents a type, we can just put some variable names after it and use them. ``` {.c} struct { // <-- No name! char *name; int leg_count, speed; } a, b, c; // 3 variables of this struct type a.name = "antelope"; c.leg_count = 4; // for example ``` But that's still not that useful. Far more common is use of anonymous `struct`s with a `typedef` so that we can use it later (e.g. to pass variables to functions). ``` {.c} typedef struct { // <-- No name! char *name; int leg_count, speed; } animal; // New type: animal animal a, b, c; a.name = "antelope"; c.leg_count = 4; // for example ``` Personally, I don't use many anonymous `struct`s. I think it's more pleasant to see the entire `struct animal` before the variable name in a declaration. But that's just, like, my opinion, man. ## Self-Referential `struct`s For any graph-like data structure, it's useful to be able to have pointers to the connected nodes/vertices. But this means that in the definition of a node, you need to have a pointer to a node. It's chicken and eggy! But it turns out you can do this in C with no problem whatsoever. For example, here's a linked list node: ``` {.c} struct node { int data; struct node *next; }; ``` It's important to note that `next` is a pointer. This is what allows the whole thing to even build. Even though the compiler doesn't know what the entire `struct node` looks like yet, all pointers are the same size. Here's a cheesy linked list program to test it out: ``` {.c .numberLines} #include #include struct node { int data; struct node *next; }; int main(void) { struct node *head; // Hackishly set up a linked list (11)->(22)->(33) head = malloc(sizeof(struct node)); head->data = 11; head->next = malloc(sizeof(struct node)); head->next->data = 22; head->next->next = malloc(sizeof(struct node)); head->next->next->data = 33; head->next->next->next = NULL; // Traverse it for (struct node *cur = head; cur != NULL; cur = cur->next) { printf("%d\n", cur->data); } } ``` Running that prints: ``` 11 22 33 ``` ## Flexible Array Members Back in the good old days, when people carved C code out of wood, some folks thought would be neat if they could allocate `struct`s that had variable length arrays at the end of them. I want to be clear that the first part of the section is the old way of doing things, and we're going to do things the new way after that. For example, maybe you could define a `struct` for holding strings and the length of that string. It would have a length and an array to hold the data. Maybe something like this: ``` {.c} struct len_string { int length; char data[8]; }; ``` But that has `8` hardcoded as the maximum length of a string, and that's not much. What if we did something _clever_ and just `malloc()`d some extra space at the end after the struct, and then let the data overflow into that space? Let's do that, and then allocate another 40 bytes on top of it: ``` {.c} struct len_string *s = malloc(sizeof *s + 40); ``` Because `data` is the last field of the `struct`, if we overflow that field, it runs out into space that we already allocated! For this reason, this trick only works if the short array is the _last_ field in the `struct`. ``` {.c} // Copy more than 8 bytes! strcpy(s->data, "Hello, world!"); // Won't crash. Probably. ``` In fact, there was a common compiler workaround for doing this, where you'd allocate a zero length array at the end: ``` {.c} struct len_string { int length; char data[0]; }; ``` And then every extra byte you allocated was ready for use in that string. Because `data` is the last field of the `struct`, if we overflow that field, it runs out into space that we already allocated! ``` {.c} // Copy more than 8 bytes! strcpy(s->data, "Hello, world!"); // Won't crash. Probably. ``` But, of course, actually accessing the data beyond the end of that array is undefined behavior! In these modern times, we no longer deign to resort to such savagery. Luckily for us, we can still get the same effect with C99 and later, but now it's legal. Let's just change our above definition to have no size for the array^[Technically we say that it has an _incomplete type_.]: ``` {.c} struct len_string { int length; char data[]; }; ``` Again, this only works if the flexible array member is the _last_ field in the `struct`. And then we can allocate all the space we want for those strings by `malloc()`ing larger than the `struct len_string`, as we do in this example that makes a new `struct len_string` from a C string: ``` {.c} struct len_string *len_string_from_c_string(char *s) { int len = strlen(s); // Allocate "len" more bytes than we'd normally need struct len_string *ls = malloc(sizeof *ls + len); ls->length = len; // Copy the string into those extra bytes memcpy(ls->data, s, len); return ls; } ``` ## Padding Bytes Beware that C is allowed to add padding bytes within or after a `struct` as it sees fit. You can't trust that they will be directly adjacent in memory^[Though some compilers have options to force this to occur---search for `__attribute__((packed))` to see how to do this with GCC.]. Let's take a look at this program. We output two numbers. One is the sum of the `sizeof`s the individual field types. The other is the `sizeof` the entire `struct`. One would expect them to be the same. The size of the total is the size of the sum of its parts, right? ``` {.c .numberLines} #include struct foo { int a; char b; int c; char d; }; int main(void) { printf("%zu\n", sizeof(int) + sizeof(char) + sizeof(int) + sizeof(char)); printf("%zu\n", sizeof(struct foo)); } ``` But on my system, this outputs: ``` 10 16 ``` They're not the same! The compiler has added 6 bytes of padding to help it be more performant. Maybe you got different output with your compiler, but unless you're forcing it, you can't be sure there's no padding. ## `offsetof` In the previous section, we saw that the compiler could inject padding bytes at will inside a structure. What if we needed to know where those were? We can measure it with `offsetof`, defined in ``. Let's modify the code from above to print the offsets of the individual fields in the `struct`: ``` {.c .numberLines} #include #include struct foo { int a; char b; int c; char d; }; int main(void) { printf("%zu\n", offsetof(struct foo, a)); printf("%zu\n", offsetof(struct foo, b)); printf("%zu\n", offsetof(struct foo, c)); printf("%zu\n", offsetof(struct foo, d)); } ``` For me, this outputs: ``` 0 4 8 12 ``` indicating that we're using 4 bytes for each of the fields. It's a little weird, because `char` is only 1 byte, right? The compiler is putting 3 padding bytes after each `char` so that all the fields are 4 bytes long. Presumably this will run faster on my CPU. ## Fake OOP There's a slightly abusive thing that's sort of OOP-like that you can do with `struct`s. Since the pointer to the `struct` is the same as a pointer to the first element of the `struct`, you can freely cast a pointer to the `struct` to a pointer to the first element. What this means is that we can set up a situation like this: ``` {.c} struct parent { int a, b; }; struct child { struct parent super; // MUST be first int c, d; }; ``` Then we are able to pass a pointer to a `struct child` to a function that expects either that _or_ a pointer to a `struct parent`! Because `struct parent super` is the first item in the `struct child`, a pointer to any `struct child` is the same as a pointer to that `super` field^[`super` isn't a keyword, incidentally. I'm just stealing some OOP terminology.]. Let's set up an example here. We'll make `struct`s as above, but then we'll pass a pointer to a `struct child` into a function that needs a pointer to a `struct parent`... and it'll still work. ``` {.c .numberLines} #include struct parent { int a, b; }; struct child { struct parent super; // MUST be first int c, d; }; // Making the argument `void*` so we can pass any type into it // (namely a struct parent or struct child) void print_parent(void *p) { // Expects a struct parent--but a struct child will also work // because the pointer points to the struct parent in the first // field: struct parent *self = p; printf("Parent: %d, %d\n", self->a, self->b); } void print_child(struct child *self) { printf("Child: %d, %d\n", self->c, self->d); } int main(void) { struct child c = {.super.a=1, .super.b=2, .c=3, .d=4}; print_child(&c); print_parent(&c); // Also works even though it's a struct child! } ``` See what we did on the last line of `main()`? We called `print_parent()` but passed a `struct child*` as the argument! Even though `print_parent()` needs the argument to point to a `struct parent`, we're _getting away with it_ because the first field in the `struct child` is a `struct parent`. Again, this works because a pointer to a `struct` has the same value as a pointer to the first field in that `struct`. ## Bit-Fields In my experience, these are rarely used, but you might see them out there from time to time, especially in lower-level applications that pack bits together into larger spaces. Let's take a look at some code to demonstrate a use case: ``` {.c .numberLines} #include struct foo { unsigned int a; unsigned int b; unsigned int c; unsigned int d; }; int main(void) { printf("%zu\n", sizeof(struct foo)); } ``` For me, this prints `16`. Which makes sense, since `unsigned`s are 4 bytes on my system. But what if we knew that all the values that were going to be stored in `a` and `b` could be stored in 5 bits, and the values in `c`, and `d` could be stored in 3 bits? That's only a total 16 bits. Why have 128 bits reserved for them if we're only going to use 16? Well, we can tell C to pretty-please try to pack these values in. We can specify the maximum number of bits that values can take (from 1 up the size of the containing type). We do this by putting a colon after the field name, followed by the field width in bits. ``` {.c .numberLines startFrom="3"} struct foo { unsigned int a:5; unsigned int b:5; unsigned int c:3; unsigned int d:3; }; ``` Now when I ask C how big my `struct foo` is, it tells me 4! It was 16 bytes, but now it's only 4. It has "packed" those 4 values down into 4 bytes, which is a four-fold memory savings. The tradeoff is, of course, that the 5-bit fields can only hold values from 0-31 and the 3-bit fields can only hold values from 0-7. But life's all about compromise, after all. ### Non-Adjacent Bit-Fields A gotcha: C will only combine **adjacent** bit-fields. If they're interrupted by non-bit-fields, you get no savings: ``` {.c} struct foo { // sizeof(struct foo) == 16 (for me) unsigned int a:1; // since a is not adjacent to c. unsigned int b; unsigned int c:1; unsigned int d; }; ``` In that example, since `a` is not adjacent to `c`, they are both "packed" in their own `int`s. So we have one `int` each for `a`, `b`, `c`, and `d`. Since my `int`s are 4 bytes, that's a grand total of 16 bytes. A quick rearrangement yields some space savings from 16 bytes down to 12 bytes (on my system): ``` {.c} struct foo { // sizeof(struct foo) == 12 (for me) unsigned int a:1; unsigned int c:1; unsigned int b; unsigned int d; }; ``` And now, since `a` is next to `c`, the compiler puts them together into a single `int`. So we have one `int` for a combined `a` and `c`, and one `int` each for `b` and `d`. For a grand total of 3 `int`s, or 12 bytes. Put all your bitfields together to get the compiler to combine them. ### Signed or Unsigned `int`s If you just declare a bit-field to be `int`, the different compilers will treat it as `signed` or `unsigned`. Just like the situation with `char`. Be specific about the signedness when using bit-fields. ### Unnamed Bit-Fields In some specific circumstances, you might need to reserve some bits for hardware reasons, but not need to use them in code. For example, let's say you have a byte where the top 2 bits have a meaning, the bottom 1 bit has a meaning, but the middle 5 bits do not get used by you^[Assuming 8-bit `char`s, i.e. `CHAR_BIT == 8`.]. We _could_ do something like this: ``` {.c} struct foo { unsigned char a:2; unsigned char dummy:5; unsigned char b:1; }; ``` And that works---in our code we use `a` and `b`, but never `dummy`. It's just there to eat up 5 bits to make sure `a` and `b` are in the "required" (by this contrived example) positions within the byte. C allows us a way to clean this up: _unnamed bit-fields_. You can just leave the name (`dummy`) out in this case, and C is perfectly happy for the same effect: ``` {.c} struct foo { unsigned char a:2; unsigned char :5; // <-- unnamed bit-field! unsigned char b:1; }; ``` ### Zero-Width Unnamed Bit-Fields Some more esoterica out here... Let's say you were packing bits into an `unsigned int`, and you needed some adjacent bit-fields to pack into the _next_ `unsigned int`. That is, if you do this: ``` {.c} struct foo { unsigned int a:1; unsigned int b:2; unsigned int c:3; unsigned int d:4; }; ``` the compiler packs all those into a single `unsigned int`. But what if you needed `a` and `b` in one `int`, and `c` and `d` in a different one? There's a solution for that: put an unnamed bit-field of width `0` where you want the compiler to start anew with packing bits in a different `int`: ``` {.c} struct foo { unsigned int a:1; unsigned int b:2; unsigned int :0; // <--Zero-width unnamed bit-field unsigned int c:3; unsigned int d:4; }; ``` It's analogous to an explicit page break in a word processor. You're telling the compiler, "Stop packing bits in this `unsigned`, and start packing them in the next one." By adding the zero-width unnamed bit field in that spot, the compiler puts `a` and `b` in one `unsigned int`, and `c` and `d` in another `unsigned int`. Two total, for a size of 8 bytes on my system (`unsigned int`s are 4 bytes each). ## Unions These are basically just like `struct`s, except the fields overlap in memory. The `union` will be only large enough for the largest field, and you can only use one field at a time. It's a way to reuse the same memory space for different types of data. You declare them just like `struct`s, except it's `union`. Take a look at this: ``` {.c} union foo { int a, b, c, d, e, f; float g, h; char i, j, k, l; }; ``` Now, that's a lot of fields. If this were a `struct`, my system would tell me it took 36 bytes to hold it all. But it's a `union`, so all those fields overlap in the same stretch of memory. The biggest one is `int` (or `float`), taking up 4 bytes on my system. And, indeed, if I ask for the `sizeof` the `union foo`, it tells me 4! The tradeoff is that you can only portably use one of those fields at a time. If you try to read from a field that was not the last one written to, the behavior is unspecified. Let's take that crazy `union` and first store an `int` in it, then a `float`. Then we'll print out the `int` again to see what's in there---even though, since it wasn't the last value we stored, the result is unspecified. ``` {.c .numberLines} #include union foo { int a, b, c, d, e, f; float g, h; char i, j, k, l; }; int main(void) { union foo x; x.a = 12; printf("%d\n", x.a); // OK--x.a was the last thing we stored into x.g = 3.141592; printf("%f\n", x.g); // OK--x.g was the last thing we stored into printf("%d\n", x.a); // Unspecified behavior! } ``` On my machine, this prints: ``` 12 3.141592 1078530008 ``` Probably deep down the decimal value `1078530008` is probably the same pattern of bits as `3.141592`, but the spec makes no guarantees about this. ### Pointers to `union`s If you have a pointer to a `union`, you can cast that pointer to any of the types of the fields in that `union` and get the values out that way. In this example, we see that the `union` has `int`s and `float`s in it. And we get pointers to the `union`, but we cast them to `int*` and `float*` types (the cast silences compiler warnings). And then if we dereference those, we see that they have the values we stored directly in the `union`. ``` {.c .numberLines} #include union foo { int a, b, c, d, e, f; float g, h; char i, j, k, l; }; int main(void) { union foo x; int *foo_int_p = (int *)&x; float *foo_float_p = (float *)&x; x.a = 12; printf("%d\n", x.a); // 12 printf("%d\n", *foo_int_p); // 12, again x.g = 3.141592; printf("%f\n", x.g); // 3.141592 printf("%f\n", *foo_float_p); // 3.141592, again } ``` The reverse is also true. If we have a pointer to a type inside the `union`, we can cast that to a pointer to the `union` and access its members. ``` {.c} union foo x; int *foo_int_p = (int *)&x; // Pointer to int field union foo *p = (union foo *)foo_int_p; // Back to pointer to union p->a = 12; // This line the same as... x.a = 12; // this one. ``` All this just lets you know that, under the hood, all these values in a `union` start at the same place in memory, and that's the same as where the entire `union` is.