# The Outside Environment When you run a program, it's actually you talking to the shell, saying, "Hey, please run this thing." And the shell says, "Sure," and then tells the operating system, "Hey, could you please make a new process and run this thing?" And if all goes well, the OS complies and your program runs. But there's a whole world outside your program in the shell that can be interacted with from within C. We'll look at a few of those in this chapter. ## Command Line Arguments Many command line utilities accept _command line arguments_. For example, if we want to see all files that end in `.txt`, we can type something like this on a Unix-like system: ``` ls *.txt ``` (or `dir` instead of `ls` on a Windows system). In this case, the command is `ls`, but it arguments are all all files that end with `.txt`^[Historially, MS-DOS and Windows programs would do this differently than Unix. In Unix, the shell would _expand_ the wildcard into all matching files before your program saw it, whereas the Microsoft variants would pass the wildcard expression into the program to deal with. In any case, there are arguments that get passed into the program.]. So how can we see what is passed into program from the command line? Say we have a program called `add` that adds all numbers passed on the command line and prints the result: ``` ./add 10 30 5 45 ``` That's gonna pay the bills for sure! But seriously, this is a great tool for seeing how to get those arguments from the command line and break them down. First, let's see how to get them at all. For this, we're going to need a new `main()`! Here's a program that prints out all the command line arguments. For example, if we name the executable `foo`, we can run it like this: ``` ./foo i like turtles ``` and we'll see this output: ``` arg 0: ./foo arg 1: i arg 2: like arg 3: turtles ``` It's a little weird, because the zeroth argument is the name of the executable, itself. But that's just something to get used to. The arguments themselves follow directly. Source: ``` {.c .numberLines} #include int main(int argc, char *argv[]) { for (int i = 0; i < argc; i++) { printf("arg %d: %s\n", i, argv[i]); } } ``` Whoa! What's going on with the `main()` function signature? What's `argc` and `argv`^[Since they're just regular parameter names, you don't actually have to call them `argc` and `argv`. But it's so very idiomatic to use those names, if you get creative, other C programmers will look at you with a suspicious eye, indeed!] (pronounced _arg-c_ and _arg-v_)? Let's start with the easy one first: `argc`. This is the _argument count_, including the program name, itself. If you think of all the arguments as an array of strings, which is exactly what they are, then you can think of `argc` as the length of that array, which is exactly what it is. And so what we're doing in that loop is going through all the `argv`s and printing them out one at a time, so for a given input: ``` ./foo i like turtles ``` we get a corresponding output: ``` arg 0: ./foo arg 1: i arg 2: like arg 3: turtles ``` With that in mind, we should be good to go with our adder program. Our plan: * Look at all the command line arguments (past `argv[0]`, the program name) * Convert them to integers * Add them to a running total * Print the result Let's get to it! ``` {.c .numberLines} #include #include int main(int argc, char **argv) { int total = 0; for (int i = 1; i < argc; i++) { // Start at 1, the first argument int value = atoi(argv[i]); // Use strtol() for better error handling total += value; } printf("%d\n", total); } ``` Sample runs: ``` $ ./add 0 $ ./add 1 1 $ ./add 1 2 3 $ ./add 1 2 3 6 $ ./add 1 2 3 4 10 ``` Of course, it might puke if you pass in a non-integer, but hardening against that is left as an exercise to the reader. ### The Last `argv` is `NULL` One bit of fun trivia about `argv` is that after the last string is a pointer to `NULL`. That is: ``` {.c} argv[argc] == NULL ``` is always true! This might seem pointless, but it turns out to be useful in a couple places; we'll take a look at one of those right now. ### The Alternate: `char **argv` Remember that when you call a function, C doesn't differentiate between array notation and pointer notation in the function signature. That is, these are the same: ``` {.c} void foo(char a[]) void foo(char *a) ``` Now, it's been convenient to think of `argv` as an array of strings, i.e. an array of `char*`s, so this made sense: ``` {.c} int main(int argc, char *argv[]) ``` but because of the equivalence, you could also write: ``` {.c} int main(int argc, char **argv) ``` Yeah, that's a pointer to a pointer, all right! If it makes it easier, think of it as a pointer to a string. But really, it's a pointer to a value that points to a `char`. Also recall that these are equivalent: ``` {.c} argv[i] *(argv + i) ``` which means you can do pointer arithmetic on `argv`. So an alternate way to consume the command line arguments might be to just walk along the `argv` array by bumping up a pointer until we hit that `NULL` at the end. Let's modify our adder to do that: ``` {.c .numberLines} #include #include int main(int argc, char **argv) { int total = 0; // Cute trick to get the compiler to stop warning about the // unused variable argc: (void)argc; for (char **p = argv; *p != NULL; p++) { int value = atoi(*p); // Use strtol() for better error handling total += value; } printf("%d\n", total); } ``` Personally, I use array notation to access `argv`, but have seen this style floating around, as well. ### Fun Facts Just a few more things about `argc` and `argv`. * Some environments might not set `argv[0]` to the program name. If it's not available, `argv[0]` will be an empty string. I've never seen this happen. * The spec is actually pretty liberal with what an implementation can do with `argv` and where those values come from. But every system I've been on works the same way, as we've discussed in this section. * You can modify `argc`, `argv`, or any of the strings that `argv` points to. (Just don't make those strings longer than they already are!) * On some Unix-like systems, modifying the string `argv[0]` results in the output of `ps` changing^[`ps`, Process Status, is a Unix command to see what processes are running at the moment.]. Normally, if you have a program called `foo` that you've run with `./foo`, you might see this in the output of `ps`: ``` 4078 tty1 S 0:00 ./foo ``` But if you modify `argv[0]` like so, being careful that the new string `"Hi! "` is the same length as the old one `"./foo"`: ``` {.c} strcpy(argv[0], "Hi! "); ``` and then run `ps` while the program `./foo` is still executing, we'll see this instead: ``` 4079 tty1 S 0:00 Hi! ``` This behavior is not in the spec and is highly system-dependent. ## Exit Status {#exit-status} Did you notice that the function signatures for `main()` have it returning type `int`? What's that all about? It has to do with a thing called the _exit status_, which is an integer that can be returned to the program that launched yours to let it know how things went. Now, there are a number of ways a program can exit in C, including `return`ing from `main()`, or calling one of the `exit()` variants. All of these methods accept an `int` as an argument. Side note: did you see that in basically all my examples, even though `main()` is supposed to return an `int`, I don't actually `return` anything? In any other function, this would be illegal, but there's a special case in C: if execution reaches the end of `main()` without finding a `return`, it automatically does a `return 0`. But what does the `0` mean? What other numbers can we put there? And how are they used? The spec is both clear and vague on the matter, as is common. Clear because it spells out what you can do, but vague in that it doesn't particularly limit it, either. Nothing for it but to _forge ahead_ and figure it out! Let's get [flw[Inception|Inception]] for a second: turns out that when you run your program, _you're running it from another program_. Usually this other program is some kind of [flw[shell|Shell_(computing)]] that doesn't do much on its own except launch other programs. But this is a multi-phase process, especially visible in command-line shells: 1. The shell launches your program 2. The shell typically goes to sleep (for command-line shells) 3. Your program runs 4. Your program terminates 5. The shell wakes up and waits for another command Now, there's a little piece of communication that takes place between steps 4 and 5: the program can return a _status value_ that the shell can interrogate. Typically, this value is used to indicate the success or failure of your program, and, if a failure, what type of failure. This value is what we've been `return`ing from `main()`. That's the status. Now, the C spec allows for two different status values, which have macro names defined in ``: |Status|Description| |-|-| |`EXIT_SUCCESS` or `0`|Program terminated successfully.| |`EXIT_FAILURE`|Program terminated with an error.| Let's write a short program that multiplies two numbers from the command line. We'll require that you specify exactly two values. If you don't, we'll print an error message, and exit with an error status. ``` {.c .numberLines} #include #include int main(int argc, char **argv) { if (argc != 3) { printf("usage: mult x y\n"); return EXIT_FAILURE; // Indicate to shell that it didn't work } printf("%d\n", atoi(argv[1]) * atoi(argv[2])); return 0; // same as EXIT_SUCCESS, everything was good. } ``` Now if we try to run this, we get the expected effect until we specify exactly the right number of command-line arguments: ``` $ ./mult usage: mult x y $ ./mult 3 4 5 usage: mult x y $ ./mult 3 4 12 ``` But that doesn't really show the exit status that we returned, does it? We can get the shell to print it out, though. Assuming you're running Bash or another POSIX shell, you can use `echo $?` to see it^[In Windows `cmd.exe`, type `echo %errorlevel%`. In PowerShell, type `$LastExitCode`.]. Let's try: ``` $ ./mult usage: mult x y $ echo $? 1 $ ./mult 3 4 5 usage: mult x y $ echo $? 1 $ ./mult 3 4 12 $ echo $? 0 ``` Interesting! We see that on my system, `EXIT_FAILURE` is `1`. The spec doesn't spell this out, so it could be any number. But try it; it's probably `1` on your system, too. ### Other Exit Status Values The status `0` most definitely means success, but what about all the other integers, even negative ones? Here we're going off the C spec and into Unix land. In general, while `0` means success, a positive non-zero number means failure. So you can only have one type of success, and multiple types of failure. Bash says the exit code should be between 0 and 255, though a number of codes are reserved. In short, if you want to indicate different error exit statuses in a Unix environment, you can start with `1` and work your way up. On Linux, if you try any code outside the range 0-255, it will bitwise AND the code with `0xff`, effectively clamping it to that range. You can script the shell to later use these status codes to make decisions about what to do next. ## Environment Variables {#env-var} Before I get into this, I need to warn you that C doesn't specify what an environment variable is. So I'm going to describe the environment variable system that works on every major platform I'm aware of. Basically, the environment is the program that's going to run your program, e.g. the bash shell. And it might have some bash variables defined. In case you didn't know, the shell can make its own variables. Each shell is different, but in bash you can just type `set` and it'll show you all of them. Here's an excerpt from the 61 variables that are defined in my bash shell: ``` HISTFILE=/home/beej/.bash_history HISTFILESIZE=500 HISTSIZE=500 HOME=/home/beej HOSTNAME=FBILAPTOP HOSTTYPE=x86_64 IFS=$' \t\n' ``` Notice they are in the form of key/value pairs. For example, one key is `HOSTTYPE` and its value is `x86_64`. From a C perspective, all values are strings, even if they're numbers^[If you need a numeric value, convert the string with something like `atoi()` or `strtol()`.]. So, _anyway_! Long story short, it's possible to get these values from inside your C program. Let's write a program that uses the standard `getenv()` function to look up a value that you set in the shell. `getenv()` will return a pointer to the value string, or else `NULL` if the environment variable doesn't exist. ``` {.c .numberLines} #include #include int main(void) { char *val = getenv("FROTZ"); // Try to get the value // Check to make sure it exists if (val == NULL) { printf("Cannot find the FROTZ environment variable\n"); return EXIT_FAILURE; } printf("Value: %s\n", val); } ``` If I run this directly, I get this: ``` $ ./foo Cannot find the FROTZ environment variable ``` which makes since, since I haven't set it yet. In bash, I can set it to something with^[In Windows CMD.EXE, use `set FROTZ=value`. In PowerShell, use `$Env:FROTZ=value`.]: ``` $ export FROTZ="C is awesome!" ``` Then if I run it, I get: ``` $ ./foo Value: C is awesome! ``` In this way, you can set up data in environment variables, and you can get it in your C code and modify your behavior accordingly. ### Setting Environment Variables This isn't standard, but a lot of systems provide ways to set environment variables. If on a Unix-like, look up the documentation for `putenv()`, `setenv()`, and `unsetenv()`. On Windows, see `_putenv()`. ### Unix-like Alternative Environment Variables If you're on a Unix-like system, odds are you have another couple ways of getting access to environment variables. Note that although the spec points this out as a common extension, it's not truly part of the C standard. It is, however, part of the POSIX standard. One of these is a variable called `environ` that must be declared like so: ``` {.c} extern char **environ; ``` It's an array of strings terminated with a `NULL` pointer. You should declare it yourself before you use it, or you might find it in the non-standard `` header file. Each string is in the form `"key=value"` so you'll have to split it and parse it yourself if you want to get the keys and values out. Here's an example of looping through and printing out the environment variables a couple different ways: ``` {.c .numberLines} #include extern char **environ; // MUST be extern AND named "environ" int main(void) { for (char **p = environ; *p != NULL; p++) { printf("%s\n", *p); } // Or you could do this: for (int i = 0; environ[i] != NULL; i++) { printf("%s\n", environ[i]); } } ``` For a bunch of output that looks like this: ``` SHELL=/bin/bash COLORTERM=truecolor TERM_PROGRAM_VERSION=1.53.2 LOGNAME=beej VSCODE_GIT_ASKPASS_NODE=/home/beej/.vscode-server/bin/ea3859d4ba2f3e577a159bc91e3074c5d85c0523/node HOME=/home/beej ... etc ... ``` Use `getenv()` if at all possible because it's more portable. But if you have to iterate over environment variables, using `environ` might be the way to go. Another non-standard way to get the environment variables is as a parameter to `main()`. It works much the same way, but you avoid needing to add your `extern` `environ` variable. [flw[Not even the POSIX spec supports this|https://pubs.opengroup.org/onlinepubs/9699919799/functions/exec.html]] as far as I can tell, but it's common in Unix land. ``` {.c .numberLines} #include int main(int argc, char **argv, char **env) // <-- env! { (void)argc; (void)argv; // Suppress unused warnings for (char **p = env; *p != NULL; p++) { printf("%s\n", *p); } // Or you could do this: for (int i = 0; env[i] != NULL; i++) { printf("%s\n", env[i]); } } ``` Just like using `environ` but _even less portable_. It's good to have goals.