🐧 bsantanad.com

Files in c

Introduction

I've been working in python so long that I have been spoiled on how easy things are. I have to write a compiler for a class of the last semester in college, and thought well, I can do it in C right? I have not use it in a long way, but hey there are things once you've learnt, you'll never forget? right?

Well apparently I never learned about Files in C, or even how to write good C at all. I only saw it in my first semesters where we did simple stuff. The homework is due for tomorrow so, I ended up writing a quick token detector in python. Nevertheless I went to sleep surprised how little I know about C, and how I thought I knew it. So I woke up the next day, and said, let's write this in C.

I have a really big book C Primer Plus that goes into big depth on the topic, so let's go over it and write here my notes. This way, my me from the future -- if ever encounter with the same problem I had -- can go back here and read all about files in c.

Without further to say, let's start.

what is a file?

Let's start with the basics, a file is just a named section of storage. C makes the distinction between binary and text files. Internally all file content is in binary form, C makes the distinction based on if the file primarly uses the binary codes for character that represent text (ASCII or Unicode).

C opens 3 files by default when you run a program, stdout, stdin, stderr. standard output, standard input, and standard error. These represent the normal input, output, and error output of your system.

simple program

let's make a program that counts the characters in a file

/** program that counts the characters in file **/
#include<stdio.h>
#include<stdlib.h>

int 
main(int argc, char *argv[])
{
        char *prog = argv[0];  /** program name for errors **/

        int c;
        FILE *fp;
        unsigned long count = 0;
        if (argc != 2) {
                printf("usage: %s filename\n", prog);
                exit(-1);
        }
        if ((fp = fopen(argv[1], "r")) == NULL){
                printf("can not open: %s \n", argv[1]);
                exit(-1);
        }
        while((c = getc(fp)) != EOF)
        {
                putc(c, stdout);
                count++;
        }

        fclose(fp);
        printf("file %s has %lu chars\n", argv[1], count);

        return 0;
}

okay, so what is going on, lets see.

Since we are using args passed in the command line, we use *argv[], this is an array of string with all the args you passed after the binary name. argv[0] being the name of the actual binary. So first we check if the user sent at least two params (the binary name, and a filename).

If he did then we proceed to try and open the file, this is where the function fopen comes in handy. Note that it receives two args, first the filename, and then how you want to open the file. Since we you want to read it now we used r, but you can also use w for writing, a for appending something to the file. If you want to see all the options go here, or do man fopen in a unix machine.

This function will return NULL if the file can not be opened. If it works it will return the FILE pointer to the variable you assigned, in this case fp. A curiose fact is that it will not point to the actual file in memory, it poitn to a data object containing info about the file.

Anyway, then we use getc and putc, this gets a char of put a char, in the file specified. So getc(fp), will get a char from the fp file. And putc(c, stdout) will put a char in the standard output (remember that those are files as well). We do the getc until we get the EOF character, this will tell us that the file has ended, so we need to stop.

Then we use fclose to close the file, and call it a day. Note that we are using exit() instead of return -1 if the code fails. This is becuase this function will close all files open by the program in case it fails, while return will not.

simple cat implementation

/** simple cat implemetation **/
#include<stdio.h>
#include<stdlib.h>

int
main(int argc, char *argv[])
{
        char *prog = argv[0]; /** program name for error **/

        char c;
        FILE *fp; 

        if (argc < 2){
                printf("%s: you need to pass at least one file to cat \n",
                       prog);
                exit(-1);
        }

        /** we need to move the argv array by one, since we dont want to print 
         * the binary program, argv[0] = prog (the actual binary) **/
        *argv++; 

        while(argc-- > 1){
            if ((fp = fopen(*argv++, "r")) == NULL) {
                    printf("%s: can not open file\n", prog);
                    exit(-1);
            }
            while((c = getc(fp)) != EOF) {
                    putc(c, stdout);
            }
            fclose(fp);
        }
        return 0;
}

This basically does the same as the program above, but now we can send any number of files to the program, and it will print them to the stdout one after the other.

The way we do this, is by iterating the argv array, first we skip the fist element since it's the actual binary name, and we do not want to print that. then we iterate thru the next file and print it, until we are left out of files.