CGI with Harp and C - QDecoder and Apache

Welcome to round 2 of "Let's create a chat server!", if you missed the previous post then you'll want to read that to get up to speed. If you just want to grab the source code from that tutorial you can find it here on github.

In this tutorial we're going to take the work we did last time (what I refer to as internals) and use the qdecoder library to make it web facing. I'll show you the neccesary apache configuration for using CGI and give a brief explanation into some of the basics of CGI. So let's get started:

Install Apache and qdecoder

First off, you'll need apache (or some other webserver) setup. If you're running linux this is normally as easy as:

sudo apt-get install apache2

If you're on windows you can download a bundle like xampp or wamp and install the binary by following the instructions on their pages. Most mac's come preinstalled with some type of webserver, but I don't own a mac so you're on your own.

To install the qdecoder library you'll need to follow their instructions and download the source and build the static library. It's not difficult to do, for me the steps were something like this:

Download the latest qdecoder package from here
run ./configure to setup the library for you system
compile the library with make
install the library with make install

Which are literally the steps out of the INSTALL.md file in the repository. I've only installed qdecoder on linux, so if you're trying to do this on windows or mac you'll want to get your google fu on to figure it out. I believe, (don't qoute me on this) that the mac install should be the exact same as what I've said above since it's a unix based operating system.

Once you've installed the library, try building some of the examples to make sure everything is working. Once you've got that come back here. For the purpose of my tutorial I'll assume you built it from source and installed it into a folder called lib within your working directory.

Using qdecoder

The library has some great examples and if you haven't looked at them, seriously do so. If you've ever done web programming before, chances are some of this stuff will look and feel familiar to you. And even if it doesn't you'll probably be able to take a good guess. Let's take an overview of the functions we're going to use:

FCGI_Accept

If you've configured qdecoder with fast CGI, then you'll be using this function to keep your script running (as oppose to turning on and off for each request). The library provides a simple define to check for the configuration, so you'll see this in a few places as a fallback in case the runtime doesn't support it:

#ifdef ENABLE_FASTCGI
    while(FCGI_Accept() >= 0) {
#endif

//code ...

#ifdef ENABLE_FASTCGI
    }
#endif

Note: If you're unfamiliar with the C Preprocessor, all you need to know is that the code between the ifdef and endif will only be used if the ENABLE_FASTCGI constant is defined. If it's not, just think of the code as being commented out.

qcgireq_parse

The qcgireq_parse function handles the hard part about taking the environmental variable sent to apache and parsing them into a useful way. If you've used PHP before you can think of this as what your server does as it puts variables into the super globals $_GET,$_POST, $_COOKIE, and etc. qDecoder parses the servers in COOKIE, POST, then GET order. If you want to change that ordering you can find out how in the documentation. The qcgireq_parse function takes two parameters: a qentry_t pointer and a flag on parsing order. If you pass NULL as the first parameter you'll be given back a qentry_t variable with all the variables stored in it, this is what we'll be doing. So far, the only time I've seen a non-NULL value passed is for getting the parameters in a specific order.

qcgires_setcontenttype

This function allows you to set the HTML header for the content type. qDecoder does not allow you to set any headers you want. Rather, it provides methods to set the content type and to send redirect headers. Most applications really don't need much more than this so it's a pragmatic choice and keeps the library code less complex and the API simple. We'll be using 2 content types in our scripts: text/plain and application/JSON.

qentry_t->getstr, qentry->free

These two functions are two of the ways we can interact with a request object, when qcgireq_parse returns a struct of type qentry_t you can use ->getstr to retrieve a variable from the parsed values, and use ->free to release your request object.

Here's some equalvalent code between PHP and qDecoder for some clarification:

PHP

$myvar = $_GET['theVarInTheUrl'];
$mypvar = $_POST['theVarInThePostedData'];

and in C

qentry_t *req = qcgireq_parse(NULL, 0);
char * myvar = req->getstr(req, "theVarInTheUrl", true);
char * mypvar = req->getstr(req, "theVarInThePostedData", false);
req->free(req);

Not much difference besides having to parse the request. Also, you'll notice I used true in one call to ->getstr and not in the other. The third parameter to the ->getstr method is whether or not the caller is responsible for freeing the string returned to them or not. If you're in a single thread environment then you'll safe to use false and you won't need to worry about free-ing the strings you from the request (so long as you free the request itself), but if you're working in a multithreaded application or fastCGI you'll want to use true and free them when you can.

Another difference to note is that we do not have seperate calls for GET or POST but in this case, because we've passed 0 as the second parameter to qcgireq_parse we get ALL values sent to us. If you wanted to make a couple of global variables for yourself you could do this:

qentry_t *_POST = qcgireq_parse(NULL, Q_CGI_POST);
qentry_t *_GET = qcgireq_parse(NULL, Q_CGI_GET);

and then use each one accordingly.

The last thing I want to touch on is an error message you might see in your logs when attempting to debug your scripts:

Premature end of script headers

This means that you tried to output content before you set the content type of the request. If you're looking at your code saying "no way, I set it right there!", then there's a good chance you're dealing with some undefined behavior in your scripts, a variable isn't initialized properly, you've written somewhere you shouldn't or you've read somewhere you definitely shouldn't have. All of these things will cause that error. And if you're lucky, you'll get a memory dump in your error log for you to decipher. This is why we tested our internal code in the last tutorial, and it's always a good idea to do so. Also, use valgrind to test everything you do! (more on that in a bit!)

Your first qdecoder script

We're going to make a heartbeat script. This is a standard API trick, it's simple to make, and when you want to know if your server is still functioning it's a good way to check. Some heartbeats can be complicated, but our's is going to be super simple. We'll simply spit out the time and make sure the chat is initialized. Check it out:

src/heartbeat.c

#include "config.h"
#include "chatfile.h"
#include "load_qdecoder.h"

int main(void){
#ifdef ENABLE_FASTCGI
    while(FCGI_Accept() >= 0) {
#endif
    qentry_t *req = qcgireq_parse(NULL, Q_CGI_GET);
    qcgires_setcontenttype(req, "application/JSON");

    int initialized = chatInit();

    printf("{ \"heartbeat\" : %ld, \"initialized\" : %s }", time(0), initialized ? "true" : "false");

    // De-allocate memories
    req->free(req);
#ifdef ENABLE_FASTCGI
    }
#endif
    return 0;
}

In order to compile this program we'll need to load the qdecoder library, which means updating our Makefile a little bit. Add the following to the top of your Makefile underneath the LINKFLAGS

LIBS = lib/wolkykim-qdecoder-63888fc/src/libqdecoder.a

Note:If you've installed qdecoder somewhere other than lib, you'll need to reflect that in the definition above. I downloaded the source, untar-ed it, and placed it into a lib folder, you might have done something else.

And then change the line that looks like this:

${CC} ${LINKFLAGS} -o $@ $(patsubst bin/%.cgi, obj/%.o, $@ ) $(patsubst %, obj/%.o, $(INTERNAL))

to this:

${CC} ${LINKFLAGS} -o $@ $(patsubst bin/%.cgi, obj/%.o, $@ ) $(patsubst %, obj/%.o, $(INTERNAL)) ${LIBS}

Lastly, we need to define the header file load_qdecoder.h which we've mentioned at the top of the heartbeat script. Here's the header:

load_qdecoder.h

#ifndef __LOAD_QDECODER_H__
#define __LOAD_QDECODER_H__

#ifdef ENABLE_FASTCGI
    #include "fcgi_stdio.h"
#else
    #include <stdio.h>
#endif
#include "qdecoder.h"

#endif

The only thing special about this include is that we are once again checking if we're using fast CGI or not, and if we are, then we need to include the appropriate standard I/O library.

Now run make and if you've got everything set up correctly, you'll be awarded with an output like this:

make
cc -std=gnu99 -pedantic -Wall -Wextra -Werror -g -I./headers -c src/internal/chatfile.c -o obj/chatfile.o 
cc -std=gnu99 -pedantic -Wall -Wextra -Werror -g -I./headers  -o bin/heartbeat.cgi  obj/heartbeat.o  obj/chatfile.o lib/wolkykim-qdecoder-63888fc/src/libqdecoder.a

If you run your compiled file you should receive a heartbeat response:

./bin/heartbeat.cgi
Content-Type: application/JSON

{ "heartbeat" : 1408487034, "initialized" : true }

and now that we're alive, we can get to the fun stuff! Wiring your internal functions from the previous tutorial into CGI scripts!

Polling, Reading, and Writing with CGI and Internals

We now have enough knowledge to implement every function which our chat server needs:

Polling: Check if the user needs to refresh their copy of the conversation
Writing: Send a message to the chat server
Reading: Retrieve the current chat history

Polling

Let's define our contract: The user will send us an epoch timestamp of when they last retrieved the history for the chat. If this timestamp is less than the last modification time of our history file, we know that the history has been updated. Sounds like the perfect opportunity to make use of our function fileLastModifiedAfter!

If you recall, the function has a signature like so:

int fileLastModifiedAfter(const char * filename, time_t lastCheckedTime);

we know the filename (It's a constant), and now we just need a time_t value. Well, we know that time_t is defined to be int, long int, float, or whatever your system/compiler feels like, so we need to be sure we store the result into something big enough. Then we'll convert it to the proper type and use it. This will also give us the chance to perform some data validation (it is user input after all).

For ease of use later on, let's say we'll send back a JSON object that looks like this:

{"updated": true /* or false */}

And with that, our contract is defined and signed with the outside word. Easily enough, we can translate our above specification into the following CGI script:

poll.c

#include "config.h"
#include "chatfile.h"
#include "load_qdecoder.h"

static void printUpdated(int updated){
    printf("{\"updated\": %s}", updated ? "true" : "false");
}

int main(void){
    chatInit();
#ifdef ENABLE_FASTCGI
    while(FCGI_Accept() >= 0) {
#endif
    qentry_t *req = qcgireq_parse(NULL, Q_CGI_GET);
    char * sentTime = NULL; 
    long long intermediateTime = 0L;
    time_t parsedTime = 0;
    qcgires_setcontenttype(req, "application/json");

    sentTime = req->getstr(req, "date", true);
    if(sentTime == NULL){
        /* They did not send us a proper request. */
        printUpdated(0);
        goto end;
    }

    int scanned = sscanf(sentTime, "%lld", &intermediateTime);
    if(scanned != 1){
        /* Incorrect format likely since we couldn't parse it out */
        printUpdated(0);
        free(sentTime);
        goto end;
    }

    parsedTime = (time_t)intermediateTime;

    int updated = fileLastModifiedAfter(DATA_FILE, parsedTime);
    printUpdated(updated);
    
    free(sentTime);
    // De-allocate memories
    end:
    req->free(req);
#ifdef ENABLE_FASTCGI
    }
#endif
    return 0;
}

This source code is a bit longer than before, but simple. First, we ensure that the chat server is initialized with chatInit. I do this before the Fast CGI accept becuase we know that our initialization is just creating a file, which only really needs to be done once. Next, we parse our request for GET variables using the Q_CGI_GET flag to the qcgireq_parse function. We then ensure that the expected URL parameter of date was sent to us, convert it to a time_t type, and finally use our internal function fileLastModifiedAfter to check whether or not the file's been updated since the parsedTime.

You might be wondering? How do I test this script since it takes a URL parameter. We don't have any URL's after all! It's simple my friends! CGI is really nothing more than passing information along in Environmental variables, because of this it's easy to specify a variable on the command line and "pass" it to the script. For example here are some tests of our poll script:

make
QUERY_STRING="date=no" ./bin/poll.cgi 
    Content-Type: application/json

    {"updated": false}

QUERY_STRING="date2=100000000000" ./bin/poll.cgi 
    Content-Type: application/json

    {"updated": false}

QUERY_STRING="date=1" ./bin/poll.cgi 
    Content-Type: application/json
        
    {"updated": true}

You can see that it works correctly, a non integral date or no date parameter at all means we get false, and if we send in an actual time we'll get true. Running these scripts from the command line and verifying their correctness is something you should always try to do, and we can update out Makefile to perform these tests for us:

Makefile

#...previous code above...
test-poll:
    QUERY_STRING="date=no" ${valgrind} ./bin/poll.cgi
    QUERY_STRING="date=1" ${valgrind} ./bin/poll.cgi
    QUERY_STRING="date=100000000000" ${valgrind} ./bin/poll.cgi
    QUERY_STRING="" ${valgrind} ./bin/poll.cgi

And then running make test-poll will valgrind each of the scenarios we are trying to test our poll script for. I highly recommend using valgrind when testing for undefined behavior as it is immensely helpful in pretty much all circumstances. With this code we can now poll our chat server's file! Now we need to write to it:

Writing

In order for a chat server to work, people need to be able to chat! So we'll need to define a protocol for user A to talk to user B by sending a message of some kind. Easy enough, let's say that with each submission the chatter sends their username and their message to the server within a POST request. We'll use the standard web format of parameter=value&param2=value2 to do this.

Specifically we'll send the parameter u for user and m for message. This means that within our script we'll parse the data like so:

char * user = req->getstr(req, "u", true);
char * msg  = req->getstr(req, "m", true);

And then we'll need to somehow store it, lucky for us we have updateConversation from our internal's library to work with. The signature looks like this:

int updateConversation(const char * user, const char * addendum)

Gosh, it's like we designed it this way or something!

Enough chatter, let's get to the code:

chat.h

#include "config.h"
#include "chatfile.h"
#include "load_qdecoder.h"

/* Don't pass msg with newline or "'s! */
static void printSuccess(int updated, char * msg){
    printf("{\"success\": %s, \"message\" : \"%s\"}", updated ? "true" : "false", msg);
}

int main(void){
    chatInit();
#ifdef ENABLE_FASTCGI
    while(FCGI_Accept() >= 0) {
#endif
    qentry_t *req = qcgireq_parse(NULL, Q_CGI_POST);
    char * user = NULL; 
    char * msg = NULL;
    qcgires_setcontenttype(req, "application/json");

    user = req->getstr(req, "u", true);
    if(user == NULL){
        /* They did not send us a proper request. */
        printSuccess(0, "Invalid Request");
        goto end;
    }
    /* Limit the user name length */
    int i = 0;
    int maxlength = 21;
    for (i = 0; i < maxlength && user[i] != '\0'; ++i)
        ;
    if(i == maxlength){
        printSuccess(0, "Username too long");
        free(user);
        goto end;
    }

    msg = req->getstr(req, "m", true);
    if(msg == NULL){
        printSuccess(0, "Invalid Request");
        free(user);
        goto end;
    }

    int updated = updateConversation(user, msg);
    printSuccess(updated, "Message has been sent");
    
    free(user);
    free(msg);
    // De-allocate memories
    end:
    req->free(req);
#ifdef ENABLE_FASTCGI
    }
#endif
    return 0;
}

You'll notice this is exceptionally similar to the polling process, except that we do our validations a little differently. First off, there is none for the chat message itself. Why? Because going into the minutiae of what we would actually have to watch for is WAY too much for this blog post. Second, we are limiting the length of the username. Why? Because I figured we had to do some type of validation for this script. And this code, in my estimation, is more protective than using strlen. Why? Because strlen relies on the string being ended properly, and we're dealing with user input. So we assume nothing and simply count characters while checking for the end of the string.

By this point you're probably wondering: "Why does he keep using `goto`?"

And the answer is, because it makes my code cleaner. Now before you declare a crusade on me for bad practice and etc, let me explain to you why goto is good in this use case:

goto is a local jump, it's not an actual long jump statement
we prevent a lot of conditional branching and repeated code by using it
it's very readable in my opinion since all the scripts are small enough to view at once
goto is great for error handling since C has no conditionals

The script above can be rewritten to not use goto, if you want to repeat code and nest a bunch of if conditionals inside one another. Also, the flow of the code is logicaly structured as well.

Next up, how to test the above script? When you send a POST request to a server, a few environmental variables are set pertaining to the Content-Length, the Request Method, and various other things. The most important thing to remember is that the data comes in on stdin. So to test it, we need to set the proper variables and then pipe data into our script. You can do so by running the following:

CONTENT_TYPE="application/x-www-form-urlencoded" REQUEST_METHOD=POST CONTENT_LENGTH=11 ./bin/chat.cgi <<< "u=test&m=hi"

This will send a message of "hi" from the user "test" into the chat history in the DATA_FILE. The CONTENT_LENGTH is extremely important for the inner workings of qdecoder, and if you're testing your scripts out then make sure to set this right. Also, when testing this out, I found that only the right CONTENT_TYPE (application/x-www-form-urlencoded) would allow qdecoder to work from the cli for postings. You can add the following to your Makefile in order to automate some tests of your script:

test-chat:
    CONTENT_TYPE="application/x-www-form-urlencoded" REQUEST_METHOD=POST CONTENT_LENGTH=7 ${valgrind} ./bash bin/chat.cgi <<< "u=12345" 
    CONTENT_TYPE="application/x-www-form-urlencoded" REQUEST_METHOD=POST CONTENT_LENGTH=23 ${valgrind} ./bin/chat.cgi <<< "u=123456789012345678901"
    CONTENT_TYPE="application/x-www-form-urlencoded" REQUEST_METHOD=POST CONTENT_LENGTH=11 ${valgrind} ./bin/chat.cgi <<< "u=test&m=hi"

Note:You might need to set #!/bin/bash at the top of the Makefile, or change the symlink of /bin/sh to /bin/bash instead of /bin/dash if you have problems running make test-chat

Reading

And last but not least, we have the script for reading the history file out to the world:

#include "config.h"
#include "chatfile.h"
#include "load_qdecoder.h"

int main(void){
#ifdef ENABLE_FASTCGI
    while(FCGI_Accept() >= 0) {
#endif
    qentry_t *req = qcgireq_parse(NULL, Q_CGI_GET);
    qcgires_setcontenttype(req, "text/plain");

    FILE * fp =  getChatFile();
    if(fp == NULL){
        printf("%s\n", "Could not retrieve chat history. Please try again later");
        goto end;
    }

    int cOrEOF;
    char c;
    while( (cOrEOF = fgetc(fp)) != EOF){
        c = (char)cOrEOF;
        printf("%c", c);
    }
    fclose(fp);

    end:
    req->free(req);
#ifdef ENABLE_FASTCGI
    }
#endif
    return 0;
}

This script is straightforward, we retrieve our chat history with the internal function getChatFile and then output to the world as plain text. If we can't read the file we simply print out an error message. An observant reader will notice that we're not calling chatInit anywhere. We know that chatInit simply creates our history file, which we're going to check for anyway when we try to read it. So there's no point in checking twice and we skip the call to initialize.

Since we're storing the chat in the tmp directory (if you're using the defaults from last tutorial and on a *nix system.) the chat will be cleared whenever you shut off your computer at least, so people need to either poll or check the heartbeat of your server to make sure it's initialized.

Since the read script is stateless, you don't need to worry about sending any environmental variables when trying to test it and can simply run it with ./bin/read.cgi after a make command.

And that's it for the CGI scripts! Now we just need to set up a server:

Apache Configuration

So now all that we have to do is setup a new virtual host in our apache configuration to wrap our CGI scripts. First off, if you're working locally, add this line to yours hosts file (/etc/hosts for *nix, %systemroot%\system32\drivers\etc\ for windows)

/etc/hosts

127.0.0.1 www.chat.dev

and next in your apache configuration file:

/sites-available/default

<VirtualHost *:80>
        ServerAdmin webmaster@localhost
        ServerName www.chat.dev
        DocumentRoot /path/to/this/repository/tutorialchat/www
        <Directory />
                Options Indexes
                AllowOverride None
        </Directory>
        Alias /chat /path/to/this/repository/tutorialchat/bin
        <Directory />
                AddHandler cgi-script .cgi
                AllowOverride None
                Options +ExecCGI -MultiViews +SymLinksIfOwnerMatch
                Order allow,deny
                Allow from all
        </Directory>

        ErrorLog /path/to/this/repository/tutorialchat/error.log

        # Possible values include: debug, info, notice, warn, error, crit,
        # alert, emerg.
        LogLevel warn

</VirtualHost>

Note you'll need to change the path's inside the configuration to match your own server, but it's pretty easy to do. What we've done is said that when someone goes to /chat/<file>.cgi we'll let apache ExecCGI and run the script there. This means that anything in that directory ending in .cgi will be able to be seen from the outside world.

Before you restart/start your webserver we need to make the document root exist.

mkdir wwww
echo "<html><body><h1>I'm alive. Yay." > www/index.html

now start or restart your apache and navigate to http://www.chat.dev and you'll see the words "I'm alive. Yay." on the screen. If you navigate to http://www.chat.dev/chat/heartbeat.cgi you should be greeted with the familiar:

{ "heartbeat" : 1408623802, "initialized" : true }

which let's you know that your chat server is up and ready for an interface.

What's next

With that, you have a fully operation chat server. Kind of. It stills needs a web interface but that will come in the next tutorial. For now if you want to be sure that everything is working (becuase a heartbeat wasn't enough), make your index file use this markup instead:

<html><body><h1>I'm alive. Yay.
<iframe src="/chat/read.cgi"></iframe>

Run the following commands to get something to appear in the chat box:

make test-internal
./bin/test-chat.out
refresh your browser page

We'll get into HTML, Javascript, and a small amount of CSS for the front end in the next tutorial! I'll show you how to setup a Harp project and we'll test the full application. See you then!