Protocol

I wasn't about to give up on my pager just yet. How hard would it be to simulate a POCSAG signal? Turned out it's not that hard. (Not what she said.) Documentation on POCSAG is vague though, but after reading through wiki, a few sites and PDF's I pieced everything together. Below you will find my interpretation.
The receiver (pager) waits for a training sequence (with the correct baudrate!), and as soon as it recognizes one, it keeps listening until it receives a sync code word. The training sequence needs to be sufficiently long for the receiver to remain turned off most of the time, and lock on to the baudrate. Specifications say a minimum of 576 bits, so we better respect that. Longer ís allowed though! Longer is safer, but it may take slightly longer for the page to arrive. In a busy real life scenario, it would mean channel occupation is slightly worse. Don't think it would matter nowadays though, nobody uses pagers anymore :)

After the training sequence a 32bit syncword is transmitted. The usual syncword is 0x7CD215D8. It turned out to be the same syncword callmax used or uses. After the syncword, 16 words of 32 bits are transmitted. After this batch is transmitted, the syncword is transmitted again, followed by a new batch of 16 words. The 16 words are subdivided into 8 frames.
Words can be of 3 types, idlewords (usually 0x7A89C197), address words or message words. Any unused code word is valued 'idle'.

A pagers unique ID under POCSAG is it's RIC (Radio Identification Number). It's a 21 bits value, which means the system can address 2097151 unique RICs. A pagers RIC can usually be found on the backside of the device. The RIC is transmitted in the address word. The address frame starts with '1', followed by 18 of the most significant bits of the RIC.

After the address bits, 2 function bits appear. These can be used to send pages to different mailboxes, or change the beeper pattern. A CRC and parity are calculated and appear in the remaining 11 bits by calculating a polynomial over the address bits.

[1][18 bits MSB of RIC][2 Function bits][9 bits CRC][2 bits parity]

The 3 least significant bits of the RIC frame determines the frame where the addressword appears. Say, for instance, the RIC is 10234. The binary value would be 0b10011111111010. This means the frame where the addressword appears, should be 0b10. In this example:

Batch number:              [                             0                                ][                1           ].
Frame number:                        [  0 ][  1 ][  2 ][  3 ][  4 ][  5   ][  6   ][  7   ]
Data:      [01010101010101][Syncword][0][1][2][3][4][5][6][7][8][9][10][11][12][13][14][15][Syncword][0][1][2][3][4].....
Receiver enable: ____----------------___________-----------------------------------

The receiver will start listening somewhere during the training sequence, and remains listening until the syncword is received. It then switches off its receiver, and starts listening again just before it would be expecting its address frame. If it receives its addressframe and the frame passes CRC, the pager will leave the radio on until it receives an idle code word. In this case, for RIC 10234, the addressword would be expected in frame 2, in the 4'th 32 bits word. The 5th and all further words will be filled with the message. After the message has been transmitted, the rest of the words will be the idle code word. A message should be terminated with an idle word. If the last part of the message is sent in word 15, a syncword will appear before an idleword in frame 0.

This is what it looks like on a scope:

scope

The big red blob on the left is the training sequence, followed by the sync word. The receiver then switches itself off, and switches back on just before the part where the address frame should appear. It's not seeing its RIC, and switches itself off again until right before it predicts the sync word to be.

If addressing a tone-only pager, just the address word and and idle word is enough. It can't do much with message data anyway.

Not that hard right? We're just getting started though. Messages for pagers are encoded in the following manner:

[0][20 bits message][9 bits CRC][2 bits parity]

More than 20 bits to send? Use the next word! Messages for numeric pagers use 4 bits per character. Using 4 bits, 16 different characters can be encoded. This means that some stuff like brackets and spaces can be sent to numeric pagers, in addition to normal digits.

    0xA Reserved (possibly used for address extension)
    0xB Character U (urgency)
    0xC " ", Space (blank)
    0xD "-", Hyphen (or dash)
    0xE ")", Left bracket
    0xF "(", Right bracket

Alphanumeric messages use the 7 LSB of a normal ascii table to encode characters. They are sent LSB first. As you can probably figure out, 7 bit characters don't align all that well in a 20 bit word. The solution for this is to send all possible bits in the message words, and send the remaining bits in the next word. Example: In the first message word, only 2 complete characters are sent (14 bits), and the 6 LSB of the next character. The last bit of the 3rd character is first sent, followed by 2 complete characters. Only 5 bits are left in this second messageword; they can be used to send the first 5 LSB of the 6th character (and so on). Seems like we're ready to send POCSAG messages!