Gmail Mysql Database Design:

This is the new design as of 0.4.8pre1. It has been stable on this for a long time now. Though there are some minor improvements to be made, the basics are good. 

See below for a list of possible changes I am stacking up to make sometime. Changing the database format is really a chore because it is pretty tough (but not impossible) to export all the data and import it again. I would like to make all these changes in one big modification one day.

The basic design in that the mail data is not stored in one table but two. The difference is that one of them is a fixed length table, and the other is a dynamic length table. Fixed length tables are MUCH faster than the dynamic ones, so it is worth splitting them up. To display the contents of a vfolder (eg: date, subject, fromfield, matched) everything is in the fixed length table.

The name of the fixed length table is 'display' and the name of the dynamic length table is 'details'.

The basic email message can be reconstructed by concatenating details.headers and details.message. All the other fields are redundant data that has been parsed out for use in vfolder queries.

In the design of the tables there is a tradeoff between parsing out special fields at download time or parsing them at runtime/display time. The advantage of parsing out special fields at download time means that we can have special fields to use in vfolder queries. The disadvantage is that we got lots of new fields and complicate the database design.

eg: we could parse the subject at runtime (from display.headers) but then we could not use the headers field in vfolders (not easily anyway). We do not parse attachments at download time, as you would never search on it.


Display:
This is a fixed length table.
+------------+--------------------------------------+------+-----+---------------------+----------------+
| Field      | Type                                 | Null | Key | Default             | Extra          |
+------------+--------------------------------------+------+-----+---------------------+----------------+
| id         | mediumint(11) unsigned               |      | PRI | 0                   | auto_increment |
| date       | datetime                             |      | MUL | 0000-00-00 00:00:00 |                |
| subject    | char(255)                            |      | MUL |                     |                |
| fromfield  | char(255)                            |      | MUL |                     |                |
| readstatus | enum('Read','Unread', 'Queued', 'Sent')                |      | MUL | Read                |                |
| direction  | enum('Incoming','Outgoing') |      | MUL | Incoming            |                |
| uid        | char(255)                            |      | MUL |                     |                |
| matched    | set(' ')                 |      | MUL | False               |                |
+------------+--------------------------------------+------+-----+---------------------+----------------+

The matched field get built around the names of the user vfolders. When you add/remove vfolders you should rebuild the matched index (see advanced menu). That is, gmail will modify the 'set' to contain the names of all your vfolders. The set is stored in a binary way, so we can use the 'matched' field for fast caching of vfolders. Read the online help for more info, then look at the code.


Details:
This is a dynamic length table.
+-------------+--------------+------+-----+---------------------+----------------+
| Field       | Type         | Null | Key | Default             | Extra          |
+-------------+--------------+------+-----+---------------------+----------------+
| id          | int(11)      |      | PRI | 0                   | auto_increment | unique index
| headers     | text         |      |     | NULL                |                |
| message     | longtext         |      |     | NULL                |                |
| tofield     | text         |      |     | NULL                |                |
| ccfield     | text         | YES  |     | NULL                |                |
| bccfield    | text         | YES  |     | NULL                |                |
| attachments | text         | YES  |     | NULL                |                |
+-------------+--------------+------+-----+---------------------+----------------+

The attachment field was a stupid idea and it has never been used. The whole body of the message including mime parts just goes into details.message. That is why it is longtext, because it can get very large. Testing showed that we were maxing out the 'text' type at 50k, but 'longtext' seems to handle quite large attachments, at least 1 mb.

NOTES/FUTURE MODIFICATIONS:
We don't use the attachments field as we parse them at run time. Drop it.

We don't need to split the headers and message field now that we are using the GMime library. This would allow us to keep a perfect copy of the message in the database. The current code we use to split the message into headers/message is a bit dodgy too. The only advantage in keeping them seperate is that one can do vfolder searches on 'headers', which would become slower if they were done on everything. We should test this. The user can always do fast searches by searching on the to, cc, and from fields. Maybe the solution to this problem is to put the full rfc822 message (with headers) into the message field and still split off the headers into the headers field. Then the code could always work from the message field and the user could still search on headers. We would have to write a function to convert the old style to the new (use the mbox output?). This would work well too because gmime gives us a nice function to rip out the headers...

Change the status fields (like direction) to be BOOLEAN types. eg: rename direction to 'incoming' and incoming will either be true or false. This will use much less space than an enum, and run faster too.
Other status fields to add:
	- split read status into 'read' (boolean), and a sentstatus enum ('Received', 'Queued', 'Sent').
	- Add another date field for 'datedownloaded' ??

We need a field for 'deleted' or not, so we can implement a trashcan.
