Email database design (schema)

We are currently developing a fairly large application that needs to handle a huge amount of records.

The idea is that emails should be saved (with attachments) and through the web api, users should be able to search their saved emails. Users should be able to search (within their own posts that they have exported to the database / repository) for at least the following points:

  • of
  • in
  • Heading
  • date (range)
  • attachments (names and types only)
  • message content
  • (optional) mailbox / folder structure

The app has to work with a large number of users and with an extreme number of emails (easily growing from millions to billions). Users should be able to download all original messages (with attachments) so they can import them into their email client.

I was thinking about indexing emails in a database and was just saving the complete email with unique key attachments as package

a separate store. As such, I have to keep the database loading as low as possible and hence searching as quickly as possible.

I found several database schemas for handling email like this . I could not find a database capable of handling hundreds of millions and perhaps even billions of records (emails).

Is this the most efficient way to keep it simple, efficient and fast, or am I forgetting something?

// edit The idea is to run this on an amazon cloud (maybe any suggestions related to this?)

+3


source to share


2 answers


You can use mongoDB database for this amount of data. See mongoDb in detail here. http://www.mongodb.org/

In mongoDb the mysql table is called assemblies and row as document.

Mongo stores data in JSON based object format.



one possible way to make db schema here.

from : string
to : string
subject: string
date (range): datetime
attachments (names & types only) : Object Array
message contents : string
(optional) mailbox / folder structure: string

for example:
from: from@gmail.com
to: to@gmail.com
subject: "test subject"
date: "current date",
attachments: {
 [0]=>{
   names: "attachments1",
   types: "text"
},
[1]=>{
  names: "attachments2",
   types: "pdf"
}
}

      

+3


source


You don't want to store this information in the DBMS. Rather, you want to extend something like lucene . For email, solr has an email index . Hope it helps ...



0


source







All Articles