Database design theory for multiple application instances

I am working on a SaaS project in which each client will have an instance of an application (customer1.application.com, customer2.application.com, etc.) and ideally each client will have their own "own" space in the DB. The current plan is to create a database for each client and deploy an application instance to a web farm. The idea is that every customer can opt out of the update to maintain the status quo (something one of our investors REALLY wanted, partly because he hates how Facebook keeps changing the way it works.)

I tried rolling out to two of my test accounts last night - an update that changed the database. While the following errors were caused by my mistake (forgetting about a small but apparently very important DDL change), I start to worry about my general theory of work because one ALTER COLUMN statement is missing and the whole update cycle hell. So after that for a long time, my questions come up here:

1) Is there a way to make the difference between the two databases (a "test" production database and an actual production database) that will accurately record every change made?

2) Is there another database model (and / or application) that I should consider? I know that if I remove support for multiple versions of an application, I will actually remove a lot of long term headaches.

+2


source to share


5 answers


Food for thought:

Code updates are more frequent than DB schema updates. Make sure you have a really good SCM to handle code updates. We use it git

with great success.

The code is easy to manage, databases are not (in comparison). The reason is that they are mutable and change every moment. Also, they are really difficult to roll back (possibly, but time consuming, with downtime). Therefore, we need to find an easy way to keep track of schema updates (along with corresponding data changes) and be able to apply them in the future to other similar databases.


Each version of the database schema must be assigned a unique, sequential integer version number. Start at 100 per review.

Every time you need to update it write sql scripts like



  • 100-101.sql

  • 101-102.sql

  • 102-103.sql

It is up to every script to update to that specific version. It can be as simple as adding a table, or as complex as reinstalling foreign keys. But in any case, they will be reliable in what they are designed to perform.

You can apply any given script many times during testing (on new data) to ensure that it works as expected.


So when you need to upgrade a client from version 130 to 180, you can safely apply SQL scripts (IN ORDER) and you will arrive at your desired destination.

+2


source


  • You should never change the database manually. Do it with a script that makes all DDL changes, etc.

    Ideally, there should be a generic DB release script that uses the DDL version as config / input.

    (and DDL changes must be tagged with a specific tag in source control)

  • You can go to Microsoft route re: multi-version support as a headache - just mark all versions prior to X (say 2 versions ago) as not supported. This way you can support the latest 2-3 versions, but don't waste resources on anything more, while still providing flexibility on the client to a great extent.

  • You should carefully weigh the pros and cons of the version of the application / DB system you are proposing.

    List the pros (e.g. settling an investor, a positive experience for a client with an unexpected version change that you talked about, translated into a marginal likelihood of keeping / adding new clients who require such a feature , plus an easy way to do BETA / UAT testing, plus a fail-safe way to rollback schema changes that were faked by loading client data into the database schema from a previous version).

    List of cons (cost of DB space, cost of your implementation time, cost of support)



Compare the two and decide what's best for your business.

+2


source


Redgate SQL Compare does a pretty good job of comparing and differentiating the two SQL Server databases (warning: commercial third party product). Also, I think there is free stuff out there that does the same thing.

If you want to keep some clients on older versions of your product, it might make more sense to maintain a model with one database per client, with the scripts to create each version of the databases in their original control state. This allows your clients to isolate themselves from each other and even allows you to switch database providers (e.g. from SQL Server to Oracle) or versions (e.g. from SQL Server 2000 to Sql Server 2005) on some clients, keeping other clients in older versions ...

+2


source


Manual startup scripts will not work. There are also no comparison tools. Diff works for 2.4, maybe 10 databases. But it doesn't scale because you need reliability in the presence of failures (offline databases, server restart all of that).

You deploy by scheduling upgrade scripts. For example, look at how MySpace does it for over 1000 databases: MySpace Uses a SQL Server service broker to protect the integrity of 1 petabyte of data . the key is that they use a guaranteed, reliable delivery mechanism (SSB) to deploy schema maintenance scripts. You need an asynchronous and reliable scripting mechanism as the destination databases can be offline, perform scheduled maintenance, unreacahbe, etc., and a reliable delivery mechanism like Service Broker can handle all retries and related problems (handle duplicates, confirmations, etc.). You can also look at Executing an Asynchronous Procedure for an example on how to handle script execution over SSB.

As for the scripts themselves, I recommend that you start looking at your database schema and configuration data as a versioned resource. I have a problem with this issue several times already for example. see Are you putting your static database data in source control? How?

Update

I think I have some kind of explanation as to why I think the approach is wrong. Just to keep things clear, I'm talking about deploying hundreds of servers and thousands of databases. The original post is comparing to facebook and I want them to reach that size, but also asking questions about design principles, so I say that the discussion of cloud level scales is appropriate.

I see two problems with diff tools:

  • Availability

    ... All diff tools work by connecting to both "master" and "copy", so they can only do their work if both users are online. This creates a hotspot, a single point of failure, a "master" copy whose availability becomes critical for deploying updates. High availability is always expensive. This also leaves the issue of "copy" availability as a minor implementation detail, the update scheme should handle retries and timeouts and disconnect from the client on its own (not a trivial issue in any way).

  • Atomicity

    ... Diff expects a stable "master" schema. This actually puts a freeze on the "master" during the upgrade. While this can be controlled on a small scale, it becomes a problem on a large scale as upgrading the master itself to v. N + 1 becomes a race against all the thousands of databases where some of them may still be updated from v. N-1.

Scripts that move the update script to "copy" solve both of these problems. In addition, comparison tools such as VSDB.dbschema vsdbcmd.exe are better than the 'live' diff tool because the dbschema 'master' file can be delivered to the machine to "copy" and turn the entire update process into a local operation.

In general, I also believe that a script based update using metadata versioning is in addition to a differentiation based update due to testing and source control reasons that I mentioned in the link to Q1525591.

+2


source


if I remove support for multiple versions of an application that I actually remove a lot of long term support headaches

Any change, no matter how small, has a chance to break something important for someone.

So, if you have multiple clients, deploying a patch to client 1 will break client 2. It doesn't even have to be listening; it might just be a change in behavior that they disagree with. For most clients, an uncontrolled release schedule is simply unacceptable.

So I would suggest that you keep a different codebase for each client. Deploy fixes only after consultation with the client.

There are several clients where this approach breaks down (think Yahoo letter), but reading your question, I think you are safe below that number. And for a comparison tool, I cannot disagree with the posts suggesting Redgate SQL Compare.

0


source







All Articles