Name of Value Parameters and Fact Tables

I am working on an asterisk schema for parsing posted form data. The site that will host the form data is actually external to the site that hosts the form, so only the data in the form will be available. I'll give the option to include additional useful information with hidden fields, original referee, session id, etc.

I will be able to use regex to match certain datatypes and pull them out to certain sizes for example. Postal codes.

I have a solution to deal with the arbitrary nature of the dimensions, but it's not very good, but it will work.

The problem I have is that I have no idea what will be in my fact table, but that is not how there is a nice numeric value that I can fill in. In addition to being "yes, there is a form post" that meets these criteria.

I am wondering if I approached this the right way? Am I using the wrong tool for the job? Or am I just missing something?

Simon.

Additional Information:

There are two areas of functionality that filter form messages based on criteria, for example. between two timestamps. But in most cases, it all depends on what happens to the filtering. The selected form messages will then be used to generate a csv file for export.

Another main area is analytics, and studying the conversion of ad spend into leads is an obvious starting point. Also somewhat open and dependent on form data.

+1


source to share


2 answers


You are not designing a star schema. You are developing an Entity-Attribute-Value table that has all the issues that you identify.

If you really have no idea what your data will look like, that is, what form fields exist and what data types should be used for each of them, a relational database is not the right tool for storing information. Try using XML or YAML or JSON. They are structured, but dynamic, formats. You can set metadata on the fly. You can store the entire form instance in a file or in a BLOB in your database.



Another new technology that can manage dynamic metadata, RDF , with the SPARQL query language . Sesame is an example of a Semantic Data Engine.

+2


source


It is good to have fact tables without any dimensions - they are simply called "actual fact tables". But you usually put the row_count column anyway, although it will always have the value one - to easily add pivot tables. And you may end up adding other dimensions later - like the sense dimension of a term.

And I wouldn't worry too much about it not like Warehouse 101 example - there are a lot of corner cases where strange things happen. You probably have field_name and field_value as columns, or even just field_value if you don't have field_name. It works. And this provides a subtlety of flexibility.

But you are missing some important features. Since a given item or object is indeed split across multiple lines, typical sql filtering will not work. Usually you need to pull all the rows into a small application that can evaluate them altogether, or write a very complex multi-step sql where you insert the boolean results of each row evaluation into a temporary table, then group the session_id (or whatever the equiv), then finally appreciate and / or logic.



Another option is to go this route, but gradually develop your ETL analysis functionality so that over time you can pull some of this stuff into more traditional dimensions. This might become your staging or raw table, but you are trying to get most of the reports to fall into your more traditional star schema.

The last option is to consider a non-relational database. Something more document oriented may provide you with better functionality.

0


source







All Articles