How are OLAP, OLTP, data warehousing, analytics, analysis and data mining related?
I am trying to understand what OLAP, OLTP, data mining, analytics, etc. are all about, and I feel like my understanding of some of these concepts is still a little vague. Information about these subjects is usually explained in a very complex way on the Internet.
I feel like a question like this will most likely be closed as it is very broad, so I will try to narrow it down to two questions:
After researching, I understand the following about these concepts, is it correct?
- Analysis decomposes something complex in order to better understand the inner workings.
- Analytics is the predictive analysis of information that requires a lot of mathematics and statistics.
- There are many types of databases, but they are OLTP (transactional) or OLAP (analytical).
- OLTP databases use ER diagrams and are therefore easier to update as they are in normalized form.
- In contrast, OLAP uses a denormalized star schema and is therefore easier to query
- OLAP is used for mining and OLTP is usually used in more practical situations since there is no redundancy.
- Data warehouses are a type of OLAP database and are usually composed of several other databases.
- Data mining is a tool used in analytics, where you use computer software to figure out relationships between data so that you can predict things (like customer behavior).
I am especially confused by the difference between analytics and analysis. They say that analytics is multidimensional analysis, but what does that mean?
source to share
I will try to explain to you from the top of the pyramid:
Business intelligence (what you didn't mention) is an IT term that is a complex system and provides useful insights into a company from data.
Thus, BI systems have a goal: Clean, accurate and meaningful information. Clean means no technical problem (missing keys, incomplete data ect). Exact means accurate - BI systems are also used as error checking in the production database (logical errors - that is, the account score is too high or an inactive partner is used). This was done using rules. The meaning is difficult to explain, but in plain English all your data (even excellent tables from the last meeting) is as you want.
Thus, the BI system has a back-end : a data warehouse . DWH is nothing more than a database (instance, not software). It can be stored in RDBMS, analytic db (column types or document stores) or NoSQL databases.
Data Warehouse is the term commonly used for the entire database that I mentioned above. There may be a number of data-marts (if the Kimball model is used) - more often, or a relational system in 3rd normalized form (Inmon model) called an enterprise data warehouse .
A data worm is tables inside DWH that are linked (star schema, snowflake schema). Fact table (business process in denormalized form) and dimension tables.
Each dataset represents one business process. Example: DWH has 3 data marts. One is retail sales, the second is exports, and the third is imports. In retail, you can see total sales, sold prices, import price, profit (measures) by SKU, date, store, city ect (sizes).
Loading data into DWH is called ETL (Extract, Transform, Load).
Extract data from multiple sources (ERP db, CRM db, excel, web service ...)
Data transformation (raw data, plugging in data from diff sources, match keys, mine data )
Data loading (loading transformed data into specific data marts)
edit due to comment: ETL process is usually created with ETL tool or manually with some programming language (python, C # ect) and API.
An ETL process is a group of SQL, procedures, scripts and rules linked and divided into 3 parts (see above) controlled by metadata. It's either scheduled (every night, every few hours) or live (change data capture, triggers, transactions).
OLTP and OLAP are types of data processing . OLTP is used for transactional purposes, between a database and software (usually only one way to input / output data). OLAP is designed for analytical purposes, which means there are multiple sources, historical data, high query performance, output data.
edit due to comment: Data processing is a way of storing and accessing data from a database. Thus, based on your needs, the database is installed in different ways.
Image from http://datawarehouse4u.info/ :
Data inference is the computational process of discovering patterns in large datasets. Corrected data can give you a deeper insight into the business process or even predict.
Analysis is a verb that in the BI world means the ease of getting information from data. Multivariate analysis actually talks about how the system slices your data (with dimensions within the cube). Wikipedia said that data analysis is the process of checking data in order to find useful information.
Analytics is a noun and is the result of a process of analysis.
Don't worry about those two words.
source to share
I can tell you about Data mining as I had a Data mining project. Data mining is not a tool. His data mining method and various tools used for data mining are WEKA, RAPID MINER, etc. When developing data, many algorithms are used that are built into tools such as Weka, Rapid miner. Algorithms such as clustering algorithm, association algorithm, etc. A simple example I can give you in the field of data mining. The teacher teaches the subject of science in the classroom using different teaching methods such as whiteboard, presentation, hands-on. So, now our goal is to find which method is suitable for students. We then conduct a survey and assess the students' opinions of 40 students such as chalk board, 30 presentation and 20 hands-on method. Thus,using this data, we can draw up rules, for example, a scientific subject should be taught using the chalkboard method. To use different algorithms, you can use google: D.
source to share