Skip to content

Data Analysis & Warehousing

A Data Warehouse (DW) is a centralized repository that stores integrated data from multiple heterogeneous sources such as databases, transactional systems, and external files. It is designed specifically for querying and analysis, not for transaction processing.

  • To support decision-making and business intelligence (BI).
  • To provide a historical, subject-oriented, integrated, and time-variant view of data.
PropertyDescription
Subject-orientedOrganized around major business subjects like sales, finance, or customers.
IntegratedData is collected from various sources and standardized.
Time-variantMaintains historical data (e.g., yearly, quarterly trends).
Non-volatileOnce data is stored, it is not updated or deleted frequently.
  1. Data Sources: Operational databases, flat files, web logs, etc.

  2. ETL Process (Extract, Transform, Load):

    • Extract: Retrieve data from various sources.
    • Transform: Clean, filter, and format data.
    • Load: Store transformed data into the warehouse.
  3. Data Storage Area: The central repository storing both current and historical data.

  4. OLAP Engine (Online Analytical Processing): Supports complex queries, multidimensional analysis, and data summarization.

  5. Front-end Tools: Dashboards, reports, and data visualization tools for business analysis.

SchemaStructureDescription
Star SchemaCentral fact table linked to dimension tables.Simplest and widely used.
Snowflake SchemaDimension tables are normalized into multiple related tables.Reduces redundancy.
Galaxy SchemaMultiple fact tables share dimension tables.Used for complex data marts.
  • Enhances data analysis and reporting.
  • Improves decision-making and forecasting.
  • Provides data consistency across the organization.
  • Supports historical and trend analysis.


Data Mining is the process of extracting hidden patterns, trends, and knowledge from large datasets using statistical, mathematical, and machine learning techniques.

It is often considered as “Knowledge Discovery in Databases (KDD)”.

  1. Data Cleaning: Remove noise and inconsistencies.
  2. Data Integration: Combine data from multiple sources.
  3. Data Selection: Retrieve relevant data for analysis.
  4. Data Transformation: Convert data into appropriate forms for mining.
  5. Data Mining: Apply algorithms to extract patterns (e.g., clustering, classification).
  6. Pattern Evaluation: Identify interesting and useful patterns.
  7. Knowledge Presentation: Visualize the results for decision-making.
TechniqueDescriptionExample
ClassificationAssigns data into predefined categories.Spam vs. Non-spam email.
ClusteringGroups similar data items together.Customer segmentation.
Association Rule MiningDiscovers relationships between items.“If customer buys bread, they also buy butter.”
RegressionPredicts continuous values.Predicting sales revenue.
Anomaly DetectionIdentifies outliers or unusual data points.Fraud detection.
  • Business Intelligence: Market basket analysis, sales forecasting.
  • Finance: Credit scoring, risk analysis.
  • Healthcare: Disease prediction, patient profiling.
  • Web Mining: User behavior analysis, recommendation systems.

Difference between Data Warehousing and Data Mining

Section titled “Difference between Data Warehousing and Data Mining”
AspectData WarehousingData Mining
PurposeStore and organize dataAnalyze and discover patterns
Process TypeData storageData analysis
Tools UsedOLAP, ETLML algorithms, statistics
OutcomeHistorical data repositoryActionable insights and predictions