Your browser does not seem to support JavaScript. As a result, your viewing experience will be diminished, and you have been placed in read-only mode.

Please download a browser that supports JavaScript, or enable it if it's disabled (i.e. NoScript).

CS614 - Data Warehousing

17 Topics 71 Posts

Z

CS614 Assignment 3 Solution and Discussion
Scheduled Pinned Locked Moved cs614 assignment 3 solution discussion spring 2020
2

0 Votes

2 Posts

152 Views

Z

Please share your idea solution
Z

CS614 Assignment No.2 Solution and Discussion
Scheduled Pinned Locked Moved assignment 2 cs614 discussion solution spring 2020
4

0 Votes

4 Posts

99 Views

Z

@zareen said in CS614 Assignment No.2 Solution and Discussion:

Question No. 2
Consider the following normalized tables for a telecommunication company showing the daily call record details of customers:
Customer_ID Customer Phone No. Balance
1 033XXXXX 300
2 033YYYYY 250
3 033ZZZZZZ 300
4 033AAAAA 1000
5 033BBBBB 80
6 033CCCCC 554
…
… …

Call_ID Customer_ID Dialed Phone Number Duration Call Charges
1 1 032ABCVD 1 minute 2 RS
2 1 032ABCVG 2 minutes 4 RS
3 1 032ABCVD 1 minute 2 RS
4 2 032ANNNN 3 minutes 6 RS
5 2 032AMMM 4 minutes 8 RS
6 3 033RRRRR 1 minute 2 RS
… … … … …

Due to certain performance factors company wants to de-normalize the tables using pre-joining technique.
Table Information is given below:
• Assume 1:4 record count ratio between customer Info (master) and Call record detail (detail).
• Assume 15 million customers.
• Assume 10 byte Customer_ID.
• Assume 50 byte header for customer Info (master) and 80 byte header for Call record detail (detail) tables.
You are required to perform the following tasks:
• Calculate the Total space in GBs used with normalization.
• Calculate the Total space in GBs used after de-normalization.

Solution:
With normalization:
Total space used = 15 x 50 + 60 x 80 = 5.55 GB

After de-normalization:
Total space used = (80 + 50 – 10) x 60 = 7.2 GB
F

CS614 Assignment 2 Solution and Discussion Spring 2020
Scheduled Pinned Locked Moved assignment 2 cs614 discussion solution spring 2020
5

0 Votes

5 Posts

227 Views

Z

@Faaizaa said in CS614 Assignment 2 Solution and Discussion Spring 2020:

Re: CS614 Assignment 2 Solution and Discussion

Assignment No. 2
Semester: Spring 2020
CS614 – Data Warehousing
Total Marks: 15

Due Date:
June 17, 2020

Objectives:
After completing this assignment, the students will be able to:
• De-Normalize the given table using horizontal splitting technique
• Calculate the Total space used with normalization.
• Calculate the Total space used after de-normalization.

Instructions
Please read the following instructions carefully before submitting assignment:
It should be clear that your assignment will not get any credit if:

• Assignment is submitted after due date.
• Submitted assignment does not open or file is corrupt.
• Assignment is copied (From internet/ to from students).
• Assignment is submitted other than word format (.doc, .docx).

Assignment

Question No. 1

Consider the following table having the information of students of a university:
Student ID Student Name Campus ID Student Age Degree Program 1 Ali VLHR01 27 MS 2 Kamran VISB01 24 BS 3 Akmal VRWP01 24 BS 4 Ahmad VLHR01 26 MS 5 Rehan VISB01 23 BS 6 Rizwan VRWP01 29 MS 7 Umer VISB01 25 BS 8 Javed VLHR01 26 MS
You are required to completely de-normalize the above table using “horizontal splitting” on the basis of Degree Program.

Question No. 2
Consider the following normalized tables for a telecommunication company showing the daily call record details of customers:
Customer Info Customer_ID Customer Phone No. Balance 1 033XXXXX 300 2 033YYYYY 250 3 033ZZZZZZ 300 4 033AAAAA 1000 5 033BBBBB 80 6 033CCCCC 554
…
… …
Call record detail Call_ID Customer_ID Dialled Phone Number Duration Call Charges 1 1 032ABCVD 1 minute 2 RS 2 1 032ABCVG 2 minutes 4 RS 3 1 032ABCVD 1 minute 2 RS 4 2 032ANNNN 3 minutes 6 RS 5 2 032AMMM 4 minutes 8 RS 6 3 033RRRRR 1 minute 2 RS
… … … … …

Due to certain performance factors company wants to de-normalize the tables using pre-joining technique.

Table Information is given below:

• Assume 1:4 record count ratio between customer Info (master) and Call record detail (detail).
• Assume 15 million customers.
• Assume 10 byte Customer_ID.
• Assume 50 byte header for customer Info (master) and 80 byte header for Call record detail (detail) tables.
You are required to perform the following tasks:

• Calculate the Total space in GBs used with normalization.
• Calculate the Total space in GBs used after de-normalization.

Deadline:
Your assignment must be uploaded on VULMS on or before June 17, 2020. While June 18, 2020 will be a bonus day for assignment submission. After the bonus day, no assignment would be entertained via email.

Solution file link
M

CS614 Assignment 1 Solution and Discussion
Scheduled Pinned Locked Moved Solved cs614 assignment 1 solution discussion spring 2020
2

0 Votes

2 Posts

446 Views

Z

CS614 ASSIGNEMNT NO 01 Solution
411dc679-4c14-45ed-9524-f7df373c44de-image.png

b4d345a9-d296-4b5b-9ad6-b9523459976c-image.png

3e85fea8-a27a-4b26-9590-829227619cde-image.png

21deaaf2-238b-4330-9ebe-07a3eace6d92-image.png

425d51f3-a1df-44b8-879b-dbb3021df40f-image.png

52e2773b-a944-404c-9e8f-5d61cd61cc0b-image.png

b9d3fc5b-95b8-42e2-868f-a650c5667191-image.png

29a35db4-8f48-4045-a80f-0412997b7af7-image.png

CS614 assignment Solution spring 2020.xlsx
Z

CS614 Assignment 3 Solution and Discussion
Scheduled Pinned Locked Moved cs614 assignment 3 solution discussion fall 2019
3

0 Votes

3 Posts

509 Views

Z

95402d7a-b0da-4b8a-b70c-52ba4ac298fe-image.png

0c72dda8-4e51-4d51-8585-2fcf7882b710-image.png
2a4ac89f-cbe7-4262-beca-6e3876e68e0a-image.png 082f8ac1-7529-4db9-a900-09500328d6c1-image.png
92668a31-5bc9-4143-aad4-1dd9ca0073e1-image.png
Z

CS614 GDB 1 Solution and Discussion
Scheduled Pinned Locked Moved cs614 gdb 1 solution discussion fall 2019
1

0 Votes

1 Posts

93 Views

No one has replied
Z

CS614 Assignment 2 Solution and Discussion
Scheduled Pinned Locked Moved Solved cs614 assignment 1 solution discussion fall 2019
2

0 Votes

2 Posts

192 Views

Z

@zareen said in CS614 Assignment 2 Solution and Discussion:

Calculate the total space reserved in memory using normalization

Q. 1. Calculate the total space reserved in memory using normalization1. Calculate the total space reserved in memory using normalization

Q. 1 Solution:
Total space used
= 10 (Million records) x 40 (Bytes header) + 110 (Million records) x 70 (Bytes header)
= (1000000040) + (11000000070)
=8,100,000,000
= 8.1 GB

Q. 2. Calculate the total space reserved in memory after de-normalization using pre-joining technique

Q. 2 Solution:
Total space used
= (70 + 40 – 10) (Bytes header) x 110 (Million records)
=100*110000000
=11,000,000,000
= 11 GB
Z

CS614 Assignment 1 Solution and Discussion
Scheduled Pinned Locked Moved Solved cs614 assignment 1 solution discussion fall 2019
3

0 Votes

3 Posts

163 Views

Z

@zareen said in CS614 Assignment 1 Solution and Discussion:

Assignment No. 1
Semester: Fall 2019
CS614 – Data Warehousing
Total Marks: 15

Due Date:
November 14, 2019

Objectives:
After completing this assignment the students will be able to:
• Identify Database entities from a given scenario
• Understand the database table structure
• Normalize a database table up to 2nd normal form
• De-normalize relationships using collapsing table technique

Instructions
Please read the following instructions carefully before submitting assignment:
It should be clear that your assignment will not get any credit if:

o Assignment is submitted after due date.
o Submitted assignment does not open or file is corrupt.
o Assignment is copied (From internet/ to from students).
o Assignment is submitted other than word format (.doc, .docx).

Assignment

Question No. 1

Consider the following schema related to a Social Media website named as ‘userPosts’. You have to perform following tasks related to the provided schema:

1- Identify appropriate keys for following structure (Primary and/or foreign key(s))
2- Convert this schema into 2 NF
userPosts (userID, userName, password, address, postId, postDate, postContent)

Question No. 2

Consider the following schemas relevant to a hotel booking website. You are required to De-Normalization the given schemas using Collapsing Tables Technique.

roomVisitor (roomID, visitorCNIC, dateTime)

roomCharges (roomID, spentDays, roomRent)

Deadline:
Your assignment must be uploaded on VULMS on or before November 14, 2019. While November 15, 2019 will be a bonus day for assignment submission. After the bonus day, no assignment would be entertained via email.

Q.1 Answer:

userDetails (userID, userName, password, address)
posts(postID, postDate, postContent)
userPosts (userID, postID)

• userID and postID in schema / table ‘userPosts’ are foreign keys

Q.2 Answer:
roomVisitorCharges (roomID, visitorCNIC, dateTime, spentDays, roomRent)
Z

CS614 Quiz No. 2 Solution and Discussion
Scheduled Pinned Locked Moved Solved cs614 quiz.2 solution
30

1 Votes

30 Posts

7k Views

Z

1_ As per Bill Inmost, a data warehouse, in contrast with classical applications is:

Data driven _ pg285

2_ Which of the following is NOT one of the three parallel tracks in Kimball’s approach?
Lifecycle Maintenance track

3_ Bill Inmon argues that requirements are well understood only after
Data warehouse is populated _ pg285

4_ Goal driven approach of data warehouse development was result of ______ work
Böhnlein and Ulbrich-vom _ pg285

5_ Identify the TRUE statement:
Clustering is unsupervised learning and classification is supervised learning _ pg 270

6_ Normally the term “DWH face to the business user” refers to:
Lifecycle Analytical Applications track _ pg 306

7_ In ________learning you don’t know the number of clusters and no idea about their attributes.
Unsupervised learning
https://www.cs.uic.edu/~liub/teach/cs583-fall-05/CS583-unsupervised-learning.ppt

8_ Waterfall model is appropriate when
Requirements are clearly defined _ pg 284

9_ Implementation of a data warehouse requires ________ activities.
none of above

10_ Normally the input data structure (a database table) for a data mining algorithm:
Has more number of records than attributes (not sure)
Z

CS614 Assignment No.3 Solution and Discussion
Scheduled Pinned Locked Moved Solved cs614 assignment no.3 solution discussion
5

0 Votes

5 Posts

346 Views

Z

Answer will be same after convert.

84e0807a-616d-4c42-84a2-4f63289e5a4a-image.png
4476461b-9e74-4125-ad00-78dac6fb19df-image.png
Z

CS614 GDB Solution and Discussion
Scheduled Pinned Locked Moved Solved oltp online transaction dss
3

3 Votes

3 Posts

570 Views

M

Introduction to Parallel Execution
Parallel execution enables the application of multiple CPU and I/O resources to the execution of a single database operation. It dramatically reduces response time for data-intensive operations on large databases typically associated with a decision support system (DSS) and data warehouses. You can also implement parallel execution on an online transaction processing (OLTP) system for batch processing or schema maintenance operations such as index creation. Parallel execution is sometimes called parallelism. Parallelism is the idea of breaking down a task so that, instead of one process doing all of the work in a query, many processes do part of the work at the same time. An example of this is when four processes combine to calculate the total sales for a year, each process handles one quarter of the year instead of a single process handling all four quarters by itself. The improvement in performance can be quite significant. Parallel execution improves processing for:

Queries requiring large table scans, joins, or partitioned index scans

Creation of large indexes

Creation of large tables (including materialized views)

Bulk insertions, updates, merges, and deletions

You can also use parallel execution to access object types within Oracle Database. For example, you can use parallel execution to access large objects (LOBs).

If the necessary parallel server processes are not available for parallel execution, a SQL statement is queued when the parallel degree policy is set to automatic. After the necessary resources become available, the SQL statement is dequeued and allowed to execute. The parallel statement queue operates as a first-in, first-out queue by default. If the query in front of the queue cannot be scheduled, none of the queries in the queue can be scheduled even if resources are available in the system to ensure that the query at the head of the queue has adequate resources. However, if you configure and set up a resource plan, then you can control the order in which parallel statements are dequeued and the number of parallel servers used by each workload or consumer group. For information, refer to “Managing Parallel Statement Queuing with Resource Manager”.

This section contains the following topics:

When to Implement Parallel Execution

When Not to Implement Parallel Execution

Fundamental Hardware Requirements

Operations That Can Use Parallel Execution

When to Implement Parallel Execution
Parallel execution benefits systems with all of the following characteristics:

Symmetric multiprocessors (SMPs), clusters, or massively parallel systems

Sufficient I/O bandwidth

Underutilized or intermittently used CPUs (for example, systems where CPU usage is typically less than 30%)

Sufficient memory to support additional memory-intensive processes, such as sorting, hashing, and I/O buffers

If your system lacks any of these characteristics, parallel execution might not significantly improve performance. In fact, parallel execution may reduce system performance on overutilized systems or systems with small I/O bandwidth.

The benefits of parallel execution can be seen in DSS and data warehouse environments. OLTP systems can also benefit from parallel execution during batch processing and during schema maintenance operations such as creation of indexes. The average simple DML or SELECT statements that characterize OLTP applications would not experience any benefit from being executed in parallel.

When Not to Implement Parallel Execution
Parallel execution is not typically useful for:

Environments in which the typical query or transaction is very short (a few seconds or less).

This includes most online transaction systems. Parallel execution is not useful in these environments because there is a cost associated with coordinating the parallel execution servers; for short transactions, the cost of this coordination may outweigh the benefits of parallelism.

Environments in which the CPU, memory, or I/O resources are heavily utilized.

Parallel execution is designed to exploit additional available hardware resources; if no such resources are available, then parallel execution does not yield any benefits and indeed may be detrimental to performance.

Fundamental Hardware Requirements
Parallel execution is designed to effectively use multiple CPUs and disks to answer queries quickly. It is very I/O intensive by nature. To achieve optimal performance, each component in the hardware configuration must be sized to sustain the same level of throughput: from the CPUs and the Host Bus Adapters (HBAs) in the compute nodes, to the switches, and on into the I/O subsystem, including the storage controllers and the physical disks. If the system is an Oracle Real Application Clusters (Oracle RAC) system, then the interconnection also has to be sized appropriately. The weakest link is going to limit the performance and scalability of operations in a configuration.

It is recommended to measure the maximum I/O performance a hardware configuration can achieve without Oracle database. This measurement can be used as a baseline for the future system performance evaluations. Remember, it is not possible for parallel execution to achieve better I/O throughput than the underlying hardware can sustain. Oracle Database provides a free calibration tool called Orion, which is designed to measure the I/O performance of a system by simulating Oracle I/O workloads. A parallel execution typically performs large random I/Os.

Operations That Can Use Parallel Execution
You can use parallel execution for any of the following:

Access methods

Some examples are table scans, index fast full scans, and partitioned index range scans.

Join methods

Some examples are nested loop, sort merge, hash, and star transformation.

DDL statements

Some examples are CREATE TABLE AS SELECT, CREATE INDEX, REBUILD INDEX, REBUILD INDEX PARTITION, and MOVE/SPLIT/COALESCE PARTITION.

You can typically use parallel DDL where you use regular DDL. There are, however, some additional details to consider when designing your database. One important restriction is that parallel DDL cannot be used on tables with object or LOB columns.

All of these DDL operations can be performed in NOLOGGING mode for either parallel or serial execution.

The CREATE TABLE statement for an index-organized table can be run with parallel execution either with or without an AS SELECT clause.

Different parallelism is used for different operations. Parallel CREATE (partitioned) TABLE AS SELECT and parallel CREATE (partitioned) INDEX statements run with a degree of parallelism (DOP) equal to the number of partitions.

DML statements

Some examples are INSERT AS SELECT, UPDATE, DELETE, and MERGE operations.

Parallel DML (parallel insert, update, merge, and delete operations) uses parallel execution mechanisms to speed up or scale up large DML operations against large database tables and indexes. You can also use INSERT … SELECT statements to insert rows into multiple tables as part of a single DML statement. You can ordinarily use parallel DML where you use regular DML.

Although data manipulation language usually includes queries, the term parallel DML refers only to inserts, updates, merges, and deletes done in parallel.

Parallel query

You can run queries and subqueries in parallel in SELECT statements, plus the query portions of DDL statements and DML statements (INSERT, UPDATE, DELETE, and MERGE).

Miscellaneous SQL operations

Some examples are GROUP BY, NOT IN, SELECT DISTINCT, UNION, UNION ALL, CUBE, and ROLLUP, plus aggregate and table functions.

SQL*Loader

You can use SQL*Loader in parallel execution where large amounts of data are routinely encountered. To speed up your load operations, you can use a parallel direct-path load as in the following example:

sqlldr CONTROL=LOAD1.CTL DIRECT=TRUE PARALLEL=TRUE
sqlldr CONTROL=LOAD2.CTL DIRECT=TRUE PARALLEL=TRUE
sqlldr CONTROL=LOAD3.CTL DIRECT=TRUE PARALLEL=TRUE
You provide your user Id and password on the command line. You can also use a parameter file to achieve the same result.

An important point to remember is that indexes are not maintained during a parallel load.

Source
Z

How aggregates awareness helps the users?
Scheduled Pinned Locked Moved Solved
2

0 Votes

2 Posts

101 Views

C

The existence of aggregate tables does not help user queries. Users must access the summary tables, ideally automatically with awareness built into the BI tool or the RDBMS to force the query to be processed in the most efficient way. In recent years, BI vendors have debated if aggregate awareness belongs in the BI tool or in the RDBMS. Business Objects is the only BI tool that has built-in functionality to take advantage of RDBMS aggregate tables in a way that is seamless to users.
Z

Normalization effects performance?
Scheduled Pinned Locked Moved Solved
2

0 Votes

2 Posts

174 Views

M

Full normalisation will generally not improve performance, in fact it can often make it worse but it will keep your data duplicate free. In fact in some special cases I’ve denormalised some specific data in order to get a performance increase. Normalization comes from the mathematical concept of being “normal.”
Source
Z

CS614 Mid Term Past Paper and Please share your current Paper
Scheduled Pinned Locked Moved
1

0 Votes

1 Posts

429 Views

No one has replied
Z

CS614 Quiz No. 1 Solution and Discussion
Scheduled Pinned Locked Moved
2

0 Votes

2 Posts

3k Views

M

Today Quiz Shared by someone (Thanks)
History is excellent predictor of the __________. CS614

• History is excellent predictor of the future.
014f7f4f-3647-4d8e-936b-d8a17bc98757-image.png

In Extract, Load, Transform (ELT) process, data transformation _____ CS614

901dd53e-335d-4eb0-ad22-f5865e28c1fc-image.png

The relation R will be in 2nd Normal Form if CS614
ff57f5f4-b7a3-4bfa-8e76-6c6b2aa64df1-image.png

Taken jointly, the extract programs or naturally evolving systems formed a spider web, also known as CS614

Distributed Systems Architecture
Legacy Systems Architecture
Online Systems Architecture
Intranet Systems Architecture
120b59eb-6029-4d05-a5b3-93946a3f6a38-image.png

As dimensions get less detailed (e.g., year vs. day) cubes get ____ CS614

b855952b-7f69-4220-9040-78601e363b34-image.png

Suppose the size of the attribute “Computerized National Identity Card (CNIC) no” is changed in NADRA database. This transformation refers to: CS614

Format revision
Field splitting
Field decoding
Calculation of derived value

41fb525d-a82d-481c-bfaf-7cabe011560e-image.png

OLAP is: CS614

6db35c32-31e1-4376-89fb-b443c0fae800-image.png

_________ provides a combination of “relational database access” and “cube” data structures within a single framework. CS614

HOLAP (Page No. 78)
DOLAP
MOLAP
ROLAP

HOLAP provides a combination of relational database access and “cube” data structures
within a single framework. The goal is to get the best of both MOLAP and ROLAP:
scalability (via relational structures) and high performance (via pre-built cubes).

73337e89-a8a7-4fe3-ba66-c47847a8a5d3-image.png

The data has to be checked, cleansed and transformed into a ________ format to allow easy and fast access CS614

The data has to be checked, cleansed and transformed into a unified format to allow easy and fast access.

unified
predefi

0cc063a3-54b0-4f71-8a38-21a97ce84a0f-image.png

Robotic libraries are needed for ___________ CS614

Cubes
Data marts
Data warehouse
Aggregates

8f7b0fb2-dd48-4d8b-a796-571fbdb766b1-image.png
Z

CS614 Assignment No.2 Solution and Discussion
Scheduled Pinned Locked Moved
2

0 Votes

2 Posts

185 Views

M

CS614 Assignment Solution

DENORMALIZED Table
DE normalization using pre-joining
In pre-joining all columns are moved from the master table into the detail table

a84b7c27-f4c1-4db4-8794-16a47960ec01-image.png
Detail table
2) Calculate the total space used with normalization

SOL:

Total space with normalization
= (H.S of master table * records in master table) + (H.S of detail table * records in detail table)
= (30 * 10) + (60 * 120)
= 300 + 7200
= 7.5 GB

¥ Calculate the total space used after de-normalization
Total space after de-normalization
= (H.S of detail table + H.S of master table – common join path (orde_id)) * (records in detail table)
= (60 + 30 - 10) * 120
= 80 * 120
=9600
9.6 GB

CS614-Assignment-Solution(72G6dF).docx

1 / 2