UTPA STEM/CBI Courses/Databases/Partitioning

From Wikiversity
Jump to navigation Jump to search

Course Title: Database Design and Implementation

Lecture Topic: Database Tuning

Instructor: Artem Chebotko

Institution: University of Texas - Pan American

Backwards Design[edit | edit source]

Course Objectives

  • Primary Objectives- By the next class period students will be able to:
    • Talk about potential performance issues in large relational databases
    • Identify situations when applying horizontal and vertical partitioning techniques can be used
    • Explain advantages and disadvantages of relation partitioning
    • Apply horizontal and vertical partitioning techniques to relations
  • Sub Objectives- The objectives will require that students are familiar with notions:
    • Response time
    • Throughput
    • Denormalization
    • Schema redesign
  • Difficulties
    • Once students learned and applied database schema normalization techniques throughout the course, it may be confusing to learn that certain cases may require denormalization. Examples are:
  • Real-World Contexts- There are many ways that students can use this material in the real-world.
    • You work as a database architect/administrator at ABC, Inc. For a few weeks, you have been getting increasingly many emails from the Customer Service department about slow performance of some database queries that retrieve information about customers of the company. You checked the Customer table in the database and realized that your company is doing very well: there are several million customers in the table. Customers are in general served by branches based on their geographical location. The Customer table can be partitioned both horizontally and vertically.
    • Relational RDF databases use table Triples to store RDF triples. Horizontal partitioning of this table based on a predicate column value results in tables known as property relations. Vertical partitioning of table Triples can also be beneficial for column-based relational databases.

Model of Knowledge

  • Concept Map
    • Normalization
    • Denormalization
    • Horizontal Partitioning
    • Vertical Partitioning
  • Content Priorities
    • Enduring Understanding
      • Estimate of response time and throughput for a particular schema design
      • Conditions that suggest horizontal or vertical partitioning
      • Application of horizontal and vertical partitioning
    • Important to Do and Know
      • Normalization vs. denormalization
      • Cons and pros of partitioning
      • Cons and pros of normalization and denormalization
    • Worth Being Familiar with
      • Cost estimation that involves indexes
      • RDF databases: partitioning based on instance or instance type
      • Redundancy techniques to improve performance

Assessment of Learning

  • Formative Assessment
    • In Class (groups)
      • Example of table Triples from RDF databases and its horizontal partitioning into property tables based on values in the predicate column
      • Estimate cost of a query with one relational join over table Triple vs. property tables
    • Homework (individual)
      • Repeat cost estimation for a different queries and different indices
      • Explore vertical partitioning of table Triples and explain a rational for such partitioning when a column-based relational database is used
  • Summative Assessment
    • Quiz
    • Exam

Legacy Cycle[edit | edit source]

OBJECTIVE

By the next class period, students will be able to:

  • Talk about potential performance issues in large relational databases
  • Identify situations when to apply horizontal and vertical partitioning techniques
  • Explain advantages and disadvantages of relation partitioning
  • Apply horizontal and vertical partitioning techniques to relations

The objectives will require that students be able to:

  • Estimate cost of a given schema design for particular SQL queries
  • Choose an "optimal" query execution plan for a logical plan
  • Understand how different storage structures (heap, sorted file, integrated with an index file) work
  • Understand how indices work (B+ tree, hash)
  • Understand how algorithms for join, sorting, selection and projection work
  • Understand how pipelining works
  • Understand how index covering works

THE CHALLENGE

What is the most efficient database schema that can store RDF data?

GENERATE IDEAS

Start with table Triples. Choose indexes. Estimate cost of a query. How can we do better? Brainstorming ... Horizontal partitioning!

MULTIPLE PERSPECTIVES

Vertical partitioning and column-based stored. Native RDF stores.

RESEARCH & REVISE

Estimate cost of several queries between property tables vs. table Triple. Choose the best indexing scheme to support those queries.

TEST YOUR METTLE

Written and oral feedback from students

GO PUBLIC

Groups agree and present the results of their schema design, cost estimation, and indexing analysis. Discussion and conclusion.


Pre-Lesson Quiz[edit | edit source]

  1. What is a logical query plan?
  2. How to choose a good execution plan for a query?
  3. How to locate index entries in a B+ tree index? Cost?
  4. How to locate index entries in a hash index? Cost?
  5. How to retrieve tuples using an integrated B+ tree index? Cost?
  6. How to retrieve tuples using an integrated hash index? Cost?
  7. How to retrieve tuples using a clustered unintegrated B+ tree index? Cost?
  8. How to retrieve tuples using a clustered unintegrated hash index? Cost?
  9. How to retrieve tuples using an unclustered B+ tree index? Cost?
  10. How to retrieve tuples using an unclustered hash index? Cost?
  11. How to compute a join using the block nested-loops join algorithm? Cost?
  12. How to compute a join using the indexed-based block nested-loops join algorithm? Cost?
  13. When pipelining can be applied? Cost?
  14. When pipelining cannot be applied?
  15. When to choose a clustered index?
  16. When to choose an unclustered index?
  17. What is normalization?
  18. What anomalies exist in relations that are not normalized?
  19. How to evaluate a database schema design?
  20. What are cons and pros of indices?

Test Your Mettle Quiz[edit | edit source]

  1. What is horizontal partitioning?
  2. What is vertical partitioning?
  3. When to apply horizontal partitioning?
  4. When to apply vertical partitioning?
  5. What are pros and cons of horizontal partitioning?
  6. What are pros and cons of vertical partitioning?
  7. What are RDF databases?
  8. What are relational RDF databases?
  9. Is horizontal partitioning used in RDF databases?
  10. Is vertical partitioning used in RDF databases?
  11. What is a query response time?
  12. What is a database throughput?
  13. How to improve a query response time?
  14. How to improve a database throughput?
  15. How can views be used to hide some of the complexity of horizontal partitioning?
  16. How can views be used to hide some of the complexity of vertical partitioning?
  17. What is a good indexing scheme for table Triple? Why?
  18. What is a good indexing scheme for property tables? Why?
  19. What are relationships among schema redesign, normalization, denormalization, and partitioning?
  20. Can the same relation be a subject to both horizontal and vertical partitioning?