Data Management Foundations - D426

icon picker
Unit 1: Introduction to Databases

Last edited 255 days ago by Makiel [Muh-Keel].

1.1 Data

Data is numeric, textual, visual, or audio information that describes real-world systems.
Data is collected and processed to aid in a variety of tasks, such as forecasting weather, analyzing financial investments, and tracking the global spread of pandemics.
Data can vary in several important ways:
Scope. The amount of data produced and collected can vary.
Ex: A small business might track an inventory of a few thousand items, but a large commerce website might track billions of items.
Format. Data may be produced as numbers, text, image, audio, or video.
Ex: A phone's proximity sensor generates raw numbers, and a satellite captures images.
Access. Some data sources are private while others are made publicly available.
Ex: A retail company may use private customer data to discover purchasing behavior patterns, but a government may be required by law to share certain data sets.

1.1 Databases

A database is a collection of data in a structured format.

Database Management System

A database system, also known as a database management system or DBMS, is software that reads and writes data in a database.
Database systems ensure data is secure, internally consistent, and available at all times.

Query

A query is a request to retrieve or change data in a database.
A query language is a specialized programming language, designed specifically for database systems.
Query languages read and write data efficiently, and differ significantly from general-purpose languages such as Python, Java, and C++.

Database Application

A database application is software that helps business users interact with database systems. Many databases are complex, and most users are not familiar with query languages.
Direct database access is usually not feasible, so programmers write applications to simplify the user experience and ensure data access is efficient and secure.
An information management system is a software application that manages corporate data for a specific business function.
An information management system usually includes a database system as well as other components, such as a user interface, business logic, and interfaces to other systems.

Database Roles

A database administrator is responsible for securing the database system against unauthorized users.
A database administrator enforces procedures for user access and database system availability.
A database designer determines the format of each data element and the overall database structure.
Database designers must balance several priorities, including storage, response time, and support for rules that govern the data.
Since these priorities often conflict, database design is technically challenging.
A database programmer develops computer programs that utilize a database.
Database programmers write applications that combine database query languages and general-purpose programming languages.
Query languages and general-purpose languages have significant differences, so database programming is a specialized challenge.
A database user is a consumer of data in a database. Database users request, update, or use stored data to generate reports or information.
Database users usually access the database via applications but can also submit queries directly to the database system.

1.2 Database Systems

Smaller databases only shared by 1-2 users can be managed by using a text file or spreadsheet.
Large, complex databases that are shared by many users have special requirements:
Performance. When many users and applications simultaneously access large databases, query response time degrades rapidly.
Database systems maintain fast response times by structuring data properly on storage media and processing queries efficiently.
Authorization. Many database users should have limited access to specific tables, columns, or rows of a database.
Database systems authorize individual users to access specific data.
Security. Database systems ensure authorized users only access permissible data.
Database systems also protect against hackers by encrypting data and restricting access.
Rules. Database systems ensure data is consistent with structural and business rules.
Ex: When multiple copies of data are stored in different locations, copies must be synchronized as data is updated.
Ex: When a course number appears in a student registration record, the course must exist in the course catalog.
Recovery. Computers, database systems, and individual transactions occasionally fail.
Database systems must recover from failures and restore the database to a consistent state without loss of data.

Transaction

A transaction is a group of queries that must be either completed or rejected as a whole. Execution of some, but not all, queries results in inconsistent or incorrect data.
Ex: A debit-credit transaction transfers funds from one bank account to another.
The first query removes $100 from one account and the second query deposits $100 in another account.
If the first query succeeds but the second fails, $100 is mysteriously lost.
The transaction must process either both queries or neither query.
When processing transactions, database systems must:
Ensure transactions are processed completely or not at all.
A computer or application might fail while processing a transaction.
When failing to process a transaction, the database system must reverse partial results and restore the database to the values prior to the transaction.
Prevent conflicts between concurrent transactions.
When multiple transactions access the same data at the same time, a conflict may occur.
Ex: Sam selects a seat on a flight. Maria purchases the same seat in a separate transaction before Sam completes his transaction.
When Sam clicks the 'purchase' button, his seat is suddenly unavailable.
Ensure transaction results are never lost.
Once a transaction completes, transaction results must always be saved on storage media, regardless of application or computer failures.

Architecture

The architecture of a database system describes the internal components and the relationships between components.
At a high level, the components of most database systems are similar:

Query Processor

The query processor interprets queries, creates a plan to modify the database or retrieve data, and returns query results to the application.
The query processor performs query optimization to ensure the most efficient instructions are executed on the data.

Storage Manager

The storage manager translates the query processor instructions into low-level file-system commands that modify or retrieve data.
Database sizes range from megabytes to many terabytes, so the storage manager uses indexes to quick(ly locate data.

Transaction Manager

The transaction manager ensures transactions are properly executed.
The transaction manager prevents conflicts between concurrent transactions.
The transaction manager also restores the database to a consistent state in the event of a transaction or system failure.

Log

The log is a file containing a complete record of all inserts, updates, and deletes processed by the database.
The transaction manager writes log records before applying changes to the database.
In the event of a failure, the transaction manager uses log records to restore the database.

Catalog

The catalog, also known as a data dictionary, is a directory of tables, columns, indexes, and other database objects.
Other components use catalog information to process and execute queries.

Products

A relational database stores data in tables, columns, and rows, similar to a spreadsheet.
Ideal for databases that require an accurate record of every transaction, such as banking, airline reservation systems, and student records.
Column: All data in a column has the same format.
Row: All data in a row represents a single object, such as a person, place, product, or activity.
Structured Query Language
SQL supports all relational systems using the SQL query language.
SQL includes statements that:
Read/Write Data
Create/Delete tables
Administer the database system.
NoSQL (Not Only SQL) - optimized and designed for big data.
Catalyst for the Open-Source software where anyone can inspect, copy, and modify with no licensing fee.
image.png

1.3 Query Languages

Common Queries

A query is a command for a database that typically inserts new data, retrieves data, updates data, or deletes data from a database.
A query language is a computer programming language for writing database queries.

CRUD

The four common queries are sometimes referred to as CRUD operations, an acronym for Create, Read, Update, and Delete data.

Writing Queries with SQL

The SQL standard is sponsored by the American National Standards Institute (ANSI) and the International Organization for Standardization (ISO).
An SQL statement is a database command, such as a query that inserts, selects, updates, or deletes data:
INSERT inserts rows into a table.
SELECT retrieves data from a table.
UPDATE modifies data in a table.
DELETE deletes rows from a table.

Examples

image.png
Values are always in the order of the table’s columns.
Name is the 1st column.
Capital is the 2nd column.
Area is the 3rd column.
Independence Year is the 4th column.

Creating tables with SQL

The SQL CREATE TABLE statement creates a new table by specifying the table and column names.
Each column is assigned a data type that indicates the format of column values. Data types can be numeric, textual, or complex.
INT stores integer values.
DECIMAL stores fractional numeric values.
VARCHAR stores textual values.
DATE stores year, month, and day.
image.png

1.4 Database Design and Programming

A database design is a specification of database objects such as tables, columns, data types, and indexes.
Database design also refers to the process used to develop the specification.

Analysis

Analysis - A database design is a specification of database objects such as tables, columns, data types, and indexes.
Database design also refers to the process used to develop the specification.
This phase has many names: conceptual design, entity-relationship modeling, and requirements definition.
ER Diagram - used to represent Entities, relationships, and attributes.
Rectangles represent entities. Entity names appear at the top of rectangles.
Lines between rectangles represent relationships.
Text inside rectangles and below entity names represent attributes.
image.png

Logical Design

Logical Design - implements database requirements in a specific database system.
For relational database systems, logical design converts entities, relationships, and attributes into tables, keys, and columns.
Tables, keys, and columns are specified in SQL with CREATE TABLE statements.
Key is a column used to identify individual rows of a table.
Table Designs include the following:
Rectangles represent tables. Table names appear at the top of rectangles.
Text within rectangles and below table names represents columns.
Solid bullets (●) indicate key columns.
Empty bullets (○) and arrows indicate columns that refer to keys.
image.png

Physical Design

Physical Design phase adds indexes and specifies how tables are organized on storage media.
Physical design is specified with SQL statements such as CREATE INDEX and, like logical design, is specific to a database system.
Logical design in more important for database users and programmers, so physical design diagrams are not commonly used.
In Relational Database , logical and physical design affects queries differently.
Database Design is the overall process of determining and implementing database requirements.
Logical Design affects the actual query result.
Physical Design affects query processing speed, but never the query result.
Data Independence (information independence) is the principle that physical design never affects the actual query results.
Allows database designers to tune query performance without changes to application programs.
When database designers modify indexes or row order, applications run faster or slower but always generate the same results.
Data independence is a major advantage of relational databases and contributed to the rapid adoption of relational technology in the 1980s.
image.png

Programming

API ( Application Programming Interface)
To simplify the use of SQL with a general-purpose language, database programs typically use an application programming interface.
An application programming interface, or API, is a library of procedures or classes that links a host programming language to a database.
Most SQL languages are not object-oriented.
To write a database program, SQL is usually combined with a general-purpose programming language such as C++, Java, or Python.

1.5 MySQL

MySQL is a leading relational database system sponsored by Oracle.
MySQL is relatively easy to install and use, yet has many advanced capabilities.
Specific examples of SQL syntax are taken from MySQL, but the material is generally applicable to all relational database systems.
MySQL runs on all major operating systems, including Linux, Unix, Mac OS, and Windows.
For these reasons, MySQL is one of the most popular database systems.
Root Account is the administrative account, which may be used to create user accounts with various permissions.

MySQL is available in two editions:

MySQL Community

MySQL Community, commonly called MySQL Server, is a free edition.
MySQL Server includes a complete set of database services and tools, and is suitable for non-commercial applications such as education.
To install MySQL Server, the user must enter a password the root account.
After installation, MySQL Server runs as a service in the background.
Automatically starts/stops when the OS system starts and stops.

MySQL Enterprise

MySQL Enterprise is a paid edition for managing commercial databases.
MySQL Enterprise includes MySQL Server and additional administrative applications.

MySQL Command-Line Client

The MySQL Command-Line Client is a text interface included in the MySQL Server download.
The Command-Line Client allows developers to connect to the database server, perform administrative functions, and execute SQL statements.
When MySQL Command-Line Client is started with the root account, the user is prompted to enter the root account password.
The root account password is set when first installing MySQL.
Then Command-Line Client attempts to connect to the database server running on the local machine.
The database server runs as a service in the background and does not normally need to be started or stopped manually.

MySQL Workbench

MySQL Workbench is a way for developers to interact with MySQL Server via a graphical user interface.
MySQL Workbench is installed with MySQL Server and allows developers to execute SQL commands using an editor.
When MySQL Workbench is started, the user can connect to MySQL Server running on the local machine or on the network.
image.png
image.png
Want to print your doc?
This is not the way.
Try clicking the ⋯ next to your doc name or using a keyboard shortcut (
CtrlP
) instead.