Basic Fundamentals of Data Base Management Systems
In this article I will discuss data models, defining the concepts of schema and instances, which are fundamental to the study of database systems. As well, I will discuss Database Management system architecture, data independence, different types of interfaces and languages provided by DBMS. Database systems software environment and classification of Database Management Systems.
Data Models, Schema, and Instances
One fundamental characteristic of the database approach is that it provides some level of data abstractions by hiding details of data storage that are not needed by most database users. A data model is the main tool that provides this abstraction. A data model is a set of concepts that can be used to describe the structures of a database. By structureof a database, we mean the data types, relationships, and constraints that should hold for the data. Most data models also include a set of basic operations for specifying retrievals and updates on the database. It is gradually becoming a common practice to include concepts in the data model to specify behavior; this refers to specifying a set of valid user-defined operations that are allowed on the data base in addition to the basic operations provided by the data model. An example of a user-defined operation is COMPUTE_GPA, which can be applied to a student object. On the other hand, generic operations to insert, delete, modify, or retrieve an object are often included in the basic model operations.
Categories of Data Models
Many data models have been proposed. We can categorize data models based on the types of concepts they provide to describe the database structure. High-level or conceptual data models provide concepts that are close to the way many users perceive data, wheres low-level or physical data models provides concepts that describe the details of how data is stored in the computer. Concepts provided by low-level data models are generally meant for for computer specialists, not for typical users. Between these two extremes is a class of representational (or implementation ) data models, which provide concepts that may be understood by end users but that are not too far removed from the way data is organized within the computer. Representational data models hide some details of data storage but can be implemented on a computer system in a direct way.
High-level data models use concepts such as entities, attributes, and relationships. An entity represents a real-world object or concept, such as an employee, or a project, that is stored in the database. An attribute represents some property of interest that further describes an entity, such as the employees name, or salary. A relationship among two or more entities represents an interaction among the entities; for example, a a works on relationships between an employee and a project.
Representational or implementation data models are the ones used most frequently in current commercial DBMS, and they include the three most widely used data models, relational, network, and hierarchical. They represent data by using record structures and hence are sometimes called record-based data models. One can regard object-oriented data models as a new family of high-level implementation data models that are close to conceptual data models. Object-oriented models are frequently utilized as a high-level conceptual models, particularly in the software engineering domain.
Physical data models describe how data is stored in the computer by representing information such as record formats, record orderings, and access paths. An access path is a structure that makes the search for particular database records efficient.
Schema and Instances
In any data model it is important to distinguish between the description of the database and the data base itself. The description of a database is called the database schema. A database schema is specified during database design and is not expected to change frequently. Most data base models have certain conventions for diagrammatically displaying the schema. A displayed schema is called a schema diagram. The diagram displays the structure of each record type but not the actual instances of records. Each object of the schema is called a schema construct.
A schema diagram displays only some aspects of a schema, such as the names of record types and data items, and some types of constraints. Other aspects are not shown in the schema diagram: for example it shows neither data type of each data item nor the relationship among the various files. Many types of constraints are not shown in the schema diagrams; for example constraints such as “students majoring in computer science must take COSC11310 before the end of their sophomore year” is quite difficult to represent.
The actual data in a database may change quite frequently; for example, the databse changes eevery time one adds a new data for a given database schema. That data in the database is called a database state (or set of occurrences or instances). In a given database state, each schema construct will contain the set of individual student entries as its instances. Many database states can be constructed to correspond to a particular database schema. Every time one delete or insert a record or change value of a data item, one changes one state of the database in to another state.
DBMS Architecture and Data Independence
Three important characteristics of the database approach are as follows:
(a) Insulation of programs and data
(b) Support of multiple user views
(c ) Use of catalog to store the database descriptions (schema)
The Three Schema Architecture
The goal of three-schema architecture is to separate the user applications and the physical database . In this architecture, schema can be defined at the following three levels:
The internal level has an internal schema, which describes the physical storage structure of the database. The internal schema uses a physical data model and describes the complete details of data storage and access paths for the database.
The conceptual level, has the conceptual schema, which describes the structure of the whole database for a community of users. The conceptual schema hides the details of physical storage structures and concentrates on describing entities, data types, relationships, user operations, and constraints. A high-level data model or an implementation data model can be used at this level.
The external or view level includes a number of external schema or user views. Each external schema describes the part of database that a particular user group in interested in an hides the rest of the data base from that user group. A high-level data model or an implementation data model can be used at this level.
Most DBMS do not separate the three levels completely, but several of them support the there schema architecture to some extent. Some DBMS include physical-level details in the conceptual schema. In most DBMS the support user views, external schema are specified in the same data model that describes the conceptual elvel information. Some DBMS allow different data models to be used at the conceptual and external levels.
Notice that the three schema are only descriptions of the data; the only date that accurately exists is at the physical level. In a DBMS based on the three-schema architecture, each user group refers only to its own external schema. Hence, the DBMS must transform a request specified on an external schema into a request against the conceptual schema, and then into a request on the internal schema for processing over the stored database. If the request is a database retrieval, the data extracted from the stored database must be reformatted to match the users external view. The process of transforming requests and results between levels are called mapping. These mapping may be time consuming so some DBMS – especially those that are meant to support small databases – do not support external views. Even in such systems, however, a certain amount of mapping is necessary to transform requests between conceptual and internal levels.
The three-schema architecture can be used to expain the concept of data independence, which can be defined as a capacity to change the schema at one level of a database system without having to change the schema at the next higher level. One can define two types of data independence:
Logical data independence is the capacity to change the conceptual schema without having to change external schema or application program. We may change the conceptual schema to expand the database (by adding a new record type or data item), or to reduce the data base (by removing a data type or data item). In the late case, external schema that refers only to the remaining data should not be affected. For example, the external schema should not be affected by changing say a grade report file of a database of student records. Only the view definition and mapping need be changed in a DBMS that supports logical data independence. Application programs that refers the external schema must work as before after the conceptual schema undergoes a logical reorganization. Changes to constraints can also be applied to the conceptual schema without affecting the external schema.
Physical data independence is the capacity to cahnge the internal schema without having to change the conceptual (or external) schema. Changes to the internal schema may be needs because some physical files had to be reorganized – for example, by creating additional access structures – to improve the performance of retrieval or update. If the same data as before remains in the database, one should not have to change the conceptual schema. For example, providing an access path to improve retrieval of section records by semestar and year should not require a query such as “ list all section offered in Fall 1991” to be changes, although the query can be executed more efficiently by the DBMS by utilizing the new access path. Because physical data independence refers to the insulation of an application from the physical data storage structure only, it is easier to achieve than logical independence.
Whenever one has multiple-level DBMS, its catalog must be expanded to include information on how to map requests and data among the various levels. The DBMS uses additional software to accomplish these mappings by referring to the mapping information in the catalog. Data independence is accomplished because, when the schema mapping between the two levels is changed. Hence, the three-schema architecture can make it easier to achieve true data independence, both physical and logical. However, the two levels of mapping create an overhead during compilation or execution of a query or program, leading to inefficiencies in the DBMS. Because of this, few DBMS have implemented the full three-schema architecture.
Database Language and Interface
The DBMS must provide appropriate languages and interfaces for each category of users. I will discuss the types of languages and interfaces provided by a DBMS and user categories targeted by each interface.
Once the design of ta database is completed and a DBMS is chosen to implement the database, the first order of the day is to specify conceptual and internal schema for the database and any mapping between the two. In may DBMS where no strict separation of levels is maintained, one language called the database definition language (DDL), is used by the DBA and by the database designers to define both schema. The DBMS will have a DBMS will have a DDL compiler whose function is to process DDL statements in order to identify descriptions of the schema constructs and store the schema descriptions in DBMS catalog.
In DBMS where a clear separation is maintained between the conceptual and internal levels, the DDL is used to specify the conceptual schema only. Another language, the storage definition language (SDL), is used to specify the internal schema. The mapping between the two schema may be specified in either one of these languages. For a true three-shema-architecture, one would need a third language, the view definition language (VDL), to specify user views and their mappings to the conceptual schema. Once the the database schema is compiled and the database is populated with data, users must have some means for manipulating the database. Typical manipulations include retrieval, insertions, deletion, and modification of data. The DBMS provide data manipulation language (DML) for these purposes.
It is common in current DBMS not to identify the preceding types of languages as distinct; rather, a comprehensive integrated languages can be used that includes constructs for conceptual schema definition, view definition, data manipulation and storage definition. A typical example is SQL relational database language, which represnts a combination of DDL, VDL, DML and Sdl, although the SDL component is now being removed from the language.
There are two main types of DML. A high-level or non procedural DML can be used on its own to specify complex database operations in a concise manner. Many DBMS allow high-level DML statements either to be entered interactively from a terminal or to be embedded in a general-purpose programming language. In the latter case, DML statement must be identified within the program so that hey can be processed by the DBMS. A Low-level or procedural DML must be embedded in a general-purpose programming language. This type of DML typically retrieves individual records from the database and processes each record separately. Hence, it needs to make use of programming language constructs, such as looping, to retrieve and process each individual records from a set of records. Low level DML, such as SQL, can specify and retrieve many records in a single DML statement and are hence called set-at-a time or set-oriented DML. A query in a high-level DML, often specify what is to be retrieved rather than how to retrieve the data; hence, such languages are also called declarative.
Whatever DML commands, whether high-level or low-level, are embedded ina general-purpose programming language, that language is called the host language and the DML is called the data sub-language. In newer DBMS, such as object-oriented systems, the host and data sub-languages typically form one integrated language such as C++. On the other hand, a high-level DML used in a stand-alone interactive manner is called a query language. In general both retrieval and update commands of a high-level DML may be used interactively and are hence considered part of the query language.
Causal end users typically use a high-level query language to specify their requests, wheres programmers use DML in its embedded form. For naïve and parametric users, there usually are user friendly interfaces for interacting with the data base; these can also be used by casual users or others who do not want to learn the details of a high-level query language.