udmlogos.gif (4490 bytes)

The Data Model Resource Book
Chapter 1: Introduction


Why Is There a Need for This Book?

On many of our data modeling consulting engagements, our clients ask us the same question: "Where can we find a book showing a standard way to model this structure? Surely, we are not the first company to model company and address information."

Based upon our consulting experiences, most companies develop their data models or data warehouse designs with very little outside reference materials. There is a large cost associated with either hiring experienced consultants or using internal staff to develop this critical component of the system design. Often there is no objective reference material which they can use to validate their data models or data warehouse designs, or seek alternate options for database structures.

In general, one third of a data model (corporate or logical) consists of common constructs that are applicable to most organizations and the other two thirds of the model are either industry- or enterprise-specific. This means that most data modeling efforts are recreating data modeling constructs that have already been created many times before in other organizations.

With this in mind, doesn't it make sense to have a source upon which to get a head start on your data model so you are not "reinventing the wheel" each time a company develops a new system? Organizations can save time and money by leveraging the use of common or universal database structures. Even if a company has data models from its previous systems development efforts, it is very helpful to be able to check the designs against an unbiased source in order to evaluate alternative options.

Although there are a large number of publications which describe how to model data, there are very few compilations of data model examples in published form. This book provides both a starting point and a source for validating data models. It can assist data modelers to minimize design costs and develop more effective database designs.

Who Can Benefit from Reading This Book?
This book can assist many different systems development professionals: data administrators, data modelers, data analysts, database designers, data warehouse administrators, data warehouse designers, data stewards, and corporate data integrators. Systems professionals can use the database constructs contained within this book to increase their productivity and provide a checkpoint for quality designs.

The Need for Universal Data Models
Data modeling has been an art that first gained recognition since Dr. Peter Chen's 1976 article which illustrated his new-found approach called "Entity-Relationship Modeling." Since then it has become the standard approach used towards designing databases. By properly modeling an organization's data, the database designer can eliminate data redundancies which are a key source for inaccurate information and ineffective systems.

Currently, data modeling is a well-known and accepted method for designing effective databases. Therefore, there is a great need to provide standard templates to enterprises (the term enterprise is used to describe the organizations for whom the models and systems are being developed) so they can refine and customize their data models instead of starting from scratch.

Although many standards exist for data modeling, there is a great need to take data modeling to the next step: providing accessibility to libraries of common data model examples in a convenient format. These libraries of models should be able to be used across many different organizations and industries. Such universal data models can help save tremendous amounts of time and money in the systems development process.

A Holistic Approach to Systems Development

One of the largest challenges to building effective systems is integration. Systems are often built separately since there are particular needs at different times within each enterprise. Enterprises have needs to build many systems: contact management systems, sales order systems, project management systems, accounting systems, budgeting systems, purchase order systems, and human resources systems, to name a few.

When systems are built separately, there are separate pools of information for each system. Many of these systems will use common information about organizations, people, geographic locations, or products. This means that each separate system will build and use its own source of information. A huge problem with this approach is that it is almost impossible to maintain accurate up-to-date information since the same type of information is stored redundantly across many systems. In large organizations, it is not uncommon to see information about customers, employees, organizations, products, and locations stored in dozens of separate systems. How is it possible to know which source of information is most current or accurate?

Another way to approach systems development is from a perspective that an enterprise's systems are connected and, in fact, may be viewed as one interconnected system. From this perspective, there are tremendous benefits to building an enterprise-wide framework so that systems can work together more effectively. Part of this enterprise-wide framework should include a corporate data model which can assist the enterprise in maintaining one of its most valued assets: information. Since each system or application may use similar information about people, organizations, products, and geographic locations, a shared information architecture can be invaluable.

The IS (information systems) industry has recognized the need for integrated designs and this is why many corporate data modeling and corporate data warehouse modeling efforts have taken place. Unfortunately, the IS track record for building and implementing corporate data models has been very poor. Enterprises have realized that it takes a tremendous amount of time and resources to build these models.

Enter CASE (Computer-Aided Systems Engineering) tools. These tools claimed tremendous productivity and time savings when used for corporate-wide modeling efforts. While these tools help document the models, unfortunately they do not reduce the time to develop good corporate models.

Many enterprises have stopped building corporate data models because of their time constraints. They are looking at the track record of corporate data modeling and CASE efforts and choosing other alternatives.

Enter data warehousing. Finally, here is an approach to provide executives with the management information they need, without all the time and expense of corporate data modeling. Enterprises are now extracting the various pieces of information they need directly from their operational systems in order to build decision support systems.

The only problem with this approach is that the same problem exists! First of all, the information in the data warehouse may be extracted from several different, inconsistent sources. If there are multiple places that customer information is being held, which system represents the most accurate source of information?

According to data warehousing principles, the transformation routines are responsible for consolidating and cleansing the data. However, if different departments have different needs for various pieces of data, then each department may build its own extracts from the operational systems. One department may transform the information using one algorithm while a different department may use another algorithm. For example, if two departments are extracting sales analysis information, one department may use the order entry system as its source and another department may use the invoicing system as its source. A high-level manager may view information from both data warehouses and see inconsistent results, thus questioning the credibility of any of the information. This type of scenario actually compounds the initial problem of many data sources by creating even more slices of data.

This is not to say that data warehousing is the wrong approach. It is an ingenious approach which can be used extremely effectively not only to create decision support systems but also to build a migration path to an integrated environment. The data warehouse transformation process helps to identify where there are data inconsistencies and data redundancies in the operational environment. However, it is imperative to use this information to migrate to new integrated data structures.

The answer is still to build integrated data structures in order to provide good, accurate information. It is also necessary to understand the nature of the data in order to build effective systems. Instead of saying that corporate data modeling or CASE is the wrong approach because it just takes too long, the IS community needs to find a way to make it work effectively. By building common reusable data structures, the IS community can produce quicker results and move toward integrated structures in both the transaction processing and data warehouse environments.

What Is the Intent of This Book and These Models?
Most data modeling books focus on the techniques and methodologies behind data modeling. The approach behind this book is dramatically different. This book assumes that the reader knows how to model data. Data modeling has been around long enough that most information systems professionals are familiar with this concept and will be able to understand this book. Therefore, this book makes no efforts to teach data modeling principles, except by example. Data modelers can use this book, and their previous experience, to build upon and refine the data model examples contained within the book in order to develop more customized data models. Essentially, it is providing the modeler with fundamental tools and building blocks which can be reused. Therefore, the modeler can be more productive and save a great deal of time by starting with standard data models instead of building data models from scratch.

Furthermore, the reader can also benefit from the data warehouse models which are applicable to decision support environments. This book not only presents examples of data warehouse designs, but also explains in detail how to convert the logical data models to an enterprise-wide data warehouse, then to departmental data marts. The logical data models and data warehouse models presented here are applicable across a wide variety of enterprises.

These models are intended to be a starting point for developing logical and data warehouse data models for an enterprise. Each enterprise will have its own detailed requirements; the models will need to be modified and customized in order to be implemented for a specific enterprise. Since the data warehouse data models reflect actual database designs (as opposed to logical data models), they are even more dependent on the business needs of the specific enterprise wishing to use these models. In addition, the models in this book can be used to validate an enterprise's existing data models.

The models presented in the first part of this book (Chapters 2 through 8) are logical data models, not physical database designs. Therefore, these models are normalized and may require some denormalization when designing the physical database. Methodologies for physical database design are not discussed in this book. Consistent with this point, the logical data models do not include any derived attributes since derived attributes do not add anything to the information requirements of a business. They merely serve to enhance performance of the physical database.

These logical data models represent possible data requirements for enterprises. They do not include many of the business processing rules that may accompany data models. The data models generally provide all the information needed to enforce business rules; however, the reader is advised in many cases that additional business rules may need to be developed to supplement the data models. Examples of the need for business rules are provided throughout this book.

These data models were designed to benefit many different industries and enterprises. They were picked specifically because they represent very common data constructs that appear in most organizations. Within these models, whenever there was a data modeling decision which may have been dependent on a specific enterprise, the most flexible data modeling option was chosen in order to accommodate many different enterprises.

Conventions and Standards Used in This Book
The following section describes the naming standards and diagramming conventions used for presenting the models within this book. Details are described for entities, sub-types, attributes, relationships, foreign keys, physical models, and illustration tables.

Entities
An entity is something of significance about which the enterprise wishes to store information. Whenever entities are referenced throughout the book, they are shown in capital letters. For example, ORDER represents an entity which stores information about a commitment between parties to purchase products. When the name of an entity is used in a sentence in order to illustrate concepts and business rules, it may be shown in normal text. For example, "Many enterprises have mechanisms such as a sales order form to record sales order information."

The naming conventions for an entity include using a singular noun that is as meaningful as possible to reflect the information it is maintaining. Additionally, the suffix TYPE is added to the entity name if the entity represents a classification of information such as an ORDER TYPE (i.e., sales versus purchase order) rather than a specific instance of a real thing such as an ORDER ("order #23987").

The data models in this book include TYPE entities on the diagrams, even though they usually only have a code and description. These entities are included for completeness and to show where allowable values or look-ups are stored.

Entities are included in the data model if it is a requirement of the enterprise to maintain the information included in the entity. For example, if an enterprise doesn't really care about tracking the tasks associated with a shipment, then, even though this information exists in the real world, the data model should not incorporate this information since it does not add value. Entities are represented by rounded boxes. Figure 1.1 shows an example of the entity ORDER.

Sub-types and Super-types
A sub-type, sometimes referred to as a sub-entity, is a classification of an entity which has characteristics such as attributes or relationships in common with the more general entity. INTERNAL ORGANIZATION and EXTERNAL ORGANIZATION are for example, sub-types of ORGANIZATION.

Sub-types are represented in the data modeling diagrams by entities inside other entities. The common attributes and relationships between sub-types are shown in the outside entity which is known as the super-type. The attributes and relationships of the super-type are therefore inherited by the sub-type. Figure 1.2 shows the super-type ORGANIZATION and its sub-types INTERNAL ORGANIZATION and EXTERNAL ORGANIZATION. Notice that the name and federal tax ID apply to both sub-types and are therefore shown at the super-type level of ORGANIZATION. Anything specific to an internal organization would be tied to only that sub-type.

The sub-types within an entity should represent a complete set of classifications (meaning that the sum of the sub-types covers the super-type in its entirety) and at the same time be mutually exclusive of each other. Many times the data model includes an OTHER . . . sub-type to provide for other possible classifications of the entity which may be defined by the enterprise using the model.

While the sub-types represent a complete set of possible classifications, there may be more detailed sub-types which are not included in the data model; instead, they may be included in a TYPE entity. In this case, sub-types are shown in two places on a model: as a sub-type and in a TYPE entity which shows the domain of allowed types for the entity.

Attributes
An attribute holds a particular piece of information about an entity, such as the order date on an order. Attributes are identified in the text of the book by boldface, lowercase letters such as the previous order date example.

Attributes may be either part of the unique identifier of an entity (also referred to as a primary key), mandatory, or optional. The primary key attribute(s) is identified by a '#' sign preceding the attribute name on the diagram. Mandatory attributes are signified by a '*' before the attribute name. Optional attributes have a 'o' before the attribute. Figure 1.3 shows that the ORDER entity has order ID as a primary key attribute, order date as a mandatory attribute, and entry date and shipping instructions as optional attributes.

Certain strings included in an attribute's name have meanings based upon the conventions in Table 1.1.

Relationships
Relationships define how two entities are associated with each other. When relationships are used in the text, they are usually shown in lowercase as a normal part of the text. In some situations, where they are specifically highlighted, they are identified by boldface lowercase letters. For example, manufactured by would be the way a relationship may appear in the text of this book.

Relationships may be either optional or mandatory. A dotted relationship line means that the relationship is optional and a continuous line means that the relationship is mandatory (the relationship is present in all occurrences of each entity). Figure 1.4 shows a relationship that "each PRODUCT must be manufactured by one and only one ORGANIZATION." This means that the manufacturer for each product must be specified. The same relationship has an optional aspect when read in the other direction: "Each ORGANIZATION may be the producer of one or more PRODUCTs."

Relationships may be one-to-one, one-to-many, or many-to-many. This is generally known as the cardinality of the relationship. The presence of a crowsfoot (a three-pronged line which looks like a crow's foot) defines if an entity points to more than one occurrence of another entity. Figure 1.5 shows that "each ORDER may be composed of one or more ORDER LINE ITEMs," since the crowsfoot is at the ORDER LINE ITEM side. The other relationship side states that "each ORDER LINE ITEM must be part of one and only one ORDER." A one-to-one relationship doesn't have any crowsfeet on the relationship and a many-to-many relationship has crowsfeet at both ends of the relationship. Sometimes, one-to-many relationships are referred to as parent-child relationships.

Sometimes the term "over time" needs to be added to the relationship sentence to verify whether the relationship is one-to-many. For instance, an ORDER may only appear to have one ORDER STATUS. However, if status history is required, then each ORDER may be described by one or more ORDER STATUS, over time.

The data models in the book have very few one-to-one relationships, since most of the time one-to-one relationships can be grouped together into a single entity when normalized. The data model diagrams show very few many-to-many relationships, since most of the time many-to-many-relationships are broken out into intersection entities.

Intersection entities are also known as associative entities or cross-reference entities. They are used to resolve many-to-many relationships by cross-referencing one entity to another. Often they include additional attributes which may further delineate the relationship. Figure 1.6 shows a many-to-many relationship between a PRODUCT and an ORGANIZATION which is resolved in this way. The diagram indicates that a PRODUCT may be supplied through more than one ORGANIZATION and an ORGANIZATION may be the seller of more than one PRODUCT. This many-to-many relationship is resolved by the intersection entity PRODUCT SUPPLIER.

Notice that in all the examples given, each relationship has two relationship names associated with it that describe the relationship in both directions. The relationship names should be combined so that they read as a complete sentence, as shown in the following format: "Each ENTITY {must be/may be} relationship name {one and only one/one or more} ENTITY, over time," where the appropriate choices are filled in.

In the models presented, the crowsfeet on the relationships generally point up and to the left in order to provide a consistent mechanism for reading the diagrams. This tends to organize the data models in a more understandable format.

Exclusive arcs are relationships where an entity is related to two or more other entities, but only one relationship can exist for a specific entity occurrence. The exclusive arc is represented by a curved line going through two or more relationship lines with little circles identifying the relationships it covers. Figure 1.7 shows an example of an exclusive arc. The relationships are read as "Each AGREEMENT LINE ITEM PRICE may be either a price specified for one and only one PARTY ADDRESS or a price specified for one and only one GEOGRAPHIC BOUNDARY, but not both."

Recursive relationships are either modeled via a relationship pointing from an entity to itself or via a many-to-many-relationship. Figure 1.8 shows an example of a one-to-many recursion around the PRODUCT entity and a many-to-many recursion which is resolved by the intersection entity PRODUCT COMPONENT.

Foreign Key Relationships
A foreign key is defined as the presence of another entity's (or table's) primary key in an entity (or table). For example, in Figure 1.5 the order ID from the ORDER entity is part of the ORDER LINE ITEM entity; therefore, it is a foreign key. Any one-to-many relationship indicates that the primary key of the entity on the one side of the relationship is brought into the entity on the many side of the relationship. Some data modelers show this foreign key as an attribute of the entity (this is sometimes known as key migration). The data models in this book do not show the foreign keys of entities as attributes since this is redundant. Instead, the relationship itself identifies the foreign key. In Figure 1.5, the order ID is not shown as an attribute in the ORDER LINE ITEM entity since the one-to-many nature of the relationship reveals it is a foreign key. Another diagramming convention in this book is to use a crossed relationship line to indicate that the inherited foreign key is part of the primary key of the child entity. The small horizontal line across the relationship in Figure 1.5 indicates that the horizontal order ID is part of the ORDER LINE ITEM entity primary key.

Physical Models
The data warehouse models and diagrams (Chapters 9 through 12) represent physical database designs for decision support environments; therefore, the notation is slightly different. Since these models represent physical database designs, each square (not rounded as in the entity) box represents a table and the field names are columns. All the lines connecting tables in the model are solid lines and the foreign keys of the relationships are now shown in each appropriate table as columns since these columns will ultimately be implemented.

The naming convention for tables is to keep them plural and the column names have underscores instead of blanks between them. (The entities had singular names and attributes allowed blank spaces between the names.)

Conventions Used for Illustration Tables
Many parts of the data models are illustrated via tables which contain possible values for attributes. Each illustration table is normally defined to show a specific entity and the relevant information from related entities. For instance, there may be a table illustrating the ORDER LINE ITEM entity as shown in Table 1.2.

In order to illustrate the details of an ORDER LINE ITEM, Table 1.2 brings in some attribute information from the ORDER entity. Notice that there is an asterisk on the order date column of the table. The asterisk indicates that even though the table is designed to illustrate ORDER LINE ITEM information, the order date information resides in an entity other than the main entity being illustrated and is included for context.

Whenever data from each illustration table is referenced in the text of this book, it is surrounded by double quotes. For instance, the text may refer to specific order "12930", line item seq "1" which has a comment of "Need this item urgently".

The Companion CD-ROM
Within this book and its appendices, are very detailed descriptions of the models discussed. The diagrams lay out all the relationships, the mandatory attributes and columns, the primary keys, and even include some optional attributes. The appendices include the physical details for the attributes and columns, such as the datatype and size. With this information, it would be possible for a data modeler or database designer to recreate these models in the tool of their choice or write the SQL code to build them in a database.

This, however, would take a substantial amount of time and opens the possibility of data entry errors. To assist those interested in quickly implementing the models discussed in the following pages, a companion CD-ROM is available for sale separately. On the CD-ROM is a series of SQL scripts derived directly from the models in the book. All the entities, attributes, tables, and columns discussed are implemented with this code. Scripts are provided for Oracle, Informix, Sybase SQL Server, and Microsoft SQL Server. Each of these scripts has been tested and verified against the above databases. Since these are standard SQL scripts, they should work with not only the current versions of these database management systems but also with future versions (i.e. Oracle8). There are also ANSI standard SQL scripts which could be used with other ANSI compliant databases.

Since the CR-ROM includes standard SQL scripts, they should work with not only the current versions of these databse management systems but also with future versions. This includes object-relational databases (i.e. Oracle8, Informix Universal Server) which should continue to support relational designs. The constructs in the book are, of course, also generally applicable to any relational or object-relational database.

Use of the scripts on the CD-ROM will allow an enterprise to more rapidly deploy the models presented in this book. In addition to the time savings, there is obviously a cost savings as well (nobody has to type in all the definitions or write SQL scripts). Once the scripts have been run, the models could be reverse-engineered into the enterprise's favorite CASE tool (most popular CASE tools provide a reverse engineering feature). Once the models have been brought into a repository, they are easily accessible and may be customized for a specific enterprise's needs. Additionally, they can be used to jump-start the development of corporate data models, new applications, data warehouse designs, or decision support systems.

The remainder of this book will provide many examples of universal data models and data warehouse designs which can assist in increasing the productivity of system development efforts.

Contact Universal Data Models, info@univdata.com or call (303) 688-1412 to find out how your organization can save time and money by "jump starting" your data modeling and data warehousing efforts.

Data Model Jump-Start ProgramData Warehouse Jump-Start Program | The Data Model Resource Book
Publications | Seminars and Training
Universal Data Models |
About Universal Data Models | Contact Us