Using
"Universal Data Models" to by Len Silverston & Kent Graziano |
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Abstract On many of our data modeling consulting engagements, clients often ask us the same question: "Where can we find a book or paper showing a standard way to model this structure? Surely we are not the first company to model company and address information!" In general, one third of a data model (corporate or logical) consists of common constructs that are applicable to most organizations and the other two thirds of the model are either industry or enterprise specific. This means that most data modeling efforts are at some point re-creating data modeling constructs that have already been built many times before in other organizations. Doesnt it make sense then to have a source from which you can get a head start on your data model so you are not "re-inventing the wheel" each time you are asked to develop a new system? This paper will illustrate some examples of common or "Universal Data Models" related to one subject area and explain how they can be used as a starting point for most data modeling efforts. By using these constructs, both time and money can be saved in systems development efforts. Note that there are many other "Universal Data Models" currently available for other subject data areas and applications in The Data Model Resource Book (see reference at the end of this paper). The need for "Universal Data Models" Currently, data modeling is a well known and accepted method for designing effective databases. Therefore, there is a great need to provide standard templates to enterprises (the term enterprise is used to describe the organizations for whom the models and systems are being developed) so they can refine and customize their data models instead of starting from scratch. Although many standards exist for data modeling, there is an great need to take data modeling to the next step: providing accessibility to libraries of common data model examples in a convenient format. These libraries of models should be able to be used across many different organizations and industries. These "universal data models" can help save tremendous amounts of time and money in the systems development process. A Holistic Approach to Systems Development When systems are built separately, there are separate pools of information for each system. Many of these systems will use common information such as information about organizations, people, geographic locations or products. This means that each separate effort will build and use their own source of information. A huge problem with this approach is that it is almost impossible to maintain accurate up-to-date information since the same type of information is stored redundantly across many systems. In large organizations, it is not uncommon to see information about customers, employees, organizations, products and locations stored in dozens of separate systems. How is it possible to know which source of information is most current or accurate? Another way to approach systems development, is from a perspective that an enterprises systems are connected and, in fact, may be viewed as one interconnected system. From this perspective, there are tremendous benefits to building an enterprise wide framework so that systems can work together more effectively. Part of this enterprise wide framework should include a corporate data model which can assist the enterprise in maintaining one of its most valued assets: information. Since each system or application may use similar information about people, organizations, products and geographic locations, a shared information architecture can be invaluable. The IS (information systems) industry has recognized the need for integrated designs and this is why many corporate data modeling and corporate data warehouse modeling efforts have taken place. Unfortunately, the IS track record for building and implementing corporate data models has been very poor. Enterprises have realized that it takes a tremendous amount of time and resources to build these models. Enter CASE (Computer Aided Systems Engineering) tools. These tools claimed tremendous productivity and time savings when used for corporate wide modeling efforts. While these tools help document the models, unfortunately they do not reduce the time to develop good corporate models. Many enterprises have stopped building corporate data models because of their time constraints. Enterprises are looking at the track record of corporate data modeling and CASE efforts and choosing other alternatives. Enter "data warehousing". Finally, an approach to provide executives with the management information they need, without all the time and expense of corporate data modeling. Enterprises are now extracting the various pieces of information they need directly from their operational systems in order to build decision support systems. The only problem with this approach is that the same problem exists! First of all, the information in the data warehouse may be extracted from several different inconsistent sources. If there are multiple places that customer information is being held, which system represents the most accurate source of information? According to data warehousing principles, the transformation routines are responsible for consolidating and "cleansing" the data. However, if different departments have different needs for various pieces of data, then each department may build their own extracts from the operational systems. One department may transform the information using one algorithm while a different department may use another algorithm. For example, if two departments are extracting sales analysis information, one department may use the order entry system as its source and another department may use the invoicing system as its source. A high level manager may view information from both data warehouses and see inconsistent results, thus questioning the credibility of any of the information. This type of scenario actually compounds the initial problem of many data sources by creating even more "slices of data". Not to say that data warehousing is the wrong approach. It is an ingenious approach which can be used extremely effectively not only to create decision support systems but to build a migration path to an integrated environment. The data warehouse transformation process helps to identify where there are data inconsistencies and data redundancies in the operational environment. However, it is imperative to use this information to migrate to new integrated data structures. The answer is still to build integrated data structures in order to provide good, accurate information. It is also necessary to understand the nature of the data in order to build effective systems. Instead of saying that corporate data modeling or CASE is the wrong approach because it just takes too long, the IS community needs to find a way to make it work effectively. By building common re-usable data structures, the IS community can produce quicker results and move towards integrated structures in both the transaction processing and data warehouse environments. What is the intent of this paper and these models? These models are intended to be a starting point for developing logical data models for an enterprise. Each enterprise will have their own detailed requirements and the models will need to be modified and customized in order to be implemented for a specific enterprise. In addition, the models in this paper can be used to validate an enterprises existing data models. Note that the models presented in this paper are logical data models and not physical database designs. Therefore these model are normalized and may require some denormalization when designing the physical database. This paper does not discuss methodologies for physical database design. Consistent with this point, the logical data models do not include any derived attributes since derived attributes do not add anything to the information requirements of a business. They merely serve to enhance performance of the physical database. The logical data models in this paper represent possible data requirements for enterprises. The models do not include business processing rules that may accompany data models. The data models generally provide all the information that business rules would need, however the reader is advised in many cases that additional business rules need to be developed to supplement their data models. Examples of the need for business rules are provided throughout this paper. The following data models were designed to be of benefit to many different industries and enterprises. These models were picked specifically because they represent very common data constructs that appear (or should appear) in most organizations. Within these models, whenever there was a data modeling decision which may have been dependent on a specific enterprise, the most flexible data modeling option was chosen in order to accommodate many different enterprises. Samples of Common Models for People and Organizations Almost all business applications track information about people and organizations, recording information about customers, suppliers, subsidiaries, departments, employees and contractors redundantly in many different systems. For this reason, it is very difficult to keep key information such as client contact data consistent and accurate. Examples of applications which store information about people and organization include sales, marketing, purchasing, order entry, invoicing, project management and accounting. The following sections discuss some standard data constructs for both organizations and people as well as their related data. Organizations The problem is that an organization may play many roles, depending on the particular circumstance. For instance, in larger companies, internal organizations sell to each other. The property management division may be a supplier to the product sales division. The property management division may also be a customer of the product sales division. In this case, there would normally be both a customer and supplier record, with redundant data, for each of these divisions. Not only could there be a customer and supplier record, but there could be many additional records for the organization depending on how many roles the organization plays within the enterprise. When an organizations information changes, such as a change in address, the information might be updated in only one of the many systems where organization information is stored. This, of course, results in inconsistent information within the enterprise. It may also result in major frustration on the part of managers, customers, suppliers and anyone who might want to get out a correct mailing list! The solution to this redundancy problem is to model an entity called ORGANIZATION which stores information about a group of people with a common purpose such as a corporation, department, division, government agency, or non-profit organization. Basic organizational information such as its name and federal tax ID is stored once within this entity, reducing redundancy of information and eliminating possible update discrepancies. Figure 1 shows the data model for organization information. The ORGANIZATION entity is sub-typed into INTERNAL ORGANIZATION and EXTERNAL ORGANIZATION. An INTERNAL ORGANIZATION is one that is part of the enterprise for whom the data model is being developed and an EXTERNAL ORGANIZATION is not part of that enterprise.
Figure 1 Organization. This model reduces redundancy since the organization name is stored only once, as opposed to storing this information redundantly in a customer entity, supplier entity, department entity or any other entity storing organization information. Note that organizations include not only businesses but other groups of individuals such as departments. For example the accounting and information systems departments would be included as organizations. People For example, John Smith was a good customer of ABC Corporation. John then decided to perform contract labor for ABC Corporation. ABC Corporation liked his work so much that they then hired him as an employee. For most systems, there would be a separate record for John Smith as a customer contact, then as a contractor and then as an employee. However much of John Smiths information has remained the same such as his name, sex, birth date, other demographics and skills. Because John Smiths information is stored in several locations, many systems would have trouble keeping his information accurate and consistent. Another problem is that the same person may have many different roles at the same time. For instance, ABC Corporation is a large company with many divisions. Shirley Jones is an employee and manager of the transportation division. She is also considered a customer for the supplies division. At the same time, she is the supplier for the publishing division who needs her services to transport catalogues. Shirley is therefore an employee of one division, a customer contact of another division, and a supplier contact of another division. Rather than have three separate records for Shirley with redundant information, there should only be one record for Shirley. To address this issue, Figure 2 shows a PERSON entity which stores a particular persons information independent of their jobs or roles. Attributes of the PERSON entity could include sex, birth date, height, weight, and any characteristics which describe the person.
Figure 2 Person.
Just as the ORGANIZATION entity is sub-typed, the PERSON entity is sub-typed into EMPLOYEE and NON-EMPLOYEE. An EMPLOYEE is a person that is employed by the enterprise for whom the data model is being developed and a NON-EMPLOYEE is not employed by that enterprise (employed means that the person is an employee whose federal tax is withheld from their pay check by the enterprise). This model has again helped reduce redundancy since the persons base information is only maintained once, even though the person may play many different roles. The "Party Relationships" section later in this paper will describe how to model the various roles each person and organization can play. Party Definition If person and organization were modeled as separate entities, the data model would be more complex. Each contract, sales order, membership, or transaction that involved either a person or an organization would need two relationships: one to the person and one to the company. Furthermore, these relationships are mutually exclusive and thus form an exclusive arc. For instance, a sales order could either be placed by a person or an organization but a single sales order can not be placed by both a person and an organization at the same time. Therefore, Figure 3 shows a super-entity named PARTY which has as its two sub-types, PERSON and ORGANIZATION. This PARTY entity will enable storage of some of the common characteristics and relationships which people and organizations share.
Figure 3 Party definition. Parties are classified into various categories using the entity PARTY DEFINITION which stores each category into which parties may belong. The possible values for categories are maintained in the PARTY TYPE entity. For example, a type of party may be "minority owned business", "8A business", "woman owned business" "government institute" or "manufacturer". The categorizations of parties can be used to determine if there are any special business considerations for parties, special pricing arrangements, or special terms based upon the type of party. It is also a mechanism for classifying businesses into types of industries for market segmentation. A from date and thru date are included so history can also be tracked since it is possible for the definition to change over time (e.g., businesses may "graduate" from the 8A program). Table 1 shows several party occurrences. This single entity allows other data models to refer to either a person or organization as a party to a transaction. Table 1 Party Definition Data
* The value of this attribute in the entity described is actually a numeric ID. Instead, a description is provided for ease of understanding. Party Relationship Instead of modeling just the roles of the party, there is a need to model the relationship between parties. For example, there is a need to know not only that ACME Company is a customer but that ACME Company is a customer of the ABC subsidiary. By default, this fact also implies that the ABC subsidiary is a supplier of the ACME Company. A relationship is comprised of two parties and their respective roles. For example, customer/supplier, parent/subsidiary and division/department are possible organization relationships. The PARTY RELATIONSHIP entity shown in Figure 4 allows parties to be related to other parties and maintains the respective roles in the relationship. The PARTY RELATIONSHIP entity has attributes of from date and thru date in order to show the valid time frames of the relationship.
Figure 4 Party relationship. The PARTY RELATIONSHIP TYPE entity in Figure 4 consists of a pair of roles which are used to define the nature of a PARTY RELATIONSHIP. Customer/Supplier is a valid pair of roles, while the combination of Customer/Sales Agent roles would not be valid because these roles do not compliment each other (Authorizor/Sales Agent would make more sense). The description attribute describes the nature of a specific relationship. For example, a customer/supplier relationship description may be "where the customer has purchased or is planning on purchasing items or services from the supplier". The PARTY ROLE TYPE entity is a list of possible roles that can be played by the parties within a PARTY RELATIONSHIP TYPE. The two relationships from PARTY ROLE TYPE to PARTY RELATIONSHIP TYPE define the nature of the relationship. To form a customer/supplier relationship there would be two relationships to "customer" and "supplier" instances in the PARTY ROLE TYPE entity. The PARTY RELATIONSHIP STATUS entity defines the current state of the relationship. Examples include "active", "inactive" or "pursuing more involvement". The PARTY PRIORITY entity establishes the relative importance of the relationship to the enterprise. Examples may include "very high", "high", "medium" and "low". Alternatively, an enterprise may choose to use "first", second", "third", and so forth, to prioritize the importance of various relationships.
Figure 5 Party relationship hierarchy example. Figure 5 illustrates the relationships for the organization, ABC Subsidiary. Table 2 shows the data which is stored in the PARTY RELATIONSHIP entity to represent these relationships. The internal organizations are the ABC Corporation, ABC Subsidiary, and ABCs Customer Service Division. The first row shows that ABC subsidiary is a subsidiary of the parent corporation, ABC Corporation. The second row shows that the Customer Service Division is a division of the ABC Subsidiary. The third row shows that ACME Company is a customer of ABC Subsidiary. Notice that the fifth row shows that ABC Subsidiary is a customer of Fantastic Supplies, or in other words, Fantastic Supplies is a supplier for ABC Subsidiary. If Fantastic Supplies was a supplier for all of ABC Corporation, there would be a relationship to the parent company, ABC Corporation instead of to the subsidiary. Table 2 Organization to Organization Party Relationships
Just as organizations have relationships with other organizations, people have relationships with other people. Examples of person to person relationships include peoples reporting structures, peoples mentors, peoples family structures and peoples previous managers. Table 3 shows person to person relationship examples. These relationships are stored in the same entity (PARTY RELATIONSHIP) as organization to organization relationships, however Table 3 breaks out the person to person relationships for ease of understanding. In Table 3, John Smith reported to Harry Johnson in 1995 and currently reports to Jim Biggs. John Smith has as a mentor, Barry Goldstein. Judy Smith is John Smiths daughter. Joe Schmidt is the customer representative who Nancy Barry calls upon to sell her companys products. John Smith is also Barry Cunninghams customer (contact). Table 3 Person to Person Party Relationships
Finally, a person may play any number of roles within an organization. The person may be a employee of an organization, a supplier contact, a customer contact, and so on. Table 4 shows examples of peoples roles within organizations. For Example Nancy Barry, John Smith and William Jones are all employees of ABC Subsidiary. William Jones is not only an employee of ABC Subsidiary but also contracts to Hughes Cargo. Barry Cunningham is a supplier representative for Fantastic Supplier and therefore people can contact him to purchase items from Fantastic Supplies. Joe Schmidt is the customer representative for ACME Company and therefore represents their interests as a customer.
Table 4 Person to Organization Party Relationships
Address Definition
Figure 6 Address definition. Address The ADDRESS entity stores attributes to identify the specific location within the geographic boundary. The address1 and address2 attributes provide a mechanism for two text lines of an address. There may be a need for more address line attributes depending on the needs of the enterprise. The postal code identifies the mailing code that is used for delivery. The directions attribute provides instructions on what roads to travel on and what turns to take in order to arrive at that address. Party Address A particular address may have many people residing there such as when many employees work at the same facility. And, of course, people generally have many addresses; their home address, work address, vacation address, and so on. So there is also a many-to-many relationship between PERSON and ADDRESS. Instead of two separate relationships for people and organizations, the model shows a many-to-many relationship between PARTY and ADDRESS. The many-to-many relationship is resolved via an intersection entity (sometimes referred to as an associative or cross reference entity) named PARTY ADDRESS in Figure 6. Notice that PARTY ADDRESS has a from and thru date which allows the ability to track the address history of parties. With this model, addresses are only stored once, thus eliminating redundant data problems, and can be reused many times in relationship to many parties. Address Role Another way this could be modeled is to include the role in the PARTY ADDRESS entity and have additional cross reference records for each of the address roles. For example, if the same partys address served as a mailing, headquarters and service address, it would be stored as three instances in the PARTY ADDRESS entity. Each instance would have the same party and address ID but would have a different role. The disadvantage of this model is that the PARTY ADDRESS entity has significance on its own. For instance, each PARTY ADDRESS may have telephone and fax numbers associated with it. For this reason, our model shows separate PARTY ADDRESS and PARTY ADDRESS ROLE entities. Table 5 illustrates that ABC Corporations address can be used as the corporate headquarters, central mailing address and legal office. The PARTY ADDRESS ROLE entity provides for the storage of a party address only once with many roles for that partys address. Table 5 Party Address Role
Contact Mechanism Definition The CONTACT MECHANISM entity in Figure 7 stores access numbers for parties. Each CONTACT MECHANISM may be the way to contact either a particular PARTY or PARTY ADDRESS. The intersection entity PARTY CONTACT MECHANISM shows which contact mechanisms are related to which parties or addresses. The model also shows valid roles available via the relationship from PARTY CONTACT MECHANISM to PARTY CONTACT MECHANISM ROLE. The CONTACT MECHANISM TYPE and CONTACT MECHANISM ROLE TYPE are entities which maintain allowable values.
Figure 7 Contact mechanism definition. Contact Mechanism The CONTACT MECHANISM TYPE entity shows all the possible values for types of contact mechanism. Examples include office phone, home phone, office fax, modem, cellular, Internet Address, and other electronic addresses. With technology growing so quickly, it is very likely that there will be many ways to get in touch with someone. The data structure in Figure 7 provides an easy method for adding any new contact mechanisms by simply inserting and using new CONTACT MECHANISM TYPEs. Contact Mechanism Relationships to Party
and Party Address Table 6 gives examples of party contact mechanisms. The first two entries show the phone and fax numbers which are tied to the PARTY ADDRESS for ABC Corporation at 100 Main Street. The third and fourth rows show that John Smiths number at 100 Main Street is (212) 234-9856 but he has another office phone at 345 Hamlet Place. These are both tied to a PARTY ADDRESS. John Smith also has a cellular number which is tied directly to his PARTY instance. Barry Goldstein has an office phone which is tied to one PARTY ADDRESS (his work address) and a home phone which is tied to a different PARTY ADDRESS (his home address). He also has an Internet address which is tied directly to his PARTY instance.
Contact Mechanism Role An example of a party contact mechanism role is that a telephone number may be playing roles as the "primary business contact number" and as the "general information number". Other possible party contact mechanism roles include "customer service number" or "invoicing questions line". In the complex world of today, since there are usually many contact mechanisms, it is very useful to identify the purposes of each contact mechanism. Since the purposes of various contact mechanisms change over time, the from date and thru date identify when the purposes are valid. Summary These "Universal Data Models" can be used to:
While care should be taken when modifying these models, as stated before, one purpose of these models is to provide a starting point for data modelers to work from. If the models are used for this purpose, modifications to the models should be expected and encouraged in order to meet the information requirements of each enterprise. We hope that this is just the beginning of efforts towards "Universal Data Models" and that the Information Systems (IS) industry will continue to develop re-usable models. If the IS industry can develop more common sharable models over time, systems development professionals will be able to shorten systems development cycles and produce higher quality information systems at a reduced cost for the user community at large. References Much of the material in this paper was excerpted by permission from the John Wiley and Sons publication The Data Model Resource Book: A Library of Logical Data Models and Data Warehouse Designs by Len Silverston, W.H. Inmon, and Kent Graziano. The book contains many other common data models (and data warehouse designs) regarding products, orders, shipping, professional services, invoicing, work order management, budgeting, accounting, sales analysis, and human resources, to name a few. There is a CD-ROM which may be ordered to supplement the
book which includes a physical database design of the books logical data models
(including those in this paper) using a very straightforward conversion process. No
denormalization has taken place and each super-types with its sub-types are generally
implemented as a single table. This should provide a good starting point for the physical
database design of these models. The CD-ROM contains SQL DDL generated from Designer/2000.
In addition, Quest has all these models (and more) available in a Designer/2000
repository. |
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Contact Universal Data Models, info@univdata.com or call (303) 688-1412 to find out how your organization can save time and money by "jump starting" your data modeling and data warehousing efforts. |
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Data
Model Jump-Start Program | Data Warehouse Jump-Start Program
| The Data Model Resource Book |