MatBase algorithm for translating (E)MDM schemes into E-R data models
1. Introduction
All subuniverses of discourse are governed by business rules (e.g., ”nobody may die before birth”, ”nobody may get married before birth”, ”people’s height is between 30 and 225 cm”, etc.). A database (db) instance is useful only if it is plausible, i.e., only if its data does not violate any business rule governing the corresponding subuniverse of discourse. In dbs and software
engineering, business rules are formalized as constraints, i.e., closed formulas of the first-order predicate calculus with equality.
Our (Elementary) Mathematical Data Model ((E)MDM) [1] was introduced as an intermediate conceptual design level between the Entity-Relational Data Model (E-RDM) [2, 3, 4] and the Relational Data Model (RDM) [4, 5, 6]. The E-RDM includes the very powerful E-R Diagrams (E-RDs) that may be easily understood even by customers without computer science background but, despite its many extensions, it has a very limited set of constraints. The RDM has only six constraint types (domain/range, not null, default values, unique keys, foreign keys, and tuple/check). The (E)MDM has 76 constraint types, thus allowing for far more accurate conceptual data modeling and db design; they include, explicitly or implicitly, all 6 relational type ones; the non-relational type ones should be enforced by the software applications (sa) that manage the corresponding dbs.
In [4], we defined E-R data models as triples <ERDS, ARS, ICSD>, where ERDS is a set of E-RDs, ARS is an Associated Restriction Set, and ICSD is an Informal Corresponding Sub universe Description. The associated restrictions, which correspond to the business rules governing the modeled sub-universes, are of the following five types: inclusions between object sets (e.g., EMPLOYEES ⊆ PERSONS), ranges of the attributes (e.g., Month between 1 and 12), compulsory (not null) attribute values (e.g., BirthDate compulsory), minimal uniqueness of attributes (e.g., SSN unique) and attribute concatenations (e.g., Room • Weekday • StartHour unique), and other restriction types (e.g., “no teacher may be simultaneously present in two classrooms”).
An E-RD may be single, i.e., consisting only of an entity or relationship type set and its attributes, or structural, i.e., containing any number of sets, the structural functions (i.e., binary functional relationships) relating them, and no attributes. The single ones for relationship-type sets also contain all the sets over which the relationship is defined.
2
MatBase [7, 8] is our intelligent data and knowledge base management system prototype, based on both (E)MDM, E-RDM, and RDM. It includes 3 graphic user interfaces (GUI), one for each data model, and automatically translates between them, generates corresponding sa GUIs and code for enforcing a vast majority of the (E)MDM constraint types.
In our previously published paper [9], we presented and discussed the forward MatBase algorithm for translating E-R data models into (E)MDM schemes. The algorithm presented in this paper is its dual, a reverse engineering type one. Its necessity arises from the following couple of reasons:
✓ Reflecting updates of a(n) (E)MDM schema performed after obtaining it from an E-R data model.
✓ Extracting only an E-R data model subset.
The algorithm has two optional input parameters: an object set name of the current db, and a natural radius. If the first one is not null and the radius is either null or 0, then the algorithm outputs the E-R data model of the corresponding object set (whose E-RD is a single one); for any other natural value n of the radius, it outputs the E-R data sub-model centered in the desired object set and having radius at most n (i.e., whose E-RD is a structural one having length at most n for at least one path starting from the desired object set); if the first parameter is null, then it outputs the whole db E-R data model (and the radius is ignored if it is not null).
After the literature review, the third Section introduces and characterizes the pseudocode algorithm used by MatBase to translate (E)MDM schemes into E-R data models, then presents and discusses the results of applying this algorithm to an (E)MDM scheme from [10, 11]. The paper ends with conclusions, recommendations, and a reference list.
2. Related work
Only MatBase manages (E)MDM schemes.
Quest has over 30 years of its erwin Data Modeler[12] success. For example, it may reverse engineer both relational (e.g., MS SQL Server, Oracle Database, IBM Db2, Sybase SQL Anywhere, SQLite) and noSQL (e.g., mongoDB, cassandra, MariaDB, neo4j) schemas into corresponding E-RDs.
Related algorithms are, however, used by all major commercial RDBMSes for translating relational schemas into E-R data models-like ones.
For example, the MS SQL Server Management Studio [13] provides a Database Diagram menu, part of the Visual Database Tools. You can create, update, and delete relational diagrams corresponding to the tables of a db. These diagrams are not “standard” E-RDs but are equivalent to them: nodes are rectangles corresponding to underlying db tables that can be thought of as single E-RDs, as their attributes are listed inside them; edges are the relations between tables, as established by the foreign keys, so that they are equivalent to structural functions and E
RDs. As the RDM is a syntactical one, no relationship-type nodes are present, all of them being of entity-type. This is consistent with our Theorem stating that the only fundamental mathematical relations for conceptual data modeling are the functions [14].
Similarly, the Oracle SQL Developer provides a Data Modeler tool [15]. The IBM Db2 Data Management Console [16], which is the equivalent of both MS SQL Server Management Studio and Oracle SQL Developer does not provide graphical tools. They are available from third parties, such as Dataedo [17].
Other related approaches to MatBase are based on business rules management (BRM) [18, 19] and their corresponding implemented systems (BRMS) and decision managers (e.g., [20 – 22]). From this perspective, (E)MDM is also a formal BRM, and MatBase is an automatically code generating BRMS.
3. Research Methodology
Any (E)MDM schema [1] is a quadruple DKS = <S, M, C; P>, where S is a finite non empty poset of sets ordered by inclusion, M is a finite non-empty set of mappings defined on and taking values from the sets of S, C is a finite non-empty set of constraints (i.e. closed Horn clauses of the first-order predicate logic with equality) over sets in S and/or mappings in M , and P is a finite set of Datalog¬ programs, also over sets in S and mappings in M. Whenever P is empty, DKS is a db scheme, otherwise it is a knowledge base one. In the context of this paper, we are only interested in db schemes.
(S, ⊆) is a poset of sets, with S = Ω ⊕ V ⊕ *S ⊕ SysS, where Ω = E ⊕ R (the non-empty collection of object sets), where E is a non-empty collection of atomic entity-type sets (e.g., PEOPLE, COUNTRIES, PRODUCTS), R is a collection of relationship-type sets (e.g., NEIGHBOUR_ COUNTRIES ⊂ COUNTRIES × COUNTRIES, STOCKS ⊂
PRODUCTS×WAREHOUSES); V is a non-empty collection of value sets (e.g., SEXES = {“F”,”M”}, [0,16] ⊂ NAT(2), ASCII(32) ⊂ ASCII(n), [1/1/100, Today()] ⊂ DATETIME, with NAT(n) being the subset of naturals of at most n digits, ASCII(n) the subset of the freely generated monoid over the ASCII alphabet only including strings of maximum length n, etc.); *S is a collection of computed sets (e.g., MALES, FEMALES, UNPAID_BILLS), and SysS is a collection of system sets (e.g., ∅ (the empty set), NULLS (the distinguished countable set of null values), BOOLE = {true, false}, NAT(n), ASCII(n), DATETIME (the set of date and time values)).
In RDM schemata, generally, the object sets are tables, the computed sets are views (queries), the system sets are corresponding db management system (RDBMS) data types, and the value sets are their needed subsets.
The set of mappings (functions) is M = A ⊕ F ⊕ *M ⊕ SysM, where A ⊂ Hom(S – SysS ⊕V, V) is the non-empty set of attributes(e.g., x : PEOPLE ↔ NAT(10), FirstName : PEOPLE
5
→ ASCII(64), BirthDate : PEOPLE → [1/1/-6000, Today()], Sex : PEOPLE → {“F”, “M”), Amount : UNPAID_BILLS → (0, 100000], where ↔ denotes injections (one-to-one functions) and x is our notation for object identifiers, i.e., functions to be implemented as autonumber surrogate primary keys); F ⊂ Hom(S – SysS ⊕V, S – SysS ⊕ V) is the non-empty set of structural functions (e.g., BirthPlace : PEOPLE → CITIES, Capital : COUNTRIES ↔ CITIES); *M is the set of computed mappings (e.g., BirthCountry = Country ° BirthPlace : PEOPLE → COUNTRIES, Mother • Father : PEOPLE → (PEOPLE ∪NULLS)2), and SysM is the set of system mappings (e.g., 1S (the unity function of a set S), card(S) (the cardinal of a set S), Im(f ) (the image of a function f, i.e., the set of the values it takes), dom(f) and codom(f) (the domain and codomain of f, respectively), isNull(x,y) (which returns x if it is not null, and y otherwise)).
In RDM schemata, the attributes, structural functions, and computed mappings are implemented as table or/and view (query) columns, with structural functions being foreign keys.
The set of constraints is C = SC ⊕ MC ⊕ OC ⊕ SysC, where the set constraints SC contains both general ones (e.g., inclusion, equality, disjointness) and dyadic relation ones (e.g., reflexivity, symmetry, transitivity); the mapping constraints MC contains the general ones (e.g., totality, one-to-oneness, ontoness), dyadic-type self-map ones, general mapping products ones (e.g., minimal one-to-oneness, variable geometry keys and their subkeys, existence), dyadic-type homogeneous binary product ones, function diagram ones (e.g., commutativity, anti-commutativity, representative systems); the set OC of object constraints contains any other closed Horn clauses not among the above, and the system constraints SysC includes all needed mathematical constraints (e.g., card(∅) = 0, ∅ ⊆ S, ∅ ∩ S = ∅, S ∩ S = S ∪ S = S , ∀S∈S).
Constraints are not absolute, but relative to the corresponding subuniverses: for example, constraint C1 from subsection 3.2 goes not generally hold (e.g., there are several cities called “Paris” in the U.S.).
Structural E-RDs are directed graphs having as nodes any type of sets except for the value type ones, and structural functions as edges. Our version of E-RDs [4] uses arrows (i.e., the standard mathematical convention for mappings) instead of diamonds for any binary functional relationship.
In (E)MDM, as a shorthand, the domain of attributes may be omitted if they are listed immediately after their domain set name, and one-to-one mappings use double arrows (e.g., see attribute Country ↔ ASCII(255) from subsection 3.2).
The complexity of an algorithm is denoted by O(e), where e is an algebraic expression and means that the algorithm never loops infinitely and ends in a number of steps that is a multiple of e. Generally, an algorithm is sound if it returns only true answers, complete if it accepts any valid inputs, and optimal (efficient) if it performs in the minimal number of steps possible for any input. Moreover, we say that an algorithm is semi-optimal when, even if it visits more than once an object, it processes any object only once.
In our context, soundness means that all object and computed sets are translated into same type of sets (i.e., rectangles or diamonds, respectively), attributes into ellipses, structural functions into arrows (all of the above being dotted for computed objects), constraints into restrictions, and comments into informal description texts, while completeness means that all valid (E)MDM schemes are correspondingly translated.
In this paper, “mapping” and “function” are used interchangeably. Mapping totality is translated into RDM by a corresponding not null constraint.
4. The REA2 Algorithm
The reverse engineering pseudocode algorithm REA2 used by MatBase for translating (E)MDM schemes into E-R data models is shown in Figures 1 to 10. The dependencies between its 10 methods are presented in Figure 11.
Obviously, value-type sets are not figured in E-RDs: only object or computed ones (which are drawn in dotted lines) are.
5. Results: two reverse engineering examples
Here is an example of an (E)MDM schema – a subset of the GENEALOGIES sub-universe studied in [10, 11]:
COUNTRIES
x ↔ NAT(3) total
Country ↔ ASCII(255) total (There may not be 2 countries having same name.)
CITIES
x ↔ NAT(6) total
City → ASCII(255) total
Country : CITIES → COUNTRIES total
Capital : COUNTRIES ↔ CITIES (No city may simultaneously be the capital of 2 countries.)
C1: City • Country key (There may not be 2 cities with a same name in a same country.) C2: Country ° Capital reflexive (The capital city of any country must belong to that country.)
DYNASTIES
x ↔ NAT(8) total
Dynasty ↔ ASCII(255) total (There may not be 2 dynasties having same name.) Country : DYNASTIES → COUNTRIES total
TITLES
x ↔ NAT(2) total
Title ↔ ASCII(32) total (There may not be 2 titles having same name.) RULERS
x ↔ NAT(16) total
Name → ASCII(255) total
Sex → {‘M’, ‘F’, ‘N’} total (‘N’ is for non-persons)
BirthYear → [-6500, CurrentYear()]
PassedAwayYear → [-6500, CurrentYear()]
URL → ASCII(255)
Age = isNull(PassedAwayYear, CurrentYear()) – BirthYear
Mother : RULERS→ RULERS acyclic (Nobody may be his/her maternal ancestor or descendant.)
Father : RULERS→ RULERS acyclic (Nobody may be his/her paternal ancestor or descendant.)
KilledBy : RULERS→ RULERS
Dynasty : RULERS→ DYNASTIES
Title : RULERS→ TITLES
Founder : DYNASTIES ↔ RULERS (Nobody founds more than one dynasty.) BirthPlace : RULERS→ CITIES
Nationality : RULERS→ COUNTRIES
PassedAwayPlace : RULERS → CITIES
C3: Name • Dynasty • BirthYear key (There may not be two persons of a same dynasty born in a same year and having a same name.)
C4: Dynasty ° Founder reflexive (The founder of any dynasty must belong to that dynasty.) C5: (∀x∈RULERS)(Dynasty(x) ∉ NULLS ∧ Founder(Dynasty(x)) ∉ NULLS ∧ BirthYear(Founder(Dynasty(x))) ∉ NULLS ⇒ PassedAwayYear(x) ∈ NULLS ∨ PassedAwayYear(x) > BirthYear(Founder(Dynasty(x))))
(Nobody may belong to a dynasty that was founded after his/her death). C6: (∀x∈RULERS)(0 ≤ Age(x) ≤ 140) (Anybody’s age must be a natural at most equal to 140.) C7: (∀x∈RULERS)(Sex(Mother(x)) = ‘F’) (Mothers’ sex must be ‘F’.)
C8: (∀x∈RULERS)(Sex(Father(x)) = ‘M’) (Fathers’ sex must be ‘M’.)
C9: (∀x∈RULERS)(Sex(x) = ‘N’ ⇒ Mother(x) ∈ NULLS ∧ Father(x) ∈ NULLS ∧ Dynasty(x) ∈ NULLS ∧ KilledBy(x) ∈ NULLS) (Non-persons may not have parents or killers or belong to dynasties.)
C10: (∀x,y∈RULERS)(x ≠ y ∧ Mother(x) = Mother(y) ⇒ Name(x) ≠ Name(y)) (No mother gives a same name to 2 of her children.)
C11: (∀x,y∈RULERS)(x ≠ y ∧ Father(x) = Father(y) ⇒ Name(x) ≠ Name(y)) (No father gives a same name to 2 of his children.)
C12: (∀x∈RULERS)(BirthYear(x) ∉ NULLS ∧ Mother(x) ∉ NULLS ∧ BirthYear(Mother(x))
∉ NULLS ⇒ 5 ≤ BirthYear(x) – BirthYear(Mother(x)) ≤ 75 ∧
(PassedAwayYear(Mother(x)) ∉ NULLS ⇒ (BirthYear(x) ≤ PassedAwayYear(Mother(x)))) (Women may give birth only between 5 and 75 years, and before passing away.) C13: (∀x∈RULERS)(BirthYear(x) ∉ NULLS ∧ Father(x) ∉ NULLS ∧ BirthYear(Father(x)) ∉ NULLS ⇒ 9 ≤ BirthYear(x) – BirthYear(Father(x)) ≤ 100 ∧ (PassedAwayYear(Father(x)) ∉ NULLS ⇒ (BirthYear(x) ≤ PassedAwayYear(Father(x)) + 1)) (Men may have children only between 9 and 100 years, and at most one year after passing away.) C14: (∀x∈RULERS)(PassedAwayYear(x) ∉ NULLS ∧ KilledBy(x) ∉ NULLS ∧ BirthYear(KilledBy(x)) ∉ NULLS ⇒ BirthYear(KilledBy(x)) + 3 ≤ PassedAwayYear(x) ≤ isNull(PassedAwayYear(KilledBy(x)), BirthYear(KilledBy(x)) + 140)) (Any killer of a person must have been alive and at least 3 years old when his/her victim was killed.) MARRIAGES
x ↔ NAT(16) total
MarriageYear → [-6500, CurrentYear()]
DivorceYear → [-6500, CurrentYear()]
Husband : MARRIAGES → RULERS
Wife : MARRIAGES → RULERS
C15: Husband • Wife • MarriageYear key (Nobody may marry a same person twice in a same year.)
C16: Husband • Wife • DivorceYear key (Nobody may divorce a same person twice in a same year.)
C17: (∀x∈MARRIAGES)(MarriageYear(x) ∉ NULLS ⇒ DivorceYear(x) ∉ NULLS ∨ DivorceYear(x) ≥ MarriageYear(x)) (Nobody may divorce somebody before marrying him/her.)
C18: (∀x∈MARRIAGES)(Sex(Wife(x)) = ‘F’) (Wives’ sex must be ‘F’.)
C19: (∀x∈MARRIAGES)(Sex(Husband(x)) = ‘M’) (Husbands’ sex must be ‘M’.) C20: (∀x∈MARRIAGES)(MarriageYear(x) ∉ NULLS ⇒ ((BirthYear(Husband(x)) ∈ NULLS ∨ BirthYear(Husband(x)) ≤ MarriageYear(x)) ∧ (PassedAwayYear(Husband(x)) ∈ NULLS ∨ PassedAwayYear(Husband(x)) ≥ MarriageYear(x))) ∧ (BirthYear(Wife(x)) ∈ NULLS ∨ BirthYear(Wife(x)) ≤ MarriageYear(x)) ∧ (PassedAwayYear(Wife(x)) ∈ NULLS ∨ PassedAwayYear(Wife(x)) ≥ MarriageYear(x)))) (Nobody may marry before being born or after death.)
C21: (∀x∈MARRIAGES)(DivorceYear(x) ∉ NULLS ⇒ ((BirthYear(Husband(x)) ∈ NULLS ∨ BirthYear(Husband(x)) ≤ DivorceYear(x)) ∧ (PassedAwayYear (Husband(x)) ∈ NULLS ∨ PassedAwayYear(Husband(x)) ≥ DivorceYear (x))) ∧ (BirthYear(Wife(x)) ∈ NULLS ∨ BirthYear(Wife(x)) ≤ DivorceYear(x)) ∧ (PassedAwayYear(Wife(x)) ∈ NULLS ∨ PassedAwayYear(Wife(x)) ≥ DivorceYear(x)))) (Nobody may divorce before being born or after death.)
REIGNS
x ↔ NAT(20) total
FromY → [-6500, CurrentYear()] total
ToY → [-6500, CurrentYear()]
Ruler : REIGNS → RULERS total
Country : REIGNS → COUNTRIES total
Title : RULERS→ TITLES
C22: Ruler • Country • FromY key (It does not make sense to store twice that a ruler began ruling a country during a year.)
C23: Ruler • Country • ToY key (It does not make sense to store twice that a ruler ended ruling
a country during a year.)
C24: (∀x∈REIGNS)(ToY(x) ∈ NULLS ∨ ToY(x) ≥ FromY(x)) (No reign may end before starting.)
C25: (∀x∈REIGNS)((BirthYear(Ruler(x)) ∉ NULLS ⇒ BirthYear(Ruler(x)) ≤ FromY(x)) ∧ (PassedAwayYear(Ruler(x)) ∉ NULLS ⇒ ToY(x) ∉ NULLS ∧ PassedAwayYear(Ruler(x)) ≥ ToY(x)) (Nobody may reign before being born or after death.)
C26: (∀x,y∈REIGNS)(x ≠ y ∧ Country(x) = Country(y) ∧ (FromY(y) ≥ FromY(x) ∧ FromY(y) ≤ isNull(ToY(x), CurrentYear() ∨ FromY(x) ≥ FromY(y) ∧ FromY(x) ≤ isNull(ToY(y), CurrentYear()) ⇒ (Father(Ruler(y)) = Ruler(x)) ∨ Father(Ruler(x)) = Ruler(y) ∨ Mother(Ruler(y)) = Ruler(x)) ∨ Mother(Ruler(x)) = Ruler(y))) ∨ (∃z∈MARRIAGES) (Husband(z) = Ruler(x) ∧ Wife(z) = Ruler(y) ∨ Husband(z) = Ruler(y) ∧ Wife(z) = Ruler(x))) (No country may be simultaneously ruled by 2 persons, except for cases where the two were married or parent and child.)
Figure 12 presents the corresponding single E-RD obtained by applying algorithm REA2 to this (E)MDM schema with parameter values S = “RULERS” and r = 0 (in fact, the self-maps Mother, Father and KilledBy are not generated for this E-RD, but for the one in Figure 13: we moved them here in this paper as Figure 13 would otherwise become too complicated).
The corresponding generated Restriction Set is the following:
a. max(card(RULERS) = 1016
b. Data ranges:
Name: ASCII(255)
Sex: {‘M’, ‘F’, ‘N’}
BirthYear: [-6500, CurrentYear()]
PassedAwayYear: [-6500, CurrentYear()]
URL: ASCII(255)
c. Compulsory data: x, Name, Sex
d. Uniqueness:
Name • Dynasty • BirthYear (There may not be two persons of a same dynasty born in a same year and having a same name.)
Founder (Nobody founds more than one dynasty.)
e. Other types of restrictions:
f. Age = isNull(PassedAwayYear, CurrentYear()) – BirthYear
g. Mother acyclic (Nobody may be his/her maternal ancestor or descendant.) h. Father acyclic (Nobody may be his/her paternal ancestor or descendant.) i. C3 to C14 (that we do not duplicate here, to obey paper length limitations). The corresponding informal description is trivial: “The set of RULERS stores properties x,
its object integer identifier, Name, an ASCII string of at most 255 characters, etc.”. Figure 13 shows the structural E-RD obtained when running REA2 with S = “RULERS” and r > 0 or with null values for both S and r. The corresponding restriction set for RULERS is exactly the one given above; for the rest of the sets, it is similar. This goes the same for the corresponding informal description.
6. Discussion
Proposition
REA2 is:
(i) linear, having complexity O(n + a + m + c), where n is the total number of the (E)MDM schema non-value sets, a is the total number of their attributes, m of their structural functions (including the roles of the relationship-type sets), and c is the total number of its constraints; (ii) sound;
(iii) complete;
(iv) semi-optimal.
Proof
(i) For r = 0, trivially, n ≤ 1, m = 0, while a and c are the number of attributes defined on S and the one of the constraints involving S, respectively (added by the only one executions of methods addSet, addAttributes, and addFunctRestrictions, when S is a known set; otherwise, they are 0 as well); for S known and r > 0, the number of the corresponding non-value-type sets of the sub-model is s ≤ n = card(Nodes(M)), the one of structural functions is f ≤ m = card(Edges(M)), while a and c are the sums of the number of attributes defined on the sets of the corresponding sub-model and the sum of the constraints involving these sets, respectively;
finally, when S is null, s = n, f =m, while a and c are the sums of the number of attributes defined on the sets of the corresponding (E)MDM schema and the sum of its constraints , respectively. It is very easy to verify that the only loops of REA2 and of its method computeCardinal are executed exactly n times, the one from method addStructuralFunctions exactly for the number of such functions defined on its parameter, the one from addAttributes, similarly, exactly for the number of such functions defined on its parameter, and the one from addSet exactly for the number of constraints that involve its parameter and that were not previously converted to the corresponding restrictions.
The methods otherSet, addFunctRestrictions, addRelationshipType and addEntityType have no loops.
Finally, the outer while loop of method subModel is executed at most n times; before its first execution, the S_A array is initialized with the set name S and its level 0 (in the above example, S = “RULERS”); in each iteration, its inner while loop is executed exactly the number of times that there are sets on the previous level; in the first iteration it is executed exactly once, for S; its first inner loop is executed, for each iteration, exactly the number of times the current set has structural functions defined on it (e.g., for “RULERS”, 5 times, discovering and storing in S_A the sets named “DYNASTIES”, “CITIES”, “COUNTRIES”, and “TITLES”); its second inner loop is executed, for each iteration, exactly the number of times the current set has structural functions taking values from it (e.g., for “RULERS”, 4 times, discovering and storing in S_A the sets named “MARRIAGES” and “REIGNS”); as, its final loop is executed at most n times, method subModel might need at most 2n steps to finish.
Consequently, REA2 never loops infinitly and is linear in the sum n + a + m + c. (ii) As any entity-type set is translated into a rectangle, any relationship-type one into a diamond, any attribute into an ellipsis, any structural function into an arrow, any constraint into a corresponding restriction, and any comment into an informal description text, REA2 is sound.
(iii) As any well-formed (E)MDM schema (i.e., only consisting of sets, mappings, constraints, and comments) is translated into corresponding E-R data models, REA2 is complete.
(iv) For S null and S not-null but r = 0 or null, REA2 is optimal, as it reads and processes each non-value-type set, function, and constraint only once. For S not-null and r > 0, it is only semi-optimal, as it processes them only once, but reads the corresponding sub-model non-value sets twice (once for discovering and storing them into the S_A array, and a second time to translate them), some of the structural functions twice as well (once as defined on the current set in the inner while loop and a second time as taking values from it), and each constraint the number of times equal to the number of sets involved. Q.E.D.
As it may be seen from the above example, for such small (E)MDM schemas, it is not optimal to ask for sub-models: all 7 object sets are discovered for r = 1, so that it would be simpler to ask for the complete model instead, as even this smallest sub-model is equal to the whole one, which can be obtained faster.
However, for commercial dbs with hundreds of object and computed sets, this feature is essential, as, on one hand, updates or/and extensions of current models are generally involving only a handful of related sets and, on the other, printing and analyzing the whole structural E RD for them is almost impossible.
Please note that method subModel stops execution of its outer while as soon as no more sets are discovered in an iteration, which is discovered whenever at the beginning of the following one the variables j and oldj have same value. This means, for example, that running the algorithm with S = “RULERS” and r having any other natural value greater than 1 will stop exactly when the one for r = 1 stops.
The actual implementations of REA2 in MatBase are smarter and faster than the pseudocode algorithm presented in subsection 3.1, by using embedded SQL over its meta-catalog. For
example, in method subModel the S_A array is replaced by a temporary table with same name and the two inner loops are replaced by the execution of the following MS VBA statement (for the structure of the meta-catalog tables SETS and FUNCTIONS, please see [8, 23]): DoCmd.RunSQL “INSERT INTO S_A SELECT SetName, “ & i & “ FROM SETS INNER JOIN FUNCTIONS ON SETS.[#S] = FUNCTIONS.Codomain WHERE SetType <> ‘V’ AND Domain IN (SELECT [#S] FROM SETS INNER JOIN S_A ON SETS.SetName = S_A.Set WHERE len = “ & i – 1 & “) AND SetName NOT IN (SELECT Set FROM S_A) UNION SELECT SetName, “ & i & “ FROM SETS INNER JOIN FUNCTIONS ON SETS.[#S] = FUNCTIONS. Domain WHERE Codomain IN (SELECT [#S] FROM SETS INNER JOIN S_A ON SETS.SetName = S_A.Set WHERE len = “ & i – 1 & “) AND SetName NOT IN (SELECT Set FROM S_A)”;
7. Conclusions and further work
To sum up, we presented a linear, sound, complete, and semi-optimal pseudocode algorithm for translating E-R data models into (E)MDM schemes, used by both versions of our intelligent DBMS prototype MatBase. Obviously, this algorithm may be also manually used by db and/or software architects.
We provided an example of applying it to a genealogical tree sub-universe. We also described some powerful additional features of its actual implementations that are aimed at obtaining the fastest possible execution speed.
We successfully used this algorithm for years during our classes of Advanced Database and Software Application Architecture and Design for M.Sc. students with both Computer Science Taught in English stream of the Department of Engineering in Foreign Languages from the Bucharest Polytechnic University and Informatics of the Department of Mathematics and Informatics from the Ovidius University at Constanta, Romania.
We would warmly recommend using our (E)MDM during both db scheme design, db software application architecture, and their maintenance and extensions, as it guarantees the highest possible db data quality. We are convinced that even this small example from subsection 3.2 proves its power and elegance, as these twin fields are, in fact, applied mathematical naïve set, relation, and function theory, plus first-order predicate calculus with equality.
Our next paper will provide the algorithm used by MatBase to translate (E)MDM schemas into RDM ones and associated sets of non-relational constraints –that must be enforced by the software applications managing those dbs–, which is the next step towards guaranteeing the highest possible db stored data quality. For the last step, i.e., the architecture and design of db software applications, we would warmly recommend our db constraint-driven approach [11] .
Acknowledgement
Nobody other than its authors contributed to this paper.
Funding Support
This work was not sponsored by anybody.
Ethical Statement
This study does not contain any studies with human, or animal subjects performed by any of the authors.
Conflicts of Interest
The authors declare that they have no conflicts of interest to this work.
References
[1] Mancas, C. (2024). The (Elementary) Mathematical Data Model revisited. Primera Scientific Engineering, 5(4), 78–91. https://doi.org/10.48550/arXiv.2408.08367 [2] Chen, P.P. (1976). The entity-relationship model. Toward a unified view of data. ACM TODS, 1(1), 9–36. 10.1145/320434.320440
[3] Thalheim, B. (2000). Entity-Relationship Modeling: Foundations of Database Technology. Springer-Verlag.
[4] Mancas, C. (2015). Conceptual Data Modeling and Database Design: A Completely Algorithmic Approach. Volume 1: The Shortest Advisable Path. Apple Academic Press. [5] Codd, E.F. (1970). A relational model for large shared data banks. CACM 13(6), 377–387. 10.1145/362384.362685
[6] Abiteboul, S., Hull, R., & Vianu, V. (1995). Foundation of Databases. Addison-Wesley. [7] Mancas, C. MatBase – a tool for transparent programming while modeling data at conceptual levels. Proc. Int. Conf. on Comp. Sci. & Inf. Techn. CSITEC 2019, Vienna, Austria, (2019), 15–27. . https://aircconline.com/csit/papers/vol9/csit91102.pdf [8] Mancas, C. (2020). MatBase metadata catalog management. Acta Scientific Computer Sciences, 2(4), 25–29. https://doi.org/10.48550/arXiv.2504.07243
[9] Mancas, C. & Mancas, D.C. (2025). MatBase Algorithm for Translating Entity Relationship Data Models into (Elementary) Mathematical Data Model Schemes. Primera Scientific Engineering 7(6), 04–11. https://10.56831/PSEN-07-237
[10] Mancas, D.C. (2023). Design and development of a database software application for managing genealogical trees [M.Sc. dissertation, Ovidius University at Constanta, Romania, Mathematics and Informatics Department].
[11] Mancas, C., Serban, C., and Mancas, D.C. (2023). On Software Application Database Constraint-driven Design and Development. J. of Comp. Sci. Res. 5(1), 31–45. https://doi.org/10.30564/jcsr.v5i1.5476
[12] Quest. (2025, November 19). erwin Data Modeler by Quest. https://www.quest.com/documents/erwin-data-modeler-datasheet-147769.pdf [13] Microsoft SQL Documentation. (2025, November 19). SQL Server Management Studio. https://learn.microsoft.com/pdf?url=https%3A%2F%2Flearn.microsoft.com%2Fen us%2Fssms%2Ftoc.json
[14] Mancas, C. & Dragomir, S. On the Equivalence between Entity-Relationship and Functional Data Modeling. Proc. IASTED Software Eng. & App. SEA 2003, Marina del Rey, U.S.A., (2003), 335–340. https://www.actapress.com/Abstract.aspx?paperId=14533
[15] Oracle Corp. (2025, November 19). Oracle SQL Developer Data Modeler User’s Guide. https://docs.oracle.com/en/database/oracle/sql-developer-data
modeler/24.3/dmdug/data-modeler-concepts-usage.html#GUID-69B12E47-6075-4471- A205-BBB04113EE24
[16] IBM Software Hub. (2025, November 19). IBM Db2 Data Management Console. https://www.ibm.com/docs/en/software-hub/5.2.x?topic=services-db2-data management-console
[17] Dataedo. (2025, November 19). Create diagram for IBM Db2 database. https://dataedo.com/tutorials/create-diagram-for-ibm-db2-database
[18] von Halle, B. & Goldberg, L. (2006). The Business Rule Revolution. Running Businesses the Right Way. Happy About.
[19] Taylor, J. (2019). Decision Management Systems: A Practical Guide to Using Business Rules and Predictive Analytics. IBM Press.
[20] IBM Corp. (2025, November 19). IBM Business Automation Manager Open Editions. https://www.ibm.com/docs/en/ibamoe/9.3.0
[21] Red Hat Documentation. (2025, November 19). Designing your decision management architecture with Red Hat Decision Manager. https://docs.redhat.com/en/documentation/red_hat_decision_manager/7.13/html/designi ng_your_decision_management_architecture_for_red_hat_decision_manager/index
[22] DecisionRules.io. (2025, November 19). DecisionRules Documentation. https://docs.decisionrules.io/doc/ [23] Mancas, C. (2025). On Enforcing Satisfiable, Coherent, and Minimal Sets of Self-map Constraints in MatBase. Primera Scientific Engineering 6(2), 31–49. https://10.56831/PSEN-06-179
