EDF data model and its representation using UML

The ALMA Export Data Format (EDF) is structured using the Data Model which is underneath the MeasurementSet ( Kemball and Wieringa 2000). This model is build using concepts from the domain of relational data bases. The present note gives a number of remarks which need to be taken into consideration when representing the EDF data model by using the UML. Suggestions are made to adopt a comprehensive nomenclature for the names of attributes involving relationships between different entities in the model.

2 Context and Definitions

The structure of the data model is presented as an ensemble of tables, their names being listed in the first column of Tab. 1.
These tables are composed of different sections allowing to discriminate between different categories of attributes. These are the key, the non-key, the data-description and the data sections. Categorized as such, this gives the logic for the various relationships between these tables, the role of each attribute implying an association in a manner which is precisely defined. The network of all the relationships forms a conceptual schema. Its construct is based on a structure which follows rules to offer the capabilities of a relational data base.

To make easier the understanding of this presentation and to avoid ambiguities I provide some definitions of words to be used in this document.

Entity: This word is used to represent a table, this table reflecting a relation between a certain number of attributes. The entity is named by the table name. Using object-oriented words it may be considered as a class.

Attribute: This is a column. The relation reflected by the entity is defined by the ensemble of columns for that entity.

Tuple: It is a given row in the table. Each row provides an instanciation of the ensemble of attributes in the entity. Using object-oriented words it is an object.

Key: If the instance of a minimum ensemble of attributes allows to identify a unique tuple in the relation, a key can be assigned to this ensemble to identify it.

Association: An association is represented by a relation of the same name having for attributes the list of the keys of the entities which participate to the association, this in addition to its own non-key attributes. This word comes from UML. Just as an entity is similar to a class, an association is also similar to a class. In UML an instance of a such class is called a link.

Identifier: To be able to have associations each of the entities which participates to the association needs to have a primary key. A primary key may be explicit or implicit. It is implicit when it corresponds to the position of a tuple in a sequence or, in other words, to the row number in the table. We define an identifier as the primary key in an entity.
For convenience we give to these identifiers a name of the form entityName_ID and it is of type int.

Collection: It may be useful to consider an ensemble of tuples, all members to the same entity. This ensemble defines a collection. In this respect a table is a collection but collections may also correspond only to a subset of the rows in a table. When several values have to be assigned to an attribute in a tuple this attribute refers to a collection. There are several types collections, in particular the set, the bag, the list and the array. Using collections requires telling of which type is each of these collections. This is mandatory to fully describe the structure of the data base. With UML the cardinality is indicated to provide the number of members in a collection.

3 EDF table properties to describe their inter-relations

With these definitions in mind I now give the consequences in term of relations for a given EDF table with the other tables of the EDF data model, this for the various possibilities concerning the parent table section of a given identifier. Each of these possibilities defines the type of relation between the current association taken in consideration and the entity in which the identifier is the primary key.
In the context of relational data bases it is not recommended (forbiden?) to have two tuples which have the same values for their ensemble of key attributes. For this reason it is prefered to avoid the use of an implicit key in the case of the associations. Hence, for those, the key section contains, in addition to the list of primary keys of the entities which participate to the association, the identifier corresponding to the tuple. This can be seen, for examples, with the FEEDor DOPPLERtables which both reflect associations. In this context it is noted that e.g. the SYSCALor CALWIDGETtables do not need to have a SYSCAL_ID or CALWIDGET_ID identifier in their key section; this is because these tables cannot contain two or more tuples with the same value assigned to the attribute which is the key TIME.

To represent the EDF tables using an UML class diagram it is necessary to consider to which section the explicit identifiers belong. The keys participating to an association provide the referential constraint for that association. The non-key attributes in the association may have optional identifiers. In that case the association is also an aggregation. An association is a composite in case there is in its non-key attributes at least one mandatory identifier. A pure composite is an association where all its attributes which are identifiers are mandatory. An example is the DATA_DESCRIPTIONtable. More commonly the EDF tables contain both mandatory and optional identifiers. They must be represented as composites because their existences rely on the existence of one or several other entities.

According to the UML it is necessary to distinguish if a table is an atomic entity (a static or quasi-static table having a single implicit identifier), an entity to which is added a specialisation either as an aggregation or a composition, or an association with or without a role of aggregation or composition. Following this description the status of each table of the data model is given in Tab. 1.

Table 1:

EDF tables with their inter-connections


EDF table name	Type	Primary key		Keys participating to the
		Identifier	visibility	association	aggregation	composition
MAIN	Ass/Com			ANTENNA_ID[2] FEED_ID[2] DATA_DESCRIPTION_ID PROCESSOR_ID SWITCH_PHASE_ID FIELD_ID		EXECUTE_ID STATE_ID
ANTENNA	Ent	ANTENNA_ID	implicit		PHASE_ARRAY_ID
BEAM	Ent	BEAM_ID	implicit
CALWIDGET	Ass			ANTENNA_ID FEED_ID SPECTRAL_WINDOW_ID TIME INTERVAL
DATA_DESCRIPTION	Com	DATA_DESCRIPTION_ID	implicit			SPECTRAL_WINDOW_ID POLARIZATION_ID
DOPPLER	Ass/Com	DOPPLER_ID	explicit	SOURCE_ID		TRANSITION_ID
EXECUTE_SUMMARY	Ass/Com	EXECUTE_ID	implicit			SCHEDULE_ID MAIN_ID[2] ANTENNALIST
FEED	Ass/Com	FEED_ID	explicit	ANTENNA_ID SPECTRAL_WINDOW_ID TIME INTERVAL	RECEIVER_ID BEAM_ID
FIELD	Agg	FIELD_ID	implicit		SOURCE_ID FIELD_ID EPHEMERIS_ID
FOCUS	Ass/Agg			ANTENNA_ID FEED_ID TIME INTERVAL		FOCUS_MODEL_ID
FREQ_OFFSET	Ass/Com			ANTENNA_ID[2] FEED_ID SPECTRAL_WINDOW_ID TIME INTERVAL		FIELD_ID
HISTORY	Ass			EXECUTE_IDTIME
PHASE_TRACKING	Ass			ANTENNA_ID FEED_ID SPECTRAL_WINDOW_ID TIME INTERVAL
POINTING	Ass/Comp			ANTENNA_ID TIME INTERVAL		POINTING_MODEL_ID
POLARIZATION	Ent	POLARIZATION_ID	implicit
PROCESSOR	Ent	PROCESSOR_ID	implicit
PWVM	Ass/Com			ANTENNA_ID FEED_ID DATA_DESCRIPTION_ID PROCESSOR_ID SWITCH_PHASE_ID FIELD_ID		EXECUTE_ID STATE_ID
PWVMCAL	Ass			ANTENNA_ID SPECTRAL_WINDOW_ID TIME INTERVAL
RECEIVER	Ent	RECEIVER_ID	implicit	TIME INTERVAL
SEEING	Ent			TIME INTERVAL
SOURCE	Ass/Agg	SOURCE_ID	explicit	SPECTRAL_WINDOW_ID TIME INTERVAL	SOURCE_PARAMETER_ID
SOURCE_PARAMETER	Ent/Agg	SOURCE_PARAMETER_ID	implicit	TIME INTERVAL	DEP_SOURCE_PAR_ID
SPECTRAL_WINDOW	Ent/Agg	SPECTRAL_WINDOW_ID	implicit		DOPPLER_ID ASSOC_SPW_ID
STATE	Ent	STATE_ID	implicit
SYSCAL	Ass			ANTENNA_ID FEED_ID SPECTRAL_WINDOW_ID TIME INTERVAL
WEATHER	Ass			ANTENNA_ID TIME INTERVAL

Notes:
It is also useful to highlight some characteristics used with the tables which are closely related to associations, in particular when using collections.