About the shape of the correlator data matrices.
FV: Nov 9, 2003
Objectives:

Dimensionality: ______________


Axis Size Range of sizesDescription
FREQUENCYNf 64 to 8192 Nb of frequency channels per baseband
BASELINE Nbl 1 to 2080 Nb of baselines.
APC Nphc1 to 2 Atmospheric phase correction
POL Np 1 to 4 Nb of polar. cross products
BIN Nbin1 to 4 Nb bins (eg 2 or 4 for wobbling, 2 for sideband separ.)
BASEBAND Nbb 1 to 8 (or 4?) Correlator baseband
TIME always large

Order for these axes:

The correlator sub-system publishes one by one data matrices for each baseband (and each subarray unless they are synchronized?). These data are published in two different streams, the correlator data stream at a time interval corresponding to the INTEGRATION time and the channel average stream at a time interval corresponding most likely to a shorter interval i.e. sub-integrations. The current version of the correlator ICD seems to indicate the order as indicated here in the table, the frequency varying first and the bin axis the last. For the second stream there is the power (dimensionnality 1 size numAnt) and the visibility data matrix with axes FREQUENCY,POL,BASELINE,BIN the frequency varying first. With the proposed EDF these two streams would be logically concatenated as they arrive sequentially in time.
Jim Pisano said (Oct 03) that the order of the axes in the published matrices can still be modified if this appears not optimum for the archive and the users of the archive.

Which order is desired?

The time should be identical or increasing from one row to the next in the EDF MAIN table. The sub-integration being published before each integration this incremental time is satisfied. The BIN axis corresponds to the different switching phases of a cycle. The time interval between two phases being smaller than the sub-integration time and the data being combined according a certain switching scheme, the sequentiality in time is preserved.

Going backward from the last dimension it is desired to have, after the TIME axis, the BASELINE axis. The reason is obvious and important: the data need to be accessed sorted in time, the native order, for the calibration but then need to be sorted in baseline coordinates when imaging. With BASELINE contiguous to TIME, when sorting in baseline coordinates each data cell can preserve its content.
After this comes BASEBAND as the most natural. Notice that the size of the data cell in MAIN table may be different from baseband to baseband since each baseband has its own set of sizes Nf,Nphc,Np. In the 4 dimensions Nf,Nphc,Np,Nbin, the frequency axis will always be the one with the largest size; furthermore it could be the one with the highest probability to change from one baseband to the next (spectral zooms on a line in addition to low resolution for continuum). This suggests to put the FREQUENCY axis there, after TIME and BASEBAND when decrementing the axis number.

Location of the BIN axis:
The BIN axis could be the one with the highest probability to be common to all basebands, possibly together with the POL axis. However the axis BIN of size Nbin is not in the matrices of the EDF data cell, the different phases of the switching appearing as different rows sharing the same time stamps but associated to different keys (DATA_DESCRIPTION_ID and or FIELD_ID). BIN and BASEBAND are tightly associated in some cases (sideband separations, frequency switch) and it is seems adequate to have those two axis contiguous.

This ensemble of remarks leads to the following order:

1: POL, 2: APC, 3: FREQ, 4: BIN, 5: BASEBAND, 6: BASELINE, 7: SUB-INTEG, 8: INTEG
where POL varies first and TIME has been splitted to differenciate SUB-INTEG and INTEG. Then come OBSERVATION, SCAN and EXECUTE but this does not need to be considered here.

Location of the APC axis:
It may be questionnable to put APC before FREQ! If the system imposes APC of common size for all basebands then APC could be put as the first axis and the data filtering when retrieving (keeping the corrected or uncorrected data according to some criterion such as the maximum amplitude) would be very simple allowing to ignore the fact that POL and BIN could be different from BASEBAND to BASEBAND. There is however one complication: both the auto and the cross-correlations will always be stored and Nphc will always be 1 for the auto-correlations. Hence when having both the corrected and uncorrected data Nphc will be anyhow never always 2! For that reason it is not justified to put APC as the first axis. An alternate order is the following:

1: POL, 2: FREQ, 3: APC, 4: BIN, 5: BASEBAND, 6: BASELINE, 7: SUB-INTEG, 8: INTEG
which swaps FREQ and APC compared to the previous order. I suspect this later order with APC as the 3rd axis to provide more efficiency while the selection between the corrected and un-corrected has to be done.

Conclusion:___________

The order of the axes in the matrices published by the correlator sub-system as currently given in the ICD should be revised unless there are technical reasons and in that case the archive sub-sytem would have the responsability of transpositions. If there is no such limitations the most convenient order for the matrices published by the correlator subsystem would be:

1: POL, 2: FREQ, 3: APC, 4: BIN, 5: BASEBAND, 6: BASELINE.
However, if for some reasons it is not possible to put all basebands together in a single block, the archive subsystem will need to interleave the different basebands while receiving matrices if possible of the form
1: POL, 2: FREQ, 3: APC, 4: BIN, 5: BASELINE.