Skip to content

Data importation

When the user clicks on the "Start" button of the "Overview" tab, the "Import" tab is displayed (fig 2).

Fig2 Import

Figure 2: Import tab

Import TXT file button

The button "Import TXT file" allows you to import a format dataset into the interface. Here are the rules to follow so that the imported data is conform :

  • the data must be sampled at a constant pitch;
  • the data must be stored in a .txt file (save as ’.txt’ in your spreadsheet);
  • each column corresponds to a parameter and the first line indicates the name of the parameters;
  • le séparateur décimal doit être le caractère ’.’ ;
  • missing values must be labelled ’NA’ ;
  • the dates of the observations must be entered in a column entitled ’Dates’ ;
  • these dates must be in ’YYYY-MM-DD’ format (ISO 8601 format);
  • the times of the observations must be entered in a column entitled 'Hours';
  • these times must be in ’HH:MM:SS’ format (ISO 8601 format).

When data is imported, the corresponding file name is displayed next to the button.

The package also contains the dataset MarelCarnot1 which can be used as an example file. This dataset is composed of 16 variables and 131,487 observations: 1 observation every 20 minutes from January 1st 2005 to December 31st 2009. It comes from measurements of physico-chemical parameters and nutrient concentrations made by the MAREL-Carnot station, installed in the roadstead of Boulogne-sur-Mer and developed by IFREMER.

All illustrations in this document are based on this dataset. The obervations from the years 2005 to 2008 constituted the learning sample and those from the year 2009 constituted the validation sample. Nine variables were included in the model (see details in appendix, tab 2) and the variable ECHL1 (fluorescence) was used to validate the model3 and check its quality.

Select your save directory button

The "Select your save directory" button allows the user to choose a backup directory. Once selected, the path to the backup directory is displayed next to the button.

In this directory 4 sub-directories are automatically created:

  • "/DonneesBrutes/" which contain the raw data,
  • "/Classification/" which contain the classification result,
  • "/Modelisation/" which contain the modelisation result,
  • and "/Prediction/" which contain the prediction result,.

In each of these 4 directories, 3 sub-directories are created: "/Figures/" , "/Tableaux/" (tables) and "/FichiersR/" (R files).

Summary button

The "Summary" button displays, in a new window, statistics on the variables of the imported dataset. These statistics are then saved in the file "./DonneesBrutes/Tableaux/summaryData.xls".

Figure 3: Output example obtained with the Summary button
Figure 3 is an example of output obtained with the "Summary" button. We can see that the variable C_O21 has a minimum value of 5, a 1st quartile of 7.06, a median of 7.94, a mean of 8.256, a 3rd quartile of 9.38, a maximum value of 16.38 and has 4,501 missing values.

Fix data button

If the imported data do not fully conform to the format requested by the interface, they can be edited using the "Fix Data" button (fig 4).

It is possible to rename variables, to transform character variables into numeric variables, or to modify some data one by one. Once edited, a click on the "Quit" button allows to save the corrected data set in the "./DonneesBrutes/Tableaux/ fileName corrige date .txt)" , which is automatically loaded by the interface.

If important corrections are necessary (dates in a wrong format, wrong labeling of missing values...), it is advised to make them outside the interface and then to import the corrected file.

Figure 4: Fix data window example

Next steps buttons

Once the data has been imported and the save directory has been selected, the user has 3 choices :

  • For the first time use of the interface or to create a new classification to detect states in the imported data, click on the "Classification" button.
  • If a state classification on the imported dataset has been performed previously, and the user now wants to estimate the parameters of the associated MMC-NS model, then the "Time series modeling" button must be pressed.
  • If an MMC-NS model has already been estimated on data with the same structure as the imported data and the user wants to predict the states of the imported data, then the user should click on the "Prediction" button.

  1. http://www.seanoe.org/data/00286/39754/: this dataset has been cleaned (temporal alignment, correction of outliers, replacement of outliers, replacement of some missing values...). 

  2. MAREL = Mesures Automatisées en Réseau pour l’Environnement Littoral (Networked Automated Measurements for the Coastal Environment) 

  3. The fluorescence is indeed a parameter which allows to judge the quality of the considered environment.