Exploratory Data Analysis

Workflow context
For the workflow context, see 📘 Data Center · 📘 Aquifer Attributes (data sources)
  1. Graphs
  2. Local Filter
  3. Statistics Panel
  4. Plot Options

Exploratory Data Analysis

1. Graphical Analyses

Historgram

The first (default) graph shown is the histogram, or a frequency versus parameter value bar chart. The number of bars (or bins) is controlled by the PDF Number of lags value under Plot Options.

PDF

The PDF graph is the probability density function (PDF), or the relationship between observations and their probability (expressed as a fraction of 1 in IGW-NET). Data densities are shown as bars. The number of bars (or bins) is controlled by the PDF Number of lags value under Plot Options.

Check the box next to 'Normal PDF' to add a Guassian curve to the plot (click Update to add the curve).

CDF

The CDF graph is the cumulative distribution function, or the relationship between the probability of an observation/value taking on a value equal to or less than a specified value. Data densities are shown as bars. The number of bars (or bins) is controlled by the PDF Number of lags value under Plot Options.

h-ScatterPlot

This is a bivariate plot on which the pairs, zi and zi+h, for the value for separation distance, h, are plotted as the two axes. The shape and tightness of the cloud of points is related to the value of the variogram for h.

The seperation distance interval, or lag, can be adjusted used the 'Scatterplot Lag h' slider control under Plot Options. The Scatterplot Tolerance is used to specify how much the distance between pairs can differ from the exact lag distance and still can be included in the lag calculations.

X-Spatial

This graph showns the concentration as a function of distance along the west-to-east central line through the data (spatial) domain.

Y-Spatial

This graph showns the concentration as a function of distance along the north-to-central line through the data (spatial) domain.

Temporal

This graph showns the concentration as a function of time (sampling date) if temporal data is available.

Z-Spatial

This graph showns the concentration as a function of elevation if elevation data is available.

2. Local Filter

The top-right panels is used to create subsets of input data for graphical analysis.

Check the box next to Data Filter and specify the minimum ('From') and maximum parameter values ('To') of the data subset.

Check the box next to Date Filter and specify a starting date ('From') and ending date ('To') of the data subset (Temporal plot only).

3. Statistics Panel

The middle panel on the RHS of the interface provides descriptive statistics of the selected data, including:
- the number of points in data (Npts)
- maximum value
- minimum value
- mean value,
- median value
- variance
- standard deviation
- skewness
- coefficient of variation
- lower quartile (25%)
- upper quartile (75%)

4. Plot Options

The 'PDF Number of Lags option controls the number of bars (bins) used in the Histogram, PDF, and CDF plots (see above).

Check the ' Normal PDF option to add a Gaussian curve (normalized frequency distribution).

The Num-time Interval option controls how many intervals are used for the x-axis of the Temporal plot (Date).

The Scatterplot Lag and Scatterplot Tolerance options are used for the h-ScatterPlot (see above). Seperation distance interval, or lag, can be adjusted used the 'Scatterplot Lag h' slider control under Plot Options. The Scatterplot Tolerance is used to specify how much the distance between pairs can differ from the exact lag distance and still can be included in the lag calculations.