Tutorial 28: Point Data Processing & Regression Tutorial

1Extract Water Well Data

Step 1 — Extract Wells from the Data Center

Zoom to Lansing, Michigan. Go to 'Analysis Tools' → 'Data Explorer' → 'Data Analysis'. In the Scatter Data Spatial Analysis interface, select 'Wellogic' as the Data Source. Follow the prompt to draw a polygon along a stretch of the Grand River in southwest Lansing. Click 'SaveShape' to extract all water well records within the polygon.

The Wellogic database is Michigan's comprehensive water well record system — every permitted well in the state, with driller's logs, construction details, and static water levels. MAGNET's Data Center provides instant access to these records through a spatial query: draw a boundary, get the wells.

📸 Figures 1–2: Screenshots pending — Lansing area zoom and extracted well data markers on map

Millions of Wells, Integrated and Ready

Wellogic is Michigan's system — but every state has one: This tutorial uses Michigan's Wellogic database, but the MAGNET Data Center has integrated water well records from across the USA and Canada into a single, spatially queryable layer. Draw a polygon anywhere — Michigan, Texas, Alberta, Florida — and extract every well record in that area. The Data Center handles the state-by-state format differences, coordinate systems, and field naming conventions. You get a unified dataset regardless of which agency collected it.

What each record contains: Location (lat/lon or geocoded address), well depth, static water level (SWL — depth to water at time of completion), driller's lithology log (materials encountered at each depth), well use (domestic, irrigation, municipal), construction details (casing, screen), pumping rate, and in many cases specific capacity. The DEM-derived land surface elevation (ELEV_DEM) is added by MAGNET during integration — giving you the elevation reference that drillers don't provide.

Density is the advantage: In the Lansing area, a typical polygon might contain hundreds of water well records. A dedicated monitoring network for the same area might have 5–10 wells. What the water well data lacks in per-record precision, it compensates for with sheer density. Statistical techniques — filtering, regression, moving averages, confidence intervals — are designed precisely for this situation: extracting reliable patterns from many noisy observations. Density defeats noise when you process the data correctly.

2Filter Invalid Values

Step 2 — Apply Universal Filters to SWL Data

Click the 'Universal' filter option to open the Universal Filters interface. Under 'Spatial Analysis – Numeric Fields', select 'SWL' — the depth-to-water measurements in feet below ground surface.

Check 'Unique Values' to list all SWL values in the dataset. Scroll to the bottom and double-click '999' and '99999' to add them to the Invalid Values list. These are placeholder codes meaning "not measured" — common in water well databases. Select 'Skip all' under Invalid value filter, then click 'Apply'.

📸 Figures 3–4: Screenshots pending — Universal Filters interface with SWL field selected and invalid values flagged

Filtering Noisy Well Data — Systematic, Not Arbitrary

Sentinel values are the easy part: Removing 999 and 99999 codes is straightforward — they're obviously invalid. But real data cleaning for water well records goes much deeper. A SWL of 0 feet — is that a flowing artesian well or a "not measured" code? A SWL of 500 feet — is that a deep bedrock well or a typo? The Universal Filters interface lets you see every unique value in the dataset, identify the distribution, and make informed decisions about what to keep.

Sources of noise in water well data: (1) Driller variability — different drillers measure SWL differently, at different times after completion, with different accuracy. (2) Temporal variability — a well completed during a 1988 drought has a different SWL than one completed in a 2019 wet spring. (3) Pumping effects — if a nearby well was pumping during measurement, the SWL is artificially depressed. (4) Depth sampling bias — domestic wells are shallow (50–150 ft), municipal wells are deep (300+ ft) — they sample different aquifer zones. (5) Location uncertainty — older wells geocoded from legal descriptions can be off by hundreds of meters; GPS-located wells are accurate to meters.

The statistical approach: You cannot fix individual records — you don't know which driller was careful and which wasn't. But with hundreds of records, statistical tools reveal the signal through the noise. That's why this tutorial leads to regression analysis: the regression line, confidence intervals, and moving averages extract the underlying trend while the scatter around the line quantifies the noise. The R² tells you how much of the variation is signal (topography-driven water table) vs. noise (driller error, temporal effects, pumping). This is how you use lots of noisy data responsibly.

3Calculate Water Level Elevations

Step 3 — Compute SWL Elevation from Depth and DEM

Re-open the Universal Filters interface. Click 'Name Operation' to apply a mathematical filter. Build the equation:

Water Table Elevation = ELEV_DEM − SWL

Click 'ELEV_DEM' in the Numeric Fields list (land surface elevation from DEM), type '−', then click 'SWL' (depth to water). Click 'Done' then 'Apply'.

A new field called 'CalculatedV' is created — the water table elevation at each well. This is automatically selected as the mapped parameter. Click 'Color Ramp for Marker' to visualize the spatial distribution of water table elevations across the study area.

📸 Figures 5–6: Screenshots pending — Name Operation equation builder and color-coded water level elevation markers

4Linear Regression Analysis

Step 4 — Build and Run the Regression

Re-open the Universal Filters interface. Expand 'Simple/Multiple Linear Regression Analysis >>>'. Construct the regression equation:

ELEV_DEM = A0 + A1 × CalculatedV

Name 1 (right-hand side): Click 'Add Name' → build A0+A1*[CalculatedV] → click 'Done'
Name 2 (left-hand side): Click 'Add Name' → select [ELEV_DEM] → click 'Done'

Click 'Do Regression'. A scatter plot of ELEV_DEM vs. CalculatedV appears with the regression line, along with complete statistical results: regression coefficients with standard errors, standard deviation, correlation coefficient (R²), ANOVA table, and individual coefficient t-tests.

📸 Figures 7–8: Screenshots pending — Regression setup interface and results chart with statistics

What the Regression Reveals About Your Aquifer

The land surface–water table relationship: In unconfined aquifers with topography-driven flow, the water table is a subdued replica of the land surface. Hills have deeper water tables, valleys have shallow ones, but the absolute elevation rises and falls with the terrain. A strong linear regression (high R²) from hundreds of noisy water well records confirms this pattern — even though individual records are imprecise. This is the power of statistical analysis applied to dense, noisy data: the regression line reveals what no single well measurement can.

The slope tells a story: A slope near 1.0 means the water table is nearly parallel to the ground surface — typical of thin, low-conductivity surficial aquifers where water follows topography closely. A slope significantly less than 1.0 means the water table is flatter than the terrain — high-conductivity aquifers smooth out topographic variations. The slope is a first-order indicator of aquifer properties before you've run a single flow simulation.

Confidence intervals quantify the noise: The dashed confidence lines and moving-window averages on the regression chart are not decoration — they're the answer to "how much should I trust this relationship given my noisy data?" Tight confidence bands mean the water well data, despite its imprecision, produces a reliable regional trend. Wide bands mean too much noise for the data density — you may need a smaller study area, more filtering, or supplemental monitoring data. This is the quantitative answer to the noisy-data question.

Multiple regression — testing hypotheses: The interface supports multiple independent variables. For the Lansing area, you could test: does distance to the Grand River improve the prediction beyond topography alone? Does well depth matter (indicating confined vs. unconfined conditions)? Does the year of construction introduce a temporal trend (declining water levels over decades)? Each added variable tests a hydrogeologic hypothesis using the same noisy but dense water well data. This is exploratory data analysis — hypothesis-driven, data-rich, computation-free.

5Customize and Export Results

Step 5 — Add Confidence Intervals and Export

Check 'Show Std' and 'Add Band-mean' to add confidence intervals (dashed lines at ±1.5 standard deviations) and moving-window averages (yellow triangles, 10 bins by default) to the regression chart. Click 'OK' to update. Adjust symbol size as needed (default: 3 pixels per data point).

To export the statistical results: select all text in the results box, press Ctrl+C to copy, paste into a text editor, and save as a local file. The exported results include everything needed for a technical report: coefficients, standard errors, R², ANOVA table, and t-test results.

📸 Figures 9–10: Screenshots pending — Regression chart with confidence bands and exported statistics text

Key Concepts

Noisy data, used correctly, beats sparse precise data for regional studies: A hundred water well records with ±10 ft uncertainty, processed through systematic filtering and statistical analysis, produce a more reliable regional water table map than five monitoring wells with ±0.01 ft precision. The monitoring wells are essential for calibration and validation — but they can't cover a county. Water well data can. The tools in this tutorial — Universal Filters, calculated fields, regression — are how you bridge from noisy abundance to reliable regional understanding.

This workflow scales: The same extraction → filter → calculate → analyze workflow works at any scale. A small polygon in Lansing yields hundreds of wells. A county-scale polygon yields thousands. A statewide query yields hundreds of thousands. The statistical tools scale with the data — more records means tighter confidence intervals, more robust regression, and more reliable regional patterns. Michigan EGLE used exactly this approach to model all remaining community groundwater systems statewide (Case Study: $30M Saved).

Data processing is Part I of the modeling workflow: Tutorials 26–28 form the complete data analysis toolkit: (26) import, visualize, and explore point data in 2D and 3D; (27) integrate DataNET federated layers into a model; (28) process, filter, compute derived fields, and run statistical analysis. These are not optional prerequisites — they are the foundation. Every insight from data analysis (depth trends, spatial patterns, regression relationships) feeds directly into conceptual model decisions. The model is only as good as the data understanding behind it.

Lansing — real data under your feet: This tutorial uses water well records from Lansing, Michigan — the Grand River corridor, the glacial drift aquifer, the Wellogic database. The regression result describes the hydrogeology where many MAGNET users live and work. That's the point: the Data Center has already integrated the data. You don't need to request files from agencies, reformat spreadsheets, or geocode addresses. Draw a polygon, extract, filter, analyze. The data is there — everywhere.

6What's Next

You've completed all 28 IGW-NET Quick Tutorials — from your first 2D steady-state model to data processing and regression analysis. The full progression:

Fundamentals (1–5): Domain, flow, particles, water balance, transport
3D & Transient (6–10): Layers, time-stepping, calibration, stochastic
Advanced Modeling (11–18): Hierarchy, profiles, shapefiles, post-analysis, Monte Carlo
Specialized Tools (19–23): Parameter estimation, Theis, unstructured grids, MODFLOW
3D & Data (24–28): T-PROGS geology, 3D visualization, point data, DataNET, regression

Each tutorial built on the last. You now have the skills to build, calibrate, analyze, and communicate groundwater models using the full power of the MAGNET4WATER platform.

This tutorial covers