MAE 127:  Lecture 1

Introduction to the course

Handouts:  syllabus and reading reference list (pdf format)
                   survey (pdf format)

Professor Sarah Gille:  I'm a physical oceanographer.  At UCSD I'm split between Scripps Institution of Oceanography and the department of Mechanical and Aerospace Engineering.  Since I can't be two places at once, I usually sit at Scripps, though I'll be in my MAE office before class.   If you need to reach me any other time, call or send e-mail.

Among other things, I also teach data analysis grad students at SIO.  When I'm not teaching, my research focuses on the ocean's role in climate, and I work mostly on the Southern Ocean---that's the part of the global ocean that encircles Antarctica.   My work depends on analyzing data, and the methods that we'll cover in this class are things that come up in climate research all of the time.  That's not to say that they only matter for climate studies or even that they're only relevant for research in a university setting.

The class: nominally aimed at ME, EnvE, Earth Science, and ESYS majors.
Formal prerequisites are Math 20C (or equivalent)  and 20F is recommended.

Why study statistical methods?
Statistical methods are the tools to help us interpret observations of the natural environment or measurements from the lab.  Thus statistics is fundamental to science and engineering.  Environmental sciences differ from pure physics or laboratory engineering work, because we often observe the world as it exists, rather than performing controlled experiments.  As a result, our data can be noisy and imperfect, and we have to analyze our data carefully.  

Course objectives:
(1) To teach you fundamental statistics and basic techniques for analyzing data.
(2) To make sure you learn not only how to treat data but also how to assess uncertainties.
(3) To emphasize problem solving skills. 

Course schedule:
Roughly 3 segments:  basic statistics, least-squares fitting, spectral analysis, with empirical orthogonal functions at end.  Details are subject to revision, so check course web site for updates.  This is a new course, and feedback is welcome.

See the syllabus for course requirements.  I try to start class promptly and to finish on time, so please plan to arrive on time.

Comments on texts:
This year the course has no assigned text.   That's because the class is new, and nothing that I checked out in advance seemed a perfect choice in terms of both content and cost.  In exchange, I'll post detailed notes on the web.  I've also put a dozen or so books on reserve, and I'll have key chapters made available through electronic reserves.   Your feedback will be great in deciding whether we can assign a single textbook next year. 

Taylor's book is quite introductory but clearly written and comparatively inexpensive, so I've made it available at the bookstore as an optional text.  We won't cover all of it.

We will make use of some basics of linear algebra.  I've put a few basic textbooks (by Strang and by Noble) on reserve.  There are also some good Matlab tutorials on linear algebra. 

The other books are mostly upper level undergraduate or graduate texts on data analysis.  I'll point out appropriate references as they come up.

Comments on software:
We will use Matlab for this course.  Matlab is good for statistical methods, because it's really built around a linear algebra package.  (The name Matlab comes from Matrix Laboratory).  It does the mathematical operations that we need, and it lets you make plots easily.  Some of you are probably really familiar with Matlab, but some of you probably haven't seen it, so I'll spend a couple of class sessions going over the basics that you need for this course.

What is data? 
Data can be any measurements, either from field observations, laboratory experiments, or computer simulations.   Some examples include temperatures at the Scripps Pier or output from a computer simulation of ocean circulation in the tropical Pacific.

What do you do with it?
(see lecture1.pdf)
Start by looking at it, making plots.
time-series:  graphs of a variable versus time
maps:  variable versus latitude and longitude
sections:  variable versus depth and for example, longitude.
Hovmoller diagrams:  variable versus time and for example, longitude.

What do we learn through data analysis? (see lecture1.pdf)
Example 1:  Long-term climate trends are tracked by testing whether conditions at present differ from conditions in the past.   But what does different mean?   We'll have to define a statistical standard for determining when one measurement differs from another.  In some cases, when many data are averaged together, error bars are clearly small.  On the other hand, measurements from single points can be useful, if we can figure out how to interpret their uncertainties.

Example 2:  Evaluating Southern California air quality:  Federal and state air quality standards require monitoring for levels of ozone, carbon monoxide, NOx, and particulate matter that exceed a threshold.  The law is strict:  one day of violation is cause for concern.  These threshold standards are implemented because they're easy to set up, and because the thresholds are considered minimum requirements for human health.  However natural variability can lead to occasional extreme events, so threshold requirements can be very stringent requirements.

(On the other hand, if you design a satellite to monitor pollution for example, any failure in the rocket that launches your satellite or the measurement equipment would prevent you from collecting any data, so you might want a stringent engineering design standard.)

Example 3:  Identifying the annual cycle:  The annual cycle of seasons means that almost everything on Earth undergoes a natural annual cycle.  Thus, before we do any analysis we often want to remove the annual trend.  For the moment, we'll look at examples for ozone in the troposphere (which varies because it undergoes a photochemical reaction, and that of course depends on the available sunlight), temperature in the upper ocean at the BATS site, and the Keeling curve.