![]() |
Intro to Scientific ComputingPHYS 27/193Physics Department University of the Pacific |
This goes under a very old branch of mathematics called
"Linear Regression" (when fitting linear functions), or
"Least Squares" fitting, or sometimes "Data Modelling".
In it's simplest form, the procedure works like this. We'll consider a very
simple linear fit to three data points.
Suppose your experiment gives you these three data points:
1.0 0.7 # x1, y1 2.0 0.8 # x2, y2 3.0 1.7 # x3, y3You define a function f(x) = ax + b which has some parameters that you can change, a and b in this case.
But again, how do you know which line is the best fit (comes closest to the data)?
For each data point (labeled by i = 1,2,3), we compute the
difference between our guess function
f(xi) = axi + b
This difference is called the Residual at
xi. The residuals of each data point are shown as the
little vertical lines ( | ) in the figure above.
Since these can be
either positive or negative, we square them to get a positive number
irregardless of whether the data is above or below the line.
Then
we sum all the squares of residuals.
The
For those who know calculus, what we are really doing is taking the derivative of R with
respect to a and b and set those expressions to zero; this finds where R has a
minimum as we vary a and b.
This gives two equations that we have to solve, sumultaneously.
These are actually easy to solve (email me if you need help).
However, gnuplot doesn't actually solve them!
It just varies a and b and
sees how much R changes. If increasing a decreases R, then gnuplot
increases a a little more.
Once changing the parameters by a little amount always increases R,
After 4 iterations the fit converged. final sum of squares of residuals : 0.106667 rel. change during last iteration : -3.71111e-09 degrees of freedom (FIT_NDF) : 1 rms of residuals (FIT_STDFIT) = sqrt(WSSR/ndf) : 0.326599 variance of residuals (reduced chisquare) = WSSR/ndf : 0.106667 Final set of parameters Asymptotic Standard Error ======================= ========================== a = 0.5 +/- 0.2309 (46.19%) b = 0.0666667 +/- 0.4989 (748.3%)
This gives the paramerters of the line that is the "Best" fit to the data,
as well as some information about how good the fit is (how big was the sum of the squares of the residuals).
This Best line looks like this. It's the one that's closest to the data.
As you can see from the figure below, we need not limit ourselves to
linear functions--you could do the same with almost any function and data.
Your job as a scientist is to have insight (usually from theory) as
to which function is correct.
In this case, you want a quadratic function.
You would do the following
gnuplot> |
Do you see that we've defined a quadratic function: ax2 + bx + c above?
Here's my fit.
Before we look at it in gnuplot however, we will have to edit it slightly. It will look something like this (right?):
Year Total US Total California 1900 76212168 1485053 1910 92228496 2377549 ... |
We have to make the first line (or lines) into a comment, ortherwise gnuplot will
bail trying to plot the number "Year".
Remember how to make a line
into a comment? If not see here: page 4. (
Now our file looks like this:
#Year Total US Total California 1900 76212168 1485053 1910 92228496 2377549 ... |
We can do a normal plot:
plot "calpop.dat"
which plots column 1 and 2 on the x and y axes.
However look at what this command does
gnuplot> |
While the first plot is the normal default one--columns 1:2 on x:y, the second part
gnuplot> plot "calpop.dat",
plots the data in "calpop.dat" again, but this time using columns 1 and 3. You can simplify this with the abreviated version:
gnuplot> plot "calpop.dat", |
Of course you could do more if you had more columns, such as
plot "datafile" u 1:4, "" u 1:3, "" u 3:2
etc. Notice that the last instance, u 3:2, plots the 3rd column as the x values and 2nd column as y values of the points.
Very nice, huh?
But now--Check this out!
You can modify the data on the fly with using!
Try these examples:
gnuplot> |
u 1:(10*$3) -> plots 10 times the value in column 3 u 1:(1/$2) -> plots 1/y for column 2 values u 1:($2/$3) -> plots the ratio of the values in columns 2 and 3
Here's the syntax for modifying data with using:
|
You can make complicated functions. Suppose for example, the data in your file was
1 1
2 10
3 100
the using command
u 1:(2*log10($2)+1)
would plot your data as if it were
1 2*0+1 = 1
2 2*1+1 = 3
3 2*2+1 = 5
This is a really nice feature.
Here is another of my favorite handy using examples. Make sure you understand what it does.
gnuplot> |
The IF/THEN
This is a handy way to suppress plotting
or fitting of some part of your data.