Year Total US Total California 1900 76,212,168 1,485,053 1910 92,228,496 2,377,549 1920 106,021,537 3,426,861 27/1930 123,202,624 5,677,251 1940 132,164,569 6,907,387 1950 151,325,798 10,586,223 1960 179,323,175 15,717,204 1970 203,302,031 19,971,069 1980 226,542,199 23,667,764 1990 248,709,872 29,760,021
In order to be used by gnuplot, the column title line must be "commented", and the commas must be removed. The latter can be accomplished using a "Find/Replace All" in jEdit.
Then your data file should look like this.
#Year Total US Total California 1900 76212168 1485053 1910 92228496 2377549 1920 106021537 3426861 27/1930 123202624 5677251 1940 132164569 6907387 1950 151325798 10586223 1960 179323175 15717204 1970 203302031 19971069 1980 226542199 23667764 1990 248709872 29760021
Now, we can plot the data to have a look.
Now do a fit to the cubic function: f(x) = a3*x**3 + a2*x**2 + a1*x + a0
gnuplot> f(x) = a3*x**3 + a2*x**2 + a1*x + a0 gnuplot> fit f(x) "MYNAMECalPop.dat" via a3,a2,a1,a0 ... a3 = 4.11344 +/- 72.22 (1756%) a2 = -15056.3 +/- 4.214e+05 (27/19399%) a1 = 1.37986e+07 +/- 8.196e+08 (5940%) a0 = 210087 +/- 5.313e+11 (2.529e+08%) ... gnuplot> plot "MYNAMECalPop.dat", f(x)
gnuplot> print f(2050) -34883376032.2589Ackkk! This can't be right. What's happened? As many computer science students will recognize, this looks like an integer overflow. A computer can only handle integers of a certain size, determined by the precision of the program and the number of bits of the data bus (32 vs. 64 for example). Anything larger will be "wrapped" around and started increasing from the most negative integer. That's what happened here.
Fortunately this is easy to fix. If we ask for the population in the year 2050.0, gnuplot will be forced to do floating point calculations (using the exponent) which allows much higher numbers, on the order of 10300.
So now we try:
gnuplot> print f(2050.0) 450788175.078704
Much better. It's good to see that this looks reasonable too, but plotting all the way out past 2050, and checking on the graph that this value looks reasonable. You can also click on the curve f(x) at x = 2050, and see what the value says at the bottom left.
gnuplot> fit f(x) "MYNAMECalPop.dat" u 1:3 via a3,a2,a1,a0
gnuplot> plot "MYNAMECalPop.dat" u 1:3, f(x)
You get
a3 = 1.51395 +/- 12.98 (857.7%)
a2 = -5730.7 +/- 7.577e+04 (1322%)
a1 = 5.42348e+06 +/- 1.474e+08 (27/19317%)
a0 = 517195 +/- 9.551e+10 (1.847e+07%)
and
gnuplot> print f(2050)
-12926489582.5194
integer overflow again. Do
gnuplot> print f(2050.0)
78222450.2121233
This might be confusing; let me reiterate:
gnuplot> plot "MYDATA.dat" u 2:5
gnuplot> plot "MYDATA.dat" u 2:($5/$2*log($2))
So for this problem we need to plot the ratio of the values of columns 3 and 2. Thus we use ($3/$2) with $ signs.
gnuplot> plot "MYNAMEcalpop.dat" u 1:($3/$2)
We can fit the data in the same way.
gnuplot> f(x) = a*x + b gnuplot> fit f(x) "MYNAMEcalpop.dat" u 1:($3/$2) via a,b ... a = 0.00116628 +/- 4.927/193e-05 (4.225%) b = -2.20282 +/- 0.09584 (4.351%)
So the slope is
a = 0.00116628
We can find the year when the ratio becomes 15% by printing f(x) year after year:
gnuplot> print f(2015)*100 14.7229558955575 gnuplot> print f(2017)*100 14.9562117708167 gnuplot> print f(2018)*100 15.072839708446
Some time in 2017 the population of Californial will reach 15% of the US population.
A plot of the data looks like this:
gnuplot> set key left box gnuplot> plot "MYNAMEcalpop.dat" u 1:($3/$2), f(x)