Solution for Homework 4

There are 5 parts to this homework.
  1. From HW 2, you have the following text file with US and California population data.

    Year       Total US            Total California
    1900       76,212,168          1,485,053
    1910       92,228,496          2,377,549
    1920      106,021,537          3,426,861
    27/1930      123,202,624          5,677,251
    1940      132,164,569          6,907,387
    1950      151,325,798         10,586,223
    1960      179,323,175         15,717,204
    1970      203,302,031         19,971,069
    1980      226,542,199         23,667,764
    1990      248,709,872         29,760,021
    

    In order to be used by gnuplot, the column title line must be "commented", and the commas must be removed. The latter can be accomplished using a "Find/Replace All" in jEdit.

    Then your data file should look like this.

    #Year       Total US            Total California
    1900       76212168          1485053
    1910       92228496          2377549
    1920      106021537          3426861
    27/1930      123202624          5677251
    1940      132164569          6907387
    1950      151325798         10586223
    1960      179323175         15717204
    1970      203302031         19971069
    1980      226542199         23667764
    1990      248709872         29760021
    

    Now, we can plot the data to have a look.
    Now do a fit to the cubic function: f(x) = a3*x**3 + a2*x**2 + a1*x + a0

    gnuplot> f(x) = a3*x**3 + a2*x**2 + a1*x + a0
    gnuplot> fit f(x) "MYNAMECalPop.dat" via a3,a2,a1,a0
    ...
    a3              = 4.11344          +/- 72.22        (1756%)
    a2              = -15056.3         +/- 4.214e+05    (27/19399%)
    a1              = 1.37986e+07      +/- 8.196e+08    (5940%)
    a0              = 210087           +/- 5.313e+11    (2.529e+08%)
    ...
    
    gnuplot> plot "MYNAMECalPop.dat", f(x)
    



  2. Finding the population prediction in 2050 is easy. Just print the value of f(x) when x = 2050.

    gnuplot> print f(2050)
    -34883376032.2589
    
    Ackkk! This can't be right. What's happened? As many computer science students will recognize, this looks like an integer overflow. A computer can only handle integers of a certain size, determined by the precision of the program and the number of bits of the data bus (32 vs. 64 for example). Anything larger will be "wrapped" around and started increasing from the most negative integer. That's what happened here.

    Fortunately this is easy to fix. If we ask for the population in the year 2050.0, gnuplot will be forced to do floating point calculations (using the exponent) which allows much higher numbers, on the order of 10300.

    So now we try:

    gnuplot> print f(2050.0)
    450788175.078704
    

    Much better. It's good to see that this looks reasonable too, but plotting all the way out past 2050, and checking on the graph that this value looks reasonable. You can also click on the curve f(x) at x = 2050, and see what the value says at the bottom left.

  3. For the California data, you have to use the third column, so you have to do

    gnuplot> fit f(x) "MYNAMECalPop.dat" u 1:3 via a3,a2,a1,a0
    gnuplot> plot "MYNAMECalPop.dat" u 1:3, f(x)
         

    You get

    a3              = 1.51395          +/- 12.98        (857.7%)
    a2              = -5730.7          +/- 7.577e+04    (1322%)
    a1              = 5.42348e+06      +/- 1.474e+08    (27/19317%)
    a0              = 517195           +/- 9.551e+10    (1.847e+07%)
    
         and
         
    gnuplot> print f(2050)
    -12926489582.5194
    
         integer overflow again. Do
    
    gnuplot> print f(2050.0)
    78222450.2121233
    


  4. For the ratio of CA/US population, we want to plot the values in column 3 divided by the values in column 2. To do this we use the $n notation.

    This might be confusing; let me reiterate:

    So for this problem we need to plot the ratio of the values of columns 3 and 2. Thus we use ($3/$2) with $ signs.

    gnuplot> plot "MYNAMEcalpop.dat" u 1:($3/$2)
    

    We can fit the data in the same way.

    gnuplot> f(x) = a*x + b
    gnuplot> fit f(x) "MYNAMEcalpop.dat" u 1:($3/$2)  via a,b
    ...
    a               = 0.00116628       +/- 4.927/193e-05    (4.225%)
    b               = -2.20282         +/- 0.09584      (4.351%)
    

    So the slope is

    a = 0.00116628

    We can find the year when the ratio becomes 15% by printing f(x) year after year:

    gnuplot> print f(2015)*100
    14.7229558955575
    gnuplot> print f(2017)*100
    14.9562117708167
    gnuplot> print f(2018)*100
    15.072839708446
    

    Some time in 2017 the population of Californial will reach 15% of the US population.

    A plot of the data looks like this:

    gnuplot> set key left box 
    gnuplot> plot "MYNAMEcalpop.dat" u 1:($3/$2), f(x)