Lab (1) - Computing and Visualization Basics in Excel

Microsoft Excel and other spreadsheets such as Calc in OpenOffice are widely
used to manage data and more importantly to perform computations and analysis.
One of the most attractive features of these spreadsheets is their ability and
ease of use for data visualization. In this workshop, we will present several
examples where you will have a chance to learn about and try some of these
features in Excel.

**Excel Basics & Conventions**

Open an Excel Spreadsheet by double clicking on its icon in the list of
programs on your PC. Often the main Spreadshet "Sheet1" is the
default page that will be opened. As you can see at the bottom this sheet,
there might be several other Sheets available (usually 3 others). In the screen
that is shown below, you can see three Sheets (Sheet1, Sheet2, and Sheet3).

A Spreadsheet like the one shown below has many cells on it. Each of the cells
has an address, which is based on the Column and Row where it is located. For
example, the cell in Column B and Row 4, has the address of B4. Of course, we
haven't done anything with these cell yet, so it may not be clear why we need
to know this. To illustrate this addressing format, we have marked four of
these cells A8, B1, C4, and D10. It is very important to know how this
addressing is done as we will use this convention throughout this lab. Just for
the warm-up, write the cell numbers for the two cells that have the arrows
pointed to them.

To enter data in each cell click in the cell and type the data in that cell and press enter to store the content of the cell. You can format and/or change format of each cell using Right-Click on each cell then Format option to choose the desired format.

**Entering Data and Building Formulas**

Let's enter some data and start manipulating them in
Excel. The following spreadsheet holds the salaries for 4 employees in a
company:

John Doe -- $45000

Jane Doe -- $55000

Jim Jungleboy -- $61300

Roger Tallman -- $42400

As you can see in the image below, the
name of each employee is stored in Column A in Rows 6:9 (6 through 9) and the
salary of each employee is stored in Column B in Rows 6:9 of the
spreadsheet. For example, in cell B7 you
will find the salary of Jane Doe which is $55,000. Also, you may have noticed
that we had some other things above the data. For example, we have put a title
and then in B3 we have defined a percentage.

We want to increase the salary of these employees by 3%. To do so we need a
formula:

To enter a formula you will go to the cell where you want to store the result and will start typing beginning with =. For example to compute the new salary for John Doe you will go to cell C6 and will type (note '=' is critical):

= B6 + 0.03*B6

You can see this in the spreadsheet below. Note that in this formula we have used B6 that holds John Doe’s current salary. Repeat this for the rest of employees. Remember to press enter after each entry.

After you typed (or copied) the formulas for all employees, you will get something that looks like this:

Let’s compute the total salary for all the employees before
and after the raise.
A pre-defined function we want to use to compute the total salary is called **sum**. To compute the sum we need the range of cells
for which we want to compute the sum. In
our spreadsheet, the range of salaries before the raise is B6 through B9, i.e.,
B6, B7, B8, and B9. So go to cell A11
type Sum (this is a label), then go to cell B11 where you will compute the Sum
of salaries before the raise and type:

= sum(B6:B9)

The **B6 : B9 **will give us a range of cells starting
from B6 and ending at B9.
Note that we used the pre-defined function **sum**
to compute the sum of salaries. There are many pre-defined functions in Excel that help you compute different
things. To find out what these functions are click in a cell and then under
Formulas -> Insert Functiont to see a list of available built-in funstions.

For the next sum, we want to do something a bit different. In cell C11 type:

=sum(

then left click on the cell C6, and hold the button and drag
down to C9 as you are still holding the button, then at C9 release the button.
The C11 cell gets filled automatically due to **fill automatic **feature of
Excel. The cell looks like this:

And if you press enter the sum of salaries after the raise will be computed as well.

**Naming Cells**

In the above example we use cell B3 to keep the percent of the raise, i.e., 3%. But in our salary calculations we didn’t use it because we used a fix 0.03. Now we want to use that cell in the calculation of the new salary. If we use:

C6 = B6 + B3*B6

and then use the automatic copy and fill option to copy the formula to other cells B3 will change as you move to different cells due to relative cell addressing. Let’s try this so you can see what will happen. Click in C6 and type:

= B6 + B3*B6

and press enter. You will get the same value as before. Now, click on that cell and drag the bottom-right of the cell down on cell C7. Once done, click in a blank cell, then double click on cell C7, you will see:

= B7 + B4*B7

Did you notice that B3 was not used and Excel automatically has used the next cell which is B4. To avoid this problem we can fix the cell B3 by placing a $ in front of it in the formula in cell C6. So let’s try this. Click on cell C6 and type:

= B6 + $B$3 * B6

Press enter. The result for C6 is the same. But this time when you drag the bottom-right of the cell on C7, you will see:

= B7 + $B$3*B7

But there is another way to do this which is very powerful and often more popular. Using this approach we name the cell that holds the rate. So, we name B3 as raise and will use word raise in the calculations. How do we do this?

Click on cell B3

At the top-left, right below Home, Click on where it says B3

Type raise instead of B3, then press Enter.

This will name the cell B3 as raise. To make sure that this has been saved, click on another cell, and then back on B3, the spce right below Home should say raise if the naming is done correctly. From now on we can use raise iin all formulas instead of B3.

Let’s try this (see the screen below):

Now when we use the automatic fill option we no longer need to be worried about raise being changed as that will always stay as variable Raise.

What is interesting with this approach is that now we can change the value in cell B3 where we stored the percentage of the raise and that affects the calculations automatically without the need of copying the formula again. Here is an example:

Compute the average salary for both the current and the salaries after 3% raise. Label this calculation as Average Salary.

Compute the Min and Max of the original salaries and those for after 3% raise.

In the above example we calculated 3% or 4% raise and the sum of salaries was computed at the bottom. Suppose we are given $240,000 to spend on new salaries and we want to give a raise such that the sum of salaries after the raise is $240,000. One way to do this is to play with B3,

What
parameter do you want to change to make it happen? In our case that is **Raise,
B3**

What will
change as a result of the change in the Raise? In our case sum of salaries, **C11**

What target
value do we want to reach for sum? In our case, **$240,000**

Ok we are ready.
Click on cell C11 where you want to set the target sum of salaries.

* Click on Data at the top

* Locate the **What if Analysis **item. From the What if Analysis menu,
select **Goal Seek**

* The following dialogue box appears. Enter the values as shown and pay
attention on why you are entering the values in the fields this way.

Now click on Ok. Also, click on the next screen. Your calculations are done. You will get:

So you need 18% raise to get the sum of the salaries to $240,000. What a great raise in such a tough time.

Suppose we want to include a tax penalty rate in the salary calculation. This rate is 3.2% that will be applied to the portion of the salary that is higher than $40,000. Thus, in case of John Doe, the new salary is:

NewSalary = CurrentSalary + Raise*CurrentSalary – PenaltyRate*(CurrentSalary – 40000)

Note that the sum of salaries after the raise and penalty has to stay $240,000. So you need to run Goal Seek.

Most scientific graphs show the relationship between
two quantities. In Excel, such a relationship is graphed using a scatter plot.
Visualization of the data will help us better understand the data.

Let's work on a problem together.

Now we will download a file from the Internet that contains some data and then
will get that data into an Excel sheet. To get the data which is stored in two
columns in a file called unknowData.txt right click on this file name unknownData.txt and use Save Link As (Or Save File As)
to save the file on your PC, perhaps on your Desktop or a directory you may
have created for the workshop.

You can double click on the link above which will display the content of the
file and then use Save As, to save the file.

**How to get this data into our excel sheet? **

Getting the data to an Excel sheet involves several steps:

1) Open a New Excel Sheet, by going to the top-left corner and choosing New.

2) Then, on the top-left of the sheet, find the Open, and click on it

3) Browse to the location where you had saved the unknownData.txt file. Once
you are in that directory, at the bottom where you have **Files of type**,
choose All Files (*.*). This way you will be able to see the .txt file. Click
on the unknownData.txt file and then on Open at the bottom.

4) You will get a new form called "Text Import Wizard -- ". In that
screen, for the Original data type, Choose Delimited. I am sure you know why.
The data you just downloaded contains two columns that are separated by one
space. Thus, the identifier of each column is that space. Then click on Next.

4) The two column data are stored a Tab apart, so you can choose Tab as
Delimiters, and then click on Next.

5) In the next screen, choose General for the Data type format, and click on
Finish.

Now you should have two columns of data in your sheet. Excel will assume that
the left column is intended for the horizontal axis. Graph a scatter plot of
these points by highlighting the data, then clicking on Insert, Scatter. This
produces a scatter plot of the two columns of data.

You should get a graph like the one shown below. Since you have only one set of
data, click on Series1 on the right-hand-side of your plot and then press
Delete to delete that caption.

To label axis click on the scatter plot and then go to Design -> Chart
Layouts and choose the left most option (or really whichever one you like but
this will do). You can now right click on the labels to edit them.

Often scientists look at scatter plots like the one
above to determine the trend of the data by finding a function that fits the
data well. Excel provides an option for Curve Fitting. Here is a document that
explain the process in details Curve
Fitting in Excel . We will go through some steps below to fit a curve to
the data we have plotted above.

First be sure that your chart is highlighted (left click on the chart anywhere
if it isn't). Go to Layout -> Trendline. Be sure to choose more Trendline
options. You have 6 options and note that when selecting polynomial you need to
set the order of the polynomial (2 is quadratic, 3 is cubic). At the bottom of
the Format Trendline box choose Display Equation and Display R squared. Note
also that you can choose to give the equation a name, choose to have the equation
extrapolated forward or backward, and set a fixed intercept value. From the
look of our plot, I guessed that this might be a polynomial of order 2, so I
selected that one. I also played with the Line Color and Width and selected
color Red for the Trendline and width of 2Pts.

The curve will appear on the graph along with the
data.

As
always, you should use caution when interpreting the *r-squared* value. Mathematically this value must increase for
“curvier” functions such as polynomials of higher degree. Choosing between
different functions should be done visually, not relying solely on the *r-squared* value.

Can you guess what this formula is for in Physics? You are right. It is
actually the formula for the height of a projectile defined as:

Height = -(1/2)*g*t^2 + v0_{y}*t

So you can see that g = 9.8 m/s^2 and v0_{y} = 28.284. Note that v0_{y}
is actually v0_{y} = v0*cos(angle), where v0 is the initial velocity of
the projectile and angle is the angle at which it was sent off.

The projectile motion is a two-dimensional motion where the projectile moves in x (distance) and in y (height) directions. In this motion the projectile is usually sent of from its initial location (x0 and y0) with an initial speed (v0) at an initial angle (angle).

Assuming our initial velocity is v0 = 40 (m/s) and the initial angle is angle = 35 degree. The initial horizontal and vertical speeds are computed using:

InitialHorizontalSpeed = v0 * cos(angle)

InitialVerticalSpeed = v0 * sin(angle)

On Earth gravitational constant g = 9.8 (m/s^2).

We want to create 4 different plots to:

- Visualize the
**horizontal distance**of the projectile from its initial location at different times. - Visualize the
**height**of the projectile from its initial location at different times. - Visualize the
**horizontal speed**at different times. - Visualize the
**vertical speed**at different times.

Note that horizontal speed will never change and it is always the same as InitialHorizontalVelocity. The above values are computed as following (note three are time-dependent):

**distance** =
InitialHorizontalSpeed*time

**height **= -(1/2)*g*time^2 +
InitialVerticalSpeed * time + initialHeight

**verticalSpeed** = -g*time +
IntialVerticalSpeed

**Please note that in Excel you have to type angle by pi/180 to convert to
Radian. you will type angle*PI()/180. for example:
=v0*COS(angle*PI( )/180)**

**horizontalSpeed**
= InitialHorizontalSpeed

Once again note that the last one is always constant because it is not time-dependent.

We need to compute these values at some time instances. We will use the total time that the projectile is traveling before it comes to stop and will create equally spaced time instances between 0 to TMAX. TMAX is computed as:

TMAX = 2*v0*sin(angle)/g

So you need to start the time from 0 and end it at
TMAX. Use an Excel spreadsheet to solve
this problem. Please use **cell naming **for
**angle, v0, Initial height, and g.**

Create three plots.

Make observations based on the plots you have created.

What will happen to the graph if we repeat the same experiment on planets:

moon: g = 1.63333 m/s^{2}

Mars: g = 3.26667 m/s^{2}

Europa (moon
of Jupiter): g = 1.61 m/s^{2}

Create all the graphs for these cases and make your observations.

**Some Sample
Data and Assignment you can do on Your Own:**

(graphing 1) Use linear regression to find a formula relating the amount of the specimen to the output from the gas chromatograph in the dataset.

http://lib.stat.cmu.edu/DASL/Stories/Chromatography.html

(graphing 2) The inverse of Hubble’s constant can be used as
an estimate of the time since the big bang. Use the data to fit a curve to find
Hubble’s constant (note that the relationship has no intercept).

http://lib.stat.cmu.edu/DASL/Stories/Hubble'sConstant.html

(graphing 3) Investigate Gompertz’ law for population
growth.

http://lib.stat.cmu.edu/DASL/Stories/Gompertz'sLaw.html

(graphing 4) Investigate the relationship between mortality
from breast cancer and average annual temperature..