Data Analysis with Excel (MacOS Version)

Lavi Singh; Abigail Graetz; Benjamin Schwessinger

Mar 26, 2026

Data Analysis with Excel (MacOS Version)

This protocol is a draft, published without a DOI.

¹Australian National University;
²ANU

BIOL3106/6106

Lavi Singh

Australian National University

Protocol Citation: Lavi Singh, Abigail Graetz, Benjamin Schwessinger 2026. Data Analysis with Excel (MacOS Version). protocols.io https://dx.doi.org/

License: This is an open access protocol distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited

Protocol status: Working

We use this protocol and it's working

Created: March 25, 2026

Last Modified: March 26, 2026

Protocol Integer ID: 313933

Keywords: steps for 16s excel analysis, macos excel version, data analysis with excel, excel analysis, step excel analysis, excel, sequencing data, batch blast search, step excel analysis of the class, stacked bar chart, data, data analysis

Abstract

This specific protocol describes the step-by-step Excel analysis of the class ITS and 16S sequencing data. We have performed a batch BLAST search and will use the output files to generate and interpret a stacked bar chart. 

This protocol uses ITS data only for demonstration purposes. The steps for 16S Excel analysis will be identical, aside from the input files. There are two versions of this protocol- Windows and MacOS version. Please use the relevant version. 

MacOS Excel Version 16.107.1

Data Import into Excel

The BLAST output files will be made available to you on the course page. Download output files from OneDrive onto device. Save the files in a location you can access at your convenience. 

Open a new Excel workbook on your device and save it in a location you can access at your convenience. We recommend saving the file with naming convention “YYYYMMDD_ITSAnalysis”.  You will have two excel workbooks. One for 16S and one for ITS, so label accordingly.

In the header bar, select File > Import. 

A window will open and prompt you to select an import format. Select CSV file, and navigate to the location where you have saved the sequencing data shared with you on OneDrive. 

Select the file and press “Get Data”. 

Once you’ve selected the file to import, another window will appear. Accept the default settings (as in the image below) and press ‘Finish’. 

Finally, select where you want the data to import – either in a cell on the current sheet, or a new sheet. Press ‘Import’. 

Your data should import into the spreadsheet from the cell you’ve selected, with the appropriate headings (column titles). It will look like this. 

Pivot Tables

We will again convert the data to a table format so we can interact with it. Select the data by clicking column A, holding the shift key, and clicking column H. In the Excel header bar, navigate to Insert > Table.

When the Create Table window appears, make sure the box ‘My data has headers’ is ticked. Excel will alert you about losing connections between data – this doesn’t matter for this analysis, so click ‘Yes’.

Your data should now appear in blue, with interactive column headers. Select your table as before (by clicking column A, holding the shift key, and clicking column H.), and navigate again to Insert > Pivot Table.

A window will appear and prompt you to select the data range, and where you want
to create your pivot table. Ensure that the appropriate cells are selected. We
recommend creating the pivot table in the same sheet as your data, to avoid
confusion. In the same vein, it is recommended that you rename your Excel
workbook, or tab, to help you remember which samples your data belongs to. 

A blank pivot table will be created for you, and an option bar will open on the
right hand side of the screen to build your pivot table. We recommend using the
‘stitle’ field as your row values for this analysis. Tick this box and it
should appear in the “Rows” column If not, click and drag it from ‘Field Name’
to ‘Rows’. 

Think about what other fields you might want to add, to help you analyse this data. Below is a suggested selection. 

Your selected fields will appear in your pivot table. But, we need to make sure these are useful values. Think about the best way to represent this data – count data might make sense for the stitle field, but is the sum of all pident values for a given species useful? In our case, averages will be more helpful.

To adjust the way values are presented, click on the information icon next to the value column headers. A new window will pop up as shown in the image below. Click on ‘Average’. Do this for pident, qcovs and alen.

Now, we have a pivot table with our averages and counts. However, pivot tables are tricky to manipulate so we will copy over the pivot table into a new sheet. In the Excel header bar, navigate to Home and click on Paste. In the dropdown, select ‘Paste Values’.

We suggest renaming this new sheet to keep track, e.g. ‘Analysis_G5_TG5’. Also enter the research and treatment group name next to the ‘Row Labels’ column. 

Next, we will create a new column called ‘Relative Abundance’. Relative abundance is the ‘Count of stitle’/ Total Count of stitle * 100. Enter this formula into the first cell and click the drag down option to calculate abundance for all. In this example, the total count is 5572.

We will again convert the data to a table format so we can interact with it.
Select the data by clicking column A, holding the shift key, and clicking column H. In the Excel header bar, navigate to Insert > Table.

Now that the data is in an interactive table format, click on the arrow next to the Relative Abundance column. A new window will pop up. Under the Sort section, click on ‘Descending’. This will sort your data according to the largest to smallest relative abundance. 

However, relative abundance as a metric is only useful if you have a reliable match i.e. query coverage. Just relying on relative abundance without considering query coverage can inflate false positives. We will discuss this in class.

To filter by query coverage, click on the arrow next to the Average of qcovs column. A new window will pop up. Under the Filter drop down menu, select ‘Greater than’. In the field next to it, enter 25. You should now have a shorter table. 

Create a new Excel sheet and name it ‘Analysis_allTG’. Copy over the table you created in Step 22 to this sheet.

Repeat Step 3 till 22 for all treatment groups. Copy over the tables you create in Step 22 to the sheet named “Analysis_allTG”. An example of what your sheet will look like at this stage is shown below.

We will manipulate this data to make it easier to plot. Copy over all the Row labels onto the right side as shown here.

You may notice that there are a few repeats of the same name. To get rid of these, we will use an Excel function called UNIQUE. Select a cell on the right side of these labels. In that cell, enter ‘=unique’ and select all the row labels. Hit Enter and you should now have a shorter table.

Copy these new labels and paste them again as Values.

Delete the column containing the Row Labels with repeats. We will not be needing those now. Our sheet will look like this.

Next, we will enter the relative abundances for all treatment groups. Create five new columns labelled TG1-TG5 beside the Row Labels. 

We will use an Excel function called VLOOKUP. If you specify a value, this function will look it up in a selected range and return a corresponding value from a specified column.

In the first empty cell of the table, type ‘=vlookup’. The function will prompt you to select the value you want to look up. Select the first Row Label and enter a comma beside it.

Then, select the range of cells to search. The column is for TG1, so we will select the range of data for TG1.

We want the function to look up relative abundance, which is on the 6th column. Enter the column number, in this case it is ‘6’. Make sure to add a comma after each value.

Enter ‘FALSE’ in the function to return exact matches.

Your formula will look like the image below.

As we will be dragging the function down to other cells, we will lock the array to always search for the specified TG. To do this, Excel uses “$” signs. Enter ‘$’ signs in front of your table array values e.g. $A$12:$F$15. Hit Enter.

Drag down the formula to give you relative abundances for each Row Label. If there isn’t a match, Excel will return ‘N/A’.

Repeat Steps 31 – 37 for each TG. Change the range of cells you search through for each TG accordingly as they will be different.

Stacked Bar Chart

Once you have the table ready, copy and paste it onto the same sheet again as Values. This makes manipulating the data smoother.

Select the entire table and navigate to the Excel header. Press Insert3e Column Charts. In the dropdown menu, select 100% stacked bar chart.

The bar chart will have Row Labels on the y axis and TG as the key. Click on ‘Switch Row/Column’ to swap these.

Your graphs should be mostly ready at this stage. We recommend adding a descriptive chart title and axis labels. You can also simplify the Row Labels to make them clear. For example, ‘NR_156348.1 Cladosporium domesticum CPC 22307 ITS region; from TYPE material’ can be changed to ‘Cladosporium domesticum’.

Feel free to customize the colours and have fun interpreting your graphs!