Mar 27, 2026

Public workspaceData Analysis with Excel (Windows Version)

This protocol is a draft, published without a DOI.
  • Lavi Singh1,
  • Abigail Graetz2,
  • Benjamin Schwessinger1
  • 1Australian National University;
  • 2ANU
  • BIOL3106/6106
Icon indicating open access to content
QR code linking to this content
Protocol CitationLavi Singh, Abigail Graetz, Benjamin Schwessinger 2026. Data Analysis with Excel (Windows Version). protocols.io https://protocols.io/view/data-analysis-with-excel-windows-version-jwvbcpe2p
License: This is an open access protocol distributed under the terms of the Creative Commons Attribution License,  which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited
Protocol status: Working
We use this protocol and it's working
Created: March 27, 2026
Last Modified: March 27, 2026
Protocol Integer ID: 313987
Keywords: steps for 16s excel analysis, data analysis with excel, excel analysis, step excel analysis, excel, windows excel version, sequencing data, batch blast search, step excel analysis of the class, stacked bar chart, data analysis, data
Abstract
This specific protocol describes the step-by-step Excel analysis of the class ITS and 16S sequencing data. We have performed a batch BLAST search and will use the output files to generate and interpret a stacked bar chart.

This protocol uses ITS data only for demonstration purposes. The steps for 16S Excel analysis will be identical, aside from the input files. There are two versions of this protocol- Windows and MacOS version. Please use the relevant version.

Windows Excel Version 2603 (Build 16.0.19822.20086)
Troubleshooting
Data Import into Excel
The BLAST output files will be made available to you on the course page. Download output files from OneDrive onto device. Save the files in a location you can access at your convenience.
Open a new Excel workbook on your device and save it in a location you can access at your convenience. We recommend saving the file with naming convention “YYYYMMDD_ITSAnalysis”.  You will have two excel workbooks. One for 16S and one for ITS, so label accordingly.
In the Data bar, go to the Get & Transform Data section. Select 'From Text/CSV'.


A window will open and prompt you to select a file. Navigate to the location where you have saved the sequencing data. The folder will initially appear blank. Change the format to 'All Files'.


Select the file and press “Import”.


Once you’ve selected the file to import, another window will appear. Accept the default settings (as in the image below) and press ‘Load’.


Your data should import into the spreadsheet from the cell you’ve selected, with the appropriate headings (column titles). It will look like this.


Pivot Tables
Navigate to Insert > Pivot Table. Select 'From Table/Range'.


A window will appear and prompt you to select the data range, and where you want to create your pivot table. Ensure that the appropriate cells are selected. We recommend creating the pivot table in the same sheet as your data, to avoid confusion. In the same vein, it is recommended that you rename your Excel workbook, or tab, to help you remember which samples your data belongs to.




A blank pivot table will be created for you, and an option bar will open on the right hand side of the screen to build your pivot table. We recommend using the ‘stitle’ field as your row values for this analysis. Tick this box and it should appear in the “Rows” column If not, click and drag it from ‘Field Name’ to ‘Rows’.


Think about what other fields you might want to add, to help you analyse this data. Below is a suggested selection.




Your selected fields will appear in your pivot table. But, we need to make sure these are useful values. Think about the best way to represent this data – count data might make sense for the stitle field, but is the sum of all pident values for a given species useful? In our case, averages will be more helpful.
To adjust the way values are presented, click on the drop-down arrow next to the value column headers. Select 'Value Field Settings'. A new window will pop up as shown in the image below. Click on ‘Average’. Do this for pident, qcovs and alen.


Now, we have a pivot table with our averages and counts. However, pivot tables are tricky to manipulate so we will copy over the pivot table into a new sheet. In the Excel header bar, navigate to Home and click on Paste. In the dropdown, select ‘Paste Values’.




We suggest renaming this new sheet to keep track, e.g. ‘Analysis_G5_TG5’. Also enter the research and treatment group name next to the ‘Row Labels’ column.




Next, we will create a new column called ‘Relative Abundance’. Relative abundance is the ‘Count of stitle’/ Total Count of stitle * 100. Enter this formula into the first cell and click the drag down option to calculate abundance for all. In this example, the total count is 5572.




We will convert the data to a table format so we can interact with it.
Select the data and navigate to Insert > Table. Press 'OK'.


Now that the data is in an interactive table format, click on the arrow next to the Relative Abundance column. A new window will pop up. Click on ‘Sort Largest to Smallest’.


However, relative abundance as a metric is only useful if you have a reliable match i.e. query coverage. Just relying on relative abundance without considering query coverage can inflate false positives. We will discuss this in class.
To filter by query coverage, click on the arrow next to the Average of qcovs column. A new window will pop up. Clock on the arrow next to 'Number Filters' and select ‘Greater than’. In the window that pops up, enter 25. You should now have a shorter table.




Create a new Excel sheet and name it ‘Analysis_allTG’. Copy over the table you created in Step 22 to this sheet.
Repeat Step 3 till 22 for all treatment groups. Copy over the tables you create in Step 22 to the sheet named “Analysis_allTG”. An example of what your sheet will look like at this stage is shown below.


We will manipulate this data to make it easier to plot. Copy over all the Row labels onto the right side as shown here.


You may notice that there are a few repeats of the same name. To get rid of these, we will use an Excel function called UNIQUE. Select a cell on the right side of these labels. In that cell, enter ‘=unique’ and select all the row labels. Hit Enter and you should now have a shorter table.


Copy these new labels and paste them again as Values.
Delete the column containing the Row Labels with repeats. We will not be needing those now. Our sheet will look like this.


Next, we will enter the relative abundances for all treatment groups. Create five new columns labelled TG1-TG5 beside the Row Labels.


We will use an Excel function called VLOOKUP. If you specify a value, this function will look it up in a selected range and return a corresponding value from a specified column.
In the first empty cell of the table, type ‘=vlookup’. The function will prompt you to select the value you want to look up. Select the first Row Label and enter a comma beside it.


Then, select the range of cells to search. The column is for TG1, so we will select the range of data for TG1.


We want the function to look up relative abundance, which is on the 6th column. Enter the column number, in this case it is ‘6’. Make sure to add a comma after each value.
Enter ‘FALSE’ in the function to return exact matches.
Your formula will look like the image below.


As we will be dragging the function down to other cells, we will lock the array to always search for the specified TG. To do this, Excel uses “$” signs. Enter ‘$’ signs in front of your table array values e.g. $A$12:$F$15. Hit Enter.




Drag down the formula to give you relative abundances for each Row Label. If there isn’t a match, Excel will return ‘N/A’.




Repeat Steps 31 – 37 for each TG. Change the range of cells you search through for each TG accordingly as they will be different.
Stacked Bar Chart
Once you have the table ready, copy and paste it onto the same sheet again as Values. This makes manipulating the data smoother.
Select the entire table and navigate to the Excel header. Press Insert3e Column Charts. In the dropdown menu, select 100% stacked bar chart.


The bar chart will have Row Labels on the y axis and TG as the key. Click on ‘Switch Row/Column’ to swap these.


Your graphs should be mostly ready at this stage. We recommend adding a descriptive chart title and axis labels. You can also simplify the Row Labels to make them clear. For example, ‘NR_156348.1 Cladosporium domesticum CPC 22307 ITS region; from TYPE material’ can be changed to ‘Cladosporium domesticum’.
Feel free to customize the colours and have fun interpreting your graphs!