Managing Data Support Outline and TOC

On this page find a table of contents and outline. See also the example data files and downloads page.

The book is split into two main parts covering broadly topics related to preparing your data and using your data respectively. There are nine chapters in total.

The appendix contains answers to the end-of-chapter exercises and notes about the differences between versions of Excel.

Buy this book online

The book is split into two main parts covering broadly topics related to preparing your data and using your data respectively. There are nine chapters in total. The appendix contains answers to the end-of-chapter exercises and notes about the differences between versions of Excel. Click on a heading to go to a more detailed description:

Part 1: Preparing your dataset.

  1. Arranging your data.
  2. Managing your data: building your dataset.
  3. Managing your data: checking your dataset.

Part 2: Using your dataset.

  1. Making sense of your data.
  2. Exploring regression data.
  3. Exploring time-related data.
  4. Exploring association data.
  5. Exploring differences data.
  6. Sharing your data.

Introduction

Data are important if you are a scientist. You need data to answer research questions, test hypotheses and to undertake the process of science itself. Managing your data effectively is fundamentally important to making the scientific process run smoothly.

If you can manage your data effectively you can save time and effort, allowing you to focus on what is important, the patterns in your data. If you manage your data effectively you can explore your data more fully and produce high-quality graphs to help you and others understand your data. Managing your data more effectively allows you to share your data more easily and communicate with others, a cornerstone of the scientific process.

Microsoft’s Excel spreadsheet is virtually ubiquitous and familiar to most computer users. It is a robust program that makes an excellent storage and manipulation system for many kinds of data, including scientific data. Excel is able to act like a database and this ability gives it great power and flexibility. Using Excel you can explore your data and produce summary information necessary to help you in your research. You can also produce high-quality graphs, which can help you and others gain more insight into the patterns in your data. Using Excel you can save your data in a variety of formats, allowing you to share your data with other researchers and specialist analytical software as necessary.

Who this book is for

The book is aimed at scientists at all levels. It is especially aimed at university-level students, from undergraduates to postdoctoral researchers. However, high school students should find the IT skills easy to acquire and the data management skills learnt will be a good foundation for any branch of further study.

Although aimed at scientists, the data examples are all “scientific”, any student who needs to use data should find this book helpful.

What you will learn in this book

This book is intended to give you some insights into the processes involved in managing scientific data, allowing you to focus on your research questions. In particular you will learn:

  • How to maximize the usefulness of your data by maintaining a scientific recording format.
  • How to use your spreadsheet like a database.
  • How to manage your data, checking for errors and maintaining the most useful dataset possible.
  • How to summarize your data visually and numerically.
  • How to share your data and summaries (numerical and graphical).
  • How to ready your data for further analyses that Excel is unable to conduct.

In short, you’ll learn everything you need to deal with data.

How this book is arranged

This book is split into two main parts, which cover issues surrounding the preparation and use of your data respectively. The first part is concerned with the preparation of your dataset and deals with the arrangement and management of your data. The second part deals with the use of your dataset and is concerned with ways to explore and share your data.

The first chapter deals mostly with showing you how to set out your data in a logical and useful manner. This systematic approach to data layout (let’s call it Scientific Recording Format) underpins everything else.

The second chapter deals with the building of your dataset and covers a range of data management issues, including:

  • Adding to your data.
  • Editing your data.
  • Re-arranging your data.
  • Saving your data.
  • Keeping notes and ancillary material.

The third chapter deals with management issues relating to error checking, an important but often overlooked task.

The last few chapters deal with ways of using your data, such as:

  • Visualizing your data – creating summary graphs and charts is important for you to make sense of your data and also to convey information about your data to others.
  • Summarizing your data – lots of numbers are meaningless without interpretation and summary information. This summary information includes averages and correlations.
  • Sharing your data – it is important to be able to share your data with others. You may also need to share your data with other computer programs so that you can carry out statistical analyses that Excel is unsuited for.

Throughout the book you will see example exercises that are intended for you to try out. In fact, they are expressly aimed at helping you on a practical level – reading how to do something is fine but you need to do it for yourself to learn it properly. The Have a Go exercises are hard to miss.

At the end of each chapter there is a summary table to help give you an overview of the material in that chapter. There are also some self-assessment exercises for you to try out. The answers are in the Appendix.

Book Outline

Managing Data Using Excel contains 9 chapters, split into two main parts.

Part 1. Preparing your dataset

Arranging and Managing Your Data

Before you can undertake any scientific analyses you need to prepare your dataset. Your data need to be in an appropriate arrangement so that you can manage them more easily. Part of this data management includes editing, rearranging and checking for errors.

The main topics are:

  • How to arrange your data in scientific recording format.
  • How to use your dataset like a database.
  • How to add new variables to your dataset.
  • How to make and use index variables.
  • How to edit your data.
  • Using Filter tools.
  • Using Sort tools.
  • Using Find & Replace tools.
  • How to rearrange your data.
  • How to check your data for errors:
  • Using Sort and Filter tools.
  • Using Pivot Tables.
  • Using dot charts to help check your data for errors.
  • Using data validation tools.

Essentially this part covers topics concerned with the setting out (arranging), building and checking of your dataset.

1. Arranging your data

This chapter shows how the layout and management of your data fits into the scientific process. You’ll see some examples of data layout that could be used but which do not necessarily maximize the usefulness of the dataset.

You’ll see how a scientific recording layout can help by setting out data in a more logical and useful manner. There are notes about different sorts of variables and how you can use things like dates and times to maximum effect.

The last part of the chapter deals with ancillary information, such as notebooks and recording sheets.

1.1 SYSTEMS FOR DATA LAYOUT
1.1.1 Common ways to layout data
1.1.2 A standard layout for data
1.2 Recording Format
1.3 Turn your Data into a Database
1.4 Planning your Data Layout
1.4.1 Index variables
1.4.2 Using dates
1.5 Using Recording Sheets and Notebooks
1.5.1 Data from equipment
1.5.2 Data in notebooks
1.5.3 Data in recording sheets
1.6 SUMMARY
1.7 EXERCISES

2. Managing your data: building your dataset

This chapter deals with the business of building a dataset. You’ll see how to add different sorts of variable, including index and date variables. Later on you’ll see how to edit and rearrange your data. The editing process involves the use of Excel tools such as Find, Sort and Filter, which are generally useful in managing your dataset.

2.1 ADDING TO YOUR DATASET
2.1.1 Adding new variables
2.2 Editing your Data
2.2.1 Find and replace
2.2.2 Sorting data
2.2.3 Filtering data
2.3 Rearranging your Data
2.3.1 Rearranging data by row
2.3.2 Rearranging data by column
2.4 Saving your Data
2.5 SUMMARY
2.6 EXERCISES

3. Managing your data: checking your dataset

You want the “best” dataset you can get and this chapter looks at how you can check over your data for errors. You’ll also see how to use Excel Data Validation tools. These can help you minimize errors as data is entered as well as afterwards.

Later sections look at graphical methods to look over your data as a way to spot potential errors.

3.1 TYPOGRAPHICAL ERRORS
3.1.1 Sort tools in error checking
3.1.2 Pivot Tables in error checking
3.1.3 Filter tools in error checking
3.2 Validating Entries
3.2.1 Validation during entry
3.2.2 Validation after data entry
3.3 Numerical Errors
3.3.1 Errors in sample data
3.3.2 Errors in correlation data
3.4 SUMMARY
3.5 EXERCISES

Part 2. Using your dataset

Summarizing, visualizing and sharing your data

This part is concerned with how to use your data. This involves using data exploration tools to summarize and visualize your data. You also need to be able to share your data with others.

The main topics are:

  • How to tell what kind of dataset you have.
  • How to summarize numbers:
    • Using averages.
    • Using measures of dispersion.
    • Using the Analysis ToolPak.
  • How to look at the shape of your data:
    • Using data frequency and “shape” statistics.
    • Using histograms.
  • How to look for patterns in your data – data mining.
  • How to summarize your data numerically:
    • Using correlation matrices.
    • Using contingency tables.
    • Using Pivot Tables.
  • How to choose the appropriate graph for your data.
  • How to use Sparklines for quick visual summaries.
  • How to make different sorts of graph and Pivot Chart:
    • Using scatter plots for correlation and regression data.
    • Using line plots for time-series data.
    • Using bar charts for association data.
    • Using bar charts for differences data.
      • Adding error bars.
    • Using box-whisker plots for differences data.
  • How to share your data with others:
    • Exporting data in various file formats.
    • Saving Pivot Tables.
    • Saving charts and Sparklines.

Essentially this part covers topics concerned with using the dataset you’ve created. Chapter 4 deals with some general ways to summarize and visualize data as well as how to recognize different sorts of dataset.

Chapters 5–8 deal with the methods of summary for different sorts of dataset. The final chapter deals with aspects of sharing your data.

4. Making sense of your data

Getting your data into a sensible arrangement, managing your data and checking over them for errors are important tasks. However, they are simply the forerunner to the most important phase of all, making sense of your data. In this chapter you’ll see some of the basics of data summary and ways to visualize data, using graphs.

You’ll also see how to recognize the kind of dataset that you are dealing with, as this affects the approach to data summary and visualization that you’ll need to use. The later chapters each deal with a specific kind of dataset.

4.1 TYPES OF DATASET
4.1.1 Correlation and regression datasets
4.1.2 Association datasets
4.1.3 Differences datasets
4.2 Ways to Summarise your Data
4.2.1 Averages
4.2.2 Variability
4.2.3 Replication
4.2.4 Data distribution
4.2.5 Using the Analysis ToolPak
4.3 Ways to Visualise your Data
4.3.1 Types of graph
4.3.2 Pivot Charts
4.3.3 Sparklines
4.4 SUMMARY
4.5 EXERCISES

5. Exploring regression data

This chapter focusses on datasets that are primarily concerned with correlations and regressions. These datasets are designed to help you find links between variables. You’ll see how to explore the correlations between variables using correlation matrices.

You’ll also see how to explore the relationships between variables visually using scatter plots.

5.1 FINDING CORRELATIONS IN YOUR DATASET
5.1.1 Correlation matrices using all-numeric data
5.1.2 Correlation matrices using grouping variables
5.1.3 Highlighting correlations
5.2 Visualizing the Correlations
5.2.1 Sparklines
5.2.2 Scatter plots
5.2.3 Checking assumptions of normality
5.3 SUMMARY
5.4 EXERCISES

6. Exploring time-related data

The general importance of the time-related element is that you can follow measurements from one time period to the next, this leads to a particular way of visualizing your data, the line plot. In this chapter you’ll see how to explore time-related data using Pivot Tables and line plots.

6.1 SUMMARIZING TIME-RELATED DATA
6.1.1 Using Pivot Tables
6.2 Visualizing Time-Related Data
6.2.1 Sparklines
6.2.2 Line plots
6.3 SUMMARY
6.4 EXERCISES

7. Exploring association data

When you are looking for associations your data generally take a particular form. You have counts of items that fall into particular categories. In this chapter you’ll see how to build contingency tables from basic frequency data.

You’ll also see how to visualize the data using various sorts of chart. In generall the bar chart (column chart) is the most useful but pie charts are sometimes helpful and you’ll see those used too.

7.1 CONTINGENCY TABLES FROM YOUR DATA
7.2 Visualizing your Contingency Tables
7.2.1 Column and bar charts
7.2.2 Pie charts
7.3 SUMMARY
7.4 EXERCISES

8. Exploring differences data

When you have differences data you are looking to split your data into chunks, that is samples based on certain groupings. You may have multiple grouping variables that split the dataset into an hierarchy of samples or you may have a single grouping variable.

In this chapter you’ll see how to summarize your sample groups numerically, using several methods including Pivot Tables and the Analysis ToolPak.

You’ll also see how to visualize the data using graphs. The bar chart is the most commonly used graph and you’ll see how to make bar charts and add error bars to show sample variability. You’ll also see how to produce box-whisker plots, which allow you to present a lot of information about each sample in a compact manner.

8.1 SUMMARIZING DIFFERENCES DATA
8.1.1 Using Pivot tables
8.1.2 Using the Analysis ToolPak
8.2 Visualizing Differences Data
8.2.1 Sparklines
8.2.2 Histograms
8.2.3 Bar charts
8.2.4 Box-whisker plots
8.3 SUMMARY
8.4 EXERCISES

9. Sharing your data

Sharing your data is an important part of the scientific process. In a general way this is how scientific knowledge is gained and our understanding of the world advances. Ideas are tested by collecting data and exploring the results. These results are then shared with the scientific community (or general public) and the process of science moves onward.

In this chapter you’ll see how to share data with other computer programs, that may need a diferent format to carry out statistical analyses. You’ll also see how to export selective parts of your dataset, thus using your data like a database.

In later sections you’ll see how to share graphs and learn about the elements needed to make a useful chart.

9.1 EXPORTING YOUR DATA
9.1.1 Sharing your entire dataset
9.1.2 Sharing portions of your dataset
9.2 Sharing your Summaries
9.2.1 Sharing Pivot Tables
9.2.2 Sharing summary results
9.3 Sharing your Graphs
9.3.1 Sharing Sparklines
9.3.2 Sharing charts
9.4 SUMMARY
9.5 EXERCISES

Appendix

APPENDIX 1: ANSWERS TO EXERCISES

Each chapter ends with a summary and several self-assessment questions. The answers are here.

APPENDIX 2: DIFFERENCES BETWEEN VERSIONS OF EXCEL

A few notes about different versions of Excel. I used Excel 2010 for Windows throughout the production of this book. If you use another version of Excel then this section should help you to spot where things are different.

In general Excel 2013 and 2007 are quite similar to 2010. Version 2013 has a few alterations in the appearence of some of the menu commands but overall the functionality is very similar. Excel 2007 is unable to make Sparklines but that is the only major difference in functionality.

If you use the Mac version of Excel the menu items will appear a bit different. The Analysis ToolPak is not available and there are a few other minor issues.

 

If you use Open Office then you are somewhat more limited. The Pivot Table options are rather more restricted and you cannot use Sparklines or the Analysis ToolPak. However, much of the material in the book will be accessible to you and you ought to be able to get the hang of most of the general ideas.

My Publications

I have written several books on ecology and data analysis

An Introduction to R
Data Analysis and Visualisation
£35.00
Beginning R: The Statistical
Programming Language
£26.99
Statistics for Ecologists
Using R and Excel
£34.99
The Essential R
Reference
£44.99
Community
Ecology
£39.99
Managing Data
Using Excel
£24.99

Register your interest for our Training Courses

We run training courses in data management, visualisation and analysis using Excel and R: The Statistical Programming Environment. Courses will be held at one of our training centres in London. Alternatively we can come to you and provide the training at your workplace. Training Courses are also available via an online platform.




    Get In Touch Now

    for any information regarding our training courses, publications or help with a data project