# Sport research/Data analysis

To find out whether there is a difference in the data you have collected we tend to employ the area of science referred to as statistics. Statistics can actually do much more then just tell you whether or not a difference takes place, it can also assist in the planning of your study. And if you think statistics is boring, perhaps check out this enthusiastic video, The Joy of Statistics.

If you are quite **new to statistics**, then a great place to start is on Will Hopkins' website (an excellent resource created to assist researchers and students in the exercise and sport sciences better understand statistics), specifically, start with the basics here and click on the 'next' button to progress through it. Wikipedia also has a good introduction to statistics worth reading through.

Hopkin's tends to present alternatives to many traditional statistical methods. Hopkin's explain's the traditional statistical approach as well as his view on it's limitations. A recent article in the conversation also points out why hypothesis and significance tests ask the wrong questions. Worth thinking about as part of your research design.

If you are **more comfortable with the basics** then you will probably just want to get down to answering the questions you need to. The data analysis you undertake will depend on the type of research, the research design, and they type of data you are working with. Go back and revisit the research design section if you are not sure about the power of your study, subject numbers or randomisation. If you have all your data then you will need to:

- Clean your data
- Analyse for differences/probability/associations
- Represent the data (visualisation).

If you are looking for the **answers to specific questions**, a number of resources are available online.

## Cleaning your data[edit | edit source]

From Data Cleaning in Statistics wikibook:

- 'Cleaning' refers to the process of removing invalid data points from a dataset.

- Many statistical analyses try to find a pattern in a data series, based on a hypothesis or assumption about the nature of the data. 'Cleaning' is the process of removing those data points which are either (a) Obviously disconnected with the effect or assumption which we are trying to isolate, due to some other factor which applies only to those particular data points. (b) Obviously erroneous, i.e. some external error is reflected in that particular data point, either due to a mistake during data collection, reporting etc.

- In the process we ignore these particular data points, and conduct our analysis on the remaining data.

For the rest, see the data cleaning wikibook entry.

You should always be able to justify any data cleaning you do, and often, you may be able to avoid it by using alternate analysis. As part of the reporting process remember that the methods section should offer sufficient information so that your study can be replicated, this includes any statistical analysis.

## Data Analysis[edit | edit source]

The types of things you will likely want to do with your data is provide:

- Descriptive summaries
- Investigate if differences exist, or better, describe (the likelihood of) differences/changes
- Investigate the strength of associations between measures you have taken

### Descriptive statistics[edit | edit source]

Methods used to summarize or describe a collection of data are referred to as descriptive statistics. In descriptive statistics we generally focus on the centrality of the data (average) and the variation in the data. When describing a group, we want to know what most of the data points focus around. We can calculate various types of averages, most commonly the mean, but also the mode or median as well as other types of averages. The average or mean tells us something about the data, but sometimes the variation in the data tells us much more about the behavious of our results. Variation is most commonly reported in terms of the standard deviation (SD) of the data, but may also be commonly reported as a standard error (SE) or even the range. There are other ways to describe the variation as well which we won't go into here. Hopkin's has a view on whether or not you should report Mean and SD or Mean and SE? too.

As with all data that you present, it is important to remember that you don't just report all the numbers you get, you have to consider how many significant figures to report. How many digits should you report?

### Inferential statistics[edit | edit source]

When we want to draw a conclusion about differences/changes/relationships between data we have collected we employ inferential|statistics. They type of conlusion we want to make depends upon the research design. The type of conclusion we want to make depends upon the research design. Yes, that is written twice, for a reason.

We generally either want to detect a significant difference between groups/time points etc or describe the likelihood of differences or we want to find out the relationship between variables.

If we want to know how one variable varies in relation to another we generally use a corelation statistic. Note that this is not evidence of a causal relationship, that one variable causes the other to vary in a certain way. You need to show this through appropriate research design.

If we want to know if groups are different, or change, then the type of statistics we employ is very dependent on the research design. Hopkin's offers solutions, and helps you make the decision of what to use through his website. It's important to get this right, and to justify what you do. **Importantly**, there is not necessarily one correct statistical method to use on your data, you just need to be able to justify what you have done. Make sure you talk to colleagues, supervisors and other researchers about these decisions. Below are just some of the links to spreadsheets that Hopkin's has created to help you analyse your data.

- Spreadsheets for Analysis of Controlled Trials, with Adjustment for a Subject Characteristic
- A Spreadsheet for Analysis of Straightforward Controlled Trials
- Making Meaningful Inferences About Magnitudes
- Progressive Statistics
- Linear Models and Effect Magnitudes for Research, Clinical and Practical Applications

## Qualitative Analysis[edit | edit source]

Most of the above deals with quantitative research, and much of it also applies to qualitative work as well once the data has been coded. There are a few things to bear in mind though because the data tends to be categorical (the response is in a group or not) instead of a continuous data set. James Neill offers an excellent overview of qualitative statistics with links to other worthwhile resources. Content analysis is a methodology in the social sciences for studying the content of communication. Earl Babbie defines it as "the study of recorded human communications, such as books, websites, paintings and laws."^{[1]}

If you are doing qualitative research, it is worth checking out the links from the Qualitative analysis wikiversity page for much more information on conducting and analysing the research (although most is designed for psychology students the tutorials and lectures are useful for many fields).

## Visualisation[edit | edit source]

The most appropriate format to present your results will change depending on the circumstances and there may be many alternatives available. An obvious 3 ways in most research is listed below.

- In text
- In a table
- In a figure. This may be a type of graph or another type of illustration

In text

I'm an advocate that you should tell the reader as much as you can about the data, present the numbers, effect sizes, means and standard deviations... whatever you have. Whilst you will interpret it in your discussion, you should have enough information so the reader can make up their own mind about the results. If you have something to hide... you should think about not publishing... or openly acknowledging limitations that exist.

In 2011 the Journal of Physiology printed a perspective entitled: Show the data, don't conceal them with some advice on presenting figures and tables etc. Altman and Bland (two quite famous statisticians) also offer advice on presenting numerical data. Things like presenting the actual p value. See the British Medical Journal article: Statistics Notes: Presentation of numerical data. The last real in text comment I'll make about the presentation of data, and a pet hate of mine, refers specifically to the issue of how many decimal places to use (as well as some other tips) in Troublesome decimals; a hidden problem in the sports medicine literature.

A single statistic only tells a small part of the story. Variances of differences between groups tell you a little more, but a well designed statistical graphic helps us explore, and perhaps understand, these relationships much more.

- Tips on using graphs from Statistics Canada
- Using Graphs and Tables on Presentation Slides by Dave Paradi
- Charts and Graphs: Choosing the right format by Mindtools
- Decision tree for decidiing on which graph to use by Labright, North Carolina State University

Many of the examples above do not contain the detail of what a scientific figure should contain, but are useful in terms of some of the types of graphs available and their use. In scientific presentation, the parts of the graph are important too. Parts of grpahs worth considering for scientific publication include:

- title
- indicators of significant differences
- axis labels
- the scale and continuous (or not) nature of the axis
- legend
- error bars

MIT provide a very good summary of the importance of graphics as well as what to, and what not to include on your graphic(s).

Traditional graphical forms of representing data are presented here, and are generally most appropriate for traditional scietnific publications, but to really get the most out of your data, especially when it is online on on video, it's almost worth employing a graphic design consultant. The area of emerging oportunities for data visualisation are incredible and worth exploring.

## Activity[edit | edit source]

Activities are mini-tasks that will give you some practice with the concepts of each section. Activities should appear here soon, if not, feel free to add some open access ones yourself.

## Task[edit | edit source]

Before data collection

- Decide how you think you will analyse your data.
- Try to predict any issues you might have with your data

After data collection

- Conduct any data cleaning and record what you did and why
- Carry out the analysis. Record everything you did
- Write out all the methods for your data analysis
- Present the data in a suitable format for your audience (it may be a specific scientific journal, or on a blog)

## Resources[edit | edit source]

Will Hopkin's A new view of statistics is a resource you will keep going back to time and time again.

### Data Analysis[edit | edit source]

McGraw-Hill Companies (2007). Educational Research and statistics: Valuable web sites available at http://fms.wsd.wednet.edu/TechLab/educationallinks.htm

Quick R - a website on R for experienced users of statistical packages such as SAS, SPSS, Stata, and Systat.

Research Methods in the Social and Natural Sciences offers a basic tutorial-style program that covers experimental, correlation, naturalistic observation, survey and case study methods. 'The Lab' also has revision tests.

School of Library Archival and Information Studies The University of British Columbia (2004). General survey research. Research Methods resources on the WWW is an extensive site covering a wide range of research methods.

Simple R - a website created by John Verzani and which are notes that were turned into the book Using R for Introductory Statistics, published Fall 2004 by Chapman Hall/CRC Press. Available at http://www.math.csi.cuny.edu/Statistics/R/simpleR/index.html

#### Tools[edit | edit source]

Data analysis generally relies on the tools used to undertake it. The range of tools is immense, from statisitical software packages to spreadsheets to pen and paper and specific graphics packages. A selection of common tools are listed below.

SPSS

Information on SPSS can be found from its website http://www.spss.com/au/index.htm

R

R is an open source statistical package. It can be downloaded for free from http://www.r-project.org
Advantages of R:

- Multi-platform
- Extremely powerful
- Lots of free documentation is available

A summary of Survey Analysis Software is available from http://www.hcp.med.harvard.edu/statistics/survey-soft/
There is an R-Users group in the ACT. Their website http://www.meetup.com/Canberra-R-Users-Group/

### Visualisation[edit | edit source]

Some tips on preparing graphics for publication from Will Hopkins.

Link to a number of visualization web resources courtesy gsiemens

#### Tools[edit | edit source]

Many of the data analysis tools also produce graphical representations of data. Some are more useful than others. Some more specific tools are listed below, some of which have statistical capabilities (often limited) themselves.

Graphpad Prism, commercial software.

Microsoft Excel and alternatives.

### References[edit | edit source]

- ↑ Wikipedia: Content Analysis http://en.wikipedia.org/wiki/Content_analysis Accessed 12/01/2011