by Pen Lister, on May 06, 2014

Analysing Your Research Data

Snapshot: A broad overview covering a few ways to approach analysing research data, especially suitable for social science research areas

Spotting your Outlier
Spotting your Outlier | Credit: iwanbeijes/freeimages
Spotting your Outlier
Spotting your Outlier | Credit: iwanbeijes/freeimages

Analysing your data can be the most daunting aspect of your research project. It is especially difficult if you've gathered lots of data without thinking about how you will analyse it, how it fits together, how you can compare sets of data, what the data sets even are, and how you might draw conclusions from that data. These questions are the fundamentals of analysing data, as without knowing these basics, you'll never be able to analyse effectively, even if you use simple methods of analysis.

What counts as data?

All sorts of things can count as data for your project. It depends on what your research questions are, and how you have approached your research design. We often use several types and sources of data in research, for comparison purposes, to be able to 'triangulate' our findings. Data from several different sources that appear to point to a similar conclusion make for a much more powerful argument than only one type of data or one type of data source. 

 

It's helpful to think about types and sources of data BEFORE you do any data gathering, to help you design the research approach in a useful and methodical way. Visualising how you can analyse data from different sources in terms of how they might fit, compliment or duplicate each other really helps to decide which data you want to gather, and which might be unnecessary extra work.

Sources and Analysis

The table below is a guide for a variety of research data sources, how you might analyse them, and which type of data they are generally considered to be - qualitative or quantitative.

Research Data Source 

Analysis approaches and methods  

Type of Data 

Literature review findings

The review itself, categorisation tables, comparison tables, dates, sources (journal types),  ... qualitative         

Observation - notes or diary (recorded in systematic or non systematic way)

Fixed questions answered, or open non specific topic common occurrences, categorisations, generated by observer/participant,  ... qualitative or quantitative

Informal discussions online or face-to-face (recorded or evidenced in some way)

categorisation of common topics, dates and times, types of people involved, samples compared,  ... qualitative

Formal interviews or focus group discussions

Sets of to be criteria covered with each interviewee/group with corresponding frequency of responses, negative/positive comments,  ... qualitative

Image or video evidence

number of occurrences, common approaches (content, themes etc), locations, sources... quantitative or qualitative

Questionnaire responses

numbers counted, compared, averaged, measured, noted outliers ... quantitative

Comparisons observed and recorded

Any set of factors relevant as evidence to research; tabled, compared, measured ... quantitative
Experiential testimony (yours, or other people's) participatory notes, journals, categorisations, topic number of occurrences, times created, author type (sample) ... qualitative
Digital sources Web analytics, user experience behaviour, task tracing, usability testing ... quantitative or qualitative

  Table showing data source, analysis approach and type  

 

Any or all of the data sources in Table 1 could be used in the same piece of research - though often you may not need quite so many different types in order to establish reasonable conclusions in relation to your research question. The table shows how each source might be analysed, along with whether it's quantitative or qualitative. 

We also think about primary and secondary data sources. Primary Data (aka Primary Research) is the research data you produce from first hand enquiry. Secondary Data is 'other data' (i.e.not generated by you), which can come from a variety of sources, for example literature reviews or re-analysing data from old journal papers would be secondary data. This can all be used by you to contribute to your findings, discussion and any relevant conclusions. Your own interviews, questionnaires or observation notes would be primary data.

We know the difference between qualitative and quantitative data - the table indicates which source might fall into which category. So how do we analyse each of these, and can we use both types of data in our findings to help draw conclusions? The short answer is yes, and we can also use quantitative data to inform qualitative research, or vice-versa. For example carry out an introductory questionnaire to form the basis of who we want to interview about different topics, thereby creating 'criteria' for our 'sample' respondent groups. So planning is essential to understand sources and types of data and how they can work together as a whole.

 

Data Analysis Planning

This makes a huge difference to what you can find out, and how you approach the iterations of both data gathering and data analysis. An example of this can be seen in a diagram of data analysis architecture from my MA dissertation (2013). This demonstrates fairly clearly how planned stages of analysis then form iterations of findings, and establish areas for discussion. This approach can also help to highlight what is most effective in answering the research question, what might show up 'confounding variables' and what is simply inneffective or irrelevant.

Robust Analysis

it's worth noting that the quality of your analysis hinges on how robust your methods are. In practical terms this usually means two things: can methods and results be replicated and can the analysis be carried out by someone else (i.e.not you)? The rigour with which your apply these principles to your research is up to you, and is currently generating some debate amongst top academics see Allan Defoe's research paper (PDF), and Tom Pepinsky's blog post as examples, but it's worth further Googling if you're interested.

More Information, reading and further detail

Included below are some useful PDF downloads on qualitative content analysis approaches, which are likely to be some of the most popular methods by which data will be analysed in a social sciences and education setting.

Further advice in the tabbed content sections which follow outline a variety of issues and useful things to remember, plus list a number of other useful resources and clrify some of the more difficult terminology used in research. 

DOWNLOADS:

Click on the tabs to access more content

Analysis Techniques

Overview of a simple research design

This webpage provides a good concise overview to give ideas on how to approach your research design and what you need to 'do'. 

Content Analysis

There are a variety of content analysis methodologies, see the pdf downloads in the main section above, but also refer to the following webpages for help:

Observation Analysis

Observation analysis is how we measure what we observe about our research sample or samples. We can do this in all kinds of ways and choices will be dependent on what your research question is and who your samples are. Here are some useful resources to get you started on how to use this method:

Categorisation of Content

Categorisation can be used in all sorts of settings. In relation to interview key quotes, or categorising other types of transcript text (for example, what happens in a video), here's a useful page outlining 10 steps for content analysis, from Surrey University, which describes the categorisation process.

A Table of Comparisons

Here's an example of a comparison table. These can be useful for all sorts of purposes and are especially useful in a findings section to illustrate or evidence a point, not just for raw data analysis.

Number of people in each sample who agreed or disagreed with the statement "I really love cheese":

Number agreeing or disagreeing
------------
Sample Group
Strongly Agree    Strongly Disagree    Neither agree or disagree
Sample 1 7 3 2
Sample 2 5 6 1
Sample 3 2 9 1
Sample 4 0 7 5
Sample 5 4 8 0

Table showing a grid of comparisons

 

Some other examples of tables of comparisons:

The Expert Review

An Expert Review is a usability technique of looking at a lot of data offered by different comparable sources, and measuring differences and similarities or frequency of occurences of factors being measured. It uses expert knowledge to gauge the differences or measure preferences or requirements. This technique has been adapted and used in a variety of other knowledge provision fields.

Heuristic Evaluations

Another technique that began in usability research was the heuristic evaluation. '...A group of experts is asked to assess a particular design using a given rubric (set of heuristics)'. This is extremely effective in gathering useful (measureable) feedback on learning design(s). Two links below provide further insight:

Ranking and scaling

Ranking and/or scaling is a way of demonstrating strengths, weaknesses, frequencies, significance or other factors in relation to data. Developing strata for ranking your data, for example by order of time, location, occurences, age etc. Put simply, it is often at the core of any data analysis system. 

Interviewing

Recording 

Obtain a good audio recorder for any face to face interviews or focus group discussions - you can either hire one, or use your smartphone if it has good audio quality available. Set the recorder down close to the respondent(s), on a soft base (a cushion, or jumper) with the mic aimed towards the respondent(s). Do a test beforehand to make sure you are getting the recording! Also make sure that you have lots of battery power or are plugged into the mains if available. You may wish to edit the recording files later, in which case, get Audacity (free) and use that - it's quite straightforward, and there are lots of tutorials online.

Précis

You need to précis the recordings. To précis an interview or discussion means to write a fairly full summary - not perhaps word for word but certainly to create a written version of all pertinent information, who said it (respondent code) and when it was said in relation to the timecode of the recording, or topic being discussed and the traceable stage of the interview.

Key Quotes

Pull out the key quotes of each interview/discussion - you may wish to use these in your findings, discussion and conclusions, and having them clearly available is a must. They may also form part of the material being categorised in your open ended qualitative data. They are in the précis, but it's useful to have them separate too, as a short summary. You may wish to create measurable criteria for selecting key quotes (for example words mentioned, frequency of occurence of words or topics, in response to specific questions etc. Selecting key quotes is a skill so try it out with other text to see what you end up with. Social media discussions are a great source for practicing this!

Referring to specific responses or quotes

It is important to NEVER reveal the name of the respondents. If you need to refer to a specific quote and it is useful to state 'who' said it, always use their code. For example:

'Respondent A stated "I never liked cheese as it was never given to us as children". Respondent A grew up in Peru, where cheese is not part of the staple diet, and this therefore may have a bearing on the validity of the statement in the context of the research..."

Sometimes use of actual quotes can work well in a comparsion table setting, to help your reader compare responses or frame their understanding better. It's also useful to refer to which set of interviews or questionnaires the quotes are sourced from, or which Sample group, to provide background.

More information

More information on analysing data from transcripts is in the Analysis Techniques/Categorisation of Content section on this page.

Questionnaires

Question order and sample groups

Question order can be extremely important in obtaining accurate responses, both from individual respondents and also from distinct sample groups. Order according to logic of responses is perhaps more obvious (response path guidance), but also in terms of allowing for question fatigue, or possible negative reactions early on which might prevent further responses from either being accurate, or being given at all. Sometimes demographic questions of a more personal nature (e.g. age, address) may often be left until the end because of this. A common way of guarding against question fatigue is to ask the most important questions in a random order to each respondent. This is done with a robust method, not just using a 'shuffle and see' approach. 

Randomisation

Randomisation, when done in a controlled manner, is a way of removing 'confounding variables', i.e. factors which might be influencing your data, and that you either may not be able to control, or had not thought to remove. Confounding variables, in the case of questionnaire fatigue, might be the order of questions. To control this, you would create a randomisation table, thereby controlling the order that questions were presented, but randomly associating that with any individual respondent. Here's an example:

Respondent Number First Question  Second Question Third Question 
R1 1 2 3
R2 2 3 1
R3 3 1 2

Table showing fixed random order method

 

Assigning respondent codes

It's important to assign codes to your respondents. This is for a variety of reasons:

  • It allows for anonymity and assured privacy of all respondents (even from each other)
  • It allows you to trace sets of responses according to a given respondent
  • It allows for comparisons between respondents/responses according to criteria such as demographic characteristics
  • It makes your data more suitable for further sharing (for other research projects)
  • Remember to record each questionnaire with a Unique Identifier which corresponds to your respondent code - if you don't do that, you have no record of who said what or when it was done
Response data, Respondent data

Response data is the absolute crux of your research. It is the primary data which will be used as evidence to challenge or confirm any hypothesis or points in findings and discussion. Response data can be grouped in as many ways as you can think of to achieve this, and may even throw up ways of interpretation you hadn't thought of, once you see results. You may iterate your analysis mehtods and document your iterations, this is all useful in a research project as sheds light on what is going on. 

When we think of Respondent data, we think of something different to response data, as it allows us to see who is responding, and how they respond to different sets of questions. This can tell us different things than just looking at groups of response data. By building context and background of respondents (in whichever way is relevant to your project) more can be understood in relation to responses and possible implications and conclusions.

Using software

SPSS

SPSS software (Statistical Package for the Social Sciences) comes for Windows and Mac OS. It is the most widely used of all softwares in academia for analysing sometimes very complex sets of data. It is usually available free from your institution, so ask your IT department.

It can seem hard to use, but it is recommended to become at least familiar with the basics if you intend to go on with your research in your academic career. There are an enourmous amount of tutorials and guides available online to help you get started, as well as a variety of useful books. The download PDF in this section is very comprehensive and useful. Here are some other resources to get you started:

nVivo

nVivo (v10) is a platform for 'analysing all forms of unstructured data' and is widely used in the academic community. It has the ability to search, query or visualise data, and offers a 30 day free trial download for Windows or Mac. It can be purchased at student rates (6 month Windows licence is £55, 12 month Mac licence £49), other rates are also available.

There are comprehensive tutorials and guides on the nVivo website (link below). 

Excel (using formulae)

Microsoft Excel may be the easiest way for you to analyse data, and with some Googling for tutorials, you will be able to create great visualisations as well as do good analysis. Here are a few to get you started:

Online tools for surveys

Online tools such as Survey Monkey, Typeform or Wufoo Forms will all provide you with downloadable xsl files (excel files) that you can load up into Excel. This cuts out the need to enter your responses first and saves time. Surveymonkey also provide its own visualisations of responses, so can be doubly useful. Typeform and Wufoo provide an excellent form interfaces which enhance your respondents experience when they take part in your research.

DOWNLOADS:
  • [pdf : 3 MB] downloads 'Getting started with nVivo 10', by QSR Int (makers of nVivo)
  • [pdf : 4 MB] downloads 'Analyzing Data Using Excel', by Scott Sample, for Microsoft

Glossary of Terms

Common terms used in research analysis

  • Statistics
  • Generating numbers which correspond to factors you are researching. Often analysed using software like SPSS or Excel.
  • The Hypothesis
    More common in Sciences research, the hypothesis is either 'null' or the one you are researching. For example: "everyone likes cheese" (the null hypothesis) and "not everyone likes cheese" (the one you are researching).
  • The 'P' Value
    Also known as statistical significance. The P value stands for probability value and is usually +/- 5% of replicating the results to prove or disprove the null hypothesis.
  • Standard Deviation
    The term is used to mean the standard expected amount of deviation from the norm of a set of data results.
  • Mean, Median and Mode
    The three main types of 'averages' - this is a good link to explain the difference
  • Participants/Subjects/Respondents
    The people who are part of your research - who you are watching, or testing, or interviewing. They may also be referred to as being from a specific sample if applicable.
  • Within subject/between subject
    This means research (or sets of research) done within a single subject group and comparisons made about the data within that group. Between subject is research carried out with distinctly different groups of subjects, and comparisons made between those groups.
  • Independent and Dependent Variable 
    The Dependent Variable is the state of behaviour or other factor on which you want to make an effect or change, the Independent Variable is how you intend to achieve that change - for example: a group of people smoke (the dependent variable), by instigating some independent variable (e.g.anti smoking campaign) you are attempting to change their behaviour (e.g. cut down their smoking after being exposed to anti-smoking campaigns).
  • Confounding Variables
    Those variables which 'confound' your data or findings. They are different to just uncontrolled variables, as they coincide with what is changing in your research. For example, people who smoke a lot in pubs but perhaps not as much elsewhere. In pubs/not in pubs might be a confounding variable on amount of smoking when exposed to anti-smoking campaigns.
  • Outliers
    An Outlier is someone who is unduly skewing your results - i.e. influencing the mean or standard deviation with extreme values. Screening your data for outliers so you can remove them is useful before you do any serious analysis, but be sure to report this very early in your results section to be clear about what you have done.
  • Causal Inference
    Making a connection between data to infer a cause or effect. This can be fraught with issues, some of which are covered in Causal Inference with Obervational Data (PDF), (Nichols, A, 2007, Stata Journal).
Analysis Techniques
Interviewing
Questionnaires
Using software
Glossary of Terms