Snapshot: A broad overview covering a few ways to approach analysing research data, especially suitable for social science research areas
Analysing your data can be the most daunting aspect of your research project. It is especially difficult if you've gathered lots of data without thinking about how you will analyse it, how it fits together, how you can compare sets of data, what the data sets even are, and how you might draw conclusions from that data. These questions are the fundamentals of analysing data, as without knowing these basics, you'll never be able to analyse effectively, even if you use simple methods of analysis.
All sorts of things can count as data for your project. It depends on what your research questions are, and how you have approached your research design. We often use several types and sources of data in research, for comparison purposes, to be able to 'triangulate' our findings. Data from several different sources that appear to point to a similar conclusion make for a much more powerful argument than only one type of data or one type of data source.
It's helpful to think about types and sources of data BEFORE you do any data gathering, to help you design the research approach in a useful and methodical way. Visualising how you can analyse data from different sources in terms of how they might fit, compliment or duplicate each other really helps to decide which data you want to gather, and which might be unnecessary extra work.
The table below is a guide for a variety of research data sources, how you might analyse them, and which type of data they are generally considered to be - qualitative or quantitative.
Research Data Source
Analysis approaches and methods
Type of Data
Literature review findings
|The review itself, categorisation tables, comparison tables, dates, sources (journal types), ...||qualitative|
Observation - notes or diary (recorded in systematic or non systematic way)
|Fixed questions answered, or open non specific topic common occurrences, categorisations, generated by observer/participant, ...||qualitative or quantitative|
Informal discussions online or face-to-face (recorded or evidenced in some way)
|categorisation of common topics, dates and times, types of people involved, samples compared, ...||qualitative|
Formal interviews or focus group discussions
|Sets of to be criteria covered with each interviewee/group with corresponding frequency of responses, negative/positive comments, ...||qualitative|
Image or video evidence
|number of occurrences, common approaches (content, themes etc), locations, sources...||quantitative or qualitative|
|numbers counted, compared, averaged, measured, noted outliers ...||quantitative|
Comparisons observed and recorded
|Any set of factors relevant as evidence to research; tabled, compared, measured ...||quantitative|
|Experiential testimony (yours, or other people's)||participatory notes, journals, categorisations, topic number of occurrences, times created, author type (sample) ...||qualitative|
|Digital sources||Web analytics, user experience behaviour, task tracing, usability testing ...||quantitative or qualitative|
Table showing data source, analysis approach and type
Any or all of the data sources in Table 1 could be used in the same piece of research - though often you may not need quite so many different types in order to establish reasonable conclusions in relation to your research question. The table shows how each source might be analysed, along with whether it's quantitative or qualitative.
We also think about primary and secondary data sources. Primary Data (aka Primary Research) is the research data you produce from first hand enquiry. Secondary Data is 'other data' (i.e.not generated by you), which can come from a variety of sources, for example literature reviews or re-analysing data from old journal papers would be secondary data. This can all be used by you to contribute to your findings, discussion and any relevant conclusions. Your own interviews, questionnaires or observation notes would be primary data.
We know the difference between qualitative and quantitative data - the table indicates which source might fall into which category. So how do we analyse each of these, and can we use both types of data in our findings to help draw conclusions? The short answer is yes, and we can also use quantitative data to inform qualitative research, or vice-versa. For example carry out an introductory questionnaire to form the basis of who we want to interview about different topics, thereby creating 'criteria' for our 'sample' respondent groups. So planning is essential to understand sources and types of data and how they can work together as a whole.
This makes a huge difference to what you can find out, and how you approach the iterations of both data gathering and data analysis. An example of this can be seen in a diagram of data analysis architecture from my MA dissertation (2013). This demonstrates fairly clearly how planned stages of analysis then form iterations of findings, and establish areas for discussion. This approach can also help to highlight what is most effective in answering the research question, what might show up 'confounding variables' and what is simply inneffective or irrelevant.
it's worth noting that the quality of your analysis hinges on how robust your methods are. In practical terms this usually means two things: can methods and results be replicated and can the analysis be carried out by someone else (i.e.not you)? The rigour with which your apply these principles to your research is up to you, and is currently generating some debate amongst top academics see Allan Defoe's research paper (PDF), and Tom Pepinsky's blog post as examples, but it's worth further Googling if you're interested.
Included below are some useful PDF downloads on qualitative content analysis approaches, which are likely to be some of the most popular methods by which data will be analysed in a social sciences and education setting.
Further advice in the tabbed content sections which follow outline a variety of issues and useful things to remember, plus list a number of other useful resources and clrify some of the more difficult terminology used in research.
This webpage provides a good concise overview to give ideas on how to approach your research design and what you need to 'do'.
There are a variety of content analysis methodologies, see the pdf downloads in the main section above, but also refer to the following webpages for help:
Observation analysis is how we measure what we observe about our research sample or samples. We can do this in all kinds of ways and choices will be dependent on what your research question is and who your samples are. Here are some useful resources to get you started on how to use this method:
Categorisation can be used in all sorts of settings. In relation to interview key quotes, or categorising other types of transcript text (for example, what happens in a video), here's a useful page outlining 10 steps for content analysis, from Surrey University, which describes the categorisation process.
Here's an example of a comparison table. These can be useful for all sorts of purposes and are especially useful in a findings section to illustrate or evidence a point, not just for raw data analysis.
Number of people in each sample who agreed or disagreed with the statement "I really love cheese":
|Number agreeing or disagreeing
|Strongly Agree||Strongly Disagree||Neither agree or disagree|
Table showing a grid of comparisons
Some other examples of tables of comparisons:
An Expert Review is a usability technique of looking at a lot of data offered by different comparable sources, and measuring differences and similarities or frequency of occurences of factors being measured. It uses expert knowledge to gauge the differences or measure preferences or requirements. This technique has been adapted and used in a variety of other knowledge provision fields.
Another technique that began in usability research was the heuristic evaluation. '...A group of experts is asked to assess a particular design using a given rubric (set of heuristics)'. This is extremely effective in gathering useful (measureable) feedback on learning design(s). Two links below provide further insight:
Ranking and/or scaling is a way of demonstrating strengths, weaknesses, frequencies, significance or other factors in relation to data. Developing strata for ranking your data, for example by order of time, location, occurences, age etc. Put simply, it is often at the core of any data analysis system.
Obtain a good audio recorder for any face to face interviews or focus group discussions - you can either hire one, or use your smartphone if it has good audio quality available. Set the recorder down close to the respondent(s), on a soft base (a cushion, or jumper) with the mic aimed towards the respondent(s). Do a test beforehand to make sure you are getting the recording! Also make sure that you have lots of battery power or are plugged into the mains if available. You may wish to edit the recording files later, in which case, get Audacity (free) and use that - it's quite straightforward, and there are lots of tutorials online.
You need to précis the recordings. To précis an interview or discussion means to write a fairly full summary - not perhaps word for word but certainly to create a written version of all pertinent information, who said it (respondent code) and when it was said in relation to the timecode of the recording, or topic being discussed and the traceable stage of the interview.
Pull out the key quotes of each interview/discussion - you may wish to use these in your findings, discussion and conclusions, and having them clearly available is a must. They may also form part of the material being categorised in your open ended qualitative data. They are in the précis, but it's useful to have them separate too, as a short summary. You may wish to create measurable criteria for selecting key quotes (for example words mentioned, frequency of occurence of words or topics, in response to specific questions etc. Selecting key quotes is a skill so try it out with other text to see what you end up with. Social media discussions are a great source for practicing this!
It is important to NEVER reveal the name of the respondents. If you need to refer to a specific quote and it is useful to state 'who' said it, always use their code. For example:
'Respondent A stated "I never liked cheese as it was never given to us as children". Respondent A grew up in Peru, where cheese is not part of the staple diet, and this therefore may have a bearing on the validity of the statement in the context of the research..."
Sometimes use of actual quotes can work well in a comparsion table setting, to help your reader compare responses or frame their understanding better. It's also useful to refer to which set of interviews or questionnaires the quotes are sourced from, or which Sample group, to provide background.
More information on analysing data from transcripts is in the Analysis Techniques/Categorisation of Content section on this page.
Question order can be extremely important in obtaining accurate responses, both from individual respondents and also from distinct sample groups. Order according to logic of responses is perhaps more obvious (response path guidance), but also in terms of allowing for question fatigue, or possible negative reactions early on which might prevent further responses from either being accurate, or being given at all. Sometimes demographic questions of a more personal nature (e.g. age, address) may often be left until the end because of this. A common way of guarding against question fatigue is to ask the most important questions in a random order to each respondent. This is done with a robust method, not just using a 'shuffle and see' approach.
Randomisation, when done in a controlled manner, is a way of removing 'confounding variables', i.e. factors which might be influencing your data, and that you either may not be able to control, or had not thought to remove. Confounding variables, in the case of questionnaire fatigue, might be the order of questions. To control this, you would create a randomisation table, thereby controlling the order that questions were presented, but randomly associating that with any individual respondent. Here's an example:
|Respondent Number||First Question||Second Question||Third Question|
Table showing fixed random order method
It's important to assign codes to your respondents. This is for a variety of reasons:
Response data is the absolute crux of your research. It is the primary data which will be used as evidence to challenge or confirm any hypothesis or points in findings and discussion. Response data can be grouped in as many ways as you can think of to achieve this, and may even throw up ways of interpretation you hadn't thought of, once you see results. You may iterate your analysis mehtods and document your iterations, this is all useful in a research project as sheds light on what is going on.
When we think of Respondent data, we think of something different to response data, as it allows us to see who is responding, and how they respond to different sets of questions. This can tell us different things than just looking at groups of response data. By building context and background of respondents (in whichever way is relevant to your project) more can be understood in relation to responses and possible implications and conclusions.
SPSS software (Statistical Package for the Social Sciences) comes for Windows and Mac OS. It is the most widely used of all softwares in academia for analysing sometimes very complex sets of data. It is usually available free from your institution, so ask your IT department.
It can seem hard to use, but it is recommended to become at least familiar with the basics if you intend to go on with your research in your academic career. There are an enourmous amount of tutorials and guides available online to help you get started, as well as a variety of useful books. The download PDF in this section is very comprehensive and useful. Here are some other resources to get you started:
nVivo (v10) is a platform for 'analysing all forms of unstructured data' and is widely used in the academic community. It has the ability to search, query or visualise data, and offers a 30 day free trial download for Windows or Mac. It can be purchased at student rates (6 month Windows licence is £55, 12 month Mac licence £49), other rates are also available.
There are comprehensive tutorials and guides on the nVivo website (link below).
Microsoft Excel may be the easiest way for you to analyse data, and with some Googling for tutorials, you will be able to create great visualisations as well as do good analysis. Here are a few to get you started:
Online tools such as Survey Monkey, Typeform or Wufoo Forms will all provide you with downloadable xsl files (excel files) that you can load up into Excel. This cuts out the need to enter your responses first and saves time. Surveymonkey also provide its own visualisations of responses, so can be doubly useful. Typeform and Wufoo provide an excellent form interfaces which enhance your respondents experience when they take part in your research.