Research Methods and Analysing Data



Introduction

My research question is “What is the best technique for extracting structured data from recipe websites”.

To answer this question, I will use a quantitative approach to data gathering and analysis. A Quantitative Research Method is one that “deal[s] with numbers” (Urban et al., 2018) rather than dealing with thoughts and preferences. This definition fits perfectly with the analysis that I will be performing.

Analysing the Data

To answer my research question, I will be taking a sample of recipe sites and then analysing sample recipes from each site. This will enable me to determine what is the best way of extracting structures recipe data from a web site using web scraping.

This necessitates the question of how many recipes and recipe sites I will analyse to answer the question. There are 14 different allergen groups (Allergen guidance for food businesses 2023), so as a minimum, I will analyse 14 different recipes from each site. This minimum however will not give me very representative figures, so I intend to analyse 5 recipes per allergen per site.  This means analysing 70 recipes per site (5*14).  I aim to analyse data from 4 sites, making a total of 280 recipes being analysed (5*14*4). The recipe sites I aim to analyse are:

  • BBC Food Recipes
  • All Recipes
  • Sainsbury’s Recipes
  • Delicious Magazine

One of the first stages of analysis will be to identify the recipes that I can use. In the situation that there are insufficient recipes from each site, then I will investigate further sites.

I will use several techniques for analysing data on these recipe pages, namely:

  • Regular expressions (Python Regular Expressions) – this is a technique for finding patterns in strings
  • DOM querying (MozDevNet) – this technique allows any node within the Document Object Model (DOM) to be queried along with its immediate neighbours and their neighbours and so on…
  • HTML querying – this technique is similar to DOM querying, but is more along the lines of searching for HTML elements within a string rather than in a DOM model
  • XPath (MozDevNet, Xpath) – XPath is a similar technique to DOM querying, but is based upon XML, however it can also be used for querying HTML.

For each recipe, I will determine the best technique and whether this technique can be used for other recipes on the same site with minor (or preferably no) modifications. The best technique will be the one that is the most accurate, the easiest to use and applicable to the greatest number of recipes (in the case of multiple techniques being appropriate).

For example, if I find that using a regular expression works on every recipe, but the regular expression needs to be customised for each recipe, this will not rank as highly as a DOM query that works on every recipe without being modified.



References

Urban, J.B. and Eeden-Moorefield, V.B. (2018) ‘4’, in Designing and proposing your research project. Washington, DC: American Psychological Association. 

UK Food Standards Agency (2023) Allergen guidance for food businesses. Available at: https://www.food.gov.uk/business-guidance/allergen-guidance-for-food-businesses#allergens (Accessed: 17 August 2023). 

MozDevNet (no date) Introduction to the DOM - web apis: MDN, Web APIs | MDN. Available at: https://developer.mozilla.org/en-US/docs/Web/API/Document_Object_Model/Introduction (Accessed: 18 September 2023). 

W3Schools.com (no date) Python Regular Expressions, Python regex. Available at: https://www.w3schools.com/python/python_regex.asp (Accessed: 17 September 2023). 

MozDevNet (no date) Xpath, MDN. Available at: https://developer.mozilla.org/en-US/docs/Web/XPath (Accessed: 17 August 2023). 

Credits

Photo by Stephen Dawson on Unsplash

Comments

Popular posts from this blog

Notion Kanban Boards