How-Web-Scraping-of-Zomato-can-be-done-by-BeautifulSoup-Library-in-Python

Introduction

Web scraping, also known as data scraping, is a kind of data extraction used to gather information from different websites. The software of web scraping uses a web browser or HTTP to access these websites. The software user performs web scraping manually but web scraping is generally known for automated procedures done by bots or by a web crawler. This is a type of process where specific data from the websites and the internet are copied and stored into a local dataset or spreadsheet to retrieve the data later.

Here, we will use Zomato data scraper to gather information on the best restaurants in Bengaluru, India. HTML website pages will be used in accessing and reading the information.

Scraping the Website Content

Scraping-the-Website-Content

The web address is typed in the browser and the HTTP request is made to visit the webpage. If a request is successfully completed, the web page will be displayed by the browser otherwise or it will show an error. The same kind of request is made for accessing a Zomato web page.

Some of the tools that are available with us help us use Python to access a web page.

import requests
from bs4 import BeautifulSoup

Let us understand the uses of libraries before using them as well as functions in accessing a web page.

Making a Request

Making-a-Request

It is created for humans who are dependent on the language. It eliminates the need of adding query strings manually to the URLs or encrypting the post data. The Requests allow you to use Python in sending requests of HTTP/1.1. You can use simple Python libraries to add material like headers, multipart files, form data, and arguments. Similarly, Python's response data can be retrieved.

BeautifulSoup (BS4)

BeautifulSoup

BeautifulSoup4 is a package of Python for data extraction from XML and HTML files. It integrates with your preferred parser to offer navigation, search, and modification of a parse tree. This is normal for programmers to save hours or even days of effort.

After knowing the tools, we shall now try to access the web page of Zomato.

The data of the best hotels on Zomato has now been put in the variable. However, it is not in the readable format for everyone except computer scientists. Let's see the uses of scraped data.

Here, we are looking for the name of restaurant, address of a restaurant, and the category of cuisine. To start looking for all these characteristics, we need to locate the HTML elements that contain this data.

By looking at the BeautifulSoup material mentioned above, or by using a review on your Web Browser called Chrome to check which tag holds the gathering of the best restaurants, as well as additional tags with more information.

top_rest = soup.find_all("div",attrs={"class": "bb0 collections-grid col-l-16"})
list_tr = top_rest[0].find_all("div",attrs={"class": "col-s-8 col-l-1by3"})

Want to scrape Zomato data?

Get a Quote!
Ideas Flow

The preceding code will look for any div HTML tags with the class="col-s-8 col-l-1by3" and return data for collecting lists of hotels. We need to use a loop for accessing the list items, i.e., a restaurant information at a time, for extracting additional information using loop.

list_rest =[]
for tr in list_tr:
    dataframe ={}
    dataframe["rest_name"] = (tr.find("div",attrs={"class": "res_title zblack bold nowrap"})).text.replace('\n', ' ')
    dataframe["rest_address"] = (tr.find("div",attrs={"class": "nowrap grey-text fontsize5 ttupper"})).text.replace('\n', ' ')
    dataframe["cuisine_type"] = (tr.find("div",attrs={"class":"nowrap grey-text"})).text.replace('\n', ' ')
    list_rest.append(dataframe)
list_rest 

The tr variable in the preceding code holds various details about the hotel, such as its name, cuisine, address, prices, reviews, and menu. Each piece of information is saved in its particular tag, which can be identified by looking at the tr called each item’s data.

Before looking for tags in the HTML, we should take a look at how the restaurant's menu appears on the website.

Restaurant's Menu Appears

You can see in the above images that the data required to get scraped is shown in several formats. Returning to HTML content, we have discovered that data is kept within the div tag in the modules defining the kind of formats or fonts used.

The dataframe is developed for collecting necessary information. We go through each detail of data one after another and save it in diverse DataFrame columns. Because HTML data utilizes ‘n’ to split data that cannot be saved in a DataFrame, we will have to employ a few String functions. As a result, we can substitute ‘n’ with “to prevent any issues with space.

Results obtained from the above-mentioned code would be like-

Result
Saving Data in a Readable Format
Saving-Data-in-a-Readable-Format

Presume the situation where you need to deliver data to a person who is not familiar with Python. They will not understand any information. The dataframe data will be saved in a readable format like CSV.

import pandas
df = pandas.DataFrame(list_rest)
df.to_csv("zomato_res.csv",index=False)

The code above will generate the Zomato res CSV file.

Conclusion

In this blog, we have learned to make Requests for accessing a web page from Python and BeautifulSoup4 for extracting HTML data from the available content. Then, the data was formatted in a dataframe and saved in a CSV format.

Looking for Web Scraping Service to scrape Zomato data? Contact Web screen Scraping now! Request for a quote!


Post Comments

Get A Quote