how-to-scrape-linkedin-job-posting-data-for-competitive-analysis

LinkedIn is a beneficial social network for job hunters and for everyone who wants to expand their contacts and connections. It has millions of people and, therefore, stands as one of the best platforms for looking for job vacancies. However, finding job posts by browsing the internet is a time-consuming process. This guide will help you understand how to collect LinkedIn job postings so you can easily view all the jobs and search for specific ones.

What is LinkedIn Data Scraping?

LinkedIn data scraping is the act of extracting information from LinkedIn profiles, jobs, and other data accessible on the LinkedIn platform through the use of automated tools or scripts. Scraping tools can acquire this information en masse, unlike traditional data mining, where one has to search for specific information such as job titles, company names, or contact information on LinkedIn. This method is commonly employed in creating databases for employment, advertising, or client procurement; it helps evaluate trends or in searching for better candidates.

Nevertheless, data scraping is not allowed on LinkedIn. This is against the platform's terms of service, as it may lead to invasion of privacy and misuse of users' information. The legal consequences of scraping data from LinkedIn include an account suspension or even a lawsuit against the companies or individuals involved. So, we must work with LinkedIn's official tools, such as API or paid services, to obtain the data legally.

What is the Need to Scrape LinkedIn Job Postings?

Using LinkedIn job postings can be beneficial for job seekers and companies. Here's a detailed and easy-to-understand explanation of why it's useful:

Saves Time

On this site, you can spend a lot of time, especially if you are searching for a job and going through the hundreds of job offers. Scraping tools can help by pulling in job postings quickly and in large numbers. A simple click-through method with one page is designed to make the whole process faster and easier to find the jobs of your dreams.

Gathers Critical Job Information

Job seekers who wish to get more details about the job, such as the job title, the company that provides the job, the job location, and perhaps even the job description or salary. This is a gathering job by scraping LinkedIn job postings. This information is then captured and arranged by computers in an auto-case scenario. Consequently, you do not need to waste your time copying and pasting that information separately. Using this collection mechanism, you obtain the information you can easily compare to find the most appropriate option.

Comprehend Job Market Trends

Job market behaviors, however, can be better understood by capturing large amounts of job postings. You can realize, for instance, that some skills are better in demand or that specific job titles are being frequently published if you scrape many job postings. This may be helpful for job seekers who need to determine which skills they should master. Companies can also utilize the information to revise the hiring process to stay in the market.

Competitor Analysis

Scraping job postings from the internet is an insult to any given company. It can be used as a tracking tool against the other companies that are the company's competitors. If a competitor aggressively hires these roles, they could probably be growing or introducing new products. This will give them a significant edge and help them plan their strategies more efficiently.

Improvement in the hiring process

Employers and recruiters can use scraping to track job postings in their industry and build a pool of potential candidates. Scraping then becomes an instrument of connecting recruiters with high-demand jobs, and consequently, they visit the right places to conduct their recruitment activities.

Be Informed About Job Updates

LinkedIn's regularly updated job postings automate the process. Scraping tools are used to stay ahead of the curve and avoid missing out on possible job openings. This is especially useful if you're actively looking for a job and want to apply when a new position becomes available.

However, it's important to remember that LinkedIn has rules against scraping without permission. If done improperly, scraping could lead to your account being suspended or even legal action. To avoid this, using LinkedIn's official tools or getting permission if needed is best.

Challenges of Scraping Job Postings from LinkedIn

Some difficulties arise when scraping data from LinkedIn. Here are some of the main challenges:

Legal and Ethical Issues

LinkedIn's terms of service prohibit unauthorized scraping. If caught, people or companies can have their accounts suspended, arrested, or fined. Unauthorized scraping can also raise ethical problems, especially regarding the privacy of LinkedIn users whose data is being extracted without permission.

Technical Barriers

LinkedIn has implemented measures to combat scraping, including CAPTCHA tests, IP locking, and rate limiting. These barriers can affect scraping tools' reliability and sustainability because they may be locked after sending many requests in a certain amount of time.

Data Accuracy

Scraping tools may not always get data properly because LinkedIn changes often with time or design. Since LinkedIn can change the structure of its website periodically or the format of its job postings, web scraping can sometimes retrieve wrong or limited information.

Data Volume and Management

It is easy to scrape large quantities of information from LinkedIn, creating vast volumes of information that must be addressed. Dealing with this data can be a process, and putting in proper systems to sort, scrub, and structure it is crucial. But, it becomes difficult to manage if the right tools are not in place.

Account Risk

If you use scraping tools with your LinkedIn account, there are some dangers, such as getting banned by LinkedIn's system. If your account is affiliated with such activities, then you will be limited from using the platform for network connections or seeking a job.

Limited Access to Some Data

Some data could not be scraped from LinkedIn; individual profiles are set to private or information behind subscription paywalls (LinkedIn Premium). This restricts the quantity of information accessible for scraping and, therefore, the broader collections of data that can be accrued.

How to Scrape Job Postings from LinkedIn

Install the below-mentioned libraries

pip install requests
pip install beautifulsoup4

Determine how LinkedIn job search works.

2_Determine-how-LinkedIn-job-search-works

For a Python job in the Las Vegas, visit this page. Let's have a look at the page's URL, which is as follows: https://www.linkedin.com/jobs/search?keywords=Python (Programming Language)&location=Las Vegas, Nevada, United States&geoId=100293800¤tJobId=3415227738&position=1&pageNum=0

Let me explain it to you.

  • keywords: Python (Programming Language)
  • location: Las Vegas, Nevada, United States
  • geoId: 100293800
  • currentJobId: 3415227738
  • position: 1
  • pageNum: 0

There are 118 jobs total on this page, but the pageNum remains unchanged as I navigate to the next page (this page has unlimited scrolling). The challenge then becomes, how can we scrape every job?

A Selenium web driver can be used to fix the problem mentioned above. To extract every page, we can utilize the.execute_script() method to scroll down the page.

The second issue is figuring out how to retrieve data highlighted on the right side of the page. Additional information about each chosen job, such as pay and duration, will be shown in the box.

We can use the Selenium.click() function. Using that logic, you must use a for loop to review each job displayed and click on each one to get information in the corresponding box.

Although this approach is accurate, it takes too much time. Clicking and scrolling will strain our hardware, preventing large-scale scraping.

Discovering the answer into devtool

1_Determine-how-LinkedIn-job-search-works

Let's launch our development tool and reload our target page. Let's examine the contents of our network tab.

As we know, LinkedIn loads the second page by using infinite scrolling. By scrolling down to the second, let's see if anything appears in our network tab.

2_Discovering-the-answer-into-devtool

We already know that LinkedIn loads the second page through infinite scrolling. Let's see if anything appears in our network tab by scrolling down to the second.

3_Discovering-the-answer-into-devtool

Let's launch our browser and open this URL.

4_Discovering-the-answer-into-devtool

We can now conclude that LinkedIn will send a GET request to the aforementioned URL to load all of the jobs that are posted each time you scroll and the site loads a new page.

To further understand how the URL functions, let's dissect it.

Alright, we now have a way to obtain all of the listed.

https://www.linkedin.com/jobs-guest/jobs/api/seeMoreJobPostings/search?keywords=Python (Programming Language)&location=Las Vegas, Nevada, United States&geoId=100293800¤tJobId=3415227738&position=1&pageNum=0&start=25

  • keywords: Python (Programming Language)
  • location: Las Vegas, Nevada, United States
  • geoId: 100293800
  • currentJobId: 3415227738
  • position: 1
  • pageNum: 0
  • start: 25

The start parameter is the only one that varies in every page. The start value will change to 50 when you reach the 3rd page. For each additional page, the initial value will rise by 25. Another thing you'll notice is that the last job will get buried if you increase the value of start by 1.

Indeed, we now understand how to get any of the listed jobs. The information that appears when you click on any job on the right—what about it? How do I get that?

5_Discovering-the-answer-into-devtool

Every time a job is clicked, LinkedIn sends a request- GET to this URL. However, the URL contains an excessive amount of noise. This is how the URL will appear in its most basic form: https://www.linkedin.com/jobs-guest/jobs/api/jobPosting/3415227738 \

The current JobId, which appears in the li tag of every job posted on the platform, is 3415227738.

We now know how to go around Selenium and improve the scalability and dependability of our scraper. Now, we can use the requests library to extract all of this data with a straightforward GET request.

What are we Aiming to scrape?

Selecting the precise information you wish to extract from a website in advance is always preferable. Three items will be scraped for this tutorial.

  • Name of the company
  • Job position
  • Seniority Level
What-are-we-Aiming-to-scrape

We will scrape all of the jobs using BeautifulSoup's.find_all() method. Next, we will extract each job's jobid. We will then extract job data from this API.

Scraping Jobs IDs from Linkedin

First, let's import every library.

import requests
from bs4 import BeautifulSoup

This page lists 117 Python jobs available in Las Vegas.

2_Scraping-Jobs-IDs-from-Linkedin

Since each page displays 25 jobs, here explains how our approach will help us scrape every position.

117 divided by 25 We will use the math.ceil() method to the value if it is a float number or whole number.

import requests
from bs4 import BeautifulSoup
import math


target_url='https://www.linkedin.com/jobs-guest/jobs/api/seeMoreJobPostings/search?keywords=Python%20%28Programming%20Language%29&location=Las%20Vegas%2C%20Nevada%2C%20United%20States&geoId=100293800¤tJobId=3415227738&start={}'
number_of_loops=math.ceil(117/25)

Let's look at where job IDs are located in DOM.

4_Scraping-Jobs-IDs-from-Linkedin

Under the div tag, next to the class base-card, is the ID. You need to find the data-entity-urn attribute inside this element in order to get the ID.

We need to use nested for loops to get the Job Ids for every job. The first loop will change the page, while the second loop will go over each job on each page.

target_url='https://www.linkedin.com/jobs-guest/jobs/api/seeMoreJobPostings/search?keywords=Python%20%28Programming%20Language%29&location=Las%20Vegas%2C%20Nevada%2C%20United%20States&geoId=100293800¤tJobId=3415227738&start={}'
for i in range(0,math.ceil(117/25)):

    res = requests.get(target_url.format(i))
    soup=BeautifulSoup(res.text,'html.parser')
    alljobs_on_this_page=soup.find_all("li")

    for x in range(0,len(alljobs_on_this_page)):
        jobid = alljobs_on_this_page[x].find("div",{"class":"base-card"}).get('data-entity-urn').split(":")[3]
        l.append(jobid)

This is a detailed breakdown of the code mentioned above.

We have specified a target URL that contains jobs.

After that, a for loop is run through to the last page.

We then sent a GET request to the page.

We are using BS4 to generate a parse tree constructor.

Because all jobs are contained within li tags, we can use the.find_all() method to find every li tag.

We then started a fresh loop that would keep going until the finished product showed up on any page.

The location of the job ID is being ascertained.

Each and every ID has been added to an array.

All of the ids for every location will eventually be in array l.

Scraping Job Details

Let's locate where the Firm name appears in the DOM.

6_Scraping-Jobs-IDs-from-Linkedin

The value of the alt tag, which is located inside the div tag with class top-card-layout__card, contains the firm name.

7_Scraping-Jobs-IDs-from-Linkedin

The div tag with class top-card-layout__entity-info contains the job title. The text is included within this div element's first a tag.

8_Scraping-Jobs-IDs-from-Linkedin

The first li tag of the ul tag with class description__job-criteria-list contains the seniority level.

A GET request will now be sent to the specific URL of the job page. The information we are trying to pull from Linkedin will be available on this page. We will look for these specific items inside BS4 using the DOM element positions listed above.

target_url='https://www.linkedin.com/jobs-guest/jobs/api/jobPosting/{}'
for j in range(0,len(l)):

    resp = requests.get(target_url.format(l[j]))
    soup=BeautifulSoup(resp.text,'html.parser')

    try:
        o["company"]=soup.find("div",{"class":"top-card-layout__card"}).find("a").find("img").get('alt')
    except:
        o["company"]=None

    try:
        o["job-title"]=soup.find("div",{"class":"top-card-layout__entity-info"}).find("a").text.strip()
    except:
        o["job-title"]=None

    try:
        o["level"]=soup.find("ul",{"class":"description__job-criteria-list"}).find("li").text.replace("Seniority level","").strip()
    except:
        o["level"]=None



    k.append(o)
    o={}

print(k)

We have established a URL that has the specific Linkedin job URL for each organization.

A for loop will be used to determine how many IDs are in the array l.

We then submitted a GET request to the LinkedIn page.

again produced a BS4 parse tree.

Then, we are using try/except statements to extract all of the data.

Array k now contains object o.

To store data from a different URL, declare the object empty.

The array k is finally printed.

After printing, this is the result.

10_Scraping-Jobs-IDs-from-Linkedin

Saving the data to a CSV file

The pandas library will be used for this process. It only takes two lines of code to save our array to a CSV file.

Steps to install

pip install pandas

Import this library in our main Python file.

import pandas as pd

We will now transform our list to a column and row structure using the DataFrame technique. Next, we will transform a DataFrame to a CSV file using the.to_csv() method.

df = pd.DataFrame(k)
df.to_csv('linkedinjobs.csv', index=False, encoding='utf-8')

You can add these two lines when your list is complete with all the data. Following the completion of the application, a CSV file named linkedinjobs.csv will be placed in your root folder.

4_Steps-to-install

Complete Code

import requests
from bs4 import BeautifulSoup
import math
import pandas as pd
l=[]
o={}
k=[]
headers={"User-Agent":"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/107.0.0.0 Safari/537.36"}
target_url='https://www.linkedin.com/jobs-guest/jobs/api/seeMoreJobPostings/search?keywords=Python%20%28Programming%20Language%29&location=Las%20Vegas%2C%20Nevada%2C%20United%20States&geoId=100293800¤tJobId=3415227738&start={}'
for i in range(0,math.ceil(117/25)):

    res = requests.get(target_url.format(i))
    soup=BeautifulSoup(res.text,'html.parser')
    alljobs_on_this_page=soup.find_all("li")
    print(len(alljobs_on_this_page))
    for x in range(0,len(alljobs_on_this_page)):
        jobid = alljobs_on_this_page[x].find("div",{"class":"base-card"}).get('data-entity-urn').split(":")[3]
        l.append(jobid)

target_url='https://www.linkedin.com/jobs-guest/jobs/api/jobPosting/{}'
for j in range(0,len(l)):

    resp = requests.get(target_url.format(l[j]))
    soup=BeautifulSoup(resp.text,'html.parser')

    try:
        o["company"]=soup.find("div",{"class":"top-card-layout__card"}).find("a").find("img").get('alt')
    except:
        o["company"]=None

    try:
        o["job-title"]=soup.find("div",{"class":"top-card-layout__entity-info"}).find("a").text.strip()
    except:
        o["job-title"]=None

    try:
        o["level"]=soup.find("ul",{"class":"description__job-criteria-list"}).find("li").text.replace("Seniority level","").strip()
    except:
        o["level"]=None



    k.append(o)
    o={}

df = pd.DataFrame(k)
df.to_csv('linkedinjobs.csv', index=False, encoding='utf-8')
print(k)

Conclusion

LinkedIn job listings are immensely valuable for not only job seekers but for recruitment professionals and can dramatically transform the whole employment seeking and selection process. There exist methods for the collection of job information that should be used and an understanding of the benefits of doing so should be gained. Regardless of the programming languages , Python or tools used by ReviewGators is to provide efficient data extraction of the details like job title, company name, location etc.

Once you’re done scraping the data, it can be employed in job search as well as industry analysis. This lets you predict what fields are popular, that is, which skills are sought after, the typical salaries, and employers offering jobs. Having this data in your possession provides you with a competitive advantage in landing jobs, career transitions, or hiring the best candidates. It enables you to defend yourself and make a wise decision in the current ever changing job marketplace.


Post Comments

Get A Quote