Google Answer Box Strategy

Jul 23, 2020
10 min read

Leveraging the Google PAA (People Also Ask) element on a Search Results Page for Targeted Content Creation with a Python Scraper

All businesses that are online today are creating content at a furious pace. According to Technavio, a research firm, content marketing will grow by an incremental $269.24 billion.

Typically, in companies large and small, there is a challenge around ‘what content to produce.’ In large corporations, this often comes down to who is paying the content, and the content takes a slant towards the specific business unit funding the content initiative. Smaller firms are typically stuck, thinking about what material to create.

In early 2020, Ahrefs, an SEO tool provider, analyzed over 1 billion pages on the web and concluded that 90.63% of those pages get 0 traffic from Google, and 5.29% get less than ten visits a month. Here is a link to their complete findings. A visual depiction sourced from Ahrefs is provided below.

So for any business, the challenge to produce engaging content that ranks well on Google and gathers a respectable amount of traffic is quite significant.

It would be a mistake to think that a single strategy would be able to solve this challenge. Depending on your business, your USP (Unique Selling Proposition), your competition, and many other factors, a unique strategy would be required to produce content that engages prospects and customers.

One strategy that brands can look to is the PAA (People Also Ask) element on a Google Search Page. I have used ‘red shoes’ as a seed term, which Google chose to associate with the 1948 movie ‘The Red Shoes’ as an example.

What is PAA?

PAA is a Google Answer Box that you may have come across when conducting a Google search.

Here is a screenshot of the Google Answer Box.

Typically, the PAA Answer box tends to be four questions with the critical parts of the answer pulled in from the website for the user.

And if a user clicks on a question, the answer and other related questions that Google thinks are essential surfaces to the fore, as shown in the screenshot above.

If you believe that Google has a sufficiently large data set of what people ask for, then the questions associated with your target search keyword could be an excellent place to look for in terms of content production.

Details on how to scrape these questions for a keyword using a Python script is provided below. A word of caution – scraping Google results is against Google’s Terms of Service, so execute this script at your risk. Here is a link to Google’s Terms of Service and the specific section around scraping of their results.

From an ROI perspective, we should also consider how prevalent these PAA Google Question and Answer boxes are. To get a sense of that, we should refer to the Mozcast, a collection of 1,000 keywords, and the Google results for these words are tracked every 24 hours to give us a sense of the changes in the Google search results page.

The graph below from Mozcast clearly shows how predominant the presence of the PAA Q&A box is on the Google Search results page. Although we do not have a view longer than 30 days, this high incidence of the PAA has been quite common for some time now.

So now that we have established the powerful presence of PAA and Google’s vast data set that can show questions that users would be interested in, it is time to move to the next stage, which would be to harness these questions for content creation of your website. The steps below describes how you can use Python to do this for yourself.

At the very end, we have bonus resources for you too.

Setting Up the Python Script to Scrape Google PAA Answer Box (thanks to a freelancer that assisted in the creation of this script)

Step 1

Import the required Libraries

#import block

import requests

##import request library - this is http for humans

from bs4 import BeautifulSoup

#import BeautifulSoup library

import pandas as pd

#import pandas for file read dataframe...

import time

These libraries will form the foundation of the rest of the program

Step 2

Create a csv file that has all the words for which you wish to scrape the questions in the PAA Google Answer Box. In my case, I named the csv file as paa-searcha.csv and the contents of the file are a set of keywords in a single column.

Step 3

Have Python open a Google.com page and conduct a search for the keyword of your choice.

#specify the file name and location

file_path="paa-csvsearcha.csv"

#open the file to be read

df=pd.read_csv(file_path, usecols =["query"],squeeze = True)

#block to create a dataframe

key_search=list(df)

#create an empty output dictionary

output_csv={}

#creating dictionary output_csv...

key=1

##counter variable key for the dictionary...

max_ques_val=0

#block for a loop that goes through each keyword one by one and stores the output

Step 4

Retrieve the Search Results Page into Python.

for searchKey in key_search:

page = 'https://www.google.com/search?q='+searchKey

#url to be searched

req = requests.get(page, headers={'User-Agent': 'Mozilla/5.0'})

#avoid acting as a bot

soup = BeautifulSoup(req.content,'html5lib')

#soup object to parse the file

#print(soup)

Step 5

Parse the HTML for the PAA Answer Box Questions.

At this point, when I did my first search on red shoes, I used the following manner to find the div class tag in which the PAA questions were tagged in the HTML

on Google for me. Chances are, this may be different for you.

In order to find the div tag for you, follow the following steps:

A. Right click on your mouse and choose ‘Inspect’ (I am assuming you are using a Chrome browser)

B. Click on the “select an element” icon as shown in the screenshot below.

C. Move the cursor to the first PAA.

This will highlight the HTML for the first PAA.

D. Look for the div class ID in here:

In my case, the div class id was Lt3Tzc, which I only saw the first time when I was putting this post together.  On my subsequent visit, the div class id still exists but is hidden.

How do I know this?

I looked for the div class in the Python output when I uncomment the line ‘print(soup) in the line in the last line Step 3 above and run the program.

This div class id is missing from the view in Chrome highlighted below:

E. Store everything in the target Div class id in an empty list.

#block to create an empty list called paa_list and look for content to be added that is within a certain div class

paa_list=[]

#create empty list

div_tag=soup.find_all("div", {"class": "Lt3Tzc"})

#find all data with div tag with class Lt3Tzc and store it in div_tag variable...

for tags in div_tag:

paa_list.append(tags.text)

#add each element to the list..

count=len(paa_list)

#print(count)

#count total elements...

print("searched word:",searchKey)

#print searched word on console...

print("Questions fetched:\n",paa_list)

#print questions fetched on console...

print("\n")

#next line

F. Store PAA in case Google ever shows 10 questions.

if count == 1:

output_csv[key]=[searchKey, paa_list[0]];

#check the count in the list and accordingly assign it to the dictionary

elif count == 2:

output_csv[key]=[searchKey, paa_list[0], paa_list[1]];

#check the count in the list and accordingly assign it to the dictionary

elif count == 3:

output_csv[key]=[searchKey, paa_list[0], paa_list[1], paa_list[2]];

#check the count in the list and accordingly assign it to the dictionary

elif count == 4:

output_csv[key]=[searchKey, paa_list[0], paa_list[1], paa_list[2], paa_list[3]];

#check the count in the list and accordingly assign it to the dictionary

elif count == 5:

output_csv[key]=[searchKey, paa_list[0], paa_list[1], paa_list[2], paa_list[3], paa_list[4]];

#check the count in the list and accordingly assign it to the dictionary

elif count == 6:

output_csv[key]=[searchKey, paa_list[0], paa_list[1], paa_list[2], paa_list[3], paa_list[4], paa_list[5]];

#check the count in the list and accordingly assign it to the dictionary

elif count == 7:

output_csv[key]=[searchKey, paa_list[0], paa_list[1], paa_list[2], paa_list[3], paa_list[4], paa_list[5], paa_list[6]];

#check the count in the list and accordingly assign it to the dictionary

elif count == 8:

output_csv[key]=[searchKey, paa_list[0], paa_list[1], paa_list[2], paa_list[3], paa_list[4], paa_list[5], paa_list[4], paa_list[6], paa_list[7]];

#check the count in the list and accordingly assign it to the dictionary

elif count == 9:

output_csv[key]=[searchKey, paa_list[0], paa_list[1], paa_list[2], paa_list[3], paa_list[4], paa_list[5], paa_list[6], paa_list[7], paa_list[8]];

#check the count in the list and accordingly assign it to the dictionary

elif count == 10:

output_csv[key]=[searchKey, paa_list[0], paa_list[1], paa_list[2], paa_list[3], paa_list[4], paa_list[5], paa_list[6], paa_list[7], paa_list[8], paa_list[9]];

#check the count in the list and accordingly assign it to the dictionary

else:

print("more than 10 questions for a particular searched word...");

key+=1

#increment key...

if count > max_ques_val:

max_ques_val=count

#update the max. value for the heading value...

time.sleep(5)

#sleep counter for 5 seconds...

#block that creates an output file

if  max_ques_val == 1:

output_file_create = pd.DataFrame.from_dict(output_csv, orient = 'index', columns = ['Searched words','Question 1']);

elif  max_ques_val == 2:

output_file_create = pd.DataFrame.from_dict(output_csv, orient = 'index', columns = ['Searched words','Question 1','Question 2']);

elif  max_ques_val == 3:

output_file_create = pd.DataFrame.from_dict(output_csv, orient = 'index', columns = ['Searched words','Question 1','Question 2', 'Question 3']);

elif  max_ques_val == 4:

output_file_create = pd.DataFrame.from_dict(output_csv, orient = 'index', columns = ['Searched words','Question 1','Question 2', 'Question 3', 'Question 4']);

elif  max_ques_val == 5:

output_file_create = pd.DataFrame.from_dict(output_csv, orient = 'index', columns = ['Searched words','Question 1','Question 2', 'Question 3', 'Question 4', 'Question 5']);

elif  max_ques_val == 6:

output_file_create = pd.DataFrame.from_dict(output_csv, orient = 'index', columns = ['Searched words','Question 1','Question 2', 'Question 3', 'Question 4', 'Question 5', 'Question 6']);

elif  max_ques_val == 7:

output_file_create = pd.DataFrame.from_dict(output_csv, orient = 'index', columns = ['Searched words','Question 1','Question 2', 'Question 3', 'Question 4', 'Question 5', 'Question 6', 'Question 7']);

elif  max_ques_val == 8:

output_file_create = pd.DataFrame.from_dict(output_csv, orient = 'index', columns = ['Searched words','Question 1','Question 2', 'Question 3', 'Question 4', 'Question 5', 'Question 6', 'Question 7', 'Question 8']);

elif  max_ques_val == 9:

output_file_create = pd.DataFrame.from_dict(output_csv, orient = 'index', columns = ['Searched words','Question 1','Question 2', 'Question 3', 'Question 4', 'Question 5', 'Question 6', 'Question 7', 'Question 8', 'Question 9']);

elif  max_ques_val == 10:

output_file_create = pd.DataFrame.from_dict(output_csv, orient = 'index', columns = ['Searched words','Question 1','Question 2', 'Question 3', 'Question 4', 'Question 5', 'Question 6', 'Question 7', 'Question 8', 'Question 9', 'Question 10']);

else:

print("... more than 10 questions for a particular searched word ...");

Step 6

Store the output in a csv file with a name of your choosing.

In my case, I named the file to be output_csv

output_file_create.head()

#printing the head of output_file_create file...

output_file_create.to_csv('output_file.csv', encoding='utf-8-sig')

#exporting the output to the dataframe in output_file_create file...

print("Output file created successfully")

print("...script ended...")

Once the program has run, the output in your file should look something like the following:

Step 7

Once you have the all the appropriate PAA Answer Box questions, you should check the search volume on these questions and associated phrases in a keyword like https://ahrefs.com/.

Step 8

Once you have a list of questions with decent search volume, create content that offers relevant searchers something over and above what the current content does.

Step 9

Promote the content created. A passive approach where you wait for the content to be picked up by others is unlikely to get you the traffic or provide Google with any signals that such content is worthy of ranking highly. 26% of clicks will go to the first link provided on a search, so you need to make sure yours is at the top.

Step 8 and Step 9 are the hardest part of the process and often cause good content to lie un-discovered.

Bonus Resources:

A. If you want to scrape and visualize the questions that become visible one you click upon each question, you may be interested in this great script written by Alessio Nittoli.

B. If you wish to account for any regional variations where Google may be changing the questions based on the city/country that you are searching from, you may want to use some commercial scraping services like https://serpapi.com/, https://zenserp.com/, https://serpstack.com/, https://serpproxy.com/ and many more

And with these commercial providers, you eliminate the risk of violating Google’s Terms of Service yourself.

Happy PAA scraping.

You can get the code above from https://github.com/hkamboe/Google-PAA-People-Also-Ask-Scraping