HTTP Error exception handling

In web scraping, two things can impact the running status of a program: The first is that the page is not found on the server and the second is that the URL is mistyped or not found.

In the first case, the program will return an HTTPError exception. While in the second case, it will return an URLError exception.

In this blog, I will show you how to handle http errors. We will take an example by using a page that does not exist. Page ” https://docs.python.org/3/tutoal/index.html ” does not exist on the server docs.python.org. Or if you can see, we have mistyped the url address. The actual address is ” https://docs.python.org/3/tutorial/index.html “. Now we will run this program without importing the httperror exception. Below is the code with the output.

from urllib.request import urlopen
from bs4 import BeautifulSoup
page = urlopen("https://docs.python.org/3/tutoal/index.html")
bs = BeautifulSoup(page.read(),'html.parser')
print(bs.title)

Ouptut

urllib.error.HTTPError: HTTP Error 404: Not Found

We can run the above code by using the correct page address in the url function.

from urllib.request import urlopen
from bs4 import BeautifulSoup
page = urlopen("https://docs.python.org/3/tutorial/index.html")
bs = BeautifulSoup(page.read(),'html.parser')
print(bs.title)

Output

<title>The Python Tutorial — Python 3.10.6 documentation</title>

Below is the code written with an incorrect page address. We have handled http error by importing HTTPError

from urllib.request import urlopen
from urllib.error import HTTPError
from bs4 import BeautifulSoup
try:
    page = urlopen("https://docs.python.org/3/tutoal/index.html")
except HTTPError as error:
    print(error)
else:
    print('No error detected')

Output

HTTP Error 404: Not Found

Now, we will write the same code but with the correct page address and let’s see the output.

from urllib.request import urlopen
from urllib.error import HTTPError
from bs4 import BeautifulSoup
try:
page = urlopen("https://docs.python.org/3/tutorial/index.html")
except HTTPError as error:
print(error)
else:
print('No error detected')

Output

No error detected

The second thing that can go wrong with web scraping is that the server cannot be found or that the url is incorrectly typed.

Advertisement

Leave a Comment

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s