In web scraping, two things can impact the running status of a program: The first is that the page is not found on the server and the second is that the URL is mistyped or not found.
In the first case, the program will return an HTTPError exception. While in the second case, it will return an URLError exception.
In this blog, I will show you how to handle http errors. We will take an example by using a page that does not exist. Page ” https://docs.python.org/3/tutoal/index.html ” does not exist on the server docs.python.org. Or if you can see, we have mistyped the url address. The actual address is ” https://docs.python.org/3/tutorial/index.html “. Now we will run this program without importing the httperror exception. Below is the code with the output.
from urllib.request import urlopen
from bs4 import BeautifulSoup
page = urlopen("https://docs.python.org/3/tutoal/index.html")
bs = BeautifulSoup(page.read(),'html.parser')
print(bs.title)
Ouptut
urllib.error.HTTPError: HTTP Error 404: Not Found
We can run the above code by using the correct page address in the url function.
from urllib.request import urlopen
from bs4 import BeautifulSoup
page = urlopen("https://docs.python.org/3/tutorial/index.html")
bs = BeautifulSoup(page.read(),'html.parser')
print(bs.title)
Output
<title>The Python Tutorial — Python 3.10.6 documentation</title>
Below is the code written with an incorrect page address. We have handled http error by importing HTTPError
from urllib.request import urlopen from urllib.error import HTTPError from bs4 import BeautifulSoup try: page = urlopen("https://docs.python.org/3/tutoal/index.html") except HTTPError as error: print(error) else: print('No error detected')
Output
HTTP Error 404: Not Found
Now, we will write the same code but with the correct page address and let’s see the output.
from urllib.request import urlopen
from urllib.error import HTTPError
from bs4 import BeautifulSoup
try:
page = urlopen("https://docs.python.org/3/tutorial/index.html")
except HTTPError as error:
print(error)
else:
print('No error detected')
Output
No error detected
The second thing that can go wrong with web scraping is that the server cannot be found or that the url is incorrectly typed.