In this blog, we will do simple web scraping using beautifulsoup library
We will start by importing urllib.request. This is a standard python library and contain functions for requesting data across web and handling cookies. urlopen is a function that can be used to read html files, image files or any other files with ease
from urllib.request import urlopen
Then import beautifulsoup library
from bs4 import BeautifulSoup
Specify website address for webscraping
html = urlopen("https://en.wikipedia.org/wiki/India")
Create a beautifulsoup object
bs = BeautifulSoup(html.read(),'html.parser')
Now, specify tags that we want to display
print(bs.title) print(bs.h1)
Output
<title>India - Wikipedia</title> <h1 class="firstHeading mw-first-heading" id="firstHeading">India</h1>
Complete Code
from urllib.request import urlopen
from bs4 import BeautifulSoup
html = urlopen("https://en.wikipedia.org/wiki/India")
bs = BeautifulSoup(html.read(),'html.parser')
print(bs.title)
print(bs.h1)