Simple web scraping using beautifulsoup library

In this blog, we will do simple web scraping using beautifulsoup library

We will start by importing urllib.request. This is a standard python library and contain functions for requesting data across web and handling cookies. urlopen is a function that can be used to read html files, image files or any other files with ease

from urllib.request import urlopen

Then import beautifulsoup library

from bs4 import BeautifulSoup

Specify website address for webscraping

html = urlopen("https://en.wikipedia.org/wiki/India")

Create a beautifulsoup object

bs = BeautifulSoup(html.read(),'html.parser')

Now, specify tags that we want to display

print(bs.title)
print(bs.h1)

Output

<title>India - Wikipedia</title>
<h1 class="firstHeading mw-first-heading" id="firstHeading">India</h1>

Complete Code

from urllib.request import urlopen
from bs4 import BeautifulSoup
html = urlopen("https://en.wikipedia.org/wiki/India")
bs = BeautifulSoup(html.read(),'html.parser')
print(bs.title)
print(bs.h1)
Advertisement

Leave a Comment

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s