Web Scraping in Python using Beautiful Soup

Web Scraping in Python using Beautiful Soup



Web Scraping in Python using Beautiful Soup

Web scraping refers to the automatic process of extracting data from different websites. One of the best libraries in Python for this task is BeautifulSoup, which is used for crawling and extracting data from HTML and XML pages.



Step1: Installing:

pip install requests beautifulsoup4

 (We have it installed already)

Web Scraping in Python using Beautiful Soup

 



Step2: Importing:

Import the BeautifulSoup module in your script

Web Scraping in Python using Beautiful Soup






Step 3: Using:

To scrape an HTML page with the BeautifulSoup library in Python and extract data from the HTML, we can either use an HTML file stored on our computer or fetch the HTML code of a webpage using the requests module in Python. We previously discussed the requests module in a separate post.

To use the BeautifulSoup library in Python, we can either work with a local HTML file or fetch the HTML of a webpage. we create a BeautifulSoup object (usually named soup), which allows us to use its methods to extract and scrape the data we need from the page.



Web Scraping in Python using Beautiful Soup



Step4: Methods:

The BeautifulSoup library has many methods for web scraping, but we will only cover the most useful and important ones along with examples and small projects.

To find the desired tag and its attributes on an HTML page, you can use the "Inspect Element" tool in your browser. This way, you can easily locate the tag you need and use it in your code.

Web Scraping in Python using Beautiful Soup



  • soup.find("tag name",attribute: class="" or  id="" or string="" and...)


Finding the first tag on the page based on its name and attribute.


 

  • soup.find_all("tag name",attribute: class_="" or id="" or string="" and...)

Finding all the tags on the page based on their name and attribute.



  • soup.text 

Access to the text inside the tag



  • soup.tag["attribute"]

Access to the attribute value of a tag




Step5: Examples:



  • Find the first h1 tag on GitHub along with its text:

Web Scraping in Python using Beautiful Soup





  • Display a user's name on GitHub(deftincomputer):

Web Scraping in Python using Beautiful Soup


Web Scraping in Python using Beautiful Soup




  • The URL of the profile picture of the user 'deftincomputer' on GitHub:

Web Scraping in Python using Beautiful Soup


Web Scraping in Python using Beautiful Soup



List of repositories belonging to the user 'deftincomputer' on GitHub:


Web Scraping in Python using Beautiful Soup


Web Scraping in Python using Beautiful Soup


Comments

Popular

hack chrome dino game