Web Scraping Databases with Mechanical Soup and SQlite



Web Scraping Databases with Mechanical Soup and SQlite

Web Scraping Databases with Mechanical Soup and SQlite

Hi Everyone! In this step by step tutorial, we will extract a huge table of data from the internet and store it inside an SQLite database!
To keep things simple I’ve chosen a Wikipedia table, but I highly encourage you to apply the same principles on data that updates a bit more frequently (for example weather forecasts) 😃

If you’re curious about my IDE – I’m using Wayscript which is now available for the wide public! you no longer need an invitation, you can simply sign up with the following link: https://app.wayscript.com

⭐clone complete tutorial code⭐
https://app.wayscript.com/lairs/517c9eb3-a662-41ec-9fe8-c09b2a7559bc/public

⏰ TIMESTAMPS ⏰
***************************************
00:00 – intro
00:34 – imports and installs
01:42 – web scraping with mechanical soup
02:20 – select HTML table elements
03:47 – extract element attributes
06:11 – find the index value of a list item
07:13 – extract multiple columns of table data
09:44 – organize extracted columns
12:44 – enumerate function
14:02 – dictionary to data frame
14:53 – create SQLite database
15:36 – create SQLite table
16:35 – insert Pandas data frame into SQlite table
17:26 – save data permanently inside database file
18:49 – thanks for watching!

💻 CODE AND IMPORTANT LINKS 💻
***************************************
⭐ URL used in the tutorial:
https://en.wikipedia.org/wiki/Comparison_of_Linux_distributions

⭐ complete code repository on Github:
https://github.com/MariyaSha/WebscrapingDatabases

⭐install SQLite on Linux:
sudo apt install sqlite3

⭐install SQLite on Windows:
Download the Precompiled Binaries for Windows zip file from SQLite docs:
https://www.sqlite.org/download.html

⭐install SQLite on MAC or Anaconda:
no need to install – you already have it! 😁

⭐ code used in the tutorial:
column_names = [“Founder”,
“Maintainer”,
“Initial_Release_Year”,
“Current_Stable_Version”,
“Security_Updates”,
“Release_Date”,
“System_Distribution_Commitment”,
“Forked_From”,
“Target_Audience”,
“Cost”,
“Status”]

📽️ RELATED TUTORIALS📽️
***************************************
🌞 Much Better HTML table Web Scraping with Pandas:
https://youtu.be/oF-EMiPZQGA

🌞 SQLite Databases for Beginners:
https://youtu.be/Ohj-CqALrwk

🌞 Web Scraping Images with Mechanical Soup:
https://youtu.be/drDdb1MBBfI

🌞 Web Scraping Text with Beautiful Soup:
https://youtu.be/ySNSY7iiBDY

Comments are closed.