{"id":28,"date":"2023-06-15T19:43:00","date_gmt":"2023-06-15T16:43:00","guid":{"rendered":"https:\/\/fsaeed.blog\/?p=28"},"modified":"2023-06-15T19:43:00","modified_gmt":"2023-06-15T16:43:00","slug":"a-beginners-guide-web-scraping-and-storing-data-in-excel-with-python","status":"publish","type":"post","link":"https:\/\/fsaeed.blog\/index.php\/2023\/06\/15\/a-beginners-guide-web-scraping-and-storing-data-in-excel-with-python\/","title":{"rendered":"A Beginner&#8217;s Guide: Web Scraping and Storing Data in Excel with Python"},"content":{"rendered":"\n<p>Hello tech enthusiasts at FSAEED.BLOG! Today we\u2019re focusing on a fundamental skill in the world of data and automation: web scraping and storing the scraped data in an Excel sheet. This tutorial is perfect for business users who may not have the privileges to install a database on their computer but still want to work with structured data.<\/p>\n\n\n\n<p>Before we dive in, let&#8217;s introduce the main actors of this process:<\/p>\n\n\n\n<ol class=\"wp-block-list\">\n<li><strong>Python<\/strong>: A user-friendly language that&#8217;s excellent for beginners and a powerful tool in data manipulation and analysis.<\/li>\n\n\n\n<li><strong>BeautifulSoup<\/strong>: A Python library used for web scraping purposes to extract the data from HTML and XML documents.<\/li>\n\n\n\n<li><strong>pandas<\/strong>: Another Python library used for data manipulation and analysis. It&#8217;s particularly well-suited to working with structured data.<\/li>\n<\/ol>\n\n\n\n<p><strong>Step 1: Installing Required Libraries<\/strong><\/p>\n\n\n\n<p>If you haven&#8217;t already, you&#8217;ll need to install both BeautifulSoup and pandas. You can do this with pip, Python&#8217;s package installer. Open your command prompt (Windows) or terminal (Mac\/Linux) and type the following commands:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>pip install beautifulsoup4\npip install pandas<\/code><\/pre>\n\n\n\n<p><strong>Step 2: Importing Libraries<\/strong><\/p>\n\n\n\n<p>Once installed, you need to import the libraries into your script.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>from bs4 import BeautifulSoup\nimport requests\nimport pandas as pd<\/code><\/pre>\n\n\n\n<p><strong>Step 3: Fetch and Parse Website Content<\/strong><\/p>\n\n\n\n<p>Now, let&#8217;s fetch the content from the web page. You can do this with the requests library, another Python library that allows you to send HTTP requests. We&#8217;ll use the BeautifulSoup library to parse this content.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>url = \"http:\/\/www.website.com\"\nresponse = requests.get(url)\nsoup = BeautifulSoup(response.text, 'html.parser')<\/code><\/pre>\n\n\n\n<p><strong>Step 4: Extracting the Data<\/strong><\/p>\n\n\n\n<p>Assuming the data is in a table, you can find the table and then iterate over its rows to extract the data.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>table = soup.find('table') \ndata = &#91;]\nfor row in table.find_all('tr'): \n    columns = row.find_all('td')\n    columns = &#91;element.text.strip() for element in columns]\n    data.append(&#91;element for element in columns if element]) <\/code><\/pre>\n\n\n\n<p>This code creates a list of lists, where each sub-list is a row of data.<\/p>\n\n\n\n<p><strong>Step 5: Creating a DataFrame<\/strong><\/p>\n\n\n\n<p>Now that we have the data, we can put it into a pandas DataFrame, which is a 2-dimensional labeled data structure with columns of potentially different types.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>df = pd.DataFrame(data)<\/code><\/pre>\n\n\n\n<p><strong>Step 6: Writing to an Excel File<\/strong><\/p>\n\n\n\n<p>Finally, we can use pandas to write this DataFrame to an Excel file.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>df.to_excel('output.xlsx', index=False)<\/code><\/pre>\n\n\n\n<p>And there you have it! You&#8217;ve scraped data from a website and saved it into an Excel file, all with a few lines of Python code. Remember, each website is structured differently, so the scraping part of your code may be a bit different for each website.<\/p>\n\n\n\n<p>Keep exploring and enhancing your tech skills with FSAEED.BLOG. Don&#8217;t forget to subscribe and join our tech community. Happy coding!<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Hello tech enthusiasts at FSAEED.BLOG! Today we\u2019re focusing on a fundamental skill in the world of data and automation: web scraping and storing the scraped data in an Excel sheet. This tutorial is perfect for business users who may not have the privileges to install a database on their computer but still want to work [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[1],"tags":[],"class_list":["post-28","post","type-post","status-publish","format-standard","hentry","category-uncategorized"],"_links":{"self":[{"href":"https:\/\/fsaeed.blog\/index.php\/wp-json\/wp\/v2\/posts\/28","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/fsaeed.blog\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/fsaeed.blog\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/fsaeed.blog\/index.php\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/fsaeed.blog\/index.php\/wp-json\/wp\/v2\/comments?post=28"}],"version-history":[{"count":1,"href":"https:\/\/fsaeed.blog\/index.php\/wp-json\/wp\/v2\/posts\/28\/revisions"}],"predecessor-version":[{"id":29,"href":"https:\/\/fsaeed.blog\/index.php\/wp-json\/wp\/v2\/posts\/28\/revisions\/29"}],"wp:attachment":[{"href":"https:\/\/fsaeed.blog\/index.php\/wp-json\/wp\/v2\/media?parent=28"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/fsaeed.blog\/index.php\/wp-json\/wp\/v2\/categories?post=28"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/fsaeed.blog\/index.php\/wp-json\/wp\/v2\/tags?post=28"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}