Python Programming
Use the Python requests library and
Beautiful Soup library to create a Python script that “scrapes” and
displays the html links and images from the home page of the
Smithsonian institute. (Si.org)
What I have so far-
#http get a file and save in python string variable
#check http code
from bs4 import BeautifulSoup
import urllib.request, urllib.parse, urllib.error
#http response to a site
resp = urllib.request.urlopen('https://www.si.edu/')
soup = BeautifulSoup(resp,"html.parser")
#get a list of anchor <a> tags
tags = soup('a')
print(type(tags))
for item in tags:
print (item.get('href',None))
for item in tags:
if "art" in str(item).lower():
print(item.get('href',None))
#save downloaded file to disk
try:
resp =
urllib.request.urlopen('https://www.si.edu/')
bytesToWrite = resp.read()
#must write as binary to maintain unicode
formatting
myFile = open("weblinks.txt",'wb')
myFile.write(bytesToWrite)
myFile.close()
except Exception as exc:
print('An error occured.' + str(exc))
Python Programming Use the Python requests library and Beautiful Soup library to create a Python script that “scrapes” a
-
- Site Admin
- Posts: 899603
- Joined: Mon Aug 02, 2021 8:13 am