본문 바로가기

전체 글

(216)
카페 지점명, 주소, 전화번호 크롤링 from selenium import webdriver as wb from bs4 import BeautifulSoup as bs import time import pandas as pd url = 'http://www.istarbucks.co.kr/store/store_map.do' driver = wb.Chrome() driver.get(url) #지역검색 버튼 클릭 btn_search = driver.find_element_by_class_name('loca_search') #btn_search = driver.find_element_by_xpath('//*[@id="container"]/div/form/fieldset/div/section/article[1]/article/header[2]/h3/..
실습 '사과' 자동검색하기 !pip install selenium import requests as req from bs4 import BeautifulSoup as bs import pandas as pd from selenium import webdriver as wb from selenium.webdriver.common.keys import Keys driver = wb.Chrome() # 웹 페이지 띄워보기 url = 'https://www.google.com/' # 웹 사이트 설정 driver.get(url) input_search = driver.find_element_by_class_name('gLFyf') # 개발자 모드 검색 창 클래스 input_search.send_keys('사과') in..
도시락 크롤링 from selenium import webdriver as wb from selenium.webdriver.common.keys import Keys from bs4 import BeautifulSoup as bs import time import pandas as pd url = 'https://www.hsd.co.kr/menu/menu_list' driver = wb.Chrome() driver.get(url) # 예외처리(try except문) # 더보기 버튼요소를 3번 클릭하기 btn_more = driver.find_element_by_class_name('c_05') try: for index in range(50): btn_more.click() time.sleep(2) #2초동안 멈춤 e..
Selenium 모듈 !pip install selenium from selenium import webdriver as wb from selenium.webdriver.common.keys import Keys #driver.implicitly_wait(5) 최대지연 5초 #웹 브라우저 실행 driver = wb.Chrome() #해당 URL을 브라우저로 실행 url = 'https://www.naver.com' driver.get(url) #검색창 태그(요소) 검색 input_search = driver.find_element_by_id('query') #검색창에 검색어를 입력할 수 있음. input_search.send_keys('날씨') # 검색할 수 있는 2가지 방식 # 1. 버튼 클릭 #검색버튼 태그(요소) 검색 ..
iframe부분 크롤링 실습 # 웹 개발자도구에서 해당 iframe을 찾아 src주소를 입력해서 찾아들어가야함. import requests as req from bs4 import BeautifulSoup as bs import pandas as pd url = 'https://movie.naver.com' url_sub = '/movie/bi/mi/pointWriteFormList.nhn?code=181381&type=after&isActualPointWriteExecute=false&isMileageSubscriptionAlready=false&isMileageSubscriptionReject=false' url_final = url + url_sub res = req.get(url_final) soup = bs(res.con..
진행사항을 알려주는 tqdm from tqdm import tqdm_notebook movie_date = [] movie_title = [] movie_rate = [] for day in tqdm_notebook(days): url = "https://movie.naver.com/movie/sdb/rank/rmovie.nhn?sel=cur&date="+day res = req.get(url) soup = bs(res.content, 'lxml') title = soup.select('div.tit5 > a') rate = soup.find_all('td',class_='point') for index in range(len(title)): movie_date.append(day) movie_title.append(title[in..
한달동안의 영화 평점 수집 import requests as req from bs4 import BeautifulSoup as bs import pandas as pd movie_date = [] movie_title = [] movie_rate = [] for day in range(20191201,20191226,1): url = "https://movie.naver.com/movie/sdb/rank/rmovie.nhn?sel=cur&tg=0&date="+str(day) res = req.get(url) soup = bs(res.content, 'lxml') title_list = soup.select('div.tit5 > a') rate_list = soup.find_all('td',class_='point') for ind..
영화랭킹 페이지에서 제목, 평점 수집하기 import requests as req from bs4 import BeautifulSoup as bs import pandas as pd url = 'https://movie.naver.com/movie/sdb/rank/rmovie.nhn?sel=cur&date=20191228' res = req.get(url) #파서 종류 : lxml, html.parser, html5lib soup = bs(res.content, 'lxml') name = soup.select('div.tit5 > a') rate = soup.find_all('td',class_='point') len(name),len(rate) #순위, 영화제목, 평점 수집 rank_list = [] name_list = [] rating_l..