[Python] Python Crawling 웹 크롤링

Notice

Recent Posts

Recent Comments

Link

« 2025/02 »
일	월	화	수	목	금	토
						1
2	3	4	5	6	7	8
9	10	11	12	13	14	15
16	17	18	19	20	21	22
23	24	25	26	27	28

Tags more

Archives

Today

Total

관리 메뉴

로메오의 블로그

[Python] Python Crawling 웹 크롤링 본문

Backend/Python & Blockchain

[Python] Python Crawling 웹 크롤링

romeoh 2020. 2. 18. 16:27

PYTHON CRAWLING 차례

beautifulsoup, requests 설치

$ pip install requests beautifulsoup4
$ touch index.py

문서 전체 출력하기

## 문서 전체 출력
from urllib.request import urlopen
from bs4 import BeautifulSoup

html  = urlopen('http://www.naver.com')
bsObject = BeautifulSoup(html, 'html.parser')

print(bsObject)

$ python index.py

문서 title 출력

## 문서 title 출력
from urllib.request import urlopen
from bs4 import BeautifulSoup

html  = urlopen('http://www.naver.com')
bsObject = BeautifulSoup(html, 'html.parser')

print(bsObject.head.title)

$ python index.py
<title>NAVER</title>

모든 메타 데이터 내용 가져오기

## 모든 메타 데이터 내용 가져오기
from urllib.request import urlopen
from bs4 import BeautifulSoup

html  = urlopen('http://www.naver.com')
bsObject = BeautifulSoup(html, 'html.parser')

for meta in bsObject.head.find_all('meta'):
    print(meta.get('content'))

$ python index.py
None
origin
text/javascript
.....
https://s.pstatic.net/static/www/mobile/edit/2016/0705/mobile_212852414260.png
네이버 메인에서 다양한 정보와 유용한 컨텐츠를 만나 보세요

모든 링크와 주소 가져오기

## 모든 링크와 주소 가져오기
from urllib.request import urlopen
from bs4 import BeautifulSoup

html  = urlopen('http://www.naver.com')
bsObject = BeautifulSoup(html, 'html.parser')

for link in bsObject.find_all('a'):
    print(link.text.strip(), link.get('href'))

$ python index.py
연합뉴스 바로가기 #news_cast
주제별캐스트 바로가기 #themecast
....
고객센터 https://help.naver.com/
NAVER Corp. https://www.navercorp.com/

가져올 태그에 조건 추가

## 가져올 태그에 조건 추가
from urllib.request import urlopen
from bs4 import BeautifulSoup

html  = urlopen('http://www.naver.com')
bsObject = BeautifulSoup(html, 'html.parser')

print (bsObject.head.find("meta", {'name': 'description'}).get('content'))

$ python index.py
네이버 메인에서 다양한 정보와 유용한 컨텐츠를 만나 보세요

PYTHON CRAWLING 차례

'Backend > Python & Blockchain' 카테고리의 다른 글

[Python] setInterval 구현 (0)	2022.11.17
[Python] Selenium으로 Crawling하기 (0)	2020.02.19
[Facial Recognition] 얼굴 인식하기 (0)	2019.07.19
[Facial Recognition] 얼굴 추출하기 (0)	2019.07.19
[Facial Recognition] 얼굴 비교하기 (0)	2019.07.19

'Backend/Python & Blockchain' Related Articles

Comments

로메오의 블로그

[Python] Python Crawling 웹 크롤링 본문

[Python] Python Crawling 웹 크롤링

beautifulsoup, requests 설치

문서 전체 출력하기

문서 title 출력

모든 메타 데이터 내용 가져오기

모든 링크와 주소 가져오기

가져올 태그에 조건 추가

'Backend > Python & Blockchain' 카테고리의 다른 글

티스토리툴바