728x90
html.parser 보다 lxml 이 빠르다.
chatgpt 한테 물어봤다. 어떻게 해야 뷰티풀숩 속도를 높일 수 있는지
대답
- lxml 사용하기, Use the lxml library: The lxml library is faster than the default parser (html.parser) in BeautifulSoup. You can install lxml using pip: pip install lxml. To use lxml as the parser, pass 'lxml' as the second argument to the BeautifulSoup constructor.
- select 대신 find 사용하기, Use find instead of select: The find method is faster than the select method for finding elements in the HTML tree. You can use the find method to get the first element that matches a specific tag name or attribute.
- 찾을 부분만 제한두기, Limit the search scope: If you're looking for a specific element, limit the search scope to its parent element. This way, BeautifulSoup doesn't have to search the entire HTML tree.
- Use a faster parser: BeautifulSoup supports multiple parsers, including lxml, html5lib, and html.parser. You can try using a faster parser to see if it speeds up the parsing process.
- 정규표현식 안 쓰기, Avoid using regex: Regex is slow in general, so try to avoid using regex with BeautifulSoup if possible.
- Use a cache: If you're making multiple requests to the same page, you can cache the BeautifulSoup object to avoid having to parse the page again.
- concurrent.futures 모듈써서 멀티프로세스 만들기, Parallelize: If you're processing multiple pages, you can use the concurrent.futures module in Python to parse multiple pages in parallel.
728x90
'나를 위한 코드' 카테고리의 다른 글
파이썬 gui : 항목 정리 (0) | 2023.03.20 |
---|---|
파이썬 gui : 파일, 폴더 찾기 (0) | 2023.03.20 |
pyinstaller 에 내가 원하는 폰트 지정하기 (qt designer) (0) | 2023.03.20 |
파이썬 딕셔너리 KeyError 해결 (0) | 2023.03.17 |
파이썬 엑셀 데이터 입력하기 (xlsx, csv) (0) | 2023.03.17 |
댓글