single ওয়েব পেজ স্ক্রাপ করি - পাইথন বাংলা টিউটোরিয়াল

লিংকের ওয়েবসাইট এ যাই

রিকোয়েস্ট পাঠাই

import requests
from bs4 import BeautifulSoup
url = 'https://subslikescript.com/movie/Titanic-120338'
headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.36'}
response = requests.get(url,headers=headers) 
soup = BeautifulSoup(response.content,'html.parser')

ব্যাখ্যা :

প্রথমে লাইব্রেরি ইম্পোর্ট করি।

import requests
from bs4 import BeautifulSoup

স্ক্র্যাপিং ওয়েবসাইট url টি নির্ধারণ করি

url = 'https://subslikescript.com/movie/Titanic-120338'

রিকোয়েস্ট পাঠানোর সময় আমরা একটি ব্রাউজার যেভাবে রিকোয়েস্ট পাঠায় সেভাবে রিকোয়েস্ট পাঠাবো এজন্য রিকোয়েস্ট হিসাবে header এ কিছু পাঠাবো যাতে মনে করে এটি একটি ব্রাউজার থেকে রিকোয়েস্ট এসেছে।

header বানাই

headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.36'}

রিকোয়েস্ট পাঠাই

response = requests.get(url,headers=headers)

এখন soup ভ্যারিয়েবল এর মধ্যে html গুলো চলে এসেছে এইরকম html আউটপুট পাবো

html কে soup ভ্যারিয়েবল এ স্টোর করেছি এখন soup ভ্যারিয়েবল কে ব্যবহার করে html এলিমেন্টকে সিলেক্ট করবো

আমাদের ওয়েবসাইট এর html টি নিচের মত

এখন আমরা নিচের নিয়মে স্ক্রাপ করবো :

import requests
from bs4 import BeautifulSoup
# Send a request to the webpage
url = 'https://subslikescript.com/movie/Titanic-120338'
response = requests.get(url)
# Parse the HTML content of the webpage using BeautifulSoup
soup = BeautifulSoup(response.content, 'html.parser')
article = soup.find('article',class_='main-article')
h1 = soup.find('h1').get_text()
p = article.find('p',class_='plot').get_text()
transscript = article.find('div',class_='full-script').get_text() 
# Print the parsed HTML content
print(h1)
print(p)
print(transscript)

ব্যাখ্যা :

রিকোয়েস্ট পাঠাই

import requests
from bs4 import BeautifulSoup
# Send a request to the webpage
url = 'https://subslikescript.com/movie/Titanic-120338'
response = requests.get(url)
# Parse the HTML content of the webpage using BeautifulSoup
soup = BeautifulSoup(response.content, 'html.parser')

article কে ধরি

article = soup.find('article',class_='main-article')

article এর মধ্যে h1 কে ধরে প্রিন্ট করি

h1 = article.find('h1').get_text()
print(h1)

article এর মধ্যে p কে ধরে প্রিন্ট করি

p = article.find('p',class_='plot').get_text()
print(p)

article এর মধ্যে div class_=’full-script কে ধরে প্রিন্ট করি

transscript = article.find('div',class_='full-script').get_text() 
print(transscript)

এক্সপোর্ট করি

import requests
from bs4 import BeautifulSoup
# Send a request to the webpage
url = 'https://subslikescript.com/movie/Titanic-120338'
response = requests.get(url)
# Parse the HTML content of the webpage using BeautifulSoup
soup = BeautifulSoup(response.content, 'html.parser')
article = soup.find('article',class_='main-article')
h1 = soup.find('h1').get_text()
p = article.find('p',class_='plot').get_text()
transscript = article.find('div',class_='full-script').get_text() 

with open(f'{h1}.txt','w',encoding='utf-8') as file:
    file.write(transscript)

ওয়েব স্ক্র্যাপিং

রিকোয়েস্ট পাঠাই

ব্যাখ্যা :

How can we help?