Automating Article Summarization and Posting to Telegram using Python
Automating Article Summarization and Posting to Telegram using Python
In this blog post, we will walk through a Python script that automates the process of extracting text from a web page, summarizing it into key points using the Groq API, and posting the summary to a Telegram channel. This can be particularly useful for curating and sharing tech content efficiently.
Overview of the Script
The script is divided into several key functions:
- Extracting text from a URL using BeautifulSoup
- Summarizing the text with the Groq API
- Posting the summary to a Telegram channel
Let’s dive into each part of the script.
Importing Necessary Libraries
import requests
from groq import Groq
from bs4 import BeautifulSoup
We start by importing the necessary libraries: requests
for making HTTP requests, BeautifulSoup
for parsing HTML content, and Groq
for interacting with the Groq API.
Setting Up Variables
channel_id = "@ArticleDigestTech"
token = "<<TOKEN>>"
url = "https://techcommunity.microsoft.com/t5/educator-developer-blog/local-development-using-azure-cosmos-db-emulator-at-no-cost/ba-p/4153822"
title = "Local Development using Azure Cosmos DB Emulator at no Cost"
client = Groq(api_key="<<API_KEY>>")
Here, we define variables for the Telegram channel ID, bot token, article URL, and title. We also initialize the Groq API client with an API key.
Defining the Question Template
question_template = (
"Please extract the main points of the article as bullet list. Article is an extraction from html page as a text. "
"Provide only valuable information for senior dotnet developer. Do not add any non-valuable information from software development perspective. "
"Return ONLY bullet points WITHOUT any explanation or comments. Don't be verbose. "
"Expected result: "
"- point 1 "
"- point 2 "
"- point 3 "
"- point N "
"INPUT:\n{}"
)
The question_template
is used to instruct the Groq API on how to summarize the article. It specifies that only valuable information for senior .NET developers should be included.
Extracting Text from the URL
def extract_text_from_url():
response = requests.get(url)
if response.status_code == 200:
soup = BeautifulSoup(response.content, 'html.parser')
text = soup.get_text()
lines = (line.strip() for line in text.splitlines())
chunks = (phrase.strip() for line in lines for phrase in line.split(" "))
text = '\n'.join(chunk for chunk in chunks if chunk)
return text
else:
return f"Failed to retrieve the webpage. Status code: {response.status_code}"
This function sends a GET request to the specified URL, parses the HTML content with BeautifulSoup, and extracts the text. If the request fails, it returns an error message.
Summarizing the Text with Groq API
def get_answer_from_groqcloud(article_text):
question = question_template.format(article_text)
chat_completion = client.chat.completions.create(
messages=[
{
"role": "user",
"content": question,
}
],
model="llama3-8b-8192"
)
return chat_completion.choices[0].message.content
This function formats the extracted article text with the question_template
and sends it to the Groq API. The response, which contains the summarized points, is returned.
Posting the Summary to Telegram
def send_telegram_message(message):
telegram_url = f"https://api.telegram.org/bot{token}/sendMessage"
payload = {
"chat_id": channel_id,
"text": f"[{title}]({url})\n{message}",
"parse_mode": "Markdown"
}
response = requests.post(telegram_url, data=payload)
return response.json()
This function sends the summarized points to the specified Telegram channel using the Telegram bot API. The message is formatted in Markdown.
Main Function
if __name__ == "__main__":
try:
article = extract_text_from_url()
if "Failed to retrieve the webpage" in article:
raise ValueError(article)
print("Extracted Article Text:")
print(article)
except (requests.exceptions.RequestException, ValueError) as e:
print("Error extracting text from URL:", e)
exit(1)
try:
answer = get_answer_from_groqcloud(article)
print("Extracted Main Points:")
print(answer)
except requests.exceptions.RequestException as e:
print("Error calling GroqCloud API:", e)
exit(1)
try:
response = send_telegram_message(answer)
print("Response from Telegram API:")
print(response)
except requests.exceptions.RequestException as e:
print("Error posting message to Telegram:", e)
exit(1)
The main function orchestrates the entire process: extracting the article text, summarizing it, and posting the summary to Telegram. It includes error handling for each step to ensure robustness.
Conclusion
This script automates the extraction, summarization, and sharing of technical articles. By leveraging the Groq API and BeautifulSoup, it ensures that only the most relevant information is shared with your audience, saving time and effort. Whether you’re a content curator or a tech enthusiast, this script can be a valuable tool in your workflow.