Mathematical Model and Algorithm for Accurate Main Content Extraction From News Websites
Irrelevant elements like ads, menus, and footers in web pages hinder data extraction and reduce the performance of Retrieval-Augmented Generation (RAG) systems in Large Language Models (LLMs). This paper tackles the challenge of accurately identifying and extracting the main content from web pages t...
Saved in:
Main Authors: | Hamza Salem, Hadi Salloum, Manuel Mazzara |
---|---|
Format: | Article |
Language: | English |
Published: |
IEEE
2025-01-01
|
Series: | IEEE Access |
Subjects: | |
Online Access: | https://ieeexplore.ieee.org/document/10819347/ |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
-
Enhancing Environmental Control in Broiler Production: Retrieval-Augmented Generation for Improved Decision-Making with Large Language Models
by: Marcus Vinicius Leite, et al.
Published: (2025-01-01) -
IDAS: Intelligent Driving Assistance System Using RAG
by: Hernandez-Salinas Bernardo, et al.
Published: (2024-01-01) -
Crafting the Path: Robust Query Rewriting for Information Retrieval
by: Ingeol Baek, et al.
Published: (2025-01-01) -
Pic2Plate: A Vision-Language and Retrieval-Augmented Framework for Personalized Recipe Recommendations
by: Yosua Setyawan Soekamto, et al.
Published: (2025-01-01) -
Impact of Chatbots on User Experience and Data Quality on Citizen Science Platforms
by: Akasha-Leonie Kessel, et al.
Published: (2025-01-01)