HIN-MELM-AE AND DePori-BASED AUTOMATIC TEXT SUMMARIZATION FOR MULTI-TEXT DOCUMENTS AND MULTI-LINGUAL SUMMARIES VIA ENSEMBLE LEARNING

Authors

  • Sunil Upadhyay Amity University
  • Hemant K Soni

DOI:

https://doi.org/10.6977/IJoSI.202512_9(6).0003

Keywords:

Hyperfan-IN Multilayer Extreme Learning Machine Auto Encoder (HIN-MLELM-AE), Sentence Bidirectional Encoder Representations from Transformers (SBERT), Info-Squared Fuzzy C Means Clustering (InS-FCM), Latent Dirichlet Allocation (LDA), Sememe Similarity induced Hidden Markov Model (SemSim-HMM), Parts Of Speech (POS), Term Frequency-Inverse Document Frequency (TF-IDF), and Variational Auto Encoder (VAE).

Abstract

Automatic Text Summarization (ATS) emerged from the need to manage the growing volume of textual information. ATS is a process of creating a short and accurate summary of a longer text document.The prevailing studies didn’t perform ATS for multi-document and multi-lingual summaries.This paper presents an improved ensemble learning-based automatic text summarization with slang filtering using HIN-MELM-AE and Dehghani Poor and rich optimization algorithm (DePori) techniques.Initially, the text document is taken and then pre-processed. Afterward, the slang identification and filtering are done on the pre-processed text by using DePori. Next, the slang-filtered text is transformed by InS-FCM-based clustering, LDA-based topic modeling, TF-IDF analysis, and frequent term selection. From the transformed data, the POS tagging is performed by utilizing SemSim-HMM. Then, the significant entity is extracted from the transformed data and POS-tagged text. After that, the SBERT is employed to perform entity vectorization. Finally, the ATS is done by the ensemble models, which include HIN-MELM-AE, AE, VAE, and SBERT. Next, the cosine similarity evaluation is done from the output of ensemble models. Next, the voting-based fusion, re-ranking, and optimal sentence selection are performed. At last, the summarized text is obtained.The results proved that the proposed model achieved a high accuracy of 98.72%, thus outperforming conventional methods.

Downloads

Published

2025-12-31