Publicación: AN ONLINE MULTI-SOURCE SUMMARIZATION ALGORITHM FOR TEXT READABILITY IN TOPIC-BASED SEARCH

Fecha
2021
Autores
Título de la revista
ISSN de la revista
Título del volumen
Editor
COMPUTER SPEECH AND LANGUAGE
Resumen
WEB SEARCH USERS ARE LIKELY TO FACE PROBLEMS RELATED TO THE AVAILABILITY OF LARGE AMOUNTS OF DATA. AS THE QUANTITY OF ONLINE CONTENT GROWS, THE RISK OF MISSING RELEVANT INFORMATION DURING SEARCH CAN ONLY INCREASE. MOREOVER, EXTERNAL VARIABLES SUCH AS THE USERS? READING PROFICIENCY LEVEL CAN FURTHER COMPLICATE THE TASK.
THIS ARTICLE PROPOSES AN ONLINE MULTI-DOCUMENT SUMMARIZATION ALGORITHM FOR TEXT READABILITY, AS A MEANS TO SIMPLIFY WEB SEARCH. THE ALGORITHM IS DESIGNED TO WORK OVER COLLECTIONS OF TOPIC-RELATED DOCUMENTS, SUCH AS THE ONES RETURNED AS THE RESULTS TO A WEB QUERY. CONTRARY TO MOST MODERN APPROACHES, NO PRELIMINARY TRAINING FOR THE ALGORITHM IS REQUIRED.
THE ALGORITHM WAS TESTED IN BOTH ENGLISH AND SPANISH LANGUAGE DOCUMENTS, USING DIFFERENT METRICS OF TERM AND SENTENCE RELEVANCE. THE RESULTS WERE COMPARED AGAINST SUMMARIES CREATED BY BOTH HUMAN SUMMARIZERS AND THIRD-PARTY AUTOMATIC TEXT SUMMARIZATION (ATS) SYSTEMS IN TERMS OF TWO VARIABLES: READABILITY AND INFORMATION CONTENT.
IN BOTH VARIABLES, THE RESULTS SHOW GENERALIZED GAINS WITH RESPECT TO BOTH THE HUMAN SUMMARIZERS AND THE THIRD-PARTY ATS SYSTEMS. FURTHERMORE, THE ALGORITHM ACHIEVED THESE RESULTS WITH A TIME COMPLEXITY STRICTLY LOWER THAN ; WELL BELOW TRADITIONAL MACHINE LEARNING APPROACHES.