Publicación: A PARALLEL APPROACH TO TEXT DATA AUGMENTATION FOR SENTIMENT ANALYSIS USING THE POS WISE SYNONYM SUBSTITUTION ALGORITHM

Fecha
2023
Título de la revista
ISSN de la revista
Título del volumen
Editor
IEEE CONFERENCIAS
Resumen
OVER THE LAST DECADE, THE USE OF SOCIAL MEDIA AS A MASSIVE COMMUNICATION MEDIUM HAS GIVEN PEOPLE A TOOL TO EXPRESS THEIR OPINIONS. IN IT, PEOPLE WRITE THEIR THOUGHTS AND FEELINGS ABOUT PLENTY OF TOPICS GENERATING LARGE AMOUNT OF DATA THAT CAN BE ANALYZED BY COMPANIES AND RESEARCHERS. BEING TASKS OF THE NATURAL LANGUAGE PROCESSING, EMOTION ANALYSIS FOCUSES ON EXTRACTING THE UNDERLYING EMOTIONS IN TEXT, MEANWHILE, SENTIMENT ANALYSIS FOCUSES ON EXTRACTING THE POLARITY OF IT. TO ACCOMPLISH THIS TWO TASKS, TRADITIONAL MACHINE LEARNING AND DEEP LEARNING TECHNIQUES ARE USED. HOWEVER, TO REACH GOOD GENERALIZATION PERFORMANCE, THESE TECHNIQUES REQUIRE LARGE DATASETS OF LABELED DATA FOR TRAINING. FOR RESEARCHERS THIS IS AN ISSUE BECAUSE IN LANGUAGES LIKE SPANISH THE LABELED DATASETS ARE SPARSE. TO SOLVE THIS, DATA AUGMENTATION TECHNIQUES ARE USED TO GENERATE WIDER DATASETS OF LABELED DATA FROM A SMALL, LABELED DATASET. THIS WORK PRESENTS AN OPENMP VERSION FOR SHARED MEMORY SYSTEMS OF A DATA AUGMENTATION TECHNIQUE CALLED POS WISE SYNONYM SUBSTITUTION THAT REPLACES SOME OF THE WORDS OF A SENTENCE WITH THEIR SYNONYMS EXTRACTED FROM WORDNET TO CREATE NEW SENTENCES. WITH THE PARALLEL APPROACH WE REDUCED THE EXECUTION TIME REASONABLY COMPARED TO THE ORIGINAL VERSION REACHING A SPEEDUP OF UP TO 17.5X
Descripción
Palabras clave
Text Data Augmentation, Sentiment Analysis, OpenMP, Natural Language Processing, Emotion analysis