Otros Documentos
URI permanente para esta colección
Examinar
Examinando Otros Documentos por browse.metadata.publiauthor "ALEJANDRO MAURICIO VALDÉS JIMÉNEZ"
Mostrando 1 - 2 de 2
Resultados por página
Opciones de ordenación
- PublicaciónA PARALLEL APPROACH TO TEXT DATA AUGMENTATION FOR SENTIMENT ANALYSIS USING THE POS WISE SYNONYM SUBSTITUTION ALGORITHM(IEEE CONFERENCIAS, 2023)
;RODRIGO ANDRÉS GUTIÉRREZ BENÍTEZ ;ALEJANDRO MAURICIO VALDÉS JIMÉNEZALEJANDRA ANDREA SEGURA NAVARRETEOVER THE LAST DECADE, THE USE OF SOCIAL MEDIA AS A MASSIVE COMMUNICATION MEDIUM HAS GIVEN PEOPLE A TOOL TO EXPRESS THEIR OPINIONS. IN IT, PEOPLE WRITE THEIR THOUGHTS AND FEELINGS ABOUT PLENTY OF TOPICS GENERATING LARGE AMOUNT OF DATA THAT CAN BE ANALYZED BY COMPANIES AND RESEARCHERS. BEING TASKS OF THE NATURAL LANGUAGE PROCESSING, EMOTION ANALYSIS FOCUSES ON EXTRACTING THE UNDERLYING EMOTIONS IN TEXT, MEANWHILE, SENTIMENT ANALYSIS FOCUSES ON EXTRACTING THE POLARITY OF IT. TO ACCOMPLISH THIS TWO TASKS, TRADITIONAL MACHINE LEARNING AND DEEP LEARNING TECHNIQUES ARE USED. HOWEVER, TO REACH GOOD GENERALIZATION PERFORMANCE, THESE TECHNIQUES REQUIRE LARGE DATASETS OF LABELED DATA FOR TRAINING. FOR RESEARCHERS THIS IS AN ISSUE BECAUSE IN LANGUAGES LIKE SPANISH THE LABELED DATASETS ARE SPARSE. TO SOLVE THIS, DATA AUGMENTATION TECHNIQUES ARE USED TO GENERATE WIDER DATASETS OF LABELED DATA FROM A SMALL, LABELED DATASET. THIS WORK PRESENTS AN OPENMP VERSION FOR SHARED MEMORY SYSTEMS OF A DATA AUGMENTATION TECHNIQUE CALLED POS WISE SYNONYM SUBSTITUTION THAT REPLACES SOME OF THE WORDS OF A SENTENCE WITH THEIR SYNONYMS EXTRACTED FROM WORDNET TO CREATE NEW SENTENCES. WITH THE PARALLEL APPROACH WE REDUCED THE EXECUTION TIME REASONABLY COMPARED TO THE ORIGINAL VERSION REACHING A SPEEDUP OF UP TO 17.5X - PublicaciónIMPROVING THE DISCOVERY AND CLUSTERING OF THREE-DIMENSIONAL PROTEIN PATTERNS WITH OPENMP(IEEE CONFERENCIAS, 2023)ALEJANDRO MAURICIO VALDÉS JIMÉNEZTHE DISCOVERY OF CONSERVED THREE-DIMENSIONAL (3D) AMINO-ACID PATTERNS AMONG A SET OF PROTEIN STRUCTURES CAN BE USEFUL, FOR INSTANCE, TO PREDICT THE FUNCTIONS OF UNKNOWN PROTEINS OR FOR THE RATIONAL DESIGN OF MULTI-TARGET DRUGS. THERE ARE SEVERAL APPLICATIONS THAT PERFORM A THREE-DIMENSIONAL SEARCH OF PATTERNS IN THE STRUCTURES OF PROTEINS. HOWEVER, DISCOVERING CONSERVED 3D PATTERNS IN A SET OF PROTEINS WITH NO OTHER BASELINE PATTERNS IS A CHALLENGE. IN THIS PAPER, WE ANALYZE AND IMPROVE A STATE-OF-THE-ART ALGORITHM, 3D-PP, THAT IMPLEMENTS THIS DISCOVERY. IN THIS ALGORITHM, THE 3D PATTERNS ARE DETECTED AND CLUSTERED USING THE ROOT MEAN SQUARE DEVIATION VALUE, MEASURED AMONG EACH PAIR OF 3D PATTERNS (TOPOLOGICAL VARIABILITY INDICATOR). EVEN WHEN 3D-PP DEALS WITH THIS TASK, THE SIMULTANEOUS PROCESSING OF HIGH AMOUNTS OF PROTEINS BECOMES A COMPUTATIONAL CHALLENGE WITH THE SIZE AND THE NUMBER OF PROTEINS TO BE EVALUATED. IN THIS WORK, WE PRESENT AND ANALYZE DIFFERENT SHARED MEMORY PARALLEL STRATEGIES OF 3D-PP, USING OPENMP. THOSE STRATEGIES IMPROVE THE OVERALL PERFORMANCE OF THE ORIGINAL IMPLEMENTATION BY REDUCING PARALLEL LOAD UNBALANCE AMONG THREADS AND OVERALL INCREASING PARALLELISM. THE RESULTS SHOW SIGNIFICANT PERFORMANCE IMPROVEMENTS COMPARED TO THE ORIGINAL VERSION, ACHIEVING UP TO 13X SPEEDUP FOR A SMALL NUMBER OF PROTEINS AND 17.7× FOR A LARGER SET.