The Department of Informatics of Diponegoro University once again held the Undip Global Classroom (UGC) as part of strengthening international academic collaboration and developing students ‘ global Insights. On this occasion, UGC activities were held for Natural Language Processing (NLP) courses by presenting international speakers, Dr. Suraya Alias from Universiti Malaysia Sabah.
Carrying the theme ” Beyond Text Summarization: Then and Now”, this guest lecture discusses the development of text summarization technology from classical rule-based approaches to modern technology based on Large Language Models (LLMs) and Transformer architecture.
In his presentation, Dr. Suraya explained that the current explosion of digital information makes it difficult for humans to process all text information manually. Therefore, automatic text summarization technology is one of the important solutions in the field of NLP to produce short, accurate summaries, and still maintain the core information of the original document.
Students are introduced to two main approaches in text summarization, namely:
Extractive Summarization, which selects important sentences directly from the source text without modification, and
Abstract Summarization, which is able to produce new sentences through the process of paraphrasing and understanding the context in depth.
In addition to explaining the basic concepts, this activity also explores the evolution of NLP architecture ranging from statistical methods such as TF-IDF and vector space models, the development of deep learning based on Seq2Seq and LSTM, to the Transformer revolution through the self-attention mechanism that is the foundation of modern models such as BERT, GPT, T5, and various current LLMs.
One of the parts that attracted the attention of the participants was a discussion of the attention Transformer mechanism that allows the model to understand the relationship between words in the context of sentences simultaneously, resulting in a much better understanding of language than previous NLP approaches.
This UGC activity was also complemented by two interactive demonstration sessions. The first Demo featured a direct comparison between extractive summarization and abstract summarization using English text on the theme of Artificial Intelligence. In the demonstration, participants were able to see how the extractive approach only selected important parts of the original text, while the abstractive approach was able to produce a new summary that was more natural and concise.
The second Demo shows the process of fine-tuning the Transformer model for the text summarization task using the Airline Review Dataset. In this session, students were introduced to the stages of model training, performance evaluation using metrics such as ROUGE and BERTScore, to the analysis of summary results produced by BART models after the fine-tuning process.
Dr. Suraya also shared his research experience in developing the Malay Text Summarizer Framework which combines sentence compression and sentence clustering techniques to improve the quality of automatic summaries in Malay.
Through this activity, students not only gain a theoretical understanding of the development of modern NLP, but also gain practical experience related to the implementation of generative AI technologies and transformers in text summarization. This Undip Global Classroom activity is expected to broaden students ‘ international horizons and strengthen academic networks between Diponegoro University and Universiti Malaysia Sabah in the field of artificial intelligence and natural language processing.