Available online at www.sciencedirect.com ScienceDirect Available online at www.sciencedirect.com Procedia Computer Science 00 (2021) 000–000 ScienceDirect www.elsevier.com/locate/procedia Procedia Computer Science 192 (2021) 3432–3439 25th International Conference on Knowledge-Based and Intelligent Information & Engineering Systems Wordhyve: A context-aware language learning app for vocabulary enhancement through images and learning contexts Mohammad Nehal Hasninea*, Junji Wua a Reseach Reseach Center for Computing and Multimedia Studies, Hosei University, 3-7-2 Kajinocho, Koganei city, Tokyo 184-8584, Japan Abstract Vocabulary acquisition is an essential component for mastering any language as words are the building blocks of a language. In informal learning, foreign language learners often struggle to memorize new vocabularies, and therefore, new tools need to be developed to facilitate vocabulary acquisition. In computer-assisted learning environments, images are often used as annotations to represent words because images convey the essence of a word more effectively than verbal descriptions. Also, understanding the learning contexts in which learning happens is crucial for any computer-assisted learning environment. From this standpoint, in this research, a context-aware language learning app called Wordhyve is developed. Wordhyve, a native Android app, is built to support foreign language learners in memorizing foreign vocabularies using multimedia annotations, including images, texts, translation, voices, and contextual clues. Wordhyve allows language learners to capture and record lifelogs. Later on, the app uses those learning experiences as triggers to enhance foreign vocabularies. The analytics of the Wordhyve use those logs for the recommendation of incidental vocabularies and assist learners in memorizing those recommended vocabularies using various learning contexts. Wordhyve uses image analytics on the learner-captured images to recommend those incidental vocabularies. © 2021 2021 The The Authors. Authors. Published Published by by ELSEVIER Elsevier B.V.B.V. © This is an open access article under the CC CC BY-NC-ND BY-NC-ND license license (https://creativecommons.org/licenses/by-nc-nd/4.0) This is an open access article under the (https://creativecommons.org/licenses/by-nc-nd/4.0) Peer-review under responsibility of the scientific committee of KES International. Peer-review under responsibility of the scientific committee of KES International Keywords: context-aware application, image analytics, incidental vocabulary, language learning, learning context, vocabulary, Wordhyve * Mohammad Nehal Hasnine (Corresponding author.) Tel.: +81-(0)42-387-6070; fax: +81-(0)42-387-6085. E-mail address: nehal.hasnine.79@hosei.ac.jp 1877-0509 © 2021 The Authors. Published by ELSEVIER B.V. This is an open access article under the CC BY-NC-ND license (https://creativecommons.org/licenses/by-nc-nd/4.0) Peer-review under responsibility of the scientific committee of KES International 1877-0509 © 2021 The Authors. Published by Elsevier B.V. This is an open access article under the CC BY-NC-ND license (https://creativecommons.org/licenses/by-nc-nd/4.0) Peer-review under responsibility of the scientific committee of KES International. 10.1016/j.procs.2021.09.116 2 Mohammad Nehal Hasnine et al. / Procedia Computer Science 192 (2021) 3432–3439 Author name / Procedia Computer Science 00 (2021) 000–000 3433 1. Introduction 1.1. Importance of vocabulary enhancement Vocabulary is an inseparable component of language development as it is impossible to read, write, and communicate without having much command of it. Vocabulary can be acquired either in traditional classroom settings with continuous support from the instructors or using language learning applications. In classroom settings, instructors are responsible for preparing the set of vocabulary traditionally items’ unit by unit. Then, the instructors urge the students to use them in various contexts, acquire associated words through dialogs or read textbooks. In technologyassisted learning, technologies guide the entire learning process. This type of learning is addressed as informal learning, where dedicated language learning applications provide a considerable amount of support as the alternative to human instructors. The process of vocabulary learning involves four stages: Discrimination, Understanding Meaning, Remembering, and Consolidation and Extension of Meaning [1]. According to Grauberg [1], discrimination involves a learner’s ability to separate sounds, characters from those next to them and from the sounds and characters of similar words when listening and reading; and to keep them distinct when speaking and writing. Understanding meaning refers to the concept of foreign words and phrases. Remembering refers to retention, ensuring that after a certain period of introducing and explaining a new word, the word stays in our short-term memory and long-term memory. Consolidation and extension of the meaning stage are associated with relearning as word learning is not instantaneous. Discrimination and understanding meaning stages are the basic steps and are closely associated with the learning style and have much researched. In contrast, remembering and consolidation stages are more related to the cognitive process of our brains. After found out the meaning of a word, if the learner has no reason to practice it anymore, it will be forgotten [2]. 1.2. Association between images and human memory for vocabulary enhancement As vocabulary learning is not an instantaneous process, it often takes time to absorb and remain in our permanent memory. After this slow learning process, once it becomes fully integrated into a learner’s permanent memory, those words can be used with the same sort of fluency as native speakers [1], [2]. As newly learned words are prone to forget quickly, this remains a critical challenge for learning via mobile applications. By far, several strategies and techniques such as multimedia effect [3], spacing effect, concept map [4], picture superiority effect, and interactive imagery are used in designing context-aware systems. In computer-assisted learning, one of the most common vocabulary acquisition strategies is the use of images. It is said that one image is worth a thousand words because a single still image conveys its meaning or essence more effectively than the verbal descriptions. With smartphone technology at our fingertips, capturing many photos in authentic contexts has become more common. It is reported that 1.4 trillion photos were taken in 2020, and it is predicted that about 1.6 trillion photos will be captured by 2022 [5]. This vast number of images is mainly shared through social networks. While scholars use images for various research purposes such as deep learning, image recognition, and social network analysis, using image analytics for facilitating vocabulary acquisition is yet to be a new area to investigate. Images are not just visual representation, it embeds texts or anything not visual that could be used to design new learning systems. In vocabulary acquisition, images have several roles to play. Scholars suggested that vocabulary acquisition with both labels and pictures is beneficial and more effective than vocabulary acquisition with labels only [6]. According to the dual-coding theory introduced by Paivio [7], visual and verbal information are processed in different parts of the brain. The visual channel of our brain processes visual information and produces pictorial representations. In contrast, the verbal channel of our brain processes verbal information and produces verbal representations. In our brain, both visual and verbal information are selected and held in both visual and working memory. Then, a learner establishes mental connections that organize information into cause-and-effect chains. At last, the visual model, verbal mental model, and prior knowledge are merged through constructing referential connections among them [8]. Scholars also suggested that remembering information can be significantly enhanced after such associations are formed in our brain [9]. Due to this, scholars frequently use images as a visual aid for vocabulary acquisition. 3434 Mohammad Nehal Hasnine et al. / Procedia Computer Science 192 (2021) 3432–3439 Author name / Procedia Computer Science 00 (2021) 000–000 3 1.3. Purpose of this project This project aims to assist foreign language learners in enhancing foreign vocabularies using images and learning contexts. In other words, this project aims to research incidental and intentional vocabulary learning through images. Furthermore, contextual clues associate with language learning is aimed to investigate. In this study, Wordhyve, a language learning app, supports foreign language learners in intentional and incidental vocabulary learning. Wordhyve is a native android app developed using Kotlin. Wordhyve is designed in a way so that learners can record their own learning experiences in this app. The app uses image analytics and learning analytics to understand a learner’s learning context and recommend incidental vocabularies using an intentional vocabulary log. In this paper, we introduce the app and discuss the development of it. 2. Literature review 2.1. Survey on research systems In mobile learning, Alemi et al. investigated academic vocabulary acquisition and retention using a sophisticated short-message service [10]. This study collected vocabulary retention data from university students yield the conclusion that short-message service has more effect on vocabulary retention than the traditional dictionary. A study by Agca et al. revealed the effect of multimedia contents using barcode technology in vocabulary learning [11]. This study found that their mobile-assisted learning environment has increased vocabulary knowledge in a foreign language. Two systems, namely PSI [12] and MultiPod [13] are developed in a mobile platform to support English vocabulary learning. These systems aim to create a 5-second-long learning material consisting of the spelling, the meaning, and a short video clip together with the pronunciation data to acquire an English word is generated that aids English as second language learners. These studies suggest that learning vocabulary with this kind of learning material effectively affects long-term memory retention compared to pen-to-paper-based learning. In context-aware ubiquitous learning, Chen et al. proposed a personalized context-aware ubiquitous learning system called PCULS [14]. When a learner learns an English vocabulary in this system, the system detects the location and learning time using wireless positioning technology. The PCULS system has been successfully implemented on PDA devices and tested in a school environment to support effective situational English vocabulary learning, which yielded that the context-awareness of the system is superior to without context-awareness. Ogata et al. introduced a context-aware system [15] to capture contextual information such as location, time, lifelog image, and contextual information. The system has been used for various purposes, including task-based learning, business Japanese vocabulary acquisition, authentic learning, and contextual image recommendation [16]. Later on, UEVL (Ubiquitous English Vocabulary Learning) is another ubiquitous learning system that is introduced by Huang et al. [17]. Their research looked into the systematic vocabulary learning process in various learning contexts using the UEVL system. As learning in the context is crucial in foreign vocabulary development, several studies looked into this research aspect. In the scenarios of web-based learning, AIVAS (Appropriate Image-based Vocabulary Acquisition System) is proposed that provides learning material creation support in foreign vocabulary learning [18]–[20]. The IU Ecosystem (Image Understanding Ecosystem) [21] and IVLS (Incidental Vocabulary Learning System) [22] also support learners in web-based learning. 2.2. Survey on commercial apps In order to understand the state of the arts in technology-enhance language learning research, this study surveyed popular apps in the marketplace. The objective was to understand the features of the existing apps and their limitations. The results presented in Table 1 is based on: First, the types of multimedia annotations such as image, video, animation, game, phrase cards, and breakthrough games are used to develop the app. Second, the strategies are followed to create learning material and deliver it to the learner. Third, the kind of quiz an app generates. Lastly, the development platform is used to build and release the app. Mohammad Nehal Hasnine et al. / Procedia Computer Science 192 (2021) 3432–3439 Author name / Procedia Computer Science 00 (2021) 000–000 4 3435 Table 1. Technology-enhanced language learning apps Name Platform Duolingo Android Lingvist Android Pros 1. User-friendly user interfaces 2. Breakthrough game is exciting and inspiring 3. Achievement system to monitor user's learning 4. Gradually difficulty contents are recommended 5. Duplicate wrong questions detection 6. Pre-test save user's time 7. Uniqueness in question creation Cons 1. Lack of systematic tutorial, but put the main learning process into well-designed breakthrough game 2. Question sets are manually designed 3. All components of a language are put into the quiz. Users do not know what to do without having basic knowledge 1. 1. 4. 5. Uses AI to teach language in an efficient way Focus on vocabulary AI precisely recommend themes and questions based on knowledge graphic and wrong quiz. Users can check the studying curve Unique conversation challenge 2. 3. 2. Does not use many multimedia annotations such as image and video Have different supports for different target languages SuperMemo Android 1. 2. Make full use of multimedia Organize content into themes 1. 2. Customized course is complicated Study process is more like exercise, which can be dull. Beelinguapp iOS 1. Use all kinds of text material, including news, story, articles, and music Users can add words to the glossary and practice pronunciation Users can choose from various text categories such as history, culture, mystery, science, and technology Rich audio related to the text 1. 2. Only uses texts and related audios Levels of the user are roughly divided into beginner, intermedia, and advanced Put the primary language and the target language together, which makes the learning harder 2. 3. 4. 3. 3. Design guidelines In the guidelines of context-aware learning environments, the Wordhyve system is designed to understand: • who, what, when, and where a learner has lifelogged a learning experience; and • how a capture learning experience is utilized for learning vocabulary using the functions of the app. In this study, who refers to the learner, what refers to the vocabulary in the form of lifelogging; when refers to the time lifelogging takes place; where refers to the location; and how refers to the process of learning. In order to ensure the required data that could lead us to the objective of the app, the app is designed to have the sign-up, log-in, lifelogging, assess knowledge, learning technique, and learning progress functions. The sign-up and log-in function answers to who. The lifelogging function provides the details on what, when, and where. To assess knowledge and learning technique, additional functions to be built to reflect on how the learning process takes place for each of the lifelogged vocabularies. In the dashboard, the learning progress function offers a learner to monitor learning history and outcomes. 4. Development of Wordhyve In this section, we discuss the development of the app. Here, we emphasize user registration, log-in process, and image analytics for incidental vocabulary generation in 4.1. In 4.2, we provide the details on the technical specifications. Mohammad Nehal Hasnine et al. / Procedia Computer Science 192 (2021) 3432–3439 Author name / Procedia Computer Science 00 (2021) 000–000 3436 5 4.1 User interfaces To use the app, a user must sign-up by inputting one’s demographics information. In the process of signing up, several personal information such as name, gender, age, nationality, native language, and languages of interest, etc., need to be provided. An email address and a corresponding password is a must for a new user to sign-up to use this service. A valid email address and the correct password are required to log in to the system. In Figure 1, the UIs for user registration, log creation, and incidental vocabulary recommendation are shown. Fig. 1. Wordhyve system For example, in the app, when a learner creates a log for an intentional vocabulary ‘train’, the system analyzes the image using image analytics and generates a new vocabulary, ‘electric locomotive’, which we refer to as incidental vocabulary. Wordhyve recommends such incidental vocabularies instantly to the learner and lets a learner decide to be learned those incidental vocabularies or not. 4.3 Technical specification Table 2 presents the technical specification of the app. Table 2. Technical specification of the Wordhyve Spec. Details Image analysis model 1 Microsoft cognitive vision services Image analysis model 2 Megvi’s deep learning APIs Platform (OS) Android Programming language Kotlin Database Firebase Mohammad Nehal Hasnine et al. / Procedia Computer Science 192 (2021) 3432–3439 Author name / Procedia Computer Science 00 (2021) 000–000 6 3437 4. Data In Wordhyve, data from various aspects are collected and analyzed for making the app context-aware. Demographic data captured using the app is presented in Table 3. Table 3. Data captured using the app for building analytical features Data type Description Example User name First and last name of the user John Smith Email Email address of the user wordhyve@wordhyve.com Password Combination of numbers and strings AAA111 Age Calculated from date-of-birth 25 Nationality Nationality China Occupation Occupation Student Native language The language a learner is the most familiar with Chinese Target languages(s) The languages that a learner wishes to be learned Japanese, English, Spanish Place Primary learning location Home, Library, Café Latitude Latitude of a learner according to the smartphone’s location 21.148689 Longitude Longitude of a learner according to the smartphone’s location 79.040802 Wordhyve, data associated with a particular learning context is captured. A list of the data that is logged to understand a learning context is presented in Table 4. Table 4. Log data captured using the Wordhyve for building analytical features Data type Description Example Word The word or the phrase input by the learner Train Image The image uploaded by the learner to memorize a particular word or phrase train.jpg Learning context The memo taken by the learner to describe a learning context The dark color train is very classy Reco_inci_word A list of incidental vocabularies recommended by the Wordhyve Electric locomotive Secleted_inci_word The list of the selected words from the recommended words Electric locomotive Time Time of learning (in ISO standard) 2021-05-01 11:04:45 JST Latitude Latitude of a learner according to the smartphone’s location 21.148689 Longitude Longitude of a learner according to the smartphone’s location 79.040802 EXIF Metadata of an image including place where the picture was taken Hosei university, Koganei campus 5. Summary Vocabulary learning is a complex task as instructors do not focus much on vocabulary learning in the classroom. In general, language instructors expect learners to learn vocabulary using informal learning. While learning vocabularies using informal learning methods, learners use intentional and incidental learning approaches. To acquire vocabulary using these approaches, learners use language learning apps that have vocabulary learning features. Most language learning apps use various multimedia annotations such as images, texts, audio, animations, and creating Mohammad Nehal Hasnine et al. / Procedia Computer Science 192 (2021) 3432–3439 Author name / Procedia Computer Science 00 (2021) 000–000 3438 7 learning materials. Besides, language learning apps use analytical functions to understand a learner’s contextual information as contexts play a crucial role in vocabulary memorization. In this paper, Wordhyve, a context-aware language learning app, is introduced to the language learning literature. Wordhyve allows the learners to create the learning logs in authentic contexts. A learner can capture images in the app and use them to create an intentional vocabulary learning material. After that, Wordhyve’s analytical function uses an intentional learning log to generate incidental vocabularies that a learner could learn. Wordhyve’s analytical function uses image analytics using cutting-edge cognitive vision APIs. The app also uses various ubiquitous learning logs (described in Table 3 and Table 4) for supporting vocabulary learning using learning contexts. The current work has some limitations. For example, this paper does not include information on the evaluation experiment to measure the efficiency of the approach and usability of the system. To measure the system's effectiveness, we plan to conduct a user study where our target is to collect data from foreign language learners. The experiment would measure- i) short-term and long-term memory retention and ii) the acceptance ratio of the Wordhyve recommended incidental vocabularies by the learner. Another potential limitation could be the application behavior of the Wordhyve. In this regard, at present, the system requires uploading a picture coupling with the word to be learned. However, this may happen that a foreign language learner does not know the object he/she wishes to learn in a new language. A feature called scene analysis AI (refer to Fig.2) would be implemented to address this issue. This feature of the Wordhyve would leverage the modern image recognition technologies that could easily infer the objects to a target language. Acknowledgements This project is supported by JSPS Grant-in-Aid for Young Scientists 21K13651. References [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] [14] [15] [16] [17] W. Grauberg, The elements of foreign language teaching, vol. 7. Multilingual Matters, 1997. R. Rohmatillah, “A study on stidents’ difficulties in learning vocabulary,” English Education: Jurnal Tadris Bahasa Inggris, vol. 6, no. 1, pp. 75–93, 2014. Y. Yeh and C. Wang, “Effects of multimedia vocabulary annotations and learning styles on vocabulary learning,” Calico Journal, pp. 131– 144, 2003. C.-C. Chiou, L.-C. Tien, and L.-T. Lee, “Effects on learning of multimedia animation combined with multidimensional concept maps,” Computers & Education, vol. 80, pp. 211–223, 2015. “How Many Photos Will Be Taken in 2020? - Life In Focus.” https://focus.mylio.com/tech-today/how-many-photos-will-be-taken-in-2020 (accessed Apr. 29, 2021). C.-C. Lin and Y.-C. Yu, “Effects of presentation modes on mobile-assisted vocabulary learning and cognitive load,” Interactive Learning Environments, vol. 25, no. 4, pp. 528–542, 2017. A. Paivio, Mental representations: A dual coding approach. Oxford University Press, 1990. R. Shadiev, T.-T. Wu, and Y.-M. Huang, “Using image-to-text recognition technology to facilitate vocabulary acquisition in authentic contexts,” ReCALL, vol. 32, no. 2, pp. 195–212, 2020. H. Peker, M. Regalla, and T. D. Cox, “Teaching and learning vocabulary in context: Examining engagement in three prekindergarten French classrooms,” Foreign Language Annals, vol. 51, no. 2, pp. 472–483, 2018. M. Alemi, M. R. A. Sarab, and Z. Lari, “Successful learning of academic word list via MALL: Mobile Assisted Language Learning.,” International Education Studies, vol. 5, no. 6, pp. 99–109, 2012. R. K. Agca and S. Özdemir, “Foreign language vocabulary learning with mobile technologies,” Procedia-Social and Behavioral Sciences, vol. 83, pp. 781–785, 2013. K. Hasegawa, S. Amemiya, M. Ishikawa, K. Kaneko, H. Miyakoda, and W. Tsukahara, “PSI: A system for creating english vocabulary materials based on short movies,” The Journal of Information and Systems in Education, vol. 6, no. 1, pp. 26–33, 2007. K. Hasegawa, S. Amemiya, K. Kaneko, H. Miyakoda, and W. Tsukahara, “MultiPod: A multilinguistic word learning system based on iPods,” 2007. C.-M. Chen and Y.-L. Li, “Personalised context-aware ubiquitous learning system for supporting effective English vocabulary learning,” Interactive Learning Environments, vol. 18, no. 4, pp. 341–364, 2010. H. Ogata, M. Li, B. Hou, N. Uosaki, M. M. El-Bishouty, and Y. Yano, “SCROLL: Supporting to share and reuse ubiquitous learning log in the context of language learning.,” Research & Practice in Technology Enhanced Learning, vol. 6, no. 2, 2011. H. Ogata, N. Uosaki, K. Mouri, M. N. Hasnine, V. Abou-Khalil, and B. Flanagan, “SCROLL Dataset in the context of ubiquitous language learning,” in Workshop Proceedings of the 26th International Conference on Computer in Education, pp. 418–423, 2018. Y.-M. Huang, Y.-M. Huang, S.-H. Huang, and Y.-T. Lin, “A ubiquitous English vocabulary learning system: Evidence of active/passive attitudes vs. usefulness/ease-of-use,” Computers & Education, vol. 58, no. 1, pp. 273–282, 2012. 8 [18] [19] [20] [21] [22] Mohammad Nehal Hasnine et al. / Procedia Computer Science 192 (2021) 3432–3439 Author name / Procedia Computer Science 00 (2021) 000–000 3439 M. N. Hasnine, Y. Hirai, M. Ishikawa, H. Miyakoda, and K. Kaneko, “A vocabulary learning system by on-demand creation of multilinguistic materials based on appropriate images,” Proceedings of the 2014 e-Case & e-Tech, pp. 343–356, 2014. M. N. Hasnine, M. Ishikawa, Y. Hirai, H. Miyakoda, and K. Kaneko, “An algorithm to evaluate appropriateness of still images for learning concrete nouns of a new foreign language,” IEICE Transactions on Information and Systems, vol. 100, no. 9, pp. 2156–2164, 2017. M. N. Hasnine, Y. Hirai, M. Ishikawa, H. Miyakoda, and K. Kaneko, “Learning effects investigation of an on-demand vocabulary learning materials creation system based on appropriate images,” , Proceedings of the 2015 International Conference on 4th ICT-ISPC, 2015. M. N. Hasnine, G. Akçapõnar, K. Mouri, and H. Ueda, ÒAn Intelligent Ubiquitous Learning Environment and Analytics on Images for Contextual Factors Analysis,” Applied Sciences, vol. 10, no. 24, p. 8996, 2020. M. N. Hasnine, K. Mouri, G. Akcapinar, M. M. H. Ahmed, and H. Ueda, ÒA New Technology Design for Personalized Incidental Vocabulary Learning using Lifelog Image Analysis,” Proceedings of the 28th International Conference on Computers in Education (ICCE2020), pp.516-521, 2020