International Conference on Data Mining, Big Data and Machine Learning (DBML 2023)

Accepted Papers

Improving CNN-based Stock Trading byConsidering Data Heterogeneity and Burst

Keer Yang¹, Guanqun Zhang², Chuan Bi³, Qiang Guan⁴, Hailu Xu⁵ and Shuai Xu⁶, ¹Case Western Reserve University,Cleveland, OH, ²Nankai University,Tianjin, China, ³National Institute of Health,Baltimore, USA, ⁴Kent State University,Kent, USA, ⁵California State University,Long Beach, USA, ⁶Case Western Reserve University,Cleveland, OH

ABSTRACT

In recent years, there have been quite a few attempts to apply intelligent techniques to financial trading, i.e., constructing automatic and intelligent trading framework based on historical stock price. Due to the unpredictable, uncertainty and volatile nature of financial m arket, r esearchers h ave a lso r esorted t o d eep l earning t o c onstruct the intelligent trading framework. In this paper, we propose to use CNN as the core functionality of such framework, because it is able to learn the spatial dependency (i.e., between rows and columns) of the input data. However, different with existing deep learning-based trading frameworks, we develop novel normalization process to prepare the stock data. In particular, we first empirically observe that the stock data is intrinsically heterogeneous and bursty, and then validate the heterogeneity and burst nature of stock data from a statistical perspective. Next, we design the data normalization method in a way such that the data heterogeneity is preserved and bursty events are suppressed. We verify out developed CNN-based trading framework plus our new normalization method on 29 stocks. Experiment results show that our approach can outperform other comparing approaches.

KEYWORDS

data normalization, intelligent stock trading, CNN.

Review of Class Imbalance Dataset Handling Techniques for Depression Prediction and Detection

Simisani Ndaba, Department of Computer Science, Faculty of Science, University of Botswana

ABSTRACT

Depression is a prevailing mental disturbance affecting an individual’s thinking and mental development. There have been many researches demonstrating effective automated prediction and detection of Depression. The majority of datasets used suffer from class imbalance where samples of a dominant class outnumber the minority class that is to be detected. This review paper uses the PRISMA review methodology to enlist different class imbalance handling techniques used in Depression prediction and detection research. The articles were taken from information technology databases. The results revealed that the common data level technique is SMOTE as a single method and the common ensemble method is SMOTE, oversampling and under sampling techniques. The model level consists of various algorithms that can be used to tackle the class imbalance problem. The research gap was found that under sampling methods were few for predicting and detecting Depression and regression modelling could be considered for future research.

KEYWORDS

Depression prediction, Depression detection, Class Imbalance, Sampling, Data Level and Model Level.

Volatility Association Research Based on the Stock Market of China and ASEAN Countries-the Empirical Analysis Based on Complex Networks

Yu Wangke^a, Liu Shuhua b,Pan Ruoqi^c,HuangKe^d and Deng Linyuna, ^aSchool of Management, Nanning University，8 Long-ting Road, Nanning, Guangxi, China, ^bGuangxi Academy of Social Sciences, 5 Xin-zu Road, Nanning, Guangxi, China and ^c,dSchool of Digital Economic, Nanning University，8 Long-ting Road, Nanning, Guangxi, China

ABSTRACT

By constructing the volatility network of stock market indexes in China and ASEAN countries, the mechanism of transnational market risk transmission and the characteristics of key nodes are analyzed. Finding the volatility network is a good description of the linkage and tightness of the various share indexvolatility. The COVID-19 led to a significant increase in convergence of behavior patterns of major country share indexes, and significant differences in node changes and topological features of the volatility network. Dynamic analysis shows that the evolution of share index volatility network reflects that the overall risk of volatility network changes with time, the information link structure of the market changes with time, and major emergencies break the original structure and trigger the information connection in the market. The findings of this paper have important implications for understanding the characteristics of transnational risk transmission between the stock markets of China and ASEAN countries.

KEYWORDS

China and ASEAN, Stock market, Share index volatility network, Complex networks

Exposing the Opportunities and Benefits Provided by the Machine Learning Platform

Agun Olusola Olumuyiwa

ABSTRACT

Machine Learning is a buzzword in the technology world right now and for good reason, it represents a major step forward in how computers can learn. Machine learning is an umbrella term for a set of techniques and tools that help computers learn and adapt on their own. Machine learning algorithms help Artificial Intelligence learn without being explicitly programmed to perform the desired action. By learning a pattern from sample inputs, the machine learning algorithm predicts and performs tasks solely based on the learned pattern and not a predefined program instruction. Machine learning is a life savior in several cases where applying strict algorithms is not possible. It will learn the new process from previous patterns and execute the knowledge. One of the machine learning applications we are familiar with is the way our email providers help us deal with spam. Spam filters use an algorithm to identify and move incoming junk email to your spam folder. Several e-commerce companies also use machine learning algorithms in conjunction with other IT security tools to prevent fraud and improve their recommendation engine performance. The need for Machine Learning professionals are high in demand and this surge is due to evolving technology and the generation of huge amounts of data aka Big Data. This paper will address the major practical application of Machine learning in our present world with a view to key into it and to embrace the opportunities provided by it. It will further state the types of Machine learning algorithm stating it application.

KEYWORDS

Machine Learning, Algorithm, Big Data

Math Function Entity Recognition with Fine-tuning Pre-trained Models

Fatimah Alshamari^1,2 and Abdou Youssef¹, ^{1Department of Computer Science, The George Washington University, Washington D.C, USA, ²Department of Computer Science, Taibah University, Medina, KSA}

ABSTRACT

Mathematical Function Entity Recognition (MFER) task is a special domain of the Named Entity Recognition (NER) task and, in this work, 11 mathematical functions have been selected and grouped into five categories as domain-specific entities. In this work, we propose a model for MFER based on fine-tuning a set of pre-trained language models to identify the mathematical function groups. The proposed model is intended to help map mathematical representation to natural language, and to enable meaningful math information to be recognized and used by down-stream tasks such as math Information Retrieval, Knowledge Extraction, and Question Answering. Our contributions include: (1) a state-of-the-art result achieved by fine-tuning pre-trained models for MFER task, (2) an annotated MFER dataset that can be used by future researchers.

KEYWORDS

Named entity recognition, Math information retrieval, Math language processing, Pre-trained language models.