Unstructured Big Data Threat Intelligence Parallel Mining Algorithm

To efficiently mine threat intelligence from the vast array of open-source cybersecurity analysis reports on the web, we have developed the Parallel Deep Forest-based Multi-Label Classification (PDFMLC) algorithm. Initially, open-source cybersecurity analysis reports are collected and converted into...

Full description

Saved in:

Bibliographic Details
Main Authors:	Zhihua Li, Xinye Yu, Tao Wei, Junhao Qian
Format:	Article
Language:	English
Published:	Tsinghua University Press 2024-06-01
Series:	Big Data Mining and Analytics
Subjects:	unstructured big data mining parallel deep forest multi-label classification algorithm threat intelligence
Online Access:	https://www.sciopen.com/article/10.26599/BDMA.2023.9020032
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1832544925849223168
author	Zhihua Li Xinye Yu Tao Wei Junhao Qian
author_facet	Zhihua Li Xinye Yu Tao Wei Junhao Qian
author_sort	Zhihua Li
collection	DOAJ
description	To efficiently mine threat intelligence from the vast array of open-source cybersecurity analysis reports on the web, we have developed the Parallel Deep Forest-based Multi-Label Classification (PDFMLC) algorithm. Initially, open-source cybersecurity analysis reports are collected and converted into a standardized text format. Subsequently, five tactics category labels are annotated, creating a multi-label dataset for tactics classification. Addressing the limitations of low execution efficiency and scalability in the sequential deep forest algorithm, our PDFMLC algorithm employs broadcast variables and the Lempel-Ziv-Welch (LZW) algorithm, significantly enhancing its acceleration ratio. Furthermore, our proposed PDFMLC algorithm incorporates label mutual information from the established dataset as input features. This captures latent label associations, significantly improving classification accuracy. Finally, we present the PDFMLC-based Threat Intelligence Mining (PDFMLC-TIM) method. Experimental results demonstrate that the PDFMLC algorithm exhibits exceptional node scalability and execution efficiency. Simultaneously, the PDFMLC-TIM method proficiently conducts text classification on cybersecurity analysis reports, extracting tactics entities to construct comprehensive threat intelligence. As a result, successfully formatted STIX2.1 threat intelligence is established.
format	Article
id	doaj-art-ebd1e2149746426cad7fb6a1ec547934
institution	Kabale University
issn	2096-0654
language	English
publishDate	2024-06-01
publisher	Tsinghua University Press
record_format	Article
series	Big Data Mining and Analytics
spelling	doaj-art-ebd1e2149746426cad7fb6a1ec5479342025-02-03T09:08:16ZengTsinghua University PressBig Data Mining and Analytics2096-06542024-06-017253154610.26599/BDMA.2023.9020032Unstructured Big Data Threat Intelligence Parallel Mining AlgorithmZhihua Li0Xinye Yu1Tao Wei2Junhao Qian3School of Artificial Intelligence and Computer Science, Jiangnan University, Wuxi 214122, ChinaSchool of Artificial Intelligence and Computer Science, Jiangnan University, Wuxi 214122, ChinaSchool of Artificial Intelligence and Computer Science, Jiangnan University, Wuxi 214122, ChinaSchool of IoT Engineering, Jiangnan University, Wuxi 214122, ChinaTo efficiently mine threat intelligence from the vast array of open-source cybersecurity analysis reports on the web, we have developed the Parallel Deep Forest-based Multi-Label Classification (PDFMLC) algorithm. Initially, open-source cybersecurity analysis reports are collected and converted into a standardized text format. Subsequently, five tactics category labels are annotated, creating a multi-label dataset for tactics classification. Addressing the limitations of low execution efficiency and scalability in the sequential deep forest algorithm, our PDFMLC algorithm employs broadcast variables and the Lempel-Ziv-Welch (LZW) algorithm, significantly enhancing its acceleration ratio. Furthermore, our proposed PDFMLC algorithm incorporates label mutual information from the established dataset as input features. This captures latent label associations, significantly improving classification accuracy. Finally, we present the PDFMLC-based Threat Intelligence Mining (PDFMLC-TIM) method. Experimental results demonstrate that the PDFMLC algorithm exhibits exceptional node scalability and execution efficiency. Simultaneously, the PDFMLC-TIM method proficiently conducts text classification on cybersecurity analysis reports, extracting tactics entities to construct comprehensive threat intelligence. As a result, successfully formatted STIX2.1 threat intelligence is established.https://www.sciopen.com/article/10.26599/BDMA.2023.9020032unstructured big data miningparallel deep forestmulti-label classification algorithmthreat intelligence
spellingShingle	Zhihua Li Xinye Yu Tao Wei Junhao Qian Unstructured Big Data Threat Intelligence Parallel Mining Algorithm Big Data Mining and Analytics unstructured big data mining parallel deep forest multi-label classification algorithm threat intelligence
title	Unstructured Big Data Threat Intelligence Parallel Mining Algorithm
title_full	Unstructured Big Data Threat Intelligence Parallel Mining Algorithm
title_fullStr	Unstructured Big Data Threat Intelligence Parallel Mining Algorithm
title_full_unstemmed	Unstructured Big Data Threat Intelligence Parallel Mining Algorithm
title_short	Unstructured Big Data Threat Intelligence Parallel Mining Algorithm
title_sort	unstructured big data threat intelligence parallel mining algorithm
topic	unstructured big data mining parallel deep forest multi-label classification algorithm threat intelligence
url	https://www.sciopen.com/article/10.26599/BDMA.2023.9020032
work_keys_str_mv	AT zhihuali unstructuredbigdatathreatintelligenceparallelminingalgorithm AT xinyeyu unstructuredbigdatathreatintelligenceparallelminingalgorithm AT taowei unstructuredbigdatathreatintelligenceparallelminingalgorithm AT junhaoqian unstructuredbigdatathreatintelligenceparallelminingalgorithm

Unstructured Big Data Threat Intelligence Parallel Mining Algorithm

Similar Items