Treffer: Distributed Storage and Parallel Processing Technology of Financial Big Data under Cloud Computing Platform.
Weitere Informationen
Financial big data is massive, multi-source, and heterogeneous. Traditional storage and processing methods are no longer able to cope with it. This paper adopts a method that combines distributed storage with parallel computing to improve the efficiency and reliability of data processing. First, various financial data are collected, covering multi-dimensional information such as accounting records, and then the collected data is analyzed in terms of format and other aspects. In the process of sorting, the correlation and complexity of the data are gradually clarified. Then, based on the characteristics of financial big data, AbiCloud is selected to build a distributed storage architecture, using a combination of Hadoop distributed file system (HDFS) and HBase distributed database to divide the collected massive financial data into blocks based on time, and then store them in multiple storage nodes of the cloud computing platform. On this basis, the parallel processing framework Apache Spark is introduced on top of the established distributed storage architecture to decompose the specific analysis tasks and convert them into multiple subtasks that can be executed in parallel. During the entire distributed storage and parallel processing process, symmetric encryption is used to encrypt sensitive financial data stored on the cloud computing platform. The method proposed in this paper surpasses the traditional method in data reading and writing speed, showing its high efficiency in processing financial data. In terms of task execution time, the time of this method is between 206-392 milliseconds, with an average of about 205 milliseconds, while the maximum time of the traditional method is 597 milliseconds, indicating that this method provides faster data processing capabilities. In terms of resource utilization, the resource utilization rate under this method is generally high, and the resource utilization rate of multiple data points exceeds 90%. This paper provides more solid data support for financial decision-making and enhances the competitiveness and adaptability of enterprises in a rapidly changing market environment. [ABSTRACT FROM AUTHOR]