top of page

Amazon: Big Data Analysis Case Study

Updated: Sep 12, 2021

Big data is one of the advanced technologies mainly used for evaluating and integrating the collected data in the companies. The use of big data is increasing, and many companies are using the key features of big data for improving the performance of businesses and developed systems. Amazon is one of the leading e-commerce organizations that provide various services and products to their consumers [1]. The essential purpose is to examine the significance of big data in Amazon and evaluate the critical aspects of big data in the context of the Amazon e-commerce industry. This article mainly focuses on the significant three sections: contemporary elements of big data, characteristics of data, and big data preparation.

Security Issues And Challenges

Amazon is the largest e-commerce industry that provides many services to their customers and big data in an effective technology used by Amazon to improve their performance and effectiveness of the data collection processes. Two significant aspects are selected from the contemporary sections, including security issues and challenges [5]. It is identified that big data can reduce complexity and errors from the business, and Amazon is using such kind of technology for improving their performance. Security is one of the significant issues associated with the big data approach used by Amazon. The criminals send unwanted signals to the central server and collect all reliable data [1]. A recent study identified that data security is challenging for the amazon because of unauthentic access present in the servers and computing networks.

However, three major risk factors are linked with big data that increase security-related issues in the Amazon industry, such as lack of security, utilization of unauthentic networks, and misconfiguration of servers. Therefore, it is stated that amazon should focus on security while using big data technology in their businesses. According to Bello, Jung, and Camacho (2016), there are various security issues and threats in big data technology, such as DDOS attacks, malware activities, fake data generation, data breach issues, and phishing attacks and so on [1]. DDOS and malware are pervasive security threats associated with big data, which directly impact the security of amazon and reduced the performance of computing networks used by amazon.

The previous investigation found that most criminals use unauthentic networks and malicious tools to produce a large number of traffic signals and networks. These signals transfer from one network to another and impact the privacy of data [6]. It is identified that big data uses internet connections and computing networks for performing data integration activities. Still, sometimes the users use third-party applications to obtain reliable data from consumers, which directly impact privacy and lead to cyber-security attacks. Moreover, it is argued that utilizing big data technology, controlling and managing data privacy is another challenge faced by amazon. According to Bowker (2014), misconfiguration of networks is a key issue that increases the chances of a data breach in Amazon. Developers cannot identify the key elements that produce challenges and risks in the workplace [2].

Controlling and managing the malware activities and signals from the IT systems used by Amazon is another challenge that helps the criminals enter into the main server and collect data of amazon without taking any permission. Moreover, there are several other challenges associated with the big data faced by the Amazon industry, for example, unauthentic access of networks, data integration, data protection issue, the confidentiality of data, monitoring fake data from the system, and so on. Therefore, it is argued that while using the key features of big data Amazon should focus on privacy-related concerns and develop the proper security plans for securing the collected data. Moreover, data integration-related issues can be solved by providing proper training and education to the employees and evaluating the effective data integration techniques while using big data analysis approaches [9].

Data Source and Data Format

From the aspects of data, there are two major factors selected, including data source and data format in the context of the Amazon e-commerce industry. It is argued that big data is an effective computing technology that can integrate both structured and unstructured data appropriately. A recent study identified that Amazon is now providing cloud-based services to the consumers by which the small business communities can enhance the performance of computing systems and networks. It is identified that data format is an essential part of the big data technology used by Amazon for evaluating the collected data from various resources [10].

Moreover, Amazon uses various file formats in their businesses, such as text files, sequences files, Avro data files, etc. With the help of big data technology, the data may be formatted effectively where amazon can reduce complexity from the systems and improve the performance of the networks. Such data formats help the developers collect data or facts from numerous sources and effectively evaluate each data using big data analysis approaches. According to Chen and Zhang (2014), big data can improve the performance of data integration processes and help amazon improve their productivity by monitoring data formats in an effective manner [3].

While using the big data analysis technique, Amazon can effectively evaluate semi-structured and unstructured data and reduce complexity from the system. While storing data in Hadoop, Amazon needs to identify the fake data generation and unwanted signals, which may directly impact the performance of the computing devices and servers. It is recognized that there are three major kinds of data file formats involved by amazon and big data technology, for example, optimized row columnar, Parquet, and Avro [11]. All these kinds of data formats can store data files of amazon and help the management for reducing data format-related issues and problems. In the big data process, files recorded in such kinds of data formats may be split across various disks and help the Amazon industry improve the scalability and availability of data.

All these data formats mainly carry the data in the computer files and provide a platform where amazon can easily effectively evaluate the collected data. The major issue with the data formats is that the management requires effective methods for storing reliable data into the data centers, producing complexity in the systems [12]. Mainly, amazon uses a column-oriented type of data format for storing and recording the data of consumers and employees. Still, it is suggested that the management include row-based storage systems while using big data technology because of their ability to reduce system errors.

It is argued that data source is a vital part of the big data that provide a platform where amazon can collect and evaluate the data related to the customers and stakeholders. Chong, Ch'ng, Liu and, Li (2017) identified that mainly amazon uses third-party applications and social media networking sites for obtaining the data related to the consumers and store in file formats using big data networks [4]. Moreover, AWS is one of the effective services provided by Amazon, which helps other small business communities. It is observed that big data technology helps amazon for collecting data in very little time. The management team uses various data sources, including Facebook, Twitter, online communities, data-driven systems, and other IT sources.

Social interaction is a common source used by Amazon and other e-commerce companies because it provides many data sets in very little time, including both structured and unstructured data [13]. In this modern era, most consumers are using social networks and applications while searching on the internet. Amazon uses social networks as a data source to collect huge amounts of data and then integrate using big data techniques. Conclusiseveral electronic files can be used as the data source in Amazon and help obtain reliable data, including information related to the consumers and stakeholders. Therefore, it is stated that the adopted data sources by Amazon help for getting effective and reliable data but sometimes the hackers produce fake data, which may create conflicts in the system and make errors in the extensive data analysis processes.

De-Normalization, And Data Integration

There are numerous aspects of big data preparation, for example, de-normalization, data integration, aggregation, and data cleansing. In this research essay, de-normalization and data integration, both aspects, will be explained in the context of the Amazon e-commerce industry. The term normalization in big data is a kind of process that Amazon uses for normalizing and evaluating the collected data to enhance the business's overall performance. Erevelles, Fukawa, and Swayne (2016) argued that de-normalization is defined as enhancing the read performance of the developed database and reducing errors from the systems [5]. It is observed that Amazon is a leading e-commerce industry that provides AWS services to consumers. The management requires a de-normalization process along with the big data for evaluating the previously obtained data sets.

Such kind of data preparation can enhance the performance of the data warehouse used by Amazon and provide a platform where the organization can store a large number of data sets. The major issue linked with de-normalization is that it may impact the performance of Amazon if the developed system does not work properly in the database systems [14]. Moreover, while developing and implementing such a process in the business, Amazon can produce an effective relationship between the data model and the query model. With the help of the de-normalization process, Amazon can easily reduce the integrity-related problem in big data systems. Still, it requires proper communication between the servers and data sets.

In the last few years, the management team of Amazon changed their database systems and included the key aspects of de-normalization for improving the effectiveness of the proposed systems. After reviewing the recent reports, it has been found that Amazon adopted big data technology and de-normalization process in the current systems that provided better platforms for reducing problems and complexity faced by the developers [15]. Lack of proper resources and communication between the networks and data sets may increase the rate of degradation, affecting the performance of developed systems.

Therefore, it is stated that while using de-normalization along with the big data, Amazon should ensure that they focus on the networks and issue of degradation and identify the risk factors which may produce breach and data loss-related issues. Hashem, et al. (2015) provided their views on big data and suggested that in the context of big data, the better option for data modeling is de-normalization which does not require more time and reduces the rate of waste time from the systems [6]. Because of such benefits, Amazon uses the key aspects of de-normalization and big data technology.

According to Kim, Trimi and Chung, (2014), the term data integration is a part of big data that combines business and technical processes for evaluating and integrating the stored data effectively [7]. Mainly, Amazon Company uses data integration approaches and big data for evaluating both unstructured and semi-structured data. Such a process offers enterprises related services to Amazon and helps scale the obtained data in various data sets by which management can easily store data effectively.

Recent literature identified that data integration provides a wide range of data quality abilities to Amazon. The management can monitor and improve the quality of collected data from various sources. Data integration is one of the best platforms where amazon and other business companies can integrate and evaluate a large number of data sets and monitor the effectiveness of the gathered data. With the help of big data, technology amazon is now performing data integration-related activities. Still, fake data generation is a key problem produced by criminals that cannot be solved easily, and amazon may suffer from the security-related issue. It is analyzed that big data integration plays a major role in the e-commerce industry, also used as a data protection tool in Amazon. Generally, such kind of process combines data originating and software formats. It then delivers consumers with an undefined view of the accumulated data, which may produce problems in the Amazon industry.

Controlling and managing the integration of data is a complex step for amazon because it requires better decisions making approaches and reliable systems that evaluate the collected data inappropriately. According to Kiran, Murphy, Monga, Dugan, and Baveja (2015), Amazon faces various kinds of challenges while using the data integration process and the big data. These challenges include data uncertainty, finding insights, availability of data, and syncing across data resources [8]. Uncertainty of data is a major problem that directly impacts the databases and systems used by Amazon. Most criminals send unauthentic networks and fake data to Amazon to enter into the data sets and collect all consumers' sensitive data.

Therefore, it is stated that de-normalization and data integration are effective data preparation aspects used by Amazon, which help evaluate the reliable and effective data sets from data sources.


After reviewing big data aspects in the context of Amazon, it is concluded that big data is an effective technology that improved the performance of amazon and monitored a large amount of data in very little time. This study critically reviewed the role of big data aspects and networks in the Amazon e-commerce industry. It highlighted the security issues associated with big data and amazon networks. It is identified that DDOS and malware are widespread security threats that occurred in big data techniques, which impact the personal data of amazon and their performance. While using the key aspects of big data Amazon can enhance its performance and evaluate both unstructured and semi-structured data effectively. Still, it requires advanced computing networks and large space for storing data of consumers.


[1]. G., Bello-Orgaz, J.J. Jung and, D., Camacho, "Social big data: Recent achievements and new challenges," Information Fusion, vol. 28, no. 6, pp.45-59, 2016.

[2]. G.C., Bowker, "Big data, big questions| the theory/data thing," International Journal of Communication, vol. 8, no. 7, p.5, 2014.

[3]. C.P. Chen and, C.Y., Zhang, "Data-intensive applications, challenges, techniques and technologies: A survey on Big Data," Information sciences, vol. 275, no. 9, pp.314-347, 2014.

[4]. A.Y.L., Chong, E., Ch'ng, M.J. Liu and, B., Li, "Predicting consumer product demands via Big Data: the roles of online promotional marketing and online reviews," International Journal of Production Research, vol. 55, no. 17, pp.5142-5156, 2017.

[5]. S., Erevelles, N. Fukawa and, L., Swayne, "Big Data consumer analytics and the transformation of marketing," Journal of Business Research, vol. 69, no. 2, pp.897-904, 2016.

[6]. I.A.T., Hashem, I., Yaqoob, N.B., Anuar, S., Mokhtar, A. Gani and, S.U., Khan, "The rise of "big data" on cloud computing: Review and open research issues," Information systems, vol. 47, no. 7, pp.98-115, 2015.

[7]. G.H., Kim, S. Trimi and, J.H., Chung, "Big-data applications in the government sector," Communications of the ACM, vol. 57, no. 3, pp.78-85, 2014.

[8]. M., Kiran, P., Murphy, I., Monga, J. Dugan and, S.S., Baveja, "Lambda architecture for cost-effective batch and speed big data processing," In 2015 IEEE International Conference on Big Data (Big Data), vol. 7, no. 6, pp. 2785-2792, 2015.

[9]. G., Manogaran, C. Thota and, M.V., Kumar, "MetaCloudDataStorage architecture for big data security in cloud computing," Procedia Computer Science, vol. 87, no. 7, pp.128-133, 2016.

[10]. G., Manogaran, R., Varatharajan, D., Lopez, P.M., Kumar, R. Sundarasekar and, C., Thota, "A new architecture of Internet of Things and big data ecosystem for secured smart healthcare monitoring and alerting system," Future Generation Computer Systems, vol. 82, no. 6, pp.375-387, 2018.

[11]. M.J. Mazzei and, D., Noble, "Big data dreams: A framework for corporate strategy," Business Horizons, vol. 60, no. 3, pp.405-414, 2017.

[12]. A., Oussous, F.Z., Benjelloun, A.A. Lahcen and, S., Belkin, "Big Data technologies: A survey," Journal of King Saud University-Computer and Information Sciences, vol. 30, no. 4, pp.431-448, 2018.

[13]. H., Özköse, E.S. Arı and, C., Gencer, "Yesterday, today and tomorrow of big data," Procedia-Social and Behavioral Sciences, vol. 195, no. 7, pp.1042-1050, 2015.

[14]. R., Ranjan, "Streaming big data processing in datacenter clouds," IEEE Cloud Computing, vol. 1, no. 1, pp.78-83, 2014.

[15]. D.A. Reed and, J., Dongarra, "Exascale computing and big data," Communications of the ACM, vol. 58, no. 7, pp.56-68, 2015.

[16]. H.J., Watson, "Tutorial: Big data analytics: Concepts, technologies, and applications," Communications of the Association for Information Systems, vol. 34, no. 1, p.65, 2014.


Thank you for reading this post. If you like it, please share this with your friends.

Would you please comment down below on what your thoughts are about this post?

Never miss an update; just subscribe to our newsletters.

4,071 views0 comments

Related Posts

See All
bottom of page