A research report projects a big data security market growth at 17.1 per cent Compound Annual Growth Rate (CAGR) from USD 12.22 Billion in 2017 to USD 26.85 Billion by 2022. Some of the key drivers of this growth include the ever evolving regulatory landscape, an increasing volume of business data generated from a variety of sources, and greater threat from cyber-attacks requiring scalable high security solutions.
Big Data is typically stored in the Hadoop Distributed File System, which provides a very basic form of security, not enough to protect business interests. The name string level of authentication is not enough to deal with the nature of breaches that cloud and networking to the World Wide Web expose one to.
The challenges to big data storage, retrieval and use come at multiple levels and can be broadly classified as:
- Generation of fake data
- Untrusted mappers accessing the system
- Insufficient cryptographic protection
- Mining of sensitive information by unauthorised sources
- Challenges to granular access control
- Difficulty in data provenance difficulties
- Lack of focus on security in high speed NoSQL databases
- Neglecting security audits
To counter these challenges, big data solutions for security can be ensured only through encryption, access control, security intelligence, data governance and data masking.
The Big Data security world is still evolving and expected to mature as newer and newer challenges emerge. In real world systems, organisations may have their entire solution developed on Big Data, or have apps talking to legacy systems. This will decide the complexity of the security solution being developed and the levels of security that will have to be built in. Designing entirely for the Big Data environment is relatively easier as a compact solution can take care of all possible internal and external breaches. But, with diligence and proper assessment, a hybrid environment too can be effectively protected.
Customers who wanted to build a platform on top of a Big Data ecosystem encounter security concerns even though the architecture is successfully implemented. Unfortunately, most of the applications that are built on top of Big Data ecosystem components weren’t designed to address this. This means, many applications lack encryption enablement, policy enablement to address the user level ACL, compliance and risk management to handle in case of emergencies or breach. To ensure that their data or environment is secured, organisations will have to build those features by themselves using Big Data security components.
One of Indium Software’s clients was developing a mobile app and needed security solution for their Big Data. They also had legacy systems with which the HDFS had to communicate with. The security solution needed to provide access controls for internal and external users, as well as assign privileges.
- As a first step, the security level of the existing legacy systems managed by the client had to be assessed, and the Big Data system protected appropriately from any possible breaches at the point where the two connected.
- Secondly, it had to ensure authenticated access to the internal teams based on their requirements.
- Thirdly, it had to protect the Big Data system from external threats.
Based on the evaluation of the security of the existing legacy systems, Indium Software developed a blue print to ensure robustness and designed the solution accordingly. Second, it provided access control with an OpenSource tool called Kerberos to different teams based on their needs. Through this, it was able to define privileges, thus ensuring authenticated and authorised user access of data at multiple levels. This was based on the client list of users with privileges at the various levels, ensuring access of data to different teams based on their development goals.
Kerberos limits the assigning of privileges at the group level and not to individuals. Indium Software worked around this limitation by creating groups with single members to be able to grant appropriate authorisations. Thirdly, it provided security for the web protocols using Knox to restrict external breaches. This was especially important as the client had provided access to his customers, which needed to be allowed only on authentication.
Hadoop also provides auditing logs, which is being monitored and maintained as part of the SLAs.
Today, various rule based methods and different anomaly detection methods are already being used by many banks. However, these have their own limitations and are not all that powerful. Fraud detection capabilities are enhanced with the influx of analytics and a whole new dimension to fraud detection techniques can be seen. Along with this, performance measurement which helps standardize and maintain control for constant improvement is possible with fraud analytics.
The Evolving World
The advantage of being predominantly OpenSource is that there is a community of developers and as and when a patch is developed, it will become accessible to all. This will take care of the limitations be it in the HDFS, Kerberos or any other security solution. However, the threats are also expected to become just as sophisticated. Therefore, it is essential that businesses have a clear security strategy, define their goals and ensure the implementation of a good security solution to protect not just their data, but their business as well.
Alex is a Big Data Evangelist and a Certified Big Data Engineer with many years of experience. He has helped clients to optimize custom Big Data Implementation, migrate legacy systems to Big Data ecosystem, and build integrated Big Data and Analytics solutions to help business leaders generate custom analytics without need of IT.