Leveraging BigID to implement an enterprise data platform

What is BigID?

BigID is a data privacy and protection platform designed to help organizations manage, protect, and govern their sensitive data across their entire data landscape. The company was founded in 2016 by Dimitri Sirota and Nimrod Vax, and its headquarters are in New York City, United States.

BigID leverages advanced machine learning and data intelligence techniques to discover, classify, and catalog sensitive information, such as personally identifiable information (PII) and other types of sensitive data. This enables organizations to comply with data privacy regulations like the General Data Protection Regulation (GDPR), the California Consumer Privacy Act (CCPA), and other similar laws around the world.

BigID Data Catalog Demo - YouTube

Some key features of BigID include:

Data discovery and classification – BigID helps organizations find and categorize sensitive information across various data sources, including structured and unstructured data.
Data mapping and cataloging – The platform creates a comprehensive data inventory that provides a holistic view of an organization’s data landscape, facilitating data governance and management.
Data privacy and protection – BigID helps organizations monitor and control access to sensitive data, identify potential privacy risks, and implement measures to protect the data in accordance with regulatory requirements.
Data minimization and retention – The platform aids in identifying and deleting unnecessary or outdated data, ensuring compliance with data minimization and retention policies.
Incident response and breach notifications – BigID enables organizations to quickly respond to data breaches or incidents by identifying affected data subjects and providing the necessary information for regulatory notifications.

Overall, BigID aims to empower organizations to manage their sensitive data more effectively while ensuring compliance with data privacy regulations and reducing the risk of data breaches.

BigID masking functionality

BigID provides data masking functionality as a part of its data privacy and protection capabilities. Data masking is a technique used to obfuscate, anonymize, or pseudonymize sensitive information by replacing the original data with fictional or scrambled data, rendering it unreadable or unrecognizable to unauthorized users. This process helps organizations protect sensitive data, such as personally identifiable information (PII), while still allowing it to be used for testing, analytics, or other purposes where the actual sensitive data is not needed.

BigID’s data masking features include:

Static Data Masking – This technique involves masking sensitive data in a non-production environment, such as development or testing environments, to ensure that sensitive information is not exposed to unauthorized users.
Dynamic Data Masking – BigID can also provide real-time, on-the-fly masking of sensitive data, ensuring that unauthorized users cannot access the original data while still allowing authorized users to view or use the unmasked data.
Customizable masking policies – BigID enables organizations to create and implement custom masking policies based on their specific data privacy requirements and business needs.
Integration with existing systems – The platform can be integrated with existing data storage, processing, and analytics systems, allowing organizations to apply data masking policies across their entire data landscape.

By providing data masking functionality, BigID helps organizations protect sensitive information, reduce the risk of data breaches, and comply with data privacy regulations, all while ensuring that their data can still be used for various non-sensitive purposes.

BidID’s integration with other tools and services

BigID can be integrated with various tools and services to help organizations effectively manage their sensitive data while maintaining compliance with data privacy regulations. Integration with other tools and services can be achieved through APIs, connectors, and native platform support. Some of the ways BigID integrates with other tools and services include:

Data Storage and Databases – BigID can be integrated with various data storage solutions, such as relational databases (e.g., MySQL, PostgreSQL, Oracle), NoSQL databases (e.g., MongoDB, Cassandra), and data warehouses (e.g., Snowflake, Amazon Redshift). This allows organizations to apply data masking policies directly to the data stored in these systems.
Cloud Services – BigID supports integration with popular cloud service providers, such as Amazon Web Services (AWS), Microsoft Azure, and Google Cloud Platform (GCP). By integrating with these services, organizations can extend their data masking capabilities to data stored in the cloud, ensuring consistent data protection across different environments.
Data Integration and ETL (Extract, Transform, Load) Tools – BigID can be integrated with data integration and ETL tools (e.g., Apache NiFi, Talend, Informatica) to ensure that masked data is used during data processing and transformation operations. This helps maintain data privacy while transferring data between different systems or environments.
Analytics and Business Intelligence Tools – BigID’s data masking functionality can also be integrated with analytics and business intelligence platforms (e.g., Tableau, Power BI, Looker) to provide masked data for analysis, ensuring that sensitive data is protected even when used for analytics purposes.
Data Governance and Data Catalog Tools – Integrating BigID with data governance and catalog tools (e.g., Collibra, Alation) can help organizations to enforce data masking policies consistently across their entire data landscape, thereby enhancing their overall data protection and privacy posture.

By integrating its data masking functionality with various tools and services, BigID enables organizations to build a unified data protection strategy that effectively safeguards sensitive data across multiple platforms and environments while ensuring compliance with data privacy regulations.

BigID and on-premises systems

BigID can be used with on-premises systems as well as cloud-based and hybrid environments. The platform is designed to be flexible and adaptable, allowing organizations to deploy it in a way that best suits their specific data privacy and protection needs.

BigID supports integration with various on-premises data storage solutions, including relational databases (e.g., MySQL, PostgreSQL, Oracle), NoSQL databases (e.g., MongoDB, Cassandra), and data warehouses. It can also be integrated with on-premises data processing, analytics, and data governance tools.

To deploy BigID in an on-premises environment, organizations typically install and run the BigID software on their own hardware infrastructure, ensuring that all data processing, discovery, and classification operations take place within the organization’s secure network. This can help organizations maintain control over their sensitive data and meet specific compliance requirements that may not allow for data processing to be conducted in the cloud.

In summary, BigID can be effectively used with on-premises systems, offering organizations the flexibility to choose a deployment model that aligns with their specific data privacy, protection, and compliance needs.

Big ID real-time dynamic masking

You can incorporate BigID into your interfaces to provide masked data by default to users who do not have permissions to see unmasked data. This can be achieved through the use of BigID’s dynamic data masking functionality, which allows for real-time, on-the-fly masking of sensitive data based on user permissions.

To implement this, you would need to integrate BigID with your interfaces, applications, or APIs, so that data requests are routed through BigID’s masking engine. When a user requests data through an interface, BigID will determine the user’s access permissions and apply the appropriate data masking policies. Users without the necessary permissions to view sensitive data will receive masked data, while authorized users will have access to the unmasked data.

Here are some steps you can follow to achieve this:

Configure BigID’s dynamic data masking policies based on your organization’s data privacy requirements and user roles.
Integrate BigID with your interfaces, applications, or APIs, either directly or by using a middleware solution.
Set up access controls to define which users or user groups have permissions to view unmasked data.
When a user requests data through an interface, route the request through BigID’s masking engine, which will apply the appropriate masking policies based on the user’s permissions.
Return the masked or unmasked data to the user based on their access permissions.

By incorporating BigID in this manner, you can ensure that sensitive data is protected by default while still allowing authorized users to access the unmasked data when necessary. This approach can help enhance your organization’s data privacy and protection posture while maintaining compliance with data privacy regulations.

Masking data at rest with BigID

BigID can help mask data at rest by applying static data masking techniques to sensitive information stored in databases, data warehouses, or other storage systems. Static data masking involves replacing sensitive data with fictional or scrambled data in non-production environments, such as development, testing, or staging environments, rendering the information unreadable or unrecognizable to unauthorized users.

To mask data at rest using BigID, you would need to follow these steps:

Discover and classify sensitive data – Use BigID’s data discovery and classification features to identify sensitive data in your organization’s storage systems, such as personally identifiable information (PII) or other types of sensitive data.
Define masking policies – Configure BigID’s data masking policies based on your organization’s data privacy requirements and desired masking techniques (e.g., anonymization, pseudonymization, encryption, or tokenization).
Apply masking to data at rest – Execute the masking policies on the identified sensitive data in non-production environments, ensuring that the original sensitive data is replaced with masked data.
Verify the masking process – After the masking process is complete, validate that the sensitive data has been masked appropriately and that unauthorized users cannot access the original information.
Use masked data in non-production environments – When conducting development, testing, or other non-sensitive activities, use the masked data instead of the original sensitive data to minimize the risk of data breaches and maintain compliance with data privacy regulations.

By masking data at rest, BigID helps organizations protect sensitive information while still allowing it to be used for non-sensitive purposes, such as testing, development, or analytics.

Converting masked data to its original state

Whether masked data can be converted back to the original data depends on the masking technique used. Some masking techniques are reversible, while others are not.

Pseudonymization – This technique replaces sensitive data with pseudonyms or tokens, maintaining a reference to the original data. Pseudonymized data can be reverted to the original data using the mapping between the pseudonyms and the original data, typically stored in a separate and secured location. This technique is useful when you need to maintain relationships between the data points for analysis or processing but want to limit exposure of sensitive data.
Encryption – Encryption involves transforming the original data using a specific encryption algorithm and a secret key. Encrypted data can be decrypted and reverted to the original data using the correct decryption algorithm and secret key. While encryption provides strong protection, managing and securing encryption keys is crucial to maintain data privacy.

Non-reversible (one-way) techniques:

Anonymization – Anonymization techniques, such as data aggregation, generalization, or noise addition, permanently alter the sensitive data in such a way that it cannot be linked back to the original data. Once the data is anonymized, it cannot be reverted to the original data. Anonymization provides strong privacy protection but may reduce the utility of the data for certain use cases.
Masking (scrambling) – This technique involves replacing the original data with fictional or random data, making it unreadable or unrecognizable. Depending on the specific masking technique used, it may not be possible to reverse the process and recover the original data. For example, full masking, which replaces the entire data value with a random value, is not reversible.

When choosing a masking technique, consider your organization’s specific data privacy requirements, the need to maintain relationships between data points, and the level of protection.

BigID supports various data masking techniques, including scrambling, anonymization, and pseudonymization, to help organizations protect their sensitive data while maintaining compliance with data privacy regulations.

Scrambling – BigID allows you to scramble or obfuscate sensitive data by replacing it with random characters or values. This technique is useful when you want to render the data unreadable or unrecognizable without the need to maintain any relationship between the original and masked data.
Anonymization – BigID supports anonymization techniques such as data aggregation, generalization, and noise addition, which permanently alter sensitive data in a way that it cannot be linked back to the original data. Anonymization provides strong privacy protection but may reduce the utility of the data for certain use cases.
Pseudonymization – With BigID, you can replace sensitive data with pseudonyms or tokens, maintaining a reference to the original data. Pseudonymized data can be reverted to the original data using the mapping between the pseudonyms and the original data, typically stored in a separate and secured location. This technique is useful when you need to maintain relationships between data points for analysis or processing but want to limit exposure of sensitive data.

BigID allows you to configure and implement custom data masking policies based on your organization’s specific data privacy requirements and the desired masking techniques. By providing support for various data masking techniques, including scrambling, anonymization, and pseudonymization, BigID helps organizations effectively manage and protect their sensitive data while ensuring compliance with data privacy regulations.

Data discovery with BigID

BigID discovers data by leveraging advanced machine learning, data intelligence techniques, and predefined discovery rules to scan and analyze data across a wide range of data sources and types. The data discovery process in BigID consists of several steps:

Connect to data sources – BigID supports integration with various data sources, such as databases, file systems, cloud storage, data warehouses, and big data platforms. By establishing connections to these data sources, BigID can access and scan the data stored within them.
Data scanning – BigID scans the connected data sources to identify and collect metadata, such as file names, file types, creation dates, and other relevant information. This process helps to create a comprehensive data inventory and provides insights into the organization’s data landscape.
Data classification – Using machine learning algorithms and predefined rules, BigID automatically classifies the scanned data based on various attributes, such as data types, formats, and patterns. The platform can identify sensitive information, such as personally identifiable information (PII), protected health information (PHI), and other types of sensitive data.
Data correlation and mapping – BigID correlates the classified data to build relationships between data points, allowing for a deeper understanding of the data and its context. The platform can also create data maps that provide a visual representation of the relationships between different data elements and their sources.
Data cataloging – The discovered and classified data is organized into a data catalog, which serves as a centralized repository for managing and governing the organization’s data. The data catalog helps in maintaining an up-to-date inventory of sensitive data and simplifies data governance tasks.
Continuous monitoring – BigID continuously monitors the connected data sources to ensure that any new or updated data is discovered and classified, maintaining an accurate and current view of the organization’s data landscape.

By employing a combination of machine learning, data intelligence techniques, and predefined discovery rules, BigID enables organizations to discover, classify, and catalog sensitive data across their entire data landscape, facilitating data governance, protection, and compliance with data privacy regulations.

BigID and enterprise data catalogs

Integrating BigID with enterprise data catalogs such as Collibra’s data catalog and Alation’s data catalog can help organizations consolidate their data governance and data privacy efforts, providing a more comprehensive view of their data landscape. To integrate the BigID data catalog with other data catalogs, you can follow these steps:

Establish connections between BigID and the catalog – To enable data exchange between BigID and other data catalogs, you may need to use APIs or connectors provided by both platforms. This will ensure that data discovered and classified by BigID can be shared with other data catalogs.
Configure data sharing policies – Define the data sharing policies that determine which data elements, classifications, and metadata from the BigID data catalog should be shared with the other data catalog. This ensures that only relevant information is exchanged between the two platforms.
Synchronize data between the catalogs – Set up a synchronization process to regularly update both data catalogs with the latest data discoveries, classifications, and metadata from the BigID data catalog. This can be done through scheduled data synchronization tasks or real-time data updates using APIs or connectors.
Map data elements and classifications – Once the data from the BigID data catalog is shared with the other catalog, map the data elements, classifications, and metadata to the corresponding elements in the other data catalog. This helps in maintaining consistency and coherence between the two catalogs.
Implement data governance workflows – With the integration in place, you can leverage the combined capabilities of BigID and your preferred data catalog to implement data governance workflows, such as data stewardship, data quality management, and data lineage tracking. This will help in effectively managing and governing your organization’s data landscape.

By integrating the BigID data catalog with other enterprise data catalogs, organizations can benefit from a unified data governance and data privacy framework, allowing them to efficiently manage, protect, and govern their sensitive data across various data sources and environments.

Conclusion

BigID is a data privacy platform that helps organizations discover, manage, and protect their sensitive data. The platform’s capabilities include data discovery, classification, and correlation, as well as automated privacy management, access controls, and data retention policies. BigID uses machine learning and advanced analytics to identify sensitive data across structured and unstructured data sources, enabling organizations to comply with privacy regulations such as GDPR, CCPA, and HIPAA. With BigID, organizations can gain visibility into their data landscape, manage risks, and improve their overall data governance practices. Overall, BigID’s capabilities provide an essential solution to help organizations maintain data privacy and security in today’s increasingly regulated and data-driven world.