Amazon DataZone ensures secure, purpose-driven access to data, aligning with organizational security policies without relying on individual credentials. Provides transparency on asset usage and supports a governed workflow for approving data subscriptions. Offers monitoring of data assets across domains and projects using usage auditing capabilities.
Key Features:
- Domains:
- Scalable building blocks that organize resources according to business teams or lines of business (LOBs).
- Allow creation of business-specific taxonomies using metadata forms and glossaries.
- Govern data and control access through a domain’s associated resources.
- Workflows for Publishing and Subscribing:
- Facilitates decentralized data ownership and federated governance for data sharing.
- Data producers publish and govern their data assets and configure subscription rules for consumers.
- Data consumers access desired data after completing an approval workflow with data owners.
- Fulfilling Grants on AWS Data:
- Automatically manage permissions for AWS Lake Formation-managed AWS Glue tables and Amazon Redshift tables and views.
- Emits standard events for other assets related to user actions (e.g., subscription requests or approvals).
- Integrates with other AWS services or third-party solutions using these events for custom workflows.
Use Cases:
- Control Data Sharing:
- Abstracts the data sharing process between producers and consumers.
- Domains manage access control through a subscription approval process that supports any account and AWS Region.
- Understand Access Rights:
- Allows creating business use case–based groupings for teams, tools, and data.
- Enables self-service access to data and analytics tools, while administrators manage access centrally.
- Organize by Business Units:
- Securely organizes resources according to business-driven domains (e.g., LOBs).
- Domains serve as scalable containers for Amazon DataZone objects like data assets, projects, and associated AWS accounts.
- Provides a mechanism for organizational discipline in data governance and cataloging.
Amazon DataZone Capabilities:
- Automate Catalog Hydration Using LLMs:
- Uses large language models to automate the curation and hydration of the data catalog.
- Auto-generates business names for structured data to facilitate data discovery.
- Start Small and Scale Quickly:
- Allows LOBs or teams to control their domains and share assets, fostering quick adoption and scalability across the organization.
- Increase Productivity of Data People:
- Promotes collaboration by enabling seamless switching between tools and integration with customized tools.
Integrations:
- Producer Data Sources:
- Publishes data from built-in sources like AWS Glue Data Catalog and Amazon Redshift.
- Allows custom asset types and public APIs for other data sources.
- Integrates with AWS Glue data quality for scheduled data quality scores.
- Analytics Tools:
- Works with Amazon Athena and Amazon Redshift Query Editor for direct data querying.
- Extensible via APIs for third-party tool integration.
- Shares project access context with these tools.
- Access Fulfillment:
- Automatically manages permissions for AWS Lake Formation-managed assets.
- Emits standard events for other assets to facilitate integration with AWS or third-party solutions.
- Machine Learning (ML) Tools:
- Integrates with Amazon SageMaker to enable easy access to data and ML assets.
- Supports ML governance and facilitates the publication of new ML assets to the business data catalog.