Amazon CloudSearch is designed for high throughput and low latency search capabilities.
- Supports various features: language-specific text processing (34 languages), free text search, faceted search, geospatial search, customizable relevance ranking, highlighting, autocomplete, and user-configurable scaling and availability.
Getting Started with Amazon CloudSearch:
- Steps to use:
- Create a search domain.
- Configure indexing options for your data.
- Upload data for indexing.
- Submit search requests via your website or application.
Free Trial:
- Offers 750 free hours of fully functional search instances for 30 days.
- To start, sign in to AWS, access the CloudSearch Console, and create/configure a search domain.
Search Instances and Domains:
- A search domain represents a collection of data you want searchable, along with the necessary resources.
- Search instances are server instances with allocated RAM and CPU to index data and process queries.
- Amazon CloudSearch auto-scales search instances based on data volume and search traffic.
- The number of search instances varies with the data size and query load, scaling up with increased data or traffic and down when they decrease.
Scaling and Performance:
- Auto-scaling adjusts search instances and replicas based on index size and search traffic.
- Handles large datasets by partitioning the index across multiple instances if needed.
- Allows manual scaling to handle expected spikes in query traffic or data volume.
Data Capacity of Search Instances:
- The document size and index configuration influence capacity.
- Instance types:
- Small: Supports up to 2 million documents.
- Large: Supports up to 8 million documents.
- Extra Large: Supports up to 16 million documents.
- Double Extra Large: Supports up to 32 million documents.
- For larger datasets, indexes are partitioned across multiple Double Extra Large instances.
Amazon CloudSearch Architecture:
- Interacts through three main services:
- Configuration Service: Set up and configure search domains, indexing options, text analysis, availability, scaling, suggesters, and expressions.
- Document Service: Upload and manage searchable data formatted in JSON or XML.
- Search Service: Handles search and suggestion requests, returning results in JSON or XML.
Configuration Options:
- Indexing Options: Define fields to include in the index and their configuration (e.g., searchable, facet enabled).
- Text Analysis Schemes: Language-specific processing, stopwords, synonyms, and stemming options.
- Availability Options: Deploy domains across multiple Availability Zones for high availability.
- Scaling Options: Set instance types, replication count, and partition count for handling large datasets or query spikes.
- Suggesters: Provide autocomplete suggestions based on partial input.
- Expressions: Customize search result rankings using numeric expressions, combining relevance scores with other document factors.
Document and Search Services:
- Document Service: Allows modification of searchable data via a unique HTTP endpoint.
- Search Service: Executes searches and suggestions, supports a rich query language and multiple query parsers (Lucene, DisMax).
Security and Management:
- Uses AWS Identity and Access Management (IAM) policies to control access to services and domains.
- Provides monitoring and scaling options through the AWS Management Console, CLI, or SDKs.