Amazon CloudSearch

Amazon CloudSearch is designed for high throughput and low latency search capabilities.

Supports various features: language-specific text processing (34 languages), free text search, faceted search, geospatial search, customizable relevance ranking, highlighting, autocomplete, and user-configurable scaling and availability.

Getting Started with Amazon CloudSearch:

Steps to use:
1. Create a search domain.
2. Configure indexing options for your data.
3. Upload data for indexing.
4. Submit search requests via your website or application.

Free Trial:

Offers 750 free hours of fully functional search instances for 30 days.
To start, sign in to AWS, access the CloudSearch Console, and create/configure a search domain.

Search Instances and Domains:

A search domain represents a collection of data you want searchable, along with the necessary resources.
Search instances are server instances with allocated RAM and CPU to index data and process queries.
Amazon CloudSearch auto-scales search instances based on data volume and search traffic.
The number of search instances varies with the data size and query load, scaling up with increased data or traffic and down when they decrease.

Scaling and Performance:

Auto-scaling adjusts search instances and replicas based on index size and search traffic.
Handles large datasets by partitioning the index across multiple instances if needed.
Allows manual scaling to handle expected spikes in query traffic or data volume.

Data Capacity of Search Instances:

The document size and index configuration influence capacity.
Instance types:
- Small: Supports up to 2 million documents.
- Large: Supports up to 8 million documents.
- Extra Large: Supports up to 16 million documents.
- Double Extra Large: Supports up to 32 million documents.
For larger datasets, indexes are partitioned across multiple Double Extra Large instances.

Amazon CloudSearch Architecture:

Interacts through three main services:
1. Configuration Service: Set up and configure search domains, indexing options, text analysis, availability, scaling, suggesters, and expressions.
2. Document Service: Upload and manage searchable data formatted in JSON or XML.
3. Search Service: Handles search and suggestion requests, returning results in JSON or XML.

Configuration Options:

Indexing Options: Define fields to include in the index and their configuration (e.g., searchable, facet enabled).
Text Analysis Schemes: Language-specific processing, stopwords, synonyms, and stemming options.
Availability Options: Deploy domains across multiple Availability Zones for high availability.
Scaling Options: Set instance types, replication count, and partition count for handling large datasets or query spikes.
Suggesters: Provide autocomplete suggestions based on partial input.
Expressions: Customize search result rankings using numeric expressions, combining relevance scores with other document factors.

Document and Search Services:

Document Service: Allows modification of searchable data via a unique HTTP endpoint.
Search Service: Executes searches and suggestions, supports a rich query language and multiple query parsers (Lucene, DisMax).

Security and Management:

Uses AWS Identity and Access Management (IAM) policies to control access to services and domains.
Provides monitoring and scaling options through the AWS Management Console, CLI, or SDKs.