Select Page

Amazon CloudSearch

AWS

Amazon CloudSearch is designed for high throughput and low latency search capabilities.

  • Supports various features: language-specific text processing (34 languages), free text search, faceted search, geospatial search, customizable relevance ranking, highlighting, autocomplete, and user-configurable scaling and availability.

Getting Started with Amazon CloudSearch:

  • Steps to use:
    1. Create a search domain.
    2. Configure indexing options for your data.
    3. Upload data for indexing.
    4. Submit search requests via your website or application.

Free Trial:

  • Offers 750 free hours of fully functional search instances for 30 days.
  • To start, sign in to AWS, access the CloudSearch Console, and create/configure a search domain.

Search Instances and Domains:

  • A search domain represents a collection of data you want searchable, along with the necessary resources.
  • Search instances are server instances with allocated RAM and CPU to index data and process queries.
  • Amazon CloudSearch auto-scales search instances based on data volume and search traffic.
  • The number of search instances varies with the data size and query load, scaling up with increased data or traffic and down when they decrease.

Scaling and Performance:

  • Auto-scaling adjusts search instances and replicas based on index size and search traffic.
  • Handles large datasets by partitioning the index across multiple instances if needed.
  • Allows manual scaling to handle expected spikes in query traffic or data volume.

Data Capacity of Search Instances:

  • The document size and index configuration influence capacity.
  • Instance types:
    • Small: Supports up to 2 million documents.
    • Large: Supports up to 8 million documents.
    • Extra Large: Supports up to 16 million documents.
    • Double Extra Large: Supports up to 32 million documents.
  • For larger datasets, indexes are partitioned across multiple Double Extra Large instances.

Amazon CloudSearch Architecture:

  • Interacts through three main services:
    1. Configuration Service: Set up and configure search domains, indexing options, text analysis, availability, scaling, suggesters, and expressions.
    2. Document Service: Upload and manage searchable data formatted in JSON or XML.
    3. Search Service: Handles search and suggestion requests, returning results in JSON or XML.

Configuration Options:

  • Indexing Options: Define fields to include in the index and their configuration (e.g., searchable, facet enabled).
  • Text Analysis Schemes: Language-specific processing, stopwords, synonyms, and stemming options.
  • Availability Options: Deploy domains across multiple Availability Zones for high availability.
  • Scaling Options: Set instance types, replication count, and partition count for handling large datasets or query spikes.
  • Suggesters: Provide autocomplete suggestions based on partial input.
  • Expressions: Customize search result rankings using numeric expressions, combining relevance scores with other document factors.

Document and Search Services:

  • Document Service: Allows modification of searchable data via a unique HTTP endpoint.
  • Search Service: Executes searches and suggestions, supports a rich query language and multiple query parsers (Lucene, DisMax).

Security and Management:

  • Uses AWS Identity and Access Management (IAM) policies to control access to services and domains.
  • Provides monitoring and scaling options through the AWS Management Console, CLI, or SDKs.

Latest Post:

Pin It on Pinterest