View on GitHub

Open Commerce Search Stack

The Documentation

Home > Configuration

Table of Contents

Configuration

Configuration is essential at OCSS, because it controls the translation of the simple OCS-API to the complex Elasticsearch API.

Per default the standard Spring mechanics are used to provide the configuration for each individual index. That standard is quiet powerful, because you can enable Spring Cloud Config that comes with a lot of different features and supported backends.

If that’s not suitable for your use case (because for example you want to integrate OCSS configuration into your own back end), it is also possible to customize the configuration retrieval by adding implementations for IndexerConfigurationProvider and SearchConfigurationProvider. They are requested whenever the according configuration needs to be loaded. At the Java-docs you will find all information about the exiting configuration possibilities. All those settings must be provided by the according implementation.

For this documentation the Spring yaml configuration is used, to explain the different settings.

back to top

Indexer

All configuration are prefixed with ‘ocs’ - or in yaml actually subordinated to that prefix. In these examples it will only be listed once in the first example.

Connection Configuration

The connection to Elasticsearch must be part of the Spring application configuration.

ocs:
  connection-configuration:
    hosts: http://localhost:9200
    # optional if auth is necessary
    auth: "username:password"

back to top


Plugin Configuration

With ‘disabled-plugins’ and ‘prefered-plugins’ you can specify plugins to ignore and which plugins to prefer in case several plugin classes are available for the same service. All classes have to be specified with their full canonical names.

  disabled-plugins:
    -  my.fancy.ExprimentalPlugin
  prefered-plugins:
    "[de.cxp.ocs.spi.indexer.IndexerConfigurationProvider]": "my.fancy.IndexConfigurationProviderV2"

back to top


Default and Specific Index Configuration

All the following index specific configuration can be defined per index or once as ‘default-index-config’. The default configuration will only be used, if there is no index specific configuration.

  default-index-config:
    ...
  index-config:
    my-index-1:
      ...
    my-other-index:
      ...

back to top


Data Processor Configuration

With the data-processor-configuration you can list the data-processors that should be used to transform data. These can be standard processors shipped with OCSS or custom data-processors.

For data-processors that expect some configuration, it can be specified as key-value map below a key with the processor’s classname. Check the java-doc of the data-processors about the configuration details.

    data-processor-configuration:
      processors:
        # some data processors don't need any configuration
        - ExtractCategoryLevelDataProcessor
        - FlagFieldDataProcessor
      configuration:
        # for a data-processors configuration
        # specify the full canonical name as config key and
        # below it the required settings as key-value map
        "[de.cxp.ocs.preprocessor.impl.FlagFieldDataProcessor]":
          group_1_brand: "Fancy Brand"
          group_1_brand_match: 1
          group_1_noMatch: 0
          group_1_destination: "myBrand"

back to top


Index Settings

These settings are applied to the according Elasticsearch index after the indexation process. They configure how the index data should be scaled (replicated) and how fast data updates should be visible. Check the Elasticsearch docs about the details.

    index-settings:
      replica-count: 2
      refresh-interval: 10s

back to top


Field Configuration

It’s required for the indexer to know which data fields should be indexed in which way. Learn more about it at the Indexer docs.

This config is split into two part: the specific fields and the dynamic fields. The specific fields map on an explicit list of data source fields where instead the dynamic fields can use wildcard matching or type matching to map a certain data field. You can think of dynamic fields as some kind of templates, because internally each match produces a specific field configuration.

Dynamic fields are checked in the order they are defined and only in case no specific field was found.

    field-configuration:
      # explicit fields must have a unique field name,
      # that's why they are defined in a map
      # Unfortunately the name needs to be defined twice
      # but this is also validated internally
      fields:
        myfield:
          name: ...
          type: ...
          ...
        myfield2:
          ...
      # this is a list of fields, where the order 
      # of the defined fields matter.
      dynamic-fields:
        - name: ...
          type: ...
          ...

Each single field has the following properites:

Have a look on the “preset configuration” for a full example of the index configuration.

back to top

Search Service

To understand the search service configuration (and not duplicate that information), it is recommended to read the Configuration Paragraph of the Search Service first.

Here only the configuration “language” is documented, not all details of the resulting behaviour.

As an example we use the Spring yaml configuration style to describe the settings. There all settings are prefixed with ocs. See the preset configuration for an full example.

Connection and Plugin Configuration

These settings are identical to the one for the Indexer service.

Default and Specific Tenant Configuration

It is possible to have a default configuration, that is used for all tenants where no specific configuration exists.

Additionally the tenant’s search configuration has the feature to reference parts of the default configuration! This way you can reuse certain configuration blocks.

ocs:
  default-tenant-config:
    plugin-configuration:
      ...
    query-processing:
      ...
    query-configuration:
      ...
    scoring-configuration:
      ...
    rescorers:
      ...
    facet-configuration:
      ...
    sort-configuration:
      ...
  tenant-config:
    my-tenant:
      # Similar to the structure of 'default-tenant-config'
      # Additionaly the following options are possible:
      # These boolean properties can be used to use the according
      # default configuration instead defining the same config again
      use-default-query-config: [true|false]
      use-default-scoring-config: [true|false]
      use-default-facet-config: [true|false]
      use-default-sort-config: [true|false]
      # There can be different tenant configuration for the same index
      index-name: "my-index"
      ...

back to top


Plugin Configuration

The plugin configuration is a data map that contains custom key-value data for activated search plugins. Plugins are always activated in their according context, e.g. at the “query-processing” config.

As a key the full canonical class-name of the plugin must be used. The expected data depends on the plugin’s implementation.

      plugin-configuration:
        "[my.example.FancyCustomization]":
          "key1": "value1"
          "key2": "value2"
        # real example:
        "[de.cxp.ocs.elasticsearch.query.analyzer.QuerqyQueryExpander]":
          "common_rules_url": "rules/querqy_rules.my_index.txt"

back to top


Query Processing

At this section the details are configured, about how a user-query is processed before the Elasticsearch query is built.

      query-processing:
        user-query-preprocessors:
          - "my.example.FancyCustomization"
        user-query-analyzer: "de.cxp.ocs.elasticsearch.query.analyzer.QuerqyQueryExpander"

back to top


Query Configuration

This configuration part is an ordered map of one or more named query configuration objects. They configure the Query Relaxation logic

Each query configuration consists of a name, a strategy, a condition, the weighted-fields that are searched, and some strategy specific settings:

    query-configuration:
      <name>:
        strategy: "<strategy-name>"
        condition: 
          matchingRegex: "<regex>"
          maxTermCount: <int>
        weightedFields:
          "<field-name>": <float>
          ...
        settings:
          ...

back to top


back to top


Scoring Configuration

With this configuration you can influence the final scores of the result hits. This configuration actually is a mapping of the Elasticsearch Function-Score Query. So take the details about the configured behaviour from the Elasticsearch docs.

Options:

    scoring:
      boost-mode: [multiply|avg|sum|min|max|replace]
      score-mode: [multiply|avg|sum|min|max|first]
      score-functions:
        - field: "<field-name>"
          type: [weight|random_score|field_value_factor|script_score|decay_gauss|decay_linear|decay_exp]
          weight: <float>
          options:
            "<key>": "<value>"
        - ... 
          

Each function must have at least a type property and depending on that one or more of the following properties:

back to top


Rescorers

A list of canonical class names of custom implementations of the de.cxp.ocs.spi.search.RescorerProvider interface.

These customization allows the usage of the Query Rescorer API.

      rescorers:
        - "my.custom.LearningToRankRescorerProvider"
        - "my.custom.FancyRescorer"

back to top


Facet Configuration

With the facet configuration you can add additional behaviour about how facets are generated. It contains the following properties:

    facet-configuration:
      max-facets: <int>
      default-facet-configuration:
        type: [term|hierarchical|interval|range|ignore|<custom>]
        order: <int>
        value-order: [COUNT|ALPHANUM_ASC|ALPHANUM_DESC]
        optimal-value-count: <int>
        exclude-from-facet-limit: <boolean>
        show-unselected-options: <boolean>
        multi-select: <boolean>
        prefer-variant-on-filter: <boolean>
        min-facet-coverage: <double>
        min-value-count: <int>
        filter-dependencies:
          - <url-parameters>
        meta-data:
          "<key>": "<value>"
      facets:
      - source-field: "<field-name>"
        label: "<label>"
        type: [term|hierarchical|interval|range|ignore|<custom>]
        order: <int>
        value-order: [COUNT|ALPHANUM_ASC|ALPHANUM_DESC]
        optimal-value-count: <int>
        exclude-from-facet-limit: <boolean>
        show-unselected-options: <boolean>
        multi-select: <boolean>
        prefer-variant-on-filter: <boolean>
        min-facet-coverage: <double>
        min-value-count: <int>
        meta-data:
          "<key>": "<value>"
      - ...

Each individual facet config may contain the following properties:

Special configuration

Mandatory Facet: If the both options exclude-from-facet-limit: true and min-facet-coverage: 0 are set, a facet is considered as mandatory and OCS tries to retrieve the necessary facet values, even if they are actually very sparse and would not show up as a facet.

back to top


Sort Configuration

This section allows the control of the sorting options inside the response and also their sorting behavior. If this configuration part is missing, the sorting options will be generated from all fields that are indexed for sorting with the label being fieldName.order.

Each sorting option configuration supports the following properties:

Example:

      sort-configuration:
        - label: "Cheapest first"
          field: price
          order: ASC
          # consider documents without price to be sorted to the end
          missing: 1000

back to top


Misc Configuration

Some more stand-alone setting options on tenant level:

back to top


Suggest Service

Similar to the other ConfigurationProvider there is also a SuggestConfigProvider interface that can be implemented and added to the suggest-service classpath. It allows different suggest configurations per index.

The configuration can also be provided by Java system properties or by providing a suggest.properties into classpath, and the Suggest Service will load them accordingly.

For missing system properties the Suggest Service tries to lookup an environment variable where each dot . and dash - is replaced by underscore _ and all letters are uppercase. (Example: If the property suggest.index.folder is undefined, it will lookup the SUGGEST_INDEX_FOLDER environment variable)

Default Config

Due to simplicity and having a proper blueprint, the properties are presented as a properties file including all explanation as comments and all default values already set.

Legacy Support: In previous versions of the suggest-service, some properties had different names, for example suggest.preload.indexes or suggester.max.idle.minutes. Those names are still supported by internal fallback mappers, even if not documented anymore. This means that its also possible to have conflicting property names, however in such cases the properties documented here are always prefered over the legacy ones.

# global setting for the service
# server listening settings
suggest.server.port=8080
suggest.server.address=0.0.0.0

# global setting for all indexes
# how often (in seconds) are the data providers asked if the have new data
suggest.update-rate=60

# Normally the data for an index is loaded when the first request comes in.
# With this setting, you can name the indexes that should be loaded directly at the start.
# Values should be comma-separated - index names MUST NOT contain commas.
# Example: suggest.preload.indexes=myindex1,myindex2
# (global setting since only read initially)
# 
#suggest.preload-indexes=

# Specify where lucene puts the indexes. If not specified, the temporary 
# directory will be used.
# (global setting)
#
#suggest.index-folder=

# If several suggest-data-providers are used, they are indexed into separate indexes by default. This option
# activates a merging logic, so that all provided data is merged into one index.
#
# This could reduce load and improve performance since a single Lucene suggester is asked for results.
# However in such a case the weights should be in a similar range to avoid a proper ranking.
#
# Default: false
#
#suggest.data-source-merger=false

# If this property is set, it will be used to extract the payload value with
# this key and group the suggestions accordingly.
# It's recommended to specify 'suggest.group.share.conf' or
# 'suggest.group.cutoff.conf' as well, otherwise the default limiter will
# be used after grouping.
#
#suggest.group.key=

# Depends on a configured `suggest.group.key` property
# The property changes the way, how the result list is truncated (limited).
# Expects the property in the format 'group1=0.x,group2=0.x' to be used as 
# group-share configuration for the 'ConfigurableShareLimiter'
# See the [java doc](javadoc.html#apidocs/de/cxp/ocs/smartsuggest/limiter/ConfigurableShareLimiter.html)
# for more details.
# Basically these values configure, which group of suggestions should get which
# share in the result (e.g. keywords=0.5 (50%), brand=0.3 (30%), category=0.2 (20%)).
#
# This ConfigurableShareLimiter also reads env variables, however they can
# also be configured here directly, but all in upper case, like that:
# SUGGEST_GROUP_SHARE_BRAND=
#
#suggest.group.share-conf=

# Depends on a configured `suggest.group.key` property
# Expects the property to be specified in the format 'group1=N,group2=M'
# with the group names that exist in your suggestion data and integer values.
# The values are considered as absolute limites.
#
#suggest.group.cutoff-conf=

# If grouping and limiting is configured by a key that comes from a single or merged data-provider, then this value
# can be used to increase the internal amount of fetched suggestions.
# This is usable to increase the likeliness to get the desired group counts.
#
# Default: 1
#
#suggest.group.prefetch-limit-factor=1

# If this property is set, the returned values will be deduplicated. As a value
# a comma separated list of the group-values can be specified. It's used as
# a priority order: suggestions of the groups defined first will be
# preferred over suggestions from other groups. Example: a value
# "brand,keyword" will be used to remove a keyword suggestions if there is
# a similar brand suggestions. Comparison is done on normalized values
# (lowercase+trim). Defining the property without a value will enable
# deduplication, but will do that without any priorization.
#
#suggest.group.deduplication-order=

# Optional path prefix for the '/health' and '/metrics' endpoint.
#suggest.service.mgmt-path-prefix=

# Optional Limit for the amount of queries that should be injected by a full query match. 
# Such sharpened queries must be provided by at least one of the used Suggest-Data-Providers
#suggest.max-sharpened-queries=12

# If a suggest index is not requested for that time, it will be unloaded.
# A new request to that index will return an empty list, but restart the loading
# of that index.
# (global setting)
suggest.service.max-idle-minutes=30

Index specifc settings

Besides the properties documented as “global” and the ones with “suggest.service”, all other properties can be specified specifically for an individual index by putting the indexes name directly after the suggest. prefix.

Example:

With the properties suggest.group.prefetch-limit-factor=1 and suggest.my-special-index.group.prefetch-limit-factor=3 all indexes will use the configured default prefetch limit factor of 1, only the index with the name “my-special-index” will use a factor of 3.

back to top