View on GitHub

Open Commerce Search Stack

The Documentation

Home > Quick Start Demo

Table of Contents

Quick Start Demo

This guide gives you an short overview, how to start OCSS locally and index sample data from kaggle. This could be usefull for example for frontend developers who want to develop their frontend against the search api. This guide is NOT for how to configure OCSS the best way etc.

Prerequisites

Start the stack

Before we can index some data to OCSS we have to start the stack. For this purpose there is an docker-compose folder in the operations folder of this repository. This folder contains files needed to start the stack locally:

The docker-compose.yml defines 3 services:

indexer

The indexer service is transforming and indexing the data into elasticsearch. Here is handled that for exmaple the rigth master/child structure is created out of the product data etc. The service provides REST endpoints though an normal indexing process. This means you can create an index, add product data, and set the indexing state to done over a simple API.

searcher

The search service is performing the search / filtering etc. on top of the search engine. In the repo this is done on top of elasticsearch as reference implementation.

elasticsearch

Elasticsearch in this case is the searchengine we used for our reference implementation of the OCSS search and indexing API.

To start the stack change the parent working direktory next to the docker-compose.yml and perform the following command:

# docker-compose up -d
Starting ocs_elasticsearch ... done
Starting ocs_searcher      ... done
Starting ocs_indexer       ... done

This command will startup all services defined in the docker-compose.yml. To confirm that all three services are up and running you can use the docker ps command:

# docker ps
CONTAINER ID        IMAGE                                                 COMMAND                  CREATED             STATUS              PORTS                              NAMES
e367877eb384        commerceexperts/ocs-indexer-service:latest            "java -server -cp /a…"   4 minutes ago       Up 40 seconds       0.0.0.0:8535->8535/tcp             ocs_indexer
ebaa75aa2a20        commerceexperts/ocs-search-service:latest             "java -server -cp /a…"   4 minutes ago       Up 39 seconds       0.0.0.0:8534->8534/tcp             ocs_searcher
433e52146c9c        docker.elastic.co/elasticsearch/elasticsearch:7.9.2   "/tini -- /usr/local…"   4 minutes ago       Up 41 seconds       0.0.0.0:9200->9200/tcp, 9300/tcp   ocs_elasticsearch

As you can see there are now 3 sockets opened:

back to top

Index data

For indexing data from a CSV file there is a helper script ocs-index-data.sh. After we downloaded the sample dataset from kaggle, we have to unpack the .zip file to get the reddit_vm.csv. Now let us inspect the CSV file to get an overview of the data. In the first step the field delimiter, the field sperator, the count of header rows and the id field number (starting from 0) are intressting for us. To get this information just print the first two lines of the CSV file:

# head -n 2 reddit_vm.csv
title,score,id,url,comms_num,created,body,timestamp
Health Canada approves AstraZeneca COVID-19 vaccine,7,lt74vw,https://www.canadaforums.ca/2021/02/health-canada-approves-astrazeneca.html,0,1614400425.0,,2021-02-27 06:33:45

The sample dataset contains a collection of vaccine myths posted on the subreddit r/VaccineMyths.

Here we can get our wanted informations:

Let’s call the script to look what we get (For more options call the script with -h to get an overview):

# ./ocs-index-data.sh -f /tmp/reddit_vm.csv -s "," -q "" -k 1 -i 2 -n quick-start -l en
ERROR: no mapping defined. These are the columns of the given file:
     0	title
     1	score
     2	id
     3	url
     4	comms_num
     5	created
     6	body
     7	timestamp

The script tells us now that we have to describe how the single colmuns of the CSV file should be mapped to search configuration. This should be done the jq way, to make this easier for us the script is telling us the column numbers of the headers. As we don’t have a field configuration yet, we have to create one. As described above this can be done in the application.indexer-service.yml. The following search field configuration (complete application.indexer-service.yml) would fit to our datafeed:

spring:
  logging:
    level:
      de.cxp.ocs: DEBUG

ocs:
  index-config:
    quick-start:
      field-configuration:
        fields:
          id:
            name: id
            type: id
            usage:
              - Result
          title:
            name: title
            type: string
            usage:
              - Search
              - Result
              - Sort
          score:
            name: score
            type: number
            usage:
              - Facet
              - Result
              - Sort
          url:
            name: url
            type: string
            usage:
              - Result
          number_of_comments:
            name: number_of_comments
            type: string
            source-names:
              - comms_num
            usage:
              - Result
              - Facet
              - Sort
          created:
            name: created
            type: number
            usage:
              - Result
              - Sort
          comment:
            name: comment
            source-names:
              - body 
            usage:
              - Search
              - Result
          timestamp:
            name: timestamp
            usage:
              - Result

Please notice that if you change the search configuration you have to restart the indexer service. This could be done with the following command next to the docker-compose.yml:

# docker-compose restart indexer
Restarting ocs_indexer ... done

Once we have the field configuration the mapping for the ocs-index-data.sh is clear and the parameter value in our case would be:

'{title:.[0],score:.[1],url:.[3],comms_num:.[4],created:.[5],body:.[6],timestamp:.[7]}'

Okay, so let’s run the script again and index the data (the indexing process can take a while):

# ./ocs-index-data.sh -f /tmp/reddit_vm.csv -s "," -q "" -k 1 -i 2 -m '{title:.[0],score:.[1],url:.[3],comms_num:.[4],created:.[5],body:.[6],timestamp:.[7]}' -n quick-start -l en -v

After the indexing process is done, check elasticsearch if the index is there:

# curl http://localhost:9200/_cat/indices                       
green open ocs-2-quick-start-en mH0OHNsVS5K4L3yvp_gqaA 1 0 3937 2256 1.9mb 1.9mb

As we can see, the index creation and indexing was successful.

back to top

Use search-api

After we created an index, it’s time to try a search against the search api.

# curl "http://localhost:8534/search-api/v1/search/quick-start?q=beer" | jq .
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 10288    0 10288    0     0   401k      0 --:--:-- --:--:-- --:--:--  401k
{
  "tookInMillis": 19,
  "inputURI": "",
  "slices": [
    {
      "label": null,
      "matchCount": 1323,
      "nextOffset": 12,
      "nextLink": null,
      "resultLink": "",
      "hits": [
        {
          "index": "ocs-2-quick-start-en",
          "document": {
            "id": "lt67lb",
            "data": {
              "score": "1",
              "search_combi": "Beer after corona vaccination",
              "created": "1614397967.0",
              "comment": "\"Hello hello people",
              "title": "Beer after corona vaccination",
              "number_of_comments": "0",
              "url": "https://www.reddit.com/r/VaccineMyths/comments/lt67lb/beer_after_corona_vaccination/"
            },
            "attributes": null,
            "categories": null
          },
          "matchedQueries": [
            "_match_all"
          ]
          ...

back to top