Skip to content
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
@@ -0,0 +1,352 @@
---
products:
- Alauda Application Services
kind:
- Solution
---

# How to Install the IK Analyzer Plugin for OpenSearch Using opensearch-operator

:::info
Applicable Version: OpenSearch Operator ~= 2.8.x, OpenSearch ~= 2.19.3 / 3.3.1
:::

This document explains how to deploy an OpenSearch cluster with the [IK Analyzer](https://github.com/infinilabs/analysis-ik) plugin pre-installed using the opensearch-operator. The IK Analyzer is the most widely used Chinese text analysis plugin for OpenSearch/Elasticsearch, providing smart and maximum-granularity tokenization for Chinese text.

## How Plugin Installation Works

The opensearch-operator installs plugins by passing each entry in `pluginsList` to the `opensearch-plugin install` command during node startup. You need to configure `pluginsList` in two places:

| Field | Purpose |
| :--- | :--- |
| `spec.general.pluginsList` | Installs the plugin on all OpenSearch data/master nodes |
| `spec.bootstrap.pluginsList` | Installs the plugin on the bootstrap pod used for initial cluster formation |

Both must be configured. If the bootstrap pod is missing the plugin while `additionalConfig` references it, cluster initialization may fail.

:::note
Adding or modifying `pluginsList` on a running cluster will trigger a **rolling restart** of all nodes to install the new plugin.
:::

## IK Analyzer Plugin Download URLs

| OpenSearch Version | Plugin Download URL |
| :--- | :--- |
| **2.19.3** | `https://release.infinilabs.com/analysis-ik/stable/opensearch-analysis-ik-2.19.3.zip` |
| **3.3.1** | `https://release.infinilabs.com/analysis-ik/stable/opensearch-analysis-ik-3.3.1.zip` |

:::note
Before applying, verify that the plugin URL for your OpenSearch version is available. Check the [Infinilabs releases page](https://github.com/infinilabs/analysis-ik/releases) to confirm the file exists. If the URL returns a 404, the cluster will fail to start.
:::

:::warning Air-Gapped Environments
If your Kubernetes cluster does not have external network access, download the plugin zip files first and host them on an internal HTTP server (e.g., Nexus, Artifactory, or Nginx). Then replace the download URLs in the configurations below with your internal accessible URLs.
:::

## Deploy OpenSearch with IK Analyzer

### For OpenSearch 2.19.3

```yaml
apiVersion: opensearch.opster.io/v1
kind: OpenSearchCluster
metadata:
name: my-cluster
namespace: <namespace>
spec:
general:
serviceName: my-cluster
version: 2.19.3
setVMMaxMapCount: true
pluginsList:
- "https://release.infinilabs.com/analysis-ik/stable/opensearch-analysis-ik-2.19.3.zip"
bootstrap:
pluginsList:
- "https://release.infinilabs.com/analysis-ik/stable/opensearch-analysis-ik-2.19.3.zip"
security:
tls:
transport:
generate: true
perNode: true
http:
generate: true
nodePools:
- component: masters
replicas: 3
diskSize: "30Gi"
persistence:
pvc:
storageClass: "<your-storage-class>"
accessModes:
- ReadWriteOnce
roles:
- "cluster_manager"
- "data"
resources:
requests:
memory: "2Gi"
cpu: "500m"
limits:
memory: "2Gi"
cpu: "500m"
dashboards:
enable: true
version: 2.19.3
replicas: 1
resources:
requests:
memory: "512Mi"
cpu: "200m"
limits:
memory: "512Mi"
cpu: "200m"
```

### For OpenSearch 3.3.1

```yaml
apiVersion: opensearch.opster.io/v1
kind: OpenSearchCluster
metadata:
name: my-cluster
namespace: <namespace>
spec:
general:
serviceName: my-cluster
version: 3.3.1
setVMMaxMapCount: true
pluginsList:
- "https://release.infinilabs.com/analysis-ik/stable/opensearch-analysis-ik-3.3.1.zip"
bootstrap:
pluginsList:
- "https://release.infinilabs.com/analysis-ik/stable/opensearch-analysis-ik-3.3.1.zip"
security:
tls:
transport:
generate: true
perNode: true
http:
generate: true
nodePools:
- component: masters
replicas: 3
diskSize: "30Gi"
persistence:
pvc:
storageClass: "<your-storage-class>"
accessModes:
- ReadWriteOnce
roles:
- "cluster_manager"
- "data"
resources:
requests:
memory: "2Gi"
cpu: "500m"
limits:
memory: "2Gi"
cpu: "500m"
dashboards:
enable: true
version: 3.3.0 # Dashboards 3.3.0 is the latest release compatible with OpenSearch 3.3.1
replicas: 1
resources:
requests:
memory: "512Mi"
cpu: "200m"
limits:
memory: "512Mi"
cpu: "200m"
```

Apply the configuration:

```bash
kubectl apply -f cluster.yaml
```

## Verify the Plugin is Installed

After the cluster is running, verify the IK plugin is installed on a node:

```bash
kubectl -n <namespace> exec my-cluster-masters-0 -- bin/opensearch-plugin list
```

The output should include `analysis-ik`.

## Test IK Analyzer

Port-forward the OpenSearch service and run a quick tokenization test:

```bash
kubectl -n <namespace> port-forward svc/my-cluster 9200
```

**Test `ik_max_word` analyzer** (maximum granularity, splits text into all possible tokens):

```bash
# The operator generates a self-signed cert; -k skips local certificate validation
curl -k -u admin:admin -X POST "https://localhost:9200/_analyze" \
-H "Content-Type: application/json" \
-d '{
"analyzer": "ik_max_word",
"text": "自然语言处理技术在人工智能领域的应用越来越广泛"
}'
```

Expected output:

```json
{
"tokens": [
{ "token": "自然语言", "start_offset": 0, "end_offset": 4, "type": "CN_WORD", "position": 0 },
{ "token": "自然", "start_offset": 0, "end_offset": 2, "type": "CN_WORD", "position": 1 },
{ "token": "语言", "start_offset": 2, "end_offset": 4, "type": "CN_WORD", "position": 2 },
{ "token": "处理", "start_offset": 4, "end_offset": 6, "type": "CN_WORD", "position": 3 },
{ "token": "技术", "start_offset": 6, "end_offset": 8, "type": "CN_WORD", "position": 4 },
{ "token": "在", "start_offset": 8, "end_offset": 9, "type": "CN_CHAR", "position": 5 },
{ "token": "人工智能", "start_offset": 9, "end_offset": 13, "type": "CN_WORD", "position": 6 },
{ "token": "人工", "start_offset": 9, "end_offset": 11, "type": "CN_WORD", "position": 7 },
{ "token": "智能", "start_offset": 11, "end_offset": 13, "type": "CN_WORD", "position": 8 },
{ "token": "领域", "start_offset": 13, "end_offset": 15, "type": "CN_WORD", "position": 9 },
{ "token": "的", "start_offset": 15, "end_offset": 16, "type": "CN_CHAR", "position": 10 },
{ "token": "应用", "start_offset": 16, "end_offset": 18, "type": "CN_WORD", "position": 11 },
{ "token": "越来越", "start_offset": 18, "end_offset": 21, "type": "CN_WORD", "position": 12 },
{ "token": "越来", "start_offset": 18, "end_offset": 20, "type": "CN_WORD", "position": 13 },
{ "token": "越", "start_offset": 20, "end_offset": 21, "type": "CN_CHAR", "position": 14 },
{ "token": "广泛", "start_offset": 21, "end_offset": 23, "type": "CN_WORD", "position": 15 }
]
}
```

**Test `ik_smart` analyzer** (coarse-grained, splits into the fewest tokens):

```bash
# The operator generates a self-signed cert; -k skips local certificate validation
curl -k -u admin:admin -X POST "https://localhost:9200/_analyze" \
-H "Content-Type: application/json" \
-d '{
"analyzer": "ik_smart",
"text": "自然语言处理技术在人工智能领域的应用越来越广泛"
}'
```

Expected output:

```json
{
"tokens": [
{ "token": "自然语言", "start_offset": 0, "end_offset": 4, "type": "CN_WORD", "position": 0 },
{ "token": "处理", "start_offset": 4, "end_offset": 6, "type": "CN_WORD", "position": 1 },
{ "token": "技术", "start_offset": 6, "end_offset": 8, "type": "CN_WORD", "position": 2 },
{ "token": "在", "start_offset": 8, "end_offset": 9, "type": "CN_CHAR", "position": 3 },
{ "token": "人工智能", "start_offset": 9, "end_offset": 13, "type": "CN_WORD", "position": 4 },
{ "token": "领域", "start_offset": 13, "end_offset": 15, "type": "CN_WORD", "position": 5 },
{ "token": "的", "start_offset": 15, "end_offset": 16, "type": "CN_CHAR", "position": 6 },
{ "token": "应用", "start_offset": 16, "end_offset": 18, "type": "CN_WORD", "position": 7 },
{ "token": "越来越", "start_offset": 18, "end_offset": 21, "type": "CN_WORD", "position": 8 },
{ "token": "广泛", "start_offset": 21, "end_offset": 23, "type": "CN_WORD", "position": 9 }
]
}
```

## Use IK Analyzer in an Index Mapping

When creating an index, specify `ik_max_word` or `ik_smart` as the analyzer for Chinese text fields:

```bash
curl -k -u admin:admin -X PUT "https://localhost:9200/my-index" \
-H "Content-Type: application/json" \
-d '{
"settings": {
"analysis": {
"analyzer": {
"ik_max_word_analyzer": {
"type": "ik_max_word"
},
"ik_smart_analyzer": {
"type": "ik_smart"
}
Comment on lines +265 to +270
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🌐 Web query:

OpenSearch IK Analyzer plugin analyzer type configuration ik_max_word ik_smart tokenizer

💡 Result:

In the OpenSearch IK Analysis plugin, ik_smart and ik_max_word are provided as both analyzer names and tokenizer names. You can use them either directly in field mappings, or inside a custom analyzer definition. [1]

1) Use IK directly in mappings (simplest)

PUT my_index
{
  "mappings": {
    "properties": {
      "content": {
        "type": "text",
        "analyzer": "ik_max_word",
        "search_analyzer": "ik_smart"
      }
    }
  }
}

ik_smart = coarser “smart” segmentation; ik_max_word = finest-grained segmentation. [2]

2) Use IK as the tokenizer in a custom analyzer (when you need filters)

PUT my_index
{
  "settings": {
    "analysis": {
      "analyzer": {
        "my_ik": {
          "type": "custom",
          "tokenizer": "ik_max_word",
          "filter": ["lowercase"]
        }
      }
    }
  },
  "mappings": {
    "properties": {
      "content": { "type": "text", "analyzer": "my_ik" }
    }
  }
}

OpenSearch custom analyzers are built from a tokenizer plus optional char/token filters. [3]

3) Quick validation

POST my_index/_analyze
{
  "analyzer": "ik_max_word",
  "text": "计算机汉字输入法"
}

[2]

Sources: [1] [2] [3]


Invalid IK Analyzer type in settings definition.

In settings.analysis.analyzer, "type": "ik_max_word" and "type": "ik_smart" are not valid analyzer types. The ik_max_word and ik_smart are pre-built analyzers you can reference directly in field mappings, or tokenizers that must be wrapped in a custom analyzer. Using them as type values will cause index creation to fail.

Correct configuration
 "settings": {
   "analysis": {
     "analyzer": {
       "ik_max_word_analyzer": {
-        "type": "ik_max_word"
+        "type": "custom",
+        "tokenizer": "ik_max_word"
       },
       "ik_smart_analyzer": {
-        "type": "ik_smart"
+        "type": "custom",
+        "tokenizer": "ik_smart"
       }
     }
   }
 }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
"ik_max_word_analyzer": {
"type": "ik_max_word"
},
"ik_smart_analyzer": {
"type": "ik_smart"
}
"ik_max_word_analyzer": {
"type": "custom",
"tokenizer": "ik_max_word"
},
"ik_smart_analyzer": {
"type": "custom",
"tokenizer": "ik_smart"
}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@docs/en/solutions/ecosystem/opensearch/How_to_Install_IK_Analyzer_Plugin_for_OpenSearch_with_opensearch-operator.md`
around lines 265 - 270, The analyzer entries under settings.analysis.analyzer
(ik_max_word_analyzer and ik_smart_analyzer) are invalid because "type":
"ik_max_word" / "ik_smart" are not valid analyzer types; replace them with
proper custom analyzer definitions that specify "type": "custom" and set the
"tokenizer" to "ik_max_word" and "ik_smart" respectively (and any needed
filters), or alternatively remove those analyzer blocks and reference the
built-in ik_max_word and ik_smart analyzers directly from your field mappings;
update the ik_max_word_analyzer and ik_smart_analyzer definitions accordingly so
index creation will succeed.

}
}
},
"mappings": {
"properties": {
"title": {
"type": "text",
"analyzer": "ik_max_word",
"search_analyzer": "ik_smart"
},
"content": {
"type": "text",
"analyzer": "ik_max_word",
"search_analyzer": "ik_smart"
}
}
}
}'
```

:::note
Using `ik_max_word` for indexing and `ik_smart` for search is a common pattern: it maximizes recall at index time while keeping search queries precise.
:::

## (Optional) Mount a Custom Dictionary

The IK Analyzer supports custom word dictionaries and stop-word lists via `IKAnalyzer.cfg.xml`. To mount a custom dictionary into the cluster, use `additionalVolumes` with a ConfigMap.

### Step 1: Create the ConfigMap

Prepare your custom dictionary files and create a ConfigMap. The following example adds a custom word list:

```bash
# custom_dict.dic — one word per line
cat > custom_dict.dic << 'EOF'
云原生
容器编排
服务网格
EOF

# IKAnalyzer.cfg.xml — reference the custom dictionary
cat > IKAnalyzer.cfg.xml << 'EOF'
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE properties SYSTEM "http://java.sun.com/dtd/properties.dtd">
<properties>
<comment>IK Analyzer Extended Configuration</comment>
<!-- Custom extended dictionary; separate multiple files with ; -->
<entry key="ext_dict">custom_dict.dic</entry>
<!-- Custom stop-word dictionary; separate multiple files with ; -->
<entry key="ext_stopwords"></entry>
</properties>
EOF

kubectl -n <namespace> create configmap ik-custom-dict \
--from-file=custom_dict.dic \
--from-file=IKAnalyzer.cfg.xml
```

### Step 2: Mount the ConfigMap via additionalVolumes

Add the `additionalVolumes` section to `spec.general` in your `OpenSearchCluster` CR:

```yaml
spec:
general:
pluginsList:
- "https://release.infinilabs.com/analysis-ik/stable/opensearch-analysis-ik-2.19.3.zip"
additionalVolumes:
- name: ik-custom-dict
path: /usr/share/opensearch/plugins/analysis-ik/config
restartPods: true # Restart pods when ConfigMap content changes
configMap:
name: ik-custom-dict
```

After applying, pods will restart and pick up the new dictionary. Verify by running an `_analyze` request with your custom terms.

## References

- [opensearch-operator: Adding Plugins](https://github.com/opensearch-project/opensearch-k8s-operator/blob/v2.8.0/docs/userguide/main.md#adding-plugins)
- [opensearch-operator: Additional Volumes](https://github.com/opensearch-project/opensearch-k8s-operator/blob/v2.8.0/docs/userguide/main.md#additional-volumes)
- [IK Analyzer for OpenSearch (Infinilabs)](https://github.com/infinilabs/analysis-ik)