検索ブログ

主に検索のことについて書いています。

Elasticsearchのインデックスを上書きしないように設定する

Elasticsearchでインデックスを上書きしないように設定する方法を記す。

公式ドキュメント URL-based access control を参考にした。 www.elastic.co

検証環境: Elasticsearch 6.0.0-rc1

Elasticsearchは基本URLベースでインデックスにアクセスする。
デフォルトではインデックスは上書きできるようになっている。しかし、elasticsearch.ymlファイルにrest.action.multi.allow_explicit_indexの値をfalseとして追加することで上書きしないように設定することが出来る。

以下はデフォルトのelasticsearch.ymlファイル(rest.action.multi.allow_explicit_indexの設定を加える前)。

bash-3.2$ pwd
/Users/sakura818uuu/elasticsearch-6.0.0-rc1
bash-3.2$ ls
LICENSE.txt README.textile  bin     data        logs        plugins
NOTICE.txt  accounts.json   config      lib     modules
bash-3.2$ cd config/
bash-3.2$ ls
elasticsearch.yml   jvm.options     log4j2.properties
bash-3.2$ cat elasticsearch.yml 
# ======================== Elasticsearch Configuration =========================
#
# NOTE: Elasticsearch comes with reasonable defaults for most settings.
#       Before you set out to tweak and tune the configuration, make sure you
#       understand what are you trying to accomplish and the consequences.
#
# The primary way of configuring a node is via this file. This template lists
# the most important settings you may want to configure for a production cluster.
#
# Please consult the documentation for further information on configuration options:
# https://www.elastic.co/guide/en/elasticsearch/reference/index.html
#
# ---------------------------------- Cluster -----------------------------------
#
# Use a descriptive name for your cluster:
#
#cluster.name: my-application
#
# ------------------------------------ Node ------------------------------------
#
# Use a descriptive name for the node:
#
#node.name: node-1
#
# Add custom attributes to the node:
#
#node.attr.rack: r1
#
# ----------------------------------- Paths ------------------------------------
#
# Path to directory where to store the data (separate multiple locations by comma):
#
#path.data: /path/to/data
#
# Path to log files:
#
#path.logs: /path/to/logs
#
# ----------------------------------- Memory -----------------------------------
#
# Lock the memory on startup:
#
#bootstrap.memory_lock: true
#
# Make sure that the heap size is set to about half the memory available
# on the system and that the owner of the process is allowed to use this
# limit.
#
# Elasticsearch performs poorly when the system is swapping the memory.
#
# ---------------------------------- Network -----------------------------------
#
# Set the bind address to a specific IP (IPv4 or IPv6):
#
#network.host: 192.168.0.1
#
# Set a custom port for HTTP:
#
#http.port: 9200
#
# For more information, consult the network module documentation.
#
# --------------------------------- Discovery ----------------------------------
#
# Pass an initial list of hosts to perform discovery when new node is started:
# The default list of hosts is ["127.0.0.1", "[::1]"]
#
#discovery.zen.ping.unicast.hosts: ["host1", "host2"]
#
# Prevent the "split brain" by configuring the majority of nodes (total number of master-eligible nodes / 2 + 1):
#
#discovery.zen.minimum_master_nodes: 3
#
# For more information, consult the zen discovery module documentation.
#
# ---------------------------------- Gateway -----------------------------------
#
# Block initial recovery after a full cluster restart until N nodes are started:
#
#gateway.recover_after_nodes: 3
#
# For more information, consult the gateway module documentation.
#
# ---------------------------------- Various -----------------------------------
#
# Require explicit names when deleting indices:
#
#action.destructive_requires_name: true

実際にやってみる。

bash-3.2$ vim elasticsearch.yml 
bash-3.2$ cat elasticsearch.yml 
# ======================== Elasticsearch Configuration =========================
#
# NOTE: Elasticsearch comes with reasonable defaults for most settings.
#       Before you set out to tweak and tune the configuration, make sure you
#       understand what are you trying to accomplish and the consequences.
#
# The primary way of configuring a node is via this file. This template lists
# the most important settings you may want to configure for a production cluster.
#
# Please consult the documentation for further information on configuration options:
# https://www.elastic.co/guide/en/elasticsearch/reference/index.html
#
# ---------------------------------- Cluster -----------------------------------
#
# Use a descriptive name for your cluster:
#
#cluster.name: my-application
#
# ------------------------------------ Node ------------------------------------
#
# Use a descriptive name for the node:
#
#node.name: node-1
#
# Add custom attributes to the node:
#
#node.attr.rack: r1
#
# ----------------------------------- Paths ------------------------------------
#
# Path to directory where to store the data (separate multiple locations by comma):
#
#path.data: /path/to/data
#
# Path to log files:
#
#path.logs: /path/to/logs
#
# ----------------------------------- Memory -----------------------------------
#
# Lock the memory on startup:
#
#bootstrap.memory_lock: true
#
# Make sure that the heap size is set to about half the memory available
# on the system and that the owner of the process is allowed to use this
# limit.
#
# Elasticsearch performs poorly when the system is swapping the memory.
#
# ---------------------------------- Network -----------------------------------
#
# Set the bind address to a specific IP (IPv4 or IPv6):
#
#network.host: 192.168.0.1
#
# Set a custom port for HTTP:
#
#http.port: 9200
#
# For more information, consult the network module documentation.
#
# --------------------------------- Discovery ----------------------------------
#
# Pass an initial list of hosts to perform discovery when new node is started:
# The default list of hosts is ["127.0.0.1", "[::1]"]
#
#discovery.zen.ping.unicast.hosts: ["host1", "host2"]
#
# Prevent the "split brain" by configuring the majority of nodes (total number of master-eligible nodes / 2 + 1):
#
#discovery.zen.minimum_master_nodes: 3
#
# For more information, consult the zen discovery module documentation.
#
# ---------------------------------- Gateway -----------------------------------
#
# Block initial recovery after a full cluster restart until N nodes are started:
#
#gateway.recover_after_nodes: 3
#
# For more information, consult the gateway module documentation.
#
# ---------------------------------- Various -----------------------------------
#
# Require explicit names when deleting indices:
#
#action.destructive_requires_name: true

# Add

rest.action.multi.allow_explicit_index:false
bash-3.2$ curl -XPUT 'localhost:9200/indexusertest?pretty&pretty'
{
  "acknowledged" : true,
  "shards_acknowledged" : true,
  "index" : "indexusertest"
}
bash-3.2$ curl -XGET 'localhost:9200/_cat/indices?v&pretty'
health status index         uuid                   pri rep docs.count docs.deleted store.size pri.store.size
yellow open   customer      tl6qvdROTfuL380eLOxH0Q   5   1          2            0      8.3kb          8.3kb
yellow open   indexusertest EO9eYTNoT-i7_L3PBIb7dQ   5   1          0            0      1.1kb          1.1kb
yellow open   bank          yaVaZLiLT2G0RyA-vBn5nw   5   1       1000            0    488.3kb        488.3kb
bash-3.2$ curl -XPUT 'localhost:9200/customer/doc/1?pretty&pretty' -H 'Content-Type: application/json' -d'
> > {
> >   "name": "John Doe"
> > }
> 
bash-3.2$ curl -XPUT 'localhost:9200/indexusertest/doc/1?pretty&pretty' -H 'Content-Type: application/json' -d'
> {
>   "name": "John Doe"
> }
> '
{
  "_index" : "indexusertest",
  "_type" : "doc",
  "_id" : "1",
  "_version" : 1,
  "result" : "created",
  "_shards" : {
    "total" : 2,
    "successful" : 1,
    "failed" : 0
  },
  "_seq_no" : 0,
  "_primary_term" : 1
}
bash-3.2$ curl -XGET 'localhost:9200/indexusertest/doc/1?pretty&pretty'
{
  "_index" : "indexusertest",
  "_type" : "doc",
  "_id" : "1",
  "_version" : 1,
  "found" : true,
  "_source" : {
    "name" : "John Doe"
  }
}
bash-3.2$ curl -XPUT 'localhost:9200/indexusertest/doc/1?pretty&pretty' -H 'Content-Type: application/json' -d'
> {
>   "name": "Jane Doe"
> }
> '
{
  "_index" : "indexusertest",
  "_type" : "doc",
  "_id" : "1",
  "_version" : 2,
  "result" : "updated",
  "_shards" : {
    "total" : 2,
    "successful" : 1,
    "failed" : 0
  },
  "_seq_no" : 1,
  "_primary_term" : 1
}
bash-3.2$ curl -XGET 'localhost:9200/indexusertest/doc/1?pretty&pretty'
{
  "_index" : "indexusertest",
  "_type" : "doc",
  "_id" : "1",
  "_version" : 2,
  "found" : true,
  "_source" : {
    "name" : "Jane Doe"
  }
}

予想に反してelasticsearch.ymlに設定を追加しても普通にインデックスの上書きができてしまった。 上記はelasticsearchをずっと起動させっぱなしだったので、再起動したらelasticsearch.ymlがきちんと反映されるかもしれないと思って同じことを再度試した。

^C[2017-10-29T18:35:29,817][INFO ][o.e.n.Node               ] [Xj840__] stopping ...
[2017-10-29T18:35:29,881][INFO ][o.e.n.Node               ] [Xj840__] stopped
[2017-10-29T18:35:29,882][INFO ][o.e.n.Node               ] [Xj840__] closing ...
[2017-10-29T18:35:29,900][INFO ][o.e.n.Node               ] [Xj840__] closed
bash-3.2$ ./bin/elasticsearch
Exception in thread "main" 2017-10-29 18:35:36,902 main ERROR No log4j2 configuration file found. Using default configuration: logging only errors to the console. Set system property 'log4j2.debug' to show Log4j2 internal initialization logging.
ElasticsearchParseException[malformed, expected settings to start with 'object', instead was [VALUE_STRING]]
    at org.elasticsearch.common.settings.loader.XContentSettingsLoader.load(XContentSettingsLoader.java:73)
    at org.elasticsearch.common.settings.loader.XContentSettingsLoader.load(XContentSettingsLoader.java:52)
    at org.elasticsearch.common.settings.loader.YamlSettingsLoader.load(YamlSettingsLoader.java:50)
    at org.elasticsearch.common.settings.Settings$Builder.loadFromStream(Settings.java:1069)
    at org.elasticsearch.common.settings.Settings$Builder.loadFromPath(Settings.java:1058)
    at org.elasticsearch.node.InternalSettingsPreparer.prepareEnvironment(InternalSettingsPreparer.java:99)
    at org.elasticsearch.cli.EnvironmentAwareCommand.createEnv(EnvironmentAwareCommand.java:78)
    at org.elasticsearch.cli.EnvironmentAwareCommand.execute(EnvironmentAwareCommand.java:69)
    at org.elasticsearch.cli.Command.mainWithoutErrorHandling(Command.java:134)
    at org.elasticsearch.cli.Command.main(Command.java:90)
    at org.elasticsearch.bootstrap.Elasticsearch.main(Elasticsearch.java:92)
    at org.elasticsearch.bootstrap.Elasticsearch.main(Elasticsearch.java:85)

log4j2に関するエラーがでて起動できなくなった。変更したところはelasticsearch.ymlのrest.action.multi.allow_explicit_index:falseだけなので本末転倒だがそこを削除して起動してみるときちんと起動するようになった。

bash-3.2$ ./bin/elasticsearch
[2017-10-29T18:38:29,327][INFO ][o.e.n.Node               ] [] initializing ...
(省略)
[2017-10-29T18:38:36,999][INFO ][o.e.n.Node               ] [Xj840__] started

もしかするとelasticsearch.ymlファイルの記述方法が間違っていたのかもしれない。elasticsearch.ymlファイルのデフォルトにより似せてrest.action.multi.allow_explicit_index:falseを追加してelasticsearchを再起動してみた。

bash-3.2$ vim elasticsearch.yml 
bash-3.2$ cat elasticsearch.yml 
# ======================== Elasticsearch Configuration =========================
#
# NOTE: Elasticsearch comes with reasonable defaults for most settings.
#       Before you set out to tweak and tune the configuration, make sure you
#       understand what are you trying to accomplish and the consequences.
#
# The primary way of configuring a node is via this file. This template lists
# the most important settings you may want to configure for a production cluster.
#
# Please consult the documentation for further information on configuration options:
# https://www.elastic.co/guide/en/elasticsearch/reference/index.html
#
# ---------------------------------- Cluster -----------------------------------
#
# Use a descriptive name for your cluster:
#
#cluster.name: my-application
#
# ------------------------------------ Node ------------------------------------
#
# Use a descriptive name for the node:
#
#node.name: node-1
#
# Add custom attributes to the node:
#
#node.attr.rack: r1
#
# ----------------------------------- Paths ------------------------------------
#
# Path to directory where to store the data (separate multiple locations by comma):
#
#path.data: /path/to/data
#
# Path to log files:
#
#path.logs: /path/to/logs
#
# ----------------------------------- Memory -----------------------------------
#
# Lock the memory on startup:
#
#bootstrap.memory_lock: true
#
# Make sure that the heap size is set to about half the memory available
# on the system and that the owner of the process is allowed to use this
# limit.
#
# Elasticsearch performs poorly when the system is swapping the memory.
#
# ---------------------------------- Network -----------------------------------
#
# Set the bind address to a specific IP (IPv4 or IPv6):
#
#network.host: 192.168.0.1
#
# Set a custom port for HTTP:
#
#http.port: 9200
#
# For more information, consult the network module documentation.
#
# --------------------------------- Discovery ----------------------------------
#
# Pass an initial list of hosts to perform discovery when new node is started:
# The default list of hosts is ["127.0.0.1", "[::1]"]
#
#discovery.zen.ping.unicast.hosts: ["host1", "host2"]
#
# Prevent the "split brain" by configuring the majority of nodes (total number of master-eligible nodes / 2 + 1):
#
#discovery.zen.minimum_master_nodes: 3
#
# For more information, consult the zen discovery module documentation.
#
# ---------------------------------- Gateway -----------------------------------
#
# Block initial recovery after a full cluster restart until N nodes are started:
#
#gateway.recover_after_nodes: 3
#
# For more information, consult the gateway module documentation.
#
# ---------------------------------- Various -----------------------------------
#
# Require explicit names when deleting indices:
#
#action.destructive_requires_name: true
#
# --------------------------------- Add ----------------------------------------
#
# rest.action.multi.allow_explicit_index:false
#
bash-3.2$ ./bin/elasticsearch
[2017-10-29T18:44:56,179][INFO ][o.e.n.Node               ] [] initializing ...
(省略)
[2017-10-29T18:45:04,328][INFO ][o.e.n.Node               ] [Xj840__] started

きちんと起動するようになったので、最初の目的であるインデックスを上書きしないようになってるかまた確かめる。

bash-3.2$ curl -XPUT 'localhost:9200/noindextest/doc/1?pretty&pretty' -H 'Content-Type: application/json' -d'
> {
>   "name": "John Doe"
> }
> '
{
  "_index" : "noindextest",
  "_type" : "doc",
  "_id" : "1",
  "_version" : 1,
  "result" : "created",
  "_shards" : {
    "total" : 2,
    "successful" : 1,
    "failed" : 0
  },
  "_seq_no" : 0,
  "_primary_term" : 1
}
bash-3.2$ curl -XGET 'localhost:9200/noindextest/doc/1?pretty&pretty'
{
  "_index" : "noindextest",
  "_type" : "doc",
  "_id" : "1",
  "_version" : 1,
  "found" : true,
  "_source" : {
    "name" : "John Doe"
  }
}
bash-3.2$  curl -XGET 'localhost:9200/_cat/indices?v&pretty'
health status index         uuid                   pri rep docs.count docs.deleted store.size pri.store.size
yellow open   customer      tl6qvdROTfuL380eLOxH0Q   5   1          2            0      8.3kb          8.3kb
yellow open   bank          yaVaZLiLT2G0RyA-vBn5nw   5   1       1000            0    488.3kb        488.3kb
yellow open   indexusertest EO9eYTNoT-i7_L3PBIb7dQ   5   1          1            0      4.7kb          4.7kb
yellow open   noindextest   LseoEEkrSiGuMeWXJNEMrA   5   1          1            0      4.5kb          4.5kb
bash-3.2$ curl -XPUT 'localhost:9200/noindextest/doc/1?pretty&pretty' -H 'Content-Type: application/json' -d'
> {
>   "name": "Jane Doe"
> }
> '
{
  "_index" : "noindextest",
  "_type" : "doc",
  "_id" : "1",
  "_version" : 2,
  "result" : "updated",
  "_shards" : {
    "total" : 2,
    "successful" : 1,
    "failed" : 0
  },
  "_seq_no" : 1,
  "_primary_term" : 1
}
bash-3.2$ curl -XGET 'localhost:9200/noindextest/doc/1?pretty&pretty'
{
  "_index" : "noindextest",
  "_type" : "doc",
  "_id" : "1",
  "_version" : 2,
  "found" : true,
  "_source" : {
    "name" : "Jane Doe"
  }
}

やはりインデックスできている。
結果として公式ドキュメントにならったがElasticsearch 6.0.0-rc1でインデックスを上書きしないようにはできなかった。
できる方法がみつかった or できるようになったらまたブログに記載する。

2017/10/30追記

Elasticsearch公式日本語質問フォーラムでこのことを質問してみた。

sakura818uuu.hatenadiary.com

するとJun Ohtaniさんから以下のような返答を頂いた。

ブログに書かれている設定のドキュメントにありますが、 リクエストボディにインデックス名を入力した場合に、リジェクトする機能になります。 Bulkなどで、インデックス名をリクエストボディで指定したものがあるとエラーになる機能です。 「上書き」が何を想定されているかはちょっとブログからは読み取れなかったですが、 ここで言っている「overriding」はindexの名前の「overriding」ですね。