Elasticsearch专题精讲—— REST APIs —— Document APIs —— Bulk API

发布时间 2023-06-09 14:36:14作者: 左扬

REST APIs —— Document APIs —— Bulk API

https://www.elastic.co/guide/en/elasticsearch/reference/8.8/docs-bulk.html#docs-bulk

Performs multiple indexing or delete operations in a single API call. This reduces overhead and can greatly increase indexing speed.

在单个 API 调用中执行多个建立索引或删除操作。这降低了开销,并且可以极大地提高索引速度。

curl -X POST "localhost:9200/_bulk?pretty" -H 'Content-Type: application/json' -d'
        { "index" : { "_index" : "test", "_id" : "1" } }
        { "field1" : "value1" }
        { "delete" : { "_index" : "test", "_id" : "2" } }
        { "create" : { "_index" : "test", "_id" : "3" } }
        { "field1" : "value3" }
        { "update" : {"_id" : "1", "_index" : "test"} }
        { "doc" : {"field2" : "value2"} }
        '

1、Request(请求)

https://www.elastic.co/guide/en/elasticsearch/reference/8.8/docs-bulk.html#docs-bulk-api-request

        POST /_bulk

        POST /_mass
        
        POST /< target>/_bulk
        
        POST /< target>/_mass

2、Prerequisites(先决条件)

https://www.elastic.co/guide/en/elasticsearch/reference/8.8/docs-bulk.html#docs-bulk-api-prereqs

If the Elasticsearch security features are enabled, you must have the following index privileges for the target data stream, index, or index alias:

如果启用了 Elasticsearch 安全特性,您必须对目标数据流、索引或别名拥有以下索引特权:

  • To use the create action, you must have the create_doc, create, index, or write index privilege. Data streams support only the create action.
  • To use the index action, you must have the create, index, or write index privilege.
  • To use the delete action, you must have the delete or write index privilege.
  • To use the update action, you must have the index or write index privilege.
  • To automatically create a data stream or index with a bulk API request, you must have the auto_configure, create_index, or manage index privilege.
  • To make the result of a bulk operation visible to search using the refresh parameter, you must have the maintenance or manage index privilege.

Automatic data stream creation requires a matching index template with data stream enabled. See Set up a data stream.

3、Description(描述)

https://www.elastic.co/guide/en/elasticsearch/reference/8.8/docs-update-by-query.html#docs-update-by-query-api-desc

You can specify the query criteria in the request URI or the request body using the same syntax as the Search API.

您可以使用与 SearchAPI 相同的语法在请求 URI 或请求体中指定查询条件。

When you submit an update by query request, Elasticsearch gets a snapshot of the data stream or index when it begins processing the request and updates matching documents using internal versioning. When the versions match, the document is updated and the version number is incremented. If a document changes between the time that the snapshot is taken and the update operation is processed, it results in a version conflict and the operation fails. You can opt to count version conflicts instead of halting and returning by setting conflicts to proceed. Note that if you opt to count version conflicts the operation could attempt to update more documents from the source than max_docs until it has successfully updated max_docs documents, or it has gone through every document in the source query.

当您提交更新请求时,在 Elasticsearch 开始处理请求时,它会获取数据流或索引的快照,并使用内部版本控制更新匹配的文档。当版本匹配时,文档将被更新并且版本号会递增。如果文档在获取快照和处理更新操作之间发生更改,则会发生版本冲突并且操作失败。您可以选择计算版本冲突而不是停止和返回,通过将 conflicts 设置为 proceed。请注意,如果您选择计算版本冲突,操作可能会尝试从源中更新更多的文档,直到成功更新了 max_docs 个文档,或者它已经遍历了源查询中的每个文档。