Elasticsearch专题精讲——What's new in 8.7?

发布时间 2023-05-01 17:28:45作者: 左扬

What's new in 8.7?

https://www.elastic.co/guide/en/elasticsearch/reference/8.7/release-highlights.html , orther versions:8.6 | 8.5 | 8.4 | 8.3 | 8.2 | 8.1 | 8.0

Time series (TSDS) GA (时间序列)

Time Series Data Stream (TSDS) is a feature for optimizing Elasticsearch indices for time series data. This involves sorting the indices to achieve better compression and using synthetic _source to reduce index size. As a result, TSDS indices are significantly smaller than non-time_series indices that contain the same data. TSDS is particularly useful for managing time series data with high volume.

时间序列数据流(TSDS)是用于优化时间序列数据的 Elasticsearch 索引的一个特性。这涉及到对索引进行排序以实现更好的压缩,并使用综合 _ source 来减少索引大小。因此,TSDS 指数明显小于包含相同数据的非时间序列指数。TSDS 对于管理大容量的时间序列数据特别有用。

Downsampling GA下采样 GAedit编辑

Downsampling is a feature that reduces the number of stored documents in Elasticsearch time series indices, resulting in smaller indices and improved query latency. This optimization is achieved by pre-aggregating time series indices, using the time_series index schema to identify the time series. Downsampling is configured as an action in ILM, making it a useful tool for managing large volumes of time series data in Elasticsearch.

下采样功能可以减少 Elasticsearch 时间序列索引中存储的文档数量,从而缩小索引,提高查询延迟。这种优化是通过预聚集时间序列指标,使用时间序列指标模式来识别时间序列。下采样在 ILM 中被配置为一种行为,使其成为管理 Elasticsearch 大量时间序列数据的有用工具。

#92913

Geohex aggregations on both 都是地球六角形聚集体 geo_point and 还有 geo_shape fields 田野edit编辑

Previously Elasticsearch 8.1.0 expanded geo_grid aggregation support from rectangular tiles (geotile and geohash) to include hexagonal tiles, but for geo_point only. Now Elasticsearch 8.7.0 will support Geohex aggregations over geo_shape as well, which completes the long desired need to perform hexagonal aggregations on spatial data.

之前的 Elasticsearch 8.1.0扩展了 geo _ grid 聚合支持,从矩形瓦片(土工瓦和地质哈希)到包括六角形瓦片,但只针对 geo _ point。现在 Elasticsearch 8.7.0也将支持 geo _ form 上的 Geohex 聚合,这完成了对空间数据执行六边形聚合的长期需求。

Kibana map with geohex aggregation inclusing polygons and lines

In 2018 Uber announced they had open sourced their H3 library, enabling hexagonal tiling of the planet for much better analytics of their traffic and regional pricing models. The use of hexagonal tiles for analytics has become increasingly popular, due to the fact that each tile represents a very similar geographic area on the planet, as well as the fact that the distance between tile centers is very similar in all directions, and consistent across the map. These benefits are now available to all Elasticsearch users.

2018年,优步(Uber)宣布开放了自己的 h3库,使地球正六边形镶嵌能够更好地分析自己的流量和区域定价模型。使用六角瓦片进行分析已经变得越来越流行,因为每个瓦片代表地球上非常相似的地理区域,以及瓦片中心之间的距离在所有方向上都非常相似,并且在整个地图上都是一致的。现在所有 Elasticsearch 用户都可以享受这些好处。

#91956

Allow more than one KNN search clause允许多个 KNN 搜索子句edit编辑

Some vector search scenarios require relevance ranking using a few kNN clauses, e.g. when ranking based on several fields, each with its own vector, or when a document includes a vector for the image and another vector for the text. The user may want to obtain relevance ranking based on a combination of all of these kNN clauses.

一些向量搜索场景需要使用几个 kNN 子句进行相关性排序,例如,当基于几个字段进行排序时,每个字段都有自己的向量,或者当文档包含图像的向量和文本的另一个向量时。用户可能希望基于所有这些 kNN 子句的组合获得相关性排名。

#92118

Make natural language processing GA制作自然语言处理 GAedit编辑

From 8.7, NLP model management, model allocation, and support for inference against third party models are generally available. (The new text_embedding extension to knn search is still in technical preview.)

从8.7开始,NLP 模型管理、模型分配和支持对第三方模型的推理通常是可用的。(knn 搜索的新文本嵌入扩展仍处于技术预览阶段。)

#92213

Speed up ingest geoip processors加速摄取地理位置处理器edit编辑

The geoip ingest processor is significantly faster.

Geoip 摄取处理器明显更快。

Previous versions of the geoip library needed special permission to execute databinding code, requiring an expensive permissions check and AccessController.doPrivileged call. The current version of the geoip library no longer requires that, however, so the expensive code has been removed, resulting in better performance for the ingest geoip processor.

以前版本的 Geoip 库需要特殊权限来执行数据绑定代码,需要昂贵的权限检查和 AccessController.doPrivileged 调用。然而,当前版本的 Geoip 库不再需要这个功能,因此删除了昂贵的代码,从而为摄取的 Geoip 处理器带来了更好的性能。

#92372

Speed up ingest set and append processors加速摄取集和追加处理器edit编辑

The set and append ingest processors that use mustache templates are significantly faster.

设置和追加使用胡子模板的摄取处理器要快得多。

#92395

Improved downsampling performance改进的下采样性能edit编辑

Several improvements were made to the performance of downsampling. All hashmap lookups were removed. Also metrics/label producers were modified so that they extract the doc_values directly from the leaves. This allows for extra optimizations for cases such as labels/counters that do not extract doc_values unless they are consumed. Those changes yielded a 3x-4x performance improvement of the downsampling operation, as measured by our benchmarks.

对下采样的性能作了一些改进。所有的散列表查找都被删除了。还修改了度量/标签生成器,以便它们直接从叶子中提取 doc _ value。这允许对诸如标签/计数器之类的情况进行额外的优化,除非使用了 doc _ value,否则它们不会提取 doc _ value。根据我们的基准测试,这些变化使得下采样操作的性能提高了3-4倍。

#92494

The Health API is now generally availableHealthAPI 现在普遍可用edit编辑

Elasticsearch introduces a new Health API designed to report the health of the cluster. The new API provides both a high level overview of the cluster health, and a very detailed report that can include a precise diagnosis and a resolution.

Elasticsearch 引入了一个新的 HealthAPI,旨在报告集群的健康状况。新的 API 既提供了集群健康状况的高级概述,也提供了包括精确诊断和解决方案的非常详细的报告。

#92879

Improved performance for get, mget and indexing with explicit `_id`s使用显式的“ _ id”改进了 get、 mget 和索引的性能edit编辑

The false positive rate for the bloom filter on the _id field was reduced from ~10% to ~1%, reducing the I/O load if a term is not present in a segment. This improves performance when retrieving documents by _id, which happens when performing get or mget requests, or when issuing _bulk requests that provide explicit `_id`s.

开花滤波器的假阳性率从 ~ 10% 降低到 ~ 1% ,减少了 I/O 负荷,如果一个项目没有出现在一个段。这提高了通过 _ id 检索文档时的性能,这在执行 get 或 mget 请求时发生,或者在发出提供显式‘ _ id’的 _ mass 请求时发生。

#93283

Speed up ingest processing with multiple pipelines使用多个管道加速摄取处理edit编辑

Processing documents with both a request/default and a final pipeline is significantly faster.

处理同时具有请求/默认值和最终管道的文档要快得多。

Rather than marshalling a document from and to json once per pipeline, a document is now marshalled from json before any pipelines execute and then back to json after all pipelines have executed.

现在不需要在每个管道之前将文档从 json 封送到 json,而是在执行任何管道之前将文档从 json 封送到 json,然后在执行所有管道之后将文档返回到 json。

#93329

Support geo_grid ingest processor支持 geo _ grid 摄取处理器edit编辑

The geo_grid ingest processor supports creating indexable geometries from geohash, geotile and H3 cells.

Geo _ grid 摄取处理器支持从 Geohash、 Geotiles 和 H3单元创建可索引的几何图形。

There already exists a circle ingest processor that creates a polygon from a point and radius definition. This concept is useful when there is need to use spatial operations that work with indexable geometries on geometric objects that are not defined spatially (or at least not indexable by lucene). In this case, the string 4/8/5 does not have spatial meaning, until we interpret it as the address of a rectangular geotile, and save the bounding box defining its border for further use. Likewise we can interpret geohash strings like u0 as a tile, and H3 strings like 811fbffffffffff as an hexagonal cell, saving the cell border as a polygon.

已经存在一个圆摄取处理器,它根据点和半径定义创建一个多边形。当需要使用空间操作时,这个概念非常有用,这些空间操作可以对没有空间定义的几何对象(或者至少不能被 Lucene 索引)使用可索引的几何形状。在这种情况下,字符串4/8/5没有空间意义,直到我们将其解释为一个矩形土工织物的地址,并保存定义其边界的边框以供进一步使用。同样,我们可以将 u0这样的地理哈希字符串解释为平铺字符串,将811fbffffffff 这样的 H3字符串解释为六边形单元格,从而将单元格边界保存为多边形。

Kibana map with three H3 layers: cell

#93370

Make 制造 frequent_item_sets aggregation GA 聚合 GAedit编辑

The frequent_item_sets aggregation has been moved from technical preview to general availability.

经常项目集聚合已经从技术预览转移到通用可用性。

#93421

Release time_series and rate (on counter fields) aggegations as tech preview作为技术预览发布时间序列和速率(在计数器字段上)聚合edit编辑

Make time_series aggregation and rate aggregation (on counter fields) available without using the time series feature flag. This change makes these aggregations available as tech preview.

使时间序列聚合和速率聚合(在计数器字段上)可用,而不使用时间序列特性标志。此更改使这些聚合可以作为技术预览。

Currently there is no documentation about the time_series aggregation. This will be added in a followup change.

目前没有关于 time _ Series 聚合的文档,这将在后续更改中添加。