knowledge Spectrum miscellaneous

发布时间 2023-09-01 01:22:13作者: 鱼市口

WordNet

WordNet is a dataset that describes the associative characteristics between English vocabulary words and also serves as a database. This database links English nouns, verbs, adjectives, and adverbs to sets of synonyms, which are further connected through semantic relationships to determine word definitions.

As its name suggests, WordNet constructs a network of words and attempts to enhance this network with semantic information. When it was initially developed in a paper from 1995, six semantic relations between words were proposed:

1. Synonymy: Basic relation of WordNet, representing synonyms and being a symmetric semantic relation.
2. Antonymy: Symmetric relation representing antonyms.
3. Hyponymy & Hypernymy: Relations indicating hierarchical semantic relationships between nouns, creating a hierarchical semantic structure.
4. Meronymy & Holonymy: Relations indicating part-whole relationships, similar to concepts like components or members.
5. Troponymy: Hierarchy of semantic descriptions mainly aimed at verbs.
6. Entailment: Relations of implication mainly between verbs.

This milestone definition of semantic relations has had a profound impact not only on the semantic relationships between words but also on the definition and structured description of semantic relationships in all ontologies.

In recent research, the WordNet dataset has been used, but not the original version proposed in 1995. Instead, subsets like WN18 and WN18RR have been utilized.

WN18 (2013) is a subset of WordNet 1995, and it mainly includes symmetric, asymmetric, and inversion relations. The type of relations affects tasks like knowledge extraction and representation, influencing model construction. The same algorithms may perform differently on datasets with different relation types.

WN18RR (2017) is a subset of WN18, preserving more symmetric, asymmetric, and composition relations while removing inversion relations.

Freebase

Freebase contains over 125,000,000 tuple relations, more than 4,000 categories, and over 7,000 attributes. It supports collaborative data creation and maintenance on a massive scale, facilitating rich associations between information and empowering their use.

Data in Freebase covers a wide range of topics and types of knowledge, including information about humans, media, geographic locations, and more. Freebase provides not only a dataset or database but also convenient access methods. It supports the Metaweb Query Language (MQL) for object-oriented queries and structured query objects. Additionally, it offers HTTP web-based access and an API interface in JSON data format.

Freebase introduced a new paradigm for structuring human knowledge, playing a significant role in guiding subsequent research and implementation of knowledge graphs and knowledge engineering.

Due to the large size and complexity of the Freebase database, recent research has focused on using subsets like FB15k and FB15k-237:

FB15k (2013): A knowledge graph containing a substantial amount of common-sense knowledge. It primarily includes symmetric, asymmetric, and inversion relations.
FB15k-237 (2017): A relatively new dataset with methods that might not perform as well as on other datasets. It's a subset of FB15k, mainly preserving symmetric, asymmetric, and composition relations while removing inversion relations.