数据库_DuckDB_概念梳理和发展趋势-526互联

数据库大致类型

 C/S关系型数据库引擎
      高并发 数据量超大 网络与应用程序分离
 	 MariaDB  MySQL, Oracle, PostgreSQL, 或者 SQL Server
 	 mysql :多线程SQL服务器组成的服务器/客户端体系结构-可伸缩性，安全性
 	 PostgreSQL使用称为多版本并发控制或MVCC的技术来维护数据并发访问期间的数据一致性
      
 本地化存储 
     嵌入式设备和物联网  数据分析 数据传送 文件归档/数据容器 替换自定义数据文件 教育与培训
 	桌面图形界面程序
       SQLite DuckDB 基于本地文件系统设计的，都有着完整的数据库体系（客户端、SQL解析器、SQL优化器和存储引擎等等）
	        So SQLite is good for OLTP and DuckDB is better for OLAP.
 	    缺乏用户管理和安全功能,无法被远程的客户端访问,适合单线程访问，对多线程高并发的场景不适用
 		缺乏细粒度访问控制以及除加密数据库文件本身之外的安全功能,在构建多用户或多租户应用程序时通常不受青睐。
 		 缺乏任何数据库即服务（DBaaS）产品

从使用用户来看

 个人操作数据库，那么经典的 CLI 或者像使用 Navicat 这样的 GUI SQL 客户端就能满足需
 需要和代码库集成-完善的项目协同能力-提供组织层面的管理能力-数据库变更变得更加稳定/高效 -保障企业中的数据安全和合规问题

解决命名冲突问题

 Catalog和Schema : Database, Catalog and Schema
 一个数据库系统包含多个Catalog，每个Catalog又包含多个Schema，而每个Schema又包含多个数 据库对象（表、视图、字段等
 Catalog名称.Schema名称.表名称
 select * from information_schema.schemata;
 SELECT name, lat, lonFROM mytest.main.cities;
 database_name :
                  system
 				  temp
 				  mytest
  schema_name: 
      information_schema 
	  pg_catalog 
	  main
      The CREATE SCHEMA statement creates a schema in the catalog. The default schema is main

查询meta数据

 The views in the information_schema are SQL-standard views that describe the catalog entries of the database. 
    
 meta-data:
   information_schema.schemata  information_schema.tables information_schema.columns
   
 implicit schemas :隐式  
 
 
  USE statement selects a database and optional schema to use as the default
   USE memory;
   USE duck.main;

性能分析（Profiling）

  EXPLAIN statement 
        logical_plan  Physical Plan	  
 Run-Time Profiling: EXPLAIN ANALYZE
 
数据库版本--数据存储版本
 DuckDB versions   Storage version	
 
升级数据库存储
  the older duckdb and using the SQL statement ：  "EXPORT DATABASE 'tmp';
  newer duckdb： IMPORT DATABASE 'tmp';"
    /older/duckdb mydata.db     -c "EXPORT DATABASE 'tmp';"
    /newer/duckdb mydata.new.db -c "IMPORT DATABASE 'tmp';"

数据库开发

 Vector  ： Flat Vectors   Constant Vectors   Dictionary Vectors  Sequence Vectors
    Unified Vector Format： String Vectors  List Vectors  Struct Vectors  Map Vectors  Union Vectors
 DataChunk：

表格数据

 tabular data 	
   深度学习在文本和图像数据集上取得了巨大进步，但它在表格数据上的目前没有优势可言	 
  Snowflake 分离了计算和存储，而 MotherDuck 将计算接入存储

软件交互范式的演进：从命令行到 GUI

从基于图形界面 (GUI) 的交互转到基于 Chat 的对话式自然语言交互
CLI---》 GUI
   CLI： CLI 交互大概从 70 年代流行到 1984 年	
   GUI： GUI 的发展则分了三个阶段，首先是 1984-1993 年是桌面端，1993-2007 年是 Web 端，2007 年到现在则是移动端
       navicat dbeaver 
   CUI： SQL Chat	 

 eg: 把 DevOps 带进数据库
    命令行客户端 CLI - mysql / psql	
    图形用户界面（GUI） -- phpMyAdmin 和 pgAdmin 是非常老牌经典的 SQL 客户端。	
                      DBeaver 已经支持把自然语言转换成 SQL
    数据库即代码（Database-as-Code）: 把代码变更的流程引入到数据库变更
         文件命名方式来控制 schema 迁移行为（惯例高于配置
    Atlas 是一个数据库 schema 管理工具

 代码（无状态部分） 数据（有状态部分） 以及代码如何与数据交互

参考

 数据库 schema与catalog	 https://www.cnblogs.com/ECNB/p/4611309.html 
 Why do tree-based models still outperform deep learning on tabular data? https://arxiv.org/abs/2207.08815
 https://github.com/antonycourtney/tad     : CSV, Parquet, and SQLite and DuckDb database files
 酷表ChatExcel https://chatexcel.com/
  https://sqlchat.ai/

sql_duckdb nuscenes数据duckdb

sql_duckdb数据库数据duckdb