You've successfully subscribed to The Daily Awesome
Welcome back! You've successfully signed in.
Success! Your billing info is updated.
Billing info update failed.

# 数据库 | 数据是什么 - Data: What is it?

[toc]

## 基本概念

Data is a collection of symbols recorded things, it represent some meaning (information) only if it is interpreted.

Information:

• factors: "meaning", "humans", "assign to", "data"
• concept: Information is a concept / notion that holds only if there is a recipient.

Database: A database is a collection of logically related data stored structurally, organizationally, and together in a computing system.

A collection of data that is managed be a DBMS.

Database management system (DBMS):

A database management system is a software system running on a computing system that provides a systematic approach to manage data stored in a database and control access to the database.

Database system:

• A database system consists of a database and a database management system managing it. (DBS = DB + DBMS)
• Management of data involves both defining structures for storeage of data (Meta-Data) and providing mechanisms for the manipulation of data. (DDL + SQL)
• the database system must ensure the safety and security of data stored. despit system crashes or attemptes at unauthorized access. (权限管理，数据的几个性质)
• Note: 使用DBMS管理不同的数据库也许需要创建和使用不同的DBS

Database application:

​ A database application is simply a program that interacts with a database system at some point in its execution.

File-based system:

• Based on the connection of the files
• Data redundancy (duplication) and inconsistency
• Data separation and isolation
• Data dependence
• Integrity problems
• Atomicity problems
• Concurrent-access problems
• Security problems

## Problems of the file-based system

Two intrinsic factors of the file-based approach

• The defination of the data is embedded in the application programs, rather than being stored separately and independently. 【数据由程序管理】
• There is no control over the access and manipulation of data beyond that imposed by the application programs.【无法整体操作】

Data redundancy (duplication) and inconsistency: 【冗余 与 不一致】

​ The same data may be duplicted in saveral places, which may lead to data inconsistency, that is, the various copies of the same data may no longer agree.

Data separation and isolation: 【数据隔离】（难以跨文件访问）

​ Data scattered in various files may be in different format, causing diffcult for programs to retrieve.

Data dependence: 【数据依赖】 (数据与程序的相关性)

​ The physical structure and storage of the data files and records are defined in the application code. (program-data dependence)

Integrity problems: 【整体性】(真实，限制)

​ The data values stored in the database must satisfy certain types of consistency constraints, which is hard to be implemented and modified in the program.

Atomicity problem: 【原子性】（事件进度只能是 0 或 1）

​ It is difficult to ensure atomicity in a conventional file-processing system**.**

Concurrent-access problems: 【并发】

​ Interaction of concurrent updates may result in inconsistent data if the different application programs have not been coordinated previously.

Security problem: （外部攻击； 内部泄露）

​ Not all user of the DBS should be able to access all data.

## The Conceptual Layers of a Database Implementation

• 数据库权限等细节数据调用
• Find and actual type of data
• 映射

## DBMS

### DBMS as an Interface

• Users and application programs do not directly manipulate the database
• The actual manipulation of the database is performed by the DBMS
• It allows for the construction and use of abstract tools
• The details of how the database is actually stored are isolated within the DBMS, hence the design of an application software can be greatly simplified.

### Functions of DBMS

• Services for users
• Data-Defination Language (DDL) 【定义数据库，得到空表】
• Data-Manipulation Language (DML) 【数据操作， 增删查改】
• Query Language (QL $\subsetneqq DML$)
• Management for DBs
• Support the storage of very large amount of data over a long period of time, allow efficient access to the data for queries and DB modifications.
• Enable durability (recovery)
• without allowing unexpected interactions among users (called isolation)
• without actions on the data to be performed partially but not completely (called atomicity)

Control of data redundancy:

​ Eliminate the redundancy by integrating the files. (multiple copies of same data are not stored)  (not entirely eliminate redundancy, but control the amount [trade off with complexity])

Data consistency:

​ By eliminating the redundancy, we improve the consistence.

​ "Derive imformation from the same data"

Improve data integrity:

​ Refers to the validity and consistency. [constrains]

Sharing of data:

​ New applications can build on the existing data in DB and add only data that is not currently stored, and can rely on the functions provided by the DBMS, such as data definition and manipulation, and concurency and recovery control.

Improve security:

​ Protection the DB from unauthorized user.

Enforcement of standards:

​ Including data format, naming conventions, documentation standards, update procedures and access rules.

Economy of scale: cost saving (a set of apps work on the same data source)

Balance of conflicting requirements:

​ DBA can make decisions about the design and operational use of the DB that provide the best use of resources for the organization as a whole.

Improve data accessibility and responsiveness:

​ data from many sources is directly accessible to the end-users, allow users to ask and hoc questions and to obtain the required imformation almost immediately.

Increase productivity:

​ provides many of the standard functions, help programmers to concentrate on the specific functionality without worring the low-level implementation details.

Improve maintaince through data independence:

​ Separate the data description from the applications, thereby making applications immune to changes in the data descriptions.

Increase concurrency:

​ Manage concurrent DB access.

Improve backup and recovery services:

​ Provide facilities to minimize the amount of processing that is lost following the failure.

Complexity

​ Must understand its functionality to take full advantage of it. (Poor understand have serious consequences)

Size:

​ Complexity and breadth of functionality makes the DBMS an extremely large piece  of software.

Cost of DBMS:

​ Depending on the environment and functionality provided.

​ Size requires the purchase of additional hardware storage space; To achieve the required performance, may be nessary to purchase a larger machine, perhaps more machines.

Cost of conversion:

​ cost of existing applications to run on the new DBMS and hardware.

Performance:

​ DBMS is written more general with less performance than file-base system.

Greater impact of  failure:

​ Many applications and users rely on the availability of the DBMS.

Omit

## Notes

### 信息和数据的区别 - Difference between information and data

• The existence of the recipient:
• Information: Require recipient, be strongly linked to and dependent on the value of the recipient
• Data: not required
• Interpretation:
• Information: Its meaning is depended on the values of recipient, has no meaning if the recipient cannot understand.
• Data: Has no meaning if not being interpreted, may has different meaning by different interpretation.
graph LR
A[Plain text] -->B[Encryption]
B --> C[ciphertext]
C --> D[Decryption]
D --> E[original plaintext]

• Plain text: Information
• Ciphertext: Data