MongoDB入门指南：文档的基本概念

1. MongoDB 文档简介

在 MongoDB 中，文档（Document）是数据的基本单元，类似于关系型数据库中的行（Row），但它的结构更加灵活和强大。文档以键值对（key-value pairs）的形式存储数据，其中键是字符串，值可以是各种数据类型，包括基本数据类型（如字符串、数字、布尔值等）、复杂数据类型（如数组、嵌套文档等）。这种灵活性使得 MongoDB 能够轻松处理各种不同结构的数据，而无需像关系型数据库那样预先定义严格的表结构。

1.1 文档的表示形式

在 MongoDB 中，文档通常以 JSON（JavaScript Object Notation）格式表示。例如，以下是一个简单的 MongoDB 文档，用于表示一个人的信息：

{
    "name": "Alice",
    "age": 30,
    "email": "alice@example.com",
    "hobbies": ["reading", "painting"],
    "address": {
        "street": "123 Main St",
        "city": "Anytown",
        "state": "CA",
        "zip": "12345"
    }
}

在上述文档中，name、age、email 等是键，对应的值分别为字符串、数字、字符串。hobbies 键对应的值是一个数组，而 address 键对应的值是一个嵌套文档。

1.2 文档与集合的关系

文档存储在集合（Collection）中，集合类似于关系型数据库中的表（Table），但集合不需要预先定义结构。一个集合可以包含多个文档，这些文档可以具有不同的结构。例如，我们可以创建一个名为 users 的集合，然后向其中插入不同结构的用户文档：

// 使用 MongoDB Node.js 驱动
const { MongoClient } = require('mongodb');

async function insertDocuments() {
    const uri = "mongodb://localhost:27017";
    const client = new MongoClient(uri);

    try {
        await client.connect();
        const database = client.db('test');
        const usersCollection = database.collection('users');

        const user1 = {
            "name": "Bob",
            "age": 25,
            "email": "bob@example.com"
        };

        const user2 = {
            "name": "Charlie",
            "email": "charlie@example.com",
            "phone": "123-456-7890"
        };

        const result1 = await usersCollection.insertOne(user1);
        const result2 = await usersCollection.insertOne(user2);

        console.log(`Inserted user1 with _id: ${result1.insertedId}`);
        console.log(`Inserted user2 with _id: ${result2.insertedId}`);
    } finally {
        await client.close();
    }
}

insertDocuments().catch(console.error);

在上述代码中，我们使用 MongoDB Node.js 驱动连接到本地 MongoDB 实例，然后在 test 数据库的 users 集合中插入了两个结构略有不同的用户文档。

2. 文档的数据类型

MongoDB 支持多种数据类型，这些数据类型使得文档能够存储丰富多样的数据。了解这些数据类型对于正确使用 MongoDB 至关重要。

2.1 基本数据类型

2.1.1 字符串（String）

字符串是最常见的数据类型之一，用于存储文本数据。在 MongoDB 中，字符串必须是 UTF - 8 编码的。例如：

{
    "name": "David",
    "department": "Engineering"
}

2.1.2 数字（Number）

MongoDB 支持多种数字类型，包括 32 位整数（Int32）和 64 位整数（Int64），以及 64 位浮点数（Double）。默认情况下，MongoDB 使用 Double 类型存储数字。例如：

{
    "age": 28,
    "salary": 5000.5
}

如果需要存储整数，可以显式指定类型。例如，在 MongoDB 的 shell 中：

db.users.insertOne({
    "age": NumberInt(28)
});

2.1.3 布尔值（Boolean）

布尔值用于表示真（true）或假（false）。例如：

{
    "isActive": true,
    "isAdmin": false
}

2.1.4 日期（Date）

日期类型用于存储日期和时间信息。在 MongoDB 中，日期以 UTC 时间存储。可以使用 JavaScript 的 Date 对象来创建日期值。例如：

const today = new Date();
db.orders.insertOne({
    "orderDate": today
});

2.1.5 null

null 类型表示一个空值或缺失值。例如：

{
    "middleName": null
}

2.2 复杂数据类型

2.2.1 数组（Array）

数组在 MongoDB 文档中非常有用，可以用来存储多个值或文档。例如，一个用户可能有多个电话号码：

{
    "name": "Eve",
    "phoneNumbers": ["123-456-7890", "098-765-4321"]
}

数组也可以包含嵌套文档。例如，一个订单可能包含多个订单项：

{
    "orderId": "12345",
    "orderItems": [
        {
            "product": "Laptop",
            "quantity": 1,
            "price": 1000
        },
        {
            "product": "Mouse",
            "quantity": 2,
            "price": 50
        }
    ]
}

2.2.2 嵌套文档（Embedded Document）

如前文提到的地址文档示例，嵌套文档允许在一个文档中嵌入另一个文档结构。这在表示具有复杂关系的数据时非常有用。例如，一个员工文档可以包含其经理的信息作为嵌套文档：

{
    "employeeName": "Frank",
    "manager": {
        "name": "Grace",
        "department": "Management"
    }
}

2.2.3 ObjectId

ObjectId 是 MongoDB 为每个文档自动生成的唯一标识符。它是一个 12 字节的十六进制字符串，由以下部分组成：

时间戳（4 字节）：表示文档创建的时间。
机器标识符（3 字节）：标识生成 ObjectId 的机器。
进程标识符（2 字节）：标识生成 ObjectId 的进程。
计数器（3 字节）：一个自增的计数器，用于确保在同一秒内生成的 ObjectId 是唯一的。

在插入文档时，如果没有指定 _id 字段，MongoDB 会自动为文档生成一个 ObjectId。例如：

const { MongoClient } = require('mongodb');

async function insertDocument() {
    const uri = "mongodb://localhost:27017";
    const client = new MongoClient(uri);

    try {
        await client.connect();
        const database = client.db('test');
        const collection = database.collection('documents');

        const document = {
            "title": "Sample Document"
        };

        const result = await collection.insertOne(document);
        console.log(`Inserted document with _id: ${result.insertedId}`);
    } finally {
        await client.close();
    }
}

insertDocument().catch(console.error);

在上述代码中，我们插入了一个文档，MongoDB 为其自动生成了一个 ObjectId，并在控制台输出。

3. 文档的操作

对文档的操作是 MongoDB 开发的核心部分，包括插入、查询、更新和删除等操作。

3.1 插入文档

3.1.1 insertOne 方法

insertOne 方法用于向集合中插入单个文档。在 MongoDB 的 shell 中：

db.users.insertOne({
    "name": "George",
    "age": 35,
    "email": "george@example.com"
});

在 Node.js 中，使用 MongoDB 驱动：

const { MongoClient } = require('mongodb');

async function insertOneDocument() {
    const uri = "mongodb://localhost:27017";
    const client = new MongoClient(uri);

    try {
        await client.connect();
        const database = client.db('test');
        const usersCollection = database.collection('users');

        const user = {
            "name": "George",
            "age": 35,
            "email": "george@example.com"
        };

        const result = await usersCollection.insertOne(user);
        console.log(`Inserted document with _id: ${result.insertedId}`);
    } finally {
        await client.close();
    }
}

insertOneDocument().catch(console.error);

3.1.2 insertMany 方法

insertMany 方法用于向集合中插入多个文档。在 MongoDB 的 shell 中：

db.users.insertMany([
    {
        "name": "Hank",
        "age": 22,
        "email": "hank@example.com"
    },
    {
        "name": "Ivy",
        "age": 27,
        "email": "ivy@example.com"
    }
]);

在 Node.js 中：

const { MongoClient } = require('mongodb');

async function insertManyDocuments() {
    const uri = "mongodb://localhost:27017";
    const client = new MongoClient(uri);

    try {
        await client.connect();
        const database = client.db('test');
        const usersCollection = database.collection('users');

        const users = [
            {
                "name": "Hank",
                "age": 22,
                "email": "hank@example.com"
            },
            {
                "name": "Ivy",
                "age": 27,
                "email": "ivy@example.com"
            }
        ];

        const result = await usersCollection.insertMany(users);
        console.log(`Inserted ${result.insertedCount} documents`);
        result.insertedIds.forEach((id, index) => {
            console.log(`Inserted document ${index + 1} with _id: ${id}`);
        });
    } finally {
        await client.close();
    }
}

insertManyDocuments().catch(console.error);

3.2 查询文档

3.2.1 find 方法基础

find 方法用于从集合中查询文档。最简单的形式是查询集合中的所有文档。在 MongoDB 的 shell 中：

db.users.find();

这将返回 users 集合中的所有文档。在 Node.js 中：

const { MongoClient } = require('mongodb');

async function findAllDocuments() {
    const uri = "mongodb://localhost:27017";
    const client = new MongoClient(uri);

    try {
        await client.connect();
        const database = client.db('test');
        const usersCollection = database.collection('users');

        const cursor = usersCollection.find({});
        const users = await cursor.toArray();
        console.log(users);
    } finally {
        await client.close();
    }
}

findAllDocuments().catch(console.error);

3.2.2 条件查询

可以在 find 方法中传递查询条件来筛选文档。例如，查询年龄大于 30 岁的用户：在 MongoDB 的 shell 中：

db.users.find({ "age": { "$gt": 30 } });

在 Node.js 中：

const { MongoClient } = require('mongodb');

async function findUsersAboveAge() {
    const uri = "mongodb://localhost:27017";
    const client = new MongoClient(uri);

    try {
        await client.connect();
        const database = client.db('test');
        const usersCollection = database.collection('users');

        const query = { "age": { "$gt": 30 } };
        const cursor = usersCollection.find(query);
        const users = await cursor.toArray();
        console.log(users);
    } finally {
        await client.close();
    }
}

findUsersAboveAge().catch(console.error);

3.2.3 复合条件查询

可以使用逻辑运算符（如 $and、$or）进行复合条件查询。例如，查询年龄大于 30 岁且城市为 "New York" 的用户：在 MongoDB 的 shell 中：

db.users.find({
    "$and": [
        { "age": { "$gt": 30 } },
        { "address.city": "New York" }
    ]
});

在 Node.js 中：

const { MongoClient } = require('mongodb');

async function findUsersWithComplexCondition() {
    const uri = "mongodb://localhost:27017";
    const client = new MongoClient(uri);

    try {
        await client.connect();
        const database = client.db('test');
        const usersCollection = database.collection('users');

        const query = {
            "$and": [
                { "age": { "$gt": 30 } },
                { "address.city": "New York" }
            ]
        };
        const cursor = usersCollection.find(query);
        const users = await cursor.toArray();
        console.log(users);
    } finally {
        await client.close();
    }
}

findUsersWithComplexCondition().catch(console.error);

3.3 更新文档

3.3.1 updateOne 方法

updateOne 方法用于更新集合中的单个文档。例如，将名为 "Alice" 的用户的年龄增加 1：在 MongoDB 的 shell 中：

db.users.updateOne(
    { "name": "Alice" },
    { "$inc": { "age": 1 } }
);

在 Node.js 中：

const { MongoClient } = require('mongodb');

async function updateOneUser() {
    const uri = "mongodb://localhost:27017";
    const client = new MongoClient(uri);

    try {
        await client.connect();
        const database = client.db('test');
        const usersCollection = database.collection('users');

        const filter = { "name": "Alice" };
        const update = { "$inc": { "age": 1 } };
        const result = await usersCollection.updateOne(filter, update);
        console.log(`Matched ${result.matchedCount} documents`);
        console.log(`Modified ${result.modifiedCount} documents`);
    } finally {
        await client.close();
    }
}

updateOneUser().catch(console.error);

3.3.2 updateMany 方法

updateMany 方法用于更新集合中的多个文档。例如，将所有用户的年龄增加 1：在 MongoDB 的 shell 中：

db.users.updateMany(
    {},
    { "$inc": { "age": 1 } }
);

在 Node.js 中：

const { MongoClient } = require('mongodb');

async function updateManyUsers() {
    const uri = "mongodb://localhost:27017";
    const client = new MongoClient(uri);

    try {
        await client.connect();
        const database = client.db('test');
        const usersCollection = database.collection('users');

        const filter = {};
        const update = { "$inc": { "age": 1 } };
        const result = await usersCollection.updateMany(filter, update);
        console.log(`Matched ${result.matchedCount} documents`);
        console.log(`Modified ${result.modifiedCount} documents`);
    } finally {
        await client.close();
    }
}

updateManyUsers().catch(console.error);

3.4 删除文档

3.4.1 deleteOne 方法

deleteOne 方法用于删除集合中的单个文档。例如，删除名为 "Bob" 的用户：在 MongoDB 的 shell 中：

db.users.deleteOne({ "name": "Bob" });

在 Node.js 中：

const { MongoClient } = require('mongodb');

async function deleteOneUser() {
    const uri = "mongodb://localhost:27017";
    const client = new MongoClient(uri);

    try {
        await client.connect();
        const database = client.db('test');
        const usersCollection = database.collection('users');

        const filter = { "name": "Bob" };
        const result = await usersCollection.deleteOne(filter);
        console.log(`Deleted ${result.deletedCount} document`);
    } finally {
        await client.close();
    }
}

deleteOneUser().catch(console.error);

3.4.2 deleteMany 方法

deleteMany 方法用于删除集合中的多个文档。例如，删除所有年龄小于 20 岁的用户：在 MongoDB 的 shell 中：

db.users.deleteMany({ "age": { "$lt": 20 } });

在 Node.js 中：

const { MongoClient } = require('mongodb');

async function deleteManyUsers() {
    const uri = "mongodb://localhost:27017";
    const client = new MongoClient(uri);

    try {
        await client.connect();
        const database = client.db('test');
        const usersCollection = database.collection('users');

        const filter = { "age": { "$lt": 20 } };
        const result = await usersCollection.deleteMany(filter);
        console.log(`Deleted ${result.deletedCount} documents`);
    } finally {
        await client.close();
    }
}

deleteManyUsers().catch(console.error);

4. 文档的索引

索引在 MongoDB 中对于提高查询性能至关重要。就像在一本书的目录一样，索引可以帮助 MongoDB 更快地定位到满足查询条件的文档。

4.1 创建索引

4.1.1 单字段索引

单字段索引是基于单个字段创建的索引。例如，为 users 集合的 email 字段创建索引：在 MongoDB 的 shell 中：

db.users.createIndex({ "email": 1 });

在 Node.js 中：

const { MongoClient } = require('mongodb');

async function createIndex() {
    const uri = "mongodb://localhost:27017";
    const client = new MongoClient(uri);

    try {
        await client.connect();
        const database = client.db('test');
        const usersCollection = database.collection('users');

        const result = await usersCollection.createIndex({ "email": 1 });
        console.log(`Index created: ${result}`);
    } finally {
        await client.close();
    }
}

createIndex().catch(console.error);

4.1.2 复合索引

复合索引是基于多个字段创建的索引。例如，为 users 集合的 age 和 name 字段创建复合索引：在 MongoDB 的 shell 中：

db.users.createIndex({ "age": 1, "name": 1 });

在 Node.js 中：

const { MongoClient } = require('mongodb');

async function createCompoundIndex() {
    const uri = "mongodb://localhost:27017";
    const client = new MongoClient(uri);

    try {
        await client.connect();
        const database = client.db('test');
        const usersCollection = database.collection('users');

        const result = await usersCollection.createIndex({ "age": 1, "name": 1 });
        console.log(`Compound index created: ${result}`);
    } finally {
        await client.close();
    }
}

createCompoundIndex().catch(console.error);

4.2 使用索引

当查询条件与索引字段匹配时，MongoDB 会使用索引来加速查询。例如，在创建了 email 字段的索引后，查询特定 email 的用户会更快：在 MongoDB 的 shell 中：

db.users.find({ "email": "alice@example.com" });

在 Node.js 中：

const { MongoClient } = require('mongodb');

async function findUserByEmail() {
    const uri = "mongodb://localhost:27017";
    const client = new MongoClient(uri);

    try {
        await client.connect();
        const database = client.db('test');
        const usersCollection = database.collection('users');

        const query = { "email": "alice@example.com" };
        const cursor = usersCollection.find(query);
        const user = await cursor.toArray();
        console.log(user);
    } finally {
        await client.close();
    }
}

findUserByEmail().catch(console.error);

5. 文档的嵌套与关联

在 MongoDB 中，处理数据之间的关系有两种常见方式：文档嵌套和文档关联。

5.1 文档嵌套

文档嵌套是将相关的数据直接嵌入到一个文档中。例如，一个订单文档可以直接嵌入订单项的信息：

{
    "orderId": "67890",
    "customer": "John Doe",
    "orderItems": [
        {
            "product": "T - Shirt",
            "quantity": 2,
            "price": 20
        },
        {
            "product": "Jeans",
            "quantity": 1,
            "price": 50
        }
    ]
}

这种方式适用于数据之间紧密关联且子文档数量不会过多的情况。它的优点是查询时可以通过一次查询获取所有相关数据，缺点是如果子文档频繁更新，可能会导致文档大小超出限制。

5.2 文档关联

文档关联是通过引用其他文档的 _id 来建立关系。例如，一个订单文档可以引用客户文档的 _id：

{
    "orderId": "13579",
    "customerId": ObjectId("5f4f4f4f4f4f4f4f4f4f4f4f"),
    "orderItems": [
        {
            "product": "Book",
            "quantity": 3,
            "price": 15
        }
    ]
}

然后可以通过 customerId 来查找对应的客户文档。在 MongoDB 的 shell 中：

const order = db.orders.findOne({ "orderId": "13579" });
const customer = db.customers.findOne({ "_id": order.customerId });

这种方式适用于数据之间关系较为松散，或者子文档数量可能较多的情况。它的优点是数据的更新和维护更灵活，缺点是查询相关数据时可能需要进行多次查询。

6. 文档的规范化与反规范化

在设计 MongoDB 数据模型时，需要考虑文档的规范化与反规范化。

6.1 规范化

规范化是将数据分解为多个相关的集合，以减少数据冗余。例如，在一个电子商务系统中，可以将客户信息、订单信息和产品信息分别存储在不同的集合中，并通过 _id 进行关联。这种方式有助于保持数据的一致性，但可能会增加查询的复杂性，因为需要进行多表连接（在 MongoDB 中通过多次查询实现）。

6.2 反规范化

反规范化是有意在文档中引入一定的数据冗余，以提高查询性能。例如，在订单文档中重复客户的部分信息（如姓名、地址），这样在查询订单时就不需要再查询客户集合。但反规范化可能会导致数据一致性问题，因为如果客户信息发生变化，需要同时更新多个相关文档。

在实际应用中，通常需要根据具体的业务需求和性能要求来平衡规范化和反规范化。如果查询性能是关键，且数据一致性要求不是特别严格，可以适当采用反规范化；如果数据一致性非常重要，且查询复杂度可以接受，则倾向于规范化。

通过深入理解 MongoDB 文档的基本概念、数据类型、操作、索引、嵌套与关联以及规范化与反规范化，开发者能够更有效地使用 MongoDB 来构建高性能、灵活的数据存储和管理系统。无论是处理简单的业务数据还是复杂的大数据场景，这些知识都是至关重要的。