Mongo 3.0 学习

16 Mar 2015 • Wenbin Zhang

Mongo在15年3月发布了, 更改了存储引擎, 并支持引擎插拔. 优化了索引和读写控制. 正好最近项目有需求想要试水 Mongo3.0, 那么就开始吧!

Mongo和RDBMS的对比:

SQL对比

MongoDB的锁:

锁Concurrency

提供读写锁, 3.0 MMAPv1引擎支持Collection Level的锁, WiredTiger引擎提供document level的锁; 2.2 ~ 3.0 只能提供database级别的锁

读写锁会对其他锁让步. 长时间的读/写操作都会做出让步. MongoDB的启发式算法会判断数据是否在物理内存, 非物理内存的操作会让步于从内存读取的操作, 一旦数据load到内存, 则会重新获得锁.

当查询索引数据时, 无论是否在物理内存中, Mongo不会释放锁.

读写锁的场景

Operation	Lock Type
Issue a query	Read lock
Get more data from a cursor	Read lock
Insert data	Write lock
Remove data	Write lock
Update data	Write lock
Map-reduce	Read lock and write lock, unless operations are specified as non-atomic. Portions of map-reduce jobs can run concurrently.
Create an index	Building an index in the foreground, which is the default, locks the database for extended periods of time.

DDL操作:

建表/表结构变更/删表

建表:

Mongo直接使用insert即可, 如未指定, _id作为unique key自动加入表结构中, 也可以单纯建表

db.users.insert( {
    user_id: "abc123",
    age: 55,
    status: "A"
 } )
 
db.createCollection("users") 

表结构变更:

无强制schema要求, 如果想为之前的数据增减字段, 则可用update. 文档

db.users.update(
    { },
    { $set: { join_date: new Date() } },
    { multi: true }
)

删表: 删掉collection, 同时删掉关联的相关索引. 此操作会锁整个database

db.users.drop()

建索引

相关文档: http://docs.mongodb.org/manual/reference/method/db.collection.createIndex/#additional-information http://docs.mongodb.org/manual/core/indexes-introduction/

Mongo支持单列索引, 复合索引, 还支持multi-key index, geospatial index. 也支持唯一索引, 稀疏索引, hash索引, 也支持索引TTL
所有的索引在Mongo都是B-Tree结构, 对于范围查找和精确查找比较高效
索引是排序的, 在索引建立时需要指定排序顺序. 对于单列索引 顺序没有关系; 但是对于复合索引 顺序起到了关键作用
索引前导列, mongo也是支持复合索引基于前导列进行查询的
索引默认是非后台执行的, 也就是说默认会锁整个database

db.collection.createIndex(keys, options)

//唯一索引
db.collection.createIndex( { "a.b": 1 }, { unique: true } )

//稀疏索引, 适合存在null的字段, 不会将null字段所在行建立在索引中
//使用sort的时候, 需要使用hint()来使用到稀疏索引
db.addresses.createIndex( { "xmpp_id": 1 }, { sparse: true } )

//复合索引  user_id asc, age desc
db.users.createIndex( { user_id: 1, age: -1 } 

db.users.find().sort( { user_id: 1, age: -1 } ) //Ok

db.users.find().sort( { user_id: -1, age: 1 } ) //OK

db.users.find().sort( { user_id: 1, age: 1 } ) //无法使用索引支持这种查询

DML

* 读操作(select)

Mongo本身对SQL中的语义支持较全.

默认查询返回的是前20条记录的游标(可使用DBQuery.shellBatchSize), 10分钟无操作则会关闭.

支持读隔离, snapshot

sort操作没有index的话, server会将全表load到内存中

索引优化, 覆盖索引, 查看执行计划, Mongo都可以支持文档 , 执行计划

支持各种聚合操作, mapR, 文档

db.collection.find(<criteria>, <projection>)

db.users.find() //查询全表, 默认值返回前20行的cursor

//查询user_id和status字段, 默认包含_id字段
db.users.find(
    { },
    { user_id: 1, status: 1 }
) 

//查询user_id和status字段, 不包含_id字段
db.users.find(
    { },
    { user_id: 1, status: 1, _id: 0 }
)

//where status = A
db.users.find(
    { status: "A" }
)

//where status != A
db.users.find(
    { status: { $ne: "A" } }
)

//status = A and age = 50
db.users.find(
    { status: "A",
      age: 50 }
)

//status = A or age = 50
db.users.find(
    { $or: [ { status: "A" } ,
             { age: 50 } ] }
)

//WHERE user_id like "%bc%"
db.users.find( { user_id: /bc/ } )

//WHERE user_id like "bc%"
db.users.find( { user_id: /^bc/ } )

//WHERE status = "A" ORDER BY user_id ASC
db.users.find( { status: "A" } ).sort( { user_id: 1 } )

//select count(*)
db.users.count()
db.users.find().count()

//SELECT COUNT(user_id)
db.users.count( { user_id: { $exists: true } } )
db.users.find( { user_id: { $exists: true } } ).count()

//SELECT DISTINCT(status) FROM users
db.users.distinct( "status" )

//SELECT * FROM users LIMIT 1
db.users.findOne()
db.users.find().limit(1)

//SELECT * FROM users LIMIT 5 SKIP 10
db.users.find().limit(5).skip(10)

//EXPLAIN SELECT * FROM users WHERE status = "A"
db.users.find( { status: "A" } ).explain()

* 写操作(insert/update/remove)

写操作的Concern(ACK): http://docs.mongodb.org/manual/core/write-concern/

读一致性: Mongo可读到其他client未提交/新插入的数据, 不管写入控制和日志配置, Read uncommitted

isolate: MongoDB更新每单行是隔离的, client不会看到中间状态的. 但多行数据操作, Mongo未提供事务和隔离, 在集群中无效

w Option: 表示写通知的数目. 1为默认值: 表示单节点或集群主节点; 0表示: 关闭默认的, 但如果同时开启j, 则可通知; 大于1: 表示除主节点外还有N-1个需要写入成功, 但N-1必须>0, 否则可能陷入死循环; majority: MongoDB自行决定多数写入则成功, 避免hardcode

j Option: 表示Mongo需要写磁盘日志: true为开启, 开启写日志配置到disk, 则可在Mongo宕机重启后, 看到数据. 否则数据重启后可能丢失

wtimeout: 表示写入等待, 超时则返回error

insert

_id字段未指定则使用12byte的BSON类型来保证唯一性

db.collection.insert(
   <document or array of documents>,
   {
     writeConcern: <document>,
     ordered: <boolean> //出错后是否还继续插入其余行
   }
)
//多行插入
db.products.insert(
   [
     { _id: 11, item: "pencil", qty: 50, type: "no.2" },
     { item: "pen", qty: 20 },
     { item: "eraser", qty: 25 }
   ]
) 

update

更新默认是更新单条记录(自带limit 1), 可设置multi: true来修改多行

如果未查到配置数据, upsert:true则可插入新的记录, 类似insert or update语法, 为防止插入多行数据, 请在使用upsert:true时, 确保查询条件的唯一性

使用默认的写入策略

db.collection.update(
   <query>,
   <update>,
   {
     upsert: <boolean>,
     multi: <boolean>,
     writeConcern: <document>
   }
)

//更新多行age>25的将status设置为C
db.users.update(
   { age: { $gt: 25 } },
   { $set: { status: "C" } },
   { multi: true }
)

//更新多行将status=A的age=age+3
db.users.update(
   { status: "A" } ,
   { $inc: { age: 3 } },
   { multi: true }
)

//

delete

默认删除所有的符合条件的行, 如果想删掉单行并排序, 使用findAndModify()

db.bios.remove( { } ) //移除全部行
db.products.remove( { qty: { $gt: 20 } } ) //移除全部大于20的
db.products.remove(
    { qty: { $gt: 20 } },
    { writeConcern: { w: "majority", wtimeout: 5000 } }
)//使用自定义ack
db.products.remove( { qty: { $gt: 20 } }, true )//只移除单行, limit 1
db.products.remove( { qty: { $gt: 20 }, $isolated: 1 } ) //事务隔离, 只在单机有效

技术与产品