Linux 内核驱动解析 - UBI坏块预留大小
在处理ubifs相关bug的过程中,学习了ubi驱动对坏块保留分区的处理方式,在此记录一下。
相关概念
mtd
mtd 全称 memory technology device 内存技术设备,是用于访问内存设备(RAM, Flash)的Linux 子系统,在硬件层和用户空间之间提供抽象接口。
在嵌入式linux设备中的/dev/
目录下有很多/dev/mtdxx
文件,这些文件对应的就是内存设备。比如,嵌入式设备的Nand Flash被划分为多个分区,每个分区对应一个/dev/mtdxx
文件。
如下所示,/dev/mtd0
至/dev/mtd10
对应uboot分区,mtd21
对应固件分区,mtd20
就是本文将要用到的数据分区。
root:/dev# ls mtd* |grep -v block
mtd0 mtd16ro mtd23ro mtd30ro
mtd0ro mtd17 mtd24 mtd31
mtd1 mtd17ro mtd24ro mtd31ro
mtd10 mtd18 mtd25 mtd3ro
mtd10ro mtd18ro mtd25ro mtd4
mtd11 mtd19 mtd26 mtd4ro
mtd11ro mtd19ro mtd26ro mtd5
mtd12 mtd1ro mtd27 mtd5ro
mtd12ro mtd2 mtd27ro mtd6
mtd13 mtd20 mtd28 mtd6ro
mtd13ro mtd20ro mtd28ro mtd7
mtd14 mtd21 mtd29 mtd7ro
mtd14ro mtd21ro mtd29ro mtd8
mtd15 mtd22 mtd2ro mtd8ro
mtd15ro mtd22ro mtd3 mtd9
mtd16 mtd23 mtd30 mtd9ro
root:/dev#
root:/dev# cat /proc/mtd
dev: size erasesize name
mtd0: 00100000 00020000 "0:SBL1"
mtd1: 00100000 00020000 "0:MIBIB"
mtd2: 00100000 00020000 "0:BOOTCONFIG"
...
mtd7: 00080000 00020000 "0:BOOTCONFIG1"
mtd8: 00080000 00020000 "0:APPSBLENV"
mtd9: 00200000 00020000 "0:APPSBL"
mtd10: 00200000 00020000 "0:APPSBL_1"
mtd11: 00080000 00020000 "0:ART"
mtd12: 00080000 00020000 "0:ART.bak"
mtd13: 00100000 00020000 "config"
mtd14: 00080000 00020000 "data1"
mtd15: 00040000 00020000 "data2"
...
mtd20: 01e00000 00020000 "mtddata"
mtd21: 02800000 00020000 "firmware"
...
mtd25: 02780000 00020000 "reserved"
ubi
ubi 是Unsorted Block Image的简称,ubifs是Unsorted Block Image File System(无序区块镜像文件系统)的简称,构建于MTD之上,可操控大容量的Nand flash.
nand flash, mtd 和ubifs三者关系可以简述为:nand flash作为硬件设备,mtd介于硬件设备和用户层间提供抽象接口,ubifs是在mtd之上构建的文件系统,方便对nand flash进行数据读写。
ubi相关概念有:
- PEB: physical eraseblock, 物理擦除块,通常为128KB(131072 Bytes)
- LEB: logical eraseblock, 逻辑擦除块,通常为124KB(126976 Bytes)
UBI Headers
UBI stores 2 small 64-byte headers at the beginning of each non-bad physical eraseblock:
- erase counter header (or EC header) which contains the erase counter of the physical eraseblock (PEB) plus other information;
- volume identifier header (or VID header) which stores the volume ID and the logical eraseblock (LEB) number to which this PEB belongs.
从字面意思也可以理解,LEB是逻辑块,PEB是物理块,实际上LEB包含于PEB,通常LEB会比PEB小4KB,其中2KB用于存储VID Headers, 另外2KB是偏移量,用于对齐。
console log
了解了相关概念,接下来从嵌入式设备的console log看看UBI相关的信息,依据log可以方便在用户空间或者内核空间搜索相关信息,定位到与之相关的代码。
UBI attach
首先来看下嵌入式linux设备开机过程中UBI设备的加载信息
Info: init ubi volumes on mtddata raw partition
UBI: attaching mtd20 to ubi0
random: procd: uninitialized urandom read (4 bytes read, 60 bits of entropy available)
UBI: scanning is finished
UBI: attached mtd20 (name "mtddata", size 30 MiB) to ubi0
UBI: PEB size: 131072 bytes (128 KiB), LEB size: 126976 bytes
UBI: min./max. I/O unit sizes: 2048/2048, sub-page size 2048
UBI: VID header offset: 2048 (aligned 2048), data offset: 4096
UBI: good PEBs: 240, bad PEBs: 0, corrupted PEBs: 0
UBI: user volume: 5, internal volumes: 1, max. volumes count: 128
UBI: max/mean erase counter: 2/1, WL threshold: 4096, image sequence number: 860068978
UBI: available PEBs: 3, total reserved PEBs: 237, PEBs reserved for bad PEB handling: 20
UBI: background thread "ubi_bgt0d" started, PID 115
UBI device number 0, total 240 LEBs (30474240 bytes, 29.1 MiB), available 3 LEBs (380928 bytes, 372.0 KiB), LEB size 126976 bytes (124.0 KiB)
Info: attach ubi device on mtddata success!
以上信息绝大部分在内核UBI驱动的build.c/ubi_attach_mtd_dev
函数中打印输出,下面内核空间部分会讲到。
下面分析其中几条主要信息:
# 在 mtddata 原始分区初始化 ubi 卷
Info: init ubi volumes on mtddata raw partition
# 将mtd20附加到ubi0...
UBI: attaching mtd20 to ubi0
# 将mtd20附加到ubi0 完成
UBI: attached mtd20 (name "mtddata", size 30 MiB) to ubi0
# PEB 128KB, LEB 124KB
UBI: PEB size: 131072 bytes (128 KiB), LEB size: 126976 bytes
# 最小最大I/O读写单元:2048/2048, subpage 2048, 就是2KB
UBI: min./max. I/O unit sizes: 2048/2048, sub-page size 2048
# VID header偏移量2KB,用于对齐,数据偏移量4KB
UBI: VID header offset: 2048 (aligned 2048), data offset: 4096
# 240 pebs正常,无坏块
UBI: good PEBs: 240, bad PEBs: 0, corrupted PEBs: 0
# 剩余可用3 PEBs, 总保留 237 PEBs(已用或保留),用于坏块处理的保留PEBs 20(本文讨论重点)
UBI: available PEBs: 3, total reserved PEBs: 237, PEBs reserved for bad PEB handling: 20
# UBI 设备号0,共240 LEBs(29.1MB),剩余可用3 LEBs,每个LEB大小为124KB
UBI device number 0, total 240 LEBs (30474240 bytes, 29.1 MiB), available 3 LEBs (380928 bytes, 372.0 KiB), LEB size 126976 bytes (124.0 KiB)
# 成功在mtddata附加ubi设备
Info: attach ubi device on mtddata success!
从log中可以知晓很多关键信息,UBI挂载的分区名称为mtddata,对应mtd20; PEB 128KB, LEB 124KB; ubi0共240 LEBs/PEBs, 剩余可以3 LEBs/PEBs, 无坏块; 用于坏块处理的保留部分为20 PEBs, 本文后续将要介绍的就是这个保留20 PEBs是如何得来的。
ubinfo -a
开机过程会自动打印UBI的信息,那开机后如何手动获取呢,这就要用到ubi相关的工具集了,ubi相关的指令包含:
root:/# ubi
ubiattach ubidetach ubinfo ubirmvol
ubiblock ubiformat ubinize ubirsvol
ubicrc32 ubimkvol ubirename ubiupdatevol
其中的ubinfo
就可以查看ubi信息
root:/# ubinfo -a
UBI version: 1
Count of UBI devices: 1
UBI control device major/minor: 10:60
Present UBI devices: ubi0
ubi0
Volumes count: 5
Logical eraseblock size: 126976 bytes, 124.0 KiB
Total amount of logical eraseblocks: 240 (30474240 bytes, 29.1 MiB)
Amount of available logical eraseblocks: 3 (380928 bytes, 372.0 KiB)
Maximum count of volumes 128
Count of bad physical eraseblocks: 0
Count of reserved physical eraseblocks: 20
Current maximum erase counter value: 2
Minimum input/output unit size: 2048 bytes
Character device major/minor: 249:0
Present volumes: 0, 1, 2, 3, 4
...
ubi0包含5个分卷,最多可包含128个分卷,其它基本信息与开机过程中内核打印的信息一致,本文主要关心下面这一条信息。
Count of reserved physical eraseblocks: 20
好啦,到此就获取并分析了最直观的log信息,下面将从用户空间和内核空间两个维度来追溯这个20 PEBs是怎么来的。
用户空间
在用户空间的ubi-utils代码库中搜索“Count of reserved physical eraseblocks”,就可以定位到函数ubinfo.c/print_dev_info
print_dev_info
static int print_dev_info(libubi_t libubi, int dev_num, int all)
{
int i, err, first = 1;
struct ubi_dev_info dev_info;
struct ubi_vol_info vol_info;
err = ubi_get_dev_info1(libubi, dev_num, &dev_info);
if (err)
return sys_errmsg("cannot get information about UBI device %d", dev_num);
printf("ubi%d\n", dev_info.dev_num);
printf("Volumes count: %d\n", dev_info.vol_count);
printf("Logical eraseblock size: ");
util_print_bytes(dev_info.leb_size, 0);
printf("\n");
printf("Total amount of logical eraseblocks: %d (", dev_info.total_lebs);
util_print_bytes(dev_info.total_bytes, 0);
printf(")\n");
printf("Amount of available logical eraseblocks: %d (", dev_info.avail_lebs);
util_print_bytes(dev_info.avail_bytes, 0);
printf(")\n");
printf("Maximum count of volumes %d\n", dev_info.max_vol_count);
printf("Count of bad physical eraseblocks: %d\n", dev_info.bad_count);
printf("Count of reserved physical eraseblocks: %d\n", dev_info.bad_rsvd);
printf("Current maximum erase counter value: %lld\n", dev_info.max_ec);
printf("Minimum input/output unit size: %d %s\n",
dev_info.min_io_size, dev_info.min_io_size > 1 ? "bytes" : "byte");
printf("Character device major/minor: %d:%d\n",
dev_info.major, dev_info.minor);
if (dev_info.vol_count == 0)
return 0;
printf("Present volumes: ");
for (i = dev_info.lowest_vol_id;
i <= dev_info.highest_vol_id; i++) {
err = ubi_get_vol_info1(libubi, dev_info.dev_num, i, &vol_info);
if (err == -1) {
if (errno == ENOENT)
continue;
return sys_errmsg("libubi failed to probe volume %d on ubi%d",
i, dev_info.dev_num);
}
if (!first)
printf(", %d", i);
else {
printf("%d", i);
first = 0;
}
}
printf("\n");
if (!all)
return 0;
first = 1;
printf("\n");
for (i = dev_info.lowest_vol_id;
i <= dev_info.highest_vol_id; i++) {
if(!first)
printf("-----------------------------------\n");
err = ubi_get_vol_info1(libubi, dev_info.dev_num, i, &vol_info);
if (err == -1) {
if (errno == ENOENT)
continue;
return sys_errmsg("libubi failed to probe volume %d on ubi%d",
i, dev_info.dev_num);
}
first = 0;
err = print_vol_info(libubi, dev_info.dev_num, i);
if (err)
return err;
}
return 0;
}
打印保留size的是下面这一行,对应变量dev_info.bad_rsvd
printf("Count of reserved physical eraseblocks: %d\n", dev_info.bad_rsvd);
追踪 bad_rsvd
根据 dev_info.bad_rsvd
这个变量可以逐步逆向追溯到信息来源
从以上函数调用关系可以看出,饶了一大圈,实际上就是读了个文件里的值,没错,这个值就是存在/sys/class/ubi/ubi0/reserved_for_bad
文件里
root:/sys/devices/virtual/ubi/ubi0# ls
avail_eraseblocks max_ec reserved_for_bad ubi0_2
bad_peb_count max_vol_count subsystem ubi0_3
bgt_enabled min_io_size total_eraseblocks ubi0_4
dev mtd_num ubi0_0 uevent
eraseblock_size power ubi0_1 volumes_count
root:/sys/devices/virtual/ubi/ubi0# cat reserved_for_bad
20
在目录/sys/class/ubi/ubi0
下还可以看到其它ubi信息,比如avail_eraseblocks(可用块), bad_peb_count(坏块个数)等。
好啦,用户空间就到这吧,我们已经搞清楚了ubinfo -a
的信息来源是系统目录下的文件,当然我们也可以很容易猜到这些文件是系统内核产生的,确切的说是UBI驱动程序产生的。
内核空间
接下来看内核空间的UBI驱动部分,查找底层驱动是如何计算坏块保留大小的。
根据开机过程的UBI log,在linux内核的drivers/mtd/ubi/
目录grep搜索相关字符串(如"PEBs reserved for bad PEB handling"),可以找到打印这些log的函数ubi_attach_mtd_dev
.
ubi_attach_mtd_dev
该函数用于附加MTD device到UBI并分配@ubi_num给新创建的UBI设备,在附加过程中会打印UBI设备的相关信息,也就是UBI attach部分的console log.
int ubi_attach_mtd_dev(struct mtd_info *mtd, int ubi_num,
int vid_hdr_offset, int max_beb_per1024)
{
struct ubi_device *ubi;
int i, err, ref = 0;
/*省略部分代码*/
ubi_msg("attached mtd%d (name \"%s\", size %llu MiB) to ubi%d",
mtd->index, mtd->name, ubi->flash_size >> 20, ubi_num);
ubi_msg("PEB size: %d bytes (%d KiB), LEB size: %d bytes",
ubi->peb_size, ubi->peb_size >> 10, ubi->leb_size);
ubi_msg("min./max. I/O unit sizes: %d/%d, sub-page size %d",
ubi->min_io_size, ubi->max_write_size, ubi->hdrs_min_io_size);
ubi_msg("VID header offset: %d (aligned %d), data offset: %d",
ubi->vid_hdr_offset, ubi->vid_hdr_aloffset, ubi->leb_start);
ubi_msg("good PEBs: %d, bad PEBs: %d, corrupted PEBs: %d",
ubi->good_peb_count, ubi->bad_peb_count, ubi->corr_peb_count);
ubi_msg("user volume: %d, internal volumes: %d, max. volumes count: %d",
ubi->vol_count - UBI_INT_VOL_COUNT, UBI_INT_VOL_COUNT,
ubi->vtbl_slots);
ubi_msg("max/mean erase counter: %d/%d, WL threshold: %d, image sequence number: %u",
ubi->max_ec, ubi->mean_ec, CONFIG_MTD_UBI_WL_THRESHOLD,
ubi->image_seq);
ubi_msg("available PEBs: %d, total reserved PEBs: %d, PEBs reserved for bad PEB handling: %d",
ubi->avail_pebs, ubi->rsvd_pebs, ubi->beb_rsvd_pebs);
/*省略部分代码*/
}
ref: https://elixir.bootlin.com/linux/v3.14.77/source/drivers/mtd/ubi/build.c#L867
从函数可以看到打印坏块保留分区的语句:
ubi_msg("available PEBs: %d, total reserved PEBs: %d, PEBs reserved for bad PEB handling: %d",
ubi->avail_pebs, ubi->rsvd_pebs, ubi->beb_rsvd_pebs);
结合以下变量定义
/**
* struct ubi_device - UBI device description structure
* ...
* @rsvd_pebs: count of reserved physical eraseblocks
* @avail_pebs: count of available physical eraseblocks
* @beb_rsvd_pebs: how many physical eraseblocks are reserved for bad PEB
* handling
* @beb_rsvd_level: normal level of PEBs reserved for bad PEB handling
* ...
*/
ref: https://elixir.bootlin.com/linux/v3.14.77/source/drivers/mtd/ubi/ubi.h#L383
可知变量beb_rsvd_pebs
对应的就是为坏块预留的大小,beb_rsvd_level
是坏块预留的常规等级,这两者有啥联系呢。OK,接下来要做的和用户空间一样,我们进行逆向追踪,看下这两个变量的值是如何获取的。
追踪 beb_rsvd_pebs
图中虚线代表非直接调用关系,虚线中间的全局变量代表的是两个节点的关联信息,捋一下:
- ubi_eba_init 调用 ubi_calculate_reserved 函数计算出
beb_rsvd_level
- ubi_calculate_reserved 调用了 get_bad_peb_limit 获取
bad_peb_limit
- get_bad_peb_limit 调用了其它3个函数计算
bad_peb_limit
- ubi_calculate_reserved 调用了 get_bad_peb_limit 获取
- ubi_eba_init 将
beb_rsvd_level
赋值给beb_rsvd_level
- ubi_attach_mtd_dev 将
beb_rsvd_pebs
打印到 console
有点绕,没关系,下面按照箭头方向从下往上逐一细说。
ubi_eba_init
ubi_eba_init
使用ubi信息初始化EBA子系统,但是这个我们不关心,主要看其中一小段代码。
int ubi_eba_init(struct ubi_device *ubi, struct ubi_attach_info *ai)
{
/*省略部分代码*/
if (ubi->bad_allowed) {
ubi_calculate_reserved(ubi);
if (ubi->avail_pebs < ubi->beb_rsvd_level) {
/* No enough free physical eraseblocks */
ubi->beb_rsvd_pebs = ubi->avail_pebs;
print_rsvd_warning(ubi, ai);
} else
ubi->beb_rsvd_pebs = ubi->beb_rsvd_level;
ubi->avail_pebs -= ubi->beb_rsvd_pebs;
ubi->rsvd_pebs += ubi->beb_rsvd_pebs;
}
/*省略部分代码*/
}
在允许坏块,并且有足够PEBs的情况下, beb_rsvd_pebs
等于 beb_rsvd_level
ubi->beb_rsvd_pebs = ubi->beb_rsvd_level;
ok,那接下来的问题是beb_rsvd_level
如何得来,继续往下↓
ubi_calculate_reserved
从上面的流程图可知,beb_rsvd_level
由 以下函数计算得到。
/**
* ubi_calculate_reserved - calculate how many PEBs must be reserved for bad
* eraseblock handling.
* @ubi: UBI device description object
*/
void ubi_calculate_reserved(struct ubi_device *ubi)
{
/*
* Calculate the actual number of PEBs currently needed to be reserved
* for future bad eraseblock handling.
*/
ubi->beb_rsvd_level = ubi->bad_peb_limit - ubi->bad_peb_count;
if (ubi->beb_rsvd_level < 0) {
ubi->beb_rsvd_level = 0;
ubi_warn("number of bad PEBs (%d) is above the expected limit (%d), not reserving any PEBs for bad PEB handling, will use available PEBs (if any)",
ubi->bad_peb_count, ubi->bad_peb_limit);
}
}
这个函数的精髓就一行, beb_rsvd_level
等于坏块限制 bad_peb_limit
减去当前已检测到的坏块数量bad_peb_count
.
ubi->beb_rsvd_level = ubi->bad_peb_limit - ubi->bad_peb_count;
由于检测到的坏块数量与硬件实际情况有关,我们不深究,接下来继续追溯 bad_peb_limit
的来源。
get_bad_peb_limit
get_bad_peb_limit
就是用来计算坏块限制的函数,函数中有段注释,大致意思是我们不能保证坏块平均分摊在整个flash芯片,考虑最坏情况,有可能所有坏块都出现在附加了ubi的MTD分区。因此在计算限制大小时使用的是整个flash size.
static int get_bad_peb_limit(const struct ubi_device *ubi, int max_beb_per1024)
{
int limit, device_pebs;
uint64_t device_size;
if (!max_beb_per1024)
return 0;
/*
* Here we are using size of the entire flash chip and
* not just the MTD partition size because the maximum
* number of bad eraseblocks is a percentage of the
* whole device and bad eraseblocks are not fairly
* distributed over the flash chip. So the worst case
* is that all the bad eraseblocks of the chip are in
* the MTD partition we are attaching (ubi->mtd).
*/
device_size = mtd_get_device_size(ubi->mtd);
device_pebs = mtd_div_by_eb(device_size, ubi->mtd);
limit = mult_frac(device_pebs, max_beb_per1024, 1024);
/* Round it up */
if (mult_frac(limit, 1024, max_beb_per1024) < device_pebs)
limit += 1;
return limit;
}
这里用到3个函数:
mtd_get_device_size
- 获取整个flash芯片的大小mtd_div_by_eb
- 将flash大小换算成eraseblock个数,就是将Bytes单位换算为PEBsmult_frac
- 分数相乘函数,把以PEB为单位的limit值乘以一个坏块系数
前面两个函数都好理解,单独来看下mult_frac
, 这其实是个宏定义,用于分数相乘。
/*
* Multiplies an integer by a fraction, while avoiding unnecessary
* overflow or loss of precision.
*/
#define mult_frac(x, numer, denom)( \
{ \
typeof(x) quot = (x) / (denom); \
typeof(x) rem = (x) % (denom); \
(quot * (numer)) + ((rem * (numer)) / (denom)); \
} \
)
举例说明,假设flash为128MB(134,217,728 Bytes), get_bad_peb_limit
函数用到的max_beb_per1024
来自于kernel config, 默认值为20,代表每1024 PEBs中最多允许20个坏块,那么对应的limit计算如下:
device_size = 134217728; /* flash size 128MB */
device_pebs = 134217728 / (128 * 1024) = 1024; /* eraseblock: 128KB */
limit = mult_frac(device_pebs, max_beb_per1024, 1024) = 1024 * (20 / 1024) = 20;
最终计算得到bad_peb_limit
为20 PEBs, 与 ubinfo -a 中的结果一致。
小结
本文首先从 console log 入手分析了UBI相关配置信息,然后从用户空间和内核空间两个方面分析了UBI信息的来源以及坏块保留大小的计算. 其中坏块保留大小 beb_rsvd_pebs
的计算过程可以归结为:
/* get_bad_peb_limit */
device_size = mtd_get_device_size(ubi->mtd);
device_pebs = mtd_div_by_eb(device_size, ubi->mtd);
limit = mult_frac(device_pebs, max_beb_per1024, 1024);
ubi->bad_peb_limit = get_bad_peb_limit(ubi, max_beb_per1024);
/* ubi_calculate_reserved */
ubi->beb_rsvd_level = ubi->bad_peb_limit - ubi->bad_peb_count;
/* ubi_eba_init */
ubi->beb_rsvd_pebs = ubi->beb_rsvd_level;
参考
版权声明:本博客所有文章除特殊声明外,均采用 CC BY-NC 4.0 许可协议。转载请注明出处 litreily的博客!