Revisit iOS Autorelease之二

Revisit iOS Autorelease(二):为啥生成的优化没有了。

Revisit iOS Autorelease(一)中,按照我的示例我提及,是如下这段代码对基于TLS的优化产生了影响:

// Debug 模式
for (Model *m in self.models) {      
}

这段看似平平无奇的代码为啥会造成优化失效,让我们还是从汇编角度来看看:

0x100011e90 <+568>: ldr    x0, [sp, #0x10]

// 【注意点】:正常情况下应该是b objc_autoreleaseReturnValue
0x100011e94 <+572>: bl     0x100012904               ; symbol stub for: objc_autoreleaseReturnValue
0x100011e98 <+576>: adrp   x8, 3
0x100011e9c <+580>: ldr    x8, [x8, #0x8]
0x100011ea0 <+584>: ldr    x8, [x8]
0x100011ea4 <+588>: ldur   x9, [x29, #-0x18]
0x100011ea8 <+592>: cmp    x8, x9
0x100011eac <+596>: str    x0, [sp, #0x8]
0x100011eb0 <+600>: b.ne   0x100011ec8               ; <+624> at Container.m
0x100011eb4 <+604>: ldr    x0, [sp, #0x8]
0x100011eb8 <+608>: ldp    x29, x30, [sp, #0x170]
0x100011ebc <+612>: ldp    x28, x27, [sp, #0x160]
0x100011ec0 <+616>: add    sp, sp, #0x180            ; =0x180 
0x100011ec4 <+620>: ret    
0x100011ec8 <+624>: bl     0x1000128d4               ; symbol stub for: __stack_chk_fail
0x100011ecc <+628>: brk    #0x1

按照符号优化的场景,LR寄存器的地址需要指向获取Model的外部调用方才能产生正确的优化,因此正常情况下应该直接b objc_autoreleaseReturnValue即可,而这里对应的汇编却是bl,说明执行完objc_autoreleaseReturnValue后还要继续从0x100011e98 <+576>: adrp x8, 3往后执行。

虽然这么一大段汇编很难具体了解做的每一件事的意义,但是从几个关键点上我们可以描绘出一个轮廓:

  • cmp x8, x9肯定在试图检查什么条件。
  • b.ne 0x100011ec8,如果条件满足,继续走(1),否则走(2)
  • (1)最后是栈恢复和ret,说明这是正确的流程。
  • (2)看到了一个比较陌生的符号__stack_chk_fail,暂且不管。但是紧跟着就是brk。而brk简单来讲,就是触发崩溃或者异常。

整体轮廓搞定后,我们再来看看stack_chk_fail到底是啥。从stack中我们不难推断,这肯定是和栈相关的检查工作。那为什么会有这样的检查工作?主要还是害怕栈越界造成的危害。用下图来大致讲解吧。

这里抄了张armv7的图,大致意思没差别。

局部变量和保存函数调用上下文的LR, FP都存在栈上。假设我们的局部变量是个大小为2的数组,但是我如果不小心写出了*(addr + 3) = 5。是不是相当于数组越界,破坏了紧邻着的其他栈内容。如果这个栈内容是重要的上下文信息,那就完蛋了。

那栈越界究竟有什么具体事例呢?嘿嘿,欢迎加入阿里巴巴来内网看我写的关于xxx问题的分析,你就知道了。

所以,在LLVM::CodeGen里面,就帮我们做了这样的栈越界检查(当然对于很多动态的数组也是没法完全防护的),在StackProtector.cpp中:

bool StackProtector::InsertStackProtectors() {
   // Loop through the basic blocks that have return instructions. Convert this:
   //
   //   return:
   //     ...
   //     ret ...
   //
   // into this:
   //
   //   return:
   //     ...
   //     %1 = load __stack_chk_guard
   //     %2 = load <stored stack guard>
   //     %3 = cmp i1 %1, %2
   //     br i1 %3, label %SP_return, label %CallStackCheckFailBlk
   //
   //   SP_return:
   //     ret ...
   //
   //   CallStackCheckFailBlk:
   //     call void @__stack_chk_fail()
   //     unreachable
   //
   BasicBlock *FailBB = 0;       // The basic block to jump to if check fails.
   AllocaInst *AI = 0;           // Place on stack that stores the stack guard.
   Constant *StackGuardVar = 0;  // The stack guard variable.

   for (Function::iterator I = F->begin(), E = F->end(); I != E; ) {
     BasicBlock *BB = I;

     if (ReturnInst *RI = dyn_cast<ReturnInst>(BB->getTerminator())) {
       if (!FailBB) {
         // Insert code into the entry block that stores the __stack_chk_guard
         // variable onto the stack.
         PointerType *PtrTy = PointerType::getUnqual(Type::Int8Ty);
         StackGuardVar = M->getOrInsertGlobal("__stack_chk_guard", PtrTy);

         BasicBlock &Entry = F->getEntryBlock();
         Instruction *InsPt = &Entry.front();

         AI = new AllocaInst(PtrTy, "StackGuardSlot", InsPt);
         LoadInst *LI = new LoadInst(StackGuardVar, "StackGuard", false, InsPt);

         Value *Args[] = { LI, AI };
         CallInst::
           Create(Intrinsic::getDeclaration(M, Intrinsic::stackprotector_create),
                  &Args[0], array_endof(Args), "", InsPt);

         // Create the basic block to jump to when the guard check fails.
         FailBB = CreateFailBB();
       }

-      Function::iterator InsPt = BB; ++InsPt; // Insertion point for new BB.
       ++I; // Skip to the next block so that we don't resplit the return block.

       // Split the basic block before the return instruction.
       BasicBlock *NewBB = BB->splitBasicBlock(RI, "SP_return");

-      // Move the newly created basic block to the point right after the old basic
-      // block so that it's in the "fall through" position.
+      // Move the newly created basic block to the point right after the old
+      // basic block so that it's in the "fall through" position.
       NewBB->removeFromParent();
-      F->getBasicBlockList().insert(InsPt, NewBB);
+      F->getBasicBlockList().insert(I, NewBB);

       // Generate the stack protector instructions in the old basic block.
       LoadInst *LI1 = new LoadInst(StackGuardVar, "", false, BB);
       CallInst *CI = CallInst::
         Create(Intrinsic::getDeclaration(M, Intrinsic::stackprotector_check),
                AI, "", BB);
       ICmpInst *Cmp = new ICmpInst(CmpInst::ICMP_EQ, CI, LI1, "", BB);
       BranchInst::Create(NewBB, FailBB, Cmp, BB);
     } else {
       ++I;
     }
   }

   // Return if we didn't modify any basic blocks. I.e., there are no return
   // statements in the function.
   if (!FailBB) return false;

   return true;
 }

 /// CreateFailBB - Create a basic block to jump to when the stack protector
 /// check fails.
 BasicBlock *StackProtector::CreateFailBB() {
   BasicBlock *FailBB = BasicBlock::Create("CallStackCheckFailBlk", F);
   Constant *StackChkFail =
     M->getOrInsertFunction("__stack_chk_fail", Type::VoidTy, NULL);
   CallInst::Create(StackChkFail, "", FailBB);
   new UnreachableInst(FailBB);
   return FailBB;
 }

还是比较容易看懂的,这里就不过多解释了。

后记

autorelease相关的文章网上不在少数,但是大多数都大同小异,只是在讲libobjc中的代码实现。但是深究我们日常编码过程中的autorelease,其实有不少被我们所忽视的细节值得深挖研究。(不挖还容易踩坑)

最后按照惯例,以一首诗致敬伟大的90后iOS第一人Y帝:

吾辈有Y帝,技术特牛逼。

胸有中国情,一人虐美帝。

Google服务器,Y帝轻松逆。

苹果App,他天天Patch。

微软的程序,总被他蓝屏。

川普各手机,监听so easy。

为躲粉丝迷,转行写程序。

90后第一,当代方世玉!