Notes
Slide Show
Outline
1
Hardware debugging presentation 
2
Don’t be scared of hardware only bugs
  • Everyone has horror stories
    •  full of embellished gory details.
    • They are like veterans talking about war wounds!
  • Lack of documentation about tools and techniques.
    • Further mystifies the black art.
  • There will be some difficult hardware only bugs.
    • But majority are quite easy to progress using systematic methods.



3
So
  • lets start with what you know already
    • The emulator and Metrowerks CodeWarrior
4
Using the emulator
  • What happens when a thread panics?
    • Breakpoint is hit. Causing the emulator to stop at the line that caused the failure.
    • A Source level call stack is shown.
    • objects and variables in all functions of the call stack can be examined


  • You are spoilt! Always try to reproduce problems on the emulator first - it is a good debugging environment.
5
What does a panic look like?
6
What does a panic look like?
  • Find the line of code which calls user::Panic
7
What does a panic look like?
  • Or an access violation
    • below - trying to call Cancel() on a NULL pointer

8
Tips
  • Make sure Just In Time debugging is enabled
    • Set the following registry value:

      [HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Windows NT\CurrentVersion\AeDebug] "UserDebuggerHotKey"=dword:00000000 "Debugger"="\"C:\\apps\\Metrowerks\\bin\\IDE.exe\" -p %ld -e %ld" "Auto"="0"


    • also ensure that the following macro is removed from \epoc32\data\epoc.ini:

      JustInTime 0


  • Debug messages also appear in %Temp%epocwind.out
9
Tips
  • Enable Logging of System messages…
    • From the "Target Settings" panel, go to the "Debugger | Debugger Settings" options and tick the box labelled "Log System Messages“


10
What is a panic?
  • A panic is a Symbian term used to denote an unexpected exit of a thread
    • A thread is the unit of execution on Symbian OS
    • Processes must have at least one thread to begin executing code
  • A panic denotes a serious coding error.
    • Either by the caller of a function which has violated an API contract (e.g calling a function with invalid paramaters)
    • or that a object or memory structure has moved into a bad internal state causing an Invariant
  • Panics are helpful
    •  They aim to inform you about the exact nature of the problem during development
11
What does a panic look like?

    • TReal PercentageToDecimal(Tint aPercentage)
    • {
    • ASSERT__ALWAYS(aPercentage>=0 && aPercentage <=100), Panic( EInvalidInput)
    • TReal result = aPercentage/100;
    • ASSERT(result>=0.0 && result<=1.0);
    • return result;
      }
12
Call Stack
  • What is a call stack?
13
Call Stack
  • The cascade of function calling functions which resulted in the panic.
  • Shows some history of the current operation.
    • This often gives a pretty good idea of the chain of events leading to a panic.
  • Essential for tracking down problems and knowing where to put breakpoints.
  • Also a good way of identifying duplicate defects.
14
Debugging Memory Leaks
15
Using Hook Logger
  • Provides logging for:
    • memory allocations
    • process and thread creation
    • leaves
    • more in the future?
    • main use is for most pin-pointing the source of leaked memory
  • To use this tool you need to:
    • Install it on your machine: download from Symbian DevNet
    • Attach the hooks to EUSER.DLL
    • Run HookLogger.EXE
    • Run the code to be hooked
16
1. Attach the Hooks
  • Run “HookEUser.cmd” from the “emulator” drive.
    • x:\> HookEUser WINSCW
      • ‘x’ is the drive containing the epoc32 folder
    • Replaces EUSER with a hook "parasite" DLL
    • Undo by using the “-r” (remove) option
17
2. Start the UI
  • Run “HookLogger.exe”
    • Connection status shown in title bar
    • Set the options for monitoring heaps threads


18
3. Reproduce the leak
  • Start the emulator and reproduce the memory leak
    • the emulator will panic
  • Break into Codewarrior
  • Walk back up the stack to User::__DbgMarkEnd
    • take a note of leaked memory location (badCell) and thread id.

19
3. Reproduce the leak
20
4. Find the bad thread
  • Go to the Threads tab in the hook logger
    • find the thread that leaked memory

21
5. Show heap allocations
  • Right-click and select "Show allocations”
    • may take 10 to 20 seconds to respond
22
6. Find the bad allocation
  • Order list by “Ptr”
  • Find address indicated by “badCell” in part 3
  • Double click to get a nice callstack
23
Panics on Hardware
24
Hardware situation
  • What happens when a thread panics?
    • Either a panic dialog appears
    • or device reboots
    • No context is stored
  • Oh dear - No wonder it’s scary.
    • But you need to use tools to get the same information which emulator gives so easily.
25
Why are there two kinds?
  • Marking a thread or process as “system critical” means that it is an integral and essential part of the system
    • e.g. the file server
    • The thread or process is being declared necessary for correct functioning of the device
    • If a system critical thread exits or panics the device will reboot
  • This is why panics in some threads cause the device to reset
26
Here’s where it happens
  • \src\cedar\generic\base\e32\kernel\sthread.cpp
  • void DThread::Exit()
  • {
  • if (iExitType!=EExitKill && (iFlags & (KThreadFlagSystemPermanent|KThreadFlagSystemCritical)))
    • K::Fault(K::ESystemThreadPanic);
  • <snip>
    • }
27
Need some more information!
  • The most important information to get hold of
    • Which thread panicked/caused an access violation
    • What was the panic reason and number?
    • What was the callstack of the thread when it paniced?
28
Hardware Panics
  • There are two kinds
  • Application panic
    • Where a Panic dialog appears
    • not critical - device carries on working


  • System thread panic
    • critical - the device halts and resets.
    • Or possibly device may enter a special debug mode (called crash debugger or debug monitor)
29
Application panic
  • Dialog will tell you
    • Thread which panicked
    • Panic reason
  • What else do we need?
    • The call stack.

  • A tool called D_EXC can provide the call stack.
30
System panic
  • Must enable a tool called the debug monitor (or crash debugger) to get more info
  • Crash debugger tells you
    • which thread paniced
    • the category and number of the panic
    • where the stack for the paniced thread is located in memory
  • The crashdebugger can be coaxed to dump the callstack
31
Tackling a hardware panic
32
Use the OS Library to look up Panic codes
  • E.g if the dialog says “KERN-EXEC 3”.
    • Type in KERN-EXEC panic into the search
    • this will help you understand what to look for in code
33
Useful call stacks from Hardware
  • To get a useful call stack two things are always needed.
    • A hex dump of the memory used by the stack of the thread which paniced
    • A ROM symbol file for the software flashed onto the device.
  • With this information a Symbian perl script can decode a human readable call stack  ( similar to the call stack seen in the emulator).


34
How do I get a call stack?
Application panic
  • run d_exc tool on the device first
  • reproduce the panic.
  • d_exc dialog  pops up
    • telling you some information about the panic. Press OK to save the stack to disk.
  • d_exc will have dumped 2 files to disk
    • a binary .stk file containing the thread’s stack
    • a .txt file detailing the panic code and category
  • get those files onto a PC and have your symbol file at hand.
35
How do I get a call stack?
Application panic
36
Stack.txt
  • Open the output in notepad.
  • Do a find for “>>>>”.
    • This takes you to the top of the decoded stack!


  •  >>>> current stack pointer >>>>


  • r00=80007204 00000000 80000368 80000003
  • r04=00801bb0 00000001 00000000 00802bc4
  • r08=00000002 50340f15 00802bc4 00000000
  • r12=8041b36c 00801bb0 50160ff8 5000b34c
  • PC = 5000b34c L..P  __ArmVectorSwi(void) + 0x124
  • LR = 50160ff8 ...P  SvSendReceive(int, void *) + 0x1c


  •  >>>> current stack pointer >>>>
37
What next
  • Scroll down the text. Sometimes you may see this familiar finger print for a panic:


  •  >>>> current stack pointer >>>>
  • r00=80007204 00000000 80000368 80000003
  • r04=00801bb0 00000001 00000000 00802bc4
  • r08=00000002 50340f15 00802bc4 00000000
  • r12=8041b36c 00801bb0 50160ff8 5000b34c
  • PC = 5000b34c L..P  __ArmVectorSwi(void) + 0x124
  • LR = 50160ff8 ...P  SvSendReceive(int, void *) + 0x1c
  •  >>>> current stack pointer >>>>


  • 1bb0  80000001 ....
  • 1bb4  00000082 ....
  • 1bb8  50161018 ...P  SvSendReceiveCheck(int, void *) + 0x8
  • 1bbc  5016594c LY.P  RThread::Panic(TDesC16 const &, int) + 0x24
  • 1bc0  ffff8001 ....
  • 1bc4  00000082 ....
  • 1bc8  00801bdc ....  Stack + 0x1bdc
  • 1bcc  0000003c <...
  • 1bd0  50162024 $ .P  User::Panic(TDesC16 const &, int) + 0x24
  • 1bd4  ffff8001 ....
  • 1bd8  5016ce20  ..P  Panic(TCdtPanic) + 0x24
  • 1bdc  10000004 ....
  • 1be0  50178000 ...P  TUnicode::CjkWidthFoldTable + 0x5408
38
And then?
  • Look at all the functions that follow
  • In my case, after cutting out the lines that looked garbled. I got:


  • 1e18  50650031 1.eP  CBaLockChangeNotifier::DoRunL(void) + 0x5d
  • 1e24  5064fdef ..dP  RBaBackupSession::GetBacukupOperationEvent(..
  • 1e5c  5064ff65 e.dP  CBaLockChangeNotifier::RunL(void) + 0x19


  • That was enough to tell me to look at the code for DoRunL(), and to put some logging in there to see what is going on.


  • That’s the basics for d_exc
39
But what about system panics?
  • Same idea - we want to get panic reason and call stack:
    • Firstly get the base porting people to show you how to enable crash debugger build.
    • Reproduce the problem - If the device enters crash debugger - then you can get more information.
    • You use a terminal program on the pc to “talk” the the crashed device.
40
What do I ask it?
  • Same as always
    • Which thread caused a panic or access violation
    • What is the panic reason and number
    • What is the callstack of the thread
41
Connect the crash debugger
  • Launch terminal emulator (e.g. hyperterm) on your PC
  • Connect serial port to serial port which provides debug tracing
  • The terminal window should show a “password” prompt
  • Type in “replacement” and you have entered the debug monitor prompt
    • the kernel is frozen allowing you to interrogate it’s current state
42
Find the fault
  • Type ‘f’ into the crash debugger to get the Fault information
    • If the category is KERN 4 then you are in business.
    • KERN 4 simply says that a panic happened in a system thread
    • The actual panic, such as KERN-EXEC3, is hidden
  • Type ‘i’ into the crash debugger to get information about the real panic reason
    • Sometimes even this doesn’t work - a non-system critical thread which crashes can cause the process to exit if it is process critical e.g. the main thread
    • If another thread in the process is marked as system critical this will take down the platform.
  • A fool proof method of finding the real panic is to look at the output from the KPanic debug tracing


43
Find the fault
  • Type ‘r’ to get the values of all the registers
    • the ones to look at depends on the processor mode!
  • Type ‘c0’ to get the details of all the threads
    • ‘C0’ will pause between each screen full

44
Some background - APCS
  • The ARM Procedure Calling Standard (APCS)
    • Imposes conventions on the use of registers
    • So we always know the important registers to look at

45
Finding the panicked thread
  • If you have KPanic debug tracing enabled use that to identify the panic


    • RLibrary::Load - aFileName: BMPANSRV.DLL, -aPath:  threadName: Wserv
    • RLibrary::Load - OK
    • RLibrary::Load ......1
    • RLibrary::Load ......2
    • RLibrary::Load ......3
    • RLibrary::Load ......4
    • RLibrary::Load ......5
    • RLibrary::Load Init() - OK
    • Exc 1 Cpsr=20000010 FAR=0060fa6d FSR=00000801
    •  R0=00614a40  R1=806a21d7  R2=006029c4  R3=006029c4
    •  R4=0060f448  R5=0060f4c8  R6=006126b8  R7=0060f484
    •  R8=00000012  R9=00000040 R10=c8087d78 R11=00000000
    • R12=8009fced R13=004060e0 R14=8108b0f8 R15=8144710c
    • R13Svc=c924c000 R14Svc=80020108 SpsrSvc=00000010
    • Thread 37, KernCSLocked=0
    • FAULT: KERN 00000004
    • Password: replacement
46
What do all those numbers mean?
  • The type of exception or why the processor was unhappy
    • Exc 1 Cpsr=20000010 FAR=0060fa6d FSR=00000801
47
What do all those numbers mean?
  • The processor mode or which registers are valid
    • Exc 1 Cpsr=20000010 FAR=0060fa6d FSR=00000801

48
What do all those numbers mean?
  • The Fault Address Register (FAR) indicates the dodgy address that was accessed
    • Exc 1 Cpsr=20000010 FAR=0060fa6d FSR=00000801

  • Least significant 4 bits of the Fault Status Register (FSR) indicates the MMU fault
    • Exc 1 Cpsr=20000010 FAR=0060fa6d FSR=00000801
49
Finding the panicked thread
  • The “i” command gives you a lot more information, but all you are interested in is finding a fingerprint.


    • <snip>
    • THREAD at c8084ef0 VPTR=00000000 AccessCount=6 Owner=c80848a8
    • Full name apprun.exe::Calcsoft
    • Thread MState READY
    • Default priority 16 WaitLink Priority 16
    • ExitInfo 2,3,KERN-EXEC
    • Flags 00000002, Handles c8084a70
    • Supervisor stack base c9208000 size 4000
    • User stack base 00402000 size 5000
    • Id=29, Alctr=00600000, Created alctr=00600000, Frame=00406e1c
    • <snip>
    • R13_USR 8005e414 R14_USR 000002a8 SPSR_SVC c8084ef0
    •  R4 c8085198  R5 00000000  R6 00000000  R7 00000001
    •  R8 00000000  R9 8005e834 R10 000002a8 R11 c8085198
    •  PC 8005e81c


    • TheCurrentProcess=c80848a8
    • PROCESS at c80848a8 VPTR=00000000 AccessCount=7 Owner=00000000
    • Full name apprun.exe
    • ExitInfo 3,0,
    • <snip>
50
What do all those numbers mean?
  • The Exit Type
    • ExitInfo 2,3,KERN-EXEC


51
What next? - The Program counter
  • From the same information as previous page
    •  look at R15 and copy that number.
    •  This is the PC - address of the last instruction to execute in the thread which panicked
    • Be careful to get the right version depending on the “mode” of the processor

    • Exc 1 Cpsr=48000030 FAR=01000003 FSR=00000001
    •  R0=00600080  R1=00405ee0  R2=08cc014c  R3=00000000
    •  R4=00619ae8  R5=00000000  R6=00ffffff  R7=ffffffff
    •  R8=00000012  R9=00000040 R10=c808a2e0 R11=00000000
    • R12=800c3a35 R13=00405ee0 R14=810b1b59 R15=80728562
    • R13Svc=c924c000 R14Svc=80020234 SpsrSvc=08000010
    • Thread 37, KernCSLocked=0
    • Thread eiksrvs.exe::!EikAppUiServer Die: 2,3,KERN-EXEC
    • Thread eiksrvs.exe::!EikAppUiServer SetDefaultPriority 16
    • Thread eiksrvs.exe::!EikAppUiServer SetRequiredPriority def 16 cleanup -1 nest 0
    • Thread eiksrvs.exe::!EikAppUiServer MState 2 SetActualPriority 16
    • Exec::ThreadId
    • Exec::SemaphoreWait
    • Thread eiksrvs.exe::!EikAppUiServer Panic KERN-EXEC 3
    • FAULT: KERN 00000004
52
Program counter
  • Lookup the PC to find the “top” of the callstack
    • Either look at the symbol file directly or use printsym to decode the address
  • You were probably half way through a function
    • so you may have to look for the closest match, e.g. where R15=80728562

    • 80728498    0000    CAknViewAppUi::~CAknViewAppUi__sub_object()  avkon.in(.text)
    • 80728584    0010    CAknViewAppUi::~CAknViewAppUi__deallocating()  avkon.in(.text)

  • R14 (the link register – lr) may also give you a clue
    • e.g. where R14=810b1b59

    • 810b1b56    001c    CCoeEnv::CreateResourceReaderLC(TResourceReader&, int) const  CONE.in(.text)
    • 810b1b72    0074    CCoeEnv::ReadResourceAsDes16(TDes16&, int) const  CONE.in(.text)
53
What next? - The call stack
  • From the same information as previous page
    •  look at R13 and copy that number.
    •  This is the address of the stack.

    • Exc 1 Cpsr=48000030 FAR=01000003 FSR=00000001
    •  R0=00600080  R1=00405ee0  R2=08cc014c  R3=00000000
    •  R4=00619ae8  R5=00000000  R6=00ffffff  R7=ffffffff
    •  R8=00000012  R9=00000040 R10=c808a2e0 R11=00000000
    • R12=800c3a35 R13=00405ee0 R14=810b1b59 R15=80728562
    • R13Svc=c924c000 R14Svc=80020234 SpsrSvc=08000010
    • Thread 37, KernCSLocked=0
    • Thread eiksrvs.exe::!EikAppUiServer Die: 2,3,KERN-EXEC
    • Thread eiksrvs.exe::!EikAppUiServer SetDefaultPriority 16
    • Thread eiksrvs.exe::!EikAppUiServer SetRequiredPriority def 16 cleanup -1 nest 0
    • Thread eiksrvs.exe::!EikAppUiServer MState 2 SetActualPriority 16
    • Exec::ThreadId
    • Exec::SemaphoreWait
    • Thread eiksrvs.exe::!EikAppUiServer Panic KERN-EXEC 3
    • FAULT: KERN 00000004
54
Yuk! Hex
  • All you need to do now is
    • Type command M. into the crash debugger with the address of the stack from R13
    • and take dump about 200 bytes of stack - that should be plenty.
    • You can dump the stacks of all threads by using the ‘S’ command
  • Command is…
    • m 00405ee0+200
55
More hex!
  • That will dump some HEX and text to your terminal:


    • 00405ee0: 00 00 00 00 00 00 00 00 44 04 77 80 c6 56 00 10 ........D.w..V..
    • 00405ef0: 4c 01 cc 08 e8 9a 61 00 40 04 77 80 50 2a 60 00 L.....a.@.w.P*`.
    • 00405f00: ff ff ff ff e3 78 72 80 00 00 60 00 18 00 00 00 .....xr...`.....
56
Decoding the data using printsym
  • Type the following into a windows command prompt :
57
Warning: stack overflow == KE3
  • Always be aware to check whether you’re suffering from a stack overflow
    • A stack overflow will cause unexplainable   KERN-EXEC 3 errors
    • If you can’t get hold of the stack (you see a line like that shown below), it may indicate a stack overflow


      • .m 00414ff8 00415fff
      • Exception: Type 1 Code 80073280 Data 00414ff8 Extra 00000007
58
Checking for stack overflow
  • You need the value of R13 and the thread id
    • Exc 1 Cpsr=68000030 FAR=00414fe8 FSR=00000807
    •  R0=00415190  R1=00415190  R2=800e19bc  R3=00000038
    •  R4=7fffffff  R5=00000000  R6=00415028  R7=00000100
    •  R8=00000000  R9=00000040 R10=c808a2e8 R11=00000000
    • R12=800c3699 R13=00414ff8 R14=800c8d43 R15=800c8a1c
    • R13Svc=c931c000 R14Svc=8002014c SpsrSvc=08000010
    • Thread 58, KernCSLocked=0
  • Look up the details of the thread from the output of the ‘i’ or ‘c0’ commands to get the stack base
    • THREAD at c809ae88 VPTR=00000000 AccessCount=3 Owner=c8089e30
    • Full name eiksrvs.exe::KeySoundServerThread
    • <snip>
    • Supervisor stack base c9318000 size 4000
    • User stack base 00415000 size 1000
    • Id=58, Alctr=00600000, Created alctr=00600000, Frame=00415bd4
    • <snip>


59
Checking for stack overflow
  • The stack has overflowed if R13 < stack base
    • Plus: the exception id will indicate a data abort


    • Exc 1 Cpsr=68000030 FAR=00414fe8 FSR=00000807
    • R12=800c3699 R13=00414ff8 R14=800c8d43 R15=800c8a1c
    • User stack base 00415000 size 1000




60
Decoding the stack dump output
  • The output in will be similar to the decoded d_exc stack except
    • The top of printout represents the function which called Panic() (with d_exc you have to find the top)
  • So your output may start with something like this:


    • 00405f00: ff ff ff ff e3 78 72 80 00 00 60 00 18 00 00 00 .....xr...`.....


    • = ffffffff ....
    • = 807278e3 .xr.  CAknNoteAttributes::ConstructFromResourceL(TResourceReader&)  avkon.in(.text) + 0x1a1
    • = 00600000 ..`.
    • = 00000018 ....


    • 00405f10: 00 00 60 00 a1 40 0d 80 a8 60 40 00 ce 7b 61 00 ..`..@...`@..{a.


    • = 00600000 ..`.
    • = 800d40a1 .@..  RHeap::Alloc(int)                         euser.in(.text) + 0x8b
    • = 004060a8 .`@.
    • = 00617bce .{a.
61
What if the program is not in ROM?
  • D_EXC knows how to decode RAM based symbols also
    • D_EXC .txt file, lists any DLLs which were loaded into RAM (hence not present in the ROM symbol file
    • You have to place the .MAP file of every RAM DLL you are interested in into the same directory as the d_exc trace
    •  Run printstk.pl as usual, and it should pick up the addresses correctly
62
Anything else?
  • Now that you know the thread, panic code and have the stack for both application panics and system panics:
    • It gives you a good idea of what functions to put logging in
    • It quickly allows you to see if a defect is a duplicate (if the callstack has already been posted on a previous defect)


63
Debugging on hardware is hard?
  • But with practice it’s a systematic method - not a black art
  • Application and System thread panics cover 90% of application side hardware crashes
    • So learn how to diagnose these first, then worry about more advanced debugging facilities
    • Print out the documentation as you work, it will help you with other kinds of problems and is a good reference
    • Read the other debugging guides to help you understand what is going on under the hood
64
Tips
  • Try these techniques out on a panic you put in the code yourself
    • so that you are confident about the result
    • make sure the call stack matches what you know to be the problem
  • When looking into a defect. It is often enough to find the component which paniced
    • This is what triage may do
    • The component owner may then be able to take over apply some knowledge and logging etc
  • Sometimes it can be helpful to make every thread a system thread
    • so all panics go to the debug monitor

65
ROM Symbol file format
  • 5055ced4    004c    CMmPhoneTsy::UpdatePhoneIndicator(RMobileCall::TMobileCallStatus
  • 5055cf20    0054    CMmPhoneTsy::UpdatePhoneIndicator(RMobilePhone::TMobilePhoneRegistr
  • 5055cf74    001c    CMmPhoneTsy::GetSubscriberIdL(TBuf<15> &)
  • 5055cf90    0038    CMmPhoneTsy::CompleteReadNamData(int, TPtrC8)
  • 5055cfc8    0058    CMmPhoneTsy::CompleteProductInfoNumId(TBuf8<50> &)