Toyota distributes its firmware in an undocumented format. My customer, who has a car of this brand, showed me the firmware file, which begins like this: Then there are lines of 32 hexadecimal digits. The owner and other craftsmen would like to be able to check what is inside before installing the firmware: put it in the disassembler and see what it does.
CALIBRATIONΓͺXi ΒΊ
attach.att
ΓΓ[Format]
Version=4
[Vehicle]
Number=0
DateOfIssue=2019-08-26
VehicleType=GUN1**
EngineType=1GD-FTV,2GD-FTV
VehicleName=IMV
ModelYear=15-
ContactType=CAN
KindOfECU=0
NumberOfCalibration=1
[CPU01]
CPUImageName=3F0S7300.xxz
FlashCodeName=
NewCID=3F0S7300
LocationID=0002000100070720
CPUType=87
NumberOfTargets=3
01_TargetCalibration=3F0S7200
01_TargetData=3531464734383B3A
02_TargetCalibration=3F0S7100
02_TargetData=3747354537494A39
03_TargetCalibration=3F0S7000
03_TargetData=3732463737463B4A
3F0S7300forIMV.txt ΒΈNiΒΆm5A56001000820EE13FE2030133E20301
33E2030133C20EF13FE2030133E20301
33E2030133E2030133E2030133E20301
33E2030133C20EF13FE2030133E20301
33E2030133C20EF13FE2030133E20301
33E2030133C20EF13FE2030133E20301
33E2030133E2030133E2030133E20301
33E2030133C20EF13FE2030133E20301
33E2030133E20911381959FAB0EE9000
81C9E03ADE35CEEEEFC5CF8DE9AC0910
38C2E031DE35CEEEEFC8CF87E95C0920
...
Specifically for this firmware, he had a content dump:
0000: 80 07 80 00 00 00 00 00 β 00 00 00 00 00 00 00 00
0010: 80 07 00 00 00 00 00 00 β 00 00 00 00 00 00 00 00
0020: 00 00 00 00 00 00 00 00 β 00 00 00 00 00 00 00 00
0030: 80 07 00 00 00 00 00 00 β 00 00 00 00 00 00 00 00
0040: 80 07 00 00 00 00 00 00 β 00 00 00 00 00 00 00 00
0050: 80 07 00 00 00 00 00 00 β 00 00 00 00 00 00 00 00
0060: 00 00 00 00 00 00 00 00 β 00 00 00 00 00 00 00 00
0070: 80 07 00 00 00 00 00 00 β 00 00 00 00 00 00 00 00
0080: E0 07 60 01 2A 06 00 FF β 00 00 0A 58 EA FF 20 00
0090: FF 57 40 00 EB 51 B2 05 β 80 07 48 01 E0 FF 20 00
...
As you can see, there is nothing even close to the strings of hexadecimal digits in the firmware file. The question arises: in what format is the firmware distributed, and how to decrypt it? The owner of the car entrusted me with this task.
Repeating fragments
Let's take a closer look at those hexadecimal lines: We see eight repetitions of a sequence of three , which are very similar to the first eight lines of a dump, ending in 12 zero bytes. Three conclusions can be drawn immediately:
5A56001000820EE13FE2030133E20301
33E2030133C20EF13FE2030133E20301
33E2030133E2030133E2030133E20301
33E2030133C20EF13FE2030133E20301
33E2030133C20EF13FE2030133E20301
33E2030133C20EF13FE2030133E20301
33E2030133E2030133E2030133E20301
33E2030133C20EF13FE2030133E20301
33E2030133E20911381959FAB0EE9000
81C9E03ADE35CEEEEFC5CF8DE9AC0910
38C2E031DE35CEEEEFC8CF87E95C0920
...
E2030133
- The first five bytes
5A56001000
are some kind of header that does not affect the contents of the dump; - Further content is encrypted in blocks of 4 bytes, with the same dump bytes corresponding to the same bytes in the file:
E2030133 β 00000000
820EE13F β 80078000
C20EF13F β 80070000
E2091138 β E0076001
1959FAB0 β 2A0600FF
EE900081 β 00000A58
C9E03ADE β EAFF2000
- It can be seen that this is not XOR encryption, but something more complex; but at the same time similar blocks of dump correspond to similar blocks in the file - for example, changing one bit
80078000β80070000
corresponds to changing one bit820EE13FβC20EF13F
.
Correspondences between blocks
Let's get a list of all pairs (file block, dump block), and look for patterns in it:
$ xxd -r -p firmware.txt decoded
$ python
>>> f = open('decoded','rb')
>>> data=f.read()
>>> words=[data[i:i+4] for i in range(0,4096,4)]
>>> f = open('dump','rb')
>>> data=f.read()[:4096]
>>> reference=[data[i:i+4] for i in range(0,4096,4)]
>>> list(zip(words,reference))[:3]
[(b'\x82\x0e\xe1?', b'\x80\x07\x80\x00'), (b'\xe2\x03\x013', b'\x00\x00\x00\x00'), (b'\xe2\x03\x013', b'\x00\x00\x00\x00')]
>>> dict(zip(words,reference))
{b'\x82\x0e\xe1?': b'\x80\x07\x80\x00', b'\xe2\x03\x013': b'\x00\x00\x00\x00', b'\xc2\x0e\xf1?': b'\x80\x07\x00\x00', ...}
>>> decode=dict(zip((w.hex() for w in words), (r.hex() for r in reference)))
>>> decode
{'820ee13f': '80078000', 'e2030133': '00000000', 'c20ef13f': '80070000', ...}
>>> sorted(decode.items())
[('00beb5ff', '4c07a010'), ('02057139', '0000f00f'), ('03ef5ed0', '50ff710f'), ...]
This is what the first pairs look like in the sorted list:
00beb5ff β 4c07a010 02057139 β 0000f00f 03ef5ed0 β 50ff710f \ change in bit 24 in the dump changes bits 8, 10, 24-27 in the file 04ef5bd0 β 51ff710f < 0408ed38 β 14002d06 \ 05f92ed7 β ffffd087 | 0a5d22bb β f602dffe> changing bit 25 in the dump changes bits 11, 25-27 in the file 0a62f9a9 β e10f5761 | 0acdc6e4 β a25d2c06 / 0aef53d0 β 53ff710f < 0aef5cd0 -> 52ff710f / change in bit 24 in the dump changes bits 8-11 in the file 0bdebd6f β 4c57a410 0d0c7fec β 0064ffff 0d0fe57f β 18402c57 0d8fa4d0 β bfff88ff 0ee882d7 β eafd7f00 1001c5c6 β 6c570042 \ 1008d238 -> 42003e06> change in bit 1 in the dump changes bits 0, 3, 16-19 in the file 100ec5cf β 6c570040 / 109ec58f β 6c070050 10e1ebdf β 62ff6008 10ec4cdd β dafd4c07 119f0f8f β 08006d57 11c0feee β 2c5f0500 120ff07e β 20420452 125ef13e β 20f600c8 125fc14e β 60420032 126f02af β 02006d67 1281d09f β 400f3488 1281d19f β 400f3088 12a6d0bb β 40073498 12a6d1bb β 40073098 \ 12aed0bf -> 40073490> change to bit 3 in the dump changes bits 2 and 19 in the file 12aed1bf -> 40073090 /> change in bit 10 in the dump changes bit 8 in the file 12c3f1ea β 20560001 \ 12c9f1ea -> 20560002 / changes to bits 0 and 1 in the dump changes bits 17 and 19 in the file ...
Indeed, the following patterns are visible:
- Changes to bits 0-3 in the dump change bits 0-3 and 16-19 in the file (mask
000F000F
) - Changes to bits 24-25 in the dump change bits 8-11 and 24-27 in the file (mask
0F000F00
)
The hypothesis suggests itself that every 4 bits in a dump affects the same 4 bits in every 16-bit half of a 32-bit block.
To check, let's "cut off" the most significant 4 bits in each half-block, and see what pairs we get:
>>> ints=[int.from_bytes(w, 'big') for w in words]
>>> [hex(i) for i in ints][:3]
['0x820ee13f', '0xe2030133', '0xe2030133']
>>> scrambled=[((i & 0xf000f000) >> 12, (i & 0x0f000f00) >> 8, (i & 0x00f000f0) >> 4, (i & 0x000f000f)) for i in ints]
>>> scrambled=[tuple(((i >> 16) << 4) | (i & 15) for i in q) for q in scrambled]
>>> scrambled[:3]
[(142, 33, 3, 239), (224, 33, 3, 51), (224, 33, 3, 51)]
>>> [tuple(hex(i) for i in q) for q in scrambled][:3]
[('0x8e', '0x21', '0x3', '0xef'), ('0xe0', '0x21', '0x3', '0x33'), ('0xe0', '0x21', '0x3', '0x33')]
>>> [b''.join(bytes([i]) for i in q) for q in scrambled][:3]
[b'\x8e!\x03\xef', b'\xe0!\x033', b'\xe0!\x033']
>>> decode=dict(zip((b''.join(bytes([i]) for i in q).hex() for q in scrambled), (r.hex() for r in reference)))
>>> sorted(decode.items())
[('025efd97', 'ffffd087'), ('02a25bdb', 'f602dffe'), ('053eedf0', '50ff710f'), ...]
>>> decode=dict(zip((b''.join(bytes([i]) for i in q[1:]).hex() for q in scrambled), (r.hex()[1:4]+r.hex()[5:8] for r in reference)))
>>> sorted(decode.items())
[('018d90', '0f63ff'), ('020388', '200e06'), ('050309', 'c03000'), ...]
After rearranging the subblocks by 4 bits in the sorting key, the correspondences between pairs of subblocks become even more explicit:
018d90 β 0f63ff
020388 β 200e06 \
050309 β c03000 \ | xx0xxx0x xx0xxx3x
05030e β c0f000 | |
05036e β c06000 | /
050c16 β c57042 |
050cef β c57040 |
05971e β c88007 > xCxxx0xx x0xxx5xx
0598ef β c07050 |
05bfef β c07010 |
05db59 β c9000f |
05ed0e β cff000 <
060ecc β 264fff |
065ba7 β 205fff |
0bed1f β 2ff008 <|
0bfd15 β 2ff086 |
0cedcd β afdc07 <|
10f2e7 β e06a7e > xxFxxx0x xxExxxDx
118d5a β 9fdfff | \
13032b β 40010a | > xxFxxxFx xx8xxxDx
148d3d β fff6fc | /
16b333 β f00e30 |
16ed15 β fffe06 /
1b63e6 β 52e883
1c98ff β 400b57 \
1d4d97 β aff1b7 | xx00xx57 xx9Fxx8F
1ece0e β c5f500 |
1f98ff β 800d57 /
20032f β 00e400 \
200398 β 007401 |
2007fe β 042452 |
2020ef β 057490 |
206284 β 067463 > x0xxx4xx x2xxx0xx
20891f β 00f488 |
20ab6b β 007498 | \
20abef β 007490 | / xx0xxx9x xxAxxxBx
20ed1d β 0ff404 |
20fb6e β 0064c0 /
21030e β 00f000 \
21032a β 00b008 |
210333 β 000000 |
210349 β 00c008 |
21034b β 003007 |
210359 β 00000f |
210388 β 000006 > x00xx00x x20xx13x
21038b β 00300b |
210398 β 007001 |
2103c6 β 007004 |
2103d2 β 008000 |
2103e1 β 008009 |
2103ef β 007000 /
...
Correspondences between subblocks
The above list shows the following matches:
- For the mask
0F000F00
:x0xxx0xx
in dump ->x2xxx1xx
in filex0xxx4xx
in dump ->x2xxx0xx
in filexCxxx0xx
in dump ->x0xxx5xx
in file
- For the mask
00F000F0
:xx0xxx0x
in dump ->xx0xxx3x
in filexx0xxx5x
in dump ->xx9xxx8x
in filexx0xxx9x
in dump ->xxAxxxBx
in filexxFxxx0x
in dump ->xxExxxDx
in filexxFxxxFx
in dump ->xx8xxxDx
in file
- For the mask
000F000F
:xxx0xxx7
in dump ->xxxFxxxF
in filexxx7xxx0
in dump ->xxxExxxF
in filexxx7xxx1
in dump ->xxx9xxx8
in file
We can conclude that each 32-bit block in the dump is split into four 8-bit values, and these values ββare replaced using some lookup tables, for each mask. The contents of these four tables seem to be relatively random, but let's try to extract all of them from our file:
>>> ref_ints=[int.from_bytes(w, 'big') for w in reference]
>>> ref_scrambled=[((i & 0xf000f000) >> 12, (i & 0x0f000f00) >> 8, (i & 0x00f000f0) >> 4, (i & 0x000f000f)) for i in ref_ints]
>>> ref_scrambled=[tuple(((i >> 16) << 4) | (i & 15) for i in q) for q in ref_scrambled]
>>> decode=dict(zip((b''.join(bytes([i]) for i in q).hex() for q in scrambled), (b''.join(bytes([i]) for i in q).hex() for q in ref_scrambled)))
>>> sorted(decode.items())
[('025efd97', 'fdf0f8f7'), ('02a25bdb', 'fd6f0f2e'), ('053eedf0', '5701f0ff'), ...]
>>> decode=[dict(zip((bytes([q[byte]]).hex() for q in scrambled), (bytes([q[byte]]).hex() for q in ref_scrambled))) for byte in range(4)]
>>> decode
[{'8e': '88', 'e0': '00', 'cf': '80', 'e1': 'e6', '1f': '20', 'c3': 'e2', ...}, {'03': '00', '5b': '0f', '98': '05', 'ed': 'f0', 'ce': '50', 'd6': '51', ...}, {'21': '00', '9a': 'a0', 'e0': '0a', '5e': 'f0', '5d': 'b2', 'c0': '08', ...}, {'ef': '70', '33': '00', '98': '71', '90': '6f', '01': '08', '0e': 'f0', ...}]
>>> decode=[dict(zip((q[byte] for q in scrambled), (q[byte] for q in ref_scrambled))) for byte in range(4)]
>>> decode
[{142: 136, 224: 0, 207: 128, 225: 230, 31: 32, 195: 226, 62: 244, 200: 235, ...}, {3: 0, 91: 15, 152: 5, 237: 240, 206: 80, 214: 81, 113: 16, 185: 2, 179: 3, ...}, {33: 0, 154: 160, 224: 10, 94: 240, 93: 178, 192: 8, 135: 2, 62: 1, 120: 26, ...}, {239: 112, 51: 0, 152: 113, 144: 111, 1: 8, 14: 240, 249: 21, 110: 96, 241: 47, ...}]
When the lookup tables are ready, the decryption code is quite simple:
>>> def _decode(x):
... scrambled = ((x & 0xf000f000) >> 12, (x & 0x0f000f00) >> 8, (x & 0x00f000f0) >> 4, (x & 0x000f000f))
... decoded = tuple(decode[i][((v >> 16) << 4) | (v & 15)] for i, v in enumerate(scrambled))
... unscrambled = tuple(((i >> 4) << 16) | (i & 15) for i in decoded)
... return (unscrambled[0] << 12) | (unscrambled[1] << 8) | (unscrambled[2] << 4) | (unscrambled[3])
...
>>> hex(_decode(0x00beb5ff))
'0x4c07a010'
>>> hex(_decode(0x12aed1bf))
'0x40073090'
Firmware header
At the very beginning, there was a five-byte header before the encrypted data
5A56001000
. The first two bytes β the signature 'ZV'
β indicate that the LZF format is being used ; further indicated the compression method ( 0x00
- no compression) and length ( 0x1000
bytes).
The owner of the car, who gave me the files for analysis, confirmed that LZF compressed data are also found in the firmware. Fortunately, the implementation of LZF is open source and fairly simple, so along with my analysis, he managed to satisfy his curiosity about the contents of the firmware. Now he can make changes to the code - for example, auto-start the engine when the temperature drops below a predetermined level in order to use the car in the harsh Russian winter.