In this post I will the structure of GoPro’s GPMF standard, how to create a GPMF binary, and how to inject it into a mp4 video file.
To understand what was shown at the end of the post, we need to understand the structure of GPMF. Luckily GoPro have written extensively on the subject.
A recap on GPMF
What GoPro say;
GPMF – GoPro Metadata Format or General Purpose Metadata Format – is a modified Key, Length, Value solution, with a 32-bit aligned payload, that is both compact, full extensible and somewhat human readable in a hex editor. GPMF allows for dependent creation of new FourCC tags, without requiring central registration to define the contents and whether the data is in a nested structure. GPMF is optimized as a time of capture storage format for the collection of sensor data as it happens.
OK, that’s quite a mouthful. I promise by the end of this post it will all make sense.
Let us start by picking up where the last post left off, the box specification of GMPF;
ftyp [type ‘mp41’] mdat [all the data for all tracks are interleaved] moov [all the header/index info] ‘trak’ subtype ‘vide’, name “GoPro AVC”, H.264 video data ‘trak’ subtype ‘soun’, name “GoPro AAC”, to AAC audio data ‘trak’ subtype ‘tmcd’, name “GoPro TCD”, starting timecode (time of day as frame since midnight) ‘trak’ subtype ‘meta’, name “GoPro MET”, GPMF telemetry
Raw telemetry (inside
To begin with, it is worth familiarising yourself with the data that GPMF supports (put another way, the types and values of telemetry that can be recorded).
Above is a snippet of what telemetry the GoPro MAX writes into GPMF.
Note how the first column is titled “FourCC”. e.g. the data for the 3-axis accelerometer is recorded under “ACCL” in the format; Z,X,Y.
Now lets start thinking about representing this as binary…
Here we can see the FourCC value is the Key (so in the previous example,
ACCL). The key has a Length and a Value.
So lets look an example, of GPS data to start with. In GPMF the key for this data is recorded under the key
GPS5 which can be used to record the values; latitude (degrees), longitude (degrees), altitude (meters), 2D ground speed (meters/second), and 3D speed (meters/second).
Let us assume we start with a GPX file with the following;
<?xml version="1.0" encoding="UTF-8"?> <gpx xmlns="http://www.topografix.com/GPX/1/1" version="1.1" creator="https://github.com/juanirache/gopro-telemetry"> <trk> <name>undefined</name> <desc>GPS (Lat., Long., Alt., 2D speed, 3D speed) - [deg,deg,m,m/s,m/s]</desc> <src>GoPro Max</src> <trkseg> <trkpt lat="51.2725595" lon="-0.8459936"> <ele>84.067</ele> <time>2021-09-04T07:25:17.352Z</time> <fix>3d</fix> <hdop>139</hdop> <cmt>altitude system: MSLV; 2dSpeed: 0.094; 3dSpeed: 0.12</cmt> </trkpt>
If this looks familiar, it’s the output from GoPro Telemetry.
It gives us all the values needed for the
- latitude = 51.2725595 degrees
- longitude = -0.8459936 degrees
- altitude = 84.067 meters
- 2dSpeed = 0.094 meters/second
- 3dSpeed = 0.12 meters/second
We know in
Telemetry metadata (inside
trak box you’ll see more nested boxes;
'moov' 'trak' 'tkhd' < track header data > 'mdia' 'mdhd' < media header data > 'hdlr' < ... Component type = 'mhlr', Component subtype = 'meta', ... Component name = “GoPro MET” ... > 'minf' 'gmhd' 'gmin' < media info data > 'gpmd' < the type for GPMF data > 'dinf' < data information > 'stbl' < sample table within > 'stsd' < sample description with data format 'gpmd', the type used for GPMF > 'stts' < GPMF sample duration for each payload > 'stsz' < GPMF byte size for each payload > 'stco' < GPMF byte offset with the MP4 for each payload >
I won’t go over
traks again, this was covered last week. Today I will focus exclusively on the
But let us recap just one thing; the boxes in the meta
trak describe data inside the
mdat (e.g. duration, track type etc.) and what you can find where.
Let’s have a look at out sample video
tkhd(track header) box: this is the track header data, for
mdia(track media structure) box: describes the main information of the media data sample of this audio and video track/stream (
trak), and is a very important box for the player…
- description about data inside the
mdat(e.g. duration, track type etc.) and what you can find where (e.g.
co64which provides the offset value to fetch the right data from the
co64 which provides the offset value to fetch the right data from the
Turning back to the books (aka the mp4 specification) for a second and looking at the boxes nested in the GMPF meta
A deeper look at the
minf (media info) box
The boxes Almost all boxes contain a concept of time (or a stream). The most obvious is the content in the
mdat box – the audio and imagery in the video is streamed over time. This is the same for other boxes too, including those of interest to this exercise for the metadata streams (more on that next week when I talk a bit more about what is in “a box”).
* `mdhd` (media header) box: The overall information of the current audio/video track/stream (`trak`). The box has a duration field and a timescale field. The value of duration/timescale is the duration of the current stream. * `hdlr` (handler) box: is used to specify the type of the stream (in this case always `GoPro MET`) * `minf`: (media info) box; media information container. This is where the main telemetry is housed...
In the case of GPMF,
minf contains 3 nested boxes:
The sample table box
stbl, you can find the corresponding sample description information (
stsd), timing information (
stts), sample size information (
stsz), chunk location information (
stco), and GoPro metadata (
stts are all box types in the mp4 specification.
gpmd is custom.
Writing values into boxes
Lets start with a simple examp
’’ < sample table within > ‘stsd’ < sample description with data format ‘gpmd’, the type used for GPMF > ‘stts’ < GPMF sample duration for each payload > ‘stsz’ < GPMF byte size for each payload > ‘stco’
32-bit aligned payload
If you open up the binary file, example in last post, you will see each line is 32 characters long.
starting point for our script will be at end of video/audio binary data as we append telemetry binary to the end of this binary
The GPMF defined
traks in the
moov box map to
0:2) in ffprobe – each stream in the
mdat track has its own corresponding metadata
trak shown in the GPMF spec in the sample video is missing (as it is not applicable to this video type – I’ve only seen this in unprocessed GoPro
.360 video files).
There can be a varying number of
trak boxes depending on how the video is created. For example, if there is only video and no audio/metadata track then there would be only one
trak box present.
Another point to note is that there might be more than one of the stream of the same type depending on the video mode, e.g a
.360 output (note, 360’s follow the mp4 standard, and thus are mp4s in all but name) from ffprobe which contains two video streams in the
Stream #0:0[0x1](eng): Video: h264 (High) (avc1 / 0x31637661), yuv420p(tv, bt709, progressive), 3072x1536 [SAR 1:1 DAR 2:1], 38042 kb/s, 59.94 fps, 59.94 tbr, 600 tbn (default) Stream #0:1[0x2](eng): Video: h264 (High) (avc1 / 0x31637661), yuv420p(tv, bt709, progressive), 3072x1536 [SAR 1:1 DAR 2:1], 38042 kb/s, 59.94 fps, 59.94 tbr, 600 tbn (default)
Thus this video will have two
trak boxes in the
moov box to describe the metadata of the two video streams in the
I will go into GPMF in the next post, but back to the main point I am trying to make; the metadata for the telemetry (not the telemetry itself) is housed in the
moov box. In the
moov box is nested
trak files for the streams held in the
├── b'mdat' └── b'moov' ├── b'mvhd' ├── b'trak' (vide) │ └── ... ├── b'trak' (soun) │ └── ... └── b'trak' (meta) └── ...
Writing data into 1) nested boxes inside the
trak (meta) boxes and 2) raw telemetry in the
mdat is what we’re interested in for this exercise.
I’ll leave the raw telemetry (
mdat) aside for a minute.
As noted, more than one
trak box (aka atom) can exist – in our sample video there are three
Never miss an update
Sign up to receive new articles in your inbox as they published.