GPX is a well-liked XML format for operating or biking tracks with geocoordinates. This can be a how-to for cleansing up a GPX file by eradicating undesirable or privacy-sensitive info.
Many apps that file exercise routes and might export them as GPX recordsdata embody extra knowledge than the plain GPS coordinates. As an illustration, a GPX file from my favourite recording app, Guru Maps, appears to be like like this:
<?xml model="1.0" encoding="utf-8"?> <gpx model="1.1" creator="Guru Maps/4.5.2" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns="http://www.topografix.com/GPX/1/1" xmlns:gom="https://gurumaps.app/gpx/v2" xsi:schemaLocation="http://www.topografix.com/GPX/1/1 http://www.topografix.com/GPX/1/1/gpx.xsd https://gurumaps.app/gpx/v2 https://gurumaps.app/gpx/v2/schema.xsd"> <trk> <identify>Barnimer Dörferweg</identify> <kind>TrackStyle_FF7F00C8</kind> <trkseg> <trkpt lat="52.6254614634" lon="13.4092010169"> <ele>54.238586451</ele> <time>2020-05-10T05:30:38.997Z</time> <hdop>4.6875</hdop> <vdop>3.375</vdop> <extensions> <gom:pace>5.5661926282</gom:pace> <gom:course>329.1938658731</gom:course> </extensions> </trkpt> … <!-- 1000's of observe factors -->
This observe consists of the next properties for every observe level:
- Geocoordinates (latitude and longitude)
- Horizontal and vertical dilution of precision (hdop/vdop)
- Present pace
- Present course/heading
Plus, Guru Maps makes use of the observe’s
<kind> attribute to encode the colour of the observe as displayed within the app in a non-standardized format (
Some apps additionally embody coronary heart price or different health measurements.
All this knowledge is helpful for archiving tracks or importing them into one other app. However earlier than sharing this observe publicly, I’d need to clear the information up first:
- The one really vital items of knowledge are the coordinates and presumably the elevation.
- Timestamps are personal knowledge. I don’t need to share these.
- The opposite measurements are largely irrelevant.
GPX recordsdata can change into fairly massive (1000’s of observe factors is widespread), so lowering the quantity of information can also be good for file sizes and parsing efficiency.
One non-obligatory processing step makes use of xmllint, which comes preinstalled on macOS.
XSLT file for eradicating unused namespaces
Unique supply: Dimitre Novatchev on Stack Overflow.
Operating the command
Assuming your supply file is known as
enter.gpx and the XSLT file you downloaded above is within the present listing, that is the total command to course of the GPX file and save the end result to
xmlstarlet ed -d "//_:extensions" -d "/_:gpx/_:metadata/_:time" -d "/_:gpx/_:trk/_:kind" -d "//_:trkpt/_:time" -d "//_:trkpt/_:hdop" -d "//_:trkpt/_:vdop" -d "//_:trkpt/_:pdop" -u "/_:gpx/@creator" -v "Shell script" enter.gpx | xmlstarlet tr remove-unused-namespaces.xslt - | xmlstarlet ed -u "/_:gpx/@xsi:schemaLocation" -v "http://www.topografix.com/GPX/1/1 http://www.topografix.com/GPX/1/1/gpx.xsd" | xmllint --c14n11 --pretty 2 - > output.gpx
This sequence performs the next steps:
- Delete all
- Delete the timestamp from the file’s
<metadata>part if current.
- Delete the
- Delete the
<pdop>components from all observe factors.
- Set the file’s
- Now that extension fields are gone, take away all unused XML namespaces from the file header.
- Delete all
xsi:schemaLocationentries besides the one for the GPX schema.
Run the file by means of xmllint for formatting. The
--c14n11choice performs XML Canonicalization (C14N). Amongst many different issues, canonicalization replaces numeric character entities within the XML with their regular Unicode characters, which is vital for my use case.
For instance, the textual content “Dörferweg” within the supply would change into “Dörferweg”. I discovered that a few of the instruments I take advantage of insert non-ASCII characters as numeric codes and different instruments don’t show these appropriately.
The processed GPX file appears to be like like this:
<gpx xmlns="http://www.topografix.com/GPX/1/1" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" creator="Shell script" model="1.1" xsi:schemaLocation="http://www.topografix.com/GPX/1/1 http://www.topografix.com/GPX/1/1/gpx.xsd"> <trk> <identify>Barnimer Dörferweg</identify> <trkseg> <trkpt lat="52.6254614634" lon="13.4092010169"> <ele>54.238586451</ele> </trkpt> <trkpt lat="52.6255090307" lon="13.4091548326"> <ele>53.9600219977</ele> </trkpt> …
The processing steps above are those that work for me given the apps I take advantage of. Your mileage could differ in case your instruments add different knowledge to your GPX recordsdata. Be happy to edit the command accordingly. XmlStarlet makes use of XPath syntax to pick out which components to function on. The
xmlstarlet sel command is helpful for inspecting a supply file and making an attempt out the required XPath incantations.
Lastly, it’s a good suggestion to validate the processed GPX file in opposition to the official GPX schema:
xmlstarlet val --quiet --err --xsd http://www.topografix.com/GPX/1/1/gpx.xsd output.gpx