TrapTagger was designed around the most-common camera trap use case – where cameras are placed perpendicular to a game trail. In such a setup, groups of animals tend to pass by the camera and trigger a discrete set of images. These images are then grouped together into an image cluster for easy reference and annotation etc. Moreover, most animals tend to be relatively close to the camera, where they can be more-easily identified on the low resolution imagery generated by these devices. Finally – and most significantly – these image clusters tend (>99% of the time) to only contain a single species. This allows us to make a powerful simplifying assumption in the name of precision when applying our AI species classifications – that there should only be one species and any additional species should go to human review because they are likely to be errors.
However, alternative camera-trap setups tend to violate these assumptions – such as cameras set up to monitor waterholes and bait stations. In these scenarios a number of things occur:
- Groups of animals can spend extended periods in front of the camera, triggering vast quantities of images.
- These scenes regularly contain multiple species, with their relative mix waxing and waning as different groups arrive and depart.
- Due to differing group sizes and their position relative to the camera, some species can be eclipsed by another making it difficult to differentiate these from false positives.
- Events are often blurred together due to the overlap of species coming and going.
- Animals can be very far away from the camera – such as on the opposite side of a waterhole.
- In arid locations, the scenes can often be flooded with large quantities of non-focal species like birds.

Figure 1: How many species can you see? (Photo credit: Ongava Research Centre)
Additionally, the animal detector that we use regularly confuses rocks, man-made objects and branches as animals – especially with our intentionally-conservative thresholds. These false static detections have a number of negative effects in terms of reducing species classification accuracy, requiring users to manually view empty images, and inflating image-level animal counts. To combat this, we introduced automated static-detection flagging and manual human review. This allows us to use a more-conservative threshold that over-suggests potential false detections with the intention to more-thoroughly identify and remove them. However, in these alternative setups, animals are regularly detected in the same locations – at the water’s edge, or at a bait trap – in turn triggering significant quantities of false alarms, requiring a lot of human verification.
TrapTagger’s latest update was primarily aimed at addressing these alternative camera setups, but a lot of the updates were chosen so as to also benefit the majority of the platform’s users – such as the new clustering rules and the related-cluster workflow. These updates can be summarised as follows, with detailed discussions below:
- The data type for a survey can now be specified at upload time – waterholes, baited traps, or general.
- Waterholes and baited traps use more-strict criteria for identifying false static detections, resulting less manual review.
- Waterholes can have multiple species automatically classified in a cluster without human review.
- The threshold for the manual review of potential additional species in a cluster for waterholes has also been lowered.
- The maximum cluster size has been changed from 50 images to 15 minutes.
- The breakdown of long clusters has been switched from being species-classification based to purely based on the presence/absence of animals (up to a maximum of 15 minutes).
- To facilitate the potentially enormous 15 minute clusters that can result (in terms of images and detections/boxes), a number of optimisations were carried out in the various annotation and exploration interfaces.
- A dedicated interface was added for cross-checking clusters from the same site with different labels that occurred within 10 minutes of each other – aiding in the identification of additional species that might have been missed in a cluster.
1. Data-Source Selection
At survey-creation time, users can now select the source of their data:
-
General:
- This category applies to all camera trap setups that don’t fall under the other two categories. It treats data in the same way in which it was always historically treated in the platform and should be the (conservative) default if you’re unsure.
- Species Classification: only a single species can be automatically identified. Additional species must be verified manually in the AI check workflow.
- Static Detections: less-strict criteria to more-thoroughly identify static detections.
-
Waterholes:
- This category applies to all camera trap setups where multiple species are regularly expected and anticipated to hang around for extended periods.
- Species Classification: multiple species can be automatically identified. The threshold for additional species that require manual verification is also reduced.
- Static Detections: more-strict criteria to reduce the number of false alarms and hence reduce the human verification workload in exchange for less thorough removal of these detections.
-
Baited Traps:
- This category applies to all camera trap setups where a single species is still anticipated, but animals are expected to hang around in specific areas in the scene.
- Species Classification: only a single species can be automatically identified. Additional species must be verified manually in the AI check workflow.
- Static Detections: more-strict criteria to reduce the number of false alarms and hence reduce the human verification workload in exchange for less thorough removal of these detections.

Figure 2: The new options in the survey-creation form.
Additionally, a lot of users will also note the addition of the “advanced options” in the survey-creation form. These features themselves are not new, but the choice to prioritise them on this form is. The idea here is that these features are particularly useful for handling waterhole-type data, and needed to be enabled in the survey edit form before the creation of your annotation set – which users often forgot to do. This inclusion simply makes the process easier and less error prone for users that make use of them. For those who are unfamiliar with these options:
- Ignore Small Detecions: this option hides all detections/boxes that are less that 0.25% of the total image area. This has the effect of hiding all animals that are either too small (birds, rodents etc.) or too far away, allowing you to focus on the large mammals in the foreground of your images.
- Ignore Sky Detections: this option hides all detections/boxes whose bottom occurs in the top third of the image. This has the effect of hiding birds.
Both these options were specifically added to aid in the masking of birds, which can be quite a significant false detection source (in a mammal study) in arid-region waterhole surveys. Significantly, one can switch these options on/off at any time for a particular survey, allowing you to perform a quick pass over the survey with them switched on to generate rapid results, before then switching them off and examining the potential animal clusters that were missed.

Figure 3: Birds?! (Photo credit: Ongava Research Centre)
2. Clustering
As before, clusters are simply defined as images taken within a minute of one another at the same site (across all the cameras there). In theory, these images all represent a single trigger event, where somebody only needs to look at a single image to know what is contained within. However, in certain circumstances camera traps can persistently take images over an extended period. To prevent excessively-long clusters in which important information (such as additional species) can be easily hidden away, long clusters have always been split up.
Previously, a cluster was considered too long if it contained more than 50 images (which is pretty rare in normal camera-trapping scenarios). It was then split up based on the species classified within. In other words, if the first lot of images appeared to contain impala, then there was a short break in which no animals were detected, and then finally a set of images appeared to contain lions – the cluster would be broken up into those three separate events (up to a maximum of 50 images each). However, this technique tends to struggle when there are many species in each image. It also suffers in line with classifier performance – where there are species that our AI is weak on, or regions where we have a less-developed classifier, or simply in cases where there are many distant (small) detections in which a lack of resolution reduces classification performance. Moreover, since cluster splitting was not independent of species classification, the classifier could misclassify the species in a single image and split out that image out on it’s own – and auto label it incorrectly.
Generally speaking, this technique tended to over-split clusters which is desirable as it is erring on the side of caution (preventing additional species from being missed in the middle of a long cluster). However, in situations where the animals mill about in front of the camera for extended periods (like waterholes), this over-splitting created a lot more manual annotation work and failed to define useful trigger events. To better handle these sorts of events, and to decouple species classification from the clustering process, we have chosen to instead perform long-cluster splitting purely on the presence/absence of detections/boxes, where a split in animal continuity results in a cluster being split. Significantly, in the above example, the impala-lion cluster should still be split into three clusters in the same way.
Further, we also changed the definition of a long cluster from one containing more than 50 images, to one spanning more than 15 minutes. This helps result in more consistent clustering despite differences in camera trap sensitivity and settings. So in other words, if a herd of animals mulls about around a waterhole for an hour, it will much more consistently be broken down into four 15 minute clusters – which is much more conducive to annotation.
Finally, it is also noted that this approach aids in the handling of cameras operating on a timelapse mode – where images are taking on a regular time interval. In these cases, the images are then simply clustered based on the presence/absence of animals into useful trigger events (assuming a 1 minute or less time interval).
3. Related-Cluster Annotation Interface
In normal camera-trap setups, clusters at a particular site/camera tend to be rather sparse in the time domain and hence independent. However, in these waterhole-type scenarios with constant activity for long unbroken periods where we have had to rely on the cluster-splitting algorithm to divide up hours of images – the clusters tend to be closely related. As such, we can take advantage of this fact to audit the species labels of such clusters. For example, if a 45 minute period of activity was split into three clusters where the first and third were identified as containing zebra and wildebeest, whilst the middle one was only identified as containing zebra. In such a case, there is a very good chance that the middle cluster also contains wildebeest – and this should be manually checked.

Figure 4: The new related-cluster workflow has been added to the manual-annotation launch menu.
Therefore we have added a new related-clusters workflow to the the manual annotation options available on an annotation set. In this workflow, you are shown all clusters that have a related cluster with differing labels – where a related cluster is one that occurred within 10 minutes at the same site. More specifically, we have used the same interface as the AI check workflow, where you are shown the cluster, its current labels and all the labels that it is potentially missing – which you can then individually either accept or reject. You are offered all the cluster images to look at, but this can be quite infeasible for particularly large clusters. As such, we pre-determine which images you should look at (and have your browser pre-load them when they are ahead in your work queue), making that decision for you. In particular, these pre-selected images will either be the those determined to potentially contain the suggested species (based on the species classifier), or in the case that no such images exist – 5 equally spaced chronological images from across the cluster. So in theory you only need to look at the images you are forced to – no more and no less (unless you immediately see the suggested species and accept the suggestion). However, should you choose to trudge through the entire cluster, we have also added the option to skip 5 images at a time to ease the process a little.
Although this annotation workflow was added for waterhole-type datasets, it can be quite useful for other types of data (although the number of related clusters you will have will be significantly fewer). In particular, it can aid in identifying cases where annotators were forced to label a cluster as unknown by offering the labels from related clusters where the animal was perhaps better visible. It can also assist in identifying additional species in clusters that had been missed – especially where the secondary species is in the background (relative to some species in the foreground).

Figure 5: The new related-clusters annotation interface is based on the already-existing AI-check workflow and suggests additional species that you might have missed based on related clusters that are from the same site within a 10 minute window. (Photo credit: WildCRU)