Over the years, the csv-generation code became more and more bloated as we squeezed in extra fields and features, or made changes to facilitate edge cases. Eventually, it got to the point that the csv files started taking unreasonably long to generate – sometimes in the order of hours for large datasets. Moreover, the code wasn’t memory optimised since the surveys we were handling at that time were small by today’s standards – meaning that particularly large surveys could cause trouble for our low-cost parallel-processing servers.

As such, we made the call to strip down, simplify, and completely rewrite the csv-generation functionality with a focus on speed and memory efficiency. In particular, we chose to remove a number of aggregation fields that were under-utilised. These are fields such as the species list detected at a particular camera or site, where one essentially aggregates all the species labels across all the clusters associated with that camera. The reality is that this information can be easily derived by users from the camera ID and species label fields – so it does not make sense for us to provide this information explicitly.

This philosophy was further extended to the custom-format interface, where we combined the dual dropdown-menu approach (one for the data level, and one for the related information) into a single dropdown menu with a list of fields with simpler/clearer names. This should significantly speed up the process of specifying a custom csv format, whilst also making things easier for new users.

Figure 1: The new custom-CSV form.

Finally, we used to provide three different ways to handle images/clusters with multiple species labels: list, columns and rows, where the default was the multiple-column option. However, this option presented a number of roadblocks in the code-optimisation and parallelisation process due to the variable number of columns that could result. Moreover, upon reflection we determined that this multi-column approach is more difficult to process automatically using scripts by our users, whilst also not being in line with what is best practice in these sorts of files. As such, we dropped support for this option – leaving just the list and row options, with the former as our default. Again, we do not believe that generating such multi-column formats from either of the other two formats is out of reach for our user base – so we do not believe that this should present a major challenge to our users.

Overall, we believe that most users will benefit from the improved ease of use of the custom csv generator along with the significantly reduced generation times – which should be in the order of just a few minutes for most surveys. If you find yourself on the receiving end of these changes, and find that you are unable to perform the requisite aggregation yourself to be able to convert the new csv files into your organisation’s format, please reach out to us for assistance.