Home

Overview

cydr is an R package that has been designed to help automate the cleaning of agricultural yield data. Through four core functions, cydr allows users to identify five common errors within yield data: narrow passes, pass-end turns, speed errors, stop-piles, and residual outliers.

cydr’s implementation as an open-source R package with predefined defaults and optional customization provides reduced barriers to entry while maintaining powerful flexibility for experienced users.

Motivation

Decision-making in agriculture increasingly relies on quantitative analyses of large-scale data. This data is generated by sensors on farming machinery such as seeders, sprayers, and combine harvesters. Good decision-making requires valid data, yet automated collection often introduces systematic errors. Yield data is an example of useful data that is also especially error prone. Agricultural researchers have developed algorithms for identifying common problems and cleaning data, but the algorithms are implemented in GUI applications, most commonly as Excel macros. Often, they are one-off implementations. Although there are some advantages to this approach, it hinders the development of reproducible research in precision agriculture and introduces unnecessary costs. An R package called cydr was developed to facilitate reproducible research in precision agriculture, and to simplify cleaning of yield data by identifying and removing some common problems. cydr ensures accessibility for new users, while providing powerful functionality for experienced users.

To Do

Convenience functions to

  • Verify compatibility of data frames with cydr
  • Perform all core functions at once
  • Aggregate errors
  • Produce summary statistics
  • Create yield maps

Create a reference dataset

A data set which includes information on whether data points are truly erroneous or not will improve our ability to evaluate cydr’s effects and investigate the prevalence of false positives and false negatives. This will be especially valuable for the identification of errors associated with narrow passes and stop-piles, both of which are easy to classify events when observing the operation. Methods of creating this reference dataset may include operator recording of events such as narrow passes and beaver huts, image or video analysis to identify physical features such as stop-piles, or extensive manual data analysis and consultation with producers to determine true and erroneous observations.

Contribute

If you find issues with the current implementation of cydr, or have ideas for new features and functionality let us know through cydr’s GitHub page.