PlantCV v2, a new metadata proce

In PlantCV v2, a new metadata processing system was added to allow for flexibility in file naming both within and between experiments and systems. If a white color standard is visible within the image, the user can specify a region of interest. Alternatively, gaussian_blur determines the value of the central pixel by multiplying its and neighboring pixel values by a normalized kernel and then averaging these weighted values (i.e., image convolution) (Kaehler & Bradski, 2016). PlantCV v2 has added new functions for image white balancing, auto-thresholding, size marker normalization, multi-plant detection, combined image processing, watershed segmentation, landmarking, and a trainable naive Bayes classifier for image segmentation (machine learning). Each channel of the image is scaled relative to the reference maximum. The triangle threshold method uses the histogram of pixel intensities to differentiate the target object (plant) from background by generating a line from the peak pixel intensity (Duarte, 2015) to the last pixel value and then finding the point (i.e., the threshold value) on the histogram that maximizes distance to that line. The function uses the input mask to calculate a Euclidean distance map (Liberti et al., 2014). Continuous integration provides a safeguard against code updates that break existing functionality by providing a report that shows which tests passed or failed for each build (Wilson et al., 2014). The Otsu, mean, and Gaussian threshold functions in PlantCV are implemented using the OpenCV library (Bradski, 2000). The rotate_img and shift_img functions allow the image to be adjusted so objects are better aligned to a grid pattern. If there is no clustered object in a grid cell, no image is outputted. The PlantCV metadata processing system is part of the parallelization tool and works by using a user-provided template to process filenames. New functions have been added to PlantCV v2 that enable individual plants from images containing multiple plants to be analyzed. For example, an analysis of leaf data might utilize a larger window size to identify the tips of lobes whereas smaller window sizes would be able to capture more minute patterns such as individual leaf serrations. The location of landmark points can be used to examine multidimensional growth curves for a broad variety of study systems and tissue types and can be used to compare properties of plant shape throughout development or in response to differences in plant growth environment. (B) The cluster_contours_split_img function was used to split the full image into individual plants. 2A). Graphs were produced using Matplotlib v2.0.2 (Hunter, 2007) and ggplot2 v2.2.1 (Wickham, 2009). Consequently, plant phenotyping is widely recognized as a major bottleneck in crop improvement (Furbank & Tester, 2011). Additions or revisions to the PlantCV code or documentation are submitted for review using pull requests via GitHub. When the angle score is calculated for each position along the length of a contour, clusters of acute points can be identified, which can be segmented out by applying an angle threshold. Preliminary evidence from a water limitation experiment performed using a Setaria recombinant inbred population indicates that vertical distance from rescaled leaf tip points identified by the acute_vertex function to the centroid is decreased in response to water limitation and thus may provide a proximity measurement of plant turgor pressure (Figs. The modular structure of the PlantCV package makes it easier for members of the community to become contributors. Second, PlantCV was written in Python, a high-level language widely used for both teaching and bioinformatics (Mangalam, 2002; Dudley & Butte, 2009), to facilitate contribution from both biologists and computer scientists. Steven T. Callen analyzed the data, contributed reagents/materials/analysis tools, wrote the paper, reviewed drafts of the paper. To assess how well the two-class naive Bayes method identifies plant material in comparison to thresholding methods, we reanalyzed Setaria images (Fahlgren et al., 2015) using the naive Bayes classifier and compared the pixel area output to pipelines that utilize thresholding steps (Fig. For example, two classes of features in an image may be visually distinct but similar enough in color that simple thresholding is not sufficient to separate the two groups. As an example, we used images of wheat leaves infected with wheat rust to collect pixel samples from four classes: non-plant background, unaffected leaf tissue, rust pustule, and chlorotic leaf tissue, and then used the naive Bayes classifier to segment the images into each class simultaneously (Fig. However, fully automated segmentation of individual organs such as leaves remains a challenge, due to issues such as occlusion (Scharr et al., 2016). The PlantCV SQLite database schema was simplified so that new tables do not need to be added for every new camera system (Fig. For this module to function correctly we assume that the size marker stays in frame, is unobstructed, and is relatively consistent in position throughout a dataset, though some movement is allowed as long as the marker remains within the defined marker region of interest. For Type III landmarks, the x_axis_pseudolandmarks and y_axis_pseudolandmarks functions identify homologous points along a single dimension of an object (x-axis or y-axis) based on equidistant point locations within an object contour. Each function has a debugging option to allow users to view and evaluate the output of a single step and adjust parameters as necessary. Finally, the use of a permissive, open-source license (MIT) allows PlantCV to be used, reused, or repurposed with limited restrictions, for both academic and proprietary applications. We would like to thank Melinda Darnell, Leonardo Chavez, Kevin Reilly, and the staff of both the Danforth Center Facilities and Support Services group and the Plant Growth Facility for careful maintenance of the Danforth Center phenotyping facilities. With the cluster_contour_split_img function, a text file with genotype names can be included to add them to image names. The focus of the paper associated with the original release of PlantCV v1.0 (Fahlgren et al., 2015) was not the structure and function of PlantCV for image analysis, but rather an example of the type of biological question that can be answered with high-throughput phenotyping hardware and software platforms. In PlantCV v2, several service integrations were added to automate common tasks during pull requests and updates to the code repository. 4A). (B) Example of a classified image. As noted above for the two-class approach, it is important to adequately capture the variation in the image dataset for each class when generating the training text file to improve pixel classification. Jupyter compatibility allows users to immediately visualize output and to iteratively rerun single steps in a multi-step PlantCV pipeline, which makes parameters like thresholds or regions of interest much easier to adjust. John G. Hodge and Andrew N. Doust contributed to the research described while working at the University of Oklahoma. If images are captured in a greenhouse, growth chamber, or other situation where light intensity is variable, image segmentation based on global thresholding of image intensity values can become variable. Malia A. Gehan, Noah Fahlgren, Arash Abbasi, Jeffrey C. Berry, Steven T. Callen, Leonardo Chavez, Max J. Feldman, Kerrigan B. Gilbert, Steen Hoyer, Andy Lin, Csar Lizrraga, Michael Miller and Monica Tessman contributed to the research described while working at the Donald Danforth Plant Science Center, a 501(c)(3) nonprofit research institute. Additionally, PlantCV v2 includes the acute_vertex function that uses the same chain code-based pseudo-landmark identification algorithm used in the acute function except that it uses an adjustable local search space criteria to reduce the number of angle calculations, which speeds up landmark identification. Kernel density estimation (KDE) is used to calculate a probability density function (PDF) from a vector of values for each HSV channel from each class. There is growing interest among the PlantCV user community to process images with multiple plants grown in flats or trays, but PlantCV v1.0 was built to processes images containing single plants. For PlantCV v2, the parallelization framework was completely rewritten in Python using a multiprocessing framework, and the use of Matplotlib was updated to mitigate the issues and processor constraints in v1.0. There does not need to be an object in each of the grid cells. Department of Plant Biology, Ecology, and Evolution, Oklahoma State University, Computational and Systems Biology Program, Washington University in St. Louis, Arkansas Biosciences Institute, Arkansas State University, Arkansas Biosciences Institute, Department of Chemistry and Physics, Arkansas State University, Missouri University of Science and Technology, Department of Plant Biology, University of Georgia, Department of Agronomy and Horticulture, Center for Plant Science Innovation, Beadle Center for Biotechnology, University of Nebraska - Lincoln, This is an open access article distributed under the terms of the. 2). In other cases, PlantCV can be combined with new and existing tools. The number of rows and columns approximate the desired size of the grid cells. Standards and interoperability: Improved interoperability of PlantCV with data providers and downstream analysis tools will require adoption of community-based standards for data and metadata (e.g., Minimum Information About a Plant Phenotyping Experiment; wiek Kupczyska et al., 2016). 4B). While segmentation and analysis of whole plants in images provides useful information about plant size and growth, a more detailed understanding of plant growth and development can be obtained by measuring individual plant organs. PlantCV v1.0 required pipeline development to be done using the command line, where debug mode is used to write intermediate image files to disk for each step. If there is a conflict in the number of names and objects, a warning is printed and a correction is attempted. If you are following multiple publications then we will send you Furthermore, even with methods that adjust for inconsistencies between images (e.g., white balancing and auto-thresholding functions), inconsistent lighting conditions in a growth chamber, greenhouse, or field can still make bulk processing of images with a single workflow difficult. For complex backgrounds (or non-targeted objects), several classes may be required to capture all of the variation. PeerJ promises to address all issues as quickly and professionally as possible. (2017). Core PlantCV developers do not filter additions of new functions in terms of perceived impact or number of users but do check that new functions follow the PlantCV contribution guide (see the sections on contributing in the online documentation). After objects are clustered, the cluster_contour_split_img function splits images into the individual grid cells and outputs each as a new image so that there is a single clustered object per image. The naive_bayes_classifer function uses these PDFs to calculate the probability (using Bayes theorem) that a given pixel is in each class. Kerrigan B. Gilbert prepared figures and/or tables, reviewed drafts of the paper. Typos, corrections needed, missing information, abuse, etc. The pixel area of the marker is returned as a value that can be used to normalize measurements to the same scale. Image blurring, while reducing detail, can help remove or reduce signal from background noise (e.g., edges in imaging cabinets), generally with minimal impact on larger structures of interest. (C) Example of a merged pseudocolored image with pixels classified by the naive_bayes_classifier as background (black), unaffected leaf tissue (green), chlorotic leaf tissue (blue), and pustules (red). This function estimates the vertical, horizontal, Euclidean distance, and angle of landmark points from two landmarks (centroid of the plant object and centroid localized to the base of the plant). Several updates to PlantCV v2 addressed the need to increase the flexibility of PlantCV to analyze data from other plant phenotyping systems. The size marker function allows users to either detect a size marker within a user-defined region of interest or to select a specific region of interest to use as the size marker. PlantCV functions can be assembled into simple sequential or branching/merging pipelines. The pull request mechanism is essential to protect against merge conflicts, which are sections of code that have been edited by multiple users in potentially incompatible ways. Jeffrey C. Berry, Leonardo Chavez, Andy Lin, Csar Lizrraga, Michael Miller, Eric Platon, Monica Tessman and Tony Sax contributed reagents/materials/analysis tools, reviewed drafts of the paper. The landmark functions in PlantCV output untransformed point values that can either be directly input into morphometric programs in R (shapes (Dryden & Mardia, 2016) or morpho (Schlager, 2017)) or uniformly rescaled to a 0-1 coordinate system using the PlantCV scale_features function. Combining VIS and NIR camera pipelines also has the added benefit of decreasing the number of steps necessary to process images from both camera types, thus increasing image processing throughput. To develop PlantCV as a sustainable project we have adopted an open, community-based development framework using GitHub as a central service for the organization of developer activities and the dissemination of information to users. no more than one email per day or week based on your preferences. Resizing values are determined by measuring the same reference object in an example image taken from both VIS and NIR cameras (for example the width of the pot or pot carrier in each image). Furthermore, to decentralize the computational resources needed for parallel processing and prepare for future integration with high-throughput computing resources that use file-in-file-out operations, results from PlantCV pipeline scripts (one per image) are now written out to temporary files that are aggregated by the parallelization tool after all image processing is complete. Machine learning: Our goal is to develop additional tools for machine learning and collection of training data. The inputs required are an image, an object mask, and a minimum distance to separate object peaks. It is important for the training dataset to be representative of the larger dataset. Version 1.0 of PlantCV (PlantCV v1.0) was released in 2015 alongside the introduction of the Bellwether Phenotyping Facility at the Donald Danforth Plant Science Center (Fahlgren et al., 2015). 5B). The Plant Image Analysis database currently lists over 150 tools that can be used for plant phenotyping (http://www.plant-image-analysis.org/; Lobet, Draye & Prilleux, 2013). In addition to overall improvements in the organization of the PlantCV project, new functionality includes a set of new image processing and normalization tools, support for analyzing images that include multiple plants, leaf segmentation, landmark identification tools for morphometrics, and modules for machine learning. As noted throughout, we see great potential for modular tools such as PlantCV and we welcome community feedback. Marker peaks calculated from the distance map that meet the minimum distance setting are used in a watershed segmentation algorithm (Van der Walt et al., 2014) to segment and count the objects. For the three example images, the watershed segmentation function was used to estimate the number of leaves for, (A) Automatic identification of leaf tip landmarks using the acute and acute_vertex functions (blue dots). The following are short descriptions and sample applications of new PlantCV functions. Therefore, software tools needed to process high-throughput image data need to be flexible and amenable to community input. Once images are split, they can be processed like single plant images using additional PlantCV tools (Fig. PlantCV can be used to generate binary masks for the training set using the standard image processing methods and the new output_mask function. This method can likely be used for a variety of applications, such as identifying a plant under variable lighting conditions or quantifying specific areas of stress on a plant. We used 99 training images (14 top view and 85 side view images) from a total of 6,473 images. Otsus binarization (otsu_auto_threshold; (Otsu, 1979)) is best implemented when a grayscale image histogram has two peaks since the Otsu method selects a threshold value that minimizes the weighted within-class variance. We currently use the Pixel Inspection Tool in ImageJ (Schneider, Rasband & Eliceiri, 2012) to collect samples of pixel RGB values used to generate the training text file. An example VIS/NIR dual pipeline to follow can be accessed online (http://plantcv.readthedocs.io/en/latest/vis_nir_tutorial/). Additionally, the identification of landmark points should be repeatable and reliable across subjects while not altering their topological positions relative to other landmark positions (Bookstein, 1991). In addition to the median_blur function included in PlantCV v1.0, we have added a Gaussian blur smoothing function to reduce image noise and detail. Pixel-level segmentation of images into two or more classes is not always straightforward using traditional image processing techniques. An example of one such application is the landmark_reference_pt_dist function. Here we present the details and rationale for major developments in the second major release of PlantCV. For example, identification of petals can be used to measure flowering time, but petal color can vary by species. We found that the plant pixel area calculated by naive Bayes was highly correlated with that calculated from pipelines that use thresholding for both side-view images (R2=0.99; Fig. The Bellwether Phenotyping Facility has both RGB visible light (VIS) and near-infrared (NIR) cameras, and images are captured 1min apart (Fahlgren et al., 2015). The scale that might be considered high-throughput for root phenotyping might not be the same for shoot phenotyping, which can be technically easier to collect depending on the trait and species. TypoMissing or incorrect metadataQuality: PDF, figure, table, or data qualityDownload issuesAbusive behaviorResearch misconductOther issue not listed above. To extend PlantCV beyond quantification of size-based morphometric features, we developed several landmarking functions. (B) Geometrically homologous semi/pseudo-landmarks across both the, Correlation between plant area in pixels (px) detected using thresholding pipelines (. A pipeline can be as long or as short as it needs to be, allowing for maximum flexibility for users using different imaging systems and analyzing features of seed, shoot, root, or other plant systems. PlantCV contains a library of modular Python functions that can be assembled into simple sequential or branching/merging processing pipelines. Further segmentation can also be done using the average pixel values output (pt_vals) for each pseudo-landmark, which estimates the mean pixel intensity within the convex hull of each acute region based on the binary mask used in the analysis. The field of digital plant phenotyping is at an exciting stage of development where it is beginning to shift from a bottleneck to one that will have a positive impact on plant research, especially in agriculture. Here, we focus on the software tools required to nondestructively measure plant traits through images. The middle position within each cluster of acute points is then identified for use as a pseudo-landmark (Fig. While creating multiple regions of interest (ROI) to demarcate each area containing an individual plant/target is an option, we developed two modules, cluster_contours and cluster_contours_split_img, that allow contours to be clustered and then parsed into multiple images without having to manually create multiple ROIs (Fig. The crop_position_mask function is then used to adjust the placement of the VIS mask over the NIR image and to crop/adjust the VIS mask so it is the same size as the NIR image. The major challenge with analyzing multiple plants in an image is successfully identifying individual whole plants as distinct objects. We thank Katie Liberatore and Shahryar Kianian for images of wheat (Triticum aestivum L.). Suggestions on how to approach image analysis with PlantCV, in addition to specific tutorials, are available through online documentation (http://plantcv.readthedocs.io/en/latest/analysis_approach/). Here we define high-throughput as thousands or hundreds of thousands of images per dataset. An alternative approach to using a fixed, global threshold for image segmentation is to use an auto-thresholding technique that either automatically selects an optimal global threshold value or introduces a variable threshold for different regions in an image. The vocabulary used can be easily updated to accommodate future community standards. Triangle, Otsu, mean, and Gaussian auto-thresholding functions were added to PlantCV to further improve object detection when image light sources are variable. The plant object is divided up into twenty equidistant bins, and the minimum and maximum extent of the object along the axis and the centroid of the object within each bin is calculated. For mean adaptive thresholding, the threshold of a pixel location is calculated by the mean of surrounding pixel values; for Gaussian adaptive thresholding, the threshold value of a pixel is the weighted sum of neighborhood values using a Gaussian window (Gonzalez & Woods, 2002; Kaehler & Bradski, 2016). Suxing Liu and Argelia Lorence contributed to the research described while working at the University of Arkansas. Compared to VIS images, NIR images are grayscale with much less contrast between object and background. Systems for collecting image data in conjunction with computer vision techniques are a powerful tool for increasing the temporal resolution at which plant phenotypes can be measured non-destructively. Future releases of PlantCV may incorporate additional strategies for detection and identification of plants, such as arrangement-independent K-means clustering approaches (Minervini, Abdelsamea & Tsaftaris, 2014). PlantCV v2.1 is archived on Zenodo at https://doi.org/10.5281/zenodo.1035894. Both the median and Gaussian blur methods are implemented using the OpenCV library (Bradski, 2000) and are typically used to smooth a grayscale image or a binary image that has been previously thresholded. The term high-throughput is relative to the difficulty to collect the measurement. To improve the pipeline and function development process in PlantCV v2, the debugging system was updated to allow for seamless integration with the Juptyer Notebook system (http://jupyter.org/; Kluyver et al., 2016). We aim to periodically publish updates, such as the work presented here, to highlight the work of contributors to the PlantCV project. The cluster_contour_split_img function also checks that there are the same number of names as objects. (A) An image produced by cluster_contours in debug mode highlights plants by their cluster group with unique colors on a sequential scale. Mean and Gaussian thresholding are executed by indicating the desired threshold type in the function adaptive_threshold. The mean and Gaussian methods will produce a variable local threshold where the threshold value of a pixel location depends on the intensities of neighboring pixels. 6). A random sample of 10% of the foreground pixels and the same number background pixels are used to build the PDFs. First, GitHub was used as a platform to organize the community by integrating version control, code distribution, documentation, issue tracking, and communication between users and contributors (Perez-Riverol et al., 2016). Because standards for data collection and management for plant phenotyping data are still being developed (Pauli et al., 2016), image metadata is often stored in a variety of formats on different systems. Computational tools that are flexible and extendable are needed to address the diversity of plant phenotyping problems. When specified a priori, landmarks should be assigned to provide adequate coverage of the shape morphology across a single dimensional plane (Bookstein, 1991). (A) Probability density functions (PDFs) from the plantcv-train.py script that show hue, saturation, and value color channel distributions of four classes estimated from training data. Such semi/pseudo-landmarking strategies have been utilized in cases where traditional homologous landmark points are difficult to assign or poorly represent the features of object shape (Bookstein, 1997; Gunz, Mitteroecker & Bookstein, 2005; Gunz & Mitteroecker, 2013). The cluster_contours function takes as input: an image, the contours that need to be clustered, a number of rows, and a number of columns. "Following" is like subscribing to any updates related to a publication. (B) Overview of the structure of the SQLite database. For example, if there are large fluctuations in light intensity throughout the day or plant color throughout the experiment, the training dataset should try to cover the range of variation. Unit test coverage of the PlantCV Python package is monitored through the Coveralls service (https://coveralls.io/), which provides a report on which parts of the code are covered by existing unit tests. In addition to the code, the PlantCV documentation was enhanced to use a continuous documentation framework using the Read the Docs service (https://readthedocs.org/), which allows documentation to be updated automatically and versioned in parallel with updates to PlantCV. In PlantCV v1.0, image analysis parallelization was achieved using a Perl-based multi-threading system that was not thread-safe, which occasionally resulted in issues with data output that had to be manually corrected. tritici) were acquired with a flatbed scanner. 5A) and top-view images (R2=0.96; Fig. Photo credit: Katie Liberatore and Shahryar Kianian. The triangle_auto_threshold function implements the method developed by Zack, Rogers & Latp (1977). Segmented objects are visualized in different colors, and the number of segmented objects is reported (Fig. To address this, PlantCV v2 contains functions to identify anatomical landmarks based upon the mathematical properties of object contours (Type II) and non-anatomical pseudo-landmarks/semilandmarks (Type III), as well as functions to rescale and analyze biologically relevant shape properties (Bookstein, 1991; Bookstein, 1997; Gunz, Mitteroecker & Bookstein, 2005; Gunz & Mitteroecker, 2013). The watershed_segmentation function can be used to estimate the number of leaves for plants where leaves are distinctly separate from other plant structures (e.g., A. thaliana leaves are separated by thin petioles; Fig. Further documentation for using PlantCV can be found at the project website (http://plantcv.danforthcenter.org/). Scripts used for image and statistical analysis are available on GitHub at https://github.com/danforthcenter/plantcv-v2-paper. The extent of image blurring can be modified by increasing (for greater blur) or decreasing the kernel size (which takes only odd numbers; commonly, 33) or by changing the standard deviation in the X and/or Y directions. Therefore, several functions were added to allow the plant binary mask that results from VIS image processing pipelines to be resized and used as a mask for NIR images. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript. We These sixty points located along each axis possess the properties of semi/pseudo-landmark points (an equal number of reference points that are approximately geometrically homologous between subjects to be compared) that approximate the contour and shape of the object (Fig.

Sitemap 33