Min Joha,
Surjith Kumarana,
Younseo Shina,
Hyunji Chaa,
Euna Oha,
Kyu Hyoung Leeb and
Hyo-Jick Choi
*a
aDepartment of Chemical and Materials Engineering, University of Alberta, Edmonton, AB T6G 1H9, Canada. E-mail: hyojick@ualberta.ca
bDepartment of Materials Science and Engineering, Yonsei University, Seoul 03722, Republic of Korea
First published on 20th January 2025
Non-destructive color sensors are widely applied for rapid analysis of various biological and healthcare point-of-care applications. However, existing red, green, blue (RGB)-based color sensor systems, relying on the conversion to human-perceptible color spaces like hue, saturation, lightness (HSL), hue, saturation, value (HSV), as well as cyan, magenta, yellow, key (CMYK) and the CIE L*a*b* (CIELAB) exhibit limitations compared to spectroscopic methods. The integration of machine learning (ML) techniques presents an opportunity to enhance data analysis and interpretation, enabling insights discovery, prediction, process automation, and decision-making. In this study, we utilized four different regression models integrated with an RGB sensor for colorimetric analysis. Colorimetric protein concentration assays, such as the bicinchoninic acid (BCA) assay and the Bradford assay, were chosen as model studies to evaluate the performance of the ML-based color sensor. Leveraging regression models, the sensor effectively interprets and processes color data, facilitating precision color detection and analysis. Furthermore, the incorporation of diverse color spaces enhances the sensor's adaptability to various color perception models, promising precise measurement, and analysis capabilities for a range of applications.
Biologists and chemists often utilize color to track reactions of interest. However, due to the lower accuracy and precision of visual colorimetry, instruments like spectrophotometer are typically needed for quantitative data. Spectrophotometers provide more accurate color change resolution and are preferred for concentration determination applications.8 These colorimetric tests use spectroscopic absorbance measurements and a calibration curve to determine analyte concentration accurately.9 However, visual colorimetry methods encounter significant operational limitations on-site, including user interpretation errors and environmental inconsistencies, leading to unreliable outcomes. Currently, digital cameras, smartphones, and scanners, can be used for image capturing, but their accuracy of color information is affected by ambient light changes and built-in automatic image correction.10,11 Traditional human visual assessment is limited, as the human eye can struggle to discern subtle changes accurately and consistently, especially when the changes indicate the presence of small amounts of an analyte. Human error, combined with external factors like lighting, temperatures, and sample distance, complicates the reliability of these readings. Converting these color changes into quantitative data is challenging due to subjective and error-prone quantifiable color indices.12
Using controller boards is deemed a simpler alternative for building automated systems for rapid execution.13 Color sensors offer greater quality, portability, do-it-yourself (DIY) capabilities, and cost-effectiveness compared to spectrophotometers, encouraging their adoption over more expensive spectrometers.14 In many colorimetric procedures, color information is often described using color spaces like red, green, blue (RGB).15 However, RGB has been found to be complex in terms of human perception.16 RGB lacks perceptual uniformity and intuitive control over hue, saturation, and brightness. Additionally, since RGB values depend on device characteristics, colors displayed on various devices can differ, even when using identical RGB values. Accurate readings and reproducible assessment of colors, both qualitatively and quantitatively, are crucial.17 Perceptually uniform spaces like cyan, magenta, yellow, key (CMYK), CIELAB (L*a*b*) and approximately uniform spaces such as hue, saturation, lightness (HSL) and hue, saturation, value (HSV) offer more intuitive color representation, where measured differences reflect human perception. HSV and HSL separate hue from intensity, aiding in color recognition, while CIELAB provides high accuracy for colorimetric analysis. Combining RGB's practicality with the perceptual advantages of other color spaces ensures efficient and accurate color quantification. Colorimetric sensors have individual sensors for red, green, and blue, each detecting colors specific light wavelengths. These sensors operate within a frequency range of 2 Hz to 500 kHz and convert the detected values into a scale from 0 to 255. By merging these values, the appropriate color code can be obtained.18 To mitigate uncertainties in human vision, digital colorimetry requires image calibration algorithms for the visible color space. Converting the RGB system to human-perceptible color spaces like HSL or HSV addresses this limitation by ensuring linear changes in chroma or color intensity.19 Additionally, the implementation of machine learning (ML) techniques can enhance data analysis and interpretation by uncovering insights, making predictions, automating processes, and facilitating decision-making, especially with non-linear data.20 It has been reported that ML algorithms excel in classifying, discriminating, and predicting unknown samples by uncovering latent patterns within voluminous, noisy, or intricate datasets.21 Thus, by leveraging ML approaches, colorimetric sensor devices have been devised to offer competitive accuracy, low-cost, convenient, non-destructive methods, and enhanced colorimetric assays for chemical and biological applications. The advantages of ML algorithms include adaptability to different settings, making them applicable to sensors for real-time analysis.22,23 Despite advancements using ML technologies, color sensors still fall short of human color processing capabilities. Key drawbacks include slow speed, low identification efficiency, poor real-time performance, and limited regression models. Hence, achieving optimal performance in color sensor devices requires prioritizing precise color recognition, as the accuracy and reliability of their data depend on it.
In this work, we used an ML algorithm to replicate the human ability to recognize patterns and applied it to various color models, some based on human perception, including RGB, HSL, and HSV, as well as the CMYK, and CIELAB. Considering the closer alignment of the HSL model with human perception, it is logical to explore whether ML algorithms could benefit from adopting this color model. By utilizing the HSL model, the fabricated color sensor device takes an intuitive, human-aligned approach to identifying subtle color changes. To demonstrate the effectiveness of the HSL model in detecting saturation and hue differences, it was tested to predict protein concentration accurately. To this end, conventional protein assays such as the bicinchoninic acid (BCA) and Bradford assays were compared with image-based colorimetric measurement in the HSL color space using ML algorithms. Currently, widely employed colorimetric protein assay techniques require spectrophotometers, limiting their versatility. However, image-based colorimetric detection has emerged as a cost-effective alternative for field applications compared to traditional methods such as spectrophotometry, colorimetry, and fluorometry.24 Recognized for its ability to perform both qualitative and quantitative protein analysis, this method is highly considered one of the most promising approaches in protein assays. Given their biological significance, accurate methods for detecting, identifying, and quantifying proteins are routinely employed for diagnostic purposes in clinical settings, including proteomics, UV-vis spectrometry, electrophoresis, and immunoblotting.25 Paving the way for advancements in ML, this approach holds the potential to extend beyond the conventional RGB model, aligning more closely with human perception and interpretation of color. ML models offer a significant advantage in image-based colorimetric detection, resilience against unwanted variations.26 Current methods for detecting color changes often rely on identifying a single type of change using regression models like linear or logistic regression. In this study, we utilize four machine learning models — random forest regressor (RFR), gradient boosting regressor (GBR), support vector regressor (SVR), and multi-layer perceptron (MLP) — tailored for different datasets, with the aim of enhancing prediction accuracy by capturing linear correlations between dependent variables. MLPs, a class of artificial neural networks (ANN), have been utilized to model the CIELAB color space due to their capacity to manage non-linear relationships and complex interactions.27 RFR combines decision trees for high accuracy but can be resource intensive, achieved over 90% accuracy in predicting peroxide.28 GBR sequentially corrects errors, offering high accuracy for complex patterns with prediction errors of 10–20% in dye concentration estimation.29 SVR excels at modeling nonlinear relationships in color spaces like CIELAB, achieving mean absolute percentage error (MAPE) below 12% and low RMSE for chromium(VI) and iron(III).30 MLPs use deep neural layers to model complex relationships, delivering near-human precision in color reconstruction and outperforming traditional methods.31 Thus, multiple machine learning models help reduce prediction error, while lower ensemble complexity aids in reducing computational demands.32 Additionally, the ML framework introduces an end-to-end processing pipeline for color detection using non-RGB sensing devices such as HSL, HSV, CMYK, and CIELAB, enabling rapid detection and adaptation to diverse colorimetric applications, including detecting chemical analytes and biological assays.
The schematic of the Raspberry Pi-connected RGB device and color palette for BCA and Bradford assays is in Fig. S1.† The sensor's response to light was evaluated by fixing an LED light source to a 3D-printed support for the 96-well plate, ensuring direct radiation onto the sensor. The RGB output values from the sensor, derived from colored albumin solutions, were used to develop an algorithm for reporting RGB sensor outputs. Subsequently, coding tasks were performed, and the generated code was applied to the device. The resulting data was then compared with spectrophotometry readings for validation.
For the Bradford assay, 50 μL of standard containing BSA was prepared and placed into a 96-well plate. Then, 150 μL of Bradford assay reagent (Bio-Rad, Hercules, CA) was added to the protein standard and incubated for 10 minutes at room temperature. The absorbance was measured at 595 nm using a plate reader. The Bradford Assay mechanism relies on the binding of Coomassie Brilliant Blue G-250 dye to proteins. The dye (465 nm) interacts with proteins through hydrophobic interactions (involving residues such as phenylalanine and tryptophan) and electrostatic interactions specifically, the sulfonic group of the dye and positively charged guanidino or arginine groups. Upon protein binding, the protein-dye complex causes a shift in the dye's absorption maximum from 465 nm to 595 nm. The intensity of the absorption at 595 nm is directly proportional to the protein concentration in the solution, making it a reliable quantitative method for protein quantification. The schematic Bradford assay mechanism is shown in Fig. S3.†
Color analyses involved a database of protein concentrations using 216 BCA assay samples (0 to 200 μg mL−1). Results were evaluated with calibration curves (see Fig. S5†). The TCS3200 sensor collected RGB frequencies of samples with known and unknown protein concentrations, automatically stored in a database. To build the regression models, we utilized a set of machine learning algorithms form the scikit-learn library. Specifically, we constructed four models: such as RFR, GBR, SVR, and MLP. These models were organized into a Python dictionary to facilitate efficient management, model comparison, and iteration during training. Through the interactions of various parameters of the four machine learning models, it automatically finds the best parameter set to use for each model. Using 20% of the dataset as test samples for each protein concentration measurement, the program returns metric scores, which include mean absolute error (MAE), mean squared error (MSE), root mean squared error (RMSE), and coefficient of determination (R2) scores, to interpret the results and select the best model. The program also illustrates prediction and residual plots to visualize the results.
In Fig. 1b, frequencies for both black and white and color samples are measured using red, green, blue, and clear filters. A Kalman filter is used to improve frequency reading accuracy and reduce noise by consolidating ten post-calibration measurements into a single frequency reading, a technique commonly used in precise measurements like GPS or telecommunications.26 Five raw RGB readings are averaged during data creation, ensuring deviations between consecutive readings remain below three units for each color channel, thereby refining the precision of the RGB readings. Also, this method's real-time computational complexity (input data vs. number of operations) is efficient for mobile applications due to the matrix inversion's cubic Big O complexity (n3), where n is the state vector dimension.
The embedded color reading function converts raw frequency reading into RGB values, scaling them between 0 and 255. Frequencies below the minimum are set to 0, and those above the maximum are set to 255, with intermediate values interpolated. This ensures precis color data representation. Fig. S6a† shows a linear relationship between the sum of RGB frequencies and numbers, confirming sensor reliability. Fig. S6b† illustrates the ratio of individual colors (R, G, B) to the sum of RGB with increasing protein concentration, highlighting the sensitivity of the colorimetric method to changes in protein levels.
Data was collected and saved in five color spaces (RGB, HSL, HSV, CMYK, and CIELAB) for ML procedures. Four regression models – RFR, GBR, SVR, and MLP – were trained using optimal parameters and evaluated on a test dataset.38,39 Regression models deliver accurate color predictions by reducing color calibration errors and are suitable for real-time application on devices with limited computational resources.40,41 Those models are chosen for color sensing devices due to their efficiency in handling continuous data outputs, which is crucial for capturing subtle differences in color shades.
When transitioning to other colorimetry models, distinct trends emerge: minimal hue variations, increased saturation, and decreased lightness, aligning with expected outcomes. The relationship between RGB values and protein concentration is crucial in Bradford colorimetric analysis, affecting color intensity and distribution. The HSV color space is employed instead of RGB for better detection accuracy, as it separates light effects from color information.
The regression model was tested under ambient light conditions during the BCA assay (Fig. 2). As protein concentration increased, RGB values decreased, resulting in a darker blue color (Fig. 2a). In the HSL model, hue remained consistent while saturation increased and lightness decreased, indicating a darker color (Fig. 2b). Similarly, the HSV model showed consistent hue with increased saturation and decreased value (Fig. 2c). In Fig. 2d, the CMYK model showed consistent cyan with increasing magenta, yellow, and black values, resulting in a more intense purple (Fig. S6c†). Finally, Fig. 2e demonstrates that in the CIELAB model, lightness decreases, a* shifts positively, and b* shifts negatively, indicating a transition to a darker blue with increasing protein concentration (Fig. S7a†). These results compare the protein concentration-dependent RGB components, showing significant variance due to uneven ambient lighting. In the HSL and HSV models (Fig. S7b and c†), the hue shows minimal variation, while saturation increases and lightness decreases, as expected. Periodic dimming can introduce variances in ML predictions, especially at higher protein concentrations, where subtle color shifts cause trend divergence. Additionally, the transparency of the 96-well plate may lead to minor reading inaccuracies due to the influence of neighboring colors.
The relationship between RGB values (Fig. S8a†) and protein concentration is crucial for Bradford colorimetric analysis (Fig. 3). As protein concentration increases, changes occur in the intensity and distribution of colors captured by RGB, HSL, HSV, CMYK, and CIELAB sensors. The HSV model is preferred for its better detection accuracy in image analysis as it separate light effects from color information.
In Fig. 3a, as protein concentration increases, red decreases and blue increases, indicating a color shift from yellow to blue (Fig. S8b†). Fig. 3b shows HSL values vs. protein concentration, where hue sharply increases, indicating a yellow-to-blue shift, with saturation decreasing and then intensifying blue around 50 μg mL−1. This pattern is reflected in the HSV graph (Fig. 3c), where hue, lightness, and value increase around 50 μg mL−1 (Fig. S7a†), indicating a color shift. The CMYK values vs. protein concentration (Fig. 3d) show cyan increasing from 50 μg mL−1 (Fig. S9b†), magenta first decreasing then increasing, and yellow decreasing and stabilizing around 60 μg mL−1, indicating the yellow-to-blue transition. Finally, Fig. 3e shows CIELAB vs. protein concentration, where the b* channel decreases, signifying a transition from yellow to blue.
Comparing various color models, periodic dimming is noticeable in the RGB readings of the Bradford assay dataset (Fig. 3). Lower protein concentrations show a distinctive brown hue (higher red, lower blue values), shifting to a predominant blue shade (higher blue, lower red values) at higher concentrations. Notably, the a* channel of CIELAB exhibits a consistent trend ranging approximately from 10 to −10 (Fig. S9c†), potentially enhancing model performance. Similar to the BCA assay results, divergences in trends become more pronounced beyond approximately 200 μg mL−1, likely due to subtle variations in blue color intensity in the Bradford assay.
During the BCA assay, the sensor captured color readings ranging from 0 to 200 μg mL−1 in 4 μg mL−1 increments. The dataset included 204 measurements from four 96-sample trays, each containing 51 samples. RGB values showed periodic dimming occurrences nine times across the 96-tray setup, indicating inconsistent ambient lighting affecting each row. However, a clear trend emerged with increasing protein concentration: all three RGB values decreased. This decline led to higher saturation levels in HSV and HSL, increased magenta and black values in CMYK, and reduced lightness in CIELAB, HSL, and HSV systems.
Notably, the consistent rises in HSV saturation or CMYK magenta are expected to influence effective model training. Periodic dimming introduces prediction variances when identifying influential features for ML. The two predominant trends begin to diverge at higher protein concentrations, likely due to subtle color shifts at elevated levels. Additionally, the transparency of the 96-well tray may cause minor reading inaccuracies influenced by neighboring colors. Regression models including MLP, GBR, SVR, and RFR were trained on the dataset shown in Fig. 4. Evaluating these algorithms on the test dataset revealed strong performance for protein volumes ranging from 0 μL up to approximately 70 μL. Particularly in HSL and HSV color metrics, predictions closely matched actual values. However, for protein volumes exceeding 70 μL, result consistency diminished, likely due to factors such as high variability in the BCA assay, occasional quantification of different proteins with identical concentrations, and sensitivity to substances like salt, detergents, and reducing agents.42–44 It's noteworthy that the neural network model struggled and did not provide accurate predictions within this higher range, possibly due to nuanced color changes occurring with increasing protein volumes.
![]() | ||
Fig. 4 Scatter plots of the predicted protein concentrations from regression models (RFR, GBR, SVR, and MLP) in five different color systems, RGB (a); HSL (b), HSV (c), CYMK (d), and CIELAB (e). |
The RGB, HSL, HSV, CMYK, and CIELAB responses were plotted using various regression models to predict the hue coordinate for the mixed indicator, ranging between 0 and 45 (Fig. 4). This broad range indicates these color coordinates provide high-resolution measurements suitable for diverse regression models. Experimental data were fitted with a fourth-order polynomial curve, enabling concentration determination of unknown solutions based on RGB, HSL, HSV, CMYK, and CIELAB values obtained from the device.
Fig. 4a illustrates protein prediction using the RGB model with various regression techniques. Regression models trained on assay data achieved accurate predictions of protein concentrations. During cross-validation, RFR, GBR, and SVR demonstrated optimal performance within the RGB model, while HSL, HSV, CMYK, and CIELAB (Fig. 4b–e) also showed high performance. However, MLP deviated in fitting compared to other models. MLP, a nonlinear neural network, captures complex relationships due to its layers and activation functions but requires extensive data for effective training. Performance variations may arise from its flexibility, sensitivity to hyperparameters, and differences in feature scaling and data interpretation. These results highlight the color sensor's potential for accurately predicting protein concentrations. The performance across different color models and ML techniques underscores their applicability in various color-to-concentration applications. The TCS3200 color sensor detects red, green, blue, and overall light using respective filters, influenced by factors like ambient color temperature, reflections, surface colors, finishes, and sensor angle relative to light source. While these factors minimally alter hue values, they noticeably affect saturation and lightness values, emphasizing the need for controlled lighting systems to enhance dataset quality. Addressing signal nonlinearity within sensor devices is critical to minimizing measurement errors, often addressed through artificial neural network models.21 Furthermore, ensuring linearity between observed and expected protein concentrations enhances the reliability of bioanalytical approaches.
![]() | (1) |
![]() | (2) |
![]() | (3) |
![]() | (4) |
In protein concentration assays, RFR and GBR consistently outperform other models based on metric scores. For instance, using the RGB colorimetry model (Fig. 5a), RFR achieves an MAE of 8.1, GBR records 8.95, and SVR achieves 8.21. In comparison, MLP shows a higher MAE of 11.02. This highlights the superior performance of tree-based models like RFR and GBR in accurately detecting protein concentrations, particularly in capturing subtle color differences and intricate dataset patterns. In terms of MSE (Fig. 5b), tree-based models demonstrate exceptional accuracy with the HSL and HSV color models. For HSL, RFR and GBR achieve MSE values of 93.1 and 96.65, respectively. In the HSV model, RFR and GBR record MSE values of 85.35 and 102.49, respectively. These errors consistently remain below ±5, indicating robust performance. Similarly, the RMS error graph (Fig. 5c) reveals the lowest errors for HSL and HSV using RFR (9.64 for HSL and 9.23 for HSV) and GBR (9.83 for HSL and 10.12 for HSV). In contrast, MLP shows the highest errors (31.32 for HSV and 35.49 for HSL). The R2 score (Fig. 5d) further confirms the models' accuracy, with RFR and GBR both achieving a score of 0.96 in the HSV model, indicating that these models explain over 95% of the variance in the dataset. SVR also demonstrates strong performance, particularly with the CMYK color model, achieving an MAE of 9.51, closely comparable to RFR's 9.54. This underscores SVR's effectiveness in capturing inherent patterns in the CMYK color space. Conversely, MLP excels in the CIELAB model with an R2 of 0.96, the best among all models for this color system, highlighting the neural network's capability. Detailed error evaluations are provided in ESI Table S1.†
In our study, we used 20% of the input datasets to generate performance metrics. While increasing data volume can enhance accuracy, it also demands more resources and time. Our goal was to achieve robust performance indicators with minimal number of features. We employed a combination of MLP and various color spaces to optimize protein concentration predictions, ensuring adaptability across diverse datasets. Prioritizing regression models over classifiers allowed us to achieve precise quantification of protein concentrations. Unlike existing devices relying on a single standard concentration, our method considers a wide range of concentrations using ML techniques. Our approach offers improved detection accuracy, distinguishing our color sensor from traditional absorbance-based analyzers.
The color sensor device facilitates colorimetric-based BCA and Bradford assays, providing direct protein quantification crucial for assessing protein levels in biological samples. This can significantly contribute to disease diagnosis, monitoring, and treatment. Our sensor measures RGB signals from the BCA assay plate, demonstrating its potential to replace multiplexed analyses such as commercial microplate readers (see Table S2† for the comparison with previous work).
To assess the colorimetric sensor's performance, we conducted an evaluation using the BCA protein assay and compared protein estimation across multiple models. The sensor detected BSA concentrations ranging from 0 to 160 μg mL−1 during BCA assays and captured RGB frequencies in just 10 seconds, faster than traditional plate readers. The integrated machine learning program enabled precise measurement of RGB intensity in the 96-well plate assay, facilitating quantitative protein analysis.
Converting RGB frequency to protein concentration involves a sophisticated ML process. By correlating RGB readings with known protein concentrations, a linear regression curve is established, allowing protein concentration estimation from RGB values. The software's ML algorithms ensure precise RGB intensity measurements for each well in the 96-well plate assay, generating quantitative data for protein concentration analysis. Accuracy depends on the calibration curve quality and experimental consistency. Cross-validation (hold-out cross-validation) assesses model performance, with Fig. 6 illustrating the relationship between the number of principal components and R2, aiding in selecting the optimal components for accurate predictions. The predicted protein concentration was subsequently used to estimate the protein concentration using the BCA and Bradford assays, spanning a concentration range of 0 to 160 μg mL−1. To determine the linear range of these assays, least-squares linear regression equations were computed based on the experimental data, yielding R2 of 0.989 and 0.957 for the BCA and Bradford assays, respectively. A high R2 indicates effectiveness in fitting the model to the observed data, reflecting how well the model captures the underlying trend. The sensitivity of the assays is determined by the slope of the regression lines, yielding 1.20 ± 0.02 for BCA and 0.8673 ± 0.02 for Bradford. The predicted vs. observed plot (Fig. 6) visually represents the model's accuracy and the range of values covered by the experimental data at a 95% confidence level. Finally, using the color sensor technology, we conducted a BCA assay experiment to determine lysozyme protein concentration (see Fig. S10†). This study demonstrates that the proposed machine learning-based color sensor technology can be broadly applied to predict the concentrations of various proteins and monitor biochemical reactions.
Footnote |
† Electronic supplementary information (ESI) available: Image of manufactured prototype of the color sensor; protein quantitation curve of absorbance at 560 nm vs. protein concentration in H2O of BCA assay and Bradford assay; RGB color component values vs. frequency of color sensor output values; calibration curves for CIELAB, HSL and HSV values; sum of RGB color component values vs. frequency of color sensor outputs, ratio of individual R, G, B colors to the sum of RGB, and ratio of individual C, M, Y, K colors to the sum of CMYK with increasing protein concentration for the Bradford assay test; calibration curves for HSL values; summary of regression model error matrices of various color models; summary of previous color sensor and regression model work reported. See DOI: https://doi.org/10.1039/d4ra07510b |
This journal is © The Royal Society of Chemistry 2025 |