Most camera reviews and benchmarks on the Web carry out data analytics in different ways to provide rationale simply on the ability of one camera sensor over the other. While their effort is appreciated but there is still a gap in the current evaluations that should be filled. Camera and similar imaging devices are complicated systems which their performance depend on many intrinsic and extrinsic factors. At this stage this short article focuses on the sensor part which can be metaphorically expressed as the retina which wouldn’t produce somewhat we called “image” without a lens. While lenses are interchangeable (on most cameras) sensors up to now cannot be interchanged on conventional cameras; therefore focusing on this part of the camera anatomy is proper and sensible. There are certainly other built-in components that are involved in different ways on the chain of producing the image however they are all dependant on the signal quality coming from the sensor which will be elaborated further by multivariate statistical analysis.
Current sensor comparisons and benchmarks are mostly based on scatter plots and one to one relationships, e.g. comparing sensors characteristics by isolating one or two sensor variable(s) and regressing X vs Y (bi-variate); for example dynamic range vs EV or sometimes putting images from different cameras side by side and trusting the human eye to judge between them. These could hinder explaining correlation in complex systems and large data matrices and are likely to miss a great deal of hidden variability in the data that cannot be explained with this approach. The inability is due to the complexity of the multiple factors that may co-vary or not, and are involved at the same time when a sensor functions. As such, conclusions based on regression plots will likely to be inaccurate in terms of interpretation, where a multivariate approach can help to reveal the latent data structure, the association and the interplay between different factors all at the same time, in other words it captures all the possible bivariate correlations within a multivariate data set in a 2D space.
Material and methods
In this article the actual dataset has been scaled for the purpose of this statistical analysis and the corresponding data presentation. They were initially extracted from Photonstophotos, upon my request with the permission of it’s author, Mr. William J. Claff ; they are raw sensor data from measured Heatmap attributes. I have added some descriptive parameters (production year, sensor area etc.) to the dataset before the analysis to help for better clarification and visualization of the sensor data.
Statistical analysis of the measured sensor raw data (Table 1) were performed using a multivariate data analysis approach called Principal Components Analysis (PCA). PCA is a pattern recognition and exploratory data analysis method which is used to reduce data dimensionality and to investigate possible latent structure on the data. Prior to performing PCA, data were transformed to the mean divided by standard deviation (x̅/ σ), mean centered and cross validated using leave one out (LOO) method.
Three camera brands of Canon, Nikon and Sony that are highly debated in professional photography world especially for their image quality and their sensor similarities are selected. The lists of cameras are given in Table 2.
Results and discussion
Using the available dataset from 9 sensor variables RAW values that were acquired at their base ISO (Table 1) with PCA, enabled separation between CCD, CMOS and BI-CMOS sensors groups. The PCA on the sensor data is presented in Figure 1a. The first two PCs (PC1 = x and PC2 = y axis) which represent the score of each camera in the plot were able to explain a cumulative of 70% of sensor data variability among all camera models which shows a large portion of the information in the data, so the sensor relationships can be interpreted with a high degree of certainty.
The first two PC axes are divided into negative and positive areas which also represent the sample score correlation or anti correlation. For example there is very high correlation between Nikon D500 and Nikon D7500 sensor characteristics on positive PC-1 area (Figure 3a). A distinct clustering between sensor form factors were made possible using the 9 sensor attributes (Figure 1a).
The correlation and significance of the mentioned nine variables (Table 1) and their influence on the PC data structure (Figure 1a) were examined with a variable loadings plot and are presented in Figure 2 which all loading vectors run between -1 and +1. The correlation loading plot represents the leverage and influence of each sensor parameter on clustering of sensor data. Overlaying this plot on Figures 1 will explain by which parameters or features each camera or a cluster of cameras are best explained with or why they are clustered together. This becomes clearer on Figure 3 where a bi-plot presents both loadings and scores plot at the same time. Also the same principle for negative and positive PC area applies to the loading plot.
As an example the proximity of loading similarity of Low Light EV (LLEV) and PDRmax show that a high degree of correlation exists between theses parameters in other words these two features vary together. This also demonstrates the unique characteristic of the clustered high end full frame cameras (lower right part of the figure) in this PC 1 region where PDR and LLEV are located. The flagship cameras of this region are Nikon D5, SONY ILCE-9 and SONY ILCE-7RM3 with Canon 1DX Mark II as runner up.
PRNU also is in inverse correlation with Pitch, FWC and Sensor size; which indicates that larger PRNU values are characteristic of a sensor with smaller FWC, Pitch and sensor size (Figure 1a and 5) . Also Read Noise variations seem to be independent of Pitch by reading the PC-2 area; it also shows a positive correlation with DSNU.
Figures 1 and 5 clearly show that the more the sensors’ score shifts towards left, the less becomes their quality in their image output. A trend can also be seen between PC-2 negative and positive area where Read Noise is anti correlated to QE. The two extremes of QE sensor characteristic are Nikon D2X, Nikon D70, Canon EOS 300D vs Nikon D500, Nikon D7500, Canon Power Shot G7X and Sony DSC-RX100 M5. Although the two latter cameras have high scores on QE but they are not among the best high end cameras. This is also shown on the loading plot data where QE is not correlated to PDRmax, LLEV and Pitch; in other words a sensor with a greater QE score seems to not indicate that it also has an improved dynamic range and good low light characteristics or larger sensor.
It might be proper to say that the next Nikon’s APS-C sensor would appear somewhere between Nikon D7500 and Sony ILCE-7RM2 where there is a gap with no competitor camera within this population (apart from technological challenges on developing the new sensor). With the available variables analysed in this article the new D7500 sensor needs a tweak on LLEV, PDRmax, FWC and not on the QE (Figure 4).
Figures 5 and 6 present a different view on the data on their sensor form factor and their bit depth. It is clear that CCD sensors are among the worst on Read Noise and PRNU relative to CMOS and BI-CMOS sensors. Example sensors of this range are Nikon D70 and Canon Power Shot G11 and G12.
Hierarchical cluster analysis is an alternative approach that is more user friendly for those audiences who don’t want to look into much technical info and data relationship but would want to compare cameras side by side in one frame. It provides a visualization of the proximity of camera sensors to each other. The algorithm uses an agglomerative clustering approach which is analogous to a diagram of the relationship between leaves, tree branches and trunk and it is based on the previous components used for the PCA plot.
Figure 7 shows the camera similarities by their relative distance; the less distant their branches are to each other (smaller forks!) the more similar are their sensor profile. This becomes especially helpful when one would want to compare a camera over another or in relation to bunch of other cameras. The information that this diagram provides can save time on going through different reviews and reading long pages that usually do a one on one comparison or present some scattered incoherent comparisons.
This article demonstrates the ability of multivariate classification for the identification of the latent sensor classes which can also be expanded to identify new sensors’ attributes in relation to the existing sensors by using supervised methods. This classification might also be useful for trend analytics on development of new image sensor characteristics and market trend analysis, for example based on the dataset and sensor variables tested in this work; it appears that the sensor manufacture trend is more towards improving PDR (Figure – 8). One reason (apart from resolving more shadow detail) might be to produce cameras that are more capable of shooting at difficult illumination conditions without having to couple them with expensive fast lenses.
Disregarding other camera components, this approach for sensor classification can also be a help for individuals or companies to decide which camera brand/sensor type to buy for their special purpose or whether the extra money makes the technical difference. It may also be helpful to employ similar RAW image post-processing strategies in image processing softwares for similar cameras.
It is worth to point out that more measurements from other sensor variables would probably help in better explaining the source of variability of each sensor; which in this study the model and its’ precision relied on available number of measurements carried out by Photonstophotos. Furthermore the accuracy of the measurements definitely influences the quality of the output data and the subsequent interpretation; in this instance the available Heatmap data derived from Photonstophotos helped to classify camera sensors and determine the impact of each individual sensor parameters on this classification with good confidence.
I would like to thank Mr. William J. Claff for his permission on using his Heatmaps and other data for this work.
I endeavour to expand this preliminary work in the near future on other camera sensors to investigate if the same loading variability (fingerprint) in the data structure holds. An extension of this research could also be to apply this methodology to cover and test more sensor data from other companies and databases and also to involve further statistical methods both for image sensor classification and predictive tests.
Emmett E. Rad / January 2018