Multivariate Analysis on Image Sensor Classification and Variability (a case study on Canon, Nikon and Sony)

Please find complementary figures of this post via this link

Introduction

Most camera reviews and benchmarks on the Web carry out data analytics in different ways to provide rationale simply on the ability of one camera sensor over the other. While their effort is appreciated but there is still a gap in the current evaluations that should be filled. Camera and similar imaging devices are complicated systems which their performance depend on many intrinsic and extrinsic factors. At this stage this short article focuses on the sensor part which can be metaphorically expressed as the retina which wouldn’t produce somewhat we called “image” without a lens. While lenses are interchangeable (on most cameras) sensors up to now cannot be interchanged on conventional cameras; therefore focusing on this part of the camera anatomy is proper and sensible. There are certainly other built-in components that are involved in different ways on the chain of producing the image however they are all dependant on the signal quality coming from the sensor which will be elaborated further by multivariate statistical analysis.

Current sensor comparisons and benchmarks are mostly based on scatter plots and one to one relationships, e.g. comparing  sensors characteristics by isolating one or two sensor variable(s) and regressing X vs Y (bi-variate); for example dynamic range vs EV or sometimes putting images from different cameras side by side and trusting the human eye to judge between them. These could hinder explaining correlation in complex systems and large data matrices and are likely to miss a great deal of hidden variability in the data that cannot be explained with this approach. The inability is due to the complexity of the multiple factors that may co-vary or not, and are involved at the same time when a sensor functions. As such, conclusions based on regression plots will likely to be inaccurate in terms of interpretation, where a multivariate approach can help to reveal the latent data structure, the association and the interplay between different factors all at the same time, in other words it captures all the possible bivariate correlations within a multivariate data set in a 2D space.

Material and methods

In this article the actual dataset has been scaled for the purpose of this statistical analysis and the corresponding data presentation. They were initially extracted from Photonstophotos, upon my request with the permission of it’s author, Mr. William J. Claff ; they are raw sensor data from measured Heatmap attributes. I have added some descriptive parameters (production year, sensor area etc.) to the dataset before the analysis to help for better clarification and visualization of the sensor data.

Table-1.png

Statistical analysis of the measured sensor raw data (Table 1) were performed using a multivariate data analysis approach called Principal Components Analysis (PCA). PCA is a pattern recognition and exploratory data analysis method which is used to reduce data dimensionality and to investigate possible latent structure on the data. Prior to performing PCA, data were transformed to the mean divided by standard deviation (x̅/ σ), mean centered and cross validated using leave one out (LOO) method.

Three camera brands of Canon, Nikon and Sony that are highly debated in professional photography world especially for their image quality and their sensor similarities are selected. The lists of cameras are given in Table 2.

 

Table-2

Results and discussion

Using the available dataset from 9 sensor variables RAW values that were acquired at their base ISO (Table 1) with PCA, enabled separation between CCD, CMOS and BI-CMOS sensors groups. The PCA on the sensor data is presented in Figure 1a. The first two PCs  (PC1 = x and PC2 = y axis) which represent the score of each camera in the plot were able to explain a cumulative of 70% of sensor data variability among all camera models which shows a large portion of the information in the data, so the sensor relationships can be interpreted with a high degree of certainty.

 The first two PC axes are divided into negative and positive areas which also represent the sample score correlation or anti correlation. For example there is very high correlation between Nikon D500 and Nikon D7500 sensor characteristics on positive PC-1 area (Figure 3a). A distinct clustering between sensor form factors were made possible using the 9 sensor attributes (Figure 1a).

PCA on Sensor Format Classification

Figure 1a – PCA on Canon, Nikon and Sony camera sensors

Swarm.png

Figure 1b – 3D Graphical schematic presentation of the PCA where the best fit-line is drawn through the swarm of points at PC-1 and PC-2 space and will then be projected onto a 2 dimensional plane (Figure 1a, PC-1 and PC-2) in a way that it  represents maximum variability between data points.

The correlation and significance of the mentioned nine variables (Table 1) and their influence on the PC data structure (Figure 1a) were examined with a variable loadings plot and are presented in Figure 2 which all loading vectors run between -1 and +1.  The correlation loading plot represents the leverage and influence of each sensor parameter on clustering of sensor data. Overlaying this plot on Figures 1 will explain by which parameters or features each camera or a cluster of cameras are best explained with or why they are clustered together. This becomes clearer on Figure 3 where a bi-plot presents both loadings and scores plot at the same time.  Also the same principle for negative and positive PC area applies to the loading plot.

As an example the proximity of loading similarity of Low Light EV (LLEV) and PDRmax show that a high degree of correlation exists between theses parameters in other words these two features vary together. This also demonstrates the unique characteristic of the clustered high end full frame cameras (lower right part of the figure) in this PC 1 region where PDR and LLEV are located. The flagship cameras of this region are Nikon D5, SONY ILCE-9 and SONY ILCE-7RM3 with Canon 1DX Mark II as runner up.

PRNU also is in inverse correlation with Pitch, FWC and Sensor size; which indicates that larger PRNU values are characteristic of a sensor with smaller FWC, Pitch and sensor size (Figure 1a and 5) . Also Read Noise variations seem to be independent of Pitch by reading the PC-2 area; it also shows a positive correlation with DSNU.

Figures 1 and 5 clearly show that the more the sensors’ score shifts towards left, the less becomes their quality in their image output. A trend can also be seen between PC-2 negative and positive area where Read Noise is anti correlated to QE. The two extremes of QE sensor characteristic are Nikon D2X, Nikon D70, Canon EOS 300D vs Nikon D500, Nikon D7500, Canon Power Shot G7X and Sony DSC-RX100 M5. Although the two latter cameras have high scores on QE but they are not among the best high end cameras. This is also shown on the loading plot data where QE is not correlated to PDRmax, LLEV and Pitch; in other words a sensor with a greater QE score seems to not indicate that it also has an improved dynamic range and good low light characteristics or larger sensor.

 

Correlation Loadings Plot

Figure 2 – Correlation loadings  (correlation coefficient) of sensor parameters.  Diagonally located variables are anti-correlated (inverse correlation). Closely located variables are highly correlated (co-vary). Inner and outer ellipses explain 50% and 100% of the variance (r2 =0.5 and 1).

Loading bar PC1.jpg

Figure 2a –  Sensor variables’ correlation loadings in PC-1 which imply to improved sensor characteristics

Loading bar PC2.jpg

Figure 2b –  Sensor variables’ correlation loadings in PC-2 which mostly describes the noise characteristics of the camera sensors.

 

Sensor Score and Loadings Bi-Plot

Figure 3 – Bi-plot of camera sensors (blue coded) and the discriminating influence of sensor characteristics, (red coded) on resolving the sensor classification (loading magnitudes are greater when are farther from plot origin). Figures 3a, 3b, 3c and 3d are blow ups of this figure.

Fig3_1

Figure 3a

Fig3_2

Figure 3b

Fig3_3

Figure 3c

Fig3_4

Figure 3d

It might be proper to say that the next Nikon’s APS-C sensor would appear somewhere between Nikon D7500 and Sony ILCE-7RM2 where there is a gap with no competitor camera within this population (apart from technological challenges on developing the new sensor). With the available variables analysed in this article the new D7500 sensor needs a tweak on LLEV, PDRmax, FWC and not on the QE (Figure 4).

 

Figure 4.jpg

Figure 4 – Possible Nikon’s next move on its next APS-C, towards Sony ILCE-7RM2 sensor characteristics (see also the temporal trend on Figure – 8).

Figures 5 and 6 present a different view on the data on their sensor form factor and their bit depth. It is clear that CCD sensors are among the worst on Read Noise and PRNU relative to CMOS and BI-CMOS sensors. Example sensors of this range are Nikon D70 and Canon Power Shot G11 and G12.

 

Figure 5

Figure 5 – Sensor form factor distribution influenced by sensor parameters.

 

 

Scores Plot

Figure 6 – Sensor bit depth distribution influenced by sensor parameters’ leverage. Note that larger leverage from PDR , LLEV, FWC and sensor size are the major sources of clustering 14 bit sensors on Full F. cameras on PC-1 area (see also Figure – 2).

 

Hierarchical cluster analysis is an alternative approach that is more user friendly for those audiences who don’t want to look into much technical info and data relationship but would want to compare cameras side by side in one frame.  It provides a visualization of the proximity of camera sensors to each other.  The algorithm uses an agglomerative clustering approach which is analogous to a diagram of the relationship between leaves, tree branches and trunk and it is based on the previous components used for the PCA plot.

Figure 7 shows the camera similarities by their relative distance; the less distant their branches are to each other (smaller forks!) the more similar are their sensor profile. This becomes especially helpful when one would want to compare a camera over another or in relation to bunch of other cameras. The information that this diagram provides can save time on going through different reviews and reading long pages that usually do a one on one comparison or  present some scattered incoherent comparisons.

 

Figure 7.pngFigure 7_b.png

Figure 7_c.png

Figure 7 – Correlation tree of camera sensors. Figure (a), on the top depicts the whole cluster relationship and the following Figures (b and c) are the blow up of different clusters from Figure (a). (open saved images for a better image quality)

Conclusion

This article demonstrates the ability of multivariate classification for the identification of the latent sensor classes which can also be expanded to identify new sensors’ attributes in relation to the existing sensors by using supervised methods. This classification might also be useful for trend analytics on development of new image sensor characteristics and market trend analysis, for example based on the dataset and sensor variables tested in this work; it appears that the sensor manufacture trend is more towards improving PDR (Figure – 8). One reason (apart from resolving more shadow detail) might be to produce cameras that are more capable of shooting at difficult illumination conditions without having to couple them with expensive fast lenses.

 

Figure 8.png

Figure 8 – Trend depicting how the changes in sensor attributes developed throughout recent years of sensor manufacture (see also Figure – 1a for actual camera names for each data point).

Disregarding other camera components, this approach for sensor classification can also be a help for individuals or companies to decide which camera brand/sensor type to buy for their special purpose or whether the extra money makes the technical difference. It may also be helpful to employ similar RAW image post-processing strategies in image processing softwares for similar cameras.

It is worth to point out that more measurements from other sensor variables would probably help in better explaining the source of variability of each sensor; which in this study the model and its’ precision relied on available number of measurements carried out by Photonstophotos. Furthermore the accuracy of the measurements definitely influences the quality of the output data and the subsequent interpretation; in this instance the available Heatmap data derived from Photonstophotos helped to classify camera sensors and determine the impact of each individual sensor parameters on this classification with good confidence.

Acknowledgement

I would like to thank Mr. William J. Claff for his permission on using his Heatmaps and other data for this work.

Future works

I endeavour to expand this preliminary work in the near future on other camera sensors to investigate if the same loading variability (fingerprint) in the data structure holds.  An extension of this research could also be to apply this methodology to cover and test more sensor data from other companies and databases and also to involve further statistical methods both for image sensor classification and predictive tests.

Emmett E. Rad / January 2018

 
Donate with PayPal

 

 

6 replies

Trackbacks

  1. Multivariate Analysis on Image Sensor Classification and Variability: interesante, pero hay que dedicarle unos minutos. | fotochismes.com
  2. A brief M4/3 statistical Sensor Classification Between Panasonic and Olympus in combination with Cropped F. and Full F. Has M4/3 reached its limits? – Revealing the Latent Data Structure
  3. Complementary Figures for Canon, Nikon and Sony – Revealing the Latent Data Structure
  4. Weekly Nikon news flash #462 | Nikon Rumors
  5. Adding the Nikon Z7 to the equilibrium – Nikon next move on sensor properties showed to be as expected. – Revealing the Latent Data Structure
  6. How a global camera sensor map helps you to decide which camera to choose using supervised multivariate classification method. – Revealing the Latent Data Structure
%d bloggers like this: