April 11, 2020
This article is very much unlike any other article you can find on this website. Photography only plays a secondary role for this is about metadata and statistics.
I wanted to do this project for a while now but I was missing the capabilities. The basic idea is to gather all the metadata of all the images I have on my hard drive together and then process them to get some deeper insights into properties that normally remain unseen. Common image editing software unfortunately lacks the option to export all the metadata into a single file so I wasn't sure how to accomplish this.
A few days ago I got the idea to write a little Python script that would find all the images on my drive, extract the data and consolidate it into a single CSV file. After a few hours of programming the script was ready and a single file containing the metadata of all my 7500 images created.
In this article I'd like to show some of the insight I gained from analyzing this file, because I find the results to be quite interesting and worth sharing.
Data structure and methodology
Before we delve into the results I'd like to briefly show what the CSV file looks like and what kind of information it contains.
File Name;Capture Time;Camera;Lens;Focal Length;Shutter Speed;Aperture;ISO;Exp. Bias;Exposure Program;Flash Mode;Drive Mode;White Balance;Metering Mode;Self Timer;Live View;Macro Magnification;Keywords;Latitude;Longitude;Altitude;Rating
5D3_67972.CR2;07.09.2018 17:33;EOS 5D Mark III;EF24-70mm f/2.8L II USM;39;0.008;5.6;100;0.666666667;Aperture Priority;Flash Not Fired;Single Or Timer;Manual;Pattern;0;Off;100;Landscape, Seaside, Summer, Travel;38.77804833;-9.497573167;138.5;
5D3_67990.CR2;07.09.2018 18:18;EOS 5D Mark III;EF24-70mm f/2.8L II USM;47;30;8;100;0;Manual;Flash Not Fired;Single Or Timer;Manual;Pattern;0;On;99;Landscape, Seaside, Summer, Travel;38.77743981;-9.496986231;66;
5D3_68013.CR2;10.09.2018 08:05;EOS 5D Mark III;EF16-35mm f/2.8L II USM;20;0.02;9;100;0.666666667;Aperture Priority;Flash Not Fired;Single Or Timer;Manual;Pattern;0;Off;99;Travel, Urban;38.57242348;-7.9071135;365.5;
5D3_68228.CR2;11.09.2018 07:59;EOS 5D Mark III;EF500mm f/4L IS II USM +2x III;1000;0.0025;8;200;1;Shutter Priority;Flash Not Fired;Unknown;Auto;Pattern;0;Off;85;Animals, BIF, Birds, Travel, Wildlife;37.02459814;-8.011784857;62;
The above example shows the extracted fields and and the data for four example images. While this looks rather confusing in this form, it is easily imported into a spreadsheet and then analyzed. As you can see the main fields/properties I'm interested in are:
The first few properties origin directly from the raw image file that is written by the camera (EXIF data), while the keywording, position information and rating were extracted from the XMP file (editing information) that is generated by my image editing software (IPTC data).
Data overview table
|Total amount of data sets/images:||7031|
|Data extraction date:||07.02.2020|
|Number of images with keywords:||6825|
|Number of images with location data:||6759|
Most photographers are gearheads - always wondering what glass to buy next. That's why I'm very interested in what lenses and focal lengths I actually use.
Unsurprisingly the general purpose standard zoom lens takes first place in my overall lens use. In general, wide angle lenses seem to be more important to me than telephotos, which can probably be explained by the fact that I do predominantly landscape photography. I was surprised that my 100mm Macro gets so little use overall since it is one of my absolute favorite lenses. The flexibility of the 70-200 usually wins over the image quality and wider aperture of the 100 Macro. The nifty fifty is a nice lens if you want to go light but it doesn't really hold a candle to L glass, so unsurprisingly I leave it at home most of the time.
Even more interesting is the lens use over time:
Looking only at 2012, over half of my images were taken with my only wide angle lens at the time - the 16-35. After my 2014 24-70 purchase, that lens took over most of the wide angle segment and was responsible of about 60% of all images taken in 2019. I used to cover the wide angle to standard focal length range with the 16-35 & 50 and I remember that i wasn't really missing anything but the comfort of a standard zoom is just decisive.
The 70-200 is just a workhorse for any needs in the light to medium telephoto range. This shows in it's constant use over the years.
I was also surprised to see that I didn't use the 500 at all in 2019. I didn't take it on any trips and obviously that shows. I'm planning to use it more again this year.
This diagram shows quite clearly why I was happy with only the 16-35 and the 70-200. Excluding the super telephoto range almost 90% of my images fall into the focal range of those lenses. It is just the convenience and the great image quality that makes me use the 24-70 so much. If you want to know more about the gear I'm using please refer to my gear page.
The next thing I'd like to look at is the holy trinity of camera settings: aperture, shutter speed and ISO.
The aperture value is usually the single most important setting for slow moving and non moving subjects. It has the largest influence on image sharpness and depth of field (DOF). The leading apertures I use are the maximum apertures of my lenses. This is especially true for the 70-200 f/4 and the 500 f/4 since they do not noticeably improve when stopped down. My f/2.8 lenses do improve (especially the 16-35!) so I use them stopped down if the light levels allow and if I do not try to achieve shallow DOF.
A special aperture for the 5D Mark III is its diffraction limited aperture - f/11. This means that the diffraction disks after f/11 become larger than the 5D3's pixel size and therefore decrease image quality. For that reason only about 1% of all my images were taken at an aperture narrower than f/11. If you want to know more about diffraction, there is a great resource available here.
Now that's a really nice and insightful graph. In general, best possible image quality is achieved through using the longest possible shutter speed (= maximum amount of light/information) before overexposing the image and therefore losing highlights. While the maximum aperture of a lens is limited, the shutter speed can be as long as desired. Practical considerations limit the shutter speed however and those limits are all visible in the diagram above. Let's take a closer look.
Shutter speeds shorter than 1/4000s are of more of an experimental nature or for extremely bright ambient conditions, since it is possible to freeze almost any action with 1/4000s. The first real 'longest possible shutter speed' is 1/2000s - that is my go-to for freezing most fast action such as birds in flight. At 1/200s, slow movement can be photographed very well - I'm thinking of people as a prime example here. After 1/30s a huge drop-off in quantity occurs. This is, as you have guessed, by the limitations the human body poses - aka. handhold ability.
Most of the images from 1/25 to 0.3s came to be when I was able to stabilize the camera in at least one axis - by resting my elbows on a stable surface or leaning against a wall or pillar for example. After that, it is pretty much stable platform only which in most cases means tripod. Now, if you have the opportunity to use the camera on a stable platform, the shutter speed, in many cases, turns into a non-issue and the 'as long as possible' rule dictates to go towards 30s. That is the maximum shutter speed most cameras (including mine) allow to set. Of course it is possible to go even longer with a bulb exposure but that comes with its very own set of trade-offs - long exposure noise being the most important one.
The selection of the correct ISO is really no science - just use the minimum number possible. As we just saw, the shutter speed and aperture are limited for practical or physical reasons respectively. If the image will still be too dark when those are maxed out, higher ISO needs to come to the rescue. Most of the time a longer shutter speed will work, which explains the overwhelming amount of ISO 100 images in my portfolio. Reduced image quality is always the consequence for choosing anything other than that, but in reality this becomes noticeable only at ISO 800 and above (considering an eight year old camera and highest image quality expectations - your limit may differ).
The full ISO stops (100;200;400;800;...) are used much more often than the in-betweeners since I only set full stops manually. All in-betweeners were set by the camera in Auto-ISO mode. Looking at the above figure it is clear to see that the full ISO stops follow an asymptotic curve towards zero with an outlier at ISO 400. That outlier is explained by the preferred use of ISO 400 for flash photography.
When attaching a flash, the camera always switches to ISO 400 in Auto-ISO mode. That happens for good reason since the ISO and flash power/range have a very special relationship. Doubling the ISO doubles the power of the flash, so using ISO 400 means having four times the flash power (or having four flashes for that matter) while sacrificing virtually no image quality. This is also true with the aperture, but of course the maximum aperture is limited. The shutter speed in contrast has basically no impact on flash power. The flash is only active for a very short period of time (think 1/10,000s - 1/1000s) so using a longer shutter speed only means exposing into darkness which doesn't impact the part of the image primarily illuminated by flash. This can (or rather should) be used as an advantage, since it is possible to precisely set the ratio of foreground (flash) to background (ambient) exposure by changing the shutter speed in flash photography.
But the intricacies of flash photography could be a topic for a different article and is not really what I want to talk about here. What I would like to show is my use of exposure compensation.
If the term exposure compensation doesn't ring a bell for you, let me quickly explain what it's about. Every camera has some way of determining the required exposure for each scene - you could say it 'wants' to use a certain amount of exposure. By setting an exposure compensation value the photographer tells the camera to offset the automatically calculated exposure by that value. If an EV of +1 is set for example the camera will use an exposure twice as bright (+1 stop) as it normally would.
Expose to the right (ETTR) is one of my core photographic principles. I find it interesting how the above figure almost follows a Gaussian distribution around the center of +2/3 with a big outlier at 0. My standard EV setting is indeed +2/3, and only if that doesn't work I correct to either direction. The Canon metering algorithm is very conservative and makes sure not to blow out highlights. Most of the time this wastes a lot of valuable information through underexposure and that should be corrected accordingly. In the end, the photographer is responsible for a correct exposure, not the camera algorithm.
As already mentioned the aperture is probably the most important single setting. That's why it does not come as a surprise that Aperture Priority is by far my most used exposure program. For those action situations where a fast shutter speed is what counts, Shutter Priority is the go-to. Often times it is easiest to determine the optimal exposure and then just use this specific exposure without relying at the exposure meter at all - time to put the camera to Manual. Manual is also a great option when time is not an issue, but reproducibility is key, like it is often the case with tripod shooting.
Now this one is just for fun. I really like having all the GPS data of my photos available for a variety of reasons. Being able to create this diagram is more of a side effect. A more straightforward way to use the location information would be like in the following pictures.
Another property I found interesting to explore was the time of day where pictures are taken:
This diagram shows that if you enjoy a good nights sleep and a relaxing breakfast, landscape photography is probably not for you. Photographing humans is in that case probably more up your alley since that seems to happen more in the afternoon hours.
It is possible to assign a subjective rating to each image. I am using this feature predominantly to decide which images go into the 'Best of' category and which ones I'd like to show in my galleries.
I think the secret of appearing as a competent photographer is to only show your best images. I currently consider slightly more than 4% of my images to be actually good.
The amount of data and insight that can be extracted from this metadata file is of course much greater than the few examples I have presented here. If you have any particular ideas what else would be interesting to evaluate, let me know.
I will make it a point to update this article every once in a while when a sufficiently large amount of new photos have been added to my database.
I had a lot of fun exploring the data and creating this piece - I hope you gained some insight or at least entertainment reading it. If you'd like to analyze your own images in a similar way I am happy to share my Python script (a certain level of familiarity with Python will still be required) - just shoot me a mail.
Thanks for reading!