Getting total size occupied by files by extension.

I was in need of finding out what files were taking the most space on my hard disk so I wrote the following program to do it and I though I share it here in case others find it useful.

The code is pure STL C++ so it will also run on Mac and Linux if you want.

#include <chrono>
#include <filesystem>
#include <iomanip>
#include <iostream>
#include <map>
#include <numeric>
#include <sstream>

// Create alias for std::filesystem
namespace fs = std::filesystem;
// Define the container used to store the stats. Extension, Size, Count
typedef std::map< std::string, std::pair< std::size_t, std::size_t > > StatsMap;
// Enum to make the code easier to read
enum Sizes : std::size_t {KiloByte = 0x3e8, MegaByte = 0xf4240, GigaByte = 0x3b9aca00 };

// Helper method to print a formatted size
std::string printSize(const std::size_t& size) {
	std::stringstream ss;
	if (size >= GigaByte) {
		ss << (size / GigaByte) << 'G';
	} else if (size >= MegaByte) {
		ss << (size / MegaByte) << 'M';
	} else if (size >= KiloByte) {
		ss << size / KiloByte << 'K';
	} else { ss << size << 'B';}
	return ss.str();

// Method used to populate the map with the file stats
StatsMap getStats(const fs::path& path) {
	StatsMap map;
	try {
		for (const auto& entry : fs::recursive_directory_iterator(path)) {
			const fs::path p{ entry.path() };
			const fs::file_status status{ fs::status(p) };
			// If it's a dir ignore it and move on
			if (fs::is_directory(status)) { continue; }
			// If there's no file extension, ignore and move on
			if (!p.extension().string().length()) { continue; }
			// Get the information on the file
			auto&[size_acc, count] = map[p.extension().string()];
			// Add this files size to the entry for its extension
			size_acc += fs::file_size(p);
			// Increment the count of files the extension
	} catch (fs::filesystem_error const fe) {
		std::cout << "Oops! " << fe.what() << " in " << __func__ << std::endl;
	return map;

// Main point of execution
int main(int argc, char ** argv) {
	// See if there's a path on the command line, using . as a default
	fs::path dir{ argc > 1 ? argv[1] : "." };
	// Check the location actually exists
	if (!fs::exists(dir)) {
		std::cout << "Path at " << dir << "doesn't exist!" << std::endl;
	} else {
		try {
			// Start the timer to see how long the operation takes
			std::chrono::system_clock::time_point started{ std::chrono::system_clock::now() };
			// Tell the user we're busy as it may take some time
			std::cout << "Busy..." << std::endl;
			// Fill the map with file stats
			StatsMap map = getStats(dir);
			// Iterate the map and print the information on the console.
			for (const auto&[ext, stats] : map) {
				const auto&[acc_size, count] = stats;
				std::cout << std::setw(20) << std::left << ext << ": " << std::setw(7) << std::right << count << " files, size " << std::setw(7) << printSize(acc_size) << " avg " << printSize(acc_size / count) << " per file" << std::endl;
			// Job done, how long did it take
			std::chrono::milliseconds ms{ std::chrono::duration_cast<std::chrono::milliseconds>(std::chrono::system_clock::now() - started) };
			std::cout << " Processing took " << ms.count() << "s with " << map.size() << " distinct extensions" << std::endl;
		catch (std::exception const& e) {
			std::cout << "Ouch " << e.what() << " in " << __func__ << std::endl;

Should you want to build this for yourself you first need to compile it using the command line:

cl /GA /Gz /EHsc /MD /favor:INTEL64 /Os /std:c++17 FileSizer.cpp

Once it’s compiled you need to link it to make an .exe from the .obj file produced above:

link /OUT:FileSizer.exe /MACHINE:X64 /SUBSYSTEM:CONSOLE FileSizer.obj

If you don’t have Visual Studio installed you can get it for free here:

Visual Studio Community Edition (Free)

or you can just install the free Microsoft SDK Toolchain from here:

Windows 10 SDK Toolchain

Once built its ready to run. Using a Windows Command prompt you can enter:


to process the current directory or you can provide a specific path on the command line, for example:

FileSizer ./MyStuff

When complete the output will be something like this:

.win32              :       1 files, size    887B avg 887B per file
.xaml               :       2 files, size      2K avg 1K per file
.xdc                :       4 files, size     37K avg 9K per file
.xlsm               :       2 files, size     26K avg 13K per file
.xlsx               :       2 files, size     15M avg 7M per file
.xml                :      33 files, size     42K avg 1K per file
.xsd                :      56 files, size      1M avg 22K per file
.y                  :       1 files, size      8K avg 8K per file
.yml                :      11 files, size     37K avg 3K per file
.zip                :      75 files, size     36M avg 491K per file
 Processing took 7852s with 242 distinct extensions


The code will fail if it encounters an error such as Access Denied so it’s not recommended to run this from the root of C I'd suggest the best place to run from would be your home directory.


Re: Getting total size occupied by files by extension. does the same thing with quite a good interface


Re: Getting total size occupied by files by extension.

That's a bit overkill for my needs @daveplus Wink