cancel
Showing results for 
Search instead for 
Did you mean: 

Getting total size occupied by files by extension.

Anonymous
Not applicable

Getting total size occupied by files by extension.

I was in need of finding out what files were taking the most space on my hard disk so I wrote the following program to do it and I though I share it here in case others find it useful.

The code is pure STL C++ so it will also run on Mac and Linux if you want.

#include <chrono>
#include <filesystem>
#include <iomanip>
#include <iostream>
#include <map>
#include <numeric>
#include <sstream>

// Create alias for std::filesystem
namespace fs = std::filesystem;
// Define the container used to store the stats. Extension, Size, Count
typedef std::map< std::string, std::pair< std::size_t, std::size_t > > StatsMap;
// Enum to make the code easier to read
enum Sizes : std::size_t {KiloByte = 0x3e8, MegaByte = 0xf4240, GigaByte = 0x3b9aca00 };

// Helper method to print a formatted size
std::string printSize(const std::size_t& size) {
	std::stringstream ss;
	if (size >= GigaByte) {
		ss << (size / GigaByte) << 'G';
	} else if (size >= MegaByte) {
		ss << (size / MegaByte) << 'M';
	} else if (size >= KiloByte) {
		ss << size / KiloByte << 'K';
	} else { ss << size << 'B';}
	return ss.str();
}

// Method used to populate the map with the file stats
StatsMap getStats(const fs::path& path) {
	StatsMap map;
	try {
		for (const auto& entry : fs::recursive_directory_iterator(path)) {
			const fs::path p{ entry.path() };
			const fs::file_status status{ fs::status(p) };
			// If it's a dir ignore it and move on
			if (fs::is_directory(status)) { continue; }
			// If there's no file extension, ignore and move on
			if (!p.extension().string().length()) { continue; }
			// Get the information on the file
			auto&[size_acc, count] = map[p.extension().string()];
			// Add this files size to the entry for its extension
			size_acc += fs::file_size(p);
			// Increment the count of files the extension
			count++;
		}
	} catch (fs::filesystem_error const fe) {
		std::cout << "Oops! " << fe.what() << " in " << __func__ << std::endl;
	}
	return map;
}

// Main point of execution
int main(int argc, char ** argv) {
	// See if there's a path on the command line, using . as a default
	fs::path dir{ argc > 1 ? argv[1] : "." };
	// Check the location actually exists
	if (!fs::exists(dir)) {
		std::cout << "Path at " << dir << "doesn't exist!" << std::endl;
	} else {
		try {
			// Start the timer to see how long the operation takes
			std::chrono::system_clock::time_point started{ std::chrono::system_clock::now() };
			// Tell the user we're busy as it may take some time
			std::cout << "Busy..." << std::endl;
			// Fill the map with file stats
			StatsMap map = getStats(dir);
			// Iterate the map and print the information on the console.
			for (const auto&[ext, stats] : map) {
				const auto&[acc_size, count] = stats;
				std::cout << std::setw(20) << std::left << ext << ": " << std::setw(7) << std::right << count << " files, size " << std::setw(7) << printSize(acc_size) << " avg " << printSize(acc_size / count) << " per file" << std::endl;
			};
			// Job done, how long did it take
			std::chrono::milliseconds ms{ std::chrono::duration_cast<std::chrono::milliseconds>(std::chrono::system_clock::now() - started) };
			std::cout << " Processing took " << ms.count() << "s with " << map.size() << " distinct extensions" << std::endl;
		}
		catch (std::exception const& e) {
			std::cout << "Ouch " << e.what() << " in " << __func__ << std::endl;
		}
	}
}

Should you want to build this for yourself you first need to compile it using the command line:

cl /GA /Gz /EHsc /MD /favor:INTEL64 /Os /std:c++17 FileSizer.cpp

Once it’s compiled you need to link it to make an .exe from the .obj file produced above:

link /OUT:FileSizer.exe /MACHINE:X64 /SUBSYSTEM:CONSOLE FileSizer.obj

If you don’t have Visual Studio installed you can get it for free here:

Visual Studio Community Edition (Free)

or you can just install the free Microsoft SDK Toolchain from here:

Windows 10 SDK Toolchain

Once built its ready to run. Using a Windows Command prompt you can enter:

FileSizer

to process the current directory or you can provide a specific path on the command line, for example:

FileSizer ./MyStuff


When complete the output will be something like this:

…
.win32              :       1 files, size    887B avg 887B per file
.xaml               :       2 files, size      2K avg 1K per file
.xdc                :       4 files, size     37K avg 9K per file
.xlsm               :       2 files, size     26K avg 13K per file
.xlsx               :       2 files, size     15M avg 7M per file
.xml                :      33 files, size     42K avg 1K per file
.xsd                :      56 files, size      1M avg 22K per file
.y                  :       1 files, size      8K avg 8K per file
.yml                :      11 files, size     37K avg 3K per file
.zip                :      75 files, size     36M avg 491K per file
 Processing took 7852s with 242 distinct extensions

Caveat:

The code will fail if it encounters an error such as Access Denied so it’s not recommended to run this from the root of C I'd suggest the best place to run from would be your home directory.

 

2 REPLIES 2
daveplus
Pro
Posts: 630
Thanks: 132
Fixes: 10
Registered: ‎25-08-2010

Re: Getting total size occupied by files by extension.

https://windirstat.net/ does the same thing with quite a good interface

Dave

Anonymous
Not applicable

Re: Getting total size occupied by files by extension.

That's a bit overkill for my needs @daveplus Wink