Getting total size occupied by files by extension.
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page
- Plusnet Community
- :
- Forum
- :
- Other forums
- :
- Tech Help - Software/Hardware etc
- :
- Getting total size occupied by files by extension.

Getting total size occupied by files by extension.
22-04-2019 4:15 PM
- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Report to Moderator
I was in need of finding out what files were taking the most space on my hard disk so I wrote the following program to do it and I though I share it here in case others find it useful.
The code is pure STL C++ so it will also run on Mac and Linux if you want.
#include <chrono> #include <filesystem> #include <iomanip> #include <iostream> #include <map> #include <numeric> #include <sstream> // Create alias for std::filesystem namespace fs = std::filesystem; // Define the container used to store the stats. Extension, Size, Count typedef std::map< std::string, std::pair< std::size_t, std::size_t > > StatsMap; // Enum to make the code easier to read enum Sizes : std::size_t {KiloByte = 0x3e8, MegaByte = 0xf4240, GigaByte = 0x3b9aca00 }; // Helper method to print a formatted size std::string printSize(const std::size_t& size) { std::stringstream ss; if (size >= GigaByte) { ss << (size / GigaByte) << 'G'; } else if (size >= MegaByte) { ss << (size / MegaByte) << 'M'; } else if (size >= KiloByte) { ss << size / KiloByte << 'K'; } else { ss << size << 'B';} return ss.str(); } // Method used to populate the map with the file stats StatsMap getStats(const fs::path& path) { StatsMap map; try { for (const auto& entry : fs::recursive_directory_iterator(path)) { const fs::path p{ entry.path() }; const fs::file_status status{ fs::status(p) }; // If it's a dir ignore it and move on if (fs::is_directory(status)) { continue; } // If there's no file extension, ignore and move on if (!p.extension().string().length()) { continue; } // Get the information on the file auto&[size_acc, count] = map[p.extension().string()]; // Add this files size to the entry for its extension size_acc += fs::file_size(p); // Increment the count of files the extension count++; } } catch (fs::filesystem_error const fe) { std::cout << "Oops! " << fe.what() << " in " << __func__ << std::endl; } return map; } // Main point of execution int main(int argc, char ** argv) { // See if there's a path on the command line, using . as a default fs::path dir{ argc > 1 ? argv[1] : "." }; // Check the location actually exists if (!fs::exists(dir)) { std::cout << "Path at " << dir << "doesn't exist!" << std::endl; } else { try { // Start the timer to see how long the operation takes std::chrono::system_clock::time_point started{ std::chrono::system_clock::now() }; // Tell the user we're busy as it may take some time std::cout << "Busy..." << std::endl; // Fill the map with file stats StatsMap map = getStats(dir); // Iterate the map and print the information on the console. for (const auto&[ext, stats] : map) { const auto&[acc_size, count] = stats; std::cout << std::setw(20) << std::left << ext << ": " << std::setw(7) << std::right << count << " files, size " << std::setw(7) << printSize(acc_size) << " avg " << printSize(acc_size / count) << " per file" << std::endl; }; // Job done, how long did it take std::chrono::milliseconds ms{ std::chrono::duration_cast<std::chrono::milliseconds>(std::chrono::system_clock::now() - started) }; std::cout << " Processing took " << ms.count() << "s with " << map.size() << " distinct extensions" << std::endl; } catch (std::exception const& e) { std::cout << "Ouch " << e.what() << " in " << __func__ << std::endl; } } }
Should you want to build this for yourself you first need to compile it using the command line:
cl /GA /Gz /EHsc /MD /favor:INTEL64 /Os /std:c++17 FileSizer.cpp
Once it’s compiled you need to link it to make an .exe from the .obj file produced above:
link /OUT:FileSizer.exe /MACHINE:X64 /SUBSYSTEM:CONSOLE FileSizer.obj
If you don’t have Visual Studio installed you can get it for free here:
Visual Studio Community Edition (Free)
or you can just install the free Microsoft SDK Toolchain from here:
Windows 10 SDK Toolchain
Once built its ready to run. Using a Windows Command prompt you can enter:
FileSizer
to process the current directory or you can provide a specific path on the command line, for example:
FileSizer ./MyStuff
When complete the output will be something like this:
… .win32 : 1 files, size 887B avg 887B per file .xaml : 2 files, size 2K avg 1K per file .xdc : 4 files, size 37K avg 9K per file .xlsm : 2 files, size 26K avg 13K per file .xlsx : 2 files, size 15M avg 7M per file .xml : 33 files, size 42K avg 1K per file .xsd : 56 files, size 1M avg 22K per file .y : 1 files, size 8K avg 8K per file .yml : 11 files, size 37K avg 3K per file .zip : 75 files, size 36M avg 491K per file Processing took 7852s with 242 distinct extensions
Caveat:
The code will fail if it encounters an error such as Access Denied so it’s not recommended to run this from the root of C I'd suggest the best place to run from would be your home directory.
Re: Getting total size occupied by files by extension.
22-04-2019 5:15 PM
- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Report to Moderator
https://windirstat.net/ does the same thing with quite a good interface
Dave

Re: Getting total size occupied by files by extension.
22-04-2019 5:44 PM
- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Report to Moderator
That's a bit overkill for my needs @daveplus
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page
- Plusnet Community
- :
- Forum
- :
- Other forums
- :
- Tech Help - Software/Hardware etc
- :
- Getting total size occupied by files by extension.