cancel
Showing results for 
Search instead for 
Did you mean: 

Printing a PDF version of a complete website

shermans
Pro
Posts: 1,272
Thanks: 94
Fixes: 3
Registered: ‎07-09-2007

Printing a PDF version of a complete website

I would be most grateful of how to make a PDF file of a website which consists of 150 mb of documents.  There is a thorough structure of the documents, starting with an Index of 8 Sections.  Each Section consists of individual PDF files, all contained within sub-folders, each sub-folder having its own Index to the PDF files concerned.

As a website, it is extremely easy to navigate to each PDF file.  Some sub-folders have perhaps five PDF files in them , others as many as 50 PDF files.  I guess the average would be about 10 per sub-folder.

The only way that I can think of doing it is to open every Index and print the page.  Then open the Index and save all the individual PDF files subject by subject.  Having worked through the entire website, I would then have to combine all these documents into one large PDF file.  It would be a fairly laborious job.

It is so easy to navigate on screen but there is a requirement for a consolidated version as a single PDF file which is portable and easily transferable without using the internet.

There must be an easier way to achieve this, but at present all I can think of is to make paper prints of everything and then scan all the paper as a PDF file.  That is not really a solution because the PDF file is also required to be searchable and a paper scan would just be 'photographs' and the file size would be unimaginably large !

Any suggestions would be most appreciated.

 

9 REPLIES 9
Mook
Aspiring Champion
Posts: 716
Thanks: 498
Fixes: 1
Registered: ‎27-12-2019

Re: Printing a PDF version of a complete website

@shermans Do you have physical access to the site?

Mook
Aspiring Champion
Posts: 716
Thanks: 498
Fixes: 1
Registered: ‎27-12-2019

Re: Printing a PDF version of a complete website

If you do have access to the machine @shermans and you have python installed:

import glob
from PyPDF2 import PdfFileMerger

directory = "./"
pathname = directory + "/*.pdf"
pdfs = glob.glob(pathname, recursive=True)
if len(pdfs):
    print("Found %s Files, Merging..." % len(pdfs))
    merge = PdfFileMerger()

    for pdf in pdfs:
        merge.append(pdf)

    merge.write("master.pdf")
    merge.close()
else:
    print("No files found %s" % pathname)

should do what you want but it's not been tested.

 

shermans
Pro
Posts: 1,272
Thanks: 94
Fixes: 3
Registered: ‎07-09-2007

Re: Printing a PDF version of a complete website

Thanks for that idea.  I do have access to the machine but I have not downloaded Python but there is no reason why I should not do so.

However, would this actually just consolidate all the PDS or would it also include the directory structure which is of course HTML ?  I could of course print the HTML output for the directory structure as PDFs to be included but that would make it rather more complicated.

Mook
Aspiring Champion
Posts: 716
Thanks: 498
Fixes: 1
Registered: ‎27-12-2019

Re: Printing a PDF version of a complete website

This would just create a consolidated PDF how can it include the dir structure, that's not part of the PDF being merged. So I'm really not sure what you mean by include the directory structure, could you elaborate please?

 

VileReynard
Hero
Posts: 12,535
Thanks: 615
Fixes: 19
Registered: ‎01-09-2007

Re: Printing a PDF version of a complete website

Actually, searching a PDF may not work well, at least outside of a PDF viewer.

So your collection of PDF's isn't searchable at present, unless you search each one separately!

Displaying and searching a 150MB PDF may take a little while.

 

"In The Beginning Was The Word, And The Word Was Aardvark."

daveplus
Pro
Posts: 554
Thanks: 122
Fixes: 7
Registered: ‎25-08-2010

Re: Printing a PDF version of a complete website

@shermans 

Hi. Why not take a mirror of the whole site?

Dave

shermans
Pro
Posts: 1,272
Thanks: 94
Fixes: 3
Registered: ‎07-09-2007

Re: Printing a PDF version of a complete website

That sounds interesting.  How do I take a mirror ?  Would this mirror work like the website, and would it be easily shareable ?

What I want to do is to be able to consolidate everything from my own website so that it can be shared offline in a simple file envelope.  Obviously, I could just copy all the code onto a USB stick and it could be run from there.  However, as this is destined for a Court, they require any digital submissions to be sent by email so that it can be virus checked before use.  So a single file, like a PDF, is required rather than html files.  The advantage of a single consolidated PDF file is that it could not only be searched, but it could be printed off as well if necessary, and all put into a folder - some judges are not very digitally literate.  The pages of the consolidated file, in whatever format, would also finally have to be sequentially numbered very simply from beginning to end, so that the judge could be directed to a specific page number rather than using the index.

The structure of the website is very simple HTML files which contain about 100 PDF exhibits.  It is broken down into about 8 sections, and each section has an HTML index for that section - there are a couple of sub-sections as well.  The coding is very low level DIY, but it allows documents to be viewed simultaneously in a remote telephone Court hearing.  In normal times, the hearing would be in open Court with paper bundles provided for the judge and the two parties.  But due to Covid-19, it is going to be heard remotely by telephone - initially at least; there may be a subsequent live hearing in Court for concluding addresses, but I am trying to avoid the necessity for that to save time.

This is understandably confidential, and so I cannot share the website publicly.  I am a litigant in person taking on a multi-national company in a claim due to be heard by the Court on 17th March - this has been on-going since 2018.  It really is a case of Sampson and Goliath ! 

However, if anyone thinks that they might be able to help me to bundle this, then please send me a private message, and I would be willing to share the website privately with log-in details etc. and be indebted to you.

I really would be most grateful for any help.

VileReynard
Hero
Posts: 12,535
Thanks: 615
Fixes: 19
Registered: ‎01-09-2007

Re: Printing a PDF version of a complete website

It is very easy to provide to take a mirror if it is a static website, no server or client side scripts etc.

Think of a basic web site as a hierarchy of directories - which is what it mostly is.

Note:- You can't send a 150MB file by email, although you could send a link to one...

If a multi-national doesn't settle out of court, you have probably lost already.

"Sampson and Goliath"

Do you mean 'Samson & Delilah' or 'David & Goliath'? 😀

 

"In The Beginning Was The Word, And The Word Was Aardvark."

shermans
Pro
Posts: 1,272
Thanks: 94
Fixes: 3
Registered: ‎07-09-2007

Re: Printing a PDF version of a complete website

Probably a mixture of both !  It was a Malapropism !

The file could be put on a USB stick and would probably be accepted by the Court officers.  I would need to ask.

The next hearing will not be the final one, but we will have a good idea from it what the outcome will be.  I would not have gone this far if I had not believed I had a strong case.  But making it as easy as possible for the judge to digest all the documentation will no doubt play a significant role.  Faced with hundreds of pages of paper (which will have to be printed also anyway) will make it much more difficult for the judge who may therefore not bother, whereas if the paper is all available digitally on a computer, there is much more chance that the important documents will be looked at / found.  I use the website myself all the time to refer to things because it so easy to use and is convenient.

So what does mirroring involve ?  Is it like an image or is it just transferring all the HTML and PDF files into a folder (which of course I already have) ?  Unless that folder can be consolidated into one 'file', I fear the Court office will not accept it, which is why I have not done it before.