Russian
LogoPicture Pump (c) 2000-2011 by Zmey Petroff
 
Online Help and FAQ

Help
Getting started. Creating a project.

Before you start preparing a project for massive graphics download, you should make clear the following things:

  • Full URL of the site from which you want to download pictures.
  • The method used to number pictures on the site.

Full URL of the site. To start downloading files from a web server, you should know the full URL of the items you wish to retrieve. Full URL is not only the correct name or address of a web-server; you should specify the full path to the files (or script programs that fetch files), parameters (if any) required for the server-side scripts to work correctly, etc. To define the full URL of a web-page that contains graphics you wish to download, just display it in your browser and then view its properties. Among other things, there is the full URL of the page there. Page properties in Microsoft Internet Explorer is in File->Properties submenu.

You should note the following:

  • If the web page uses frames, you will see the URL of the frames page, not the URL of the page that has graphics on it. To find out the URL of the page(s) with pictures, you should view the HTML source code of the frames page.
  • The URLs of pictures might be generated by a JavaScript applet located somewhere within the page. As Picture Pump cannot run JavaScript applets, there is only one option left - to view the JavaScript code yourself, find out what URLs it generates, and create a project that will directly download pictures from that URLs.
  • Picture Pump cannot (yet) work using protocols other than HTTP.

The method used to number pictures on the site. To organize fast and simple access to large collections of graphic images, the easiest (for the webmaster), fastest and the most reliable method is to assign *numbers* to pictures or HTML pages that contain pictures. It means that to access the next picture from a collection, you should change some number in the URL of the picture you're currently viewing. In 90% of cases you shoul increase or decrease some number (otherwise called "counter"):

http://www.server.com/images/cool/zoom.asp?id=15 - URL of the current picture,

http://www.server.com/images/cool/zoom.asp?id=16 - URL of the next picture.

As you can see in this example, counter is the number after "id=". In most cases it's quite safe to set the initial value of the counter to 1 - this way you may download "hidden" images that are not directly accessible via the links on the web pages.

When you know the full URL of the site and the method used to number pictures on the server, you are ready to create a project:

  • Enter the full URL of the site into the corresponding field. Replace the changeable part (i.e., the counter) with a sigle substitute character (in the above example the counter is located after "id=". By deafult, the substitute character is @. You should enter http://www.server.com/images/cool/zoom.asp?id=@ into the Site URL field).
  • If @ is used as a part of the site URL, you can select another substitute character from the list or enter one of your own choice.
  • Enter the initial value of the counter. Usually it is 1.
  • Enter the increment for the counter. If you want the counter to increase by one for each next generated URL, set increment to 1.
  • Enter the counter template if necessary.

Well, that is all! You can now start downloading.

Debugging the project and correcting errors.

Using Counter Template to fine tune your project.

Since Picture Pump version 1.3, you can use a template for your counter. Though using a template may seem an unnecessary complication for a novice user, in some cases it is very useful - moreover, its use is unavoidable. Let us consider the following situation:

Pictures are numbered using a three-digit counter from 001 up to 999 (e.g., www.server.com/images/pic_000_.jpg, www.server.com/images/pic_001_.jpg, etc). If you do not use the template, you will have to create three virtually identical projects:

The first one will download the first 9 pictures numbered from 1 to 9:

www.server.com/images/pic_00@_.jpg, counter starts at 1 and ends at 9;

The second one will download pictures numbered 10 to 99:

www.server.com/images/pic_0@_.jpg, counter from 10 to 99;

And the third one - all pictures from 100 till 999:

www.server.com/images/pic_@_.jpg, counter from 100 to 999;

You will have to watch each project so that it doesn't exceed the upper limit of the counter, and manually stop each project after all pictures are downloaded.

Using the Counter Template lets you avoid such a dull job. With it, you can specify the width of the counter in digits (to indicate a position, place the substitute character into the template field - default substitute character is @). For the example project described above, one should specify three numeric positions for the counter - that is, one should enter three @'s into the template field: "@@@". So:

www.server.com/images/pic_@_.jpg, set the counter to 1, template to @@@.

Insertion of the counter will then be performed in two steps:

  1. First the template will be applied to the current counter value (if the counter is 12, the result will be 012 - with leading zero).
  2. Second, the result will be inserted into the site URL in place of the substitute character (www.server.com/images/pic_@_.jpg -> www.server.com/images/pic_012_.jpg).

Template is convenient not only because it allows you to specify the leading zeros in the counter. When the counter exceeds the number of digits specified in the template, the project is stopped automatically - in the above example, project will be stopped when the counter reaches 1000.

If you wish, you can enter characters other than the substitute character in the template. Such characters do not specify a position, but are directly copied into the Site URL:

Using the Template "@@-@" and Counter 12, the following substring will be inserted into the Site URL: "01-2" (as you see, the hyphen is copied from the template as-is).

When you fill in the project configuration yourself, I recommend you to periodically check your Site URL, Counter and Template settings. Just press the corresponding button in the main window (the one with the red checkmark on it), and it will warn you of any errors or potential problems in your settings (to a limited extent, of course). Besides, you will see detailed report about your current template (if you use one), and the next ten URLs that will be generated when you start the project.

Getting rid of banners and ads.

On most web pages, banners and advertisements make 90% of all graphics. To get rid of them, you should fill in the list of Exclusions Masks. Excluded items (those whose URLs match one or more masks) are not put into the retrieval queue, and thus are never downloaded.

To add an exclusion mask (I usually) do the following:

  1. Start the project. Click on the "block queue" button so that it is pressed (this button is a small one with an "X" on it). Now wait till the queue grows to 10-20 items. Stop the project. You can disconnect from the Internet while setting up the Exclusions Masks.
  2. Now open the Project Configuration -> Retrieval Queue and view the list of items in the queue. As a rule, banners and counters have "complex" URLs like this: "a10.bb.com/cgi-bin/banner.cgi?id=45367257". Select one of the banners and press "Copy to Clipboard".
  3. Now open the "Exclusions" tab. Press "Add mask", and in the window that appears after it - "Paste from clipboard". You will see the full URL of the banner. Now edit the URL a little: replace all parts that may change in other similar banners with asterisk(s). As a rule, you can leave only the name of the server, and replace all the rest with an asterisk: "http://a10.bb.com/cgi-bin/banner.cgi?id=45367257" will become "http://a10.bb.com/*", or even shorter - "http://*.bb.com/*"). Press "Add".
  4. Again open the "Retrieval Queue" tab. Press "Apply Exclusions" - this will apply our newly created exclusion mask to the items in the Retrieval Queue. If everything is okay, banners that match the mask will be removed from the queue. Now check the queue once again for banners - if there still are some, go to step 2. If not, close the Project Configuration dialogue, connect to the Internet and start downloading.

Instead of filling in the list of Exclusion Masks, you can simply exclude files that are shorter than some arbitrary limit (Project Configuration -> Reply -> Exclude files shorter than XX bytes in length). Specify some "reasonable" lower limit, so that banners and advertisements are not downloaded (hint: in some banner exchange networks banners cannot exceed 15 Kb). Regrettably, this method has some major drawbacks:

  • You can block quality images, too (especially if you set the limit too high);
  • You lose time while Picture Pump gets the file size of a picture and decides if it's okay to download it (when you block banners usings exclusion masks, Picture Pump does not even attempt to retrieve them);
  • Some servers may not return the size information for pictures, so blocking by size will not work with them.
Projects optimization hints.

  • Where possible, try to set up your projects so that the Site URL could point directly to pictures, and not the pages containing them. Thus you need not use any exclusions (because you never meet banners). And, it speeds up downloading!
  • Try to set the minimum required fields in the HTTP Request. Thus you reduce the amount of information sent to the server, and (theoretically) speed up downloading a little.
Frequently Asked Questions.

Downloaded files are in some unknown format. What is wrong?

You may have specified wrong type of the Site URL (instead of "Site URL points to HTML pages with pictures" you specified "Site URL points directly to pictures"). When you open such files in a text editor, you clearly see HTML tags (e.g. <html> - an identifier of an HTML document). To correct this problem set the correct type of the Site URL (pointing to HTML pages).

When I start a project, nothing is saved to disk.

This could be caused by several reasons. The most evident (and the most frequent one) is wrong or misspelled Site URL. Server cannot find the requested pages and pictures, and reports errors. To correct the problem fill in th Site URL field correctly. Other possible reasons:

  • The server redirects Picture Pump to another web site (this is an error if you use Picture Pump 1.3 or earlier). Update your copy of Picture Pump.
  • HTML pages have no direct links to pictures; pictures are loaded with a JavaScript applet. Try to modify your project to download the pictures directly.

In general, to define the reason of download failures, you should do the following:

  • View the errors log file (just press the button with a notepad on it). The log contains detailed descriptions of each error, and thus is the first place you should look if you have problems.
  • View the exact URLs that are generated by your project (by pressing the "test" button). Enter one of the URLs into your browser and see if it works. You may have to correct the Site URL, counter or template settings.
How do I view *.ppp files?

Wow, my favorite question. :)

People, *.ppp files are not pictures! They are not packs of pictures, either. These are simply Picture Pump Project files, containing all sorts of information about the project (Site URL, request settings, counter template, counter's value, the contents of retrieval queue, and so on). The pictures themselves are put (dumped, huh) into the target folder for pictures, the one that's specified in the project. By the way, there is a button in the main window of Picture Pump to open the contents of this folder in Explorer.

To view a picture, just find it in Explorer's window and click on it (or double- click if you still use Windows 95). If you still cannot see anything, I recommend you to install a picture viewer program.

Designed by Zmey Petroff © 2000-2011