BusinessWebsite Success

OnPage.org Crawler Settings – Part 2

Want create site? Find Free WordPress Themes and plugins.

OnPage.org’s extended settings allow you to individually design your crawler in order to best suit your projects. Boilerplate settings and live testing of the settings are now possible.

In Part 1 of this post, we showed you in what way the basic settings can be adapted in order to receive even better analysis of your website. In the second part, we’ll talk about the extended settings which allow you to have even more control over the crawler. Let’s start right away!

Extended Crawler Settings

Apart from the Basic Settings, you can further customise your settings in order to individually design the crawling of your project. In the following, we’ll explain all the settings in-depth.

Bildschirmfoto 2015-03-30 um 15.41.27

Crawler Settings in OnPage.org Zoom

Parallel requests

When OnPage.org crawls a website, it yields stress on your server, as it has to process even more requests than usual. Especially if your server isn’t very powerful, meaning it can’t “cope” with the crawling, and therefore slows down and signals timeouts, we recommend choosing Parallel Requests: Like that, we crawl your site considerably faster. We offer you 10 parallel requests by default – however, up to 100 Parallel Requests are possible.

Unfortunately, the unit “Parallel Requests” is often misinterpreted: Some think of it as “requests per second”. It’s simplest to visualise the setting as “people who are clicking through my site”. Setting “Parallel Requests” to 1, this would mean one user is on your website waiting for it to be loaded. After that, he will follow the first link. If this one’s loaded, he will again follow the next link – and so on. Increasing this value results in a respective number of “simulated” users clicking through your site. That’s why this value can be raised without fearing to bring the website to a standstill.

Login data

The OnPage.org crawler can crawl the site even if it isn’t “public” and password protected. If your website requires such a login, simply insert your username and password. If you don’t need this feature, just leave it blank. Please note that this feature only works for “htcaccess” protected sections, self built login systems are not yet being supported.

Crawler User Agent

Crawling the site, our crawler by default pretends to be the GoogleBot. Like that, we can simulate the Google Crawling and try to look at a website with Google’s eyes. In rare cases, it may be necessary to use a different user agent for the crawler. It can be determined separately. You can also put the user agent to “OnPage Crawler” in order to know it’s us.

Watch out though: there may run desired or undesired scripts that exclusively display content for the Googlebot (Cloaking) – that’s why we recommend to crawl the site as Googlebot in order to reveal problems of that kind.

Adwords Branding Campaigns

Additional Request Header: x-Request-with

Some webservers have IP based blocking systems. In scenarios as such, the “x-Request-with” header can be applied in order to keep on crawling with the GoogleBot user agent while telling the webserver to be the OnPage Crawler. You can insert any value which we will then send out with every request and which can for instance be used for whitelists.

Remove the Boilerplate *new*

The Boilerplate settings are all new! Our text statistics are by default based on the entire page content. If you prefer the reports to be based on the “Main Content”, you can remove the so-called Boilerplate. In doing so, we’ll try to ignore your site’s header, footer and sidebar – so that your site’s main content will be interpreted exclusively. However, this algorithm’s quality strongly depends on your website code’s quality – that’s why this feature should be treated with caution.

Boilerplate

The Boilerplate settings

Adapting the detection of headers

Due to certain Content Management Systems or for other reasons, it is possible that the H1 Tag is not being used as main head tag. Here you can apply a different tag. CSS classes can be used as well. Valid inputs would be ‘h2’, ‘h3’ or CSS selectors such as ‘div.headline’.

Subdirectory mode

With the Business, Agency or Enterprise package, you can also choose to only choose crawling a particular subfolder. With this setting, you can insert the respective URL (relative path) and we will ONLY examine that particular subfolder.

Accept Language

If your server displays content based on the Accept-Language Request Header, you can set your desired value here. Leave the box empty in order to use standard values. Valid figures are ISO language codes, such as “de” for German or “en-us” for American English. We don’t send Accept-Language parameters by default and let the web server decide on the language.

Analyse sitemaps

Would you like the crawler to download and analyse the sitemap.xml(s)? This option is needed for the “sitemap.xml” report. If your website has a lot of sitemaps (20+), deactivating this option can increase your crawler’s performance.

Adwords Branding Campaigns

Sitemap URLs

If you’re not using the standard file name for your sitemap.xml and haven’t linked it in the robots.txt, you can deposit the URL of your sitemap.xml here. Alternatively, you can also use this option in order to display a different sitemap.xml as different bot (for instance: testing a new sitemap.xml). You can also indicate various sitemaps (such as video sitemap, image sitemap) here. We of course also support different sitemap index files and gzip compressed sitemap files.

Crawling the sitemap files

If the OnPage.org crawler comes across a sitemap, not all of the therein contained URLs are being crawled automatically. We only crawl URLs containing links or redirects. Tick the box, if you want to have the URLs in the sitemap crawled anyway.

Testing the Crawler Settings

The OnPage.org Crawler settings can be tested directly and live with the tab “Test Settings”. Insert any URL and we’ll show you the Crawling result – live! With this, you can for instance test whether your login data or the adjusted header detection work properly.

crawl-settings-testing

Live testing of the Crawler Settings

As you can see, there are various options to improve and individualise your reports. Make use of the Crawler Settings in order to get a feel for the way Google crawls your site and how this affects your website.

Keep on optimising!

Did you find apk for android? You can find new Free Android Games and apps.
Advertisements
Show More

Related Articles

Close
Close