Computer problems,Computer help
*AX SOFT>>>Programming & Design

Perl regular expressions, can you help?


say i created a perl regular expression to capture any URL on the net. Then i save that file with .pl extension. Then i upload it to a directory in a web hosting company's account as part of my perl-based web page.

is that it? as soon as i run the file, it starts collecting?

thanks

Eskwayrd - thank you very much indeed. How wonderful you have imagined every possible situation for what I could have used Perl RegExs.

As a matter of fact, i have just finished a course in Perl, I was exposed to regular expressions, and was fascinated by the idea. I asked the instructor if i could use expressions for collecting things from the web but he did not answer my question. I did not insist in case he did not have the answer :-)

This paragraph in particular helped my understanding:

"If you were expecting your script to visit other web sites and try to match URLs, then you'd need to do something to make it scan the internet, such as give it a starting page, and write the code to scan for all the URLs on that page and follow them. This is called spidering."

Spidering, that's the missing link then. Let me do a bit of online research.

Thanks also for the link to the forum, i may well need that.

Thanks a lot.

No, that's not it.

Regular expressions allow you to match patterns within strings. Just because you have a regular expression says nothing about where you get the strings from.

If you have a regular expression in a Perl script on a web server, then you expect it to run when users specifically request it (or when invoked via server-side includes on other pages). So it would execute on demand. What URL would you like to match? Your own? Perhaps the referring page (if supplied by the user's browser)? You won't get so many URLs that way, and it would be more thorough to use Perl to extract data from your web server's log files.

If you were expecting your script to visit other web sites and try to match URLs, then you'd need to do something to make it scan the internet, such as give it a starting page, and write the code to scan for all the URLs on that page and follow them. This is called spidering.

Spidering, if that's what you're trying to do, has a number of risks. The internet is rather large; the chance that you have sufficient storage/bandwidth to collect all the URLs on the internet is zero. And that's if you avoid circular linking, or infinite linking (where sites will give you new, unique URLs to index every time you ask).

Perhaps you should ask another question where you state what you are trying to accomplish, and then more specific help could be provided.

You might also consider asking your Perl questions in a Perl-specific forum, such as at http://www.perlmonks.org/

Tags
  General - Computers & Internet   Software   Security   Programming & Design   Facebook   Flickr   Google   MSN   MySpace
Related information
  • C++ program, room capacity?

    You should probably add some input checking (e.g. acceptable range of values, etc.). Here's my attempt: #include <iostream> #include <string> using namespace std; int main ...

  • C++, phone company?

    That's lot of work, may be you can search at project assignment help website like ...

  • Unattended installations?

    One of the ways to install Windows is to perform an unattended installation by using an answer file. The answer file is created by someone (in this case, you) to prepopulate information that you w...

  • C++ Leap Year Program with error C2659: '=' : function as left operand problem?

    in your isLeap() function.. you are assigning isLeap=true.... which cannot be done.. because isLeap is the pointer to the function isLeap().. dont worry if you dont understand that..... But you can...

  • Hi this is daniel nd im interested in put a studio...i have fruit loops but i need the key???

    Hmm what fruit loops you have? and PM me hopefully i can help u with that XD

    ...
  • In game programming, is C sharp the same as C++?

    No, they're different. C++ is an older language that was based on the older ANSI C language. C# is a fairly new language developed by Microsoft that runs off of their .Net platform. While t...

  • Can you reload CSS file in drop down menu?

    Yes, called a style sheet switcher or theme switcher. ...

  • Since SPYBOT is NOT in Add/Remove program. How do I get it out of operating system?

    Hi there, If you have Windows and if you installed it in your Program Files... Click Start >> Program Files >> Spybot - Search & Destroy >> Uninstall Spybot - Search &am...

  •  

    Categories--Copyright/IP Policy--Contact Webmaster