Computer problems,Computer help
*AX SOFT>>>Programming & Design

Out of 10,000 internet links available with me, how to find out which link is related with technical ?


I have data base of 10,000 internet links. I want some mechanism, by which , I can find out that this link is technical related website or other. It should be automated process. I know the functioning of whois server, which is used to find out the data related with the web site. I want to find out whether that site is related with technical information ? If we have similar server like whois , which will tell u that given site belongs to technical or non technical then will work. So pls help me for this.
Thanks in advance.

You're talking about text classification. This is pretty easy, it more or less comes down to word counting and some probability. You'll also need some examples of technical web pages and some of non technical web pages, preferably that are not in your set.

You might be able to find a classifier out there that's open source or for sale, but they aren't hard to write. Baysian methods (see references) are also used in spam detection, but people probably won't be obfuscating their technical web sites so badly that they wouldn't be useful to human readers.

Why don't you see if any of those sites are classified with Google Directory? Thanks to thousands of volunteers they are classifying all internet sites. In addition you can check with alexa.com.

Enjoy.

Tags
  General - Computers & Internet   Software   Security   Programming & Design   Facebook   Flickr   Google   MSN   MySpace
Related information
  • VLC can't able to play .avi file !!!?

    try using KMPlayer it is the best player out there anyway, VLC is slow and resource consuming software ...

  • How are those online virtual tours created?

    The usually use equipment such as the ones on the following page. Quicktime is almost always the codec used to make the "video." ...

  • Is there any site for learning Bluej and java for a beginner?

    you can check out ...

  • How to determine the base address of the given array?

    I am finding it a little difficult to understand what you are asking here. What do you mean "location of x[5] is 151"? And what does "w" represent?

    ...
  • I have a folder with 10000 files I want an automatic way to put them in 100subfolders of 100files for each..?

    Hi, As a programmer, i think writting a program in any of the high level language will help u to solve the mentioned problem. -- Solution Provider: Eroz Awari (SGM School, Navsari)

    ...
  • The Binary # 11 Would Have The Decimal Equivalent Of...?

    Hi, To convert Binary number in to decimal follow the given below steps: For eg : 110111 we want to convert into decimal. 110111(Base 2) = 1 x 2^5 + 1 x 2^4 + 0 x 2^3 + 1x2^2 + 1x 2^1 + 1x2...

  • Operating system question?

    hello, Most softwares currently are intelligent softwares. What I mean with intelligent is that they wont allow you to run/execute/install once they find out that the OS is not the suited OS for...

  • How to protect/copyright websites?

    Copyright is protected automatically. You may want to add the word Copyright (or the 漏 symbol) and the year. "(C)" is bogus and "All rights reserved" no longer has any legal mea...

  •  

    Categories--Copyright/IP Policy--Contact Webmaster