WebProWorld IT Forum | AMD Cracks Open Geode The fastest most power efficient chip yet manufactured by AMD, the Geode LX 800 is set for release for use in a range of products. Working without a fan, the x86 based chip runs Windows XP and XP embedding operating systems.
Software Piracy Will Expand With IT Market For every two dollars’ worth of software purchased legitimately, one dollar’s worth was obtained illegally, says a new study from Business Software Alliance. Conducted by International Data Corporation (IDC), the IT industry’s leading global market research and forecasting firm
Gates Says He's the MacDaddy Sounding a bit smug about Microsoft’s ability to knock off Apple products, Bill Gates predicted an anemic future for iPod. His grim prediction was revealed in an interview published in German newspaper saying that iPod popularity has been a first generation temporary type of zeitgeist having the same fate as the Mac.
|
| Recent Articles | The Seven Things You Have To Say About Security provider of information risk management solutions offers its take on George Carlin's famed bit, but on the serious side. IPLocks offers a press release today, listing seven topics a business needs to consider in terms of security.
Intel’s Craig Barrett Stepping Down Paul Otellini will become the chipmaker's first CEO who is not an engineer, while Mr. Barrett becomes company chairman. Can a number cruncher thrive in the chief executive's spot? Or will Intel turn into PacBell during Scott Adams' pre-Dilbert banking career?
Microsoft and Sun Partnership Starts To Show Results Microsoft and Sun Microsystems spent a lot of time at each other's throats in legal battles, but last year the two companies formed a partnership which is now starting to show its results.
Microsoft Shuts Windows XP on Homes With Older PCs. Microsoft announced a new slimmed down version of Windows XP. They call it Eiger. Its shrunken version of XP will be usable on older PCS, at least some of them. The product will be based on Windows XP professional with Service Pack 2 and is designed for server-centric computing environments.
Strategic Outsourcing: Testing the IT Outsourcing Waters and Staying Afloat Before Gertrude Ederle began her historic swim off of Cape Griz-Nez, France, she underwent extensive training for endurance and technique-even though she was already an accomplished record-breaking swimmer with Olympic medals to her name.
Wages of Fear In the circles of power, fear is often admired as a potent motivator. In his classic discourse on power politics, The Prince, Niccolo Machiavelli offered the following thoughts on the question of whether it is better for a leader to be feared or loved
Trend Micro Buys Small Anti-Spyware Company Trend Micro has entered into an agreement to acquire anti-spyware product developer, InterMute. Trend Micro will pay about $15 million for InterMute, which only consists of around 35 people.
A Gentle Introduction to Cryptography With the increasing incidence of identity thefts, credit card frauds, social engineering attacks, the digital world is facing challenges in the years ahead. Obviously, cryptography, a young science, will play a prominent role in the security
Trend Micro To Compensate Japanese Customers Weekend problems caused by a poorly tested update file created havoc for thousands of users. The company will pay for the repair of Japanese PCs damaged by its faulty antivirus file. Compensation for business losses has not been offered.
|
| | 05.24.05
The Machine Readable Web
By Ron Tower
The vast majority of the Web is intended for human readers. The goal has been to create an online experience for human beings. It is an open and ever growing body of information.
This is all great, but it does present some problems. There is just too much there. We aren't sure what information to trust. We can get lost in the Web and waste a lot of time. So we need some software tools to help us, but the information itself is not structured in a way that software can easily deal with. Enter the machine readable Web.
The most basic way for software to deal with information on the Web is to simply read the HTML of the pages and "analyze" it. This is what search engines do. They have software agents called spiders that walk the Web and index the pages. They then use various techniques to give us the "best" pages for the search queries we enter.
This is helpful and essential, but you still have to go to the pages (many pages) and try to find what you want. And you need to know when to go back to get updated information. You may even know that a page has the information you want and that it will be updated regularly, but you don't want to go back again and again to get that bit of information off that page.
There are tools called "screen scrapers" or Web page extractors that can read the pages and extract just the information you want, but the pages are unstructured and changing. The rules you describe for extracting the information may be complex and may not work as the page changes.
And content providers often don't want you to use their page that way. They want you to look at the whole page, so that you will get the other messages they have on the page (like marketing messages), not just the bit you want. They try to put up a "no droids allowed" sign, in this case, "no robots, we want human eyeballs only".
Some content providers realize that you can't always come to their site and that if they will give you a useful summary of what is on their site, you might come more often to see the details (and the other stuff you really don't want to see, but live with to get the content you want). A very useful way of doing this is using RSS feeds. RSS (Really Simple Syndication) provides the summary in an XML file that a software agent can easily process. RSS news readers or information aggregators go and get the summary for you and then you can see if you want to click through to see the details. (See http://www.w3schools.com/rss/default.asp for more on RSS.)
RSS is the first really successful example of the machine readable Web. The RSS XML file has a well known structure and is easy to produce and to process. It has also been successful because it is a win-win situation for content providers and consumers. Consumers get the summary information they want, making their Web browsing more effective and enjoyable. And providers get what they want, more traffic to their site.
Content providers had started down this path by providing HTML fragments that Web site authors could add to their sites. There are a few tools for individuals to also use these fragments. Commented HTML can be used to allow Web page extractors to more easily extract dynamic HTML fragments. HTML fragments, like RSS feeds, are useful for consumers of information and helpful for content providers in attracting traffic.
This brings us to a fundamental point. The content providers need to have a relatively easy way to provide the machine readable content and it has to fit in with their mission. And the consumers won't use it unless they get something useful from it. So we need the win-win for the machine readable Web to get off the ground. At this point, according to a Pew Research report (http://www.pewinternet.org/PPF/r/144/report_display.asp), 5% of internet users are using RSS. Most of these people are classic early adopters. But it seems like RSS is moving quickly to being more widely adopted.
But even this relative simple standard was not easy to get to. There was a lot of conflict between the "keep it simple" crowd and the "more features" crowd (see http://diveintomark.org/archives/2002/09/06). And RSS is just scratching the surface. After all, it just provides a title, a link, and a short summary for each item. Richer information will require a richer structure.
There is a community of researchers looking to provide the approach for this richer structure under the tag phrase "semantic Web". This is largely a vision and research project at this point. See http://www.w3.org/2001/sw/. One criticism of this work is that it is "too complex", and if you try to read some of it, you might be pardoned for developing a similar opinion. Of course, tools will be provided to hide the complexity from users, but the issue is whether it is too complex for the typical content provider and for tool developers. See http://www.snipsnap.org/space/RDF+too+complex if you are interested in exploring this issue.
A more near term approach is "Web services". This uses the Web infrastructure for application to application communications. It is not as easy as RSS, but it builds on a similar structure of XML as the data format. At this point it is mostly used for business-to- business and there are hardly any Web services that provide public information. And you need to define an interface for each kind of information you might as a content provider want to serve. See http://www.w3.org/2002/ws/ if you are interesting in exploring this.
Another intriguing use of the Web is machine-to-machine. The idea here is that many machines exist that have embedded computers. If they could hook into the Web, they could provide a lot of useful information. Some might be sensors of various kinds. Others might be cars, toasters, or washing machines. Wireless companies are interested in providing devices similar to cell phones to allow these machines to be accessible over the Web. See http://itpapers.zdnet.com/whitepaper.aspx.
So a machine readable Web is starting to become a reality with RSS and Web services and may progress even further with something like machine-to-machine or the semantic Web. Early adopter consumers are starting to adopt the idea via RSS. The key will be for content providers to adopt a richer set of machine readable formats like they have started to do for RSS and keeping it as simple as possible so a wide variety of software developers can provide tools for the end users. This may be the key to making the Web even more useful.
About the Author: Ron Tower is the President of Sugarloaf Software and is the developer of Personal Watchkeeper, an information aggregator supporting a variety of ways to summarize the Web. http://www.sugarloafsw.com |
|