0
Select Articles

Web Mining for Innovation PUBLIC ACCESS

Automatically Combing User Reviews and Patent Databases Turns up Plenty of Innovative Ideas.

[+] Author Notes

Andrew Kusiak is a professor of industrial engineering in the College of Engineering at the University of Iowa.

Joseph Engler is a senior software engineer at Rockwell Collins Inc. and an industrial engineering Ph.D. student at the University of Iowa.

Mechanical Engineering 130(11), 38-40 (Nov 01, 2008) (3 pages) doi:10.1115/1.2008-NOV-1

This article reviews a way to automatically mine the Web for innovative requirements. Websites now commonly host user reviews filled with opinions about a product’s strengths and weaknesses. Publication of user reviews is so ingrained in the Web that it has spawned an entire field of study known as collaborative intelligence. Collaborative intelligence gathers the collective reasoning of multiple users to achieve some goal. While user and expert reviews offer a wealth of information about the needs and desires of the market, they are not the only source of requirements for innovation. Much the same method can be used to search patent databases for the same type of product attributes and requirements for innovation. Patent databases provide both complete and summary descriptions of already-envisioned inventions and offer a great deal of information about trends in current innovation. Requirements for innovation are changing in time. To keep up with these changes and to aid in the market acceptance of an idea, the mining of current requirements for innovation is crucial.

Think of the Web as a massive search space. What if companies could mine its myriad data to discover customer and expert reviews of products and services, and then use the feedback from those reviews to drive innovation when designing new products or refining existing ones?

From focus groups to market research, companies spend vast amounts of money and time to ensure that a new product becomes a marketplace success. But determining market acceptability of a new concept during its early development stage is difficult and time-consuming. No well-established method or tool exists to guarantee success.

Focus groups and market research are often limited to a select group chosen either randomly or with market segmentation in mind. Due to the limited size of the sampling this process may offer companies a false sense of security.

What if the focus group and market research could be expanded to take in the entire World Wide Web?

We have researched a way to automatically mine the Web for innovative requirements. By calling upon these data mining methods, a company may be better positioned for market success.

Web sites now commonly host user reviews filled with opinions about a product's strengths and weaknesses. Publication of user reviews is so ingrained in the Web that it has spawned an entire field of study known as collaborative intelligence. Collaborative intelligence gathers the collective reasoning of multiple users to achieve some goal.

Engineering companies can call upon collaborative intelligence-in this case, gleaned from user reviewsto get an overall look at what consumers think about the products they use. Engineers can use this feedback to drive innovation when creating or updating a product. When combing user reviews of products similar to the ones they'd like to design or have already designed, engineers can get a collective interpretation of the attributes that make the existing product a success or failure.

Before companies can extract user reviews from the Web, they need a way to fll1d the sites that contain reviews. There are two main methods for automated searching of the Web: unfocused or focused crawling. Standard-or unfocused-crawling of the Web doesn't consider a specific topic; rather, its job is to index all pages available on the Internet. Focused crawling seeks only those pages related to a specific query string.

Even with the assistance of algorithms such as PageRank, developed by Google, successful standard crawling requires massive hardware and bandwidth. This drawback prevents most corporations from performing this type of crawl internally.

Focused crawling requires far less hardware and bandwidth, but does require some sophisticated algorithms to weed out the undesirable links as they relate to the gIven query.

Search results returned by crawling next need to be filtered and classified.

The first step is to filter out those reviews that offer little or no value while gathering attributes from user reviews. We call the favorable attributes "requirements for innovation." For filtering, we call upon a classification system using a simple decision-tree classifier to remove the unwanted reviews.

The second step, then, is to perform a semantic analysis of each review to classify the individual attributes. Attributes, or requirements, can include words like "looks good," "sturdy," or "flimsy."

Once the reviews are segmented and the attributes formed, we need a way to store and analyze the feedback. Thus, the attributes are fed into a transactional database populated with attributes discovered while crawling. A transactional database contains a field for each requirement and a row for each review. Large companies commonly use them to classify and rank information.

It's from this database that companies mine customer requirements, which they can then use for new product innovation.

While user and expert reviews offer a wealth of information about the needs and desires of the market, they're not the only source of requirements for innovation. Much the same method can be used to search patent databases for the same type of product attributes and requirements for innovation.

Patent databases provide both complete and summary descriptions of already-envisioned inventions and offer a great deal of information about trends in current innovation.

Mining Web-based patent databases is similar to the mining of user reviews, given that both may be viewed as text documents with html markup added.

Patent databases are often more effective for requirements gathering than publications and thesis information are. While patent gazettes reveal over 90 percent of research results for the patents, more than 80 percent of that information isn't available in academic theses or publications, according to Guihui Wen (Rotating Dynamics Jor Computational Creativity, National Defense Industry Press, Beijing 2005).

This wealth of information can be tapped to formulate true requirements for innovation.

Wen and his colleagues have devised a patent-trend monitoring system that automatically searches for patents, dissects them through semantic analysis, and compares them to show recent trends in specific technological areas.

Having such a road map of requirements is especially helpful during the idea-gathering phase of innovation when new, possibly innovative, ideas are being put forth.

Requirements discovered in the patent documents are stored in the same type of transactional database used to store user reviews. Additionally, engineers may place patent requirements in a creativity database for use during the idea-forming phase of a project design.

SO how do engineers best call upon their databases? By mining them to find the most frequently mentioned requirements for innovation.

We mine the database using an a priori-style market basket analysis. This returns to us the requirements that show up most frequently. Other requirements-mentioned only once or twice-are unlikely to be good indicators of market success. So frequent requirements gathered from the review of MP3 players may include: long battery life, good sound quality, tough case, and acceptance of standard accessories.

Product ideas not meeting the requirements mined from the Web can be viewed as risky and should undergo some evolution to increase the likelihood of success.

Keep in mind that, while mining frequently mentioned user requirements from the Web takes a large step forward in determining the best attributes for a particular product, it doesn't offer a complete understanding of how those attributes interact.

Also remember that incremental innovation is an evolutionary process. Requirements for innovation are changing in time. To keep up with these changes and to aid in the market acceptance of an idea, the mining of current requirements for innovation is crucial.

The use of an and-or tree offers more than just visualization. With the requirements for innovation formed in such a structure, evolutionary computation algorithms could be called upon to compare and evolve current innovation ideas into those that would be more market-acceptable.

We are planning to commercialize the system after further development, so the public will have access to it at some point in the near future.

Knowing the current requirements for innovation can not only guide engineers to ideas for new products, but also can increase the product's chances of success.

A Closer Look

In a companion article exclusive to Mechanical Engineering Online, "A Search Engine for Product Design That Clicks,"' authors Joseph Engler and Andrew Kusiak go into greater detail describing the tools and methods they apply to data mining on the World Wide Web.

Copyright © 2008 by ASME
View article in PDF format.

References

Figures

Tables

Errata

Discussions

Some tools below are only available to our subscribers or users with an online account.

Related Content

Customize your page view by dragging and repositioning the boxes below.

Related Journal Articles
Related eBook Content
Topic Collections

Sorry! You do not have access to this content. For assistance or to subscribe, please contact us:

  • TELEPHONE: 1-800-843-2763 (Toll-free in the USA)
  • EMAIL: asmedigitalcollection@asme.org
Sign In