Skip to content. | Skip to navigation

Personal tools
IST Directory James Wang
James  Wang

James Wang

  • Professor of Information Sciences and Technology
  • Affiliate Professor of Integrative Biosciences, The Huck Institutes of the Life Sciences
  • Affiliate Professor of Computer Science and Engineering, College of Engineering
313 C Information Sciences and Technology Building
University Park , PA 16802
Office Phone: (814) 865-7889


  1. Ph.D., Medical Information Sciences, Stanford University, 2000
  2. M.S., Computer Science, Stanford University, 1997
  3. M.S., Mathematics, Stanford University, 1997
  4. B.S., Mathematics and Computer Science, Summa Cum Laude, University of Minnesota, 1994


Dr. Wang is the author or coauthor of two monographs and more than 30 journal articles.

Research interests of his group include automatic image tagging, semantics-sensitive image retrieval, image security, biomedical informatics, computational aesthetics, story picturing, art image retrieval, and computer vision.

Wang has received an NSF Career award and the endowed PNC Technologies Career Development Professorship (created in 1999 through part of a $1 million gift to Penn State from The PNC Foundation).

He has served as the lead guest editor of IEEE Transactions on Pattern Analysis and Machine Intelligence Special Issue on Real-world Image Annotation and Retrieval, the chair of ACM Multimedia Information Retrieval MIR 2006 and MIR 2007, and as an invited speaker at more than 70 institutions.

From 2007-2008, Wang was a visiting professor at the Robotics Institute of the School of Computer Science, Carnegie Mellon University. He has held visiting positions at IBM Almaden Research Center, SRI International, NEC Research, and Academia Sinica.

Wang’s research has been featured in science media including PBS NOVA TV, Discovery News, Scientific American, National Public Radio, and MIT Technology Review, as well as wired news agencies. His publications are widely cited.

Research and Teaching

Wang is an expert in visual database search and retrieval. Formerly with the Biomedical Informatics Group and the Computer Science Database Group at Stanford, he undertakes work that makes possible the retrieval of specific images from databanks of images.


Among other contributions, he has co-developed the SIMPLIcity semantics-sensitive image retrieval system and the ALIPR real-time computerized image tagging system. These systems have been applied to several domains including biomedical image analysis, satellite imaging, and art and cultural imaging.


Wang’s studies also have involved retrieval from large-scale genome databases through pattern recognition. His research has been primarily funded by the National Science Foundation.


At Penn State, Wang teaches theoretical foundations of information science, techniques related to the organization of data, and medical informatics. He also guides a group of both graduate and undergraduate researchers. More about his research can be found at



For publications information, see



Book: Integrated Region-Based Image RetrievalIntegrated Region-Based Image Retrieval

By James Z. Wang
The need for efficient content-based image retrieval has increased tremendously in areas such as biomedicine, the military, commerce, education, and Web image classification and searching. In the biomedical domain, content-based image retrieval can be used in patient digital libraries, clinical diagnosis, searching of 2-D electrophoresis gels, and pathology slides. Integrated Region-Based Image Retrieval presents a wavelet-based approach for feature extraction, combined with integrated region matching. An image in the database, or a portion of an image, is represented by a set of regions, roughly corresponding to objects, which are characterized by color, texture, shape, and location. A measure for the overall similarity between images is developed as a region-matching scheme that integrates properties of all the regions in the images. The advantage of using this "soft matching" is that it makes the metric robust to poor segmentation, an important property that previous research has not solved. Integrated Region-Based Image Retrieval demonstrates an experimental image retrieval system call SIMPLIcity (Semantics-sensitive Integrated Matching for Picture LIbraries). This system validates these methods on various image databases, proving that such methods perform much better and much faster than existing ones. The system is exceptionally robust to image alterations such as intensity variation, sharpness variation, intentional distortions, cropping, shifting, and rotation. These features are extremely important to biomedical image databases since visual features in the query image are not exactly the same as the visual features in the images in the database. Integrated Region-Based Image Retrieval is an excellent reference for researchers in the fields of image retrieval, multimedia, computer vision and image processing.

Book: Machine Learning And Statistical Modeling Approaches To Image RetrievalMachine Learning And Statistical Modeling Approaches To Image Retrieval

by Yixin Chen, Jia Li, and James Z. Wang
In the early 1990s, the establishment of the Internet brought forth a revolutionary viewpoint of information storage, distribution, and processing: the World-Wide Web is becoming an enormous and expanding distributed digital library. Along with the development of the Web, image indexing and retrieval have grown into research areas sharing a vision of intelligent agents: computer programs capable of making ``meaningful interpretations'' of images based on automatically extracted imagery features. Far beyond Web searching, image indexing and retrieval can potentially be applied to many other areas, including biomedicine, space science, biometric identification, digital libraries, the military, education, commerce, cultural, and entertainment. Although much research effort has been put into image indexing and retrieval, we are still very far from having computer programs with even the modest level of human intelligence. Decades of research have shown that designing a generic computer algorithm for object recognition, scene understanding, and automatically translating the content of images to linguistic terms is a highly challenging task. However, a series of successes have been achieved in recognizing a relatively small set of objects or concepts within specific domains based on learning and statistical modeling techniques. This motivates many researchers to use recently-developed machine learning and statistical modeling methods for image indexing and retrieval. Some results are quite promising. The topics of this book reflect our personal biases and experiences of machine learning and statistical modeling based image indexing and retrieval. A significant portion of the book is built upon material from articles we have written, our unpublished reports, and talks we have presented at several conferences and workshops. In particular, the book presents five different techniques of integrating machine learning and statistical modeling into image indexing and retrieval systems: an similarity measure defined over region-based image features; an image clustering and retrieval scheme based on dynamic graph partitioning; an image categorization method based on the information of regions contained in the images; modeling semantic concepts of photographic images by stochastic processes; and the characterization of ancient paintings using a mixture of stochastic models. The first two techniques are within the scope of image retrieval. The remaining three techniques are closely related to automatic linguistic image indexing. The book will be of value to faculty seeking a textbook that covers some of the most recent advances in the areas of automated image indexing, retrieval, and annotation. Researchers and graduate students interested in exploring state-of-the-art research in the related areas will find in-depth treatments of the covered topics. Demonstrations of some of the techniques presented in the book are available at
Contact IST

Behind all of IST's research, education, and services are its people.

IST is a diverse and cohesive community of world-class educators and researchers; corporate and academic partners; dedicated professionals; and more than 2,118 undergraduate and 76 graduate students statewide.

What makes this community work is the value we place on integrity, professionalism, diversity, and respect for individuals.

Penn State's College of Information Sciences and Technology
332 Information Sciences and Technology Building
University Park, PA 16802 - 6823
Phone: 814-865-3528
Fax: 814-865-5604
Directions to IST building