Unsighted Cite: Highly Cited Paper which was never written
In a recent post E. Garcia points out a great article “The Most Influential Paper Gerard Salton Never Wrote” written by David Dubin. According to article abstract
Gerard Salton is often credited with developing the vector space model (VSM) for information retrieval (IR). Citations to Salton give the impression that the VSM must have been articulated as an IR model sometime between 1970 and 1975. However, the VSM as it is understood today evolved over a longer time period than is usually acknowledged, and an articulation of the model and its assumptions did not appear in print until several years after those assumptions had been criticized and alternative models proposed. An often cited overview paper titled “A Vector Space Model for Information Retrieval” (alleged to have been published in 1975) does not exist, and citations to it represent a confusion of two 1975 articles, neither of which were overviews of the VSM as a model of information retrieval. Until the late 1970s, Salton did not present vector spaces as models of IR generally but rather as models of specifi c computations. Citations to the phantom paper refl ect an apparently widely held misconception that the operational features and explanatory devices now associated with the VSM must have been introduced at the same time it was fi rst proposed as an IR model.
Phantom paper by Salton which Dubin mentions in his article has more than 216 citations in Google Scholor. No surprise that this article was cited by big shots in the field of information retrieval, despite the fact there is no such paper. Indeed no one ever bothered or tried to read the cited paper, even some of Salton’s own colleagues propagated this mistake on several occasions
The paper is even cited in a few of the very last articles on which Salton is listed as a coauthor (Singhal, Salton, Mitra, & Buckley, 1996; Singhal & Salton, 1995). These papers were published close to or shortly after the time of his death, and so the errors cannot be blamed on Salton (remembered by his colleagues as a very careful and meticulous writer).
What exactly happened and and how it continued for a such a long time? According to Dubin,
What is surprising, however, is that there is evidence that the VSM evolved over a much longer period of time than is usually acknowledged and that Salton did not publish an articulation of the model and its assumptions until several years after criticisms of those assumptions had been leveled and alternative models proposed.
and
[T]he real evolution of the VSM (as people conceived it) is even more fascinating than citation errors for which Dr. Salton bears none of the blame. What began as a growing comfort in using vector spaces to explain computations led to the use of language that suggested the VSM was a retrieval model in its own right. When Salton and his colleagues were challenged on the implications of taking that language seriously, they joined their critics in reinterpreting their earlier writings.
Because I am no information retrieval specialist so I will cite the William Webber’s commentry on the whole issue,
[V]ector spaces were first introduced by Salton and collaborators as a handy way of visualising and describing similarity computations; that other papers misunderstood Salton as proposing the vector space model as a formal model of information retrieval, and criticised it on that basis; and that it was in response to those criticisms that Salton finally, in the 1980s, explicitly described the vector space model as it is now understood.
Does this paper indicates towards an isolated example or there are more hidden phantom instances? Answer is a big NO. In their article Read before you cite! M.V. Simkin and V.P. Roychowdhury show that misprint distributions in citations follow a Zipf law and only about 20% of citers read the original. Further they suggest
In the past decade with the advent of the Internet, the ease with which would-be non-readers can copy from unreliable sources, as well as would-be readers can access the original has become equally convenient, but there is no increased incentive for those who read the original to also make verbatim copies, especially from unreliable resources.
In my opinion there are several factors working together to create this kind of situation ,
1. Lack of motivation in authors to read the papers they cite
2. Citing without context or reason seems to be a major cause for citations errors
3. Lack of originality in opinion, thought and research reported
4. Unintentional typo mistakes propagated due to copy cat habit of following authors
5. Although a small percentage of people read the papers they cite but due to their laziness they just copies a citation from an unreliable reference list
6. Publishers are equally blamed for this kind of problem especially when there are more than 100 citations styles and lot of them look like similar but they are different
One thing is sure that we need better tools to create, manage and process the citations during manuscript writing process. As Dubin suggests that
Another irony—one representing a more fitting tribute to Salton’s legacy—is that locating papers containing the mistaken citation is very difficult using conventional citation databases such as the Web of Science. But discovery of the errors is greatly aided by search engines such as Google and CiteSeer—systems that employ techniques similar to those that Salton himself refined and recommended.
Making the whole citations management workflow easy and user friendly can provide required motivation to create citation list from scratch rather than just copying from unreliable sources, and as matter of fact tools like CiteULike , Connotea or others can handle this kind of problems (check out my previous post on Collaborative scientific writing and sharing using freely available tools).



















Unsighted Cite: Highly Cited Paper which was never written: In a recent post E. Garcia points out a great articl.. http://bit.ly/20Imz