您好,欢迎来到小奈知识网。
搜索
您的当前位置:首页The NASA Astrophysics Data System Data Holdings

The NASA Astrophysics Data System Data Holdings

来源:小奈知识网
Yourthesauruscodesare:04(04.01.1)

TheNASAAstrophysicsDataSystem:DataHoldings

C.Grant,A.Accomazzi,G.Eichhorn,M.J.Kurtz,andS.S.Murray

Harvard-SmithsonianCenterforAstrophysics,Cambridge,MA02138Received/Accepted

000Abstract.Sinceitsinceptionin1993,theADSAbstract2Servicehasbecomeanindispensableresearchtoolforas- ebtronomersandastrophysicistsworldwide.Inthosesevenyears,muchefforthasbeendirectedtowardimprovingFboththequantityandthequalityofreferencesinthe 4database.Fromtheoriginaldatabaseofapproximately 160,000astronomyabstracts,ourdatasethasgrownal-1mosttenfoldtoapproximately1.5millionreferencescov-v3eringastronomy,astrophysics,planetarysciences,physics,0optics,andengineering.Wecollectandstandardizedata1fromapproximately200journalsandpresenttheresulting2informationinauniform,coherentmanner.Withtheco-0operationofjournalpublishersworldwide,wehavebeen0abletoplacescansoffulljournalarticleson-linebackto0/thefirstvolumesofmanyastronomicaljournals,andwehareabletolinktocurrentversionofarticles,abstracts,andp-datasetsforessentiallyallofthecurrentastronomyliter-oature.Thetrendtowardelectronicpublishinginthefield,rttheuseofelectronicsubmissionofabstractsforjournalsarticlesandconferenceproceedings,andtheincreasinglya:prominentuseoftheWorldWideWebtodisseminatein-vformationhaveenabledtheADStobuildadatabaseun-iXparalleledinotherdisciplines.rTheADScanbeaccessedatahttp://adswww.harvard.edu

Keywords:methods:dataanalysis–astronomicalbib-liography–astronomicalsociology

Sendoffprintrequeststo:C.GrantCorrespondenceto:C.Grant

duetoacombinationoffactors:growthinprofessionalsocietymembership,anincreaseinpapersbymultipleauthors,thelaunchingofnewspacecrafts,andincreasedcompetitionforjobsandPIsinthefield(sincecandidateevaluationispartiallybasedonpublicationhistory).Asthenumberofpapersinthefieldgrows,sodoestheneedfortoolswhichastronomerscanusetolocatethatfractionofpaperswhichpertaintotheirspecificinterests.TheADSAbstractServiceisoneofseveralbiblio-graphicserviceswhichprovidethisfunctionforastron-omy,butduetothebroadscopeofourcoverageandthesimplicityofaccesstoourdata,astronomersnowrelyex-tensivelyontheADS,andotherbibliographicservicesnotonlylinktous,butsomehavebuilttheirbibliographicsearchcapabilitiesontopoftheADSsystem.TheInter-nationalSocietyforOpticalEngineering(SPIE)andtheNASATechnicalReportService(NTRS)aretwosuchser-vices.

TheevolutionoftheAstrophysicsDataSystem(ADS)hasbeenlargelydata-driven.Oursearchtoolsandindex-ingroutineshavebeenmodifiedtomaximizespeedandefficiencybasedonthecontentofourdataset.Asnewtypesofdata(suchaselectronicversionsofarticles)be-cameavailable,theAbstractServicequicklyincorporatedthatnewfeature.TheorganizationandstandardizationofthedatabasecontentistheverycoreuponwhichtheAbstractServicehasbeenbuilt.

ThispapercontainsadescriptionoftheADSAbstractServicefroma“data”pointofview,specificallydescrip-tionsofourholdingsandoftheprocessesbywhichweingestnewdataintothesystem.Detailsareprovidedontheorganizationofthedatabases(section2),thedescrip-tionofthedatainthedatabases(section3),thecreationofbibliographicrecords(section4),theproceduresforup-datingthedatabase(section5),andonthescannedarti-clesintheAstronomydatabase(section6).WediscusstheinteractionbetweentheADSandthejournalpublishers(section7)andanalyzesomeofthenumberscorrespond-ingtothedatasets(section8).InconjunctionwiththreeotherADSpapersinthisvolume,thispaperisintendedto

2C.Grantetal.:TheNASAAstrophysicsDataSystem:DataHoldings

offerdetailsontheentireAbstractServicewiththehopesthatastronomerswillhaveabetterunderstandingofthereferencedatauponwhichtheyrelyfortheirresearch.Inaddition,wehopethatresearchersinotherdisciplinesmaybeabletobenefitfromsomeofthedetailsdescribedherein.

AsisoftenthecasewithdescriptionsofactiveInternetresources,whatfollowsisadescriptionofthepresentsit-uationwiththeADSAbstractService.Newfeaturesarealwaysbeingadded,someofwhichnecessitatechangesinourcurrentprocedures.Furthermore,withthegrowthofelectronicpublishing,someofourcoreideasaboutbibli-ographictoolsandrequirementsmustbereconsideredinordertobeabletotakefulladvantageofnewpublishingtechnologiesforanewmillennium.

2.TheDatabases

TheADSAbstractServicewasoriginallyconceivedofinthemid1980’sasawaytoprovideon-lineaccesstobibliographiesofastronomerswhichwerepreviouslyavailableonlythroughexpensivelibrariansearchservicesorthroughtheA&AAbstractsseries(Schmadel1979,Schmadel1982,Schmadel19),publishedbytheAs-tronomischesRechen-InstitutinHeidelberg.WhiletheideasbehindtheAbstractServicesearchenginewerebeingdeveloped(seeKurtzetal.2000,hereafterOVERVIEW),concurrenteffortswereunderwaytoacquireareliabledatasourceonwhichtobuildtheserver.Inordertobestde-velopthelogisticsofthesearchengineitwasnecessarytohaveaccesstorealliteraturedatafromthepastandpresent,andtosetupamechanismforacquiringdatainthefuture.

Anelectronicpublishingmeetinginthespringof1991broughttogetheranumberoforganizationswhoseulti-matecooperationwouldbenecessarytomakethesystemareality(seeOVERVIEWfordetails).NASA’sScientificandTechnicalInformationProgram(STI)offeredtopro-videabstractstotheADS.STI’sabstractswerearewrit-tenversionoftheoriginalabstracts,categorizedandkey-wordedbyprofessionaleditors.Theynotonlyabstractedtheastronomicalliterature,butmanyotherscientificdis-ciplinesaswell.WithSTIagreeabletoprovidingthepastandpresentliterature,andthejournalscommittedtopro-vidingthefutureliterature,thedatabehindthesystemfellintoplace.TheterminationofthejournalabstractingbytheSTIprojectseveralyearslaterwasunfortunate,butdidnotcausethecollapseoftheADSAbstractSer-vicebecauseofthecommitmentofthejournalpublisherstodistributetheirinformationfreely.

TheSTIabstractingapproximatelycoveredtheperiodfrom1975to1995.WiththeSTIdataalone,weestimatedthecompletenessoftheAstronomydatabasetobebetterthan90%forthecoreastronomicaljournals.Fortunately,withtheadditionaldatasuppliedbythejournals,bySIM-

BAD(SetofIdentifications,Measurements,andBibli-ographiesforAstronomicalData,Egret&Wenger1988)attheCDS(CentredeDonn´eesAstronomiquesdeStras-bourg),andbyperformingOpticalCharacterRecognition(OCR)onthescannedtableofcontents(seesection6below),wearenowcloserto99%completeforthatpe-riod.Intheperiodsincethenweare100%completeforthosejournalswhichprovideuswithdata,andsignifi-cantlylesscompleteforthosewhichdonot(e.g.manyob-servatorypublicationsandnon-U.S.journals).Thedatapriorto1975arealsosignificantlyincomplete,althoughwearecurrentlyworkingtoimprovethecompletenessoftheearlydata,primarilythroughscanningthetableofcontentsforjournalvolumesastheyareplacedon-line.Weare100%completeforanyjournalvolumewhichwehavescannedandputon-line,sinceweverifythatwehaveallbibliographicentriesduringtheprocedureofputtingscanson-line.

SincetheSTIdataweredividedintocategories,itwaseasytocreateadditionaldatabaseswithnon-astronomicaldatawhichwerestillofinteresttoastronomers.Thecre-ationofanInstrumentationdatabasehasenabledustoprovideadatabaseforliteraturerelatedtoastronomicalinstrumentation,ofparticularinteresttothosescientistsbuildingastronomicaltelescopesandsatelliteinstruments.WewerefortunatetogetthecooperationoftheSPIEveryquicklyafterreleasingtheInstrumentationdatabase.SPIEhasbecomeourmajorsourceofabstractsfortheIn-strumentationdatabasenowthatSTInolongersuppliesuswithdata.

OurPhysicsandGeophysicsdatabase,thethirddatabasetogoon-line,isintendedforscientistsworkinginphysics-relatedfields.WeaddauthorsandtitlesfromallofthephysicsjournalsoftheAmericanInstituteofPhysics(AIP),theInstituteofPhysics(IOP),andtheAmericanPhysicalSociety(APS),aswellasmanyphysicsjournalsfrompublisherssuchasElsevierandAcademicPress(AP(AP)).

Thefourthdatabaseinthesystem,thePreprintdatabase,containsasubsetoftheLosAlamosNationalLaboratory’s(LANL)PreprintArchive(LosAlamosNationalLaboratory1991).OurdatabaseincludestheLANLastro-phpreprintswhicharere-trievedfromLANLandindexednightlythroughanautomatedprocedure.Thatdatasetincludespreprintsfromastronomicaljournalssubmitteddirectlybyauthors.

3.DescriptionoftheData

TheoriginalsetofdatafromSTIcontainedseveralbasicfieldsofdata(author,title,keywords,andabstracts)tobeindexedandmadeavailableforsearching.AllrecordswerekeyedonSTI’saccessionnumber,anine-digitcodeconsistingofaletterprefix(AorN)followedbyatwo-digitpublicationyear,followedbyafive-letteridentifier(e.g.

C.Grantetal.:TheNASAAstrophysicsDataSystem:DataHoldings3

A95-12345).Datawerestoredinfilesnamedbyaccessionnumber.

Withtheinclusionofdatafromothersources,pri-marilythejournalpublishersandSIMBAD,weextendedSTI’sconceptoftheaccessionnumbertohandleotherabstractsaswell.SincetheADSmayreceivethesameabstractfrommultiplesources,weoriginallyadoptedasystemofusingadifferentprefixletterwiththeremain-deroftheaccessionnumberbeingthesametodescribeabstractsreceivedfromdifferentsources.Thus,thesameabstractfortheaboveaccessionnumberfromSTIwouldbelistedasJ95-12345fromthejournalpublisherandS95-12345fromSIMBAD.Thisallowedtheindexingroutinestoconsideronlyoneinstanceoftherecordwhenindexing.Recently,limitationsintheformatofaccessionnumbersandthedesiretoindexdatafrommultiplesources(ratherthanjustSTI’sversion)havepromptedustomovetoadatastoragesystembasedentirelyonthebibliographiccode.

3.1.BibliographicCodes

TheconceptofauniquebibliographiccodeusedtoidentifyanarticlewasoriginallyconceivedofbySIMBADandNED(NASA’sExtragalacticDatabase,Helou&Madore1988).Theoriginalspecificationisde-tailedinSchmitzetal.1995.Intheyearssince,theADShasadoptedandexpandedtheirdefinitiontobeabletodescribereferencesoutsideofthescopeofthoseprojects.Thebibliographiccodeisa19-characterstringcom-prisedofseveralfieldswhichusuallyenablesausertoidentifythefullreferencefromthatstring.Itisdefinedasfollows:

YYYYJJJJJVVVVMPPPPA

wherethefieldsaredefinedinTable1.

Thejournalfieldisleft-justifiedandthevolumeandpagefieldsareright-justified.Blankspacesandleadingze-roesarereplacedbyperiods.Forarticleswithpagenum-bersgreaterthan9999,theMfieldcontainsthefirstdigitofthepagenumber.TheAfieldcontainsacolon(“:”)ifthereisnoauthorlisted.

Creatingbibliographiccodesfortheastronomicaljour-nalsisuncontroversial.Eachjournaltypicallyhasacommonly-usedabbreviation,andthevolumeandpageareeasilyassigned(e.g.1999PASP..111..438F).Eachvol-umetendstohaveindividualpagenumbering,andinthosecaseswheremorethanonearticleappearsonapage(suchaserrata),a“Q”,“R”,“S”,etc.isusedasthequal-ifierforpublicationtomakebibliographiccodesunique.Whenpagenumberingisnotcontinuousacrossissuenum-bers(suchasSky&Telescope),theissuenumberisrepre-sentedbyalowercaseletterasthequalifierforpublication(e.g.“a”forissue1).Thisisbecausetheremaybemultiplearticlesinavolumestartingonthesamepagenumber.Creatingbibliographiccodesforthe“grey”literaturesuchasconferenceproceedingsandtechnicalreportsisa

moredifficulttask.TheexpansionintotheseadditionaltypesofdataincludedintheADSrequiredustomod-ifytheoriginalprototypebibliographiccodedefinitioninordertopresentidentifierswhichareeasilyrecognizabletotheuser.Theprototypedefinitionofthebibliographiccodesuggestedusingasingleletterinthesecondplaceofthevolumefieldtoidentifynon-standardreferences(cat-alogs,PhDtheses,reports,preprints,etc.)andusingthethirdandfourthplaceofthatfieldtounduplicateandre-portvolumenumbers(e.g.1981CRJS..R.3...14W).Sincewefeltthatthiscreatedcodesunidentifiabletothetyp-icaluserandsinceNEDandSIMBADdidnotfeelthatusersneededtobeabletoidentifybooksdirectlyfromtheirbibliographiccodes,theADSadopteddifferentrulesforcreatingcodestoidentifythegreyliterature.Itisstraightforwardtocreatebibliographiccodesforconferenceproceedingswhicharepartofaseries.Forex-ample,theIAUSymposiaSeries(IAUS)containsvolumenumbersandthereforefitsthejournalmodelforbiblio-graphiccodes.Otherconferenceproceedings,books,col-loquia,andreportsintheADStypicallycontainafourletterwordinthevolumefieldsuchas“conf”,“proc”,“book”,“coll”,or“rept”.Whenthisisthecasewithabibliographiccode,thejournalfieldtypicallyconsistsofthefirstletterfromimportantwordsinthetitle.Thiscangivetheusertheabilitytoidentifyaconferenceproceed-ingataglance(e.g.“ioda.book”for“InformationandOn-LineDatainAstronomy”).Wewilloftenleavethefifthplaceofthejournalfieldasadotfor“readability”(e.g.1995ioda.book..175M).Formostproceedingswhicharealsopublishedaspartofaseries(e.g.ASPConfer-enceSeries,IAUColloquia,AIPConferenceSeries),weincludeinthesystemtwobibliographiccodes,oneasde-scribedaboveandonewhichcontainstheseriesnameandthevolume(seesection5.1).Wedothissothatuserscansee,forexample,thatapaperpublishedinoneofthe“AstronomicalDataAnalysisSoftwareandSystems”se-riesisclearlylabelledas“adass”whereasatypicalusermightnotrememberwhichvolumeofASPCcontainedthoseADASSpapers.Thisincreasestheuser’sreadabilityofbibliographiccodes.

WiththeSTIdata,thedetailswereoftenunclearastowhetheranarticlewasfromaconferenceproceeding,ameeting,acolloquium,etc.Weassignedthosecodesasbestwecould,makingnosignificantdistinctionbetweenthem.Forconferenceabstractssubmittedbytheeditorsofaproceedingspriortopublication,weoftendonothavepagenumbers.Inthiscase,weuseacounterinlieuofapagenumberandusean“E”(for“Electronic”)inthefour-teenthcolumn,thequalifierforpublication.Ifthesecon-ferenceabstractsarethenpublished,theirbibliographiccodesarereplacedbyabibliographiccodecompletewithpagenumber.Iftheconferenceabstractsarepublishedonlyon-line,theyretaintheirelectronicbibliographiccodewithitsEandcounternumber.

4C.Grantetal.:TheNASAAstrophysicsDataSystem:DataHoldings

Table1.BibliographicCodeDefinition(e.g.1996A&AS..115....1S)

YYYYJJJJJVVVVMPublicationYear

JournalAbbreviationVolumeNumber

QualifierforPublication

PPPPAPageNumber

FirstLetteroftheFirstAuthor’sSurname

1997

ApJ,A&A,MNRAS,etc.480

L(forLetter),P(forPinkPage)Q,R,S,etc.forunduplicatinga,b,c,etc.forissuenumber129N

C.Grantetal.:TheNASAAstrophysicsDataSystem:DataHoldings5

allkeywordsinthesystemandequatesthosefromdiffer-entkeywordsystemswhicharesimilar(Leeetal.1999).Thecurrentnon-searchablefieldsintheADSdatabasesincludethejournalfield,authoraffiliation,cat-egory,abstractcopyright,andabstractorigin.Althoughwemaydecidetocreateanindexandsearchinterfaceforsomeoftheseentities(suchascategory),otherswillcontinuetoremainunsearchablesincesearchingthemisnotusefultothetypicaluser.Inparticular,authoraffilia-tionswouldbeusefultosearch,howeverthisinformationisinconsistentlyformattedsoitisvirtuallyimpossibletocollectallvariationsofagiveninstitutionforindexingco-herently.Furthermore,wehavetheauthoraffiliationsforonlyabouthalfoftheentriesintheAstronomydatabasesowehavedecidedtokeepthisfieldnon-searchable.Forresearcherswishingtoanalyzeaffiliationsonalargescale,wecanprovidethisinformationonacollaborativebasis.3.3.DataSources

TheADScurrentlyreceivesabstractsortableofcontents(ToC)referencesfromalmosttwohundredjournalsources.Tables2,3,and4listthesejournals,alongwiththeirbibli-ographiccodeabbreviation,source,frequencywithwhichwereceivethedata,whatdataarereceived,andanylinkswecancreatetothedata.ToCreferencestypicallycon-tainonlyauthorandtitle,althoughsometimeskeywordsareincludedaswell.Thedataarecontributedviaemail,ftp,orretrievedfromwebsitesaroundtheworldatafre-quencyrangingfromonceaweektoapproximatelyonceayear.Theterm“often”usedinthefrequencycolumnimpliesthatwegetthemmorefrequentlythanonceamonth,butnotnecessarilyonaregularbasis.Theterm“occasionally”isusedforthosejournalswhosubmitdatatousinfrequently.

UpdatestotheAstronomyandInstrumentationdatabasesoccurapproximatelyeverytwoweeks,ormoreofteniflogisticallypossible,inordertokeepthedatabasecurrent.Recentenhancementstotheindexingsoftwarehaveenabledustoperforminstantaneousupdates,trig-geredbyanemailcontainingnewdata(seeARCHITEC-TURE).UpdatestothePhysicsdatabaseoccursapprox-imatelyonceeverytwomonths.Asstatedearlier,thePreprintdatabaseisupdatednightly.3.4.DataFormats

TheADSisabletobenefitfromcertainstandardswhichareadheredtointhewritingandsubmissionpracticesofastronomicalliterature.Thejournalssharecommonabbreviationsandtextformattingroutineswhichareusedbytheastronomersaswell.TheuseofTeX(Knuth1984)andLaTeX(Lamport1986),andtheirextensiontoBibTeX(Lamport1986)andAASTeX

(AmericanAstronomicalSociety1999)resultsincommonformatsamongsomeofourdatasources.Thisenablesthereuseofparsingroutinestoconverttheseformatstoourstandardformat.OthervariationsofTeXusedbyjournalpublishersalsoallowsustousecommonparsingroutineswhichgreatlyfacilitatesdataloading.

TeXisapublicdomaintypesettingprogramdesignedespeciallyformathandscience.Itisamarkupsystem,whichmeansthatformattingcommandsareinterspersedwiththetextintheTeXinputfile.Inadditiontocom-mandsforformattingordinarytext,TeXincludesmanyspecialsymbolsandcommandswithwhichyoucanfor-matmathematicalformulaewithbotheaseandprecision.Becauseofitsextraordinarycapabilities,TeXhasbecometheleadingtypesettingsystemforscience,mathematics,andengineering.ItwasdevelopedbyDonaldKnuthatStanfordUniversity.

LaTeXisasimplifieddocumentpreparationsystembuiltonTeX.BecauseLaTeXisavailableforjustaboutanytypeofcomputerandbecauseLaTeXfilesareASCII,scientistsareabletosendtheirpaperselectronicallytocolleaguesaroundtheworldintheformofLaTeXin-put.ThisisalsotrueforothervariantsofTeX,althoughtheastronomicalpublishingcommunityhaslargelycen-teredtheirpublishingstandardsonLaTeXoroneofthesoftwarepackagesbasedonLaTeX,suchasBibTeXorAASTeX.BibTeXisaprogramandfileformatdesignedbyOrenPatashnikandLeslieLamportin1985fortheLaTeXdocumentpreparationsystem,andAASTeXisaLaTeX-basedpackagethatcanbeusedtomarkupmanuscriptsspecificallyforAmericanAstronomicalSo-ciety(AAS)journals.

SimilartothewidespreadacceptanceofTeXanditsvariants,theextensiveuseofSGML(StandardGener-alizedMarkupLanguage,Goldfarb&Rubinsky1991)bythemembersofthepublishingcommunityhasgivenustheabilitytostandardizemanyofourparsingroutines.AlldatagleanedofftheWorldWideWebsharefea-turesduetotheuseofHTML(HyperTextMarkupLan-guage,Powell&Whitworth1998),anexampleofSGML.Furthermore,thetrendtowardsusingXML(ExtensibleMarkupLanguage,Harold1999)todescribetextdoc-umentswillenableustosharestandarddocumentat-tributeswithothermembersoftheastronomicalcommu-nity.XMLisasubsetofSGMLwhichisintendedtoen-ablegenericSGMLtobeserved,received,andprocessedontheWebinthewaythatisnowpossiblewithHTML.TheADSparsingroutinesbenefitfromthesestandardsinseveralways:wecanreuseroutinesdesignedaroundthesesystems;weareabletopreserveoriginaltextrepre-sentationsofentitiessuchasembeddedaccentssotheseentitiesaredisplayedcorrectlyintheuser’sbrowser;andweareabletocapturevalue-addedfeaturessuchaselec-tronicURLsandemailaddressesforuseelsewhereinoursystem.

6C.Grantetal.:TheNASAAstrophysicsDataSystem:DataHoldings

Table2.TheADSAstronomyDatabase

SeeaccompanyingtextfileADS

abcdefghij

LettercodesdescribingwhatdataareavailableAstronomischeGesellschaftUniversityofChicagoPressAmericanInstituteofPhysicsOverseasPublishersAssociationAmericanGeophysicalUnion

CentralBureauforAstronomicalTelegramsAcademicPress

UniversitadNacionalAutonomadeMexicoAstronomischesRechen-Institut

Table3.TheADSInstrumentationDatabase

SeeaccompanyingtextfileADS

abcd

LettercodesdescribingwhatdataareavailableOpticalSocietyofAmerica

TheInternationalSocietyforOpticalEngineering(SPIE)InstituteofPhysics

Table4.TheADSPhysicsDatabase

SeeaccompanyingtextfileADS

a

Lettercodesdescribingwhatdataareavailable

InordertofacilitatedataexchangebetweendifferentpartsoftheADS,wemakeuseofataggedformatsimilartothe“Refer”format(Jacobsen1996).Referisaprepro-cessorforthewordprocessorsnroffandtroffwhichfindsandformatsreferences.Whileourtaggedformatssharesomecommonfields(%A,%T,%J,%D),theReferfor-matisnotspecificenoughtobeusedforourpurposes.Itemssuchasobjects,URLsandcopyrightnoticesarebe-yondthescopeoftheRefersyntax.DetailsonourtaggedformatareprovidedinTable5.Readingandwritingrou-tinesforthisformataresharedbyloadingandindexingroutines,andanumberofourdatasourcessubmitab-stractstousinthisformat.

stage.Fromtheimplementationpointofview,thismeansthatbibliographicrecordsprovidedtotheADSbypub-lishersorotherdatasourcestypicallyaresavedasfileswhicharetaggedwiththeirorigin,entrydate,andanyotherancillaryinformationrelevanttotheircontents(e.g.ifthefieldsintherecordcontaindatawhichwastranslit-eratedorconvertedtoASCII).

Forinstance,therecordsprovidedtotheADSbytheUniversityofChicagoPress(thepublisherofseveralmajorU.S.astronomicaljournals)areSGMLdocumentswhichcontainauniquemanuscriptidentifierassignedtothepa-perduringtheelectronicpublishingprocess.Thisidenti-fierissavedinthefilecreatedbytheADSsystemforthisbibliographicentry.

BecausedataaboutaparticularbibliographicentrymaybeprovidedtotheADSbydifferentsourcesandatdifferenttimes,weadoptedamulti-stepprocedureinthecreationandmanagementofbibliographicrecords:1)Tokenization:Parsinginputdataintoamemory-residentdatastructureusingprocedureswhichareformat-andsource-specific.

4.CreatingtheBibliographicRecords

Oneofthebasicprinciplesintheparsingandformat-tingofthebibliographicdataincorporatedintotheADSdatabaseovertheyearshasbeentopreserveasmuchoftheoriginalinformationaspossibleanddelayanysyn-tacticorsemanticinterpretationofthedatauntilalater

C.Grantetal.:TheNASAAstrophysicsDataSystem:DataHoldings

Table5.TaggedFormatDefinitions

7

%R%T%A%D%B%C%E%F%G%H%J%K%L%O%Q%U%V%W%X%Y%ZBibliographicCodeTitle

AuthorList

PublicationDateAbstractText

AbstractCopyright

URLforElectronicDataTableAuthorAffiliationOriginEmail

JournalName,Volume,andPageRangeKeywords

LastPageofArticleObjectNameCategory

URLforElectronicDocumentLanguage

Database(AST,PHY,INST)CommentIdentifiersReferences

requiredrequiredrequiredrequired

8C.Grantetal.:TheNASAAstrophysicsDataSystem:DataHoldings

Theassessmentofwhatconstitutesamultiplewordlastnameasopposedtoamiddlenameisnon-trivialsincesomenames,suchasDavis,canbeafirstname(DavisHartman),amiddlename(A.G.DavisPhilip),alastname(RobertDavis),orsomecombination(DavisS.Davis).Anotherexampleishowtodeterminewhenthename“Van”isafirstname(VanNguyen),amiddlename(W.VanDykeDixon),orpartofalastname(J.vanAllen).Handlingallofthesecasescor-rectlyrequiresnotonlyfamiliaritywithnamingcon-ventionsworldwide,butanintimatefamiliaritywiththenamesofastronomerswhopublishinthefield.Wearecontinuallyamassingthelatterasweincorporateincreasingamountsofdataintothesystem,andaswegetfeedbackfromourusers.

–Spellchecking:Sincemanyofthehistoricalrecordsen-teredintheADShavebeengeneratedbytypesettingtablesofcontents,typographicalerrorscanoftenbeflaggedinanautomatedwayusingspell-checkingsoft-ware.WehavedevelopedaPERLsoftwaredriverfortheinternationalispellprogram,aUNIXutility,whichcanbeusedasaspell-checkingfilteronallinputtobeconsideredtextualinformation.Acustomdictionarycontainingtermsspecifictoastronomyandspacesci-encesisusedtoincreasetherecognitioncapabilitiesofthesoftwaremodule.Anycorrectionssuggestedbythespell-checkermodulearereviewedbyahumanbeforethedataareactuallyupdated.

–Languagerecognition:Extendingthecapabilityofthespell-checker,wehaveimplementedasoftwaremodulewhichattemptstoguessthelanguageofaninputtextbufferbasedonthepercentageofwordsthatitcanrecognizeinoneofseverallanguages:English,Ger-man,French,Spanish,orItalian.Thismoduleisusedtoflagrecordstobeenteredinourdatabaseinalan-guageotherthanEnglish.Knowledgeofthelanguageofanabstractallowsustocreateaccuratesynonymsforthosewords(seeARCHITECTURE).

4.2.Identification

Wecallidentificationtheactivityofmappingthetokensextractedfromtheparsingofabibliographicrecordintoauniqueidentifier.TheADSadoptedtheuseofbibli-ographiccodesastheidentifierforbibliographicentriesshortlyafteritsinception,inordertofacilitatecommuni-cationbetweentheADSandSIMBAD.Theadvantageofusingbibliographiccodesasuniqueidentifiersisthattheycanmostoftenbecreatedinastraightforwardwayfromtheinformationgiveninthelistofreferencespublishedintheastronomicalliterature,namelythepublicationyear,journalname,volume,andpagenumbers,andfirstau-thor’sname(seesection3.1fordetails).

4.3.Instantiation

“Instantiation”ofabibliographicentryconsistsofthecreationofarecordforitintheADSdatabase.TheADSmusthandlereceiptofthesamedatafrommulti-plesources.Wehavecreatedahierarchyofdatasourcessothatwealwaysknowthepreferreddatasource.Aref-erenceforwhichwehavereceivedrecordsfromSTI,thejournalpublisher,SIMBAD,andNED,forexample,mustbeinthesystemonlyoncewiththebestinformationfromeachsourcepreserved.Whenweloadareferenceintothesystem,wecheckwhetheratextfilealreadyexistsforthatreference.Ifthereisnotextfile,itisanewreferenceandatextfileiscreated.Iftherealreadyisatextfile,weappendthenewinformationtothecurrenttextfile,cre-atinga“merged”textfile.Thismergedtextfilelistseveryinstanceofeveryfieldthatwehavereceived.4.4.Extraction

By“extraction”ofabibliographicentrywemeanthepro-cedureusedtocreateauniquerepresentationofthebibli-ographyfromtheavailablerecords.Thisisessentiallyanactivityofdatafusionandunification,whichremovesre-dundanciesinthebibliographicrecordsobtainedbytheADSandproperlylabelsfieldsbytheircharacteristics.Theextractionalgorithmhasbeendesignedwithourpriorexperienceastothequalityofthedatatoselectthebestfieldsfromeachdatasource,tocross-correlatethefieldsasnecessary,andtocreatea“canonical”textfilewhichcontainsauniqueinstanceofeachfield.Sincethelatteriscreatedthroughsoftware,onlyoneversionofthetextfilemustbemaintained;whenthemergedtextfileisap-pended,thecanonicaltextfileisautomaticallyrecreated.

Theextractionroutineselectsthebestpiecesofin-formationfromeachsourceandcombinesthemintoonereferencewhichismorecompletethantheindividualref-erences.Forexample,authorlistsreceivedfromSTIwereoftentruncatedafterfiveortenauthors.Wheneverwehavealongerauthorlistfromanothersource,thatau-thorlistisusedinstead.Thisnotonlyrecapturesmissingauthors,italsoprovidesfullauthornamesinsteadofau-thorinitialswheneverpossible.Inaddition,ourjournalsourcessometimesomitthelastpagenumberoftherefer-ence,butSIMBADusuallyincludesit,soweareabletopreservethisinformationinourcanonicaltextfile.

Somefieldsneedtobelabelledbytheircharacteris-ticssothattheyareproperlyindexedanddisplayed.Thekeywords,forexample,needtobeattributedtoaspe-cifickeywordsystem.Thesystemdesignationallowsformultiplekeywordsetstobedisplayed(e.g.NASA/STIKeywordsandAASKeywords)andwillbeusedinthekeywordsynonymtablecurrentlyunderdevelopment(Leeetal.1999).

Wealsoattempttocross-correlateauthorswiththeiraffiliationswhereverpossible.Thisisnecessaryforrecords

C.Grantetal.:TheNASAAstrophysicsDataSystem:DataHoldings9

wherethepreferredauthorfieldisfromonesourceandtheaffiliationsarefromanothersource.WeattempttoassigntheproperaffiliationbasedonthelastnameanddonotassumethattheauthororderisaccuratesinceweareawareoforderingdiscrepanciesinsomeoftheSTIrecords.

Throughthesefourstepsintheprocedureofcreatingandmanagingbibliographicrecords,weareabletotakeadvantageofreceivingthesamereferencefrommultiplesources.Westandardizethevariousrecordsandpresenttotheuseracombinationofthemostreliablefieldsfromeachdatasourceinonesuccincttextfile.5.UpdatingtheDatabase

ThesoftwaretoupdatebibliographicrecordsinthedatabaseconsistsofaseriesofPERLscripts,typicallyoneperdatasource,whichreadsinthedata,performsanyspecialprocessingparticulartothatdatasource,andwritesoutthedatatotextfiles.Theloadingroutinesper-formthreefundamentaltasks:1)theyaddnewbiblio-graphiccodestothecurrentmasterlistofbibliographiccodesinthesystem;2)theycreateandorganizethetextfilescontainingthereferencedata;and3)theymaintainthelistsofbibliographiccodesusedtoindicatewhatitemsareavailableforagivenreference.5.1.TheMasterList

Themasterlistisatablecontainingbibliographiccodestogetherwiththeirpublicationdates(YYYYMM)anden-trydatesintothesystem(YYYYMMDD).Thereisonemasterlistperdatabasewithonelineperreference.Themostimportantaspectofthemasterlististhatitre-tainsinformationabout“alternative”bibliographiccodesandmatchesthemtotheircorrespondingpreferredbiblio-graphiccode.Analternativebibliographiccodeisusuallyareferencewhichwereceivefromanothersource(primar-ilySIMBADorNED)whichhasbeenassignedadifferentbibliographiccodefromtheoneusedbytheADS.Some-timesthisisduetothedifferentrulesusedtobuildbibli-ographiccodesfornon-standardpublications(seesection3.1),butoftenitisjustanincorrectyear,volume,page,orauthorinitialinoneofthedatabases(SIMBADorNEDortheADS).Ineithercase,theADSmustkeeptheal-ternativebibliographiccodeinthesystemsothatitcanbefoundwhenreferencedbytheothersource(e.g.whenSIMBADsendsbackalistoftheircodesrelatedtoanobject).TheADSmatchesthealternativebibliographiccodetoourcorrespondingoneandreplacesanyinstancesofthealternativecodewhenreferencedbytheotherdatasource.Alternativebibliographiccodesinthemasterlistareprependedwithanidentificationletter(SforSIM-BAD,NforNED,JforJournal)sothattheiroriginisretained.

Whilewemakeeveryefforttopropagatecorrectionsbacktoourdatasources,sometimesthereissimplyavaliddiscrepancy.Forexample,alternativebibliographiccodesareoftendifferentfromtheADSbibliographiccodeduetoambiguousdifferencessuchaswhichnameisthesurnameofaChineseauthor.SinceAmericanstendtoinvertChinesenamesoneway(Zheng,Wei)andEuro-peansanother(Wei,Zheng),thisresultsintwodifferent,butequallyvalidcodes.Similarly,discrepanciesinjournalnamessuchasBAAS(forthepublishedabstractsintheBulletinoftheAmericanAstronomicalSociety)andAAS(fortheequivalentabstractwithmeetingandsessionnum-ber,butnovolumeorpagenumber)needdifferentcodestorefertothesamepaper.RussianandChinesetransla-tionjournals(AstronomicheskiiZhurnalvs.SovietAstron-omyandActaAstronomicaSinicavs.ChineseAstronomyandAstrophysics)sharethesameproblem.Thesepapersappearonceintheforeignjournalandonceinthetrans-lationjournal(usuallywithdifferentpagenumbers),butareactuallythesamepaperwhichshouldbeinthesys-temonlyonce.TheADSmustthereforemaintainmultiplebibliographiccodesforthesamearticlesinceeachjournalhasitsownabbreviation,andqueriesforeitheronemustbeabletoberecognized.Themasterlististhesourceofthiscorrelationandenablestheindexingproceduresandsearchenginetorecognizealternativebibliographiccodes.5.2.TheTextFiles

TextfilesintheADSarestoredinadirectorytreebybibliographiccode.Thetoplevelofdirectoriesisdividedintodirectorieswithfour-digitnamesbypublicationyear(characters1through4ofthebibliographiccode).Thenextlevelcontainsdirectorieswithfive-characternamesaccordingtojournal(characters5through9),andthetextfilesarenamedbyfullbibliographiccodeunderthesejournaldirectories.Thus,asamplepathnameis1998/MNRAS/1998MNRAS.295...75E.Alternativebibli-ographiccodesdonothaveatextfilenamedbythatcode,sincethetranslationtotheequivalentpreferredbiblio-graphiccodeisdonepriortoaccessingthetextfile.

Asampletextfileisgivenintheappendices.AppendixBshowsthefullbibliographicentry,includingallrecordsasreceivedfromSTI,MNRAS,andSIMBAD.ItcontainsXML-taggedfieldsfromeachsource,showingallinstancesofeveryfield.AppendixCshowstheextractedcanonicalversionofthebibliographicentrywhichcontainsonlyse-lectedinformationfromthemergedtextfile.Thislatterversionisdisplayedtotheuserthroughtheuserinterface(seeSEARCH).5.3.TheCodesFiles

Thethirdbasicfunctionoftheloadingproceduresistomodifyandmaintainthelistingsforavailableitems.The

10C.Grantetal.:TheNASAAstrophysicsDataSystem:DataHoldings

ADSdisplaystheavailabilityofresourcesorinformationrelatedtobibliographicentriesaslettercodesinthere-sultslistofqueriesandasmoredescriptivehyperlinksinthepagedisplayingthefullinformationavailableforabib-liographicentry.AfulllistingoftheavailableitemcodesandtheirmeaningisgiveninSEARCH.

Theloadingroutinesmaintainlistsofbibliographiccodesforeachlettercodeinthesystemwhicharecon-vertedtoURLsbytheindexingroutines(seeARCHITEC-TURE).Bibliographiccodesareappendedtothelistsei-therduringtheloadingprocessoraspost-processingworkdependingontheavailabilityoftheresource.Whenelec-tronicavailabilityofdatacoincideswithourreceiptofthedata,thebibliographiccodescanbeappendedtothelistsbytheloadingprocedures.Whenwereceivethedatapriortoelectronicavailability,post-processingroutinesmustberuntoupdatethebibliographiccodelistsafterweareno-tifiedthatwemayactivatethelinks.6.TheArticles

TheADSisabletoscanandprovidefreeaccesstopastissuesoftheastronomicaljournalsbecauseofthewillingcollaborationofthejournalpublishers.Theprimaryrea-sonthatthejournalpublishershaveagreedtoallowthescanningoftheiroldvolumesisthatthelossofindivid-ualsubscriptionsdoesnotposeathreattotheirliveli-hood.Unlikemanydisciplines,mostastronomyjournalsareabletopayfortheirpublicationsthroughthecostofpagechargestoastronomerswhowritethearticlesandthroughlibrarysubscriptionswhichareunlikelytobecan-celledinspiteoffreeaccesstooldervolumesthroughtheADS.Thejournalpublisherscontinuetochargeforaccesstothecurrentvolumes,whichispaidforbymostinstitu-tionallibraries.Thisarrangementplacesastronomersinafortunatepositionforelectronicaccessibilityofastronomyarticles.

TheoriginalelectronicpublishingplansfortheastronomicalcommunitycalledforSTELAR(STudyofElectronicLiteratureforAstronomicalRe-search,vanSteenberg1992,vanSteenbergetal.1992,Warnocketal.1992,Warnocketal.1993)tohandlethescanninganddisseminationofthefulljournalarticles.However,whentheSTELARprojectwasterminatedin1993,theADSassumedresponsibilityforprovidingscannedfulljournalarticlestotheastronomicalcommu-nity.ThefirsttestjournaltobescannedwastheApJLetterswhichwasscannedinJanuary,1995at300dotsperinch(dpi).Itshouldbenotedthatthosescanswereintendedtobe600dpiandwewillsoonrescanthematthehigher600dpiresolution.Complicationsinthejournalpublishingformat(platesattheendofsomevolumesandinthemiddleofothers)werenotedanddetailedinstructionsprovidedtothescanningcompanysothattheresultingscanswouldbenamedproperlybypageorplatenumber.Allofthescanssincetheoriginaltestbatchhavebeenscannedat600dpiusingahighspeedscannerandgener-atinga1bit/pixelmonochromeimageforeachpage.Thefilescreatedarethenautomaticallyprocessedinordertode-skewandcenterthetextineachpage,resizeimagestoastandardU.S.Lettersize(8.5x11inches),andaddacopyrightnoticeatthebottomofeachpage.Foreachoriginalscannedpage,twoseparateimagefilesofdifferentresolutionsaregeneratedandstoredondisk.Theavail-abilityofdifferentresolutionsallowsuserstheflexibilityofdownloadingeitherhighormediumqualitydocuments,dependingonthespeedoftheirinternetconnection.Theimageformatsandcompressionusedwerechosenbasedontheavailablecompressionalgorithmsandbrowserca-pabilities.Thehighresolutionfilescurrentlyusedare600dpi,1bit/pixelTIFF(TaggedImageFileFormat)files,compressedusingtheCCITTGroup4facsimileencod-ingalgorithm.Themediumresolutionfilesare200dpi,1bit/pixelTIFFfiles,alsowithCCITTGroup4facsimilecompression.

Conversiontoprintingformats(PDF,PCL,andPostscript)isdoneondemand,asrequestedbytheuser.Similarly,conversionfromtheTIFFfilestoalowreso-lutionGIF(GraphicInterchangeFormat)file(75,100,or150dpi,dependingonuserpreferences)forviewingonthecomputerscreenisdoneondemand,thencachedsothatthemostfrequentlyaccessedpagesdonotneedtobecre-atedeverytime.AprocedurerunnightlydeletestheGIFfileswiththeoldestaccesstimestampsothatthetotalsizeofthediskcacheiskeptunderapre-definedlimit.Thecurrent10GBytesofcachesizeinuseattheSAOArticleServercausesonlyfileswhichhavenotbeenaccessedforaboutamonthtobedeleted.Likethefull-screenGIFim-ages,theADSalsocachesthumbnailimagesofthearticlepageswhichprovideuserswiththecapabilityofviewingtheentirearticleataglance.

TheADSusesOpticalCharacterRecognition(OCR)softwaretogainadditionaldatafromTIFFfilesofarticlescans.TheOCRsoftwareisnotyetadequateforaccu-ratereproductionofthescannedpages.Greeksymbols,equations,charts,andtablesdonottranslateaccuratelyenoughtoremaintruetotheoriginalprintedpage.Forthisreason,wehavechosennottodisplaytotheuseranythingrenderedbytheOCRsoftwareinanunsuper-visedfashion.However,wearestillabletotakeadvantageoftheOCRsoftwareforseveralpurposes.

First,weareabletoidentifyandextracttheabstractparagraph(s)forusewhenwedonothavetheabstractfromanothersource.Inthesecases,theOCR’dtextisin-dexedsothatitissearchableandtheextractedimageoftheabstractparagraphisdisplayedinlieuofanASCIIversionoftheabstract.Extractingtheabstractfromthescannedpagesissomewhattedious,asitrequiresestab-lishingdifferentsetsofparametersforeachjournal,aswellasfordifferentfontsusedovertheyearsbythesamejour-nal.TheOCRsoftwarecanbetaughthowtodetermine

C.Grantetal.:TheNASAAstrophysicsDataSystem:DataHoldings11

wheretheabstractends,butitdoesnotworkforeveryarticleduetoodditiessuchasauthorlistswhichextendbeyondthefirstpageofanarticle,andarticleswhichareinadifferentformatfromothersinthesamevolume(e.g.nokeywordsormultiplecolumns).TheADScurrentlycon-tainsapproximately25,000oftheseabstractimagesandmorewillbeaddedaswecontinuetoscanthehistoricalliterature.

WearealsocurrentlyusingtheOCRsoftwaretoren-derelectronicversionsoftheentirescannedarticlesforindexingpurposes.Wewillnotusethisfordisplaytotheusers,buthopetobeabletoindexittoprovidethepos-sibilityoffulltextsearchingatsomefuturedate.Weesti-matethattheindexingofouralmostonemillionscannedpageswithourcurrenthardwareandsoftwarewilltakeapproximatelytwoyearsofdedicatedCPUtime.

ThelastbenefitthatwegainfromtheOCRsoftwareistheconversionofthereferencelistattheendofarticles.WeuseparsedreferencelistsfromthescannedarticlestobuildcitationandreferencelistsfordisplaythroughtheCandRlinksoftheavailableitems.Sincereferencelistsaretypicallyinoneofseveralstandardformats,weparseeachreferenceforauthor,journal,volumeandpagenum-berformostjournalarticles,andconferencename,au-thor,andpagenumberformanyconferenceproceedings.Thisenablesustobuildbibliographiccodelistsforrefer-encescontainedinthatarticle(Rlinks)andinverttheseliststobuildbibliographiccodelistsofarticleswhichcitethispaper(Clinks).Weareabletousethisprocesstoidentifyandthereforeaddcommonly-citedarticleswhicharecurrentlymissingfromtheADS.Thisisusuallydatapriorto1975orastronomy-relatedarticlespublishedinnon-astronomyjournals.

TheArticleServicecurrentlycontains250GBytesofscans,whichconsistsof1,128,955articlepagescomprising138,7articles.Thesenumbersincreaseonaregularba-sis,bothasweaddmorearticlesfromtheolderliteratureandaswescannewjournals.7.ADS/JournalInteraction

AdescriptionofthedataintheADSwouldbeincom-pletewithoutadiscussionoftheinteractionbetweentheADSandtheelectronicjournals.Thedataavailableon-linefromthejournalpublishersisanextensionofthedataintheADSandviceversa.Thisinteractionisgreatlyfa-cilitatedbytheacceptanceofthebibliographiccodebymanyjournalpublishersasameansforaccessingtheiron-linearticles.

Accesstoarticlescurrentlyon-lineatthejournalsitesthroughtheADScomprisesasignificantpercentoftheon-linejournalaccess(seeOVERVIEW).ThebestmodelforinteractionbetweentheADSandajournalpublisheristheUniversityofChicagoPress(hereafterUCP),publisherofApJ,ApJL,ApJS,AJ,andPASP.Whenanewvolumeappearson-lineatUCP,theADSisnotifiedbyemailandanSGMLheaderfileforeachofthosearticlesissimulta-neouslytransferredtooursite.Thedataareparsedandloadedintothesystemandappropriatelinksarecreated.However,priortothis,theUCPhasmadeuseoftheADStobuildtheirelectronicversionthroughtheuseofourbibliographiccodereferenceresolver.

Ourbibliographiccodereferenceresolver(Accomazzietal.1999)wasdevelopedtoprovidethecapabilitytoautomaticallyparse,identify,andverifycitationsappearinginastronomicalliterature.ByverifyingtheexistenceofareferencethroughtheADS,journalsandconferenceproceedingseditorsareabletopublishdocumentscontaininghyperlinkspointingtostable,uniqueURLs.IncreasinglymorejournalsarelinkingtotheADSintheirreferencesections,providinguserswiththeabilitytoreadreferencedarticleswiththeclickofamousebutton.

Duringthecopyeditingphase,UCPeditorsquerytheADSreferenceresolveranddetermineifeachreferenceexactlymatchesabibliographiccodeintheADS.Ifthereisamatch,alinktotheADSisestablishedforthisen-tryintheirreferencesection.Ifthereisnotamatch,oneofseveralscenariostakesplace.First,ifitisavalidref-erencenotyetincludedintheADS(mostoftenthecasefor“fringe”articles,thoseperipherallyassociatedwithas-tronomy),ourreferenceresolvercapturestheinformationnecessarytoaddittoourdatabaseduringthenextup-date.Second,ifitisavalidreferenceunabletobeparsedbytheresolver(sometimesthecaseforconferencepro-ceedingsorPhDtheses),noactionistakenandnolinkislistedinthereferencesection.Third,ifthereisanerrorinthereferenceasdeterminedbythereferenceresolver,theUCPeditorsmayaskforacorrectionorclarificationfromtheauthors.

Thelastoptiondemonstratesthepoweroftheref-erenceresolver,whichhasbeentaughtonajournal-by-journalbasishowcompletethecoverageofthatjournalisintheADS.Beforetheimplementationofthereferenceresolver,UCPwasabletomatch72%ofreferencesinApJarticles(E.Owens,privatecommunication).Earlyresultsfromtheuseofthereferenceresolvershowthatwearenowabletomatchconferenceproceedings,sothisnum-bershouldbecomesomewhatlarger.Itisunlikelythatwewillevermatchmorethan90%ofreferencesinanarticleduetoreferencessuchas“privatecommunication”,“inpress”,andpreprints,aswellasauthorerrors(seesection8).OurownreferenceresolvingofOCR’dreferencelistsshowsthatwecanmatchapproximately86

TheADSprovidesmultiplewaysforauthorsandjour-nalpublisherstolinktotheADS(seeSEARCH).Wemakeeveryefforttofacilitateindividualsandorganizationslink-ingtous.Thisiseasilydoneforsimplesearchessuchastheverificationofabibliographiccodeoranauthorsearchforasinglespelling.However,giventhecomplexityofthesys-tem,theseautomatedsearchescanquicklybecomecompli-

12C.Grantetal.:TheNASAAstrophysicsDataSystem:DataHoldings

cated.DetailsforconferenceproceedingseditorsorjournalpublisherswhoareinterestedinestablishingorimprovinglinkstotheADSareavailableuponrequest.Inparticu-lar,thosewhohaveindividualTeXmacrosincorporatedintheirreferencescanuseourbibliographiccoderesolvertofacilitatelinkingtotheADS.

8.DiscussionandSummary

Asofthiswriting(12/1999),thereare524,304referencesintheAstronomydatabase,523,498referencesintheIn-strumentationdatabase,443,858referencesinthePhysicsdatabase,and3467referencesinthePreprintdatabase,foratotalofalmost1.5millionreferencesinthesystem.Astronomerscurrentlywriteapproximately18,000jour-nalarticlesannually,andpossiblythatmanyadditionalconferenceproceedingspapersperyear.Morethanhalfofthejournalpapersappearinpeer-reviewedjournals.Thesenumbersaremorethandoublewhattheywerein1975,inspiteofanincreaseinthenumberofwordsperpageinmostofthemajorjournals(Abt1995),andanincreaseinnumberofpagesperarticle(Schulmanetal.1997).Atthecurrentrateofpublication,astronomerscouldbewriting25,000journalpapersperyearby2001andanadditional20,000conferenceproceedingspapers.Figure1showsthetotalnumberofpapersforeachyearintheAstronomydatabasesince1975,dividedintorefereedjournalpapers,non-refereedjournalpapers,andconferences(includingre-portsandtheses).Therearethreefeaturesworthnoting.First,theincreaseintotalreferencesin1980isduetotheinclusionofHelenKnudsen’sMonthlyAstronomyandAs-trophysicsIndex,arichsourceofdataforbothjournalsandconferenceproceedingswhichbegancoverageinlate1979andcontinueduntil1995.Second,therecentincreaseinconferencesincludedintheAstronomydatabase(start-ingaround1996)isduetotheinclusionofconferenceproceedingstableofcontentsprovidedbycollaboratinglibrariansandtypedinbyourcontractors.Last,thede-creaseinnumbersfor1999isduetocoverageforthatyearnotyetbeingcompleteintheADS.

ThegrowthrateoftheInstrumentationandPhysicsdatabasesisdifficulttoestimate,primarilybecausewedonothavedatasetswhichareascompleteasastronomy.Inanycase,theneedfortheorganizationandmaintenanceofthislargevolumeofdataisclearlyimportanttoeveryresearchastronomer.Fortunately,theADSwasdesignedtobeabletohandlethislargequantityofdataandtobeabletogrowwithnewkindsofdata.Newavailableitemlinkshavebeenaddedfornewtypesofdataastheybecameavailable(e.g.thelinkstocompletebookentriesattheLibraryofCongress)andfuturedatasets(e.g.fromfuturespacemissions)shouldbeabletobeaddedinthesamefashion.

Aswithanydatasetofthismagnitude,thereissomefractionofreferencesinthesystemwhichareincor-rect.Thisisunavoidablegiventhelargenumberofdata

secnerefer fo #579135791357977799999999999991111111111111Year

Fig.1.Histogramshowingthenumberofrefereedjournalpa-pers,non-refereedjournalpapers,andconferences(includingreportsandtheses)foreachyearintheAstronomydatabasesince1975.

sources,errorsinindicesandtablesofcontentsasorigi-nallypublished,andhumanerror.Inaddition,manyau-thorsdonotgivefullattentiontoverifyingallreferencesinapaper,resultingintheintroductionoferrorsinmanyplaces.Inasystematicstudyofmorethan1000referencescontainedinasingleissueoftheAstrophysicalJournal,Abt(1992)foundthatmorethan12%ofthosecontainederrors.ThisnumbershouldbesignificantlyreducedwiththeintegrationoftheADSreferenceresolverintheelec-tronicpublishingprocess.However,anymistakesintheADScanandwillgetpropagated,sostepsarebeingtakenbyustomaximizeaccuracyofourentries.

Locatingandidentifyingcorrelationsbetweenmulti-plebibliographiccodeswhichdescribethesamearticleisatime-consumingandsometimessubjectivetaskasmanypairsofbibliographiccodesneedtobeverifiedbymanu-allylookinguppapersinthelibrary.WeusetheAbstractServiceitselfforgrossmatchingofbibliographiccodes,submittingasearchwithauthorandtitle,andconsider-inganyresultingmatcheswithascoreof1.0asapotentialmatch.Thesematchesareonlypotentialmatcheswhichrequireverificationsinceauthorscansubmitthesamepa-pertomorethanonepublicationsource(e.g.BAASandarefereedjournal),andsinceerratapublishedwiththesametitleandauthorlistwillperfectlymatchtheoriginalpaper.

Whenavolumeoryearismismatched,itisusuallyobviouswhichofapairofmatchedbibliographiccodesiscorrect,butifapagenumberisoff,thedecisionastowhichcodeiscorrectcannotalwaysbeautomated.Wealsoneedtoconsidermatcheswithveryhighscoreslessthan1.0sincethesearethematcheswhereanauthornamemaybeincorrect.Thecorrectionoferrorsofthissortisongoingworkwhichiscarriedoutasoftenastimeandresourcespermit.

C.Grantetal.:TheNASAAstrophysicsDataSystem:DataHoldings13

TheevolutionoftheInternetandtheWorldWideWeb,alongwiththeexplosionofastronomicalservicesontheWebhasenabledtheADStoprovideaccesstoourdatabasesinanopenanduniformenvironment.Wehavebeenabletohyperlinkbothtoourownresourcesandtootheron-lineresourcessuchasthejournalbibliographies(Boyce&Biemesderfer1996).AspartoftheinternationalcollaborationUrania(UniversalResearchArchiveofNet-workedInformationinAstronomy,Boyce1998),theADSenablesafullyfunctioningdistributeddigitallibraryofas-tronomicalinformationwhichprovidespowerandutilitypreviouslyunavailabletotheresearcher.

PerhapsthelargestfactorwhichhascontributedtothesuccessoftheADSisthewillingcooperationoftheAAS,CDS,andallthejournalpublishers.TheADShaslargelybecomethemeansforlinkingtogethersmallerpiecesofabiggerpicture,makinganelaboratedigitallibraryforastronomersareality.Wecurrentlycollaboratewithoverfiftygroupsincreatingandmaintainingcross-linksamongdatacenters.Theseadditionalcollaborationswithindivid-ualsandinstitutionsworldwideallowustoprovidemanyvalue-addedfeaturestothesystemsuchasobjectinforma-tion,authoremailaddresses,mailorderformsforarticles,citations,articlescans,andmore.Alistingofthesecol-laborationsisprovidedinTable6.Anyomissionsfromthistablearepurelyunintentional,astheADSvaluesallofourcolleaguesandtheusersbenefitnotonlyfromthemajorcollaboratorsbuttheminoronesaswell,astheseareoftenmoredifficultforuserstolearnaboutindepen-dently.MostoftheabbreviationsarelistedinTables2,3,and4.

ThesuccessfulcoordinationofdataexchangeswitheachofourcollaboratorsandtheeffortswhichwentintoestablishingtheminthefirstplacehavebeenkeytothesuccessoftheADS.Establishinglinkstoandfromthejournalpublishers,changingtheselinksduetorevisionsatpublisherwebsites,andtrackingandfixingbrokenlinksisallconsideredroutinedatamaintenanceforthesystem.Sinceitisnecessaryforustomaintainconnectivitytoex-ternalsites,routinechecksofsamplelinksareperformedonaregularbasistoverifythatthelinksarestillactive.UsagestatisticsfortheAbstractService(seeOVERVIEW)indicatethatastronomersandlibrariansatscientificinstitutionsareeagertotakeadvantageoftheinformationthattheADSprovides.Thewidespreadac-ceptanceoftheADSbytheastronomicalcommunityischanginghowastronomersdoresearch,placingextensivebibliographicinformationattheirfingertips.Thisenablesresearcherstoincreasetheirproductivityandtoimprovethequalityoftheirwork.

AnumberofimprovementstothedataintheADSareplannedforthenearfuture.Asalways,wewillcontinueoureffortstoincreasethecompletenessofcoverage,par-ticularlyforthedatapriorto1975.WehavecollectedmostofthemajorjournalsbacktothefirstissueforscanningandaddingtotheAstronomydatabase.Inaddition,we

arescanningandOCR’ingtableofcontentsforconferenceproceedingstoimproveourcoverageinthatarea.WearecurrentlyOCR’ingfulljournalarticlestoprovidefulltextsearchingandtoimprovethecompletenessofourrefer-enceandcitationcoverage.Finally,astheADSbecomescommonplaceforallastronomers,valuablefeedbackfromouruserstoinformusaboutmissingpapers,errorsinthedatabase,andsuggestedimprovementstothesystemservetoguidethefutureoftheADSandtoensurethattheADScontinuestoevolveintoamorevaluableresearchtoolforthescientificcommunity.9.Acknowledgments

TheotherADSTeammembers:MarkusDemleitner,Eliz-abethBohlen,andDonnaThompsoncontributemuchonadailybasis.FundingforthisprojecthasbeenprovidedbyNASAunderNASAGrantNCC5-1.References

Abt,H.A.1994,PASP,106,1015Abt,H.A.1995,ApJ,455,407

Accomazzi,A.,Eichhorn,G.,Grant,C.S.,Kurtz,M.J.,&Mur-ray,S.S.2000,(thisissue)

Accomazzi,A.,Eichhorn,G.,Kurtz,M.J.,Grant,C.S.,&Mur-ray,S.S.1998,ASPConf.Ser.172:AstronomicalDataAnalysisSoftwareandSystemsVIII,291

AmericanAstronomicalSocietyManuscriptPreparation,1999,

http://www.journals.uchicago.edu/AAS/AASTeX

Boyce,PeterB.1998,ASPConf.Ser.153:LibraryandInfor-mationServicesinAstronomyIII,107

Boyce,P.B.&Biemesderfer,C.1996,ASPConf.Ser.101:As-tronomicalDataAnalysisSoftwareandSystemsV,7Egret,D.&Wenger,M.1988,ESOConf.#28:Astronomy

fromLargeDatabases,edsF.MurtaghandA.Heck,323Eichhorn,G.,Kurtz,M.J.,Accomazzi,A.,Grant,C.S.,&Mur-ray,S.S.2000,(thisissue)

Goldfarb,CharlesF.&Rubinsky,Yuri,1991,TheSgmlHand-book(ClarendonPress)

Harold,ElliotteRusty,1998,Xml:ExtensibleMarkupLan-guage(IDGBooksWorldwide)

Helou,G.&Madore,B.1988,ESOConf.#28:Astronomy

fromLargeDatabases,edsF.MurtaghandA.Heck,335InternationalOrganizationforStandardization1987,Informa-tionandProcessing—8–bitSingle–byteCodedGraphicCharacterSets(Geneva)

Jacobsen,Dana,1996,http://www.ecst.csuchico.edu/-˜jacobsd/bib/formats/refer.html

Knuth,DonaldE.1984,TheTeXbook(Addison-WesleyPub-lishingCo.)

Kurtz,M.J.,Eichhorn,G.,Accomazzi,A.,Grant,C.S.,Mur-ray,S.S.,&Watson,J.M.2000,(thisissue)

Lamport,Leslie1986,LaTex:ADocumentationPreparation

System(Addison-WesleyPublishingCo.)

Lee,J.,Dubin,D.S.,Kurtz,M.J.1999,ASPConf.Ser.172:

AstronomicalDataAnalysisSoftwareandSystemsVIII,287

14

Table6.Collaborators

C.Grantetal.:TheNASAAstrophysicsDataSystem:DataHoldings

A.G.DavisPhilipAcademicPress(AP)

AmericanAstronomicalSociety(AAS)AmericanInstituteofPhysicsAndreHeck

AnnualReviews,Inc.

AstronomicalDataCenter(ADC)

AstronomicalInstituteofCzechoslovakia

AstronomicalInstituteoftheSlovakAcademyofSciencesAstronomicalSocietyofAustraliaAstronomicalSocietyofIndiaAstronomicalSocietyofJapan

AstronomicalSocietyofthePacific(ASP)AstronomischeGesellschaftAstronomischeNachrichtenBalticAstronomy

BritishAstronomicalAssociationCambridgeUniversityPress

CentralBureauforAstronomicalTelegrams(CBAT)ChrisBennEDPSciences

ElsevierPublishers

GeneralCatalogueofPhotometricData(GCPD)InstituteforScientificInformation(ISI)

InternationalSocietyforOpticalEngineering(SPIE)KoreanAstronomicalSocietyKluwerPublishers

LibraryofCongress(LOC)

LosAlamosNationalLaboratory(LANL)LunarandPlanetaryScienceInstitute(LPI)MeteoriticalSocietyNED

TheObservatory

RoyalAstronomicalSocietySIMBAD

SpringerVerlag

UniversitadNacionalAutonomadeMexico(UNAM)UniversityofChicagoPress(UCP)ScanningofConferenceProceedingsScanningofIcarus

Citations,ScanningofAJ,ApJ,ApJL,ApJS,AASPBa,BAASScanningofSvAL

StarHeads(AuthorHomePages)ScanningofARA&ADlinkstodataScanningofBAICzScanningofCoSkaScanningofPASAScanningofBASIScanningofPASJ

ScanningofPASPandConferenceProceedingsScanningofRvMAScanningofANScanningofBaltAScanningofJBAA

Mlinkstoorderforms,ScanningObjectsearches

Astropersons.lis(AuthorEmail)ScanningofA&ASElinkstoarticlesDlinkstodataCitations

MlinkstoorderformsScanningofJKAS

Mlinkstoorderforms,ScanningofSoPhZ39.50interface,LlinkstodataPreprintArchive

Scanning,ObjectsearchesScanningofM&PS

Nlinkstoobjects,ObjectsearchesScanning

ScanningofMNRAS

Slinkstoobjects,Dlinkstodata,ObjectsearchesScanningofA&A,ZAb

ScanningofRMxAA,RMxACReferenceResolving

C.Grantetal.:TheNASAAstrophysicsDataSystem:DataHoldings

Warnock,A.,Gass,J.,Brotzman,L.E.,vanSteenberg,M.E.,

Kovalsky,D.,&Giovane,F.1992,NewsletteroftheAmer-icanAstronomicalSociety,62,10

Warnock,A.,vanSteenberg,M.E.,Brotzman,L.E.,Gass,J.E.,

Kovalsky,D.,&Giovane,F.1993,ASPConf.Ser.52:As-tronomicalDataAnalysisSoftwareandSystemsII,137

15

AppendixA:

Version1.0oftheXMLDTDdescribingtextfilesintheADSAbstractService.

DocumentTypeDefinitionfortheADSbibliographicrecords

Syntaxpolicy=============

-Theelementnamesareinuppercaseinordertohelpthereading.

-Theattributenamesarepreferablyinlowercase

-TheattributevaluesareallowedtobeoftypeCDATAtoallowmoreflexibilityforadditionalvalues;however,attributestypicallymayonlyassumeoneofawell-definedsetofvalues

-Cross-referencingamongelementssuchasAU,AF,andEMisaccomplishedthroughtheuseofattributesoftypeIDREFS(forAU)andID(forAFandEM)

TITLE?,AUTHORS?,

AFFILIATIONS?,EMAILS?,FOOTNOTES?,BIBCODE,MSTRING,MONOGRAPH?,SERIES?,PAGE?,LPAGE?,

COPYRIGHT?,PUBDATE,

CATEGORIES*,COMMENTS*,ANOTE?,BIBTYPE?,

IDENTIFIERS?,ORIGINS,

16C.Grantetal.:TheNASAAstrophysicsDataSystem:DataHoldings

OBJECTS*,KEYWORDS*,ABSTRACT*)>

originCDATAlangCDATA

#REQUIRED

#IMPLIED>

CREATOR,CDATE,EDATE)>

AFIDREFSEMIDREFSFNIDREFS

#IMPLIED#IMPLIED#IMPLIED>

-->FNAME?,

addresseswiththeindividualauthorrecords.

ThisistheonlyexceptionofattributesinTypicallythisiscreatedbyparsingthe

C.Grantetal.:TheNASAAstrophysicsDataSystem:DataHoldings17

textintheMSTRINGelement.Example:SERVOL?,TheAstrophysicalJournalSEREDITORS?,526SERBIBCODE?)>2

-->

VOLUME?,MNOTE?,EDITORS?,EDITION?,

FormattingisasforAUTHORSandAUelementsCOPYRIGHT(#PCDATA)>

FNAME?,LNAME,SUFF?)>

assigned.STI/RECONhasalwaysassignedacategoryforeachentryintheirsystem,butotherwisethereislittleelseinourdatabase.Theattributesoriginandsystemofferings.The\"system\"attributecharacterizes

thesystemusedtoexpresstheidentifier-->systemNMTOKEN#IMPLIED>

ConferenceSeries).NotethattheremaybeseveralSERIESelements,sincesomeaseries.-->

18C.Grantetal.:TheNASAAstrophysicsDataSystem:DataHoldings

sources(E.g.STI,whichkeyedabstractsinmostcases).ThereforeweallowseveralABSTRACTelementswithineachrecord,eachwithaseparateoriginorlanguage.

Theattributetypeisusedtokeeptrackofhowtheabstractdatawasgenerated.Forinstance,abstracttextgeneratedbyourOCRsoftwarewillhave:

origin=\"ADS\"type=\"OCR\"lang=\"en\"-->

typeNMTOKEN#IMPLIEDlangCDATA#IMPLIED

>

>>

preformattedtext.-->

Examples:38426A90-12345Inourcontentmodel,theyareallowedtocontainadditionalSUPandSUBelements,typeNMTOKEN#REQUIRED>PCDATAatsomepoint-->

AppendixB:

originNMTOKEN#IMPLIEDsystemNMTOKEN#REQUIRED>

Copyright © 2019- huatuo3.com 版权所有 蜀ICP备2023022190号-1

违法及侵权请联系:TEL:199 1889 7713 E-MAIL:2724546146@qq.com

本站由北京市万商天勤律师事务所王兴未律师提供法律服务