Blog

E2372 --2020 NDL Digital Library Cafe <Report>

Current Awareness-E

No.411 2021.04.22

E2372

2020 NDL Digital Library Cafe

Electronic Information Department Electronic Information Distribution Division, Ryoka Suzuki (Haruka Suzuki), Michiko Takahashi (Michiko Takahashi)

On December 10, 2020 and January 15, 2021, the National Diet Library (NDL) held the 2020 NDL Digital Library Cafe. This event is a lecture for the general public that sets themes on research and the latest trends related to digital libraries, and invites experts in the field as lecturers to have fun discussions (see E2081). This was the first time that the event was held online, and about 20 people participated each time, including participation from afar.

● The 1st "Utilization and Challenges of Web Archives: From WARP and Domestic and Foreign Cases"

The first theme was "Utilization and Issues of Web Archives: From WARP and Cases in Japan and Overseas", and NDL staff introduced the Web Archiving Project (WARP) and cases of providing web archives in Japan and overseas. After receiving topics from Mr. Kunihiko Ueshima of Japan Data Exchange Co., Ltd. and Mr. Masayuki Asahara of the National Institute for Japanese Language and Language Corpus Development Center, we had a discussion with the participants.

As the current status and development of WARP from the NDL Kansai-kan Electronic Library Division, the recent expansion of collection targets, including the collection of private websites with permission, and the institutional archive, which is an example of the use of WARP's persistent identifier (PID), etc. I introduced it. In addition, it is said that the issues of web archiving are changing from the aspect of technological development to the aspect of utilization in academic research (see CA1893), and as an example of dealing with broken links and content changes in cited documents, the cited document storage service Perma.cc in the legal field. As an example of providing an easy-to-use secondary data set, the UK Web Archive of the British Library (BL), and as a development project related to a data set creation tool, the Internet Archive and researchers belonging to Canadian universities are playing a central role. Introduced the Archives Unleashed project.

Regarding the market value of web archives, Mr. Ueshima stated that in addition to the public data that is the final product, the data created in each process of target data selection, collection, organization, and storage have different utility values. He also described WARP as having abundant data content, and stated that there is room for development in expanding the types of data sets to be provided and providing custom-made aggregates that are not currently being provided.

Mr. Asahara stated that the usefulness of WARP in academic research lies in the quality maintenance by controlling the collection target and the provision of large-scale text data, based on the development experience of "Kokugoken Japanese Web Corpus". He also commented that the "Parliamentary Minutes Search System" (see E2240) is often used in linguistic research research, and that continued archiving will be important data for future linguistic researchers.

In the discussion, in order to expand the use of data, we will provide a general-purpose open data set that can be used in a variety of ways regardless of trends and current affairs, and Japan's ten to metadata so that it can be used for integrated analysis of huge and diverse data. There was an opinion that it is effective to create and publish a correspondence table between standard vocabulary such as Data Catalog Vocabulary (DCAT) and output items from WARP.

● The 2nd "New Year Project: Humanities in 2021"

The second theme is "New Year's Project: Humanities in 2021", Yuta Hashimoto of the National Museum of Japanese History (hereinafter referred to as "History"), Naoki Kofu of Chiba University, Akihiro Kameda of Chiba University, Saga University After the introduction of the efforts by Mr. Natsuko Yoshiga, we had a discussion with the participants.

Mr. Hashimoto picked up "Reprinting Together" (see E2353). By supporting IIIF (see CA1989) in 2019 and improving interoperability with digital archives, it will lead to the provision of materials from the regional material archive to "Reprint with everyone" and access to overseas materials related to Japan. It was shown that there were various positive effects, such as an increase in the possibility of being able to do it.

Mr. Kofu introduced a text conversion project for "Engi-shiki" using the Text Encoding Initiative (TEI). This is an attempt to convert the ancient Japanese administrative historical material "Engi-shiki" into mechanically analyzable data by converting it into text and marking it up with TEI. While it will be possible to verify previous research and conduct new research using the obtained data, securing specialized knowledge and manpower, and overall management will be issues for implementing similar projects. I raised that it would be.

Mr. Kameda introduced the possibility of utilizing Wikidata, which is Linked Data (see CA1746), by taking the cooperation between khirin, which is a research database of Japanese History, and Wikidata as an example. Wikidata concludes that it is useful as a hub for connecting various data, although it is necessary to reserve reliability and devise a design when linking databases to ensure persistence.

The "Ogi Domain Diary Database" that Mr. Yoshiga was involved in constructing has been converted into data focusing on the "Diary Catalog", which is a summary of the domain business diary of the Edo period. , It is possible to search by catalog text or keywords. It was a big issue to extract the vocabulary specific to the region and the era that often appear in the text as a search keyword, but he introduced that it was solved by the enthusiastic participation of local citizens who can read local materials.

In the discussion, they shared the problem that it was difficult to obtain Japanese text data regardless of whether it was a modern sentence or a historical document, and the experience that cooperation with Wikidata and time information analysis software HuTime was useful. In addition, many opinions on project management are introduced, such as it is easy to make a project by narrowing down the scope of the target materials, it is important to secure manpower and expertise, and it is also important to develop human resources who can participate in the project and supervise the entire project. Was done.

Throughout the discussion, many collaborations between data providers, experts, and citizens who use the data were discussed, and it was an opportunity to realize the importance of creating a system that connects "people" in their respective positions.

Ref: “2020“ NDL Digital Library Cafe ”. NDL Lab.https://lab.ndl.go.jp/event/digicafe2020/“ National Diet Library Data URI ”. NDL.https: // www. ndl.go.jp/jp/dlib/standards/lod/uri.htmlPerma.cc.https://perma.cc/ “More than 9 million broken links on Wikipedia are now rescued”. Internet Archive Blogs. 2018-10- 01.http://blog.archive.org/2018/10/01/more-than-9-million-broken-links-on-wikipedia-are-now-rescued/UK Web Archive.https: // www. webarchive.org.uk/ukwa/The Archives Unleashed Project.https://archivesunleashed.org/ “Use of custom-made aggregates”. Statistics Bureau, Ministry of Internal Affairs and Communications. Https://www.stat.go.jp/info/tokumei/ order.html National Diet Library Japanese Web Corpus. Https://bonten.ninjal.ac.jp/ “Data Catalog Vocabulary (DCAT) --Version 2”. W3C.https: //www.w3.org/TR/vocab- dcat / Naoki Kofu, Makoto Goto. Application of TEI to "Enki-shiki" and sharing and distribution of text data of Japanese history materials. National Diet Library Research Report. 2019, (218), p. 315-327. https: / /www.rekihaku.ac.jp/outline/publication/ronbun/ronbun9/pdf/218005.pdfText Encoding Initiative.https://tei-c.org/Wikidata.https://www.wikidata.org/wiki/Wikidata : Main_Pagekhirin.https://khirin-ld.rekihaku.ac.jp/ Koshiro Clan Diary Database. Https: //crch.dl.saga-u.ac.jp/nikki/HuTime.http://www.hutime.jp/ Toru Aoike. 2018 NDL Digital Library Cafe . Current Awareness-E. 2018, ( 358), E2081.https://current.ndl.go.jp/e2081 Research and Legislative Examination Bureau Parliamentary Office Materials Division. Renewal of four search services such as NDL and Diet Library. Current Awareness-E. 2020, (387) ), E2240.https://current.ndl.go.jp/e2240 Yuta Hashimoto, Yasuyuki Kano. Reprinted by Everyone: Citizen-participation reprinting platform for historical materials. Current Awareness-E. 2021, (408), E2353.https //current.ndl.go.jp/e2353 Naotoshi Maeda. Movement toward Utilization of Web Archives-World Trends and WARP Initiatives-. Current Awareness. 2017, (331), CA1893, p. 9-13 .https://doi.org/10.11501/10317594 Kennobu Nagasaki. Overview of IIIF and release of major API version 3.0. Current Awareness. 2020, (346), CA1989, p. 13-16. https://doi. org / 10.115101 / 11596735 Hideaki Takeda. Trends in Linked Data. Current Awareness. 2011, (308), CA1746, p. 8-11. Https://doi.org/10.11501/3192158

Hot Articles

How to Save Websites as PDF on iPhone or PC | Business Insider Japan

How to Save Websites as PDF on iPhone or PC | Business Insider Japan

Sign up for a free e-mail newsletter We'll send you a Business Insider Japan e-mail newsletter at 17:00 on weekdays. Check the terms of use You can save the website as a PDF from various web browsers including Safari on iPhone. Photo: Takuma Imamura Web page suddenly ...

READ MORE READ MORE
 It's okay if you forget to record the news!How to see the famous scenes of the Olympics later on your smartphone

It's okay if you forget to record the news!How to see the famous scenes of the Olympics later on your smartphone

Explaining how to use the archive distribution The Tokyo Olympics attracts attention not only for players' play but also for unique commentary. Even if you miss it even though it became a hot topic, or if you did not record it, you can do it at your favorite timing later ...

READ MORE READ MORE
Yahoo! News Digitalizing the traditional "small pattern dyeing" pattern Crisis of disappearance, challenge of long-established president

Yahoo! News Digitalizing the traditional "small pattern dyeing" pattern Crisis of disappearance, challenge of long-established president

In the file in front of Mr. Atsushi Tomita, a well-preserved paper pattern is included so that it is not exposed to the air as much as possible. To prepare for digitization and prevent deterioration = Taken by Hiroyuki Kondo on the morning of December 10, 2021 at Tomita Dyeing Crafts in Shinjuku-ku, Tokyo ...

READ MORE READ MORE

Related Articles