BIM-224 Research Infrastructures 23

From Wikiversity
Jump to navigation Jump to search

Materials and Tasks for the module "BIM-224, SoSe 2023, Blümel/Rossenova" for students at Hochschule Hannover. The materials are prepared with several colleagues from the Open Science Lab at TIB Hannover.

Session 1: Data harvesting interfaces / data collection[edit | edit source]

Slides are available here: https://docs.google.com/presentation/d/1IxRTQhTY8nwFaijHq78m0NvW6Qw_YAj3YQtyO9Nn6dg/edit?usp=sharing

Student homework task pages[edit | edit source]

Group task 1[edit | edit source]

Platform list[edit | edit source]
  • Radar4Culture
  • GNM catalog
  • Forschungsbibliothek Gotha der Universität Erfurt
  • Datenportal des MfN Berlin
  • Herbarium Berolinense
  • Sketchfab
  • Porta Fontium
  • Coding da Vinci
Type of API list[edit | edit source]

Group task 2[edit | edit source]

Session 2: Data cleaning, reconciliation and enrichment[edit | edit source]

Slides are available here: https://docs.google.com/presentation/d/1HpXUXYcs-LDOQYuQzFv1SYYutKUYP8qG0mR5BG3fLyw/edit?usp=sharing

OpenRefine official documentation:[edit | edit source]

https://openrefine.org/docs/manual/facets

https://openrefine.org/docs/manual/transforming

OpenRefine video tutorial:[edit | edit source]

https://youtu.be/jyUlT8ohlG4

Homework presentations:[edit | edit source]

Session 3: Data in Wikidata[edit | edit source]

Slides are available here: https://docs.google.com/presentation/d/1bCilgycOApKcFjzelntD6zRf5WBU9804t_9Fb-Lc1E8/edit?usp=sharing

Homework presentations:[edit | edit source]

Session 4: Data Upload and querying (26.05)[edit | edit source]

Slides are available here: https://docs.google.com/presentation/d/1ebFJXSKikUSyjjPIsXFwTqVV2-igku6ra5Vm83h5SWQ/edit?usp=sharing

Additional tutorials:[edit | edit source]

Complete upload pipeline tutorial: https://en.wikiversity.org/wiki/OpenRefine_to_Wikibase%3A_Data_Upload_Pipeline

Upload tutorial for media files in Wikimedia Commons: https://en.wikiversity.org/wiki/Uploading_media_files_to_a_Wikibase_with_OpenRefine

Homework presentations:[edit | edit source]

Session Workshop: Fermenting Data Workshop (02.06)[edit | edit source]

Slides are available here: https://docs.google.com/presentation/d/1BHlO17nTTXccoPMgqXZBx46zuDvhnM52h5Wj9X8p36M/edit?usp=sharing

Wikibase instance:[edit | edit source]

https://fermentingdata.wikibase.cloud/w/index.php?title=Special:CreateAccount&returnto=Main+Page

Session 5: Data upload and querying (cont.) / Data visualisation and presentation (09.06)[edit | edit source]

Video recording of the lecture: https://drive.google.com/file/d/1q94LdQauMPErzK5Yp2jD1zq0_MjWWgCX/view?usp=sharing

Slides are available here: https://docs.google.com/presentation/d/1T1fPDI2jSQJ1Q6rAaARIgxTbmST5Py_C8pmCBBAlXWQ/edit?usp=sharing

Book an individual feedback session - 15 mins per person:[edit | edit source]

  • 15:00: Lisa Sommer
  • 15:15: -
  • 15:30: Ahmad Aroud
  • 15:45: -
  • 16:00: Anna Rahr
  • 16:15: Gizem Ergün
  • 16:30: Memo Loran Tuku
  • 16:45: Josef Debase
  • 17:00: Ahmad Hasan Ahmad
  • 17:15: Jana Cornelius
  • 17:30: -

Session 6: Data publication and review[edit | edit source]

In this session we will review homework and discuss requirements for final assignment submission.

Final submission deadline is July 7th.

Final assignment submission instructions[edit | edit source]

1) Spreadsheet with data you uploaded to Wikidata
2) Spreadsheet with the data you can download from the SPARQL endpoint with your main data query
3) Publication on GitHub Pages containing:

  • your custom query results
  • customized title / author / cover image
  • customized additional text and optionally embedded data visualization as .svg and/or live results in an iframe.

Infos discussed during the session today[edit | edit source]

1) Adding proper Wikitext to Images in Commons when Uploading via OpenRefine[edit | edit source]

- A more detailed tutorial page, if you want to go more in-depth (esp. page 5 & 6): https://docs.google.com/document/d/1ENpZBOHvMESOst4Phh5gSRWlnAdBs-OMZt5j_cL-YGA/edit?usp=sharing

- For quick reference, I advise you to just check the screenshot here: and try to replicate in your schema builder when uploading. You need to make sure you have all of these statements for the images, in addition to the Wikitext. Depicts / Main subject link your image to the main object / artwork you uploaded to Wikidata.

- If you have photos of objects, you can use this simple Wikitext for all your photos (in addition to the statements as shown in the screenshot)

== {{int:filedesc}} == {{Art photo}} == {{int:license-header}} == {{CC-BY-4.0}}

Note to check the license – the above is just an example!

Note that if you copy my screenshot schema you will need to update the museum to match the museum you’re working with and license, too.
If you have photos of paintings / artworks, you can use this simple Wikitext for all your photos (in addition to the statements as shown in the screenshot)

== {{int:filedesc}} == {{Artwork}} == {{int:license-header}} == {{CC-BY-4.0}}

- More details are available in the google doc I shared above, but these instructions should be sufficient, too.

2) Using OpenRefine online[edit | edit source]

- There is actually an online version of OpenRefine! It is a bit old and does not have all new functionalities, e.g. you can’t upload images with it, but other than that it can be helpful in cases when you can’t use it on a personal or institutional computer for technical or other reasons. You need to go here: hub-paws.wmcloud.org and log in with your Wikimedia account. Then select OpenRefine from the set of tools available.

3) Issues with SPARQL queries, e.g. removing multiple line results for same item, etc.[edit | edit source]

- You can use a group_concat clause to concatenate multiple values in a single column, in order to avoid duplication of the same item over multiple lines, e.g. see this example: https://w.wiki/6qbP

- If you need more help customizing your queries, you can ask your peers, ask ChatGPT (though do not rely on it too much, it is still not very good with SPARQL and you have to be a magician with the prompts to get it all correct), or you can always consult trusted sources like StackOverflow and this very helpful SPARQL learning page on Wikidata - https://www.wikidata.org/wiki/Wikidata:SPARQL_query_service/queries/examples

Final updates regarding publications:[edit | edit source]

For reference, you can have a look at the publications of your peers, or you can also double-check my own publication, which exemplifies different parts of the assignment.
- published view here: https://lozanaross.github.io/catalogue-003/
- Github code view here: https://github.com/lozanaross/catalogue-003

FINAL SUBMISSION[edit | edit source]

Send the spreadsheets to the instructor via email.

Add your name & link to your publication below: