Building Solid Apps which use Public Data

One of the early Solid Apps which both looked nice and was definitely useful was Noel de Martin's Media Kraken.   Media Kraken allows you to track the movies you have watched or would like to watch.    It does this by making a simple database in the user's pod, and tying that data to the Internet Movie Data Base, IMDB.   To add another movie to you collection, you smart typing, and it pulls in from IMDB candidate movies which match what you are typing, as you the the name of a movie. When have selected one, then it is represented by IMDB's miniature picture of the DVD cover.   It is neat for the base functionality it gives, but also for how in the future is we all share our media collections, then we can easily see when you and I  have watched the same thing as it will be captured by the same IMDB uri.

Stepping back, this is a great example of how a user's world is make up of data from across what the ODI calls the Data Spectrum.  Some of it is public, like  the properties of movies, some of it is shared, like playlists and photos, and some of it is private, like passwords. 

 Meanwhile, I've been thinking about the public part of of a solid user's profile, and felt it would be great to put in some details of what one has been up to in one's life as well as the bare contact details it has (as of 2020-12 anyway).  Adding this resumée information involves mainly capturing the various projects, companies and organizations one has been involved in, as student, volunteer, employee, and on. And just as with the movies I am tracking, it is useful to tie these organizations into public data.  For several reasons in fact

Data sources and ontologies: Organization

There was a couple of public databases which came to mind, dbpedia and WikiData.  They each have sparql endpoints which seemed to work.  Wikidata has a really nice playground where you can tweak your sparql query and test it instantly.  Meanwhile as Solid policy is to use terms whenever possible, I looked to the class structure of these organizations.   I'm using the term 'organization' as a generic one to include companies, universities, projects and so on, partly because that matches both with's Organization, and also  Wikidata's Organization which it gives a number to rather than a name.  Unfortunately wikidata, the main source of public data, and, the main source of terms people will  be interoperable with, don't completely agree on the structure of subclasses of organization.  Schema says more specific classes include "Airline, Consortium, Corporation, EducationalOrganization, FundingScheme, GovernmentOrganization, LibrarySystem, LocalBusiness, MedicalOrganization, NGO, NewsMediaOrganization, PerformingGroup, Project, SportsOrganization, and WorkersUnion" including lots of things but not Non-Profit.  I made a sketch of a little part of the wikidata classes. 

 From the UX point of view its nice to give the user a choice of mutually distinct classes even when life maybe more complex. So the colored ones seems to be fairly easy to understand and so seemed to partition the space fairly cleanly.

I made an executive decision that for now we would assume for now: 

solid:InterestingOrganization owl:disjointUnionOf(
) .

In other words be prepared to mint a solid-specific concept of Organization which would involve just those 7 subclasses.  

That allows us to make a form field like

:OrgClassifier a ui:Classifier; ui:label "What sort of organization?"@en;
 ui:category solid:InterestingOrganization .

which looks something like this:

Data sources and ontologies: Occupation

What about the connections between people and organizations?  Schema's Role class looks interesting, though apparently to be used in a strange way, with incoming and outgoing arcs both "athlete" which is weak.  What existing work is there out there? A quick search reveals  Uldis Bojar's 2007  ResumeeRDF and of course the LinkedIn API. The latter mentions some enumeration types, like:

Clearly, as LI is very much the dominant player in the CV space, being able to interoperate with them would be valuable.  We imagine that they have their own carefully curated databases of schools, companies, job types, which is not public and they may regard as the crown jewels not to be shared.  In an ideal future world one would be able to sync both ways between one solid CV and linked in - and their competitors.  So a deeper investigation of the LinkedIn API would be a good idea.

 A CV entry is basically a relationship between people and organizations. The address book vcard gives people, augmented/grounded using their webids, and organizations, augmented/grounded by their public IDs. The start and end date there are a few contenders (iCal events, ResumeeRDF) but as schema has that, let's use that.  The solid world is like a bag of chips -- many different languages.  If a vocabulary you like or are using already you are using has a term, then use it.  Don't try to get everything into the same ontology. 

Now the remaining thing we need to model is the relationship between the two: the job, the exact role. The ultimate find here would by an up-to-date list of occupations translated into many languages. Then I could fill in my CV in english and you could set your browser to Arabic and read my CV in Arabic.

Wikipedia quotes the UK ONS as listing 27,966 job titles. (Check it out - no Quarterback but Quartermaster and Putter-on (glue) and putter-together (scissors)). Clearly a great resource but UK-culture-specific. Does the ILO have something more general? Yup - about labour surveys, pointers to ICLS whose 1992 report mentions  "Some 30 countries have already either established, or are in the process of creating, a national standard classification of occupations similar to ISC0-88." The latter has a spreadsheet with 500 posts in 10 categories in English, French and German.   There is ESCO, which is the EU equivalent, ISCO-compatible, more recent and notes  "ESCO does not adapt the ISCO occupation groups, but the Commission manages translations of ISCO labels in the official languages of the European Union."  Not to mention "The full ESCO dataset can be downloaded in turtle format from the ESCO Service Portal"  , "and each module is available in 26 European languages and in Arabic". (my emphasis). Restful API ... including search by name substring. Examples:

A SPARQL endpoint would be nicer.. There is the EU Open Data Portal SPARQL service.  Does it have ESCO?

Is this ESCO stuff available in fact in wikidata? There are 251 mentions of ESCO IDs in wikidata.  Not typically in 27 languages.  Can we use wikidata Profession? 126 subclasses, pretty mixed, and 105k transitive closure subclasses including things like Mayor of Halifax! So nothing like as clean, international, or complete. The ESCO occupations seem to be the ones to pick.

Looking through the open HR and finding the JDX work on job opening exchange, that mentions, when talking about job classification "Industry code identifying the primary activity at the location of employment (e.g., NAICS in the U.S. and ESCO in the EU)" but  NAICS seems to have codes for industry types, not job types.

Now we have to figure out the shape for the resumes entry itself, and its discovery. A discussion in gitter suggested that a CV should be a different resource from one's profile. Down the road, we may have a different CV shared within each community, but for now let's focus on a primary public one which everyone will be able to see with your profile.

The Linked In APIs for position, organization etc use linked-in defined URNs for things like language codes.

Adding to the SolidOS UX

Where to add this to SolidOS?   The plan is to extend the contacts management in the Address Book to be much smarter about Organizations. Already we are in a good place in a way, in that we use the vCARD ontology for storing stuff about Individual, Organizations, and groups of them.  So a new contact is either an Individual or an Organization already, and it looks as though we can treat vcard:Organization class as being just the same as the schema one and the wikidata one.  

The plan is to make a user workflow in which when you add an organization to your contacts the handling of it much smarter, tying it to the public data where we can.   Should we include project which are public Github, Gitlab, etc projects? Yes but later.  Then add a resume form on the profile editor and a resumee rendering on the profile.  How about an "Add this to my CV" button any time we are looking at an organization? Maybe.

Yet to be written up:

Yet to be done:

To be continued


CC BY-NC 4.0
Document Status