The Map Is Not The Territory

A blog by Christian Willmes.

The daily kindergardening of OSGeo wiki spammers

| categories: osgeo, semantic mediawiki | View Comments

I regret to bother you with this topic, but I need to write something about my frustration with increasing spam activity in the OSGeo wiki. It is really unbelievable how much human time resources these spammers invest to put some links and upload some documents into the wiki.

Since some time I do some voluntary work in helping to maintain the OSGeo wiki. I do this because I have some Mediawiki and Semantic Mediawiki knowledge from my other research and work projects, that I am happy to share with the OSGeo community.

Originally the OSGeo wiki was linked to the central OSGeo LDAP directory for identity and account management, thus in this time the user account management was not carried out through the wiki but through that LDAP directory. Since about two years now, the LDAP integration with the wiki has been broken, because the extension we used would not have been updated to work with the newer versions of Mediawiki.

Meanwhile, because I myself felt not knowledgeable enough about LDAP and Martin Spott tried but did not succeed to get another LDAP extension to work, we had to manage the user accounts from within the wiki. Because the standard account request/creation procedure of Mediawiki is not well protected against abuse, its actually really simple to let bots create huge numbers of spam accounts, we first disabled the account registration, and had new users request new accounts via email to the OSGeo SAC mailing list. After this proved to be unhandy, we decided to install the ConfirmAccount Extension, to handle account requests.

This extension requires from new users, additionally to a valid and confirmed email address, to provide a short biography about them self. This biography is then reviewed by SAC volunteers, to check if the requester is not a spammer. The SAC volunteer has the options to Accept, Reject, Hold, or to qualify the request as Spam. On Reject, the requester is informed with a standard note, that his request was denied. On Hold, the volunteer can ask for additional information from the requester to decide upon that if the request is valid, on spam the request is denied, but the requester is not informed, further more his email address is blocked from further requests. On Accept the user account is created with a random password and notified by email about this.

So far so good, but from here it gets messy, because we experience about ~10 account requests a day of which about 99% are fraudulent and or spam requests. And the spammers are actual humans from SEO companies, I guess. They make up all kinds of things, that let me be certain that they have some human agents pasting this into the requests. Here are some nice example biographies, I got to read:

User:Maleshwar: Born as a princess into a royal family of Kingdom of Dagbon, in the Northern Region of Ghana, Gunu has been interested in dancing and music since she was young. She competed in regional and national dance competitions, winning the dance championship for the northern Region and second place in the 1998 National Dance Championship. She took second place in the Hiplife dance championship in 2003, where she met King Ayisoba and Terry Bonchaka, who subsequently become collaborators.

Or:

User:Marshrobin088: Hi my name is Robin Marsh and I've been in the digital design industry for 3 years. As a kid, art and technology always interested me. I could lose track of time doing art or messing around with computers.The way I approach web development is keeping in mind scalability, organisation, and clean syntax. As for the message or purpose is the nucleus,Self learner,highly interested in Geospatial development activities using open source tools. Having knowledge of GIS,vector graphics programming and data bases. Involved in teaching Geology, web and geospatial development. I am proficient in HTML/HTML5, CSS/CSS3, LESS, SASS, XML, JavaScript, jQuery, AJAX, and SQL/MySQL/PostgreSQL, to name a few. I am also proficient in many non-web-based languages, including but not limited to Java, Scheme/Racket, C, ACL2 (LISP), and MIPS Assembly. I have also worked on some smaller Python projects, and have used the language to create one-time use tools for data processing and similar purposes.

On these two above requests, for example, I asked the requesters back with a standard phrase like “Can you please elaborate about your relation/interest in OSGeo? ”, and never heard back. Some request are easy to identify as spam like the following:

User:Baarishi: baarishi is a good boy baarishi is a good boy baarishi is a good boy baarishi is a good boy baarishi is a good boy baarishi is a good boy baarishi is a good boy baarishi is a good boy baarishi is a good boy baarishi is a good boy baarishi

Or:

User:Mekee4444: im a person that need this web id to produce my business in whole world

And here are two example bios of spammers that got through, because I thought that these were valid requests:

User:Ehalu2016: Hi my name is Eahul and I've been in the digital design industry for 5 years. As a kid, art and technology always interested me. I could lose track of time doing art or messing around with computers.The way I approach web development is keeping in mind scalability, organisation, and clean syntax. As for the message or purpose is the nucleus,Self learner,highly interested in Geospatial development activities using open source tools. Having knowledge of GIS,vector graphics programming and data bases. Involved in teaching Geology, web and geospatial development

Or:

User:Mayerjohntec: A web developer and software engineer by profession, An open source enthusiast and a maker by heart. Honored to be sharing space among the Leaders we look up to and admire. I love contributing my best to take the Open Source Mission and OPEN WEB forward. I hold a Masters in Computers degree and have been working and contributing towards the open source community in all ways I can. Love Code, Privacy and Advocacy, learning, teaching and Community Building. I support Open data and Open Knowledge. As am a social person, and love interacting with new people,traveling, reading books, history, museums and listening to all kinds of music. Thanks John Mayer

As you can see from these above examples, I have to read a lot of BS on a daily basis fighting spam requests and cleaning up behind some spammers that got through. And in some cases it is really not easy to decide if its spam or not. Right now I tend to accept request were I am not sure, because it is really easy to block a user and delete/revert all his/her edits ever made to the wiki, as soon as I see them spamming.

But in the end, its already more than half an hour of work per day, and it seemingly will not get less...


 

comments powered by Disqus

Read and Post Comments

A 32c3 blog post, from someone who was not there

| categories: conference, open source, research | View Comments

As it is now almost tradition for me to follow the CCC Congress via video streaming between the years, during major years end home cleanup and holiday chill, I want to take note about some interesting talks and recommend you to see the recordings of this talks, embedded in this post.

First off, the video streaming, recording and its web publication delivered by the CCC Video Operation Center is top notch! It is possible to follow every talk, some of them even simultaneously translated and with sub texts in several languages. The videos were live streamed in several formats, embedded HTML5 video, or video streams to play from video player applications, such as VLC. Later on the video are published on the media.ccc.de servers, without ads and without tracking. Good services for embedding and sharing of the videos are provided too. It's really awesome and most important it's rock solid and professional. I am really looking forward to FOSS4G 2016 in Bonn, where these guys will be doing the video recordings for the FOSS4G Conference.

#dieselgate

The second part of this talk is very interesting, the presenter of this part was able to in some way reverse engineer, or at least to successfully trace, how the variables are tuned in the Volkswagen engines, for cheating the emissions on the test stands.

A pity that the first guy took so much time, because he did not told anything new. If you read for example the Spiegel Online headlines according to the #dieselgate you know evereything he said. Sadly, the talk was aborted by the moderator in the end, when the talk was getting most interesting, because they ran out of time.

Let's Encrypt

Let's encrypt is a great new service, for anyone running a web server or web site. This service makes it possible to get TLS/SSL certificates, accepted by all the major browsers, for free! I tried it on this very web site and server, and got it installed in about 5 minutes. Really easy and a great service indeed!

Onion Tor Talks

The first talk, State of the Onion, is getting a tradition at CCC Congress. Here the lead persons of the Tor project are delivering the latest news about the project to the community. An absolute must see, for anyone interested in security, privacy and decency on the internet.

The second Tor talk is more technical, but also shows interesting statistics and has some stories about for example, how Facebook uses Tor and Onion sites to deliver its services in repressional regimes, such as Saudi Arabia, Russia or China.

A new kid on the block

Here, Katharina Nocun delivers THE "anti Facebook" Talk of this years Congress. The new kid on the block is in this case the open source decentralized Diaspora* social network. If Diaspora* is really new, is object-able, but she has some very valid observations to share and reminds us of why it is not a good idea to trust in a central commercial entity with to much of your private informations and data. Here is also a good article in German about her and this talk.

One year of securitarian drift in France

A very interesting talk, in the aftermath of the Paris attacks and the Charlie Hebdo attack in France, was given by two French activists form the la quadrature du net, the French CCC (in some sense). The French authorities and politics are strengthen the surveillance and diminishing the freedom of speech, because they think this increases the security. The two activists tell a quite shocking story about the current situation in france.

There were of course a lot more very interesting talks, but I leave it to the many crowds of the inter webs to report on this other talks. You can find all recordings on the CCC media servers, to virtually participate in the Congress like I did.

A happy new year to you all and of course have fun!


 

comments powered by Disqus

Read and Post Comments

New SemanticMediawiki based OSGeo Member Map

| categories: webdev, semantic web, geospatial, osgeo, semantic mediawiki | View Comments

In this post, I give some background on the new Semantic Mediawik based OSGeo Members map, that replaced the userMap. Starting with the Mediawiki update and introducing Semantic Mediawiki, some words about the history of the userMap and most important an overview of the new implementation and possible additional applications of Semantic Mediawiki in the OSGeo Wiki are given.

The introduction of Semantic Mediawiki into the OSGeo Wiki

Recently, thanks to an effort by OSGeo SAC (namely by Martin Spott), the OSGeo Wiki underlying Mediawiki software was upgraded from an ancient version (I think it was 1.12) to the current 1.25.3. Additionally the Semantic Mediawiki (SMW) extension, including Semantic Maps was installed, to enhance the OSGeo Wiki with its features.

SMW is a Mediawiki extension, that allows to structure wiki content (as data) and provides tools for queriying, export and visualization of this structured data. The Semantic Maps extension adds the capabilitiy to visualize SMW content, containing data of the special type "Geographic Coordinate" on maps. SMW even offers an API that allows to query the structured data stored in the wiki from external applications and export data based on queries. SMW is a mature project running on many large Mediawiki implementations, by well known organizations like NASA, OLPC, The Free Software Directory, semanticweb.org, to name just a few.

The OSGeo Wiki userMap

The original OSGeo Wiki userMap, implemented by me in 2008 during an internship at WhereGroup, is now broken because of dependencies of the not anymore supported Mediawiki extension called Simple Forms. The extension implemented a parser hook, that allowed to store the spatial locations of users in a PostGIS database. And parser hooks for including OpenLayers based map into wiki pages, displaying a users Location as well as a map of all were implemented in this first version of the userMap. The now deprecated documentaion is for now still available in the wiki, to get an overview.

SMW based OSGeo Members map

The SMW data model was developed using a tool called mobo. Due to using mobo, it is possible to develop and maintain an SMW data model from a central point in a consistent manner, enhancing maintainability, coordinating possible collaboration and also allowing to grow the Schema to additional applications and scopes over time. Mobo is a command line toolset that helps building Semantic MediaWiki structure in an agile, model driven engineering (MDE) way. The Schema is formulated applying the JSON-Schema specification, in JSON or YAML notation, in a defined folder structure considering file naming conventions. A bit similar to some MVC frameworks for building a web applications domain. The documaentation including a tutorial and examples of the mobo toolkit, can be found here.

The development code files of the mobo model are stored and published in a GitHub repository, for community review and allowing anyone to send pull requests for helping to improve the SMW based capabilities of the OSGeo Wiki.

It was even possible, to save the locations entered through the previous userMap implementation into the mentioned PostGIS table. This was possible by exporting the data from the PosGIS table as CSV, applying some Python foo on the CSV (especially on the geometry wkb notation using Shapely) and importing the data into the wiki as CSV, using the Mediawiki DataTransfer Extension.

Conclusion and Outlook

The application of SMW technology in the OSGeo wiki has, with the introduction of the OSGeo Members model, created a valuable directory that gives a nice overview of the OSGeo community. It is possible to extend the model in the future, to a directory of Charter Members, or OSGeo Advocates. This would yield sortable tables and of course maps of these contacts.

It is even possible to develop models for the Service Providers, to replace the sometimes hard to maintain current Service Provider directory, or for example a model of the Geo4All laboratories to generate directory and an according map. But one of my favorite possible models would be a model for an Open Geo Data directory in the OSGeo wiki.

All these models and the emerging directories would be collaboratively created and maintained by the OSGeo community by just editing the wiki. And not yet to speak of what is possible with the Mediawiki API for querying the structured data and getting the results nicely in JSON format, and by far not yet to speak of enabling the SPARQL-Endpoint which comes with Semantic Mediawiki.

So, the OSGeo Wiki has a bright future If we want. I will do my best for this goal.

Have fun!


 

comments powered by Disqus

Read and Post Comments

Gridcoin it is!

| categories: gridcoin, research | View Comments

Recently, I started mining GridCoins, In the euphoria of successfully beeing awarded ~100 Gridcoin for BOINC computations today, I want to share some insights I got on the way. And yes, I also want to advertise this quite new and innovative crypto-currecy in this post a bit, because I believe it is a good thing to provide the computing energy for mining to actual scientific research computations, instead of computing just hashes like it is the case for Bitcoin, that produces no original wealth and only costs energy.


What is GridCoin and what is BOINC?

The short answer is, that GridCoin facilitates BOINC for mining a virtual math-based digital asset (crypto-currency), comparble to the more famous Bitcoin. In the following a more detailed answer is given, by first explaining BOINC, and how GridCoin works and how it relates to and applies the Berkeley Open Infrastructure for Network Computing (BOINC).

BOINC

BOINC is an open source software, that facilitates volunteer based grid computing. The project was originally developed to support the SETI@Home project, but is now also used for any kind of scientific computing intensive research projects.

BOINC is pretty easy to setup on any common operating system (Linux, Mac, Windows), the BOINC manager software is quite easy to use, and makes participation in volunteer grid computing very user firendly. You just need to run the installer, and then follow some GUI based steps which includes choosing a project you want to volunterr for and providing an email adress and password. This credentials are needed from the projects server to keep track of how much your computer has done. In turn the project awards you with with credits accordingly. To ensure that credit is granted fairly, most BOINC projects work as follows, each task may be sent to two computers, When a computer reports a result, it claims a certain amount of credit, based on how much CPU time was used. When at least two results have been returned, the server compares them. If the results agree, then users are granted the smaller of the claimed credits. It is notable, that between claiming the credit and being awardet the credits, a significant amount of time (several day) can be needed until two results for the same tasks are recieved.

GridCoin

Gridcoin is the first block chain protocol that delivered a working algorithm that equally rewards and cryptographically proves solving BOINC hosted work, which can be virtually any kind of distributed computing process.

So, mining currency in Grdcoin is based on an algorithm called Proof-of-Research (PoR), which basic idea is the more a user researches, the more Gridcoins the user will get.

The gridcoin-Research software and its wallet system manage the process of staking in Gridcoin and thus being able to get Gridcoins awarded for BOINC computations, and taking care of your Gridcoin kind of "savings account", which saves completed research activities for up to 14 days. So the wallet should be run AT LEAST every 14 days for a few hours. To be on the safe side, leave the wallet open for as long as possible. This has the additional benefit of supporting overall network security.

You can find a lot of sometimes confusing terms within the GridCoin software, here is a nice glossary where you can look up most of that terms.

GridCoin setup

I am normally working on a Ubuntu system, so I aimed for the Linux release of the gridcoin-research implementation. But it turned out the Linux setup is a bit tricky, and I did not managed to compile the software from source, because of some unresolvable (without circumventing ubuntu otherwise needed package versions of those libs) dependencies. Though, it seems there will be a packaged version of the software sometime in the future, the project exists on launchpad, but at the time of writing there were no packages available.

So, I went to install the Windows version in a virtual machine, whic is a no-brainer, just downloading the installer and following the simple installation steps.
After the basic setup, you should try to recieve some starter gridcoins into your wallet, to have a positive balance, and thus are able to stake. For now, it is possible to grab some so called faucets (free gridcoins), for example from Gridcoin Asia, or from Gridcoin Pool. On the main Gridcoin page you can whatch a nice video explaining the setup proces and also how to gain some first coins.
Take care that you also have joined all BOINC projects with the same email adress and also joined the gridcoin team on each project. And double check if the projects you compute are whitelisted for gridcoin.

Computing on multiple computers and BOINC projects

To be able to compute on multiple devices, and get rewarded for one GridCoin-Research client and its according wallet, you need to take care of the following constraints. First, it is important that you run a BOINC manager instance on the device on which the Gridcoin-Research client is running. This manager instance should join all projects that you are computing from the other devices (of course using the same account credentials), so the Gridcoin client knows about these projects. I additionally, also added teh other devices for remote managemetn from this BOINC manager instance, but this is not nessecary (at least how I read the documentation).
You can verify the correct setup, by listing your CPIDs in the Gridcoin-Research client, and check if every project is listed. This is don by openeing the "Debug console" in the Grindcoin client and type: "list cpids".

Now waiting...

To actually start being rewarded takes some quite notable time. In my case it took 8 days, from initial setup of having BOINC computing on three devices (VPS server, Desktop PC and Laptop), over adding some coins to the wallet for being able to stake, until gaining the first mined PoR. So, be paitent in the begining, it will pay off.

Look at how the projects perform on your devices, this is very hardware configuration dependent. Project A can be more suitable for device X and project B can possibly perfome better on device Y with a dedicated gaming GPU for example. So, keep an eye on this and play around with the setup to achive most with your hardware. Its possible to track the performance of your devices and according project computations via the BOINC manager statistics interface.

Happy Gridcoin mining!


 

comments powered by Disqus

Read and Post Comments

Misleading Chromium and Google Chrome warning message for self signed SSL certificates

| categories: webdev, ubuntu, open source, server | View Comments

Since some time, I am aware of a new Chromium or Google Chrome web browser warnig message concerning not "trusted" SSL certificates for SSL secured websites.

I guess that this new warnig message is part of Google's (basicaly good) campaign to support and promote the use of SSL. Google announced this campaign under the title HTTPS Everywhere at Google I/O 2014. They are talking about things like good citizenship of the web in the context of SSL.


Screenshot of the Chromium "Privacy error" warning, shown on accessing my own server via HTTPS.

The problem with that message is, that colleagues and freinds with whom I want to share data through my server, get scared by this misleading message from Chromium and Chrome, they get back to me saying that there might be somethig not working or wrong with my server. Then I have to try to explain to people, mostly barely knowing the difference between a website and a server, about SSL certificates and HTTPS, and convince them to trust me and not that serious appearing message... This does basically only work for people I know a fair bit. Some people with whom I need to work, but not know, will most probably not trust me and are scared away by this message, if they don't know enough about the matter of SSL encrypted HTTPS. And sorry Google, this is not good.


Even more misleading message chown by Chrome/Chromium if the user proceeds through the "Advanced" option.

In my view this warning message is not just very suggestive, in a way that it compromises the trust in accessing data and web applications on my server through HTTPS, it is also wrong in the content it claims. It says that accessing my server is unsafe. Which is not the case! And anybody who thinks that is the case when using a self signed certificate, please comment to this post and educate me.

I have now issued a free SSL certificate from StartSSL for the HTTPS configuration of my webserver, to get rid of this wrong and annoying warning by Chrome/Chromium. Which I am very uncomfortable with, because I do not trust this company in any way. And why the heck should I or anyone? I do not know anything about the people behind this company. And why the heck should I care? I just want to have a minumum protection for entering passwords and data into my webapplictation by providing HTTPS connection to my server. Since the Snowden revelations it is clear that SSL can be decrypted by knowledgeable enough "agency" anyway.... None the less, I am forced to trust in some company, which sells trust (which is plain wrong on so many different levels of implementation and from so many different angles of view on that matter). And I also need to force my colleagues and friends to trust in this company, from which I got some trust... This trust I gained throgh receiving and confirming an email send to an address on my domain name. That I host my Email not on the server, the domain is registered for, and where I use that certificate, does not matter for that company to trust me... :P.

On a side note, the warnings issued by FireFox or IE are way more polite, and do not scare away people from accessing my server (using a self signed certificate), they just accept the "asumed" and way less severe risk warnigs of those browsers notifications.

Finally, I have a question to you all. Please tell me, how a Self Signed Certificate is in any way less secure, than a "certified" and "trusted" one? The connection itself is not more or less secure, its just the trust. And as said, I am not comfortable with trusting some companies who can grant (sell) trust... This trust must come from the provider of the application and maintainer of the server that is to be accessed, I think.

Have fun and a good start into 2015!


 

comments powered by Disqus

Read and Post Comments

« Previous Page -- Next Page »