Qri: A bleeding-edge open source data science tool on the distributed web
jeu. 7 février à 01:00
Our meetup exists to help promote the useful exchange between academia and industry. Qri is such a tool that can be useful for both worlds. Cancer research generates lots of data from various sources? How can we manage this across different organizations? We can use the open source tool Qri! Qri is like Git/Github for large datasets. Qri is useful for fields that have to manage, integrate and clean large amounts of data in a decentralized manner like in bio-informatics, finance, data science, climatology, etc. Academic researchers and business professionals who deal with large data sets will find this tool useful. What is Qri ("query")?
Qri is a free and open source dataset version control system built on IPFS (the distributed web) - a kind of git/github just for datasets. Think:
Git/Github -> version control + network for distributed community code.
Qri -> version control for large data sets + distributed community data Qri scales better than Git when dealing with massive data sets See: http://qri.io/ Qri lives at the intersection of open data and dataset versioning. Qri's founder, Brendan, established Qri after volunteering full time with a recent data rescue effort. Unlike general version control systems, all datasets stored in Qri can interoperate because they have the same composition. Datasets are stored & transmitted in standard formats, allowing outside systems to bypass Qri entirely to interact directly with datasets Qri produces & consumes. Datasets You Can Actually Use:
Every dataset change is tracked & attributed to an author, so you can audit whether the data you’re looking at meets your standards, and track changes as they happen. What Github did to code, Qri will to data. Who is Qri best for?
open source software engineers/hackers
open data publishers/users
Bio-informatics researchers Use cases:
Publishing: Put your data directly into the hands of those who want to see it and use it on the distributed web Data Sync: The data you rely on may change in an instant, often, or both. Use Qri to automatically ingest web content into a structured, auto-updating, and versioned dataset, keeping your data fresh. Automated Munging: Design automated transformations to clean a dataset and prepare it for analysis. When the source data is updated, Qri will clean it and incorporate it into your dataset. Collaboration: What are the chances someone has already done the same analysis you thought about doing yourself? Because datasets on Qri are public, you can easily discover who has played with similar data before, and you can know exactly what they did to it. You can then take that data and ‘fork’ it, keep it, build on it, or propose changes - just like Github. What We'll Demo / What they'll learn:
How to create a profile, & find, copy, create and publish a dataset
How to write a script to generate and auto-update a dataset
How to commit to (edit) a dataset and track versions
How to build a visualization For more on Qri:
Docs & Tutorials: http://qri.io/docs/
Twitter: https://twitter.com/qri_io Target Audience: ~35-40 data scientists, open source enthusiasts/hackers, data analysts, researchers, or open/civic data people. Right now Qri has command line tools and a Mac-supported app (windows support is coming!) What else?: Qri will provide food and drink for anyone who attends!
Nous avons temporairement désactivé la possibilité de naviguer vers les tags.