Difference between revisions of "Dropbox Crawler"

From SimpleWiki
Jump to navigationJump to search
Line 34: Line 34:
 
== How will we use this information? ==
 
== How will we use this information? ==
  
We ensure that:
+
The only information that may reveal your identity is your IP address, which we will '''anonymize''' at the server side. All other statistics cannot be related to the person owning the files.  
 
 
All data we collect are anonymized.
 
We do not copy any file content.
 
We do not collect any personal information and file/dir names.
 
 
 
We also will make our data publicity in a near future. Thus, anyone will be able to use this important data source.
 
  
 +
In the future, we will release a summary of our data on this Web site. Thus, anyone will be able to use our data sources in further researches.
  
 
== What this program will NOT do? ==
 
== What this program will NOT do? ==

Revision as of 16:45, 7 January 2013

Personal cloud storage is becoming more and more popular - Dropbox is certainly the best known example. Cloud storage generates a huge amount of Internet traffic. Because of that, understanding how people interact with such applications is essential for designing efficient cloud storage systems.

We have been doing research on the usage of Dropbox (see our results here). As a next step, we need to know what type of files people store in the service. This would allow us to understand the impact of some technologies on the system performance and on network traffic, among other things. For that, we need volunteers to provide us basic statistics (size, type etc) about files stored in their folders.

Be part of the crowd

All you need to do is run a Java application at your PC. This application will read your Dropbox folder, calculate basic statistics, show everything to your approval and, only after that, send the statistics to us.

  • Most people will be able to run the application by clicking here
  • In case your browser does not support that, you can download the package and run it: Just double click on it!


What will be captured?

For each file/folder in your Dropbox, the program will collect:

* Size in bytes
* Last modification time
* Mime type found by the Mime Type Detection Utility
* File extension - the sub-string after the last "." on the file name
* Hash (MD5) of both initial and final 8 bytes of the file
* Hash (MD5) of the file name

The program will also send to us:

* Hash (MD5) of your Dropbox configuration files (or a hash of your MAC address if we cannot read the former)
* Hash (MD5) of the path of your Dropbox home folder
* Your IP address


How will we use this information?

The only information that may reveal your identity is your IP address, which we will anonymize at the server side. All other statistics cannot be related to the person owning the files.

In the future, we will release a summary of our data on this Web site. Thus, anyone will be able to use our data sources in further researches.

What this program will NOT do?

  • Copy any file, folder our of your computer
  • Copy any other information than what is stated above
  • Install or store anything in your computer

You can also take a look on the source code of the program if you have doubts about the program.

Client source code

Download the source code by clicking [here here]. You can compile the project using the ant tool, or any Java IDE (we use NetBeans v7.2.1)


More information

  • You can find more information about our work on this paper:

Drago, I. and Mellia, M. and Munafò, M. M. and Sperotto, A. and Sadre, R. and Pras, A. (2012) Inside Dropbox: Understanding Personal Cloud Storage Services. Proceedings of the 12th ACM Internet Measurement Conference - IMC'12, Boston, Nov. 2012

  • This page has more information about the data we used in our research so far.

External Links

These institutes are running this research: