Difference between revisions of "Dropbox Crawler"

From SimpleWiki
Jump to navigationJump to search
Line 36: Line 36:
 
The only information that could reveal your identity is your IP address, which we will '''anonymize'''. All other statistics cannot be related to the person owning the files.
 
The only information that could reveal your identity is your IP address, which we will '''anonymize'''. All other statistics cannot be related to the person owning the files.
  
In the future, we will release a summary of our data on this Web site. Thus, anyone will be able to use our data sources for further researches. We will, however, take extra actions to ensure that no sensitive information leaks in our data.
+
In the future, we will release a summary of our data on this Web site. Thus, anyone will be able to use our data sources for further researches. We will, however, take extra actions to ensure that no sensitive information will be these data sets.
  
 
== What this program will NOT do? ==
 
== What this program will NOT do? ==

Revision as of 16:54, 7 January 2013

Personal cloud storage is becoming more and more popular - Dropbox is certainly the best known example. Cloud storage generates a huge amount of Internet traffic. Because of that, understanding how people interact with such applications is essential for designing efficient cloud storage systems.

We have been doing research on the usage of Dropbox (see our results here). As a next step, we need to know what type of files people store in the service. This would allow us to understand the impact of some technologies on the system performance and on network traffic, among other things. For that, we need volunteers to provide us basic statistics (size, type etc) about files stored in their folders.

Be part of the crowd

All you need to do is run a Java application at your PC. This application will read your Dropbox folder, calculate basic statistics, show everything to your approval and, only after that, send the statistics to us.

  • Most people will be able to run the application by clicking here


What will be captured?

For each file/folder in your Dropbox, the program will collect:

* Size in bytes
* Last modification time
* Mime type found by the Mime Type Detection Utility
* File extension - the sub-string after the last "." on the file name
* Hash (MD5) of both initial and final 8 bytes of the file
* Hash (MD5) of the file name

The program will also send to us:

* Hash (MD5) of your Dropbox configuration files (or a hash of your MAC address if we cannot read the former)
* Hash (MD5) of the path of your Dropbox home folder
* Your IP address


How will we use this information?

The only information that could reveal your identity is your IP address, which we will anonymize. All other statistics cannot be related to the person owning the files.

In the future, we will release a summary of our data on this Web site. Thus, anyone will be able to use our data sources for further researches. We will, however, take extra actions to ensure that no sensitive information will be these data sets.

What this program will NOT do?

  • Copy any file or folder out of your computer
  • Copy any other information than what is listed above
  • Install or store anything in your computer

You can also take a look on the source code of the program if you have any doubts about the program.

Client source code

Download the source code by clicking [here here]. You can compile the project using the ant tool, or any Java IDE (we use NetBeans v7.2.1)


More information

  • You can find more information about our work on this paper:

Drago, I. and Mellia, M. and Munafò, M. M. and Sperotto, A. and Sadre, R. and Pras, A. (2012) Inside Dropbox: Understanding Personal Cloud Storage Services. Proceedings of the 12th ACM Internet Measurement Conference - IMC'12, Boston, Nov. 2012

  • This page has more information about the data we used in our research so far.

External Links

These institutes are running this research: