33. VLDB 2007: Vienna, Austria
Christoph Koch, Johannes Gehrke, Minos N. Garofalakis, Divesh Srivastava, Karl Aberer, Anand Deshpande, Daniela Florescu, Chee Yong Chan, Venkatesh Ganti, Carl-Christian Kanne, Wolfgang Klas, Erich J. Neuhold (Eds.): Proceedings of the 33rd International Conference on Very Large Data Bases, University of Vienna, Austria, September 23-27, 2007. ACM 2007 ISBN 978-1-59593-649-3
Cover Page.
Sponsors.
Copyright Notice.
Welcome Message from the General Chairs.
Foreword from the PC Chairs.
Conference Officers.
Program Committees and External Reviewers.
VLDB Endowment Board of Trustees.
Table of Contents (pdf).
Keynotes
Werner Vogels: Data Access Patterns in The Amazon.com Technology Platform. 1
Eric A. Brewer: Technology for Developing Regions. 2
10 Year Best Paper Award
Research Sessions
Uncertain and Probabilistic Data


Douglas Burdick, AnHai Doan, Raghu Ramakrishnan, Shivakumar Vaithyanathan: OLAP over Imprecise Data with Domain Constraints. 39-50
Christopher Re, Dan Suciu: Materialized Views in Probabilistic Databases for Information Exchange and Query Optimization. 51-62
XML Query Processing
Shirish Tatikonda, Srinivasan Parthasarathy, Matthew Goyder: LCS-TRIM: Dynamic Programming Meets XML Indexing and Querying. 63-74
Irina Botan, Peter M. Fischer, Daniela Florescu, Donald Kossmann, Tim Kraska, Rokas Tamosevicius: Extending XQuery with Window Functions. 75-86
Andrei Arion, Véronique Benzaken, Ioana Manolescu, Yannis Papakonstantinou: Structured Materialized Views for XML Queries. 87-98
Outsourcing and Authentication
Wai Kit Wong, David W. Cheung, Edward Hung, Ben Kao, Nikos Mamoulis: Security in Outsourcing of Association Rule Mining. 111-122
Sabrina De Capitani di Vimercati, Sara Foresti, Sushil Jajodia, Stefano Paraboschi, Pierangela Samarati: Over-encryption: Management of Access Control Evolution on Outsourced Data. 123-134
Stavros Papadopoulos, Yin Yang, Dimitris Papadias: CADS: Continuous Authentication on Data Streams. 135-146
Feifei Li, Ke Yi, Marios Hadjieleftheriou, George Kollios: Proof-Infused Streams: Enabling Authentication of Sliding Window Queries On Streams. 147-158
Data Stream Processing
Nesime Tatbul, Ugur Çetintemel, Stanley B. Zdonik: Staying FIT: Efficient Load Shedding Techniques for Distributed Stream Processing. 159-170
Aiyou Chen, Jin Cao, Tian Bu: A Simple and Efficient Estimation Method for Stream Expression Cardinalities. 171-182
Gautam Das, Dimitrios Gunopulos, Nick Koudas, Nikos Sarkas: Ad-hoc Top-k Query Answering for Data Streams. 183-194
Text Databases
Hongrae Lee, Raymond T. Ng, Kyuseok Shim: Extending Q-Grams to Estimate Selectivity of String Matching with Low Edit Distance. 195-206
Witold Litwin, Riad Mokadem, Philippe Rigaux, Thomas J. E. Schwarz: Fast nGram-Based String Search Over Data Encoded Using Algebraic Signatures. 207-218
Relational Models and Views
Skyline Query Processing
Michael D. Morse, Jignesh M. Patel, H. V. Jagadish: Efficient Skyline Computation over Low-Cardinality Domains. 267-278
Ken C. K. Lee, Baihua Zheng, Huajing Li, Wang-Chien Lee: Approaching the Skyline in Z Order. 279-290
Data Quality
Chen Li, Bin Wang, Xiaochun Yang: VGRAM: Improving Performance of Approximate Queries on String Collections Using Variable-Length Grams. 303-314
Gao Cong, Wenfei Fan, Floris Geerts, Xibei Jia, Shuai Ma: Improving Data Quality: Consistency and Accuracy. 315-326
Surajit Chaudhuri, Bee-Chung Chen, Venkatesh Ganti, Raghav Kaushik: Example-driven design of efficient record matching queries. 327-338
Novel Architectures

Ryan Johnson, Nikos Hardavellas, Ippokratis Pandis, Naju Mancheril, Stavros Harizopoulos, Kivanc Sabirli, Anastassia Ailamaki, Babak Falsafi: To Share or Not To Share? 351-362
Web Data Management and Search
Junghoo Cho, Uri Schonfeld: RankMass Crawler: A Crawler with High PageRank Coverage Guarantee. 375-386
Tao Cheng, Xifeng Yan, Kevin Chen-Chuan Chang: EntityRank: Searching Entities Directly and Holistically. 387-398
Pedro DeRose, Warren Shen, Fei Chen, AnHai Doan, Raghu Ramakrishnan: Building Structured Web Community Portals: A Top-Down, Compositional, and Incremental Approach. 399-410
Daniel J. Abadi, Adam Marcus, Samuel Madden, Katherine J. Hollenbach: Scalable Semantic Web Data Management Using Vertical Partitioning. 411-422
Time-Series Data Mining
Wook-Shin Han, Jinsoo Lee, Yang-Sae Moon, Haifeng Jiang: Ranked Subsequence Matching in Time-Series Databases. 423-434
Qiuxia Chen, Lei Chen, Xiang Lian, Yunhao Liu, Jeffrey Xu Yu: Indexable PLA for Efficient Similarity Search. 435-446
Xiaolei Li, Jiawei Han: Mining Approximate Top-K Subspace Anomalies in Multi-Dimensional Time-Series Data. 447-458
Spiros Papadimitriou, Feifei Li, George Kollios, Philip S. Yu: Time Series Compressibility and Privacy. 459-470
Top-k Queries and Ranking I

Man Lung Yiu, Nikos Mamoulis: Efficient Processing of Top-k Dominating Queries on Multi-Dimensional Data. 483-494
Reza Akbarinia, Esther Pacitti, Patrick Valduriez: Best Position Algorithms for Top-k Queries. 495-506
Yan Qi, K. Selçuk Candan, Maria Luisa Sapino: Sum-Max Monotonic Ranked Joins for Evaluating Top-K Twig Queries on Weighted Data Graphs. 507-518
Private and Secure Databases

Vibhor Rastogi, Sungho Hong, Dan Suciu: The Boundary Between Privacy and Utility in Data Publishing. 531-542
Raymond Chi-Wing Wong, Ada Wai-Chee Fu, Ke Wang, Jian Pei: Minimality Attack in Privacy Preserving Data Publishing. 543-554
Qihua Wang, Ting Yu, Ninghui Li, Jorge Lobo, Elisa Bertino, Keith Irwin, Ji-Won Byun: On the Correctness Criteria of Fine-Grained Access Control in Relational Databases. 555-566
Spatial Databases
Daniel Zinn, Jim Bosch, Michael Gertz: Modeling and Querying Vague Spatial Objects Using Shapelets. 567-578
Raymond Chi-Wing Wong, Yufei Tao, Ada Wai-Chee Fu, Xiaokui Xiao: On Efficient Spatial Matching. 579-590
Laurynas Biveinis, Simonas Saltenis, Christian S. Jensen: Main-Memory Operation Buffering for Efficient R-Tree Update. 591-602
Business and Web Services
Catriel Beeri, Anat Eyal, Tova Milo, Alon Pilberg: Monitoring Business Processes with Queries. 603-614
Marko Vrhovnik, Holger Schwarz, Oliver Suhre, Bernhard Mitschang, Volker Markl, Albert Maier, Tobias Kraft: An Approach to Optimize Data Processing in Business Processes. 615-626
Dumitru Roman, Michael Kifer: Reasoning about the Behavior of Semantic Web Services with Concurrent Transaction Logic. 627-638
Information Integration I
Fei Xu, Chris Jermaine: Randomized Algorithms for Data Reconciliation in Wide Area Aggregate Query Processing. 639-650
Garrett Wolf, Hemal Khatri, Bhaumik Chokshi, Jianchun Fan, Yi Chen, Subbarao Kambhampati: Query Processing over Incomplete Autonomous Databases. 651-662
Marcos Antonio Vaz Salles, Jens-Peter Dittrich, Shant Kirakos Karakashian, Olivier René Girard, Lukas Blunschi: iTrails: Pay-as-you-go Information Integration in Dataspaces. 663-674
Information Integration II
Todd J. Green, Grigoris Karvounarakis, Zachary G. Ives, Val Tannen: Update Exchange with Mappings and Provenance. 675-686
Shui-Lung Chuang, Kevin Chen-Chuan Chang, ChengXiang Zhai: Context-Aware Wrapping: Synchronized Data Extraction. 699-710
Query Processing

Marcin Zukowski, Sándor Héman, Niels Nes, Peter A. Boncz: Cooperative Scans: Dynamic Bandwidth Sharing in a DBMS. 723-734
Surajit Chaudhuri, Raghav Kaushik, Ravishankar Ramamurthy, Abhijit Pol: Stop-and-Restart Style Execution for Long Running Decision Support Queries. 735-745
Data Privacy, Anonymization, and Outsourcing
Tochukwu Iwuchukwu, Jeffrey F. Naughton: K-Anonymization as Spatial Indexing: Toward Scalable and Incremental Anonymization. 746-757
Gabriel Ghinita, Panagiotis Karras, Panos Kalnis, Nikos Mamoulis: Fast Data Anonymization with Low Information Loss. 758-769
Bee-Chung Chen, Raghu Ramakrishnan, Kristen LeFevre: Privacy Skyline: Privacy with Multidimensional Adversarial Knowledge. 770-781
Novel Data Mining Applications
Hector Gonzalez, Jiawei Han, Xiaolei Li, Margaret Myslinska, John Paul Sondag: Adaptive Fastest Path Computation on a Road Network: A Traffic Mining Approach. 794-805
Nilesh Bansal, Fei Chiang, Nick Koudas, Frank Wm. Tompa: Seeking Stable Clusters in the Blogosphere. 806-817
Cuiping Li, Anthony K. H. Tung, Wen Jin, Martin Ester: On Dominating Your Neighborhood Profitably. 818-829
Peter J. Haas, Fabian Hueske, Volker Markl: Detecting Attribute Dependencies from Query Feedback. 830-841
Sensor Networks and Information Dissemination
Adam Silberstein, Alan E. Gelfand, Kamesh Munagala, Gavino Puggioni, Jun Yang: Making Sense of Suppressions and Failures in Sensor Data: A Bayesian Approach. 842-853
Arnab Bhattacharya, Anand Meka, Ambuj K. Singh: MIST: Distributed Indexing and Querying in Sensor Networks using Statistical Models. 854-865
Mirella Moura Moro, Petko Bakalov, Vassilis J. Tsotras: Early Profile Pruning on XML-aware Publish/Subscribe Systems. 866-877
Badrish Chandramouli, Jeff Phillips, Jun Yang: Value-Based Notification Conditions in Large-Scale Publish/Subscribe Systems. 878-889
Top-k Queries and Ranking II
Ming Hua, Jian Pei, Ada Wai-Chee Fu, Xuemin Lin, Ho-fung Leung: Efficiently Answering Top-k Typicality Queries on Large Databases. 890-901
Karl Schnaitter, Joshua Spiegel, Neoklis Polyzotis: Depth Estimation for Ranking Query Optimization. 902-913
Benjamin Arai, Gautam Das, Dimitrios Gunopulos, Nick Koudas: Anytime Measures for Top-k Algorithms. 914-925
Indexing and Search
Chen Chen, Xifeng Yan, Philip S. Yu, Jiawei Han, Dong-Qing Zhang, Xiaohui Gu: Towards Graph Containment Search and Indexing. 926-937
Qin Lv, William Josephson, Zhe Wang, Moses Charikar, Kai Li: Multi-Probe LSH: Efficient Indexing for High-Dimensional Similarity Search . 950-961
Distributed Data Management
Navendu Jain, Michael Dahlin, Yin Zhang, Dmitry Kit, Prince Mahajan, Praveen Yalagandula: STAR: Self-Tuning Aggregation for Scalable Monitoring. 962-973
Jorge-Arnulfo Quiané-Ruiz, Philippe Lamarre, Patrick Valduriez: SQLB: A Query Allocation Framework for Autonomous Consumers and Providers. 974-985
Christos Doulkeridis, Akrivi Vlachou, Yannis Kotidis, Michalis Vazirgiannis: Peer-to-Peer Similarity Search in Metric Spaces. 986-997
Schema and Structure Management
Geert Jan Bex, Frank Neven, Stijn Vansummeren: Inferring XML Schema Definitions from XML Data. 998-1009
Sven Helmer: Measuring the Structural Similarity of Semistructured Documents Using Entropy. 1022-1032
Information Extraction and Text
Warren Shen, AnHai Doan, Jeffrey F. Naughton, Raghu Ramakrishnan: Declarative Information Extraction Using Datalog with Embedded Extraction Predicates. 1033-1044
Eric Chu, Akanksha Baid, Ting Chen, AnHai Doan, Jeffrey F. Naughton: A Relational Approach to Incrementally Extracting and Querying Structure in Unstructured Data. 1045-1056
Feng Shao, Lin Guo, Chavdar Botev, Anand Bhaskar, Muthiah Chettiar, Fan Yang, Jayavel Shanmugasundaram: Efficient Keyword Search over Virtual XML Views. 1057-1068
Query Optimization for Novel Applications
Michael Gibas, Ning Zheng, Hakan Ferhatosmanoglu: A General Framework for Modeling and Processing Optimization Queries. 1069-1080
Harish D., Pooja N. Darera, Jayant R. Haritsa: On the Production of Anorexic Plan Diagrams. 1081-1092
Stratos Papadomanolakis, Debabrata Dash, Anastassia Ailamaki: Efficient Use of the Query Optimizer for Automated Database Design. 1093-1104
Industrial, Application, and Experience Sessions
Decision Support
Stefan Krompass, Umeshwar Dayal, Harumi A. Kuno, Alfons Kemper: Dynamic Workload Management for Very Large Data Warehouses: Juggling Feathers and Bowling Balls. 1105-1115
Mingwu Zhang, Xiangyu Zhang, Xiang Zhang, Sunil Prabhakar: Tracing Lineage Beyond Relational Operators. 1116-1127
Fabio Casati, Malú Castellanos, Umeshwar Dayal, Norman Salazar: A Generic solution for Warehousing Business Process Data. 1128-1137
Meikel Pöss, Raghunath Othayoth Nambiar, David Walrath: Why You Should Run TPC-DS: A Workload Analysis. 1138-1149
Invited Talks
Michael Stonebraker, Samuel Madden, Daniel J. Abadi, Stavros Harizopoulos, Nabil Hachem, Pat Helland: The End of an Architectural Era (It's Time for a Complete Rewrite). 1150-1160
Michael L. Brodie: Computer Science 2.0: A New World of Data Management. 1161
Data Streams

Sankar Subramanian, Srikanth Bellamkonda, Hua-Gang Li, Vince Liang, Lei Sheng, Wayne Smith, James Terry, Tsae-Feng Yu, Andrew Witkowski: Continuous Queries in Oracle. 1173-1184
Kun-Lung Wu, Philip S. Yu, Bugra Gedik, Kirsten Hildrum, Charu C. Aggarwal, Eric Bouillet, Wei Fan, David George, Xiaohui Gu, Gang Luo, Haixun Wang: Challenges and Experience in Prototyping a Multi-Modal Stream Analytic and Monitoring Application on System S. 1185-1196
Query Processing Engines
Bishwaranjan Bhattacharjee, Timothy Malkemus, Sherman Lau, Sean Mckeough, Jo-Anne Kirton, Robin Von Boeschoten, John Kennedy: Efficient Bulk Deletes for Multi Dimensionally Clustered Tables in DB2. 1197-1206
Ying Hu, Seema Sundara, Jagannathan Srinivasan: Supporting Time-Constrained SQL Queries in Oracle. 1207-1218
Rubao Lee, Minghong Zhou, Huaming Liao: Request Window: an Approach to Improve Throughput of RDBMS-based Data Integration System by Utilizing Data Sharing Across Concurrent Distributed Queries. 1219-1230
Nicola Onose, Vinayak R. Borkar, Michael J. Carey: Inverse Functions in the AquaLogic Data Services Platform. 1231-1242
Profiling
Hardik Bati, Leo Giakoumakis, Steve Herbert, Aleksandras Surna: A genetic approach for random testing of database systems. 1243-1251
Surajit Chaudhuri, Vivek R. Narasayya, Manoj Syamala: Bridging the Application and DBMS Profiling Divide for Database Application Developers. 1252-1262
Sudhir Jorwekar, Alan Fekete, Krithi Ramamritham, S. Sudarshan: Automating the Detection of Snapshot Isolation Anomalies. 1263-1274
Engine Infrastructure

Bugra Gedik, Rajesh Bordawekar, Philip S. Yu: CellSort: High Performance Sorting on the Cell Processor. 1286-1207
Christian A. Lang, Bishwaranjan Bhattacharjee, Timothy Malkemus, Kwai Wong: Increasing Buffer-Locality for Multiple Index Based Scans through Intelligent Placement and Index Scan Speed Control. 1298-1309
Demo Sessions
Demo Group I
Fusheng Wang, Pierre-Emmanuel Bourgue, Georg Hackenberg, Mo Wang, David Kaltschmidt, Peiya Liu, Cornelius Rabsch, Patrick Kling, Gerald Madlmayr, John Pearson, Joe Carpinelli: SciPort: An Adaptable Scientific Data Integration Platform for Collaborative Scientific Research. 1310-1313
Tianyi Wu, Xiaolei Li, Dong Xin, Jiawei Han, Jacob Lee, Ricardo Redder: DataScope: Viewing Database Contents in Google Maps' Way. 1314-1317
Fabien Duchateau, Zohra Bellahsene, Ela Hunt: XBenchMatch: a Benchmark for XML Schema Matching Tools. 1318-1321
David Kensche, Christoph Quix, Xiang Li, Yong Li: GeRoMeSuite: A System for Holistic Generic Model Management. 1322-1325
Laura Chiticariu, Mauricio A. Hernández, Phokion G. Kolaitis, Lucian Popa: Semi-Automatic Schema Integration in Clio. 1326-1329
Philippe Cudré-Mauroux, Suchit Agarwal, Adriana Budura, Parisa Haghani, Karl Aberer: Self-Organizing Schema Mappings in the GridVine Peer Data Management System. 1334-1337
Ullas Nambiar, Himanshu Gupta, Mukesh K. Mohania: CallAssist: Helping Call Center Agents in Preference Elicitation. 1338-1341
Radu Sion, Sumeet Bajaj, Bogdan Carbunar, Stefan Katzenbeisser: NS2: Networked Searchable Store with Correctness. 1342-1345
Christophe Salperwyck, Nicolas Anciaux, Mehdi Benzine, Luc Bouganim, Philippe Pucheral, Dennis Shasha: GhostDB: Hiding Data from Prying Eyes. 1346-1349
Demo Group II
Jens Bleiholder, Karsten Draba, Felix Naumann: FuSem - Exploring Different Semantics of Data Fusion. 1350-1353
Christoph Brochhaus, Thomas Seidl: IndeGS: Index Supported Graphics Data Server for CFD Data Postprocessing. 1354-1357
Michael Minock: A STEP Towards Realizing Codd's Vision of Rendezvous with the Casual User. 1358-1361
Olivier Biton, Sarah Cohen Boulakia, Susan B. Davidson: Zoom*UserViews: Querying Relevant Provenance in Workflow Systems. 1366-1369
Mehmet Altinel, Paul Brown, Susan Cline, Rajesh Kartha, Eric Louie, Volker Markl, Louis Mau, Yip-Hing Ng, David E. Simmen, Ashutosh Singh: DAMIA - A Data Mashup Fabric for Intranet Applications. 1370-1373
Heng Tao Shen, Xiaofang Zhou, Zi Huang, Jie Shao, Xiangmin Zhou: UQLIPS: A Real-time Near-duplicate Video Clip Detection System. 1374-1377
Christoph Koch, Stefanie Scherzinger, Michael Schmidt: The GCX System: Dynamic Buffer Minimization in Streaming XQuery Evaluation. 1378-1381
Tobias Kraft: A Cost-Estimation Component for Statement Sequences. 1382-1385
Demo Group III
Serge Abiteboul, Itay Dar, Radu Pop, Gabriel Vasile, Dan Vodislav, Nicoleta Preda: Large Scale P2P Distribution of Open-Source Software. 1390-1393
Tobias Scholl, Bernhard Bauer, Benjamin Gufler, Richard Kuntschke, Daniel Weber, Angelika Reiser, Alfons Kemper: HiSbase: Histogram-based P2P Main Memory Data Management. 1394-1397
Josiane Xavier Parreira, Sebastian Michel, Matthias Bender, Tom Crecelius, Gerhard Weikum: P2P Authority Analysis for Social Communities. 1398-1401
Sandeep Tata, Willis Lang, Jignesh M. Patel: Periscope/SQ: Interactive Exploration of Biological Sequence Databases. 1406-1409
Nilesh Bansal, Nick Koudas: BlogScope: A System for Online Analysis of High Volume Text Streams. 1410-1413
Klaus Berberich, Srikanta J. Bedathur, Thomas Neumann, Gerhard Weikum: FluxCapacitor: Efficient Time-Travel Text Search. 1414-1417
Wen-Syan Li, Dengfeng Gao, Rafae Bhatti, Inderpal Narang, Hirofumi Matsuzawa, Masayuki Numao, Masahiro Ohkawa, Takeshi Fukuda: Deadline and QoS Aware Data Warehouse. 1418-1421
Lyublena Antova, Christoph Koch, Dan Olteanu: Query language support for incomplete information in the MayBMS system. 1422-1425
Tutorials
Zachary G. Ives, Amol Deshpande, Vijayshankar Raman: Adaptive query processing: Why, How, When, and What Next? 1426-1427
Peter Baumann: Raster databases. 1428-1429
Ling Liu: From Data Privacy to Location Privacy: Models and Algorithms. 1429-1430
Radu Sion: Secure Data Outsourcing. 1431-1432
Amol Deshpande, Sunita Sarawagi: Probabilistic Graphical Models and their Role in Databases. 1435-1436
Mariano P. Consens, Ricardo A. Baeza-Yates, Mounia Lalmas, Sihem Amer-Yahia: XML retrieval: db/ir in theory, web in practice. 1437-1438
Philip A. Bernstein, Howard Ho: Model Management and Schema Mappings: Theory and Practice. 1439-1440
Panels
Ioana Manolescu, Stefan Manegold: Performance Evaluation and Experimental Assessment - Conscience or Curse of Database Research? 1441-1442



