With a busy fall full of development and some travel I wasn’t able to keep up with monthly or bi-monthly reports, so this report gives highlights about the Ursa Labs team’s work from August up until now. With 2019 nearly behind us, this also gives us a moment to reflect on everything that’s been accomplished this year and look forward to next year. We hope to write a blog post soon discussing the big picture plan ahead for 2020.
Development Highlights
Reading and Writing Data
- C++ Datasets API: read multi-file, partitioned datasets as a stream/sequence of Arrow columnar batches with minimal exposure of physical details. Enable execution of rudimentary queries (involving scan, filter, and projection) against Arrow Datasets.
- C++ Filesystem API: added support for S3-compatible data stores, ported HDFS interface to use the common virtual filesystem API
- Parquet improvements: direct-to-
DictionaryArray
reads, faithful reading and writing ofpandas
Categorical data. See the blog post for more.
We also worked with the Arrow community to come up with 1.0.0 version plan and subsequent backward and forward compatibility guarantees with the Arrow columnar format.
End-user Features
- R bindings include convenience methods for reading Parquet and other file types, as well as lower-level classes and methods that wrap the C++ objects. See the package vignette for details.
dplyr
methods for querying Arrow Tables and Datasets in R will be included in the next release.- Integration between Arrow “extension types” and
pandas
’s extension (custom) arrays.
Packaging and Testing
- R package delivery: now available for download on CRAN; documentation website available; nightly binary packages for macOS and Windows published
- Ported project’s continuous integration to GitHub Actions for better maintainability and turnaround time (see ARROW-7101). As part of this, all of our Linux CI tasks have been migrated to use Docker Compose for better local reproducibility. Reproducing macOS or Windows builds locally still requires some effort; we may improve this in the future
- Implemented nightly e-mail summary of failing test jobs that are run once a day (like package builds) instead of on every commit. This has helped our awareness of failing jobs significantly
- Numerous improvements to C++ build system to enable a “zero build dependency” core build, most optional project components are now disabled by default to yield a simpler, faster default build with no external third party dependencies. The project’s dependence on Boost has been significantly reduced.
Releases
- 0.15.0 major release on October 5, 2019 (announcement)
- 0.15.1 patch release on November 1, 2019
Talks and Blog Posts
Here are some talks and publications from the team during this period:
- Introducing Apache Arrow Flight: A Framework for Fast Data Transport
- R package initial CRAN release
- Parquet improvements and benchmarks blog posts
- VLDB 2019 Apache Arrow Tutorial: Materials and Slides
- OmniSci Converge: Slides
Arrow Maintenance
There were 949 overall commits to Apache Arrow during the time period. Ursa Labs was responsible for merging 606, so 63% of the project’s overall patch maintenance.
Team Changelog
The team had 468 commits merged into Apache Arrow during this period. That is 49% of the project’s overall commits.
Here are the most frequently occurring categories of issues:
- C++: 220 issues
- CI: 49 issues
- Parquet: 20 issues
- Python: 134 issues
- R: 54 issues
You can click on the ASF JIRA links to learn more about the discussion on a particular issue or the commit hash to see each patch.
- 2019-08-01: ARROW-6000: [Python] Add support for LargeString and LargeBinary types (eb73b9 by pitrou)
- 2019-08-01: ARROW-6077: [C++][Parquet] Build Arrow schema tree" from Parquet schema to help with nested data implementation" (06fd2d by wesm)
- 2019-08-01: ARROW-5974: [C++] Support reading concatenated compressed streams (a39f7b by pitrou)
- 2019-08-01: ARROW-6068: [C++] Allow passing Field instances to StructArray::Make (d44f03 by pitrou)
- 2019-08-01: ARROW-5414: [C++] default to release build on windows (248f2c by bkietz)
- 2019-08-02: ARROW-6002: [C++][Gandiva] test casting int64 to decimal (4f520a by bkietz)
- 2019-08-02: ARROW-5527: [C++] Uses Buffer/Builder in HashTable and MemoTable (6f9880 by fsaintjacques)
- 2019-08-05: ARROW-6108: [C++] Workaround Windows CRT crash on invalid locale (134e9c by pitrou)
- 2019-08-05: ARROW-3325: [Python][Parquet] Add read_dictionary" argument to parquet.read_table, ParquetDataset to enable direct-to-DictionaryArray reads" (7aefa5 by wesm)
- 2019-08-05: ARROW-3204: [R] Enable R package to be made available on CRAN (60c935 by nealrichardson)
- 2019-08-05: ARROW-6135: [C++] Make KeyValueMetadata::Equals() order-insensitive (cf90ea by pitrou)
- 2019-08-05: ARROW-3325: [Python][FOLLOWUP] In Python 2.7, a class’s __doc__ member is not writable (#5018) (42f4f3 by wesm)
- 2019-08-06: ARROW-6084: [Python] Support LargeList (2774cf by pitrou)
- 2019-08-07: ARROW-6060: [C++] ChunkedBinaryBuilder should only grow when necessary, address runaway memory use in Parquet binary column read (49badd by bkietz)
- 2019-08-08: ARROW-6082: [Python] check type of the index_type passed to pa.dictionary() (c675df by jorisvandenbossche)
- 2019-08-08: ARROW-6152: [C++][Parquet] Add parquet::ColumnWriter::WriteArrow method, refactor (b4c176 by wesm)
- 2019-08-08: ARROW-6041: [Website] Blog post announcing R library availability on CRAN (d63fe6 by nealrichardson)
- 2019-08-08: ARROW-6132: [Python] validate result in ListArray.from_arrays (3e6d75 by jorisvandenbossche)
- 2019-08-08: ARROW-6167: [R] macOS binary R packages on CRAN don’t have arrow_available (13f5e9 by nealrichardson)
- 2019-08-08: ARROW-6142: [R] Install instructions on linux could be clearer (908b05 by nealrichardson)
- 2019-08-08: ARROW-6121: [Tools] Improve merge tool ergonomics (9b3e69 by fsaintjacques)
- 2019-08-12: ARROW-5977: [C++] [Python] Allow specifying which columns to include (93688e by pitrou)
- 2019-08-13: ARROW-5746: [Website] Move website source out of apache/arrow (84254c by nealrichardson)
- 2019-08-13: ARROW-6181: [R] Only allow R package to install without libarrow on linux (cb9f71 by nealrichardson)
- 2019-08-13: ARROW-517: [C++] array comparison, uses D**2 space Myers (cba9c7 by bkietz)
- 2019-08-14: ARROW-6177: [C++] Add Array::Validate() (13851d by pitrou)
- 2019-08-14: ARROW-5559: [C++] Add an IpcOptions structure (cef7e8 by pitrou)
- 2019-08-14: ARROW-6224: [Python] fix deprecated usage of .data (previouly Column.data) (c7b937 by jorisvandenbossche)
- 2019-08-15: ARROW-6038: [C++] Faster type equality (91e33d by pitrou)
- 2019-08-15: ARROW-6237: [R] Allow compilation flags to be passed for R package with ARROW_R_CXXFLAGS (26666d by wesm)
- 2019-08-15: ARROW-6180: [C++][Parquet] Add RandomAccessFile::GetStream that returns InputStream that reads a file segment independent of the file’s state, fix concurrent buffered Parquet column reads (2c808a by wesm)
- 2019-08-15: ARROW-6259: [C++] Add -Wno-extra-semi-stmt when compiling with clang 8 to work around Flatbuffers bug, suppress other new LLVM 8 warnings (fb8cb8 by wesm)
- 2019-08-15: ARROW-5952: [Python] fix conversion of chunked dictionary array with 0 chunks (5479d3 by jorisvandenbossche)
- 2019-08-16: ARROW-6170: [R] Faster docker-compose build (ea9106 by pitrou)
- 2019-08-16: ARROW-3246: [C++][Python][Parquet] Direct writing of DictionaryArray to Parquet columns, automatic decoding to Arrow (2ba056 by wesm)
- 2019-08-17: ARROW-5028: [Python] Avoid malformed ListArray types caused by reaching StringBuilder capacity when converting from Python sequence (7cc6f7 by wesm)
- 2019-08-17: ARROW-5085: [C++][Parquet][Python] Do not allow reading to dictionary type unless we have implemented support for it (abeb7a by wesm)
- 2019-08-19: ARROW-4648: [C++] Use underscores in source file names (438a14 by pitrou)
- 2019-08-19: ARROW-6258: [R] Add macOS build scripts (040511 by nealrichardson)
- 2019-08-19: ARROW-5480: [Python] Add unit test asserting specifically that pandas.Categorical roundtrips to Parquet format without special options (c4b8cb by wesm)
- 2019-08-19: ARROW-3652: [Python][Parquet] Add unit test exhibiting that pandas.CategoricalIndex survives roundtrip to Parquet format (6efba8 by wesm)
- 2019-08-19: PARQUET-1640: [C++] Fix crash in parquet-encoding-benchmark (747b47 by pitrou)
- 2019-08-20: ARROW-5985: [Developer] Do not suggest setting Fix Version for patch releases by default (586ef2 by wesm)
- 2019-08-20: ARROW-6182: [R] Add note to README about r-arrow conda installation (6d4948 by nealrichardson)
- 2019-08-20: ARROW-5134: [R][CI] Run nightly tests against multiple R versions (84374c by nealrichardson)
- 2019-08-20: ARROW-6095: [C++] Fix unit test build when only building static libraries, add cpp-static-only to tests.yml (4042d7 by bkietz)
- 2019-08-20: ARROW-6049: [C++] Support view from one dictionary type to another in Array::View (a40872 by wesm)
- 2019-08-20: ARROW-6161: [C++][Dataset] Implements ParquetFragment (721e6f by fsaintjacques)
- 2019-08-20: ARROW-5992: [C++][Python] Support String->Binary in Array::View. Add Python bindings for Array::View (1a085f by wesm)
- 2019-08-20: ARROW-6067: [Python] Fix failing large memory Python tests (5da8ae by wesm)
- 2019-08-20: ARROW-6048: [C++] Add ChunkedArray::View method that dispatches to Array::View (f8b742 by wesm)
- 2019-08-20: ARROW-5966: [Python] Also use ChunkedStringBuilder when converting NumPy string types to Arrow StringType (66e2b8 by wesm)
- 2019-08-20: ARROW-6046: [C++] Do not write excess varbinary offsets in IPC messages from sliced BinaryArray (7f6e6a by wesm)
- 2019-08-20: ARROW-6125: [Python] Remove Python APIs deprecated in 0.14.x and prior (980269 by wesm)
- 2019-08-21: ARROW-5888: [C++][Parquet][Python] Restore timezone metadata when original Arrow schema has been stored in Parquet metadata (c01c1c by wesm)
- 2019-08-21: ARROW-6278: [R] Read parquet files from raw vector (277f79 by nealrichardson)
- 2019-08-21: ARROW-6058: [C++][Parquet] Validate whole ColumnChunk raw data reads so that underlying filesystem issues are caught earlier (527df7 by wesm)
- 2019-08-21: ARROW-6159: [C++] Properly indent first line of PrettyPrint with Schema (a98e82 by wesm)
- 2019-08-21: ARROW-6174: [C++] Validate chunks in ChunkedArray::Validate. Fix validation of sliced ListArray, values null checks (7eb9a1 by wesm)
- 2019-08-21: ARROW-6092: [Python] Fix C++ arrow-python-test on Python 2.7 (e0910a by wesm)
- 2019-08-22: ARROW-6291: [C++] Do not override ARROW_PARQUET if other PARQUET options are enabled (62fd70 by wesm)
- 2019-08-22: ARROW-6183: [R] Document that you don’t have to use tidyselect if you don’t want (3c9236 by nealrichardson)
- 2019-08-22: ARROW-6178: [Developer] Keep prompting for authors in merge script for multi-author PRs if given bad input (ad568a by wesm)
- 2019-08-22: ARROW-3531: [Python] add Schema.field() method / deprecate field_by_name (6cbaf7 by jorisvandenbossche)
- 2019-08-23: ARROW-4848: [C++] Static libparquet not compiled with -DARROW_STATIC on Windows (c05024 by nealrichardson)
- 2019-08-23: ARROW-6227: [Python] Apply from_pandas option in pyarrow.array consistently across types (a3da22 by wesm)
- 2019-08-23: ARROW-6126: [C++] Return error when an IPC stream terminates in the middle of receiving dictionaries (1cf984 by wesm)
- 2019-08-23: ARROW-5686: [R] Review R Windows CI build (dfe34c by nealrichardson)
- 2019-08-23: ARROW-6325: [Python] fix conversion of strided boolean arrays (16bf32 by jorisvandenbossche)
- 2019-08-23: [Developer] Fix merge script regression, use default primary author if empty input (b31f22 by wesm)
- 2019-08-24: ARROW-5910: [Python] Support non-seekable streams in ipc.read_tensor, ipc.read_message, add Message.serialize_to method (03186b by wesm)
- 2019-08-24: ARROW-6279: [Python] Add Table.slice, __getitem__ support to match RecordBatch, Array, others (5a53e3 by wesm)
- 2019-08-25: ARROW-5906: [CI] Turn off ARROW_VERBOSE_THIRDPARTY_BUILD by default in Docker builds (c2e906 by wesm)
- 2019-08-25: ARROW-6238: [C++][Dataset] Implement SimpleDataSource, SimpleDataFragment and SimpleScanTask (e0fa3d by fsaintjacques)
- 2019-08-26: ARROW-6301: [C++][Python] Prevent ExtensionType-related race condition in Python process teardown by exposing shared_ptr to global ExtensionTypeRegistry"" (c9bd6d by wesm)
- 2019-08-27: ARROW-6323: [R] Expand file paths when passing to readers (6a7f07 by nealrichardson)
- 2019-08-27: ARROW-6338: [R] Type function names don’t match type names (cef5f3 by nealrichardson)
- 2019-08-27: ARROW-6363: [R] segfault in Table__from_dots with unexpected schema (3690fb by nealrichardson)
- 2019-08-27: ARROW-6229: [C++][Dataset] implement FileSystemBasedDataSource (17a070 by bkietz)
- 2019-08-27: ARROW-6364: [R] Handling unexpected input to time64() et al: (31bdc0 by nealrichardson)
- 2019-08-27: ARROW-3829: [Python] add __arrow_array__ protocol to support third-party array classes in conversion to Arrow (38401a by jorisvandenbossche)
- 2019-08-28: ARROW-6376: [Developer] Use target ref of PR when merging instead of hard-coding master"" (63dbc1 by wesm)
- 2019-08-28: ARROW-4648: [Doc] Add documentation about C++ file naming (443ac0 by pitrou)
- 2019-08-28: ARROW-6263: [Python] Use RecordBatch::Validate in RecordBatch.from_arrays. Normalize API vs. Table.from_arrays. Add record_batch factory function (05bc63 by wesm)
- 2019-08-28: ARROW-4511: [Format][Docs] Revamp Format documentation, consolidate columnar format docs into a more coherent single document. Add Versioning/Stability page (67d46c by wesm)
- 2019-08-28: ARROW-6354: [C++] Fix failing build when ARROW_PARQUET=OFF (a1dbba by fsaintjacques)
- 2019-08-28: ARROW-5522: [Packaging][Documentation] Comments out of date in python/manylinux1/build_arrow.sh (e5ccef by kszucs)
- 2019-08-29: ARROW-453: [C++] Filesystem implementation for Amazon S3 (7ec173 by pitrou)
- 2019-08-29: ARROW-6381: [C++] BufferOutputStream::Write does extra work that slows down small writes (b9d8cd by wesm)
- 2019-08-29: ARROW-6384: [C++] Bump dependency versions (ab712d by pitrou)
- 2019-08-29: ARROW-6348: [R] arrow::read_csv_arrow namespace error when package not loaded (bcf589 by nealrichardson)
- 2019-08-30: ARROW-6231: [C++] Allow generating CSV column names (beea8f by pitrou)
- 2019-08-30: ARROW-4095: [C++] Optimize DictionaryArray::Transpose() for trivial transpositions (5a8285 by pitrou)
- 2019-08-31: ARROW-2769: [Python] Deprecate and rename add_metadata methods (c2762a by kszucs)
- 2019-08-31: ARROW-6397: [C++][CI] Generate minio server connect string (0b41e5 by fsaintjacques)
- 2019-08-31: ARROW-5300: [C++] Remove the ARROW_NO_DEFAULT_MEMORY_POOL macro (a1b477 by fsaintjacques)
- 2019-08-31: ARROW-4398: [C++][Python][Parquet] Improve BYTE_ARRAY PLAIN encoding write performance. Add BYTE_ARRAY write benchmarks (2164e3 by wesm)
- 2019-09-01: ARROW-6406: [C++] Fix jemalloc URL for offline build in thirdparty/versions.txt (7d63df by wesm)
- 2019-09-02: ARROW-6411: [Python][Parquet] Improve performance of DictEncoder::PutIndices (ab908c by wesm)
- 2019-09-03: ARROW-6424: [C++] Fix IPC fuzzing test name (327057 by pitrou)
- 2019-09-03: ARROW-6269: [C++] check decimal precision in IPC code (9517ad by pitrou)
- 2019-09-03: ARROW-5610: [Python] define extension types in Python (c39e35 by jorisvandenbossche)
- 2019-09-03: ARROW-6412: [C++] Improve TCP port allocation in tests (561f86 by pitrou)
- 2019-09-04: ARROW-6423: [C++] Fix crash when trying to instantiate Snappy CompressedOutputStream (6b714a by pitrou)
- 2019-09-04: ARROW-6415: [R] Remove usage of R CMD config CXXCPP (4a7dd4 by nealrichardson)
- 2019-09-04: ARROW-6358: [C++] Add FileSystem::DeleteDirContents (d81829 by pitrou)
- 2019-09-04: ARROW-4836: [C++] Support Tell() on compressed streams (26f631 by pitrou)
- 2019-09-04: ARROW-6450: [C++] Use 2x reallocation strategy in BufferBuilder instead of 1.5x (ea309d by wesm)
- 2019-09-04: ARROW-6432: [CI][Crossbow] Remove alpine nightly crossbow jobs (c0656a by kszucs)
- 2019-09-04: ARROW-6454: [LICENSE] Add LLVM’s license due to static linkage (131ae4 by kszucs)
- 2019-09-04: ARROW-6431: [Python] Test suite fails without pandas installed (243d48 by kszucs)
- 2019-09-04: ARROW-6457: [C++] Always set CMAKE_BUILD_TYPE if it is not defined (314e9f by wesm)
- 2019-09-05: ARROW-6242: [C++][Dataset] Implement Dataset, Scanner and ScannerBuilder (2620ed by fsaintjacques)
- 2019-09-05: ARROW-6385: [C++] Use xxh3 instead of custom hashing code for non-tiny strings (b829f5 by pitrou)
- 2019-09-05: ARROW-6433: [Java][CI] Fix java docker image (40d08a by fsaintjacques)
- 2019-09-05: ARROW-5558: [C++] Support Array::View on arrays with non-zero offset (4a5d10 by wesm)
- 2019-09-05: ARROW-6417: [C++][Parquet] Miscellaneous optimizations yielding slightly better Parquet binary read performance (45e41c by wesm)
- 2019-09-05: ARROW-6453: [C++] More informative error messages with S3 (d2be6a by pitrou)
- 2019-09-05: ARROW-6443: [CI][Crossbow] Nightly conda osx builds fail (5931d5 by kszucs)
- 2019-09-05: ARROW-6447: [C++] Allow rest of arrow_objlib to build in parallel while memory_pool.cc is waiting on jemalloc_ep (8cfa16 by wesm)
- 2019-09-06: ARROW-6171: [R][CI] Fix R library search path (d7ef11 by fsaintjacques)
- 2019-09-06: ARROW-5292: [C++] Work around symbol visibility issues so building static libraries is not necessary when building unit tests on WIN32 platform (1137de by wesm)
- 2019-09-06: ARROW-6476: [Java][CI] Fix java docker build script (a89300 by fsaintjacques)
- 2019-09-06: ARROW-6369: [C++] Handle Array.to_pandas case for type=list
(c0dbf7 by wesm) - 2019-09-06: ARROW-6478: [C++] Revert to jemalloc stable-4 until we understand 5.2.x performance issues (53c5af by wesm)
- 2019-09-06: ARROW-6475: [C++] Don’t try to dictionary encode dictionary arrays (b8ebc9 by bkietz)
- 2019-09-06: ARROW-6120: [C++] Forbid use of
in public header files (200e30 by wesm) - 2019-09-06: ARROW-6435: [Python] Use pandas null coding consistently on List and Struct types (e29e26 by wesm)
- 2019-09-07: ARROW-4880: [Python] Rehabilitate ASV benchmark build scripts (03e6c0 by wesm)
- 2019-09-07: ARROW-3933: [C++][Parquet] Handle non-nullable struct children when reading Parquet file, better error messages (1f893a by wesm)
- 2019-09-08: ARROW-6477: [Packaging][Crossbow] Use Azure Pipelines to build linux packages (6ed87b by kszucs)
- 2019-09-08: ARROW-6446: [OSX][Python][Wheel] Turn off ORC feature in the wheel building scripts (0158ae by kszucs)
- 2019-09-09: ARROW-5743: [C++] Add cmake option and macros for enabling large memory tests (c0baff by bkietz)
- 2019-09-09: ARROW-6292: [C++] Add option to use the mimalloc allocator (a32112 by pitrou)
- 2019-09-09: ARROW-6300: [C++] Add Abort() method to streams (dd29b0 by pitrou)
- 2019-09-09: ARROW-3651: [Python] Handle ‘datetime’ logical type when reconstructing pandas columns from custom metadata (c97c64 by wesm)
- 2019-09-09: ARROW-6413: [R] Support autogenerating column names (4f7ead by nealrichardson)
- 2019-09-09: ARROW-6492: [Python] Handle pandas_metadata created by fastparquet with missing field_name (44e7f1 by jorisvandenbossche)
- 2019-09-09: ARROW-5374: [Python][C++] Improve ipc.read_record_batch docstring, fix IPC message type error messages generated in C++ (9c2694 by wesm)
- 2019-09-09: ARROW-6368: [C++][Dataset] Add interface for projecting" RecordBatch from one schema to another, inserting null values where needed" (92f16e by bkietz)
- 2019-09-09: ARROW-3762: [Python] Add large_memory unit test exercising BYTE_ARRAY overflow edge cases from ARROW-3762 (74d829 by wesm)
- 2019-09-10: ARROW-6480: [Crossbow] Summary report e-mailer with polling logic (3f2a33 by kszucs)
- 2019-09-10: ARROW-6481: [C++] Avoid copying large ConvertOptions (b1025c by pitrou)
- 2019-09-10: ARROW-5505: [R] Normalize file and class names, stop masking base R functions, add vignette, improve documentation (9dec79 by nealrichardson)
- 2019-09-10: ARROW-5646: [Crossbow][Documentation] Move the user guide to the Sphinx documentation (6d25df by kszucs)
- 2019-09-11: ARROW-6243: [C++][Dataset] Filter expressions (d2b6d1 by bkietz)
- 2019-09-11: ARROW-6015: [Python] Add note to python/README.md about installing Visual C++ Redistributable on Windows when using pip (2cedd6 by wesm)
- 2019-09-11: ARROW-6522: [Python] Fix failing pandas tests on older pandas / older python (a6b118 by jorisvandenbossche)
- 2019-09-11: ARROW-6506: [C++] Fix validation of ExtensionArray with struct storage type (1d2738 by jorisvandenbossche)
- 2019-09-11: ARROW-6524: [Developer][Packaging] Nightly build report’s subject should contain Arrow (3437c9 by kszucs)
- 2019-09-11: ARROW-5450: [Python] Always return datetime.datetime in TimestampValue.as_py for units other than nanoseconds (6f7245 by wesm)
- 2019-09-12: ARROW-6488: [Python] fix equality with pyarrow.NULL to return NULL (40718b by jorisvandenbossche)
- 2019-09-12: ARROW-6531: [Python] Add detach() method to buffered streams (0c3915 by pitrou)
- 2019-09-12: ARROW-4220: [Python] Add buffered IO benchmarks with simulated high latency, allow duck-typed files in input_stream/output_stream (8c2177 by wesm)
- 2019-09-12: ARROW-6530: [CI][Crossbow][R] Nightly R job doesn’t install all dependencies (c3a687 by nealrichardson)
- 2019-09-12: ARROW-6357: [C++] Issue S3 file writes in the background by default (c2f726 by pitrou)
- 2019-09-12: ARROW-6252: [C++][Python] Add Array::Diff in C++ and Array.diff in Python to return diff as string (3ea70b by wesm)
- 2019-09-12: ARROW-6525: [C++] Avoid aborting in CloseFromDestructor() (d1466a by pitrou)
- 2019-09-12: ARROW-5682: [Python] Raise error when trying to convert non-string dtype to string (765686 by jorisvandenbossche)
- 2019-09-12: ARROW-5853: [Python] Expose boolean filter kernel on Array (c12a25 by jorisvandenbossche)
- 2019-09-12: ARROW-6518: [Packaging][Python] Flight failing in OSX Python wheel builds (239347 by kszucs)
- 2019-09-12: ARROW-6504: [Python][Packaging] Add mimalloc to conda packages for better performance (4b72c4 by kszucs)
- 2019-09-12: ARROW-6526: [C++] Poison data in debug mode (5a1d98 by pitrou)
- 2019-09-13: ARROW-6557: [Python] Always return pandas.Series from Array/ChunkedArray.to_pandas. Add mechanism to preserve column names" from RecordBatch, Table as Series.name" (c4671b by wesm)
- 2019-09-13: ARROW-6314: [C++] Implement IPC message format alignment changes, provide backwards compatibility and legacy" option to emit old message format" (f0d776 by wesm)
- 2019-09-13: ARROW-6541: [Format][C++] Update Columnar.rst for two-part EOS, update C++ implementation (7f4d50 by wesm)
- 2019-09-13: ARROW-6509: [Java][CI] Upgrade maven-surefire-plugin to version 3.0.0-M3, disable Gandiva JNI unit tests temporarily (5b783f by wesm)
- 2019-09-13: ARROW-5220: [Python] Specified schema in from_pandas also includes the index (a1eb81 by jorisvandenbossche)
- 2019-09-13: ARROW-6556: [Python] Handle future removal of pandas SparseDataFrame (3e6f8d by jorisvandenbossche)
- 2019-09-14: ARROW-1741: [C++] Add DictionaryArray::CanCompareIndices (5fa694 by bkietz)
- 2019-09-15: ARROW-6560: [Python] Fix nopandas integration tests (f77c24 by jorisvandenbossche)
- 2019-09-15: ARROW-6559: [Developer][C++] Add option to pass ARROW_PACKAGE_PREFIX when using ‘archery benchmark’ (7749c4 by wesm)
- 2019-09-15: ARROW-6561: [Python] Fix python tests to pass on pandas master (8d9ba8 by jorisvandenbossche)
- 2019-09-16: ARROW-6549: [C++] Switch to jemalloc 5.2.x (1e6a58 by pitrou)
- 2019-09-16: ARROW-6572: [C++] Fix Parquet decoding returning uninitialized data (3bf4d8 by pitrou)
- 2019-09-16: ARROW-5562: [C++][Parquet] Write negative zero or small epsilons as positive zero when computing Parquet statistics (879cf3 by wesm)
- 2019-09-17: ARROW-6584: [Python][Wheel] Bundle zlib again with the windows wheels (6d4f25 by kszucs)
- 2019-09-17: ARROW-6568: [C++] ChunkedArray constructor needs type when chunks is empty (6d7445 by bkietz)
- 2019-09-17: ARROW-5494: [Python] Create FileSystem bindings (ae20ce by kszucs)
- 2019-09-17: ARROW-6391: [Python][Flight] Add built-in methods on FlightServerBase to start server and wait for it to be available (5926ac by kszucs)
- 2019-09-17: ARROW-2490: [C++] Normalize input stream concurrency (7fb6cb by pitrou)
- 2019-09-17: ARROW-6558: [C++] Refactor Iterator to type erased handle (48df57 by bkietz)
- 2019-09-17: ARROW-6253: [Python] Expose enable_buffered_stream" option from parquet::ReaderProperties in pyarrow.parquet.read_table" (7f4247 by kszucs)
- 2019-09-17: ARROW-6362: [C++] Allow customizing S3 credentials provider (accb29 by pitrou)
- 2019-09-17: ARROW-5630: [C++][Parquet] Fix RecordReader accounting for repeated fields with non-nullable leaf (7dea1d by wesm)
- 2019-09-18: ARROW-2317: [Python] Fix C linkage warning with Cython (32abe6 by pitrou)
- 2019-09-18: ARROW-6573: [Python] Add test case to probe additional behavior in schema-data mismatch in Table.from_pydict (28d3f9 by wesm)
- 2019-09-18: ARROW-6590: [C++] Do not require ARROW_JSON to build ARROW_IPC when unit tests are off (1f60b4 by wesm)
- 2019-09-18: ARROW-5220: [Python] Follow-up to improve error messages and docs for from_pandas schema argument (0fb4ca by jorisvandenbossche)
- 2019-09-18: ARROW-6589: [C++] Error propagation, tests for /MakeArray(OfNulls|FromScalar)/ (de27f1 by bkietz)
- 2019-09-18: ARROW-6474: [Python] Add option to use legacy / pre-0.15 IPC message format and to set the default using PYARROW_LEGACY_IPC_FORMAT environment variable (176adf by wesm)
- 2019-09-18: ARROW-6527: [C++] Add OutputStream::Write(Buffer) (825c79 by pitrou)
- 2019-09-18: ARROW-6520: [Python] More consistent handling of specified schema when creating Table (2d8cf1 by jorisvandenbossche)
- 2019-09-18: ARROW-6570: [Python] Use Arrow’s allocators for creating NumPy array instead of leaving it to NumPy (19545f by wesm)
- 2019-09-18: ARROW-5344: [C++] Use ArrayDataVisitor in dict-to-anything cast (95afd4 by pitrou)
- 2019-09-18: ARROW-4841: [C++] Add arrowOptions.cmake with options used to build arrow (d6b057 by bkietz)
- 2019-09-18: ARROW-6597: [Python] Sanitize Python datetime handling (329c99 by pitrou)
- 2019-09-18: ARROW-6564: [Python] Do not require pandas for invoking Array.__array__ (1f8856 by jorisvandenbossche)
- 2019-09-18: ARROW-5870: [C++][Docs] Refine source build instructions, do not tell people to install flex/bison if they don’t need them (5ddefd by wesm)
- 2019-09-19: ARROW-6336: [Python] Add notes to pyarrow.serialize/deserialize to clarify that these functions do not read or write the standard IPC protocol (86dc95 by wesm)
- 2019-09-19: ARROW-5343: [C++] Refactor dictionary unification to incremental interface, and use Buffer for transpose map allocations (00a3c4 by wesm)
- 2019-09-19: ARROW-6618: [Python] Fix read_message() segfault on end of stream (dbacce by pitrou)
- 2019-09-19: ARROW-6609: [C++] Add Dockerfile for minimal C++ build (4e1178 by wesm)
- 2019-09-19: ARROW-5935: [C++] ArrayBuilder::type() should be kept accurate (d4e489 by bkietz)
- 2019-09-19: ARROW-5086: [Python][Parquet] Opt in to file memory-mapping when reading Parquet files rather than opting out (7ef8e0 by wesm)
- 2019-09-19: ARROW-6214: [R] Add R sanitizer docker image (c2e832 by fsaintjacques)
- 2019-09-19: ARROW-6244: [C++][Dataset] Add partition key to DataSource interface (19d1d0 by bkietz)
- 2019-09-19: ARROW-6556: [Python] Fix warning for pandas SparseDataFrame removal (9736d3 by jorisvandenbossche)
- 2019-09-20: ARROW-6544: [R] Documentation/polishing for 0.15 release (36ce1c by nealrichardson)
- 2019-09-20: ARROW-6379: [C++] Write no IPC buffer metadata for NullType (58f679 by wesm)
- 2019-09-20: ARROW-5216: [CI] Add Appveyor badge to README (dd20e9 by nealrichardson)
- 2019-09-21: ARROW-4649: [C++/CI/R] Add nightly job that tests the homebrew formula (b2785d by nealrichardson)
- 2019-09-21: ARROW-6651: Fix conda R job (982a4a by nealrichardson)
- 2019-09-21: ARROW-6652: [Python] Fix Array.to_pandas to retain timezone (b95c9b by jorisvandenbossche)
- 2019-09-21: ARROW-6642: [Python] Link parent objects in Parquet’s metadata and statistics objects (a5bedf by jorisvandenbossche)
- 2019-09-22: ARROW-5717: [Python] Unify variable dictionaries when converting to pandas (565371 by wesm)
- 2019-09-23: ARROW-3817: [R] Extract methods for RecordBatch and Table (f74716 by nealrichardson)
- 2019-09-23: ARROW-6664: [C++] Add CMake option to build without SSE4.2 instructions (3129e3 by wesm)
- 2019-09-23: ARROW-6605: [C++][Filesystem] Add recursion depth control to fs::Selector (06dc86 by fsaintjacques)
- 2019-09-23: ARROW-6670: [CI][R] Fix fixes for R nightly jobs (98b4ad by nealrichardson)
- 2019-09-24: ARROW-6652: [Python] Fix ChunkedArray.to_pandas to retain timezone (61637d by jorisvandenbossche)
- 2019-09-24: ARROW-3777: [C++] Add Slow input streams and slow filesystem (5a918c by pitrou)
- 2019-09-24: ARROW-6187: [C++] Fallback to storage type when writing ExtensionType to Parquet (b780c5 by jorisvandenbossche)
- 2019-09-24: ARROW-6629: [Doc] [C++] Add filesystem docs (2c7fb2 by pitrou)
- 2019-09-24: ARROW-6649: [R] print methods for Array, ChunkedArray, Table, RecordBatch (a89c80 by nealrichardson)
- 2019-09-24: ARROW-6674: [Python] Fix or ignore the test warnings (232cde by jorisvandenbossche)
- 2019-09-24: ARROW-6158: [C++/Python] Validate child array types with type fields of StructArray (199d3c by jorisvandenbossche)
- 2019-09-24: ARROW-6678: [C++][Parquet] Binary data stored in Parquet metadata must be base64-encoded to be UTF-8 compliant (4fe330 by wesm)
- 2019-09-25: ARROW-6622: [R] Normalize paths for filesystem API on Windows (0d0e4c by nealrichardson)
- 2019-09-25: ARROW-6679: [RELEASE] Add license info for the autobrew scripts (883d9e by nealrichardson)
- 2019-09-25: ARROW-6630: [Doc] Document C++ file formats (196fac by pitrou)
- 2019-09-26: ARROW-6606: [C++] Add PathTree tree structure (dec0cf by fsaintjacques)
- 2019-09-26: ARROW-6683: [Python] Test for fastparquet <-> pyarrow cross-compatibility (df2791 by jorisvandenbossche)
- 2019-09-27: ARROW-6701: [C++][R] Lint failing on R cpp code (cf3990 by nealrichardson)
- 2019-09-27: ARROW-6714: [R] Fix untested RecordBatchWriter case (7fb6b7 by nealrichardson)
- 2019-09-29: ARROW-6115: [Python] Support LargeBinary and LargeString in conversion to python (9cfb53 by jorisvandenbossche)
- 2019-09-29: ARROW-6725: [CI] Disable 3rdparty fuzzit nightly builds (ee5965 by kszucs)
- 2019-09-30: [Release] Update .deb/.rpm changelogs for 0.15.0 (4460fe by kszucs)
- 2019-09-30: [Release] Update versions for 0.15.0 (6b1374 by kszucs)
- 2019-09-30: [maven-release-plugin] prepare release apache-arrow-0.15.0 (40d468 by kszucs)
- 2019-09-30: [maven-release-plugin] prepare for next development iteration (c14761 by kszucs)
- 2019-09-30: [Release] Update versions for 1.0.0-SNAPSHOT (3a50db by kszucs)
- 2019-09-30: [Release] Update .deb package names for 1.0.0 (586259 by kszucs)
- 2019-09-30: [Release] Update CHANGELOG.md for 0.15.0 (fe5813 by kszucs)
- 2019-10-05: ARROW-6770: [CI][Travis] Download Minio quietly (f2fab6 by kszucs)
- 2019-10-05: ARROW-6773: [C++] Fix filter kernel when filtering with a boolean Array slice (99db64 by nealrichardson)
- 2019-10-05: ARROW-6762: [C++] Support reading JSON files with no newline at end (b9c154 by pitrou)
- 2019-10-05: ARROW-6613: [C++] Minimize usage of boost::filesystem (134b65 by pitrou)
- 2019-10-05: ARROW-3808: [R] Array extract, including Take method (cad5c4 by nealrichardson)
- 2019-10-05: ARROW-6771: [Packaging][Python] Missing pytest dependency from conda and wheel builds (740231 by kszucs)
- 2019-10-05: ARROW-6686: [CI] Pull and push docker images to speed up the nightly builds (273a84 by kszucs)
- 2019-10-05: ARROW-6760: [C++] More informative error messages for JSON parsing errors (633045 by bkietz)
- 2019-10-05: ARROW-6437: [R] Add AWS SDK to Homebrew formulae (04b08d by nealrichardson)
- 2019-10-05: ARROW-6494: [C++][Dataset] Implement PartitionSchemes (a6787c by bkietz)
- 2019-10-05: ARROW-6581: [C++] Fix fuzzit job submission (075712 by pitrou)
- 2019-10-05: ARROW-6688: [Packaging] Include s3 support in the conda packages (3f687e by kszucs)
- 2019-10-05: ARROW-6755: [Release] Improve Windows release verification script (345b01 by wesm)
- 2019-10-05: ARROW-6614: [C++][Dataset] Add DataSourceDiscovery class (83654f by fsaintjacques)
- 2019-10-05: ARROW-6610: [C++] Add cmake option to disable filesystem layer (ced37b by pitrou)
- 2019-10-05: ARROW-6564: [Python] Do not require pandas for invoking ChunkedArray.__array__ (768ba4 by jorisvandenbossche)
- 2019-10-05: ARROW-6685: [C++] Ignore trailing slashes in S3FS (95fe0e by pitrou)
- 2019-10-05: ARROW-6740: [C++] Unmap MemoryMappedFile as soon as possible (321896 by pitrou)
- 2019-10-05: ARROW-6634: [C++] Vendor Flatbuffers and check in compiled sources (bef9a1 by wesm)
- 2019-10-05: ARROW-6655: [Python] Filesystem bindings for S3 (8bbb29 by kszucs)
- 2019-10-05: ARROW-5831: [Release] Add Python program to download binary artifacts in parallel, allow abort/resume (2ca4ed by wesm)
- 2019-10-05: ARROW-6751: [CI] Fix ccache setup on Travis-CI (902148 by pitrou)
- 2019-10-05: ARROW-6730: [CI] Use GitHub Actions for C++ with clang 7" docker image" (1ce956 by fsaintjacques)
- 2019-10-05: ARROW-6708: [C++] Fix hardcoded boost library names (c8e91c by pitrou)
- 2019-10-05: ARROW-6750: [Python] Silence S3 error logs by default (c2f389 by pitrou)
- 2019-10-06: ARROW-6797: [Release] Use a separately cloned arrow-site repository in the website post release script (e27d3c by kszucs)
- 2019-10-06: ARROW-6634: [C++][FOLLOWUP] Remove Flatbuffers EP remnants from C++ Dockerfiles (759442 by wesm)
- 2019-10-06: ARROW-6578: [C++] Allow casting number to string (bd3331 by pitrou)
- 2019-10-06: ARROW-412: [Format][Documentation] Clarify that Buffer.size in Flatbuffers should reflect the actual memory size rather than the padded size (26c56d by wesm)
- 2019-10-07: ARROW-6806: [C++] [Python] Fix crash validating an IPC-originating empty array (4fa044 by pitrou)
- 2019-10-07: ARROW-6468: [C++] Remove unused hashing routines (7d2866 by pitrou)
- 2019-10-07: ARROW-6378: [C++][Dataset] Implement recursive TreeDataSource (524104 by fsaintjacques)
- 2019-10-07: ARROW-6787: [CI] [C++] Decommission C++ with clang 7 and system packages" Travis CI job" (44bd85 by pitrou)
- 2019-10-07: ARROW-6754: [C++] Merge allocator.h into stl.h (dcd685 by pitrou)
- 2019-10-07: ARROW-6804: [CI] [Rust] Migrate Travis job to Github Actions (793c60 by pitrou)
- 2019-10-08: ARROW-5855: [Python] Support for Duration (timedelta) type (c805b5 by jorisvandenbossche)
- 2019-10-08: ARROW-6811: [R] Assorted post-0.15 release cleanups (f6760f by nealrichardson)
- 2019-10-08: ARROW-5655: [Python] Table.from_pydict/from_arrays not using types in specified schema correctly (b67dd4 by kszucs)
- 2019-10-08: ARROW-6768: [C++][Dataset] Add method to convert from Scanner to Table (3d5512 by fsaintjacques)
- 2019-10-08: ARROW-5802: [CI][Archery] Dockerify lint utilities (583fb7 by fsaintjacques)
- 2019-10-08: ARROW-6321: [Python] Ability to create ExtensionBlock on conversion to pandas (a8936d by jorisvandenbossche)
- 2019-10-09: ARROW-6764: [C++] Create a readahead iterator (d80899 by pitrou)
- 2019-10-09: ARROW-6778: [C++] Support cast for DurationType (1b02af by jorisvandenbossche)
- 2019-10-09: ARROW-6466: [Integration][CI] Move integration test code to `archery integration` command. Dockerize integration tests (5ca859 by wesm)
- 2019-10-09: ARROW-6631: [C++] Do not build any compression libraries by default in C++ build (ad335f by wesm)
- 2019-10-09: ARROW-6834: [C++][TRIAGE] Pin gtest version 1.8.1 to unblock Appveyor builds (6ea984 by wesm)
- 2019-10-09: ARROW-6782: [C++] Do not require Boost for minimal C++ build (b3629d by wesm)
- 2019-10-10: ARROW-6833: [R][CI] Add crossbow job for full R autobrew macOS build (59a678 by nealrichardson)
- 2019-10-10: ARROW-6831: [R] Update R macOS/Windows builds for change in cmake compression defaults (d1f872 by nealrichardson)
- 2019-10-10: ARROW-6832: [R] Implement Codec::IsAvailable (16bd62 by nealrichardson)
- 2019-10-11: ARROW-6835: [Archery][CMake] Restore ARROW_LINT_ONLY cmake option (d5ba83 by fsaintjacques)
- 2019-10-11: ARROW-6711: [C++] Consolidate Filter and Expression (929c9f by bkietz)
- 2019-10-12: ARROW-6859: [CI][Nightly] Disable docker layer caching for CircleCI tasks (1fc101 by kszucs)
- 2019-10-12: ARROW-6860: [Python][C++] Do not link shared libraries monolithically to pyarrow.lib, add libarrow_python_flight.so (102acc by wesm)
- 2019-10-13: ARROW-6864: [C++] Add compression-related compile definitions before adding any unit tests (07128f by wesm)
- 2019-10-14: ARROW-6857: [C++] Fix DictionaryEncode for zero-chunk ChunkedArray (40c971 by pitrou)
- 2019-10-14: ARROW-6882: [C++] Ensure the DictionaryArray indices has no dictionary data (0cb737 by jorisvandenbossche)
- 2019-10-15: ARROW-6877: [C++] Add additional Boost versions to support 1.71 and the presumed next 2 future versions (018e1f by wesm)
- 2019-10-15: ARROW-6885: [Python] Remove superfluous skipped timedelta test (9f0650 by jorisvandenbossche)
- 2019-10-15: ARROW-6844: [C++][Parquet] Fix regression in reading List types with item name that is not item"" (2f183a by wesm)
- 2019-10-15: ARROW-6789: [Python] Improve ergonomics by automatically boxing Action and Result in do_action RPC (9acd0f by wesm)
- 2019-10-16: ARROW-6874: [Python] Fix memory leak when converting to Pandas object data (3572af by pitrou)
- 2019-10-16: ARROW-6847: [C++] Add range_expression adapter to Iterator (e08076 by bkietz)
- 2019-10-16: ARROW-6876: [C++][Parquet] Use shared_ptr to avoid copying ReaderContext struct, fix performance regression with reading many columns (2ce62d by wesm)
- 2019-10-16: ARROW-6903: [Python] Attempt to fix Python wheels with introduction of libarrow_python_flight, disabling of pyarrow.orc (560867 by wesm)
- 2019-10-17: ARROW-6884: [Python] Format friendlier message in Python when a server-side RPC handler fails (3ae655 by wesm)
- 2019-10-17: ARROW-6913: [R] Potential bug in compute.cc (6f2c90 by nealrichardson)
- 2019-10-17: ARROW-6861: [C++] Fix length/null_count/capacity accounting through Reset and AppendIndices in DictionaryBuilder (83ed35 by wesm)
- 2019-10-17: ARROW-6878: [Python] Fix creating array from list of dicts with bytes keys (1714fb by pitrou)
- 2019-10-17: ARROW-6918: [R] Make docker-compose setup faster (8e1982 by pitrou)
- 2019-10-17: ARROW-6916: [Developer] Sort tasks by name in Crossbow e-mail report (3207ac by wesm)
- 2019-10-18: ARROW-6869: [C++] Do not return invalid arrays from DictionaryBuilder::Finish when reusing builder. Add FinishDelta" method and “ResetFull” method" (99aa62 by wesm)
- 2019-10-18: ARROW-6937: [Packaging][Python] Fix conda linux and OSX wheel nightly builds (5465c1 by kszucs)
- 2019-10-18: ARROW-6922: [Python] Compat with pandas for MultiIndex.levels.names (3adea3 by jorisvandenbossche)
- 2019-10-18: ARROW-2863: [Python] Add context manager APIs to RecordBatch*Writer/Reader classes (0ac429 by kszucs)
- 2019-10-18: ARROW-6938: [Packaging][Python] Disable bz2 in Windows wheels and build ZSTD in bundled mode to triage linking issues (32a1e5 by wesm)
- 2019-10-21: ARROW-6936: [Python] Improve error message when unwrapping object fails (6efa9f by pitrou)
- 2019-10-22: ARROW-6769: [Dataset][C++] End to end test (c02495 by fsaintjacques)
- 2019-10-22: ARROW-6910: [C++][Python] Set jemalloc default configuration to release dirty pages more aggressively back to the OS dirty_decay_ms and muzzy_decay_ms to 0 by default, add C++ / Python option to configure this (1ae946 by wesm)
- 2019-10-23: ARROW-6962: [C++] [CI] Stop compiling with -Weverything (c02a57 by pitrou)
- 2019-10-24: ARROW-6704: [C++] Check for out of bounds timestamp in unsafe cast (af6fa2 by jorisvandenbossche)
- 2019-10-24: ARROW-6977: [C++] Disable jemalloc background_thread on macOS (a33bd3 by pitrou)
- 2019-10-24: ARROW-6983: [C++] Fix ThreadedTaskGroup lifetime issue (caefc7 by pitrou)
- 2019-10-24: ARROW-6963: [Packaging][Wheel][OSX] Use crossbow’s command to deploy artifacts from travis builds (2bf344 by kszucs)
- 2019-10-24: ARROW-6964: [C++][Dataset] Add multithread support to Scanner::ToTable (619cfb by fsaintjacques)
- 2019-10-24: ARROW-6969: [C++][Dataset] ParquetScanTask defer memory usage (54a54c by fsaintjacques)
- 2019-10-25: ARROW-6986: [R] Add basic Expression class (5e5e52 by nealrichardson)
- 2019-10-28: ARROW-6980: [R] dplyr backend for RecordBatch/Table (f9bde6 by nealrichardson)
- 2019-10-29: ARROW-7024: [CI][R] Update R dependencies for Conda build (cea8a0 by nealrichardson)
- 2019-10-29: ARROW-7016: [Developer][Python] Add Windows batch script to test Python wheels for release candidate (e2a302 by wesm)
- 2019-10-29: ARROW-6758: [Developer] Install local NodeJS via nvm when running release verification (37434f by wesm)
- 2019-10-29: ARROW-7013: [C++] arrow-dataset pkgconfig is incomplete (89b5a2 by nealrichardson)
- 2019-10-29: ARROW-7014: [Developer][Release] Add wheels" verification option to verify-release-candidate.sh for Linux and macOS" (5a2154 by wesm)
- 2019-10-30: ARROW-7027: [Python] Correctly raise error in pa.table(..) on invalid input (102410 by jorisvandenbossche)
- 2019-10-31: ARROW-6950: [C++][Dataset] Add dataset benchmark example (4334a0 by fsaintjacques)
- 2019-11-01: ARROW-7034: [CI][Crossbow] Skip known nightly failures (d2ed30 by nealrichardson)
- 2019-11-01: ARROW-7039: [Python] Fix pa.table/record_batch typecheck to work without pandas (6a127e by jorisvandenbossche)
- 2019-11-01: ARROW-6989: [Python] Check for out of range precision decimals in python conversion (e73793 by jorisvandenbossche)
- 2019-11-02: ARROW-6784: [C++][R] Move filter and take for ChunkedArray, RecordBatch, and Table from Rcpp to C++ library (7fd9ba by nealrichardson)
- 2019-11-04: ARROW-6825: [C++] Rework CSV reader IO around readahead iterator (21ca13 by pitrou)
- 2019-11-04: ARROW-2428: [Python] Support pandas ExtensionArray in Table.to_pandas conversion (7f4165 by jorisvandenbossche)
- 2019-11-04: ARROW-7057: [C++] Add API to parse URI query strings (2f5f26 by pitrou)
- 2019-11-04: ARROW-7052: [C++] Fix linking of datasets example when ARROW_BUILD_SHARED=OFF (e0cc9c by wesm)
- 2019-11-05: ARROW-7031: [Python] Expose the offsets of a ListArray in python (09c304 by jorisvandenbossche)
- 2019-11-05: ARROW-7060: [R] Post-0.15.1 cleanup (adb2c7 by nealrichardson)
- 2019-11-05: ARROW-6999: [Python] Fix unnamed index when specifying schema in Table.from_pandas (997bbd by jorisvandenbossche)
- 2019-11-05: ARROW-7022, ARROW-7023: [Python] fix handling of pandas Index and Period/Interval extension arrays in pa.array (e0e8e5 by jorisvandenbossche)
- 2019-11-05: ARROW-6277: [C++][Parquet] Support direct DictionaryArray write of all parquet types (1afbcb by bkietz)
- 2019-11-06: ARROW-6984: [C++] Update LZ4 to 1.9.2 for CVE-2019-17543 (7f0871 by pitrou)
- 2019-11-06: ARROW-7067: [CI] Disable code coverage on Travis-CI (d7d4c5 by pitrou)
- 2019-11-06: ARROW-7058: [C++] FileSystemDataSourceDiscovery should apply partition schemes relative to its base dir (d22f80 by bkietz)
- 2019-11-06: ARROW-6743: [C++] Remove usage of boost::filesystem (21f610 by pitrou)
- 2019-11-06: ARROW-7031: [Python] Correct LargeListArray.offsets attribute (75e909 by jorisvandenbossche)
- 2019-11-06: ARROW-7054: [Docs] Enable overriding project version with environment variable when building Sphinx docs (10a3b7 by wesm)
- 2019-11-06: ARROW-7007: [C++] Add use_mmap option to LocalFS (44e8d9 by pitrou)
- 2019-11-07: ARROW-3408: [C++] Add CSV option to automatically attempt dict encoding (b71d28 by pitrou)
- 2019-11-08: ARROW-7103: [R] Various minor cleanups (8219a8 by nealrichardson)
- 2019-11-08: ARROW-6951: [C++][Dataset] Column projection in ParquetFragment (fd552e by bkietz)
- 2019-11-08: ARROW-6952: [C++][Dataset] Implement predicate pushdown with ParqueFileFragment (ad1fc6 by fsaintjacques)
- 2019-11-08: ARROW-7062: [C++][Dataset] Ensure ParquetFileFormat::Open catch parqu… (9510b7 by fsaintjacques)
- 2019-11-08: ARROW-7074: [C++] ASSERT_OK_AND_ASSIGN should use ASSERT_OK instead of EXPE… (7a33d3 by fsaintjacques)
- 2019-11-08: ARROW-7097: [Rust][CI] Apply rustfmt nightly (aa9f5c by fsaintjacques)
- 2019-11-08: ARROW-6340 [R] Implements low-level bindings to Dataset classes (21ad7a by nealrichardson)
- 2019-11-12: ARROW-7101: [CI] Refactor docker-compose setup and use it with GitHub Actions (7bc2b0 by kszucs)
- 2019-11-12: ARROW-6396: [C++] Add overloads of Boolean kernels implementing Kleene logic (76cebf by bkietz)
- 2019-11-12: ARROW-3444: [Python] Add Array/ChunkedArray/Table nbytes attribute (417feb by jorisvandenbossche)
- 2019-11-12: ARROW-7128: [CI] Use proper version for fedora tests in GitHub actions cron jobs (cde24b by kszucs)
- 2019-11-12: ARROW-7133: [CI] Allow GH Actions to run on all branches (e9a860 by pitrou)
- 2019-11-12: ARROW-7066: [Python] Allow returning ChunkedArray in __arrow_array__ (3bc6db by jorisvandenbossche)
- 2019-11-14: ARROW-6635: [C++] Disable glog integration by default (78d52a by pitrou)
- 2019-11-14: ARROW-7157: [R] Add validation, helpful error message to Object$new() (b4b633 by nealrichardson)
- 2019-11-14: ARROW-6749: [Python] Let Array.to_numpy use general conversion code with zero_copy_only=True (85a9ae by jorisvandenbossche)
- 2019-11-14: ARROW-6636: [C++] Do not build command line tools by default (acf34b by pitrou)
- 2019-11-14: ARROW-7160: [C++] Update string_view backport (f82ca9 by pitrou)
- 2019-11-14: ARROW-7162: [C++] Cleanup warnings in cmake_modules/SetupCxxFlags.cmake (553663 by pitrou)
- 2019-11-14: ARROW-7142: [C++] GCC compilation failures in nightlies (f3f7bd by bkietz)
- 2019-11-14: ARROW-7164: [CI] Dev cron github action is failing every 15 minutes (b0f384 by nealrichardson)
- 2019-11-15: ARROW-7186: [R] Add inline comments to document the dplyr code (e86958 by nealrichardson)
- 2019-11-15: ARROW-7187: [C++][Doc] doxygen broken on master because of @ (12ecf8 by nealrichardson)
- 2019-11-15: ARROW-7105: [CI][Crossbow] Nightly homebrew-cpp job fails (18ca5f by nealrichardson)
- 2019-11-15: ARROW-7183: [CI][Crossbow] Re-skip r-sanitizer nightly tests (169805 by nealrichardson)
- 2019-11-15: ARROW-7047: [C++] Insert implicit casts in ScannerBuilder::Finish (91bb9a by bkietz)
- 2019-11-15: ARROW-7167: [CI][Python] Add nightly tests for additional pandas versions to Github Actions (83bd39 by jorisvandenbossche)
- 2019-11-15: ARROW-6967: [C++][Dataset] IN, IS_VALID filter expressions (a04fe2 by bkietz)
- 2019-11-15: PARQUET-1693: [C++] Fix parquet examples with compression define guards (b3773a by fsaintjacques)
- 2019-11-15: ARROW-6633: [C++] Vendor double-conversion library (767c95 by pitrou)
- 2019-11-15: ARROW-7180: [CI] Java builds are not triggered on the master branch (b6aa40 by kszucs)
- 2019-11-15: ARROW-7061: [C++][Dataset] Add ignore file options to FileSystemDataSourceDiscovery (4d8685 by fsaintjacques)
- 2019-11-18: ARROW-7148: [C++][Dataset] Major API cleanup (8df0f5 by fsaintjacques)
- 2019-11-18: ARROW-1900: [C++] Add kernel for min / max (26d4e4 by jorisvandenbossche)
- 2019-11-19: ARROW-5859: [Python] Support ExtensionArray.to_numpy using storage array (aae5e6 by jorisvandenbossche)
- 2019-11-19: ARROW-7170: [C++] Fix linking with bundled ORC (fb29a2 by pitrou)
- 2019-11-19: ARROW-7172: [C++][Dataset] Improve format of Expression::ToString (8685da by bkietz)
- 2019-11-19: ARROW-7185: [R][Dataset] Add bindings for IN, IS_VALID expressions (9becd4 by nealrichardson)
- 2019-11-20: ARROW-7169: [C++] Vendor uriparser library (5ae4f9 by pitrou)
- 2019-11-20: ARROW-7217: [CI][Python] Use correct python version in Github Actions (980e1d by kszucs)
- 2019-11-20: ARROW-6720: [C++] Add HDFS implementation to filesystem layer (d2ca1f by pitrou)
- 2019-11-20: ARROW-7214: [Python] Fix pickling of DictionaryArray (6f0499 by jorisvandenbossche)
- 2019-11-21: ARROW-6975: [C++] Put make_unique in its own header (ea75df by pitrou)
- 2019-11-21: ARROW-7168: [Python] Respect the specified dictionary type for pd.Categorical conversion (09eac5 by jorisvandenbossche)
- 2019-11-21: ARROW-7161: [C++] Migrate filesystem APIs from Status to Result (c20aaa by pitrou)
- 2019-11-21: ARROW-7225: [C++] Fix `*std::move(Result
)` for move-only T (ee3f10 by pitrou) - 2019-11-25: ARROW-7116: [CI] Use the docker repository provided by apache organization (a974ba by kszucs)
- 2019-11-26: ARROW-6954: [Python] [CI] Add Python 3.8 to CI matrix (7adcb7 by pitrou)
- 2019-11-27: ARROW-7056: [Python] Fix test_fs failures when S3 not enabled (26d6be by pitrou)
- 2019-11-27: ARROW-7271: [C++][Flight] Use the single parameter version of SetTotalBytesLimit (61c8b1 by kszucs)
- 2019-11-27: ARROW-7117: [C++][CI] Fix the hanging C++ tests in Windows 2019 (9f4017 by pitrou)
- 2019-11-27: ARROW-7149: [C++] Remove experimental status on filesystem APIs (54d81b by pitrou)
- 2019-11-28: ARROW-7209: [Python] Fix tests on pandas master related to extension dtype conversion (25d3f1 by jorisvandenbossche)
- 2019-11-29: ARROW-6157: [C++] Array data validation (420442 by pitrou)
- 2019-11-29: ARROW-6515: [C++] Clean type_traits.h definitions (df613b by fsaintjacques)
- 2019-12-02: ARROW-7292: [CI] [C++] Add ASAN / UBSAN run (c3db09 by pitrou)
- 2019-12-02: ARROW-7236: [C++] Add Result
APIs to arrow/csv (bb96b7 by pitrou) - 2019-12-02: ARROW-7240: [C++] Add Result
to APIs to arrow/util (b218a7 by pitrou) - 2019-12-02: ARROW-6926: [Python] Support __sizeof__ protocol for Python objects (c6bec1 by jorisvandenbossche)
- 2019-12-02: ARROW-7295: [R] Fix bad test that causes failure on R < 3.5 (10333c by nealrichardson)
- 2019-12-03: ARROW-7298: [C++] Fix thirdparty dependency downloader script (644c17 by wesm)
- 2019-12-04: ARROW-7050: [R] Fix compiler warnings in R bindings (c2fb1c by nealrichardson)
- 2019-12-04: ARROW-7269: [Python] Add ORC to api documentation (511f0d by jorisvandenbossche)
- 2019-12-04: ARROW-7279: [C++] Rename UnionArray::type_ids to type_codes (e902b2 by pitrou)
- 2019-12-05: ARROW-7314: [Python] Fix compiler warning in pyarrow.union (1f7e7f by jorisvandenbossche)
- 2019-12-05: ARROW-7235: [C++] Add Result
APIs to IO layer (6758b2 by pitrou) - 2019-12-05: ARROW-7293: [Dev] [C++] Persist ccache in docker-compose build volumes (fc35d7 by pitrou)
- 2019-12-05: ARROW-7303: [C++] Refactor CSV benchmarks to use Result APIs (0bfec1 by bkietz)
- 2019-12-05: ARROW-7159: [CI] Run HDFS tests as cron task (6cfe3e by pitrou)
- 2019-12-05: ARROW-6637: [C++] Further streamline default build, add ARROW_CSV CMake option (d6caca by wesm)
- 2019-12-05: ARROW-7322: [CI][Python] Fall back to arrowdev dockerhub organization for manylinux images (10ba28 by kszucs)
- 2019-12-05: ARROW-7077: [C++] Casting dictionary to unrelated value type shouldn’t crash (864eb3 by pitrou)
- 2019-12-06: ARROW-6637: [Packaging][FOLLOWUP] Enable necessary components in Autobrew build for R (456076 by wesm)
- 2019-12-06: ARROW-7146: [R][CI] Various fixes and speedups for the R docker-compose setup (e76e1f by nealrichardson)
- 2019-12-06: ARROW-7341: [CI] Unbreak nightly Conda R job (a9114b by nealrichardson)
- 2019-12-06: ARROW-6957: [CI][Crossbow] Nightly R with sanitizers build fails installing dependencies (2e3185 by nealrichardson)
- 2019-12-07: ARROW-7340: [CI] Prune defunct appveyor build setup (b16a3b by nealrichardson)
- 2019-12-08: ARROW-7346: [CI] Explicit usage of ccache across the builds (7102d7 by kszucs)
- 2019-12-09: ARROW-7354: [C++] Fix crash in test-io-hdfs (be2dcb by pitrou)
- 2019-12-10: ARROW-6965: [C++][Dataset] Optionally expose partition keys as columns (5b9dee by bkietz)
- 2019-12-10: ARROW-7261: [Python] Add Python support for Fixed Size List type (1500d3 by jorisvandenbossche)
- 2019-12-10: ARROW-7353: [C++] Ignore -Wmissing-braces when building with clang (e65c2f by wesm)
- 2019-12-10: ARROW-7351: [Developer] Only suggest cpp-* versions by default for PARQUET issues in merge tool (1a1bfd by wesm)
- 2019-12-10: ARROW-7355: [CI] Environment variables are defined twice for the fuzzit builds (eb752e by kszucs)
- 2019-12-10: ARROW-7344: [Packaging][Python] Build manylinux2014 wheels (a6bc52 by nealrichardson)
- 2019-12-11: ARROW-7361: [Rust] Build directory is not passed to ci/scripts/rust_test.sh (2ab72c by kszucs)
- 2019-12-11: ARROW-5333: [C++] Clamp build option summary width to 90 (29569e by bkietz)
- 2019-12-11: ARROW-7210: [C++][R] Allow Numeric <-> Temporal Scalar casts (1accdc by fsaintjacques)
- 2019-12-11: ARROW-7317: [C++] Migrate Iterator to a Result API (3d3ecf by bkietz)
- 2019-12-11: ARROW-7360: [R] Can’t use dplyr filter() with variables defined in parent scope (2287ce by nealrichardson)
- 2019-12-12: ARROW-7310: [Python] Expose HDFS implementation for pyarrow.fs (54cb9c by kszucs)
- 2019-12-13: ARROW-7381: [C++] Unbreak manylinux1 wheels after Iterator refactor (f4cfbc by bkietz)
- 2019-12-13: ARROW-6341: [Python] Implement low-level bindings for Dataset (9cb49f by kszucs)
- 2019-12-14: ARROW-5523: [Python] [Packaging] Use HTTPS consistently for downloading wheel dependencies (0c25fc by nealrichardson)
- 2019-12-14: ARROW-7388: [Python] Skip HDFS tests if libhdfs cannot be located (ef8025 by kszucs)
- 2019-12-14: ARROW-7389: [Python][Packaging] Remove pyarrow.s3fs import check from the recipe (75048e by kszucs)
- 2019-12-16: ARROW-7392: [Packaging] Add conda packaging tasks for python 3.8 (f7f5bf by kszucs)
- 2019-12-16: ARROW-6463: [C++][Python] Rename arrow::fs::Selector to FileSelector (7a6ae2 by kszucs)
- 2019-12-16: ARROW-843: [C++][Dataset] Ensure Schemas are unified in DataSourceDiscovery (860796 by fsaintjacques)
- 2019-12-17: ARROW-7374: [Dev] [C++] Fix cuda-cpp docker build (bce089 by pitrou)
- 2019-12-17: ARROW-7377: [C++][Dataset] Add ScanOptions::MaterializedFields (769c4d by fsaintjacques)
- 2019-12-17: ARROW-7282: [Python] IO functions should raise the right exceptions (28f498 by pitrou)
- 2019-12-18: ARROW-7416: [R][Nightly] Fix macos-r-autobrew build on R 3.6.2 (43066c by nealrichardson)
- 2019-12-18: ARROW-7408: [C++] Fix compilation of reference benchmarks (bdc126 by pitrou)
- 2019-12-18: ARROW-7410: [Doc] [Python] Document filesystem API (ad21a3 by pitrou)
- 2019-12-19: ARROW-6742: [C++] Remove boost::filesystem dependency in hdfs_internal.cc (e12d28 by pitrou)
- 2019-12-19: ARROW-7266: [C++] Fix ArrayDataVisitor on sliced binary-like array (dbf637 by pitrou)
- 2019-12-19: ARROW-7436: [Archery] Enable more benchmark binaries in archery benchmark (344ed4 by fsaintjacques)